Problem Scenario 90 : You have been given below two files
course.txt
id,course
1,Hadoop
2,Spark
3,HBase
fee.txt
id,fee
2,3900
3,4200
4,2900
Accomplish the following activities.
1. Select all the courses and their fees , whether fee is listed or not.
2. Select all the available fees and respective course. If course does not exists still list the fee
3. Select all the courses and their fees , whether fee is listed or not. However, ignore records having fee as null.
Answer : A
Problem Scenario 74 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderjd , order_date , ordercustomerid, order status}
Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_product_price)
Please accomplish following activities.
1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory p89_orders and p89_order_items .
2. Join these data using orderjd in Spark and Python
3. Now fetch selected columns from joined data Orderld, Order date and amount collected on this order.
4. Calculate total order placed for each date, and produced the output sorted by date.
Answer : A
Problem Scenario 73 : You have been given data in json format as below.
{"first_name":"Ankit", "last_name":"Jain"}
{"first_name":"Amir", "last_name":"Khan"}
{"first_name":"Rajesh", "last_name":"Khanna"}
{"first_name":"Priynka", "last_name":"Chopra"}
{"first_name":"Kareena", "last_name":"Kapoor"}
{"first_name":"Lokesh", "last_name":"Yadav"}
Do the following activity
1. create employee.json file locally.
2. Load this file on hdfs
3. Register this data as a temp table in Spark using Python.
4. Write select query and print this data.
5. Now save back this selected data in json format.
Answer : A
Problem Scenario 70 : Write down a Spark Application using Python, In which it read a file "Content.txt" (On hdfs) with following content. Do the word count and save the results in a directory called "problem85" (On hdfs)
Content.txt
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
Answer : B
Problem Scenario 69 : Write down a Spark Application using Python,
In which it read a file "Content.txt" (On hdfs) with following content.
And filter out the word which is less than 2 characters and ignore all empty lines.
Once doen store the filtered data in a directory called "problem84" (On hdfs)
Content.txt
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce
Answer : A
Problem Scenario 52 : You have been given below code snippet.
val b = sc.parallelize(List(1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1))
Operation_xyz
Write a correct code snippet for Operation_xyz which will produce below output. scalaxollection.Map[lnt,Long] = Map(5 -> 1, 8 -> 1, 3 -> 1, 6 -> 1, 1 -> S, 2 -> 3, 4 -> 2, 7 -> 1)
Answer : A
Problem Scenario 51 : You have been given below code snippet.
val a = sc.parallelize(List(1, 2,1, 3), 1)
val b = a.map((_, "b"))
val c = a.map((_, "c"))
Operation_xyz
Write a correct code snippet for Operationxyz which will produce below output.
Output:
Array[(lnt, (lterable[String], lterable[String]))] = Array(
(2,(ArrayBuffer(b),ArrayBuffer(c))),
(3,(ArrayBuffer(b),ArrayBuffer(c))),
(1,(ArrayBuffer(b, b),ArrayBuffer(c, c)))
)
Answer : B