Hortonworks Data Platform Certified Developer Exam Practice Test

Page: 1 / 14
Total 108 questions
Question 1

Which HDFS command displays the contents of the file x in the user's HDFS home directory?



Answer : C


Question 2

You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python. Which format should you use to store this data in HDFS?



Answer : B


Question 3

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?



Answer : A

You can use your reducer code as a combiner if the operation performed is commutative and associative.


Question 4

Which best describes what the map method accepts and emits?



Answer : D

public class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

extends Object

Maps input key/value pairs to a set of intermediate key/value pairs.

Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.


Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Question 5

Workflows expressed in Oozie can contain:



Answer : A

Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph), specifying a sequence of actions execution. This graph is specified in hPDL (a XML Process Definition Language).

hPDL is a fairly compact language, using a limited amount of flow control and action nodes. Control nodes define the flow of execution and include beginning and end of a workflow (start, end and fail nodes) and mechanisms to control the workflow execution path ( decision, fork and join nodes).

Workflow definitions

Currently running workflow instances, including instance states and variables


Note: Oozie is a Java Web-Application that runs in a Java servlet-container - Tomcat and uses a database to store:

Question 6

Which process describes the lifecycle of a Mapper?



Answer : B

For each map instance that runs, the TaskTracker creates a new instance of your mapper.

Note:

* The Mapper is responsible for processing Key/Value pairs obtained from the InputFormat. The mapper may perform a number of Extraction and Transformation functions on the Key/Value pair before ultimately outputting none, one or many Key/Value pairs of the same, or different Key/Value type.

* With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defines an 'Identity' map function by default - every input Key/Value pair obtained from the InputFormat is written out.

Examining the run() method, we can see the lifecycle of the mapper:

/**

* Expert users can override this method for more complete control over the

* execution of the Mapper.

* @param context

* @throws IOException

*/

public void run(Context context) throws IOException, InterruptedException {

setup(context);

while (context.nextKeyValue()) {

map(context.getCurrentKey(), context.getCurrentValue(), context);

}

cleanup(context);

}

setup(Context) - Perform any setup for the mapper. The default implementation is a no-op method.

map(Key, Value, Context) - Perform a map operation in the given Key / Value pair. The default implementation calls Context.write(Key, Value)

cleanup(Context) - Perform any cleanup for the mapper. The default implementation is a no-op method.


Question 7

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?



Answer : D

FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Subclasses of FileInputFormat can also override the isSplitable(JobContext, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers.


Page:    1 / 14   
Total 108 questions