Review the following data and Pig code:
What command to define B would produce the output (M,62,95l02) when invoking the DUMP operator on B?
Answer : A
Which TWO of the following statements are true regarding Hive? Choose 2 answers
Answer : A, C
Examine the following Hive statements:
Assuming the statements above execute successfully, which one of the following statements is true?
Answer : B
A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?
Answer : D
HDFS keeps three copies of a block on three different datanodes to protect against truedata corruption. HDFS also tries to distribute these three replicas on more than one rack to protect againstdata availabilityissues. The fact that HDFS actively monitors any failed datanode(s) and upon failure detection immediately schedules re-replication of blocks (if needed) implies that three copies of data on three different nodes is sufficient to avoid corrupted files.
Note:
HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.
You need to create a job that does frequency analysis on input dat
a. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?
Answer : B
Which one of the following statements regarding the components of YARN is FALSE?
Answer : D
Given the following Pig command:
logevents = LOAD 'input/my.log' AS (date:chararray, levehstring, code:int, message:string);
Which one of the following statements is true?
Answer : B