Big Data Quiz
Big Data Quiz : This Big Data Beginner Hadoop Quiz contains set of 60 Big Data Quiz which will help to clear any exam which is designed for Beginner.
1) Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools.
- TRUE
- FALSE
2) Default bock size in HDFS is____________
- 128 KB
- 64 MB
- 64 KB
- 128MB
3) Which of the following statement is/are TRUE regarding Hadoop
i)Performs best with a ‘modest’ number of large files
ii)Performs best with a large number of small files
- i)
- ii)
- Both i) & ii)
- none of the above
4) By defalut each block is replicated _______times
- 1
- 2
- 3
- 4
5) Large block size makes transfer time more effective?
- TRUE
- FALSE
6) Which of the following is NOT a demon process?
- secondarynamenode
- jobtracker
- tasktracker
- mapreducer
7) SPOF (single point of failure) , can be handled by using _________
- secondarynamenode
- backupserver
- jobtracker
- passive nodes
8) Clients access the blocks directly from ________for read and write
- data nodes
- name node
- secondarynamenode
- none of the above
9) Information about locations of the blocks of a file is stored at __________
- data nodes
- name node
- secondarynamenode
- none of the above
10) What makes data into Big Data?
- volume
- velocity
- variety
- all of the above
11) Which of the following statement(s) are TRUE ?
i) Hadoop is comprised of five separate daemons.
ii) Each daemon runs in its own Java Virtual Machine (JVM).
- Only ii)
- Only i)
- Only i) & ii)
- All i), ii) & iii)
12) Hadoop is ‘Rack-aware’ and HDFS replicates data blocks on nodes on different racks
- TRUE
- FALSE
13) Which node stores the checksum?
- datanode
- secondarynamenode
- namenode
- all of the above
14) MapReduce programming model is _____________
- Platform Dependent but not language-specific
- Neither platform- nor language-specific
- Platform independent but language-specific
- Platform Dependent and language-specific
15) Which is optional in map reduce program?
- Mapper
- Reducer
- both are optional
- both are mandatory
16) TaskTracker reside on _________ and run ________ task.
- datanode, map/reduce
- datanode,reducer
- datanode,mapper
- namenode, map/reduce
17) The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes. What are these writable data types optimized for?
- file system storage
- network transmissions
- data retrieval
- all of the above
18) What is the default input format?
- sequencefileformat
- BinaryFileFormat
- TextInputFormat
- none of the above
19) Which is TRUE about HIVE?
- No support for update and delete
- No support for singleton inserts
- Correlated sub queries are not supported
- all of the above
20) Sqoop is a tool which can be used to
- Imports tables from an RDBMS into HDFS
- Exports files from HDFS into RDBMS tables
- Uses a JDBC interface
- all of the above
21) Which tool can be used to transfer data from Microsoft SQL Server databases to Hadoop or HIVE.
- HBASE
- PIG
- SQOOP
- Flume
22) _________ is a distributed, reliable, available service for efficiently moving large amounts of data as it is produced
- FLUME
- SQOOP
- PIG
- HIVE
23) ________ is a workflow engine , runs on a server typically outside the cluster
- Oozie
- Zookeeper
- Chukwa
- Mahout
24) To custom OutputFormats must provide a __________ implementation
- InputWriter
- RecordWriter
- OutWriter
- WritableComparable
25) Combiner is
- Like a ‘mini-Reducer’
- Runs locally on a single Mapper’s output
- Output from the Combiner is sent to the Reducers
- all of the above
26) For Better Load Balancing and to avoid potential performance issues
- custom Partitioner
- custom combiner
- custom reducer
- user more reducer
27) Anything written using the OutputCollector.collect method will be written to __________
- Local file system
- HDFS
- Windows file systems only
- none of the above
28) Which component of the HIVE architecture submits the individual map-reduce jobs from the DAG to the Execution Engine
- compiler
- optimizer
- driver
- none of the above
29) Which HIVE command will load data from an HDFS file/directory to the table?
- LOAD DATA INPATH ‘/user/myname/AB.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);
- LOAD DATA LOCAL INPATH ‘/user/myname/AB.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);
- Both statements are correct
- none of the above
30) Which HIVE command will display tables created by user?
- show table;
- select * from tab;
- show tables;
- none of the above
31) Which HIVE file format is not splitable after compression?
- RCFILE
- SEQUENCEFILE
- TEXTFILE
- all of the above
32) HIVE command : LOAD DATA INPATH ‘/user/myname/log.txt’ INTO TABLE mylog;
- Load the data from local file ‘/user/myname/log.txt’ to table mylog
- Load the data from HDFS file ‘/user/myname/log.txt’ to table mylog
- Overwrite the data from local file ‘/user/myname/log.txt’ to table mylog
- none of the above
33) HIVE command: LOAD DATA LOCAL INPATH ‘/examples/files/ab1.txt’ OVERWRITE INTO TABLE sample
- Load the data from local file ‘/examples/files/ab1.txt’ to table sample
- Load the data from HDFS file ‘/examples/files/ab1.txt’ to table sample
- Overwrites the data from local file ‘/examples/files/ab1.txt’ to table sample
- all of the above
34) join operation is performed at ___________
- mapper
- reducer
- shuffle and sort
- none of the above
35) When is the earliest that the reducer() method of any reduce task in a given job is called?
- immediately after all map tasks have completed
- As soon as a map task emits at least one record
- As soon as at least one map task has finished processing its complete input split
- none of the above
36) You have built a MapReduce job that denormalizes a very large table, resulting in an extremely large amount of output data. Which two cluster resources will your job stress the most?
- RAM , Network
- Network , Disk Input output
- CPU, RAM
- all of the above
37) You have 10 files in the directory /user/amit/example. Each file is 640MB. You submit a MapReduce job with /user/foo/example as the input path.
- all files in the directory
- A single file
- A single input split
- none of the above
38) ___________ Ensure no (key, value) pair is processed more than once
- InputSplit
- RecordReader
- mapper
- reducer
39) ___________ reads the record and passes it to the mapper
- RecordReader
- reducer
- InputSplit
- none of the above
40) Which of the following is the correct sequence of operations for a MR job?
- RecordReader,shuffle and sort,mapper,reducer,InputSplit
- InputSplit,RecordReader,mapper,shuffle and sort, reducer
- InputSplit,RecordReader,reducer,shuffle and sort, mapper
- none of the above
41) Which deamon distributes the individual task to data nodes?
- tasktracker
- jobtracker
- namenode
- datanode
42) __________object allows the mapper to interact with the rest of the Hadoop system
- Context object
- InputSplit
- Recordreader
- Shuffle and Sort
43) How many instances of JobTracker can run on a Hadoop Cluster?
- only one
- maximum two
- any number but should not be more than number of datanodes
- none of the above
44) How many instances of Tasktracker run on a Hadoop cluster?
- unlimited TaskTracker on each datanode
- one TaskTracker for each datanode
- maximum 2 Tasktarcker for each datanode
- none of the above
45) Which PIG LATIN statement is used for per record transformation of data(projection)?
- JOIN
- FOREACH – GENERATE
- FATTEN
- FILTER
46) Which PIG statement is used to remove nesting?
- JOIN
- FOREACH – GENERATE
- FILTER
- none of the above
47) Consider a relation that has a tuple of the form (a,(b,c)) . What is the output, If we apply statement GENERATE $0,FLATTEN($1)
- (a,b,c)
- (a,b) and (a,c)
- invalid operation
- none of the above
48) Command to invoke grunt to use local file system
- pig
- pig -x local
- pig local
- all of the above
49) ______is currently a better choice for low-latency access.
- HBase
- HIVE
- PIG
- all of the above
50) Port number to find namenode and dfshealth information in the browser is________
- 50070
- 50060
- 50030
- none of the above
51) To look for jobtracker in the browser use ________ in the browser
- http://localhost:50070/
- http://localhost:50060/
- http://localhost:50030/
- none of the above
52) To look for tasktracker in the browser use ________ in the browser
- http://localhost:50070/
- http://localhost:50060/
- http://localhost:50030/
- none of the above
53) Which Deamon processes must run on namenode
- tasktracker and jobtracker
- namenode and jobtracker
- namenode and secondarynamenode
- none of the above
54) Which Deamon processes must run on datanode
- tasktracker and datanode
- namenode and jobtracker
- datanode and secondarynamenode
- tasktracker and jobtracker
55) Which Deamon process must run on secondarynamenode
- tasktracker
- namenode
- secondarynamenode
- datanode
56) Hadoop was named after the toy elephant of Doug Cutting’s son.
- TRUE
- FALSE
57) Which of the following accurately describe Hadoop?
- distributed computing approch
- open source
- java based
- all of the above
58) We can update rows and delete rows of a table in HIVE?
- TRUE
- FALSE
59) HIVE is NOT designed for
- OLTP
- low latency applications
- user facing/interactive applications
- all of the above
60) In HDFS HIVE these will create a directory
- table
- partition
- bucket
- all of the above