Big Data Quiz
Big Data Quiz : This Big Data Expert Hadoop Quiz contains set of 60 Big Data Quiz which will help to clear any exam which is designed for Expert.
1) What is identity mapper in hadoop.
- When same mapper runs on the different dataset in same job.
- When same mapper runs on different dataset in different jobs.
- When there is no mapper for a job then identity mapper class used.
- Both b and c are correct.
2) Default scheduler used in map-reducer framework
- Capacity scheduler.
- Fair scheduler.
- Job scheduler.
- Lazy scheduler.
3) You are executing PIG in MapReduce mode and you want to execute in local mode. How can you achieve it?
- Pig –x local
- Pig –x mapreduce leave
- Both are correct
- None of these
4) The process in hadoop-2.0 which scale the name node horizontally is known as
- Federation
- Fenching
- Namenode HA
- None of these.
5) What is true about partitioner in Map-reduce job?
- No. of partition is same as number of reducer.
- Default partition in hive is Hash partition.
- All of these.
- None of these.
6) Which is not a role of reporter in map-reduce program.
- Report about the progress of mappers and reducers.
- Set application level status message.
- Update counters.
- Help in lunching failed job with the help of counter.
7) What is mapred.tip.id refers to in the time of debugging?
- Id of the mapper currently running.
- Id of the reducer currently running.
- Id of the task currently running.
- Id of the job currently running.
8) Which below scheduler support multiple queues?
- Fair scheduler
- Capacity scheduler.
- Lazy scheduler.
- Default scheduler.
9) How can you set a debug script in hadoop MR Job?
- JobConf.setMapDebugScript(String)
- JobConf.setReducerDebugScript(String)
- JobConf.setJobScript(String)
- None of these.
10) What is the default data type in PIG
- Bytearray
- Chararray
- Textarray
- None
11) Which operator in PIG is not associated with loading and storing?
- Load
- Store
- Dump
- Split
12) How can you enable the MemStore-Local Allocation Buffer, a feature which works to prevent heap fragmentation under heavy write loads?
- hbase.hregion.memstore.mslab.enabled
- hbase.hregion.memstore.enabled
- hbase.hregion.memstore.mslab.job.enabled
- none
13) What is the hashing algorithm used for hash function.
- Murmur
- Lazy
- Default
- None.
14) Which one is the policy configuration file used by RPC servers to make authorization decisions on client requests.
- hadoop.policy.file= hbase-policy.xml
- hadoop.policy.file.apache= hbase-policy.xml
- hadoop.policy.file.enable= hbase-policy.xml
- none
15) how can you specify destination directory in sqoop
- –target-dir <dir>
- –destination-dir <dir>
- –hdfs-dir <dir>
- All of the above
16) How can you enable compression in sqoop.
- -z
- –compress
- Both a and b
- None.
17) Let say you are importing from PostgreSQL through sqoop in conjunction with direct mode, you can split the import into separate files after individual files reach a certain size. How you can do it?
- –direct-split-size
- –input-split-size
- –postgresql-split-size
- None
18) What is the default port to access name node web UI?
- 50060
- 50050
- 50070
- None of the above
19) How many states does Writable interface defines _____.
- Two
- Three
- Six
- None of the above
20) What are supported programming languages for Map Reduce?
- The most common programming language is Java, but scripting languages are also supported via Hadoop streaming.
- Any programming language that can comply with Map Reduce concept can be supported.
- Only Java supported since Hadoop was written in Java.
- Currently Map Reduce supports Java, C, C++ and COBOL.
21) What is map – side join?
- Map-side join is a technique in which data is eliminated at the map step
- Map-side join is done in the map phase and done in memory
- Map-side join is a form of map-reduce API which joins data from different locations
- None of these answers are correct
22) What is reduce – side join?
- Reduce-side join is a technique to eliminate data from initial data set at reduce step
- Reduce-side join is a technique for merging data from different sources based on a specific key. There are no memory restrictions
- Reduce-side join is a set of API to merge data from different sources.
- None of these answers are correct
23) What is AVRO?
- Avro is a java serialization library
- Avro is a java compression library
- Avro is a java library that create splittable files
- None of these answers are correct
24) Can you run Map – Reduce jobs directly on Avro data?
- Yes, Avro was specifically designed for data processing via Map-Reduce
- Yes, but additional extensive coding is required
- No, Avro was specifically designed for data storage only
- Avro specifies metadata that allows easier data access. This data cannot be used as part of map-reduce execution, rather input specification only.
25) Can a custom type for data Map-Reduce processing be implemented?
- No, Hadoop does not provide techniques for custom datatypes.
- Yes, but only for mappers.
- Yes, custom data types can be implemented as long as they implement writable interface.
- Yes, but only for reducers.
26) Which Node acts as an access point for the external applications, tools, and users
that need to utilize the Hadoop environment.
- Datanode
- namenode
- job tracker
- N/A
27) Which object can be used to get the progress of a particular job
- MAP
- Reducer
- Context
- Prgress
28) Which node performs housekeeping functions for the NameNode.
- Datanode
- namenode
- Secondary NameNode
- Edge Node
29) Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
- Oozie
- Sqoop
- Sqoop
- Hadoop Streaming
30) Which MapReduce stage serves as a barrier, where all previous stages must be completed before it may proceed?
- Combine
- Group
- Reduce
- Write
31) Which TACC resource has support for Hadoop MapReduce?
- Ranger
- Longhorn
- Lonestar
- Spur
32) What is the implementation language of the Hadoop MapReduce framework?
- Java
- C
- FORTRAN
- Python
33) Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution?
- Split
- Map
- Combine
- None of the above
34) _______ shell is used to execute Pig Latin statement
- Execute
- Run
- Grunt
- N/A
35) _______ operator is used to view logical, physical and mapreduce execution plan to compute a relation
- Show
- Describe
- Display
- Explain
36) Pig is developed by
- Face Book
- Yahoo
- Linked In
37) Pig is ?
- Declarative language
- Data Flow Language
- Both
- N/A
38) Following is not a Daemon of YARN
- Resource Manager
- Node Manager
- Application Master
- Job TRacker
39) What happens when a Map task crashes while running a MapReduce job on a cluster configured with MapReduce version 1 (MRv1)?
- The framework closes the JVM instance and restarts
- The job immediately fails
- The JobTracker attempts to re-run the task on the same node
- The JobTracker attempts to re-run the task on a different node
40) Which daemon reports available slots for scheduling a Map or Reduce operation in MapReduce version 1 (MRv1)?
- TaskTracker
- JobTracker
- Secondary NameNode
- DataNode
41) How is the number of Mappers determined for a job in a MapReduce?
- The number of Mappers is calculated by the NameNode based on the number of HDFS blocks in the files.
- The developer specifies the number in the job configuration.
- The JobTracker chooses the number based on the number of available nodes.
- The number of Mappers is equal to the number of InputSplits calculated by the client submitting the job
42) Which daemon instantiates Java Virtual Machines in a cluster running MapReduce v1 (MRv1)
- ResourceManager
- TaskTracker
- JobTracker
- DataNode
43) Number of files the reduce task generate?
- one file all together
- one file per reducer
- Depends on input file
- None
44) If the Name node is down and job is submitted
- It will connect with Secondary name node to process the job
- Wait untill Name Node comes up
- gets files from local disk
- Job will fail
45) What is Partitioning in MapReduce
- Making map output into equal partions
- when map output exceeds the limit, create a new one
- Assigning map output keys to reducers
- None of the above
46) The Reducer class defines
- How to process one key at a time
- How to process multiple keys together
- Depends on the logic any thing can be done
- Depends on the number of keys
47) By Default number of Map tasks depends upon
- number of machines
- Number of files
- configurable
- number of splits
48) How dose MapReduce hides data dispersion
- Through Map Reduce components
- By defining data as keys and values
- By clustering machines
- HDFS takes care of it
49) What is the input format for Hadoop Archive files
- TextInputFormat
- SequenceFileInputFormat
- None of these
- There is no suitable Input Format Type
50) map() method uses the following object to send output to Map/Reduce framework
- JobClient
- Config
- Context
- It can directly write
51) Number of Partition is equals to
- Number of Reducers
- Number of Mappers
- Number of Input Split
- Number of output directories
52) A combiner class can be created by extending
- Combiner Class
- Mapper class
- Reducer Class
- Partitioner Class
53) Distributed cache can be used to add
- a data file
- a jar file library
- both 1 and 2
- None of the above
54) For making Custom Partitioning one needs to implement
- Logic to be written in Mapper
- Logic to be written in Reducer
- Partitioner
- Combiner
55) _______________ file controls debugging metrics in hadoop
- .core-site.xml
- properties
- .hadoop-env.sh
- hadoop-metrics.properties
56) Default input key type for TextInputFormat?
- LongWritable
- ShortWrtiable
- NullWritable
- Text
57) Output of reducer is written to
- temp directory
- HDFS
- Local disk
- None of the above
58) How to specify UNIX time in milliseconds in Flume.
- %u
- %b
- %t
- None
59) How to specify long month name (January, February) in flume.\
- %b
- %B
- %M
- %l