Big Data Quiz
Big Data Quiz : This Big Data Intermediate Hadoop Quiz contains set of 60 Big Data Quiz which will help to clear any exam which is designed for Intermediate.
1) What is the command for checking disk usage in hadoop.
- Hadoop fs –disk –space
- Hadoop fs –diskusage
- Hadoop fs –du
- None of the above
2) How to set the replication factor of a file.
- hadoop fs -setrep -w 3 –R path
- hadoop fs -repset -w 3 –R path
- hadoop fs -setrep -e 3 –R path
- hadoop fs -repset -e 3 –R path
3) How to set an auto map side join in hive?
- Set hive.exec.auto.map=true;
- Set hive.auto.convert.join=true;
- Set hive.mapred.auto.map.join=true;
- Set hive.map.auto.convert=true;
4) If a database is having tables with data and you want to delete, then which one is the correct command.
- Drop database database_name nonrestrict
- Drop database database_name cascade
- Drop schema database_name noncascade
- Drop database database_name
5) What is the default serde used in hive?
- Lazy serdy
- Default serde
- Binary serde
- None of the above.
6) Create table (id int, dt string, ip int) //line1
partitioned by (dt string) //line2
stored as rcfile; //line 3
- error in line 1;
- error in line 2;
- error in line 3;
- no error;
7) How can you add a catch file to a job?
- DistributedCatch.addCatchFile()
- DistributedCatch.addCatchArchive()
- DistributedCatch.setCatchFiles()
- All of the above.
8) Which one is not a master daemon?
- Namenode
- Jobtracker
- Tasktracker
- None of these.
9) How can you check your available space and total space in hadoop system?
- HDFS dfsadmin –action
- HDFS dfsadmin –property
- HDFS dfsadmin –report
- None of these
10) Job history is used to support job recovery after a jobtracker restart which parameter you need to set?
- mapred.jobtracker.restart.recover
- mapred.jobtracker.set.recover
- mapred.jobtracker.restart.recover.history
- None of the above
11) What is TTL in Hbase.
- HBase will automatically delete rows once the expiration time is reached.
- HBase will automatically disable rows once the expiration time is reached.
- It’s just a time taken for executing a job.
- None.
12) Does HDFS allow appends to files.
- True
- False
13) In which file you can Set HBase environment variables.
- hbase-env.sh
- hbase-var.sh
- hbase-update.sh
- None.
14) Which file you need to edit to change rate at which HBase files are rolled and so as the level at which HBase logs messages.
- log4j.properties
- zookeeper.properties
- hbase.properties
- None
15) What is the default block size in apache HDFS.
- 64MB
- 128MB
- 512MB
- 1024MB
16) What is the default port for jobtracker web UI?
- 50050
- 50060
- 50070
- 50030
17) As HDFS works on the principle of
- Write once , Read Many
- Write Many, Read Many
- Write Many, Read Once
- None
18) Data node decides where to store to data,
- Yes
- False
19) SSH is the communication channel between data node and name node
- Yes
- False
20) Reading is parallel and writing is not parallel in HDFS
- True
- False
21) ___ command to check for various inconsistencies in HDFS
- FSCK
- FETCHDT
- SAFEMODE
- SAFEANDRECOVERY
22) Hive provides
- SQL
- HQL
- PL/SQL
- PL/HQL
23) HQL Stands for ?
- Hibernate Query Language
- Historical Query Language
- Health Query Language
- Hive Query Language
24) HIVE is ____________
- A data mart on hadoop
- A dataware house on hadoop
- a database on hadoop
- None
25) HQL allows _________ programmers
- C# programmers
- Java programmers
- Map-reduce programmers
- python programmers
26) Hive data is organized into
- Databases
- Tables
- Buckets/Clusters
- All of the above
27) HQL has the statements
- DDL,DCL
- DML,TCL
- DML,DDL
- DCL,TCL
28) The Decimal datatype has _____precision in hive
- 4
- 8
- 16
- N/A
29) How many bytes takes TINYINT in hive
- 1
- 2
- 4
- 8
30) regexp_replace(‘sairam’,’ai|am’) output is
- sai|ram
- sai
- sr
- ram
31) If explicit conversion fails then cast operator returns
- zero
- one
- FALSE
- Null
32) which clause can be used to filter rows from a table in HQL
- group by
- order by
- where
- having
33) Which one of the following we can use to list the columns and all properties of a table
- DECRIBE EXTENDED table_name;
- DECRIBE table_name;
- DECRIBE PROPERTIES table_name;
- DECRIBE EXTENDED PROPERTIES table_name;
34) Which clause can be used to restricts the query to a fraction of the buckets in the table rather than the whole table:
- SAMPLE
- TABLESAMPLE
- RESTICTTABLE
- NONE
35) TABLESAMPLE syntax is
- TABLESAMPLE(BUCKET x OUT OF(Y))
- TABLESAMPLE(BUCKET x OUT OF y)
- TABLESAMPLE(BUCKET x IN y)
- TABLESAMPLE(BUCKET x IN(y))
36) How many total default number of dynamic partitions could be created by one DML in hive.exec.max.dynamic.partitions parameter
- 10
- 100
- 1000
- N/A
37) When using a derby database for a Metastore, how many client instances can connect to Hive?
- 1
- 10
- Any
- Cannot Say
38) In Hadoop 2.0, Name Node High Availability feature is present
- TRUE
- FALSE
39) Name Node is hortizontally scalable due to the facility in Namenode Federation
- TRUE
- FALSE
40) How will you identify when was the last checkpoint done in a cluster
- Using the Name Node Web UI
- Using the Secondary Name Node UI
- Using the hadoop dfsadmin -report command
- Using the hadoop fssck command
41) Hadoop fsck command is used to
- Check the integrity of the HDFS
- Check the status of data nodes in the cluster
- check the status of the NameNode in the cluster
- Check the status of the Secondary Name Node
42) How can you determine available HDFS space in your cluster
- using hadoop dfsadmin -report command
- using hadoop fsck / command
- using secondary namenode web UI
- Using Data Node Web UI
43) A existing Hadoop cluster has 20 slave nodes with quad-core CPUs and 24TB of hard drive space each. You plan to add 5 new slave nodes. How much disk space can your new nodes contain?
- New nodes may have any amount of hard drive space
- New nodes must have at least 24TB of hard drive space
- New nodes must have exactly 24TB or hard drive space
- New nodes must not have more than 24TB of hard drive space
44) Which is a recommended configuration of disk drives for a DataNode
- 10 1TB disk drives in a RAID configuration
- 10 2TB disk drives in a JBOD configuration
- One 3TB disk drive
- 48 2TB disk drives in a RAID configuration
45) How does the HDFS architecture provide data reliability?
- Reliance on SAN devices as a DataNode interface.
- Storing multiple replicas of data blocks on different DataNodes
- DataNodes make copies of their data blocks, and put them on different local disks.
- Reliance on RAID on each DataNode.
46) Hcatalog has APIs to connect to HBase
- TRUE
- FALSE
47) The path in which the HDFS data will be stored is specified in the following file
- hdfs-site.xml
- yarn-site.xml
- mapred-site.xml
- core-site.xml
48) To access to a Web user interface for a specific daemon requires which details
- The setting for dfs.http.address for the NameNode
- The IP address or DNS/hostname of the NameNode in the cluster
- The SSL password used to log in to the Hadoop Admin Console
- The server IP address or DNS/hostname where the daemon is running and the TCP/IP port
49) What is the default partitioning machanisim?
- Round Robin
- User needs to configure
- Hash Partitioning
- None
50) Is it possible to change the HDFS block size
- TRUE
- FALSE
51) Name Node contain
- Meta data, all data blocks
- Metadata and recently used block
- Meta data only
- None of the above
52) What is varity means to Big Data
- Related data from different source in different formats
- Unrelated data from different source.
53) Where do you specify the HDFS file system and host location
- hdsf-site.xml
- core-site.xml
- mapred-site.xml
- hive-site.xml
54) Which file do you use to configure Job Tracker
- core-site.xml
- mapred-site.xml
- hdfs-site.xml
- job-tracker.xml
55) Which file is used to define worker nodes
- core-site.xml
- mapred-site.xml
- master-slave.xml
- None
56) Name Node can be formatted any time without data loss
- TRUE
- FALSE
57) How do you list the files in a HDFS directory
- ls
- hadoop ls
- hadoop fs -ls
- hadoop ls -fs
58) Formatting the Name Node first time will result in
- Formats the Name Node disk
- Cleans the HDFS data directory
- Just creates the directory structure on the Data Node machine
- None of the above
59) What create the emply directory structure on Name Node
- Configure in hdfs-site.xml
- strat the Name Node demon
- Format the Name Node
- None of the above
60) Hadoop answer to Big Data challenge
- Job Tracker and Name Node
- Name Node and Data Node
- Data blocks, keys and value
- HDFS and MapReduce
61) HDFS Achives High Availablility and fault tolerance through
- By spliting files into blocks
- By keeping a copy of frequently accessing data block in Name Node
- By replicating any blocks on multiple data node on the cluster
- None of the above
62) Name node keeps metadata and data files
- TRUE
- FALSE
63) Big Data poses challenge to traditional system in terms of
- Network bandwidth
- Operating system
- Storage and proccessing
- None of the above
64) What is the function of Secondary Name Node
- Backup to Name Node
- Helps Name Node in merging fsimage and edit
- When Name node is busy, it servers the request for the file system
- None of the above
65) Hadoop data types are optimized for
- Data proccessing
- Encryption
- Compression
- Network transmissions
66) HCatalog uses hive metastore for schema operations
- TRUE
- FALSE
67) A HDFS file can be executed
- TRUE
- FALSE