Our pass rate is high to 98.9% and the similarity percentage between our CCA-505 study guide and real exam is 90% based on our seven-year educating experience. Do you want achievements in the Cloudera CCA-505 exam in just one try? I am currently studying for the Cloudera CCA-505 exam. Latest Cloudera CCA-505 Test exam practice questions and answers, Try Cloudera CCA-505 Brain Dumps First.
Q1. You have a cluster running with the Fair Scheduler enabled. There are currently no jobs running on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you submit Job B. now job A and Job B are running on the cluster at the same time. How will the Fair Scheduler handle these two jobs?
A. When job A gets submitted, it consumes all the tasks slots.
B. When job A gets submitted, it doesn’t consume all the task slots
C. When job B gets submitted, Job A has to finish first, before job B can scheduled
D. When job B gets submitted, it will get assigned tasks, while Job A continue to run with fewer tasks.
Answer: C
Q2. Your Hadoop cluster contains nodes in three racks. You have NOT configured the dfs.hosts property in the NameNode’s configuration file. What results?
A. No new nodes can be added to the cluster until you specify them in the dfs.hosts file
B. Presented with a blank dfs.hosts property, the NameNode will permit DatNode specified in mapred.hosts to join the cluster
C. Any machine running the DataNode daemon can immediately join the cluster
D. The NameNode will update the dfs.hosts property to include machine running DataNode daemon on the next NameNode reboot or with the command dfsadmin -refreshNodes
Answer: C
Q3. A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is a directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named DriverClass. She runs command:
hadoop jar j.jar DriverClass /data/input/data/output The error message returned includes the line:
PrivilegedActionException as:training (auth:SIMPLE) cause.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exits: file :/data/input
What is the cause of the error?
A. The Hadoop configuration files on the client do not point to the cluster
B. The directory name is misspelled in HDFS
C. The name of the driver has been spelled incorrectly on the command line
D. The output directory already exists
E. The user is not authorized to run the job on the cluster
Answer: A
Q4. Given:
You want to clean up this list by removing jobs where the state is KILLED. What command you enter?
A. Yarn application –kill application_1374638600275_0109
B. Yarn rmadmin –refreshQueue
C. Yarn application –refreshJobHistory
D. Yarn rmadmin –kill application_1374638600275_0109
Answer: A
Explanation: Reference: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_using-apache-hadoop/content/common_mrv2_commands.html
Q5. Which YARN daemon or service negotiates map and reduce Containers from the Scheduler, tracking their status and monitoring for progress?
A. ResourceManager
B. ApplicationMaster
C. NodeManager
D. ApplicationManager
Answer: B
Explanation: Reference: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_using-apache-hadoop/content/yarn_overview.html
Q6. You are the hadoop fs –put command to add a file “sales.txt” to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of this file in this situation/
A. The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file’s replication doesn’t fall two)
B. This file will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster’s replication values are restored
C. The file will remain under-replicated until the administrator brings that nodes back online
D. The file will be re-replicated automatically after the NameNode determines it is under replicated based on the block reports it receives from the DataNodes
Answer: B
Q7. Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result when you execute: hadoop jar samplejar.jar MyClass on a client machine?
A. SampleJar.jar is sent to the ApplicationMaster which allocation a container for Sample.jar
B. SampleJar.Jar is serialized into an XML file which is submitted to the ApplicationMaster
C. SampleJar.Jar is sent directly to the ResourceManager
D. SampleJar.Jar is placed in a temporary directly in HDFS
Answer: A
Q8. You have installed a cluster running HDFS and MapReduce version 2 (MRv2) on YARN. You have no afs.hosts entry()ies in your hdfs-alte.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node.
What do you have to do on the cluster to allow the worker node to join, and start storing HDFS blocks?
A. Nothing; the worker node will automatically join the cluster when the DataNode daemon is started.
B. Without creating a dfs.hosts file or making any entries, run the command hadoop dfsadmin –refreshHadoop on the NameNode
C. Create a dfs.hosts file on the NameNode, add the worker node’s name to it, then issue the command hadoop dfsadmin –refreshNodes on the NameNode
D. Restart the NameNode
Answer: B
Q9. Identify two features/issues that YARN is designed to address:
A. Standardize on a single MapReduce API
B. Single point of failure in the NameNode
C. Reduce complexity of the MapReduce APIs
D. Resource pressures on the JobTracker
E. Ability to run frameworks other than MapReduce, such as MPI
F. HDFS latency
Answer: D,E
Q10. You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum-based Storage. What is the purpose of ZooKeeper in such a configuration?
A. It manages the Edits file, which is a log changes to the HDFS filesystem.
B. It monitors an NFS mount point and reports if the mount point disappears
C. It both keeps track of which NameNode is Active at any given time, and manages the Edits file, which is a log of changes to the HDFS filesystem
D. It only keeps track of which NameNode is Active at any given time
E. Clients connect to ZoneKeeper to determine which NameNode is Active
Answer: D
Explanation: Reference: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/PDF/CDH4-High-Availability-Guide.pdf (page 15)
Q11. You have a Hadoop cluster running HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run on the cluster and
submit jobs from the command line of the gateway machine?
A. Install the impslad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node
B. Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine
C. Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalog daemon on one of the nodes in the cluster
D. Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine
E. Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster, and the impala shell on your gateway machine
Answer: B
Q12. Your cluster has the following characteristics:
✑ A rack aware topology is configured and on
✑ Replication is not set to 3
✑ Cluster block size is set to 64 MB
Which describes the file read process when a client application connects into the cluster and requests a 50MB file?
A. The client queries the NameNode which retrieves the block from the nearest DataNode to the client and then passes that block back to the client.
B. The client queries the NameNode for the locations of the block, and reads from a random location in the list it retrieves to eliminate network I/O leads by balancing which nodes it retrieves data from at any given time.
C. The client queries the NameNode for the locations of the block, and reads all three copies. The first copy to complete transfer to the client is the one the client reads as part of Hadoop’s speculative execution framework.
D. The client queries the NameNode for the locations of the block, and reads from the first location in the list it receives.
Answer: A
Q13. A slave node in your cluster has four 2TB hard drives installed (4 x 2TB). The DataNode is
configured to store HDFS blocks on the disks. You set the value of the dfs.datanode.du.reserved parameter to 100GB. How does this alter HDFS block storage?
A. A maximum of 100 GB on each hard drive may be used to store HDFS blocks
B. All hard drives may be used to store HDFS blocks as long as atleast 100 GB in total is available on the node
C. 100 GB on each hard drive may not be used to store HDFS blocks
D. 25 GB on each hard drive may not be used to store HDFS blocks
Answer: B
Q14. Which is the default scheduler in YARN?
A. Fair Scheduler
B. FIFO Scheduler
C. Capacity Scheduler
D. YARN doesn’t configure a default scheduler. You must first assign a appropriate scheduler class in yarn-site.xml
Answer: C
Explanation: Reference: http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
Q15. Which process instantiates user code, and executes map and reduce tasks on a cluster running MapReduce V2 (MRv2) on YARN?
A. NodeManager
B. ApplicationMaster
C. ResourceManager
D. TaskTracker
E. JobTracker
F. DataNode
G. NameNode
Answer: E
Q16. Which YARN process runs as “controller O” of a submitted job and is responsible for resource requests?
A. ResourceManager
B. NodeManager
C. JobHistoryServer
D. ApplicationMaster
E. JobTracker
F. ApplicationManager
Answer: D
Q17. Assuming a cluster running HDFS, MapReduce version 2 (MRv2) on YARN with all settings at their default, what do you need to do when adding a new slave node to a cluster?
A. Nothing, other than ensuring that DNS (or /etc/hosts files on all machines) contains am entry for the new node.
B. Restart the NameNode and ResourceManager deamons and resubmit any running jobs
C. Increase the value of dfs.number.of.needs in hdfs-site.xml
D. Add a new entry to /etc/nodes on the NameNode host.
E. Restart the NameNode daemon.
Answer: B
Q18. You observe that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 100 MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
A. Decrease the io.sort.mb value to 0
B. Increase the io.sort.mb to 1GB
C. For 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records
Answer: D
Q19. Which three basic configuration parameters must you set to migrate your cluster from MapReduce1 (MRv1) to MapReduce v2 (MRv2)?
A. Configure the NodeManager hostname and enable services on YARN by setting the following property in yarn-site.xml:
<name>yarn.nodemanager.hostname</name>
<value>your_nodeManager_hostname</value>
B. Configure the number of map tasks per job on YARN by setting the following property in mapred-site.xml:
<name>mapreduce.job.maps</name>
<value>2</value>
C. Configure MapReduce as a framework running on YARN by setting the following property in mapred-site.xml:
<name>mapreduce.framework.name</name>
<value>yarn</value>
D. Configure the ResourceManager hostname and enable node services on YARN by setting the following property in yarn-site.xml:
<name>yarn.resourcemanager.hostname</name>
<value>your_responseManager_hostname</value>
E. Configure a default scheduler to run on YARN by setting the following property in sapred-site.xml:
<name>mapreduce.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
F. Configure the NodeManager to enable MapReduce services on YARN by adding following property in yarn-site.xml:
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
Answer: A,B,D
Q20. Which Yarn daemon or service monitors a Container’s per-application resource usage (e.g, memory, CPU)?
A. NodeManager
B. ApplicationMaster
C. ApplicationManagerService
D. ResourceManager
Answer: A
Explanation: Reference: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_using-apache-hadoop/content/ch_using-apache-hadoop-4.html (4th para)
Q21. In CDH4 and later, which file contains a serialized form of all the directory and files inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?
A. fstime
B. VERSION
C. Fsimage_N (Where N reflects all transactions up to transaction ID N)
D. Edits_N-M (Where N-M specifies transactions between transactions ID N and transaction ID N)
Answer: C
Explanation: Reference: http://mikepluta.com/tag/namenode/
Q22. You have a 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High Availability (HA). You want to minimize the chance of data loss in you cluster. What should you do?
A. Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum
B. Configure the cluster’s disk drives with an appropriate fault tolerant RAID level
C. Run the ResourceManager on a different master from the NameNode in the order to load share HDFS metadata processing
D. Run a Secondary NameNode on a different master from the NameNode in order to load provide automatic recovery from a NameNode failure
E. Set an HDFS replication factor that provides data redundancy, protecting against failure
Answer: C
Q23. You are configuring a cluster running HDFS, MapReduce version 2 (MRv2) on YARN running Linux. How must you format the underlying filesystem of each DataNode?
A. They must not formatted - - HDFS will format the filesystem automatically
B. They may be formatted in any Linux filesystem
C. They must be formatted as HDFS
D. They must be formatted as either ext3 or ext4
Answer: D
Q24. During the execution of a MapReduce v2 (MRv2) job on YARN, where does the Mapper place the intermediate data each Map task?
A. The Mapper stores the intermediate data on the mode running the job’s ApplicationMaster so that is available to YARN’s ShuffleService before the data is presented to the Reducer
B. The Mapper stores the intermediate data in HDFS on the node where the MAP tasks ran in the HDFS /usercache/&[user]sppcache/application_&(appid) directory for the user who ran the job
C. YARN holds the intermediate data in the NodeManager’s memory (a container) until it is transferred to the Reducers
D. The Mapper stores the intermediate data on the underlying filesystem of the local disk in the directories yarn.nodemanager.local-dirs
E. The Mapper transfers the intermediate data immediately to the Reducers as it generated by the Map task
Answer: D
Q25. You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently see that MapReduce map tasks on your cluster are running slowly because of excessive garbage collection of JVM, how do you increase JVM heap property to 3GB to optimize performance?
A. Yarn.application.child.java.opts-Xax3072m
B. Yarn.application.child.java.opts=-3072m
C. Mapreduce.map.java.opts=-Xmx3072m
D. Mapreduce.map.java.opts=-Xms3072m
Answer: C
Reference: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
Q26. Your Hadoop cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. Can you configure a worker node to run a NodeManager daemon but not a DataNode daemon and still have a function cluster?
A. Yes. The daemon will receive data from the NameNode to run Map tasks
B. Yes. The daemon will get data from another (non-local) DataNode to run Map tasks
C. Yes. The daemon will receive Reduce tasks only
Answer: A
Q27. You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from a faster network fabric?
A. When your workload generates a large amount of output data, significantly larger than amount of intermediate data
B. When your workload generates a large amount of intermediate data, on the order of the input data itself
C. When workload consumers a large amount of input data, relative to the entire capacity of HDFS
D. When your workload consists of processor-intensive tasks
Answer: B