Top Quality of E20-007 question materials and practice exam for EMC certification for examinee, Real Success Guaranteed with Updated E20-007 pdf dumps vce Materials. 100% PASS Data Science Associate Exam exam Today!

Q46. A data scientist is given an R data frame, “empdata”, with the columns Age, Salary, Occupation, Education, and Gender. The data scientist would like to examine only the Salary and Occupation columns for ages greater than 40. Which command extracts the appropriate rows and columns from the data frame? 

A. empdata[empdata$Age > 40,c("Salary","Occupation")] 

B. empdata[c("Salary","Occupation"),empdata$Age > 40] 

C. empdata[Age > 40,("Salary","Occupation")] 

D. empdata[,c("Salary","Occupation")]$Age > 40 

Answer:


Q47. Consider a database with 4 transactions: 

Transaction 1: {cheese, bread, milk} 

Transaction 2: {soda, bread, milk} 

Transaction 3: {cheese, bread} 

Transaction 4: {cheese, soda, juice} 

The minimum support is 25%. Which rule has a confidence equal to 50%? 

A. {bread,milk} => {cheese} 

B. {bread} => {milk} 

C. {juice} => {soda} 

D. {bread} => {cheese} 

Answer:


Q48. You are given 10, 000, 000 user profile pages of an online dating site in XML files, and they are stored in HDFS. You are assigned to divide the users into groups based on the content of their profiles. You have been instructed to try K-means clustering on this data. How should you proceed? 

A. Run MapReduce to transform the data,and find relevant key value pairs. 

B. Divide the data into sets of 1,000 user profiles,and run K-means clustering in RHadoop 

iteratively. 

C. Run a Naive Bayes classification as a pre-processing step in HDFS. 

D. Partition the data by XML file size,and run K-means clustering in each partition. 

Answer:


Q49. Which word or phrase completes the statement? Mahout is to Hadoop as MADlib is to . 

A. PostgreSQL 

B. R 

C. Excel 

D. SAS 

Answer:


Q50. You are using the Apriori algorithm to determine the likelihood that a person who owns a home has a good credit score. You have determined that the confidence for the rules used in the algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are homeowners". What can you determine from the lift calculation? 

A. Support for the association is low 

B. Leverage of the rules is low 

C. The rule is coincidental 

D. The rule is true 

Answer:


Q51. What is the primary bottleneck in text classification? 

A. The availablilty of tagged training data. 

B. The ability to parse unstructured text data. 

C. The high dimensionality of text data. 

D. The fact that text corpora are dynamic. 

Answer:


Q52. Refer to the exhibit. 

You have run a linear regression model against your data, and have plotted true outcome versus predicted outcome. The R-squared of your model is 0.75. What is your assessment of the model? 

A. The R-squared may be biased upwards by the extreme-valued outcomes. Remove them and refit to get a better idea of the model's quality over typical data. 

B. The R-squared is good. The model should perform well. 

C. The extreme-valued outliers may negatively affect the model's performance. Remove them to see if the R-squared improves over typical data. 

D. The observations seem to come from two different populations,but this model fits them both equally well. 

Answer:


Q53. The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop. Which tool should they use? 

A. Sqoop 

B. Pig 

C. Chukwa 

D. Scribe 

Answer:


Q54. Refer to the exhibit. 

Click on the calculator icon in the upper left corner. You are given a list of pre-defined association rules: 

A) RENTER => BAD CREDIT B) RENTER => GOOD CREDIT C) HOME OWNER => BAD CREDIT D) HOME OWNER => GOOD CREDIT E) FREE HOUSING => BAD CREDIT F) FREE HOUSING => GOOD CREDIT 

For your next analysis, you must limit your dataset based on rules with confidence greater than 60%. 

Which of the rules will be kept in the analysis? 

A. Rules B and D 

B. Rules A and F 

C. Rules C and E 

D. Rules D and E 

Answer:


Q55. In data visualization, what is used to focus the audience on a key part of a chart? 

A. Emphasis colors 

B. Detailed text 

C. Pastel colors 

D. A data table 

Answer:


Q56. What would be considered "Big Data"? 

A. An OLAP Cube containing customer demographic information about 100,000,000 customers 

B. Daily Log files from a web server that receives 100,000 hits per minute 

C. Aggregated statistical data stored in a relational database table 

D. Spreadsheets containing monthly sales data for a Global 100 corporation 

Answer:


Q57. Which word or phrase completes the statement? Emphasis color is to standard color as   . 

A. Main message is to context 

B. Main message is to key findings 

C. Frequent item set is to item 

D. Pie chart is to proportions 

Answer:


Q58. You are asked to create a model to predict the total number of monthly subscribers for a specific magazine. You are provided with 1 year's worth of subscription and payment data, user demographic data, and 10 years worth of content of the magazine (articles and pictures). Which algorithm is the most appropriate for building a predictive model for subscribers? 

A. Linear regression 

B. Logistic regression 

C. Decision trees 

D. TF-IDF 

Answer:

98. Which word or phrase completes the statement? Structured data is to OLAP data as quasi- structured data is to 

A. Clickstream data 

B. XML data 

C. Text documents 

D. Image files 

Answer:


Q59. On analyzing your time series data you suspect that the data represented as 

y1, y2, y3, ... , yn-1, yn may have a trend component that is quadratic in nature.  Which pattern of data will indicate that the trend in the time series data is quadratic in nature? 

A. (y3-y2) – (y2-y1) = ………= (yn-yn-1)-(yn-1-yn-2) 

B. (y2-y1) = (y3-y2) = ……. = (yn-yn-1) 

C. ((y2-y1) /y1 ) * 100% = …….((yn-yn-1)/yn-1) * 100% 

D. (y4-y2) – (y3-y1) = ………= (yn-yn-2)-(yn-1-yn-3) 

Answer:


Q60. Refer to the Exhibit. 

In the Exhibit, the table shows the values for the input Boolean attributes "A", "B", and "C". It also shows the values for the output attribute "class". Which decision tree is valid for the data? 

A. Tree B 

B. Tree A 

C. Tree C 

D. Tree D 

Answer: