NEW QUESTION 1
A company has branch offices in eight regions. Customers within each region are classified as either "High Value" or "Medium Value" and are coded using the variable name VALUE. In the last year, the total amount of purchases per customer is used as the response variable.
Suppose there is a significant interaction between REGION and VALUE. What can you conclude?

• A. More high value customers are found in some regions than others.
• B. The difference between average purchases for medium and high value customers depends on the region.
• C. Regions with higher average purchases have more high value customers.
• D. Regions with higher average purchases have more medium value customers.

NEW QUESTION 2
CORRECT TEXT
A linear model has the following characteristics:
*A dependent variable (y)
*One continuous variable (xl), including a quadratic term (x12)
*One categorical (d with 3 levels) predictor variable and an interaction term (d by x1) How many parameters, including the intercept, are associated with this model? • A. Mastered
• B. Not Mastered

Explanation:
7

NEW QUESTION 3
Refer to the exhibit. Based on the control plot, which conclusion is justified regarding the means of the response?

• A. All groups are significantly different from each other.
• B. 2XL is significantly different from all other groups.
• C. Only XL and 2XL are not significantly different from each other.
• D. No groups are significantly different from each other.

NEW QUESTION 4
Suppose training data are oversampled in the event group to make the number of events and non-events roughly equal. A logistic regression is run and the probabilities are output to a data set NEW and given the variable name PE. A decision rule considered is, "Classify data as an event if probability is greater than 0.5." Also the data set NEW contains a variable TG that indicates whether there is an event (1=Event, 0= No event).
The following SAS program was used. What does this program calculate?

• A. Depth
• B. Sensitivity
• C. Specificity
• D. Positive predictive value

NEW QUESTION 5
Refer to the confusion matrix: Calculate the sensitivity. (0 - negative outcome, 1 - positive outcome) Click the calculator button to display a calculator if needed.

• A. 25/48
• B. 58/102
• C. 25/B9
• D. 58/81

NEW QUESTION 6
Refer to the ROC curve: As you move along the curve, what changes?

• A. The priors in the population
• B. The true negative rate in the population
• C. The proportion of events in the training data
• D. The probability cutoff for scoring

NEW QUESTION 7
Screening for non-linearity in binary logistic regression can be achieved by visualizing:

• A. A scatter plot of binary response versus a predictor variable.
• B. A trend plot of empirical logit versus a predictor variable.
• C. A logistic regression plot of predicted probability values versus a predictor variable.
• D. A box plot of the odds ratio values versus a predictor variable.

NEW QUESTION 8
Refer to the exhibit: Based upon the comparative ROC plot for two competing models, which is the champion model and why?

• A. Candidate 1, because the area outside the curve is greater
• B. Candidate 2, because the area under the curve is greater
• C. Candidate 1, because it is closer to the diagonal reference curve
• D. Candidate 2, because it shows less over fit than Candidate 1

NEW QUESTION 9
Refer to the lift chart: At a depth of 0.1, Lift = 3.14. What does this mean?

• A. Selecting the top 10% of the population scored by the model should result in 3.14 times more events than a random draw of 10%.
• B. Selecting the observations with a response probability of at least 10% should result in3.14 times more events than a random draw of 10%.
• C. Selecting the top 10% of the population scored by the model should result in 3.14 timesgreater accuracy than a random draw of 10%.
• D. Selecting the observations with a response probability of atleast 10% should result in 3.14times greater accuracy than a random draw of 10%.

NEW QUESTION 10
An analyst has a sufficient volume of data to perform a 3-way partition of the data into training, validation, and test sets to perform honest assessment during the model building process.
What is the purpose of the test data set?

• A. To provide a unbiased measure of assessment for the final model.
• B. To compare models and select and fine-tune the final model.
• C. To reduce total sample size to make computations more efficient.
• D. To build the predictive models.

NEW QUESTION 11
The selection criterion used in the forward selection method in the REG procedure is:

• B. SLE
• C. Mallows' Cp
• D. AIC

NEW QUESTION 12
In order to perform honest assessment on a predictive model, what is an acceptable division between training, validation, and testing data?

• A. Training: 50% Validation: 0% Testing: 50%
• B. Training: 100% Validation: 0% Testing: 0%
• C. Training: 0% Validation: 100% Testing: 0%
• D. Training: 50% Validation: 50% Testing: 0%

NEW QUESTION 13
Refer to the exhibit: The box plot was used to analyze daily sales data following three different ad campaigns. The business analyst concludes that one of the assumptions of ANOVA was violated.
Which assumption has been violated and why?

• A. Normality, because Prob > F < .0001.
• B. Normality, because the interquartile ranges are different in different ad campaigns.
• C. Constant variance, because Prob > F < .0001.
• D. Constant variance, because the interquartile ranges are different in different ad campaigns.

NEW QUESTION 14
This question will ask you to provide missing code segments.
A logistic regression model was fit on a data set where 40% of the outcomes were events (TARGET=1) and 60% were non-events (TARGET=0). The analyst knows that the population where the model will be deployed has 5% events and 95% non-events. The analyst also knows that the company's profit margin for correctly targeted events is nine times higher than the company's loss for incorrectly targeted non-event.
Given the following SAS program: What X and Y values should be added to the program to correctly score the data?

• A. X=40, Y=10
• B. X=.05, Y=10
• C. X=.05, Y=.40
• D. X=.10.Y=05

NEW QUESTION 15
A confusion matrix is created for data that were oversampled due to a rare target. What values are not affected by this oversampling?

• A. Sensitivity and PV+
• B. Specificity and PV-
• C. PV+ and PV-
• D. Sensitivity and Specificity

NEW QUESTION 16
An analyst knows that the categorical predictor, storeId, is an important predictor of the target.
However, store_Id has too many levels to be a feasible predictor in the model. The analyst
wants to combine stores and treat them as members of the same class level. What are the two most effective ways to address the problem? (Choose two.)

• A. Eliminate store_id as a predictor in the model because it has too many levels to be feasible.
• B. Cluster by using Greenacre's method to combine stores that are similar.
• C. Use subject matter expertise to combine stores that are similar.
• D. Randomly combine the stores into five groups to keep the stochastic variation among the observations intact.

NEW QUESTION 17
The question will ask you to provide a missing statement. Given the following SAS program: Which SAS statement will complete the program to correctly score the data set NEW_DATA?

• A. Scoredata data=MYDIR.NEW_DATA out=scores;
• B. Scoredata data=MYDIR.NEW_DATA output=scores;
• C. Scoredata=HYDIR.NEU_DATA output=scores;
• D. Scoredata=MYDIR,NEW DATA out=scores;

NEW QUESTION 18
Which SAS program will detect collinearity in a multiple regression application? • A. Option A
• B. Option B
• C. Option C
• D. Option D

NEW QUESTION 19
Given the following GLM procedure output: Which statement is correct at an alpha level of 0.05?

• A. School*Gender should be removed because it is non-significant.
• B. Gender should be removed because it is non-significant.
• C. School should be removed because it is significant.
• D. Gender should not be removed due to its involvement in the significant interaction.

NEW QUESTION 20
Refer to the exhibit. Given alpha=0.02, which conclusion is justified regarding percentage of body fat, comparing small (S), medium (M), and large (L) wrist sizes?

• A. Medium wrist size is significantly different than small wrist size.
• B. Large wrist size is significantly different than medium wrist size.
• C. Large wrist size is significantly different than small wrist size.
• D. There is no significant difference due to wrist size.

NEW QUESTION 21
Refer to the following odds ratio table: What is a correct interpretation of the estimate?

• A. The odds of the event are 1.142 greater for each one dollar increase in salary.
• B. The odds of the event are 1.142 greater for each one thousand dollar increase in salary.
• C. The probability of the event is 1.142 greater for each one dollar increase in salary.
• D. The probability of the event is 1.142 greater for each one thousand dollar increase in salary.