Databricks Certified Professional Data Scientist Exam Practice Test Instant Access

Question 1

In statistics, maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters and the normalizing constant usually ignored in MLEs because

AThe normalizing constant is always very close to 1

BThe normalizing constant only has a small impact on the maximum likelihood

CThe normalizing constant is often zero and can cause division by zero

DThe normalizing constant doesn't impact the maximizing value
(Change the explanation even it is correct)A normalizing constant is positive, and multiplying or dividing a series of values by a positive number does not affect which of them is the largest. Maximum likelihood estimation is concerned only with finding a maximum value, so normalizing constants can be ignored.

Answer : D

Question 2

Which of the following question statement falls under data science category?

AWhat happened in last six months?

BHow many products have been sold in a last month?

CWhere is a problem for sales?

DWhich is the optimal scenario for selling this product?

EWhat happens, if these scenario continues?
This question wants to check your understanding about Bl and Data Science. Bl was already existing and analytics team already using it. They need to improve and learn data science technique to solve some problems. If you check the option given in the question, it will confuse you. But if you have worked in Bl or as a Data Scientist then it is easy to answer. First 3 option can be easily answered using reporting solution, what sales happened in last six month, what was the problem etc.
But for the last two option you need to apply data science techniques like which all scenarios are optimal for product sales, you need to collect the data and applying various techniques for that. Hence, last two option can only be answered using Data Science technique And for this you need to apply techniques like Optimization, predictive modeling, statistical analysis on structured and un-structured data.

Answer : D, E

Question 3

In which of the following scenario we can use naTve Bayes theorem for classification

AClassify whether a given person is a male or a female based on the measured features. The features include height, weight and foot size.

BTo classify whether an email is spam or not spam

CTo identify whether a fruit is an orange or not based on features like diameter, color and shape
naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. They requires a small amount of training data to estimate the necessary parameters

Answer : A, B, C

Question 4

Refer to Exhibit

In the exhibit, the x-axis represents the derived probability of a borrower defaulting on a loan. Also in the exhibit, the pink represents borrowers that are known to have not defaulted on their loan, and the blue represents borrowers that are known to have defaulted on their loan. Which analytical method could produce the probabilities needed to build this exhibit?

ALinear Regression

BLogistic Regression

CDiscriminant Analysis

DAssociation Rules

Answer : B

Question 5

Let's say you have two cases as below for the movie ratings

1. You recommend to a user a movie with four stars and he really doesn't like it and he'd rate it two stars

2. You recommend a movie with three stars but the user loves it (he'd rate it five stars). So which statement correctly applies?

AIn both cases, the contribution to the RMSE is the same

BIn both cases, the contribution to the RMSE is the different

CIn both cases, the contribution to the RMSE, could varies

DNone of the above

Answer : A

Question 6

Select the choice where Regression algorithms are not best fit

AWhen the dimension of the object given

BWeight of the person is given

CTemperature in the atmosphere

DEmployee status
Regression algorithms are usually employed when the data points are inherently numerical variables (such as the dimensions of an object the weight of a person, or the temperature in the atmosphere) but unlike Bayesian algorithms, they're not very good for categorical data (such as employee status or credit score description).

Answer : D

Question 7

You have used k-means clustering to classify behavior of 100, 000 customers for a retail store. You decide to use household income, age, gender and yearly purchase amount as measures. You have chosen to use 8 clusters and notice that 2 clusters only have 3 customers assigned. What should you do?

ADecrease the number of measures used

BIncrease the number of clusters

CDecrease the number of clusters

DIdentify additional measures to add to the analysis
kmeans uses an iterative algorithm that minimizes the sum of distances from each object to its cluster centroid, over all clusters. This algorithm moves objects between clusters until the sum cannot be decreased further. The result is a set of clusters that are as compact and well-separated as possible. You can control the details of the minimization using several optional input parameters to kmeans, including ones for the initial values of the cluster centroids, and for the maximum number of iterations.
Clustering is primarily an exploratory technique to discover hidden structures of the data: possibly as a prelude to more focused analysis or decision processes. Some specific applications of k-means are image processing^ medical and customer segmentation. Clustering is often used as a lead-in to classification. Once the clusters are identified,
labels can be applied to each cluster to classify each group based on its characteristics. Marketing and sales groups use k-means to better identify customers who have similar behaviors and spending patterns.

Answer : C