Databricks Certified Professional Data Scientist Exam Practice Test

Page: 1 / 14
Total 138 questions
Question 1

You are working on a email spam filtering assignment, while working on this you find there is new word e.g. HadoopExam comes in email, and in your solutions you never come across this word before, hence probability of this words is coming in either email could be zero. So which of the following algorithm can help you to avoid zero probability?



Answer : B


Question 2

A denote the event 'student is female' and let B denote the event 'student is French'. In a class of 100 students suppose 60 are French, and suppose that 10 of the French students are females. Find the probability that if I pick a French student, it will be a girl, that is, find P(A|B).



Answer : C


Question 3

What is the probability that the total of two dice will be greater than 8, given that the first die is a 6?



Answer : B


Question 4

If E1 and E2 are two events, how do you represent the conditional probability given that E2 occurs given that E1 has occurred?



Answer : C


Question 5

Question-26. There are 5000 different color balls, out of which 1200 are pink color. What is the maximum likelihood estimate for the proportion of "pink" items in the test set of color balls?



Answer : C

In general, for a fixed set of data and underlying statistical model the method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the 'agreement' of the selected model with the observed data, and for discrete random variables it indeed maximizes the probability of the observed data under the resulting distribution. Maximum-likelihood estimation gives a unified approach to estimation, which is well-defined in the case of the normal distribution and many other problems. However in some complicated problems, difficulties do occur: in such problems, maximum-likelihood estimators are unsuitable or do not exist.


Question 6

What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?



Answer : A


Question 7

Which of the following metrics are useful in measuring the accuracy and quality of a recommender system?



Answer : C


Page:    1 / 14   
Total 138 questions