CompTIA DY0-001 Exam Practice Test Instant Access

Question 1

The term "greedy algorithms" refers to machine-learning algorithms that:

Aupdate priors as more data is seen.

Bexamine even/ node of a tree before making a decision.

Capply a theoretical model to the distribution of the data.

Dmake the locally optimal decision.

Answer : D

Greedy algorithms build the solution iteratively by choosing at each step the option that appears best at that moment, without reconsidering earlier choices.

Question 2

Which of the following distributions would be best to use for hypothesis testing on a data set with 20 observations?

APower law

BNormal

CUniform

DStudent's t-

Answer : D

With only 20 observations and an unknown population variance, the t-distribution (with -- 1 degrees of freedom) properly accounts for the extra uncertainty in the standard error when performing hypothesis tests.

Question 3

A data scientist built several models that perform about the same but vary in the number of features. Which of the following models should the data scientist recommend for production according to Occam's razor?

AThe model with the fewest features and highest performance

BThe model with the fewest features and the lowest performance

CThe model with the most features and the lowest performance

DThe model with the most features and the highest performance

Answer : A

According to Occam's razor, when models perform equivalently, you choose the simplest one - in this case, the model that achieves the needed performance with the fewest features.

Question 4

The most likely concern with a one-feature, machine-learning model is high error due to:

Abias

Bdimensionality.

Cvariance.

Dprobability.

Answer : A

A model with only one feature is unlikely to capture the true complexity of the data's underlying relationships, leading to systematic underfitting - i.e., high bias.

Question 5

A data scientist is preparing to brief a non-technical audience that is focused on analysis and results. During the modeling process, the data scientist produced the following artifacts:

Which of the following artifacts should the data scientist include in the briefing? (Choose two.)

AFinal charts and dashboards

BModel selection, justification, and purpose

CCode documentation

DMathematical descriptions of clustering algorithms included in the selected model

EModel performance statistics (accuracy, precision, recall, F1_ score, etc.)

FData dictionary

Answer : A

For a nontechnical audience centered on results, polished visualizations (charts and dashboards) and clear, high-level performance metrics (accuracy, precision, recall, F1 score) best convey the key takeaways. The deeper technical details, code docs, data dictionaries, and algorithm math, should be omitted at this level.

Question 6

A data scientist wants to predict a person's travel destination. The options are:

Which of the following models would best fit this use case?

ALinear discriminant analysis

Bk-means modeling

CLatent semantic analysis

DPrincipal component analysis

Answer : A

You need a supervised multiclass classification model to predict one of the four labeled destinations. Linear Discriminant Analysis is designed for such tasks, finding the linear boundaries that best separate the known destination classes.

Question 7

A data scientist is standardizing a large data set that contains website addresses. A specific string inside some of the web addresses needs to be extracted. Which of the following is the best method for extracting the desired string from the text data?

ARegular expressions

BNamed-entity recognition

CLarge language model

DFind and replace

Answer : A

CompTIA DY0-001 CompTIA DataX Certification Exam Practice Test