Snowflake SnowPro Advanced: Data Scientist Certification DSA-C02 Exam Questions

Page: 1 / 14
Total 65 questions
Question 1

Which of the following metrics are used to evaluate classification models?



Answer : D

Evaluation metrics are tied to machine learning tasks. There are different metrics for the tasks of classification and regression. Some metrics, like precision-recall, are useful for multiple tasks. Classification and regression are examples of supervised learning, which constitutes a majority of machine learning applications. Using different metrics for performance evaluation, we should be able to im-prove our model's overall predictive power before we roll it out for production on unseen data. Without doing a proper evaluation of the Machine Learning model by using different evaluation metrics, and only depending on accuracy, can lead to a problem when the respective model is deployed on unseen data and may end in poor predictions.

Classification metrics are evaluation measures used to assess the performance of a classification model. Common metrics include accuracy (proportion of correct predictions), precision (true positives over total predicted positives), recall (true positives over total actual positives), F1 score (har-monic mean of precision and recall), and area under the receiver operating characteristic curve (AUC-ROC).

Confusion Matrix

Confusion Matrix is a performance measurement for the machine learning classification problems where the output can be two or more classes. It is a table with combinations of predicted and actual values.

It is extremely useful for measuring the Recall, Precision, Accuracy, and AUC-ROC curves.

The four commonly used metrics for evaluating classifier performance are:

1. Accuracy: The proportion of correct predictions out of the total predictions.

2. Precision: The proportion of true positive predictions out of the total positive predictions (precision = true positives / (true positives + false positives)).

3. Recall (Sensitivity or True Positive Rate): The proportion of true positive predictions out of the total actual positive instances (recall = true positives / (true positives + false negatives)).

4. F1 Score: The harmonic mean of precision and recall, providing a balance between the two metrics (F1 score = 2 * ((precision * recall) / (precision + recall))).

These metrics help assess the classifier's effectiveness in correctly classifying instances of different classes.

Understanding how well a machine learning model will perform on unseen data is the main purpose behind working with these evaluation metrics. Metrics like accuracy, precision, recall are good ways to evaluate classification models for balanced datasets, but if the data is imbalanced then other methods like ROC/AUC perform better in evaluating the model performance.

ROC curve isn't just a single number but it's a whole curve that provides nuanced details about the behavior of the classifier. It is also hard to quickly compare many ROC curves to each other.


Question 2

Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?



Answer : D

It will Filters the row labelled r3.


Question 3

Which ones are the type of visualization used for Data exploration in Data Science?



Answer : A, D, E

Type of visualization used for exploration:

* Correlation heatmap

* Class distributions by feature

* Two-Dimensional density plots.

All the visualizations are interactive, as is standard for Plotly.

For More details, please refer the below link:

https://towardsdatascience.com/data-exploration-understanding-and-visualization-72657f5eac41


Question 4

Which ones are the correct rules while using a data science model created via External function in Snowflake?



Answer : A, B, C, D

From the perspective of a user running a SQL statement, an external function behaves like any other UDF . External functions follow these rules:

External functions return a value.

External functions can accept parameters.

An external function can appear in any clause of a SQL statement in which other types of UDF can appear. For example:

1. select my_external_function_2(column_1, column_2)

2. from table_1;

1. select col1

2. from table_1

3. where my_external_function_3(col2) < 0;

1. create view view1 (col1) as

2. select my_external_function_5(col1)

3. from table9;

An external function can be part of a more complex expression:

1. select upper(zipcode_to_city_external_function(zipcode))

2. from address_table;

The returned value can be a compound value, such as a VARIANT that contains JSON.

External functions can be overloaded; two different functions can have the same name but different signatures (different numbers or data types of input parameters).


Question 5

Which of the Following is not type of Windows function in Snowflake?



Answer : C, D

Window Functions

A window function operates on a group (''window'') of related rows.

Each time a window function is called, it is passed a row (the current row in the window) and the window of rows that contain the current row. The window function returns one output row for each input row. The output depends on the individual row passed to the function and the values of the other rows in the window passed to the function.

Some window functions are order-sensitive. There are two main types of order-sensitive window functions:

Rank-related functions.

Window frame functions.

Rank-related functions list information based on the ''rank'' of a row. For example, if you rank stores in descending order by profit per year, the store with the most profit will be ranked 1; the second-most profitable store will be ranked 2, etc.

Window frame functions allow you to perform rolling operations, such as calculating a running total or a moving average, on a subset of the rows in the window.


Question 6

Which object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data of Data Science Pipelines?



Answer : C

A stream object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data. This process is referred to as change data capture (CDC). An individual table stream tracks the changes made to rows in a source table. A table stream (also referred to as simply a ''stream'') makes a ''change table'' available of what changed, at the row level, between two transactional points of time in a table. This allows querying and consuming a sequence of change records in a transactional fashion.

Streams can be created to query change data on the following objects:

* Standard tables, including shared tables.

* Views, including secure views

* Directory tables

* Event tables


Question 7

Skewness of Normal distribution is ___________



Answer : C

Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical explanation for mathematical proofs, you can refer to books or websites that speak on the same in detail.


Page:    1 / 14   
Total 65 questions