Snowflake DSA-C02 SnowPro Advanced: Data Scientist Certification Exam Practice Test

Page: 1 / 14
Total 65 questions
Question 1

Mark the correct steps for saving the contents of a DataFrame to a Snowflake table as part of Moving Data from Spark to Snowflake?



Answer : C

Moving Data from Spark to Snowflake

The steps for saving the contents of a DataFrame to a Snowflake table are similar to writing from Snowflake to Spark:

1. Use the write() method of the DataFrame to construct a DataFrameWriter.

2. Specify SNOWFLAKE_SOURCE_NAME using the format() method.

3. Specify the connector options using either the option() or options() method.

4. Use the dbtable option to specify the table to which data is written.

5. Use the mode() method to specify the save mode for the content.

Examples

1. df.write

2. .format(SNOWFLAKE_SOURCE_NAME)

3. .options(sfOptions)

4. .option('dbtable', 't2')

5. .mode(SaveMode.Overwrite)

6. .save()


Question 2

Which of the Following is not type of Windows function in Snowflake?



Answer : C, D

Window Functions

A window function operates on a group (''window'') of related rows.

Each time a window function is called, it is passed a row (the current row in the window) and the window of rows that contain the current row. The window function returns one output row for each input row. The output depends on the individual row passed to the function and the values of the other rows in the window passed to the function.

Some window functions are order-sensitive. There are two main types of order-sensitive window functions:

Rank-related functions.

Window frame functions.

Rank-related functions list information based on the ''rank'' of a row. For example, if you rank stores in descending order by profit per year, the store with the most profit will be ranked 1; the second-most profitable store will be ranked 2, etc.

Window frame functions allow you to perform rolling operations, such as calculating a running total or a moving average, on a subset of the rows in the window.


Question 3

Which of the following process best covers all of the following characteristics?

* Collecting descriptive statistics like min, max, count and sum.

* Collecting data types, length and recurring patterns.

* Tagging data with keywords, descriptions or categories.

* Performing data quality assessment, risk of performing joins on the data.

* Discovering metadata and assessing its accuracy.

Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.



Answer : C

Data processing and analysis cannot happen without data profiling---reviewing source data for con-tent and quality. As data gets bigger and infrastructure moves to the cloud, data profiling is increasingly important.

What is data profiling?

Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.

Data profiling is a crucial part of:

* Data warehouse and business intelligence (DW/BI) projects---data profiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.

* Data conversion and migration projects---data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target. It can also un-cover new requirements for the target system.

* Source system data quality projects---data profiling can highlight data which suffers from serious or numerous quality issues, and the source of the issues (e.g. user inputs, errors in interfaces, data corruption).

Data profiling involves:

* Collecting descriptive statistics like min, max, count and sum.

* Collecting data types, length and recurring patterns.

* Tagging data with keywords, descriptions or categories.

* Performing data quality assessment, risk of performing joins on the data.

* Discovering metadata and assessing its accuracy.

* Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.


Question 4

Which command is used to install Jupyter Notebook?



Answer : A

Jupyter Notebook is a web-based interactive computational environment.

The command used to install Jupyter Notebook is pip install jupyter.

The command used to start Jupyter Notebook is jupyter notebook.


Question 5

Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the expression g = df.groupby(df.index.str.len()) do?



Answer : D

Data frames cannot be grouped by index values. Hence it results in Error.


Question 6

Which Python method can be used to Remove duplicates by Data scientist?



Answer : D

The drop_duplicates() method removes duplicate rows.

dataframe.drop_duplicates(subset, keep, inplace, ignore_index)

Remove duplicate rows from the DataFrame:

1. import pandas as pd

2. data = {

3. 'name': ['Peter', 'Mary', 'John', 'Mary'],

4. 'age': [50, 40, 30, 40],

5. 'qualified': [True, False, False, False]

6. }

7.

8. df = pd.DataFrame(data)

9. newdf = df.drop_duplicates()


Question 7

Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. [Select 2]



Answer : A, C

Query and process data with a DataFrame object. Refer to Working with DataFrames in Snowpark Python.

Convert custom lambdas and functions to user-defined functions (UDFs) that you can call to process data.

Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.

Write a stored procedure that you can call to process data, or automate with a task to build a data pipeline.


Page:    1 / 14   
Total 65 questions