Databricks Certified Data Analyst Associate Exam Questions

Page: 1 / 14
Total 65 questions
Question 1

A data analysis team is working with the table_bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table_bronze as the source of the duplication.

Which of the following queries can be used to deduplicate the data from table_bronze and write it to a new table table_silver?

A)

CREATE TABLE table_silver AS

SELECT DISTINCT *

FROM table_bronze;

B)

CREATE TABLE table_silver AS

INSERT *

FROM table_bronze;

C)

CREATE TABLE table_silver AS

MERGE DEDUPLICATE *

FROM table_bronze;

D)

INSERT INTO TABLE table_silver

SELECT * FROM table_bronze;

E)

INSERT OVERWRITE TABLE table_silver

SELECT * FROM table_bronze;



Answer : A


Question 2

A data analyst needs to use the Databricks Lakehouse Platform to quickly create SQL queries and data visualizations. It is a requirement that the compute resources in the platform can be made serverless, and it is expected that data visualizations can be placed within a dashboard.

Which of the following Databricks Lakehouse Platform services/capabilities meets all of these requirements?



Answer : E


Question 3

A stakeholder has provided a data analyst with a lookup dataset in the form of a 50-row CSV file. The data analyst needs to upload this dataset for use as a table in Databricks SQL.

Which approach should the data analyst use to quickly upload the file into a table for use in Databricks SOL?



Answer : A


Question 4

In which circumstance will there be a substantial difference between the variable's mean and median values?



Answer : D

The mean is sensitive to extreme values, often called outliers, which can significantly skew the average away from the true center of the data. The median, however, is a measure of central tendency that is resistant to such outliers because it only considers the middle value(s) when the data is ordered. Therefore, when a variable contains many extreme outliers, there will be a substantial difference between the mean and the median. According to Databricks data analysis materials, this is a fundamental concept when choosing summary statistics for reporting.


Question 5

Which of the following approaches can be used to connect Databricks to Fivetran for data ingestion?



Answer : C


Question 6

Query History provides Databricks SQL users with a lot of benefits. A data analyst has been asked to share all of these benefits with their team as part of a training exercise. One of the benefit statements the analyst provided to their team is incorrect.

Which statement about Query History is incorrect?



Answer : C

Query History in Databricks SQL is intended for reviewing executed queries, understanding their execution plans, and identifying performance issues or errors for debugging purposes. It allows users to analyze query duration, resources used, and potential bottlenecks. However, Query History does not provide any capability to automate the execution of queries across multiple warehouses; automation must be handled through jobs or external orchestration tools, not through the Query History feature itself.


Question 7

What describes the variance of a set of values?



Answer : D


Page:    1 / 14   
Total 65 questions