Databricks-Certified-Data-Analyst-Associate Exam Practice Test Instant Access

Question 1

What describes the variance of a set of values?

AVariance is a measure of how far a single observed value is from a set ot va IN

BVariance is a measure of how far an observed value is from the variable's maximum or minimum value.

CVariance is a measure of central tendency of a set of values.

DVariance is a measure of how far a set of values is spread out from the sets central value.
Variance is a statistical measure that quantifies the dispersion or spread of a set of values around their mean (central value). It is calculated by taking the average of the squared differences between each value and the mean of the dataset. A higher variance indicates that the data points are more spread out from the mean, while a lower variance suggests that they are closer to the mean. This measure is fundamental in statistics to understand the degree of variability within a dataset.WikipediaWikipedia+1Investopedia+1

Answer : D

Question 2

Where in the Databricks SQL workspace can a data analyst configure a refresh schedule for a query when the query is not attached to a dashboard or alert?

AData bxplorer

BThe Visualization editor

CThe Query Editor

DThe Dashboard Editor
In Databricks SQL, to configure a refresh schedule for a query that is not attached to a dashboard or alert, a data analyst should use the Query Editor. Within the Query Editor, there is an option to set up scheduled executions for queries. This feature enables the query to run at specified intervals, ensuring that the results are updated regularly. By scheduling queries in this manner, analysts can automate data refreshes and maintain up-to-date query results without manual intervention.

Answer : C

Question 3

A data analyst needs to use the Databricks Lakehouse Platform to quickly create SQL queries and data visualizations. It is a requirement that the compute resources in the platform can be made serverless, and it is expected that data visualizations can be placed within a dashboard.

Which of the following Databricks Lakehouse Platform services/capabilities meets all of these requirements?

ADelta Lake

BDatabricks Notebooks

CTableau

DDatabricks Machine Learning

EDatabricks SQL
Databricks SQL is a serverless data warehouse on the Lakehouse that lets you run all of your SQL and BI applications at scale with your tools of choice, all at a fraction of the cost of traditional cloud data warehouses1.Databricks SQL allows you to create SQL queries and data visualizations using the SQL Analytics UI or the Databricks SQL CLI2.You can also place your data visualizations within a dashboard and share it with other users in your organization3.Databricks SQL is powered by Delta Lake, which provides reliability, performance, and governance for your data lake4.Reference:
Databricks SQL
Query data using SQL Analytics
Visualizations in Databricks notebooks
Delta Lake

Answer : E

Question 4

A data scientist has asked a data analyst to create histograms for every continuous variable in a data set. The data analyst needs to identify which columns are continuous in the data set.

What describes a continuous variable?

AA quantitative variable that never stops changing

BA quantitative variable Chat can take on a finite or countably infinite set of values

CA quantitative variable that can take on an uncountable set of values

DA categorical variable in which the number of categories continues to increase over time
A continuous variable is a type of quantitative variable that can assume an infinite number of values within a given range. This means that between any two possible values, there can be an infinite number of other values. For example, variables such as height, weight, and temperature are continuous because they can be measured to any level of precision, and there are no gaps between possible values. This is in contrast to discrete variables, which can only take on specific, distinct values (e.g., the number of children in a family). Understanding the nature of continuous variables is crucial for data analysts, especially when selecting appropriate statistical methods and visualizations, such as histograms, to accurately represent and analyze the data.

Answer : C

Question 5

Which of the following layers of the medallion architecture is most commonly used by data analysts?

ANone of these layers are used by data analysts

BGold

CAll of these layers are used equally by data analysts

DSilver

EBronze
The gold layer of the medallion architecture contains data that is highly refined and aggregated, and powers analytics, machine learning, and production applications. Data analysts typically use the gold layer to access data that has been transformed into knowledge, rather than just information. The gold layer represents the final stage of data quality and optimization in the lakehouse.Reference:What is the medallion lakehouse architecture?

Answer : B

Question 6

Which statement about subqueries is correct?

ASubqueries are not available in Databricks SQL

BSubqueries can be used like other user-defined functions to transform data into different data types.

CSubqueries can retrieve data without requiring the creation of a table or view.

DSubqueries can be used like other built-in functions to transform data into different data types.
In Databricks SQL, a subquery is a nested query within a larger SQL query that allows for the retrieval of data without the necessity of creating a table or view. This is particularly useful for simplifying complex queries by breaking them down into more manageable parts. Subqueries can be employed in various clauses such as SELECT, FROM, and WHERE to perform operations like filtering, transforming, and aggregating data on-the-fly. This flexibility enhances query efficiency and readability without the overhead of persisting intermediate results as separate tables or views.

Answer : C

Question 7

A data analyst is attempting to drop a table my_table. The analyst wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table;

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

AThe table's data was larger than 10 GB

BThe table did not have a location

CThe table was external

DThe table's data was smaller than 10 GB

EThe table was managed
An external table is a table that is defined in the metastore, but its data is stored outside of the Databricks environment, such as in S3, ADLS, or GCS. When an external table is dropped, only the metadata is deleted from the metastore, but the data files are not affected. This is different from a managed table, which is a table whose data is stored in the Databricks environment, and whose data files are deleted when the table is dropped. To delete the data files of an external table, the analyst needs to specify the PURGE option in the DROP TABLE command, or manually delete the files from the storage system.Reference:DROP TABLE,Drop Delta table features,Best practices for dropping a managed Delta Lake table

Answer : C

Databricks-Certified-Data-Analyst-Associate Databricks Certified Data Analyst Associate Exam Practice Test