Databricks Certified Data Engineer Associate Exam Practice Test

Page: 1 / 14
Total 91 questions
Question 1

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?



Question 2

Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?



Question 3

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?



Answer : C

Option C is the correct answer because Parquet files have a well-defined schema that is embedded within the data itself. This means that the data types and column names of the Parquet files are automatically detected and preserved when creating an external table from them. This also enables the use of SQL and other structured query languages to access and analyze the data. CSV files, on the other hand, do not have a schema embedded in them, and require specifying the schema explicitly or inferring it from the data when creating an external table from them. This can lead to errors or inconsistencies in the data types and column names, and also increase the processing time and complexity.


Question 4

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?



Answer : B

To send the Databricks Job owner an email in the case that the Job fails, the best approach is to set up an Alert in the Job page. This way, the Job owner can configure the email address and the notification type for the Job failure event. The other options are either not feasible, not reliable, or not relevant for this task. Manually programming an alert system in each cell of the Notebook is tedious and error-prone. Setting up an Alert in the Notebook is not possible, as Alerts are only available for Jobs and Clusters. There is a way to notify the Job owner in the case of Job failure, so option D is incorrect. MLflow Model Registry Webhooks are used for model lifecycle events, not Job events, so option E is not applicable.Reference:

Add email and system notifications for job events

Alerts

MLflow Model Registry Webhooks


Question 5

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?



Answer : D

A webhook alert destination is a notification destination that allows Databricks to send HTTP POST requests to a third-party endpoint when an alert is triggered. This enables the data engineer to integrate Databricks alerts with their preferred messaging or collaboration platform, such as Slack, Microsoft Teams, or PagerDuty. To set up a webhook alert destination, the data engineer needs to create and configure a webhook connector in their messaging platform, and then add the webhook URL to the Databricks notification destination. After that, the data engineer can create an alert for their Databricks SQL query, and select the webhook alert destination as the notification destination. The alert can be configured with a custom condition, such as when the number of stores with $0 in sales is greater than zero, and a custom message template, such as ''Alert: {number_of_stores} stores have $0 in sales''. The alert can also be configured with a recurrence interval, such as every hour, to check the query result periodically. When the alert condition is met, the data engineer and their team will receive a notification via the messaging webhook, with the custom message and a link to the Databricks SQL query. The other options are either not suitable for sending notifications via a messaging webhook (A, B, E), or not suitable for sending recurring notifications .Reference:Databricks Documentation - Manage notification destinations,Databricks Documentation - Create alerts for Databricks SQL queries,Databricks Documentation - Configure alert conditions and messages.


Question 6
Question 7

A data engineer is working with two tables. Each of these tables is displayed below in its entirety. The data engineer runs the following query to join these tables together: Which of the following will be returned by the above query?



Answer : A

Option A is the correct answer because it shows the result of an INNER JOIN between the two tables. An INNER JOIN returns only the rows that have matching values in both tables based on the join condition. In this case, the join condition isON a.customer_id = c.customer_id, which means that only the rows that have the same customer ID in both tables will be included in the output. The output will have four columns: customer_id, name, account_id, and overdraft_amt. The output will have four rows, corresponding to the four customers who have accounts in the account table.


Page:    1 / 14   
Total 91 questions