Microsoft DP-750 Exam Questions Instant Access

Question 1

You need to deploy Databricks Asset Bundles to a development environment. The solution must support automated and repeatable deployments across environments.

What should you use?

Athe Azure Developer CLI (azd)

BGit folders

Cthe Databricks CLI

Dthe Azure Command-Line Interface (CLI)

Answer : C

CORRECT ANSWE R: C - The Databricks CLI.

According to Microsoft Learn on Databricks Asset Bundles deployment, the Databricks CLI (version 0.205+) is the official tool for deploying DABs to any environment. The deployment commands 'databricks bundle deploy' and 'databricks bundle run' are part of the CLI and support automated, repeatable, and environment-aware deployments. The CLI reads the databricks.yml configuration and deploys the bundle resources to the specified target environment. Option A (Azure Developer CLI / azd) is for deploying Azure infrastructure and does not natively support Databricks Asset Bundles. Option B (Git folders) is a workspace feature for syncing notebook code from Git but does not handle full DAB deployment. Option D (Azure CLI) manages Azure infrastructure and resources but does not have native support for deploying Databricks Asset Bundles.

Question 2

You have an Azure Databricks workspace that is enabled for Unity Catalog and contains a managed Delta table named Table1. Table1 stores customer data.

You need to implement a data retention solution that meets the following requirements:

Deleted data must be retained for 30 days to support audits.

Deleted data that is older than 30 days must be removed permanently.

The solution must minimize administrative effort.

Which two properties should you configure? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Adelta.timeUntilArchived

Bdelta.deletedFileRetentionDuration

Cdelta.autoOptimize.autoCompact

Ddelta.logRetentionDuration

Edelta.enableDeletionVectors

Answer : B, D

CORRECT ANSWE R: B - delta.deletedFileRetentionDuration; D - delta.logRetentionDuration.

According to Microsoft Learn on Delta Lake data retention, two table properties control how long data is retained after deletion. The delta.deletedFileRetentionDuration property controls how long physically deleted data files are retained before the VACUUM command can remove them --- setting this to 30 days ensures deleted data is retained for 30 days to support audits. The delta.logRetentionDuration property controls how long the Delta transaction log is kept --- this enables time-travel queries for the 30-day audit window. Together, both properties must be configured to 30 days to meet the full requirement. Option A (delta.timeUntilArchived) does not exist as a standard Delta property. Option C (delta.autoOptimize.autoCompact) controls file compaction, not retention. Option E (delta.enableDeletionVectors) enables deletion vectors for faster deletes but does not control data retention duration.

Question 3

You have an Azure Databricks workspace that is enabled for Unity Catalog

You have an Apache Spark Structured Streaming job that writes data to a Delta table.

After the cluster restarts, the streaming job reprocesses previously ingested data

You need to prevent the streaming job from reprocessing the data after the cluster restarts.

What should you do?

AIncrease the trigger interval of the streaming query.

BConfigure a checkpoint location for the streaming query.

CConfigure a watermark for the streaming query.

DEnable change data feed (CDF) for the target table.

Answer : B

CORRECT ANSWE R: B - Configure a checkpoint location for the streaming query.

According to Microsoft Learn on Apache Spark Structured Streaming, checkpointing is the mechanism that enables fault tolerance and exactly-once processing semantics. The checkpoint stores the streaming query's progress --- including the offset of the last successfully processed batch --- in a durable storage location (typically ADLS Gen2 or DBFS). When the cluster restarts, the streaming query reads the checkpoint to determine the last committed offset and resumes from that point, preventing reprocessing of already-ingested data. Option A (increase trigger interval) controls how frequently micro-batches run but does not prevent reprocessing on restart. Option C (watermark) handles late-arriving data in event-time processing but does not prevent reprocessing on restart. Option D (enable CDF) tracks changes to a Delta table but does not affect streaming source offset management.

Question 4

You have an Azure Databricks workspace that is enabled for Unity Catalog.

You have a Lakeflow Spark Declarative Pipelines (SDP) pipeline that writes numerical data to a table named Table1 by using a data quality validation rule named rule1.

You need to modify rule1 to meet the following requirements:

Ensure that amount is always greater than 0.

Prevent an update to Table1 from being committed when data that violates rule1 is detected.

Which statement should you execute?

A@dlt.expect_all_or_drop({'rule1': 'amount > 0'})

B@dlt.expect_or_drop('rule1', 'amount > 0')

C@dlt.expect_or_fail('rule1', 'amount > 0')

D@dlt.expect('rule1', 'amount > 0')

Answer : C

CORRECT ANSWE R: C - @dlt.expect_or_fail('rule1', 'amount > 0')

According to Microsoft Learn on Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) data quality, there are three expectation decorators with different behaviors on violation. @dlt.expect records violations as metrics but continues processing and writes all records. @dlt.expect_or_drop drops violating records but allows the pipeline to continue. @dlt.expect_or_fail fails the entire pipeline update and prevents the commit to the table when a violation is detected --- this is the correct choice when the requirement is 'Prevent an update to Table1 from being committed when data that violates rule1 is detected.' @dlt.expect_all_or_drop accepts a dictionary of rules and drops violating rows but does not halt the pipeline. Since the requirement explicitly states the update must not be committed on violation, @dlt.expect_or_fail is the only decorator that provides this fail-fast, transactional guarantee.

Question 5

You use Databricks Asset Bundles to manage two jobs and an app.

You need to deploy the bundle to development and production environments. The solution must meet the following requirements

* Deploy the app to both environments.

* Deploy only one job to development.

* Minimize administrative effort.

What should you use?

Aa resources node in a databricks.yml file

Bseparate databricks.yml files for each environment

Ca variables node in a databricks.yml file

Da targets node in a databricks.yml file

Answer : D

CORRECT ANSWE R: D - A targets node in a databricks.yml file.

According to Microsoft Learn on Databricks Asset Bundles (DAB), the targets node in databricks.yml defines environment-specific configurations (development, staging, production). Within each target, you can override resource inclusion using the include/exclude mechanism or resource-level overrides. The requirement to 'deploy the app to both environments' and 'deploy only one job to development' with 'minimize administrative effort' is best achieved through a single databricks.yml with a targets node --- where the development target excludes or overrides one of the jobs. Option A (resources node) defines all resources but doesn't handle environment-specific filtering. Option B (separate databricks.yml files) requires maintaining multiple files and increases administrative effort. Option C (variables node) handles parameterization but not resource inclusion/exclusion.

Question 6

You have an Azure Databricks workspace named Workspace1 that contains a takehouse and is enabled for Unity Catalog.

You have a connection to a Microsoft SQL Server database named DB1.

You need to expose the schemas and tables of DB1 to meet the following requirements:

* The schemas and tables can be queried in Databricks.

* The schemas and tables appear alongside other Unity Catalog objects.

* The data is NOT copied into Databricks-managed storage.

Solution: You create a new native catalog in Unity Catalog. Does this meet the goal?

AYes

BNo

Answer : B

CORRECT ANSWE R: B - No.

According to Microsoft Learn on Unity Catalog catalog types, creating a native catalog creates a standard Unity Catalog-managed catalog that stores metadata and data within Databricks-managed storage. A native catalog does NOT federate or expose external database objects from SQL Server. The requirements specify that 'the data is NOT copied into Databricks-managed storage' and that DB1's schemas/tables appear alongside Unity Catalog objects --- this requires a Foreign Catalog, not a native catalog. A foreign catalog uses Lakehouse Federation to create a read-only, virtual representation of an external database within Unity Catalog without moving or copying the data. Therefore, creating a native catalog does not meet the goal, as it has no connection to the SQL Server database DB1.

Question 7

You need to configure compute for the ingestion of telemetry data. The solution must meet the data ingestion and processing requirements.

What should you do?

AEnable Photon acceleration for a job compute cluster.

BMove the ingestion pipelines to shared compute.

CIncrease an all-purpose cluster to a larger fixed node type.

DDisable autoscaling for a job compute cluster.

Answer : A

CORRECT ANSWE R: A - Enable Photon acceleration for a job compute cluster.

According to Microsoft Learn and the Azure Databricks documentation, Photon is a high-performance vectorized query engine written in C++ that accelerates Apache Spark workloads, especially ingestion and SQL operations. The Contoso technical requirement states: 'Ensure that production ingestion workloads run on compute clusters that can scale automatically during telemetry spikes' and 'Provide fast and consistent performance for BI workloads.' Photon on a job compute cluster directly addresses both speed and consistency for ingestion pipelines. Option B is incorrect because moving ingestion to shared compute would violate the requirement to isolate production from development. Option C is incorrect because increasing a fixed-node all-purpose cluster does not provide autoscaling. Option D is incorrect because disabling autoscaling would prevent the cluster from handling bursty telemetry workloads, directly contradicting the stated requirements.

Microsoft Implementing Data Engineering Solutions Using Azure Databricks DP-750 Exam Questions