Microsoft DP-203 Data Engineering on Microsoft Azure Exam Practice Test

Page: 1 / 14
Total 316 questions
Question 1

You have an Azure data factor/ connected to a Git repository that contains the following branches:

* mam: Collaboration branch

* abc: Feature branch

* xyz: Feature branch

You save charges to a pipeline in the xyz branch.

You need to publish the changes to the live service

What should you do first?



Answer : D


Question 2

You have an Azure Synapse Analytics dedicated SQL pool named Pool1.

Pool! contains two tables named SalesFact_Stagmg and SalesFact. Both tables have a matching number of partitions, all of which contain data.

You need to load data from SalesFact_Staging to SalesFact by switching a partition.

What should you specify when running the alter TABLE statement?



Answer : B


Question 3

You have an Azure data factory that connects to a Microsoft Purview account. The data 'factory is registered in Microsoft Purview.

You update a Data Factory pipeline.

You need to ensure that the updated lineage is available in Microsoft Purview.

What should you do first?



Answer : D


Question 4

You have two Azure Blob Storage accounts named account1 and account2?

You plan to create an Azure Data Factory pipeline that will use scheduled intervals to replicate newly created or modified blobs from account1 to account?

You need to recommend a solution to implement the pipeline. The solution must meet the following requirements:

* Ensure that the pipeline only copies blobs that were created of modified since the most recent replication event.

* Minimize the effort to create the pipeline.

What should you recommend?



Answer : A


Question 5

You have an Azure data factory named ADM that contains a pipeline named Pipelwe1

Pipeline! must execute every 30 minutes with a 15-minute offset.

Vou need to create a trigger for Pipehne1. The trigger must meet the following requirements:

* Backfill data from the beginning of the day to the current time.

* If Pipeline1 fairs, ensure that the pipeline can re-execute within the same 30-mmute period.

* Ensure that only one concurrent pipeline execution can occur.

* Minimize de4velopment and configuration effort

Which type of trigger should you create?



Answer : D


Question 6

You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload.

You need to recommend a format for the transformed files. The solution must meet the following requirements:

Contain information about the data types of each column in the files.

Support querying a subset of columns in the files.

Support read-heavy analytical workloads.

Minimize the file size.

What should you recommend?



Answer : D

Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar format.

Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file format is more efficient in terms of storage and performance.

It is especially good for queries that read particular columns from a ''wide'' (with many columns) table since only needed columns are read, and IO is minimized.


Question 7

You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool.

You need to create a surrogate key for the table. The solution must provide the fastest query performance.

What should you use for the surrogate key?



Answer : C

Use IDENTITY to create surrogate keys using dedicated SQL pool in AzureSynapse Analytics.

Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.


Page:    1 / 14   
Total 316 questions