Posted in dp-203 Data Engineering on Microsoft Azure dp-203 Data Engineering on Microsoft Azure DP-203 dumps dp-203 dumps DP-203 exam questions dp-203 exam questions DP-203 pdf DP-203 practice test dp-203 practice test Microsoft Microsoft Azure Data Engineer Associate Microsoft Certified: Azure Data Engineer Associate Uncategorized

Why not try Lead4Pass DP-203 exam dumps | 100% pass exam

Because Lead4Pass is a leader in exam certification, we have many years of exam experience and have helped many people realize their dreams and successfully pass the exam!
Microsoft DP-203 exam “Data Engineering on Microsoft Azure”. This is just one product in the Microsoft exam.
We have a full range of exam content and products. You can also recommend friends to choose other products to help them successfully pass the exam.
Microsoft DP-203 exam dumps https://www.lead4pass.com/dp-203.html (PDF +VCE). All exam questions are the latest updates, Guaranteed actual validity.
We have a 99.5% exam pass rate, you can rest assured that you can choose to help you obtain a certificate.
All content on this site comes from Lead4Pass for free sharing. You can practice the test online.

Microsoft DP-203 exam PDF online download

The Microsoft DP-203 exam PDF shared on this site is part of the Lead4Pass DP-203 exam questions and answers. For the complete exam PDF, please visit Lead4Pass

Online practice test the latest updated Microsoft DP-203 exam questions

QUESTION 1
HOTSPOT
Which Azure Data Factory components should you recommend using together to import the daily inventory data from
the SQL server to Azure Data Lake Storage? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

microsoft dp-203 exm questions q1

Correct Answer:

microsoft dp-203 exm questions q1-1

Explanation:
Box 1: Self-hosted integration runtime
A self-hosted IR is capable of running copy activity between a cloud data stores and a data store in private network.
Box 2: Schedule trigger
Schedule every 8 hours
Box 3: Copy activity
Scenario:
Customer data, including name, contact information, and loyalty number, comes from Salesforce and can be imported
into Azure once every eight hours. Row modified dates are not trusted in the source table.
Product data, including product ID, name, and category, comes from Salesforce and can be imported into Azure once
every eight hours. Row modified dates are not trusted in the source table.

 

QUESTION 2
HOTSPOT
You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools.
Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the
same data attributes and data from a subsidiary of your company.
You need to move the files to a different folder and transform the data to meet the following requirements:
Provide the fastest possible query times.
Automatically infer the schema from the underlying files.
How should you configure the Data Factory copy activity? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

microsoft dp-203 exm questions q2

Correct Answer:

microsoft dp-203 exm questions q2-1

Box 1: Preserver herarchy
Compared to the flat namespace on Blob storage, the hierarchical namespace greatly improves the performance of
directory management operations, which improves overall job performance.
Box 2: Parquet
Azure Data Factory parquet format is supported for Azure Data Lake Storage Gen2.
Parquet supports the schema property.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
https://docs.microsoft.com/en-us/azure/data-factory/format-parquet

 

QUESTION 3
HOTSPOT
You need to design the partitions for the product sales transactions. The solution must meet the sales transaction
dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

microsoft dp-203 exm questions q3

Correct Answer:

microsoft dp-203 exm questions q3-1

Box 1: Sales date
Scenario: Contoso requirements for data integration include:
Partition data that contains sales transaction records. Partitions must be designed to provide efficient loads by month.
Boundary values must belong to the partition on the right.
Box 2: An Azure Synapse Analytics Dedicated SQL pool
Scenario: Contoso requirements for data integration include:
Ensure that data storage costs and performance are predictable.
The size of a dedicated SQL pool (formerly SQL DW) is determined by Data Warehousing Units (DWU).
Dedicated SQL pool (formerly SQL DW) stores data in relational tables with columnar storage. This format significantly
reduces the data storage costs, and improves query performance.
Synapse analytics dedicated sql pool
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-what-is

 

QUESTION 4
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following
three workloads:
A workload for data engineers who will use Python and SQL.
A workload for jobs that will run notebooks that use Python, Scala, and SOL.
A workload that data scientists will use to perform ad hoc analysis in Scala and R.
The enterprise architecture team at your company identifies the following standards for Databricks environments:
The data engineers must share a cluster.
The job cluster will be managed by using a request process whereby data scientists and data engineers provide
packaged notebooks for deployment to the cluster.
All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity.
Currently, there are three data scientists.
You need to create the Databricks clusters for the workloads.
Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data
engineers, and a Standard cluster for the jobs.
Does this meet the goal?
A. Yes
B. No
Correct Answer: B
Need a High Concurrency cluster for the jobs.
Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python,
R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they
provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.
Reference:
https://docs.azuredatabricks.net/clusters/configure.html

 

QUESTION 5
You have an Azure Synapse Analytics dedicated SQL pool that contains a large fact table. The table contains 50
columns and 5 billion rows and is a heap.
Most queries against the table aggregate values from approximately 100 million rows and return only two columns.
You discover that the queries against the fact table are very slow.
Which type of index should you add to provide the fastest query times?
A. nonclustered columnstore
B. clustered columnstore
C. nonclustered
D. clustered
Correct Answer: B
Clustered columnstore indexes are one of the most efficient ways you can store your data in dedicated SQL pool.
Columnstore tables won\\’t benefit a query unless the table has more than 60 million rows.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool

 

QUESTION 6
You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following
Transact-SQL statement.microsoft dp-203 exm questions q6

You need to alter the table to meet the following requirements:
Ensure that users can identify the current manager of employees.
Support creating an employee reporting hierarchy for your entire company.
Provide fast lookup of the managers\\’ attributes such as name and job title.
Which column should you add to the table?
A. [ManagerEmployeeID] [int] NULL
B. [ManagerEmployeeID] [smallint] NULL
C. [ManagerEmployeeKey] [int] NULL
D. [ManagerName] [varchar](200) NULL
Correct Answer: A
Use the same definition as the EmployeeID column.
Reference: https://docs.microsoft.com/en-us/analysis-services/tabular-models/hierarchies-ssas-tabular

 

QUESTION 7
HOTSPOT
You build an Azure Data Factory pipeline to move data from an Azure Data Lake Storage Gen2 container to a database
in an Azure Synapse Analytics dedicated SQL pool.
Data in the container is stored in the following folder structure.
/in/{YYYY}/{MM}/{DD}/{HH}/{mm}
The earliest folder is /in/2021/01/01/00/00. The latest folder is /in/2021/01/15/01/45.
You need to configure a pipeline trigger to meet the following requirements:
Existing data must be loaded.
Data must be loaded every 30 minutes.
Late-arriving data of up to two minutes must he included in the load for the time at which the data should have arrived.
How should you configure the pipeline trigger? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

microsoft dp-203 exm questions q7

Correct Answer:

microsoft dp-203 exm questions q7-1

Box 1: Tumbling window
To be able to use the Delay parameter we select Tumbling window.
Box 2:
Recurrence: 30 minutes, not 32 minutes
Delay: 2 minutes.
The amount of time to delay the start of data processing for the window. The pipeline run is started after the expected
execution time plus the amount of delay. The delay defines how long the trigger waits past the due time before
triggering a
new run. The delay doesn\\’t alter the window startTime.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger

 

QUESTION 8
You have an Azure Synapse Analytics dedicated SQL pool.
You need to ensure that data in the pool is encrypted at rest. The solution must NOT require modifying applications that
query the data.
What should you do?
A. Enable encryption at rest for the Azure Data Lake Storage Gen2 account.
B. Enable Transparent Data Encryption (TDE) for the pool.
C. Use a customer-managed key to enable double encryption for the Azure Synapse workspace.
D. Create an Azure key vault in the Azure subscription grant access to the pool.
Correct Answer: B
Transparent Data Encryption (TDE) helps protect against the threat of malicious activity by encrypting and decrypting
your data at rest. When you encrypt your database, associated backups and transaction log files are encrypted without
requiring any changes to your applications. TDE encrypts the storage of an entire database by using a symmetric key
called the database encryption key.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-overviewmanage-security

 

QUESTION 9
You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction dataset
requirements. What should you create?
A. a table that has an IDENTITY property
B. a system-versioned temporal table
C. a user-defined SEQUENCE object
D. a table that has a FOREIGN KEY constraint
Correct Answer: A
Scenario: Implement a surrogate key to account for changes to the retail store addresses.
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table
data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can
use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tablesidentity

 

QUESTION 10
HOTSPOT
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL pool.
You create a table by using the Transact-SQL statement shown in the following exhibit.microsoft dp-203 exm questions q10

Use the drop-down menus to select the answer choice that completes each statement based on the information
presented in the graphic. NOTE: Each correct selection is worth one point.
Hot Area:

microsoft dp-203 exm questions q10-1

Correct Answer:

microsoft dp-203 exm questions q10-2

Box 1: Type 2
A Type 2 SCD supports versioning of dimension members. Often the source system doesn\\’t store versions, so the
data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table
must use
a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that
define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for
example,
IsCurrent) to easily filter by current dimension members.
Incorrect Answers:
A Type 1 SCD always reflects the latest values, and when changes in source data are detected, the dimension table
data is overwritten.
Box 2: a business key
A business key or natural key is an index which identifies uniqueness of a row based on columns that exist naturally in a
table according to business rules. For example business keys are customer code in a customer table, composite of
sales
order header number and sales order item line number within a sales order details table.
Reference:
https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analyticspipelines/3-choose-between-dimension-types

 

QUESTION 11
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier.
You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must meet the following
requirements:
Automatically scale down workers when the cluster is underutilized for three minutes.
Minimize the time it takes to scale to the maximum number of workers.
Minimize costs.
What should you do first?
A. Enable container services for workspace1.
B. Upgrade workspace1 to the Premium pricing tier.
C. Set Cluster Mode to High Concurrency.
D. Create a cluster policy in workspace1.
Correct Answer: B
For clusters running Databricks Runtime 6.4 and above, optimized autoscaling is used by all-purpose clusters in the
Premium plan Optimized autoscaling:
Scales up from min to max in 2 steps.
Can scale down even if the cluster is not idle by looking at shuffle file state.
Scales down based on a percentage of current nodes.
On job clusters, scales down if the cluster is underutilized over the last 40 seconds.
On all-purpose clusters, scales down if the cluster is underutilized over the last 150 seconds.
The spark.databricks.aggressiveWindowDownS Spark configuration property specifies in seconds how often a cluster
makes down-scaling decisions. Increasing the value causes a cluster to scale down more slowly. The maximum value
is
600.
Note: Standard autoscaling
Starts with adding 8 nodes. Thereafter, scales up exponentially, but can take many steps to reach the max. You can
customize the first step by setting the spark.databricks.autoscaling.standardFirstStepUp Spark configuration property.
Scales down only when the cluster is completely idle and it has been underutilized for the last 10 minutes.
Scales down exponentially, starting with 1 node.
Reference:
https://docs.databricks.com/clusters/configure.html

 

QUESTION 12
You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types
of data, and insert the data into a table in an Azure Synapse Analytic dedicated SQL pool. The CSV file contains three
columns named username, comment, and date.
The data flow already contains the following:
A source transformation.
A Derived Column transformation to set the appropriate types of data.
A sink transformation to land the data in the pool.
You need to ensure that the data flow meets the following requirements:
All valid rows must be written to the destination table.
Truncation errors in the comment column must be avoided proactively.
Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob
storage.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. To the data flow, add a sink transformation to write the rows to a file in blob storage.
B. To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.
C. To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
D. Add a select transformation to select only the rows that will cause truncation errors.
Correct Answer: AB
B: Example:
1.
This conditional split transformation defines the maximum length of “title” to be five. Any row that is less than or equal to
five will go into the GoodRows stream. Any row that is larger than five will go into the BadRows stream.
2.
This conditional split transformation defines the maximum length of “title” to be five. Any row that is less than or equal to
five will go into the GoodRows stream. Any row that is larger than five will go into the BadRows stream.microsoft dp-203 exm questions q12

A:
3.
Now we need to log the rows that failed. Add a sink transformation to the BadRows stream for logging. Here, we\\’ll
“auto-map” all of the fields so that we have logging of the complete transaction record. This is a text-delimited CSV file
output to a single file in Blob Storage. We\\’ll call the log file “badrows.csv”.
4.
The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and
put those entries into a log file. Meanwhile, successful rows can continue to write to our target database.

microsoft dp-203 exm questions q12-1 microsoft dp-203 exm questions q12-2

Reference: https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows

 

QUESTION 13
HOTSPOT
You are building an Azure Stream Analytics job to identify how much time a user spends interacting with a feature on a
webpage.
The job receives events based on user actions on the webpage. Each row of data represents an event. Each event has
a type of either \\’start\\’ or \\’end\\’.
You need to calculate the duration between start and end events.
How should you complete the query? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:microsoft dp-203 exm questions q13

Correct Answer:

microsoft dp-203 exm questions q13-1

Box 1: DATEDIFF
DATEDIFF function returns the count (as a signed integer value) of the specified datepart boundaries crossed between
the specified startdate and enddate.
Syntax: DATEDIFF ( datepart , startdate, enddate )
Box 2: LAST
The LAST function can be used to retrieve the last event within a specific condition. In this example, the condition is an
event of type Start, partitioning the search by PARTITION BY user and feature. This way, every user and feature is
treated independently when searching for the Start event. LIMIT DURATION limits the search back in time to 1 hour
between the End and Start events.
Example:
SELECT [user], feature, DATEDIFF( second, LAST(Time) OVER (PARTITION BY [user], feature LIMIT
DURATION(hour, 1) WHEN Event = \\’start\\’), Time) as duration
FROM input TIMESTAMP BY Time
WHERE Event = \\’end\\’
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-stream-analytics-query-patterns


Passing the Microsoft DP-203 exam is not difficult. You can practice the test online first, and then choose the Lead4Pass DP-203 exam dumps https://www.lead4pass.com/dp-203.html (Total Questions: 107 Q&A).
All Exam questions and answers are updated in real-time! It is guaranteed to be effective immediately.

ps.
The Microsoft DP-203 exam PDF shared on this site is part of the Lead4Pass DP-203 exam questions and answers. For the complete exam PDF, please visit Lead4Pass

Posted in azure dp 201 azure dp 201 questions azure dp 201 study guide azure dp-201 dumps azure dp-201 pdf azure dp-201 questions DP-201 Designing an Azure Data Solution Microsoft Microsoft Azure Data Engineer Associate Microsoft DP-201 microsoft dp-201 certification

[September 2020] New Microsoft DP-201 Brain dumps and online practice tests are shared from Lead4Pass (latest Updated)

The latest Microsoft DP-201 dumps by Lead4Pass helps you pass the DP-201 exam for the first time! Lead4Pass Latest Update Microsoft DP-201 VCE Dump and DP-201 PDF Dumps, Lead4Pass DP-201 Exam Questions Updated, Answers corrected! Get the latest LeadPass DP-201 dumps with Vce and PDF: https://www.lead4pass.com/dp-201.html (Q&As: 164 dumps)

[Free DP-201 PDF] Microsoft DP-201 Dumps PDF can be collected on Google Drive shared by Lead4Pass:
https://drive.google.com/file/d/1OnBb9I2qIbSg248c63fHtchcZUFR1lQk/

[Lead4pass DP-201 Youtube] Microsoft DP-201 Dumps can be viewed on Youtube shared by Lead4Pass

Microsoft DP-201 Online Exam Practice Questions

QUESTION 1
You need to design the solution for analyzing customer data.
What should you recommend?
A. Azure Databricks
B. Azure Data Lake Storage
C. Azure SQL Data Warehouse
D. Azure Cognitive Services
E. Azure Batch
Correct Answer: A
Customer data must be analyzed using managed Spark clusters.
You create spark clusters through Azure Databricks.
References:
https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal

 

QUESTION 2
You are designing an Azure Cosmos DB database that will support vertices and edges. Which Cosmos DB API should you include in the design?
A. SQL
B. Cassandra
C. Gremlin
D. Table
Correct Answer: C
The Azure Cosmos DB Gremlin API can be used to store massive graphs with billions of vertices and edges.
References: https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction

 

QUESTION 3
You are designing a real-time stream solution based on Azure Functions. The solution will process data uploaded to
Azure Blob Storage. The solution requirements are as follows:
1. New blobs must be processed with a little delay as possible.
2.
Scaling must occur automatically.
3.
Costs must be minimized. What should you recommend?
A. Deploy the Azure Function in an App Service plan and use a Blob trigger.
B. Deploy the Azure Function in a Consumption plan and use an Event Grid trigger.
C. Deploy the Azure Function in a Consumption plan and use a Blob trigger.
D. Deploy the Azure Function in an App Service plan and use an Event Grid trigger.
Correct Answer: C
Create a function, with the help of a blob trigger template, which is triggered when files are uploaded to or updated in
Azure Blob storage. You use a consumption plan, which is a hosting plan that defines how resources are allocated to
your function app. In the default Consumption Plan, resources are added dynamically as required by your functions. In
this serverless hosting, you only pay for the time your functions run. When you run in an App Service Plan, you must
manage the scaling of your function app.
References: https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-storage-blob-triggered-function

 

QUESTION 4
You need to optimize storage for CONT_SQL3.
What should you recommend?
A. AlwaysOn
B. Transactional processing
C. General
D. Data warehousing
Correct Answer: B
CONT_SQL3 with the SQL Server role, 100 GB database size, Hyper-VM to be migrated to Azure VM. The storage
should be configured to optimized storage for database OLTP workloads.
Azure SQL Database provides three basic in-memory based capabilities (built into the underlying database engine) that
can contribute in a meaningful way to performance improvements:
In-Memory Online Transactional Processing (OLTP) Clustered column store indexes intended primarily for Online
Analytical Processing (OLAP) workloads Nonclustered column store indexes geared towards Hybrid
Transactional/Analytical Processing (HTAP) workloads
References: https://www.databasejournal.com/features/mssql/overview-of-in-memory-technologies-of-azure-sql-database.html 

 

QUESTION 5
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear on the review screen.
You are designing an HDInsight/Hadoop cluster solution that uses Azure Data Lake Gen1 Storage.
The solution requires POSIX permissions and enables diagnostics logging for auditing.
You need to recommend solutions that optimize storage.
Proposed Solution: Implement compaction jobs to combine small files into larger files.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: A
Depending on what services and workloads are using the data, a good size to consider for files is 256 MB or greater. If
the file sizes cannot be batched when landing in Data Lake Storage Gen1, you can have a separate compaction job that
combines these files into larger ones.
Note: POSIX permissions and auditing in Data Lake Storage Gen1 comes with an overhead that becomes apparent
when working with numerous small files. As a best practice, you must batch your data into larger files versus writing
thousands or millions of small files to Data Lake Storage Gen1. Avoiding small file sizes can have multiple benefits,
such as:
1.
Lowering the authentication checks across multiple files
2.
Reduced open file connections
3.
Faster copying/replication
4.
Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions
References: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices

 

QUESTION 6
You need to ensure that emergency road response vehicles are dispatched automatically.
How should you design the processing system? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area: lead4pass dp-201 exam questions q6

Correct Answer:

lead4pass dp-201 exam questions q6-1

Explanation: Box1: API App

lead4pass dp-201 exam questions q6-2

Events generated from the IoT data sources are sent to the stream ingestion layer through Azure HDInsight Kafka as a
stream of messages. HDInsight Kafka stores a stream of data in topics for a configurable of time.
Kafka consumer, Azure Databricks, picks up the message in real-time from the Kafka topic, to process the data based
on the business logic and can then send to the Serving layer for storage.
Downstream storage services, like Azure Cosmos DB, Azure SQL Data warehouse, or Azure SQL DB, will then be a data source for presentation and action layer.
Business analysts can use Microsoft Power BI to analyze warehoused data. Other applications can be built upon the
serving layer as well. For example, we can expose APIs based on the service layer data for third party uses.
Box 2: Cosmos DB Change Feed
Change feed support in Azure Cosmos DB works by listening to an Azure Cosmos DB container for any changes. It
then outputs the sorted list of documents that were changed in the order in which they were modified.
The change feed in Azure Cosmos DB enables you to build efficient and scalable solutions for each of these patterns,
as shown in the following image:

lead4pass dp-201 exam questions q6-3

References: https://docs.microsoft.com/bs-cyrl-ba/azure/architecture/example-scenario/data/realtime-analytics-vehicleiot?view=azurermps-4.4.1

 

QUESTION 7
You need to design the disaster recovery solution for customer sales data analytics.
Which three actions should you recommend? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Provision multiple Azure Databricks workspaces in separate Azure regions.
B. Migrate users, notebooks, and cluster configurations from one workspace to another in the same region.
C. Use zone redundant storage.
D. Migrate users, notebooks, and cluster configurations from one region to another.
E. Use Geo-redundant storage.
F. Provision a second Azure Databricks workspace in the same region.
Correct Answer: ADE
Scenario: The analytics solution for customer sales data must be available during a regional outage.
To create your own regional disaster recovery topology for data bricks, follow these requirements:
Provision multiple Azure Databricks workspaces in separate Azure regions
Use Geo-redundant storage.
Once the secondary region is created, you must migrate the users, user folders, notebooks, cluster configuration, jobs
configuration, libraries, storage, init scripts and reconfigure access control.
Note: Geo-redundant storage (GRS) is designed to provide at least 99.99999999999999% (16 9\\’s) durability of objects
over a given year by replicating your data to a secondary region that is hundreds of miles away from the primary region.
If
your storage account has GRS enabled, then your data is durable even in the case of a complete regional outage or a
disaster in which the primary region isn\\’t recoverable.
References:
https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs

 

QUESTION 8
A company is developing a mission-critical line of business app that uses Azure SQL Database Managed Instance.
You must design a disaster recovery strategy for the solution/You need to ensure that the database automatically recovers when the full or partial loss of the Azure SQL Database
service occurs in the primary region.
What should you recommend?
A. Failover-group
B. Azure SQL Data Sync
C. SQL Replication
D. Active geo-replication
Correct Answer: A
Auto-failover groups is a SQL Database feature that allows you to manage replication and failover of a group of
databases on a SQL Database server or all databases in a Managed Instance to another region (currently in public
preview for Managed Instance). It uses the same underlying technology as active geo-replication. You can initiate
failover manually or you can delegate it to the SQL Database service based on a user-defined policy.
References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group

 

QUESTION 9
You are designing a data storage solution for a database that is expected to grow to 50 TB. The usage pattern is
singleton inserts, singleton updates, and reporting. Which storage solution should you use?
A. Azure SQL. Database elastic pools
B. Azure SQL Data Warehouse
C. Azure Cosmos DB that uses the Gremlin API
D. Azure SQL Database Hyperscale
Correct Answer: D
A Hyperscale database is an Azure SQL database in the Hyperscale service tier that is backed by the Hyperscale scaleout storage technology. A Hyperscale database supports up to 100 TB of data and provides high throughput and
performance, as well as rapid scaling to adapt to the workload requirements. Scaling is transparent to the application
?connectivity, query processing, etc. work like any other Azure SQL database.
Incorrect Answers:
A: SQL Database elastic pools are a simple, cost-effective solution for managing and scaling multiple databases that
have varying and unpredictable usage demands. The databases in an elastic pool are on a single Azure SQL Database
server and share a set number of resources at a set price. Elastic pools in Azure SQL Database enable SaaS
developers to optimize the price performance for a group of databases within a prescribed budget while delivering
performance elasticity for each database.
B: Rather than SQL Data Warehouse, consider other options for operational (OLTP) workloads that have large numbers
of singleton selects.
References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale-faq

 

QUESTION 10
A company manufactures automobile parts. The company installs IoT sensors on manufacturing machinery.
You must design a solution that analyzes data from the sensors.
You need to recommend a solution that meets the following requirements:
1.
Data must be analyzed in real-time.
2.
Data queries must be deployed using continuous integration.
3.
Data must be visualized by using charts and graphs.
4.
Data must be available for ETL operations in the future.
5.
The solution must support high-volume data ingestion.
Which three actions should you recommend? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Use Azure Analysis Services to query the data. Output query results to Power BI.
B. Configure an Azure Event Hub to capture data to Azure Data Lake Storage.
C. Develop an Azure Stream Analytics application that queries the data and outputs to Power BI. Use Azure Data
Factory to deploy the Azure Stream Analytics application.
D. Develop an application that sends the IoT data to an Azure Event Hub.
E. Develop an Azure Stream Analytics application that queries the data and outputs to Power BI. Use Azure Pipelines to
deploy the Azure Stream Analytics application.
F. Develop an application that sends the IoT data to an Azure Data Lake Storage container.
Correct Answer: BCD

 

QUESTION 11
You are designing an application. You plan to use Azure SQL Database to support the application.
The application will extract data from the Azure SQL Database and create text documents. The text documents will be placed into a cloud-based storage solution. The text storage solution must be accessible from an SMB network share.
You need to recommend a data storage solution for the text documents.
Which Azure data storage type should you recommend?
A. Queue
B. Files
C. Blob
D. Table
Correct Answer: B
Azure Files enables you to set up highly available network file shares that can be accessed by using the standard
Server Message Block (SMB) protocol. Incorrect Answers:
A: The Azure Queue service is used to store and retrieve messages. It is generally used to store lists of messages to be
processed asynchronously.
C: Blob storage is optimized for storing massive amounts of unstructured data, such as text or binary data. Blob storage
can be accessed via HTTP or HTTPS but not via SMB.
D: Azure Table storage is used to store large amounts of structured data. Azure tables are ideal for storing structured,
non-relational data.
References: https://docs.microsoft.com/en-us/azure/storage/common/storage-introduction https://docs.microsoft.com/enus/azure/storage/tables/table-storage-overview

 

QUESTION 12
You need to recommend the appropriate storage and processing solution?
What should you recommend?
A. Enable auto-shrink on the database.
B. Flush the blob cache using Windows PowerShell.
C. Enable Apache Spark RDD (RDD) caching.
D. Enable Databricks IO (DBIO) caching.
E. Configure the reading speed using Azure Data Studio.
Correct Answer: C
Scenario: You must be able to use a file system view of data stored in a blob. You must build an architecture that will
allow Contoso to use the DB FS filesystem layer over a blob store.
Databricks File System (DBFS) is a distributed file system installed on Azure Databricks clusters. Files in DBFS persist
to Azure Blob storage, so you won\\’t lose data even after you terminate a cluster. The Databricks Delta cache, previously named Databricks IO (DBIO) caching, accelerates data reads by creating copies
of remote files in nodes

 

QUESTION 13
Inventory levels must be calculated by subtracting the current day\\’s sales from the previous day\\’s final inventory.
Which two options provide Litware with the ability to quickly calculate the current inventory levels by store and product?
Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Consume the output of the event hub by using Azure Stream Analytics and aggregate the data by store and product.
Output the resulting data directly to Azure SQL Data Warehouse. Use Transact-SQL to calculate the inventory levels.
B. Output Event Hubs Avro files to Azure Blob storage. Use Transact-SQL to calculate the inventory levels by using
PolyBase in Azure SQL Data Warehouse.
C. Consume the output of the event hub by using Databricks. Use Databricks to calculate the inventory levels and
output the data to Azure SQL Data Warehouse.
D. Consume the output of the event hub by using Azure Stream Analytics and aggregate the data by store and product.
Output the resulting data into Databricks. Calculate the inventory levels in Databricks and output the data to Azure Blob
storage.
E. Output Event Hubs Avro files to Azure Blob storage. Trigger an Azure Data Factory copy activity to run every 10
minutes to load the data into Azure SQL Data Warehouse. Use Transact-SQL to aggregate the data by store and
product.
Correct Answer: AE
A: Azure Stream Analytics is a fully managed service providing low-latency, highly available, scalable complex event
processing over streaming data in the cloud. You can use your Azure SQL Data Warehouse database as an output sink
for your Stream Analytics jobs.
E: Event Hubs Capture is the easiest way to get data into Azure. Using Azure Data Lake, Azure Data Factory, and
Azure HDInsight, you can perform batch processing and other analytics using familiar tools and platforms of your
choosing, at
any scale you need.
Note: Event Hubs Capture creates files in Avro format.
Captured data is written in Apache Avro format: a compact, fast, binary format that provides rich data structures with the inline schema. This format is widely used in the Hadoop ecosystem, Stream Analytics, and Azure Data Factory.
Scenario: The application development team will create an Azure event hub to receive real-time sales data, including
store number, date, time, product ID, customer loyalty number, price, and the discount amount, from the point of sale (POS)
system and output the data to data storage in Azure.
Reference: https://docs.microsoft.com/bs-latn-ba/azure/sql-data-warehouse/sql-data-warehouse-integrate-azure-streamanalytics https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview


latest updated Microsoft DP-201 exam questions from the Lead4Pass DP-201 dumps! 100% pass the DP-201 exam! Download Lead4Pass DP-201 VCE and PDF dumps: https://www.lead4pass.com/dp-201.html (Q&As: 164 dumps)

Get free Microsoft DP-201 dumps PDF online: https://drive.google.com/file/d/1OnBb9I2qIbSg248c63fHtchcZUFR1lQk/

Posted in dp-100 Implementing an Azure Data Solution Microsoft Azure Data Engineer Associate Microsoft DP-200 microsoft dp-200 dumps microsoft dp-200 exam dumps microsoft dp-200 exam questions microsoft dp-200 pdf

[September 2020] New Microsoft DP-200 Brain dumps and online practice tests are shared from Lead4Pass (latest Updated)

The latest Microsoft DP-200 dumps by Lead4Pass helps you pass the DP-200 exam for the first time! Lead4Pass Latest Update Microsoft DP-200 VCE Dump and DP-200 PDF Dumps, Lead4Pass DP-200 Exam Questions Updated, Answers corrected! Get the latest LeadPass DP-200 dumps with Vce and PDF: https://www.lead4pass.com/dp-200.html (Q&As: 207 dumps)

[Free DP-200 PDF] Microsoft DP-200 Dumps PDF can be collected on Google Drive shared by Lead4Pass:
https://drive.google.com/file/d/1b-hvJSM68TxBQmB_fv8lvGJusCiCZdrX/

[Lead4pass DP-200 Youtube] Microsoft DP-200 Dumps can be viewed on Youtube shared by Lead4Pass

Microsoft DP-200 Online Exam Practice Questions

QUESTION 1
A company plans to analyze a continuous flow of data from a social media platform by using Microsoft Azure Stream
Analytics. The incoming data is formatted as one record per row.
You need to create the input stream.
How should you complete the REST API segment? To answer, select the appropriate configuration in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area: lead4pass dp-200 exam questions q1 lead4pass dp-200 exam questions q1-1

Correct Answer:

lead4pass dp-200 exam questions q1-2 lead4pass dp-200 exam questions q1-3

 

QUESTION 2
You have an Azure SQL database named DB1 that contains a table named Table1. Table1 has a field named
Customer_ID that is varchar(22).
You need to implement masking for the Customer_ID field to meet the following requirements: The first two prefix characters must be exposed.
The last four prefix characters must be exposed.
All other characters must be masked.
Solution: You implement data masking and use an email function mask.
Does this meet the goal?
A. Yes
B. No
Correct Answer: B
Must use Custom Text data masking, which exposes the first and last characters and adds a custom padding string in
the middle.
References: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dynamic-data-masking-get-started

 

QUESTION 3
You need to ensure that phone-based polling data can be analyzed in the PollingData database.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions
to the answer area and arrange them in the correct order.
Select and Place: 

lead4pass dp-200 exam questions q3

Correct Answer:

lead4pass dp-200 exam questions q3-1

Explanation/Reference:
All deployments must be performed by using Azure DevOps. Deployments must use templates used in multiple
environments No credentials or secrets should be used during deployments

 

QUESTION 4
Contoso, Ltd. plans to configure existing applications to use the Azure SQL Database.
When security-related operations occur, the security team must be informed.
You need to configure Azure Monitor while minimizing administrative effort.
Which three actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Create a new action group to email [email protected]
B. Use [email protected] as an alert email address.
C. Use all security operations as a condition.
D. Use all Azure SQL Database servers as a resource.
E. Query audit log entries as a condition.
Correct Answer: ACD
References: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-action-rules

 

QUESTION 5
You manage a solution that uses Azure HDInsight clusters.
You need to implement a solution to monitor cluster performance and status.
Which technology should you use?
A. Azure HDInsight .NET SDK
B. Azure HDInsight REST API
C. Ambari REST API
D. Azure Log Analytics
E. Ambari Web UI
Correct Answer: E
Ambari is the recommended tool for monitoring utilization across the whole cluster. The Ambari dashboard shows easily
glanceable widgets that display metrics such as CPU, network, YARN memory, and HDFS disk usage. The specific
metrics shown depend on the cluster type. The “Hosts” tab shows metrics for individual nodes so you can ensure the load
on your cluster is evenly distributed.
The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning,
managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management
web UI backed by its RESTful APIs.
References: https://azure.microsoft.com/en-us/blog/monitoring-on-hdinsight-part-1-an-overview/
https://ambari.apache.org/

 

QUESTION 6
You manage the Microsoft Azure Databricks environment for a company. You must be able to access a private Azure
Blob Storage account. Data must be available to all Azure Databricks workspaces. You need to provide data
access. Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of
actions to the answer area and arrange them in the correct order.
Select and Place: 

lead4pass dp-200 exam questions q6

Step 1: Create a secret scope
Step 2: Add secrets to the scope
Note: dbutils.secrets.get(scope = “”, key = “”) gets the key that has been stored as a secret in a secret scope.
Step 3: Mount the Azure Blob Storage container
You can mount a Blob Storage container or a folder inside a container through Databricks File System – DBFS. The mount is a pointer to a Blob Storage container, so the data is never synced locally.
Note: To mount a Blob Storage container or a folder inside a container, use the following command:
Python dbutils.fs.mount(
source = “wasbs://@.blob.core.windows.net”, mount_point = “/mnt/”,
extra_configs = {“”:dbutils.secrets.get(scope = “”, key = “”)})
where:
dbutils.secrets.get(scope = “”, key = “”) gets the key that has been stored as a secret in a secret scope.
References:
https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html

 

QUESTION 7
What should you implement to optimize SQL Database for Race Central to meet the technical requirements?
A. the sp_updatestored procedure
B. automatic tuning
C. Query Store
D. the dbcc checkdbcommand
Correct Answer: A
Scenario: The query performance of Race Central must be stable, and the administrative time it takes to perform
optimizations must be minimized.
sp_update updates query optimization statistics on a table or indexed view. By default, the query optimizer already
updates statistics as necessary to improve the query plan; in some cases, you can improve query performance by using
UPDATE STATISTICS or the stored procedure sp_updatestats to update statistics more frequently than the default
updates.
Incorrect Answers:
D: dbcc checks the logical and physical integrity of all the objects in the specified database

 

QUESTION 8
You plan to implement an Azure Cosmos DB database that will write 100,000 JSON every 24 hours. The database will
be replicated in three regions. Only one region will be writable.
You need to select a consistency level for the database to meet the following requirements:
Guarantee monotonic reads and writes within a session.
Provide the fastest throughput.
Provide the lowest latency.
Which consistency level should you select?

A. Strong
B. Bounded Staleness
C. Eventual
D. Session
E. Consistent Prefix
Correct Answer: D
Session: Within a single client session reads are guaranteed to honor the consistent-prefix (assuming a single “writer”
session), monotonic reads, monotonic writes, read-your-writes, and write-follows-reads guarantees. Clients outside of
the session performing writes will see eventual consistency.
References: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels

 

QUESTION 9
You have an Azure SQL database that has masked columns.
You need to identify when a user attempts to infer data from the masked columns.
What should you use?
A. Azure Advanced Threat Protection (ATP)
B. custom masking rules
C. Transparent Data Encryption (TDE)
D. auditing
Correct Answer: D
Dynamic Data Masking is designed to simplify application development by limiting data exposure in a set of pre-defined
queries used by the application. While Dynamic Data Masking can also be useful to prevent accidental exposure of
sensitive data when accessing a production database directly, it is important to note that unprivileged users with ad-hoc
query permissions can apply techniques to gain access to the actual data. If there is a need to grant such ad-hoc
access, Auditing should be used to monitor all database activity and mitigate this scenario.
References: https://docs.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking

 

QUESTION 10
Which two metrics should you use to identify the appropriate RU/s for the telemetry data? Each correct answer presents
part of the solution. NOTE: Each correct selection is worth one point.
A. Number of requests
B. Number of requests exceeded capacity

C. End to end observed read latency at the 99
D. Session consistency
E. Data + Index storage consumed
F. Avg Troughput/s
Correct Answer: AE
Scenario: The telemetry data must be monitored for performance issues. You must adjust the Cosmos DB Request
Units per second (RU/s) to maintain a performance SLA while minimizing the cost of the Ru/s.
With Azure Cosmos DB, you pay for the throughput you provision and the storage you consume on an hourly basis.
While you estimate the number of RUs per second to provision, consider the following factors:
Item size: As the size of an item increases, the number of RUs consumed to read or write the item also increases.

 

QUESTION 11
A company has a SaaS solution that uses Azure SQL Database with elastic pools. The solution contains a dedicated
database for each customer organization. Customer organizations have peak usage at different periods during the
year.
You need to implement the Azure SQL Database elastic pool to minimize cost.
Which option or options should you configure?
A. Number of transactions only
B. eDTUs per database only
C. Number of databases only
D. CPU usage only
E. eDTUs and max data size
Correct Answer: E
The best size for a pool depends on the aggregate resources needed for all databases in the pool. This involves
determining the following:
Maximum resources utilized by all databases in the pool (either maximum DTUs or maximum vCores depending on your
choice of resourcing model).
Maximum storage bytes utilized by all databases in the pool.
Note: Elastic pools enable the developer to purchase resources for a pool shared by multiple databases to
accommodate unpredictable periods of usage by individual databases. You can configure resources for the pool based
either on the
DTU-based purchasing model or the vCore-based purchasing model. References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-pool 

 

QUESTION 12
You are developing a data engineering solution for a company. The solution will store a large set of key-value pair data
by using Microsoft Azure Cosmos DB.
The solution has the following requirements:
Data must be partitioned into multiple containers.
Data containers must be configured separately.
Data must be accessible from applications hosted around the world.
The solution must minimize latency.
You need to provision Azure Cosmos DB.
A. Cosmos account-level throughput.
B. Provision an Azure Cosmos DB account with the Azure Table API. Enable geo-redundancy.
C. Configure table-level throughput.
D. Replicate the data globally by manually adding regions to the Azure Cosmos DB account.
E. Provision an Azure Cosmos DB account with the Azure Table API. Enable multi-region writes.
Correct Answer: E
The scale read and write throughput globally. You can enable every region to be writable and elastically scale reads and
writes all around the world. The throughput that your application configures on an Azure Cosmos database or a
container is guaranteed to be delivered across all regions associated with your Azure Cosmos account. The provisioned
throughput is guaranteed up by financially-backed SLAs.
References: https://docs.microsoft.com/en-us/azure/cosmos-db/distribute-data-globally

 

QUESTION 13
Your company uses Azure SQL Database and Azure Blob storage.
All data at rest must be encrypted by using the company\\’s own key. The solution must minimize administrative effort
and the impact on applications that use the database.
You need to configure security.
What should you implement? To answer, select the appropriate option in the answer area.
NOTE: Each correct selection is worth one point. Hot Area:

lead4pass dp-200 exam questions q13

Correct Answer:

lead4pass dp-200 exam questions q13-1


latest updated Microsoft DP-200 exam questions from the Lead4Pass DP-200 dumps! 100% pass the DP-200 exam! Download Lead4Pass DP-200 VCE and PDF dumps: https://www.lead4pass.com/dp-200.html (Q&As: 207 dumps)

Get free Microsoft DP-200 dumps PDF online: https://drive.google.com/file/d/1b-hvJSM68TxBQmB_fv8lvGJusCiCZdrX/