Posted in dp-100 Designing and Implementing a Data Science Solution on Azure Microsoft Microsoft DP-100 microsoft dp-100 dumps microsoft dp-100 pdf microsoft dp-100 practice test Microsoft Role-based

[MAR 2021] Microsoft DP-100 exam dumps and online practice questions are available from Lead4Pass

The latest updated Microsoft DP-100 exam dumps and free DP-100 exam practice questions and answers! Latest updates from Lead4Pass Microsoft DP-100 Dumps PDF and DP-100 Dumps VCE, Lead4Pass DP-100 exam questions updated and answers corrected! Get the full Microsoft DP-100 dumps from https://www.lead4pass.com/dp-100.html (VCE&PDF)

Latest DP-100 PDF for free

Share the Microsoft DP-100 Dumps PDF for free From Lead4pass DP-100 Dumps part of the distraction collected on Google Drive shared by Lead4pass
https://drive.google.com/file/d/1dCTFiaHIqtM7a36PFW_mildIQPzQetSI/

The latest updated Microsoft DP-100 Exam Practice Questions and Answers Online Practice Test is free to share from Lead4Pass (Q1-Q13)

QUESTION 1
HOTSPOT
You write code to retrieve an experiment that is run from your Azure Machine Learning workspace.
The run used the model interpretation support in Azure Machine Learning to generate and upload a model explanation.
Business managers in your organization want to see the importance of the features in the model.
You need to print out the model features and their relative importance in an output that looks similar to the following. [2021.3] lead4pass dp-100 practice test q1

How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

[2021.3] lead4pass dp-100 practice test q1-1

Correct Answer:

[2021.3] lead4pass dp-100 practice test q1-2

Box 1: from_run_id
from_run_id(workspace, experiment_name, run_id)
Create the client with the factory method given a run ID.
Returns an instance of the ExplanationClient.
Parameters Workspace An object that represents a workspace.
experiment_name str The name of an experiment.
run_id str A GUID that represents a run.
Box 2: list_model_explanations
list_model_explanations returns a dictionary of metadata for all model explanations available.
Returns
A dictionary of metadata such as id, data type, method, model type, and upload time, sorted by upload time
Box 3:
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-contribinterpret/azureml.contrib.interpret.explanation.explanation_client.explanationclient?view=azure-ml-py


QUESTION 2
HOTSPOT
You have a dataset that contains 2,000 rows. You are building a machine learning classification model by using Azure
Learning Studio. You add a Partition and Sample module to the experiment.
You need to configure the module. You must meet the following requirements:
1.
Divide the data into subsets
2.
Assign the rows into folds using a round-robin method
3.
Allow rows in the dataset to be reused
How should you configure the module? To answer, select the appropriate options in the dialog box in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:[2021.3] lead4pass dp-100 practice test q2

Correct Answer:

[2021.3] lead4pass dp-100 practice test q2-1

Use the Split data into partitions option when you want to divide the dataset into subsets of the data. This option is also
useful when you want to create a custom number of folds for cross-validation or to split rows into several groups.
Add the Partition and Sample module to your experiment in Studio (classic), and connect the dataset.
For Partition or sample mode, select Assign to Folds.
Use replacement in the partitioning: Select this option if you want the sampled row to be put back into the pool of rows
for potential reuse. As a result, the same row might be assigned to several folds.
If you do not use a replacement (the default option), the sampled row is not put back into the pool of rows for potential
reuse. As a result, each row can be assigned to only one fold.
Randomized split: Select this option if you want rows to be randomly assigned to folds.
If you do not select this option, rows are assigned to folds using the round-robin method.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

 

QUESTION 3
You are retrieving data from a large data store by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on the system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

[2021.3] lead4pass dp-100 practice test q3

Correct Answer:

[2021.3] lead4pass dp-100 practice test q3-1

Box 1: Sampling Create a sample of data This option supports simple random sampling or stratified random sampling.
This is useful if you want to create a smaller representative sample dataset for testing.
1.
Add the Partition and Sample module to your experiment in Studio, and connect the dataset.
2.
Partition or sample mode: Set this to Sampling.
3.
Rate of sampling. See box 2 below.
Box 2: 0
3. Rate of sampling. Random seed for sampling: Optionally, type an integer to use as a seed value.
This option is important if you want the rows to be divided the same way every time. The default value is 0, meaning that
a starting seed is generated based on the system clock. This can lead to slightly different results each time you run the
experiment.
References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample

 

QUESTION 4
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear in the review screen.
You are using Azure Machine Learning Studio to perform feature engineering on a dataset.
You need to normalize values to produce a feature column grouped into bins.
Solution: Apply an Entropy Minimum Description Length (MDL) binning mode.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: A
Entropy MDL binning mode: This method requires that you select the column you want to predict and the column or
columns that you want to group into bins. It then makes a pass over the data and attempts to determine the number of
bins that minimizes the entropy. In other words, it chooses a number of bins that allow the data column to best predict
the target column. It then returns the bin number associated with each row of your data in a column named quantized.
References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-data-into-bins

 

QUESTION 5
You are evaluating a completed binary classification machine learning model.
You need to use precision as the evaluation metric.
Which visualization should you use?
A. Violin plot
B. Gradient descent
C. Box plot
D. Binary classification confusion matrix
Correct Answer: D
Incorrect Answers:
A: A violin plot is a visual that traditionally combines a box plot and a kernel density plot.
B: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local
minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or
approximate gradient) of the function at the current point.
C: A box plot lets you see basic distribution information about your data, such as median, mean, range, and quartiles but
doesn\\’t show you how your data looks throughout its range.
References: https://machinelearningknowledge.ai/confusion-matrix-and-performance-metrics-machine-learning/

 

QUESTION 6
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear in the review screen.
You have a Python script named train.py in a local folder named scripts. The script trains a regression model by using
scikit-learn. The script includes code to load a training data file which is also located in the scripts folder.
You must run the script as an Azure ML experiment on a compute cluster named aml-compute.
You need to configure the run to ensure that the environment includes the required packages for model training. You
have instantiated a variable named aml-compute that references the target compute cluster.
Solution: Run the following code:[2021.3] lead4pass dp-100 practice test q6

Does the solution meet the goal?
A. Yes
B. No
Correct Answer: B
There is a missing line: conda_packages=[\\’scikit-learn\\’], which is needed.
Correct example:
sk_est = Estimator(source_directory=\\’./my-sklearn-proj\\’, script_params=script_params,
compute_target=compute_target,
entry_script=\\’train.py\\’,
conda_packages=[\\’scikit-learn\\’])
Note:
The Estimator class represents a generic estimator to train data using any supplied framework.
This class is designed for use with machine learning frameworks that do not already have an Azure Machine Learning
pre-configured estimator. Pre-configured estimators exist for Chainer, PyTorch, TensorFlow, and SKLearn.
Example:
from azureml.train.estimator import Estimator
script_params = {
# to mount files referenced by mnist dataset
\\’–data-folder\\’: ds.as_named_input(\\’mnist\\’).as_mount(),
\\’–regularization\\’: 0.8
}
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator

 

QUESTION 7
HOTSPOT
You are performing sentiment analysis using a CSV file that includes 12.0O0 customer reviews written in a short
sentence format.
You add the CSV file to Azure Machine Learning Studio and Configure it as the starting point dataset of an experiment.
You add the Extract N-Gram Features from the Text module to the experiment to extract key phrases from the customer
review column in the dataset.
You must create a new n-gram text dictionary from the customer review text and set the maximum n-gram size to
trigrams.
You need to configure the Extract N-Gram Features from the Text module.
What should you select? To answer, select the appropriate options in the answer area;
NOTE: Each correct selection is worth one point.
Hot Area:[2021.3] lead4pass dp-100 practice test q7

Correct Answer:

[2021.3] lead4pass dp-100 practice test q7-1

 

QUESTION 8
You plan to provision an Azure Machine Learning Basic edition workspace for a data science project.
You need to identify the tasks you will be able to perform in the workspace.
Which three tasks will you be able to perform? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Create a Compute Instance and use it to run code in Jupyter notebooks.
B. Create an Azure Kubernetes Service (AKS) inference cluster.
C. Use the designer to train a model by dragging and dropping pre-defined modules.
D. Create a tabular dataset that supports versioning.
E. Use the Automated Machine Learning user interface to train a model.
Correct Answer: ABD
Incorrect Answers:
C, E: The UI is included in the Enterprise edition only.
Reference:
https://azure.microsoft.com/en-us/pricing/details/machine-learning/

 

QUESTION 9
You are analyzing a dataset by using Azure Machine Learning Studio.
You need to generate a statistical summary that contains the p-value and the unique count for each feature column.
Which two modules can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Computer Linear Correlation
B. Export Count Table
C. Execute Python Script
D. Convert to Indicator Values
E. Summarize Data
Correct Answer: BE
The Export Count Table module is provided for backward compatibility with experiments that use the Build Count Table
(deprecated) and Count Featurizer (deprecated) modules.
E: Summarize Data statistics are useful when you want to understand the characteristics of the complete dataset. For
example, you might need to know:
How many missing values are there in each column?
How many unique values are there in a feature column?
What is the mean and standard deviation for each column? The module calculates the important scores for each
column and returns a row of summary statistics for each variable (data column) provided as input.
Incorrect Answers:
A: The Compute Linear Correlation module in Azure Machine Learning Studio is used to compute a set of Pearson
correlation coefficients for each possible pair of variables in the input dataset.
C: With Python, you can perform tasks that aren\\’t currently supported by existing Studio modules such as:
Visualizing data using matplotlib
Using Python libraries to enumerate datasets and models in your workspace Reading, loading, and manipulating data
from sources not supported by the Import Data module
D: The purpose of the Convert to Indicator Values module is to convert columns that contain categorical values into a
series of binary indicator columns that can more easily be used as features in a machine learning model.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/export-count-table
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/summarize-data

 

QUESTION 10
You need to implement a model development strategy to determine a user\\’s tendency to respond to an ad.
Which technique should you use?
A. Use a Relative Expression Split module to partition the data based on centroid distance.
B. Use a Relative Expression Split module to partition the data based on distance traveled to the event.
C. Use a Split Rows module to partition the data based on distance traveled to the event.
D. Use a Split Rows module to partition the data based on centroid distance.
Correct Answer: A
Split Data partitions the rows of a dataset into two distinct sets.
The Relative Expression Split option in the Split Data module of Azure Machine Learning Studio is helpful when you
need to divide a dataset into training and testing datasets using a numerical expression.
Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number
could be a date/time field, a column containing age or dollar amounts, or even a percentage. For example, you might
want to
divide your data set depending on the cost of the items, group people by age ranges, or separate data by a calendar
date.
Scenario:
Local market segmentation models will be applied before determining a user\\’s propensity to respond to an
advertisement.
The distribution of features across training and production data are not consistent
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data

 

QUESTION 11
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear in the review screen.
You are creating a model to predict the price of a student\\’s artwork depending on the following variables:
the student\\’s length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Accuracy, Precision, Recall, F1 score, and AUC.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: B
Those are metrics for evaluating classification models, instead, use: Mean Absolute Error, Root Mean Absolute Error,
Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

 

QUESTION 12
You plan to create a speech recognition deep learning model.
The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual
Machine (DSVM).
What should you recommend?
A. Rattle
B. TensorFlow
C. Weka
D. Scikit-learn
Correct Answer: B
TensorFlow is an open-source library for numerical computation and large-scale machine learning. It uses Python to
provide a convenient front-end API for building applications with the framework TensorFlow can train and run deep
neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks,
sequence-to-sequence models for machine translation, natural language processing, and PDE (partial differential
equation) based simulations.
Incorrect Answers:
A: Rattle is the R analytical tool that gets you started with data analytics and machine learning.
C: Weka is used for visual data mining and machine learning software in Java.
D: Scikit-learn is one of the most useful libraries for machine learning in Python. It is on NumPy, SciPy, and matplotlib, this
library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression,
clustering, and dimensionality reduction.
Reference: https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html

 

QUESTION 13
You define a datastore named ml-data for an Azure Storage blob container. In the container, you have a folder named
train that contains a file named data.csv. You plan to use the file to train a model by using the Azure Machine Learning
SDK.
You plan to train the model by using the Azure Machine Learning SDK to run an experiment on local compute.
You define a DataReference object by running the following code:[2021.3] lead4pass dp-100 practice test q13

You need to load the training data. Which code segment should you use?

[2021.3] lead4pass dp-100 practice test q13-1

A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Correct Answer: E
Example:
data_folder = args.data_folder
# Load Train and Test data
train_data = pd.read_csv(os.path.join(data_folder, \\’data.csv\\’))
Reference:
https://www.element61.be/en/resource/azure-machine-learning-services-complete-toolbox-ai


Fulldumps shares the latest updated Microsoft DP-100 exam exercise questions, DP-100 dumps pdf for free.
All exam questions and answers come from the Lead4pass exam dumps shared part! Lead4pass updates throughout the year and shares a portion of your exam questions for free to help you understand the exam content and enhance your exam experience! Get the full Microsoft DP-100 exam dumps questions at https://www.lead4pass.com/dp-100.html (pdf&vce)

ps.
Get free Microsoft DP-100 dumps PDF online: https://drive.google.com/file/d/1dCTFiaHIqtM7a36PFW_mildIQPzQetSI/

Posted in dp-100 Designing and Implementing a Data Science Solution on Azure Microsoft Microsoft DP-100 microsoft dp-100 dumps microsoft dp-100 pdf microsoft dp-100 practice test Microsoft Role-based

[Nov 2020] The latest update Microsoft DP-100 dumps and online practice tests from Lead4Pass

The latest Microsoft DP-100 dumps by Lead4Pass helps you pass the DP-100 exam for the first time! Lead4Pass Latest Update Microsoft DP-100 VCE Dump and DP-100 PDF Dumps, Lead4Pass DP-100 Exam Questions Updated, Answers corrected! Get the latest LeadPass DP-100 dumps with Vce and PDF: https://www.lead4pass.com/dp-100.html (Q&As: 220 dumps)

[Free DP-100 PDF] Microsoft DP-100 Dumps PDF can be collected on Google Drive shared by Lead4Pass:
https://drive.google.com/file/d/1QXv3KvzeDJVuXG6t3ug2JwblUwDGba9m/

[Lead4pass DP-100 Youtube] Microsoft DP-100 Dumps can be viewed on Youtube shared by Lead4Pass

Microsoft DP-100 Online Exam Practice Questions

QUESTION 1
A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file is stored in a
folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for
which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent
folder named sales to create the following hierarchical structure: lead4pass dp-100 practice test q1

At the end of each month, a new folder with that month\\’s sales file is added to the sales folder.
You plan to use the sales data to train a machine learning model based on the following requirements:
1.
You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a
data frame.
2.
You must be able to create experiments that use only data that was created before a specific previous month, ignoring
any data that was added after that month.
3.
You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in the Azure Machine Learning service workspace.
What should you do?
A. Create a tabular dataset that references the datastore and explicitly specifies each \\’sales/mm-yyyy/ sales.csv\\’ file
every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and
specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
B. Create a tabular dataset that references the datastore and specifies the path \\’ sales/*/sales.csv\\’, register the
a dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use
this dataset for all experiments.
C. Create a new tabular dataset that references the datastore and explicitly specifies each \\’sales/mm- yyyy/sales.csv\\’
file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and
YYYY values for the month and year. Use the appropriate month- specific dataset for experiments.
D. Create a tabular dataset that references the datastore and explicitly specifies each \\’sales/mm-yyyy/ sales.csv\\’ file.
Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating
the month and the year it was registered. Use this dataset for all experiments, identifying the version to be used based on
the month tag as necessary.
Correct Answer: B
Specify the path.
Example:
The following code gets the workspace existing workspace and the desired datastore by name. And then passes the
datastore and file locations to the path parameter to create a new TabularDataset, weather_ds.
from azure ml. core import Workspace, Datastore, Dataset
datastore_name = \\’your datastore name\\’
# get an existing workspace
workspace = Workspace.from_config()
# retrieve an existing datastore in the workspace by name datastore = Datastore. get(workspace, datastore_name)
# create a TabularDataset from 3 file paths in datastore datastore_paths = [(datastore, \\’weather/2018/11.csv\\’),
(datastore, \\’weather/2018/12.csv\\’),
(datastore, \\’weather/2019/*.csv\\’)]
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)

 

QUESTION 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear on the review screen.
You are creating a model to predict the price of a student\\’s artwork depending on the following variables:
the student\\’s length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Accuracy, Precision, Recall, F1 score, and AUC.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: B
Those are metrics for evaluating classification models, instead, use: Mean Absolute Error, Root Mean Absolute Error,
Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

 

QUESTION 3
You deploy a model as an Azure Machine Learning real-time web service using the following code.lead4pass dp-100 practice test q3

The deployment fails.
You need to troubleshoot the deployment failure by determining the actions that were performed during deployment and
identifying the specific action that failed.
Which code segment should you run?
A. service.get_logs()
B. service.state
C. service.serialize()
D. service.update_deployment_state()
Correct Answer: A
You can print out detailed Docker engine log messages from the service object. You can view the log for ACI, AKS, and
Local deployments. The following example demonstrates how to print the logs.
# if you already have the service object handy print(service.get_logs())
# if you only know the name of the service (note there might be multiple services with the same name but different
version number) print(ws. webservices[\\’mysvc\\’].get_logs())
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment

 

QUESTION 4
DRAG DROP
You create a training pipeline using the Azure Machine Learning designer. You upload a CSV file that contains the data
from which you want to train your model.
You need to use the designer to create a pipeline that includes steps to perform the following tasks:
1.
Select the training features using the pandas’ filter method.
2.
Train a model based on the naive_bayes.GaussianNB algorithm.
3.
Return only the Scored Labels column by using the query SELECT [Scored Labels] FROM t1;
Which modules should you use? To answer, drag the appropriate modules to the appropriate locations. Each module
name may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to
view content.
NOTE: Each correct selection is worth one point.
Select and Place:lead4pass dp-100 practice test q4

Correct Answer:

lead4pass dp-100 practice test q4-1

 

QUESTION 5
You register a file dataset named csv_folder that references a folder. The folder includes multiple comma-separated
values (CSV) files in an Azure storage blob container.
You plan to use the following code to run a script that loads data from the file dataset. You create and instantiate the
following variables:lead4pass dp-100 practice test q5

You have the following code:

lead4pass dp-100 practice test q5-1

You need to pass the dataset to ensure that the script can read the files it references. Which code segment should you
insert to replace the code comment?
A. inputs=[file_dataset.as_named_input(\\’training_files\\’)],
B. inputs=[file_dataset.as_named_input(\\’training_files\\’).as_mount()],
C. inputs=[file_dataset.as_named_input(\\’training_files\\’).to_pandas_dataframe ()],
D. script_params={\\’–training_files\\’: file_dataset},
Correct Answer: B
Example:
from azureml.train.estimator import Estimator
script_params = {
# to mount files referenced by mnist dataset
\\’–data-folder\\’: mnist_file_dataset.as_named_input(\\’mnist_opendataset\\’).as_mount(), \\’–regularization\\’: 0.5
}
est = Estimator(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
environment_definition=env,
entry_script=\\’train.py\\’)
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-train-models-with-aml

 

QUESTION 6
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear on the review screen.
You create an Azure Machine Learning service datastore in a workspace. The datastore contains the following files:
1.
/data/2018/Q1.csv
2.
/data/2018/Q2.csv
3.
/data/2018/Q3.csv
4.
/data/2018/Q4.csv
5.
/data/2019/Q1.csv All files store data in the following format: id,f1,f2,I
1,1,2,0 2,1,1,1 3,2,1,0 4,2,2,1
You run the following code:lead4pass dp-100 practice test q6

Does the solution meet the goal?
A. Yes
B. No
Correct Answer: B
Use two file paths.
Use Dataset.Tabular_from_delimeted, instead of Dataset.File.from_files as the data isn\\’t cleansed.
Note:
A FileDataset references single or multiple files in your datastores or public URLs. If your data is already cleansed, and
ready to use in training experiments, you can download or mount the files to your compute as a FileDataset object.
A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides you with
the ability to materialize the data into a panda or Spark DataFrame so you can work with familiar data preparation and
training libraries without having to leave your notebook. You can create a TabularDataset object from .csv, .tsv,
.parquet, .jsonl files, and from SQL query results.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets

 

QUESTION 7
You create a multi-class image classification deep learning model that uses a set of labeled images. You create a script
file named train.py that uses the PyTorch 1.3 framework to train the model.
You must run the script by using an estimator. The code must not require any additional Python libraries to be installed
in the environment for the estimator. The time required for model training must be minimized.
You need to define the estimator that will be used to run the script.
Which estimator type should you use?
A. TensorFlow
B. PyTorch
C. SKLearn
D. Estimator
Correct Answer: B
For PyTorch, TensorFlow, and Chainer tasks, Azure Machine Learning provides respective PyTorch, TensorFlow, and
Chainer estimators to simplify using these frameworks.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-ml-models

 

QUESTION 8
HOTSPOT
You need to identify the methods for dividing the data according, to the testing requirements.
Which properties should you select? To answer, select the appropriate option-, m the answer area;
NOTE: Each correct selection is worth one point.
Hot Area:lead4pass dp-100 practice test q8

 

QUESTION 9
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear on the review screen.
You train a classification model by using a logistic regression algorithm.
You must be able to explain the model\\’s predictions by calculating the importance of each feature, both as an overall
global relative importance value and as a measure of local importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local feature importance values.
Solution: Create a PFIExplainer.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: A
Permutation Feature Importance Explainer (PFI): Permutation Feature Importance is a technique used to explain
classification and regression models. At a high level, the way it works is by randomly shuffling data one feature at a time
for the entire dataset and calculating how much the performance metric of interest changes. The larger the change, the
more important that feature is. PFI can explain the overall behavior of any underlying model but does not explain
individual predictions.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability

 

QUESTION 10
You create a batch inference pipeline by using the Azure ML SDK. You run the pipeline by using the following code:
from azure ml.pipeline.core import Pipeline
from azure ml. core.experiment import Experiment
pipeline = Pipeline(workspace=ws, steps=[parallelrun_step]) pipeline_run = Experiment(ws,
\\’batch_pipeline\\’).submit(pipeline)
You need to monitor the progress of the pipeline execution.
What are two possible ways to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.lead4pass dp-100 practice test q10

A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Correct Answer: DE
A batch inference job can take a long time to finish. This example monitors progress by using a Jupyter widget. You can
also manage the job\\’s progress by using:
1.
Azure Machine Learning Studio.
2.
Console output from the PipelineRun object.
from azureml.widgets import RunDetails RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion(show_output=True) Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-parallel-run-step#monitor-the-parallel-run-job

 

QUESTION 11
You are solving a classification task.
You must evaluate your model on a limited data sample by using k-fold cross-validation. You start by configuring a k
parameter as the number of splits.
You need to configure the k parameter for the cross-validation.
Which value should you use?
A. k=0.5
B. k=0.01
C. k=5
D. k=1
Correct Answer: C
Leave One Out (LOO) cross-validation
Setting K = n (the number of observations) yields n-fold and is called leave-one-out cross-validation (LOO), a special
case of the K-fold approach.
LOO CV is sometimes useful but typically doesn\\’t shake up the data enough. The estimates from each fold are highly
correlated and hence their average can have high variance. This is why the usual choice is K=5 or 10. It provides a
good
compromise for the bias-variance tradeoff.

 

QUESTION 12
You plan to create a speech recognition deep learning model.
The model must support the latest version of Python.
You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual
Machine (DSVM).
What should you recommend?
A. Rattle
B. TensorFlow
C. Weka
D. Scikit-learn
Correct Answer: B
TensorFlow is an open-source library for numerical computation and large-scale machine learning. It uses Python to
provide a convenient front-end API for building applications with the framework TensorFlow can train and run deep
neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks,
sequence-to-sequence models for machine translation, natural language processing, and PDE (partial differential
equation) based simulations.
Incorrect Answers:
A: Rattle is the R analytical tool that gets you started with data analytics and machine learning.
C: Weka is used for visual data mining and machine learning software in Java.
D: Scikit-learn is one of the most useful library for machine learning in Python. It is on NumPy, SciPy, and matplotlib, this
the library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression,
clustering and dimensionality reduction.
Reference: https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html

 

QUESTION 13
You are retrieving data from a large datastore by using Azure Machine Learning Studio.
You must create a subset of the data for testing purposes using a random sampling seed based on the system clock.
You add the Partition and Sample module to your experiment.
You need to select the properties for the module.
Which values should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:lead4pass dp-100 practice test q13

Correct Answer:

lead4pass dp-100 practice test q13-1

Box 1: Sampling Create a sample of data This option supports simple random sampling or stratified random sampling.
This is useful if you want to create a smaller representative sample dataset for testing.
1.
Add the Partition and Sample module to your experiment in Studio, and connect the dataset.
2.
Partition or sample mode: Set this to Sampling.
3.
Rate of sampling. See box 2 below.
Box 2: 0
3. Rate of sampling. Random seed for sampling: Optionally, type an integer to use as a seed value.
This option is important if you want the rows to be divided the same way every time. The default value is 0, meaning that
a starting seed is generated based on the system clock. This can lead to slightly different results each time you run the
experiment.
References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample


latest updated Microsoft DP-100 exam questions from the Lead4Pass DP-100 dumps! 100% pass the DP-100 exam! Download Lead4Pass DP-100 VCE and PDF dumps: https://www.lead4pass.com/dp-100.html (Q&As: 220 dumps)

Get free Microsoft DP-100 dumps PDF online: https://drive.google.com/file/d/1QXv3KvzeDJVuXG6t3ug2JwblUwDGba9m/

Posted in dp-100 Designing and Implementing a Data Science Solution on Azure Microsoft Microsoft Azure Data Scientist Associate Microsoft DP-100 microsoft dp-100 certification exam microsoft dp-100 dumps microsoft dp-100 pdf microsoft dp-100 practice test

[September 2020] New Microsoft DP-100 Brain dumps and online practice tests are shared from Lead4Pass (latest Updated)

The latest Microsoft DP-100 dumps by Lead4Pass helps you pass the DP-100 exam for the first time! Lead4Pass Latest Update Microsoft DP-100 VCE Dump and DP-100 PDF Dumps, Lead4Pass DP-100 Exam Questions Updated, Answers corrected! Get the latest LeadPass DP-100 dumps with Vce and PDF: https://www.lead4pass.com/dp-100.html (Q&As: 218 dumps)

[Free DP-100 PDF] Microsoft DP-100 Dumps PDF can be collected on Google Drive shared by Lead4Pass:
https://drive.google.com/file/d/1NyvH6g8U6EVoM_GGGdFTDqSc-NXheNLh/

[Lead4pass DP-100 Youtube] Microsoft DP-100 Dumps can be viewed on Youtube shared by Lead4Pass

Microsoft DP-100 Online Exam Practice Questions

QUESTION 1
HOTSPOT
You are performing sentiment analysis using a CSV file that includes 12.0O0 customer reviews written in a short
sentence format. You add the CSV file to Azure Machine Learning Studio and Configure it as the starting point dataset
of an
experiment. You add the Extract N-Gram Features from the Text module to the experiment to extract key phrases from the
customer review column in the dataset.
You must create a new n-gram text dictionary from the customer review text and set the maximum n-gram size to
trigrams.
You need to configure the Extract N-Gram Features from the Text module.
What should you select? To answer, select the appropriate options in the answer area;
NOTE: Each correct selection is worth one point.
Hot Area: lead4pass dp-100 exam questions q1

Correct Answer:

lead4pass dp-100 exam questions q1-1

 

QUESTION 2
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear on the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Principal Components Analysis (PCA) sampling mode.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: B
Instead, use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underrepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.
Incorrect Answers:
The Principal Component Analysis module in Azure Machine Learning Studio (classic) is used to reduce the
dimensionality of your training data. The module analyzes your data and creates a reduced feature set that captures all
the
information contained in the dataset, but in a smaller number of features.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/principal-component-analysis

 

QUESTION 3
You are using a decision tree algorithm. You have trained a model that generalizes well at a tree depth equal to 10.
You need to select the bias and variance properties of the model with varying tree depth values.
Which properties should you select for each tree depth? To answer, select the appropriate options in the answer area.
Hot Area: 

lead4pass dp-100 exam questions q3

Correct Answer:

lead4pass dp-100 exam questions q3-1

Indecision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. deep) has low bias
and high variance.
Note: In statistics and machine learning, the bias-variance tradeoff is the property of a set of predictive models whereby
models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples
and vice versa. Increasing the bias will decrease the variance. Increasing the variance will decrease bias.
References: https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

 

QUESTION 4
You are building a recurrent neural network to perform binary classification.
The training loss, validation loss, training accuracy, and validation accuracy of each training epoch has been provided.
You need to identify whether the classification model is overfitted.
Which of the following is correct?
A. The training loss stays constant and the validation loss stays on a constant value and closes to the training loss value
when training the model.
B. The training loss decreases while the validation loss increases when training the model.
C. The training loss stays constant and the validation loss decreases when training the model.
D. The training loss increases while the validation loss decreases when training the model.
Correct Answer: B
An overfit model is one where performance on the train set is good and continues to improve, whereas performance on
the validation set improves to a point and then begins to degrade. References: https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/

 

QUESTION 5
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear on the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: A
SMOTE is used to increase the number of underrepresented cases in a dataset used for machine learning. SMOTE is a
better way of increasing the number of rare cases than simply duplicating existing cases.
References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote

 

QUESTION 6
DRAG DROP
You need to define a process for penalty event detection.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions
to the answer area and arrange them in the correct order.
Select and Place:

lead4pass dp-100 exam questions q6

Correct Answer:

lead4pass dp-100 exam questions q6-1

 

QUESTION 7
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains
a unique solution that might meet the stated goals. Some question sets might have more than one correct solution,
while
others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not
appear in the review screen. You are analyzing a numerical dataset that contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature
set.
You need to analyze a full dataset to include all values.
Solution: Remove the entire column that contains the missing data point.
Does the solution meet the goal?
A. Yes
B. No
Correct Answer: B
Use the Multiple Imputation by Chained Equations (MICE) method.
References:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data

 

QUESTION 8
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?
A. Violin pilot
B. Gradient descent
C. Box pilot
D. Binary classification confusion matrix
Correct Answer: D
Incorrect Answers:
A: A violin plot is a visual that traditionally combines a box plot and a kernel density plot.
B: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local
minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or
approximate gradient) of the function at the current point.
C: A box plot lets you see basic distribution information about your data, such as median, mean, range, and quartiles but
doesn\\’t show you how your data looks throughout its range.
References: https://machinelearningknowledge.ai/confusion-matrix-and-performance-metrics-machine-learning/ 

 

QUESTION 9
HOTSPOT
You need to configure the Permutation Feature Importance module for the model training requirements.
What should you do? To answer, select the appropriate options in the dialog box in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area: lead4pass dp-100 exam questions q9

Correct Answer:

lead4pass dp-100 exam questions q9-1

Box 1: 500
For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a number is generated
based on the system clock.
A seed value is optional, but you should provide a value if you want reproducibility across runs of the same experiment.
Here we must replicate the findings.
Box 2: Mean Absolute Error
Scenario: Given a trained model and a test dataset, you must compute the Permutation Feature Importance scores of
feature variables. You need to set up the Permutation Feature Importance module to select the correct metric to
investigate
the model\\’s accuracy and replicate the findings.
Regression. Choose one of the following: Precision, Recall, Mean Absolute Error, Root Mean Squared Error, Relative
Absolute Error, Relative Squared Error, Coefficient of Determination
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-feature-importance

 

QUESTION 10
You have a feature set containing the following numerical features: X, Y, and Z.
The Poisson correlation coefficient (r-value) of X, Y, and Z features is shown in the following image:

lead4pass dp-100 exam questions q10

Use the drop-down menus to select the answer choice that answers each question based on the information presented
in the graphic. NOTE: Each correct selection is worth one point.
Hot Area:

lead4pass dp-100 exam questions q10-1

Correct Answer:

lead4pass dp-100 exam questions q10-2

Box 1: 0.859122
Box 2: a positively linear relationship +1 indicates a strong positive linear relationship -1 indicates a strong negative
linear correlation 0 denotes no linear relationship between the two variables. References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-linear-correlation

 

QUESTION 11
You need to replace the missing data in the AccessibilityToHighway columns.
How should you configure the Clean Missing Data module? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Hot Area:lead4pass dp-100 exam questions q11 lead4pass dp-100 exam questions q11-1

Correct Answer:

lead4pass dp-100 exam questions q11-2 lead4pass dp-100 exam questions q11-3

Box 1: Replace using MICE
Replace using MICE: For each missing value, this option assigns a new value, which is calculated by using a method
described in the statistical literature as “Multivariate Imputation using Chained Equations” or “Multiple Imputation by
Chained Equations”. With a multiple imputation method, each variable with missing data is modeled conditionally using
the other variables in the data before filling in the missing values.
Scenario: The AccessibilityToHighway column in both datasets contains missing values. The missing data must be
replaced with new data so that it is modeled conditionally using the other variables in the data before filling in the
missing
values.
Box 2: Propagate
Cols with all missing values indicate if columns of all missing values should be preserved in the output.
References:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data

 

QUESTION 12
HOTSPOT
You need to identify the methods for dividing the data according to the testing requirements.
Which properties should you select? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:lead4pass dp-100 exam questions q12 lead4pass dp-100 exam questions q12-1

Correct Answer:

lead4pass dp-100 exam questions q12-2 lead4pass dp-100 exam questions q12-3

Scenario: Testing
You must produce multiple partitions of a dataset based on sampling using the Partition and Sample module in Azure
Machine Learning Studio.
Box 1: Assign to folds
Use Assign to folds option when you want to divide the dataset into subsets of the data. This option is also useful when
you want to create a custom number of folds for cross-validation, or to split rows into several groups.
Not Head: Use Head mode to get only the first n rows. This option is useful if you want to test a pipeline on a small
a number of rows and don\\’t need the data to be balanced or sampled in any way.
Not Sampling: The Sampling option supports simple random sampling or stratified random sampling. This is useful if
you want to create a smaller representative sample dataset for testing.
Box 2: Partition evenly
Specify the partitioner method: Indicate how you want data to be apportioned to each partition, using these options:
Partition evenly: Use this option to place an equal number of rows in each partition. To specify the number of output
partitions, type a whole number in the Specify number of folds to split evenly into the text box.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-module-reference/partition-and-sample

 

QUESTION 13
You are using the C-Support Vector classification to do a multi-class classification with an unbalanced training dataset. The
C-Support Vector classification using Python code shown below:

lead4pass dp-100 exam questions q13

You need to evaluate the C-Support Vector classification code.
Which evaluation statement should you use? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

lead4pass dp-100 exam questions q13-1

Correct Answer:

lead4pass dp-100 exam questions q13-2

Box 1: Automatically adjust weights inversely proportional to class frequencies in the input data
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in
the input data as n_samples / (n_classes * np.bincount(y)).
Box 2: Penalty parameter
Parameter: C : float, optional (default=1.0)
Penalty parameter C of the error term.
References:
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html


latest updated Microsoft DP-100 exam questions from the Lead4Pass DP-100 dumps! 100% pass the DP-100 exam! Download Lead4Pass DP-100 VCE and PDF dumps: https://www.lead4pass.com/dp-100.html (Q&As: 218 dumps)

Get free Microsoft DP-100 dumps PDF online: https://drive.google.com/file/d/1NyvH6g8U6EVoM_GGGdFTDqSc-NXheNLh/