Amazon AWS Certified Machine Learning Engineer - Associate MLA-C01 Exam Questions

Page: 1 / 14
Total 207 questions
Question 1

A company wants to deploy an Amazon SageMaker AI model that can queue requests. The model needs to handle payloads of up to 1 GB that take up to 1 hour to process. The model must return an inference for each request. The model also must scale down when no requests are available to process.

Which inference option will meet these requirements?



Answer : A

Amazon SageMaker Asynchronous Inference is specifically designed for long-running inference requests and large payloads. It supports payload sizes up to 1 GB and processing times of up to 1 hour, while automatically queuing requests.

Asynchronous inference stores results in Amazon S3 and allows clients to retrieve inference outputs after processing completes. It also supports auto scaling down to zero when there are no incoming requests, reducing cost.

Batch transform is intended for offline, bulk inference and does not return per-request results in an asynchronous request--response pattern. Serverless and real-time inference have strict payload size and timeout limits that do not support 1-hour processing.

Therefore, asynchronous inference is the only SageMaker inference option that meets all stated requirements.


Question 2

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?



Answer : B

AWS Glue DataBrew provides an easy-to-use interface for preparing and transforming data, including masking or obfuscating sensitive information. It offers built-in data masking features, allowing the ML engineer to handle sensitive data securely while retaining its structure and meaning. This solution is efficient and requires minimal coding, making it ideal for ensuring sensitive data is masked before model building begins.


Question 3

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model's performance? (Select TWO.)



Answer : B, D

The scenario describes a classic overfitting problem: the XGBoost model performs well on the training dataset but poorly on the test hold-out dataset. According to AWS Machine Learning and XGBoost documentation, overfitting occurs when a model is too complex and learns noise and patterns specific to the training data rather than generalizable relationships.

One effective way to address overfitting is to reduce model complexity. Option B, reducing the number of features, simplifies the hypothesis space and lowers the risk of fitting spurious correlations. Feature reduction is a recommended best practice when the model shows a large generalization gap between training and test error.

Another effective method is to increase regularization. Option D, increasing the L2 regularization parameter (lambda in XGBoost), penalizes large weights and discourages overly complex trees. AWS documentation explicitly notes that L2 regularization helps improve generalization by smoothing model parameters and reducing variance.

Option A would worsen overfitting by increasing complexity. Option C is incorrect because reducing the number of training samples generally increases overfitting risk. Option E would decrease regularization strength and further degrade test performance.

Therefore, reducing feature complexity and increasing L2 regularization are the correct changes.


Question 4

A company has a large collection of chat recordings from customer interactions after a product release. An ML engineer needs to create an ML model to analyze the chat data. The ML engineer needs to determine the success of the product by reviewing customer sentiments about the product.

Which action should the ML engineer take to complete the evaluation in the LEAST amount of time?



Answer : C

Amazon Comprehend is a fully managed natural language processing (NLP) service that includes a built-in sentiment analysis feature. It can quickly and efficiently analyze text data to determine whether the sentiment is positive, negative, neutral, or mixed. Using Amazon Comprehend requires minimal setup and provides accurate results without the need to train and deploy custom models, making it the fastest and most efficient solution for this task.


Question 5

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Which AWS service or feature can aggregate the data from the various data sources?



Answer : A

Problem Description:

The dataset includes multiple data sources:

Transaction logs and customer profiles in Amazon S3.

Tables in an on-premises MySQL database.

There is a class imbalance in the dataset and interdependencies among features that need to be addressed.

The solution requires data aggregation from diverse sources for centralized processing.

Why AWS Lake Formation?

AWS Lake Formation is designed to simplify the process of aggregating, cataloging, and securing data from various sources, including S3, relational databases, and other on-premises systems.

It integrates with AWS Glue for data ingestion and ETL (Extract, Transform, Load) workflows, making it a robust choice for aggregating data from Amazon S3 and on-premises MySQL databases.

How It Solves the Problem:

Data Aggregation: Lake Formation collects data from diverse sources, such as S3 and MySQL, and consolidates it into a centralized data lake.

Cataloging and Discovery: Automatically crawls and catalogs the data into a searchable catalog, which the ML engineer can query for analysis or modeling.

Data Transformation: Prepares data using Glue jobs to handle preprocessing tasks such as addressing class imbalance (e.g., oversampling, undersampling) and handling interdependencies among features.

Security and Governance: Offers fine-grained access control, ensuring secure and compliant data management.

Steps to Implement Using AWS Lake Formation:

Step 1: Set up Lake Formation and register data sources, including the S3 bucket and on-premises MySQL database.

Step 2: Use AWS Glue to create ETL jobs to transform and prepare data for the ML pipeline.

Step 3: Query and access the consolidated data lake using services such as Athena or SageMaker for further ML processing.

Why Not Other Options?

Amazon EMR Spark jobs: While EMR can process large-scale data, it is better suited for complex big data analytics tasks and does not inherently support data aggregation across sources like Lake Formation.

Amazon Kinesis Data Streams: Kinesis is designed for real-time streaming data, not batch data aggregation across diverse sources.

Amazon DynamoDB: DynamoDB is a NoSQL database and is not suitable for aggregating data from multiple sources like S3 and MySQL.

Conclusion: AWS Lake Formation is the most suitable service for aggregating data from S3 and on-premises MySQL databases, preparing the data for downstream ML tasks, and addressing challenges like class imbalance and feature interdependencies.

AWS Lake Formation Documentation

AWS Glue for Data Preparation


Question 6

A company has a binary classification model in production. An ML engineer needs to develop a new version of the model.

The new model version must maximize correct predictions of positive labels and negative labels. The ML engineer must use a metric to recalibrate the model to meet these requirements.

Which metric should the ML engineer use for the model recalibration?



Answer : A

Accuracy measures the proportion of correctly predicted labels (both positive and negative) out of the total predictions. It is the appropriate metric when the goal is to maximize the correct predictions of both positive and negative labels. However, it assumes that the classes are balanced; if the classes are imbalanced, other metrics like precision, recall, or specificity may be more relevant depending on the specific needs.


Question 7

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?



Answer : D

AWS documentation recommends using RecordIO format with lazy loading to optimize data input pipelines for image-based training workloads. RecordIO is a binary data format that enables sequential reads, reducing I/O overhead and improving throughput during training.

By converting JPEG images into RecordIO format, the training job can read data more efficiently from Amazon S3. Lazy loading ensures that only the required data is loaded into memory when needed, which optimizes CPU utilization during computationally intensive preprocessing steps.

Option A (file mode) results in many small S3 GET requests, which can become a bottleneck for large image datasets. Option B changes training behavior and can negatively affect convergence and performance. Option C reduces image quality, which directly impacts model accuracy and violates the requirement.

AWS SageMaker documentation explicitly highlights RecordIO and lazy loading as best practices for high-performance image training pipelines, especially when preprocessing is CPU-intensive.

Therefore, Option D is the correct and AWS-aligned solution.


Page:    1 / 14   
Total 207 questions