A retail company wants to build a recommendation system for the company's website. The system needs to provide recommendations for existing users and needs to base those recommendations on each user's past browsing history. The system also must filter out any items that the user previously purchased.
Which solution will meet these requirements with the LEAST development effort?
Answer : C
Amazon Personalize is a fully managed machine learning service that makes it easy for developers to create personalized user experiences at scale. It uses the same recommender system technology that Amazon uses to create its own personalized recommendations. Amazon Personalize provides several pre-built recipes that can be used to train models for different use cases. The USER_PERSONALIZATION recipe is designed to provide personalized recommendations for existing users based on their past interactions with items. The PERSONALIZED_RANKING recipe is designed to re-rank a list of items for a user based on their preferences. The USER_PERSONALIZATION recipe is more suitable for this use case because it can generate recommendations for each user without requiring a list of candidate items. To filter out the items that the user previously purchased, a real-time filter can be created and applied to the campaign. A real-time filter is a dynamic filter that uses the latest interaction data to exclude items from the recommendations. By using Amazon Personalize, the development effort is minimized because it handles the data processing, model training, and deployment automatically. The web application can use the GetRecommendations API operation to get the real-time recommendations from the campaign.References:
USER_PERSONALIZATION recipe
PERSONALIZED_RANKING recipe
Filtering recommendations
GetRecommendations API operation
A machine learning specialist is applying a linear least squares regression model to a dataset with 1,000 records and 50 features. Prior to training, the specialist notices that two features are perfectly linearly dependent.
Why could this be an issue for the linear least squares regression model?
Answer : B
In linear least squares regression, the design matrix (often denoted as XXX) must have full rank to ensure a unique solution. When two or more features are perfectly linearly dependent, it leads to multicollinearity, causing the matrix XTXX^TXXTX to become singular (non-invertible). This singularity prevents the computation of a unique solution for the regression coefficients.
From the documentation:
A company operates large cranes at a busy port. The company plans to use machine learning (ML) for predictive maintenance of the cranes to avoid unexpected breakdowns and to improve productivity.
The company already uses sensor data from each crane to monitor the health of the cranes in real time. The sensor data includes rotation speed, tension, energy consumption, vibration, pressure, and ...perature for each crane. The company contracts AWS ML experts to implement an ML solution.
Which potential findings would indicate that an ML-based solution is suitable for this scenario? (Select TWO.)
Answer : D, E
References:
1:Machine Learning Techniques for Predictive Maintenance
2:A Guide to Predictive Maintenance & Machine Learning
3:Machine Learning for Predictive Maintenance: Reinventing Asset Upkeep
4:Predictive Maintenance with Machine Learning: A Complete Guide
: [Machine Learning for Predictive Maintenance - AWS Online Tech Talks]
A company is building a demand forecasting model based on machine learning (ML). In the development stage, an ML specialist uses an Amazon SageMaker notebook to perform feature engineering during work hours that consumes low amounts of CPU and memory resources. A data engineer uses the same notebook to perform data preprocessing once a day on average that requires very high memory and completes in only 2 hours. The data preprocessing is not configured to use GPU. All the processes are running well on an ml.m5.4xlarge notebook instance.
The company receives an AWS Budgets alert that the billing for this month exceeds the allocated budget.
Which solution will result in the MOST cost savings?
Answer : C
The best solution to reduce the cost of the notebook instance and the data preprocessing job is to change the notebook instance type to a smaller general-purpose instance, stop the notebook when it is not in use, and run data preprocessing on an ml.r5 instance with the same memory size as the ml.m5.4xlarge instance by using Amazon SageMaker Processing. This solution will result in the most cost savings because:
The other options are not as effective as option C for the following reasons:
References:
Manage Notebook Instances - Amazon SageMaker
Amazon EC2 Pricing - Reserved Instances
A manufacturing company wants to monitor its devices for anomalous behavior. A data scientist has trained an Amazon SageMaker scikit-learn model that classifies a device as normal or anomalous based on its 4-day telemetry. The 4-day telemetry of each device is collected in a separate file and is placed in an Amazon S3 bucket once every hour. The total time to run the model across the telemetry for all devices is 5 minutes.
What is the MOST cost-effective solution for the company to use to run the model across the telemetry for all the devices?
Answer : A
The task involves periodic inference (hourly batches of telemetry data) that can be processed in bulk, with the total processing time under 5 minutes. The most cost-effective solution for this batch prediction scenario is SageMaker Batch Transform, as it avoids the need for always-on endpoints.
''Use Batch Transform when you: ... Don't need a persistent endpoint and want to process batches of data at once. This is typically the most cost-efficient way to process large volumes of input data periodically.''
A Machine Learning Specialist is assigned a TensorFlow project using Amazon SageMaker for training, and needs to continue working for an extended period with no Wi-Fi access.
Which approach should the Specialist use to continue working?
Answer : B
Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. SageMaker provides a variety of tools and frameworks to support the entire machine learning workflow, from data preparation to model deployment.
One of the tools that SageMaker offers is the Amazon SageMaker Python SDK, which is a high-level library that simplifies the interaction with SageMaker APIs and services. The SageMaker Python SDK allows you to write code in Python and use popular frameworks such as TensorFlow, PyTorch, MXNet, and more. You can use the SageMaker Python SDK to create and manage SageMaker resources such as notebook instances, training jobs, endpoints, and feature store.
If you need to continue working on a TensorFlow project using SageMaker for training without Wi-Fi access, the best approach is to download the TensorFlow Docker container used in SageMaker from GitHub to your local environment, and use the SageMaker Python SDK to test the code. This way, you can ensure that your code is compatible with the SageMaker environment and avoid any potential issues when you upload your code to SageMaker and start the training job. You can also use the same code to deploy your model to a SageMaker endpoint when you have Wi-Fi access again.
References:
SageMaker Docker GitHub repository
SageMaker Studio Image Build CLI
SageMaker Python SDK installation guide
SageMaker Python SDK TensorFlow documentation
An online store is predicting future book sales by using a linear regression model that is based on past sales dat
a. The data includes duration, a numerical feature that represents the number of days that a book has been listed in the online store. A data scientist performs an exploratory data analysis and discovers that the relationship between book sales and duration is skewed and non-linear.
Which data transformation step should the data scientist take to improve the predictions of the model?
Answer : C
Quantile binning is a data transformation technique that can be used to handle skewed and non-linear numerical features. It divides the range of a feature into equal-sized bins based on the percentiles of the data. Each bin is assigned a numerical value that represents the midpoint of the bin. This way, the feature values are transformed into a more uniform distribution that can improve the performance of linear models. Quantile binning can also reduce the impact of outliers and noise in the data.
One-hot encoding, Cartesian product transformation, and normalization are not suitable for this scenario. One-hot encoding is used to transform categorical features into binary features. Cartesian product transformation is used to create new features by combining existing features. Normalization is used to scale numerical features to a standard range, but it does not change the shape of the distribution.References:
Data Transformations for Machine Learning
Quantile Binning Transformation