A manufacturing company wants to create product descriptions in multiple languages.
Which AWS service will automate this task?
Answer : A
The manufacturing company needs to create product descriptions in multiple languages, which requires automated language translation. Amazon Translate is a fully managed service that uses machine learning to provide high-quality translation between languages, making it the ideal solution for this task.
Exact Extract from AWS AI Documents:
From the Amazon Translate Developer Guide:
'Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. It can be used to automatically translate text, such as product descriptions, into multiple languages to reach a global audience.'
(Source: Amazon Translate Developer Guide, Introduction to Amazon Translate)
Detailed
Option A: Amazon TranslateThis is the correct answer. Amazon Translate automates the translation of text into multiple languages, directly addressing the company's need to create product descriptions in different languages.
Option B: Amazon TranscribeAmazon Transcribe converts speech to text, which is unrelated to translating text into multiple languages. This option is incorrect.
Option C: Amazon KendraAmazon Kendra is an intelligent search service that uses machine learning to provide answers from documents, not for translating text. This option is irrelevant.
Option D: Amazon PollyAmazon Polly is a text-to-speech service that generates spoken audio from text, not for translating text into other languages. This option does not meet the requirements.
Amazon Translate Developer Guide: Introduction to Amazon Translate (https://docs.aws.amazon.com/translate/latest/dg/what-is.html)
AWS AI Practitioner Learning Path: Module on Natural Language Processing Services
AWS Documentation: Language Translation with Amazon Translate (https://aws.amazon.com/translate/)
A company is developing an ML model to predict customer churn.
Which evaluation metric will assess the model's performance on a binary classification task such as predicting chum?
Answer : A
The company is developing an ML model to predict customer churn, a binary classification task (churn or no churn). The F1 score is an evaluation metric that balances precision and recall, making it suitable for assessing the performance of binary classification models, especially when dealing with imbalanced datasets, which is common in churn prediction.
Exact Extract from AWS AI Documents:
From the Amazon SageMaker Developer Guide:
'The F1 score is a metric for evaluating binary classification models, combining precision and recall into a single value. It is particularly useful for tasks like churn prediction, where class imbalance may exist, ensuring the model performs well on both positive and negative classes.'
(Source: Amazon SageMaker Developer Guide, Model Evaluation Metrics)
Detailed
Option A: F1 scoreThis is the correct answer. The F1 score is ideal for binary classification tasks like churn prediction, as it measures the model's ability to correctly identify both churners and non-churners.
Option B: Mean squared error (MSE)MSE is used for regression tasks to measure the average squared difference between predicted and actual values, not for binary classification.
Option C: R-squaredR-squared is a metric for regression models, indicating how well the model explains the variability of the target variable. It is not applicable to classification tasks.
Option D: Time used to train the modelTraining time is not an evaluation metric for model performance; it measures the duration of training, not the model's accuracy or effectiveness.
Amazon SageMaker Developer Guide: Model Evaluation Metrics (https://docs.aws.amazon.com/sagemaker/latest/dg/model-evaluation.html)
AWS AI Practitioner Learning Path: Module on Model Performance and Evaluation
AWS Documentation: Metrics for Classification (https://aws.amazon.com/machine-learning/)
An AI practitioner is developing a prompt for an Amazon Titan model. The model is hosted on Amazon Bedrock. The AI practitioner is using the model to solve numerical reasoning challenges. The AI practitioner adds the following phrase to the end of the prompt: "Ask the model to show its work by explaining its reasoning step by step."
Which prompt engineering technique is the AI practitioner using?
Answer : A
Chain-of-thought prompting is a prompt engineering technique where you instruct the model to explain its reasoning step by step, which is particularly useful for tasks involving logic, math, or reasoning.
A is correct: Asking the model to 'explain its reasoning step by step' directly invokes chain-of-thought prompting, as documented in AWS and generative AI literature.
B is unrelated (prompt injection is a security concern).
C (few-shot) provides examples, but doesn't specifically require step-by-step reasoning.
D (templating) is about structuring the prompt format.
'Chain-of-thought prompting elicits step-by-step explanations from LLMs, which improves performance on complex reasoning tasks.' (Reference: Amazon Bedrock Prompt Engineering Guide, AWS Certified AI Practitioner Study Guide)
'Chain-of-thought prompting elicits step-by-step explanations from LLMs, which improves performance on complex reasoning tasks.' (Reference: Amazon Bedrock Prompt Engineering Guide, AWS Certified AI Practitioner Study Guide)
A company wants to identify harmful language in the comments section of social media posts by using an ML model. The company will not use labeled data to train the model. Which strategy should the company use to identify harmful language?
Answer : B
Amazon Comprehend toxicity detection is a managed NLP service that can analyze text for harmful or toxic language using pre-trained models---no need for labeled data or custom training.
B is correct: Comprehend's toxicity detection API is designed for this use case, works out-of-the-box, and requires no data labeling or model training.
A (Rekognition) is for image and video content moderation.
C would require labeled data for training.
D (Polly) is for text-to-speech, not content moderation.
''Amazon Comprehend can detect toxicity in text with pre-trained models, requiring no labeled training data.'' (Reference: Amazon Comprehend Toxicity Detection, AWS AI Practitioner Official Guide)
''Amazon Comprehend can detect toxicity in text with pre-trained models, requiring no labeled training data.'' (Reference: Amazon Comprehend Toxicity Detection, AWS AI Practitioner Official Guide)
A company wants to build a lead prioritization application for its employees to contact potential customers. The application must give employees the ability to view and adjust the weights assigned to different variables in the model based on domain knowledge and expertise.
Which ML model type meets these requirements?
Answer : A
The company needs an ML model for a lead prioritization application where employees can view and adjust the weights assigned to different variables based on domain knowledge. Logistic regression is a linear model that assigns interpretable weights to input features, making it easy for users to understand and modify these weights. This interpretability and adjustability make it suitable for the requirements.
Exact Extract from AWS AI Documents:
From the AWS AI Practitioner Learning Path:
'Logistic regression is a supervised learning algorithm used for classification tasks. It is highly interpretable, as it assigns weights to each feature, allowing users to understand and adjust the importance of different variables based on domain expertise.'
(Source: AWS AI Practitioner Learning Path, Module on Machine Learning Algorithms)
Detailed
Option A: Logistic regression modelThis is the correct answer. Logistic regression provides interpretable coefficients (weights) for each feature, enabling employees to view and adjust them based on domain knowledge, meeting the application's requirements.
Option B: Deep learning model built on principal componentsDeep learning models, even when using principal components, are complex and lack interpretability. The weights in such models are not easily adjustable by users, making this option unsuitable.
Option C: K-nearest neighbors (k-NN) modelk-NN is a non-parametric model that does not assign explicit weights to features. It relieson distance metrics, which are not easily adjustable based on domain knowledge, so it does not meet the requirements.
Option D: Neural networkNeural networks are highly complex and lack interpretability, as their weights are not directly tied to input features in a human-understandable way. Adjusting weights based on domain knowledge is impractical, making this option incorrect.
AWS AI Practitioner Learning Path: Module on Machine Learning Algorithms
Amazon SageMaker Developer Guide: Logistic Regression (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html)
AWS Documentation: Interpretable Machine Learning Models (https://aws.amazon.com/machine-learning/)
Why does overfilting occur in ML models?
Answer : A
Overfitting occurs when an ML model learns the training data too well, including noise and patterns that do not generalize to new data. A key cause of overfitting is when the training dataset does not represent all possible input values, leading the model to over-specialize on the limited data it was trained on, failing to generalize to unseen data.
Exact Extract from AWS AI Documents:
From the Amazon SageMaker Developer Guide:
'Overfitting often occurs when the training dataset is not representative of the broader population of possible inputs, causing the model to memorize specific patterns, including noise, rather than learning generalizable features.'
(Source: Amazon SageMaker Developer Guide, Model Evaluation and Overfitting)
Detailed
Option A: The training dataset does not represent all possible input values.This is the correct answer. If the training dataset lacks diversity and does not cover the range of possible inputs, the model overfits by learning patterns specific to the training data, failing to generalize.
Option B: The model contains a regularization method.Regularization methods (e.g., L2 regularization) are used to prevent overfitting, not cause it. This option is incorrect.
Option C: The model training stops early because of an early stopping criterion.Early stopping is a technique to prevent overfitting by halting training when performance on a validation set degrades. It does not cause overfitting.
Option D: The training dataset contains too many features.While too many features can contribute to overfitting (e.g., by increasing model complexity), this is less directly tied to overfitting than a non-representative dataset. The dataset's representativeness is the primary cause.
Amazon SageMaker Developer Guide: Model Evaluation and Overfitting (https://docs.aws.amazon.com/sagemaker/latest/dg/model-evaluation.html)
AWS AI Practitioner Learning Path: Module on Model Performance and Evaluation
AWS Documentation: Understanding Overfitting (https://aws.amazon.com/machine-learning/)
A company has petabytes of unlabeled customer data to use for an advertisement campaign. The company wants to classify its customers into tiers to advertise and promote the company's products.
Which methodology should the company use to meet these requirements?
Answer : B
Unsupervised learning is the correct methodology for classifying customers into tiers when the data is unlabeled, as it does not require predefined labels or outputs.
Unsupervised Learning:
This type of machine learning is used when the data has no labels or pre-defined categories. The goal is to identify patterns, clusters, or associations within the data.
In this case, the company has petabytes of unlabeled customer data and needs to classify customers into different tiers. Unsupervised learning techniques like clustering (e.g., K-Means, Hierarchical Clustering) can group similar customers based on various attributes without any prior knowledge or labels.
Why Option B is Correct:
Handling Unlabeled Data: Unsupervised learning is specifically designed to work with unlabeled data, making it ideal for the company's need to classify customer data.
Customer Segmentation: Techniques in unsupervised learning can be used to find natural groupings within customer data, such as identifying high-value vs. low-value customers or segmenting based on purchasing behavior.
Why Other Options are Incorrect:
A . Supervised learning: Requires labeled data with input-output pairs to train the model, which is not suitable since the company's data is unlabeled.
C . Reinforcement learning: Focuses on training an agent to make decisions by maximizing some notion of cumulative reward, which does not align with the company's need for customer classification.
D . Reinforcement learning from human feedback (RLHF): Similar to reinforcement learning but involves human feedback to refine the model's behavior; it is also not appropriate for classifying unlabeled customer data.