Amazon AWS Certified Generative AI Developer - Professional AIP-C01 Exam Questions

Page: 1 / 14
Total 107 questions
Question 1

An ecommerce company is developing a generative AI (GenAI) solution that uses Amazon Bedrock with Anthropic Claude to recommend products to customers. Customers report that some recommended products are not available for sale or are not relevant. Customers also report long response times for some recommendations.

The company confirms that most customer interactions are unique and that the solution recommends products not present in the product catalog.

Which solution will meet this requirement?



Answer : C

Option C is the correct solution because it directly addresses both correctness and performance issues by grounding the model's responses in authoritative product data using Retrieval Augmented Generation. Amazon Bedrock Knowledge Bases are designed to connect foundation models to trusted enterprise data sources, ensuring that generated responses are constrained to known, validated content.

By ingesting the product catalog into a knowledge base, the GenAI application retrieves only products that actually exist in the catalog. This prevents hallucinated or unavailable recommendations, which is a common issue when models rely solely on prompt instructions without retrieval grounding. RAG ensures that the model's output is based on retrieved facts rather than learned generalizations.

Setting the PerformanceConfigLatency parameter to optimized enables Bedrock to prioritize lower-latency retrieval and inference paths, improving responsiveness for real-time recommendation scenarios. This directly addresses the reported performance issues without requiring provisioned throughput or caching strategies that are ineffective for mostly unique interactions.

Option A improves safety and latency predictability but does not ensure recommendations are limited to valid products. Option B relies on prompt constraints, which are not sufficient to prevent hallucinations. Option D introduces additional validation and caching layers but increases complexity and does not improve generation relevance.

Therefore, Option C best resolves both relevance and latency challenges using AWS-native, low-maintenance GenAI integration patterns.


Question 2

A company is developing a generative AI (GenAI) application that uses Amazon Bedrock foundation models. The application has several custom tool integrations. The application has experienced unexpected token consumption surges despite consistent user traffic.

The company needs a solution that uses Amazon Bedrock model invocation logging to monitor InputTokenCount and OutputTokenCount metrics. The solution must detect unusual patterns in tool usage and identify which specific tool integrations cause abnormal token consumption. The solution must also automatically adjust thresholds as traffic patterns change.

Which solution will meet these requirements?



Answer : C

Option C best meets the requirements by combining native Amazon Bedrock logging with adaptive monitoring and minimal operational overhead. Amazon Bedrock model invocation logging can be sent directly to CloudWatch Logs, where detailed fields such as InputTokenCount, OutputTokenCount, and tool invocation metadata are captured for each request.

CloudWatch metric filters allow extraction of structured metrics from logs, including tool-specific token consumption patterns. By defining filters per tool integration, the company can isolate which tools are responsible for increased token usage without building custom log-processing pipelines.

CloudWatch anomaly detection provides automatic baseline modeling and dynamic thresholds based on historical traffic patterns. Unlike static alarms, anomaly detection adapts as usage evolves, making it ideal for applications with changing workloads or seasonal usage patterns. This directly satisfies the requirement to automatically adjust thresholds as traffic patterns change.

When abnormal token consumption occurs, anomaly detection alarms trigger immediately, enabling rapid investigation and remediation. Because this solution uses fully managed AWS services without custom analytics jobs or manual threshold tuning, it significantly reduces operational effort.

Option A fails to adapt to changing patterns. Option B introduces batch analysis and delayed insights. Option D requires manual intervention and custom code, increasing maintenance burden.

Therefore, Option C provides the most scalable, adaptive, and low-maintenance solution for monitoring and controlling token consumption in Amazon Bedrock--based applications.


Question 3

A financial services company is building a customer support application that retrieves relevant financial regulation documents from a database based on semantic similarity to user queries. The application must integrate with Amazon Bedrock to generate responses. The application must search documents in English, Spanish, and Portuguese. The application must filter documents by metadata such as publication date, regulatory agency, and document type.

The database stores approximately 10 million document embeddings. To minimize operational overhead, the company wants a solution that minimizes management and maintenance effort while providing low-latency responses for real-time customer interactions.

Which solution will meet these requirements?



Answer : A

Option A is the optimal solution because it provides scalable semantic search, rich metadata filtering, and tight integration with Amazon Bedrock while minimizing operational overhead. Amazon OpenSearch Serverless is designed for high-volume, low-latency search workloads and removes the need to manage clusters, capacity planning, or scaling policies.

With support for vector search and structured metadata filtering, OpenSearch Serverless enables efficient similarity search across 10 million embeddings while applying constraints such as language, publication date, regulatory agency, and document type. This is critical for financial services use cases where relevance and compliance depend on precise filtering.

Integrating OpenSearch Serverless with Amazon Bedrock Knowledge Bases enables a fully managed RAG workflow. The knowledge base handles embedding generation, retrieval, and context assembly, while Amazon Bedrock generates responses using a foundation model. This significantly reduces custom glue code and operational complexity.

Multilingual support is handled at the embedding and retrieval layer, allowing documents in English, Spanish, and Portuguese to be searched semantically without language-specific query logic. OpenSearch's distributed architecture ensures consistent low-latency responses for real-time customer interactions.

Option B increases operational overhead by requiring database tuning and scaling for vector workloads. Option C does not support advanced metadata filtering, which is a key requirement. Option D introduces unnecessary complexity and is not optimized for large-scale semantic document retrieval.

Therefore, Option A best meets the requirements for performance, scalability, multilingual support, and minimal management effort in an Amazon Bedrock--based RAG application.


Question 4

Example Corp provides a personalized video generation service that millions of enterprise customers use. Customers generate marketing videos by submitting prompts to the company's proprietary generative AI (GenAI) model. To improve output relevance and personalization, Example Corp wants to enhance the prompts by using customer-specific context such as product preferences, customer attributes, and business history.

The customers have strict data governance requirements. The customers must retain full ownership and control over their own data. The customers do not require real-time access. However, semantic accuracy must be high and retrieval latency must remain low to support customer experience use cases.

Example Corp wants to minimize architectural complexity in its integration pattern. Example Corp does not want to deploy and manage services in each customer's environment unless necessary.

Which solution will meet these requirements?



Answer : A

Option A is the correct solution because Amazon Q Business is explicitly designed to provide secure, governed access to enterprise data while preserving customer ownership and control. Each customer maintains their own Amazon Q Business index, which ensures that data never leaves the customer's control boundary unless explicitly shared through approved access mechanisms.

By designating Example Corp as a data accessor, customers can allow controlled, auditable access to their indexed content through secure APIs. This model satisfies strict data governance requirements, including data ownership, access transparency, and revocation capability. Customers do not need to expose raw data or deploy infrastructure in Example Corp's environment.

Amazon Q Business provides high semantic accuracy through managed indexing, ranking, and retrieval optimizations. Because real-time access is not required, this approach avoids the complexity and latency challenges of live federated retrieval while still delivering fast query performance suitable for customer experience use cases.

Option B introduces unnecessary operational complexity by requiring real-time MCP servers per customer. Option C requires customers to manage Amazon Bedrock knowledge bases and enable cross-account access, which increases integration complexity and governance risk. Option D requires shared Amazon Kendra indexes across accounts, which complicates access control and data ownership boundaries.

Therefore, Option A provides the cleanest, lowest-overhead architecture that meets data governance, accuracy, performance, and scalability requirements while minimizing operational burden for both Example Corp and its customers.


Question 5

A company upgraded its Amazon Bedrock--powered foundation model (FM) that supports a multilingual customer service assistant. After the upgrade, the assistant exhibited inconsistent behavior across languages. The assistant began generating different responses in some languages when presented with identical questions.

The company needs a solution to detect and address similar problems for future updates. The evaluation must be completed within 45 minutes for all supported languages. The evaluation must process at least 15,000 test conversations in parallel. The evaluation process must be fully automated and integrated into the CI/CD pipeline. The solution must block deployment if quality thresholds are not met.

Which solution will meet these requirements?



Answer : D

Option D is the correct solution because it directly evaluates multilingual output consistency and quality in an automated, scalable, and deployment-gating workflow. Amazon Bedrock model evaluation jobs are designed to run large-scale, repeatable evaluations against defined datasets and to produce quantitative metrics that can be used as objective release criteria.

The core issue is semantic inconsistency across languages for equivalent inputs. The most reliable way to detect this is to create standardized test conversations where each language version expresses the same intent and constraints. Running those tests through the updated model and comparing results with similarity metrics (for example, semantic similarity between expected and actual answers, or between language variants) surfaces regressions that infrastructure testing cannot detect.

Bedrock evaluation jobs support running evaluations at scale and are well suited for processing large datasets quickly. By parallelizing evaluation runs across languages and conversations, the company can meet the 45-minute requirement while executing at least 15,000 conversations. Because the process is standardized, it also allows consistent baseline comparisons across releases.

Applying hallucination thresholds ensures that answers remain grounded and do not introduce fabricated details, which is particularly important when language-specific behavior shifts after a model upgrade. Integrating evaluation jobs into the CI/CD pipeline enables fully automated execution on every model or configuration update. The pipeline can enforce a hard quality gate that blocks deployment if thresholds are not met, preventing regressions from reaching production.

Option A focuses on performance and infrastructure bottlenecks, not multilingual response quality. Option B is post-deployment and too slow to prevent regressions. Option C normalizes inputs but does not measure multilingual output equivalence or provide robust, quantitative gating.

Therefore, Option D best meets the automation, scale, timing, and deployment-blocking requirements.


Question 6

A company is using Amazon Bedrock to develop an AI-powered application that uses a foundation model (FM) that supports cross-Region inference and provisioned throughput. The application must serve users in Europe and North America with consistently low latency. The application must comply with data residency regulations that require European user data to remain within Europe-based AWS Regions.

During testing, the application experiences service degradation when Regional traffic spikes reach service quotas. The company needs a solution that maintains application resilience and minimizes operational complexity.

Which solution will meet these requirements?



Answer : B

Option B is the most appropriate solution because it directly uses Amazon Bedrock cross-Region inference profiles, which are designed to provide resilience and load distribution while respecting data residency boundaries. Cross-Region inference profiles allow applications to distribute inference requests across multiple Regions within a defined geographic boundary, such as Europe or North America, without requiring custom failover logic.

By specifying geographical codes in the inference profile ID, the application ensures that European user data is processed only within Europe-based Regions, satisfying regulatory requirements. At the same time, Bedrock automatically routes requests to healthy Regions within that geography when traffic spikes or service quotas are reached, improving availability and maintaining low latency.

Using separate Amazon API Gateway HTTP APIs for Europe and North America provides a clean, simple routing layer that directs users to the appropriate regional inference profile. This avoids complex custom routing or retry logic in application code and minimizes operational overhead.

Option A relies on custom routing and manual monitoring, which increases complexity and does not provide automatic resilience. Option C introduces custom retry and fallback logic that risks violating data residency requirements if misconfigured. Option D requires significant application-level failover logic and adds operational burden with Global Accelerator configuration.

Therefore, Option B best meets the requirements for low latency, data residency compliance, resilience during traffic spikes, and minimal operational complexity.


Question 7

A company is using Amazon Bedrock and Anthropic Claude 3 Haiku to develop an AI assistant. The AI assistant normally processes 10,000 requests each hour but experiences surges of up to 30,000 requests each hour during peak usage periods. The AI assistant must respond within 2 seconds while operating across multiple AWS Regions.

The company observes that during peak usage periods, the AI assistant experiences throughput bottlenecks that cause increased latency and occasional request timeouts. The company must resolve the performance issues.

Which solution will meet this requirement?



Answer : B

Option B is the correct solution because it directly addresses both throughput bottlenecks and latency requirements using native Amazon Bedrock performance optimization features that are designed for real-time, high-volume generative AI workloads.

Amazon Bedrock supports cross-Region inference profiles, which allow applications to transparently route inference requests across multiple AWS Regions. During peak usage periods, traffic is automatically distributed to Regions with available capacity, reducing throttling, request queuing, and timeout risks. This approach aligns with AWS guidance for building highly available, low-latency GenAI applications that must scale elastically across geographic boundaries.

Token batching further improves efficiency by combining multiple inference requests into a single model invocation where applicable. AWS Generative AI documentation highlights batching as a key optimization technique to reduce per-request overhead, improve throughput, and better utilize model capacity. This is especially effective for lightweight, low-latency models such as Claude 3 Haiku, which are designed for fast responses and high request volumes.

Option A does not meet the requirement because purchasing provisioned throughput in a single Region creates a regional bottleneck and does not address multi-Region availability or traffic spikes beyond reserved capacity. Retries increase load and latency rather than resolving the root cause.

Option C improves application-layer scaling but does not solve model-side throughput limits. Client-side round-robin routing lacks awareness of real-time model capacity and can still send traffic to saturated Regions.

Option D is unsuitable because batch inference with asynchronous retrieval is designed for offline or non-interactive workloads. It cannot meet a strict 2-second response time requirement for an interactive AI assistant.

Therefore, Option B provides the most effective and AWS-aligned solution to achieve low latency, global scalability, and high throughput during peak usage periods.


Page:    1 / 14   
Total 107 questions