A company is using Amazon Bedrock to develop an AI-powered application that uses a foundation model (FM) that supports cross-Region inference and provisioned throughput. The application must serve users in Europe and North America with consistently low latency. The application must comply with data residency regulations that require European user data to remain within Europe-based AWS Regions.
During testing, the application experiences service degradation when Regional traffic spikes reach service quotas. The company needs a solution that maintains application resilience and minimizes operational complexity.
Which solution will meet these requirements?
Answer : B
Option B is the most appropriate solution because it directly uses Amazon Bedrock cross-Region inference profiles, which are designed to provide resilience and load distribution while respecting data residency boundaries. Cross-Region inference profiles allow applications to distribute inference requests across multiple Regions within a defined geographic boundary, such as Europe or North America, without requiring custom failover logic.
By specifying geographical codes in the inference profile ID, the application ensures that European user data is processed only within Europe-based Regions, satisfying regulatory requirements. At the same time, Bedrock automatically routes requests to healthy Regions within that geography when traffic spikes or service quotas are reached, improving availability and maintaining low latency.
Using separate Amazon API Gateway HTTP APIs for Europe and North America provides a clean, simple routing layer that directs users to the appropriate regional inference profile. This avoids complex custom routing or retry logic in application code and minimizes operational overhead.
Option A relies on custom routing and manual monitoring, which increases complexity and does not provide automatic resilience. Option C introduces custom retry and fallback logic that risks violating data residency requirements if misconfigured. Option D requires significant application-level failover logic and adds operational burden with Global Accelerator configuration.
Therefore, Option B best meets the requirements for low latency, data residency compliance, resilience during traffic spikes, and minimal operational complexity.
A company is using Amazon Bedrock and Anthropic Claude 3 Haiku to develop an AI assistant. The AI assistant normally processes 10,000 requests each hour but experiences surges of up to 30,000 requests each hour during peak usage periods. The AI assistant must respond within 2 seconds while operating across multiple AWS Regions.
The company observes that during peak usage periods, the AI assistant experiences throughput bottlenecks that cause increased latency and occasional request timeouts. The company must resolve the performance issues.
Which solution will meet this requirement?
Answer : B
Option B is the correct solution because it directly addresses both throughput bottlenecks and latency requirements using native Amazon Bedrock performance optimization features that are designed for real-time, high-volume generative AI workloads.
Amazon Bedrock supports cross-Region inference profiles, which allow applications to transparently route inference requests across multiple AWS Regions. During peak usage periods, traffic is automatically distributed to Regions with available capacity, reducing throttling, request queuing, and timeout risks. This approach aligns with AWS guidance for building highly available, low-latency GenAI applications that must scale elastically across geographic boundaries.
Token batching further improves efficiency by combining multiple inference requests into a single model invocation where applicable. AWS Generative AI documentation highlights batching as a key optimization technique to reduce per-request overhead, improve throughput, and better utilize model capacity. This is especially effective for lightweight, low-latency models such as Claude 3 Haiku, which are designed for fast responses and high request volumes.
Option A does not meet the requirement because purchasing provisioned throughput in a single Region creates a regional bottleneck and does not address multi-Region availability or traffic spikes beyond reserved capacity. Retries increase load and latency rather than resolving the root cause.
Option C improves application-layer scaling but does not solve model-side throughput limits. Client-side round-robin routing lacks awareness of real-time model capacity and can still send traffic to saturated Regions.
Option D is unsuitable because batch inference with asynchronous retrieval is designed for offline or non-interactive workloads. It cannot meet a strict 2-second response time requirement for an interactive AI assistant.
Therefore, Option B provides the most effective and AWS-aligned solution to achieve low latency, global scalability, and high throughput during peak usage periods.
A company uses Amazon Bedrock to generate technical content for customers. The company has recently experienced a surge in hallucinated outputs when the company's model generates summaries of long technical documents. The model outputs include inaccurate or fabricated details. The company's current solution uses a large foundation model (FM) with a basic one-shot prompt that includes the full document in a single input.
The company needs a solution that will reduce hallucinations and meet factual accuracy goals. The solution must process more than 1,000 documents each hour and deliver summaries within 3 seconds for each document.
Which combination of solutions will meet these requirements? (Select TWO.)
Answer : B, C
The correct answers are B and C because they directly address hallucination reduction while maintaining high throughput and low latency.
Option B reduces hallucinations at their source by grounding model outputs in verified content through Retrieval Augmented Generation (RAG). Using an Amazon Bedrock knowledge base with semantic chunking ensures that long technical documents are broken into meaningfully coherent sections. This allows the model to retrieve only the most relevant chunks, rather than processing an entire document in one pass, which significantly improves factual accuracy and reduces cognitive overload on the model. This approach scales efficiently and supports processing more than 1,000 documents per hour.
Option C adds a defense-in-depth safety layer by using Amazon Bedrock guardrails to detect and block hallucination-like output patterns. Guardrails operate at inference time with minimal performance overhead, making them suitable for low-latency requirements. While guardrails do not eliminate hallucinations entirely, they effectively prevent unsafe or clearly fabricated outputs from reaching users.
Option A increases latency and cost due to explicit reasoning steps and does not scale well for high-throughput workloads. Option D increases randomness and worsens hallucinations. Option E repeats the existing flawed approach.
Therefore, Options B and C together provide scalable grounding and runtime protection that meet accuracy, performance, and throughput requirements.
An ecommerce company is building an internal platform to develop generative AI applications by using Amazon Bedrock foundation models (FMs). Developers need to select models based on evaluations that are aligned to ecommerce use cases. The platform must display accuracy metrics for text generation and summarization in dashboards. The company has custom ecommerce datasets to use as standardized evaluation inputs.
Which combination of steps will meet these requirements with the LEAST operational overhead? (Select TWO.)
Answer : B, C
The least operational overhead approach is to use managed Amazon Bedrock model evaluation workflows with datasets stored in Amazon S3, and then publish results into Amazon CloudWatch for dashboards. That is exactly what options B and C combine.
Step B correctly places standardized evaluation inputs in Amazon S3 and focuses on granting the evaluation workflow the right permissions to read those datasets. In practice, the key requirement is controlled access to the S3 objects used as evaluation datasets. Establishing IAM permissions and private access patterns (such as using VPC connectivity patterns where applicable to the organization's networking posture) is aligned with enterprise requirements and avoids building custom storage or data distribution systems for evaluators.
Step C then operationalizes the evaluation lifecycle with minimal infrastructure: a scheduled AWS Lambda function starts evaluation jobs using the S3 dataset location, and a second Lambda function checks job status and pushes results and operational signals to CloudWatch. This meets the platform requirement to surface accuracy metrics in dashboards because CloudWatch metrics/logs can be visualized in dashboards and queried through CloudWatch Logs Insights. It also supports continuous, standardized comparisons across models without requiring developers to run ad-hoc experiments.
The alternatives introduce more operational burden. D and E rely on Amazon SageMaker-based tooling, notebook jobs, and open source evaluation frameworks, which require more environment management, dependency control, scaling considerations, and maintenance over time. A includes CORS, which is primarily a browser-access concern and does not address how Bedrock-managed evaluation jobs securely access S3 in the typical service-to-service pattern.
Therefore, B + C achieves standardized model evaluation, automated scheduling, and dashboard-ready observability with the smallest operations footprint.
A healthcare company is developing an application to process medical queries. The application must answer complex queries with high accuracy by reducing semantic dilution. The application must refer to domain-specific terminology in medical documents to reduce ambiguity in medical terminology. The application must be able to respond to 1,000 queries each minute with response times less than 2 seconds.
Which solution will meet these requirements with the LEAST operational overhead?
Answer : B
Option B provides the least operational overhead because it keeps the solution primarily inside managed Amazon Bedrock capabilities, minimizing custom orchestration code and infrastructure to operate. The core requirements are domain grounding, reduced semantic dilution for complex questions, and consistent low-latency responses at high request volume. A Bedrock knowledge base is purpose-built for Retrieval Augmented Generation by ingesting domain documents, chunking content, generating embeddings, and retrieving the most relevant passages at runtime. This directly addresses the need to reference domain-specific medical terminology from authoritative documents to reduce ambiguity and improve factual accuracy.
Reducing semantic dilution typically requires improving the retrieval query so that the retriever focuses on the most relevant concepts, especially for long or multi-intent questions. Enabling query decomposition allows the system to break a complex medical query into smaller, more targeted sub-queries. This increases retrieval precision and recall for each sub-question, which helps the model generate a more accurate synthesized response grounded in the retrieved medical context.
Amazon Bedrock Flows provide a managed way to orchestrate multi-step generative AI workflows, such as preprocessing the input, performing retrieval against the knowledge base, invoking a foundation model, and formatting the final response. Because flows are managed, the company avoids maintaining custom state machines, multiple Lambda functions, or bespoke routing logic. This reduces operational overhead while still supporting repeatable, observable execution.
Compared with the alternatives, option A introduces an agent plus API Gateway routing and multiple model choices, increasing configuration and runtime complexity. Option C requires hosting and scaling custom models on SageMaker AI, which adds significant operational burden and latency risk. Option D relies on multiple Lambda functions orchestrated by an agent, which adds more moving parts and increases cold-start and integration overhead. Option B most directly meets the requirements with the smallest operational footprint.
A financial services company is deploying a generative AI (GenAI) application that uses Amazon Bedrock to assist customer service representatives to provide personalized investment advice to customers. The company must implement a comprehensive governance solution that follows responsible AI practices and meets regulatory requirements.
The solution must detect and prevent hallucinations in recommendations. The solution must have safety controls for customer interactions. The solution must also monitor model behavior drift in real time and maintain audit trails of all prompt-response pairs for regulatory review. The company must deploy the solution within 60 days. The solution must integrate with the company's existing compliance dashboard and respond to customers within 200 ms.
Which solution will meet these requirements with the LEAST operational overhead?
Answer : A
Option A is the correct solution because it uses native Amazon Bedrock governance and evaluation capabilities to meet regulatory, performance, and deployment timeline requirements with the least operational overhead.
Amazon Bedrock guardrails provide built-in safety controls that enforce responsible AI policies directly during inference. Custom content filters and toxicity detection protect customer interactions and prevent disallowed investment guidance patterns without requiring custom application logic. Guardrails operate inline and are optimized for low latency, which helps meet the strict 200 ms response-time requirement.
Hallucination detection is addressed through Amazon Bedrock Model Evaluation, which supports automated evaluation at scale using LLM-as-a-judge techniques. This enables the company to detect factual inaccuracies and policy violations systematically, without building custom evaluation pipelines or requiring extensive human review. Evaluation outputs can be surfaced as metrics.
Storing all prompt-response pairs in Amazon DynamoDB provides a low-latency, highly scalable audit store that aligns with financial regulatory requirements. Using TTL enforces data retention policies automatically, reducing compliance risk and storage overhead.
Amazon CloudWatch custom metrics integrate seamlessly with existing compliance dashboards, allowing near--real-time monitoring of safety interventions, hallucination rates, and drift indicators. CloudWatch anomaly detection can be applied to these metrics to surface behavior changes quickly.
Option B relies on custom Lambda logic and S3-based auditing, increasing latency and operational complexity. Option C introduces additional services that increase setup time and may exceed the 60-day deployment window. Option D uses non--Bedrock-native monitoring and adds unnecessary infrastructure layers.
Therefore, Option A provides the most complete, compliant, and low-overhead governance solution for a regulated GenAI financial services application.
A company deploys multiple Amazon Bedrock--based generative AI (GenAI) applications across multiple business units for customer service, content generation, and document analysis. Some applications show unpredictable token consumption patterns. The company requires a comprehensive observability solution that provides real-time visibility into token usage patterns across multiple models. The observability solution must support custom dashboards for multiple stakeholder groups and provide alerting capabilities for token consumption across all the foundation models that the company's applications use.
Which combination of solutions will meet these requirements with the LEAST operational overhead? (Select TWO.)
Answer : C, D
The combination of Options C and D delivers comprehensive, real-time observability for Amazon Bedrock workloads with the least operational overhead by relying on native integrations and managed services.
Amazon Bedrock publishes built-in CloudWatch metrics for model invocations and token usage. Option C leverages these native metrics directly, allowing teams to build centralized CloudWatch dashboards without additional data pipelines or custom processing. CloudWatch alarms provide threshold-based alerting for token consumption, enabling proactive cost and usage control across all foundation models. This approach aligns with AWS guidance to use native service metrics whenever possible to reduce operational complexity.
Option D complements CloudWatch by enabling advanced, stakeholder-specific visualizations through Amazon Managed Grafana. The zero-ETL integration allows Bedrock and CloudWatch metrics to be visualized directly in Grafana without building ingestion pipelines or managing storage layers. Grafana dashboards are particularly well suited for serving different audiences, such as engineering, finance, and product teams, each with customized views of token usage and trends.
Option A introduces unnecessary complexity by adding a business intelligence layer that is better suited for historical analytics than real-time operational monitoring. Option B is useful for deep log analysis but requires query maintenance and does not provide efficient real-time dashboards at scale. Option E involves multiple services and custom data flows, significantly increasing operational overhead compared to native metric-based observability.
By combining CloudWatch dashboards and alarms with Managed Grafana's zero-ETL visualization capabilities, the company achieves real-time visibility, flexible dashboards, and automated alerting across all Amazon Bedrock foundation models with minimal operational effort.