You are developing a retail application on Vertex AI that uses a large language model to give customers real-time product recommendations. The app uses text, code, images, audio, and video prompts to communicate with customers. You need to minimize the application's latency to ensure a responsive and delightful user experience. What should you do?
Answer : A
According to Google Cloud Vertex AI documentation on Generative AI, Streaming is the recommended technique to reduce perceived latency in user-facing applications. When using large language models (LLMs), generating the full response can take several seconds depending on the complexity and length. By enabling streaming, the model sends the output in chunks (tokens) as they are generated, rather than waiting for the entire sequence to complete. This allows the application UI to display text to the customer progressively, creating a 'real-time' feel and a more responsive experience.
Options B, C, and D do not directly address latency in a reliable or architectural way. Increasing temperature (Option B) affects the randomness and creativity of the output but not the speed of generation. While setting a lower max_output_tokens (Option C) might technically result in shorter (and thus faster) responses, 'avoiding' the parameter altogether gives the model more freedom to produce long, slow responses, which is counterproductive. Similarly, avoiding system instructions (Option D) sacrifices the quality and safety of the model's behavior without providing a predictable latency benefit. For a high-quality 'conversational commerce' experience as defined in the Cymbal Retail case study, streaming is the standard architectural pattern for performance.
You are designing me observability strategy for a new microservices application running on Google Kubernetes Engine (GKE) The application consists of multiple services (e.g.. frontend, orders, payments). During load testing, you observe an error in the frontend service's logs, but you cannot find the corresponding logs in the downstream services to investigate the root cause because the logs are not correlated. You need to implement a solution that allows you to follow a single user request across all microservices involved in the transaction. The solution must not require developers to manually add correlation logic to their application code What should you do?
Answer : C
To follow a user request through a maze of microservices, you need Distributed Tracing.
Cloud Trace: By ensuring the traceparent (W3C standard) header is propagated (Option C), Google Cloud can automatically link logs from the Frontend, Orders, and Payments services.
Zero Manual Effort: Modern libraries (like OpenTelemetry) handle the 'heavy lifting' of extracting and injecting these headers into outgoing requests, satisfying the requirement to avoid manual correlation logic.
You need to ensure reliability for your application and operations by supporting reliable task a scheduling for compute on GCP. Leveraging Google best practices, what should you do?
Answer : B
https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine
You write a Python script to connect to Google BigQuery from a Google Compute Engine virtual machine. The script is printing errors that it cannot connect to BigQuery. What should you do to fix the script?
Answer : B
The error is most like caused by the access scope issue. When create new instance, you have the default Compute engine default service account but most serves access including BigQuery is not enable. Create an instance Most access are not enabled by default You have default service account but don't have the permission (scope) you can stop the instance, edit, change scope and restart it to enable the scope access. Of course, if you Run your script on a new virtual machine with the BigQuery access scope enabled, it also works
https://cloud.google.com/compute/docs/access/service-accounts
Your company is forecasting a sharp increase in the number and size of Apache Spark and Hadoop jobs being run on your local datacenter You want to utilize the cloud to help you scale this upcoming demand with the least amount of operations work and code change. Which product should you use?
Answer : B
Google Cloud Dataproc is a fast, easy-to-use, low-cost and fully managed service that lets you run the Apache Spark and Apache Hadoop ecosystem on Google Cloud Platform. Cloud Dataproc provisions big or small clusters rapidly, supports many popular job types, and is integrated with other Google Cloud Platform services, such as Google Cloud Storage and Stackdriver Logging, thus helping you reduce TCO.
To reduce costs, the Director of Engineering has required all developers to move their development infrastructure resources from on-premises virtual machines (VMs) to Google Cloud Platform. These resources go through multiple start/stop events during the day and require state to persist. You have been asked to design the process of running a development environment in Google Cloud while providing cost visibility to the finance department. Which two steps should you take? Choose 2 answers
Answer : A, D
https://cloud.google.com/billing/docs/how-to/export-data-bigquery
Your company has successfully migrated to the cloud and wants to analyze their data stream to optimize operations. They do not have any existing code for this analysis, so they are exploring all their options. These options include a mix of batch and stream processing, as they are running some hourly jobs and live-processing some data as it comes in. Which technology should they use for this?
Answer : B
Dataflow is for processing both the Batch and Stream.
Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed.