Google Cloud Associate Data Practitioner Exam Practice Test

Page: 1 / 14
Total 106 questions
Question 1

Your company's customer support audio files are stored in a Cloud Storage bucket. You plan to analyze the audio files' metadata and file content within BigQuery to create inference by using BigQuery ML. You need to create a corresponding table in BigQuery that represents the bucket containing the audio files. What should you do?



Answer : D

To analyze audio files stored in a Cloud Storage bucket and represent them in BigQuery, you should create an object table. Object tables in BigQuery are designed to represent objects stored in Cloud Storage, including their metadata. This enables you to query the metadata of audio files directly from BigQuery without duplicating the data. Once the object table is created, you can use it in conjunction with other BigQuery ML workflows for inference and analysis.


Question 2

You work for a healthcare company that has a large on-premises data system containing patient records with personally identifiable information (PII) such as names, addresses, and medical diagnoses. You need a standardized managed solution that de-identifies PII across all your data feeds prior to ingestion to Google Cloud. What should you do?



Answer : B

Using Cloud Data Fusion is the best solution for this scenario because:

Standardized managed solution: Cloud Data Fusion provides a visual interface for building data pipelines and includes prebuilt connectors and transformations for data cleaning and de-identification.

Compliance: It ensures sensitive data such as PII is de-identified prior to ingestion into Google Cloud, adhering to regulatory requirements for healthcare data.

Ease of use: Cloud Data Fusion is designed for transforming and preparing data, making it a managed and user-friendly tool for this purpose.

It's a fully managed, cloud-native data integration service for building ETL/ELT data pipelines visually.

It offers built-in transformations and connectors, including those suitable for data masking and de-identification.

It provides a standardized, visual interface, making it easier to create and manage data pipelines across various data sources.

It's designed for data integration and transformation, making it ideal for this scenario.

It helps to achieve a standardized managed solution.


Question 3

Your company is setting up an enterprise business intelligence platform. You need to limit data access between many different teams while following the Google-recommended approach. What should you do first?



Answer : D

Comprehensive and Detailed In-Depth

For an enterprise BI platform with data access control across teams, Google recommends Looker (Google Cloud core) over Looker Studio for its robust access management. The 'first' step focuses on setting up the foundation.

Option A: Looker Studio reports are lightweight but lack granular access control beyond sharing. Creating separate reports per team is inefficient and unscalable.

Option B: One Looker Studio report with multiple pages and data sources doesn't enforce team-level access control natively---users could access all pages/data.

Option C: Creating a Looker instance with separate dashboards per team is a step forward but skips the foundational access control setup (groups), reducing scalability.

Option D: Setting up a Looker instance and configuring groups aligns with Google's recommendation for enterprise BI. Groups allow role-based access control (RBAC) at the model, Explore, or dashboard level, ensuring teams see only their data. This is the scalable, foundational step per Looker's 'Access Control' documentation. Reference: Looker Documentation - 'Managing Users and Groups' (https://cloud.google.com/looker/docs/admin-users-groups).

Option D: Setting up a Looker instance and configuring groups aligns with Google's recommendation for enterprise BI. Groups allow role-based access control (RBAC) at the model, Explore, or dashboard level, ensuring teams see only their data. This is the scalable, foundational step per Looker's 'Access Control' documentation. Reference: Looker Documentation - 'Managing Users and Groups' (https://cloud.google.com/looker/docs/admin-users-groups).


Question 4

You work for a gaming company that collects real-time player activity dat

a. This data is streamed into Pub/Sub and needs to be processed and loaded into BigQuery for analysis. The processing involves filtering, enriching, and aggregating the data before loading it into partitioned BigQuery tables. You need to design a pipeline that ensures low latency and high throughput while following a Google-recommended approach. What should you do?



Answer : C

Comprehensive and Detailed in Depth

Why C is correct:Dataflow is the recommended service for real-time stream processing on Google Cloud.

It provides scalable and reliable processing with low latency and high throughput.

Dataflow's streaming API is optimized for Pub/Sub integration and BigQuery streaming inserts.

Why other options are incorrect:A: Cloud Composer is for batch orchestration, not real-time streaming.

B: Dataproc and Spark streaming are more complex and not as efficient as Dataflow for this task.

D: Cloud Run functions are for stateless, event-driven applications, not continuous stream processing.


Dataflow Streaming: https://cloud.google.com/dataflow/docs/concepts/streaming-pipelines

Pub/Sub to BigQuery with Dataflow: https://cloud.google.com/dataflow/docs/tutorials/pubsub-to-bigquery

Question 5

You are working on a project that requires analyzing daily social media dat

a. You have 100 GB of JSON formatted data stored in Cloud Storage that keeps growing.

You need to transform and load this data into BigQuery for analysis. You want to follow the Google-recommended approach. What should you do?



Answer : C

Comprehensive and Detailed in Depth

Why C is correct:Dataflow is a fully managed service for transforming and enriching data in both batch and streaming modes.

Dataflow is googles recomended way to transform large datasets.

It is designed for parallel processing, making it suitable for large datasets.

Why other options are incorrect:A: Manual downloading and scripting is not scalable or efficient.

B: Cloud Run functions are for stateless applications, not large data transformations.

D: While Cloud Data fusion could work, Dataflow is more optimized for large scale data transformation.


Dataflow: https://cloud.google.com/dataflow/docs

Query successful

Question 6

Your company uses Looker as its primary business intelligence platform. You want to use LookML to visualize the profit margin for each of your company's products in your Looker Explores and dashboards. You need to implement a solution quickly and efficiently. What should you do?



Answer : B

Defining a new measure in LookML to calculate the profit margin using the existing revenue and cost fields is the most efficient and straightforward solution. This approach allows you to dynamically compute the profit margin directly within your Looker Explores and dashboards without needing to pre-calculate or create additional tables. The measure can be defined using LookML syntax, such as:

measure: profit_margin {

type: number

sql: (revenue - cost) / revenue ;;

value_format: '0.0%'

}

This method is quick to implement and integrates seamlessly into your existing Looker model, enabling accurate visualization of profit margins across your products.


Question 7

You manage a large amount of data in Cloud Storage, including raw data, processed data, and backups. Your organization is subject to strict compliance regulations that mandate data immutability for specific data types. You want to use an efficient process to reduce storage costs while ensuring that your storage strategy meets retention requirements. What should you do?



Answer : D

Using object holds and lifecycle management rules is the most efficient and compliant strategy for this scenario because:

Immutability: Object holds (temporary or event-based) ensure that objects cannot be deleted or overwritten, meeting strict compliance regulations for data immutability.

Cost efficiency: Lifecycle management rules automatically transition objects to more cost-effective storage classes based on their age and access patterns.

Compliance and automation: This approach ensures compliance with retention requirements while reducing manual effort, leveraging built-in Cloud Storage features.


Page:    1 / 14   
Total 106 questions