Google Cloud Associate Data Practitioner Exam Practice Test

Page: 1 / 14
Total 106 questions
Question 1

Your organization has decided to move their on-premises Apache Spark-based workload to Google Cloud. You want to be able to manage the code without needing to provision and manage your own cluster. What should you do?



Answer : A

Migrating the Spark jobs to Dataproc Serverless is the best approach because it allows you to run Spark workloads without the need to provision or manage clusters. Dataproc Serverless automatically scales resources based on workload requirements, simplifying operations and reducing administrative overhead. This solution is ideal for organizations that want to focus on managing their Spark code without worrying about the underlying infrastructure. It is cost-effective and fully managed, aligning well with the goal of minimizing cluster management.


Question 2

Your retail company wants to predict customer churn using historical purchase data stored in BigQuery. The dataset includes customer demographics, purchase history, and a label indicating whether the customer churned or not. You want to build a machine learning model to identify customers at risk of churning. You need to create and train a logistic regression model for predicting customer churn, using the customer_data table with the churned column as the target label. Which BigQuery ML query should you use?



Answer : B

Comprehensive and Detailed in Depth

Why B is correct:BigQuery ML requires the target label to be explicitly named label.

EXCEPT(churned) selects all columns except the churned column, which becomes the features.

churned AS label renames the churned column to label, which is required for BigQuery ML.

logistic_reg is the correct model_type option.

Why other options are incorrect:A: Does not rename the target column to label. Also has a typo in the model type.

C: Only selects the target label, not the features.

D: Has a syntax error with the single quote before except.


BigQuery ML Logistic Regression: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-logistic-regression

BigQuery ML Syntax: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create

Question 3

Your organization's website uses an on-premises MySQL as a backend database. You need to migrate the on-premises MySQL database to Google Cloud while maintaining MySQL features. You want to minimize administrative overhead and downtime. What should you do?



Answer : B

Comprehensive and Detailed in Depth

Why B is correct:Database Migration Service (DMS) is designed for migrating databases to Cloud SQL with minimal downtime and administrative overhead.

Cloud SQL for MySQL is a fully managed MySQL service, which aligns with the requirement to minimize administrative overhead.

Why other options are incorrect:A: Installing MySQL on Compute Engine requires manual management of the database instance, which increases administrative overhead.

C: BigQuery is not a direct replacement for a relational MySQL database. It's an analytical data warehouse.

D: Spanner is a globally distributed, scalable database, but it requires schema conversion and is not a direct replacement for MySQL, and it is also much more complex than cloud SQL.


Database Migration Service: https://cloud.google.com/database-migration

Cloud SQL for MySQL: https://cloud.google.com/sql/docs/mysql

Question 4

Your retail company collects customer data from various sources:

You are designing a data pipeline to extract this dat

a. Which Google Cloud storage system(s) should you select for further analysis and ML model training?



Answer : B

Online transactions: Storing the transactional data in BigQuery is ideal because BigQuery is a serverless data warehouse optimized for querying and analyzing structured data at scale. It supports SQL queries and is suitable for structured transactional data.

Customer feedback: Storing customer feedback in Cloud Storage is appropriate as it allows you to store unstructured text files reliably and at a low cost. Cloud Storage also integrates well with data processing and ML tools for further analysis.

Social media activity: Storing real-time social media activity in BigQuery is optimal because BigQuery supports streaming inserts, enabling real-time ingestion and analysis of data. This allows immediate analysis and integration into dashboards or ML pipelines.


Question 5

Your company is adopting BigQuery as their data warehouse platform. Your team has experienced Python developers. You need to recommend a fully-managed tool to build batch ETL processes that extract data from various source systems, transform the data using a variety of Google Cloud services, and load the transformed data into BigQuery. You want this tool to leverage your team's Python skills. What should you do?



Answer : C

Comprehensive and Detailed In-Depth

The tool must be fully managed, support batch ETL, integrate with multiple Google Cloud services, and leverage Python skills.

Option A: Dataform is SQL-focused for ELT within BigQuery, not Python-centric, and lacks broad service integration for extraction.

Option B: Cloud Data Fusion is a visual ETL tool, not Python-focused, and requires more UI-based configuration than coding.

Option C: Cloud Composer (managed Apache Airflow) is fully managed, supports batch ETL via DAGs, integrates with various Google Cloud services (e.g., BigQuery, GCS) through operators, and allows custom Python code in tasks. It's ideal for Python developers per the 'Cloud Composer' documentation.

Option D: Dataflow excels at streaming and batch processing but focuses on Apache Beam (Python SDK available), not broad service orchestration. Pre-built templates limit customization. Reference: Google Cloud Documentation - 'Cloud Composer Overview' (https://cloud.google.com/composer/docs).

Option D: Dataflow excels at streaming and batch processing but focuses on Apache Beam (Python SDK available), not broad service orchestration. Pre-built templates limit customization. Reference: Google Cloud Documentation - 'Cloud Composer Overview' (https://cloud.google.com/composer/docs).


Question 6

Your team wants to create a monthly report to analyze inventory data that is updated daily. You need to aggregate the inventory counts by using only the most recent month of data, and save the results to be used in a Looker Studio dashboard. What should you do?



Answer : A

Creating a materialized view in BigQuery with the SUM() function and the DATE_SUB() function is the best approach. Materialized views allow you to pre-aggregate and cache query results, making them efficient for repeated access, such as monthly reporting. By using the DATE_SUB() function, you can filter the inventory data to include only the most recent month. This approach ensures that the aggregation is up-to-date with minimal latency and provides efficient integration with Looker Studio for dashboarding.


Question 7

You need to create a data pipeline for a new application. Your application will stream data that needs to be enriched and cleaned. Eventually, the data will be used to train machine learning models. You need to determine the appropriate data manipulation methodology and which Google Cloud services to use in this pipeline. What should you choose?



Answer : A

Comprehensive and Detailed In-Depth

Streaming data requiring enrichment and cleaning before ML training suggests an ETL (Extract, Transform, Load) approach, with a focus on real-time processing and a data warehouse for ML.

Option A: ETL with Dataflow (streaming transformations) and BigQuery (storage/ML training) is Google's recommended pattern for streaming pipelines. Dataflow handles enrichment/cleaning, and BigQuery supports ML model training (BigQuery ML).

Option B: ETL with Cloud Data Fusion to Cloud Storage is batch-oriented and lacks streaming focus. Cloud Storage isn't ideal for ML training directly.

Option C: ELT (load then transform) with Cloud Storage to Bigtable is misaligned---Bigtable is for NoSQL, not ML training or post-load transformation.

Option D: ELT with Cloud SQL to Analytics Hub is for relational data and data sharing, not streaming or ML. Reference: Google Cloud Documentation - 'Dataflow: ETL Patterns' (https://cloud.google.com/dataflow/docs/guides), 'BigQuery ML' (https://cloud.google.com/bigquery-ml).

Option D: ELT with Cloud SQL to Analytics Hub is for relational data and data sharing, not streaming or ML. Reference: Google Cloud Documentation - 'Dataflow: ETL Patterns' (https://cloud.google.com/dataflow/docs/guides), 'BigQuery ML' (https://cloud.google.com/bigquery-ml).


Page:    1 / 14   
Total 106 questions