Amazon AWS Certified Data Analytics - Specialty DAS-C01 Exam Questions

Page: 1 / 14
Total 207 questions
Question 1

A company needs to collect streaming data from several sources and store the data in the AWS Cloud. The dataset is heavily structured, but analysts need to perform several complex SQL queries and need consistent performance. Some of the data is queried more frequently than the rest. The company wants a solution that meets its performance requirements in a cost-effective manner.

Which solution meets these requirements?



Answer : B


Question 2

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's data analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data.

The amount of data that is ingested into Amazon S3 has increased to 5 PB over time. The query latency also has increased. The company needs to segment the data to reduce the amount of data that is scanned.

Which solutions will improve query performance? (Select TWO.)



Answer : B, C

This solution will improve query performance because:

Apache Parquet is a columnar storage format that is optimized for analytics and supports compression1.Parquet files can reduce the amount of data scanned and transferred by Athena, thus improving performance and reducing cost1.

The Athena CREATE TABLE AS SELECT (CTAS) statement allows you to create a new table from the results of a SELECT query2.You can use this statement to convert the CSV files to Parquet format and store them in a different location in S32.You can also specify partitioning keys for the new table, which can further improve query performance by filtering out irrelevant data2.

Querying the Parquet data will be faster and cheaper than querying the CSV data, as Parquet files are more efficient for analytical queries1.

C) Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.

This solution will improve query performance because:

AWS Glue is a fully managed extract, transform, and load (ETL) service that can be used to prepare and load data for analytics3.You can use AWS Glue to create a job that copies the CSV files from the source S3 bucket to a new S3 bucket, and converts them to Apache Parquet format3.


Question 3

A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in both Regions. The solution should be as low-cost as possible.

What should the company do to achieve this goal?



Answer : B


Question 4

A global pharmaceutical company receives test results for new drugs from various testing facilities worldwide. The results are sent in millions of 1 KB-sized JSON objects to an Amazon S3 bucket owned by the company. The data engineering team needs to process those files, convert them into Apache Parquet format, and load them into Amazon Redshift for data analysts to perform dashboard reporting. The engineering team uses AWS Glue to process the objects, AWS Step Functions for process orchestration, and Amazon CloudWatch for job scheduling.

More testing facilities were recently added, and the time to process files is increasing.

What will MOST efficiently decrease the data processing time?



Answer : A


Question 5

A company collects and transforms data files from third-party providers by using an on-premises SFTP server. The company uses a Python script to transform the dat

a.

The company wants to reduce the overhead of maintaining the SFTP server and storing large amounts of data on premises. However, the company does not want to change the existing upload process for the third-party providers.

Which solution will meet these requirements with the LEAST development effort?



Answer : C

This solution meets the requirements because:

AWS Transfer Family is a fully managed service that enables secure file transfers to and from Amazon S3 or Amazon EFS using standard protocols such as SFTP, FTPS, and FTP1. By using AWS Transfer Family, the company can reduce the overhead of maintaining the on-premises SFTP server and storing large amounts of data on premises.

The company can create an SFTP-enabled server with a publicly accessible endpoint using AWS Transfer Family. This endpoint can be accessed by the third-party providers over the internet using their existing SFTP clients. The company can also change the server name to match the name of the on-premises SFTP server, so that the existing upload process for the third-party providers does not change. For more information, seeCreate an SFTP-enabled server.

The company can configure the new SFTP server to use Amazon S3 as the storage service. This way, the data files uploaded by the third-party providers will be stored in an Amazon S3 bucket. The company can also use AWS Identity and Access Management (IAM) roles and policies to control access to the S3 bucket and its objects. For more information, seeUsing Amazon S3 as your storage service.

The company can schedule a Python shell job in AWS Glue to use the existing Python script to run periodically and transform the uploaded files.AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics2.A Python shell job is a type of job that runs Python scripts in a managed Apache Spark environment3.The company can use AWS Glue triggers to schedule the Python shell job based on time or events4. For more information, seeWorking with Python shell jobs.


Question 6

A utility company wants to visualize data for energy usage on a daily basis in Amazon QuickSight A data analytics specialist at the company has built a data pipeline to collect and ingest the data into Amazon S3 Each day the data is stored in an individual csv file in an S3 bucket This is an example of the naming structure

20210707_datacsv 20210708_datacsv

To allow for data querying in QuickSight through Amazon Athena the specialist used an AWS Glue crawler to create a table with the path "s3 //powertransformer/20210707_data csv" However when the data is queried, it returns zero rows

How can this issue be resolved?



Answer : D


Question 7

A transportation company uses IoT sensors attached to trucks to collect vehicle data for its global delivery fleet. The company currently sends the sensor data in small .csv files to Amazon S3. The files are then loaded into a 10-node Amazon Redshift cluster with two slices per node and queried using both Amazon Athena and Amazon Redshift. The company wants to optimize the files to reduce the cost of querying and also improve the speed of data loading into the Amazon Redshift cluster.

Which solution meets these requirements?



Answer : D


Page:    1 / 14   
Total 207 questions