A company wants to use a data lake that is hosted on Amazon S3 to provide analytics services for historical dat
a. The data lake consists of 800 tables but is expected to grow to thousands of tables. More than 50 departments use the tables, and each department has hundreds of users. Different departments need access to specific tables and columns.
Which solution will meet these requirements with the LEAST operational overhead?
Answer : C
A company wants to use automatic machine learning (ML) to create and visualize forecasts of complex scenarios and trends.
Which solution will meet these requirements with the LEAST management overhead?
Answer : B
A company has an application that ingests streaming dat
a. The company needs to analyze this stream over a 5-minute timeframe to evaluate the stream for anomalies with Random Cut Forest (RCF) and summarize the current count of status codes. The source and summarized data should be persisted for future use.
Which approach would enable the desired outcome while keeping data persistence costs low?
Answer : B
A marketing company has an application that stores event data in an Amazon RDS database. The company is replicating this data to Amazon Redshift for reporting and
business intelligence (BI) purposes. New event data is continuously generated and ingested into the RDS database throughout the day and captured by a change data
capture (CDC) replication task in AWS Database Migration Service (AWS DMS). The company requires that the new data be replicated to Amazon Redshift in near-real
time.
Which solution meets these requirements?
Answer : A
A company is designing a data warehouse to support business intelligence reporting. Users will access the executive dashboard heavily each Monday and Friday morning
for I hour. These read-only queries will run on the active Amazon Redshift cluster, which runs on dc2.8xIarge compute nodes 24 hours a day, 7 days a week. There are
three queues set up in workload management: Dashboard, ETL, and System. The Amazon Redshift cluster needs to process the queries without wait time.
What is the MOST cost-effective way to ensure that the cluster processes these queries?
Answer : D
A company receives datasets from partners at various frequencies. The datasets include baseline data and incremental data. The company needs to merge and store all the datasets without reprocessing the data.
Which solution will meet these requirements with the LEAST development effort?
Answer : C
A financial company uses Amazon Athena to query data from an Amazon S3 data lake. Files are stored in the S3 data lake in Apache ORC format. Data analysts recently introduced nested fields in the data lake ORC files, and noticed that queries are taking longer to run in Athen
a. A data analysts discovered that more data than what is required is being scanned for the queries.
What is the MOST operationally efficient solution to improve query performance?
Answer : B
This solution meets the requirement because:
By using the Athena query engine V2 and pushing the query filter to the source ORC file, the data analysts can leverage the predicate pushdown feature for nested fields and avoid scanning more data than what is required for the queries. This can improve query performance without changing the data format or partitioning strategy.