PeopleCert Site Reliability Engineering Foundation v1.2 DevOps SRE SRE Exam Questions

Page: 1 / 14
Total 80 questions
Question 1

Which of the following BEST describes an advantage of a container-based structure?



Answer : A

Comprehensive and Detailed Explanation From Exact Extract:

Containers provide a major advantage that aligns with SRE: portability and environment consistency. The SRE Workbook describes containers as: ''lightweight, portable units that encapsulate applications and dependencies, ensuring consistent behavior across environments.'' This independence from the host OS environment enables predictable deployments and simplifies automation, scaling, and orchestration---especially when used with Kubernetes.

Option A captures this exact benefit: portability and independence from the host OS.

Option B is incorrect---containers do not reduce the number of developers required.

Option C incorrectly claims that efficiency comes from virtual machines; containers are typically more efficient because they avoid VM overhead, not leverage it.

Option D is incorrect---containers do not ''inherit'' security automatically; in fact, they require additional security controls.

Thus, A is the correct answer.


The Site Reliability Workbook, Sections on containers, Docker, and Kubernetes.

Site Reliability Engineering, containerization and orchestration discussions.

Question 2

''Problem-solving with a group of people with different skillsets.''

Which of the following concepts is BEST inferred by the above statement?



Answer : B

Comprehensive and Detailed Explanation From Exact Extract:

The SRE model heavily emphasizes cross-functional teamwork. In the SRE Workbook and chapters addressing incident management, Google defines collaboration as ''bringing together individuals with diverse expertise to jointly solve problems and make decisions.'' Collaboration implies active engagement, shared goals, and joint execution---exactly what the statement describes.

Option B, Collaboration, fits perfectly because effective problem-solving during incidents, launches, or reliability engineering work requires engineers from multiple disciplines (e.g., SRE, developers, network teams, product teams) to work together directly.

Option A (Coordination) is more about task alignment, not joint problem-solving.

Option C (Communication) is necessary but insufficient for solving problems together.

Option D (Cooperation) implies helpfulness, not necessarily integrated problem-solving.

Thus, B is the correct concept.


The Site Reliability Workbook, Chapter: ''Effective Incident Management.''

Site Reliability Engineering, Sections on teamwork and cross-functional collaboration.

Question 3

Known workarounds represent what type of toil?



Answer : D

Comprehensive and Detailed Explanation From Exact Extract:

Known workarounds represent toil that has no enduring value, one of the key characteristics of toil defined by the SRE framework.

From the Site Reliability Engineering Book, Chapter ''Eliminating Toil'':

''Toil is work that is manual, repetitive, automatable, tactical, has no enduring value, and scales linearly with service size.''

Known workarounds fit this definition because:

They solve the same recurring problems repeatedly

They do not permanently fix the underlying issue

They consume engineer time without contributing long-term improvements

These activities lack enduring value and should be eliminated through automation or engineering fixes.

Why the other options are incorrect:

A . Linear scaling --- Many forms of toil scale linearly, but this does not specifically describe workarounds.

B . Tactical --- Tactical means short-term, but not all tactical work is a workaround.

C . Automatable --- While some workarounds can be automated, not all are.

D . No enduring value --- This is the defining trait of workaround-type toil.

Therefore, option D is correct.


Site Reliability Engineering Book, ''Eliminating Toil''

SRE Workbook, ''Toil Reduction Strategies''

Question 4

What is the MOST widely tracked Service Level Objective (SLO)?



Answer : D

Comprehensive and Detailed Explanation From Exact Extract:

Availability is the most widely tracked and commonly understood SLO across nearly all digital services. It measures whether users are able to successfully access and use the system. Because unavailability directly impacts user experience, revenue, trust, and reliability, it is the primary SLO used across industries.

The Site Reliability Engineering Book, Chapter ''Service Level Objectives,'' states:

''Availability is one of the most common and important SLOs since it reflects the basic ability of the service to function for users.''

The SRE Workbook also notes:

''Availability targets (e.g., 99.9%, 99.99%) are the most widely used form of SLOs and form the foundation of error budget policies.''

While performance SLOs are also common, availability SLOs are almost universal and foundational.

Thus, D. Availability is the correct answer.


Site Reliability Engineering Book, ''Service Level Objectives''

SRE Workbook, ''Implementing SLOs''

Question 5

In a safety culture, engineers are allowed to do more with the production environment without fear of repercussions.

What else do engineers need to do?



Answer : B

Comprehensive and Detailed Explanation From Exact Extract:

In a safety culture, SRE emphasizes psychological safety so engineers can work effectively in production without fear of blame. However, safety never removes accountability. Engineers must take responsibility for their actions, decisions, and assumptions, particularly during incidents.

The Site Reliability Engineering Book, Chapter ''Postmortem Culture,'' states:

''Blamelessness does not eliminate accountability. Individuals must still explain the context, assumptions, and reasoning behind their decisions so that the organization can learn.''

Google stresses that:

Engineers must feel safe to act and report issues

Engineers must remain responsible and accountable

Accountability enables learning, not punishment

Why other options are incorrect:

A Sharing incidents on social media violates confidentiality

C Blameless postmortems are required, not skipped

D Avoiding on-call is contrary to SRE responsibilities

Thus, B is correct.


Site Reliability Engineering Book, ''Postmortem Culture''

SRE Workbook, ''Learning from Incidents''

Question 6

Which of these approaches can alleviate linear scaling toil?



Answer : B

Comprehensive and Detailed Explanation From Exact Extract:

Linear-scaling toil refers to work whose effort increases proportionally to service growth, such as manually provisioning servers or handling capacity expansion. The Google SRE Book, Chapter ''Eliminating Toil,'' explains:

''Toil is work that scales linearly with the size of your service. A core strategy for reducing toil is to introduce automation that breaks the linear relationship.''

Auto-scaling capabilities directly address linear-scaling toil by automating resource allocation based on load or demand. This prevents engineers from repeatedly and manually adjusting infrastructure as usage grows.

The SRE Workbook also emphasizes:

''Infrastructure automation such as auto-scaling removes a major source of linear scaling toil by ensuring that capacity adjusts automatically as services grow.''

Why the other options are incorrect:

A Manual scaling is linear-scaling toil, not a solution.

C Outsourcing development does not reduce operational toil.

D Switching cloud providers alone does not solve toil unless automation is introduced.

Thus, B is the correct answer.


Site Reliability Engineering Book, ''Eliminating Toil''

SRE Workbook, ''Toil Reduction Strategies''

Question 7

Which of the following terms is BEST described by the definition below?

''The probability that the system will meet certain performance standards and yield correct output for a specific time.''



Answer : B

Comprehensive and Detailed Explanation From Exact Extract:

The SRE Book defines reliability as: ''the probability that a system will perform its intended function correctly for a specified period of time.'' (SRE Book -- Introduction). Reliability focuses on correctness and consistent performance, not simply uptime. Availability (option A) refers to system uptime or accessibility. Durability (option C) refers to long-term data persistence. Throughput (option D) measures volume of work processed over time.

Because the definition explicitly mentions probability of meeting performance standards and correct output over time, it directly matches the SRE definition of reliability.

Thus, B is correct.


Site Reliability Engineering, Introduction section on reliability definitions.

The Site Reliability Workbook, Reliability fundamentals.

Page:    1 / 14   
Total 80 questions