PeopleCert Site Reliability Engineering Foundation v1.2 DevOps SRE SRE Exam Questions

Page: 1 / 14
Total 80 questions
Question 1

Which of the following is NOT a SRE principle?



Answer : C

Comprehensive and Detailed Explanation From Exact Extract:

The statement ''Toil is not important work'' is NOT an SRE principle. This is incorrect based on the official Google SRE documentation. In the Site Reliability Engineering Book, toil is treated as a critical concept, because identifying and reducing toil directly enables reliability improvements and more engineering-focused work. The SRE book emphasizes that toil must be taken seriously and systematically reduced, but never dismissed.

From the SRE Book, Chapter ''Eliminating Toil'':

''Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, with no enduring value, and that scales linearly as a service grows.''

The SRE book further emphasizes:

''SRE teams should measure toil, track it, and make constant efforts to reduce it.''

This demonstrates that toil is significant and should not be ignored. Therefore, any suggestion that ''toil is not important work'' contradicts the documentation.

The other answer choices are actual SRE principles:

Operations is a software problem --- From SRE Book Introduction:

''SRE's approach starts with the belief that operations is fundamentally a software engineering problem.''

Automate what is currently done manually --- Automation is a central SRE philosophy to reduce toil.

Reduce the cost of failure --- Error budgets and controlled risk-taking are core SRE concepts designed to reduce the cost of failure.

Thus, the only option that is NOT an SRE principle is C.


Site Reliability Engineering Book, ''Introduction'' and ''Eliminating Toil'' Chapters

SRE Workbook, ''Eliminating Toil'' Section

Question 2

Before getting into the technical details of a Service Level Objective, what should be done?



Answer : C

Comprehensive and Detailed Explanation From Exact Extract:

Before defining any technical details of an SLO, the SRE guidance is clear: the conversation must start from the customer's point of view. SLOs exist to represent what reliability level users genuinely require---not internal assumptions or engineering preferences.

The SRE Workbook, Chapter ''Implementing SLOs,'' states:

''The process must begin by understanding what your users need from the service and what good performance actually means from the user's perspective.''

Likewise, in the Site Reliability Engineering Book:

''SLOs capture the reliability target that makes sense for the users and the product, which is why defining them must begin with understanding the user experience.''

This means that SLO development begins with analyzing:

What users value

What reliability thresholds they notice

What failures matter to them most

Only after this understanding is established should teams discuss metrics, thresholds, SLIs, and error budgets.

Why the other options are incorrect:

A . Identify toil --- Relevant to operations, not SLO creation.

B . Evaluate automation --- Important for reducing toil, unrelated to initial SLO definition.

D . Assess resources --- Planning happens after SLO definition, not before.

Thus, the correct answer is C.


SRE Workbook, Chapter: ''Implementing SLOs''

Site Reliability Engineering Book, Chapter: ''Service Level Objectives''

Question 3

Which of the following BEST describes observability?



Answer : C

Comprehensive and Detailed Explanation From Exact Extract:

The term observability comes directly from control theory and refers to the ability to infer the internal state of a system from its external outputs. Modern SRE and observability practices adopt this definition.

Google's Site Reliability Engineering guidance (SRE Book Addendum on Observability) states:

''Observability is a property of a system that allows operators to understand its internal state by examining its outputs such as logs, metrics, and traces.''

This aligns exactly with Option C, the formal definition.

Why the other options are incorrect:

A Monitoring is part of observability, but observability is much broader.

B Health checks are simply one signal; they do not represent observability.

D Data collection is a mechanism, not the definition of observability itself.

Thus, C is the correct and academically accurate definition.


Site Reliability Engineering Book Addendum: Observability

Google Cloud Architecture Framework: Observability Principles

Question 4

What does the term "wisdom of production" mean?



Answer : B

Comprehensive and Detailed Explanation From Exact Extract:

The term ''wisdom of production'' refers to the insights gained from real systems running under actual production conditions. Only production environments exhibit real user behavior, real workloads, true performance characteristics, and authentic failure modes. This concept is rooted in the SRE philosophy that production is the ultimate source of truth for understanding system behavior.

From the SRE Workbook, Chapter ''Monitoring'':

''Only production provides the full truth about how a system behaves under real workloads. Production is the ultimate source of wisdom about the system.''

This makes clear that wisdom gained from production is indispensable. Testing and staging environments cannot reproduce all real-world variables, usage patterns, and failure pathways.

Why the other options are incorrect:

A describes engineering approaches but does not define ''wisdom of production.''

C is incorrect because staging environments do not provide production wisdom.

D relates to automation strategy, not production insights.

Thus, the accurate meaning of the term is B --- The wisdom gained from something running in production.


Site Reliability Engineering Workbook, ''Monitoring'' Chapter

Site Reliability Engineering Book, ''Practical Alerting'' and ''Production Readiness'' Sections

Question 5

Why would some Service Level Indicators require client-side data?



Answer : A

Comprehensive and Detailed Explanation From Exact Extract:

SLIs must measure user experience, and sometimes server-side metrics alone do not show the full picture. Client-side data may reveal issues such as:

Slow networks

Browser rendering delays

Mobile device limitations

CDN performance issues

Last-mile latency

The Site Reliability Engineering Book, Chapter ''Service Level Indicators,'' states:

''Server-side metrics do not always fully capture the user experience. In many cases, client-side measurements are required to understand the actual reliability delivered to users.''

The SRE Workbook reinforces:

''Some SLIs require client instrumentation because user-visible performance problems may not be observable from backend systems alone.''

Why the other options are incorrect:

B SLA negotiation has nothing to do with SLI selection.

C Automation engineering is unrelated to client-side measurement needs.

D Achievability of SLOs does not determine whether client-side data is needed; accuracy of user-experience measurement does.

Thus, the correct answer is A.


Site Reliability Engineering Book, ''Service Level Indicators''

SRE Workbook, ''Choosing the Right SLIs''

Question 6

What is the benefit of strategically burning the Error Budget to zero every month?



Answer : A

Comprehensive and Detailed Explanation From Exact Extract:

Burning the error budget to zero --- strategically, not accidentally --- helps ensure the correct balance between release velocity and system stability, which is the fundamental purpose of error budgets. Error budgets exist to encourage a healthy level of risk-taking up to the point where user experience is not impacted.

From the Site Reliability Engineering Book, SLO chapter:

''Error budgets provide a mechanism for balancing innovation and reliability by allowing measured risk-taking while ensuring user expectations are met.''

The SRE Workbook adds:

''Teams should aim to use their full error budget. Not using it implies missed opportunities to deliver features or improvements.''

This means that strategically burning the error budget to zero ensures:

Teams are shipping value at maximum safe velocity

Reliability goals are still respected

Risk is managed and intentional

Why other options are incorrect:

B Capacity measurement is unrelated to error budget consumption.

C Error budgets should not be continually revised unless business needs change.

D Conversations with partners may occur, but this is not the primary benefit.

Thus, the correct answer is A.


Site Reliability Engineering Book, ''Service Level Objectives''

SRE Workbook, ''SLO Engineering''

Question 7

When outages are repetitive and similar, they become a form of toil.

Which of the following describes the MOST compelling reason to adopt advanced technologies and artificial intelligence (AI)?



Answer : A

Comprehensive and Detailed Explanation From Exact Extract:

SRE defines toil as ''manual, repetitive, automatable, tactical work tied to running a service'' (SRE Book -- Eliminating Toil). Repetitive outages are specifically noted as a form of operational toil. The SRE Book and SRE Workbook emphasize adopting automation, intelligent tooling, and machine-learning--assisted systems to reduce toil and decrease Mean Time to Repair (MTTR) and Mean Time to Restore Service (MTRS). The books state: ''Reducing MTTR directly increases system reliability more effectively than attempting to eliminate all failures.'' (SRE Book -- Chapter: Managing Incidents).

AI and advanced automation help detect issues faster, classify patterns, trigger automated remediation, and reduce human intervention---delivering reliability gains through faster repair rather than perfect uptime.

Option A is the only option aligned with SRE's reliability philosophy.

Options B and C incorrectly suggest increasing MTTR/MTRS.

Option D refers to ''perfect MTRS,'' which is impossible and contradicts SRE's acceptance of failure.

Thus, A is correct.


Site Reliability Engineering, Chapter: ''Eliminating Toil,'' ''Managing Incidents.''

The Site Reliability Workbook, ML/automation case studies.

Page:    1 / 14   
Total 80 questions