Which Alertmanager feature prevents duplicate notifications from being sent?
Answer : C
Deduplication in Alertmanager ensures that identical alerts from multiple Prometheus servers or rule evaluations do not trigger duplicate notifications.
Alertmanager compares alerts based on their labels and fingerprints; if an alert with identical labels already exists, it merges or refreshes the existing one instead of creating a new notification.
This mechanism is essential in high-availability setups where multiple Prometheus instances monitor the same targets.
How do you calculate the average request duration during the last 5 minutes from a histogram or summary called http_request_duration_seconds?
Answer : A
In Prometheus, histograms and summaries expose metrics with _sum and _count suffixes to represent total accumulated values and sample counts, respectively. To compute the average request duration over a given time window (for example, 5 minutes), you divide the rate of increase of _sum by the rate of increase of _count:
\text{Average duration} = \frac{\text{rate(http_request_duration_seconds_sum[5m])}}{\text{rate(http_request_duration_seconds_count[5m])}}
Here,
http_request_duration_seconds_sum represents the total accumulated request time, and
http_request_duration_seconds_count represents the number of requests observed.
By dividing these rates, you obtain the average request duration per request over the specified time range.
Extracted and verified from Prometheus documentation -- Querying Histograms and Summaries, PromQL Rate Function, and Metric Naming Conventions sections.
Which function would you use to calculate the 95th percentile latency from histogram data?
Answer : B
To calculate a percentile (e.g., 95th percentile) from histogram data in Prometheus, the correct function is histogram_quantile(). It estimates quantiles based on cumulative bucket counts.
Example:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
This computes the 95th percentile request duration across all observed instances over the last 5 minutes.
With the following metrics over the last 5 minutes:
up{instance="localhost"} 1 1 1 1 1
up{instance="server1"} 1 0 0 0 0
What does the following query return:
min_over_time(up[5m])
Answer : A
The min_over_time() function in PromQL returns the minimum sample value observed within the specified time range for each time series.
In the given data:
For up{instance='localhost'}, all samples are 1. The minimum value over 5 minutes is therefore 1.
For up{instance='server1'}, the sequence is 1 0 0 0 0. The minimum observed value is 0.
Thus, the query min_over_time(up[5m]) returns two series --- one per instance:
{instance='localhost'} 1
{instance='server1'} 0
This query is commonly used to check uptime consistency. If the minimum value over the time window is 0, it indicates at least one scrape failure (target down).
Verified from Prometheus documentation -- PromQL Range Vector Functions, min_over_time() definition, and up Metric Semantics sections.
If the vector selector foo[5m] contains 1 1 NaN, what would max_over_time(foo[5m]) return?
Answer : B
In PromQL, range vector functions like max_over_time() compute an aggregate value (in this case, the maximum) over all samples within a specified time range. The function ignores NaN (Not-a-Number) values when computing the result.
Given the range vector foo[5m] containing samples [1, 1, NaN], the maximum value among the valid numeric samples is 1. Therefore, max_over_time(foo[5m]) returns 1.
Prometheus functions handle missing or invalid data points gracefully---ignoring NaN ensures stable calculations even when intermittent collection issues or resets occur. The function only errors if the selector is syntactically invalid or if no numeric samples exist at all.
Verified from Prometheus documentation -- PromQL Range Vector Functions, Aggregation Over Time Functions, and Handling NaN Values in PromQL sections.
Which of the following PromQL queries is invalid?
Answer : B
The max operator in PromQL is an aggregation operator, not a binary vector matching operator. Therefore, the valid syntax for aggregation uses by() or without(), not on().
max by (instance) up Valid; aggregates maximum values per instance.
max without (instance) up and max without (instance, job) up Valid; aggregates over all labels except those listed.
max on (instance) (up) Invalid; the keyword on() is only valid in binary operations (e.g., +, -, and, or, unless), where two vectors are being matched on specific labels.
Hence, max on (instance) (up) is a syntax error in PromQL because on() cannot be used directly with aggregation operators.
Verified from Prometheus documentation -- Aggregation Operators, Vector Matching -- on()/ignoring(), and PromQL Language Syntax Reference sections.
You'd like to monitor a short-lived batch job. What Prometheus component would you use?
Answer : B
Prometheus normally operates on a pull-based model, where it scrapes metrics from long-running targets. However, short-lived batch jobs (such as cron jobs or data processing tasks) often finish before Prometheus can scrape them. To handle this scenario, Prometheus provides the Pushgateway component.
The Pushgateway allows ephemeral jobs to push their metrics to an intermediary gateway. Prometheus then scrapes these metrics from the Pushgateway like any other target. This ensures short-lived jobs have their metrics preserved even after completion.
The Pushgateway should not be used for continuously running applications because it breaks Prometheus's usual target lifecycle semantics. Instead, it is intended solely for transient job metrics, like backups or CI/CD tasks.
Verified from Prometheus documentation -- Pushing Metrics -- The Pushgateway and Use Cases for Short-Lived Jobs sections.