Skip to main content

2 posts tagged with "observability"

View All Tags

Prometheus vendor death match

· 13 min read
TL;DR

We evaluated a number of observability vendors, with a focus on metrics, and did detailed PoCs with both Chronosphere and Grafana Cloud. Both are excellent products, and have slightly different strengths.

Death match

At work, we're in the process of rebuilding our metrics pipeline, as we've outgrown our old self-managed TIG (Telegraf, InfluxDB, Grafana) solution. We've had this solution in place for many years, and it's served us well. Especially given the increasingly predatory pricing models of observability vendors, it's been extraordinarily cost-effective.

But over the last couple years, as we've grown, we've started to hit the limits of what we can handle with a single, vertically scaled instance of InfluxDB (especially using InfluxDB v1). It was increasingly stressful to keep it running smoothly, and we had to be very vigilant about cardinality, as it's very easy to accidentally introduce a cardinality explosion that can bring down the entire database.

Fun with OTEL collectors and metrics

· 6 min read
OpenTelemetry Logo

As part of an evaluation of Prometheus compatible monitoring solutions, I found the need to push our use of the OTEL Collector to handle some use cases like creating metrics allowlists, renaming metrics, or adding and modifying labels.

Here's some examples, based on what I learned, of the crazy and powerful things you can do with OTEL collector processors to manipulate metrics.