Skip to main content

10 posts tagged with "platform-engineering"

View All Tags

Multi-platform container builds with BuildKit

· 13 min read
TL;DR

At SimpliSafe, we wanted to take advantage of the cost savings and performance improvements of AWS's Graviton processors (ARM64), but wanted to do it incrementally to manage risk.

We built an autoscaling, docker-compatible build service using BuildKit, which could build multi-platform container images, and then used Karpenter to auto-provision Graviton-based nodes.

This is paying off- we're seeing the expected 30% better performance per EC2 dollar spent, as well as some surprising benefits to developer productivity.

A polar bear using an ARM64 laptop

AWS's Graviton is a 64 bit ARM-based CPU available on EC2. Why, you ask, would one want to use Graviton-based instances when trusty old x86 instances have served us so well in the past?

Well, for one, you'ds see a roughly 30% improvement in price/performance versus instances with x86 chips. This performance difference is even more pronounced at higher utilization, because unlike x86 chips, a Graviton vCPU is an actual CPU core, NOT a hyperthread you're sharing on a core with some rando. This should allow you to scale down, increase CPU utilization more than would be safe with hyperthreads, and still handle the same load.

Incremental IPv6 with Kubernetes

· 11 min read
TL;DR

Due to looming IP address exhaustion, we've been migrating my company's Kubernetes workloads to IPv6. While IPv6 has its sharp edges, AWS EKS's new IPv6-only mode and better OSS ecosystem support has made it possible to adopt incrementally.

Here's a bunch of tricks I've picked up in the process.

An full parking lot

At my work, we've been struggling a bit over the past few years with decisions made (almost 10 years ago now) about our AWS network design. While we have a full class A private network (16,777,216 IPv4 addresses), we've managed to paint ourselves into the very sad corner of looming IP address exhaustion.

There's a few reasons:

  • Our integration with cell network carriers (to support our home security systems) requires a huge chunk of our IP space
  • Our decision to use a multi-account architecture in AWS, and that we chose to use a flat IP space across our accounts. This means our IP space is fragmented across accounts, regions, and availability zones, making a lot of that address space effectively unusable.

Even with all of this, we might have been fine... until we went big on Kubernetes.

What would an OSS developer platform even look like?

· 15 min read
TL;DR

My team has built a developer platform that our developers really like, and is providing a ton of value for my company. But I'm struggling to figure out if and how we might open-source it. I'm looking for advice from you.

A toolbox

As a platform engineer, I enjoy the benefits of working in a field with a vibrant ecosystem of open source infrastructure and developer tools. I've spent much of the last decade building developer platforms by curating and assembling these tools, and after a number of iterations, I seem to have hit on something that's working really well for my current company (SimpliSafe).

As our platform's adoption has grown, we've gotten more and more frequent, really positive, heartwarming feedback from our developers who really like it. This is absolutely freaking delightful, and honestly never stops surprising me.

I often get asked by our developers if we should consider open-sourcing the platform. I've spent some cycles entertaining the idea, but I usually don't get very far before it seems unworkable.

This post is an experiment in thinking in public; I'd like to brain dump my thoughts on the challenges of building an open-source developer PaaS, in the hopes that the platform engineering community might provide some insight to get me past this block.

Developer experience is a product

· 14 min read
TL;DR

The most important feature of an internal developer platform is that the team that builds it has to compete to win over their users.

Figure out your initial value proposition, build a minimum viable product, get it in front of customers, listen, learn, and iterate.

Platforms imposed by a top-down mandate tend to fail.

Developer Experience Soda

Over the past 15 years, I've been working on one form or another of internal developer platform. Even long before, while working at small startups, I inevitably ended up building (or curating) some little web framework, a build system, and slapping together scripts to package and deploy our stuff reliably. No one ever told me to do this, it was just obviously necessary.

In these cases, I was building a product for myself and my immediate team members, so it was a pretty tight feedback loop with the customer. I'd put a little extra effort to make things nice for other developers on my team, and also out of a bit of pride in making something that felt elegant.

Prometheus vendor death match

· 13 min read
TL;DR

We evaluated a number of observability vendors, with a focus on metrics, and did detailed PoCs with both Chronosphere and Grafana Cloud. Both are excellent products, and have slightly different strengths.

Death match

At work, we're in the process of rebuilding our metrics pipeline, as we've outgrown our old self-managed TIG (Telegraf, InfluxDB, Grafana) solution. We've had this solution in place for many years, and it's served us well. Especially given the increasingly predatory pricing models of observability vendors, it's been extraordinarily cost-effective.

But over the last couple years, as we've grown, we've started to hit the limits of what we can handle with a single, vertically scaled instance of InfluxDB (especially using InfluxDB v1). It was increasingly stressful to keep it running smoothly, and we had to be very vigilant about cardinality, as it's very easy to accidentally introduce a cardinality explosion that can bring down the entire database.

Fun with OTEL collectors and metrics

· 6 min read
OpenTelemetry Logo

As part of an evaluation of Prometheus compatible monitoring solutions, I found the need to push our use of the OTEL Collector to handle some use cases like creating metrics allowlists, renaming metrics, or adding and modifying labels.

Here's some examples, based on what I learned, of the crazy and powerful things you can do with OTEL collector processors to manipulate metrics.

DevOps is a stew

· 7 min read
Irish Stew (10320713316)

When learning a new recipe, especially when dabbling in cuisine from different cultures, I find it really important to make sure one is really precise in their understanding the words used in the recipe. I've had a few unfortunate misunderstandings that resulted in... gastronomic disaster.

Similarly, I find that I can't responsibly use the word "DevOps" without testing that the person I'm talking to know which meaning I'm using. Here's some examples of what someone may think I mean when I say "DevOps":

Karpenter, you complete me

· 9 min read

Every once in a while, some new product comes along that solves a problem you didn't know you had, and does it so well that after you've had it, you can't imagine how you ever lived without it.

This is how I've come to feel about Karpenter. I guess you could say that the category it lives in already existed, given it's designed to replace the Kubernetes Cluster Autoscaler, but the effect it's had on my life as an EKS cluster operator and platform engineer makes me feel like the comparison cheapens it.

Kubernetes might not be for you

· 8 min read

Most mornings, after pulling myself out of bed, I put some semblance of a breakfast together. While eating, I usually take in the news (via the Android app of a traditional newspaper). I timebox this to about 10 minutes, which fits my breakfast-eating pace, and balances my desire to be an educated, responsible citizen with my tolerance for the existential dread I'm going to feel after reading about US politics.

Pepper the Dog
Pepper and coffee

Once I'm sufficiently fed/educated/terrified, I head over to the couch, where the dog joins me for a cuddle while I sip my coffee. At this point, I usually switch over to Hacker News. I've found that Hacker News is a pretty reliable purveyor of articles on topics that overlap my interests. I also appreciate that it gives me an nudge to get outside my go-to subjects, into pretty niche topics in tech, science, math, culture, philosophy (and interesting people who recently died) - all with a taint of delightful nerdiness.