All Cloud, No Cattle Weekly #9
Tech
Preparing to Issue 200 Million Certificates in 24 Hours
From the LetsEncrypt blog:
On a normal day Let’s Encrypt issues nearly two million certificates. When we think about what essential infrastructure for the Internet needs to be prepared for though, we’re not thinking about normal days. We want to be prepared to respond as best we can to the most difficult situations that might arise. In some of the worst scenarios, we might want to re-issue all of our certificates in a 24 hour period in order to avoid widespread disruptions. That means being prepared to issue 200 million certificates in a day, something no publicly trusted CA has ever done.
This is so mind blowing that I honestly got nothing else to say. Read it. It’s worth your time.
Metrics Catalog
The metrics-catalog is a declarative approach to monitoring the GitLab.com application, intended to improve the consistency and reduce repetition of configuration in our monitoring suite.
Andrew Newdigate gave a talk on this approach at ScaleConf in 2020, which provides a good overview of the approach we use at GitLab.
This is a really amazing approach to using Prometheus and I wish I’d discovered it a couple of years ago. Config management for our Prometheus instances at Booking was suboptimal at best, but coming up with something better just seemed a herculean task.
On Not Being a Cog in the Machine
Fred Hebert at Honeycomb:
I found myself looking at Honeycomb’s job ad (PDF) for their first SRE position. A few things instantly stood out.
It didn’t name any specific technology, nor did it necessarily ask for any specific prior titles or education. Instead, it mentioned characteristics of the person they wanted:
- Someone who can debug both automated and human processes
- Someone who can work in both software engineering and automation
- Someone able to find balance in all things
- Someone who enjoys teaching and practice
- Someone with some experience of managing stateful services
This led me to apply despite never having held the SRE title before and not having worked extensively with the main backend language used there.
As someone who a) recently changed jobs, b) has experience as an SRE, and c) has repeatedly joined companies whose tech stack he wasn’t already familiar with, this entire post really spoke to me - but I especially like the direction that Honeycomb went with their job description here.
Sometimes alerts have inobvious reasons for existing
Chris Siebenmann:
So, alerts have intentions, and we should make sure to document those intentions. Without the intentions, any alert can look stupid.
Speaking of, I think these two sentences could practically be a job description.
@thockingoog commenting on HN
TL;DR burstable CPU is a safety net. It has risks and requires some discipline to use properly, but for most users (even at Google) it is better than the alternative. But don’t take it for granted!
Hear hear! We spent months and months discussing how to handle this, and I think ultimately a lot of good could have been made had we just fallen on this side of the argument over CPU Limits.
How I monitor my OpenWrt router with Grafana Cloud and Prometheus
Matthew Helmke writes:
My internet router runs OpenWrt, which is a free/open source Linux operating system designed to replace the software provided by the router’s manufacturer. You can install OpenWrt on a wide range of supported devices. When you do, you frequently end up with enhanced stability along with additional configuration options not available in the stock software.
In this post, I will describe how to monitor the functionality of my Linksys WRT1900AC router running OpenWrt using Grafana Cloud.
I’m a big believer in Grafana, so I love seeing it used for stuff like this.
Don’t Stop Releasing
Click it. You won’t be disappointed.
Grab Bag
We Interrupt This Blog for Something Really Awesome
Paul Lukas at Uniwatch took a short diversion from sports uniform news:
The Baltimore-specific phenomenon of salt boxes, which are these curbside yellow boxes set up on street corners each fall by the city’s department of transportation (DOT). The idea is that local residents can scoop out some salt to use on their sidewalks, driveways, and so on. I’d never heard of anything like that (we certainly don’t have salt boxes here in NYC, or anyplace else I’ve lived), but I liked the idea of it: Socialized sodium, community chloride!
There’s a Baltimore craft artist who’s sort of locally famous for making stuff out of broken dishware. Her name is Juliet Ames, and her website’s URL is actually ibreakplates.com. Back in December, Ames got it in her head that the old, battered salt boxes could use some sprucing up and that she should decorate one of them with broken-plate lettering
Kristin is a Baltimorean and she assumed that salt boxes were normal everywhere. That these have suddenly become a community art project is really cool.
Darl McBride Files for Bankruptcy
As noted by a reddit poster:
It looks like former SCO CEO Darl McBride, who tried to sue Linux out of existence, has filed for chapter 13 bankruptcy.