ACNC Weekly #23

Welcome to All Cloud, No Cattle Weekly #23.

Last week was unexpected an off week, literally because it was a slow news week.

Tech

Building a Healthy On-Call Culture

SoundCloud’s Christine Patton:

The optimal frequency for being on call is about three days a month. More than that and people risk burning out over time. Less than that and people get rusty and aren’t as effective at dealing with incidents. This means the optimal size for a rotation is between eight and twelve engineers, with ten being just about perfect. In fact, I was once part of a rotation that had a waitlist to join because we collectively agreed to not grow bigger than twelve people.

I never joined the global on-call rotation at Booking largely because of this very concern: after looking at the cadence, I knew that I would not be able to be an effective on-call engineer over time. I would have only been on-call for about a weekend per quarter and, as a Team Lead who only spent about 20% of his time on technical contributions, this did not feel like I would be able to be productive.

Please don’t count outages

Rachel by the Bay:

This one is subtle, but it has a lot to do with the way people behave in the face of a measurement. In short, if you start counting them, it’s probably because you’re going to start making reports which say “we had X outages in this span of time”. There might even be a gasp trend line showing it going up or going down.
This is terrible. You think it’s going to help, but it’s not. At best, it will have no effect on things, but at worst, it will tell the people in the trenches that “opening a SEV (outage, …) is baaaaaad”, and they will shy away from doing it. Worse still, they may not even realize this avoidance behavior as a conscious thing. It just might not occur to them to hit the create button when it’s time.

100%. To set the right culture, we have to be very careful about the messages we send and the incentives we set.

Don’t worry so much about how many outages we have, instead worry about our overall reliability and resilience.

Using Civo Kubernetes to gamify Twitter with Prometheus and Grafana

Wiard van Rij:

It started with a Tweet from Julien Pivotto (@roidelapluie): He had created a setup that enables you to graph your Twitter followers with Prometheus and Grafana via json_exporter. This is available on Github.
I wanted to extend this creativity by automating my Twitter banner so that it would display the Grafana panel. It also should update this automatically every n-period - in my case, every minute. This way I have a little gamification for my followers. If one would follow me, the graph should go up at the next update interval. Seems pretty neat!

This is brilliant and I’m already busy thinking of ways to do something similar myself.

Should Perl die gracefully?

Mark Gardner:

You have no right to demand Perl stands still and “dies gracefully” any more than anyone has the right to demand that of you.

Well said.

Grab Bag

we can’t both be right

n-gate.com:

An internet lectures passersby about webshit. The lectures are sprinkled with advertisements for an HTTP server that runs as root. We are expected to take security advice from this person seriously.
We do not.
The arguments are copied here for posterity. For the reading impaired: the other site’s text is in block quotes.

Talk about hills to die on…

ACNC Weekly #23

Tech

Building a Healthy On-Call Culture

Please don’t count outages

Using Civo Kubernetes to gamify Twitter with Prometheus and Grafana

Should Perl die gracefully?

Grab Bag

we can’t both be right

Further Reading

ACNC Weekly #17

ACNC Weekly #18: Celle-bitten

ACNC Weekly #19

Trending Tags