Making sense of SRE and observability, one week at a time.
What is site reliability engineering (SRE) really about? How can I make sense of it in my organisation? How do I cut through the buzzwords and actually improve the lives of my colleagues and customers?
Latest episode
Watch now
When you first start implementing SRE it's a good idea to find early wins. Implementing monitoring of the four golden signals + availability is something I'm experimenting with at the moment to give our SRE team momentum and to pave the way for SLOs and more advanced observability. In this solo episode I share my experiences, including... 💛 What are the four golden signals? 🦥 Observability in an async processing context 📏 The power of tracking availability 🤷 Why SLOs can be challenging to kick off on day 1 🚪 An invitation to come on the podcast ...and much more. You can buy Slight Reliability merch here (Note: you cannot order the mugs outside of New Zealand): https://slightreliability.digitees.co.nz/ You can find Stephen on: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Bluesky: https://bsky.app/profile/slightreliability.bsky.social YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre
Latest episodes

About the host
Stephen has a background in SRE and performance engineering. He has worked in the industry for 15 years as both an external consultant and an internal engineer.
Our industry is full of buzzwords and exaggerations, it can be hard to know what is real or not. Stephen strives to take these complex technical concepts and to simplify and present them in a way everyone can understand and apply (and to call out when something is too good to be true).
Stephen lives in Auckland, New Zealand and currently works as a Developer Advocate for SquaredUp, as well as promoting and improving observability and SRE practices internally in the organisation.
