Skip to Main Content

Making sense of SRE and observability, one week at a time

What is site reliability engineering (SRE) really about? How can I make sense of it in my organisation? How do I cut through the buzzwords and actually improve the lives of my colleagues and customers?

Episode 85 - Feeling SaaSsy
Watch now
Episode 84 - Clinical Troubleshooting with Dan Slimmon

This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response. You can find Dan's blog at or connect with him on LinkedIn here: You can find the official Slight Reliability podcast website at: You can find Stephen at: LinkedIn: Twitter: YouTube: Instagram: TikTok: This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to to sign up for your free account.

Latest episodes
Stephen Townshend

About the host

Stephen has a background in SRE and performance engineering. He has worked in the industry for 15 years as both an external consultant and an internal engineer.

Our industry is full of buzzwords and exaggerations, it can be hard to know what is real or not. Stephen strives to take these complex technical concepts and to simplify and present them in a way everyone can understand and apply (and to call out when something is too good to be true).

Stephen lives in Auckland, New Zealand and currently works as a Developer Advocate for SquaredUp, as well as promoting and improving observability and SRE practices internally in the organisation.