
Site Reliability Engineering Lead - London
Greater London, South East, England
Apply by 29 Apr 2026
£90000 - £110000 per annum
Job Ref.: BH-56921-1
Job Description
You’ll lead and mentor the SRE team, setting direction and raising the bar for reliability across our systems. You’ll take end-to-end ownership of production, ensuring availability, performance, and effective incident response, while defining SLIs and partnering with Product on meaningful SLOs and error budgets.
In practice, that means you’ll:
- Own production systems (availability, performance, incident response)
- Define SLIs/SLOs and use error budgets to guide decisions
- Run incident management, on-call, and blameless postmortems
- Get hands-on with code (PHP, Java/.NET) to troubleshoot and improve reliability
- Drive automation and reduce operational toil
- Build observability that gives real insight into system health
- Partner with engineers to embed reliability into the SDLC
We’re looking for someone who brings strong technical leadership, communicates clearly (especially during incidents), and takes real ownership of problems through to resolution. You should be comfortable operating at scale, have deep experience with SLIs/SLOs, incident management, and observability tooling, and be at home working with Linux, databases, cloud platforms (ideally Azure), Kubernetes, and Infrastructure as Code.
Just as importantly, you should enjoy tackling complex, imperfect systems — and turning them into something reliable, scalable, and well-understood.