Monday, June 20, 2022

Measuring SRE vs SWE Impact

It's time to start a sprint/planning meeting.

As always, more work exists than engineers. Work must be triaged and prioritized.

The impact of the work is often a factor in deciding what cards rise to the top.

How should Site Reliability Engineers (SRE) measure impact compared to Software Engineers (SWE)?

SWE work is feature focused.

  • What have I added to this project?
  • What bug have I fixed?
  • How many story points am I taking on ?
  • What Epic will I advance with this task?

SWE is a product driven approach, which is valuable but also often opposed to reliability work.

SRE work often focuses on tasks that answer these questions:

  • What Toil will this reduce?
  • Does this work empower other engineers to self-serve a solution?
  • Will this work resolve a near miss that we've been fortunate to avoid?

Often SRE work looks like the reduction of tech debt.

That's not to say that SRE is responsible for tech debt, or should be the "infra grease" for an organization.  

SREs tend to be experienced engineers with a background in SWE, System Administration or even product management.  SREs often engineer solutions to sociotechnical problems.  The most effective use of SRE teams is to position them to rapidly identify and eliminate existing Toil, while driving efforts to increase reliability.

From a systems perspective, reliability is a spectrum.  If the number of things that increase reliability outweigh the choices that reduce resilience, a system is trending towards stability.  

SRE teams should measure the impact of the projects they take on by the amount those projects increase reliability.

No comments:

Post a Comment