Site Reliable Engineering (SRE)

Often I feel a bit impressed that continuous X and DevOps the magic term to close the gap between development and operation. Let us go back to the year 2006 and this interview here and the CTOs directive: “You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service”. SRE (Google book on can be found here) was invented more or less at the same time (2003) having the same in mind and making the 1990s invented ITIL more bottom up, from organizational perspective. ell the pressure not he developer to get rid of the operation well supported to improve the automation – we today know as e.g., CI/CD. The communication and collaboration between (often still separated) teams has improved. Together with the “Agile Manifesto” (external link) activities around the year 2000 it has been polished to the “super weapon”, but it still remains a question of service concept, primary driven by to root concerns:

  • RTO (Recovery Time Objective), definition taken from NIST (external link): The overall length of time an information system’s components can be in the recovery phase before negatively impacting the organization’s mission or mission/business processes.
  • RPO (Recovery Point Objective), definition taken from NIST (external link): The point in time to which data must be recovered after an outage.

These concern make the distingtion between DevOps and SRE:

  • It happened vs. before it happens: SRE is for the time alert a failure happened, DevOps does everything to avoid failure – a new trend is the role of “Platform Engineering“. People to train and coache engineers to continuously get better with the “platform” they are working on or using. A complementary approach is called “Projects 2 Products“. It emphasizes that the development “never” stops.
  • Deployment vs. Delivery: SRE look on uptime and availability after a application is deployed, DevOps looks on the efficient development and delivery
  • Keep it in the market vs. get it in the market: SRE is about robustness, DevOps is about rolling out new features and faster release cycle

Some words to commonly used terms:

SRE is the consideration behind “operational level agreements” (OLA). OLA is about the internal organization (contracts). Don’t mix it up with “service level agreement” (SLA) which goes in direction to the customer. Each given SLA promise is called a “service level objective” (SLO). If the SLA is meet, the “service level indicator” (SLI) gives the metric. Latest and really here are the bells are ringing, if the SLI is out of the given thresholds SRE has to react.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *