Quantcast
Channel: Kumar Chinnakali – dataottam
Viewing all articles
Browse latest Browse all 65

Leaf #1 – The 7 Key Principles of Site Reliability Engineering (SRE)

$
0
0

Leaf #1 – The 7 Key Principles of Site Reliability Engineering (SRE)

Dear friends, in this week’s Leaf (Learning from this week) episode let’s talk about “The 7 Key Principles of Site Reliability Engineering (SRE)”. I came to know about the word SRE in my last Google Cloud Platform Stockholm conference, which excites me to drill down more about the role. Referred the free book called Site Reliability Engineering https://landing.google.com/sre/book.html

The easy way to understand the SRE is, class SRE implements DevOps. And from wiki we have Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to IT operations problems. The main goals are to create ultra-scalable and highly reliable software systems. According to Ben Treynor, founder of Google’s Site Reliability Team, SRE is “what happens when a software engineer is tasked with what used to be called operations.

A site reliability engineer (SRE) will spend up to 50% of their time doing “ops” related work such as issues, on-call, and manual intervention. Since the software system that an SRE oversees is expected to be highly automatic and self-healing, the SRE should spend the other 50% of their time on development tasks such as new features, scaling or automation. The ideal SRE candidate is a programmer who also has operational, systems or networking knowledge, and likes to whittle down complex tasks. If it has come to be defined at Google? SRE is what happens when we ask a software engineer to design an operations team.

While the nuances of workflows, priorities, and day-to-day operations vary from SRE team to SRE team, all share a set of basic responsibilities for the services they support, and adhere to the same core tenets. In general, an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of our product services.

Below are the 7 key principles of Site Reliability Engineering (SRE) team,

  1. Embracing Risk
  2. Service Level Objectives
  3. Eliminating Toil
  4. Monitoring Distributed Systems
  5. The Automation
  6. Release Engineering
  7. Simplicity

Click here for more details…


Viewing all articles
Browse latest Browse all 65

Trending Articles