Running Applications in Production Workshop
The most important feature of any system is its reliability. Using Site Reliability Engineering (SRE) concepts and the principles of DevOps, teams can learn how to define objectives that describe and measure expectations of both value and reliability. In this workshop, you'll work through understanding how Service Level Objectives (SLO) and Error Budgets can be utilized to manage risk, reduce tension and enable organizations to move faster in a data-driven, user-focused manner.
The first part of the workshop is based on SRE/DevOps concepts and theoretical exercises. The workshop then shifts to focus entirely on building demonstrable value within the team's organization.
Studies show that 40 to 90 percent of the total cost of software is incurred after launch. This workshop explores the foundations critical to running software in production:
- Identify what "reliable" looks like
- Exploring objectives that define user value and expectation
- Review metrics that validate how the objectives are met
- Manage risk to enable product teams to move faster
- Understand how to measure user expectations
- How does this apply to DevOps
- Why your services need SLOs
- Spending your error budget
- Choosing a good SLI
- Developing SLOs and SLIs
- Blameless Post-Mortem
- Define SLO targets based on user journeys
- Identify and break down a valuable user journey in your system
- Define an SLO for this service based on current expectations
- Define SLIs to measure this SLO
- If possible, put this measurement into production
- Influencing data driven conversations within your organization
Who should attend?
This is a technical workshop and is designed for development/operations engineers and their immediate management. However, the best outcomes will result when technical product people and business leaders participate as well. SLO targets and Error Budgets need to be set with users in mind and the consequences for exceeding an Error Budget needs to have executive backing.