In September 2013 I joined the group known today as WWT Application Services and have been learning about DevOps, mostly through project experiences, since I began. I'm fortunate to have served in development roles commonly associated with DevOps for WWT custom software teams as well as consultant opportunities within teams external to WWT. Participating in transformational projects and improvement initiatives has taught me that DevOps can be a subjective if not arbitrary term. Desired IT performance characteristics such as "quick to market," "technical agility" and "operational stability" are intriguing aspects to pursue and thrilling experiences to observe, but how is success determined? What are the identifiable characteristics of a successful DevOps implementation?
In my consultative opinion, answering these questions and working towards alignment with customer expectations for a successful outcome is one of the more difficult aspects of providing DevOps-related services. At times, customers provide an opinion of tooling implementations as a measure of DevOps success. Eventual tooling utilization is the opinionated goal of the project, not necessarily an intended influence for the larger organization. I'm also of the belief these efforts often support a biased objective which benefits selective departments or roles rather than a targeted optimization for the organization as a whole. From the perspective of a company executive or director, roles which provide influential opinion, this is an understandable strategy. Whichever opinion provides the direction for an initiative, influencing organizational performance is difficult when department-centric improvement is pursued.
Identifying key performance related metrics throughout an organization's planning, development and operational processes is a crucial, and commonly overlooked, component of improvement initiatives. Such data points provide a performance baseline (initial status) and over time, relational data for hypothesized improvement efforts.
Which metrics should be used to measure organizational performance? How do we know if improvement has happened? Maybe more importantly, how could an outside consultant evaluate an organization's performance and determine what's needed for realized improvement? Rather than initially focusing on each unique and intricate integration within a company, we can generalize an organization as a system when evaluating performance.
Ideas or initiatives enter the system, work is done by the system, a product or service results as an output, and (ideally) a yield of value is realized. We can evaluate the system by measuring certain key characteristics:
- The amount of process time for the system to provide output relative to input.
- The frequency of output.
- The frequency of providing the expected output.
- The amount of time to resolve failures or defects caused by output.
As previously mentioned, customers may bring preconceived opinions regarding their desired improvement which at times can make engagements wonderfully awkward. They believe agile, cloud computing and Infrastructure as Code are solutions for improvement and rely on experienced technical roles to implement their transition (see Dunning-Kruger effect). However, what may not be anticipated is the pursuit of a symptom rather than a cause which is difficult to influence with tooling. While extremely useful, tools will not provide the solution for resolving organizational performance issues. Only the organization has that capability.
Implementing an agile framework innately provides as much agility for an organization as a tape measure is used to build a home. Linux containers enable development velocity as much as a mixing bowl enables a delicious cake. Tooling can complement improvement efforts but fails to provide an adaptation for goals, incentives, processes and communication structures. In my experience, tools are irrelevant when performance is at the forefront of organizational improvement.
A good example is the legendary Velocity Conference talk given by John Allspaw and Paul Hammond more than a decade ago. While discussing the characteristics of enabling ten deploys per day at Flickr, they introduced key values and methodologies which provided the foundation of the current DevOps model. Tooling is certainly mentioned however, the primary lesson is Flickr's example of utilizing tools to enable innovative development and operational processes rather than define them. If tooling was the key to improving organization performance, how could a company from 2009 outperform enterprise organizations of today who have the CNCF tooling landscape now available for their technology solution needs? The answer: Flickr constructed a better system for completing work. A cultivated system which prioritized people over processes and processes over tooling. An organization which optimized its collective systems to the benefit of the larger organization.
Maximizing optimization of exclusive sub-systems within an organization typically does not improve performance. As an example, let's say there is an organization implementing an agile transformation for its development teams. After a given amount of time the Scrum framework is adopted, the teams adapt, and velocity metrics begin to show the teams are "completing" their targeted objectives in 2-week sprints. However, when a team is ready to deliver completed work to production, the organization still requires approval from a monthly occurring change-advisory board (CAB) process.
Despite metrics reported by the now Scrum teams, a downstream release process constrains the organization at twice the rate from which the teams are reporting completed work. Essentially, the teams' agile practices are an over-optimization in comparison to the organization's overall performance. Many real-world teams who are presented with this dilemma do not focus on the formal release process (after all, it's not their responsibility) but instead are motivated to increase their delivery batch sizes in relation to the cadence constraint. This pattern then begins a downward spiral for the organization's technical solutions regarding quality, performance and stability.
The highest priority in agile development
While participating on teams who've identified as agile, I've worked with a range of popular agile frameworks. At times, with both experienced agile teams and teams who've just begun their agile journey, organizational roles, processes and systems were modeled according to the framework's documentation to supply confidence in team agility. Boards were used to visualize work. There were regularly scheduled meetings for stand-up, story planning, backlog grooming, retrospectives and stakeholder demonstrations. Vanity metrics were tracked in relation to interpreted goals derived from the framework guide. Given these efforts to improve agility and performance, the teams were still releasing work to stakeholders in months or even year-sized batches.
What is commonly not considered is team performance in relation to the 12 principles of agile and the importance of customer feedback for work considered "done." The first principle states (regardless of framework), "our highest priority is to satisfy the customer through early and continuous delivery of valuable software." If customer value cannot be determined until a solution is delivered and used, how much organizational agility exists in delivery cadences of 6, 12 or 18 months? The question does not attempt to distinguish agile legitimacy based on batch size, rather it helps to determine a preferred performance. Would the organization see a greater benefit if completed work was validated in days, hours or minutes?
Working in small batches
In my experience, improving organizational performance is not a prescriptive process because the eventual solutions are as technically unique as the organization. Improvement initiatives begin in unique contexts, desired transformations target unique objectives, and drastic variation exists for the skillset of all involved roles. There are however patterns which are common amongst the high performing teams I've worked with and these patterns typically enable one key ability: completing work in small batches.
Completing work in small batches is an enabler of performance for an organization. It directly influences The Four Key Metrics of performance, provides a construct for implementing The Three Ways of DevOps and is a requirement for any process an organization would like to classify as "continuous" (see CI Theater). That being said, the transition to completing work in small batches typically requires significant change which influences an organization's cultural context and systemic processes.
The envelope exercise
Organizational agility and performance are typically not constrained by technical capabilities but by the amount of work considered "in flight," also known as work in process (WIP). Large amounts of WIP may initially seem harmless, perhaps even a positive interpretation of performance, but completing work in large batches sizes greatly restricts flow and efficiency for any system requiring a series of stages to complete an objective. The Envelope Exercise (a demonstration of continuous flow) is a good example of how process alone directly correlates to a system's efficiency and performance. The exercise involves preparing 10 copies of a letter for mail delivery using the following process:
- Fold the letter to fit inside the envelope.
- Place the letter in the envelope.
- Seal the envelope.
- Address the envelope.
- Place envelope for delivery.
Common intuition for completing this task is to divide the work by its stages (folding all letters, stuffing all envelopes, sealing all envelopes, etc.) and creating a collection of inventory which traverses each stage of the process once the batch of 10 has been completed. In practice however, performing each step of the process one-by-one (fold one letter, stuff an envelope, seal the envelope, etc.) has been proven to provide the quickest process for completing the task as well as discovering potential issues in the planned later stages. Learning to plan, develop and deliver abstract inventory in limited batch sizes is the comparable pattern for efficient flow of work in software.
While flow of work is important, feedback loops are the mechanisms which ultimately determine a system's performance. Systems which plan for flow without frequent feedback are eventually crippled by "rework," the additional work created to resolve defects according to stakeholder expectations or customer experience. When feedback is not considered in relation to flow, IT systems become overwhelmed by planned as well as unplanned work and many systems struggle in stressful contention to complete either. The larger the batch size of delivery (WIP), the greater amount of risk for the business initiative.
Where should an organization start for a transition to working in small batches? Which aspect should an organization first focus on to construct systems which gain efficiency, increase quality, reduce risk and improve employee satisfaction? I will again revert to the people, process and tools pattern. Since working in small batches is a process implementation and people are the prerequisite, we must provide an alignment of beliefs, characteristics and practices for the psychology of the system. In other words, we must have an organizational culture which values working in small batches.
When we utilize a perspective of systems thinking, we're able to hypothesize and validate solutions for very complex and rigid problem sets. As mentioned, it's difficult to provide a prescriptive set of steps for transitioning to working in small batches. I can, however, provide an initial direction as I strongly believe pursuing these organizational characteristics are good first steps to begin the journey.
Early in my career I believed showing vulnerability related to technical domains of responsibility was a sign of weakness — a professional faux pas. I'd consistently done my best to hide implications of incompetence and for a long time primarily only pursued tasks which related to my Subject Matter Expert (SME) label because that's what kept me comfortable and confident. Once empowered by a company culture which prioritized trust, empathy and learning, I began to realize my "hiding strategy" was actually restricting my domain knowledge and technical skillset. I've learned leaning into an exposure of vulnerability provides great opportunities for learning and often the best lessons.
Is there a stigma tied to ignorance within your organization? Perhaps also a stigma tied to failure? Improvement is unlikely if it must be predicable and perfect. Designing systems to safely allow for and expect failure is an enabler for improvement. Failure provides the opportunity for learning, learning leads to iteration and iteration leads to efficiency and stability. Until ignorance and failure are relinquished from shame, change is viewed as risk and is not likely to provide improvement.
Become a learning organization
Confronting our collective vulnerabilities provides an opportunity for learning. Not necessarily as formal training programs but lessons related to our current experiences within organizational goals and processes. Organizational learning enables the adaptation of systems to align visions and goals, propagate tribal knowledge, communicate context and cultivate its participants to embrace change in motivation of improvement. Learning organizations are comprised of groups who desire accountability, autonomy, collaboration, curiosity, diversity, empathy, humility, transparency and are continuously encouraged to question the systems they work in. Learning organizations tend to generate talented IT professionals and talented IT professionals are drawn to learning organizations.
Quality is the key
What is quality? Can one person find quality where another does not? If there is a disagreement about quality, on what basis can it be debated?
It's difficult to describe quality without including lessons resulting in subjective or socially accepted value; the ability to provide an experience which is reliable and preferred over others. From a business perspective, quality can provide lower costs with greater performance which could then be utilized as a competitive advantage within a given market. It's common for businesses to have technical departments, roles and formal processes primarily focused on quality, yet still struggle to implement change, reliability and a preferred user experience. Why is this?
Conway's Law states organizations will design systems which mirror their communication structures. Technical roles build solutions which reflect their understanding of purpose in engineering cultures commonly established by leadership. Leaders are the personification of an organization's culture and so, the quality realized in a company product can be correlated to the quality of the organizational systems used to create the product. As W. Edwards Deming stated, "quality is everyone's responsibility," but also, "quality starts in the boardroom." A successful DevOps implementation increases both the quality of a product as well as quality of life for those working within the system.