DevOps Handbook Digest

Part 1 — The Three Ways

Fernando Villalba
8 min readApr 16, 2019

The Phoenix Project and later The DevOps Handbook are two books that are great to begin a journey in DevOps and understand what steps are required to build and understand an organisation that breaths a DevOps culture.

What follows is a digest of the first part of the DevOps Handbook. This is mostly meant for people who have read the book and want a quick reference or refresher of the topics described in it, if you haven’t read the book, I totally recommend you do.

History

DevOps derives from

The Agile Manifesto came first, inspired by these set of ideas a talk was given at Velocity Conference called “10 Deploys per Day: Dev and Ops Cooperation at Flickr”. The term DevOps was coined shortly after by Patrick Debois, who had been greatly inspired by this talk.

From lean and Toyota Kata, the most important lesson is that organisation must continuously improve and refine their processes; stagnant processes lead to a stagnant organisation that does not improve and evolve. In fact, this improvement kata must be at the core of the culture of the organisation and every employee must contribute actively to improve processes and outcomes.

Lead Time

One of the fundamental concepts in Lean, is value stream.

The book Value Stream Mapping: How to Visualize Work defines value stream as “the sequence of activities an organisation undertakes to deliver upon a customer request”

Lead time is a subset of value stream and it defines the time it takes for a request to be fulfilled from the time it is made. Processing time is the actual time a request is worked upon, this does not include the time that the request spends on the queue.

Traditionally lead times took months to be delivered to production, due to tightly coupled, monolithic applications, lack of automated testing and long and complicated change management systems.

On the other hand, a DevOps organisation can have lead times of minutes, this can be achieved by doing the following:

  • Checking small code changes into the repository.
  • Performing automated and exploratory testing against it.
  • Deploying to production.

Because the changes are small and the testing is thorough, the potential of breaking the production environment is greatly reduced. Also, reverting a small change is a lot easier than reverting a very big change.

This is easier to achieve when the architecture is modular, well encapsulated and loosely-coupled so that small teams are able to work with great degrees of autonomy and failures only causing localised downtime.

The Three Ways

The Phoenix Project presents the Three Ways.

  1. The Principles of flow: Build small batches and optimise for fast delivery from development to production
  2. The Principles of Feedback: There needs to be a constant flow of feedback from right to left, enable faster detection and recovery and communication between teams.
  3. Principles of Continuous Learning and Experimentation: Working in a generative organisation with high trust culture that supports dynamic, disciplined and scientific approach to experimentation and risk taking.

The First Way
The Principles of Flow

Fast and smooth flow of work from development to operations to deliver to customer quickly.

This can be achieved by:

Making the work visible

If we do not have good visibility of the work being done at the moment, it is very difficult to know where the bottlenecks and impediments are and it is quite easy to lose track and sight of the overall picture.

One of the best ways of doing this is by using kanban boards,

Limit work in progress (WIP)

Interrupting technology workers has consequences that are not easily appreciated. Programming requires intense concentration and constant interruptions can be very disruptive.

Multitasking can be very disruptive and significantly degrades the quality of the work being done, hence disruptions should be mitigated and work in progressed limited so multiple tasks are not taken at one time.

  • Limit number of cards put in the “in progress” column in your kanban board. Nothing should be worked on until it is on the in progress column.
  • Minimise disruptions to your engineers.
  • If there is nothing to do with the tickets in the in progress queue, first try and see why are these tasks blocked and what can be done about it.
  • “Stop starting, start finishing!”

Reduce batch sizes

  • Large batch sizes result in long lead times and poor quality, if an issue is found in the code, you may need to do a huge revert.
  • Having small batches means your features arrive quicker to the customer and in an incremental fashion.

Reduce the number of handoffs

  • If you bounce a unit of work around multiple departments and teams you lose visibility and the people working on it loose context.
  • Have your teams as self-sufficient as possible, if they are in complete control of the release process your changes will see fruition a lot quicker.

Continually identify and elevate our constrains

Any work that’s not directed to eliminate a constrain (or bottleneck) in the work flow is pointless. Apply these five steps to eliminate the constraint:

  1. Identify the system’s constraint.
  2. Decide how to exploit the system’s constraint.
  3. Subordinate everything else to the above decisions.
  4. Elevate the system’s constraint.
  5. Once you find and eliminate the constraint, start again, find the new constraint and keep eliminating them as they appear.

The constraints in a typical DevOps transformation are typically found in one these areas:

  • Environment Creation
  • Code Deployment
  • Test setup and run
  • Overly tight architecture

Once the above constraints are broken, the constraint is likely to shift to either development or product owners. Having the constraint shifted to the creative process is a good thing, because then you are only limited by your imagination, not by ineffective organisational structures or poor processes.

It’s also good to bear in mind that you should not have any one individual as a constraint. If one of your team members is a poor communicator or does work that is difficult to follow for others, that’s also a constraint that needs to be dealt with.

Eliminate hardships and waste in the value stream

There are the following areas of waste in software development that should be mitigated or eliminated:

  • Partially done work: This becomes obsolete and it loses value as time progresses
  • Extra processes: Processes should be refined or eliminated based on their utility. If you have complicated change management or documentation steps that add no value to anyone, eliminate these immediately.
  • Extra features: Fancy features that don’t add significant value waste time and increase application bloat.
  • Task switching: When people are constantly interrupted or given multiple tasks and assigned to multiple work streams their efficacy is greatly reduced.
  • Waiting: Any delays or constraints that are impeding work down or up the line should be addressed so people don’t sit idle waiting to complete their work.
  • Motion: Handoffs, communication to different departments, teams that are not colocated or in the same time zone, this can waste valuable time.
  • Defects: Any defects that are not addressed accrue in technical debt and time down the line and should be addressed. These could be defects on processes, documentation, code, etc.
  • Nonstandard or manual work: Anything that has to be done by hand that can be automated is a waste of time and it introduces errors and snowflakey setups.
  • Heroics: Organizations don’t need heroes, they need good systems in place, if you need to perform daily heroic acts, such as working until past midnight, then there are deeper issues that need to be addressed:

The Second Way
The Principles of Feedback

Feedback is an essential component of an ever improving system and so everyone should be responsible to enable the continuous flow of information back from actions taken. So errors don’t persevere and are corrected as soon as possible.

Working safely with complex systems

Complex systems are very difficult for a single person to see it as a whole and understand how all pieces fit together. The behaviour of a complex system is difficult to predict accurately, especially when there is tight coupling of the components.

See problems as they occur

  • Every work operation is measured and monitored, any defects or significant deviations are acted upon quickly.
  • Feedback and fastforward loops must be wherever work is performed, at all stages of the value stream. This also includes automation and test processes that can report quickly any defects, to be quickly solved.
  • Pervasive telemetry and observability is key in achieving this.
  • Testing is merely one type of feedback

Swarm and solve problems to build new knowledge

  • The goal of swarming is to contain problems before they have a chance to spread, and to learn from the problem at root cause so it does not occur again.
  • Instead of working around a problem or scheduling a fix when there is more time, swarm and fix it immediately.

Swarming is necessary for the following reasons:

  • It prevents the issue from growing exponentially and accumulation of technical debt.
  • It prevents from starting new work that could make solving the issue ever harder or build more problems.
  • If the problem is not addressed, it could easily happen again.

An “andon cord” is an excellent tool to have around so teams can use and address issues to be fixed immediately.

Keep pushing quality closer to the source

The effectiveness of approval process decreases as we push decision-making further away from where the work is performed. This also increases cycle time.

Some examples of ineffective quality controls:

  • Requiring another team to do tedious and error prone tasks by hand when it could be easily automated by the team itself, for example delegating all tests to QA or all operational and configuration tasks to operations team.
  • Requiring approvals from people who are not familiar with the work being done.
  • Creating large volumes of documentation of intricate details that quickly becomes obsolete, or that it is hard and ponderous to read and follow.
  • Pushing large batches of work to other teams and committees and then waiting for responses.

Instead use peer reviews at your team to review and approve changes before releasing.

Enable optimising for downstream work centres

According to lean, the most important customer is the next step downstream. So you should always package your work nicely for the next person who need to inherit your work.

The Third Way
The Principles of Continual Learning and Experimentation

High performing organisations actively promote learning and experimenting, by applying a scientific approach to both process improvement and product development.

Enabling Organisational learning and a safety culture

If you constantly chastise your employees for their mistakes they will do their best to avoid telling you next time. This prevents inquiry and learning and it can be problematic for the value stream.

Institutionalise the improvement of daily work

In the absence of improvements, processes don’t stay the same — due to chaos and entropy, they degrade over time.

More important than daily work, is improvement of daily work.

We improve daily work by explicitly reserving time to pay down technical debt, fix defects and refactor.

Transform Local Discoveries into Global Improvements

  • Make post-mortems readable by the whole company
  • When you learn something that can benefit the whole company, share, document and divulge!

Inject resilience patterns into our daily work

Stress test all your processes and system to stretch capacity. Don’t just increase one area but look at whole picture.

Leaders reinforce a learning culture

A leader’s role is to create the conditions so their team can discover greatness in their daily work.

Leaders are not close enough to the work, but frontline workers do not have the broader organisational context or the authority to make changes out of their area of work.

Leaders must elevate the value of learning and disciplined problem solving.

Institute a scientific approach to problem solving by asking the right questions.

--

--

No responses yet