Few years ago I went trough a massive frustration. My team aligned with the rest of the company in using the Agile methodology.
Initially the team decided to use Scrum then it switched to a sort of kanban. However even both approaches failed miserably dropping the team in utter chaos.
I remember the daily standups. It was like playing a stage role, people stated the tasks for the day knowing that that task would be not the one the will work on it.
The periodical planning was even worse. Hundreds of post-it, each of them with an estimate score from the other team members. A complete mess who nobody cared after the planning.
The estimates were what annoyed me the most. Getting estimates, for example, from a network manager over a database task, is like asking a plumber how long the electrician will take to replace a light bulb.
The main mistake, of course, was to apply a software development methodology to the system administration.
Out of my frustration I tried to find the pain points and a way to improve the workflow.
Systems are not agile
Sysadmins don’t like surprises. However, like the Force in Star Wars system administration have two sides. The day to day administration and the emergency handling.
During the day to day administration all tasks are planned well in advance. After putting a system is put in production state the changes are always minimal unless major changes are required (e.g. operating system upgrade). In that case the planning and testing is imperative before going live.
During the emergency handling all the slowish approach is dropped in favour of an high reactive behaviour. However that doesn’t mean that people will start guessing solution. The rule number zero when there is an emergency is to never do guesswork. Guessing is a one way ticket to the disaster. In particular if there is a database involved in the emergency, all actions should be trough the wise practice of RTFM.
So, anybody involved in operations is either most of the time a Jedi becoming occasionally a Sith when emergency strikes.
The things sysadmins like are few and very simple to understand.
- predictability, using the same configurations will result in the same behaviour
- stability, minimal need for downtime when patching and installing updates
- documentation, clear and exhaustive
- simple upgrade procedures, is a specific case of the predictability
- simple to troubleshoot, when you have a problem the worst situation is searching for the data in a maze
- planning, in general the event horizon of a sysadmin is at least 1 year.
The scrum or kanban’s workflow, as I can testimony, can easily fall down into micromanagement if applied to the operational world.
Out of my frustration I decided to write some guidelines for an hypotetical agile workflow which dind’t clash with the ops workflow.
Disclaimer: Please note that I’m not an agile expert, what I wrote is just an idea that needs further improvement. In that sense any feedback is more than welcome.
Le peloton is the main group of riders in a road bicycle race. Riders in a group save energy by riding close (drafting or slipstreaming) near (particularly behind) other riders. The reduction in drag is dramatic; in the middle of a well-developed group it can be as much as 40%.
Taking inspiration from le peloton is possible to build an agile approach that suits the sysadmin/devops workflow.
Le peloton is a mutable entity which adapts to the different needs automatically without a fixed leader. One or more teams will decide the course of the peloton which follows the direction temporarily. We can use some of the peloton’s concepts for our needs.
- points of authority
- single point of truth
- goal focused action.
Le peloton is organised in sub teams. Across the sub teams there is no general team leader and everybody can act like a team leader on demand. Who is the leader is decided by the task’s area. (e.g. the network manager takes the leadership when the task involve decisions affecting the network). When it’s needed, the expertise leader can virtually hire members from other teams for tasks he can’t do by himself (e.g. the network manager needs help from whom is purchasing the hardware to get new switches in place).
Points of authority
Each sub team covers a specific area of expertise and is completely independent for any action or decision. Each team is also responsible of keeping informed the rest of the teams on any unexpected change from the planning meeting. Each team relies on one or more members acting like the team’s captain. They are responsible of building the relationship with the other teams and they are the single point of truth for any information related to the tasks assigned to the team. They also keep the communication with the stakeholders.
Goal focused action
Le peloton focuses on goals. Each goal is owned by one team which is responsible of its progress. The team will also provide the estimate when possible. The tasks should be created on demand and each task doesn’t affects the goal progress unless is explicitly set into the main goal’s story.
The general planning is on a quarterly basis and sets just the high level goals with a discussion which is understandable by any team member. Technical details are the team’s responsibility which manages also any doubt or specific need using the adhoc-cracy.
Two week sprints
The workflow is organised in sprints of two weeks. Each sprint consists in maximum three goals retrieved from the backlog. At the end of the sprint each team is should create a very a short report of the goal’s progresses. This will be used in the retrospective and the eventual estimate if applicable. Same as for the general planning the discussion should be understandable by any team member.
Before a new sprint starts there will be a quick retrospective of the previous sprint. Any goal not completed is carried forward automatically to the next sprint unless something more urgent comes up. In that case the deprioritised goal is put on the top of the backlog ready to be worked out when a goal slot becomes available. Each goal will be listed in the backlog and moved into the sprint only if actively worked.
The estimates are encouraged. However the estimation should be only relative to the goal and not for the tasks. In particular the estimate should be provided only when enough information has been gathered to make an reasonable prediction. The estimation is a matter of the team who owns the goal. Any external involvement in estimation is forbidden.
As I stated before, this is just an idea that needs debate, improvement and tests.
In that sense I found that something similar has been proposed on a different scale by Holacracy. The method is better elaborated and applies for organisations that want to keep a flat hierarchy and survive the danger of being flat.
And, regarding flat hierarchy topic, I warmly recommend to read the blog post Flat Will Kill You, Eventually: Why Every Company Needs Structure.
Thanks for reading.