Talk on Continuous Delivery at CodeCentric event in Hamburg

End 2013, the general manager of CodeCentric in Munich asked me to do a presentation on our view / achievements / experience in the context of Continuous Delivery. The first of a series of events happened in Hamburg on 26th of November in a nice location.

I put the presentation on SlideShare to make it accessible to others as well. The title of the presentation “Continuous Delivery? Nett oder nötig?”.

The presentation talks about our goals, why we decided to introduce continuous delivery as a way of delivering software to our business, shows some of the experience we made, tells somethings about the challenges we had transforming our architecture to fit the new delivery ways, tools and some more.

In case you have any questions, don’t hesitate to get back to me on michael (at) agile-minds.com.

10 rules to prevent a web site from high scalability

On highscalability.com there is a great post on things you should do to prevent your web site from high scalability: “The 10 Deadly Sins Against Scalability“. The post points to Sean Hull who twitters and writes quite frequently on scalability topics (surprise, surprise).

Sean Hull wrote in his blog about “5 things toxic to scalability” (2011) and “Five More Things Deadly to Scalability” (2013). Definitely worth reading entries on high scalability – and common pitfalls.

Book by Martin L. Abbott, Mihcael T. Fisher on scalability rules.

In the context of this topic, Sean also recommends a book: “Scalability Rules for managers and startups”.

Very good reading to avoid all high-scalability pit-falls right from the beginning!

 

 

 

Software releases without damages and poor user experience @ Facebook

Recently I did some research on software releases and how huge successful companies manage to release their software without causing system failures and poor user experience.

I found this question being answered by a former Facebook release engineer: “How do big companies like Facebook, Google manage software releases without causing system outages and poor user experience?“. In addition to this great answer, there is also an interview at arstechnica “Exclusive: a behind-the-scenes look at Facebook release engineering“.

Facebook follows since the very beginning the principle of zero downtime and no service interruption. How do they accomplish this – even now, when they’re big?

Pushing releases in multiple phases

Facebook pushes their releases in waves. The initial phase, named “latest” always contains the latest code changes (hence the name). All engineers are connected to the “latest” staging system, gather in an IRC channel when the initial push happens and watch logs and error messages meticulously. If the build proves to be okay a push to some servers in production happens (p1-phase). Again, all developers concerned with the release collect in the IRC-channel to watch the release and gather KPI changes, log messages and errors. The next stage, phase p2, includes roughly 5% of all live server systems – again thoroughly monitored. Finally, when phase p2 proves to be good again, the deployment takes place on all server systems. Deployment @ Facebook means copying a 1,5GB binary to all servers – done with a customized bit torrent distribution system.

If an error occurs? Well, the developers are on IRC and held accountable to fix their piece of code. If you crash it, you repair it!

Multiple versions of code running simultaneously

Executing code in Facebooks’ server environment means automatically running multiple versions of code simultaneously. This means an extra effort to address this principle. Hardest is to migrate database schemes from one to another.

Features with on-off toggles

Facebook utilizes a tool named “Gatekeeper” to allow real-time on/off switching and throttling of features. Only few code changes need to be introduced and Facebook operations can control the traffic and which features are available. Code in such environments need to be highly decoupled – no dependencies between features …

Versioned static resources across the web tier

All the web servers in Facebooks server farm are able to serve all static content of all versions being deployed. This means that all servers are equipped with all resources prior to phase p1 deployment. This allows the whole web tier to remain stateless.

If you condense down what’s said in the articles it comes to these points:

  1. Automate everything!
  2. Test early, test often!
  3. Hold developers responsible and let them fix live errors.
  4. Each release has an owner involved from all stakeholder teams.
  5. The product is designed to be rolled back. From the beginning.
  6. The product is designed to execute multiple versions at the same time.
  7. Run multiple environments!
  8. Deploy incremental!

Principles & rules – basement for a good engineering culture @ Google

Principles and rules seem to be a good foundation for a good (or great) engineering culture. In a recent question on Quora on “How do Google, Facebook, Apple and Dropbox maintain excellent code quality at scale?” there was an interesting link towards the engineering culture established at Google.

According to the source Google established – and since then follows the mentioned principles below.

1. All developers work out of a ~single source depot; shared infrastructure!
2. A developer can fix bugs anywhere in the source tree.
3. Building a product takes 3 commands (“get, config, make”)
4. Uniform coding style guidelines across company
5. Code reviews mandatory for all checkins
6. Pervasive unit testing, written by developers
7. Unit tests run continuously, email sent on failure
8. Powerful tools, shared company-wide
9. Rapid project cycles; developers change projects often; 20% time
10. Peer-driven review process; flat management structure
11. Transparency into projects, code, process, ideas, etc.
12. Dozens of offices around world => hire best people regardless of location

In the Quora answers it becomes obvious … huge code bases are maintainable only when a culture of ownership and pride is established. The first step, however, is obviously to establish a set of rules – the basement for the engineering culture.

Seeding, growing, harvesting!

Focus points when growing your Engineering organization

Twitter brought the talk from Kris Gale, VP Engineering at Yammer to me. Kris talks about his experience on how to scale an engineering organization from 2 people up to more than 30 engineers.

“Why Yammer believes the traditional engineering organizational structure is dead”, Kris Gale – VP Engineering

My take-aways:

1) Small interdisciplinary teams ship faster. True. Experienced on my own. Don’t specialize to much – let people mix and keep the team at a certain size.

2) Don’t organize yourself in specialized domains (e.g. back-end, front-end, middleware, …)

3) Let the experts make engineering decisions as soon as possible. This needs trust. Hire people who are more expert than you are. Let them decide and keep the process flowing – not allowing any pauses in the flow. The experts are ways better decision makers than managers.

“I don’t think you should be building a product. I think you should be building an organization that builds a product.”

4) Yammer build features with three core metrics in mind:

  • Virality (attract customer)
  • Engagement (retain customer)
  • Monetization (sell to customer)

All features have to improve one or more metrics. Otherwise they change the product for no reason.

5) The 2 and 10 rule. Yammer assigns 2 to 10 people and let a project run 2 to 10 weeks. All other attempts proved wrong and created failure.

6) Avoid code ownership. Everybody owns the code. No heros defending their great code.

7) People assignment works with a “Big Board”. Every engineer has a magnetic button “now” and “future”. The board has all projects listed. Every engineer is asked to put his “now” button on where he’s working currently and his “future” button where he plans to work next. This is great to improve transparency and needs the organization to FOCUS.

Agile defined.

How is Agile defined? What does it mean?

In recent discussion within my company and also in discussion with other people, I recognized that people use the term Agile a lot. It is also obvious that people partyl have a thoroughly different understanding of the meaning of Agile.

When reading about the topic in the internet, following some people on twitter and reading books about agility and similar topics it becomes apparent that Agile has arrived at mass-movement. It is no longer well-understood and sharply defined. The term is more or less a buzzword. Think of “SOA”, “Test Driven Development”, “High Availability” or “Big Data”. All of them arrived at the buzzword-level. Anyway, that’s hard to change.

Agile defined

Looking up Agile at a dictionary it returns:

agile, adjective

  1. quick and well-coordinated in movement
  2. active; lively: an agile person.
  3. marked by an ability to think quickly; mentally acute or aware

So, we’re talking about an adjective – a closer description of the state of something. Agile doesn’t stand on its own. It refers to something. Interesting. But what does it refer?

Agile in organizations – defined

Agile is not a framework

In the context of agile organizations the term very commonly gets confused with other agile things. A lot of people refer to their organization as agile since they introduced SCRUM or Kanban as their software development frameworks. These people confuse Agile with frameworks with comparable core values and motivations. In SCRUM the core values are commitment, openness, focus, respect and courage. However, SCRUM as a framework focusses only on a certain aspect of the overall Agile movement.

Agile is not a methodology

Furthermore, some people think Agile is a methodology. If you follow well-known process steps, applying allways the same pattern to certain situations you apply a certain methodology to arrive at a goal. But Agile is not a collection of best-practices, a rule-set and you’re fine.

Agile is not a goal

Others look at their organization with the sole ambition to become Agile. But there is no state you can arrive at and claim – “Now I’m Agile”. Agile is the path, not the goal.

So what is a definition for Agile in organization context?

Agile defined

Read the great blog post from Jeff Patton about Agile development is more culture than process. Also a great source of insights is the Agile Manifesto.

SCRUM & Planning2 – HOW will we do this?

SCRUM Planning2: Architecting the unknown – The HOW.

In SCRUM Planning2 the focus is entirely on the team identifying the HOW of the WHAT being introduced by the product owner during the course of Planning1.

In Planning1 the product owner described WHAT the user stories actually contain and WHAT the result of the finished stories should look like. Now the team has the challenge to discuss HOW this user story should be implemented. Planning2 is really essential to the team to discuss, plan, architect, argue and agree on how the business user story should be implemented in software. Both, Planning1 and Planning2 are essential activities for the team to finally commit to what they plan to deliver within this sprint iteration.

In our company, we had user stories where individual software developers were sitting days and days to solve one huge technological question. Before entering Planning2 we still were at one user story worth around 40 story points. During Planning2 the whole team came together and discussed the findings of the individual software developers. And magically, they came to a solution – valued 8 story points – implemented during this one sprint and delivered as commited. The miracle of communication!

Planning2 – in essence – reserves real quality time for the team to think through the challenges of the user stories. Allow time to discuss, to argue, to joke, to de-focus, to be creative. A good result of Planning2 is a whiteboard full of tasks associated with the user stories.

SCRUM Planning Board

Ideally, the whiteboard contains a collection of tasks being identified during the discussions in Planning2. The sum of the tasks finally deliver the user story – the associated business value. In our Planning2 sessions we introduced a color coding scheme to reflect the kind of work to be done. Green – for example – stands for quality assurance testing tasks. The more experienced the team, the better the team members know each other – and trust them – the more trustworhty actually the result of Planning2.

When the team is finished with Planning2 they continue with the commitment. During commitment they discuss what user stories they are able and willing to commit. The commitment is “carved in stone”. The scrum master pushes the team to deliver their commitment and the product owner trusts on the team to actually deliver what they commited.

The commitment is all about trust and honor.

Lessons Learned?

  • Allow the team to discuss and actually understand HOW they will deliver the WHAT
  • Leave the team alone in a room – create an atmosphere of intimacy and privacy
  • The team commits to deliver – the commitment is “carved in stone”
  • Use color coding schemes for user story tasks

SCRUM & Planning1 – determine the WHAT

SCRUM Planning1: Well prepared Chaos – The WHAT.

Goal of SCRUM Planning1 is to explain the team WHAT should be accomplished during the course of this sprint.

In theory, the product owner has a well structured backlog (actually usually a spreadsheet 🙂 ) where you find user stories, estimated, already well thought-through, prioritized according to their business value and hence is very well prepared to guide the team through the SCRUM step 1 in the sprint – Planning1.

Well, after 2 years of SCRUM, we are close to there – close to – not yet there.

Initially, we had heavy struggles to get the company wide backlog setup at all. We had lots of stories in there ranging from small stories (1 to 13 story points) up to giant stories (estimated with 100 to ?) – but still in the backlog. We had stories in there with clear business value and others claimed to be “strategic”. We also had stories in there which weren’t good user stories at all (“Fixing the XYZ issue on technical element ABC”). So, we had all ingredients for a really bad Planning1 – and we had several of them. Some of them went to a point where people left the room – entirely agitated. The early days.

Now, we have product owners being on top of their backlogs and user stories. Stories being estimated higher than 13 need to be broken down into smaller pieces. Why? Experience shows that stories estimated bigger than 13 are too complex to be estimated reliably. All stories need to be estimated. Why? The team needs to know what to be expected in the Planning1 session and the product owner is forced to think-through the whole story. One point which is still tricky is the actual metric to measure business value. For the time being it’s still a mixture of our VP Product discussing with the product owners – but at least it’s an identified way to prioritize the backlog.

Lessons Learned?

  • The product owner needs to be on top of his backlog (be it the company backlog or the team backlog).
  • Stories need to be small enough to be understood fast and easy.
  • The team needs to be involved as early as possible (pre-estimation ideally).
  • You need a properly filled backlog and ideally you find a metric to measure business value in an objective way.

SCRUM Sprints – The mechanics.

SCRUM Sprints: One Sprint in a Live.

The typical structure of SCRUM – the mechanics as I call them – are quite common. No big adaptations in the industry. I still want to go into the various elements to highlight the tiny little details we decided to change and where and why.
SCRUM Sprint CycleThe graphic actually shows the structure of our SCRUM mechanics.We have all typical elements: Planning1, Planning2, daily sprint stand-ups, estimations, review and retrospective. Since we’re developing software and came from a project / waterfallish work-model in the past we still have some actions for final release testing and the actual software release.

I talk about the various elements in seperate posts in greater detail:

Lessons Learned?

  • The mechanics of SCRUM reflect the fundament of a potentially successful working model. You just need to get the details right 🙂