Reaching goals – lots of micro steps actually make the goal!

During Facebook’s developer conference f8 in 2014 Edwin Smith with the High-Performance Server Infrastructure team shared some insights on the HHVM – the PHP runtime project built around performance (27:37 onwards). In his talk he also described how the team almost failed reaching a very ambitious goal – but finally managed it … with 1% micro steps. They actually overachieved.

What happened? In October 2012 the team was in a position where they had spent nearly 2 years of development time to create a virtual machine / just-in-time compiler to boost Facebook’s execution performance. Already in April 2012 they realized that the newly created project was 3 times slower than the current execution environment – and plan to go live was end 2012. In October 2012 the team realized that following the working model as they did so far will not allow them to make their goal.

So, the need to improve the execution performance by factor 3+ (ambitious goal) meets a hard deadline to go live (time box).

At the time, the team stopped working like they did before and changed to a drastically different model.

New work model to achieve performance goals

They changed from a project working model towards a kanban-like working model. Now, they started focusing on micro-steps. Each of these steps shouldn’t take longer than a day or two. If the success was measurable and positive, great. If not, the team simply documented the effort and moved on (Furiously iterate).

The backlog of ideas for HHVM performance improvements

Prior to starting the work on the final period from October to December the team started with a brainstorming session filling up their backlog. Each of these micro 1% performance improvment steps were documented. The backlog organized like: left–>right impact – with least impact right, top–>bottom effort with least effort in top. Ideally, all steps were located top left (low effort but high impact). Those, however, were already covered.

Tasks done during HHVM performance tuning period

The team documented the finished tasks with positive and no / negative impact on the board as well. A great learning experience.

Validation of the impact was done utilizing a fine grained measuring tool allowing the team to identify even smallest performance improvements.Facebook HHVM result

The result of the effort is amazing. The team managed – focusing on these micro-steps – to get to their goal – and even further.

The team did change to this working model since. They have periods of hard and focused work. They pick a goal and divide the path towards this goal into micro steps. They work for a small amount of time on one of these steps and decide on metrics (validation) to pivot (learning: wrong direction) or to persevere (learning: right direction). When the goal is reached the team does further fine-tuning on the achievements – or goes on vacation. Afterwards, they continue with another iteration.

 

Facebook and their mobile release process

The process of releasing software in a timely manner is highly business critical. Especially, the mobile release process is critical when moving towards a mobile-first strategy. The talk “Hacker Way: Releasing and Optimizing Mobile Apps for the World” (by Chuck Rossi @Facebook’s f8 conference in 2014) describes how Facebook turned its organization structure. This move was necessary to reflect the importance of mobile for Facebook’s future. Chuck heads the company’s release team and is responsible for all releases.

Impact of Mobile strategy on organization

Before re-prioritizing everything within Facebook and focusing on mobile the development team was organized mainly around channels:

Development Organization of Facebook before moving towards mobile

This developer distribution led actually to heavy prioritization problems. The different product teams with focus on Desktop Web did prioritize their topics coming up with a numbered list of items. This prioritization were then handed over to the platform experts. They had the problem of seeing number #1 priority item of the “Messages team” competing with number #1 priority item of e.g. the “Events team”.

Facebook came over this organization issue by organizing their development differently:Development Organization of Facebook after moving towards mobile

Now, the Facebook engineering team has product and platform experts mixed working on features across all platforms.

Software Releases at Facebook

Facebook has some simple rules – simple but made of stone:

  1. WE SHIP ON TIME
    A
    release can not be postponed. If a feature can’t make it it will not make it into this release.
  2. MAKE USERS NO WORSE OFF
    Facebook is data driven. KPI’s are watched thorougly after a release. If they don’t develop as expected, a change needs to happen (e.g. fix forward or modification).
  3. THERE’S ALWAYS THE NEXT ONE
    Since the releases are already dated there is always the next release. If you can’t get your feature in today, it will be part of the release tomorrow. This relaxes the overall organization and takes away a lot of the pain experienced when the next release is month away.
  4. RETREAT TO SAFETY
    The release team is responsible for delivering a stable product. When the team actually picks the ready developed items (30 to 300 on a daily release) they carefully take the stories into the release candidate. It’s described as “subjective”. They follow a simple rule when building the release package: “If in doubt, there is no doubt”.

Facebook releases their web platform following a plan:Facebooks desktop web plattform release plan

Sunday, 6 p.m. the release team tags the next release branch. That’s done directly from the trunk. The release branch is stabilized until Tuesday, 4 p.m. and then released as a big release including 4000 to 6000 changes – 1 week of development. On Monday, Tuesday, Wednesday, Thursday, Friday, Facebook does two releases a day. These are cherry-picked changes – around 30 to 300 each release.

For Mobile the plan differs obviously a bit:Facebooks native web plattform release plan

On mobile the overall release principle is actually the same as described above. The development cycle is 4 weeks – on the day the previous release gets shipped to the various app stores, the next release candidate is taken from the master. The candidate is then 3,5 weeks into stabilization. Each candidate includes further 100-120 cherry picks taken during this 3 weeks stabilization period. When stabilization is over, the Release Candidate is tested and not touched any more.