Reaching goals – lots of micro steps actually make the goal!

During Facebook’s developer conference f8 in 2014 Edwin Smith with the High-Performance Server Infrastructure team shared some insights on the HHVM – the PHP runtime project built around performance (27:37 onwards). In his talk he also described how the team almost failed reaching a very ambitious goal – but finally managed it … with 1% micro steps. They actually overachieved.

What happened? In October 2012 the team was in a position where they had spent nearly 2 years of development time to create a virtual machine / just-in-time compiler to boost Facebook’s execution performance. Already in April 2012 they realized that the newly created project was 3 times slower than the current execution environment – and plan to go live was end 2012. In October 2012 the team realized that following the working model as they did so far will not allow them to make their goal.

So, the need to improve the execution performance by factor 3+ (ambitious goal) meets a hard deadline to go live (time box).

At the time, the team stopped working like they did before and changed to a drastically different model.

New work model to achieve performance goals

They changed from a project working model towards a kanban-like working model. Now, they started focusing on micro-steps. Each of these steps shouldn’t take longer than a day or two. If the success was measurable and positive, great. If not, the team simply documented the effort and moved on (Furiously iterate).

The backlog of ideas for HHVM performance improvements

Prior to starting the work on the final period from October to December the team started with a brainstorming session filling up their backlog. Each of these micro 1% performance improvment steps were documented. The backlog organized like: left–>right impact – with least impact right, top–>bottom effort with least effort in top. Ideally, all steps were located top left (low effort but high impact). Those, however, were already covered.

Tasks done during HHVM performance tuning period

The team documented the finished tasks with positive and no / negative impact on the board as well. A great learning experience.

Validation of the impact was done utilizing a fine grained measuring tool allowing the team to identify even smallest performance improvements.Facebook HHVM result

The result of the effort is amazing. The team managed – focusing on these micro-steps – to get to their goal – and even further.

The team did change to this working model since. They have periods of hard and focused work. They pick a goal and divide the path towards this goal into micro steps. They work for a small amount of time on one of these steps and decide on metrics (validation) to pivot (learning: wrong direction) or to persevere (learning: right direction). When the goal is reached the team does further fine-tuning on the achievements – or goes on vacation. Afterwards, they continue with another iteration.

 

Facebook and their mobile release process

The process of releasing software in a timely manner is highly business critical. Especially, the mobile release process is critical when moving towards a mobile-first strategy. The talk “Hacker Way: Releasing and Optimizing Mobile Apps for the World” (by Chuck Rossi @Facebook’s f8 conference in 2014) describes how Facebook turned its organization structure. This move was necessary to reflect the importance of mobile for Facebook’s future. Chuck heads the company’s release team and is responsible for all releases.

Impact of Mobile strategy on organization

Before re-prioritizing everything within Facebook and focusing on mobile the development team was organized mainly around channels:

Development Organization of Facebook before moving towards mobile

This developer distribution led actually to heavy prioritization problems. The different product teams with focus on Desktop Web did prioritize their topics coming up with a numbered list of items. This prioritization were then handed over to the platform experts. They had the problem of seeing number #1 priority item of the “Messages team” competing with number #1 priority item of e.g. the “Events team”.

Facebook came over this organization issue by organizing their development differently:Development Organization of Facebook after moving towards mobile

Now, the Facebook engineering team has product and platform experts mixed working on features across all platforms.

Software Releases at Facebook

Facebook has some simple rules – simple but made of stone:

  1. WE SHIP ON TIME
    A
    release can not be postponed. If a feature can’t make it it will not make it into this release.
  2. MAKE USERS NO WORSE OFF
    Facebook is data driven. KPI’s are watched thorougly after a release. If they don’t develop as expected, a change needs to happen (e.g. fix forward or modification).
  3. THERE’S ALWAYS THE NEXT ONE
    Since the releases are already dated there is always the next release. If you can’t get your feature in today, it will be part of the release tomorrow. This relaxes the overall organization and takes away a lot of the pain experienced when the next release is month away.
  4. RETREAT TO SAFETY
    The release team is responsible for delivering a stable product. When the team actually picks the ready developed items (30 to 300 on a daily release) they carefully take the stories into the release candidate. It’s described as “subjective”. They follow a simple rule when building the release package: “If in doubt, there is no doubt”.

Facebook releases their web platform following a plan:Facebooks desktop web plattform release plan

Sunday, 6 p.m. the release team tags the next release branch. That’s done directly from the trunk. The release branch is stabilized until Tuesday, 4 p.m. and then released as a big release including 4000 to 6000 changes – 1 week of development. On Monday, Tuesday, Wednesday, Thursday, Friday, Facebook does two releases a day. These are cherry-picked changes – around 30 to 300 each release.

For Mobile the plan differs obviously a bit:Facebooks native web plattform release plan

On mobile the overall release principle is actually the same as described above. The development cycle is 4 weeks – on the day the previous release gets shipped to the various app stores, the next release candidate is taken from the master. The candidate is then 3,5 weeks into stabilization. Each candidate includes further 100-120 cherry picks taken during this 3 weeks stabilization period. When stabilization is over, the Release Candidate is tested and not touched any more.

Best decision ever – skip the architect

The architect role is sold as outstanding important in product development efforts – especially when IT is involved. But I’ve learned some lessons.

The secret.de case – skip the architect

Roughly 3 years ago we started a new business – a high class and exclusive casual dating site focusing exclusively at women. The technical decision was soon done by picking Ruby on Rails v3 as web framework and mongoDB as persistence layer. It was a radical shift away from our current technology stack – pure java and postreSQL.

When we started with Sprint 0 we hired external people to support us. One person acted as the Ruby on Rails trainer for 1 week – to get our people up to speed. At the time we started we had 1 skilled Ruby on Rails person focusing on frontend development and one not-so-experienced person with Ruby on Rails focusing on backend development. The remaining team were skilled java developers. The other external people were one Rails nerd and an architect. During the sprints, it turned out that the architect didn’t have any clue about Rails nor pragmatic architecture. He started to document our project with ARC42 templates … so, we decided to put him aside soon. Leaving the team – without lead. No architect, no direction, no guidance – no hope?

Not at all. What happened? The team started to accept the fact that there’s no over-brain available. No-one making decisions for them. No-one giving direction. And, magically, they took over the ownership for the overall project. Each and every design decision was discussed within the team. Planning II got a total new meaning to the team. Sure, quite some mistakes were made – but most of them due to non-experience with the new technology stack. The Rails nerd was out-phased as soon as the platform went live – after 3,5 month of development.

So, in self-empowered teams there is no need for an explicit architect role. Naturally in team configurations there are more experienced people and less experienced people. A good team will distribute the overall responsibility for good architecture work over all heads. Everybody will carry a piece fitting their experience and willingness to contribute. Plus – you will not run into knowledge distribution problems. Everybody is involved. Knowledge is flowing. New persons can be introduced without a lot effort. The teams credo “fix it if it breaks” led to a low-maintenance and up-to-date system. So, I’ve a fast, fast, fast running application and a real high-performing team.

For me – in the end – the best decision ever was to skip the architect.

Page Load time – how to get it into the organization?

Page load time is crucial. Business and technology people as individuals start understanding the importance of this topic almost immediately and are willing to support any effort to get fast pages out of your service.

But how can you foster a culture of performance and make people aware of the importance of this single important topic – amongst hundred other important topics?

That was one of the challenges early 2013. Management and myself were convinced that 2013 one of our key focus topics is around web performance. t4t_optimizedThe birthday of “T4T”. The acronym stands for …

  • Two – Deliver any web page within 2 seconds to our customers.
  • 4 – Deliver any mobile web page within 4 seconds to our customers over 3G.
  • Two hundred – Any request over the REST API is answered below 200 milliseconds.

So, early 2013 we started T4T as an initiative to bring our page load times down to good values. To measure the page load time we experimented with two tools: Compuware’s Gomez APM tool and New Relic’s APM tool. Gomez was used initially for our Java based platform and New Relic for our Ruby on Rails platform. But we were able to measure and track-down some really nasty code segments (i.e. blocking threads in Java or 900 database requests in Ruby where 2 finally did the same job).

How did we get the idea of T4T into the organization? Any gathering of people with presentation character was used to hammer the message of web performance to the people. Any insight on the importance, any tip, hint, workshop, conference, article, blog post, presentation, anything was shared with the team. Furthermore, T4T was physically visible everywhere in the product development department:

T4T_closeup

THE LOGO – visible … everywhere … creepy!

T4T_Corner

T4T logo and information on page load and web performance at the relax area for software developers and product owners …

T4T_VPOffice

T4T at the VP office door.

For me, especially the endless talking about the topic, raising the importance, questioning of e.g. JPG picture sizes, special topic discussions on CSS sprites vs. standalone images or the usage of web-fonts for navigation elements helped a lot to raise the curiosity of people. Furthermore, giving them some room and time for research work helped a lot.

What did we achieve? Well, one of our platforms – based on Ruby on Rails started with page load time of 2,96s in January 2013. End 2013, the platform was at an impressive 2,15s page load time. In the same time, the amount of page views increased by factor 1,5!

Loadtime_secret_2013

Page Load time over the year 2013

During the same time period, the App server response time dropped from 365ms to 275ms end of year – this time doubling the amount of requests in the same time.

Response_time_secret_2013

App server response time over the year 2013

Most interesting, we had one single release with a simple reshuffling of our external tags. Some of them now load asynchronously – or even after the onLoad() event. This helped us drop the page load time from around 2,5s to 2,1s – 400ms saved!

Impact_of_one_event_secret_adtags_after_onload

Impact of one single release and the move of adtags after the onLoad() event.

So, my takeaways on how to foster such a performance culture?

  1. You need a tangible, easy to grasp goal!
  2. Talk about the topic and the goal. Actually, never stop talking about this specific goal.
  3. Make the goal visible to anybody involved – use a logo.
  4. Measure your success.
  5. Celebrate success!
  6. Be patient. It took us 12 month …

Page Load time is crucial for a web service. Why?

For a technical person, page load time feels like being important. It’s a natural tendency, an instinct almost, to make everything perform best. Unfortunately, from a business perspective that’s not really a driver to impress people or make people being responsible for revenues to re-think the importance of page load time.

Why it might be of interest – also for business people?

There is a tight coupling between revenue and page load time in e-commerce businesses. The Infographic “How Loading Time Affects Your Bottom Line” shows slower page response time resulting in increased abandonment rates,  a Forrester study shows that 14% of online shoppers go to another site if they have to wait for a page to load – 23% will stop shopping, Shopzilla redesigned their site to load 5 seconds faster resulting in 10% revenue increase, Bing reported from a trial that a 2-second slowdown of page load time resulted in reduced revenues per user by 4,3%, Amazon measured a relation of 1% sales decrease for every 100 millisecond lost in page speed, and Google reported a revenue decrease by 20% for every 500 millisecond page performance loss, the Mozilla corporation behind Firefox managed to reduce the page load time of their download pages by 2.2 seconds resulting in 60 million additional downloads per year,

Why is it important for people? Why should web sites simply be fast?

There is this article “Our Need For Web Speed: It’s about neuroscience, not entitlement” from radware / strangeloop. It gives deeper insights into human nature and why it is important to run fast websites. A really good motivation for technical and non-technical people to think about the nature of performance. On web performance today, there is this infographic “This is your brain on a slow website” which picks up some of the arguments of the article in a displayable way.

Jakob Nielsen wrote in 2010 already in “Website Response Times” about the impact of slow web pages on humans and gives good reasons why we should definitely try to avoid this bad user experience.

Also good source of information to get an impression of the current state of the union: “Ecommerce Page Speed & Web Performance“.

Furthermore, there is this article on “How Facebook satisfied a need for speed“. Robert Johnson, director of engineering explains how Facebook boosted their speed by factor 2.

Another poster “Visualizing Web Performance” by strangeloop.

Page load time is critical – how to make your site run fast?

Page load time is critical. A lot of people highlight the importance of fast web sites. Amongst them are Steve Souders, Patrick Meenan, Tammy Everts, Stoyan Stefanov and others.

What to do to make your site run fast? There are tons of pages, blogs, hints, tips, tricks and other stuff around in the web. Here’s my favorite collection:

At FriendScout24, we follow this idea of fast web pages as well. I talk in another post in greater details about our goals, achievements and how we actually did it.

Talk on Continuous Delivery at CodeCentric event in Hamburg

End 2013, the general manager of CodeCentric in Munich asked me to do a presentation on our view / achievements / experience in the context of Continuous Delivery. The first of a series of events happened in Hamburg on 26th of November in a nice location.

I put the presentation on SlideShare to make it accessible to others as well. The title of the presentation “Continuous Delivery? Nett oder nötig?”.

The presentation talks about our goals, why we decided to introduce continuous delivery as a way of delivering software to our business, shows some of the experience we made, tells somethings about the challenges we had transforming our architecture to fit the new delivery ways, tools and some more.

In case you have any questions, don’t hesitate to get back to me on michael (at) agile-minds.com.

Focus points when growing your Engineering organization

Twitter brought the talk from Kris Gale, VP Engineering at Yammer to me. Kris talks about his experience on how to scale an engineering organization from 2 people up to more than 30 engineers.

“Why Yammer believes the traditional engineering organizational structure is dead”, Kris Gale – VP Engineering

My take-aways:

1) Small interdisciplinary teams ship faster. True. Experienced on my own. Don’t specialize to much – let people mix and keep the team at a certain size.

2) Don’t organize yourself in specialized domains (e.g. back-end, front-end, middleware, …)

3) Let the experts make engineering decisions as soon as possible. This needs trust. Hire people who are more expert than you are. Let them decide and keep the process flowing – not allowing any pauses in the flow. The experts are ways better decision makers than managers.

“I don’t think you should be building a product. I think you should be building an organization that builds a product.”

4) Yammer build features with three core metrics in mind:

  • Virality (attract customer)
  • Engagement (retain customer)
  • Monetization (sell to customer)

All features have to improve one or more metrics. Otherwise they change the product for no reason.

5) The 2 and 10 rule. Yammer assigns 2 to 10 people and let a project run 2 to 10 weeks. All other attempts proved wrong and created failure.

6) Avoid code ownership. Everybody owns the code. No heros defending their great code.

7) People assignment works with a “Big Board”. Every engineer has a magnetic button “now” and “future”. The board has all projects listed. Every engineer is asked to put his “now” button on where he’s working currently and his “future” button where he plans to work next. This is great to improve transparency and needs the organization to FOCUS.

Company values – Short-term revenue versus long-term success?

Company values are discussed controversial. Some people (especially managers) like them – others (especially employees) hate them. Often enough they turn out to be not more than words – but if used wisely they guide a whole business and turn managers into leaders.

One question I repeatedly discuss and hear is

“Why should a company focus on values, culture and all this fluffy stuff? Revenue is what counts!”.

Well, allowing 30 seconds think time about this sentence leads you – perhaps – to agreement with the message in this sentence. If you allow some more time and take sustainability, long-term success and reasonable growth into account, the answer might be different.

Here is a great blog entry by Brent Gleeson: Never Sacrifice Values for Growth

My takeaway:

Core values of a company – or guiding principles – set a framework for your employees and yourself when acting in the mind of your business. Sit down at least once in your organization and think about what you really care about.

However, core values tend to become very easily buzz words. To prevent this, consider:

  1. Be authentic – only define the core values if you’re really willing to follow them and let them guide your daily work and behavior. They will form your company culture!
  2. Recruit wisely – let the values guide you in recruiting processes. Only if the candidate fits into your core value system – hire him/her.
  3. Share with clients – talk about your core values – share them with your customers. Like-minded customers will become even more loyal.
  4. Live the values – define them, know them and let them guide you!

Agile defined.

How is Agile defined? What does it mean?

In recent discussion within my company and also in discussion with other people, I recognized that people use the term Agile a lot. It is also obvious that people partly have a thoroughly different understanding of the meaning of Agile.

When reading about the topic in the internet, following some people on twitter and reading books about agility and similar topics it becomes apparent that Agile has arrived at mass-movement. It is no longer well-understood and sharply defined. The term is more or less a buzzword. Think of “SOA”, “Test Driven Development”, “High Availability” or “Big Data”. All of them arrived at the buzzword-level. Anyway, that’s hard to change.

Agile defined

Looking up Agile at a dictionary it returns:

agile, adjective

  1. quick and well-coordinated in movement
  2. active; lively: an agile person.
  3. marked by an ability to think quickly; mentally acute or aware

So, we’re talking about an adjective – a closer description of the state of something. Agile doesn’t stand on its own. It refers to something. Interesting. But what does it refer?

Agile in organizations – defined

Agile is not a framework

In the context of agile organizations the term very commonly gets confused with other agile things. A lot of people refer to their organization as agile since they introduced SCRUM or Kanban as their software development frameworks. These people confuse Agile with frameworks with comparable core values and motivations. In SCRUM the core values are commitment, openness, focus, respect and courage. However, SCRUM as a framework focusses only on a certain aspect of the overall Agile movement.

Agile is not a methodology

Furthermore, some people think Agile is a methodology. If you follow well-known process steps, applying always the same pattern to certain situations you apply a certain methodology to arrive at a goal. But Agile is not a collection of best-practices, a rule-set and you’re fine.

Agile is not a goal

Others look at their organization with the sole ambition to become Agile. But there is no state you can arrive at and claim – “Now I’m Agile”. Agile is the path, not the goal.

So what is a definition for Agile in organization context?

Agile defined

Read the great blog post from Jeff Patton about Agile development is more culture than process. Also a great source of insights is the Agile Manifesto.