Website performance – best practice to improve

Website performance comes in various flavours – but where to start with improvements? How to improve the performance? What are best practices to follow?

Tony Gentilcore (@tonygentilcore) talks in a blogpost about “An engineer’s guide to optimization“. Tony basically identified 5 steps to follow.

Step 1: Identify the metric. 

Identify a scenario worth being optimized – means it moves a business metric. If – after all thinking and crunching – you’re not able to identify a scenario with a clear relationship between the optimization and any business metric you should look for more pressing problems first – and revisit the performance issue later.

Step 2: Measure the metric.

After you’ve identified the metric – establish a repeatable benchmark of this scenario / metric. Include this metric measurement into your continuous integration / delivery pipelines and watch out for regressions. First, start with synthetic benchmarks and later include the real world (Real User Monitoring).

Step 3: Identify the theoretical optimum of your metric.

Think about your scenario and create a best-best-best-case. What would be the benefit, the performance maximum to gain out of the scenario. Given everything works really well, what would be the top-performance figure?

Step 4: Approach your optimium.

Identify the bottlenecks preventing you to reach the optimum. Work on these bottlenecks – start with the biggest impact first. Don’t stop optimizing until you reach a point where you have to invest more than you benefit.

Step 5: Profit from your achievements!

Agile – criteria to look for in an Agile business

In February 2013, I made an attempt to define Agile on my blog. I ended in a quite high-level definition of Agile:

Agile is a collection of values and principles that encourage a certain type of behaviour: focus on value generation and collaboration.”

Just recently, I found an interesting blog by Rouan Wilsenach working with ThoughtWorks where he went one level deeper and explained “Four Attributes of an Agile Business

1. Feedback – “This is what my customer wants.”

In my definition this relates to focus on value generation. Companies need to put the customer in the center of all their efforts. Rouan refers to an interesting list / toolbox on experience design. Only if organizations take this attribute serious they will finally be successful applying principles of agility.

2. A responsive team – “Yes. We can do that.”

In my definition this very much relates to values and principles supporting collaboration. Responsiveness comes only with co-location. In my experience, true collaboration over a distance can only be achieved within a trusted environment. People need to trust and respect each other to become a distributed team. If they don’t know each other of follow slightly different goals, it’s very likely to fail. So, the team spirit can be created most easily when forming inter-disciplinary teams which are located ideally in the same room. Rouan refers to Conway’s law (comparison of code structures with organizational patterns)

3. A responsive code base – “It’s ready. Shall we release it?”

For me, this again a perfect example of value generation. Have a feature ready? Why not shipping it immediately? Great entry for principles like “Zero Bug Policy” or “Continuous Deployment”.

Furthermore, a responsive code base is a clean code base. People love to work on such a code base. It’s almost bug free, easy to read, simple to understand, documented. People are proud of what they’ve developed, created.

4. Continuous direction – “What do we do next?”

Again, this point goes as value generation. Why employing a bunch of people working hard on software development, building great products if there is no direction behind, no strategy?

Clearly one of the major tasks of managers in an agile environment is to set the strategy. Product people break this strategy down into epics and stories. So, what to do next? Pick the next story from the prioritized backlog.

So, for me Rouan picked some really great criteria and attributes for an agile business. They’re in line with my previous definition (which I like) and take it to a less abstract level.

Reducing staff leads to higher productivity – proof for Brooks’ law?

Recently, my teamlead agile methods pointed me to an interesting corelation. After a discussion we concluded with a really mind-boggling observation. A settled organization facing change will increase productivity when reducing staff headcount – in other words: “Reducing staff leads to higher productivity”.

Bang! But let me explain. For software development tracking and planing purpose we use JIRA – so we’re able to track the amount of created and closed software development related tasks quite easily. The headcount in software development, product management and UX/design can be obtained easily by – counting (we actually asked our HR department for support).

In our case we ended with this curve – impressively proving Brooks’s law.

Brooks Law in action

The graph shows a period from June 2013 (left) to July 2014 (right) and focuses only on one of our products at the time.

Green shows the number of software developers working on the product. Purple shows the number of product and UX/design people involved. The two curves are declining.

Red shows the amount of releases done in the period.The curve is trending towards a stable, flat curve.

Blue shows the amount of tickets closed and turquoise the amount of bugs closed. The two curves are decreasing trending towards a way higher plateau.

What happened? In June 2013 time frame we had a whole architecture team and quite some software developers organized in three teams working on our product. Around this point in time uncertainty hit the organization due to rumors of to-be-expected acquisition of our whole company. Uncertainty translates often to people leaving the organization. This can clearly be seen in the middle of the graphic. Surprisingly, the curves associated to work results don’t decline – as one would expect – but decrease even. So, productivity of the remaining staff increased! The most right date in the graph is July 2014 and reflects an organization which has one product team and one support team left. The architecture team disappeared completely, the team is ways lighter equipped with people.

What might we learn? Brooks’ law indicates that adding people to a project doesn’t automatically make the overall project finish faster. It might even end in a longer project run time. We’ve seen that removing people might result in better productivity. We think the fact of removing whole teams dedicated to special topics (in our case: the architecture team) helped to increase productivity in two ways. First, the responsibility of architecture decisions was pushed back into the teams and dead-lock situations where person A waits for a decision from person B just disappeared. Second, the time needed to communicate, agree and disagree simply vanished. Decisions had to be taken and nobody needed to be asked.

Is this a general pattern? I personally think this effect does only appear for a certain period of time. The drawback of such leaner organizations is obviously the lack of work on technical debt and architecture drawbacks. People in such situation focus on the obvious and postpone work on maintenance tasks for later. For me, it is definitely an observation I’d like to share with you. Perhaps you have similar experience? If so, please let me know.

Is there really a relation between the number of staff members and the productivity? Where is the limit? How can one push the limit?

Clean Code Developer Initiative – a structured learning way to skill improvement

“Clean Code – A Handbook of Agile Software Craftmanship” by Robert C. Martin is a classical reading for software developers. Great, but how do you get the spirit he described into the heads of your whole software development organization? The Clean Code Developer Initiative by Ralf Westphal and Stefan Lieser drafts an interesting way to improve the skills of your software developers with the spirit of Uncle Bob Martin’s book in mind.

The Clean Code Developer Initiative (CCDI) proposes a grade based system with color coding. It borrows concepts from game design by introducing a level concept. Entry level is marked with black and levels up via red, orange, yellow, green, blue to white.

For each level there are principles and practices listed which a software developer – or a whole department – should follow to get to the next level.

Urs Enzler from planetgeek.ch produced a nice Clean Code Cheat Sheet – it’s worth a look.

Clean Code Cheat sheet

There is a nice poster available by “unclassified software development“. It’s even possible to print it out in A0 format – huge!

Clean Code Developer Poster

The whole level of a developer can be shown through wristbands. Allow them to be proud on their achievements!

Clean Code Developer wristbands

I personally haven’t tried the effort to introduce the principles of Clean Code into a software development department – but it looks like it’s a feasible effort. Accompanied by trainings and some team events, it should be possible to appeal to the developer’s honor – and improve your software quality standards.

Page Load time – how to get it into the organization?

Page load time is crucial. Business and technology people as individuals start understanding the importance of this topic almost immediately and are willing to support any effort to get fast pages out of your service.

But how can you foster a culture of performance and make people aware of the importance of this single important topic – amongst hundred other important topics?

That was one of the challenges early 2013. Management and myself were convinced that 2013 one of our key focus topics is around web performance. t4t_optimizedThe birthday of “T4T”. The acronym stands for …

  • Two – Deliver any web page within 2 seconds to our customers.
  • 4 – Deliver any mobile web page within 4 seconds to our customers over 3G.
  • Two hundred – Any request over the REST API is answered below 200 milliseconds.

So, early 2013 we started T4T as an initiative to bring our page load times down to good values. To measure the page load time we experimented with two tools: Compuware’s Gomez APM tool and New Relic’s APM tool. Gomez was used initially for our Java based platform and New Relic for our Ruby on Rails platform. But we were able to measure and track-down some really nasty code segments (i.e. blocking threads in Java or 900 database requests in Ruby where 2 finally did the same job).

How did we get the idea of T4T into the organization? Any gathering of people with presentation character was used to hammer the message of web performance to the people. Any insight on the importance, any tip, hint, workshop, conference, article, blog post, presentation, anything was shared with the team. Furthermore, T4T was physically visible everywhere in the product development department:

T4T_closeup

THE LOGO – visible … everywhere … creepy!

T4T_Corner

T4T logo and information on page load and web performance at the relax area for software developers and product owners …

T4T_VPOffice

T4T at the VP office door.

For me, especially the endless talking about the topic, raising the importance, questioning of e.g. JPG picture sizes, special topic discussions on CSS sprites vs. standalone images or the usage of web-fonts for navigation elements helped a lot to raise the curiosity of people. Furthermore, giving them some room and time for research work helped a lot.

What did we achieve? Well, one of our platforms – based on Ruby on Rails started with page load time of 2,96s in January 2013. End 2013, the platform was at an impressive 2,15s page load time. In the same time, the amount of page views increased by factor 1,5!

Loadtime_secret_2013

Page Load time over the year 2013

During the same time period, the App server response time dropped from 365ms to 275ms end of year – this time doubling the amount of requests in the same time.

Response_time_secret_2013

App server response time over the year 2013

Most interesting, we had one single release with a simple reshuffling of our external tags. Some of them now load asynchronously – or even after the onLoad() event. This helped us drop the page load time from around 2,5s to 2,1s – 400ms saved!

Impact_of_one_event_secret_adtags_after_onload

Impact of one single release and the move of adtags after the onLoad() event.

So, my takeaways on how to foster such a performance culture?

  1. You need a tangible, easy to grasp goal!
  2. Talk about the topic and the goal. Actually, never stop talking about this specific goal.
  3. Make the goal visible to anybody involved – use a logo.
  4. Measure your success.
  5. Celebrate success!
  6. Be patient. It took us 12 month …

Talk on Continuous Delivery at CodeCentric event in Hamburg

End 2013, the general manager of CodeCentric in Munich asked me to do a presentation on our view / achievements / experience in the context of Continuous Delivery. The first of a series of events happened in Hamburg on 26th of November in a nice location.

I put the presentation on SlideShare to make it accessible to others as well. The title of the presentation “Continuous Delivery? Nett oder nötig?”.

The presentation talks about our goals, why we decided to introduce continuous delivery as a way of delivering software to our business, shows some of the experience we made, tells somethings about the challenges we had transforming our architecture to fit the new delivery ways, tools and some more.

In case you have any questions, don’t hesitate to get back to me on michael (at) agile-minds.com.

10 rules to prevent a web site from high scalability

On highscalability.com there is a great post on things you should do to prevent your web site from high scalability: “The 10 Deadly Sins Against Scalability“. The post points to Sean Hull who twitters and writes quite frequently on scalability topics (surprise, surprise).

Sean Hull wrote in his blog about “5 things toxic to scalability” (2011) and “Five More Things Deadly to Scalability” (2013). Definitely worth reading entries on high scalability – and common pitfalls.

Book by Martin L. Abbott, Mihcael T. Fisher on scalability rules.

In the context of this topic, Sean also recommends a book: “Scalability Rules for managers and startups”.

Very good reading to avoid all high-scalability pit-falls right from the beginning!

 

 

 

Software releases without damages and poor user experience @ Facebook

Recently I did some research on software releases and how huge successful companies manage to release their software without causing system failures and poor user experience.

I found this question being answered by a former Facebook release engineer: “How do big companies like Facebook, Google manage software releases without causing system outages and poor user experience?“. In addition to this great answer, there is also an interview at arstechnica “Exclusive: a behind-the-scenes look at Facebook release engineering“.

Facebook follows since the very beginning the principle of zero downtime and no service interruption. How do they accomplish this – even now, when they’re big?

Pushing releases in multiple phases

Facebook pushes their releases in waves. The initial phase, named “latest” always contains the latest code changes (hence the name). All engineers are connected to the “latest” staging system, gather in an IRC channel when the initial push happens and watch logs and error messages meticulously. If the build proves to be okay a push to some servers in production happens (p1-phase). Again, all developers concerned with the release collect in the IRC-channel to watch the release and gather KPI changes, log messages and errors. The next stage, phase p2, includes roughly 5% of all live server systems – again thoroughly monitored. Finally, when phase p2 proves to be good again, the deployment takes place on all server systems. Deployment @ Facebook means copying a 1,5GB binary to all servers – done with a customized bit torrent distribution system.

If an error occurs? Well, the developers are on IRC and held accountable to fix their piece of code. If you crash it, you repair it!

Multiple versions of code running simultaneously

Executing code in Facebooks’ server environment means automatically running multiple versions of code simultaneously. This means an extra effort to address this principle. Hardest is to migrate database schemes from one to another.

Features with on-off toggles

Facebook utilizes a tool named “Gatekeeper” to allow real-time on/off switching and throttling of features. Only few code changes need to be introduced and Facebook operations can control the traffic and which features are available. Code in such environments need to be highly decoupled – no dependencies between features …

Versioned static resources across the web tier

All the web servers in Facebooks server farm are able to serve all static content of all versions being deployed. This means that all servers are equipped with all resources prior to phase p1 deployment. This allows the whole web tier to remain stateless.

If you condense down what’s said in the articles it comes to these points:

  1. Automate everything!
  2. Test early, test often!
  3. Hold developers responsible and let them fix live errors.
  4. Each release has an owner involved from all stakeholder teams.
  5. The product is designed to be rolled back. From the beginning.
  6. The product is designed to execute multiple versions at the same time.
  7. Run multiple environments!
  8. Deploy incremental!

Principles & rules – basement for a good engineering culture @ Google

Principles and rules seem to be a good foundation for a good (or great) engineering culture. In a recent question on Quora on “How do Google, Facebook, Apple and Dropbox maintain excellent code quality at scale?” there was an interesting link towards the engineering culture established at Google.

According to the source Google established – and since then follows the mentioned principles below.

1. All developers work out of a ~single source depot; shared infrastructure!
2. A developer can fix bugs anywhere in the source tree.
3. Building a product takes 3 commands (“get, config, make”)
4. Uniform coding style guidelines across company
5. Code reviews mandatory for all checkins
6. Pervasive unit testing, written by developers
7. Unit tests run continuously, email sent on failure
8. Powerful tools, shared company-wide
9. Rapid project cycles; developers change projects often; 20% time
10. Peer-driven review process; flat management structure
11. Transparency into projects, code, process, ideas, etc.
12. Dozens of offices around world => hire best people regardless of location

In the Quora answers it becomes obvious … huge code bases are maintainable only when a culture of ownership and pride is established. The first step, however, is obviously to establish a set of rules – the basement for the engineering culture.

Seeding, growing, harvesting!

Focus points when growing your Engineering organization

Twitter brought the talk from Kris Gale, VP Engineering at Yammer to me. Kris talks about his experience on how to scale an engineering organization from 2 people up to more than 30 engineers.

“Why Yammer believes the traditional engineering organizational structure is dead”, Kris Gale – VP Engineering

My take-aways:

1) Small interdisciplinary teams ship faster. True. Experienced on my own. Don’t specialize to much – let people mix and keep the team at a certain size.

2) Don’t organize yourself in specialized domains (e.g. back-end, front-end, middleware, …)

3) Let the experts make engineering decisions as soon as possible. This needs trust. Hire people who are more expert than you are. Let them decide and keep the process flowing – not allowing any pauses in the flow. The experts are ways better decision makers than managers.

“I don’t think you should be building a product. I think you should be building an organization that builds a product.”

4) Yammer build features with three core metrics in mind:

  • Virality (attract customer)
  • Engagement (retain customer)
  • Monetization (sell to customer)

All features have to improve one or more metrics. Otherwise they change the product for no reason.

5) The 2 and 10 rule. Yammer assigns 2 to 10 people and let a project run 2 to 10 weeks. All other attempts proved wrong and created failure.

6) Avoid code ownership. Everybody owns the code. No heros defending their great code.

7) People assignment works with a “Big Board”. Every engineer has a magnetic button “now” and “future”. The board has all projects listed. Every engineer is asked to put his “now” button on where he’s working currently and his “future” button where he plans to work next. This is great to improve transparency and needs the organization to FOCUS.