Customer & Service Provider – silver lining on the horizon

Customer & Service Provider interface … epic fails and how to overcome them

In a recent post I was talking about an interesting situation regarding the collaboration with our service provider (see https://www.agile-minds.com/customer-service-provider-epic-fails/). After some weeks I’d like to take a chance to draw some conclusions and report back some successes and learnings. I definitely see some silver lining on the horizon regarding the collaboration part – but also discovered some real black holes within our organisation. It’s always the same: process insufficiencies will always show up – latest – in the IT department …

The Modified Setting

Now in the development partnership with our service provider we pay for 2,5 software developers + 0,75 project management person. This means we added significant ressources to the aspect of project planning, communication and management.

Our project management person and the pendant at the service provider really met twice a week to discuss progress, issues and need for further planning on specific tasks. In addition to that it’s me and the general manager of the service provider meeting almost weekly to discuss issues and problems. This approach helped to improve the communication a lot. We’re now in a continuous dialog. Understanding of each other has raised enormously. A great step!

The New Situation

We are still not happy with the outcome – but the reasons have changed dramatically. Previously, we as an organization were not able to articulate our needs and get this information over to our service provider. We changed our communication paradigm and introduced the face-to-face communicaion twice a week. Now, the service provider has a ways better view of our requirements and is eager to start implementation. However, a lot of our stories lack quality from a subject matter view. How did we come to this point?

Multiple stakeholders talking to the service provider – We only have one person talking about new requirements – our project manager. He’s responsible to collect requirements in-house and communicate them back to our service provider. This works quite good. We’ve setup internal bi-weekly calls to collect and prioritize requirements. We’ve had our first quarterly face-to-face meeting where we started to build and organize the company backlog.

No clear communication of priorities, requirements and goals – Flag it as solved with the previous point.

Fundamental changes in design after going live – This is still an ongoing topic. But it’s a question of educating our internal people. Since we’ve one single person doing the communication towards the service provider it’s one of his duties to explain people that if there’s a change in requirements after going live we’ll treat this change as a new requirement – and this needs to get prioritized and implemented. Education.

The New Symptoms

No progress over all – Nope. Having progress. Or eager to go on. But …

Bad requirement quality – This was not transparent so far. But it came up when we introduced the new way of working. It became clearer and clearer that our subject matter experts – the stakeholder – didn’t exactly know what they wanted – not even at the point when they were talking to the service provider. Or they knew what they wanted – but didn’t consider the other business units in the design phase.

The Idea … did work out so far

The two aspects I’ve decided to implement

  • Setting up a stable and reliable single face of contact on working level
  • Introducing basic agile principles to get to stable requirements

worked out smoothly so far. As usual the introduction of agile aspects increased transparency of involved processes and especially the quality of the different artifacts a lot. And new issues turned up.

Quality of Requirements

As said previously the introduction of agile principles – well, I’d say of new communication patterns – produced process insufficiencies of our own organization.

Silo thinking – our business is separated in different business units. The different units have their own targets and try hard to reach them. There is no competition between the units – but everybody takes care of their own business. Two of the business units eventually use the same instance of the student information system. Especially between those units we have a lot of cases where e.g. the layout of the exam certificates differs, rule sets for course acceptance differ, specializations of courses differ and so on. Small differences – which might be needed … or not – which have a big, big impact on implementation capacities. It’s a difference to create different rule sets for courses – double the implementation work. And why? Ah, the business stakeholder didn’t have time to discuss their individual needs and combine them into mutual needs.

Low quality – not well thought-out – Another point we found is the quality and depth of the various business requirements. A lot of our business requirements arrive at our project manager at a stage where it says “We need a proper implementation of …”. Well, what does “proper” mean? Many times we experience developers from the service provider receiving last-minute changes – still. That’s one of the focus points for future improvements.

No consistent prioritization – Our exec management wants to review the implementation prioritization. Fair enough. We’ve agreed on having a call/meeting every second month. In this call we’ll run through the company backlog and listen carefully to what exec management sees as top priorities. In parallel we have our working calls with the business stakeholder every second week. Together, I believe, we’ll arrive at a prioritization which puts most important topics first.

My biggest learning out of this so far?

No matter how big the issue – start introducing agile patterns. Here especially the communication, company backlog and the face-to-face meeting helped a lot to unveil the real issues behind the “bad customer / service provider” interface.

The nice part about the agile pattern is the transparency. It automatically points you to the real pain points – through transparency.

Stay tuned and read on – to be continued.

Customer & Service Provider – Epic fails

Customer & Service Provider interface … epic fails and how to overcome them

Just recently I decided to change positions – moving away from a CTO role with a whole development department attached towards a CTO role with all software development tasks being outsourced to a service provider / product company. And I felt like being in a time-warp … not into the bright new and shiny future – no … directly into middle age – the darkest version of it.

See why – and how I plan to maneuver all of us out of this.

The Setting

The product company is very specialized on delivering a specific product aiming at university campus management software. We’re paying 2,5 developers to work full-time on our product branch – we call it a development partnership.

The product company develops in PHP, does deployments with puppet – however utilizes a full tomcat environment and PostgreSQL as database. They claim to work in an agile manner … and utilize JIRA (yeah!). Peeking into their JIRA unveils a per-person-association-to-tasks (well …). Visiting the development team produces clean white walls, no empty bottles or cans around, no architecture diagram scribbles, and no nothing. Suspect environment for full-heart agile developers. But let’s see.

The Situation

We are not happy with the performance of the service provider. Why? They don’t deliver on time, in quality and make promises all the time. Hmmm. After few visits and talking in-house and the service provider the situation became clearer to me.

Multiple stakeholders talking to the service provider – We had multiple parties of our business units talking to various people at the service provider. All of us spoke with different voices and everything was super-important. In the end not the feature with highest business impact made it live – it was the one of the person shouting loudest…

No clear communication of priorities, requirements and goals – In our discussions with the various people at the service provider we were not able to communicate a clear order of priorities nor were the requirements defined – not talking about documented or clear – and goals of the implementation tasks were sometimes not even clear to our internal stakeholders.

Fundamental changes in design after going live – I think that’s a real classic: The whole implementation is done, the product change is live and the stakeholder suddenly realizes that the color coding of a logo needed to change. So, the stakeholders didn’t even realize that changes in the live product are the most costly changes – compared to those done on powerpoint or photoshop level right in the beginning.

The Symptoms

No progress over all – Having not communicated a goal, a strategy or anything with a guiding function it’s no wonder there is no felt progress in the cooperation with our service provider. Sure, everybody was giving his/her best and we were moving. But nobody was satisfied at all. We didn’t get what we wanted and the service provider felt bashed at every move they made.

Communication went really personal – I personally attended the at-the-time two-weekly phone call with our service provider. We had the CEO, 3 business unit leads, one project manager and another 3 subject matter experts on the call – together with one of the managing directors of the service provider. As usually, I learnt afterwards, the call ended in real bad communication on a personal level. There was no common sense, every word was political, no commitment to anything – it was protect my a** everywhere.

The Idea

This needed to end. So, after my first week I had a meeting with the board of the service provider. Board means two people – the founder and investors. I presented them with an idea to improve the situation. At the core of the idea there were two main principles:

  • Setting up a stable and reliable single face of contact on working level
  • Introducing basic agile principles to get to stable requirements

They bought into this approach and we agreed to set this up.

Stable and reliable single face of contact on working level – I made my project manager to spend 80% of his time to manage the interface to our service provider. The service provider put 40% of a person into this task.

The two of them meet twice a week. Idea behind is to get closer communication flowing. The sooner we know about problems, the faster we can solve them. Anything not clear? Any issue to solve? Too less resources? Whatever. On Tuesday, they meet to discuss the current stories in implementation and issues. Furthermore, they discuss future stories to get feedback on expected implementation issues and too less specific requirements. On Friday, they meet to discuss the same topics and in addition I and the managing director meet to discuss and decide on topics to be resolved.

Internally, we made clear that it’s my person and the project manager who speaks to the service provider – nobody else. NOBODY! This is the only way to remove distracting conversations on non-important topics and puzzling requirements. Not to talk about setting other priorities on tasks. This all happens when the internal stakeholder have direct access to the service provider. Now, the project manager collects the user stories and requirements. We meet on a frequent basis – at least once a week, usually every day – and exchange our knowledge on user stories.

Introducing basic agile principles to get to stable requirements – Obviously, we had XLS-sheets of various formats and contents. All of them full with requirements and sorted to various priorities. We kindly removed all of them and started our Company Backlog.

We agreed with the service provider to have sprints of 2 weeks. The service provider still does releases on a 4 week basis – but we work in 2 week sprints. This allows us to communicate delays of features in terms of 2 weeks instead of month. “Well, it didn’t make it into this sprint – we will put it into the next sprint. This starts in 2 weeks.”

The Company Backlog is now organized in JIRA. We attach all relevant information to the user stories and hand them over to our service provider. Unfortunately, we couldn’t agree on using the same JIRA instance. Initially, I thought to use a Company Backlog view where we organize the next sprint which is then handed over to development and they start associating tasks to the user stories.

The daily standup is a bi-weekly get-together of the relevant people and starts to work out pretty good. During these meetings the two of them do a kind of backlog grooming and discuss the next stories to be implemented.

So, to summarize. Agile elements are:

  • The Company Backlog
  • The Product Owner (on our side – still named project manager)
  • The Scrum Master (on service provider side – also named project manager)
  • Sprints of two weeks duration
  • Backlog grooming
  • Planning 1

Does it work? Well, I’m sure it will be an improvement after all. Is it optimal? Let’s see!

 

Website performance – best practice to improve

Website performance comes in various flavours – but where to start with improvements? How to improve the performance? What are best practices to follow?

Tony Gentilcore (@tonygentilcore) talks in a blogpost about “An engineer’s guide to optimization“. Tony basically identified 5 steps to follow.

Step 1: Identify the metric. 

Identify a scenario worth being optimized – means it moves a business metric. If – after all thinking and crunching – you’re not able to identify a scenario with a clear relationship between the optimization and any business metric you should look for more pressing problems first – and revisit the performance issue later.

Step 2: Measure the metric.

After you’ve identified the metric – establish a repeatable benchmark of this scenario / metric. Include this metric measurement into your continuous integration / delivery pipelines and watch out for regressions. First, start with synthetic benchmarks and later include the real world (Real User Monitoring).

Step 3: Identify the theoretical optimum of your metric.

Think about your scenario and create a best-best-best-case. What would be the benefit, the performance maximum to gain out of the scenario. Given everything works really well, what would be the top-performance figure?

Step 4: Approach your optimium.

Identify the bottlenecks preventing you to reach the optimum. Work on these bottlenecks – start with the biggest impact first. Don’t stop optimizing until you reach a point where you have to invest more than you benefit.

Step 5: Profit from your achievements!

Agile – criteria to look for in an Agile business

In February 2013, I made an attempt to define Agile on my blog. I ended in a quite high-level definition of Agile:

Agile is a collection of values and principles that encourage a certain type of behaviour: focus on value generation and collaboration.”

Just recently, I found an interesting blog by Rouan Wilsenach working with ThoughtWorks where he went one level deeper and explained “Four Attributes of an Agile Business

1. Feedback – “This is what my customer wants.”

In my definition this relates to focus on value generation. Companies need to put the customer in the center of all their efforts. Rouan refers to an interesting list / toolbox on experience design. Only if organizations take this attribute serious they will finally be successful applying principles of agility.

2. A responsive team – “Yes. We can do that.”

In my definition this very much relates to values and principles supporting collaboration. Responsiveness comes only with co-location. In my experience, true collaboration over a distance can only be achieved within a trusted environment. People need to trust and respect each other to become a distributed team. If they don’t know each other of follow slightly different goals, it’s very likely to fail. So, the team spirit can be created most easily when forming inter-disciplinary teams which are located ideally in the same room. Rouan refers to Conway’s law (comparison of code structures with organizational patterns)

3. A responsive code base – “It’s ready. Shall we release it?”

For me, this again a perfect example of value generation. Have a feature ready? Why not shipping it immediately? Great entry for principles like “Zero Bug Policy” or “Continuous Deployment”.

Furthermore, a responsive code base is a clean code base. People love to work on such a code base. It’s almost bug free, easy to read, simple to understand, documented. People are proud of what they’ve developed, created.

4. Continuous direction – “What do we do next?”

Again, this point goes as value generation. Why employing a bunch of people working hard on software development, building great products if there is no direction behind, no strategy?

Clearly one of the major tasks of managers in an agile environment is to set the strategy. Product people break this strategy down into epics and stories. So, what to do next? Pick the next story from the prioritized backlog.

So, for me Rouan picked some really great criteria and attributes for an agile business. They’re in line with my previous definition (which I like) and take it to a less abstract level.

Reducing staff leads to higher productivity – proof for Brooks’ law?

Recently, my teamlead agile methods pointed me to an interesting corelation. After a discussion we concluded with a really mind-boggling observation. A settled organization facing change will increase productivity when reducing staff headcount – in other words: “Reducing staff leads to higher productivity”.

Bang! But let me explain. For software development tracking and planing purpose we use JIRA – so we’re able to track the amount of created and closed software development related tasks quite easily. The headcount in software development, product management and UX/design can be obtained easily by – counting (we actually asked our HR department for support).

In our case we ended with this curve – impressively proving Brooks’s law.

Brooks Law in action

The graph shows a period from June 2013 (left) to July 2014 (right) and focuses only on one of our products at the time.

Green shows the number of software developers working on the product. Purple shows the number of product and UX/design people involved. The two curves are declining.

Red shows the amount of releases done in the period.The curve is trending towards a stable, flat curve.

Blue shows the amount of tickets closed and turquoise the amount of bugs closed. The two curves are decreasing trending towards a way higher plateau.

What happened? In June 2013 time frame we had a whole architecture team and quite some software developers organized in three teams working on our product. Around this point in time uncertainty hit the organization due to rumors of to-be-expected acquisition of our whole company. Uncertainty translates often to people leaving the organization. This can clearly be seen in the middle of the graphic. Surprisingly, the curves associated to work results don’t decline – as one would expect – but decrease even. So, productivity of the remaining staff increased! The most right date in the graph is July 2014 and reflects an organization which has one product team and one support team left. The architecture team disappeared completely, the team is ways lighter equipped with people.

What might we learn? Brooks’ law indicates that adding people to a project doesn’t automatically make the overall project finish faster. It might even end in a longer project run time. We’ve seen that removing people might result in better productivity. We think the fact of removing whole teams dedicated to special topics (in our case: the architecture team) helped to increase productivity in two ways. First, the responsibility of architecture decisions was pushed back into the teams and dead-lock situations where person A waits for a decision from person B just disappeared. Second, the time needed to communicate, agree and disagree simply vanished. Decisions had to be taken and nobody needed to be asked.

Is this a general pattern? I personally think this effect does only appear for a certain period of time. The drawback of such leaner organizations is obviously the lack of work on technical debt and architecture drawbacks. People in such situation focus on the obvious and postpone work on maintenance tasks for later. For me, it is definitely an observation I’d like to share with you. Perhaps you have similar experience? If so, please let me know.

Is there really a relation between the number of staff members and the productivity? Where is the limit? How can one push the limit?

Clean Code Developer Initiative – a structured learning way to skill improvement

“Clean Code – A Handbook of Agile Software Craftmanship” by Robert C. Martin is a classical reading for software developers. Great, but how do you get the spirit he described into the heads of your whole software development organization? The Clean Code Developer Initiative by Ralf Westphal and Stefan Lieser drafts an interesting way to improve the skills of your software developers with the spirit of Uncle Bob Martin’s book in mind.

The Clean Code Developer Initiative (CCDI) proposes a grade based system with color coding. It borrows concepts from game design by introducing a level concept. Entry level is marked with black and levels up via red, orange, yellow, green, blue to white.

For each level there are principles and practices listed which a software developer – or a whole department – should follow to get to the next level.

Urs Enzler from planetgeek.ch produced a nice Clean Code Cheat Sheet – it’s worth a look.

Clean Code Cheat sheet

There is a nice poster available by “unclassified software development“. It’s even possible to print it out in A0 format – huge!

Clean Code Developer Poster

The whole level of a developer can be shown through wristbands. Allow them to be proud on their achievements!

Clean Code Developer wristbands

I personally haven’t tried the effort to introduce the principles of Clean Code into a software development department – but it looks like it’s a feasible effort. Accompanied by trainings and some team events, it should be possible to appeal to the developer’s honor – and improve your software quality standards.

Page Load time – how to get it into the organization?

Page load time is crucial. Business and technology people as individuals start understanding the importance of this topic almost immediately and are willing to support any effort to get fast pages out of your service.

But how can you foster a culture of performance and make people aware of the importance of this single important topic – amongst hundred other important topics?

That was one of the challenges early 2013. Management and myself were convinced that 2013 one of our key focus topics is around web performance. t4t_optimizedThe birthday of “T4T”. The acronym stands for …

  • Two – Deliver any web page within 2 seconds to our customers.
  • 4 – Deliver any mobile web page within 4 seconds to our customers over 3G.
  • Two hundred – Any request over the REST API is answered below 200 milliseconds.

So, early 2013 we started T4T as an initiative to bring our page load times down to good values. To measure the page load time we experimented with two tools: Compuware’s Gomez APM tool and New Relic’s APM tool. Gomez was used initially for our Java based platform and New Relic for our Ruby on Rails platform. But we were able to measure and track-down some really nasty code segments (i.e. blocking threads in Java or 900 database requests in Ruby where 2 finally did the same job).

How did we get the idea of T4T into the organization? Any gathering of people with presentation character was used to hammer the message of web performance to the people. Any insight on the importance, any tip, hint, workshop, conference, article, blog post, presentation, anything was shared with the team. Furthermore, T4T was physically visible everywhere in the product development department:

T4T_closeup

THE LOGO – visible … everywhere … creepy!

T4T_Corner

T4T logo and information on page load and web performance at the relax area for software developers and product owners …

T4T_VPOffice

T4T at the VP office door.

For me, especially the endless talking about the topic, raising the importance, questioning of e.g. JPG picture sizes, special topic discussions on CSS sprites vs. standalone images or the usage of web-fonts for navigation elements helped a lot to raise the curiosity of people. Furthermore, giving them some room and time for research work helped a lot.

What did we achieve? Well, one of our platforms – based on Ruby on Rails started with page load time of 2,96s in January 2013. End 2013, the platform was at an impressive 2,15s page load time. In the same time, the amount of page views increased by factor 1,5!

Loadtime_secret_2013

Page Load time over the year 2013

During the same time period, the App server response time dropped from 365ms to 275ms end of year – this time doubling the amount of requests in the same time.

Response_time_secret_2013

App server response time over the year 2013

Most interesting, we had one single release with a simple reshuffling of our external tags. Some of them now load asynchronously – or even after the onLoad() event. This helped us drop the page load time from around 2,5s to 2,1s – 400ms saved!

Impact_of_one_event_secret_adtags_after_onload

Impact of one single release and the move of adtags after the onLoad() event.

So, my takeaways on how to foster such a performance culture?

  1. You need a tangible, easy to grasp goal!
  2. Talk about the topic and the goal. Actually, never stop talking about this specific goal.
  3. Make the goal visible to anybody involved – use a logo.
  4. Measure your success.
  5. Celebrate success!
  6. Be patient. It took us 12 month …

Talk on Continuous Delivery at CodeCentric event in Hamburg

End 2013, the general manager of CodeCentric in Munich asked me to do a presentation on our view / achievements / experience in the context of Continuous Delivery. The first of a series of events happened in Hamburg on 26th of November in a nice location.

I put the presentation on SlideShare to make it accessible to others as well. The title of the presentation “Continuous Delivery? Nett oder nötig?”.

The presentation talks about our goals, why we decided to introduce continuous delivery as a way of delivering software to our business, shows some of the experience we made, tells somethings about the challenges we had transforming our architecture to fit the new delivery ways, tools and some more.

In case you have any questions, don’t hesitate to get back to me on michael (at) agile-minds.com.

10 rules to prevent a web site from high scalability

On highscalability.com there is a great post on things you should do to prevent your web site from high scalability: “The 10 Deadly Sins Against Scalability“. The post points to Sean Hull who twitters and writes quite frequently on scalability topics (surprise, surprise).

Sean Hull wrote in his blog about “5 things toxic to scalability” (2011) and “Five More Things Deadly to Scalability” (2013). Definitely worth reading entries on high scalability – and common pitfalls.

Book by Martin L. Abbott, Mihcael T. Fisher on scalability rules.

In the context of this topic, Sean also recommends a book: “Scalability Rules for managers and startups”.

Very good reading to avoid all high-scalability pit-falls right from the beginning!

 

 

 

Software releases without damages and poor user experience @ Facebook

Recently I did some research on software releases and how huge successful companies manage to release their software without causing system failures and poor user experience.

I found this question being answered by a former Facebook release engineer: “How do big companies like Facebook, Google manage software releases without causing system outages and poor user experience?“. In addition to this great answer, there is also an interview at arstechnica “Exclusive: a behind-the-scenes look at Facebook release engineering“.

Facebook follows since the very beginning the principle of zero downtime and no service interruption. How do they accomplish this – even now, when they’re big?

Pushing releases in multiple phases

Facebook pushes their releases in waves. The initial phase, named “latest” always contains the latest code changes (hence the name). All engineers are connected to the “latest” staging system, gather in an IRC channel when the initial push happens and watch logs and error messages meticulously. If the build proves to be okay a push to some servers in production happens (p1-phase). Again, all developers concerned with the release collect in the IRC-channel to watch the release and gather KPI changes, log messages and errors. The next stage, phase p2, includes roughly 5% of all live server systems – again thoroughly monitored. Finally, when phase p2 proves to be good again, the deployment takes place on all server systems. Deployment @ Facebook means copying a 1,5GB binary to all servers – done with a customized bit torrent distribution system.

If an error occurs? Well, the developers are on IRC and held accountable to fix their piece of code. If you crash it, you repair it!

Multiple versions of code running simultaneously

Executing code in Facebooks’ server environment means automatically running multiple versions of code simultaneously. This means an extra effort to address this principle. Hardest is to migrate database schemes from one to another.

Features with on-off toggles

Facebook utilizes a tool named “Gatekeeper” to allow real-time on/off switching and throttling of features. Only few code changes need to be introduced and Facebook operations can control the traffic and which features are available. Code in such environments need to be highly decoupled – no dependencies between features …

Versioned static resources across the web tier

All the web servers in Facebooks server farm are able to serve all static content of all versions being deployed. This means that all servers are equipped with all resources prior to phase p1 deployment. This allows the whole web tier to remain stateless.

If you condense down what’s said in the articles it comes to these points:

  1. Automate everything!
  2. Test early, test often!
  3. Hold developers responsible and let them fix live errors.
  4. Each release has an owner involved from all stakeholder teams.
  5. The product is designed to be rolled back. From the beginning.
  6. The product is designed to execute multiple versions at the same time.
  7. Run multiple environments!
  8. Deploy incremental!