Archive

software

Over the years I have worked in software development, one discussion I’ve often observed is the benefits of structuring software in projects, as most of the companies still do today. It seems that the de facto standard for software in an enterprise still is (rough sketch, this process varies a lot from company to company):

  1. Someone has an idea
  2. Idea is considered good enough to be invested in
  3. A team is assembled, and they discuss the vision and set up a plan
  4. Plan is executed with software being written
  5. Team hands over the project to a support team
  6. Software is being used (hopefully)
  7. Support team keeps software alive and kicking throughout the years

Admittedly, this is just a simplified version of what actually happens. Organisations differ in how they execute all of these phases and how long it takes to go from one to the other, but my main point is that there is usually a project team, that creates and puts a product live, and they will leave once everything is done. In some cases there will even be a country difference, where a local team will do the development, handing over to an offshore support team with the aim to keep the overall cost low.

As a consultant, I’ve seen the consequence of this behaviour too many times. Since no one is actively working on that product anymore, it keeps decaying, with some patches made once in a while to add new functionality. After a few years everyone realizes that it is cheaper to just throw the big ball of mud that they currently have away and rewrite the product from scratch… and the cycle starts again.

Now this helps to keep developers employed, so I should be happy about it, but from a company’s perspective, there are a few problems with this model:

It assumes that a software project is something static, that you can write, finish and then just use it for ever. Since it isn’t, the result is that bugs are being fixed by people that didn’t write the code and don’t have the same understanding of it, which results in poor support and probably more bugs.

It assumes that not only software is static, but also that the business is. So instead of thinking about software as something that is there to help an ever evolving business, it delivers a package that the business or their customers have to adapt to, probably resulting in worse business performance.

With the devops movement gaining momentum around the world, this scenario is currently changing in the development phase of a project, as Jen described here. We are starting to see more and more cases where development team support the application while they are building it, which is definitely a step forward, but it is not all that needs to happen. Supporting a product during its initial development phase is one thing, evolving and supporting it throughout its existence is another.

If a product is live and being used, than it should be evolving as the needs within that user group (and even new user groups) evolve. And if that’s the case, it should be treated as a first class citizen, where maintenance and evolution of the code walk together, guaranteeing that code doesn’t turn into legacy. It doesn’t mean it needs a team the same size as the one used to build it in the first place, but it needs a team that is in contact with the customer and aiming at evolving the product, not just patching it and keeping it running.

We should stop using support as a bad word. Actually, we should stop using the word support completely. We should start talking about software evolution.

More than few years ago I’ve read the book Agile Modeling, by Scott Ambler, and it was quite a revelation for me. I was beginning to look into extreme programming and TDD at the time, and the fact that you could (and should) write software in an evolving manner instead of the usual big architecture up front that I had studied in university was quite refreshing and empowering.

(as a side note, I actually was going to use the title Agile infrastructure for this, but I’ve promised I’m not to use the word anymore)

Not sure how many people have read the book, but it basically goes through principles and techniques that enable you to write software one piece at a time, solving the problem at hand today, and worrying about tomorrow’s one tomorrow.

If I remember correctly, there was a sentence that went something like this (please don’t quote me on that):

Your first responsibility is to make the code work for the problem you are solving now. The second one is the problem you are solving next.

Many years have passed since the book has been written. Nowadays growing software while developing is what (almost) everyone is doing. The idea of drawing some kind of detailed architecture that will be implemented in a few months or years is completely foreign in most sensible organisations.

Basically, evolving software is almost not interesting anymore. People do it, and know how to do it (as I wrote this I’ve realised it isn’t actually true, a lot of companies don’t do it or know how to, but let’s keep in mind the ones that do…).

In the meantime, a lot has evolved and new areas that were completely static in the past are becoming increasingly dynamic, the current trendy one being IT infrastructure.

The uprise of virtual infrastructure and the so called devops movement have developed tools and practices that make it possible to create thousands of instances on demand and automatically deploy packages whenever and wherever you want. However the thinking behind infrastructure within most IT departments is the equivalent of waterfall for software.

I’m not just talking about auto-scaling here, since that seems to be a concept that’s easy to grasp. What I don’t quite get is why the same thinking that we have when writing software can’t be applied when creating the servers that will run it.

In other words:

  1. Start writing your application in one process, one server*, and put it live to a few users.
  2. Try to increase the number of users until you hit a performance bottleneck
  3. Solve the problem by making it better. Maybe multiple processes? Maybe more servers? Maybe you need some kind of service that will scale separately from the main app?
  4. Repeat until you get to the next bottleneck

* ok, two for redundancy…

The tools and practices are definitely there. We can automate every part of the deployment process, we can test it to make sure it’s working and we can refactor without breaking everything. However, there are a few common themes that come back when talking about this idea:

“If we do something like this we will do things too quickly and create low quality infrastructure”

This is the equivalent of “if we don’t write an UML diagram, how do we know what we are building?” argument that used to happen when evolving software was still mystery to most people. It’s easy to misunderstand simplicity as low quality, but that doesn’t need to (and shouldn’t) be the case. As with application code, once you put complexity in, is a major pain to take it out, and unnecessary complexity just increases the chance for problems. Simple solutions are and will always be more reliable and robust.

“We have lots of users so we know what we need in terms of performance”

If a new software project is being developed, it is pretty much understood nowadays that nobody knows what is going to happen and how it is going to evolve over time. So pretending that we know it in infrastructure land is just a pipe dream in my opinion.

“We have SLA’s to comply to”

SLA’s are the IT infrastructure equivalent of software regulations and certifications, sometimes useful, sometimes just a something we can use to justify spending money. If there are SLA’s, deal with it, but still in the simplest possible way. If you need 99.9% uptime, then provide 99.9% uptime, but don’t do that and also use a CDN to make things faster (or cooler) just in case.

As it’s said about test code, infrastructure code is code. Treat it the same way.

If you ever worked with me you would know I’m not a big fan of estimates, mostly for the reasons better explained here, here and here, but there are still moments within a project where there are a bunch of stories written and teams need to have a guess on how much time will be needed

  • the project might be beginning and we need to know what is realistic or not
  • there might be go to market activities that need to be synchronised in advance
  • there might be a fixed deadline and we need to understand if there is any chance of making it or not

In cases like these, I’m still not a big fan of using planning poker or similar practices. First of all, it takes a _lot_ of time. Whoever has experienced a long session of estimation can probably remember people rolling their eyes as soon as we get to card number 54 (or around that…).

And handling the short attention span of tech people (which could probably be increased for the better) is not the only problem here. In every project there will be a lot of similar cards, and reestimating similar things over and over is probably not the most productive thing a software team could be doing, and also tests the patience of anyone involved.

Instead, what I’ve used in the past is a simple technique for group estimation (that I’m sure I saw somewhere before, so don’t credit it me for it) that will allow a group to get to some numbers with less time and effort.

1. Write all the stories you have in cards, and put them on top of a table.

2. Create 3 separate areas within the table, based on different timeframes. What I normally use is 1-5 days, 1-2 weeks and “too big”.

3. Ask the team to go over the stories and position in the categories they find appropriate. Let individual people move (and move again) cards however they want for a few minutes.

4. Let everyone go through the table and look at the cards, and observe the ones that are being moved between categories frequently.

5. Get the unstable cards and the ones in the “too big” category and discuss them within the team. Rewrite cards as appropriate.

6. Rinse and repeat if needed.

Is it precise? Probably not that much. Are any estimates precise? Definitely not. So far every time I’ve used we got a good level of results that were in the right timescale, which is probably the most you will get from software estimates anyway.

Regarding the fact that every story is not individually discussed within the team, a common argument in favour of detailes estimates, I believe there are better times to do that than when looking at all the cards with no context or experience working on them. Time to establish some story kick-offs maybe?

To close my participation at LAST Conference, I’ve presented a follow up of the talk I’ve done at LESS 2011, talking about why I believe most organisations are not set up for learning.

In the presentation I’ve explained my thoughts on why I believe change programs are often unfair to employees, asking them to embrace change, but only the one that management is putting forward.

I’ve also talked about learning within organisations, product teams, and how management teams should step back and understand their new role as leaders instead of controllers of the company.

If it sounds interesting to you, there is more info here.

After working in IT for a while, one thing I’ve noticed is that there is a lot of legacy code hanging around. Some companies are better than others in dealing with it, but there are a lot of reasons while code becomes old and very hard to maintain (some discussed here), making it quite a difficult task to keep an organisations’ codebase up do date.

Since every problem must have a solution, legacy code comes hand in hand with “technical” projects,  where the main goal is to rewrite an application (or part of it) in a new technology, or in a “better” way.

There are different reasons for this. Sometimes the company traditionally developed things in a specific technology and has since evolved, leaving behind a few systems that were never updated. Sometimes the original code was developed without long term aspirations (some would say hacked together) and it became more useful than expected, surviving years running in production, but also with high maintanance costs and lacking in efficiency.

Just making it clear, I agree with the need of retiring legacy code. I do believe it’s better not to create it in the first place, but once it’s there, something has to be done. However, these kind of projects come with more risks than usual in my opinion, and as John once told me,

“any project thas starts with a laundry list of problems is not a good project”.

First of all, how can you tell what has to be delivered? If something got to the point of being called legacy code, you can be sure it has stayed a few months (or years!) without being properly mantained and evolved. In the current times, any customer needs that were being solved a few years ago are most likely not current anymore. In other words, if you are rewriting the same thing again you are definitely writing the wrong thing.

Besides the business issues, any legacy code that has been around in an organisation for a decent amount of time has lived in the imagination of most developers long enough for everyone to be sure they know what needs for it to be developed perfectly this time. And as in any technical solution that has no customer goals at the end, teams end up spinning their wheels and polishing the codebase over and over again, until they reach perfection or the money runs out (which, sadly, usually happens first).

But how to approach this by not focusing on the problems to be solved? Go back to the origin and find out what problem you are trying to solve. Then find the customers who care about it and start from there. There is nothing different from any other customer facing project organisations deliver.

Every line of code should exist and be mantained for a reason, and if no reason can be found, then it might as well be deleted.

In my current project we are working with automation of deployments, so a lot of our work involves automating builds and (trying) to create a pipeline using build tasks (currently using Jenkins).

One of the things that started to annoy us is that as a standard build tasks are part of the main code repository for an application, even when the tasks itself don’t depend at all on the codebase. In our case, we are using Rake with Ruby, and using RPM’s to deploy, so our Rakefiles would contain tasks that involved testing and deploying RPM’s, but still were included in the application.

I can see the point of keeping the everything related to an app inside the same codebase, but add that to a slow git and gem server, and our build tasks would spend 90% of their time downloading and bundling the application.

So one simple practice that we are trying now is to create a separate repository for build tasks. If all you need is a Rakefile, why do we have to download more than that ?

Just to exemplify how simple it gets…

That’s it. Quite simple, but thought it was worth mentioning!

In different opportunities IT organisations need to develop internal tools/code to be used for its developers. The reasons for it are a lot of times questionable, like in some cases of creating shared code, but there are moments when they are the right option. With the rise of continuous delivery, deployment automation tools is one of the examples where common tools can be useful for a company.

When that happens, internal teams are often assembled to deal with this problem, and create solutions to be used internally. Unfortunately, these teams seem to suffer from a “too much introspection problem”, and often forget lessons learned about product development when going on their mission.

There is a perception that because something is developed internally and “approved” by the head office (after all, they are putting money on it), other people will be much more receptive and understanding about problems and poor customer service.

Since there are no real paying customers and everyone will have to use the tool at the end (at least we hope for), the need for short product iteration and constant feedback doesn’t seem that necessary, plans are made to architect something that could solve all the problems, just not the one that the users are having.

The problem is that, in reality, market laws operate internally to a company as much as they do in the outside world, and poor over architected solutions that are forced upon users are still not a big hit, even between the organisation’s walls.

For real products (the ones that people pay for), the current solution are lean startup techniques and product teams, and I believe the frame of mind when entering the world of internal development should be the same.

Start with a vision, not a complete solution – Even when the problem is supposed to be known and understood, it usually isn’t as much as we hope. Over-architecting is a problem here as in any other software project and the only way to avoid the plan to fail when facing reality is to start small and validate your idea as you go along. Yes, Im talking about a MVP.

Get out of the building – In this case, your users are probably inside the same building, so there are really no reasons to avoid them. Walk around the floor, find the people that you will help with your product and get them to help defining what needs to be executed from your vision.

Iterate fast, close to the customer - Get your solution out there to be used from the beginning. Sit with your users, help them adopt it and treat feedback appropriately (as in hear it and do something about it).

Use metrics, track adoption and problems – Metrics are there to be used, so why not get them to help understanding how your tool is being used and adopted by your colleagues (or not)?

In summary, despite not having paying users, internal software is also supposed to help somehow, and if it doesn’t, it will be left behind and forgotten, so companies should give it the importance it deserves.

A common topic when discussing IT in organisations is how to structure software teams, more specifically, how do we divide work when a company grows enough that one team is not enough anymore, and how do we deal we shared code?

This is not a simple question and probably deserves a few blog posts about it, but now I wanted to focus on one specific aspect of it, which is the duplication of effort.

A simple example would be if my company has multiple teams delivering software, they will eventually find the same problems, as in how to deploy code, how to do logging (the classic example!), monitoring and other things like that.

And in almost all organisations I have been, the response is unanimous: we should invest in building shared tools/capability to do that! It’s all in the name of uniformity and standardisation, which must be a good thing.

Well, as you might imagine at this point, I disagree.

Building shared applications in IT is the equivalent of economies of scale for software, and it is as outdated in IT as it is in other industries.

In knowledge work (as we all agree software development is at this point), no effort is really duplicated. You can develop the same thing 100 times and you are most likely getting 100 different results, some quite similar but yet different. And that small difference between each other is where innovation lies.

This is the trade off that companies should be aware of. If people don’t have freedom to experiment and look for different solutions for problems, is very easy to get to a place where everyone is stuck in the old way of thinking. And that’s what companies are really doing when making people use common tools, shared applications, etc.. They are putting cost-saving in front of innovation, a lot of times without realising it.

And I don’t mean we can’t all learn from each other and reuse solutions when they are appropriate. After all, there is still a place for economies of scale when appropriate.

We are all standing on the shoulders of people that came before us and should keep doing it, even internally to an organisation. But that should be an organic and evolutionary process, not one that is defined by an architecture team.

If you have ever been in an Agile project, or something that looks like it, you should have heard about the concept of spikes.

For those who haven’t, spikes are the agile response to reducing technical risk in a project. In case you are not sure how to build something or fear that some piece of functionality might take much more than expected, and you have to gain more confidence about it, it’s time to run a spike. Usually timeboxed, spikes are technical investigations with the objective of clarifying a certain aspect of the project.

In my current team, we are developing a few tools that are quite specific to our context and were not sure on how to solve a few issues, so we have been quite frequently playing spike cards.

This is all not new and I’m sure most of you have done that before. If you did the way I used to do, you would write a spike card, something as “investigate technology X for problem Y“, would spend 2 days doing it and would have a response for it in your head once you were finished.

In our current context, team members were rotating quite quickly, so we were worried that any knowledge we would get from spikes could have been lost if it was just…let’s say.. in our heads.

Not wanting jut to write the findings up, as we first thought about doing, we decided to tackle the problem with the golden hammer of agile development: tests!

So, instead of writing tests to decide how we should write our code, we started writing tests to state the assumptions we had about the things we were investigating, being able to verify them (or not) and have an executable documentation to show to other people.

For example, here is some code we wrote to investigate how ActiveRecord would work in different situations:

it 'should execute specific migration' do
  CreateProducts.migrate(:up)
  table_exists?("products", @db_name).should be_true
  table_exists?("items", @db_name).should be_false
 end
it 'should execute migrations to a specific version' do
  ActiveRecord::Migrator.migrate(ActiveRecord::Migrator.migrations_paths, 02) { |migration| true }
  table_exists?("products", @db_name).should be_true
  table_exists?("items", @db_name).should be_true
  table_exists?("customers", @db_name).should be_false
 end
it 'should not execute following migrations if one of them fails' do
  begin
    ActiveRecord::Migrator.migrate(ActiveRecord::Migrator.migrations_paths, nil) { |migration| true }
  rescue StandardError => e
      puts "Error: #{e.inspect}"
  end
  table_exists?("invalid", @db_name).should be_true
  m = ActiveRecord::Migrator.new(:up, ActiveRecord::Migrator.migrations_paths, nil)
  m.current_version.should == 3
  table_exists?("products", @db_name).should be_true
  table_exists?("items", @db_name).should be_true
  table_exists?("customers", @db_name).should be_true
  table_exists?("another_table", @db_name).should be_false
 end

We have used this technique just a few times and I won’t guarantee it will always be the best option, but so far the result for us is having code that can be executed, demonstrated and easily extended by others, making it easier to transfer knowledge between our team.

Recently I’ve started working often with Puppet, using it to provision environments for the project I’m working on. One of the things I’ve quickly realised when using it was how long the feedback loop between committing code and actually verifying that the manifest was working appropriately. In my situation, it would be something like this:

  1. Work on puppet manifests, making a few changes
  2. Commit code to repository
  3. Wait for build to finish, which just verified for correct syntax
  4. Wait for latest version to be published on the puppet master
  5. Wait for next sync between master and client
  6. Check that configuration was applied correctly on the client

As you can see, not very simple. If you also consider that I am not very experienced with puppet, you can imagine how I ended up having to retry things in this very long loop, which can end up with anyone’s patience.

Testing Infrastructure Code

Coming from a development background and being used to having very fast feedback about code that I write made me go into a search for testing tools that could ease my pain.

Unfortunately most of the tools I’ve found where not ideal since they focused on unit testing code, as rspec-puppet. Not sure what others think of it, but in the case of puppet manifests and chef recipes, unit testing doesn’t make much sense to me, since there is no code to be executed, and tests end up looking like some version of reverse programming, where you just assert what you wanted to write, but it doesn’t guarantee that the code actually works.

Introducing Toft

Luckily, one of the options I’ve found was Toft, which is a library aimed writing integration tests for infrastructure code using Linux containers. The main idea is that you write cucumber features verifying what you expect the target machine to have (packages, files, folders, etc..) and Toft starts a linux container, applies your configuration and runs your tests against it.

It also can be run from a vagrant box, so you can have your tests running on your mac, which is quite handy.

Features can be created using normal Cucumber steps, and mostly rely on ssh’ing into the target machine and verifying what’s going on in it, so are quite easy to extend and adapt to your needs. Here is an example of a feature verifying if a specific package has been installed.

Scenario: Check that package was installed on centos box
Given I have a clean running node n1
When I run puppet manifest “manifests/test_install.pp” on node “n1″
Then Node “n1″ should have package “zip” installed in the centos box

We’ve started using it in our team when writing new manifests and have also setup a ci build with it, which is quite useful to guarantee our manifests still work over time.

Toft is still in its beginning but I believe it has quite a lot of potential. If you are using chef or puppet you should definitely check it out at:

http://github.com/exceedhl/toft

Follow

Get every new post delivered to your Inbox.

Join 836 other followers