A couple of us went to QCon London last week, which as usual had some excellent speakers and some cutting edge stuff. QCon bills itself as “enterprise software development conference designed for team leads, architects and project management”, but it has a reputation for being an awful lot more interesting than that. In particular it covers a lot of cutting-edge work in architecture.
Scale, scale, scale
What that means in 2010 is scale, scale, scale – how do you service a bazillion people. In summary, nobody really has a clue. There were presentations from Facebook, Skype, BBC, Sky and others on how they’ve scaled out, as well as presentations on various architectural patterns that lend themselves to scale.
Everyone has done it differently using solutions tailored to their specific problem-space, pretty much all using Open Source technology but generally building something in-house to help them manage scale. This is unfortunate – it would be lovely to have a silver bullet for the scale problem.
From the academics there is a strong consensus that functional languages are the way forward, with loads of people championing Erlang. I’m a big fan of Erlang myself, and we’ve got a few Erlang coders here at Isotoma.
There was also some interesting stuff on other functional approaches to concurrency, in Haskell specifically and in general. One of the great benefits of functional languages is their ability to defer execution through lazy evaluation, which showed some remarkable performance benefits compared with more traditional data synchronisation approaches. I’d have to wave my hands to explain it better, sorry.
Erlang is now being used in production in some big scale outs now too: the BBC are using CouchDB, which they gave a glowing report to.
Skype are using Postgres (our preferred RDBMS here) and achieving remarkable scale using pretty simple technologies like pgbouncer. The architect speaking for Skype said one of their databases had 60 billion rows, spread over 64 servers, and that it was performing fine. That’s a level of scale that’s outside what you’d normally consider sane.
They did need a dedicated team of seriously clever people though – and that’s one of the themes from all the really big shops who talked, that they needed large, dedicated teams of very highly-paid engineers. Serious scale right now is not an off-the-shelf option.
Erlang starred in one of the other big themes being discussed, NoSQL databases. We’ve had our own experience with these here, specifically using Oracle’s dbXML, with not fantastic results. XML is really not suited to large scale performance unfortunately. Some of the other databases being talked about now though: Cassandra from Facebook, CouchDB and Voldemort from Amazon.
None of these are silver bullets either though – many of them do very little heavy lifting for you – often your application needs custom consistency or transaction handling, or you get unpredictable caching (i.e. “eventual consistency”). You need to architect around your user’s actual requirements, you can’t use an off-the-shelf architecture and deploy it for everyone.
The need to design around your user’s was put very eloquently by Udi Dahan in his Command-Query Responsibility Segregation talk. This was excellent, and it was pleasant to discover that an architecture we’d already derived ourselves from first principles (which I can’t talk about yet) had an actual name and everything! In particular he concentrated on divining User Intent rather than throwing in your normal GUI toolkit for building UIs – he took data grids to pieces, and championed the use of asynchronous notification. The idea of a notification stream as part of a call-centre automation system, rather than hitting F5 to reload repeatedly, was particularly well told.
DevOps, Agile and Kanban
Some of the other tracks were particularly relevant to us. The DevOps movement attempts to make it easier for development and operations teams to work closely together. For anyone who has worked in this industry this will be familiar issue – development and ops have different definitions of success, and different expectations from their customers. When these come into conflict, everyone gets hurt.
There was a great presentation from Simon Stewart of webdriver fame about his role as a System Engineer in Test at Google, where they have around one SET to 7 or 8 developers to help productionise the software, provide a proper test plan and generally improve the productivity and quality of code by applying ops and automated testing principles to development.
One of the things we’ve experienced a lot here over the last year, as we’ve grown, is that there are a lot of bottlenecks, pinch points and pain in areas outside development too. Agile addresses a lot of the issues in a development team, but doesn’t address any of the rest of the process of going from nothing to running software in production. We’ve experienced this with pain in QA, productionisation, documentation, project management, specification – in fact every area outside actual coding!
Lean Kanban attempts to address this, with methods adopted from heavy industry. I’m not going to talk about it here, but there’s definitely a role for this kind of process management, if you can get your customer on-side.
Training and Software Craftsmanship
Finally what I think was the most interesting talk of the conference and one directly relevant to my current work, Jason Gorman gave a fantastic talk about a training scheme he is running with the BBC to improve software craftsmanship using peer-review. I’ll be trying this out at Isotoma, and I’ll blog about it too!