Live video mixing with the BBC: Lessons learned

Doug WinterFounder and CTO

The Stack

In this post I am going to reflect on some of the more interesting aspects of this project and the lessons they might provide for other projects.

This post is one of a series talking about our work on the SOMA video mixing application for the BBC. The previous posts in the series are:

In my view there are three broad areas where this project has some interesting lessons.

Novel domains

First is the novel domain.

This isn’t unfamiliar – we often work in novel domains that we have little to no knowledge of. It is the nature of technical agency in fact – while we have some domains that we’ve worked in for many years such as healthcare and education there are always novel businesses with entirely new subjects to wrap our heads around. (To give you some idea, a few recent examples include store-and-forward television broadcasting, horse racing odds, medical curricula, epilepsy diagnosis, clustering automation and datacentre hardware provisioning.)

Over the years this has been the thing that I have most enjoyed out of every aspect of our work. Plunging into an entirely new subject with a short amount of time to understand it and make a useful contribution is exhilarating.

Although it might sound a bit insane to throw a team who know nothing about a domain at a problem, what we’re very good at is designing and building products. As long as our customers can provide the domain expertise, we can bring the product build. It is easier for us to learn the problem domain than it is for a domain expert to learn how to build great products.

The greatest challenge with a new domain is the assumptions. We all have these in our work – the things we think are so well understood that we don’t even mention them. These are a terrible trap for software developers, because we can spend weeks building completely the wrong thing with no idea that we’re doing so.

We were very lucky in this respect to be working with a technical organisation within the BBC: Research & Development. They were aware of this risk and did a very good job of arranging our briefing, which included a visit to a vision mixing gallery. This is the kind of exercise that delivers a huge amount in tacit understanding, and allows us to ask the really stupid questions in the right setting.

I think of the core problem as a “Rumsfeld“. Although he got a lot of criticism for these comments I think they’re bizarrely insightful. There really are unknown unknowns, and what the hell do you do about them? You can often sense that they exist, but how do you turn them into known unknowns?

For many of these issues the challenge is not the answer, which is obvious once it has been found, but facilitating the conversation to produce the answer. It can be a long and frustrating process, but critical to success.

I’d encourage everyone to try and get the software team into the existing environment of the target stakeholder groups to try and understand at a fundamental level what they need.

The Iron Triangle

The timescale for this project was extraordinarily difficult – nine weeks from a standing start. In addition much of the scope was quite fixed – we were largely building core functionality that, if missing, would have rendered the application useless. In addition we wanted to achieve the level of finish for the UX that we generally deliver.

This was extremely ambitious, and in retrospect we bit off more than we could reasonably chew.

Time is the greatest enemy of software projects because of the challenges in estimation. Estimation for software projects is somewhere between an ineffable art reserved only for the angels, and completely impossible.

Iron triangle, with the three sides made up of quality, scope and time

When estimates are impossible, time becomes an even greater challenge. One of the truisms of our industry is the “Iron Triangle” of time, scope and quality. Like a good chinese buffet, you can only choose two. If you want a fixed time and scope, it is quality that will suffer.

Building good software takes thought and planning. Also, the first version of a component is rarely the best – it takes time to assemble, then consider it, and then perhaps shape it into something near its final form.

Quality is, itself, an aggregate quality. Haste lowers the standards for each part and so, by a process of multiplication, lowers far more the overall quality of a product. The only way to achieve a very high quality for the finished product is for every single part to be of similarly high quality. This is generally our goal.

However. Whisper it. It is possible to “manage” quality, if you understand your process and know the goal. Different kinds of testing can provide different levels of certainty of code quality. Manual testing, when done exhaustively, can substitute in some cases for perfection in code.

We therefore managed our quality, and I think actually did well here.

Asynchronous integration components had to be of absolute perfection because any bugs would result in general lack of stability which would be impossible to trace. The only way to build these is carefully, with a design and the only way to test these is exhaustively with unit and integration tests.

On the other hand, there were a lot of aspects of the UI where it was crucial that they performed and looked excellent, but the code could be rougher around the edges, and could just be hacked out. This was my area of the application, and my goal was to deliver features as fast as possible with just acceptable quality. Some of the code was quite embarrassing but we got the project over the line in the time, with the scope, and it all worked. This was sufficient for those areas.

Experimental technologies

I often talk about our approach using the concept of an innovation curve, and our position on it (I think I stole the idea from Ian Jindal – thanks Ian!).

In practical terms this can be translated into “how likely I am to find the answer to my problems on Stack Overflow“.

At the very left, everything has been seen and done before, so there is no challenge from novelty – but you are almost certainly not making the most of available technologies.

At the far right, you are hand crafting your software from individual photons and you have to conduct high-energy physics experiments to debug your code. You are able to mould the entire universe to your whim – but it takes forever and costs a fortune.

There is no correct place to sit on this curve – where you sit is a strategic (and emotional) decision that depends on the forces at play in your particular situation.

Isotoma endeavours to be somewhere on the shoulder of the curve. The software we build generally needs to last 5+ years, so we can’t pick flash-in-the-pan technologies that will be gone in 18 months. But similarly we need to be relatively recent so it doesn’t become obsolete. This is sometimes called “leading edge”. Almost bleeding edge, but not so close you get cut. With careful choice of tools it is possible to maintain a position like this successfully.

This BBC project was off to the right of this curve, far closer to the bleeding edge than we’d normally choose, and we definitely suffered.

Some of the technologies we had to use had some serious issues:

To use IPStudio, a properly cutting edge product developed internally by BBC R&D, we routinely had to read the C++ source code of the product to find answers to integration questions.
We needed dozens of coordinated asynchronous streams running, for which we used RxJS. This was interesting enough to justify two posts on this blog on its own.
WebRTC, which was the required delivery mechanism for the video, is absolutely not ready for this use case. The specification is unclear, browser implementation is incomplete and it is fundamentally unsuited at this time to synchronised video delivery.
The video compositing technologies in browsers actually works quite well, but was entirely new to us and it took considerable time to gain sufficient expertise to do a good job. Also browser implementations still have surprising sharp edges (only 16 WebGL contexts are allowed! Why 16? I dunno.)

Any of these one issues could have sunk our project, so I am very proud we shipped good software, with all four issues.

Lessons learned? Task allocation is the key to this one I think.

One person, Alex, devoted his time to the IPStudio and WebRTC work for pretty much the entire project, and Ricey concentrated on video mixing.

Rather than try and skill up several people, concentrate the learning in a single brain. Although this is generally a terrible idea (because then you have a hard dependency on a single individual for a particular part of the codebase), in this case it was the only way through, and it worked.

Also, don’t believe any documentation, or in fact anything written in any human languages. When working on the bleeding edge you must “Use The Source, Luke”. Go to the source code and get your head around it. Everything else lies.

Summary

I am proud, justifiably I think, that we delivered this project successfully. It was used at the Edinburgh festival and actual real live television was mixed using our product, given all the constraints above.

The lessons?

Spend the time and effort to make sure your entire team understand the tacit requirements of the problem domain and the stakeholders.
Have an approach to managing appropriate quality that delivers the scope and timescale, if these are heavily constrained.
Understand your position on the innovation curve and choose a strategic approach to managing this.

The banner image at the top of the article, taken by Chris Northwood, shows SOMA in use during the 2017 Edinburgh Festival.

Back to The Stack