Imint Video Primer – Testing, testing, one two

Welcome to the seventh lesson of the Imint Video Primer! Let’s take a look at the heroic and continuous efforts to test our video software products, to ensure maximum quality at all times. Spoiler: It’s not easy.


Welcome to the testing lair

In the otherwise light-coloured, glassy, and buzzing office just off the main walking street in Uppsala, Sweden, a multitude of computer screens and phones light up even the darkest of Swedish winter evenings. They also provide rich amounts of heat, when that fourth cup of Earl Grey just isn’t enough.

But one of the corner rooms is dark and different. Its walls are painted black, and the few windows have special blinders to completely block out all external light. The room has all sorts of gadgets, from simple IKEA desk-lights to professional studio lighting, simple screwdrivers to expensive shake rigs, and the decorations are all sorts of abstract art. The optics lab is a must for a company like Imint and will be described further in the following sections.

In the other end of the office, a small room filled with blinking lights, computers, and a plethora of different phones all hum a steady hum. It’s loud and warm in there, but never too warm, monitoring systems and cooling ventilation make sure of that. The server room is a common sight in most software companies, but this one has some special parts to it that we’ll get back to.

Writing a high-performance video technology SDK like Vidhance is very hard, especially when dealing with the cutting edge. No major performer would go on stage without rehearsing. A major part of the development cycle consists of similar tests, taking the time to make sure our software maintains the quality that users expect without consuming too much energy. Testing and careful calibration, both automatic and manual, are required.

Testing automatically

A well-known company in the camera testing industry is DxOMark. They’ve subsequently become very popular in the smartphone industry as camera quality has become more and more of a selling feature. DxO is behind a lot of editing software, but DxOMark specifically rank and award numeric scores for a range of areas of both photo and video quality, such as autofocus, noise, and stabilization. These scores are later weighted together to a final numeric score to describe the camera’s overall prowess. Remember how we in the last text said that it’s not all about the megapixels? It may be (at least partly) all about the DxO score.

A hexapod is a high-precision device for movement in all six degrees of freedom, a necessity for reproducible stabilization tests.

The black-painted camera lab has equipment in the range of a million SEK. We’ve got a shake rig, lighting gear, moving trains, paintings of TV test views, and a plethora of other gadgets, with use cases such as calibration, comparing between different software versions and other manufacturers, testing new algorithms, and to reproduce the DxO setup. We have worked with them and been on location for different device tests in the past. For better or worse, the DxO score is the only agreed-upon customer-and-industry standard for camera quality. It’s not necessarily all objective, some things certainly are subjective, and somewhere a cut-off decision is reached about how to measure something.

Although we carry the same equipment, there is still some margin of error measuring locally as compared to at DxO in France. There may be slight differences in the software used, and it may be too easy to optimize for what you are good at when you’ve written the software yourself. More embarrassing entries on this list include accidentally moving the lights or some stickers on the floor when cleaning, corrupting future tests until it’s discovered. Anyway, general scores mean nothing for products like Auto Zoom as there is no industry standard for – and likely no way to quantify – success, and yet we still need rigorous testing. Make way for the human subjects.

Testing manually

Sometimes there’s no objective truth. Although we can agree on how to measure “shakiness” and to a degree how good something like automatic white balance is, designing a user interface is a complex process. Deciding what works best in the end requires actual human testers.

After determining exactly what we want to know, we ask people to perform certain tasks, like using Live Auto Zoom to create a video of a moving lego train. We simultaneously record all screen interactions and comments. Our biggest test so far involved 16 people. Several quantifiable values fall out of this approach, such as the time it takes to perform the given task and the number of mistakes or recovery paths taken along the way. The goal, of course, is to minimize these numbers. But other qualitative observations can be far more valuable, such as noticing – and putting into words – what users expect to happen and find intuitive.

Model trains aren’t just a kid’s toy, they’re a valuable tool for testing Live Auto Zoom. If you haven’t found a Christmas present for our lab crew yet…

For reasons of secrecy, we invite external test subjects only when a product is already known and more or less released. Before that, only employees here at Imint are used. This set of people, also known as “us”, is obviously biased. We are all professionals in our field, and know exactly what we want the end result to be like. It’s like predicting the world cup winners by only asking people from one country. Fortunately, so far we’ve seen that results of internal tests and discussions are well reflected by what other people say and do at later stages.

We also test our products by mounting smartphones on drones. The shake and movement induced by the propellers is quite different from holding something in your hands. Side-by-side comparisons and tests in “real life” scenarios, such as walking around the beautiful cathedral in Uppsala or along the streets of Shanghai, matter more than raw numbers from standardized scores. It is in the real world where our users will use the technology. We believe this passion for real-life scenarios, combined with lab equipment allowing for climbing to the high industry-standard scores, is what puts us on the top of our field.

Helping the developers

Like most companies, Imint’s customers make demands on its subcontractors. They, like us, need to ensure that what they deliver holds to the standards their users expect. They test the software they buy, of course, but the later a bug is discovered the more difficult it may be to fix. Therefore it easily behooves us to be diligent during development. DxO testing happens late in the process and the Vidhance integration needs to be already great by then. Even worse is if a serious bug is discovered in the hands of consumers.

Unreleased prototype phones sure aren’t pretty, and the software isn’t always stable, but we make sure the video always has both these qualities.

Testing software is a science all on its own. At Imint, it can become even more complicated, not because we are very rigorous – so are other companies – but because of the hardware involved. We need to test on many different phones from different manufacturers, using different OS, chipsets, CPU architectures, etc.. That requires a lot of preparation, especially for the integration tests when all components are supposed to come together. We care deeply about variables like battery time and heat dissipation. Both are closely monitored and related: hotter temperatures mean using more energy and worse performance. You might say we spend a lot of effort being “cool”.

Other software on the phone isn’t necessarily reliable, especially during the early development cycle. To decrease dependencies and possible crashes, we use newly-flashed phones with only the bare minimum installed. Building Android from source can require upwards of 100 GB of disk space (although the end result is much smaller) and a fair bit of time, just for one of many versions. How can we handle such sizes efficiently?

We found a clever way. All necessary versions are on a PC with plenty of disk space, with as much as possible precompiled and relevant parts ready to be replaced with Vidhance code. If these were later moved to a new computer, the edit dates would change, and the build system would think everything changed and rebuild everything from scratch. To avoid that, code is stored in a light-weight virtual machine called a docker container, with its own emulated file system. The fake file system fools the build system and only the new parts are built before assembling the build. The docker image also ensures the build environment is the same every time. Fast and efficient.

I’ve looked at clouds from both sides now

Testing, and evaluation of one’s testing procedures, are continuous procedures. It’s a constant trade-off between time vs. the granularity of your bug-catching net. The later a problem is discovered, the more expensive and stressful it becomes. All the software testing we have talked about here is done locally, i.e. on premise. Some fine-tuning and calibration can be done on-site at the customer, depending on the feature and our agreements.

Some of the automatic tests can certainly be performed in the cloud, but with our needs for specialized on-demand phones and GPUs it’s harder to find solutions. Services like AWS Device Farm allow testing of apps, but we often want to reprogram the lower levels of the phone itself, and mostly on phones not yet released. No such service is known to us, perhaps a business idea for someone? In any event, it is of no concern to us, as the bottleneck is usually the tests’ runtime on the phone, and the cloud simultaneously opens up a new attack vector in terms of IT security, which is why we haven’t pursued this venue.

I hope this gave you an insight into what happens inside the very busy Imint office! It’s been almost a year since my first blog text back in January. Let me wish you a nice end to this year, and stay tuned to see what we have in store for 2019!


Marcus Näslund