Imint Video Primer – Software: It’s all layers upon layers

Welcome to the fifth lesson of the Imint Video Primer! Today, we will take a look at the sweet details of Android itself, and the know-how needed for writing video enhancement software like Vidhance. I apologize in advance for all the food references. Do not read while hungry.


Just like a cake

In the old days, a new computer had a new operating system and everything else tailor-built exactly for that hardware. This has significant advantages, for example that everything can run exceptionally fast even with the limited hardware of the time, there are more details in a more detailed study.

The problem with this approach is that it scales very badly since everything has to be remade as technology advances. Progress becomes slow and expensive and technology remains difficult to use. The solution? Standardization and abstraction. Just as those delicious birthday things called cakes, technical devices today are layers upon layers upon layers, and it’s turtles code all the way down. At the bottom sits the raw hardware made up of individual transistors, wires and resistors. At the top are the abstract notions of a video app with Auto Zoom and your interactions with it. What takes place in between?

Figure 1: The green Android robot, the logo of the Android operating system. Fun fact – each version is named after some delicious dessert.

Hardware is a beast all on its own, for which I am likely not competent enough to ramble on about. It, too, has smaller components grouped to larger and larger circuits and eventually large-scale components such as a processor, a camera, and a gyro sensor. I have already written about this earlier, so let’s assume it’s all there and focus on the software this time.

A stack of pancakes, err, software

The Android operating system is split into several layers, stacked on top of each other. This is why all the parts related to for example the camera is referred to as the “camera stack”. Pieces of different layers define a specific interface (an application programming interface, API) that describes how other pieces may interact with it. The main purpose of all these layers is to create abstractions from different hardware required and to provide a common connection to the layer above. Isolating those connections makes it easier to port to new hardware (other phones) or run Android on completely new hardware (TVs, cars, etc.) by just adapting the required layer. Everything on upper layers of the stack above remain unedited. Likewise, Vidhance abstracts as much as possible from the system it runs on, making it easy to port.

This is, unfortunately, easy to deviate from. Previously, code written by the smartphone vendors (lower in the stack) was co-mingled with edits to the Android core which made updating the software to newer Android versions hard. With an Android version upgrade you then need to redo your previous work on those levels, alongside updating your code in lower levels. Project Treble, where the vendor implementations are now completely separated from the Android implementation, is a new Android initiative to ease the process of updating Android on already released devices.

Figure 2: The Linux kernel acts as interface to the hardware, with applications on top of core Android libraries, on top of Linux.

At the bottom of Android sits the Linux Kernel, originally developed in 1991. It provides a level of abstraction between the raw hardware and the upper layers. It was originally developed for use in desktop computers and servers. It is a testament to the power of today’s mobile devices that we find this software at the heart of the Android software stack. But apps don’t run directly on Linux, each Android app runs within its own instance of the Dalvik virtual machine, an intermediate layer. Running applications in virtual machines provides a number of advantages. Firstly, applications cannot interfere (intentionally or otherwise) with the operating system or other applications, nor can they directly access the device hardware. Secondly, this enforced level of abstraction makes applications platform neutral in that they are never tied to any specific hardware. Port the lower level (Dalvik), and the applications on the higher level resting on this new foundation will still work perfectly.

We know all these details because Android is “open source”, meaning all the source code is openly available to anyone for free. The same is true for Linux, Firefox, VLC, WordPress, and much other well-known software. More specialized code that differentiates one smartphone brand from another is generally not open source. Here, explicit deals with the manufacturers regulate access to the source code and to the devices for testing prior to a release with Vidhance. But even though the main product may be proprietary, many large IT companies open source parts of their infrastructure and tooling, like Google’s open source TensorFlow library for machine learning.

The same is true for Imint. For example, ooc-kean is a collection of mathematics and graphics code we have used, but no longer actively use for new things. This is in some sense an act of giving back to the community. Like Newton, we’re all standing on shoulders of other giants.

Add a slice of Vidhance

Some products are possible to fully integrate in the application layer, although in most cases only user interaction takes place here. Input data is collected from the user and sent down to lower camera stack levels. The video is edited and returned as output up the stack to the user. The Hardware Abstraction Layer, or HAL, is the interface between software and Android hardware, such as the camera. The Camera HAL API is defined by Android and this is where the manufacturer-specific code begins. The original HAL is implemented by the chipset vendor, but can be customized by the smartphone manufacturer since the source code is delivered with the chipset. This means most of the integration is identical for devices using the same chipset, but there might be minor differences due to changes by the manufacturer.

We generally discern between two types of integrations. A shallow integration is independent of the vendor, implemented using official Android APIs only. This is ideal for scalability, it’s simple and easy, but is rarely good enough since not enough control of the underlying hardware is available which severely impacts performance. A deep integration takes place in the vendor implementation of the camera stack, such as the HAL. It takes more time to develop because it is more coupled with the specific chipset and other hardware, but also enables more freedom. Below the HAL are the specific chipset drivers. Integration there is also possible and sometimes necessary, but not always possible because source code for the driver is not always shared by the chipset vendor.

Different smartphone vendors will have not just different requirements, but require custom modifications to the HAL implementation and solutions adapted to their unique camera pipelines. Vidhance must adjust to these special challenges, and may even need to be adapted to work well together with other software algorithms, or combine with hardware stabilizers like OIS.

The wrap-up

Software used to be unique and specialized for each technical equipment, but that’s very inefficient. Nowadays, pancake stacks are just layers of lasagna all the way down. Wait, that’s not right.

Software is generally abstracted into layers, forming stacks, with code all the way down. That’s it.

The same is true for all technical devices, where a bit of performance is traded away for much simpler and faster innovation. Without it, a new modern camera would take many more years to develop, and adding Vidhance would be a huge task. But separating the system, both horizontally (different stacks) and vertically (different layers) are lessons learned in development and used for the design of Vidhance itself. And the knowledge of the layers of the underlying system, like Android, is necessary to incorporate Vidhance features properly, and to understand and solve potential problems.

Today, an integration job is modularized as much as possible to minimize the effort needed both for new and existing customers, both for new models and new types of hardware. We are constantly trying to improve our architecture and level of abstractions, working towards minimizing the time required for new customers. Time is money, especially in business. It’s all about efficiency.

Thanks for reading! Now, let’s eat. (But as with e-mail, avoid the spam.)


Marcus Näslund