Imint Video Primer – Video Compression

Welcome to the first actual lesson of the Imint Video Primer!  This first part will discuss what video compression is, why it is absolutely essential, and how it relates to Vidhance.


Video still means very large amounts of data

High-quality video files, even short clips, consume large amounts of memory. A single second of raw 4K-resolution video amounts to hundreds of megabytes!

A video is essentially a series of images, normally around 25-30 for each second of video. Each image is called a frame. Storing or transmitting a video file really entails storing or transmitting a potentially very large set of images/frames:  Naturally, we want to minimize the time and cost of this action, which is why we compress videos: We sacrifice a little quality for much smaller file sizes.

How compression works

Video gets compressed using a video encoder for a specific video format, and for playback its corresponding decoder is used to create human-viewable video. Both steps must be both competent and fast to ensure live recording and live playback.

Figure 1: The camera delivers raw video frames via Vidhance to an encoder, which compresses the video before storing it. The compressed video must then be decoded before it can be viewed.

Images can themselves be compressed, for example by storing them in formats like JPEG and PNG. This is a simple approach, but is not preferable. Video compression usually works by computing and compressing only the differences between frames, called the delta. This is preferable over storing each individual frame in its entirety: If a frame is similar to the previous one, as is often the case in videos, describing how it changed involves less data than describing the full frame.

Figure 2: The red box moves from one corner to another. If we express this difference by the change in each pixel, we end up with a big delta (making all the top-left pixels yellow, and making all the bottom-right pixels red). Good compression needs something better.

As a digital image consists of pixels, the delta is an accumulation of differences in corresponding pixels. The delta between two pixels can be computed by taking the difference between the colors of those pixels. A frame delta consists of all its pixel deltas. We can visualize this delta by drawing it as its own image. If a pixel has the exact same color in both images, that pixel will be black, since the delta is 0 and black color is represented by (0,0,0) in the RGB color space. The larger the difference between corresponding pixels, the brighter their delta pixel will be. The more different the two frames are, the more color and details will be seen in the delta. (Footnote: Raw video on smartphones is often not in RGB but in YUV, but the principle is the same.)

Figure 3: Center: One frame of a shaky video. Left: Delta (changes of each pixel) to the previous frame without stabilizer. Right: Delta with stabilizer, much smaller and therefore easier to compress.

The above image shows one frame of an example video in the center. To the left is the delta to the next frame without any stabilization, and to the right with Vidhance stabilization. Not a lot is happening in the video, so the delta should be mostly black. But note how the outlines of the girl’s hair can be seen as purple lines. This is because adding purple (255,0,255) to what was previously green (0,255,0) results in new white (255,255,255) pixels. As the video was a bit shaky, there are unwanted camera movements resulting in an unnecessarily bright delta, which makes compressing harder and the video file larger. Compare this to the more steady video on the right, less data is required and the file becomes smaller.

A video’s bitrate determines how much data may describe each frame and may be either constant or variable. A higher value enables a better quality but also means a larger file. When encoding video at a constant bitrate, the quality is limited by the amount of data allowed for each frame, so quality may be sacrificed to keep each delta size below the limit. With a variable bitrate, a quality setting determines how much (or little) detail may be sacrificed on each frame, and the file size is secondary. There is also lossless video compression – smaller file size without any loss of quality – but it only goes so far compared to the far more common lossy versions.

Why stabilization helps, and how

Although professional video is often very stable to begin with, the same cannot be said for amateur video captured using a camera or a smartphone. What if, because of camera movement, all pixels moved down one step? By just comparing pixel to pixel, the delta will be large, but it’s very easy to describe what happened in a smarter way.

Encoders try to take advantage of this, by describing not only pixel deltas but how blocks of pixels have moved in relation to each other. Most modern video encoders (one example is H.264) have this form of motion compensation built-in. But it’s designed to be as truthful to the original raw video as possible, and does not make the video any more stable.

With stabilized video, small and unnecessary changes from frame to frame that otherwise occur in recorded video are minimized, effectively making each frame in the video more similar to the one before it. This allows video compression algorithms to more easily minimize the file size and bandwidth required to store or transfer video files. In turn, this improves communication with, and storage of, video in small devices with limited storage space – such as smartphones. At the same time, video quality is dramatically improved in live scenarios such as video conferencing tools, and all private video recordings, because less bandwidth is required to stream the video.

Applying video stabilization software before compressing it can dramatically reduce file size for medium and low quality video. This translates into higher possible video quality and less bandwidth usage with stabilization, ultimately increasing network performance. Our initial studies showed file size reduction typically in the range of 5% to 20% for standard encoding qualities. As a rule, the lower the quality setting, the greater the benefit of stabilization.

Not so easy

Vidhance processes output directly from the smartphone camera before any compression takes place: not on normal video files, but on the massive, raw data streams mentioned in the first paragraph. It must do so with a minimal footprint on performance and energy consumption.  If compression were performed first, the video must then be compressed again, and significant video quality would be lost.  Our new object tracking software used for Live Auto Zoom also benefits from getting the raw video without the subsequent quality loss.

All in all, stabilizing video not only produces a more watchable video, it yields better video quality for a smaller file size and smaller bandwidth requirements.  Best of all, Vidhance handles all of this for you automatically, in the background.


Marcus Näslund