• Imint Video Primer – Smartphone Motion Sensors

  • Welcome to the third lesson of the Imint Video Primer! This part will start to dive deeper into hardware, from how sensors measure motion to how mathematics helps us use all this information for stabilization!

    Six degrees of freedom

    Let’s start with this common phrase. Our world consists of three spatial dimensions: breadth, width, and height. The word freedom refers to free movement in these three dimensions.

    With “movement”, we mean both changing location by moving along an axis through space – known as translation, but also doing a rotation about an axis while still remaining at the same coordinates. Consider how you would say you are moving when you are turning around, even though you remain on the same spot.

    A smartphone and its camera has six degrees of freedom. Moving forward/backward, up/down, or left/right, is translation. Rotation around an axis is often termed pitch, yaw, or roll, depending on which axis we rotate about. Although some are more prevalent and more strongly affect video quality, video stabilization software has to understand, process, and potentially handle all six types of movement. But first it all has to be very precisely measured.

    Smartphone sensors

    Two types of sensors are of interest when tracking movement. An accelerometer detects the g-force associated with the current movement. This sensor is a very tiny chip with extremely tiny (around half a millimeter thin) moving parts made of silicon. Watch a more detailed explanation here. The gyroscope is more of an orientation tool for your smartphone: Roll, pitch, and yaw of your phone will be automatically detected by the gyroscope. The size of a typical micro-electromechanical gyroscope inside smartphones is in the order of a few millimeters.

    Information is pulled from these sensors at least a hundred times per second or faster! Compare this to the normal 30 frames per second video. A valid question to ask here is why we need sensor updates more often than there are new frames in the video. This will be answered further down.

    Movement can also be computed visually by analyzing movement in the image itself, using methods such as “Optical flow” with tracking technology, as we discussed in the previous post. Translation and rotation information is provided to Vidhance. From all this data, Vidhance calculates the total movement and by changing the frame appropriately turns each frame in to the scene you intended – separating intended motion from unintended motion, keeping only the former and canceling the latter.

    But how do we go from raw data to a motion model?

    “I’m afraid we need to use… math!”*

    Cue dramatic music.

    For all of this to work, the data reported by the sensors must be expressed and analyzed mathematically. Although images are two-dimensional, the world we and our cameras live in is not. We have six degrees of freedom in our 3D world. A translation can easily be stated with a 3D vector, and rotation can be expressed with advanced matrix algebra, or with four-dimensional numbers. Wait… what?

    Yes. Just as the complex numbers are a two-dimensional extension to the real numbers, quaternions are four-dimensional numbers. They can be added, subtracted, multiplied and divided according to their own special rules, consistent with the arithmetic rules of real numbers. Read more details about quaternions here.

    As it turns out, a rotation in 3D space can be represented with unit quaternions. They provide a more convenient mathematical notation for representing rotations of 3D objects. They also have the advantage of being easier to compose, and compared to rotation matrices they are more numerically stable, maybe more efficient, but not as easy to understand for a human. Representing rotations using unit quaternions also avoid an infamous problem in mechanical engineering known as gimbal lock.

    Even if this seems straight-forward, the actual computations are not. Sensor data is “noisy”, i.e. comes with a certain inaccuracy, and can only be polled a limited number of times per second. Turning all this data into one meaningful mathematical expression, especially on low-power hardware, is part of the Vidhance magic. At the same time, hardware can compensate for some of this, as we will see in the next part.

    * This fabulous catchphrase is unfortunately not my own, but comes from the TV show Futurama. 

    What’s the big deal?

    What are some of the main problems?

    Rolling shutter effects arise because a complete video frame is not recorded in an instant. Instead, the pixels are recorded in some order, either vertically or horizontally. With the camera in a state of motion, the last pixels may record a different scene than what the first pixels started with, causing a skewed and distorted image.

    Somewhat related, Motion Blur is the apparent streaking of rapidly moving objects, as a result of when the scene itself changes while recording a frame. This can be used for great photographic effects (using very long exposure times to illustrate stars moving in the night sky), but an unintentional and large addition of blur is rarely good.

    Example showing stabilization with and without compensating for in-frame motion artifacts.

    Optical Image Stabilization (OIS), is used to counteract some of these disturbances. It works by physically moving the camera lens ever so slightly, according to high-speed information it receives from the device’s sensors. In smartphones, the lens is shifted (translated) small distances, and not rotated. This makes it useful for still imaging, but bad for video, as we shall see below.

    Since the OIS compensates for rotations by translating the lens, its corrections actually cause perspective distortions, which become very evident in video.

    Different companies use different names for this technology, but nearly all high-end smartphones use OIS for both photo and video. It was invented to at least partially address in-frame issues such as motion blur and rolling shutter effects. It is a tool for correcting problems that for example a bigger lens and better sensor would not have, but which are too costly or too problematic to incorporate into a smartphone. The net result of its work is an improved sharpness in motion.

    Because of the limitations in compensation angle, OIS can not compensate for the bigger amplitude motion between frames. In short, OIS can enhance quality of each individual frame, but not the overall stability of video when, for example, you are out walking holding the camera in your hand. Vidhance must then also adapt to this motion and eliminate the distorted perspective, along with everything else above.

    Vidhance brings it all together

    Let’s zoom out now. Phew, that was a lot. And there are plenty of topics we have not even covered here, such as lens distortions or hardware synchronization issues. Still, it probably suffices to say that good video stabilization software needs to account for all of the above, and more!

    Vidhance Video Stabilization technology performs both in-frame stabilization, such as rolling shutter elimination, and inter-frame stabilization, analyzing and removing unintended motion. All wrapped up in one package, Vidhance stabilizes and perfects video in all real-world scenarios.

    With additional features such as Live Auto Zoom and Auto Curate, Vidhance goes beyond mere quality enhancing and starts to improve the very aesthetics and experiences of recorded video. No hardware in the world can do that!

     

     

    Marcus Näslund

    marcus.naslund@vidhance.com