• Imint Video Primer – It’s not about the megapixels

    • 6 November 2018
    • Blog
  • Welcome to the sixth lesson of the Imint Video Primer! Today we will have a talk on smartphone camera hardware. How do these small cameras work, how can they be so small, and how do they keep getting better and better?

    Smaller, better, faster, stronger

    In the earlier days of computers, up to the late-2000s, newer processors were faster because they packed more transistors and ran at a higher clock frequency. That is a measurement of how many calculations, one at a time, that can be performed every second, measured in megahertz (MHz, millions of operations per second) or gigahertz (GHz, billions). The higher frequency was an easy way to market new computers, and to compare them.

    Eventually, physics got in the way, as higher frequencies demand a higher power consumption and develop more heat. There are two ways forward. CPUs nowadays perform faster by doing more complex operations less often. GPUs work by doing very large amount of very small operations, but at the same time (in parallel). This is perfect for graphics or machine learning, but that’s a topic for another time. Both approaches ensure that Moore’s law still holds.

    The same type of “speed” arms race emerged with the advent of digital cameras, and later on again in smartphone cameras. While “photo quality” is a wide and ambiguous term, saying a camera has more megapixels quickly became a marketer’s dream – as with megahertz, the megapixel count is a simple number to describe how good a camera is. Obviously, it’s a lot more complicated than that, but since digital cameras were so new, a higher resolution generally meant a newer, improved camera. It was good enough. Until, of course, it wasn’t, and physics caught up yet again.

    There are quite a few areas important to good quality still photography and video recording that become significantly impeded on smaller physical scales required to fit in smartphones. This impediment must be compensated for by clever software. The perfect analogy is using software-based Vidhance stabilization to improve a video instead of the hardware-based tripod, or using Vidhance Auto Zoom instead of a scripted and professional TV studio. Let’s look at some finer details.

    Software to bridge the hardware gap

    Obviously, manufacturing processes keep improving, and camera modules keep getting more competent. But you cannot change the laws of physics, captain. Even if you’re ready to pay several fortunes, some qualities simply don’t manifest themselves on such small scales. Fortunately, with a lot of processing power available, software can minimize or even eliminate some defects. We’ve already talked about hardware augmentations such as Optical Image Stabilization (OIS) in a previous text, so we will focus on software here.

    First, there really is no “objective” way of looking at the world, even for electronic devices. How scenes are captured depends greatly on well-known and similar-sounding features like Auto focus, Auto white balance, Auto exposure time, etc. to try to capture a good photo. They do not necessarily replicate what you see with your own eyes. It is common to make these algorithms saturate the colors a bit “extra”, to make them “pop”. These features are also a requirement for video recording, where the time dimension plays a critical role. While subsequent photos may retune these settings (for better or worse) between shots, they must only smoothly and gently change between subsequent frames in a video. Various algorithms for computing these settings exist, and are subject to ongoing research.

    Something quite recently popularized in smartphones is creating a more-or-less fake Bokeh effect in portrait shots. This means the camera distinguishes between the foreground and background and blurs the background, creating an emphasis on the subject (usually a person). Large camera houses can narrow the focus and create this effect naturally, but the small cameras in smartphones cannot. Instead, machine learning can to some degree be used to understand what the foreground subject is, what its contours are, and blur the rest of the image. Better, and increasingly common, is to use a dual camera. The smartphone uses the offset of both lenses to calculate a depth map of the scene, just as our brain does with a pair of eyes.

    The bokeh effect: The subject in the foreground is in focus while the background is blurry, which helps to emphasize the subject. Original photo by carlosluis on Flickr.

    A synthetic process like this has some flaws that don’t occur naturally. Individual hairs are sometimes mistakenly blurred together with the background. Glasses in the foreground will be considered to be in the foreground, but the background seen through them will not be blurred, which it would be in a proper bokeh effect. So it’s not perfect, but generally good enough. Some phones also have an additional purpose for the multiple cameras. For example, the second sensor can be a high-resolution black-and-white to supplement the primary sensor with additional brightness information, improving image quality.

    Specializing beyond a general depth map, a trained AI and dual cameras can construct a 3D map of a face, and re-light it with fake studio lighting, highlighting points of the face like the nose, cheeks and chin that would have been emphasized by external studio light. This gives the image a dimensionality you could normally only achieve using external lighting solutions or a lot of post-processing.

    Zoom functionality is harder to emulate. Although some optical zoom may be available, especially with multiple cameras, most of the smartphone zoom is digital. Digital zoom means just cropping the already available image. As there is no more information to use, quality is reduced. All hope is not lost, as AI may be able to essentially fill in the details on the cropped image, as an artist would improvise on a canvas, based on prior experience of other photos. Laughing at “zoom and enhance” on TV crime shows may become a thing of the past, after all. Related to zooming, any distortions caused by movement are amplified with the amount of zoom. Vidhance certainly helps, both with video stabilization but also by creating a smoother zoom experience with Live Auto Zoom. At Imint, we realized a long time ago that good quality is not just clever engineering, but about helping the user accomplish tasks, like an accurate and smooth zoom. This, and more, is the subject of the next section.

    Even with parts of the photo removed, the AI was in some examples successful at reproducing the original with decent quality. See more examples here.

    Helping the user

    There is a lot of computing power available in smartphones and drones. All that power can be used for helping out with things that apply to the art of photography, rather than the shooting itself. A lot of these features are available today, others are sure to become more common. It’s not enough to have the ability, true help is automatic and quick.

    Editing color, exposure, applying filters, and other post-shot improvements, are very popular. With modern AI methods, this can be done in automatic fashion, to highlight important parts. Understanding what parts of an image are important is the subject of a research area on “image saliency”. It can be used to crop images by removing uninteresting parts all over, not just the edges – called Seam Carving, something that made the rounds on the internet some years ago. This video explains this process really well, see the video here.

    Image recognition techniques can be applied to auto-shoot on smile, or even to auto-select the best photo from a burst of photos. This is helpful when taking photos of a child or pet, or a group of people. This is in fact also where software beats hardware. When filming yourself (vlogging, video messaging, video conferencing, etc.), a physical gimbal stabilizer would stabilize the surrounding world, but your head would swerve around. Image analysis can stabilize the image around your face instead, something Vidhance already can support with its selfie mode, as with automatically generating a time lapse from a long video.

    Unfortunately, it doesn’t stop with just getting the right moment and composition. The final photo or video may also be perceived differently on different screens. It’s no different from how a great sound recording will be different when listened to on cheap headphones. Android phones especially vary quite a lot without a broad common standard, but that’s a problem which hopefully is being fixed.

    Only you can prevent forest fires – err – bad photos

    For a lot of people, the smartphone has become the primary means of photography and video recording, and we are doing more and more of it. Therefore camera quality has been, and will long remain, a strong selling point in all smartphone market segments. This, of course, is not news if you’re familiar with Imint. But with all these fancy technologies, what colorful compositions or cinematographic expressions can you, as the user, assemble?

    Smartphone camera sensors are much smaller than you find in other compact cameras. The distinct disadvantage in image quality is in dim lighting.  Some window light will do more to improve your photos than a new phone or camera. If you’re indoors, try to set up your shot so there’s light falling on your subject. Good light, preferably natural, is always better than your phone’s underpowered LED flash.

    We talked about autofocus and auto exposure earlier. You can make a big difference tweaking exposure settings manually, to brighten or darken a scene. Find these settings and experiment! Use them to brighten the shot of your fancy instagram dinner, or to darken shadows in a portrait for a more dramatic look – your images will start to stand out from the crowd.

    From working with smartphone cameras – all lessons learned, all knowledge assembled, and all software perfected – the mass of expertise will carry over in a multitude of related areas. Most problems apply to drones, for example, and smart software is already in professional DSLR cameras too. The amount of photos and videos on the internet is projected to skyrocket for a long time forward, and increasing the quality of all aspects of it is what we’re dedicated to.

    There are also plenty of people talking specifically about smartphone cameras on the internet, both still photography and video making. Invest some time and check them out. Of course, getting a camera with Vidhance certainly helps you a lot as well. 🙂

    Marcus Näslund