Taking object tracking for cameras in motion to the next level

Object tracking has many key uses and faces special challenges in cameras in motion. To understand these challenges and what it will take to overcome them, we’ll go over the history of object tracking, different object tracking approaches and the gap in object tracking research. Then we’ll show you a real-life example of the potential of advanced auto zoom technology to take your object tracking to the next level.

ObjectTrack

The history of object tracking

Object tracking is the process of locating and following one or more objects over time using a camera. It has a variety of uses, including human-computer interaction, security and surveillance, video communication, augmented reality, traffic control, medical imaging, video editing, and even compression – our blog post on compression  talked about detecting movement from frame to frame.

This is quite different from object detection, generally used to automatically detect a predetermined set of types of objects in a video, and object recognition, used to recognize and identify objects in a video – distinguishing people from cars and so on. It is interesting to note that machine learning (ML) and deep learning (DL) methods completely dominate detection and recognition algorithms today, whereas they play a smaller role in object tracking, although this is bound to increase over time.

Object tracking, object detection and object recognition are still open problems in image and video analysis, each with many different approaches. Both detection and recognition build upon object tracking techniques, but this blog post will focus only on object tracking.

Trackers are steadily becoming better and better, but common problems remain. Trackers can easily fail in cluttered scenes, occlusion (the tracked object disappears behind another object) is a difficult problem, and good initialization (the tracker’s starting region) still matters a great deal.

Different object tracking approaches

As stated above, there are many approaches to object tracking and many variations within each approach. Which method is “best” is largely dependent on a specific application. The cutting-edge methods constantly change with new research and breakthroughs in related fields.

Below, four very different object tracking fields will be outlined on the basis of feature clouds and correlation filters. The aim is to visualize different techniques and make them easier to understand. Typically, the art of finding the location of the object and the art of estimating its new size (if it’s moving towards or away from the camera) are considered separate processes.

Different trackers are constantly benchmarked. For examples, see the “Object Tracking Benchmark” report in IEEE Transactions on Pattern Analysis and Machine Intelligence, and a more recent unpublished study on Github.

Color histogram trackers

Color histogram trackers are commonly used for auto focus, where precision does not matter at all. They don’t stand a chance in competitions because they can easily follow something else, but their fast redetection works even if the object has changed shape completely. They are not robust enough in the face of light changes and backgrounds with similar color patterns.

Motion-based trackers

Motion-based trackers are well-suited for mounted cameras in security systems and for following fast moving objects. The method basically just follows a detected region that differs from the trained background. They are better at handling changes in illumination but will not be able to detect objects when they are not moving.

The gap in object tracking research

Most research for visual object tracking is currently done on high-performance PCs. There are plenty of competitions online and offline that generally focus on accuracy, and most algorithms can perform better by simply increasing the available computing power and/or ignoring any time constraints. Some competitions have sub-competitions for the best real-time trackers, but overall accuracy is the main point of research.

Additionally, most object trackers are benchmarked for datasets primarily with professional still camera videos. This is a vastly different use case from cameras in motion on smartphones, drones, bodycams and similar devices. Using such comparatively low-power hardware means that you don’t have the luxury to take any half-measures. Every clock cycle counts, not just for performance but also for battery consumption.

Having to optimize for performance without compromising on accuracy makes achieving effective tracking even more difficult. This doesn’t just mean taking state-of-the-art algorithms and making them faster. It also requires inventing new methods to achieve the same or even better performance with less effort.

There is very little (public) research about trackers for cameras in motion. A paper published in Pervasive and Mobile Computing [2] presents how a Kernelized correlation filter (KCF) tracker can be used in real time (~30 frames per second) on a phone. It is hard to perform a direct comparison as the paper is short on exact implementation details. However, it is possible to track objects at many more frames per second, and a KCF tracker does not perform any scaling analysis. This means that the tracker size and shape stay the same during the whole video, whereas an ideal tracker should be able to adapt to changes in size and shape.

Overcoming object tracking challenges in cameras in motion

Zooming in on moving objects with cameras in motion like smartphones is practically pointless these days – it is hard to do a smooth zoom, hard to aim at the object and hard to keep it in focus. On top of this, any unintended motion is amplified with the level of zoom. To exemplify what how to overcome these challenges, we’ll look at a representation of probability-based optical flow using our own video enhancement software Vidhance’s Live Composer feature.

Vidhance outperforms state-of-the-art-algorithms in quality. At the same time, it only consumes 0.08 ms per frame on a mobile CPU compared to the common 200 ms deterministic version that results in poor quality. All the user has to do is select the desired object and Vidhance will automatically track and follow, zooming smoothly and composing the video professionally as the scene changes.

Video: One of our engineers validating the robustness of our tracker against changes in size, movement, and heavy shaking.

Video input may be huge, but each frame is often downsampled (resized to a smaller resolution) before processing, which makes the quality of the original resolution less of a concern to us. Running software directly integrated into the lower levels of a camera in motion fortunately also comes with certain advantages, such as a highly accurate motion estimate from the stabilizer module.

Live Composer takes object tracking to the next level

Powered by advanced object tracking and auto zoom functionality, our Vidhance Live Composer features is trusted for critical use cases in drones, security cameras, smartphones and even air traffic control solutions. Combined with our powerful video stabilization algorithms, Vidhance is up to the toughest life-and-death tasks, whether that is identifying a dangerous felon on a security camera or track people in dire need of rescuing during a wildfire from a drone.

Contact us to learn more about how we can help you take your object tracking to the next level in cameras in motion. We’d be happy to discuss your needs and book a demo. For inspiration, insights and best practices for the next generation of video enhancement, enter your email address below and subscribe to our newsletter.