Welcome to the fourth lesson of the Imint Video Primer, and sorry for the extra wait! This part will look more closely at the work we put in to make Vidhance as general-purpose as possible, and what those other purposes may be for video technology.
It’s not rocket science, but still…
Writing software is hard. Writing a high-performance video technology SDK like Vidhance is very hard. And taking the time to make sure your code doesn’t just solve one problem on one particular set of hardware takes even more effort. Fortunately, these are just the skills we have assembled at Imint.
At a high level, Vidhance is a system that takes video input, processes it, and outputs either a modified frame (e.g. stabilization), metadata about the frame (e.g. location of a tracked object), or both. This is a generic description, and for a good reason. Our work neither started, nor ends, with where we are now.
Imint actually started its video software business in the defense industry over ten years ago. Video from unmanned vehicles underwater or in the air quickly needed the best possible enhancements, such as contrast optimization and stabilization. The technology was later expanded to UAV applications, drones, and solutions for air traffic control. In 2013, smartphone manufacturers realized that our technology for real time video could fit perfectly in a smartphone. Soon after that the first version of the Android-based video stabilization software Vidhance was released.
Although smartphones are the current focus and we settle for nothing less than the world’s best in our key areas, we are also constantly looking towards the future and the next big thing – whatever that may be. Looking towards the future doesn’t just mean keeping an eye out for what to do next. It also means planning ahead for the software we’re writing now, instead of only making easy gains for the current, specific problem. This doesn’t mean “slow and steady like the turtle” – we are smart, efficient, and planning ahead.
For example, drones are one of the hottest products in technology today. Their future seem bright both as consumer products and in other expanding commercial applications. Vision processing-enabled capabilities required are, for example, collision avoidance, broader autonomous navigation, terrain analysis and subject tracking. Collision avoidance is not only relevant for fully autonomous navigation, but also for “copilot” assistance when the drone is primarily controlled by a human being, analogous to today’s driver-assisted systems in cars.
These key features may expand the drone market of tomorrow by making drones more capable and easy to use. The algorithms that will be used, whether they exist today or will be researched in the years to come, are typically applicable to a wide range of problems, making Vidhance a great cornerstone in the growing video market. But a general problem domain isn’t enough. The implementation – the software – must also be general.
How does software stay general?
Stabilizing video is a very general problem. Doing it for surveillance aircraft in the defense industry is quite different from doing it on top-quality smartphone hardware, which is rather different from budget smartphone hardware, which is quite different from ATC towers or drones or … .
For Kontigo Cares’ advanced eHealth platform Previct®, Vidhance is used in an app to detect the effect of drug-inducing stimulants on the iris of the eye. Albeit a specific use case, the Vidhance SDK should make it easy to add features such as Object Tracker, Live Auto Zoom, Video Stabilization, should that be necessary, or if part(s) of these modules are needed to solving new problems. It all needs to be easy and fast, because time is money.
While algorithms may stay the same (with small tweaks) for long time, hardware and sensor data tend to look and work very differently across different devices. Staying general doesn’t just mean being open to different problems and use cases, it means decreasing the time required to integrate Vidhance into completely new hardware. For example, something we do a lot of is shuffle a large number of pixels around and manipulating them. Depending on the hardware, this is accomplished in very different ways. A 4K video frame contains over 8 million pixels. – a lot of data.
The word hardcoding is frequently used in software development context to denote something that is written specifically for exactly one configuration, and is generally frowned upon. A simple CPU will have to simply modify them one by one, which can take a lot of time. The easy way out is to hardcode this option into the software, since it will always work, there is always a CPU. But it’s also a very slow option.
Most CPUs have multiple cores, meaning they can carry out several instructions at once, and then we can divide the big chunk of data into smaller subsets for each core that can be processed simultaneously, considerably reducing the time required. Some devices have graphics cards. Some devices have even more specialized hardware, like FPGAs or DSPs. For smartphones as well as drones, the cost, performance and power consumption of different subsystems are taken into account when designing a product. Size and weight become more important for drones. Different technologies deliver different tradeoffs and software like Vidhance needs to be easily adjusted for this, preferably even be auto-adjusting.
Figure 1: Vidhance works for general input video, is configured with the right toolset for the current hardware, and outputs its result.
The optimal way to process the data also depends on other factors, for example if there’s other software running at the same time also in need system resources. In short, setting up a render pipeline in a smart way takes more time than a hardcoded solution (in the short run), but in the long run unfolds into the competent, scalable, state-of-the-art platform we call Vidhance. Our SDK always offers the integrator a chance to balance performance vs. computational cost.
Time waits for nobody
Staying general also means using modern programming techniques and languages to allow to use complex things, while still remaining compatible with older hardware. Let’s take a look at a small example of what this may entail. How about getting the current (precise) time and date? That’s simple enough, right? Well, yes, this is done quite easily in both Linux (the base of Android and most IoT systems) and Windows. On Linux, this can be done with a few lines of programming code:
result = asctime(localtime(tm&));
It’s not too bad in Windows either. Unfortunately, Windows reports the time and date in another format! Since the output (or internal workings) of Vidhance should work exactly the same regardless of operating system, we need to be proactive about even small details like these. Therefore, if we are building Vidhance for a Windows platform, our build system swaps out the above code for something like this:
length = GetDateFormat(LOCALE_USER_DEFAULT, 0, NULL, NULL, NULL, 0);
char* date = (char*)calloc(1, length);
char* time = (char*)calloc(1, length);
GetDateFormat(LOCALE_USER_DEFAULT, 0, NULL, NULL, date, length);
GetTimeFormat(LOCALE_USER_DEFAULT, 0, NULL, NULL, time, length);
There’s no need to understand the details here. The main takeaway is that even a small detail like this requires effort. When Vidhance is built for a customer, information about the target system is provided along with specific customer requests, and the correct pieces of code are automatically selected and compiled, here and in a thousand other places, all ensuring the best possible performance and compatibility.
Remember the first sentence: Writing (any kind of) software is hard. We strive towards making Vidhance as general-purpose as possible, to leverage our knowledge in smartphones, in drones, and other technologies down the line. A typical implementation project often requires more than just installing a packaged product. There’s normally more work needed for integration and fine-tuned algorithms. Our clients may request everything from customized products to pre-testing or characterization evaluation in our DxO lab. The results are distilled down to a few important key variables. These are input into Vidhance which automatically adjusts to make the best video enhancements and analysis possible.
As this text hopefully explains, a “semi-automatic” process like this takes some effort in the short run, but certainly pays off when scaling up to many more clients, more features for those clients, and different devices from those clients. And we’ve barely scratched the surface.
Future texts may touch upon more specific details in hardware and software (the entire “camera stack”), or the fascinating generality in the mathematics behind the things Vidhance can accomplish. Please be in touch about which direction you think this blog should take!
There really are so many exciting details. Unfortunately, Vidhance does not provide a time machine to give me infinite time to share it all with you in this post. Not yet, anyway.