The Rise Of Computer Vision

See the World in more clarity – Across a wider spectrum

Who would have thought that a few 1960s experiments to detect the edge of objects and categorize simple shapes would spawn arguably the biggest trend in AI? Those first forays into the field of computer vision inspired modern neural networks that are now supporting an explosion of use cases in artificial intelligence. You can find it in crime prevention and security – recognizing faces with incredible accuracy, even from lampposts as suspects speed down roads in cars. Retailers can help you search for jeans worn by your favorite celebrity. In-store personalized advertising and quicker checkout times are transforming the customer experience. Insurance firms can expedite claims handling. Google will help you around foreign locations by translating signs and other text captured by your smartphone.

For CIOs, data scientists and analytics leaders whose organizations are considering the potential of computer vision, note what a significant growth area this application of AI is. Revenues from computer vision software, hardware and services are forecast to boom from $1.1 billion in 2016 to $26.2 billion by 2025. This means many of your competitors are either planning to adopt computer vision as part of their AI strategy or are already working with it.

The availability of affordable high-performance compute power, the huge variety of visual data that pervades everyday life and more sophisticated algorithms are driving computer vision uptake. Think visual data is only generated in relatively small quantities? Forbes reports that 300 million photos are uploaded to Facebook every day and more than 95 million photos and videos are shared on Instagram.

To make full use of this visual data – whether to augment existing human processes, to radically change the way your organization operates, or to come up with wholly new services, your teams, your lines of business and your senior leaders must understand the use cases of computer vision.

What Is Computer Vision?

In essence, computer vision is a subset of artificial intelligence that enables computers to see and make sense of the world. It is the AI application that allows a computer to learn to analyze information from photos to video to thermal and infrared data, amongst other sources, and then make decisions or come to a clear understanding of the environment or situation based on that information.

Computer vision is an incredibly sophisticated tool. The best way to appreciate this complexity is to consider how human sight works. Human eyes have some six to seven million cone cells, each of which contains one of three color-sensitive proteins, called opsins. When light photons make contact with these opsins a reaction occurs that generates electrical signals. This message is transmitted to the brain where it’s interpreted.

Building an application capable of replicating that process is phenomenally intricate. Such a system would have to be capable of processing and interpreting visual information so that it can be used for pattern and object recognition, and for adapting 2D images from our 3D world into 3D information. Yet in just a decade, the accuracy of object identification rates has increased from 50% to an incredible 99%.3 This makes computer vision more accurate than humans at reacting rapidly to visual data.

How Does Computer Vision Work?

Computer vision is based on deep learning, a form of machine learning that trains computers to perform human-like tasks. Currently, these broadly include understanding natural language, identifying and classifying objects in images or making predictions, but other uses are evolving. Rather than running data through a series of predefined equations, deep learning establishes the structure of the model through some basic parameters concerning the data, then trains the computer to learn on its own by recognizing patterns using many layers of processing.

To emulate human sight, machines need to acquire, process, analyze and understand images, which is made possible due to the iterative learning process of deep neural networks.

The process begins with a curated set of images or video data, known as training data. This is used to help the machines learn certain things about a certain topic; think damage to vehicles for the insurance industry or prohibited luggage items for the airline industry.

Any training data will require within it sample images of luggage containing prohibited items, such as aerosols, weapons, and liquid containers, as well as those containing permitted items. Each image will be tagged with metadata indicating the correct answer – in this case, permitted or prohibited.

A neural network will process the visual data, using pattern recognition to identify the many different components of an image. Its outputs, or ‘answers’ as to whether an item is allowed or not, are fed back into the system allowing it to learn and improve in accuracy. So, instead of a human attributing certain characteristics to items, the machine learns from the images it receives.

DeepStream SDK

Build and deploy AI-powered Intelligent Video Analytics apps and services. DeepStream offers a multi-platform scalable framework with TLS security to deploy on the edge and connect to any cloud.

There are billions of cameras and sensors worldwide, capturing an abundance of data that can be used to generate business insights, unlock process efficiencies and improve revenue streams. Whether it’s at a traffic intersection to reduce vehicle congestion, health and safety monitoring at hospitals, surveying retail aisles for better customer satisfaction, sports analytics or at a manufacturing facility to detect component defects- every application demands reliable, real-time Intelligent Video Analytics (IVA).

Powerful & Flexible SDK

 A unified SDK suitable for a multitude of use-cases across a broad set of industries.

Real-time Insights

Understand rich and multimodal sensor data at the edge.

Managed AI Services

Deploy AI services in cloud native containers and orchestrate using Kubernetes.

Reduced TCO

Train with Transfer Learning Toolkit and use DeepStream to increase stream density.

NVIDIA’s DeepStream SDK delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video and image understanding. DeepStream is for vision AI developers, software partners, startups and OEMs building IVA apps and services.

Advanced video analysis solutions are in great demand across multiple industries. Some of the popular use-cases include retail aisles to understand customer brand sentiment, occupancy analytics for crowd management in mass transit locations, optimizing vehicle traffic patterns in cities, hospitals, and malls for social distancing protocols, and defect detection in manufacturing facilities.

Building and deploying these solutions involves massive engineering efforts such as gathering and collecting relevant datasets, extensive training of AI models for high accuracy, real-time performance with large scale deployment, and manageability. 

Transfer Learning Toolkit helps novice AI application developers and software developers accelerate AI training by providing various pre-trained AI models and in-built capabilities such as transfer learning, pruning, fine-tuning, and Quantization Aware Training. Developers can build highly accurate AI for several popular use cases using purpose-built models such as PeopleNet, VehicleMakeNet, TrafficCamNet, DashCamNet, and more. 

NVIDIA DeepStream SDK helps developers and companies build performant vision AI apps and services that can be deployed at scale and managed with ease using Kubernetes and Helm Charts.

Achieving Higher Accuracy & Real-Time Performance Using DeepStream 

 DeepStream offers exceptional throughput for a wide variety of object detection, image classification and instance segmentation based AI models. To reduce development efforts and increase throughput, developers can use highly accurate pre-trained models from Transfer Learning Toolkit (TLT) and deploy with DeepStream. The following table shows the end-to-end inference performance on 1080p/30fps input stream. Note that running on the DLAs for Jetson Xavier NX and Jetson AGX Xavier frees up GPU for other tasks

With DeepStream SDK you can apply AI to streaming video and can simultaneously optimize video decode/encode, image scaling and conversion and edge-to-cloud connectivity for complete end-to-end performance optimization.

This plot summarizes stream density achieved at 1080p/30 FPS across various NVIDIA products. 

Seamless Development

Sovereign can build seamless streaming pipelines for AI-based video and image analytics using DeepStream.

DeepStream is built for enterprises and offers extensive AI model support for popular object detection and segmentation models such as state of the art SSD, YOLO, FasterRCNN, and MaskRCNN.

Deepstream offers the flexibility for rapid prototyping to full production level solutions and greater flexibility by allowing you to choose your inference path. With native integration to NVIDIA Triton Inference Server, you can deploy models in native frameworks such as PyTorch and TensorFlow for inference or achieve the best possible performance using NVIDIA TensorRT for high throughput inference with options for multi-GPU, multi-stream and batching support.

Managed IVA Apps & Services

For a real world IVA app/ service deployment, remote management and control of applications is critical. DeepStream SDK can run in any cloud and at the edge which makes it a powerful SDK to handle IoT requirements such as effective bi-directional messaging between edge and the cloud, security, smart recording and Over-the-Air AI model update.

  • With bi-directional messaging between edge and cloud , you can add greater control for use-cases such as remote triggers for event recording, change operating parameters and app configurations or request system logs.
  • The smart record feature in DeepStream app allows you to save valuable disk space on the edge with selective recording that enables faster searchability . You can use cloud-to-edge messaging to quickly trigger recording from the cloud.
  • Seamless Over-the-Air (OTA) update for the entire app or individual AI models from any cloud registry to continuously improve accuracy with zero downtime.
  • For secure IoT device communication , DeepStream provides two-way TLS authentication based on SSL certificates and encrypted communication based on public key authentication.

DeepStream offers an IoT integration interface with Kafka, MQTT and AMQP and turnkey integration with AWS IoT and Microsoft Azure IoT.

You can build high performance DeepStream cloud native applications with NVIDIA NGC containers. By using DeepStream, you can deploy at scale and manage containerized apps with Kubernetes and Helm Charts.

Powerful End-to-End AI Solutions

Speed up overall development efforts and unlock greater real-time performance by building an end-to-end vision AI system with NVIDIA Transfer Learning Toolkit (TLT), production quality vision AI models and deploying at the edge using DeepStream. DeepStream offers turnkey integration of several detection and segmentation models trained with TLT including SSD, MaskRCNN, YOLOv3, RetinaNet and more.

Amazon Rekognition

Automate your image and video analysis with machine learning.

Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use. With Amazon Rekognition, you can identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial search capabilities that you can use to detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases.

With Amazon Rekognition Custom Labels, you can identify the objects and scenes in images that are specific to your business needs. For example, you can build a model to classify specific machine parts on your assembly line or to detect unhealthy plants. Amazon Rekognition Custom Labels takes care of the heavy lifting of model development for you, so no machine learning experience is required. You simply need to supply images of objects or scenes you want to identify, and the service handles the rest.

FEATURES

Labels

With Amazon Rekognition, you can identify thousands of objects (such as bike, telephone, building), and scenes (such as parking lot, beach, city). When analyzing video, you can also identify specific activities such as “delivering a package” or “playing soccer”.

Custom labels

With Amazon Rekognition Custom Labels, you can extend the detection capabilities of Amazon Rekognition to extract information from images that is uniquely helpful to your business. For example, you can find your corporate logo in social media, identify your products on store shelves, classify your machine parts in an assembly line, or detect your animated characters in videos.

Content moderation

Amazon Rekognition helps you identify potentially unsafe or inappropriate content across both image and video assets and provides you with detailed labels that allow you to accurately control what you want to allow based on your needs. Use Amazon A2I to enhance the accuracy of Amazon Rekognition image moderation predictions using human review.

Text detection

In photos and videos, text appears very differently than neat words on a printed page. Amazon Rekognition can read skewed and distorted text to capture information like store names, forced narratives overlaid on media, street signs, and text on product packaging.

Face detection and analysis

With Amazon Rekognition, you can easily detect when faces appear in images and videos and get attributes such as gender, age range, eyes open, glasses, facial hair for each. In video, you can also measure how these face attributes change over time, such as constructing a timeline of the emotions expressed by an actor.

Face search and verification

Amazon Rekognition provides fast and accurate face search, allowing you to identify a person in a photo or video using your private repository of face images. You can also verify identity by analyzing a face image against images you have stored for comparison.

Celebrity recognition

You can quickly identify well known people in your video and image libraries to catalog footage and photos for marketing, advertising, and media industry use cases.

Pathing

You can capture the path of people in the scene when using Amazon Rekognition with video files. For example, you can use the movement of athletes during a game to identify plays for post-game analysis.

USES

Media Analysis

Make content searchable

Amazon Rekognition automatically extracts metadata from your image and video files, capturing objects, faces, text and more. This metadata can be used to easily search your images and videos with keywords, or to find the right assets for content syndication.

Media Analysis

Enable digital identity verification

Using Amazon Rekognition, you can create scalable authentication workflows for automated payments and other identity verification scenarios. Amazon Rekognition lets you easily perform face verification for opted-in users by comparing a photo or selfie with an identifying document such a driver’s license.

Respond quickly to public safety challenges

Amazon Rekognition allows you to create applications that help find missing persons in images and videos. By searching for their faces against a database of missing persons that you provide, you can accurately flag potential matches and speed up a rescue operation.

Identify products, landmarks and brands

App developers can use Amazon Rekognition Custom Labels to identify specific items in social media and photo apps. For example, you could train a custom model to identify famous landmarks in a city to provide tourists with information about its history, operating hours, and ticket prices by simply taking a photo.

Digital Innovation Delivered!

We design, build and deliver technology-based innovations that create significant competitive advantage and revenue growth potential for our clients.

Contact Us Today

2 + 9 =