Category Archives : Media Services & CDN



Using AI to automatically redact faces in videos
Using AI to automatically redact faces in videos

In the last few years, many law enforcement agencies have adopted body worn cameras. In this blog post, I will provide some background on what is driving the growth and will talk about how AI can help law enforcement agencies with the processing of videos captured by body-worn cameras.

Background on body-worn cameras

A body worn camera is a wearable audio, video or photographic recording system. Law enforcement agencies are not the only consumers of body-worn cameras. Other consumers include journalists, medical professionals, athletes, and so on. The forecast unit shipments of body-worn cameras can be seen on this webpage published by Statista.

The National Institute of Justice (NIJ), the research, development and evaluation agency of the US Department of Justice, conducted research on body-worn cameras for law enforcement and conducted a market survey on body-worn cameras for criminal justice. The survey updated in 2016, aggregates and summarizes information on a number of makes and models of body-worn cameras available today, including the approximate costs of each unit. The full market survey on body-worn camera technologies can be found on NIJ’s website.

Freedom of Information Act (FOIA)

FOIA is defined on as a law that gives citizens the right




Brand Detection in Microsoft Video Indexer

We are delighted to announce a new capability in Microsoft Video Indexer: Brand Detection from speech and from visual text! If you are not yet familiar with Video Indexer, you may want to take a look at a few examples on our portal.

Having brands in the video index gives you insights on names of products and organizations, which appear in a video or audio asset without having to watch it. Particularly, it enables you to search over large amounts of video and audio. Customers find Brand Detection useful in a wide variety of business scenarios such as contents archive and discovery, contextual advertising, social media analysis, retail compete analysis and many more.

Out of the box brand detection

Let us take a look at an example. In this Microsoft Build 2017 Day 2 presentation, the brand “Microsoft Windows” appears multiple times. Sometimes in the transcript, sometimes as visual text and never as verbatim. Video Indexer detects with high precision that a term is indeed brand based on the context, covering over 90k brands out of the box, and constantly updating. At 02:25, Video Indexer detects the brand from speech and then again at 02:40 from visual text, which is




How is AI for video different from AI for images

Extracting insights from video, or using AI technologies, presents an additional set of challenges and opportunities for optimization as compared to images. There is a misconception that AI for video is simply extracting frames from a video and running computer vision algorithms on each video frame. While you can certainly do that but that would not help you get the insights that you are truly after. In this blog post, I will use a few examples to explain the shortcomings of taking an approach of just processing individual video frames. I will not be going over the details of the additional algorithms that are required to overcome these shortcomings. Video Indexer implements several such video specific algorithms.

Person presence in the video

Look at the first 25 seconds of this video.

Notice that Doug is present for the entire 25 seconds.

If I were to draw a timeline for when Doug is present in the video, it should be something like this.


Note the fact that Doug is not always facing the camera. Seven seconds in the video he is looking at Emily. Same thing happens at 23 seconds.

If you were to run face detection at




Bring your own vocabulary to Microsoft Video Indexer

Self-service customization for speech recognition

Video Indexer (VI) now supports industry and business specific customization for automatic speech recognition (ASR) through integration with the Microsoft Custom Speech Service!

ASR is an important audio analysis feature in Video Indexer. Speech recognition is artificial intelligence at its best, mimicking the human cognitive ability to extract words from audio. In this blog post, we will learn how to customize ASR in VI, to better fit specialized needs.

Before we get in to technical details, let’s take inspiration from a situation we have all experienced. Try to recall your first days on a job. You can probably remember feeling flooded with new words, product names, cryptic acronyms, and ways to use them. After some time, however, you can understand all these new words. You adapted yourself to the vocabulary.

ASR systems are great, but when it comes to recognizing a specialized vocabulary, ASR systems are just like humans. They need to adapt. Video Indexer now supports a customization layer for speech recognition, which allows you to teach the ASR engine new words, acronyms, and how they are used in your business context.

How does Automatic Speech Recognition work? Why is customization needed?

Roughly speaking,