Your large archive of videos to index is ever-expanding, thus you have been evaluating Microsoft Video Indexer and decided that you want to take your relationship with it to the next level by scaling up.
In general, scaling shouldn’t be difficult, but when you first face such process you might not be sure what is the best way to do it. Questions like “are there any technological constraints I need to take into account?”, “Is there a smart and efficient way of doing it?”, and “can I prevent spending excess money in the process?” can cross your mind. So, here are six best practices of how to use Video Indexer at scale.
1. When uploading videos, prefer URL over sending the file as a byte array
Video Indexer does give you the choice to upload videos from URL or directly by sending the file as a byte array, but remember that the latter comes with some constraints.
First, it has file size limitations. The size of the byte array file is limited to 2 GB compared to the 30 GB upload size limitation while using URL.
Second and more importantly for your scaling, sending files using multi-part means high dependency
We are pleased to introduce the ability to export high-resolution keyframes from Azure Media Service’s Video Indexer. Whereas keyframes were previously exported in reduced resolution compared to the source video, high resolution keyframes extraction gives you original quality images and allows you to make use of the image-based artificial intelligence models provided by the Microsoft Computer Vision and Custom Vision services to gain even more insights from your video. This unlocks a wealth of pre-trained and custom model capabilities. You can use the keyframes extracted from Video Indexer, for example, to identify logos for monetization and brand safety needs, to add scene description for accessibility needs or to accurately identify very specific objects relevant for your organization, like identifying a type of car or a place.
Let’s look at some of the use cases we can enable with this new introduction.
Using keyframes to get image description automatically
You can automate the process of “captioning” different visual shots of your video through the image description model within Computer Vision, in order to make the content more accessible to people with visual impairments. This model provides multiple description suggestions along with confidence values for an image. You can take the descriptions
https://azure.microsoft.com/blog/preview-live-transcription-with-azure-media-services/Azure Media Services provides a platform with which you can broadcast live events. You can use our APIs to ingest, transcode, and dynamically package and encrypt your live video feeds for delivery via industry-standard protocols like HTTP Live Streaming (HLS) READ MORE
Animated character recognition, multilingual speech transcription and more now available
At Microsoft, our mission is to empower every person and organization on the planet to achieve more. The media industry exemplifies this mission. We live in an age where more content is being created and consumed in more ways and on more devices than ever. At IBC 2019, we’re delighted to share the latest innovations we’ve been working on and how they can help transform your media workflows. Read on to learn more, or join our product teams and partners at Hall 1 Booth C27 at the RAI in Amsterdam from September 13th to 17th.
Video Indexer adds support for animation and multilingual content
We made our award winning Azure Media Services Video Indexer generally available at IBC last year, and this year it’s getting even better. Video Indexer automatically extracts insights and metadata such as spoken words, faces, emotions, topics and brands from media files, without you needing to be a machine learning expert. Our latest announcements include previews for two highly requested and differentiated capabilities for animated character recognition and multilingual speech transcription, as well as several additions to existing models available today in Video Indexer.
SIGGRAPH is back in Los Angeles and so is Microsoft Azure! I hope you can join us at Booth #1351 to hear from leading customers and innovative partners.
Teradici, Bebop, Support Partners, Blender, and more will be there to showcase the latest in cloud-based rendering and media workflows:
See a real-time demonstration of Teradici’s PCoIP Workstation Access Software, showcasing how it enables a world-class end-user experience for graphics-accelerated applications on Azure’s NVIDIA GPUs. Experience a live demonstration of industry-standard visual effects, animation, and other post-production tools on the BeBop platform. It is the leading solution for cloud-based media and entertainment workflows, creativity, and collaboration. Learn more about how cloud-integrator Support Partners enables companies to run complex and exciting hybrid workflows in Azure. Be the first to hear about Azure’s integration with Blender’s render manager Flamenco and how users can easily deploy a completely virtual render farm and file server. The Azure Flamenco Manager will be freely available on GitHub, and we can’t wait to hear how it is being used and get your feedback.
We’re also demonstrating how you can simplify the creation and management of hybrid cloud rendering environments, get the most of your on-prem investments while bursting to
Video Indexer (VI), the AI service for Azure Media Services enables the customization of language models by allowing customers to upload examples of sentences or words belonging to the vocabulary of their specific use case. Since speech recognition can sometimes be tricky, VI enables you to train and adapt the models for your specific domain. Harnessing this capability allows organizations to improve the accuracy of the Video Indexer generated transcriptions in their accounts.
Over the past few months, we have worked on a series of enhancements to make this customization process even more effective and easy to accomplish. Enhancements include automatically capturing any transcript edits done manually or via API as well as allowing customers to add closed caption files to further train their custom language models.
The idea behind these additions is to create a feedback loop where organizations begin with a base out-of-the-box language model and improve its accuracy gradually through manual edits and other resources over a period of time, resulting with a model that is fine-tuned to their needs with minimal effort.
Accounts’ custom language models and all the enhancements this blog shares are private and are not shared between accounts.
In the following sections I
Putting the intelligent cloud to work for content creators, owners and storytellers.
Stories entertain us, make us laugh and cry, and are the lens through which we perceive our world. In that world, increasingly overloaded with information, they catch our attention and, if they catch our hearts, we engage. This makes stories powerful, and it’s why so many large technology companies are investing heavily in content – creating it and selling it.
At Microsoft, we’re not in the business of content creation.
Why? Our mission is to help every person and organization on the planet achieve more. So instead of creating or owning content, we want to provide platforms to help content creators and owners achieve more – from the Intelligent Cloud to the Intelligent Edge, with industry leading artificial intelligence (AI). We’re excited to see that mission come to life through customers such as Endemol Shine, Multichoice, RTL, Ericsson and partners like Avid, Akamai, Haivision, Pipeline FX and Verizon Digital Media Services. And we are excited to announce new Azure rendering, Azure Media Services, Video Indexer and Azure Networking capabilities to help you achieve more at NAB Show 2019. Cue scene.
Fix it in post: higher resolution, less
After sweeping up multiple awards with the general availability release of Azure Media Services’ Video Indexer, including the 2018 IABM for innovation in content management and the prestigious Peter Wayne award, the team has remained focused on building a wealth of new features and models to allow any organization with a large archive of media content to unlock insights from their content; and use those insights improve searchability, enable new user scenarios and accessibility, and open new monetization opportunities.
At NAB Show 2019, we are proud to announce a wealth of new enhancements to Video indexer’s models and experiences, including:
A new AI-based editor that allows you to create new content from existing media within minutes Enhancements to our custom people recognition, including central management of models and the ability to train models from images Language model training based on transcript edits, allowing you to effectively improve your language model to include your industry-specific terms New scene segmentation model (preview) New ending rolling credits detection models Availability in 9 different regions worldwide ISO 27001, ISO 27018, SOC 1,2,3, HiTRUST, FedRAMP, HIPAA, and PCI certifications Ability to take your data and trained models with you when moving from trial to paid
Want to train Video Indexer to recognize people relevant specifically to your account? We have great news for you!
Face detection and recognition are both very widely used insights that Video Indexer provides. The face recognition feature includes the ability to recognize around 1M celebrity faces out of the box and to train account level custom Person models to recognize non-celebrity people who are relevant to a customer’s specific organization. We received multiple requests from customers to further enhance the capabilities of custom Person models. Today, we are happy to announce a wealth of enhancements that makes custom Person model training and management faster and easier.
These enhancements include a centralized custom Person model management page that allows you to create multiple models in your account. Each of these models can hold up to 1M different people. From this page, you can create new models and add new people to existing models. Here, you can also review, rename, and delete your models if needed. On top of that, you can now train your account to identify people based on images of people’s faces even before you upload any video to your account (public preview). For instance, organizations that already have
Video Indexer is an Azure service designed to extract deep insights from video and audio files offline. This is to analyze a given media file already created in advance. However, for some use cases it’s important to get the media insights from a live feed as quick as possible to unlock operational and other use cases pressed in time. For example, such rich metadata on a live stream could be used by content producers to automate TV production, like our example of EndemolShine Group, by journalists of a newsroom to search into live feeds, to build notification services based on content and more.
To that end, I joined forces with Victor Pikula a Cloud Solution Architect at Microsoft, in order to architect and build a solution that allows customers to use Video Indexer in near real-time resolutions on live feeds. The delay in indexing can be as low as four minutes using this solution, depending on the chunks of data being indexed, the input resolution, the type of content and the compute powered used for this process.
Figure 1 – Sample player displaying the Video Indexer metadata on the live stream
The stream analysis solution at hand, uses Azure