We are pleased to introduce the ability to export high-resolution keyframes from Azure Media Service’s Video Indexer. Whereas keyframes were previously exported in reduced resolution compared to the source video, high resolution keyframes extraction gives you original quality images and allows you to make use of the image-based artificial intelligence models provided by the Microsoft Computer Vision and Custom Vision services to gain even more insights from your video. This unlocks a wealth of pre-trained and custom model capabilities. You can use the keyframes extracted from Video Indexer, for example, to identify logos for monetization and brand safety needs, to add scene description for accessibility needs or to accurately identify very specific objects relevant for your organization, like identifying a type of car or a place.
Let’s look at some of the use cases we can enable with this new introduction.
Using keyframes to get image description automatically
You can automate the process of “captioning” different visual shots of your video through the image description model within Computer Vision, in order to make the content more accessible to people with visual impairments. This model provides multiple description suggestions along with confidence values for an image. You can take the descriptions
This post is co-authored by Anny Dow, Product Marketing Manager, Azure Cognitive Services.
In an age where low-latency and data security can be the lifeblood of an organization, containers make it possible for enterprises to meet these needs when harnessing artificial intelligence (AI).
Since introducing Azure Cognitive Services in containers this time last year, businesses across industries have unlocked new productivity gains and insights. The combination of both the most comprehensive set of domain-specific AI services in the market and containers enables enterprises to apply AI to more scenarios with Azure than with any other major cloud provider. Organizations ranging from healthcare to financial services have transformed their processes and customer experiences as a result.
These are some of the highlights from the past year:
Employing anomaly detection for predictive maintenance
Airbus Defense and Space, one of the world’s largest aerospace and defense companies, has tested Azure Cognitive Services in containers for developing a proof of concept in predictive maintenance. The company runs Anomaly Detector for immediately spotting unusual behavior in voltage levels to mitigate unexpected downtime. By employing advanced anomaly detection in containers without further burdening the data scientist team, Airbus can scale this critical capability across
Multi-language speech transcription was recently introduced into Microsoft Video Indexer at the International Broadcasters Conference (IBC). It is available as a preview capability and customers can already start experiencing it in our portal. More details on all our IBC2019 enhancements can be found here.
Multi-language videos are common media assets in the globalization context, global political summits, economic forums, and sport press conferences are examples of venues where speakers use their native language to convey their own statements. Those videos pose a unique challenge for companies that need to provide automatic transcription for video archives of large volumes. Automatic transcription technologies expect users to explicitly determine the video language in advance to convert speech to text. This manual step becomes a scalability obstacle when transcribing multi-language content as one would have to manually tag audio segments with the appropriate language.
Microsoft Video Indexer provides a unique capability of automatic spoken language identification for multi-language content. This solution allows users to easily transcribe multi-language content without going through tedious manual preparation steps before triggering it. By that, it can save anyone with large archive of videos both time and money, and enable discoverability and accessibility scenarios.
Multi-language audio transcription in Video
This post was co-authored by Tina Coll, Sr Product Marketing Manager, Azure Cognitive Services.
Innovate at no cost to you, with out-of-the box AI services that are newly available for Azure free account users. Join the 1.3 million developers who have been using Cognitive Services to build AI powered apps to date. With the broadest offering of AI services in the market, Azure Cognitive Services can unlock AI for more scenarios than other cloud providers. Give your apps, websites, and bots the ability to see, understand, and interpret people’s needs — all it takes is an API call — by using natural methods of communication. Businesses in various industries have transformed how they operate using the very same Cognitive Services now available to you with an Azure free account.
These examples are just a small handful of what you can make possible with these services:
Improve app security with face detection: With Face API, detect and compare human faces. See how Uber uses Face API to authenticate drivers. Automatically extract text and detect languages: Easily and accurately detect the language of any text string, simplifying development
Who spends their summer at the Microsoft Garage New England Research & Development Center (or “NERD”)? The Microsoft Garage internship seeks out students who are hungry to learn, not afraid to try new things, and able to step out of their comfort zones when faced with ambiguous situations. The program brought together Grace Hsu from Massachusetts Institute of Technology, Christopher Bunn from Northeastern University, Joseph Lai from Boston University, and Ashley Hong from Carnegie Mellon University. They chose the Garage internship because of the product focus—getting to see the whole development cycle from ideation to shipping—and learning how to be customer obsessed.
Microsoft Garage interns take on experimental projects in order to build their creativity and product development skills through hacking new technology. Typically, these projects are proposals that come from our internal product groups at Microsoft, but when Stanley Black & Decker asked if Microsoft could apply image recognition for asset management on construction sites, this team of four interns accepted the challenge of creating a working prototype in twelve weeks.
Starting with a simple request for leveraging image recognition, the team conducted market analysis and user research to ensure the product would stand out and prove useful.
Data extraction from printed forms is by now a tried and true technology. Form Recognizer extracts key value pairs, tables and text from documents such as W2 tax statements, oil and gas drilling well reports, completion reports, invoices, and purchase orders. However, real-world businesses often rely on a variety of documents for their day-to-day needs that are not always cleanly printed.
We are excited to announce the addition of handwritten and mixed-mode (printed and handwritten) support. Starting now, handling handwritten and mixed-mode forms is the new norm.
Extracting data from handwritten and mixed-mode content with Form Recognizer
Entire data sets that were inaccessible in the past due to the limitations of extraction technology now become available. The handwritten and mixed-mode capability of Form Recognizer is available in preview and enables you to extract structured data out of handwritten text filled in forms such as:
Medical forms: New patient information, doctor notes. Financial forms: Account opening forms, credit card applications. Insurance: Claim forms, liability forms. Manufacturing forms: Packaging slips, testing forms, quality forms. And more.
By using our vast experience in optical character recognition (OCR) and machine learning for form analysis, our experts created a state-of-the-art solution that goes beyond printed forms. The OCR technology
Video Indexer (VI), the AI service for Azure Media Services enables the customization of language models by allowing customers to upload examples of sentences or words belonging to the vocabulary of their specific use case. Since speech recognition can sometimes be tricky, VI enables you to train and adapt the models for your specific domain. Harnessing this capability allows organizations to improve the accuracy of the Video Indexer generated transcriptions in their accounts.
Over the past few months, we have worked on a series of enhancements to make this customization process even more effective and easy to accomplish. Enhancements include automatically capturing any transcript edits done manually or via API as well as allowing customers to add closed caption files to further train their custom language models.
The idea behind these additions is to create a feedback loop where organizations begin with a base out-of-the-box language model and improve its accuracy gradually through manual edits and other resources over a period of time, resulting with a model that is fine-tuned to their needs with minimal effort.
Accounts’ custom language models and all the enhancements this blog shares are private and are not shared between accounts.
In the following sections I
At Build, we highlighted a few customers who are building conversational experiences using the Bot Framework to transform their customer experiences. For example, BMW discussed its work on the BMW Intelligent Personal Assistant to deliver conversational experiences across multiple canvases by leveraging the Bot Framework and Cognitive Services. LaLiga built their own virtual assistant which allows fans to experience and interact with LaLiga across multiple platforms.
With the Bot Framework release in July, we are happy to share new releases of Bot Framework SDK 4.5 and preview of 4.6, updates to our developer tools, and new channels in Azure Bot Service. We’ll use the opportunity to provide additional updates for the Conversational AI releases from Microsoft.
Bot Framework channels
We continue to expend channels support and functionality for Bot Framework and Azure Bot Service.
Voice-first bot applications: Direct Line Speech preview
The Microsoft Bot Framework lets you connect with your users wherever your users are. We offer thirteen supported channels, including popular messaging apps like Skype, Microsoft Teams, Slack, Facebook Messenger, Telegram, Kik, as well as a growing number of community adapters.
Today, we are happy to share the preview of Direct Line Speech channel. This is a new channel
One of the newest members of the Azure AI portfolio, Form Recognizer, applies advanced machine learning to accurately extract text, key-value pairs, and tables from documents. With just a few samples, it tailors its understanding to supplied documents, both on-premises and in the cloud.
Introducing the new pre-built receipt capability
Form Recognizer focuses on making it simpler for companies to utilize the information hiding latent in business documents such as forms. Now we are making it easier to handle one the most commonplace documents in a business, receipts, “out of the box.” Form Recognizer’s new pre-built receipt API identifies and extracts key information on sales receipts, such as the time and date of the transaction, merchant information, the amounts of taxes, totals, and more, with no training required.
Streamlining expense reporting
Business expense reporting can be cumbersome for everyone involved in the process. Manually filling out and approving expense reports is a significant time sink for both employees and managers. Aside from productivity lost to expense reporting, there are also pain points around auditing expense reports. A solution to automatically extract merchant and transaction information from receipts can significantly reduce the manual effort of reporting and auditing expenses.
Data is rarely simple. Not every piece of data we have can fit nicely into a single Excel worksheet of rows and columns. Data has many diverse relationships such as the multiple locations and phone numbers for a single customer or multiple authors and genres of a single book. Of course, relationships typically are even more complex than this, and as we start to leverage AI to understand our data the additional learnings we get only add to the complexity of relationships. For that reason, expecting customers to have to flatten the data so it can be searched and explored is often unrealistic. We heard this often and it quickly became our number one most requested Azure Search feature. Because of this we were excited to announce the general availability of complex types support in Azure Search. In this post, I want to take some time to explain what complex types adds to Azure Search and the kinds of things you can build using this capability.
Azure Search is a platform as a service that helps developers create their own cloud search solutions.
What is complex data?
Complex data consists of data that includes hierarchical or nested substructures that do