Category Archives : Cognitive Services

23

Aug

Speech Services August 2018 update
Speech Services August 2018 update

We are pleased to announce the release of another update to the Cognitive Services Speech SDK (version 0.6.0). With this release, we have added the support for Java on Windows 10 (x64) and Linux (x64). We are also extending the support for .NET Standard 2.0 to the Linux platform. The changes are highlighted in the table below. The sample section of the SDK has been updated with samples showcasing the use of the newly supported languages. The UWP support was added in the Speech SDK version 0.5.0 release; and starting from now the UWP apps built with the Speech SDK can be published to the Microsoft Store.

We also included several bug fixes which were reported by early adopters. Most notable this should fix errors in long-running speech transcriptions, as well as reducing the amount of in-use socket connections and threads.

Other functional changes, breaking changes and bug fixes can be found in the Speech SDK’s release notes. For questions regarding Speech SDK and Speech Services, please visit our support page.

There are also changes that impact the Speech Devices SDK. To provide a little bit of the background, the Speech Devices SDK is for our devices solution. It

21

Aug

Logic Apps, Flow connectors will make Automating Video Indexer simpler than ever

Video Indexer recently released a new and improved Video Indexer V2 API. This RESTful API supports both server-to-server and client-to-server communication and enables Video Indexer users to integrate video and audio insights easily into their application logic, unlocking new experiences and monetization opportunities.

To make the integration even easier, we also added new Logic Apps and Flow connectors that are compatible with the new API. Using the new connectors, you can now set up custom workflows to effectively index and extract insights from a large amount of video and audio files, without writing a single line of code! Furthermore, using the connectors for your integration gives you better visibility on the health of your flow and an easy way to debug it. 

To help you get started quickly with the new connectors, we’ve added Microsoft Flow templates that use the new connectors to automate extraction of insights from videos. In this blog, we will walk you through those example templates.

Upload and index your video automatically

This scenario is comprised of two different flows that work together. The first flow is triggered when a new file is added to a designated folder in a OneDrive account. It uploads the new

09

Aug

Speech Devices SDK and Dev Kits news (August 2018)
Speech Devices SDK and Dev Kits news (August 2018)

We have just released the v0.5.0 version of the Speech Devices SDK to its download site a few days ago. If you want access to it, please apply for it via the Microsoft Speech Devices SDK Sign Up Form.

Please use the latest version of the Speech Devices SDK and its matching sample app. The Speech Devices SDK consumes the Speech SDK. In addition, it uses an advanced audio processing algorithm and enables a custom Key Word Spotting (KWS) feature. The Speech Devices SDK’s version matches the version of the Speech SDK, so that all the APIs are consistent. Right now we only have support for Java, see the Java API reference. Please see the Speech Devices SDK’s release notes for details. We currently have one developer centric Java sample app. The source code can be found posted in the GitHub repository’s under the Samples/Android example. We will post additional Java sample apps as they become available.

The microphone array dev kits from our hardware provider Roobo, have also gone on sale recently. Please visit ROOBO Dev Kits for Microsoft Speech Services for the hardware specs and product details. If you have questions regarding the device hardware, including ordering and

09

Aug

Speech Devices SDK and Dev Kits news (August 2018)
Speech Devices SDK and Dev Kits news (August 2018)

We have just released the v0.5.0 version of the Speech Devices SDK to its download site a few days ago. If you want access to it, please apply for it via the Microsoft Speech Devices SDK Sign Up Form.

Please use the latest version of the Speech Devices SDK and its matching sample app. The Speech Devices SDK consumes the Speech SDK. In addition, it uses an advanced audio processing algorithm and enables a custom Key Word Spotting (KWS) feature. The Speech Devices SDK’s version matches the version of the Speech SDK, so that all the APIs are consistent. Right now we only have support for Java, see the Java API reference. Please see the Speech Devices SDK’s release notes for details. We currently have one developer centric Java sample app. The source code can be found posted in the GitHub repository’s under the Samples/Android example. We will post additional Java sample apps as they become available.

The microphone array dev kits from our hardware provider Roobo, have also gone on sale recently. Please visit ROOBO Dev Kits for Microsoft Speech Services for the hardware specs and product details. If you have questions regarding the device hardware, including ordering and

02

Jul

Azure Search – Announcing the general availability of synonyms

Today we are announcing the general availability of synonyms. Synonyms allow Azure Search to associate equivalent terms that implicitly expand the scope of a query, without the user having to provide the alternate terms.

A good example of this capability was demonstrated at the recent Microsoft Build conference, where we showed how NBA.com searches their vast photo library of players, owners, and celebrities. In this application Azure Search synonyms are used to enable nicknames of Lebron James such as “The King” or “King James” to be returned regardless of which of the three terms are used in the query.

In Azure Search, synonym support is based on synonym maps that you define and upload to your search service. These maps constitute an independent resource, such as indexes or data sources, and can be used by any searchable field in any index in your search service. Synonym maps use the Apache Solr format as outlined in the example synonym map below:

POST https://[servicename].search.windows.net/synonymmaps?api-version=2017-11-11 api-key: [admin key] { “name”:”mysynonymmap”, “format”:”solr”, “synonyms”: ” USA, United States, United States of American Washington, Wash., WA => WAn” }

In the above example, you can see there are two types of synonyms that are

28

Jun

Get video insights in (even) more languages!

For those of you who might not have tried it yet, Video Indexer is a cloud application and platform built upon media AI technologies to make it easier to extract insights from video and audio files. As a starting point for extracting the textual part of the insights, the solution creates a transcript based on the speech appearing in the file; this process is referred to as Speech-to-text. Today, Video Indexer’s Speech-to-text supports ten different languages. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Portuguese (Brazilian), Japanese, Arabic, and Russian.

However, if the content you need is not in one of the above languages, fear not! Video Indexer partners with other transcription service providers to extend its speech-to-text capabilities to many more languages. One of those partnerships is with Zoom Media, which extended the Speech-to-text to Dutch, Danish, Norwegian and Swedish.

A great example for using Video Indexer and Zoom Media is the Dutch public broadcaster AVROTROS; who uses Video Indexer to analyze videos and allow editors to search through them. Finus Tromp, Head of Interactive Media in AVROTROS shared, “We use Microsoft Video Indexer on a daily basis to supply our videos with relevant metadata. The gathered

13

Jun

Bing Visual Search and Entity Search APIs for video apps

In this blog, I will go over how you can use the Bing Visual Search API, in combination with Bing Entity Search API to build an enhanced viewing experience in your video app.

General availability of Bing Visual Search API was announced at Build 2018, in this blog. Bing Visual Search API enables you to use an image as a query to get information about what entities are in the image, along with a list of visually similar images from the image index built by Bing. GA of Bing Entity Search was announced in this blog, published on March 1st, 2018. Bing Entity Search API enables you to brings rich contextual information about people, places, things, and local businesses to any application, blog, or website for a more engaging user experience.

By combining the power of these two APIs, you can build a more engaging experience in your video app, by following the steps listed below

Write a JavaScript function that triggers when the user clicks the pause button in your video app. In this JavaScript function, grab the paused video frame as an image. Take a look at this discussion to learn more about how to do this. Pass the

13

Jun

Bing Visual Search and Entity Search APIs for video apps

In this blog, I will go over how you can use the Bing Visual Search API, in combination with Bing Entity Search API to build an enhanced viewing experience in your video app.

General availability of Bing Visual Search API was announced at Build 2018, in this blog. Bing Visual Search API enables you to use an image as a query to get information about what entities are in the image, along with a list of visually similar images from the image index built by Bing. GA of Bing Entity Search was announced in this blog, published on March 1st, 2018. Bing Entity Search API enables you to brings rich contextual information about people, places, things, and local businesses to any application, blog, or website for a more engaging user experience.

By combining the power of these two APIs, you can build a more engaging experience in your video app, by following the steps listed below

Write a JavaScript function that triggers when the user clicks the pause button in your video app. In this JavaScript function, grab the paused video frame as an image. Take a look at this discussion to learn more about how to do this. Pass the

04

Jun

Speech services now in preview
Speech services now in preview

This blog post was authored by the Microsoft Speech Services team​.

At Microsoft Build 2018, the Microsoft Speech Services team announced the following new and improved products and services.

Speech service as a preview, including Speech to Text with custom speech, Text to Speech with custom voice, and Speech Translation. Speech SDK as a preview, which will replace the old Bing Speech APIs when generally available in fall 2018. It will be the single SDK for most of our speech services, and will require only one Azure subscription key for speech recognition and LUIS (language understanding service). With simplified APIs, Speech SDK makes it easy for new and experienced speech developers. Speech Devices SDK, as a restricted preview, has advanced multi-microphone array audio processing algorithm that’s fine-tuned to the backend Speech Services, and works great on the Roobo’s dev kits for exceptional speech experiences, and the ability to customize the wake word to strengthen your brand.

To learn more, please read the ZDNet article highlighting these products and services.

We also demonstrated our Speech Recognition capabilities in the Satya Nadella’s vision keynote at Microsoft Build 2018. You can skip to the 1:22:40 mark if you want to jump to

10

May

Full-integrated experience simplifying Language Understanding in conversational AI systems

Creating an advanced conversational system is now a simple task with the powerful tools integrated into Microsoft’s Language Understanding Service (LUIS) and Bot Framework. LUIS brings together cutting-edge speech, machine translation, and text analytics on the most enterprise-ready platform for creation of conversational systems. In addition to these features, LUIS is currently GDPR, HIPPA, and ISO compliant enabling it to deliver exceptional service across global markets.

Talk or text?

Bots and conversational AI systems are quickly becoming a ubiquitous technology enabling natural interactions with users. Speech remains one of the most widely used input forms that come natural when thinking of conversational systems. This requires the integration of speech recognition within the Language Understanding in conversational systems. Individually, speech recognition and language understanding are amongst the most difficult problems in cognitive computing. Introducing the context of Language Understanding improves the quality of speech recognition. Through intent-based speech priming, the context of an utterances is interpreted using the language model to cross-fertilize the performance of both speech recognition and language understanding. Intent based speech recognition priming uses the utterances and entity tags in your LUIS models to improve accuracy and relevance while converting audio to text. Incorrectly recognized spoken phrases or