Image Analysis 4.0 with new API endpoint and OCR model in preview

25

Oct

Image Analysis 4.0 with new API endpoint and OCR model in preview

Enterprises and hobbyists alike have been using Azure Computer Vision’s Image Analysis API to garner various insights from their images. These insights help power scenarios such as digital asset management, search engine optimization (SEO), image content moderation, and alt text for accessibility among others.

Newly improved features including read (OCR)

We are thrilled to announce the preview release of Computer Vision Image Analysis 4.0 which combines existing and new visual features such as read optical character recognition (OCR), captioning, image classification and tagging, object detection, people detection, and smart cropping into one API. One call is all it takes to run all these features on an image.

The OCR feature integrates more deeply with the Computer Vision service and includes performance improvements that are optimized for image scenarios that make OCR easy to use for user interfaces and near real-time experiences. Read now supports 164 languages including Cyrillic, Arabic, and Hindi.

Tested at scale and ready for deployment

Microsoft’s own products from PowerPoint, Designer, Word, Outlook, Edge, and LinkedIn are using Vision APIs to power design suggestions, alt text for accessibility, SEO, document processing, and content moderation.

You can get started with the preview by trying out the visual features with your images on Vision Studio. Upgrading from a previous version of the Computer Vision Image Analysis API to V4.0 is simple with these instructions.

We will continue to release breakthrough vision AI through this new API over the coming months, including capabilities powered by the Florence foundation model featured in this year’s premiere computer vision conference keynote at CVPR.

Additional Computer Vision services

Spatial Analysis is also in preview. You can use the spatial analysis feature to create apps that can count people in a room, understand dwell times in front of a retail display, and determine wait times in lines. Build solutions that enable occupancy management and social distancing, optimize in-store and office layouts, and accelerate the checkout process. By processing video streams from physical spaces, you're able to learn how people use them and maximize the space's value to your organization.

The Azure Face service provides AI algorithms that detect, recognize, and analyze human faces in images. Facial recognition software is important in many different scenarios, such as identity verification, touchless access control, and face blurring for privacy. Face service access is limited based on eligibility and usage criteria in order to support our Responsible AI principles. Face service is only available to Microsoft managed customers and partners. Use the Face Recognition intake form to apply for access. For more information, see the Face limited access page.

Computer Vision and Responsible AI

We are excited to see how our customers use Computer Vision’s Image Analysis API with these new and updated features. Our technology advancements are also guided by Microsoft’s Responsible AI process, and our principles of fairness, inclusiveness, reliability and safety, transparency, privacy and security, and accountability. We put these ethical standards into practice through the Office of Responsible AI (ORA)—which sets our rules and governance processes, the AI Ethics and Effects in Engineering and Research (Aether) Committee—which advises our leadership on the challenges and opportunities presented by AI innovations, and Responsible AI Strategy in Engineering (RAISE)—a team that enables the implementation of Microsoft Responsible AI rules across engineering groups.

Get started

Start improving how you analyze images with Image Analysis 4.0 with a unified API endpoint and a new OCR Model.

Computer Vision documentation.
Image Analysis documentation.
Quick Start for Image Analysis.
Vision Studio for demoing product solutions.

« Azure Scales 530B Parameter GPT-3 Model with NVIDIA NeMo Megatron Introducing Vision Studio, a UI-based demo interface for Computer Vision »