Category Archives : Data Science



Azure Machine Learning service now supports NVIDIA’s RAPIDS

Azure Machine Learning service is the first major cloud ML service to support NVIDIA’s RAPIDS, a suite of software libraries for accelerating traditional machine learning pipelines with NVIDIA GPUs.

Just as GPUs revolutionized deep learning through unprecedented training and inferencing performance, RAPIDS enables traditional machine learning practitioners to unlock game-changing performance with GPUs. With RAPIDS on Azure Machine Learning service, users can accelerate the entire machine learning pipeline, including data processing, training and inferencing, with GPUs from the NC_v3NC_v2, ND or ND_v2 families. Users can unlock performance gains of more than 20X (with 4 GPUs), slashing training times from hours to minutes and dramatically reducing time-to-insight.

The following figure compares training times on CPU and GPUs (Azure NC24s_v3) for a gradient boosted decision tree model using XGBoost. As shown below, performance gains increase with the number of GPUs. In the Jupyter notebook linked below, we’ll walk through how to reproduce these results step by step using RAPIDS on Azure Machine Learning service.

How to use RAPIDS on Azure Machine Learning service

Everything you need to use RAPIDS on Azure Machine Learning service can be found on GitHub.

The above repository consists of a master Jupyter Notebook that uses




ONNX Runtime integration with NVIDIA TensorRT in preview

Today we are excited to open source the preview of the NVIDIA TensorRT execution provider in ONNX Runtime. With this release, we are taking another step towards open and interoperable AI by enabling developers to easily leverage industry-leading GPU acceleration regardless of their choice of framework. Developers can now tap into the power of TensorRT through ONNX Runtime to accelerate inferencing of ONNX models, which can be exported or converted from PyTorch, TensorFlow, and many other popular frameworks.

Microsoft and NVIDIA worked closely to integrate the TensorRT execution provider with ONNX Runtime and have validated support for all the ONNX Models in the model zoo. With the TensorRT execution provider, ONNX Runtime delivers better inferencing performance on the same hardware compared to generic GPU acceleration. We have seen up to 2X improved performance using the TensorRT execution provider on internal workloads from Bing MultiMedia services.

How it works

ONNX Runtime together with its TensorRT execution provider accelerates the inferencing of deep learning models by parsing the graph and allocating specific nodes for execution by the TensorRT stack in supported hardware. The TensorRT execution provider interfaces with the TensorRT libraries that are preinstalled in the platform to process the ONNX sub-graph




Intel and Microsoft bring optimizations to deep learning on Azure

This post is co-authored with Ravi Panchumarthy and Mattson Thieme from Intel.

We are happy to announce that Microsoft and Intel are partnering to bring optimized deep learning frameworks to Azure. These optimizations are available in a new offering on the Azure marketplace called the Intel Optimized Data Science VM for Linux (Ubuntu).

Over the last few years, deep learning has become the state of the art for several machine learning and cognitive applications. Deep learning is a machine learning technique that leverages neural networks with multiple layers of non-linear transformations, so that the system can learn from data and build accurate models for a wide range of machine learning problems. Computer vision, language understanding, and speech recognition are all examples of deep learning at play today. Innovations in deep neural networks in these domains have enabled these algorithms to reach human level performance in vision, speech recognition and machine translation. Advances in this field continually excite data scientists, organizations and media outlets alike. To many organizations and data scientists, doing deep learning well at scale poses challenges due to technical limitations.

Often, default builds of popular deep learning frameworks like TensorFlow are not fully optimized for training and




PyTorch on Azure: Deep learning in the oil and gas industry

This blog post was co-authored by Jürgen Weichenberger, Chief Data Scientist, Accenture and Mathew Salvaris, Senior Data Scientist, Microsoft

Drilling for oil and gas is one of the most dangerous jobs on Earth. Workers are exposed to the risk of events ranging from small equipment malfunctions to entire off shore rigs catching on fire. Fortunately, the application of deep learning in predictive asset maintenance can help prevent natural and human made catastrophes.

We have more information than ever on our equipment thanks to sensors and IoT devices, but we are still working on ways to process the data so it is valuable for preventing these catastrophic events. That’s where deep learning comes in. Data from multiple sources can be used to train a predictive model that helps oil and gas companies predict imminent disasters, enabling them to follow a proactive approach.

Using the PyTorch deep learning framework on Microsoft Azure, Accenture helped a major oil and gas company implement such a predictive asset maintenance solution. This solution will go a long way in protecting their staff and the environment.

What is predictive asset maintenance?

Predictive asset maintenance is a core element of the digital transformation of chemical plants. It




Build your own deep learning models on Azure Data Science Virtual Machines

As a modern developer, you may be eager to build your own deep learning models but aren’t quite sure where to start. If this is you, I recommend you take a look at the deep learning course from This new course helps software developers start building their own state-of-the-art deep learning models. Developers who complete this course will become proficient in deep learning techniques in multiple domains including computer vision, natural language processing, recommender algorithms, and tabular data.

You’ll also want to learn about Microsoft’s Azure Data Science Virtual Machine (DSVM). Azure DSVM empowers developers like you with the tools you need to be productive with this course today on Azure, with virtually no setup required. Using fast cloud-based GPU virtual machines (VMs), at the most competitive rates, Azure DSVM saves you time that would otherwise be spent in installation, configuration, and waiting for deep learning models to train.

Here is how you can effectively run the course examples on Azure.

Running the deep learning course on Azure DSVM

While there are several ways in which you can use Azure for your deep learning course, one of the easiest ways is to leverage




Microsoft joins the SciKit-learn Consortium

As part of our ongoing commitment to open and interoperable artificial intelligence, Microsoft has joined the SciKit-learn consortium as a platinum member and released tools to enable increased usage of SciKit-learn pipelines.

Initially launched in 2007 by members of the Python scientific community, SciKit-learn has attracted a large community of active developers who have turned it into a first class, open source library used by many companies and individuals around the world for scenarios ranging from fraud detection to process optimization. Following SciKit-learn’s remarkable success, the SciKit-learn consortium was launched in September 2018 by Inria, the French national institute for research in computer science, to foster growth and sustainability of the library, employing central contributors to maintain high standards and develop new features. We are extremely supportive of what the SciKit-learn community has accomplished so far and want to see it continue to thrive and expand. By joining the newly formed SciKit-learn consortium, we will support central contributors to ensure that SciKit-learn remains a high-quality project while also tackling new features in conjunction with the fabulous community of users and developers.

In addition to supporting SciKit-learn development, we are committed to helping Scikit-learn users in training and production scenarios through




Analyze data in Azure Data Explorer using KQL magic for Jupyter Notebook

Exploring data is like solving a puzzle. You create queries and receive instant satisfaction when you discover insights, just like adding pieces to complete a puzzle. Imagine you have to repeat the same analysis multiple times, use libraries from an open-source community, share your steps and output with others, and save your work as an artifact. Notebooks helps you create one place to write your queries, add documentation, and save your work as output in a reusable format.

Jupyter Notebook allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. Its includes data cleaning and transformation, numerical simulation, statistical modeling, and machine learning.

We are excited to announce KQL magic commands which extends the functionality of the Python kernel in Jupyter Notebook. KQL magic allows you to write KQL queries natively and query data from Microsoft Azure Data Explorer. You can easily interchange between Python and KQL, and visualize data using rich library integrated with KQL render commands. KQL magic supports Azure Data Explorer, Application Insights, and Log Analytics as data sources to run queries against.

Use a single magic “%kql” to run a single line query, or use cell magic “%%kql” to




Azure Data Explorer plugin for Grafana dashboards
Azure Data Explorer plugin for Grafana dashboards

Are you using Azure Data Explorer to query vast amounts of data? Are you following business metrics and KPIs with Grafana dashboards? Creating a Grafana data source with Azure Data Explorer has never been easier.

Grafana is a leading open source software designed for visualizing time series analytics. It is an analytics and metrics platform that enables you to query and visualize data and create and share dashboards based on those visualizations. Combining Grafana’s beautiful visualizations with Azure Data Explorer’s snappy ad hoc queries over massive amounts of data, creates impressive usage potential.

The Grafana and Azure Data Explorer teams have created a dedicated plugin which enables you to connect to and visualize data from Azure Data Explorer using its intuitive and powerful Kusto Query Language. In just a few minutes, you can unlock the potential of your data and create your first Grafana dashboard with Azure Data Explorer.

Once you build an Azure Data Explorer data source in Grafana, you can create a dashboard panel and select Edit to add your query.

Kusto Query Language is available for executing queries in the Metrics tab. The built-in intellisense which proposes query term completion, assists in query formulation. You run




AI is the new normal: Recap of 2018

The year 2018 was a banner year for Azure AI as over a million Azure developers, customers, and partners engaged in the conversation on digital transformation. The next generation of AI capabilities are now infused across Microsoft products and services including AI capabilities for Power BI.

Here are the top 10 Azure AI highlights from 2018, across AI Services, tools and frameworks, and infrastructure at a glance:

AI services

1. Azure Machine Learning (AML) service with new automated machine learning capabilities.

2. Historical milestones in Cognitive Services including unified Speech service.

3. Microsoft is first to enable Cognitive Services in containers.

4. Cognitive Search and basketball

5. Bot Framework v4 SDK, offering broader language support (C#, Python, Java, and JavaScript) and extensibility models.

AI tools and frameworks

6. Data science features in Visual Studio Code.

7. Open Neural Network Exchange (ONNX) runtime is now open source.

8. ML.Net and AI Platform for Windows developers.

AI infrastructure

9. Azure Databricks

10. Project Brainwave, integrated with AML.

With many exciting developments, why are these moments the highlight? Read on, as this blog begins to explain the importance of these moments.

AI services

These services span pre-built




Announcing the general availability of Azure Data Box Disk

Since our preview announcement, hundreds of customers have been moving recurring workloads, media captures from automobiles, incremental transfers for ongoing backups, and archives from remote/office branch offices (ROBOs) to Microsoft Azure. We’re excited to announce the general availability of Azure Data Box Disk, an SSD-based solution for offline data transfer to Azure. Data Box Disk is now available in the US, EU, Canada, and Australia, with more country/regions to be added over time. Also, be sure not to miss the announcement of the public preview for Blob Storage on Azure Data Box below!

Top three reasons customers use Data Box Disk Easy to order and use: Each disk is an 8 TB SSD. You can easily order a pack(s) of up to five disks from the Azure portal for a total capacity of 40 TB per order. The small form-factor provides the right balance of capacity and portability to collect and transport data in a variety of use cases. Support is available for Windows and Linux. Fast data transfer: These SSD disks copy data up to USB 3.1 speeds and support the SATA II and III interfaces. Simply mount the disks as drives and use any tool of choice such