Category Archives : Data Science

16

Oct

ONNX Runtime for inferencing machine learning models now in preview

We are excited to release the preview of ONNX Runtime, a high-performance inference engine for machine learning models in the Open Neural Network Exchange (ONNX) format. ONNX Runtime is compatible with ONNX version 1.2 and comes in Python packages that support both CPU and GPU to enable inferencing using Azure Machine Learning service and on any Linux machine running Ubuntu 16.

ONNX is an open source model format for deep learning and traditional machine learning. Since we launched ONNX in December 2017 it has gained support from more than 20 leading companies in the industry. ONNX gives data scientists and developers the freedom to choose the right framework for their task, as well as the confidence to run their models efficiently on a variety of platforms with the hardware of their choice.

The ONNX Runtime inference engine provides comprehensive coverage and support of all operators defined in ONNX. Developed with extensibility and performance in mind, it leverages a variety of custom accelerators based on platform and hardware selection to provide minimal compute latency and resource usage. Given the platform, hardware configuration, and operators defined within a model, ONNX Runtime can utilize the most efficient execution provider to deliver the

Share

11

Sep

GPUs vs CPUs for Deployment of Deep Learning Models
GPUs vs CPUs for Deployment of Deep Learning Models

This blog post was co-authored by Mathew Salvaris, Senior Data Scientist, Azure CAT and Daniel Grecoe, Senior Software Engineer, Azure CAT

Choosing the right type of hardware for deep learning tasks is a widely discussed topic. An obvious conclusion is that the decision should be dependent on the task at hand and based on factors such as throughput requirements and cost. It is widely accepted that for deep learning training, GPUs should be used due to their significant speed when compared to CPUs. However, due to their higher cost, for tasks like inference which are not as resource heavy as training, it is usually believed that CPUs are sufficient and are more attractive due to their cost savings. However, when inference speed is a bottleneck, using GPUs provide considerable gains both from financial and time perspectives. In a previous tutorial and blog Deploying Deep Learning Models on Kubernetes with GPUs, we provide step-by-step instructions to go from loading a pre-trained Convolutional Neural Network model to creating a containerized web application that is hosted on Kubernetes cluster with GPUs using Azure Container Service (AKS).

Expanding on this previous work, as a follow up analysis, here we provide a detailed comparison of

Share

04

Sep

Finish your insurance actuarial modeling in hours, not days

Because of existing and upcoming regulations, insurers perform quite a bit of analysis over their assets and liabilities. Actuaries need time to review and correct results before reviewing the reports with regulators. Today, it is common for quarterly reporting to require thousands of hours of compute time. Companies which offer variable annuity products must follow Actuarial Guideline XLIII which requires several compute intensive tasks, including nested stochastic modeling. Solvency II requires quite a bit of computational analysis to understand the Solvency Capital Requirement and the Minimum Capital Requirement. International Financial Reporting Standard 17 requires analysis of each policy, reviews of overall profitability, and more. Actuarial departments everywhere work to make sure that their financial and other models produce results which can be used to evaluate their business for regulatory and internal needs.

With all this reporting, actuaries get pinched for time. They need time for things like:

Development: Actuaries code the models in their favorite software or in custom solutions. Anything they can do to reduce the cycle of Code-Test-Review helps deliver the actuarial results sooner. Data preparation: Much of the source data is initially entered by hand. Errors need to be identified and fixed. If the errors can

Share

02

Aug

Real example: improve accuracy, reduce training times for existing R codebase

When you buy an item on a favored website, does the site show you pictures of what others have bought? That’s the result of a recommendation system. Retailers have been building such systems for years, many built using the programming language R. For older implementations of recommender systems, it’s time to consider improving performance and scalability by moving these systems to the cloud —the Azure cloud.

Problem: to re-host and optimize an existing R model in Azure

Recently, we were asked to help a customer improve the performance and process surrounding the R implementation of their recommender solution and host the model in Azure. Many of their early analytic products were built in R, and they wanted to preserve that investment. After a review of their solution, we identified bottlenecks that could be vanquished. We worked together to find a way to significantly improve the model training time using parallel R algorithms. Then we worked to streamline how they operationalized their R model. All the work was done using libraries available with Microsoft Machine Learning Server (R Server). 

The architecture: Azure SQL + Machine Learning Server

There are several components and steps to the solution. We needed a database

Share

02

Aug

Real example: improve accuracy, reduce training times for existing R codebase

When you buy an item on a favored website, does the site show you pictures of what others have bought? That’s the result of a recommendation system. Retailers have been building such systems for years, many built using the programming language R. For older implementations of recommender systems, it’s time to consider improving performance and scalability by moving these systems to the cloud —the Azure cloud.

Problem: to re-host and optimize an existing R model in Azure

Recently, we were asked to help a customer improve the performance and process surrounding the R implementation of their recommender solution and host the model in Azure. Many of their early analytic products were built in R, and they wanted to preserve that investment. After a review of their solution, we identified bottlenecks that could be vanquished. We worked together to find a way to significantly improve the model training time using parallel R algorithms. Then we worked to streamline how they operationalized their R model. All the work was done using libraries available with Microsoft Machine Learning Server (R Server). 

The architecture: Azure SQL + Machine Learning Server

There are several components and steps to the solution. We needed a database

Share

02

Aug

Improve collaborative care and clinical data sharing with blockchain

Currently, the healthcare industry suffers major inefficiencies due to diverse uncoordinated and unconnected data sources/systems. Collaboration is vital to improve healthcare outcomes. With digitized health data, the exchange of healthcare information across healthcare organizations is essential to support effective care collaboration. Traditional health information exchanges have had limited success.

Blockchain offers new capabilities to greatly improve health information exchange. At Microsoft we are working to maximize the benefits of solutions that have the potential to improve patient outcomes, reduce healthcare costs, and enhance the experiences of patients and healthcare workers. With that I’d like to announce a new partner solution and pilot for a better health information exchange that uses blockchain, and runs in Microsoft Azure.

Interoperability and exchanges

Grapevine World, one of the leaders in the application of blockchain technology, make use of the Institute for Healthcare Enterprise (IHE) methodology for interoperability. They employ multiple blockchains for tracking data provenance, and provide a crypto token as means of exchange within their ecosystem.

Grapevine World is a decentralized ecosystem for the seamless exchange and utilization of health data in a standardized, secure manner. In collaboration with the University of Southampton and Tiani Spirit, they have developed a new blockchain-enabled

Share

26

Jul

Avoid Big Data pitfalls with Azure HDInsight and these partner solutions

According to a Gartner 2017 prediction, “60 percent of big data projects will fail to go beyond piloting and experimentation, these projects will be abandoned”.

Whether you worked on an analytical project or are starting one, it is a challenge on any cloud. You need to juggle the intricacies of cloud provider services, open source frameworks and the apps in the ecosystem. Apache Hadoop & Spark are very vibrant open source ecosystems which have enabled enterprises to digitally transform their businesses using data. According to Matt Turck VC at FirstMark, it has been an exciting but complex year in the data world. “The data tech ecosystem has continued to fire on all cylinders.  If nothing else, data is probably even more front and center in 2018, in both business and personal conversations”.

However, with great power comes greater responsibility from the ecosystem. There is a lot more than just using open source or a managed platform to a successful project. You have to deal with:

The complexity of combining all the open source frameworks. Architecting a data lake to get insights for data engineers, data scientists and BI users. Meeting enterprise regulations such as security, access control, data sovereignty &

Share

26

Jul

Orchestrating production-grade workloads with Azure Kubernetes Service

Happy Birthday Kubernetes! In the short three years that Kubernetes has been around, it has become the industry standard for orchestration of containerized workloads. In Azure, we have spent the last three years helping customers run Kubernetes in the cloud. As much as Kubernetes simplifies the task of orchestration, there’s plenty of setup and management that needs to take place for you to take full advantage of Kubernetes. This is where Azure Kubernetes Service (AKS) comes in. With Microsoft’s unique knowledge of the requirements of an enterprise, and our heritage of empowering developers, this managed service takes care of many of the complexities and delivers the best Kubernetes experience in the cloud.

In this blog post, I will dig into the top scenarios that Azure customers are building on Azure Kubernetes Service. After that, we will blow out the candles and have some cake.

If you are new to AKS, check out the Azure Kubernetes Service page and this video to learn more.

Lift and shift to containers

Organizations typically want to move to the cloud quickly and it is often not possible to re-write applications to take full advantage of cloud-native features right from the beginning. Containerizing applications makes

Share

10

Jul

Azure HDInsight now supports Apache Spark 2.3

Apache Spark 2.3.0 is now available for production use on the managed big data service Azure HDInsight. Ranging from bug fixes (more than 1400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.

Data engineers relying on Python UDFs get 10 times to a 100 times more speed, thanks to revamped object serialization between Spark runtime and Python. Data Scientist will be delighted by better integration of Deep Learning frameworks like TensorFlow with Spark Machine Learning pipelines. Business Analysts will find liberating availability of fast vectorized reader for ORC file format which finally makes interactive analytics in Spark practical over this popular columnar data format. Developers building real-time applications may be interested in experimenting with new Continuous Processing mode in Spark Structured Streaming which brings event processing latency to millisecond level.

Vectorized object serialization in Python UDFs

It is worth mentioning that PySpark is already fast and takes advantage of the vectorized data processing in core Spark engine as long as you are using DataFrame APIs. This is good news as it represents majority of the use cases if you follow best practices for

Share

10

Jul

Streaming analytics use cases with Spark on Azure
Streaming analytics use cases with Spark on Azure

Sensors, IoT devices, social networks, and online transactions are all generating data that needs to be monitored constantly and acted on quickly. As a result, the need for large-scale, real-time stream processing is more evident now than ever before.

With Azure Databricks running on top of Spark, Spark Streaming enables data scientists and data engineers with powerful interactive and analytical applications across both streaming and historical data, while inheriting Spark’s ease of use and fault tolerance characteristics. Azure Databricks readily integrates with a wide variety of popular data sources, including HDFS, Flume, Kafka, and Twitter.

There are four main use cases Spark Streaming is being used today:

Streaming ETL — Data is continuously cleaned and aggregated before being pushed into data stores. Triggers — Anomalous behavior is detected in real-time and further downstream actions are triggered accordingly. For example, unusual behavior of sensor devices generating actions. Data enrichment — Live data is enriched with more information by joining it with a static dataset allowing for a more complete real-time analysis. Complex sessions and continuous learning — Events related to a live session (e.g. user activity after logging into a website or application) are grouped together and analyzed. In some cases,

Share