Category Archives : Data Science



Analyze data in Azure Data Explorer using KQL magic for Jupyter Notebook

Exploring data is like solving a puzzle. You create queries and receive instant satisfaction when you discover insights, just like adding pieces to complete a puzzle. Imagine you have to repeat the same analysis multiple times, use libraries from an open-source community, share your steps and output with others, and save your work as an artifact. Notebooks helps you create one place to write your queries, add documentation, and save your work as output in a reusable format.

Jupyter Notebook allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. Its includes data cleaning and transformation, numerical simulation, statistical modeling, and machine learning.

We are excited to announce KQL magic commands which extends the functionality of the Python kernel in Jupyter Notebook. KQL magic allows you to write KQL queries natively and query data from Microsoft Azure Data Explorer. You can easily interchange between Python and KQL, and visualize data using rich library integrated with KQL render commands. KQL magic supports Azure Data Explorer, Application Insights, and Log Analytics as data sources to run queries against.

Use a single magic “%kql” to run a single line query, or use cell magic “%%kql” to




Azure Data Explorer plugin for Grafana dashboards
Azure Data Explorer plugin for Grafana dashboards

Are you using Azure Data Explorer to query vast amounts of data? Are you following business metrics and KPIs with Grafana dashboards? Creating a Grafana data source with Azure Data Explorer has never been easier.

Grafana is a leading open source software designed for visualizing time series analytics. It is an analytics and metrics platform that enables you to query and visualize data and create and share dashboards based on those visualizations. Combining Grafana’s beautiful visualizations with Azure Data Explorer’s snappy ad hoc queries over massive amounts of data, creates impressive usage potential.

The Grafana and Azure Data Explorer teams have created a dedicated plugin which enables you to connect to and visualize data from Azure Data Explorer using its intuitive and powerful Kusto Query Language. In just a few minutes, you can unlock the potential of your data and create your first Grafana dashboard with Azure Data Explorer.

Once you build an Azure Data Explorer data source in Grafana, you can create a dashboard panel and select Edit to add your query.

Kusto Query Language is available for executing queries in the Metrics tab. The built-in intellisense which proposes query term completion, assists in query formulation. You run




AI is the new normal: Recap of 2018

The year 2018 was a banner year for Azure AI as over a million Azure developers, customers, and partners engaged in the conversation on digital transformation. The next generation of AI capabilities are now infused across Microsoft products and services including AI capabilities for Power BI.

Here are the top 10 Azure AI highlights from 2018, across AI Services, tools and frameworks, and infrastructure at a glance:

AI services

1. Azure Machine Learning (AML) service with new automated machine learning capabilities.

2. Historical milestones in Cognitive Services including unified Speech service.

3. Microsoft is first to enable Cognitive Services in containers.

4. Cognitive Search and basketball

5. Bot Framework v4 SDK, offering broader language support (C#, Python, Java, and JavaScript) and extensibility models.

AI tools and frameworks

6. Data science features in Visual Studio Code.

7. Open Neural Network Exchange (ONNX) runtime is now open source.

8. ML.Net and AI Platform for Windows developers.

AI infrastructure

9. Azure Databricks

10. Project Brainwave, integrated with AML.

With many exciting developments, why are these moments the highlight? Read on, as this blog begins to explain the importance of these moments.

AI services

These services span pre-built




Announcing the general availability of Azure Data Box Disk

Since our preview announcement, hundreds of customers have been moving recurring workloads, media captures from automobiles, incremental transfers for ongoing backups, and archives from remote/office branch offices (ROBOs) to Microsoft Azure. We’re excited to announce the general availability of Azure Data Box Disk, an SSD-based solution for offline data transfer to Azure. Data Box Disk is now available in the US, EU, Canada, and Australia, with more country/regions to be added over time. Also, be sure not to miss the announcement of the public preview for Blob Storage on Azure Data Box below!

Top three reasons customers use Data Box Disk Easy to order and use: Each disk is an 8 TB SSD. You can easily order a pack(s) of up to five disks from the Azure portal for a total capacity of 40 TB per order. The small form-factor provides the right balance of capacity and portability to collect and transport data in a variety of use cases. Support is available for Windows and Linux. Fast data transfer: These SSD disks copy data up to USB 3.1 speeds and support the SATA II and III interfaces. Simply mount the disks as drives and use any tool of choice such




Connect Azure Data Explorer to Power BI for visual depiction of data

Do you want to analyze vast amounts of data, create Power BI dashboards and reports to help you visualize your data, and share insights across your organization? Azure Data Explorer (ADX), a lightning-fast indexing and querying service helps you build near real-time and complex analytics solutions for vast amounts of data. ADX can connect to Power BI, a business analytics solution that lets you visualize your data and share the results across your organization. The various methods of connection to Power BI allow for interactive analysis of organizational data such as tracking and presentation of trends.

Simple and intuitive native connector

The native connector to Power BI unlocks the power of Azure Data Explorer in only a minute. In a very intuitive process, add your cluster name and let the connector take care of the rest. Provide the database and table name to focus your analysis on specific data. You can use import mode for snappy interaction with the data or direct query mode for filtering large datasets and near real-time updates. To use the native connector method read our documentation, “Quickstart: Visualize data using the Azure Data Explorer connector for Power BI.”

Imported query

A specific Azure Data Explorer query can




Power BI and Azure Data Services dismantle data silos and unlock insights

Learn how to connect Power BI and Azure Data Services to share data and unlock new insights with a new tutorial. Business analysts who use Power BI dataflows can now share data with data engineers and data scientists, who can leverage the power of Azure Data Services, including Azure Databricks, Azure Machine Learning, Azure SQL Data Warehouse, and Azure Data Factory for advanced analytics and AI.

With the recently announced preview of Power BI dataflows, Power BI has enabled self-service data prep for business analysts. Power BI dataflows can ingest data from a large array of transactional and observational data sources, and cleanse, transform, enrich, schematize, and store the result. Dataflows are reusable and can be refreshed automatically and daisy-chained to create powerful data preparation pipelines. Power BI is now making available support for storing dataflows in Azure Data Lake Storage (ADLS) Gen2, including both the data and dataflow definition. By storing dataflows in Azure Data Lake Storage Gen2, business analysts using Power BI can now collaborate with data engineers and data scientists using Azure Data Services.

Data silos inhibit data sharing

The ability for organizations to extract intelligence from business data provides a key competitive advantage, however attempting this




Time series analysis in Azure Data Explorer
Time series analysis in Azure Data Explorer

Azure Data Explorer (ADX) is a lightning fast service optimized for data exploration. It supplies users with instant visibility into very large raw datasets in near real-time to analyze performance, identify trends and anomalies, and diagnose problems.

ADX performs an on-going collection of telemetry data from cloud services or IoT devices. This data can then be analyzed for various insights such as monitoring service health, physical production processes, and usage trends. The analysis can be performed on sets of time series for selected metrics to find a deviation in the pattern of the metrics relative to their typical baseline patterns.

ADX contains native support for creation, manipulation, and analysis of time series. It empowers us to create and analyze thousands of time series in seconds and enable near real-time monitoring solutions and workflows. In this blog post, we are going to describe the basics of time series analysis in Azure Data Explorer.

Time series capabilities

The first step for time series analysis is to partition and transform the original telemetry table to a set of time series using the make-series operator. Using various functions, ADX then offers the following capabilities for time series analysis:

Filtering – Used for noise reduction,




ONNX Runtime for inferencing machine learning models now in preview

We are excited to release the preview of ONNX Runtime, a high-performance inference engine for machine learning models in the Open Neural Network Exchange (ONNX) format. ONNX Runtime is compatible with ONNX version 1.2 and comes in Python packages that support both CPU and GPU to enable inferencing using Azure Machine Learning service and on any Linux machine running Ubuntu 16.

ONNX is an open source model format for deep learning and traditional machine learning. Since we launched ONNX in December 2017 it has gained support from more than 20 leading companies in the industry. ONNX gives data scientists and developers the freedom to choose the right framework for their task, as well as the confidence to run their models efficiently on a variety of platforms with the hardware of their choice.

The ONNX Runtime inference engine provides comprehensive coverage and support of all operators defined in ONNX. Developed with extensibility and performance in mind, it leverages a variety of custom accelerators based on platform and hardware selection to provide minimal compute latency and resource usage. Given the platform, hardware configuration, and operators defined within a model, ONNX Runtime can utilize the most efficient execution provider to deliver the




GPUs vs CPUs for Deployment of Deep Learning Models
GPUs vs CPUs for Deployment of Deep Learning Models

This blog post was co-authored by Mathew Salvaris, Senior Data Scientist, Azure CAT and Daniel Grecoe, Senior Software Engineer, Azure CAT

Choosing the right type of hardware for deep learning tasks is a widely discussed topic. An obvious conclusion is that the decision should be dependent on the task at hand and based on factors such as throughput requirements and cost. It is widely accepted that for deep learning training, GPUs should be used due to their significant speed when compared to CPUs. However, due to their higher cost, for tasks like inference which are not as resource heavy as training, it is usually believed that CPUs are sufficient and are more attractive due to their cost savings. However, when inference speed is a bottleneck, using GPUs provide considerable gains both from financial and time perspectives. In a previous tutorial and blog Deploying Deep Learning Models on Kubernetes with GPUs, we provide step-by-step instructions to go from loading a pre-trained Convolutional Neural Network model to creating a containerized web application that is hosted on Kubernetes cluster with GPUs using Azure Container Service (AKS).

Expanding on this previous work, as a follow up analysis, here we provide a detailed comparison of




Finish your insurance actuarial modeling in hours, not days

Because of existing and upcoming regulations, insurers perform quite a bit of analysis over their assets and liabilities. Actuaries need time to review and correct results before reviewing the reports with regulators. Today, it is common for quarterly reporting to require thousands of hours of compute time. Companies which offer variable annuity products must follow Actuarial Guideline XLIII which requires several compute intensive tasks, including nested stochastic modeling. Solvency II requires quite a bit of computational analysis to understand the Solvency Capital Requirement and the Minimum Capital Requirement. International Financial Reporting Standard 17 requires analysis of each policy, reviews of overall profitability, and more. Actuarial departments everywhere work to make sure that their financial and other models produce results which can be used to evaluate their business for regulatory and internal needs.

With all this reporting, actuaries get pinched for time. They need time for things like:

Development: Actuaries code the models in their favorite software or in custom solutions. Anything they can do to reduce the cycle of Code-Test-Review helps deliver the actuarial results sooner. Data preparation: Much of the source data is initially entered by hand. Errors need to be identified and fixed. If the errors can