Category Archives : Data Science

22

Mar

Unlock your data’s potential with Azure SQL Data Warehouse and Azure Databricks

Getting the most out of your data is critical for any business in a competitive environment. Businesses need the ability to get the right data into the right hands at the right time. Azure Databricks and Azure SQL Data Warehouse can help you do just that through a Modern Data Warehouse.

Azure SQL Data Warehouse is an elastic, globally available, cloud data warehouse that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. Azure SQL Data Warehouse provides a familiar interface for your analysts who know SQL and want to drive action in your business.

Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts powered by Apache Spark.

With the general availability of the Azure Databricks Service comes built-in support for Azure SQL Data Warehouse. This enables any data scientist or data engineer to have a seamless experience connecting their Azure Databricks Cluster and their Azure SQL Data Warehouse when building advanced ETL (extract, transform, and load data) for Modern Data Warehouse Architectures or accessing relational data for Machine

13

Feb

Microsoft partners with National Science Foundation to empower data science breakthroughs

Over the past decade, Microsoft has partnered with the National Science Foundation (NSF) on three separate programs, first in 2010, and more recently through a commitment of $6M in cloud credits across two NSF supported data science programs – with the Big Data Regional Innovation Hubs and as part of the NSF BigData solicitation.

The engagement with NSF has helped Microsoft reach diverse research groups such as the Big Data Hubs1 that brings together communities of data scientists to spark and nurture collaborations between domain experts, researchers, communities, state partners, nonprofits, and industry.

As of today, Microsoft has provided 17 cloud credit awards to Principal Investigators (PIs) who benefit from NSF supported programs. These collaborations are already seeing some interesting breakthroughs across the human body, microbial diseases, and even everyday communication –

Franco Pestilli, Assistant Professor in Psychology, Neuroscience and Cognitive Science, Indiana University is an Azure awardee and PI through the Midwest Big Data Hub2 – his group has built a platform called Brainlife using the Azure award, with the goal of fostering collaboration with sixty-six different global scientific communities such as developmental and learning sciences, network science, computer science, engineering, psychology, statistics, traumatic brain injury, vision science. Chirag

31

Jan

Three new reasons to love the TSI explorer
Three new reasons to love the TSI explorer

Today we’re pleased to announce three new Time Series Insights (TSI) explorer capabilities that we think our users are going to love. 

First, we are delighted to share that the TSI explorer, the visualization service of TSI, is now generally available and backed by our SLA.  Second, we’ve made the TSI explorer more accessible and easier to use for those with visual and fine-motor disabilities. And finally, we’ve made it easy to export aggregate event data to other analytics tools like Microsoft Excel. 

Now that the TSI explorer is generally available, users will notice that the explorer is backed by TSI’s service level agreement (SLA), and we’ve removed the preview moniker from the backsplash when the explorer is loading. We have many customers using TSI in production environments and we’re thrilled to offer them the same SLA that backs the rest of the product. The ActionPoint IoT-PREDICT solution is a great example of one of those customers using the TSI explorer to enable their customers to explore and analyze time series data quickly. Check out their solution below.

There are no limits to what people can achieve when technology reflects the diversity of everyone who uses it. Transparency, accountability, and

25

Jan

Accelerated Spark on GPU-enabled clusters in Azure

The ability to run Spark on a GPU enabled cluster demonstrates a unique convergence of big data and high-performance computing (HPC) technologies. In the past several years, we’ve seen the GPU market explode as companies all over the world integrate AI and other HPC workflows into their businesses. Tensorflow, a framework designed to utilize GPUs for numerical computation and neural networks has skyrocketed into popularity, a testament to the rise of AI and consequently the demand for GPUs. Simultaneously, the need for big data and powerful data processing engines has never been greater as hundreds of companies start to collect data in the petabyte range.

By providing infrastructure for high performance hardware such as GPUs with big data engines such as Spark, data scientists and data engineers can enable many scenarios that would otherwise be difficult to achieve.

Along with the recent release of our latest GPU SKUs, I’m excited to share that we now support running Spark on a GPU-enabled cluster using the Azure Distributed Data Engineering Toolkit (AZTK). In a single command, AZTK allows you to provision on demand GPU-enabled Spark clusters on top of Azure Batch’s infrastructure, helping you take your high performance implementations that are usually

04

Jan

Using Qubole Data Service on Azure to analyze retail customer feedback

It has been a busy season for many retailers. During this time, retailers are using Azure to analyze various types of data to help accelerate purchasing decisions. The Azure cloud not only gives retailers the compute capacity to handle peak times, but also the data analytic tools to better understand their customers.

Many retailers have a treasure trove of information in the thousands, or millions, of product reviews provided by their customers. Often, it takes time for particular reviews to show their value because customers “vote” for helpful or not helpful reviews over time. Using machine learning, retailers can automate identifying useful reviews in near real-time and leverage that insight quickly to build additional business value.

But how might a retailer without deep big data and machine learning expertise even begin to conduct this type of advanced analytics on such a large quantity of unstructured data? We will be holding a workshop in January to show you how easy that can be through the use of Azure and Qubole’s big data service.

Using these technologies, anyone can quickly spin up a data platform and train a machine learning model utilizing Natural Language Processing (NLP) to identify the most useful reviews.

03

Jan

Azure Data Lake tools integrates with VSCode Data Lake Explorer and Azure Account

If you are a data scientist and want to explore the data and understand what is being saved and what the hierarchy of the folder is, please try Data Lake Explorer in VSCode ADL Tools. If you are a developer and look for easier navigation inside the ADLS, please use Data Lake Explorer in VSCode ADL Tools. The VSCode Data Lake Explorer enhances your Azure login experiences, empowers you to manage your ADLA metadata in a tree like hierarchical way and enables easier file exploration for ADLS resources under your Azure subscriptions. You can also preview, delete, download, and upload files through contextual menu. With the integration of VSCode explorer, you can choose your preferred way to manage your U-SQL databases and your ADLS storage accounts in addition to the existing ADLA and ADLS commands.

If you have difficulties to login to Azure and look for simpler sign in processes, the Azure Data Lake Tools integration with VSCode Azure account enables auto sign in and greatly enhance the integration with Azure experiences. If you are an Azure multi-tenant user, the integration with Azure account unblocks you and empowers you to navigate your Azure subscription resources across tenants.

If your source