Category Archives : Data Warehouse



Azure Databricks, industry-leading analytics platform powered by Apache Spark™

This blog post was co-authored by Ali Ghodsi, CEO, Databricks.

The confluence of cloud, data, and AI is driving unprecedented change. The ability to utilize data and turn it into breakthrough insights is foundational to innovation today. Our goal is to empower organizations to unleash the power of data and reimagine possibilities that will improve our world.

To enable this journey, we are excited to announce the general availability of Azure Databricks, a fast, easy, and collaborative Apache® Spark™-based analytics platform optimized for Azure.

Fast, easy, and collaborative

Over the past five years, Apache Spark has emerged as the open source standard for advanced analytics, machine learning, and AI on Big Data. With a massive community of over 1,000 contributors and rapid adoption by enterprises, we see Spark’s popularity continue to rise.

Azure Databricks is designed in collaboration with Databricks whose founders started the Spark research project at UC Berkeley, which later became Apache Spark. Our goal with Azure Databricks is to help customers accelerate innovation and simplify the process of building Big Data & AI solutions by combining the best of Databricks and Azure.

To meet this goal, we developed Azure Databricks with three design principles.

First, enhance



Microsoft creates industry standards for datacenter hardware storage and security

Today I’m speaking at the Open Compute Project (OCP) U.S. Summit 2018 in San Jose where we are announcing a next generation specification for solid state device (SSD) storage, Project Denali. We’re also discussing Project Cerberus, which provides a critical component for security protection that to date has been missing from server hardware: protection, detection and recovery from attacks on platform firmware. Both storage and security are the next frontiers for hardware innovation, and today we’re highlighting the latest advancements across these key focus areas to further the industry in enabling the future of the cloud.

A new standard for cloud SSD storage

Storage paradigms have performed well on-premises, but they haven’t resulted in innovation for increasing performance and cost efficiencies needed for cloud-based models. For this reason, we’re setting out to define a new standard for flash storage specifically targeted for cloud-based workloads and I’m excited to reveal Project Denali, which we’re establishing with CNEX Labs. Fundamentally, Project Denali standardizes the SSD firmware interfaces by disaggregating the functionality for software defined data layout and media management. With Project Denali, customers can achieve greater levels of performance, while leveraging the cost-reduction economics that come at cloud scale.

Project Denali is



Gartner reaffirms Microsoft as a leader in Data Management Solutions for Analytics

We are excited to announce that Microsoft has once again been positioned as a leader in Gartner’s 2018 Magic Quadrant for Data Management Solutions for Analytics (DMSA). Gartner has also positioned Microsoft as a leader in the Magic Quadrant for Analytics and Business Intelligence Platforms, and in the Magic Quadrant for Operational Database Management Systems. This is an exciting milestone, and it is Microsoft’s perspective that this underscores our global leadership and its relentless commitment to innovation across the data estate.

Gartner defines DMSA as a complete software system that supports and manages data in one or more file management systems (usually databases). DMSA includes specific optimizations to support analytical processing. This includes, but is not limited to, support for relational processing, nonrelational processing (such as graph processing), and machine learning and programming languages such as Python and R.

Source: Gartner (February, 2018)*

At Microsoft, we’ve championed a data platform evolution to make big data processing and analytics simpler and more accessible, helping you transform data into intelligent action. We do this through SQL Server 2017 and key Azures services: Azure SQL Data Warehouse (a fully managed, MPP architecture cloud data warehouse), Azure Databricks (an Apache Spark-based analytics platform)



Unlock Query Performance with SQL Data Warehouse using Graphical Execution Plans

The Graphical Execution Plan feature within SQL Server Management Studio (SSMS) is now supported for SQL Data Warehouse (SQL DW)! With a click of a button, you can create a graphical representation of a distributed query plan for SQL DW.

Before this enhancement, query troubleshooting for SQL DW was often a tedious process, which required you to run the EXPLAIN command. SQL DW customers can now seamlessly and visually debug query plans to identify performance bottlenecks directly within the SSMS window. This experience extends the query troubleshooting experience by displaying costly data movement operations which are the most common reasons for slow distributed query plans. Below is a simple example of troubleshooting a distributed query plan with SQL DW leveraging the Graphical Execution Plan.

The view below displays the estimated execution plan for a query. As we can see, this is an incompatible join which occurs when there is a join between two tables distributed on different columns. An incompatible join will create a ShuffleMove operation, where temp tables will be created on every distribution to satisfy the join locally before streaming the results back to the user. The ShuffleMove has become a performance bottleneck for this query:




Simplify cloud adoption with Azure SQL Data Warehouse and Datometry

We increasingly see that every enterprise is formulating, if not already executing, a cloud-first strategy for their on-premise enterprise data management to benefit from inherent elasticity, flexibility and performance of a cloud data warehouse like Azure SQL Data Warehouse (Azure SQL DW).

The common challenge for moving Azure SQL DW is the complexity of shifting decades of on-premise data management to the cloud . Over years, enterprises have built complex disparate suites of applications such as point of sales, logistics, analytics, and reporting that communicate with a central database. Many apps can’t simply use any database other than the one they were written for originally.

Microsoft has partnered with Datometry to simplify our customer’s journey to the cloud. Re-platforming from Teradata Data Warehouse to SQL DW can be completed in weeks, not years, and at a fraction of the costs compared to traditional migration. With Datometry’s Adaptive Data Virtualization technology, existing Teradata applications can run instantly and natively on Azure SQL DW without rewriting or redesigning the legacy applications.

In the Case Study discussed in this blog post, a Fortune 100 retailer was looking to move their custom business intelligence application with close to 40 million application queries executed per



New Azure Data Factory self-paced hands-on lab for UI

A few weeks back, we announced the public preview release of the new browser-based V2 UI experience for Azure Data Factory. We’ve since partnered with Pragmatic Works, who have been long-time experts in the Microsoft data integration and ETL space, to create a new set of hands on labs that you can now use to learn how to build those DI patterns using ADF V2.

In that repo, you will find data files and scripts in the Deployment folder. There are also lab manual folders for each lab module as well an overview presentation to walk you through the labs. Below you will find more details on each module.

The repo also includes a series of PowerShell and database scripts as well as Azure ARM templates that will generate resource groups that the labs need in order for you to successfully build out an end-to-end scenario, including some sample data that you can use for Power BI reports in the final Lab Module 9.

Here is how the individual labs are divided:

Lab 1 – Setting up ADF and Resources, Start here to get all of the ARM resource groups and database backup files loaded properly. Lab 2 – Lift



December 2017 Leaderboard of Database Systems contributors on MSDN

Congratulations to our December top 10 contributors! Alberto Morillo and Visakh Murukesan maintain their top positions.

This Leaderboard initiative was started in October 2016 to recognize the top Database Systems contributors on MSDN forums. The following continues to be the points hierarchy (in decreasing order of points):



#Azure #SQLDW, the cost benefits of an on-demand data warehousing

Prices illustrated below are based on East US 2 as December 18th, 2017. For price changes updates, visit Azure Analysis Services, SQL Database, and SQL Data Warehouse pricing pages.

Azure SQL Data Warehouse is Microsoft’s SQL analytics platform, the backbone of your Enterprise Data Warehouse. The service is designed to allow customers to elastically and independently scale, compute and store. It acts as a hub to your data marts and cubes for an optimized and tailored performance of your EDW. Azure SQL DW offers guaranteed 99.9 percent high availability, PB scale, compliance, advanced security, and tight integration to upstream and downstream services so you can build a data warehouse that fits your needs. Azure SQL DW is the only data warehouse service enabling enterprises to gain insights from data everywhere with a global availability in more than 30 regions.

This is the last blog post in our series detailing the benefits of Hub and Spoke data warehouse architecture on Azure. On-premises, a Hub and Spoke architecture was hard and expensive to maintain. In the cloud, the cost of such architecture can be much lower as you can dynamically adjust compute capacity to what you need, when you



November 2017 Leaderboard of Database Systems contributors on MSDN

Congratulations to our November top-10 contributors! Alberto Morillo maintains the first position in the cloud ranking while Visakh Murukesan maintains the top in the All Databases ranking.

This Leaderboard initiative was started in October 2016 to recognize the top Database Systems contributors on MSDN forums. The following continues to be the points hierarchy (in decreasing order of points):



#AzureSQLDW cost savings with Autoscaler – part 2
#AzureSQLDW cost savings with Autoscaler – part 2 blog post was co-authored by Eldad Hagashi, Software Engineering Manager, and Feng Tan, Software Engineer, Microsoft Education Data Service. Azure SQL Data Warehouse is Microsoft’s SQL analytics platform, the backbone of your enterprise data warehouse (EDW). The service is READ MORE