Category Archives : Data Warehouse



Now available for preview: Workload importance for Azure SQL Data Warehouse

Azure SQL Data Warehouse is a fast, flexible and secure analytics platform for enterprises of all sizes. Today we are announcing the preview availability of workload importance on the Gen2 platform to help customers manage resources more efficiently. Workload importance gives data engineers the ability to use importance to classify requests. Requests with higher importance are guaranteed quicker access to resources which helps meet SLAs.

“More with less” is often the motto when it comes to operating data warehousing solutions. The ability to easily scale up compute resources gives data engineers tremendous flexibility. However, when there is budget pressure and scaling down is required, problems can arise.  Workload importance allows high business value work to meet SLAs in a shared environment with fewer resources.

An example of workload importance is shown below. The CEO’s request was submitted last and classified with high importance. Because the CEO’s request has high importance, it is granted access to resources before the Analyst requests allowing it to complete sooner.

Get started now classifying requests with importance

Classifying requests is done with the new CREATE WORKLOAD CLASSIFIER syntax. Below is an example that maps the login for the ExecutiveReports role to ABOVE_NORMAL importance and




Spinning up cloud-scale analytics is even more compelling with Talend and Microsoft

Special thanks to Lee Schlesinger and the Talend team for their contribution to this blog post.

Following the significant announcement around the continued price-performance leadership of Azure Data Warehouse in February 2019, Talend announced support of Stitch Data Loader for Azure SQL Data Warehouse. Stich Data Loader is Talend’s recent addition to its offering portfolio small and mid-market customers. With Stitch Data Loader, customers can load 5 million rows/month into Azure SQL Data Warehouse for free or scale up to an unlimited number of rows with a subscription.

All across the industry, there is a rapid shift to the cloud. Utilizing fast, flexible, and secure cloud data warehouse is an important first step in that journey. With Microsoft Azure SQL Data Warehouse and Stitch Data Loader companies can get started faster than ever. The fact that ADW can be up to 14x faster, and 94 percent less expensive than similar options in the marketplace, should only help further accelerate adoption of cloud scale analytics by customers of all sizes.

Building pipelines to the cloud with Stitch Data Loader

The Stitch team built the Azure SQL Data Warehouse integration with the help of Microsoft engineers. The solution leverages Azure Blob Storage




Stay informed about service issues with Azure Service Health

When your Azure resources go down, one of your first questions is probably, “Is it me or is it Azure?” Azure Service Health helps you stay informed and take action when Azure service issues like incidents and planned maintenance affect you by providing a personalized health dashboard, customizable alerts, and expert guidance.

In this blog, we’ll cover how you can use Azure Service Health’s personalized dashboard to stay informed about issues that could affect you now or in the future.

Monitor Azure service issues and take action to mitigate downtime

You may already be familiar with the Azure status page, a global view of the health of all Azure services across all Azure regions. It’s a good reference for major incidents with widespread impact, but we recommend using Azure Service Health to stay informed about Azure incidents and maintenance. Azure Service Health only shows issues that affect you, provides information about all incidents and maintenance, and has richer capabilities like alerting, shareable updates and RCAs, and other guidance and support.

Azure Service Health tracks three types of health events that may impact you:

Service issues: Problems in Azure services that affect you right now. Planned maintenance: Upcoming maintenance that




Microsoft continues to build the case for data estate modernization on Azure

Special thanks to Rik Tamm-Daniels and the Informatica team for their contribution to this blog post. ​

With the latest release of Azure SQL Data Warehouse, Microsoft doubles-down on Azure SQL DW as one of the core data services for digital transformation on Azure. In addition to the fundamental benefits of agility, on-demand scaling and unlimited compute availability, the most recent price-to-performance metrics from the GigaOM report are one of several the compelling arguments they have made for customers to adopt Azure SQL DW. Interestingly, Microsoft is also announcing the general availability of Azure Data Lake Gen 2 and Azure Data Explorer. Along with Power BI for rich visualization, these enhanced set of capabilities cement Microsoft’s leadership position around Cloud Scale Analytics.

Every day, I speak with joint Informatica and Microsoft customers who are looking to transform their approach to their data estate with a cohesive data lake and cloud data warehousing solution architecture. These customers range from global logistics companies, to auto manufacturers to the world’s largest insurers, and all of them see the tremendous potential of the Microsoft modern data estate approach; in fact, just via Informatica’s iPaaS (integration platform-as-a-service) offering, Informatica Intelligent Cloud Services, we’ve seen a




Individually great, collectively unmatched: Announcing updates to 3 great Azure Data Services

As Julia White mentioned in her blog today, we’re pleased to announce the general availability of Azure Data Lake Storage Gen2 and Azure Data Explorer. We also announced the preview of Azure Data Factory Mapping Data Flow. With these updates, Azure continues to be the best cloud for analytics with unmatched price-performance and security. In this blog post we’ll take a closer look at the technical capabilities of these new features.

Azure Data Lake Storage – The no compromise Data Lake

Azure Data Lake Storage (ADLS) combines the scalability, cost effectiveness, security model, and rich capabilities of Azure Blob Storage with a high-performance file system that is built for analytics and is compatible with the Hadoop Distributed File System. Customers no longer have to tradeoff between cost effectiveness and performance when choosing a cloud data lake.

One of our key priorities was to ensure that ADLS is compatible with the Apache ecosystem. We accomplished this by developing the Azure Blob File System (ABFS) driver. The ABFS driver is officially part of Apache Hadoop and Spark and is incorporated in many commercial distributions. The ABFS driver defines a URI scheme that allows files and folders to be distinctly addressed in the




Announcing the general availability of Query Store for Azure SQL Data Warehouse

Since our preview announcement, hundreds of customers have been enabling Query Store to provide insight on query performance. We’re excited to share the general availability of Query Store worldwide for Azure SQL Data Warehouse.

Query Store automatically captures a history of queries, plans, and runtime statistics and retains them for your review when monitoring your data warehouse. Query Store separates data by time windows so you can see database usage patterns and understand when plan changes happen.

Top three reasons to use Query Store right now

1. Find the full text of any query: Using the sys.query_store_query and sys.query_store_query_text catalog views, you can see the full text of queries executed against your data warehouse over the last 7 days.

SELECT q.query_id , t.query_sql_text FROM sys.query_store_query q JOIN sys.query_store_query_text t ON q.query_text_id = t.query_text_id;

2. Finding your top executing queries: Query Store tracks all query executions for your review. On a busy data warehouse, you may have thousands or millions of queries executed daily. Using the Query Store catalog views, you can get the top executing queries for further analysis:

SELECT TOP 10 q.query_id [query_id] , t.query_sql_text [command] , SUM(rs.count_executions) [execution_count] FROM sys.query_store_query q JOIN sys.query_store_query_text t ON q.query_text_id




Teradata to Azure SQL Data Warehouse migration guide
Teradata to Azure SQL Data Warehouse migration guide

With the increasing benefits of cloud-based data warehouses, there has been a surge in the number of customers migrating from their traditional on-premises data warehouses to the cloud. Microsoft Azure SQL Data Warehouse (SQL DW) offers the best price to performance when compared to its cloud-based data warehouse competitors. Teradata is a relational database management system and is one of the legacy on-premises systems that customers are looking to migrate from.

The Teradata to SQL DW migrations involve multiple steps. These steps include analyzing the existing workload, generating the relevant schema models, and performing the ETL operation. The intent of this discussed whitepaper is to provide guidance for these aforesaid migrations with emphasis on the migration workflow, the architecture, technical design considerations, and best practices.

Migration Phases

The Teradata migration should pivot on the following six areas. Though recommended, proof of concept is an alternative step. With the benefit of Azure, you can quickly provision Azure SQL Data Warehouses for your development team to start business object migration before the data is migrated and speed up the migration process.

Phase one – Fact finding

Through a question and answers session you can define what your inputs and outputs are




Deploying Apache Airflow in Azure to build and run data pipelines

Apache Airflow is an open source platform used to author, schedule, and monitor workflows. Airflow overcomes some of the limitations of the cron utility by providing an extensible framework that includes operators, programmable interface to author jobs, scalable distributed architecture, and rich tracking and monitoring capabilities. Since its addition to Apache foundation in 2015, Airflow has seen great adoption by the community for designing and orchestrating ETL pipelines and ML workflows. In Airflow, a workflow is defined as a Directed Acyclic Graph (DAG), ensuring that the defined tasks are executed one after another managing the dependencies between tasks.

A simplified version of the Airflow architecture is shown below. It consists of a web server that provides UI, a relational metadata store that can be a MySQL/PostgreSQL database, persistent volume that stores the DAG files, a scheduler, and worker process.

The above architecture can be implemented to run in four execution modes, including:

Sequential Executor – This mode is useful for dev/test or demo purpose. It serializes the operations and allows only a single task to be executed at a time. Local Executor – This mode supports parallelization and is suitable for small to medium size workload. It doesn’t support




Power BI and Azure Data Services dismantle data silos and unlock insights

Learn how to connect Power BI and Azure Data Services to share data and unlock new insights with a new tutorial. Business analysts who use Power BI dataflows can now share data with data engineers and data scientists, who can leverage the power of Azure Data Services, including Azure Databricks, Azure Machine Learning, Azure SQL Data Warehouse, and Azure Data Factory for advanced analytics and AI.

With the recently announced preview of Power BI dataflows, Power BI has enabled self-service data prep for business analysts. Power BI dataflows can ingest data from a large array of transactional and observational data sources, and cleanse, transform, enrich, schematize, and store the result. Dataflows are reusable and can be refreshed automatically and daisy-chained to create powerful data preparation pipelines. Power BI is now making available support for storing dataflows in Azure Data Lake Storage (ADLS) Gen2, including both the data and dataflow definition. By storing dataflows in Azure Data Lake Storage Gen2, business analysts using Power BI can now collaborate with data engineers and data scientists using Azure Data Services.

Data silos inhibit data sharing

The ability for organizations to extract intelligence from business data provides a key competitive advantage, however attempting this




Azure Data Lake Storage Gen2 preview – More features, more performance, better availability

Since we announced the limited public preview of Azure Data Lake Storage (ADLS) Gen2 in June, the response has been resounding. Customers participating in the ADLS Gen2 preview have directly benefitted from the scale, performance, security, manageability, and cost-effectiveness inherent in the ADLS Gen2 offering. Today, we are very pleased to announce significant updates to the preview that will allow an even greater experience for customers.

Today’s announcements include additional features that preview customers have been asking for:

Enterprise-class security features integrated into Azure Databricks and Azure HDInsight (available shortly) Azure Storage Explorer support to view and manage data in ADLS Gen2 accounts, including data exploration and access control management Support for connecting external tables in SQL Data Warehouse, including when Storage Firewalls are active on the account Power BI and SQL Data Warehouse supporting the Common Data Model for entities stored in ADLS Gen2 Storage Firewall and Virtual Network rules integration for all analytics services Encryption of data at rest using either Microsoft or customer supplied keys as well as encryption in transit via TLS 1.2 Ability to mount an ADLS Gen2 filesystem into the Databricks File System (DBFS)

Additionally, as of today, the ADLS Gen2 public preview is