Category Archives : Data Warehouse

31

Jan

Announcing the general availability of Query Store for Azure SQL Data Warehouse

Since our preview announcement, hundreds of customers have been enabling Query Store to provide insight on query performance. We’re excited to share the general availability of Query Store worldwide for Azure SQL Data Warehouse.

Query Store automatically captures a history of queries, plans, and runtime statistics and retains them for your review when monitoring your data warehouse. Query Store separates data by time windows so you can see database usage patterns and understand when plan changes happen.

Top three reasons to use Query Store right now

1. Find the full text of any query: Using the sys.query_store_query and sys.query_store_query_text catalog views, you can see the full text of queries executed against your data warehouse over the last 7 days.

SELECT q.query_id , t.query_sql_text FROM sys.query_store_query q JOIN sys.query_store_query_text t ON q.query_text_id = t.query_text_id;

2. Finding your top executing queries: Query Store tracks all query executions for your review. On a busy data warehouse, you may have thousands or millions of queries executed daily. Using the Query Store catalog views, you can get the top executing queries for further analysis:

SELECT TOP 10 q.query_id [query_id] , t.query_sql_text [command] , SUM(rs.count_executions) [execution_count] FROM sys.query_store_query q JOIN sys.query_store_query_text t ON q.query_text_id

Share

07

Jan

Teradata to Azure SQL Data Warehouse migration guide
Teradata to Azure SQL Data Warehouse migration guide

With the increasing benefits of cloud-based data warehouses, there has been a surge in the number of customers migrating from their traditional on-premises data warehouses to the cloud. Microsoft Azure SQL Data Warehouse (SQL DW) offers the best price to performance when compared to its cloud-based data warehouse competitors. Teradata is a relational database management system and is one of the legacy on-premises systems that customers are looking to migrate from.

The Teradata to SQL DW migrations involve multiple steps. These steps include analyzing the existing workload, generating the relevant schema models, and performing the ETL operation. The intent of this discussed whitepaper is to provide guidance for these aforesaid migrations with emphasis on the migration workflow, the architecture, technical design considerations, and best practices.

Migration Phases

The Teradata migration should pivot on the following six areas. Though recommended, proof of concept is an alternative step. With the benefit of Azure, you can quickly provision Azure SQL Data Warehouses for your development team to start business object migration before the data is migrated and speed up the migration process.

Phase one – Fact finding

Through a question and answers session you can define what your inputs and outputs are

Share

10

Dec

Deploying Apache Airflow in Azure to build and run data pipelines

Apache Airflow is an open source platform used to author, schedule, and monitor workflows. Airflow overcomes some of the limitations of the cron utility by providing an extensible framework that includes operators, programmable interface to author jobs, scalable distributed architecture, and rich tracking and monitoring capabilities. Since its addition to Apache foundation in 2015, Airflow has seen great adoption by the community for designing and orchestrating ETL pipelines and ML workflows. In Airflow, a workflow is defined as a Directed Acyclic Graph (DAG), ensuring that the defined tasks are executed one after another managing the dependencies between tasks.

A simplified version of the Airflow architecture is shown below. It consists of a web server that provides UI, a relational metadata store that can be a MySQL/PostgreSQL database, persistent volume that stores the DAG files, a scheduler, and worker process.

The above architecture can be implemented to run in four execution modes, including:

Sequential Executor – This mode is useful for dev/test or demo purpose. It serializes the operations and allows only a single task to be executed at a time. Local Executor – This mode supports parallelization and is suitable for small to medium size workload. It doesn’t support

Share

10

Dec

Power BI and Azure Data Services dismantle data silos and unlock insights

Learn how to connect Power BI and Azure Data Services to share data and unlock new insights with a new tutorial. Business analysts who use Power BI dataflows can now share data with data engineers and data scientists, who can leverage the power of Azure Data Services, including Azure Databricks, Azure Machine Learning, Azure SQL Data Warehouse, and Azure Data Factory for advanced analytics and AI.

With the recently announced preview of Power BI dataflows, Power BI has enabled self-service data prep for business analysts. Power BI dataflows can ingest data from a large array of transactional and observational data sources, and cleanse, transform, enrich, schematize, and store the result. Dataflows are reusable and can be refreshed automatically and daisy-chained to create powerful data preparation pipelines. Power BI is now making available support for storing dataflows in Azure Data Lake Storage (ADLS) Gen2, including both the data and dataflow definition. By storing dataflows in Azure Data Lake Storage Gen2, business analysts using Power BI can now collaborate with data engineers and data scientists using Azure Data Services.

Data silos inhibit data sharing

The ability for organizations to extract intelligence from business data provides a key competitive advantage, however attempting this

Share

06

Dec

Azure Data Lake Storage Gen2 preview – More features, more performance, better availability

Since we announced the limited public preview of Azure Data Lake Storage (ADLS) Gen2 in June, the response has been resounding. Customers participating in the ADLS Gen2 preview have directly benefitted from the scale, performance, security, manageability, and cost-effectiveness inherent in the ADLS Gen2 offering. Today, we are very pleased to announce significant updates to the preview that will allow an even greater experience for customers.

Today’s announcements include additional features that preview customers have been asking for:

Enterprise-class security features integrated into Azure Databricks and Azure HDInsight (available shortly) Azure Storage Explorer support to view and manage data in ADLS Gen2 accounts, including data exploration and access control management Support for connecting external tables in SQL Data Warehouse, including when Storage Firewalls are active on the account Power BI and SQL Data Warehouse supporting the Common Data Model for entities stored in ADLS Gen2 Storage Firewall and Virtual Network rules integration for all analytics services Encryption of data at rest using either Microsoft or customer supplied keys as well as encryption in transit via TLS 1.2 Ability to mount an ADLS Gen2 filesystem into the Databricks File System (DBFS)

Additionally, as of today, the ADLS Gen2 public preview is

Share

06

Dec

General availability of VNet Service Endpoints for Azure SQL Data Warehouse

This blog post was co-authored by Anitha Adusumilli, Principal Program Manager, Azure Networking.

We’re excited to share the general availability of Virtual Network (VNet) Service Endpoints for Azure SQL Data Warehouse in all Azure regions. Azure SQL Data Warehouse is a fast, flexible, and secure cloud data warehouse tuned for running complex queries fast and across petabytes of data.

VNet Service Endpoints enable you to isolate connectivity to your logical server from a given subnet or set of subnets within your virtual network. The traffic to Azure SQL Data Warehouse from your VNet will always stay within the Azure backbone network. This direct route will be preferred over any specific routes that take Internet traffic through virtual appliances or on-premises. There is no additional billing for virtual network access through service endpoints. Current pricing model for Azure SQL Data Warehouse applies as is.

With this release, we also enabled PolyBase connectivity to Azure Data Lake Storage Gen2 (ADLS) via Azure Blob File System (ABFS) driver. Azure Data Lake Storage Gen2 brings all the qualities that are required for the full lifecycle of analytics data to Azure Storage. It is the result of converging the capabilities of our two existing storage

Share

05

Dec

Azure SQL Data Warehouse Gen2 now supports lower compute tiers

Microsoft is helping drive down the entry-level cost of running a data warehouse capable of handling demanding queries by adding lower compute tiers for the blazing fast Azure SQL Data Warehouse Gen2. Customers can experience Azure SQL Data Warehouse’s leading performance, flexibility, and security features starting with 100 cDWU (Data Warehouse Units) and scale to 30,000 cDWU in minutes. Starting mid-December 2018, customers can benefit from Gen2 performance and flexibility with lower compute tiers in 15 regions, with remaining regions available during 2019.

By dropping the entry point for next-generation data warehousing, Microsoft opens the doors to value-driven customers who want to evaluate all the benefits of a secure, high-performance data warehouse without guessing which trial environment is best for them. Customers will be able to start as low as 100 cDWU, down from the current 500 cDWU entry point. SQL Data Warehouse Gen2 continues to support pause and resume operations and goes beyond just the flexibility in compute. Gen2 also supports unlimited column-store storage capacity along with 2.5 times more memory per query, up to 128 concurrent queries and adaptive caching features bringing experiences on average of 5 times more performance compared to the same Data Warehouse Unit on

Share

07

Nov

Azure SQL Data Warehouse introduces new productivity and security capabilities

SQL Data Warehouse continues to provide a best in class price to performance offering, leading others in TPC-H and TPC-DS benchmarks based on independent testing. As a result, we are seeing customers, including more than 50 percent of Fortune 1000 enterprise such as Anheuser Busch InBev, Thomson Reuters, and ThyssenKrupp build new analytics solutions on Azure. 

With the launch of SQL Data Warehouse Gen2 in April 2018, customers have benefited tremendously from query performance and concurrency enhancements. To support our customers’ exponentially growing data volume and resulting analytics workloads, today we are sharing new SQL Data Warehouse features. Enhanced workload management, row-level security, and improved operational experiences.

Enhanced workload management

SQL Data Warehouse will offer workload management capabilities that optimize query execution to ensure that high-value work gets priority access to system resources. With features such as workload importance, customers can use a single SQL Data Warehouse database to more efficiently run multiple workloads, taking away the complexity of separate data warehouses for each solution. With this new capability, SQL Data Warehouse enables better control, utilization, and optimization over deployed resources. Workload importance will be available for all SQL Data Warehouse customers later this year at no additional cost.

Share

07

Nov

Workload insights into SQL Data Warehouse delivered through Microsoft Azure Monitor diagnostic logs

SQL Data Warehouse (SQL DW) now enables enhanced insights into analytical workloads by integrating directly with Microsoft Azure Monitor diagnostic logs. This new capability enables developers to analyze workload behavior over an extended time period and make informed decisions on query optimization or capacity management. SQL DW is a flexible, secure, and fully managed analytics platform for the enterprise optimized for running complex queries quickly across petabytes of data.

Today, customers leverage Dynamic Management Views (DMVs) to get insights into their data warehouse workload. These DMVs have a limit of 10,000 rows that can easily be exceeded for intensive enterprise data warehouse workloads with heavy query activity. Relying solely on DMVs hinders or blocks many query troubleshooting scenarios for active workloads. To work around this DMV limitation, custom logging solutions were required which consumed internal system resources, increased the total cost of the data warehouse solution, and introduced additional development complexities and maintenance effort.

We have now introduced an external logging process through Azure Monitor diagnostic logs, which provides additional insights into your data warehouse workload. With a single click of a button, you are now able to configure diagnostic logs for historical query performance troubleshooting capabilities using Log Analytics.

Share

07

Nov

https://azure.microsoft.com/blog/automatically-discover-workload-insights-for-advanced-performance-tuning-directly-in-the-azure-portal/Advanced tuning for Azure SQL Data Warehouse (SQL DW) just got simpler with additional data warehouse recommendations and metrics. SQL DW is a flexible, secure, and fully managed analytics platform for enterprises optimized for running complex queries quickly across petabytes READ MORE

Share