Category Archives : Data Warehouse

12

Jul

Lightning fast query performance with Azure SQL Data Warehouse

Azure SQL Data Warehouse is a fast, flexible and secure analytics platform for enterprises of all sizes. Today we announced significant query performance improvements for Azure SQL Data Warehouse (SQL DW) customers enabled through enhancements in the distributed query execution layer.

Analytics workload performance is determined by two major factors, I/O bandwidth to storage and repartitioning speed, also known as shuffle speed. In this previous blog post, we described how SQL DW caches relevant data to take advantage of NVMe based local storage. In this blog post, we will go under the hood of SQL DW, to see how the shuffling speed has improved.

Data movement is an operation where parts of the distributed tables are moved to different nodes during query execution. This operation is required where the data is not available on the target node, most commonly when the tables do not share the distribution key. The most common data movement operation is shuffle. During shuffle, for each input row, SQL DW computes a hash value using the join columns and then sends that row to the node that owns that hash value. Either one or both sides of join can participate in the shuffle. The diagram below

12

Jul

Azure sets new performance benchmarks with SQL Data Warehouse

As the amount of data grows exponentially, the pressure to quickly harness it for insights to share across the organization also increases rapidly. As Microsoft continues to evolve our analytics portfolio, we are committed to delivering a data warehouse solution that provides a fast, flexible, and secure analytics platform in the cloud.

Today we are excited to announce that Azure SQL Data Warehouse has set new performance benchmarks for cloud data warehousing by delivering at least 2x faster query performance compared to before. The key to this technical innovation is instant data movement, a capability that allows for extremely efficient movement between data warehouse compute nodes. At the heart of every distributed database system is the need to align two or more tables that are partitioned on a different key to produce a final or intermediate result set. Instant data movement in SQL Data Warehouse now accelerates this movement, resulting in faster query performance. You can learn more about how your query performance will improve from this blog.

Insights in minutes with Azure SQL Data Warehouse

We know that data makes every decision better, but decisions need to be timely to be competitive in the market. Fast decisions need

12

Jul

Announcing new offers and capabilities that make Azure the best place for all your apps, data, and infrastructure

Just a few months back, I wrote about how we are helping companies move to the cloud with a flexible hybrid approach and cost-effective path to Azure. To help our customers realize even greater savings and simplify the migration to Azure, I’m pleased to share some exciting updates to address these customer needs.

Migrate to Azure with free Windows Server and SQL Server 2008 Extended Security Updates

Windows Server and SQL Server 2008/2008 R2 were hugely popular when they launched nearly 10 years ago and have been deployed in millions of instances worldwide. Our customers continue to run many of their business applications on these servers. With the decade anniversary coming, both these versions are nearing end of support – with SQL Server 2008 end of support in July 2019 and Windows Server 2008 end of support in January 2020. We know many customers want to continue with these servers, and we are here to help ensure all our customers have great options.

Today, we are announcing that we will provide extended security updates for three years past the end of support dates for free when running in Azure. This means customers get the efficiency of running servers in Azure,

10

Jul

Azure HDInsight now supports Apache Spark 2.3

Apache Spark 2.3.0 is now available for production use on the managed big data service Azure HDInsight. Ranging from bug fixes (more than 1400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.

Data engineers relying on Python UDFs get 10 times to a 100 times more speed, thanks to revamped object serialization between Spark runtime and Python. Data Scientist will be delighted by better integration of Deep Learning frameworks like TensorFlow with Spark Machine Learning pipelines. Business Analysts will find liberating availability of fast vectorized reader for ORC file format which finally makes interactive analytics in Spark practical over this popular columnar data format. Developers building real-time applications may be interested in experimenting with new Continuous Processing mode in Spark Structured Streaming which brings event processing latency to millisecond level.

Vectorized object serialization in Python UDFs

It is worth mentioning that PySpark is already fast and takes advantage of the vectorized data processing in core Spark engine as long as you are using DataFrame APIs. This is good news as it represents majority of the use cases if you follow best practices for

27

Jun

Enterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight

Azure HDInsight is one of the most popular services amongst enterprise for open source Hadoop & Spark analytics on Azure. With the plus 50 percent price cut on HDInsight, customers moving to the cloud are reaping more savings than ever.

PROS is a pioneer in using machine learning to give companies an accurate and profitable pricing. PROS Guidance product runs enormously complex pricing calculations based on variables that comprise multiple terabytes of data. In Azure HDInsight, a process that formerly took several days now takes just a few minutes.”– Ed Gonzalez, Product Manager, PROS

Today we are announcing updates to Apache Spark, Apache Kafka, ML Services, Azure Data Lake Storage Gen2 and enhancements to Enterprise Security Package. These new capabilities will continue to drive savings for many of our customers. In addition to this, Microsoft is continuing to deepen its commitment to the Apache Hadoop ecosystem and has extended its partnership with Hortonworks to bring the best of Apache Hadoop and the open source big data analytics to the Cloud.

Continued investment in Open Source for new capabilities and reliability Reliable Open Source

Microsoft’s is contributing to Apache Hadoop ecosystem and also ensuring Azure is the most reliable place

27

Jun

Enterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight

Azure HDInsight is one of the most popular services amongst enterprise for open source Hadoop & Spark analytics on Azure. With the plus 50 percent price cut on HDInsight, customers moving to the cloud are reaping more savings than ever.

PROS is a pioneer in using machine learning to give companies an accurate and profitable pricing. PROS Guidance product runs enormously complex pricing calculations based on variables that comprise multiple terabytes of data. In Azure HDInsight, a process that formerly took several days now takes just a few minutes.”– Ed Gonzalez, Product Manager, PROS

Today we are announcing updates to Apache Spark, Apache Kafka, ML Services, Azure Data Lake Storage Gen2 and enhancements to Enterprise Security Package. These new capabilities will continue to drive savings for many of our customers. In addition to this, Microsoft is continuing to deepen its commitment to the Apache Hadoop ecosystem and has extended its partnership with Hortonworks to bring the best of Apache Hadoop and the open source big data analytics to the Cloud.

Continued investment in Open Source for new capabilities and reliability Reliable Open Source

Microsoft’s is contributing to Apache Hadoop ecosystem and also ensuring Azure is the most reliable place

20

Jun

Column-Level Security is now supported in Azure SQL Data Warehouse

Today we’re announcing Column-Level Security (CLS) for Azure SQL Data Warehouse, an additional capability for managing security for sensitive data. Azure SQL Data Warehouse is a fast, flexible and secure cloud data warehouse tuned for running complex queries fast and across petabytes of data.

As you move data to the cloud, securing your data assets is critical to building trust with your customers and partners. With the introduction of CLS, you can adjust permissions to view sensitive data by limiting user access to specific columns in your tables without having to redesign your data warehouse. This simplifies the overall security implementation as the access restriction logic is located in the database tier itself rather than away from the data in another application. CLS eliminates the need to introduce views to filter out columns for access control management.

Some examples of how this is being used today:

A financial services firm allows only account managers to have access to customer social security numbers (SSN), phone numbers, and other personally identifiable information (PII). A health care provider allows only doctors and nurses to have access to sensitive medical records while not allowing members of the billing department to view this data.

20

Jun

Column-Level Security is now supported in Azure SQL Data Warehouse

Today we’re announcing Column-Level Security (CLS) for Azure SQL Data Warehouse, an additional capability for managing security for sensitive data. Azure SQL Data Warehouse is a fast, flexible and secure cloud data warehouse tuned for running complex queries fast and across petabytes of data.

As you move data to the cloud, securing your data assets is critical to building trust with your customers and partners. With the introduction of CLS, you can adjust permissions to view sensitive data by limiting user access to specific columns in your tables without having to redesign your data warehouse. This simplifies the overall security implementation as the access restriction logic is located in the database tier itself rather than away from the data in another application. CLS eliminates the need to introduce views to filter out columns for access control management.

Some examples of how this is being used today:

A financial services firm allows only account managers to have access to customer social security numbers (SSN), phone numbers, and other personally identifiable information (PII). A health care provider allows only doctors and nurses to have access to sensitive medical records while not allowing members of the billing department to view this data.

14

Jun

Quick Recovery Time with SQL Data Warehouse using User-Defined Restore Points

We are excited to announce that SQL Data Warehouse (SQL DW) now supports User-Defined Restore Points! SQL DW is a flexible and secure analytics platform for the enterprise optimized for running complex queries fast across petabytes of data.

Previously, SQL DW supported only automated snapshots guaranteeing an eight-hour recovery point objective (RPO). While this snapshot policy provided high levels of protection, customers asked for more control over restore points to enable more efficient data warehouse management capabilities leading to quicker times of recovery in the event of any workload interruptions or user errors. 

Now, with user-defined restore points, in addition to the automated snapshots, you can initiate snapshots before and after significant operations on your data warehouse. With more granular restore points, you ensure that each restore point is logically consistent and limit the impact and reduce recovery time of restoring the data warehouse should this be needed. User-defined restore points can also be labeled so they are easy to identify afterwards. 

You can seamlessly create a restore point with a single statement in PowerShell, so it’s easy to integrate with your data warehouse management operations. You can have up to 42 restore points at any point, and as all

14

Jun

Quick Recovery Time with SQL Data Warehouse using User-Defined Restore Points

We are excited to announce that SQL Data Warehouse (SQL DW) now supports User-Defined Restore Points! SQL DW is a flexible and secure analytics platform for the enterprise optimized for running complex queries fast across petabytes of data.

Previously, SQL DW supported only automated snapshots guaranteeing an eight-hour recovery point objective (RPO). While this snapshot policy provided high levels of protection, customers asked for more control over restore points to enable more efficient data warehouse management capabilities leading to quicker times of recovery in the event of any workload interruptions or user errors. 

Now, with user-defined restore points, in addition to the automated snapshots, you can initiate snapshots before and after significant operations on your data warehouse. With more granular restore points, you ensure that each restore point is logically consistent and limit the impact and reduce recovery time of restoring the data warehouse should this be needed. User-defined restore points can also be labeled so they are easy to identify afterwards. 

You can seamlessly create a restore point with a single statement in PowerShell, so it’s easy to integrate with your data warehouse management operations. You can have up to 42 restore points at any point, and as all