Category Archives : Data Warehouse

02

Aug

Automatic intelligent insights to optimize performance with SQL Data Warehouse

We are excited to announce that SQL Data Warehouse (SQL DW) serves you intelligent performance insights within the Azure portal! SQL DW is a flexible, secure, and fully managed analytics platform for the enterprise optimized for running complex queries fast across petabytes of data.

Continuously delivering on fully managed experiences, customers no longer need to monitor their data warehouse to detect data skew and suboptimal table statistics with this release. Data skew and suboptimal table statistics are common issues that can degrade the performance of your data warehouse if left unchecked. At no additional cost, SQL DW surfaces intelligent insights for all Gen2 data warehouses and is tightly integrated with Azure Advisor to deliver you best practice recommendations. SQL DW analyzes your data warehouse collecting telemetry and surfaces recommendations based on your active workload. This analysis happens on a daily cadence where you can download recommendations, configure certain subscriptions to be analyzed, or postpone recommendations from being generated.

To check for recommendations, visit the Azure Advisor portal:

To generate these recommendations yourself, you can run the following T-SQL script and identify the specific tables being impacted by skew and statistics. For feedback on recommendations, please reach out

Share

01

Aug

Microsoft Azure Data welcomes Data Platform Summit attendees

I am extremely honored to be delivering the keynote next week at the Data Platform Summit (DPS) in Bangalore. DPS is one of the events that I am looking forward to participating in as it will provide me with a pulse on data and analytics in the Asia Pacific region. More importantly, the event gives me a chance to connect with you — our customers, partners, and overall community.

As a preface, I want to share with you some of the exciting work happening on the Azure Data team at Microsoft and invite you to take a closer look. It is no secret that today’s mobile-first, cloud-first world is evolving to the intelligent cloud and intelligent edge, the new frontier for computing and data management. And IT decision makers and developers are at the forefront, driving this evolution.

We see three key trends that are defining and shaping the evolution of data management. 

First, data continues to grow exponentially and it’s not stopping any time soon. The explosion of data creates an urgency to deliver insights. Second, the rate of cloud adoption is increasing. Investments in public cloud technologies drive the need to collect data and innovate further at a

Share

26

Jul

Avoid Big Data pitfalls with Azure HDInsight and these partner solutions

According to a Gartner 2017 prediction, “60 percent of big data projects will fail to go beyond piloting and experimentation, these projects will be abandoned”.

Whether you worked on an analytical project or are starting one, it is a challenge on any cloud. You need to juggle the intricacies of cloud provider services, open source frameworks and the apps in the ecosystem. Apache Hadoop & Spark are very vibrant open source ecosystems which have enabled enterprises to digitally transform their businesses using data. According to Matt Turck VC at FirstMark, it has been an exciting but complex year in the data world. “The data tech ecosystem has continued to fire on all cylinders.  If nothing else, data is probably even more front and center in 2018, in both business and personal conversations”.

However, with great power comes greater responsibility from the ecosystem. There is a lot more than just using open source or a managed platform to a successful project. You have to deal with:

The complexity of combining all the open source frameworks. Architecting a data lake to get insights for data engineers, data scientists and BI users. Meeting enterprise regulations such as security, access control, data sovereignty &

Share

23

Jul

Accelerated and Flexible Restore Points with SQL Data Warehouse

We are thrilled to announce that SQL Data Warehouse (SQL DW) has released accelerated and flexible restore points for fast data recovery. SQL DW is a fully managed and secure analytics platform for the enterprise, optimized for running complex queries fast across petabytes of data.

The ability to quickly restore a data warehouse offers customers data protection from accidental corruption, deletion, and disaster recovery. We have seen scenarios where compliance requirements and having multiple test and development environments of a data warehouse enforce stricter capabilities in this area as well. To continue delivering first-class data protection and recovery, we have released the following critical improvements which are seamlessly integrated within the Azure Portal.

Finer granularity for cross region and server restores

You can now restore across regions and servers using any restore point instead of selecting geo redundant backups which are taken every 24 hours. Cross region and server restore is supported for both user-defined or automatic restore points enabling finer granularity for additional data protection. With more restore points available, you can be assured that your data warehouse will be logically consistent when restoring across regions.

Fast restore with Enhanced Restore Points

You can now restore your

Share

23

Jul

Accelerated and Flexible Restore Points with SQL Data Warehouse

We are thrilled to announce that SQL Data Warehouse (SQL DW) has released accelerated and flexible restore points for fast data recovery. SQL DW is a fully managed and secure analytics platform for the enterprise, optimized for running complex queries fast across petabytes of data.

The ability to quickly restore a data warehouse offers customers data protection from accidental corruption, deletion, and disaster recovery. We have seen scenarios where compliance requirements and having multiple test and development environments of a data warehouse enforce stricter capabilities in this area as well. To continue delivering first-class data protection and recovery, we have released the following critical improvements which are seamlessly integrated within the Azure Portal.

Finer granularity for cross region and server restores

You can now restore across regions and servers using any restore point instead of selecting geo redundant backups which are taken every 24 hours. Cross region and server restore is supported for both user-defined or automatic restore points enabling finer granularity for additional data protection. With more restore points available, you can be assured that your data warehouse will be logically consistent when restoring across regions.

Fast restore with Enhanced Restore Points

You can now restore your

Share

12

Jul

Lightning fast query performance with Azure SQL Data Warehouse

Azure SQL Data Warehouse is a fast, flexible and secure analytics platform for enterprises of all sizes. Today we announced significant query performance improvements for Azure SQL Data Warehouse (SQL DW) customers enabled through enhancements in the distributed query execution layer.

Analytics workload performance is determined by two major factors, I/O bandwidth to storage and repartitioning speed, also known as shuffle speed. In this previous blog post, we described how SQL DW caches relevant data to take advantage of NVMe based local storage. In this blog post, we will go under the hood of SQL DW, to see how the shuffling speed has improved.

Data movement is an operation where parts of the distributed tables are moved to different nodes during query execution. This operation is required where the data is not available on the target node, most commonly when the tables do not share the distribution key. The most common data movement operation is shuffle. During shuffle, for each input row, SQL DW computes a hash value using the join columns and then sends that row to the node that owns that hash value. Either one or both sides of join can participate in the shuffle. The diagram below

Share

12

Jul

Azure sets new performance benchmarks with SQL Data Warehouse

As the amount of data grows exponentially, the pressure to quickly harness it for insights to share across the organization also increases rapidly. As Microsoft continues to evolve our analytics portfolio, we are committed to delivering a data warehouse solution that provides a fast, flexible, and secure analytics platform in the cloud.

Today we are excited to announce that Azure SQL Data Warehouse has set new performance benchmarks for cloud data warehousing by delivering at least 2x faster query performance compared to before. The key to this technical innovation is instant data movement, a capability that allows for extremely efficient movement between data warehouse compute nodes. At the heart of every distributed database system is the need to align two or more tables that are partitioned on a different key to produce a final or intermediate result set. Instant data movement in SQL Data Warehouse now accelerates this movement, resulting in faster query performance. You can learn more about how your query performance will improve from this blog.

Insights in minutes with Azure SQL Data Warehouse

We know that data makes every decision better, but decisions need to be timely to be competitive in the market. Fast decisions need

Share

12

Jul

Announcing new offers and capabilities that make Azure the best place for all your apps, data, and infrastructure

Just a few months back, I wrote about how we are helping companies move to the cloud with a flexible hybrid approach and cost-effective path to Azure. To help our customers realize even greater savings and simplify the migration to Azure, I’m pleased to share some exciting updates to address these customer needs.

Migrate to Azure with free Windows Server and SQL Server 2008 Extended Security Updates

Windows Server and SQL Server 2008/2008 R2 were hugely popular when they launched nearly 10 years ago and have been deployed in millions of instances worldwide. Our customers continue to run many of their business applications on these servers. With the decade anniversary coming, both these versions are nearing end of support – with SQL Server 2008 end of support in July 2019 and Windows Server 2008 end of support in January 2020. We know many customers want to continue with these servers, and we are here to help ensure all our customers have great options.

Today, we are announcing that we will provide extended security updates for three years past the end of support dates for free when running in Azure. This means customers get the efficiency of running servers in Azure,

Share

10

Jul

Azure HDInsight now supports Apache Spark 2.3

Apache Spark 2.3.0 is now available for production use on the managed big data service Azure HDInsight. Ranging from bug fixes (more than 1400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.

Data engineers relying on Python UDFs get 10 times to a 100 times more speed, thanks to revamped object serialization between Spark runtime and Python. Data Scientist will be delighted by better integration of Deep Learning frameworks like TensorFlow with Spark Machine Learning pipelines. Business Analysts will find liberating availability of fast vectorized reader for ORC file format which finally makes interactive analytics in Spark practical over this popular columnar data format. Developers building real-time applications may be interested in experimenting with new Continuous Processing mode in Spark Structured Streaming which brings event processing latency to millisecond level.

Vectorized object serialization in Python UDFs

It is worth mentioning that PySpark is already fast and takes advantage of the vectorized data processing in core Spark engine as long as you are using DataFrame APIs. This is good news as it represents majority of the use cases if you follow best practices for

Share

27

Jun

Enterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight

Azure HDInsight is one of the most popular services amongst enterprise for open source Hadoop & Spark analytics on Azure. With the plus 50 percent price cut on HDInsight, customers moving to the cloud are reaping more savings than ever.

PROS is a pioneer in using machine learning to give companies an accurate and profitable pricing. PROS Guidance product runs enormously complex pricing calculations based on variables that comprise multiple terabytes of data. In Azure HDInsight, a process that formerly took several days now takes just a few minutes.”– Ed Gonzalez, Product Manager, PROS

Today we are announcing updates to Apache Spark, Apache Kafka, ML Services, Azure Data Lake Storage Gen2 and enhancements to Enterprise Security Package. These new capabilities will continue to drive savings for many of our customers. In addition to this, Microsoft is continuing to deepen its commitment to the Apache Hadoop ecosystem and has extended its partnership with Hortonworks to bring the best of Apache Hadoop and the open source big data analytics to the Cloud.

Continued investment in Open Source for new capabilities and reliability Reliable Open Source

Microsoft’s is contributing to Apache Hadoop ecosystem and also ensuring Azure is the most reliable place

Share