Category Archives : Database



Sync SQL data in large scale using Azure SQL Data Sync
Sync SQL data in large scale using Azure SQL Data Sync

Azure SQL Data Sync allows users to synchronize data between Azure SQL Databases and SQL Server databases in one-direction or bi-direction. This feature was first introduced in 2012. By that time, people didn’t host a lot of large databases in Azure. Some size limitations were applied when we built the data sync service, including up to 30 databases (five on-premises SQL Server databases) in a single sync group, and up to 500 tables in any database in a sync group.

Today, there are more than two million Azure SQL Databases and the maximum database size is 4TB. But those limitations of data sync are still there. It is mainly because that syncing data is a size of data operation. Without an architectural change, we can’t ensure the service can sustain the heavy load when syncing in a large scale. We are working on some improvements in this area. Some of these limitations will be raised or removed in the future. In this article, we are going to show you how to use data sync to sync data between large number of databases and tables, including some best practices and how to temporarily work around database and table limitations.

Sync data




Migrating to Azure SQL Database with zero downtime for read-only workloads

Special thanks to MSAsset engineering team’s Peter Liu (Senior Software Engineer), Vijay Kannan (Software Engineer), Sathya Muhandiramalage (Senior Software Engineer), Bryan Castillo (Principal Software Engineer) and Shail Batra (Principal Software Engineering Manager) for sharing their migration story with the Azure SQL Database product team.

Microsoft uses an internally written service called MSAsset to manage all Microsoft data center hardware around the world. MSAsset is used for tracking Microsoft’s servers, switches, storage devices, and cables across the company and requires 24/7 availability to accommodate break-fix requirements.

Before migrating to Azure SQL Database last year, MSAsset’s data tier consisted of a 107 GB database with 245 tables on SQL Server. The database was part of a SQL Server Always On Availability Group topology used for high availability and the scaling out of read-activity.

The MSAsset engineering team faced the following issues:

Aging hardware was not keeping up with stability and scale requirements. There was an increase in high severity data-tier incidents and no database administrator on staff to help with troubleshooting, mitigation, root cause analysis and ongoing maintenance. MSAsset’s database ran on SQL Server 2012. Developers and internal customers were increasingly requesting access to new SQL Server functionality.

After exploring various options




Azure Redis Cache geo-replication is now generally available

We are very happy to announce the general availability of geo-replication support for Azure Redis Cache. Redis Cache is Microsoft Azure’s Cache-as-a-Service offering, based on the popular open source Redis in-memory key-value store. With geo-replication support, Redis Cache joins a growing list of Azure services that enable developers and IT pros to build disaster recovery plans. This ensures the availability of mission-critical applications running on our cloud, even in the unlikely event of a widespread regional failure. In fact, customers can already design disaster resilient solutions on Azure, using Virtual Machines with Azure Site Recovery, Traffic Manager, and data services such as Cosmos DB, SQL Database, and now Redis Cache.

While announcing the general availability of geo-replication for Redis Cache, we would also like to take the opportunity to express our gratitude to everyone who has participated in the public preview. Your feedback has been invaluable to us and helped validate our implementation. Thank you!

Setting up geo-replication in Redis Cache

Geo-replication is a feature of the premium tier of Azure Redis Cache. You need a pair of premium cache instances before you can use geo-replication. If you already have a premium cache, you just need to add another one




Azure #CosmosDB Graph API now generally available
Azure #CosmosDB Graph API now generally available

Azure Cosmos DB Graph API is the first cloud database to provide graph functionality over a globally distributed managed service. This has enabled users to explore new ways of consuming their data with the use of the Gremlin language while still benefitting from global distribution, elastic scalability in storage and throughput, guaranteed low latency, consistency models, and enterprise-ready SLAs of Azure Cosmos DB.

In December, Azure Cosmos DB Graph API became generally available. This release includes several critical updates to the performance and latency, as well as expanding the application platforms that can be used with it. Here is a brief recap of the features included in the general availability release of Azure Cosmos DB Graph API.

Increased service performance and stability

Several performance and stability improvements have been applied to the Azure Cosmos DB Graph API service. These updates benefit the Gremlin query processing performance, as well as the connectivity experience when using any of the open-source Gremlin connectors. Additional fixes were also applied to the previously known Gremlin error parsing issues that used to be experienced.

Newly added support for Python and PHP application platforms!

Azure Cosmos DB Graph API now supports connections from Python and PHP applications




Build email notifications for SQL Database Automatic tuning recommendations

After reading this blogpost, you will be able to build your own custom email notifications for SQL Database Automatic tuning recommendations. We have listened to our customers requesting this functionality and have created a custom solution based on readily available technologies on Azure.

SQL Database performance tuning recommendations are generated by Azure SQL Database Automatic tuning. This solution provides peak performance and stable workloads through continuous database performance tuning utilizing Artificial Intelligence (AI).

Tuning recommendations are provided for each individual SQL database on Azure subscription for which Automatic tuning is enabled. Recommendations are related to index creation, index deletion, and optimization of query execution plans. Tuning recommendations are provided only in cases when AI considers them as beneficial to database performance.

Email notifications for Automatic tuning

Some of our customers have indicated a need to receive automated email notifications with suggested SQL Database Automatic tuning recommendations to be able to view and build automated alerts. For example, when the solution recommends that an index should be dropped to improve database performance, some customers would prefer to be notified of such event. Another customer scenario is, for example, emailing automated tuning recommendations to different database administrators in charge of different database




Azure #CosmosDB and Microsoft’s Project Olympus honored in InfoWorld’s 2018 Technology of the Year Awards

The word is out, and the industry is taking notice. Azure Cosmos DB is the world’s first globally distributed, multi-model database service with native NoSQL support. Designed for the cloud, Azure Cosmos DB enables you to build planet-scale applications that bring data to where your users are with SLA guarantees low latency, throughput, and 99.99% availability.

The experts at IDG’s InfoWorld recently recognized Azure Cosmos DB in the InfoWorld Technology of the Year Awards, zeroing in on its “innovative approach to the complexities of building and managing distributed systems,” which includes recognition for leveraging the work of Turing Award winner Leslie Lamport to deliver multiple consistency models. Azure Cosmos DB was also recognized for delivering a globally distributed system where users anywhere in the world can see the same version of data, no matter their location.

In addition, InfoWorld complimented the flexibility and variety of use cases with Azure Cosmos DB, from JSON-based document stores to support for MongoDB APIs and a SQL query option for Azure’s Table Storage.


“Do you need a distributed NoSQL database with a choice of APIs and consistency models? That would be Microsoft’s Azure Cosmos DB.”—InfoWorld, Technology of the Year 2018: The best hardware, software,




Lambda Architecture using Azure #CosmosDB: Faster performance, Low TCO, Low DevOps

Azure Cosmos DB provides a scalable database solution that can handle both batch and real-time ingestion and querying and enables developers to implement lambda architectures with low TCO. Lambda architectures enable efficient data processing of massive data sets. Lambda architectures use batch-processing, stream-processing, and a serving layer to minimize the latency involved in querying big data.

To implement a lambda architecture, you can use a combination of the following technologies to accelerate real-time big data analytics:

Azure Cosmos DB, the industry’s first globally distributed, multi-model database service. Apache Spark for Azure HDInsight, a processing framework that runs large-scale data analytics applications Azure Cosmos DB change feed, which streams new data to the batch layer for HDInsight to process The Spark to Azure Cosmos DB Connector

We wrote a detailed article that describes the fundamentals of a lambda architecture based on the original multi-layer design and the benefits of a “rearchitected” lambda architecture that simplifies operations.

What is a lambda architecture?

The basic principles of a lambda architecture are depicted in the figure above:

All data is pushed into both the batch layer and speed layer. The batch layer has a master dataset (immutable, append-only set of raw data)




Using the MySQL sys schema to optimize and maintain a database

The MySQL sys schema, which is fully enabled in Azure Database for MySQL 5.7, provides a powerful collection of user friendly views in a read-only database. Building on the MySQL Performance and Information Schemas, you can use the MySQL sys schema to troubleshoot performance issues and manage resources efficiently.

The MySQL Performance Schema, first available in MySQL 5.5, provides instrumentation for many vital server resources such as memory allocation, stored programs, metadata locking, etc. However, the Performance Schema contains more than 80 tables, and getting the necessary information often requires joining tables within the Performance Schema, as well as tables from the Information Schema. Let’s look more closely at how to use the MySQL sys schema.

There are 52 views in the sys schema, and each view is prefixed by one of the following:

Host_summary or IO: I/O related latencies. Innodb: Innodb buffer status and locks. Memory:Memory usage by the host and users. Schema: Schema related information, such as auto increment, indexes, etc. Statement: Information on SQL statements; this can be statements that resulted in a full table scan or long query time. User: Resources consumed and grouped by users. Examples are file I/Os, connections, and memory. Wait: Wait




Using EXPLAIN to profile slow queries in Azure Database for MySQL

Azure Database for MySQL is a PaaS (Platform as a Service) solution that Microsoft offers on Azure. Using Azure managed services for MySQL (and PostgreSQL), enables one to easily build an intelligent and secure application.

Though Microsoft has done a lot of work to optimize database performance, sometimes a simple query can easily become a bottle neck impacting overall database performance. Luckily, MySQL integrates a handy tool – the EXPLAIN statement – that can profile client queries and thus help you identify the root cause of a slow query. You can use an EXPLAIN statement to get information about how SQL statements are executed. With this information, you can profile which queries are running slow and why.

The output below shows an example of the execution of an EXPLAIN statement.

mysql> EXPLAIN SELECT * FROM tb1 WHERE id=100G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tb1 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 995789 filtered: 10.00 Extra: Using where

As you can see from this example, the value of key is NULL. This means that MySQL cannot find any indexes optimized for the query and it performs a full table




December 2017 Leaderboard of Database Systems contributors on MSDN

Congratulations to our December top 10 contributors! Alberto Morillo and Visakh Murukesan maintain their top positions.

This Leaderboard initiative was started in October 2016 to recognize the top Database Systems contributors on MSDN forums. The following continues to be the points hierarchy (in decreasing order of points):