Category Archives : Big Data

17

Jul

Intelligent Healthcare with Azure Bring Your Own Key (BYOK) technology

Sensitive health data processed by hospitals and insurers is under constant attack from malicious actors who try to gain access to health care systems with the goal to steal or extort personal health information. Change Healthcare has implemented a Bring Your Own Key (BYOK) solution based on Microsoft Azure Cloud services and introduces Intelligent Healthcare today.

Change Healthcare is enabling payers and providers to have immediate and granular control over their data by transferring the ownership of encryption keys used to encrypt data at rest. This allows Change Healthcare customers to make security changes without involvement by Change Healthcare personnel and have their cloud-based systems re-encrypted and operational without service interruptions. The BYOK management capabilities include revoking access to encryption keys and rotating or deleting encryption keys on demand and at the time of a potential compromise. 
 
For the Intelligent Healthcare solution, Change Healthcare implemented Azure SQL Database Transparent Data Encryption (TDE) with BYOK support. TDE with BYOK encrypts databases, log files and backups when written to disk, which protects data at rest from unauthorized access. TDE with BYOK support integrates with Azure Key Vault, which provides highly available and scalable secure storage for RSA cryptographic keys backed by

17

Jul

Blockchain as a tool for anti-fraud

Healthcare costs are skyrocketing. In 2016, healthcare costs in the US are estimated at nearly 18 percent of the GDP! Healthcare is becoming less affordable worldwide, and a serious chasm is widening between those that can afford healthcare and those that cannot. There are many factors driving the high cost of healthcare, one of them is fraud. In healthcare, there are several types of fraud including prescription fraud, medical identity fraud, financial fraud, and occupational fraud. The National Health Care Anti-Fraud Association estimates conservatively that health care fraud costs the US about $68 billion annually, which is about three percent of the US total $2.26 trillion in overall healthcare spending. There are two root vulnerabilities in healthcare organizations: insufficient protection of data integrity, and a lack of transparency.

Insufficient protection of data integrity enables fraudulent modification of records

Cybersecurity involves safeguarding the confidentiality, availability, and integrity of data. Often cybersecurity is mistakenly equated with protecting just the confidentiality of data to prevent unauthorized access. However, equally important is protecting the availability of data. That is, you must secure timely and reliable access to data, as well as the integrity of the data. You must ensure records are accurate, complete,

12

Jul

Welcome our newest family member – Data Box Disk

Last year at Ignite, I talked to you about the preview of Azure Data Box, a ruggedized, portable, and simple way to move large datasets into Azure. So far, the response has been phenomenal. Customers have used Data Box to move petabytes of data into Azure.

While our customers and partners love Data Box, they told us that they also wanted a lower capacity, even easier-to-use option. They cited examples such as moving data from Remote/Office Branch Offices (ROBOs), which have smaller data sets and minimal on-site tech support. They said they needed an option for recurring, incremental transfers for ongoing backups and archives. And they said it needed to have the same traits as Data Box – namely fast, simple, and secure.

Got it. We hear the message loud and clear. So, I’m here today with our partners at Inspire 2018 to announce a new addition to the Data Box family: Azure Data Box Disk.

How it works

Data Box Disk leverages the same infrastructure and management experience as Azure Data Box. You can receive up to five 8TB disks, totaling 40TB per order. Data Box Disk is fast, utilizing SSD technology, and is shipped overnight, so you can

12

Jul

Lightning fast query performance with Azure SQL Data Warehouse

Azure SQL Data Warehouse is a fast, flexible and secure analytics platform for enterprises of all sizes. Today we announced significant query performance improvements for Azure SQL Data Warehouse (SQL DW) customers enabled through enhancements in the distributed query execution layer.

Analytics workload performance is determined by two major factors, I/O bandwidth to storage and repartitioning speed, also known as shuffle speed. In this previous blog post, we described how SQL DW caches relevant data to take advantage of NVMe based local storage. In this blog post, we will go under the hood of SQL DW, to see how the shuffling speed has improved.

Data movement is an operation where parts of the distributed tables are moved to different nodes during query execution. This operation is required where the data is not available on the target node, most commonly when the tables do not share the distribution key. The most common data movement operation is shuffle. During shuffle, for each input row, SQL DW computes a hash value using the join columns and then sends that row to the node that owns that hash value. Either one or both sides of join can participate in the shuffle. The diagram below

11

Jul

Kafka 1.0 on HDInsight lights up real time analytics scenarios

Data engineers love Kafka on HDInsight as a high-throughput, low-latency ingestion platform in their real time data pipeline. They already leverage Kafka features such as message compression, configurable retention policy, and log compaction. With the release of Apache Kafka 1.0 on HDInsight, customers now get key features that make it easy to implement the most demanding scenarios. Here is a quick introduction:

Idempotent producers so that you don’t have to deduplicate

Consider a cellular billing system, in which the producer writes the amount of data consumed by users to a Kafka topic called data-consumption-events. If the broker or the connection fails, the producer will not get an acknowledgment of a message write and will retry that message. This will lead to duplicate writes to the system, causing users to be overbilled.

In critical scenarios like above, data engineers had to write and maintain custom deduplication logic, such as hashing and saving message ids. However, with idempotent producers turned on, Kafka handles that logic for you. Records include unique producer ids and the sequence number of the message. Kafka brokers will only accept a message from a producer if the sequence number is exactly one more than the committed sequence number

10

Jul

Azure HDInsight now supports Apache Spark 2.3

Apache Spark 2.3.0 is now available for production use on the managed big data service Azure HDInsight. Ranging from bug fixes (more than 1400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.

Data engineers relying on Python UDFs get 10 times to a 100 times more speed, thanks to revamped object serialization between Spark runtime and Python. Data Scientist will be delighted by better integration of Deep Learning frameworks like TensorFlow with Spark Machine Learning pipelines. Business Analysts will find liberating availability of fast vectorized reader for ORC file format which finally makes interactive analytics in Spark practical over this popular columnar data format. Developers building real-time applications may be interested in experimenting with new Continuous Processing mode in Spark Structured Streaming which brings event processing latency to millisecond level.

Vectorized object serialization in Python UDFs

It is worth mentioning that PySpark is already fast and takes advantage of the vectorized data processing in core Spark engine as long as you are using DataFrame APIs. This is good news as it represents majority of the use cases if you follow best practices for

09

Jul

Power BI Embedded dashboards with Azure Stream Analytics

Azure Stream Analytics is a fully managed “serverless” PaaS service in Azure built for running real-time analytics on fast moving streams of data. Today, a significant portion of Stream Analytics customers use Power BI for real-time dynamic dashboarding. Support for Power BI Embedded has been a repeated ask from many of our customers, and today we are excited to share that it is now generally available.

What is Power BI Embedded?

Power BI Embedded simplifies how ISVs and developers can quickly add stunning visuals, reports, and dashboards to their apps. By enabling easy-to-navigate data exploration in their apps, ISVs help their customers make quick, informed decisions in context. This also enables faster time to market and competitive differentiation for all parties.

Additionally, Power BI Embedded enables users to work within the familiar development environments, Visual Studio or Azure.

Using Azure Stream Analytics with Power BI Embedded

Using Power BI with Azure Stream Analytics allows users of Power BI Embedded dashboards to easily visualize insights from streaming data within the context of the apps they use every day. With Power BI Embedded, users can also embed real-time dashboards right in their organization’s web apps.

No changes are required for your existing

09

Jul

Power BI Embedded dashboards with Azure Stream Analytics

Azure Stream Analytics is a fully managed “serverless” PaaS service in Azure built for running real-time analytics on fast moving streams of data. Today, a significant portion of Stream Analytics customers use Power BI for real-time dynamic dashboarding. Support for Power BI Embedded has been a repeated ask from many of our customers, and today we are excited to share that it is now generally available.

What is Power BI Embedded?

Power BI Embedded simplifies how ISVs and developers can quickly add stunning visuals, reports, and dashboards to their apps. By enabling easy-to-navigate data exploration in their apps, ISVs help their customers make quick, informed decisions in context. This also enables faster time to market and competitive differentiation for all parties.

Additionally, Power BI Embedded enables users to work within the familiar development environments, Visual Studio or Azure.

Using Azure Stream Analytics with Power BI Embedded

Using Power BI with Azure Stream Analytics allows users of Power BI Embedded dashboards to easily visualize insights from streaming data within the context of the apps they use every day. With Power BI Embedded, users can also embed real-time dashboards right in their organization’s web apps.

No changes are required for your existing

03

Jul

IP filtering for Event Hubs and Service Bus

For scenarios in which Azure Event Hubs or Azure Service Bus is only accessible from certain well-known sites, the IP Filter feature enables you to configure rules for accepting or rejecting traffic originated from specify IP addresses, for instance the addresses that come under corporate NAT gateway. The Azure team is happy to announce the public preview of IP Filtering for Service Bus Premium and Event Hubs Standard and Dedicated price plans.

This feature allows users to control which IPs are accessing their resources. Some characteristics of this feature:

Rules allow you to specify accept and reject actions on IP masks. The rules work with IPv4 addresses. Rules are applied to the namespace level. You can have multiple rules and they are applied in order. The first rule that matches the IP address determines the accept or reject action. Requests from IPs that are rejected receive an unauthorized response.

Today these features are available in the Azure portal as shown in the screenshot. You can find them at the Event Hubs or Service Bus namespace level or via an ARM template.

The below ARM template shows how you can use this feature. This template takes the following parameters:

ipFilterRuleName

03

Jul

IP filtering for Event Hubs and Service Bus

For scenarios in which Azure Event Hubs or Azure Service Bus is only accessible from certain well-known sites, the IP Filter feature enables you to configure rules for accepting or rejecting traffic originated from specify IP addresses, for instance the addresses that come under corporate NAT gateway. The Azure team is happy to announce the public preview of IP Filtering for Service Bus Premium and Event Hubs Standard and Dedicated price plans.

This feature allows users to control which IPs are accessing their resources. Some characteristics of this feature:

Rules allow you to specify accept and reject actions on IP masks. The rules work with IPv4 addresses. Rules are applied to the namespace level. You can have multiple rules and they are applied in order. The first rule that matches the IP address determines the accept or reject action. Requests from IPs that are rejected receive an unauthorized response.

Today these features are available in the Azure portal as shown in the screenshot. You can find them at the Event Hubs or Service Bus namespace level or via an ARM template.

The below ARM template shows how you can use this feature. This template takes the following parameters:

ipFilterRuleName