Category Archives : Data Science

02

Aug

Real example: improve accuracy, reduce training times for existing R codebase

When you buy an item on a favored website, does the site show you pictures of what others have bought? That’s the result of a recommendation system. Retailers have been building such systems for years, many built using the programming language R. For older implementations of recommender systems, it’s time to consider improving performance and scalability by moving these systems to the cloud —the Azure cloud.

Problem: to re-host and optimize an existing R model in Azure

Recently, we were asked to help a customer improve the performance and process surrounding the R implementation of their recommender solution and host the model in Azure. Many of their early analytic products were built in R, and they wanted to preserve that investment. After a review of their solution, we identified bottlenecks that could be vanquished. We worked together to find a way to significantly improve the model training time using parallel R algorithms. Then we worked to streamline how they operationalized their R model. All the work was done using libraries available with Microsoft Machine Learning Server (R Server). 

The architecture: Azure SQL + Machine Learning Server

There are several components and steps to the solution. We needed a database

02

Aug

Real example: improve accuracy, reduce training times for existing R codebase

When you buy an item on a favored website, does the site show you pictures of what others have bought? That’s the result of a recommendation system. Retailers have been building such systems for years, many built using the programming language R. For older implementations of recommender systems, it’s time to consider improving performance and scalability by moving these systems to the cloud —the Azure cloud.

Problem: to re-host and optimize an existing R model in Azure

Recently, we were asked to help a customer improve the performance and process surrounding the R implementation of their recommender solution and host the model in Azure. Many of their early analytic products were built in R, and they wanted to preserve that investment. After a review of their solution, we identified bottlenecks that could be vanquished. We worked together to find a way to significantly improve the model training time using parallel R algorithms. Then we worked to streamline how they operationalized their R model. All the work was done using libraries available with Microsoft Machine Learning Server (R Server). 

The architecture: Azure SQL + Machine Learning Server

There are several components and steps to the solution. We needed a database

02

Aug

Improve collaborative care and clinical data sharing with blockchain

Currently, the healthcare industry suffers major inefficiencies due to diverse uncoordinated and unconnected data sources/systems. Collaboration is vital to improve healthcare outcomes. With digitized health data, the exchange of healthcare information across healthcare organizations is essential to support effective care collaboration. Traditional health information exchanges have had limited success.

Blockchain offers new capabilities to greatly improve health information exchange. At Microsoft we are working to maximize the benefits of solutions that have the potential to improve patient outcomes, reduce healthcare costs, and enhance the experiences of patients and healthcare workers. With that I’d like to announce a new partner solution and pilot for a better health information exchange that uses blockchain, and runs in Microsoft Azure.

Interoperability and exchanges

Grapevine World, one of the leaders in the application of blockchain technology, make use of the Institute for Healthcare Enterprise (IHE) methodology for interoperability. They employ multiple blockchains for tracking data provenance, and provide a crypto token as means of exchange within their ecosystem.

Grapevine World is a decentralized ecosystem for the seamless exchange and utilization of health data in a standardized, secure manner. In collaboration with the University of Southampton and Tiani Spirit, they have developed a new blockchain-enabled

26

Jul

Avoid Big Data pitfalls with Azure HDInsight and these partner solutions

According to a Gartner 2017 prediction, “60 percent of big data projects will fail to go beyond piloting and experimentation, these projects will be abandoned”.

Whether you worked on an analytical project or are starting one, it is a challenge on any cloud. You need to juggle the intricacies of cloud provider services, open source frameworks and the apps in the ecosystem. Apache Hadoop & Spark are very vibrant open source ecosystems which have enabled enterprises to digitally transform their businesses using data. According to Matt Turck VC at FirstMark, it has been an exciting but complex year in the data world. “The data tech ecosystem has continued to fire on all cylinders.  If nothing else, data is probably even more front and center in 2018, in both business and personal conversations”.

However, with great power comes greater responsibility from the ecosystem. There is a lot more than just using open source or a managed platform to a successful project. You have to deal with:

The complexity of combining all the open source frameworks. Architecting a data lake to get insights for data engineers, data scientists and BI users. Meeting enterprise regulations such as security, access control, data sovereignty &

26

Jul

Orchestrating production-grade workloads with Azure Kubernetes Service

Happy Birthday Kubernetes! In the short three years that Kubernetes has been around, it has become the industry standard for orchestration of containerized workloads. In Azure, we have spent the last three years helping customers run Kubernetes in the cloud. As much as Kubernetes simplifies the task of orchestration, there’s plenty of setup and management that needs to take place for you to take full advantage of Kubernetes. This is where Azure Kubernetes Service (AKS) comes in. With Microsoft’s unique knowledge of the requirements of an enterprise, and our heritage of empowering developers, this managed service takes care of many of the complexities and delivers the best Kubernetes experience in the cloud.

In this blog post, I will dig into the top scenarios that Azure customers are building on Azure Kubernetes Service. After that, we will blow out the candles and have some cake.

If you are new to AKS, check out the Azure Kubernetes Service page and this video to learn more.

Lift and shift to containers

Organizations typically want to move to the cloud quickly and it is often not possible to re-write applications to take full advantage of cloud-native features right from the beginning. Containerizing applications makes

10

Jul

Azure HDInsight now supports Apache Spark 2.3

Apache Spark 2.3.0 is now available for production use on the managed big data service Azure HDInsight. Ranging from bug fixes (more than 1400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.

Data engineers relying on Python UDFs get 10 times to a 100 times more speed, thanks to revamped object serialization between Spark runtime and Python. Data Scientist will be delighted by better integration of Deep Learning frameworks like TensorFlow with Spark Machine Learning pipelines. Business Analysts will find liberating availability of fast vectorized reader for ORC file format which finally makes interactive analytics in Spark practical over this popular columnar data format. Developers building real-time applications may be interested in experimenting with new Continuous Processing mode in Spark Structured Streaming which brings event processing latency to millisecond level.

Vectorized object serialization in Python UDFs

It is worth mentioning that PySpark is already fast and takes advantage of the vectorized data processing in core Spark engine as long as you are using DataFrame APIs. This is good news as it represents majority of the use cases if you follow best practices for

10

Jul

Streaming analytics use cases with Spark on Azure
Streaming analytics use cases with Spark on Azure

Sensors, IoT devices, social networks, and online transactions are all generating data that needs to be monitored constantly and acted on quickly. As a result, the need for large-scale, real-time stream processing is more evident now than ever before.

With Azure Databricks running on top of Spark, Spark Streaming enables data scientists and data engineers with powerful interactive and analytical applications across both streaming and historical data, while inheriting Spark’s ease of use and fault tolerance characteristics. Azure Databricks readily integrates with a wide variety of popular data sources, including HDFS, Flume, Kafka, and Twitter.

There are four main use cases Spark Streaming is being used today:

Streaming ETL — Data is continuously cleaned and aggregated before being pushed into data stores. Triggers — Anomalous behavior is detected in real-time and further downstream actions are triggered accordingly. For example, unusual behavior of sensor devices generating actions. Data enrichment — Live data is enriched with more information by joining it with a static dataset allowing for a more complete real-time analysis. Complex sessions and continuous learning — Events related to a live session (e.g. user activity after logging into a website or application) are grouped together and analyzed. In some cases,

09

Jul

Power BI Embedded dashboards with Azure Stream Analytics

Azure Stream Analytics is a fully managed “serverless” PaaS service in Azure built for running real-time analytics on fast moving streams of data. Today, a significant portion of Stream Analytics customers use Power BI for real-time dynamic dashboarding. Support for Power BI Embedded has been a repeated ask from many of our customers, and today we are excited to share that it is now generally available.

What is Power BI Embedded?

Power BI Embedded simplifies how ISVs and developers can quickly add stunning visuals, reports, and dashboards to their apps. By enabling easy-to-navigate data exploration in their apps, ISVs help their customers make quick, informed decisions in context. This also enables faster time to market and competitive differentiation for all parties.

Additionally, Power BI Embedded enables users to work within the familiar development environments, Visual Studio or Azure.

Using Azure Stream Analytics with Power BI Embedded

Using Power BI with Azure Stream Analytics allows users of Power BI Embedded dashboards to easily visualize insights from streaming data within the context of the apps they use every day. With Power BI Embedded, users can also embed real-time dashboards right in their organization’s web apps.

No changes are required for your existing

09

Jul

Power BI Embedded dashboards with Azure Stream Analytics

Azure Stream Analytics is a fully managed “serverless” PaaS service in Azure built for running real-time analytics on fast moving streams of data. Today, a significant portion of Stream Analytics customers use Power BI for real-time dynamic dashboarding. Support for Power BI Embedded has been a repeated ask from many of our customers, and today we are excited to share that it is now generally available.

What is Power BI Embedded?

Power BI Embedded simplifies how ISVs and developers can quickly add stunning visuals, reports, and dashboards to their apps. By enabling easy-to-navigate data exploration in their apps, ISVs help their customers make quick, informed decisions in context. This also enables faster time to market and competitive differentiation for all parties.

Additionally, Power BI Embedded enables users to work within the familiar development environments, Visual Studio or Azure.

Using Azure Stream Analytics with Power BI Embedded

Using Power BI with Azure Stream Analytics allows users of Power BI Embedded dashboards to easily visualize insights from streaming data within the context of the apps they use every day. With Power BI Embedded, users can also embed real-time dashboards right in their organization’s web apps.

No changes are required for your existing

02

Jul

Azure Databricks provides the best Apache Spark™-based analytics solution for data scientists and engineers

Azure Databricks provides a fast, easy, and collaborative Apache Spark-based analytics platform to accelerate and simplify the process of building Big Data and AI solutions that drive the business forward, all backed by industry leading SLAs.

I am excited to announce the availability of a set of new features and regions, which enable our customers to accelerate their AI journey with Azure Databricks.

RStudio integration generally available with Azure Databricks

Today, we are announcing the ability to use RStudio with Azure Databricks. Customers can now analyze data with RStudio while taking advantage of the scale and flexibility of Azure Databricks.

RStudio offers in a rich IDE that is very popular with the data scientists in the R community. With this integration, RStudio runs directly inside Azure Databricks. This enables data scientists to continue to use the familiar and powerful RStudio IDE while gaining the ability to build their solutions at unprecedented scale. Azure Databricks provides the flexibility to start with small jobs and automatically scale up to production workloads in the same environment.

Setting up RStudio in Azure Databricks is simple and fast. Learn how to get started today.

Azure Databricks available in Australia and UK

We are excited