Tags : #AzureML

25

Jul

Lessons learned benchmarking fast machine learning algorithms
Lessons learned benchmarking fast machine learning algorithms

This post is authored by Miguel Fierro, Data Scientist, Mathew Salvaris, Data Scientist, Guolin Ke, Associate Researcher, and Tao Wu, Principal Data Science Manager, all at Microsoft.

Boosted decision trees are responsible for more than half of the winning solutions in machine learning challenges hosted at Kaggle, according to KDNuggets. In addition to superior performance, these algorithms have practical appeal as they require minimal tuning. In this post, we evaluate two popular tree boosting software packages: XGBoost and LightGBM, including their GPU implementations. Our results, based on tests on six datasets, are summarized as follows:

XGBoost and LightGBM achieve similar accuracy metrics. LightGBM has lower training time than XGBoost and its histogram-based variant, XGBoost hist, for all test datasets, on both CPU and GPU implementations. The training time difference between the two libraries depends on the dataset, and can be as big as 25 times. XGBoost GPU implementation does not scale well to large datasets and ran out of memory in half of the tests. XGBoost hist may be significantly slower than the original XGBoost when feature dimensionality is high.

All our code is open-source and can be found in this repo. We will explain the algorithms behind these libraries

10

Jul

Announcing Cortana Intelligence Solution Evaluation Tool
Announcing Cortana Intelligence Solution Evaluation Tool

This post is authored by Avi Bathula, Sr. Program Manager Lead, and Jamie Olson, Senior SDE on the Microsoft Cloud & AI Ecosystem team.

Most of you know about Microsoft’s Cortana Intelligence and how solutions/apps built with Cortana Intelligence are already helping organizations transform their data into actionable insights. As noted in the linked blog post, AppSource is a single destination for Business Decision Makers (BDMs) to discover and seamlessly try business apps built by partners and verified by Microsoft. In May 2017 AppSource onboarded Office apps, expanding its opportunity space to over 100 million commercial workers.

In recent months, the Cloud and AI ecosystem team has evaluated and onboarded ~26 Cortana Intelligence solutions to AppSource spanning various functions and industry verticals. For more information on onboarding and the evaluation process, see the Cortana Intelligence AppSource publishing guide.

Ensuring high-quality solutions for Customers

We’re excited to announce the public preview for the Cortana Intelligence solution evaluation tool – an automated assessment of advanced analytics solutions for Microsoft’s recommended best practices allowing partners to get actionable feedback on how their solutions fare and pointers to how to address gaps.

Going forward, we will need partners to run this tool against their

29

Mar

Microsoft Makes Big Data and Analytics Easier in the Cloud

This post is by Joseph Sirosh, Corporate Vice President of the Data Group at Microsoft.

This week I’m joining thousands of people attending Strata + Hadoop World in San Jose to explore the technology and business of big data and data science. As part of our participation in the conference, we are announcing several important investments to continue delivering on our commitment to make big data processing and analytics simpler and more accessible:

Advanced analytics at scale with R Server for HDInsight and the latest version of Spark for HDInsight are now available in preview: Customers can leverage their existing R skills and reuse current code to run at scale. R Server for HDInsight offers popular scalable R algorithms and the ability to parallelize any existing R function. We are also releasing the latest version of Spark for HDInsight, which can deliver 7x performance over MapReduce for most analytics. These capabilities give our customers the ability to train and run advanced analytics and ML models on larger datasets, and much faster than previously possible in the cloud. Out-of-the-box application integration, providing easier access to popular big data apps: Customers can now discover and deploy popular big data applications with HDInsight…

29

Mar

Microsoft Makes Big Data and Analytics Easier in the Cloud

This post is by Joseph Sirosh, Corporate Vice President of the Data Group at Microsoft.

This week I’m joining thousands of people attending Strata + Hadoop World in San Jose to explore the technology and business of big data and data science. As part of our participation in the conference, we are announcing several important investments to continue delivering on our commitment to make big data processing and analytics simpler and more accessible:

Advanced analytics at scale with R Server for HDInsight and the latest version of Spark for HDInsight are now available in preview: Customers can leverage their existing R skills and reuse current code to run at scale. R Server for HDInsight offers popular scalable R algorithms and the ability to parallelize any existing R function. We are also releasing the latest version of Spark for HDInsight, which can deliver 7x performance over MapReduce for most analytics. These capabilities give our customers the ability to train and run advanced analytics and ML models on larger datasets, and much faster than previously possible in the cloud. Out-of-the-box application integration, providing easier access to popular big data apps: Customers can now discover and deploy popular big data applications with HDInsight…

29

Mar

Microsoft Makes Big Data and Analytics Easier in the Cloud

This post is by Joseph Sirosh, Corporate Vice President of the Data Group at Microsoft.

This week I’m joining thousands of people attending Strata + Hadoop World in San Jose to explore the technology and business of big data and data science. As part of our participation in the conference, we are announcing several important investments to continue delivering on our commitment to make big data processing and analytics simpler and more accessible:

Advanced analytics at scale with R Server for HDInsight and the latest version of Spark for HDInsight are now available in preview: Customers can leverage their existing R skills and reuse current code to run at scale. R Server for HDInsight offers popular scalable R algorithms and the ability to parallelize any existing R function. We are also releasing the latest version of Spark for HDInsight, which can deliver 7x performance over MapReduce for most analytics. These capabilities give our customers the ability to train and run advanced analytics and ML models on larger datasets, and much faster than previously possible in the cloud. Out-of-the-box application integration, providing easier access to popular big data apps: Customers can now discover and deploy popular big data applications with HDInsight…

21

Mar

Hadoop is famously scalable. Cloud computing is famously scalable. But R – the preferred software and lingua franca of data scientists worldwide – not so much. But what if we seamlessly combined Hadoop with the cloud and R to create a scalable data science platform? Imagine exploring, transforming, modeling, and scoring data at any scale from the comfort of your favorite R environment. Now, imagine calling a simple R function to operationalize your predictive model as a scalable, cloud-based web service. 

Learn how to leverage the magic of Hadoop on-premises or in the cloud to run your R code, with thousands of open source R extension packages, and distributed implementations of the most popular machine learning algorithms, at scale. Click here or on the image below to register for this free webinar.

ML Blog Team

21

Mar

Hadoop is famously scalable. Cloud computing is famously scalable. But R – the preferred software and lingua franca of data scientists worldwide – not so much. But what if we seamlessly combined Hadoop with the cloud and R to create a scalable data science platform? Imagine exploring, transforming, modeling, and scoring data at any scale from the comfort of your favorite R environment. Now, imagine calling a simple R function to operationalize your predictive model as a scalable, cloud-based web service. 

Learn how to leverage the magic of Hadoop on-premises or in the cloud to run your R code, with thousands of open source R extension packages, and distributed implementations of the most popular machine learning algorithms, at scale. Click here or on the image below to register for this free webinar.

ML Blog Team

21

Mar

Hadoop is famously scalable. Cloud computing is famously scalable. But R – the preferred software and lingua franca of data scientists worldwide – not so much. But what if we seamlessly combined Hadoop with the cloud and R to create a scalable data science platform? Imagine exploring, transforming, modeling, and scoring data at any scale from the comfort of your favorite R environment. Now, imagine calling a simple R function to operationalize your predictive model as a scalable, cloud-based web service. 

Learn how to leverage the magic of Hadoop on-premises or in the cloud to run your R code, with thousands of open source R extension packages, and distributed implementations of the most popular machine learning algorithms, at scale. Click here or on the image below to register for this free webinar.

ML Blog Team