Get up to speed with Azure HDInsight: The comprehensive guide
Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics. With HDInsight, you get managed clusters for various Apache big data technologies, such as Spark, MapReduce, Kafka, Hive, HBase, Storm and ML Services backed by a 99.9% SLA. In addition, you can take advantage of HDInsight’s rich ISV application ecosystem to tailor the solution for your specific scenario.
HDInsight covers a wide variety of big data technologies, and we have received many requests for a detailed guide. Whether you want to just get started with HDInsight, or become a Big Data expert, this post has you covered with all the latest resources.
The HDInsight team has been working hard releasing new features, including the launch of HDInsight 4.0. We make major product announcements on the Azure HDInsight and Big Data blogs. Here is a selection of the most recent updates:
- Launch of HDInsight 4.0 at Microsoft Ignite 2018 (Session Video)
- Azure HDInsight brings next generation Apache Hadoop 3.0 and enterprise security to the cloud
- Deep dive into Azure HDInsight 4.0
- HDInsight Enterprise Security Package now generally available
- Exciting new capabilities on Azure HDInsight
- 6-part best practice guide for on premises Hadoop to cloud migration
- Azure Toolkit for IntelliJ – Spark Interactive Console
- Secure incoming traffic to HDInsight clusters in a virtual network with private endpoint
- Apache Spark jobs gain up to 9x speed up with HDInsight IO Cache
- Bring Your Own Keys for Apache Kafka on HDInsight
- New Azure HDInsight management SDK now in public preview
HDInsight Developer Guide
The HDInsight Developer Guide covers both basic as well as advanced scenarios for developers, data scientists, or data engineers getting started or learning more with Azure HDInsight. This step-by-step guide starts with a basic overview and use-cases, followed by best practices on how to configure clusters, plan capacity, and develop applications for different workloads such as Hive, Spark, HBase and others. Finally, the guide concludes with advanced use-cases and scenarios along with samples.
HDInsight training resources
In addition to the guide, we would also like to highlight the other resources available to learn or know more about HDInsight. Please see below for the different learning resources available for HDInsight including self-paced training, documentation, videos, and more.
Self-paced online trainings
Self-paced online training on edX, an online learning destination, offers high-quality courses from the world’s best universities and institutions to learners everywhere. These self-paced training courses are available for free as part of Microsoft Professional Program for Big Data, or you can add a verified certificate for a fee. These courses have been updated and below are the three specific courses on HDInsight.
- Processing Big Data in Azure HDInsight: This course teaches you how to use the Hadoop technologies in Microsoft Azure HDInsight to build batch processing solutions that cleanse and reshape data for analysis.
- Implementing Real Time Analytics in Azure HDInsight: In this course, you’ll learn how to implement low-latency and streaming big data solutions using Hadoop technologies like HBase, Storm, and Spark on Microsoft Azure HDInsight.
- Implementing Predictive Analytics in Azure HDInsight: In this course, learn how to implement predictive analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight.
Also see self-paced online training on Microsoft Virtual Academy, which provides free online training by world-class experts to help you build your technical skills and advance your career. Ready to continue your big data deep dive? Below are the in-depth course to explore Hadoop and Spark on HDInsight, which are a key part of the analytics portion of MVA Data Series.
HDInsight Documentation: This is the landing page for HDInsight documentation that is useful to any developer, data scientist, or big data administrator. This documentation includes everything from getting started to specific scenarios and use-cases with HDInsight. You can download the complete documentation using the “Download as PDF” option available on bottom left side of the page, or search for specific topics on the top left search box.
HDInsight Troubleshooting Guide: We are constantly updating the troubleshooting guide so that you can easily debug or troubleshoot issues.
Instructor led training
Whether you’re looking to enhance your proficiency in specific technologies like Azure Machine Learning Studio or in overall architecture of Big Data and Analytics, we’ve likely got a course that can get you on your way. The instructor-led and self-paced video courses span from short webinars, to multi-day workshops, to longer-term deep dives on demand. Check back frequently because new offerings are regularly added by Microsoft and our training partners.
The following videos are great to learn about the scope and features in HDInsight.
- Deep Dive on Apache Spark Performance Tuning on HDInsight: Part 1, Part 2, Part 3, and Part 4
- New Spark UI extensions for better job performance analysis
- Optimizing HBase Performance in HDInsight
- Introduction to Apache Kafka on Azure HDInsight
- Fine-grained security with Apache Ranger on HDInsight Kafka
- Bring your own keys on Apache Kafka with Azure HDInsight
- HDInsight: Fast Interactive Queries with Hive on LLAP
- Introducing ML Services 9.3 in Azure HDInsight
- Compliance Standards on HDInsight
- Big Data Partner Program
- How to use Machine Learning on Azure Government with HDInsight
- StreamSets on Azure HDInsight
2017-18 conference recordings
- Gaining deeper insights from big data using open source analytics on Azure HDInsight
- Five essential new enhancements in Azure HDInsight
DataWorks Summit 2018
- Building a Modern Data Warehouse on Microsoft Azure with Azure HDInsight and Azure Databricks
- Zero ETL analytics with LLAP in Azure HDInsight
- Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
- ISV Showcase: End-to-end Machine Learning using H2O on Azure
- Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
- Real-time data streams with Apache Kafka and Spark
Hands on labs
- Data science lab: This lab specifically focuses on the Spark ML component of Spark and highlights its value proposition in the Apache Spark Big Data processing framework.
- Hive lab: This lab focuses on how customers can leverage HDInsight Hive to analyze big data stored in Azure Blob Storage.
Get Microsoft certified on HDInsight
- Perform Data Engineering on Microsoft Azure HDInsight
- Designing and Implementing Big Data Analytics Solutions
We hope that you will find the developer guide and all the other resources helpful. If you have any feedback or questions, feel free to send us an email at AskHDInsight@microsoft.com. We’d love to hear from you. You can also stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #HDInsight and @AzureHDInsight.