Use AU Analyzer for faster, lower cost Data Lake Analytics
Do you use Data Lake Analytics and wonder how many Analytics Units your jobs should have been assigned? Do you want to see if your job could consume a little less time or money? The recently-announced AU Analyzer tool can help you today!
See our recent announcement of the AU Analyzer, available in both Visual Studio and the Azure Portal. Using this feature with our cost-saving guide will help you get the most out of your Data Lake Analytics spend.
Optimize for cost
To see a simple recommendation for how much your job may cost in a balanced setting, click on the Balanced recommendation in the AU Analysis tab for your job. You’ll see the estimated running time and cost of your job if you assign the specified number of AUs, barring other changes to the job or any of its dependencies.
Optimize for speed
To see an estimate of how fast your job can reasonably run, click on the Fast recommendation. Just as before, you’ll see the estimated running time and cost of your job if you assign the specified number of AUs, barring other changes to the job or any of its dependencies.
Try out different scenarios by assigning a custom number of Analytics Units by clicking the Custom card in the AU Analysis tab for your job and moving the slider.
For the following job, it looks like Balanced is the best option for me. The next time I submit this job, I’ll choose 185 AUs, and spend 3.63 USD (13%) more to reduce my job’s running time by 16 minutes.
How does the AU Analyzer work?
The AU Analyzer looks at all the vertices (or nodes) in your job, analyzes how long they ran and their dependencies, then models how long the job might run if a certain number of vertices could run at the same time. Each vertex may have to wait for input or for its spot in line to run. The AU Analyzer isn’t 100% accurate, but it provides general guidance to help you choose the right number of AUs for your job.
You’ll notice that there are diminishing returns when assigning more AUs, mainly because of input dependencies and the running times of the vertices themselves. So, a job with 10,000 total vertices likely won’t be able to use 10,000 AUs at once, since some will have to wait for input or for dependent vertices to complete.
In the graph below, here’s what the modeler might produce, when considering the different options. Notice that when the job is assigned 1427 AUs, assigning more won’t reduce the running time. 1427 is the “peak” number of AUs that can be assigned.
How do we calculate the recommendations?
We generate the Balanced recommendation by first modeling the job’s running time multiple times for various selections of AUs. We then walk through each of the options, from 1 AU upwards, and identify the option which, compared with one fewer AU, offers a performance boost equal to or greater than the increase in cost.
The approach for Fast recommendation is similar. We walk through each of the same options, from 1 AU upwards, and identify the option which, compared with one fewer AU, offers a performance boost equal to or greater than half the increase in cost.
To make sure we’re giving you quality recommendations, we may update the recommendation logic for Balanced and Fast and add more recommendations in the future.
For example, if increasing from 374 AUs to 375 AUs might give a 3% speed boost for only a 3% increase in cost, we recommend 6 AUs as the Balanced option. If increasing from 825 AUs to 826 AUs might give a 2% speed boost for a 4% increase in cost, we recommend 826 AUs as the Fast option.
Try this feature today!
Give this feature a try and let us know your feedback in the comments.
Interested in any other samples, features, or improvements? Let us know and vote for them on our UserVoice page!