Last week I attended the 2016 PASS Business Analytics Conference in San Jose. I attended my first BA Conference last year and had a blast – this year was equally as good. However I had a really interesting week in the lead up to this year’s conference. As you may have read last week, Rob had a mishap a couple of weeks ago that has left him less mobile than he would like. Seriously, I have seen the photos of the “temporary realignment” of his leg and I didn’t eat for 3 days after – it kinda looked (a lot) like this.
Anyway, I got a call last week from Rob asking if I would fill in for him at the PASS BA Pre-Conference session that he and Avi were scheduled to deliver. I was planning to be in San Jose 1 day early anyway so the timing was good. Besides, I would never miss an opportunity to present on the topic of DAX. I thought today I would share with you one of the topics I covered last week, which is a workbook that demonstrates a custom time intelligence pattern for smoothing out your data. This is not the most complex topic you will find about DAX on the Web. But what I learnt from talking to people at PASS BA Conference last week is that there are many people that would like to see more simple but practical “worked through examples” to read, study and learn from. I am definitely planning to take this feedback on board and share more practical worked through examples of real life scenarios in the future, but now on to today’s topic.
Average Monthly Sales Rolling Quarter
I have used Adventure Works to produce a simple line chart in Power BI showing total sales by month over time. The raw chart looks like this.
You can download the workbook here if you want to take a look, or simply look at the embedded version I have pinned at the bottom of the post.
Notice the spikes in sales in different months in the chart above? These spikes are very common in sales data, and in my experience they can be even more prevalent in weekly sales data. These spikes make it difficult to analyse trends in the data. You could put a trend line into the chart (thanks to the April update do Power BI), but a standard linear trend line is too simplistic to really see what is happening in your data, particularly if there are seasonal changes.
One good way to look at the trends in your data is to add an Average Monthly Sales Rolling Quarter trend line to the chart. You simply take the total sales of the last 3 months and then divide by 3. If you were doing a weekly trend, take the last 13 weeks and divide by 13. When you overlay this Avg Monthly Sales RQ line on the original chart, it looks like this.
Notice how the black “trend line” provides a “smoother” view of what is happening in the data? The longer the time horizon of your rolling period, the smoother the curve will appear. This next image is for Average Monthly Sales Rolling 6 months.
The rest of this blog will cover how to produce a report like this.
Preparing the Calendar Table
The first thing I did was to add a unique monthly ID in the form YYYYMM plus an integer ID column to my calendar table. It is not possible (in Power BI) to nest the Year Column and the Month Column onto the X axis, so you need to create a column for this. One of the many things I love about Power BI Desktop is the tight integration of Power Query with Power Pivot. If I was doing this in Excel, I would probably just write a Calculated Column in DAX, but the tight integration with Power Query makes it just as easy (maybe easier actually) to write the column in Power Query.
Add a Unique Month Name
I clicked on “edit queries”, clicked, “Add Custom Columns” and then inserted the following code.
The “times by 100” simply shifts the year across 2 digit registers making room for the month. Then I can simply add the month number to the “year x 100” to give me a unique month description in the format YYYYMM.
Add a Unique ID column
The next thing I did was to add a unique Integer ID column for months. Notice how the integer column simply starts at 1 and then increments by 1 for every period (month in this case) without ever resetting.
An Integer ID column is invaluable when you come to write custom time intelligence (see the formulas and explanation further down in this post), and I highly recommend you add one for every period in your calendar table that you want to use in your time intelligence patterns (one of the many things I learnt from Rob). You can read all of my tips for creating a good calendar table from my Knowledge Base here.
My PQL code (Power Query Language – pronounced Peequel) for this ID column is shown below.
First I subtracted 2000 from the year. Given my calendar started in 2001, this had the effect of giving me 1 for the first year, 2 for the second year etc. I then multiplied this number by 12 so that when I added the month number, I ended up with a totally unique ID. After doing this, the first ID number was 19 (2001 converted to 1, times 12 + 7 (July) = 19). I could have left 19 as being the first number and it would all be fine. But to tidy it up a bit, I simply subtracted 18 from the formula to force the ID column to start at 1. I couldn’t simply insert an ID column from the menu because the calendar table is at the day level.
Rename your Steps
One more thing I did was to rename the significant steps in the Power Query pane.
Notice how much easier it is to see what you did when it says “Added YYMM column”, “Added Unique ID” compared to “Added Custom”, “Added Custom 1” etc? The easiest way to rename the columns is to simply right click the step and then click “Rename”.
OK., then I clicked Close and Apply, then on to the DAX.
Average Sales Rolling Quarter DAX Formula
Here is my custom time intelligence formula for Average Sales Rolling Quarter.
Edit: Actually, rather than dividing by 3, a better approach is to divide by the distinct count of months. That way if it is the first period ever, it will divide by 1 instead of 3 etc.
The FILTER function in this formula does 3 things.
- It removes all filters that have been applied in the visual with the ALL(Calendar) portion of the formula. If you think about it, this is a mandatory step. The line chart filters the data model so that only 1 month is displayed for each point on the chart. Given the table is filtered for a single month, it would be impossible to go back in time and fetch 2 more months and add them to the data table unless the filter is removed first.
- It then “reads” the current filter context from the data point with the MAX(Calendar[MonthID]) portion of the formula. Any time you see an aggregation function inside a FILTER function like this, you need to simply realise that the aggregator is “reading the current filter context”. I could have used any aggregating function for this “reading the current filter context” portion of the formula (for this specific chart) and it would still work. The reason I used MAX for the first part and MIN for the second part is that this makes the formula more usable in other parts of the workbook. If you had an instance where 2 months were selected in the filter context and aggregated into a single value (say in a Matrix), the MIN function would detect the first month and the MAX function would detect the last month in that filter context.
- The last thing the formula does is to reapply new filters to the table. Whenever you see a naked column ( Calendar[MonthID] in this case ), then think about the underlying table in the data model. Notice how I am using the unique integer ID column to “step back in time”. No matter where I am in the Calendar table, I can always subtract 2 from the ID in the current filter context to fetch the time period that was “2 months ago” and so on. Even though it is the YYYYMM column that is in the visualisation, you can refer to any of the columns in the table – in this sense you should learn to “think in tables” of data. When the table is filtered, the YYYYMM value is visible in the chart, but the entire row is “visible” in the underlying data model – including the ID column. You can use this to your advantage and use this ID column instead of the YYYYMM column in your formulas. There would be no point subtracting 2 from the YYYYMM column as it wont work when you cross year end periods.
The FILTER function above returns a “filtered” table and passes that table of filtered values to CALCULATE. CALCULATE then propagates the new filter across to the connected tables (Sales in this case) and then “re-calculates” the measure [Total Sales] with this new filter context from the newly filtered data model.
I have extensive coverage, explanation and practice examples on how to write custom time intelligence functions in my book http://xbi.com.au/learndax if you are looking for a more comprehensive coverage.
Now for Something More Funky
Wouldn’t it be neat if I could let the user set the time horizon for the rolling average monthly period? I created a disconnected slicer to accept input from the user on how many periods they want to go back in time. The end user experience is really engaging, don’t you think?
Creating the Disconnected Slicer
I created the disconnected table by simply using the “Enter Data” option in Power BI. I named the table TimeHorizon and manually added the values 2 through 6 into the table.
Write a Harvester Measure
The next step is to write a harvester measure to “harvest” the selection from the user. The objective here is to capture the value selected in the slicer by the user. It would be nice to be able to write the following formula:
Selected Time Wrong = TimeHorizon[Time Horizon]
But do you see what is wrong with this measure? The measure has a “naked column”, and that is not allowed unless the function has a row context. If there is no row context, then you cannot use a naked column. So what to do?
One possible correct formula for the harvester measure is as follows:
Selected Time = MAX(TimeHorizon[Time Horizon])
To understand how and why this measure works, you need to think about the underlying table and “imagine” what is happening to it behind the scenes when the user clicks on the slicer. What happens is the slicer is filtering the underlying table. You can’t “see” it being filtered, but indeed it is being filtered. So when the user clicks “2”, the underlying table is filtered with only 1 row showing – the row with the value 2 of course is still “visible”. So now that only a single row in the table is “visible”, what is the maximum visible value in the column? Well it is 2 of course! In fact the minimum value is 2, the average value is 2, the sum of all visible values is 2 and so on. So when you write a harvester measure like this, you can use any aggregation function to “extract” the single value that has been selected from the slicer by the user, and this trick therefore gets around the “naked column” issue mentioned above.
The last step is to add a slicer to the visual from the TimeHorizon table, and then wow your users with a great interactive report. Here is an embedded version of the workbook using the Power BI public sharing service. You can read about Power BI public sharing here.
If you have a business scenario that you would like to see as a worked through example, add a comment below and I will consider it for a future blog post.