An 11,000-Row Calendar Table Spanning from 2000 to 2030:
Most of the Time This is Harmless Overkill
A 60x Speed Improvement From a Most Ordinary Place
I’ve been doing some work lately for a client who really pushes the DAX envelope. One of the top-three models I’ve ever worked on in terms of complexity, no doubt. And really, my role is just to help fine-tune it and add a few bells and whistles. They built this sucker themselves and I am way impressed.
Crazy stuff. Formulas that use outlier dates from one Data table (“fact” table) to then subsequently filter another Data table (via its related Calendar table), but then wrap that up inside a MAXX inside a SUMX… and it all makes perfect business sense. It’s magic.
But speed ain’t its strong suit. We tried all the usual tricks – “de-importing” unneeded columns, replacing calculated columns with imported versions, etc.
And it was still way too slow. Then we tried something even simpler, and things got 60x faster.
Oh The Nuances of FILTER()
In this model, we do a lot of FILTER(DateTable, <complex filter expression) in our measures.
And the FILTER function, when we have sufficiently complex filter expressions, has two particular characteristics worth noting:
- It picks up each row in the specified table, one at a time, to determine if the row should be kept or not. No bulk scans where it can examine blocks of rows at once. If that sounds slow to you, you’re on the right track.
- It forces the calculation into single-threaded mode. What is single-threaded mode, you ask? It’s geekspeak for “running your engine on one cylinder.” It means that much of your computer’s CPU horsepower is left on the sidelines, watching a single “Core” (a mini-CPU, essentially) maxing itself out to do all the work.
Divide The Number “1” By the Number of Cores on Your Computer. If Your CPU Usage
is Pegged Near that Percentage for a Long Time, You Are in Single-Threaded Land.
(Note in the picture above that all four cores show activity, which might make you think they are all being used at once. Don’t be fooled! The operating system is just rotating the DAX engine’s work around amongst the 4 cores, so as not to burn one core out. It rotates things around many times per second, so it LOOKS LIKE sharing. But at any given moment, trust me, a single core is working at 100% capacity).
In short, the FILTER function is often “at fault” when performance tanks. Used clumsily against large data sets and/or complex calculations, it can grab a single core (which is basically a CPU) and hold it hostage for long periods of time.
Punchline: 1,000 Row FILTER() is 11x Faster Than 11,000 Row FILTER()
You may have already guessed the answer here. We have less than 3 years worth of actual data, but our calendar/date tables had 30 years in them – about 11x as many days as we needed.
I trimmed those calendar tables down to “fit” the range of our actual data. And that 11x reduction in calendar rows made things 11x faster!
Our other tables with millions of rows? Untouched. The formulas themselves? Untouched. Just smaller calendar tables. That’s it.
“Wait, You Said 60X!”
Yes. Without getting too deep into boring detail, I’ll tell you that there are TWO calendar tables in this model. One of them was about 11x as big as it needed to be, but the other was only about 5x bigger.
Since they were both being FILTER’d in a single measure, in a manner that “nested” one FILTER inside another, the speed-up effects were multiplied, and things got about 60x faster overall as a result.
“What If I’m Not Using FILTER()?”
Great question. What if you’re not using FILTER against your calendar table, but instead using the built-in time intelligence functions like DATEADD, DATESYTD, or DATESBETWEEN?
Um, I dunno for sure. I’d have to ask my buddies at Microsoft, or run some tests, or convince someone smarter to drop in and answer (Marco / Alberto / Chris). But my guess is that sometimes those functions will also be a lot faster on smaller calendar tables. Just maybe not all the time, and maybe not by as much.
Unlike FILTER, which has no choice but to look at every single row, I suspect the time intelligence functions may have shortcuts in them. DATEADD(…, –1, Year), for instance, doesn’t really have to look at every row in the calendar – it could be smarter and “jump” straight to the rows it needs. I have no idea if it does, however, so maybe that’s a post for another time.