Speed: Another Reason to “Trim” Calendar Tables

image

An 11,000-Row Calendar Table Spanning from 2000 to 2030:
Most of the Time This is Harmless Overkill

A 60x Speed Improvement From a Most Ordinary Place

I’ve been doing some work lately for a client who really pushes the DAX envelope.  One of the top-three models I’ve ever worked on in terms of complexity, no doubt.  And really, my role is just to help fine-tune it and add a few bells and whistles.  They built this sucker themselves and I am way impressed.

Crazy stuff.  Formulas that use outlier dates from one Data table (“fact” table) to then subsequently filter another Data table (via its related Calendar table), but then wrap that up inside a MAXX inside a SUMX…  and it all makes perfect business sense.  It’s magic.

But speed ain’t its strong suit.  We tried all the usual tricks – “de-importing” unneeded columns, replacing calculated columns with imported versions, etc.

And it was still way too slow.  Then we tried something even simpler, and things got 60x faster.

 

Oh The Nuances of FILTER()

In this model, we do a lot of FILTER(DateTable, <complex filter expression) in our measures.

And the FILTER function, when we have sufficiently complex filter expressions, has two particular characteristics worth noting:

  1. It picks up each row in the specified table, one at a time, to determine if the row should be kept or not.  No bulk scans where it can examine blocks of rows at once.  If that sounds slow to you, you’re on the right track.
  2. It forces the calculation into single-threaded mode.  What is single-threaded mode, you ask?  It’s geekspeak for “running your engine on one cylinder.”  It means that much of your computer’s CPU horsepower is left on the sidelines, watching a single “Core” (a mini-CPU, essentially) maxing itself out to do all the work.

image

Divide The Number “1” By the Number of Cores on Your Computer.  If Your CPU Usage
is Pegged Near that Percentage for a Long Time, You Are in Single-Threaded Land.

(Note in the picture above that all four cores show activity, which might make you think they are all being used at once.  Don’t be fooled!  The operating system is just rotating the DAX engine’s work around amongst the 4 cores, so as not to burn one core out.  It rotates things around many times per second, so it LOOKS LIKE sharing.  But at any given moment, trust me, a single core is working at 100% capacity).

In short, the FILTER function is often “at fault” when performance tanks.  Used clumsily against large data sets and/or complex calculations, it can grab a single core (which is basically a CPU) and hold it hostage for long periods of time.

Punchline:  1,000 Row FILTER() is 11x Faster Than 11,000 Row FILTER()

You may have already guessed the answer here.  We have less than 3 years worth of actual data, but our calendar/date tables had 30 years in them – about 11x as many days as we needed.

I trimmed those calendar tables down to “fit” the range of our actual data.  And that 11x reduction in calendar rows made things 11x faster!

Our other tables with millions of rows?  Untouched.  The formulas themselves?  Untouched.  Just smaller calendar tables.  That’s it.

“Wait, You Said 60X!”

Yes.  Without getting too deep into boring detail, I’ll tell you that there are TWO calendar tables in this model.  One of them was about 11x as big as it needed to be, but the other was only about 5x bigger.

Since they were both being FILTER’d in a single measure, in a manner that “nested” one FILTER inside another, the speed-up effects were multiplied, and things got about 60x faster overall as a result.

“What If I’m Not Using FILTER()?”

Great question. What if you’re not using FILTER against your calendar table, but instead using the built-in time intelligence functions like DATEADD, DATESYTD, or DATESBETWEEN?

Um, I dunno for sure.  I’d have to ask my buddies at Microsoft, or run some tests, or convince someone smarter to drop in and answer (Marco / Alberto / Chris).  But my guess is that sometimes those functions will also be a lot faster on smaller calendar tables.  Just maybe not all the time, and maybe not by as much.

Unlike FILTER, which has no choice but to look at every single row, I suspect the time intelligence functions may have shortcuts in them.  DATEADD(…, –1, Year), for instance, doesn’t really have to look at every row in the calendar – it could be smarter and “jump” straight to the rows it needs.  I have no idea if it does, however, so maybe that’s a post for another time.

9 Responses to Speed: Another Reason to “Trim” Calendar Tables

  1. Marco Russo says:

    As usual, it depends. CALCULATETABLE might be faster than FILTRE, but only if the cardinality within the filter argument is lower. I’m writing an article about that – I hope to publish it within a week on sqlbi.

  2. sam says:

    @Rob,
    You could always have a Calendar Table spanning 30 years, but a query that pulls data up to today or some times till the end of a period (month/year) – and the PowerPivot Model pulling data from that Query rather than the Table.

  3. toufiq1 says:

    @marco, I tend to favor CALCULATETABLE over FILTER, so looking forward to your article to understand impact in different scenarios
    @sam agreed and easy to maintain if you use a view as a decoupling layer over the table with a condition such as WHERE [Year] <= DATEPART(YEAR(GETDATE()) for facts up to the current date (a bit different for facts containing future periods or multi-fact models)

  4. Reader says:

    I would love to read more about calender tables, as of now I create them in Excel and import them into PowerPivot. But perhaps there are good reasons for creating them in PowerPivot from a single date columm?

  5. My date tables are in a SQL database so that everyone (and SharePoint) can update the model without referencing a local table or loading the whole table into Excel. Using SQL to bring in just the columns and date ranges needed for the current model keeps everything lean and fast, not to mention keeping those extra years out of the slicers.

  6. Renato Falcon Lyke says:

    Is there a way that we could reduce the file size. My Data comes a database and i am creating measure to get a trend. The file size is more than the raw data size.

  7. Mat says:

    question not related; why the background is grey? I have this on one of my file, don’t know why.

Leave a Comment or Question