Stars and Snowflakes and Bears, Oh My!

 
Stars and Snowflakes in PowerPivot.  Snowflakes are slow and evil.

A PowerPivot Model with “Single-Step” Relationships is a “Star,” and one with “Daisy-Chained” Relationships is a “Snowflake.”  Snowflakes are Often Slow.  And Evil.  Don’t trust them.
(Images taken from page 194 of the book).

“Hey, Why Did Things Get So Slow?”

Got a great question this week from Dave Boylan:

“I think I read in your book that you should always use dimension (lookup table) data in your slicers and page fields (even if the same data is in your fact/data table) because they behave better and perhaps use less memory.  This makes sense as my fact table has 2MM rows and the slicer has three distinct values.”

Yes – if you have a lookup table, you should use fields from that lookup table on your pivot, rather than their equivalent fields from your data table.  My recent post on using fields from your date table on the pivot is just one example of this.

Dave continues:

I have three tables:  Timesheet, Projects, and Initiatives.  They are linked together like this:

      Initiatives (600 rows) –> Projects (17k rows) –> Timesheet (2M rows)

(where the arrows point in the direction that filters flow, so Initiatives filters Projects and Projects filters Timesheet).

When I add a slicer of initiative type from the Initiatives table and select a value, my workbook (45MB on disk) went from 300MB to 800MB in memory, in 5 minutes, and then crashed.

So, he’s following my advice and using his Lookup table field on the pivot, but the slicer then takes forever to slice, and then crashes.  This is impossible of course – my advice is never flawed!  (Joke).

But when he uses =RELATED() to “fetch” the right field from the Inititiaves table down into his Timesheet table, as a calculated column, THAT field works great on his slicer:

So, I brute force “double related” the initiative type directly onto the 2M-row Timesheet table and used that as the field for the slicer.  The slicer buttons now recalc the pivot in one second.

Interesting huh?  In fairness to me, I covered this problem in the book.  In fairness to Dave, it’s near the end of the book, in Chapter 19 – Performance: How to keep things running fast.

Snowflake is the Culprit!

Situations like the one Dave ran into are the reason why I advocate always using Star-shaped models (single-hop relationships only), and never using Snowflake-shaped models (chains of relationships).

PowerPivot does not “like” large intermediate tables in its relationship chains:

        Initiatives (600 rows) –> Projects (17k rows) –> Timesheet (2M rows)

That Projects table is large enough (17k rows) to cause PowerPivot trouble if it “gets in the middle” of a slicer (or Row field, etc.) trying to filter the Timesheet table.

Viewed more graphically, this is his original setup:

image

Dave’s Original Setup:  Two Lookup Tables Chained Together (Snowflake)
Gives Him a Large-ish Intermediate Table in the Chain (Projects)

What’s the fix?

Eliminate a step in the chain.  Combine the Initiatives table into the Projects table.  Do this in the database (if you have one) or use =RELATED() calculated columns.

You end up with something like this:

How to convert a snowflake to a star in PowerPivot.

Convert your Snowflake into a Star by Adding the “Parent” Columns into the “Child” table.  Now you can ignore the parent (Initiatives) table and just use the Projects table fields on your pivot.

Is this always necessary?

No, it is not always necessary to “flatten” your snowflakes into stars.  “If it ain’t broke, don’t fix it.”  And sometimes you have no choice but to use a snowflake anyway. 

But if things are running slowly, this is a good fix to try. 

(As are the other tips in Chapter 19 on slicers, table shape, imported columns versus calc columns, common “slow” formulas, etc. – don’t skip chapter 19!)

11 Responses to Stars and Snowflakes and Bears, Oh My!

  1. My daughter is going to freak out when I tell her that snowflakes are evil. I can hardly wait!

    • powerpivotpro says:

      Tell her that snowflakes were proven to be evil via careful analysis… using the world’s most powerful analytic software.

  2. Dave Boylan says:

    I’m famous! Well….in an Excel geeky perspective. Thanks Rob for covering this in such detail! Very much appreciated!!

    • powerpivotpro says:

      Dave – this is the only kind of fame that people like you and me will ever experience, so we may as well enjoy it :)

    • Tom boylan says:

      Your question leads me to believe you did not read to the end of the book or you read the Cliffs notes version:)

      • powerpivotpro says:

        Given your last name, I take it you are a family member heckling Dave in public.

        Sounds a lot like something my family would do :)

        • dave boylan says:

          He’s my know it all brother! Ironically, he calls me with Excel questions. My next To Boylan solution will involve obfuscation using both the indirect function and named ranges. He’ll never follow the spaghetti logic I’m going to throw down. Always in good fun!

  3. David Hager says:

    I would guess that looking out of the window inspired this post.

  4. Richard says:

    Thanks Rob, this is very useful info. I had a similar problem on one of my models. It wasn’t very big, however every time I tried to use the slicer I noticed that it kept “building” the cache. I used a very simple SQL script (with help from a mate of mine – Toufiq) to pull the two dim tables into the model as one table and it worked perfectly. May I add that Related also work well.

  5. Eric Hutton says:

    That is only possible when you don’t have other data tables attached to that primary table which you also need to filter on.
    For example, I have a table of sectors of activity for businesses, then I have a table that links a business ID to their sector of activity, finally I have a table of data from the various businesses – lets say it is monthly sales data. In addition, there is a table with statistical data on national activity by sector connected to the sectors table – so a real “snowflake” patter of tables. So then I still need to base my filters on that sector table, because there is no way to connect the business ID table to the statistical table – it would have a many-to-many relationship. So instead of filtering on the business ID table, I could take the related sector of activity field from my businesses table, move that into the business activity table (using a RELATED column, based on the business ID), and then link from that field back to the sectors table.

Leave a Comment or Question