Filters CAN Flow Up Hill – Via Formulas That Is

By Matt Allington

Intro from Rob

imageMany of you already know of Matt Allington and his background as BI Director for Coca-Cola Asia Pacific.  Matt recently flew around the world (literally) to attend my training course with the purpose of becoming accredited by me as the official PowerPivotPro University Trainer for Australia (we even have a picture to prove it – displayed at right).

Now that Matt has my approval to teach my material, I have invited Matt to be a regular blogger on PowerPivotPro.com.  This is Matt’s first post in this capacity, so over to you Matt.

Confessions of a DAX student

Matt here:  I want to share with you a simple mistake I made early on in my DAX journey, and also create awareness of how easy it is to fall into a similar trap. This post will explain the mistake and provide the solution to how you can get filters to flow up hill – via formulae that is.

If you learnt DAX the Rob Collie way (like I did) you would be very familiar with Rob’s best practice of placing the lookup tables above the data tables in the PowerPivot Diagram View.

schema

The reason Rob teaches it this way is because it is very easy for the reader to visualise the flow of a filtered table – filters flow down hill.

In Rob’s PowerPivotPro University Online Training and also in his first book , the message is drilled into the student/reader – Filters ONLY flow down hill from the ‘one-side’ of the relationship to the ‘many-side’ of the relationship, they can’t flow up hill.  More recently I read the book by Marco Russo and Alberto Ferrari (aka The Italians) – Microsoft Excel 2013 – Building Data Models with PowerPivot.  To quote The Italians:

Filter context is automatically propagated from the one side of the relationship to the many side, whereas it is not propagated from the many side to the one side.

Both of these statements are completely true, and I received the message loud and clear during my learning.  But the problem was an error in the mental model I created for myself, which was “you can’t filter a lookup table from information stored in a data table”.  This mental model (and hence my learning) was NOT correct. Now I am not saying for one minute that Rob (or anyone else) encouraged me to follow my mental model – I did that all myself inside my own world of ignorance.  What I am saying is that this is how I interpreted the filter propagation message, no doubt in large part due to my partial understanding of DAX at the time.  The real problem is that these mental models we build for ourselves stick around as we grow, so if you learn it wrong early on, it can become a limitation in your ability to advance later.

Russo and Ferrari actually go on to say the following in their book (however I didn’t read this until much later):

Note: We said that the filter context propagation happens automatically and there is no way to force this behaviour [the other direction].  This is not completely true…

Let’s look at the specific example that challenged my understanding

I discovered I had a problem with my understanding when I needed to filter a sales data table (2) based on “brochure” information in a forecast data table (1).   [Note:  My forecast table only contains information about promotional forecasts for each week a product is on promotion – it does not contain sales forecasts for products that are not on promotion during any given week.  Hence if there is a record in the forecast table for a given week, then the product is “on promotion” by definition.]

promotions model 2

I knew I couldn’t directly filter from 1 through 3 to 2 (because filter propagation only flows from the one side of a relationship to the many side of a relationship), and I also knew that I had to use a common lookup table above the data tables and then propagate the filter context ‘down hill’ to both tables to get the result I wanted (same one side to many side propagation rule applies).  But I had burnt into my brain that you can’t filter a lookup table from information in a data table – which as I said I later learnt was not true.  My early mental model had me doing lots of strange things to solve the problem, like importing multiple copies of the forecast table, one for the data table purpose and a second copy for a lookup up table purpose. It was only when I sought help with the problems associated with the multiple tables that I flushed out my Mental Model as the ACTUAL root cause of the problem.

But with CALCULATE and FILTER, You Can Do Nearly Anything

You can indeed filter based on information in the forecast data table. It is not done with filter propagation but instead it is done in a DAX formula using the universal Swiss Army Knife DAX Function – good old CALCULATE( ).

The problem was that my mental rule was preventing me from even looking for this solution using CALCULATE( ).  Once I had erased my flawed mental model and replaced it with the new corrected information, the following DAX measure appeared from the Gods.

Sales from Products on Promotion:=
CALCULATE( [Total Sales]
,FILTER(Products
,CALCULATE(COUNTROWS(Forecast))>0
)
)

i.e. give me the sales data for all products that have at least 1 promotional forecast record in the data table for the period.  In effect I am filtering the sales data table based on information in the forecast data table – exactly what I was after.  Once I have this new Measure, I can then use the column Forecast[Brochure] in my pivot tables.  The filter context will BEHAVE as if it were flowing up hill from the Forecast table, through the Product table and back to the Sales table, but it only works for DAX formulae that include the correct filtering expression like this one.  While it has the effect of seemingly flowing up hill, technically what is happening is that the Products Table is filtered based on the existence of a record in the forecast table.  Once this filter is applied to the product table in the filter context for this measure, the filter then flows down hill from the one-side of the relationships (product table) to the many-side of the relationships (both the forecast table and the sales table).

It is probably worth explaining the extra CALCULATE in this Measure

You will notice in the Measure above that I have added an extra CALCULATE( ) function [in bold for illustration purposes] to wrap around the inner COUNTROWS(Forecast) function.  I have created some debugging DAX formulae to help illustrate what is happening here.

First consider the following part of Measure without the extra CALCULATE ( )


FILTER(Products,COUNTROWS(Forecast) > 0 )

 

This FILTER( ) function will return a table.  I can’t actually see the table, hence for debugging purposes I am going to wrap the entire Function in another COUNTROWS( ) function just for this test.


Test:=COUNTROWS(
FILTER(Products
,COUNTROWS(Forecast)>0
)
)

This extra outer COUNTROWS( ) function allows me to get a count of how many rows there are in the table returned from the FILTER ( ) function.  As you can see below, this new Test Measure returns the same value for all coordinates in the Pivot Table.

test

EDIT: 10 Aug
The rows shown in the pivot table above (“Back Page”, “Brochure Page: 1…” etc) are on the many side of the relationship with the Products table and as a result their filter context will not propagate to the Product table automatically.  The Test measure I have created therefore will only be filtered if there is some other new filter context introduced in the DAX measure itself. [You can see the Test measure is not filtered by the rows, and this is confirmation that there is no new filter context in the DAX].

If we analyse the Test DAX formula, the Filter ( ) function creates a new row context so it can iterate the Products table.  The Filter ( ) function goes to the first product in the products table and asks “ how many rows are visible in my Forecast table when I just look at you product 1 ?”.  The answer is “the same as it was before” – because nothing has changed to the filter context. Filter ( ) does not create a new corresponding filter context when it creates the row context – which is a bit confusing given the name of the function is FILTER ( ).  FILTER ( ) iterates in its own little ‘row context’ world until it has completed the iteration process. Then, and only then does it apply the filter to the Products table – and even then it is due to its role as a filter input in the CALCULATE function.

When I wrap COUNTROWS(Forecast) in a CALCUATE ( ) Function, “context transition” occurs and the row context created by FILTER ( )  is transformed into a new additional equivalent filter context that gets applied during the iteration process.   


Test:=COUNTROWS(FILTER(Products
,CALCULATE(COUNTROWS(Forecast))>0
)
)

In this new measure, the FILTER ( ) function iterates through each product in the Product table AND the product in the current row context is ALSO used to filter the Products table.  This new filter on the Products table is then propagated to the Forecast table using the relationship in the data model (from the one side to the many side) before COUNTROWS ( ) is evaluated.   Under this new condition, COUNTROWS(Forecast) will only be >0 when the product being iterated exists in the Forecast table

Now the Brochure locations in the rows of the pivot table below (eg “Back Page” etc) will filter the Forecast table directly (because they are in the same table and do not rely on filter propagation), and the new Test DAX formula is filtering the Products table and then checking to see if the >0 test is TRUE or FALSE on each iteration.  If the result is true, then we know that the product from that iteration step must exist in the Forecast Table, so that product is kept.  FILTER ( ) continues until it has gone through the entire Product table, and then the last step is the outer COUNTROWS finally returns the total number of rows (ie the number of products that survived the iteration test) to the pivot table.

test2

So taking this learning from the debugging formulae, I end up with the following Measure which includes an additional CALCULATE( ) function with the new filter context.


Sales from Products on Promotion:=
CALCULATE([Total Sales]
,FILTER(Products
,CALCULATE(COUNTROWS(Forecast))>0
)
)

Using debugging formulae like this is a really good way to solve problems and learn more about how DAX works.

Sometimes we need help to break free from our paradigms

I wouldn’t have broken out of my flawed mental model if I wasn’t able to ask someone else what I was doing wrong – someone to share my problem with and help me realise that my mental model was flawed.  One great way to do this is to participate in a community forum like http://powerpivotforum.com.au  By participating you can ask questions, explain your problems and provide help to other DAX users like yourself.  In addition to getting help with your own problems, in my experience helping others is a fabulous way to learn yourself.  The practise of teaching is one of the best ways of deepening your own understanding, so I encourage you to get involved as both someone that ‘asks’ and also someone that ‘answers’ questions.

I hope my explanation of this problem and how I fell into this mental trap helps others to broaden their understanding of what is possible when filtering tables, help you to be aware of the risks of flawed mental models and the value of sharing your problems with others.


Matt Allington is Principal Personal and Team BI Consultant at Excelerator CBS based in Sydney Australia. Matt’s Blog and Web Site
LinkedIn

25 Responses to Filters CAN Flow Up Hill – Via Formulas That Is

  1. Mike Girvin says:

    Great post today. It always seems like the most solidly learned lessons come from our biggest mistakes! Well written and very useful. Thanks, Matt!

  2. Joseph Looney says:

    Great write up with a great example. This will come in handy! thanks

  3. Chris Gilbert says:

    Matt,
    Thanks so much for the great explanation. With your and Rob’s help, this is finally starting to come together.
    It’s scary, but I actually wish that I could spend MORE time working with this PowerPivot / DAX stuff!

    I’m a big fan of those intermediate “test” calculations, as well.
    Keep up the great work!

    • Avichal Singh says:

      Great post Matt! Looking forward to seeing your future posts on PowerPivotPro. I’ll second the thought of using intermediate calculations to build your formulas. For instance you could rewrite the formula in two steps as below. (you would not need a nested Calculate in this case)

      ForecastRowCount:=COUNTROWS(Forecast)

      Sales from Products on Promotion:=
      CALCULATE( [Total Sales]
      ,FILTER(Products
      ,[ForecastRowCount]>0

      )
      )

  4. Marco Russo says:

    Hi Matt, welcome to the Power Pivot blogger gang! :-)
    The step by step approach is nice. In case you have performance issue (might happen if you have many products…), remember you can use black magic:

    Sales from Products on Promotion:= CALCULATE( [Total Sales], Forecast )

    Faster, shorter to write, much harder to explain!

    • Avichal Singh says:

      Marco, love your black magic stuff. I would not pretend to understand how the formula you provided works. But I know it works. I first picked it up from your site http://www.daxpatterns.com/distinct-count/ It has helped me many times.

        • scottsen says:

          How have I not come across your videos? There goes my weekend…

        • Marco, I watched the video you posted above – wow, what a great show. I was glued to it like an episode of Game of Thrones! I am now working through the other videos on your site.

          One comment/question from the mere mortal community – I really do prefer Rob’s approach of putting the filter tables at the top and the data tables at the bottom. I notice you and Alberto don’t use that practice. What is your view on the value of this approach? I guess you guys come from a more traditional Multi Dimensional world of star schemas and snowflake schemas etc, and in that world there are mainly highly skilled techies. The interesting thing about PowerPivot is it is making the technology available to others who don’t have such a traditional RDBMS up bringing, and visual clues like Rob’s approach really helps (me anyway)

          • Marco Russo says:

            Oh yes, there are different approaches and it’s great that we speak in different “visual” languages, because in this way I, Alberto, Bob and other reach different audience! In learning experience, no size fits all and it’s also a question of background.

            I really never used the Rob’s approach and I see now why it could be another point of view – and you confirm it’s simpler for people that come from a different background. Very good to know!

    • chandan chauhan says:

      omg..omg.. this is some voodoo stuff.. I just tried it and it works…

    • Oxenskiold says:

      Hi Marco,
      The black magic of CALCULATE([Total Sales], Forecast) is easy to explain:

      The Forecast table has common columns with the sales table namely the columns of the products table. That’s the magic. CALCULATE takes care of the rest :-D

  5. Thanks for the comments and the builds from Mark, Joseph, Chris, Avi.

    Marco, I have spent the last 2 weeks reading chapters 7 and 8 of your book (link in the post above) with Alberto over and over. If my Kindle had paper edges, indeed these sections would be dog-eared. These chapters certainly have taken my understanding of evaluation context to the next level. Thanks for the links, I will study the simplified “Marco formula” and try to get my head around it.

    • Marco Russo says:

      Thanks for the feedback! :)

    • John M says:

      It seems that there are many ways to accomplish the same goals within PowerPivot. It’s as much an art as a science. That’s what makes it so fun. Thanks for your post Matt. It was informative and instructive. I look forward to more posts from you in the future. I also enjoyed the insight shared by others. A side note to Rob – I just made Powerpivotpro.com my homepage. Why should I automatically open Google when this is where I want to go anyway?

  6. chandan chauhan says:

    I was trying to so something similar today. I wanted On Hand for only those items whose sales was greater than zero in the current filter context (last 10 days in the date-time slicer). I used

    SALES QTY:= CALCULATE ( SUM ( SOLD[QTY] ) )

    OH QTY:= CALCULATE ( SUM ( OH[QTY] ) )

    OH STOCK FOR SOLD ITEMS ONLY:= IF ( [SALES QTY] > 0, [OH QTY] )

    but the performance of this was terrible for some reason, since each of those table has millions of rows, for larger item selections in the pivot table, excel just gave up ( something like memory allocation error , upgrade to 64 bit blah blah..)

    I just tried the way mentioned above and it ran much faster for some reason. Here SIZE is the reference table that connects the two (I was expecting this to run slower since there’s the FILTER() formula and SIZE table itself contains about 1 million reference items)

    OH STOCK FOR SOLD ITEMS ONLY:=CALCULATE( [OH QTY] ,
    FILTER(SIZE , CALCULATE( COUNTROWS( SOLD ) ) > 0
    )
    )

    • chandan chauhan says:

      nevermind… I though I had discovered the holy grail.. but Marco’s black magic formula ran even faster….

      OH STOCK FOR SOLD ITEMS ONLY:= CALCULATE ( [OH QTY], SOLD)

      can’t believe that was it, would never have thought of it…

  7. Calculate() is such a simple formula to write but to really understand it is something else. Who created this magic?

  8. Oxenskiold says:

    Matt,

    When you are explaining the DAX formula:

    Test:=COUNTROWS(FILTER(Products
    ,COUNTROWS(Forecast)>0
    )
    )

    You are writing:

    “But here’s the problem – the FILTER ( ) function that surrounds it is an iterator, and as such it creates a new row context for the inner COUNTROWS(Forecast) function. Without the normal filter context, there are no pivot table coordinates passed to the COUNTROWS(Forecast) function, and hence the table returned by FILTER( ) will always be the full list of products (completely unfiltered).”

    This is actually NOT true. While it’s true that FILTER() is an iterator and it creates a row context it’s not true that the old filter context just goes away by itself because of the iterator. When COUNTROWS(forecast) is evaluated the original filter context is still there and COUNTROWS(forecast) is evaluated under that filter context.

    Here is what happens (on the logical level):

    a. the products table is materialized (all rows are returned). The products table is then evaluated under the current filter context. The current filter context consists of a single column from the forecast table. The Brochure column. If we use row 2 in your example the contents would be ‘Back Page’. However the products table does not have that column. The table does not have that column because it’s on the top of the many-to-one chain and hence is not extended by the columns of the forecast table in the materializing stage. As a result the products table is not filtered by the current filter context and is still consisting of all rows.

    b. COUNTROWS(Forecast) is now evaluated for each row in the products table. The row context is not converted or transitioned into a filter context since there is no CALCULATE to this. The filter context STILL consists of a single column from the forecast table (Brochure). The forecast table is materialized and then filtered by the contents of the Brochure column. The forecast table has that column and can therefore be filtered by it. The result of the COUNTROWS() is the number of rows from that table that is equal to the contents of the Brochure column. In row 2 ‘Back Page’ for instance. If the number of rows are greater than zero (it will always be greater than zero) the products row that is currently iterated on is included in the result of the FILTER(). So since the products table is never filtered by the forecast table and COUNTROWS() always will be true the resultant number from the formula is the number of rows in the products table.

    I hope this can help clarify the misunderstanding.

  9. Thanks for the comment. I have changed the explanation to reflect the correct process.

  10. Cabby says:

    There is an older but still very helpful article about how calculate and filter context works on Jeffrey Wang’s blog.

    http://mdxdax.blogspot.de/2011/03/logic-behind-magic-of-dax-cross-table.html

Leave a Comment or Question