FILTER() – When, Why, & How to Use It

 
A recent comment/question alerted me to the fact that I’ve never devoted a post just to this very useful (and often misunderstood) function.  Time to correct that.

The #1 Reason to Use FILTER – When CALCULATE Breaks Down

A function 'CALCULATE' has been used in a Boolean expression that is used as a table filter expression. This is not allowed.

Does this Mysterious Error Look Familiar?

 

A function 'CALCULATE' has been used in a Boolean expression that is used as a table filter expression. This is not allowed.

Different Formula, Same Error

What’s Wrong With Those Formulas?

The thing both of those formulas have in common is that they are using a measure in the filter argument of the CALCULATE function.  In both examples here, I’ve highlighted the offending measure in yellow.

CALCULATE([Sightings per Year], [Avg Sighting Length in Mins]>6)

CALCULATE([Sightings per Year],
   Observations[TTL Min]>[Avg Sighting Length in Mins])

In the first formula, I was trying to use a measure on the left side of the comparison, and in the second, I was trying to use a measure on the right side of the comparison.  Both are illegal.

CALCULATE expects its filter arguments to take the form of Column=Fixed Value, or >Fixed Value, <= Fixed Value, etc., where “Fixed Value” is a specific number (like 6), a specific text string (like “Regular”), or a specific date.  So my first formula violates the rule that a column name is required on the left.  And my second formula violates the rule where a fixed value (not an expression or a measure) is required on the right.

CALCULATE refuses to let you use variable expressions like measures in these filter arguments largely because “vanilla” CALCULATE is intended to always be fast, and once you start including expressions in these comparisons, your formulas might run a LOT slower.  So this is a good rule really – it forces you to stop and think before accidentally doing something bad.  The error message, of course, could and should be a lot better.

For a bit more explanation on this, see this brief post.

What’s the Solution?

If you look at those two illegal formulas above, they both reflect a perfectly valid intent.  The first formula is attempting to ask for “how many sightings per year would I report if we just counted sightings that lasted more than 6 minutes” and the second is asking for “how many sightings per year are above average in length.”

I’m almost regretting my selection of those examples, because they are a bit more complex than necessary to make the fundamental points.  But hey, too late now to change them, so I’ll move quickly.

In the first example, the Avg Sighting Length measure is actually based on a column in my Observations table – each UFO sighting has a [TTL Min] column.  So I could rewrite that filter in the calculate as Observations[TTL Min] > 6 and everything is fixed.

But let’s say I wanted to filter out entire States where the average sighting length was > 6.  Since I don’t have a column in my States table that does that, it’s sensible to use the measure, and that forces me to use FILTER, because FILTER does allow me to use measures in my comparisons:

CALCULATE([Sightings per Year],
   FILTER(States, [Avg Sighting Length in Mins]>6)
)

See that?  The highlighted section took one of the filter arguments to CALCULATE and replaced it with a call to the FILTER function.  The syntax of FILTER is pretty simple, but is explained below.

In my second example, where a measure was used on the right side of the comparison, the formula gets rewritten as:

CALCULATE([Sightings per Year],
   FILTER(Observations,
      Observations[TTL Min]>[Avg Sighting Length in Mins]
   )

)

So there you go.  When you want to use a measure, or an expression like AVERAGE(Observations[TTL Mins]), you have to call in the FILTER function.  More details follow, starting with the simplest information and moving to the most subtle of characteristics.

How does FILTER() Work?

The syntax for the FILTER function is FILTER(TableToFilter, FilterExpression). Pretty simple.

For simple purposes, if you understand the gist of the above, and then points 1 and 2 below, you are good to go. If you want to understand more of the details over time, I recommend revisiting points 3-5.

  1. FILTER() takes a TableToFilter and a FilterExpression, and returns all rows from that TableToFilter that match the FilterExpression.
    1. In the example above, TableToFilter is ALL(Periods)
    2. and FilterExpression is Periods[Year]=MAX(Periods[Year])-1
  2. FILTER() steps through the TableToFilter one row at a time.
    1. And for each row, it evaluates the FilterExpression. If the expression evaluates to true, the row is “kept.” If not, it is filtered out.
    2. Because FILTER() goes one row at a time, it can be quite slow if you use it against a large table. When I say “large” that is of course subjective. A few thousand rows is fine in my experience. A million is not. Do not use FILTER() against your fact table.
  3. The FilterExpression typically takes the form of Table[Column] = <expression>
    1. The comparison operator doesn’t have to be “=”. It can also be <, >, <=, >=, <>
    2. The expression on the right hand side of FilterExpression can be “rich.” This is VERY useful. In a simple CALCULATE, the right side of each filter expression has to be simple, like a literal number (9) or a string (“Standard”). The fact that FILTER() allows for rich expressions here is one of the most common reasons I use FILTER().
    3. The Table[Column] in the filter expression is a column in the TableToFilter. If you are filtering the Periods table, it makes sense that you are testing some property of each row in Periods. I can’t think of a sensible reason to use a column here that is NOT from TableToFilter. (Insert “boot signal” here, maybe the Italians can address this).
  4. FILTER() ignores everything else going on in your formula and acts completely on its own.
    1. For example, our overall formula sets ALL(Periods) as the first argument to CALCULATE.
    2. The FILTER()’s that come after that do NOT pay any attention to other arguments however, including that ALL(Periods).
    3. In other words, the FILTER() functions are still operating against the original filter context from the pivot! If the pivot is sliced to Year=2009, then the FILTER() function starts with the Periods table already pre-filtered to just 2009.
    4. This is why each of my FILTER()’s uses ALL(Periods) for TableToFilter. I have to repeat the “expand” step so that my FILTER() is also working from a clean slate.
  5. Even though each FILTER() operates on its own, their results then “stack up” in the overall formula.
    1. Even though FILTER() RETURNS a set of rows that matched the FilterExpression, it actually REMOVES rows from the overall filter context.
    2. This sounds tricky but really it isn’t.
    3. Let’s say our TableToFilter contains 6 rows: A, B, C, D, E, and F.
    4. And our overall formula contains two FILTER() clauses that both operate on the same TableToFilter, just like our overall formula near the beginning of this post.
    5. Let’s also say that the first FILTER() returns rows A, B, C, and D.
    6. And the second FILTER() returns rows C, D, E, and F.
    7. The net result is that only rows C and D are left “alive” in the overall filter context of the formula.
    8. So one way to think of this is that FILTER()s “stack up” on top of each other.
    9. Another way to think of it is that even though the first filter RETURNED rows A, B, C, and D, its real effect was to REMOVE all other rows (E and F) from consideration.

9 Responses to FILTER() – When, Why, & How to Use It

  1. ColinBanfield says:

    Nice post!

    “Do not use FILTER() against your fact table.”

    “…the FILTER() functions are still operating against the original filter context from the pivot! If the pivot is sliced to Year=2009, then the FILTER() function starts with the Periods table already pre-filtered to just 2009.”

    The above statements are worth highlighting.

  2. Mathilde says:

    Wow, thanks so much for this post. It all looks obvious now! Dax takes time to get used to coming from an Excel background, but definitely worth the effort. This is a bit like being able to write dynamic sql queries directly in Excel cells. Thanks for your very clear explanations and (infectious) enthusiasm!

  3. Mike H. says:

    I am using 5 separate slicers to specify unrelated parameters (i.e. x < Age < y, Gender = [Male and/or Female], etc.) this results in a pretty complex filter function, but it works.

    My code seems very inefficient though, as I have a collection of measures which all require the same filter function that I have cut and pasted into each definition. Is there a way to resolve the filter just once and use the result across multiple measures?

    • powerpivotpro says:

      I am unaware of a bulk way to define the same FILTER and re-use it, at least not in PowerPivot.

      Just making sure though: do you need FILTER() in all of those cases? I mean, I only use FILTER() for slicer purposes when I am explicitly doing “disconnected slicers” which is an advanced technique.

      Your Gender slicer seems like it might just be doable as a connected table. (But the x < Age < y part does seem to be a textbook disconnected example).

  4. Trisha says:

    Can Powerpivot filters be used by regular Excel users who do not have PowerPivot?

    • powerpivotpro says:

      In excel 2013, yes. In 2010, you need PowerPivot for SharePoint. Actually the SharePoint thing is pretty good for 2013 too.

Leave a Reply