Writing a RANK measure (and living to tell the tale)

 
Beavis_and_Butthead

“Number One, I order you to take…  a number two.”

-Captain Jean-Luc Butthead

 

 

Before I get started, do you have any idea how hard it is to find cartoon images from the pre-Internet era?  I just lost ninety minutes, first looking for the Far Side cartoon of “Custer’s Last Group Photo” (where everyone is holding up the “we’re number 1!” finger – get it?) and then struggling to find a version of the Beavis and Butthead image above at a better size.  No dice.  OK, enough of that…

DAX v1 has no RANK() function

That’s right.  If you want to write a pivot measure that tells you where a particular store, or product, or customer ranks according to another measure, and according to current filter context, well, a RANK() function would be nice.  But there wasn’t time to implement it in PowerPivot v1.

Never fear, there’s a workaround…

OK, yes, there’s a workaround – the FAQ says so right here.

But “never fear” might be an overstatement.  In fact, I should pause right here and say…

PowerPivot is simple, this technique is not

I’ve been doing this blog since October.  I’ve covered a lot of ground.  If you read this post and fear that you have to understand it in order to get great results with PowerPivot, well, then I’ve done you a great disservice.

Because in many ways, PowerPivot is actually easier than Excel itself.  If you are new here, I strongly recommend going back to the Great Football Project and starting from the beginning.

OK, back to the power technique

I don’t know about you, but personally, my first impression of the formula referenced in the FAQ sounded a lot like “double yoo tee eff.”  Check it out:

  COUNTROWS(
      FILTER( 
          ALL(DimCustomer[CustomerName]), 
          DimCustomer[CustomerSales]
              (Values(DimCustomer[CustomerName]))
                  
< DimCustomer[CustomerSales]
          && DimCustomer[CustomerName] <>  
                   Values(DimCustomer[CustomerName]) 
      )
  )
+1

Explaining how it works

That formula has been bothering me for a long time.  I knew it wasn’t going to be easy to break it down, but on our little DAX adventure together, we must leave no stone unturned, even the really heavy ones.

So today, I dove in.  And whenever I get uncomfortable, I run to familiar ground.  Which means, of course, oh yes…

The Great Football Project Returns!

Quick!  Someone get me some early-2000’s running backs!

  94801734-marshall-faulk   Edgerrin-James_1 Shaun_Alexander

Phew, I feel better already.  OK, let’s dig in.  A long time ago I defined a [RushingYards] measure using the CALCULATE function.  Let’s park that in a pivot:

PowerPivot Rushing Yards 
I want to write a measure that gives me the rank of each player, by Rushing Yards, no matter how I have sliced or filtered my pivot.

First step:  convert the sample formula over to my table/column names

When I substitute my football workbook’s names into that formula, I get:

   COUNTROWS(FILTER(ALL(Players[FullName]),
   [Rushing Yards](Values(Players[FullName]))
   <[Rushing Yards] && Players[FullName]<>
   Values(Players[FullName])))+1

I won’t bother pretty-printing that monster just yet, because I’m going to dissect it step by step.

Outside functions:  COUNTROWS + 1

The formula is basically:

   COUNTROWS(all kinds of crazy stuff) + 1

Which starts to make a little bit of sense, if you’re feeling sleuthy.  Really, what this is doing is:

   COUNTROWS(# of Player Rows where Player has more yards) + 1

Make sense?  Find how many players outrank the current player, add one, and that’s your rank.  OK, so how does the middle stuff determine the number of players that outrank the current player?

Inside functions – FILTER and ALL

   FILTER(ALL(Players[FullName]),
   [Rushing Yards](Values(Players[FullName]))<[Rushing Yards]
    && Players[FullName]<>Values(Players[FullName])))

Recall that the syntax for FILTER is:

   FILTER(Table Expression, Filter Clause)

In this case, our table expression is:

   ALL(Players[FullName])

Which makes sense.  Remember, on a particular row of our pivot above, the Players table is going to only contain the row from that player, for purposes of measure calculation.

Ranking Edgerrin James against himself, for instance, isn’t very interesting, unless you want to compare the Colts version to the Cards version, or heaven forbid the Seahawks version.  (The wear and tear on RB’s really is astounding).

So in order to have any players to rank the current player against, we need a table where the filter has been cleared.  That’s why we use ALL.

That leaves us with the Filter Clause, and it deserves its own section.

Inside Functions – the “Filter Clause” parameter to FILTER

OK, so we have a table that consists of all player names.  And we want to filter that down to just those players that have more Rushing Yards than the current player.

The FILTER function goes through that table of player names, one by one, and for each row, evaluates the filter clause to determine whether that row is “in” or “out.”

Our filter clause is really two clauses connected by the logical “and” operator – &&.  So a row must meet BOTH criteria to be included.

First clause:

   [Rushing Yards](Values(Players[FullName]))<[Rushing Yards]

First Row of PivotThis gets tricky, so let’s anchor it in an example.  Let’s say we are evaluating this measure for the very first row of our pivot, Edgerrin James:

Player Table

And the FILTER function is going to step through  our table of all player names, at right.  (There are about 4,000 players in my players table).  So, FILTER starts with Rabih Abdullah.

The right side of the clause above, [Rushing Yards], is the [Rushing Yards] measure, evaluated in the context of this source row, Rabih Abdullah.  Well, he totaled 172 yards in his rushing career.

But what about the left side of the clause:

  [Rushing Yards](Values(Players[FullName]))

That uses the context from the *pivot*, not the source table.  the VALUES function returns the list of valid values of the specified column, according to current pivot context.  Since the specified column is on the rows axis of our pivot, it only returns one value in this case, which is Edgerrin James.

Edge has 9,842 rushing yards.  So the clause becomes:

   9842 < 172

Which is of course FALSE.  Rabih’s row gets excluded by the FILTER function.

Wash rinse, repeat.  And then repeat.

FILTER then moves on to the next source row:  Khalid Abdullah.  The pivot context remains the same (Edge James), so the left side of the comparison is still 9842.  Khalid has less than that, so again, excluded.

Well, in my data set, no one has more yards than Edge, he’s the top dog in my data set.  So when FILTER finishes its pass, there are no rows left included.  COUNTROWS then returns 0, the + 1 from the end of the formula gets added, and Edge gets his proper rank of 1.

Second Row of PivotThe measure then moves on to the next pivot context, the second row, which is Curtis Martin:

And then FILTER starts over again, running through the entire Players table again, row by row, starting with our friend Rabih.

Two loops tucked inside each other.  Wow.

The other half of the filter clause

Remember this?

    && Players[FullName]<>Values(Players[FullName])))

That just makes sure we don’t count the player against himself for purposes of rank.  Off the top of my head, I’m not sure that’s 100% necessary.  I’ll have to noodle on that a bit.

OK, let’s see it in action

I add my new measure to the pivot, and I get:

Number Two Tops the List

What the heck?  Is my measure broken?

No, it’s not.  I’ll explain what’s going on in my next post, since this one has run so long already.

Next Football Post:  RANK Measure Continued >>

6 Responses to Writing a RANK measure (and living to tell the tale)

  1. John says:

    … we’re not worthy.. we’re not worthy!!

  2. […] Remember my Rank report that started with #2? […]

  3. […] It wasn’t until recently that I looked back and realized that I was building features, like the Rank measure, that were never in the original project at […]

  4. Jared says:

    Great post.

    Let’s say you wanted to do this ranking by year (i.e. Year, Player, Yards, Rank). How could you do this ranking by year so you have something like:

    Year Player Yards Rank
    2009 Edgerrin James 1000 1
    2009 Curtis Martin 900 2
    2008 Shawn Alexander 1200 1
    2008 Edgerrin James 1100 2
    etc…

    FYI… in my data I have sales by week by store. I’m trying to rank on sales by store by week.

  5. Derek Rickard says:

    I’ve been battling with Rankx doing terrible things to my workbook. It’s a very straight forward function, and was behaving as expected on it’s own (that is, showing the rank). But, when I’d try to use it in other measures it took a serious dump on my hopes and dreams (limited to the workbook). This little treasure saved my bacon. I replaced the rankx measure with this method, and presto! Everything is squeaky clean now…

Leave a Comment or Question