More Fun With IF(VALUES())

March 12, 2011

 
Cast of Superbad:  Clearly watching a video on amazing PowerPivot DAX techniques

“I mean, IF() I’m paying top dollar, I want a little production VALUE()”

I bet you thought I was out of movie quotes for IF(VALUES()) didn’t you!  If I had a fake ID, my name would be “McPivot.”

Bag of Tricks

OK, once you have a reasonable level of comfort with the workings of IF(VALUES()), whether that comfort is “I understand it” or even just “I can make it give me the right results,” you are in for some treats.  There are a number of ways to use IF(VALUES()) to produce different desired effects.  I’m going to blast through a few of them here real quick.

Note:  If you have not read the previous post, you should do so before reading this one.

Returning BLANK() for Subtotals and Grand Totals

Sometimes you end up with a measure that simply doesn’t make sense unless you are in the context of a single item.  Returning to the football project, let’s say I write a measure that for “Personal Best Single Game Rushing Yards,” which, for a given football player, returns their personal best single-game total over all time:

Best Single Week Performance Measure in PowerPivot

Cool huh?  (My data ends before Adrian Peterson broke Jamal Lewis’ record).  By the way, the formula for that measure is:

   =MAXX(ALL(DimProschedule[GameDate]),[Rushing Yards])

(To understand how that measure works, I refer you to one of my personal favorite posts, the Five-Point-Palm, Exploding Function Technique.)

Now, if you put another field on the pivot, you have a decision to make:  do you want that measure to be evaluated at levels other than a single player?  How about for an entire team?  Or all players of a given position?  You may decide that doesn’t make sense, and prefer that the measure NEVER return a value in those cases.

Good thing we have IF(VALUES()).  We can create another measure based on the existing measure:

   =IF(COUNTROWS(VALUES( CleanPlayers[PlayerID] ))=1,
       [Best Single Gm Rush Yds], BLANK())

Best Single Week Performance But Player Only Measure in PowerPivot

Notice how the original measure does not reflect the best player total?  In the week that Lewis ran for 295, apparently the Ravens had other players run for a total of 39 additional yards.  Depending on the semantics desired, you could argue that the Ravens’ number should be 295, for best single player performance in a game.

Or you can just return blank and forget about it, as my second measure does.

Returning different measures for different cases (aka “Branching”)

If we look at the picture above, you may say “hey it’s not fair to compare QB’s against RB’s in rushing yards, shouldn’t each position be measured by its primary metric – pass yards for QB, rush yards for RB, and receiving yards for WR’s and TE’s?”

Yeah, let’s do that.  First let’s create similar “best game ever” measures for the other two kinds of yardage – passing (throwing) and receiving (catching).  (Rushing is running, btw).

Best Single Week Performance All Three Measures in PowerPivot

OK, cool, so we have all three measures, but it results in a scraggly pivot.  I want a single column of numbers, and for the number to reflect the position (primary role) of the particular player.

IF(VALUES()) to the rescue.  A new measure that “branches” into the right measure for each position:

   =IF(COUNTROWS(VALUES( CleanPlayers[PlayerID] ))=1,
      IF(VALUES(CleanPlayers[PositionCode])="QB", [Best Single Game Pass Yards],
      IF(VALUES(CleanPlayers[PositionCode])="RB", [Best Single Game Rush Yards],
      IF(VALUES(CleanPlayers[PositionCode])="WR", [Best Single Game Rec Yards],
      IF(VALUES(CleanPlayers[PositionCode])="TE", [Best Single Game Rec Yards],
      BLANK())
      ))),
      BLANK()
   )

Best Single Week Performance Branching Measure in PowerPivot

Cool, huh?  A few things to note:

  1. IF(VALUES()) is used to check the value of [PositionCode], but [PositionCode] isn’t even on the pivot!  This works because each PlayerID does correspond, in the CleanPlayers table, to a single position code, because each PlayerID is unique.
  2. That said, it IS hard to tell which player is which position, especially for the 49ers, who had awful teams during the years for which I have data.  So I’ll add Position to the pivot below.
  3. I’m pretty sure that Johnnie Morton had his 153 yard receiving day when he played for the Lions, not the 49ers, but solving that problem is best saved for another day.

Here’s the pivot with position added in for clarity:

Best Single Week Performance Branching Measure in PowerPivot 2

(I decided to screenshot the Bears section rather than 49ers because the Bears had better players in those years.)


The Magic of IF(VALUES())

March 9, 2011

 
Dan Marino Wraps His VALUEs in IFs and So Should You

“I’m Dan Marino, and IF() anyone knows the VALUE() of protection, it’s me.”

Boom!  An Ace Ventura quote finally graces the blog.  Yes folks, that fine work of American cinema did indeed feature Dan Marino reprising his real-world ad for Isotoner.

 

(UPDATE:  I had the val_if_true, val_if_false arguments reversed in my sample formulas, even though I had them correct in Excel.  I’m rusty Smile.  Fixed now.)

But more importantly, I wanted to revisit this little two-function combo:  IF(VALUES()).  I’ve covered it before a few times, but generally in the context of covering something else.

And since this is one of my favorite techniques, I think it deserves its own dedicated post.  A series of posts, actually.  Let’s dig in.

In Measures, You Can’t Just use IF()!

In regular Excel, the IF() function is a familiar tool to most of us:

      IF(logical_test, value_if_true, value_if_false)

In PowerPivot, you can use precisely that same version of IF() in calculated columns.  So, for example:

      IF(Table1[Column1]>6, 0, 1)

In a measure formula, however, you cannot do that.  If you do, you will receive the following error:

PowerPivot Error Value for Column Cannot be Determined in the Current Context

PowerPivot Error Value for Column Cannot be Determined in the Current Context

To better understand why that happens, let’s get ourselves an example…

“Whoa, did you say DimDown???”

Yeah that’s right, the Great Football Project is back after a long hiatus! I had a phone call yesterday with a Pittsburgh Steelers fan.  You know Pittsburgh?  It’s a drinking town with a football problem.

Anyway, he requested a return to the football project, under the umbrella of “how do the Steelers win another Super Bowl?”  Given that I lived in Seattle for 13 years, and witnessed his Steelers literally “steel” a Super Bowl from Seattle via lopsided referee calls, I decided that we should look at what should have been the Seahawks’ championship season:  2005.

image

[Pct Successful Plays] is a measure that I’ve been developing with Hugh Millen.  I’m not going to reveal the “secret sauce” behind said measure, but you can see that in Week 21, when the Seahawks played Pittsburgh in the title game, only 31.2% of their plays were successful – well under their 38.1% total for the year, and their 6th-worst percentage of the year.

Applying Different Formulas in Different Circumstances

Now let’s say that I want to return a different value for this measure in Week 21, which is Super Bowl week.  You know…  I want to “correct” for bad referee calls.

I’ll create a new measure, [Corrected Pct Successful Plays].  In normal cases, I want that measure to just return the same value as would be returned by the original [Pct Successful Plays] measure.

But in week 21, I want to double it.  Just to set things straight.  (In real business, there are many legitimate examples of this technique, like applying a different formula in the West region versus the East region of a sales territory).

If I just use IF() by itself, I will get the error I showed in the previous section.

But if I nest VALUES() inside of the IF(), it will work!

      IF(VALUES(Schedule[WeekNbr])=21, [Pct Successful Plays]* 2,
       [Pct Successful Plays] )

image

Neat, huh?  Returns the same value as [Pct Successful Plays], except it doubles in Week 21.

A Return to the Golden Rules of DAX Measures:  No Naked Columns!

So why is VALUES() required?  Because you cannot have “naked” columns in your measures.  This is one of the “golden rules” for DAX measures that I first introduced over a year ago – see this video for an explanation.

If you’re not interested in watching that video right now, fine, just take my word for it:  in most places, you cannot just include column references in your measures, you have to wrap them in a valid function.  (And go watch the video later, as it’s a concise explanation of the fundamentals of DAX measures, and something I cover on every consulting/training engagement).

VALUES(Table[Column]), however, IS legal in a measure, and VALUES returns the list of values for the specified column in the current context.  So in the last row of the pivot above, it returns 21, and the formula becomes IF(21=21, 2 * [Pct Successful Plays], [Pct Successful Plays]).

Make sense?  OK, good, because I am lying to you.  There’s one more trick I have to show you, as the formula above actually STILL doesn’t work.

But…  IF(VALUES()) only works when there is only a single value!

OK, here’s the formula from above, repeated here:

      IF(VALUES(Schedule[WeekNbr])=21, [Pct Successful Plays] * 2,
       [Pct Successful Plays])

On each individual row of the pivot above, [WeekNbr] DOES have a single value (1, 2, 3, …  21).

There are actually many cases, however, where [WeekNbr] does NOT have a single value.

For instance, in the Grand Total cell of the pivot:

image

In the grand total cell, [WeekNbr] is NOT a single value.  It is actually ALL weeks.  In that case, the VALUES function returns a full column of values, and comparing a whole column to a single value doesn’t work out too well.

I like to visualize that problem in my head:

IF(VALUES(Column)) yields an error for pivot cells where there is more than one value of Column

      IF(VALUES(Column)) yields an error when there is more than one value of Column

And by the way, that’s not an error that you will just see displayed in the pivot’s grand total cell.  It will “tank” the evaluation of the entire pivot:

MDXScript(Sandbox) A table of multiple values was supplied where a single value was expected

Protecting against multi-value situations

Protecting against this is pretty simple.  You just use another IF() to “guard” your IF(VALUES()).

The “guard” if is highlighted below, wrapped around the original formula in normal font.

  =IF(COUNTROWS(VALUES(Schedule[WeekNbr]))=1,
        IF(VALUES(Schedule[WeekNbr])=21, [Pct Successful Plays] * 2,
         [Pct Successful Plays]), 
   [Pct Successful Plays])

In essence, you never let your IF(VALUES()) get evaluated unless there is only one value!  If there is more than one value (as there is in the grand total, or in a subtotal), the formula above just returns the original measure (that’s the “[Pct Successful Plays]” at the end).

Making it simple (seriously, just follow the pattern!)

A lot to digest?

It may seem bad, yes.  But you don’t actually have to grasp it fully.  You can just treat this as a “pattern” that you use.

For example, here is the formula turned into a pattern.  You just substitute your columns and values in the highlighted spots:

  =IF(COUNTROWS(VALUES( <Your Column Here> ))=1,
        IF(VALUES( <Your Column Here> ) =  <Test Value Here> ,
         <Result if True> ,
         <Result if False> ),
   <Result for subtotals and grandtotals> )

The stage is set for some serious fun!

OK, with all of that covered, that sets us up for a series of quick, simple, and powerful techniques that I can share in the next blog post.

 


Six Months With PowerPivot, part one

April 23, 2010

simply_red1

   
“If you don’t know me by now…  you will never never never know me…”

In honor of PowerPivot v1’s public release, I thought I’d offer up my thoughts on my last six months working with the product. 

Remember, I physically left Redmond in August and then officially left Microsoft in February, but have been putting PowerPivot through its paces that whole time, with a wide variety of data sets and business scenarios.

So in some sense, you can’t really get a closer view of PowerPivot than I have, while still maintaining any sense of neutrality about the product.  So here it is, the honest truth…

It passed the Great Football Project Challenge

When I started the Great Football Project back in October, I really did not know what to expect.  Honestly, I wasn’t sure that it was a great idea – was I going to get 5 days in and discover that PowerPivot simply wasn’t suited to that problem?  There was genuine potential for embarrassment.  But I needed something to blog about, and I was anxious to get started, so I just dove in.

After a few weeks, I was still holding my breath a bit.  I was past the basics but hadn’t really pushed the envelope at all.

At some point though I just stopped wondering.  It wasn’t until recently that I looked back and realized that I was building features, like the Rank measure, that were never in the original project at all. 

I can say now with confidence that PowerPivot can absolutely handle the business logic phase of the football project, which the consultant back in 2006 described as one of the most complex cubes he’d ever seen.

I call that a success.  Note how I specified the “business logic phase?”  That brings me to the next topic…

It is NOT a data cleaner/shaper

OK, this is perhaps obvious since the PowerPivot tables cannot be edited, and there are no features/functions that let you reshape rows.  Yes, you can add calc columns and that often does the trick, but there are other cases (like for instance, removing dupe rows) that PowerPivot can’t do at all.

The whole “shape it in normal Excel and then paste it over to PowerPivot window” workaround is ok for one-off work, or for tables that you don’t expect to change very often.  But like many of you, I am striving to gain the benefits of automatic nightly data refresh that the server provides, and the Excel workaround doesn’t translate to that system at all.

So, even more than ever, you need a clean and properly-shaped data source to start with.  So the first half of the work that the consultant did for the football project in 2006, where the text files were imported into SQL and turned into a decent schema, is still very much required.

For a production system, I don’t think this is a bad thing at all.  It forces you to cooperate with IT (or whoever owns your databases) to give you what you need.  And it forces them to listen to you more clearly, as long as they care about “taming the Excel beast” that is always their favorite thing to complain about :)

For a production system, that cooperation is essential for robust results.  And if it’s not a production system, then yeah, the Excel shaping workaround is great.

“No, it can’t do that.  Oh, wait. Nevermind.  Yeah, it CAN do that.”

Having worked on the product from the beginning, I’m more closely familiar with its limitations than most people.  In some ways that’s an advantage of course – I don’t dive into projects only to find out later that it can’t be done.

Oddly though, so far, knowing the limitations has largely just been a hindrance.  Every time my initial answer was “no, not in PowerPivot v1,” I’ve turned out to be wrong.

I have succeeded using PowerPivot to solve every single real-life problem I’ve encountered in my first six months of using it.

Market basket analysis?  Ranking measures?  Standard deviation?  Many to many relationship problems?  Godawful horrible data sources?  Measures that calc according to different formulas at different levels of the pivot?  Iterating over variables that aren’t even in the view?  PowerPivot has defeated them all.  Well, more accurately, I have defeated them all with PowerPivot.  It’s not like I sit back and watch PowerPivot do its thing.  It is not always easy.  Which brings me to the next point:

Challenging and Rewarding

You know those rare occasions where you suddenly find yourself in the fast lane?  When your brain is forced to expand?  When you are truly challenged, in a good way?

I’m talking about a specific kind of challenge, the good kind.  Not the kind like when you take a new job and are overwhelmed by all the new rules and bureaucracy you have to learn.  And not even the kind when you’re learning most new technologies (HTML and XML come to mind).

The best examples of this “positive challenge” vibe from my life are the ones where I’ve been pushed by a mentor.  A couple of teachers come to mind.  Some specific coworkers as well – Zeke Koch, David Gainer, and Amir Netz most prominently.  Being around those guys everyday basically was a wakeup call – “Rob, you’ve been asleep.  Wake up, it’s time to grow, to be excellent.”

PowerPivot, oddly, has felt like that.  My brain has been expanding again, after a period of stagnation.  More specifically, PowerPivot combined with the problems I’ve been tackling has done this.

And it flows over to other areas too.  Example:  years ago when I needed to estimate the incoming query load for the football stats project, I asked around for advice, got none, and then basically just guessed.  It was a very hollow experience.  Non-excellent.

Then recently, I was presented with essentially the same challenge.  But this time, I didn’t guess, I modeled it:  estimated how many reports (and queries per report) each user would exercise at peak, built a spreadsheet, re-taught myself the Poisson and Binomial functions, and voila – a “users per server” estimate I could believe. 

I’m positive that working with PowerPivot is the difference between the “lazy guess” mentality from before and the “it can be done” mentality today.  I love it :)

Carrot, not stick (but sometimes the carrot is too big for one sitting)

I don’t want you to interpret the above as “PowerPivot is a harsh Pai Mei figure, kicking Rob’s ass day in, day out.”

Nope, I worked with PowerPivot for months, and did things I thought were pretty damn cool, without ever stretching my brain the way I have been lately.  If you’re an Excel pro, you will find PowerPivot to be a very welcoming environment.

Excel veterans:  you will never be forced to do anything uncomfortable with PowerPivot.

In fact you will do amazing things in your first week that will actually deliver greater results than standalone Excel, and it will actually seem easier than normal Excel.  Check out the CALCULATE function and you will see what I mean.

But boy, sooner or later, it will TEMPT you to try something bigger.

You’ll be sitting there one day thinking, “Gee, I sure wish I could build an analysis that showed X.”  And then something hits you – “hey wait, I bet THAT’s what that function I looked at the other day does” or “Didn’t I see a blog post last month about something similar?”

Two hours later, you’re still heads-down over your DAX formulas and relationships, feeling like you’re 30 seconds away from a breakthrough that will change your professional life.

You might not even succeed that first day.  You may have to come back tomorrow with a fresh perspective and a clear head.

And you love it.  Every minute of it.

But that’s when you realize that you have left the reservation.  You are not in Kansas anymore.  Time to take off the training wheels.  Pick your analogy.  Make no mistake – the power of DAX in particular can challenge you immensely.  Jon Udell described the examples I’ve covered here on the blog as “magic,” and he’s right – you won’t pick up everything overnight.  You should be prepared for that.

It’s called learning.  And you’ve almost forgotten what that feels like.  It’s what drew you to Excel in the first place, years ago – that fleeting glimpse of capabilities and the results it could deliver, but also frankly because you were hooked, addicted, to mastering it.

That feeling is back :)


Quick Tip: Don’t Over-Use FILTER()

April 7, 2010

The other day I was working on an updated version of my Team Playcalling report, which, by the way, look FABULOUS with Slicers:

NFL PlayCalling in PowerPivot

Only problem with it was that it was taking 90 seconds to refresh.  Unacceptable.

Howie Dickerman and Marius Dumitru, two of the Superheroes of DAX at Microsoft, pointed out that my measures were using the FILTER function in places that were not necessary.

For instance:

  [New Measure] = [Original Measure] ( 
                      FILTER(DataTable[PlayType]=”Pass”)
                                              )

Can be rewritten without the FILTER function:

  [New Measure] = [Original Measure] ( 
                      DataTable[PlayType]=”Pass”
                                              )

When I switched over to using those expressions directly, rather than the FILTER function, my 90 second refresh time dropped to about 3 seconds.

That’s a pretty significant boost in DAX measure performance from a very simple change.

Why so much faster?

The explanation from Howie made a lot of sense.  When the PowerPivot engine is evaluating a measure, it already has to take filter context from the pivot itself – row fields, slicer selections, etc.

And since that’s pretty much the #1 use case for the engine, well, applying filter context to a measure is highly optimized and fast.

So it’s not a big deal for the engine to inject another filter like I have specified in the rewritten example – it gets treated much the same as if that filter came from the pivot, as if a slicer had been set to PlayType=”Pass”

But the FILTER function, on the other hand…  well, it creates a brand-new table in memory.  I know my example just filters by one column, but the FILTER function can do some pretty amazing things, dynamically responding to current context.

Because of that power, the FILTER function must either create or update that dynamic table for every cell in the pivot where FILTER is used.  I have 256 cells in this particular pivot (32 teams times 8 measures), and originally all of them used FILTER.  But given that some of my measures are based on other measures that aren’t displayed in the pivot, my actual cell count was even higher.

Now imagine what would happen in a pivot with 5,000 rows :)

Next Football Post:  Grading the Football Project  >>


RANK measure finished

March 21, 2010

 
Angel Eyes Always Sees the Job Through

 

“…I always see the job through.”

-Angel Eyes

Yep, I said I would return and finish off the Rank measure.  Time to deliver.

 
(And this makes three straight Spaghetti Western movie quotes!)

Where we left off – Unknown ranks #1

Last time, I’d written a mostly-working version of the measure that ranks a player by his Rushing Yards value, and that doesn’t return an error when I leave the Unknown player in the pivot:

DAX Rank Measure that Does Not Error 
But the Unknown/Blank player is still messing up my rank because of all the small-time players it lumps together into a huge number of rushing yards – I want Edgerrin James to be first.

Must remove the Unknown player from rank consideration

This turns out to be pretty tricky.  Here’s the formula again:

   COUNTROWS(
     FILTER(
       ALL(Players[FullName]),
       [RushYards](Values(Players[FullName])) < [RushYards]
     )
   ) + 1

As explained last time, that’s counting the number of players that have more rush yards than the current player in the pivot, and then adding one.

The heart of that formula is this comparison test – players only get counted if this evals to true:

   [RushYards](Values(Players[FullName])) < [RushYards]

I want to add a condition to that.  It’s not enough for a player to have more rush yards.  They also must NOT be Unknown/Blank.  So I want it to be something like:

   [RushYards](Values(Players[FullName])) < [RushYards]
      && PlayerIsNonBlank

Filling in that green part with a real formula is the trick.

Why not just remove the + 1 and be done with it?

Indeed, I could remove the “+ 1” at the end of the formula and voila! – all is right with the rankings.

But that’s only true right now, without the pivot filtered in any way.  There will definitely be cases where, as I slice the pivot down to narrow cases, no Unknown player recorded any Rush Yards.  And suddenly, the top real player would be ranked as –1.

“PlayerIsNonBlank” – the DAX solution

First I had to define a new measure, [NonBlank Player Count].  It is defined as follows:

   COUNTROWS(DISTINCT(CleanPlayers[FullName]))

It turns out that DISTINCT does not return Unknown members.  DISTINCT and VALUES are exactly the same function, except that VALUES includes Unknown and DISTINCT does not.  (Colin, sorry I never replied to your comment awhile back, but yes, that is the difference).

So that’s pretty handy.  When I am in the context of a real player, this measure returns 1.  And so I can use that measure inside my Rank measure, to weed out the Unknown player from rank consideration.  Replacing the green pseudoformula from above my condition is now:

   [RushYards](Values(Players[FullName])) < [RushYards]
      && [NonBlank Player Count] = 1

Which makes my overall measure:

   COUNTROWS(
     FILTER(
       ALL(Players[FullName]),
       [RushYards](Values(Players[FullName])) < [RushYards]
         && [NonBlank Player Count] = 1
     )
   ) + 1

Does it work?

Sort of.  Here ya go:

DAX Rank Measure - Fixed Numbers But Unknown Still Shows Up

So… the numbers are right!  Edgerrin James is now #1 as he should be, and everyone follows after.

But that pesky Unknown player.  It won’t go away…  it’s like Rasputin or something.

It’s ugly, and I want my reports to be pretty.

Furthermore, look at the Grand Total:

DAX Rank Measure - Grand Total Shows 1

Again, ugly.  The grand total cell should show nothing.

The fix:  making the measure return BLANK()

To solve this, I took the entire measure above and wrapped it in two nested IF’s:

=IF(COUNTROWS(VALUES(CleanPlayers[FullName]))=1,
   IF(NOT(ISBLANK(Values(CleanPlayers[FullName]))),
      <The entire measure formula from above>,
      BLANK()
   ),
   BLANK()
)

What that means is this:

  1. If the current pivot context corresponds to exactly one player, AND that player is NOT a Blank player, then go ahead and evaluate the measure expression. 
  2. If the current pivot context corresponds to multiple players (as what happens in a Grand Total, or a Subtotal), return BLANK()
  3. If the current pivot context corresponds to a Blank player, again return BLANK()

This works great!  Check out the results at the top and bottom of the pivot:

DAX Rank Measure - BLANK for Unknown Player Means It No Longer Shows Up DAX Rank Measure - BLANK for Totals Too

When a measure returns BLANK(), that row disappears from the pivot, even the Grand Total.  Yay!

And if I add another measure that is non-blank for those rows, the Blank player comes back, but my Rank measure will display a blank cell rather than a number.

Are we done?  Depends on the intent

I said this was tricky remember?  Look what happens when I add another field to rows, like College Attended:

DAX Rank Measure - Rank Evaluates By Parent Not Overall

Neat!  Now my Rank measure treats each parent value (each College in this case) as a separate world in which to rank players.

Shaun Alexander is indeed the #1 NFL rusher from Alabama in my database.  Perhaps that’s exactly the behavior I want, and in some cases it will be.

At some point I will explore how to NOT do this – how to still display Shaun’s overall rank while nesting him under his college name.  But for now, this is enough to digest I think.

Next Football Post:  Do Not Over-Use FILTER()  >>


The Great Broken Links

March 15, 2010

Just discovered this morning that all of my links to the Great Football Project were broken.  Ugh, that’s the best intro to PowerPivot since it’s where I started blogging back in the Fall.

Links fixed now.


Rank continued – dealing with unknown members

March 11, 2010

 
Unknown Player Right Next to Edgerrin James

 

TUCO:  “There’s no name on it!”

BLONDIE:  “Well, there’s no name here, either.”

 

 

Ode to YouTube HD

When I used to work on Bing and would look at search logs, I was always puzzled by some of the search strings that went by.  Well today I gave them one of my own –“Clint Eastwood holding rock,” among others.  I hope that confuses someone.

YouTube in HD, though – you are my new best friend for finding obscure screenshots from movies.

The problem from last time

Remember my Rank report that started with #2?

Number Two Tops the List

Filtering out Unknown PlayersWell, the reason for that is that I have some unknown players.  I’ve been filtering them out using the Row filter feature, shown here at right.  Note the last checkbox in the list, with the blank name unchecked.

(Also note the appearance of Zeron Flemister, two checkboxes from the bottom.  My friends and I had a tradition of always picking him in the last round of fantasy drafts just because we thought his name was so funny.)

 

 

 

 

 

Anyway, if I clear that filter and try to include that blank player in the pivot, I get the lovely error below.

image

Actually, that’s a pretty good error string, since it tells me pretty much exactly why the measure failed to evaluate.

Question 1: Where do the blank players come from?

My first hunch here was that my CleanPlayers[Full Name] column contained some blank values.

But then I remembered that the whole point of the CleanPlayers table was to make sure all unnamed players were removed.

So if that column does not contain any blank player names, why on earth do I get a blank player name in the list when I put that column in my pivot?

Answer: The Plays table sneaks them back in

alien-ceiling-large

 

“They must have found another way in, something we missed!”

-Ripley

(Best picture I could find.  Seriously.)

Here’s the thing:  when I removed all of the “blank name” players from my Clean Players table, I didn’t go back and remove all rows from the Plays table that involved those players.

So those rows from the Plays table now point to *nothing* in the Clean Players table.

To demonstrate, I sorted both the Plays and CleanPlayers table by [PlayerID]:

Plays Table First Player ID is 4 CleanPlayers Table First Player ID is 70

The first player in the Plays table is ID=4.  The first player in CleanPlayers is ID=70.

So when I add a measure from the Plays table (like Rushing Yards, or even my Rank measure), that forces a blank/unknown item to appear in my Clean Players list, even though there aren’t any such blank rows in the CleanPlayers table.

Step one of the fix:  find the error

OK, I spent a LONG time trying to figure out WHERE the error occurred in my measure.  Along the way I sent mail to my buddies in Redmond and begged them for a “step into” debugger feature.  But I know that it’s an incredibly difficult feature to build.

Let’s make a long story short.  Remember that second filter clause from the last post that I wasn’t sure was needed?  The one that made sure we weren’t comparing a player to himself?  Well, it’s actually the place where the error occurs, too:

   Players[FullName]<> Values(Players[FullName])

I remove that from the formula and now just have this for my rank measure:

   COUNTROWS(
     FILTER(
       ALL(Players[FullName]),
       [RushYards](Values(Players[FullName])) < [RushYards]
     )
   ) + 1

And that no longer errors with the unknown player in the pivot:

DAX Rank Measure that Does Not Error

Good news: not only does the measure no longer error, I verified that it returns the exact same results for all 4,000 players as the measure did before I removed the second clause.  It was like my measure had appendicitis – a critical problem in a needless organ :)

Bad news:  That unknown player is still grabbing the top rank, demoting Edge to 2.

Good news:  I figured out how to fix that.

Bad news:  Man is the fix ugly.  I’m going to see if there’s a better way before I blog about my Rube Goldberg solution.

I really did intend to finish this off for you guys today but don’t want to spread bad methodology.  Stay tuned.


Writing a RANK measure (and living to tell the tale)

March 8, 2010

 
Beavis_and_Butthead

“Number One, I order you to take…  a number two.”

-Captain Jean-Luc Butthead

 

 

Before I get started, do you have any idea how hard it is to find cartoon images from the pre-Internet era?  I just lost ninety minutes, first looking for the Far Side cartoon of “Custer’s Last Group Photo” (where everyone is holding up the “we’re number 1!” finger – get it?) and then struggling to find a version of the Beavis and Butthead image above at a better size.  No dice.  OK, enough of that…

DAX v1 has no RANK() function

That’s right.  If you want to write a pivot measure that tells you where a particular store, or product, or customer ranks according to another measure, and according to current filter context, well, a RANK() function would be nice.  But there wasn’t time to implement it in PowerPivot v1.

Never fear, there’s a workaround…

OK, yes, there’s a workaround – the FAQ says so right here.

But “never fear” might be an overstatement.  In fact, I should pause right here and say…

PowerPivot is simple, this technique is not

I’ve been doing this blog since October.  I’ve covered a lot of ground.  If you read this post and fear that you have to understand it in order to get great results with PowerPivot, well, then I’ve done you a great disservice.

Because in many ways, PowerPivot is actually easier than Excel itself.  If you are new here, I strongly recommend going back to the Great Football Project and starting from the beginning.

OK, back to the power technique

I don’t know about you, but personally, my first impression of the formula referenced in the FAQ sounded a lot like “double yoo tee eff.”  Check it out:

  COUNTROWS(
      FILTER( 
          ALL(DimCustomer[CustomerName]), 
          DimCustomer[CustomerSales]
              (Values(DimCustomer[CustomerName]))
                  
< DimCustomer[CustomerSales]
          && DimCustomer[CustomerName] <>  
                   Values(DimCustomer[CustomerName]) 
      )
  )
+1

Explaining how it works

That formula has been bothering me for a long time.  I knew it wasn’t going to be easy to break it down, but on our little DAX adventure together, we must leave no stone unturned, even the really heavy ones.

So today, I dove in.  And whenever I get uncomfortable, I run to familiar ground.  Which means, of course, oh yes…

The Great Football Project Returns!

Quick!  Someone get me some early-2000’s running backs!

  94801734-marshall-faulk   Edgerrin-James_1 Shaun_Alexander

Phew, I feel better already.  OK, let’s dig in.  A long time ago I defined a [RushingYards] measure using the CALCULATE function.  Let’s park that in a pivot:

PowerPivot Rushing Yards 
I want to write a measure that gives me the rank of each player, by Rushing Yards, no matter how I have sliced or filtered my pivot.

First step:  convert the sample formula over to my table/column names

When I substitute my football workbook’s names into that formula, I get:

   COUNTROWS(FILTER(ALL(Players[FullName]),
   [Rushing Yards](Values(Players[FullName]))
   <[Rushing Yards] && Players[FullName]<>
   Values(Players[FullName])))+1

I won’t bother pretty-printing that monster just yet, because I’m going to dissect it step by step.

Outside functions:  COUNTROWS + 1

The formula is basically:

   COUNTROWS(all kinds of crazy stuff) + 1

Which starts to make a little bit of sense, if you’re feeling sleuthy.  Really, what this is doing is:

   COUNTROWS(# of Player Rows where Player has more yards) + 1

Make sense?  Find how many players outrank the current player, add one, and that’s your rank.  OK, so how does the middle stuff determine the number of players that outrank the current player?

Inside functions – FILTER and ALL

   FILTER(ALL(Players[FullName]),
   [Rushing Yards](Values(Players[FullName]))<[Rushing Yards]
    && Players[FullName]<>Values(Players[FullName])))

Recall that the syntax for FILTER is:

   FILTER(Table Expression, Filter Clause)

In this case, our table expression is:

   ALL(Players[FullName])

Which makes sense.  Remember, on a particular row of our pivot above, the Players table is going to only contain the row from that player, for purposes of measure calculation.

Ranking Edgerrin James against himself, for instance, isn’t very interesting, unless you want to compare the Colts version to the Cards version, or heaven forbid the Seahawks version.  (The wear and tear on RB’s really is astounding).

So in order to have any players to rank the current player against, we need a table where the filter has been cleared.  That’s why we use ALL.

That leaves us with the Filter Clause, and it deserves its own section.

Inside Functions – the “Filter Clause” parameter to FILTER

OK, so we have a table that consists of all player names.  And we want to filter that down to just those players that have more Rushing Yards than the current player.

The FILTER function goes through that table of player names, one by one, and for each row, evaluates the filter clause to determine whether that row is “in” or “out.”

Our filter clause is really two clauses connected by the logical “and” operator – &&.  So a row must meet BOTH criteria to be included.

First clause:

   [Rushing Yards](Values(Players[FullName]))<[Rushing Yards]

First Row of PivotThis gets tricky, so let’s anchor it in an example.  Let’s say we are evaluating this measure for the very first row of our pivot, Edgerrin James:

Player Table

And the FILTER function is going to step through  our table of all player names, at right.  (There are about 4,000 players in my players table).  So, FILTER starts with Rabih Abdullah.

The right side of the clause above, [Rushing Yards], is the [Rushing Yards] measure, evaluated in the context of this source row, Rabih Abdullah.  Well, he totaled 172 yards in his rushing career.

But what about the left side of the clause:

  [Rushing Yards](Values(Players[FullName]))

That uses the context from the *pivot*, not the source table.  the VALUES function returns the list of valid values of the specified column, according to current pivot context.  Since the specified column is on the rows axis of our pivot, it only returns one value in this case, which is Edgerrin James.

Edge has 9,842 rushing yards.  So the clause becomes:

   9842 < 172

Which is of course FALSE.  Rabih’s row gets excluded by the FILTER function.

Wash rinse, repeat.  And then repeat.

FILTER then moves on to the next source row:  Khalid Abdullah.  The pivot context remains the same (Edge James), so the left side of the comparison is still 9842.  Khalid has less than that, so again, excluded.

Well, in my data set, no one has more yards than Edge, he’s the top dog in my data set.  So when FILTER finishes its pass, there are no rows left included.  COUNTROWS then returns 0, the + 1 from the end of the formula gets added, and Edge gets his proper rank of 1.

Second Row of PivotThe measure then moves on to the next pivot context, the second row, which is Curtis Martin:

And then FILTER starts over again, running through the entire Players table again, row by row, starting with our friend Rabih.

Two loops tucked inside each other.  Wow.

The other half of the filter clause

Remember this?

    && Players[FullName]<>Values(Players[FullName])))

That just makes sure we don’t count the player against himself for purposes of rank.  Off the top of my head, I’m not sure that’s 100% necessary.  I’ll have to noodle on that a bit.

OK, let’s see it in action

I add my new measure to the pivot, and I get:

Number Two Tops the List

What the heck?  Is my measure broken?

No, it’s not.  I’ll explain what’s going on in my next post, since this one has run so long already.

Next Football Post:  RANK Measure Continued >>


Introducing CUBESET() and CUBERANKEDMEMBER()

February 2, 2010

batman04xf9

“Where does he get such WONDERFUL toys???”

-Kasper de Jonge

(To be honest, Kasper probably looks nothing like this.  I’ve never been to the Netherlands, but based on the fashions in Sweden, I am fairly certain that the Dutch don’t dress like that, at least not when they visit Sweden).

 

Poor Kasper, though.  The mystery of how the parameterized, sliced, auto-sorting report was built has simply been too much for him to bear, so the other day I sent him an advanced preview of the workbook.  I’ll share the same workbook (without the underlying data, which is licensed) with all of you, but first, some explanation.

Two posts back, I showed you the functions CUBEMEMBER and CUBEVALUE.  Think of CUBEMEMBER cells as composing the axes of a report, and CUBEVALUE cells as composing the numerical data in the report.

But CUBEMEMBER does not easily lend itself to dynamically-sorted reports.  Each CUBEMEMBER function fetches a fixed value (an NFL player, in my case) from the database.  To get a nifty auto-sorting report, we need to introduce…

CUBESET()

As far as Excel functions go, CUBESET is a strange one.  It’s 100% useless by itself, but forms the foundation for some very interesting things.  Let’s look at its syntax:

CUBESET(connection, set_expression, , [sort_order],
       [sort_by])

  1. Connection – the name of the connection, from Excel’s perspective.  In CTP3 builds, this is always “Sandbox” for PowerPivot connections.  This, BTW, is the first argument to every cube function.
  2. Set_expression – rather than return a single “member” like CUBEMEMBER, CUBESET returns a whole “Set” of values.  There are many different syntaxes for representing a set, but the one you will use most often is <ColumnName>.children, which returns all unique values of that column.
  3. Caption (optional) – there is no way for Excel to display all values of a set in a single cell, so you generally make up your own caption.  “” is a common caption, since it doesn’t clutter your report, but you can return any string you want, like “All NFL Players”.
  4. Sort_order (optional) – an integer from 0 to 7, representing “No Sorting,” “Sort Ascending,” “Sort Descending,” “Alpha Ascending,” “Alpha Descending,” “DB order ascending,” “DB order descending.”
  5. Sort_by (optional) – the MEASURE you want to sort by.  (Must be a measure, not a column!)  This argument is ignored unless you specify 1 or 2 for sort_order (ascending or descending).

Using CUBESET() in the football report

In my football report, I use CUBESET behind the scenes, with the following arguments:

=CUBESET("Sandbox",
         "[CleanPlayers].[FullName].children",
         "All Players”,
         2,
         "[Measures].[Rushing Yards]")

Ignoring the connection and caption arguments, what this formula means is:  “Return all unique values from the [FullName] column in the [CleanPlayers] table, sorted descending by the Rushing Yards measure.”

(Note that in Cube formulas, table names are wrapped in square brackets, whereas in DAX they are wrapped in single quotes.  It would be nice if cube functions could use single quotes for tables to distinguish them from columns as DAX does, but there’s an excellent historical reason for that.)

Fetching values from that set using CUBERANKEDMEMBER()

This is where it gets fun :)

The cell that contains the CUBESET formula only displays a caption, but behind that cell, the entire set is stored, invisibly.  CUBERANKEDMEMBER is how you fetch values from that set.

(Whether you chose to specify a sort order or not, that set IS ordered, even if the database decided to return the values in somewhat random order.  Hence the “RANKED” in its name.)

The syntax for CUBERANKEDMEMBER is quite simple:

=CUBERANKEDMEMBER(
         Connection,
         Set_Expression,
         Rank,
         [Caption]
)

For set expression, just give it the cell address of the CUBESET formula.  For rank, give it any integer: 1 to return the first member of the set, 2 for the second, etc.

In my football report, this is:

The syntax for CUBERANKEDMEMBER is quite simple:

=CUBERANKEDMEMBER(
         “Sandbox”,
         $B$3,
         ROW(A1)
)

B3 is the address of my CUBESET formula, but I used the absolute reference syntax $B$3 because I plan to copy this CUBERANKEDMEMBER formula down a column, and I don’t want Excel adjusting B3 to B4, B5, etc., since that won’t reference the right cell anymore.

OK, what the heck is ROW(A1)???

It’s an Excel trick.  If I just enter 1 for this argument, that will NOT increment to 2, 3, etc. when I copy this formula down a column.

And the whole reason to copy down a column is so that I get the 2nd, 3rd, 4th, etc. members of the set as I go down the column.

If I use ROW(A1), that will return 1 for the first instance of the formula.  And when I copy down the column, Excel increments that to ROW(A2), ROW(A3), etc., which return 2, 3, etc. – just what I want!

Comparison to the Named Set Method

Denny Lee’s series of posts on Top N reports goes a different direction than what I have here.  He defines a Named Set in the Excel UI (using the MDX language), then uses a PivotTable to return that top N set.

In comments on the first post, Colin Banfield asked how to order that set.  Neither Denny nor Colin have an answer at the time of this writing.  And if Denny and Colin are stumped on MDX, well, there’s no WAY that I am going to figure it out :)

But with CUBESET, you can see that the Sort_order and Sort_by params give me a simple method for ordering the set by a variety of criteria.  Then CUBERANKEDMEMBER lets me fetch any number of those sorted items, in whatever precise order I’d like.

So… no MDX required, (except for the .children trick I used in CUBESET, which technically is MDX), AND a simple solution to the ordering problem.

The Named Set feature is a good one, and it definitely has its uses, but for custom Top N reports, I’ll take CUBESET and stay away from that MDX madness.

And what about Top N filters in pivots?

Actually, if Top N is all you want, a Pivot is gonna be your best bet at least 90% of the time.  Remember, the only reason I went down the Cube Formulas path is that I wanted custom formatting in my report, and parameterization.

I’ll post again shortly about how pivots might even be better than you expect at Top N reports.

Don’t worry if I am losing you, this is not required…  nor easy

This whole topic of cube formulas is pure gravy, take-it-or-leave-it kind of stuff.  It took me several attempts to get comfy with it myself, and I was part of the team building the feature!

Think of this as a 200-level topic.  I’m including it for the handful of you who consider yourselves hardcore, to test your mettle and stretch your Excel skills :)  Everyone else, just file this away for later and come back some day when you find yourself needing precisely these tricks.

When it comes to building PowerPivot reports, I’d break things out like this:

100 Level Topics

  1. PivotTables, PivotCharts, and Slicers
  2. Converting a PivotTable to formulas and then custom arranging/formatting the report

200 Level

  1. Customizing CUBEVALUE cells to do custom arithmetic and parameterization
  2. Writing CUBESET and CUBERANKEDMEMBER formulas

300 Level

  1. Dynamic sorting using CUBESET and CUBERANKEDMEMBER
  2. MDX-Driven Named Sets, as Denny demonstrated

…which means that my next post on cube formulas is even more optional/ignorable than this one :)

Wake up Denny!

Denny, of course, well…  Denny HAS to pay attention, because I challenged his geek-hood in my last post.

Some of you might say that wasn’t fair, since I have more Excel background than Denny and he has more SQL background than I.  Sun Tzu once suggested that choosing the field and terms of battle was more important than the fighting itself, and I do love me some Sun Tzu :)

Are you like Kasper?  Want to skip ahead?  Download the workbook!

I have not covered every technique yet, but if you are getting anxious like Kasper and can’t wait any longer, you can download the workbook here and inspect every last formula.

Remember, the source data itself is licensed, so I’ve removed that from the workbook.  You won’t be able to fetch fresh values from the PowerPivot db if you try to refresh, or modify formulas.  I’m sharing this so that you can see all the formulas and understand how the report is built.

And no, Kasper didn’t get the source data, either :)

Next Football Post:  Writing a “Rank” Measure >>


Parameterizing the report

February 1, 2010

In the Parameterized Report, how do I get the “Fan Pts” column on the far right of the cube formula report to respond to those parameter cells in the top left?

PowerPivot Report Params

Easy.  The report is just formulas, so I can combine them with the parameter cells using Excel arithmetic.

Naming the Parameter Cells:  a Readability Convenience

The parameters themselves live in cells C3, C4, and C5.  I can reference them that way, but to make things easier to read, I’m going to give them names. 

C3To give a name to cell C3, I select it.  Note the control at the upper left corner of the grid that displays the address C3:

I just click inside that control and start typing to give C3 another name.  In this case, I name it YardsPerRushPt, below:

YardsPerRushPt

That will make my formulas easier to read, and also provides an added benefit later then I publish to SharePoint.

I then repeat this process for the other two parameter cells, C4 and C5, which I name YardsPerRecPt and PtsPerTD, respectively:

 

                  YardsPerRecPtAnd PtsPerTD

The Formula

The formula was always going to be easy to write, but now it will also be easier for us to read and understand.  Here is the formula in the Fan Pts column:

=CUBEVALUE("Sandbox",F3,Slicer_DayNightName,Slicer_WeekNbr,
           "[Measures].[Rush Yards]") /
RushYardsPerPt

+CUBEVALUE("Sandbox",F3,Slicer_DayNightName,Slicer_WeekNbr,
           "[Measures].[Rec Yards]") / RecYdsPerPt

+CUBEVALUE("Sandbox",F3,Slicer_DayNightName,Slicer_WeekNbr,
           "[Measures].[Rush TD]") * PtsPerTD

All three CUBEVALUE terms are identical except for the measure they reference, and the parameter they either divide or multiply by – I have highlighted the differences in blue.

Note how I have used the cell names I defined above, rather than cell references.  This is an under-used and beneficial feature in Excel, and not at all specific to PowerPivot.

Dealing with #N/A – The Excel IFERROR() Function

Sometimes, a player will not have recorded any of a particular statistic.  This is especially true when you start slicing the report and the circumstances get narrower.

When that happens, Excel will often display #N/A in the cell instead of 0.  (In a future post I will delve into what determines 0 vs. #N/A).

Sometimes #N/A is ok to display.  But in the Fan Pts column, it’s a problem.  A player will often record yards but no TD’s, for instance.  But once a single term in the formula above goes to #N/A, the entire formula goes to #N/A, which is incorrect since that player DID have other statistics. 

We just want #N/A terms to be treated as 0 in cases like this.  And the IFERROR() function in Excel is precisely designed for this.  If we wrap each term in IFERROR, we can specify an alternate value to be returned when an error like #N/A is encountered.

For example:

=IFERRROR(
         
CUBEVALUE"Sandbox",F3,Slicer_DayNightName,
                  
Slicer_WeekNbr,"[Measures].[Rush Yards]")
                    /
RushYardsPerPt
,0
)

Get to know IFERRORThe IFERROR wrapper is highlighted in blue.  If no error is encountered, then the “meat” inside the IFERROR function (the CUBEVALUE divided by RushYardsPerPt) is returned as usual.

IFERROR is a very useful function overall, and all PowerPivot Pros should get to know it (a phrase that conjures memories of Jon Lovitz on SNL).  It can also be used to return custom strings in place of errors like “No data recorded.”

                                                                                                                               GET to know me!

Parameters Exposed in SharePoint-Published Reports!

Time for that “extra” benefit of named cells.  In my last video, remember how there was that expandable task pane rendered in Excel Services, that allowed me to change the parameter values interactively?

PowerPivot Web Params

I said that required no coding, and that’s the truth :)  To activate that feature, you have to tunnel down a bit in the Excel UI, because hey, only hardcore folks like readers of this site would ever need such a thing, right? 

First you go to the File tab in Excel and choose the “Share” option:

Share Option 

Then pick the “Publish to Excel Services” suboption, and click the button of the same name that I have highlighted:

Excel Services

The dialog that comes up looks exactly like the Save As dialog.  And that’s what it is.  But there’s a new button there, highlighted below.  Click it!

Save As Dialog

Almost there, don’t give up!  There are two tabs on the resulting dialog.  The default tab, the Show tab, is very useful, but let’s skip that for now and switch to the Parameters tab:

Parameters Tab

Clicking the highlighted “Add” button yields the following, final dialog in the chain:

Add Params

Every named range in the workbook appears here as a checkbox.  (Slicers are treated as named ranges, too, so that they can be referenced in formulas as well).  I select all three of the names representing my parameters, and then tunnel back out, clicking OK as I go.

(I can then save the file locally or save it to SharePoint, or cancel the Save As dialog and do a normal save later – in each case, the parameter settings will now be saved).

When I view this report on SharePoint, using the browser, the parameters pane will now appear automatically.

Parameters as Web Part Interface

The parameters pane is the simplest way to leverage these exposed parameters.  But those three cells are now also exposed as part of the web part framework, enabling other controls in SharePoint to send input values to the report.

That’s an advanced topic and only mentioned here in case you happen to be a SharePoint programmer as well :)

What Next?  Aren’t We Done?

Nope, we are not done.  We haven’t yet covered how to support dynamic sorting!  So that’s coming up.

And keep in mind that this whole foray into cube formulas is a bit of an advanced topic, and it gets trickier as you add more features into the report.  None of this is necessary for building PowerPivot reports – pivots are going to be your answer most of the time anyway.

Denny Lee, Your Time Has Come!

My esteemed colleague, Denny Lee, has found his way into the deep end of the pool lately with a series of posts about Top N PowerPivot reports.

I have a lot of respect for Denny.  After all, the man can write MDX, and the existence of MDX is in some ways the reason why we had to create PowerPivot (…to replace MDX with something less intimidating).  And no one ends a sentence with “eh!” like Denny :)

But apparently Denny did not get the internal executive memo:  high-end Excel techniques are the domain of one Rob Collie, the PowerPivotPro!

Denny, you see, is now on my turf.  As they say in Excel power users circles, it’s, ahem, “on” now.

So my next few posts now have two purposes:  to explain how I did the dynamic sorting, and to give you some (cough cough, superior) alternatives to those MDX-backed techniques outlined by Denny.


PowerPivot for SharePoint Video

January 28, 2010

Royale with Cheese

“You know the funny thing about Europe?  It’s the little differences.  I mean, they got the same shit we got here…  just a LITTLE different.”

-Vincent Vega

Greetings from Stockholm!  This place…  I never want to leave.

Many firsts have been recorded here.  My first-ever presentation to a large audience outside the states.  First time having dinner in a restaurant straight out of a 1930’s Hollywood movie.  First time seeing an American guy, a British guy, a Japanese guy, and a Norwegian guy all get tossed out of a hotel bar for being drunk.  First time getting half-trapped by a blizzard in a foreign country.

…and oh yeah, my first time having a mandatory, cannot-postpone-it Windows Update pop up near the end of a presentation and occupy my demo laptop for a full hour.

You can’t make this stuff up.

I have to hand it to Marcus Gullberg and company from MS Sweden though, as well as Johan Åhlén and the entire Swedish SQL Server Users group.  The former for their quick improvisation – finding an Amir Netz video of the SharePoint stuff I was unable to show, and the latter for their patience and excellence questions while we figured it out.  It ended up being pretty funny, actually.

Their loss yesterday is now your gain – as a make-up, I recorded a video of the stuff I couldn’t show yesterday, and posted it to YouTube.

(This also marks my first-ever recording in another country).

Oh, and you know what they call a Quarter Pounder, in Sweden?

A quarter pounder.


Pulling back the curtain: Intro to Cube Formulas

January 18, 2010

OK, last week I showed you a parameterized PowerPivot report that contained no pivots anywhere.  How did I build it?

Best Tutorial:  Start with a Pivot

The easiest way to show you is to start with a PivotTable, even though once you’ve mastered this technique, you won’t always want to start this way.

Here I’ve got 4 measures from the Plays table broken out by ‘Clean Players’.[FullName], and the report is sorted by the [Rushing Yards] measure:

basic pivot

First, I want to get rid of that blank value at the top – over the years, an awful lot of “nobodies” amassed a lot of stats, and apparently STATS did not care enough to capture their names.  Accordingly, I don’t care enough to want them in my report.

Filter Out Blanks

So, I’ll just filter them out.  Easy enough, as shown here at right.

But I also want to limit my report to the top 20 players.  How do I do that, now that I have already applied a filter to this field?

When I try to add a Top 10 filter to this field, it clears my filter that removed the blank, and I get that blank, unknown player back in the report.

So how do I do this?  Well, I use a feature of PivotTables that Allan Folting argued for, and I argued against, back in 2006.  He was right, I was wrong, and the feature has helped me numerous times since :)

That feature is hidden under the Options button on the PivotTable Options ribbon tab.  It defaults to off.  Once enabled, I can go back to the Pivot and add a Top 10 filter, set to show the top 20 players as sorted by Rushing Yards:

Multiple Filters Per Field

Checking that checkbox allows me to then add my Top 20 filter on top of the original:

Top 20

Then I add some slicers, which yields the basic starting point that I want:

top 20 results

Enough with the pivot, Rob! We want the pivot-LESS report!

Indeed.  Time to blow this pivot into a hundred pieces.  Literally.

On the Pivot Options tab of the ribbon, under a dropdown called OLAP Tools, there is a button named “Convert to Formulas.”  On normal pivots, this whole dropdown is disabled…  because normal pivots are not OLAP pivots.

But PowerPivot-backed pivots are OLAP, because under the hood, the PowerPivot db engine is an OLAP engine.  So this button is enabled:

OLAP Tools Convert to Formulas

Watch what happens when you click that button:

Recap

Briefly, let’s recap what we see in the video:

  1. Every cell in the pivot has been converted to a formula
  2. Each formula returns the same result that the cell previously contained (when it was part of the pivot)
  3. Slicers still work – the formulas return different numbers reflecting slicer selections
  4. The formulas can be treated just like regular spreadsheet cells – moved, rearranged, deleted, etc.
  5. There are two functions involved – CUBEMEMBER() and CUBEVALUE()
  6. Generically, we call this feature “Cube Formulas”

Cube Formulas are NOT DAX!

I know that cube formulas are new to most of you, and seeing how I am introducing them in the context of PowerPivot, it’s natural to suspect that these formulas are part of DAX.

But they are not.  They are built-in to regular Excel and are there before you install PowerPivot.  Remember, DAX can only be used on tables in the PowerPivot window, and in the New Measure dialog.  DAX cannot be used in normal Excel spreadsheet cells.

Cube formulas are just another way for Excel to interact with certain kinds of data sources.  And PowerPivot is one data source that qualifies.  That’s good news for Excel pros – a brand-new way to build reports.  It’s a free benefit of PowerPivot :)

CUBEMEMBER() and CUBEVALUE()

As shown in the video, let’s inspect the formulas in three specific cells to show you what’s going on here.

First, the Edgerrin James label cell:

Edgerrin James CUBEMEMBER

Brushing over syntax for a moment, that formula says “this cell now represents Edgerrin James” – the string “Edgerrin James” was returned from the PowerPivot database in fact.

Then the “Rushing Yards” header cell:

CUBEMEMBER Rushing Yards Measure

Again brushing over syntax, that formula means “this cell now represents the Rushing Yards measure.”

And lastly, the 9842 value, which is Edgerrin James’ total Rushing Yards:

Rushing Yards CUBEMEMBER

I put this one in edit mode to show you that it references the Edgerrin James and Rushing Yards cells.  So this CUBEVALUE formula shows the intersection of Edgerrin James, and Rushing Yards, just like it did in the PivotTable.

Notice how it also references the two Slicers.  That’s how the formula returns different values as the slicers are manipulated.  Again, just like it did in the PivotTable.

More to come

Cube formulas are indeed the core trick behind my “pivot-less” report.  But there are several elements yet to be explained, like the parameterization, and even more importantly, how the sort order of the report can change in response to parameters and slicers.

I also want to show you what happens with this report when published to SharePoint.

I think this is enough for one post though :)

Next Football Post:  CUBESET() and CUBERANKEDMEMBER() >>