Pivots, I have not forgotten ye!

February 4, 2010

vizzini A Pivot can do THAT???  In-con-CEIV-able!

-Vizzini

OK, I’ve spent a lot of digital ink lately on the wonders of cube formulas, an alternative to pivots.  And Denny’s recent series of posts may suggest to you that you can’t really use Pivots effectively in Top N scenarios without employing MDX and named sets.

So are pivots a weak technology?

No, they most certainly are not, and can handle > 90% of the scenarios you throw at them (maybe more as you discover more of their feature set).  So I thought I’d take a quick break from cube formulas to point out a few strengths.

Starting Point

OK, here I’ve got a PowerPivot workbook with some tables loaded from AdventureWorks.  I’ve got a simple PivotTable defined, where unique customer email addresses (all 18,000+ of them) appear on rows, and Sum of Sales Amount on values.  The pivot is sorted descending by sales.

PowerPivot AdventureWorks Start

All right, I want to see the top 10.  Easy, if you use the Top 10 value filter feature (activate this popup UI using the highlighted dropdown on Row Labels):

PowerPivot Top 10 entry point

PowerPivot Top 10 Dlg

So far, I’m doing exactly what Denny did in his first post.  This yields:

PowerPivot Top 10 Basic Pivot

OK, no problems so far.  Time to clear up a potential misconception, though.

Misconception #1:  PowerPivot-backed Pivots Do NOT Filter Locally

When you have a pivot based on PowerPivot data, when you select Top N (or any other label or value filter, or even just the checkboxes), Excel does NOT fetch all of the values and then perform the filtering itself.

Instead, it constructs a query (it’s an MDX query, but you never have to see it), and sends that query over to the PowerPivot engine.  The PowerPivot engine performs the filtering and then only returns the matching values.

So no matter how many customers you have in the db, Excel only gets 10 back from the PowerPivot engine.  Performance is therefore still good, and you don’t run into Excel’s 1 million row limit.

…but when you first add the field to the pivot, before you’ve had a chance to define the top N filter…  ok, there’s a place where it’s gonna be slow (and potentially truncated).

A Misconception About the Misconception :) – Sorting IS Local

Clearly, if Excel lets the PowerPivot engine do the filtering, it must also let PowerPivot do the sorting, right?

Actually, no.  Excel DOES do the sorting itself.  Go figure.  (It makes sense, but it’s a long story…  and I happen to have forgotten the answer.  But it made sense to me when I remembered.)

Problem:  Top 10 Filtering By a Different Column

In Denny’s example, though, he didn’t just need to show Sales, and the top 10 customers by Sales.  He actually wanted to base his top 10 set of customers off of one column, and then for those customers, see what the values were like in a second column.

Misconception #2:  When I Want That, I’m Screwed

OK, let’s say that, for the top 10 customers in overall sales, I want to see the Quantities those customers ordered in each Product Category.  Add a couple fields to the pivot and it looks like:

PowerPivot Top 10 Noisy Pivot

Good news:  it’s still the same set of customers, and the information I want about quantity per category is there.

Bad news:  there’s a lot of extra information there that is not needed.  I don’t really care about the dollar amount purchased by these customers – I just care that they are my top 10 by dollars, and then I only want to see the quantity information.

I want to get rid of Sales.  But don’t I have to display Sales, since I am “top 10-ing” by Sales?

Turns out that pivots have another trick up their sleeve.  Let’s go back to that Top 10 dialog.  Check out the highlighted dropdown:

PowerPivot Top 10 Dlg Highlighted

Interesting, eh?  (See that Denny, I used your trademark.  Nothing is sacred when ‘Softies play the feud!)

Look what I find in that dropdown – a measure that’s not currently used in my pivot!

PowerPivot Top 10 Other Fields

Now THAT is even MORE interesting.  That tends to suggest that I can remove Sales from my pivot and yet preserve the “top 10 by Sales” filter.

(Those “_Count” measures are a necessary implementation evil in PowerPivot v1.  You can ignore them, which also means you HAVE to ignore them, but you get the idea.)

Yup, I can indeed remove Sales and the filter is preserved.  Check it out:

PowerPivot Top 10 By Non-Visible Measure

OK, I lost the sort order – it used to be sorted by Sales, now it appears to be alphabetical.  But it IS the same ten customers.  So we kept the filter.

Kinda makes sense that we lost the sort, given that Excel sorts locally.  That might be a problem in some cases, in other cases not.  They haven’t figured out the MDX to order the set in the named set approach either, so I’m gonna give myself a pass on this one :)

Advanced Topics – 2 Quick Questions

1) Does that Top N respect other filtering?

Why yes.  Yes it does.  So if I use a slicer, for instance, to filter to a year prior to Adriana19 even becoming a customer, my top 10 will change to be the top 10 for that year only.

2) If I want a top N filter on Sales, can I define that top N based on just sales of Bikes? 

YES!  You can do that by creating a “Bike Sales” measure:

=CALCULATE(Sum(‘SalesTable’[Sales Amount]),
           ‘CategoryTable’[EnglishName]=”Bikes”)

…and then do my Top N filter based on that!

(Exclamation point because value filters based on anything other than a grand total of a measure have always been a bit of a holy grail for, well, me.  And Allan Folting of the Excel team.  Maybe a few others.)

Quick Summary

The key takeaways from this post:

  1. Pivots have a Top N filtering capability built-in
  2. That filtering takes place in the PowerPivot engine for excellent performance
  3. You can filter on measures that aren’t displayed in the pivot (!)
  4. The top N DOES respect other filters on the pivot
  5. When combined with CALCULATE and custom measures, you can do just about anything
  6. Excel DOES perform pivot sorting itself (in contrast to filtering)

Link to cube formulas spreadsheet fixed

February 2, 2010

Sorry about that link below.  It is corrected now in the original article below, but here it is as well:

http://cid-470312ee93cc790d.skydrive.live.com/self.aspx/.Public/Football%20Cube%20Formulas.xlsx


Introducing CUBESET() and CUBERANKEDMEMBER()

February 2, 2010

batman04xf9

“Where does he get such WONDERFUL toys???”

-Kasper de Jonge

(To be honest, Kasper probably looks nothing like this.  I’ve never been to the Netherlands, but based on the fashions in Sweden, I am fairly certain that the Dutch don’t dress like that, at least not when they visit Sweden).

 

Poor Kasper, though.  The mystery of how the parameterized, sliced, auto-sorting report was built has simply been too much for him to bear, so the other day I sent him an advanced preview of the workbook.  I’ll share the same workbook (without the underlying data, which is licensed) with all of you, but first, some explanation.

Two posts back, I showed you the functions CUBEMEMBER and CUBEVALUE.  Think of CUBEMEMBER cells as composing the axes of a report, and CUBEVALUE cells as composing the numerical data in the report.

But CUBEMEMBER does not easily lend itself to dynamically-sorted reports.  Each CUBEMEMBER function fetches a fixed value (an NFL player, in my case) from the database.  To get a nifty auto-sorting report, we need to introduce…

CUBESET()

As far as Excel functions go, CUBESET is a strange one.  It’s 100% useless by itself, but forms the foundation for some very interesting things.  Let’s look at its syntax:

CUBESET(connection, set_expression, , [sort_order],
       [sort_by])

  1. Connection – the name of the connection, from Excel’s perspective.  In CTP3 builds, this is always “Sandbox” for PowerPivot connections.  This, BTW, is the first argument to every cube function.
  2. Set_expression – rather than return a single “member” like CUBEMEMBER, CUBESET returns a whole “Set” of values.  There are many different syntaxes for representing a set, but the one you will use most often is <ColumnName>.children, which returns all unique values of that column.
  3. Caption (optional) – there is no way for Excel to display all values of a set in a single cell, so you generally make up your own caption.  “” is a common caption, since it doesn’t clutter your report, but you can return any string you want, like “All NFL Players”.
  4. Sort_order (optional) – an integer from 0 to 7, representing “No Sorting,” “Sort Ascending,” “Sort Descending,” “Alpha Ascending,” “Alpha Descending,” “DB order ascending,” “DB order descending.”
  5. Sort_by (optional) – the MEASURE you want to sort by.  (Must be a measure, not a column!)  This argument is ignored unless you specify 1 or 2 for sort_order (ascending or descending).

Using CUBESET() in the football report

In my football report, I use CUBESET behind the scenes, with the following arguments:

=CUBESET("Sandbox",
         "[CleanPlayers].[FullName].children",
         "All Players”,
         2,
         "[Measures].[Rushing Yards]")

Ignoring the connection and caption arguments, what this formula means is:  “Return all unique values from the [FullName] column in the [CleanPlayers] table, sorted descending by the Rushing Yards measure.”

(Note that in Cube formulas, table names are wrapped in square brackets, whereas in DAX they are wrapped in single quotes.  It would be nice if cube functions could use single quotes for tables to distinguish them from columns as DAX does, but there’s an excellent historical reason for that.)

Fetching values from that set using CUBERANKEDMEMBER()

This is where it gets fun :)

The cell that contains the CUBESET formula only displays a caption, but behind that cell, the entire set is stored, invisibly.  CUBERANKEDMEMBER is how you fetch values from that set.

(Whether you chose to specify a sort order or not, that set IS ordered, even if the database decided to return the values in somewhat random order.  Hence the “RANKED” in its name.)

The syntax for CUBERANKEDMEMBER is quite simple:

=CUBERANKEDMEMBER(
         Connection,
         Set_Expression,
         Rank,
         [Caption]
)

For set expression, just give it the cell address of the CUBESET formula.  For rank, give it any integer: 1 to return the first member of the set, 2 for the second, etc.

In my football report, this is:

The syntax for CUBERANKEDMEMBER is quite simple:

=CUBERANKEDMEMBER(
         “Sandbox”,
         $B$3,
         ROW(A1)
)

B3 is the address of my CUBESET formula, but I used the absolute reference syntax $B$3 because I plan to copy this CUBERANKEDMEMBER formula down a column, and I don’t want Excel adjusting B3 to B4, B5, etc., since that won’t reference the right cell anymore.

OK, what the heck is ROW(A1)???

It’s an Excel trick.  If I just enter 1 for this argument, that will NOT increment to 2, 3, etc. when I copy this formula down a column.

And the whole reason to copy down a column is so that I get the 2nd, 3rd, 4th, etc. members of the set as I go down the column.

If I use ROW(A1), that will return 1 for the first instance of the formula.  And when I copy down the column, Excel increments that to ROW(A2), ROW(A3), etc., which return 2, 3, etc. – just what I want!

Comparison to the Named Set Method

Denny Lee’s series of posts on Top N reports goes a different direction than what I have here.  He defines a Named Set in the Excel UI (using the MDX language), then uses a PivotTable to return that top N set.

In comments on the first post, Colin Banfield asked how to order that set.  Neither Denny nor Colin have an answer at the time of this writing.  And if Denny and Colin are stumped on MDX, well, there’s no WAY that I am going to figure it out :)

But with CUBESET, you can see that the Sort_order and Sort_by params give me a simple method for ordering the set by a variety of criteria.  Then CUBERANKEDMEMBER lets me fetch any number of those sorted items, in whatever precise order I’d like.

So… no MDX required, (except for the .children trick I used in CUBESET, which technically is MDX), AND a simple solution to the ordering problem.

The Named Set feature is a good one, and it definitely has its uses, but for custom Top N reports, I’ll take CUBESET and stay away from that MDX madness.

And what about Top N filters in pivots?

Actually, if Top N is all you want, a Pivot is gonna be your best bet at least 90% of the time.  Remember, the only reason I went down the Cube Formulas path is that I wanted custom formatting in my report, and parameterization.

I’ll post again shortly about how pivots might even be better than you expect at Top N reports.

Don’t worry if I am losing you, this is not required…  nor easy

This whole topic of cube formulas is pure gravy, take-it-or-leave-it kind of stuff.  It took me several attempts to get comfy with it myself, and I was part of the team building the feature!

Think of this as a 200-level topic.  I’m including it for the handful of you who consider yourselves hardcore, to test your mettle and stretch your Excel skills :)   Everyone else, just file this away for later and come back some day when you find yourself needing precisely these tricks.

When it comes to building PowerPivot reports, I’d break things out like this:

100 Level Topics

  1. PivotTables, PivotCharts, and Slicers
  2. Converting a PivotTable to formulas and then custom arranging/formatting the report

200 Level

  1. Customizing CUBEVALUE cells to do custom arithmetic and parameterization
  2. Writing CUBESET and CUBERANKEDMEMBER formulas

300 Level

  1. Dynamic sorting using CUBESET and CUBERANKEDMEMBER
  2. MDX-Driven Named Sets, as Denny demonstrated

…which means that my next post on cube formulas is even more optional/ignorable than this one :)

Wake up Denny!

Denny, of course, well…  Denny HAS to pay attention, because I challenged his geek-hood in my last post.

Some of you might say that wasn’t fair, since I have more Excel background than Denny and he has more SQL background than I.  Sun Tzu once suggested that choosing the field and terms of battle was more important than the fighting itself, and I do love me some Sun Tzu :)

Are you like Kasper?  Want to skip ahead?  Download the workbook!

I have not covered every technique yet, but if you are getting anxious like Kasper and can’t wait any longer, you can download the workbook here and inspect every last formula.

Remember, the source data itself is licensed, so I’ve removed that from the workbook.  You won’t be able to fetch fresh values from the PowerPivot db if you try to refresh, or modify formulas.  I’m sharing this so that you can see all the formulas and understand how the report is built.

And no, Kasper didn’t get the source data, either :)

Next Football Post:  Writing a “Rank” Measure >>


Parameterizing the report

February 1, 2010

In the Parameterized Report, how do I get the “Fan Pts” column on the far right of the cube formula report to respond to those parameter cells in the top left?

PowerPivot Report Params

Easy.  The report is just formulas, so I can combine them with the parameter cells using Excel arithmetic.

Naming the Parameter Cells:  a Readability Convenience

The parameters themselves live in cells C3, C4, and C5.  I can reference them that way, but to make things easier to read, I’m going to give them names. 

C3To give a name to cell C3, I select it.  Note the control at the upper left corner of the grid that displays the address C3:

I just click inside that control and start typing to give C3 another name.  In this case, I name it YardsPerRushPt, below:

YardsPerRushPt

That will make my formulas easier to read, and also provides an added benefit later then I publish to SharePoint.

I then repeat this process for the other two parameter cells, C4 and C5, which I name YardsPerRecPt and PtsPerTD, respectively:

 

                  YardsPerRecPtAnd PtsPerTD

The Formula

The formula was always going to be easy to write, but now it will also be easier for us to read and understand.  Here is the formula in the Fan Pts column:

=CUBEVALUE("Sandbox",F3,Slicer_DayNightName,Slicer_WeekNbr,
           "[Measures].[Rush Yards]") /
RushYardsPerPt

+CUBEVALUE("Sandbox",F3,Slicer_DayNightName,Slicer_WeekNbr,
           "[Measures].[Rec Yards]") / RecYdsPerPt

+CUBEVALUE("Sandbox",F3,Slicer_DayNightName,Slicer_WeekNbr,
           "[Measures].[Rush TD]") * PtsPerTD

All three CUBEVALUE terms are identical except for the measure they reference, and the parameter they either divide or multiply by – I have highlighted the differences in blue.

Note how I have used the cell names I defined above, rather than cell references.  This is an under-used and beneficial feature in Excel, and not at all specific to PowerPivot.

Dealing with #N/A – The Excel IFERROR() Function

Sometimes, a player will not have recorded any of a particular statistic.  This is especially true when you start slicing the report and the circumstances get narrower.

When that happens, Excel will often display #N/A in the cell instead of 0.  (In a future post I will delve into what determines 0 vs. #N/A).

Sometimes #N/A is ok to display.  But in the Fan Pts column, it’s a problem.  A player will often record yards but no TD’s, for instance.  But once a single term in the formula above goes to #N/A, the entire formula goes to #N/A, which is incorrect since that player DID have other statistics. 

We just want #N/A terms to be treated as 0 in cases like this.  And the IFERROR() function in Excel is precisely designed for this.  If we wrap each term in IFERROR, we can specify an alternate value to be returned when an error like #N/A is encountered.

For example:

=IFERRROR(
         
CUBEVALUE"Sandbox",F3,Slicer_DayNightName,
                  
Slicer_WeekNbr,"[Measures].[Rush Yards]")
                    /
RushYardsPerPt
,0
)

Get to know IFERRORThe IFERROR wrapper is highlighted in blue.  If no error is encountered, then the “meat” inside the IFERROR function (the CUBEVALUE divided by RushYardsPerPt) is returned as usual.

IFERROR is a very useful function overall, and all PowerPivot Pros should get to know it (a phrase that conjures memories of Jon Lovitz on SNL).  It can also be used to return custom strings in place of errors like “No data recorded.”

                                                                                                                               GET to know me!

Parameters Exposed in SharePoint-Published Reports!

Time for that “extra” benefit of named cells.  In my last video, remember how there was that expandable task pane rendered in Excel Services, that allowed me to change the parameter values interactively?

PowerPivot Web Params

I said that required no coding, and that’s the truth :)   To activate that feature, you have to tunnel down a bit in the Excel UI, because hey, only hardcore folks like readers of this site would ever need such a thing, right? 

First you go to the File tab in Excel and choose the “Share” option:

Share Option 

Then pick the “Publish to Excel Services” suboption, and click the button of the same name that I have highlighted:

Excel Services

The dialog that comes up looks exactly like the Save As dialog.  And that’s what it is.  But there’s a new button there, highlighted below.  Click it!

Save As Dialog

Almost there, don’t give up!  There are two tabs on the resulting dialog.  The default tab, the Show tab, is very useful, but let’s skip that for now and switch to the Parameters tab:

Parameters Tab

Clicking the highlighted “Add” button yields the following, final dialog in the chain:

Add Params

Every named range in the workbook appears here as a checkbox.  (Slicers are treated as named ranges, too, so that they can be referenced in formulas as well).  I select all three of the names representing my parameters, and then tunnel back out, clicking OK as I go.

(I can then save the file locally or save it to SharePoint, or cancel the Save As dialog and do a normal save later – in each case, the parameter settings will now be saved).

When I view this report on SharePoint, using the browser, the parameters pane will now appear automatically.

Parameters as Web Part Interface

The parameters pane is the simplest way to leverage these exposed parameters.  But those three cells are now also exposed as part of the web part framework, enabling other controls in SharePoint to send input values to the report.

That’s an advanced topic and only mentioned here in case you happen to be a SharePoint programmer as well :)

What Next?  Aren’t We Done?

Nope, we are not done.  We haven’t yet covered how to support dynamic sorting!  So that’s coming up.

And keep in mind that this whole foray into cube formulas is a bit of an advanced topic, and it gets trickier as you add more features into the report.  None of this is necessary for building PowerPivot reports – pivots are going to be your answer most of the time anyway.

Denny Lee, Your Time Has Come!

My esteemed colleague, Denny Lee, has found his way into the deep end of the pool lately with a series of posts about Top N PowerPivot reports.

I have a lot of respect for Denny.  After all, the man can write MDX, and the existence of MDX is in some ways the reason why we had to create PowerPivot (…to replace MDX with something less intimidating).  And no one ends a sentence with “eh!” like Denny :)

But apparently Denny did not get the internal executive memo:  high-end Excel techniques are the domain of one Rob Collie, the PowerPivotPro!

Denny, you see, is now on my turf.  As they say in Excel power users circles, it’s, ahem, “on” now.

So my next few posts now have two purposes:  to explain how I did the dynamic sorting, and to give you some (cough cough, superior) alternatives to those MDX-backed techniques outlined by Denny.