User-Friendly Report Sorting With Slicers!

October 13, 2011

 
image

Yeah, It Really Works!

With great Excel skill comes…  insensitivity to others’ lack of it Smile

Let’s say you are a monster Excel pro.  You’re a pivot master.  Nothing is beyond you – even the more complex features of Excel seem easy.  That’s obviously a huge strength, an asset.  And PowerPivot magnifies those powers – it gives you a much bigger stage, makes you more important, and extends the reach of your work to a lot more people than before.  Great stuff.

But your skill level can also blind you.  The people who consume your PowerPivot applications and reports are not NEARLY as Excel-savvy as you.  The things you take for granted are often hard for them, sometimes even scary to them.

That can be frustrating of course, but remember:  if they understood Excel as well as you’d like them to, there wouldn’t be so much need for your skills.

Failure to understand a report is not THEIR fault.  It’s YOURS.

When you share a report with someone and they can’t figure it out, your first response may very well be to groan or sigh (inwardly), and mutter to yourself about how some people can’t seem to tie their own shoes.  Then you put on a helpful face and go explain to them how to use the report.  You may even say something like “hey, it’s actually pretty easy once you understand.”

That’s a tempting trap.  I’m not above it, trust me.  But I know that’s the wrong first instinct, to explain to them the mechanics of how to do it, or to tell them it’s actually pretty easy.  The right first instinct, the one I am constantly reinforcing with myself, is to think “how can I make the report easier to understand?”

And as your work becomes more important, and makes its way further up the leadership hierarchy of your organization, it becomes even more critical to have the right first instinct.

Example:  “I Can’t Sort the Report!”

Let’s say you have published the following mission-critical pivot report on UFO Sightings in the United States:

PowerPivot Report on UFO Sightings

And one of the report consumers says to you “great, but how do I sort by Average Sighting Length instead?”

Well, you and I (the Excel pros) both know about that little dropdown don’t we?

image

This Dropdown Scares Most People.  Seriously, it Does.

But that dropdown is scary.  Seriously.  The only people who don’t find it scary are Excel nerds like us.

And we, the Excel nerds, also know that we can right click in the Avg Sighting Len column and choose a sort option.  Normal people don’t know that.  Furthermore, that doesn’t work on SharePoint.  And really, the report consumer is used to simply clicking on column headers to sort – in just about every single application they have ever used…  except for Excel.

So in cases where sorting is important, can we give them something a little friendlier?  Yes we can.

Step 1:  Create Two Dummy Tables for Slicers

image  image

First table just lists all the measures you’d like the user to be able to sort by.  Second table is just Ascending/Descending (although as an added boost to friendliness, I came back and changed those to Largest to Smallest/Smallest to Largest because Ascending/Descending often confuses even me!)

Now you can add them as slicers on the report, even though they don’t do anything yet:

image

Step 2:  Create Measures that Detect User Selections on Those Slicers

[SelectedSortMeasure]=

IF(COUNTROWS(VALUES(SortBy[Sort Table By]))=1,
   VALUES(SortBy[Sort Table By]),
   “Total Sightings per Year”
)

[Selected Sort Order]=

IF(COUNTROWS(VALUES(SortOrder[Sort Order]))=1,
   VALUES(SortOrder[Sort Order]),
   “Largest to Smallest”
)

Both of those merely return the caption of whatever is selected.  And if more than one thing is selected on a slicer, it returns a default value – Total Sightings per Year in the first measure.

Step 3:  Create a 1,-1 Measure Based on Sort Order

[SortOrderMultiplier]=

IF([SelectedSortOrder]=”Smallest to Largest”,-1,1)

If the SortOrder measure defined above returns “Smallest to Largest” then this measure returns –1.  Otherwise it returns 1.

Step 4:  Create a Branching Measure Based of the “Sort Table By” Slicer

[HiddenSortMeasure] =

IF([SelectedSortMeasure]=”Avg Sighting Len (Mins)”,
   [Avg Sighting Length in Mins],
   IF([SelectedSortMeasure]=”Sightings per 100K Residents”,
      [Sightings per 100K],
      [Sightings per Year]
   )
)* [SortOrderMultiplier]

This measure returns an entirely different value based on whatever the user selects on Sort Table By.  Sometimes it “mimics” one measure, other times another.

And note that last line – it multiplies [SortOrderMultiplier], which is 1 or –1, by the whole thing.

Step 5 – Add The HiddenSortMeasure to the Pivot, Sort By It

image

Sort the pivot by that measure.  Notice how it is the negative version of the Total Sightings per Year measure?  That’s expected based on the slicer selections.

Step 6 – Hide that Column of the Spreadsheet

image

And the result:

image

This is actually really easy.  Took a lot longer to write this post than it did to add to the report.

THIS TOPIC CONTINUED:

      Adding “sort by state name” to this report

      Try this report out live in your browser!

Next step, of course, is to make this thing look better, but that’s another post.


Slicers For Selecting Last "X" Periods

October 11, 2011

 

Guest post by David Churchward

Using Time Intelligence in PowerPivot can appear scary when you first start using it and I’ve seen some weird and wonderful ways of attacking it, some that look like we’re calling into question the validity of the global phenomenon that is time and others that are quite simply brilliant.

In this post, I hope to provide a simple and understandable approach whilst adding another flexible dimension called “X” where “X” is a variable that can be set by a user in a slicer.  This might be, for example, a P&L report where the user wants to see the last 4 months instead of the normal pre-determined timeframes such as year to date or quarters.  Alternatively, the report may be reviewing trends and, again, year to date or one of the other standard timeframes may not be the most relevant timeframe.

Incidentally – if you’re in the mood for conquering time intelligence there’s a host of brilliant information on this site including posts such as Running Totals Without a Traditional Calendar Table and PowerPivot time intelligent functions: why use ALL() and how to work around it which give some excellent direction on Time Intelligence and some of the complexities.

In this post, I’ll explain how you can put all of the control in the hands of the user by creating an X Periods measure. That is to say that the user can review the last X periods up to a date that they control.

Paving the Way

The crux of dealing with time intelligence in this way is to create a time table that links to your dataset and then a further time selection set of tables from which the user can select their required time parameters.

I’ve created so many of these where the fiscal year isn’t a conventional calendar year that I’ve found that it’s best to deal with the fact that some of the built-in time intelligence PowerPivot functions may not be valid. However, with last X periods, it doesn’t matter anyway.

I need the following tables in my dataset:

FACT_Tran- this is my fact table dataset that I wish to analyse.

clip_image002

Dates- this is an unbroken list of dates representing the timeframe of my dataset. Time Intelligence works better when your fact table is linked to a dates table, chronologically ordered without any breaks.

clip_image002

Year- a list of years in the dataset

clip_image003

Period - a list of periods in the dataset. This would normally be 1 to 12

clip_image004

Year_Period- this is more aligned to a proper time dimension in that it is a combination of the years and periods representing my dataset and carries additional information such as month end dates that we need for our measures (this version has been simplified for the purposes of this post)

clip_image005

X- this is a single column table that allows users to select the number of periods that they wish to see. This could be a numeric sequential number or specific timeframes that are most likely to be selected as in my example here.

clip_image006

Note – I’m using a predetermined set of values for X which contradicts my opening remarks but this is purely for the purposes of showing this example and it really depends on the situation!

Links

clip_image007

You’ll notice that my Year_Period table is linked to my Year and Period tables. My fact table is separately linked to my Dates table. For this method of Time Intelligence to work, you shouldn’t link the two sets of tables together otherwise filtering will occur when we don’t want it to.  You can get around this, but there’s no need to link the two sets of tables anyway.

Determining what has been Selected

Users will be given slicers to select the year and period that they are working on. In terms of X periods, we would consider this to drive the end date of the period to which X relates.  We then work back X periods from that date.

I need to create a series of measures to determine what the user has selected. These are as follows:

Selected_Month_End_Date = LASTDATE(Year_Period[Month_End_Date])

Selected_X = MAX(X[X])

Selected_X_Months_Start_Date = LASTDATE(DATEADD(Year_Period[Next_Month_Start_Date],[Selected_X]*-1,MONTH))

clip_image008

As I’m using Fiscal Periods and my financial year starts in July, period 3 for the year ending 2010 (as selected above) is actually September 2009. I’ve selected 9 as a value for X which means that I’m looking back 9 months from September 2009. My start date is therefore 1st Jan 2009.

Building These Dates into my Financial Measure

Now that we know what the user has selected in terms of X and the date range to which X relates, our financial measure is now very simple.

X_Periods_Value

=CALCULATE

(

SUM(FACT_Tran[Value]),

DATESBETWEEN(Dates[Date],[Selected_X_Months_Start_Date],[Selected_Month_End_Date])

)

We now have a dynamic measure which calculates the start and end dates that we wish to see and applies those dates to our value column.

This can be massaged to calculate year to date, prior year and many other date values as required and it is totally dynamic without any back end calculations.

Taking Things to the Next Level

The method above is what I use regularly to allow for year, period and X selectors to drive the analysis.  This works a treat if your primary reporting timeframe is a month.  However, what if days or weeks are a relevant timeframe?

Let’s adapt what we’ve already done by creating a new dates table.  Let’s call this table Dates2.  This is a sequential date column similar (if not identical) to Dates but NOT linked to our fact table.  Create 3 new measures on this table which are variations of the Selected_X_Months_Start_Date measure that we created previously.

Selected_X_Days_Start_Date

= DATEADD(LASTDATE(DATEADD(Dates2[Date],[Selected_X]*-1,DAY)),1,DAY)

 

Selected_X_Weeks_Start_Date

= DATEADD(LASTDATE(DATEADD(Dates2[Date],[Selected_X]*-7,DAY)),1,DAY)

 

Selected_X_Months_Start_Date

= DATEADD(LASTDATE(DATEADD(Dates2[Date],[Selected_X]*-1,MONTH)),1,DAY)

So why do we need two DATEADD functions in each of these measures? Simply put, we’re essentially using dates here. That’s a simple statement that doesn’t answer anything! What I mean is that we’re essentially not using datetime fields. Each date is therefore a value in it’s own right. Therefore, if we treat 30th September as a day only and therefore a value of 30, when we subtract one day, we would get 29. As a result, we would be reviewing 29th and 30th when in reality we only want to review one day. Since 30th is one day in it’s own right, we always have to add one day to each answer.

Note – take care to ensure that your fact table dates are in the same format.  Time can skew the answer if you’re not careful.

In the previous solution using year and period slicers to select months above, this problem was dealt with by using the Next_Month_Start_Date value which is one day ahead of our month end date selected and held as a value in our Year_Period table.

Our weeks solution is a derivative of the days function where we simply adjust by days * 7 to get to weeks.

I now create another selection table called Time_Type.  This is designed to select which timeframe type the user is selecting.

image

I create a slicer on my pivot table using the Type_Name and create a measure which determines which item has been selected by the user.

Selected_TimeType

= MAX(TimeType[Type_Code])

I then create another measure which uses the selected time type to determine which of the 3 measures to use:

Selected_X_TimeType_Start_Date

= IF([Selected_TimeType]=1,[Selected_X_Days_Start_Date],

IF([Selected_TimeType]=2,[Selected_X_Weeks_Start_Date],

IF([Selected_TimeType]=3,[Selected_X_Months_Start_Date],

   BLANK()

  )

   )

     )

 

image

We need one further measure to determine which date has been selected

Selected_Date2

= LASTDATE(Dates2[Date])

These new measures can now be used in our value measure:

X_Periods_Value

=CALCULATE

(

SUM(FACT_Tran[Value]),

DATESBETWEEN(Dates[Date],[Selected_X_TimeType_Start_Date],[Selected_Date2])

)

What if I Always Want my Report to be X Periods to Today’s Date?

This is simple.  We simply need to substitute Selected_Date2 for a new measure called Todays_Date in our X_Periods_Value measure and substitute Dates2[Date] with Todays_Date in Selected_X_Days_Start_Date and Selected_X_Weeks_Start_Date and Selected_X_Months_Start_Date measures

Todays_Date

= TODAY()

You do however need to take care about what your data content is.  As an example, I use a data warehouse that is built every night.  When I refer to today, I actually mean last night which means that my measure should in fact be

Todays_Date_LastNight

= TODAY() -1

What Does This Mean?

By using this method, we have passed the requirement to determine the timeframe to which a trend or report relates across to the user.  Trend timeframes could change and do we really want to have to recreate reports and dashboards as a result?  I hope that this solution solves that problem.

Having learnt from my previous post, YOU CAN DOWNLOAD THE WORKBOOK HERE.


Creating Accurate Percentile Measures in DAX – Part II

October 1, 2011

Intro from Rob:  today, Colin finishes in Percentile Measures topic from last week.

Note that 1) this technique requires the new release of PowerPivot that is currently still in beta (the Denali release)…

…and 2) I would rate this one a solid 3 or 4 on the spicy function scale.  Not one of the beginner techniques, in other words Smile

Guest Post by Colin Banfield

October 01, 2011

In Part I, I discussed my approach to creating percentile measures in DAX. In this part, we will add an extra dimension to our dataset and create a box & whisker PivotChart. The expanded dataset is shown in figure 1, with the source table on the left, and the resulting PowerPivot PivotTable on the right.

clip_image001

Figure 1 – Expanded dataset with Set column

In the expanded dataset, I have added a column called Set. The goal is to create percentile measures that are constrained to each set. Following the 4-step approach outlined in Part I, we start with the rank calculation. Recall that our rank formula in Part I was:

=RANKX(ALL(Data),[Sum of Value],,1)

This formula works when we are working with the entire table. However, with the addition of the Set column, we need to calculate values for the sets A, B, & C in a new filter context (i.e. one that excludes the ID column). We need RANKX to rank the values for each set independently. We employ the CALCULATETABLE function to achieve the desired result:

=RANKX(

                  CALCULATETABLE(
                                                       Data,
                                                      ALL(Data[ID])
                                                    ),
                 [Sum of Value],,1
               )

To show the ranking within each set more easily, I’ve sorted the source table by set and value, both in ascending order. Next, I created a calculated column that uses a lookup formula to get the rank measure from the PivotTable.

clip_image002

Figure 2 – Rank measure

With the inclusion of the Set column in the PivotTable, we must also modify other measures used in Part I.

Rank for 25th percentile ([25PctRank_INC])

From (Part I):

=CALCULATE(
                           (COUNTROWS(Data)-1)*25/100+1,
                           ALL(Data)
                         )

To:

=CALCULATE(
                           (COUNTROWS(Data)-1)*25/100+1,
                           ALL(Data[ID])
                         )

PercentileDown

From (Part I):

=MAXX(
                FILTER(
                             ALL(Data),
                             [Rank] = ROUNDDOWN([25PctRank_INC],0)
                           ),
              [Sum of Value]
            )

To:

=MAXX(
                FILTER(
                               ALL(Data[ID]),
                               [Rank] = ROUNDDOWN([25PctRank_INC],0)
                             ),
               [Sum of Value]
            )

PercentileUp

From (Part I):

=MAXX(
                  TOPN(
                               ROUNDUP([25PctRank_INC],0),
                              ALL(Data),
                             [Sum of Value],
                            1
                         ),
               [Sum of Value]
            )

To:

=MAXX(
                TOPN(
                             ROUNDUP([25PctRank_INC],0),
                             CALCULATETABLE(
                                                                   Data,
                                                                   ALL(Data[ID])
                                                                ),
                            [Sum of Value],
                            1
                          ),
               [Sum of Value]
            )

Or:

=CALCULATE(
                            MAXX(
                                          TOPN(
                                                       ROUNDUP([25PctRank_INC],0),
                                                       Data,
                                                       [Sum of Value],
                                                      1
                                                   ),
                                       [Sum of Value]
                                     ),
                         ALL(Data[ID])
                      )

The preceding alternative formulas are provided to illustrate how the same scalar result can be obtained using either CALCULATE or CALCULATETABLE. Similarly, we can use CALCULATETABLE instead of CALCULATE in the rank for the 25th percentile formula. I point this out because using CALCULATETABLE as an alternative to CALCULATE in applicable situations is rarely, if ever discussed.

25th, 50th, and 75th Percentiles

Recall the interpolation formula from Part I:

=[25thPercentileDown]+([25thPercentileUp]-[25thPercentileDown])*([25PctRank_INC]-ROUNDDOWN([25PctRank_INC],0))

After completing the measures for the 50th and 75th percentiles, our table is as shown in figure 3.

clip_image003

Figure 3 – Percentile measures, with comparisons to values returned by Excel’s PERCENTTILE.INC function

Creating a Box & Whisker PivotChart from Percentile Measures

It is difficult, if not impossible to analyze data directly in a table – whether or not the data is summarized. The purpose of a chart is (or should be) to provide us with insights into the data that we cannot derive by staring at the numbers. One such chart that takes advantage of percentile measures is the box & whisker plot (hereafter called a box plot). This chart is commonly used in areas such as statistical process control (SPC) and Six Sigma, but they can be quite useful for general business analysis. This chart is not native to Excel, so it’s not surprising that its use is not explored more often. The box plot visually shows the distribution of values in a dataset. The most common plots show the minimum value, maximum value, the range between the 25th and 75th percentile (known as the interquartile range), the median value and the mean (average value). The “box” portion is the interquartile range, and the two “whisker” parts are the range between the min and the 25th percentile, and the range between the 75th percentile and the max. The box plot is most effective when comparing the distribution of values across several data sets. A few examples of box plots for general business use include:

· Comparing the distribution of call hold times of agents over a period

· Comparing the distribution of price fluctuations across several products, or product parts

· Comparing the distribution of shipping weights for different service levels (e.g. normal, express, overnight). This data could provide useful intelligence for a shipping company

· Comparing the distribution of actual salaries across different salary scales. An excellent example is provided in this article here. The article explains how to interpret the plot and also illustrates a very important concept for making charts more meaningful i.e. adding additional context to a chart by providing reference information.

Basically, any related entities in your business that could benefit from analyzing a distribution of their values could benefit from a box plot.

The first step in creating the box plot PivotChart is to arrange the data appropriately. The actual construction of the plot is beyond the scope of this post, but the technique is explained in this tutorial by Excel charting guru, Jon Peltier. If you’re new to creating box plots in Excel, you should read the tutorial before continuing. Jon’s excellent charting tutorials have been around for a long time, and recently he has been updating them. In addition to his tutorials, Jon also has a few chart utilities for sale (and no, I don’t earn a commission for this plug).

The box plot PivotChart using our dataset is shown in figure 4

clip_image004

Figure 4 – Box plot PivotChart

Notes

1. I created a perspective called BoxPlot Data to contain only the measures required for the box plot

2. The measures are numbered to indicate the order that they must be added to the PivotTable

3. The box plot measure formulas are as follows:

[1-25th Percentile] = [25thPercentile_INC]

[2-50th Percentile] = [50thPercentile_INC]-[25thPercentile_INC]

[3-75th Percentile] = [75thPercentile_INC]-[50thPercentile_INC]

[4-Minimum] = [25thPercentile_INC]- MINX(ALL(Data[ID]),[Sum of Value])

[5-Maximum] = MAXX(ALL(Data[ID]),[Sum of Value])- [75thPercentile_INC]

[6-Mean] = AVERAGEX(ALL(Data[ID]),[Sum of Value])

4. I’ve created a chart template that can be used to create box plots. When creating a new plot, the dimension and measures should be added to the PivotTable before selecting the chart template from the Insert Chart dialog box.

5. After you create the chart from the template, the whiskers will not appear. This occurs because the whiskers are created using error bars, and external ranges are used to specify the data for the error bars. Unlike the rest of the chart, the template doesn’t know which ranges to use for error bars. You must add these manually, using the technique detailed in Jon’s tutorial.

6. The white median lines you see in figure 4 were created using a bit of formatting trickery. I used a white border for the bars. The outside borders don’t show because they are set against a white background.

Downloads

You can download a copy of the workbook & chart template from the following location:

http://powerpivotfaq.com/PowerPivot%20Samples/Forms/AllItems.aspx


Creating Accurate Percentile Measures in DAX – Part I

September 29, 2011

Guest Post By Colin Banfield

29th September, 2011

Equivalents of Excel’s Percentile, Quartile, and Median functions are perhaps the most significant omissions in Denali’s DAX statistical function library. Quartile and Median are actually special cases of percentile, and in this post, we calculate these special cases. Some of the most insightful BI analysis involves the use of statistics, and PowerPivot provides an excellent environment for developing statistical measures. Perhaps percentile and its cohorts will be considered for inclusion in PowerPivot V3.

Percentile measures are not difficult to create in DAX, but the process is also not trivial. Adding to the complexity is the need to interpolate values if you want to duplicate the accuracy of Excel’s percentile functions. My approach to creating percentile measures in PowerPivot is as follows:

  1. Calculate the ascending rank of values in the dataset under exploration.
  2. Calculate the rank corresponding to required percentile. I use the formula n=P(N-1)/100+1, where n is the rank for a given percentile P, and N is the number of rows in the dataset. This formula provides the rank corresponding to Excel’s PERCENTILE or PERCENTILE.INC function. For a calculation equivalent to Excel’s PERCENTILE.EXC, the rank for a given percentile can be found by using the formula n=P(N+1)/100. In most cases, the rank calculated will not be an integer. Therefore, for a generalized solution, we must use the integer rank below and above the calculated value and interpolate. For example, the first formula in (1) calculates a rank of 11.25 for P=25 (25th percentile) and N=42. We must therefore interpolate between the dataset ranks of 11 and 12.
  3. Find the data values corresponding to the integer ranks derived from step 2. For convenience, let’s call these values PPercentileDown and PPercentileUp.
  4. Calculate the percentile using linear interpolation.

Let’s start with a simple dataset to illustrate the concepts. In figure 1, the original dataset is in the table on the left. The corresponding PowerPivot PivotTable is on the right.

Figure 1- Original Dataset

Step 1 – Calculate the ascending rank of values in the dataset

Fortunately, the newly minted RANK function in Denali makes the calculation a lot easier than the rank formula we had to create in V1. The rank formula is simply:

=RANKX(ALL(Data),[Sum of Value],,1)

Figure 2 – Rank Measure

Step 2 – Calculate the rank corresponding to required percentile

We will first calculate the rank corresponding to the 25th percentile (inclusive) in the dataset. Recalling the formula n=P(N-1)/100+1, this translates into the following DAX formula:

=CALCULATE(

      (COUNTROWS(Data)-1)*25/100+1,


      ALL(Data)

          )

In this case, the calculated rank is 11.25.

Figure 3 – Rank of 25th percentile

Step 3 – Find the data values corresponding to the integer ranks derived from step 2

Our 25th percentile value is somewhere between the values corresponding to the 11th and 12th rank, as shown in figure 3. We need formulas to extract these values as measures. The 25thPercentileDown formula is:

=MAXX(

      FILTER(

             ALL(Data),

             [Rank] = ROUNDDOWN([25PctRank_INC],0)

            ),

      [Sum of Value]

     )

We are returning the Max of Sum of Value, filtered so that [Rank] is equal to the rank of the 25th percentile, rounded down to the nearest integer rank. In other words, we are filtering the table to return only values where [Rank] = 11. In the case of ties, there will be more than one value in the filtered table (all identical). This is the reason for using MAXX. However, MINX, or even AVERAGEX would calculate the same result.

For 25thPercentileUp, you may be inclined to use a similar formula, substituting the ROUNDDOWN function for ROUNDUP. However, in the event of ties for the 25th percentile, the filter will be incorrect. This is so because [Rank] will calculate the same rank for ties (11 in this case), and ROUNDUP([25PctRank_INC],0)will be = 12. We can get around the problem by filtering the table using the TOPN function instead:

=MAXX(

      TOPN(

           ROUNDUP([25PctRank_INC],0),

           ALL(Data),

           [Sum of Value],

     1

          ),

     [Sum of Value]

    )

The formula simply finds the max of the top 12 values in the dataset (i.e. 11.25 rounded up to 12). The result is not affected by ties in the rank. Figure 4 shows the resultant measures.

Figure 4 – Rankup and Rankdown

Step 4 – Calculate the percentile using linear interpolation

Now that we have the values between which we need to interpolate to find the actual percentile, we use these values in a simple interpolation formula:

=[25thPercentileDown]+([25thPercentileUp]-[25thPercentileDown])*([25PctRank_INC]-ROUNDDOWN([25PctRank_INC],0))

= 1891.07 + (2073.29-1891.07)*(11.25-11) = 1891.07 + 182.22*0.25 = 1936.63

In figure 5, I’ve completed the calculations for the 50th percentile and 75th percentile and removed intermediate formulas from the PivotTable.

Figure 5 – Percentiles

Figure 5 illustrates that that values we calculated are identical to values calculated by using Excel’s PERCENTILE.INC function. Note that:

  • The 25th percentile is also the first quartile
  • The 50th percentile is the median (i.e. the same value returned by Excel’s MEDIAN function)
  • The 75th percentile is also the third quartile

Alternative median calculation

If we need to calculate the median without regard for percentiles, we could use the following formula:

=IF(

    MOD(

        COUNTROWS(ALL(Data)),

        2

  )=0,

    AVERAGEX(

             TOPN(

                  2,

                  TOPN(

                       (COUNTROWS(ALL(Data))/2)+1,

                       ALL(Data),

                       [Sum of Value],

                       1

                      ),

                  [Sum of Value]

                 ),

             [Sum of Value]

            ),

   MAXX(

        TOPN(

             COUNTROWS(ALL(Data))/2,

             ALL(Data),

             [Sum of Value],

            ),

        [Sum of Value]

       )

   )

Essentially, we want to return the value in the middle of the range, after sorting the range in ascending order. The calculation we use depends on whether there are an even or odd number of values in our dataset. The first part of the formula, MOD(COUNTROWS(ALL(Data)),2)=0, determines if the dataset has an even number of values. I would have preferred to use ISEVEN (and thus eliminate an extra function call), but alas, Excel’s ISEVEN and ISODD functions didn’t make it into DAX.

If the number of values is even, we need to get the top value in the first half of the dataset, plus the next higher value (because we have to average these two values to get the median). We accomplish this goal in the following part of the formula:

TOPN(

     (COUNTROWS(ALL(Data))/2)+1,

     ALL(Data),

     [Sum of Value],

     1

    )

Based on our sample dataset, the preceding formula will return a table with values where ID=1 to ID=22. Next, we get the top 2 values from this new table (ID=21 and ID=22). To accomplish this task, we wrap another TOPN function around the new table and sort in descending order:

TOPN(
     2,

     TOPN(

          (COUNTROWS(ALL(Data))/2)+1,

          ALL(Data),

          [Sum of Value],

          1

         ),

    [Sum of Value]

   ),

Finally, we average the two values:

AVERAGEX(

         TOPN(

              2,

              TOPN(

                   (COUNTROWS(ALL(Data))/2)+1,

                   ALL(Data),

                   [Sum of Value],

                   1

                  ),

            [Sum of Value]

           ),

        [Sum of Value]

       ),

If the number of values in the dataset is odd, we just return the middle number, using the formula in the final argument of IF:

MAXX(

     TOPN(

          COUNTROWS(ALL(Data))/2,

          ALL(Data),

          [Sum of Value],

         ),

    [Sum of Value]

   )

Note that TOPN rounds a number up or down arithmetically, if the first argument is a decimal value. Dividing an odd number by 2 will therefore always round the number to the next higher value.

Percentile INC vs. EXC

At the beginning of this post, I mentioned that to compute percentiles corresponding to Excel’s PERCENTILE.EXC, you use the formula n=P(N+1)/100 to get the rank for a given percentile. When, therefore, should you use one versus the other? The answer is similar to the use of STDEV.S/STDEV.P or VAR.S/VAR.P. PERCENTILE.EXC, which is a new function in Excel 2010, generally provides better results when we are using a sample from a larger dataset (where it’s unlikely that the sample would include the dataset’s zero and one-hundredth percentile values – and thus these percentiles are excluded). PERCENTILE.INC is better when we’re working with the entire dataset.

My rants of the day

Rant #1

I can’t help but end this post with a couple of rants. I’ll start with my biggest irritation in all of PowerPivot. WE CAN’T HIDE MEASURES. That’s bold, underlined, and highlighted. This situation is quite unfortunate. Imagine if you couldn’t prevent a VBA function from surfacing in the user interface (by declaring the function as private). Imagine if you couldn’t reuse routines in your development work, but had to rewrite them over and over. Imagine if you couldn’t validate complex expressions and improve performance by maintaining intermediate formulas. Without being able to hide measures from client tools, we are faced with all of the preceding issues. We can hide table columns, so why are measures any different? In Analysis Services, when you create a calculated measure, you have the option to have the measure hidden or displayed in client tools. Why does this logic make sense for calculated measures in AS, but not in PowerPivot?

Look at my field list for percentiles in figure 6 (some not mentioned in this post):

Figure 6 – PowerPivot Field List

Of all the measures shown, I want the client to see about half the number of measures in the list. The rest are intermediate results for development purposes only. To be honest, I don’t know which I find more baffling; that the development team can’t see the absolute necessity of this feature, or that I seem to be the only being on the planet that has a problem with this issue. Grrr. Unlike some very brilliant folks, I am unable to construct a half-page worth of a complex DAX formula, without the need to validate intermediate results.

On this matter, I think that I’m in good company. Flipping through the second edition of “The Microsoft Data Warehouse Toolkit,” written by authors from the highly regarded Kimball Group, I came across the following statement:

The best way to create calculations and measures in PowerPivot is to break long, complex formulas up into their component parts and build them incrementally. This allows you to experiment with the data and quickly see the results of your formulas.

Great advice, but incredibly, they don’t address how you are supposed to keep all of those intermediate results from the client’s view. The only solution currently available, and it’s a poor one from a performance and maintenance standpoint, is to merge the intermediate formulas into one mega-formula. Using perspectives don’t help either, since you can’t hide the original list of measures.

Rant #2

I’m glad to see that Denali includes TOPN, as this is a very useful function. But where is TOPNPERCENT? Can you name a product where these two functions don’t appear together, like Gemini or Siamese twins? Excel conditional formatting? Both there. Microsoft Access? Both there. T-SQL? Both there. MDX? Ditto. I find this level of inconsistency irritating. You can create a TOPN percent measure by using a formula like TOPN(COUNTROWS(Data)*N%), which is understandably trivial. Well, you can create a distinct count measure by using COUNT(DISTINCT), which is also trivial, but we now have DISTINCTCOUNT because creating distinct count measures are so common (and because the BI Pros all requested it J).

On the positive side, there is the “issue” of TOPN returning more than N records if there are ties in the Nth record. Microsoft’s products are somewhat varied in behavior. PowerPivot, Excel Conditional Formatting, and Access always return more than N values if there is a tie in the Nth value. MDX in Analysis Services never returns more than N values, so if there are ties in N, values are arbitrarily dropped (in contrast, SAS Analytics provides an option in MDX TopCount/BottomCount, which allows you to specify whether or not to return ties in N). T-SQL’s TOP predicate provides a WITH TIES qualifier to give you control over returning ties in value N. Personally, I see no logic in dropping tied values in the Nth record. Which records do you drop, and which record do you keep? Also, if there are ties in any of the values within the top N, all the ties are returned. Therefore, I think that PowerPivot, Excel and Access provide the correct logic.

Next…

In Part II, I will continue the discussion of percentiles using a slightly more complex example and demonstrate how to use percentile measures to create a box & whisker PivotChart!


DAX Autocomplete Stops Working–A Workable “Fix”

September 27, 2011

 
image

“There, good as new.”

By Now, You’ve Probably Seen This

So you’re typing along, writing a DAX measure.  And suddenly, poof!  Autocomplete stops working.

This happens to me all day, every day.  Especially when I use the [Measure]( filter expression ) syntax.  Here’s a quick “before and after” example:

PowerPivot autocomplete is working

Autocomplete is Working for Table/Column Names

PowerPivot autocomplete is NOT working

Same Table Name, But Autocomplete is NOT Working

Grab Your Duct Tape…

A guy named Scott Kaylie showed me this about six months ago and I am just now getting around to posting it.  Yes, my backlog of posts has in fact grown to be at least six months long Sad smile

Anyway, here’s the trick.  Just put a * in between the measure and the open parenthesis, as if you were going to multiply the measure by another expression:

Tricking autocomplete into working with *

OK, so then just go ahead and finish writing the measure, leaving the * in there:

Tricking autocomplete into working with *

And then, at the very end, go back and remove the *…

Tricking autocomplete into working with *

Voila.  Ugly yes, but much better than having no workaround.  Thanks Scott.

Any other questions please consult the following home repair manual:

From GraphJam


Detecting ALL() – Detecting When Not Filtered

September 20, 2011

 
image

 

“I like to carry it, you never know when you’re gonna need it.”

-The  much-missed John Candy as Uncle Buck

 

 

A technique that you may need someday

File this under “you may never need this, but when you do, you’ll know it immediately.”  I’d call this one a solid 4 on the DAX Spicy Scale.

Why I needed this:  I had two different calendar tables, one at the Date level and one that was a custom calendar, Periods table.  Most of the measures in this model are written to be used with one table or the other, and I never have to “cross the streams.”

But then I ran into a case where a measure I had written to be used with the Periods table, suddenly needed to also be used on a pivot that was only filtered by Dates.  And I didn’t want to write a new version of this measure (for reasons that are mostly irrelevant here).

Of course, when I put the Period-focused measure on a Date-focused pivot, and there were no fields from Periods on the pivot, well…  the Period-focused measure returned junk.

What I decided to do, then, was detect if Periods was the aptly-named “Sir Not Appearing in This Film” and then assign a Period in that case.  In other words, detect if the Periods table was not on the pivot, and if not, FORCE a Period value into the evaluation of my measure.

Detecting ALL(), or the Absence of Filter on a Field or Table

Here is the final measure formula I used, color-coded for identify its parts:

IF(COUNTROWS(Periods) = COUNTROWS(ALL(Periods))-1,
   CALCULATE(My Original Measure,
      FILTER(Periods, Periods[Period] = [LatestPeriod])
   ),
  
My Original Measure
)

Let’s go part-by-part:

The Detector

IF(COUNTROWS(Periods) = COUNTROWS(ALL(Periods))-1

The “detector” counts the rows of Periods in the current pivot context, and compares that to the number of rows in the Periods table with all filters removed by ALL().  Pretty straightforward right?

So…  why the –1 at the end?  The reasons for that are slightly academic…  academic enough that I don’t really want to understand in great depth.  Let’s go with  the short version:  in this case, when I counted the rows of ALL(Periods), ALL() was kind enough to include the “blank” row of the Periods table.

What’s that?  You say you don’t HAVE a blank row in your Periods table???  Well, neither do I.  But you MIGHT have some rows in your Sales table that have a blank value for the [PeriodID] column.  Or maybe you have rows in the Sales table that DO have a [PeriodID] value, but that value does not appear in your Periods table.  Either way, you implicitly DO have blanks in your Periods table, and COUNTROWS(ALL()) decides to tell you about it.  So you’ll subtract one and like it, soldier!

(I have not tested, honestly, whether you always need – 1 in this detector.  If your Sales table is perfectly clean, maybe the – 1 is not needed, and maybe it is.  Someone let me know OK?  And I bet three-to-one that the answer comes from Italy.)

The Original Measure

My Original Measure

OK, this part IS straightforward.  Whatever my original measure was named, or perhaps its full original formula, appears here.  Moving on…

The Filter for the “No Periods Selected” Case

FILTER(Periods, Periods[Period] = [LatestPeriod])

OK, in the case that Periods is absent from the pivot, I take the original measure and then use the FILTER function to pretend that the Periods[Period] column IS on the pivot, and filtered to a value matching the most recent period, as calculated by my [LatestPeriod] measure.

But really, this part is going to be VERY different based on the circumstance.  Maybe you want to use a completely different measure, for instance.  Or set Periods to the first period this month.  Or the period corresponding to the current filter context from the Date table.

I merely included the “meat” of what I did here to drive home the intent.

Two Notes

One – note that there is NO difference between “the Periods table is not used on the pivot at all” and “the Periods table IS on the pivot but unfiltered in the current context.”  So if you have Periods on a slicer but nothing is selected, the detector will “go off.”  And if you have Periods on rows, the detector will still “go off” in the grand total cell of the pivot (and maybe in certain subtotal cells as well).

Two – I was messing around in this area when I discovered the need for the long-simmering Precedence Project.  In other words, when you start messing around with overriding filter context like I did with the FILTER() function above, and you’ve got a number of tables and relationships in play, every now and then you see something you don’t expect.  For that reason, I plan to return to the Precedence Project shortly.


The Incredible “Iffer-Blanker-Filter” Measure

September 6, 2011

Replica of Shawshank Postcard

“Not long after the warden deprived us of his company, I got a postcard in the mail.
It was BLANK(), but the postmark said… Fort Hancock, Texas.”

Omitting/Filtering Out Pivot Rows Based on an IF()

I’ve had this on my list to share for a long time, but David Churchward’s recent guest post bubbled it to the top.  At one point in that post, he had a nested IF measure that basically looked like this:

IF( <Non-Trivial Conditional Test>,
[Return an Existing Measure],
BLANK()
)

I find myself doing something like that once a week or so.  You just want the pivot to return certain rows – customers of a certain profit margin in David’s case, but it can be stores whose sales reflect certain characteristics, etc.

If it was just a static filter criteria, like “products that are blue,” then you wouldn’t need to put an IF in your measure at all.  You’d just set a filter in the pivot itself and use normal measures.

The key about these examples is that the filter criteria is dynamic – it takes into account selections made on slicers for instance.  And only measure logic can be dynamic like that.

But it’s Very Tedious to Write a Bunch of Measures Like That

OK, so you’ve written one measure that returns, say, [Sales] if the criteria is met, and BLANK() otherwise.  But your pivot needs to contain 6-7 different measures.  Do you want to go write that ugly IF measure another 6 times?  Doubtful.  And it would really suck if, sometime later, you realized that you needed to change the filter criteria.

David faced this problem in his post, and decided to write a single “flag” measure that returned 0 or 1 based on the conditional test, and then added a “Value” filter to the pivot based on that measure:

Pretty clever.  That way he doesn’t have to write the conditional test over and over in multiple new measures.  And given that it’s a “value” filter, it IS dynamic.

It caught my eye, because I’ve been doing this a different way.

My Approach – A “Flag” Measure that Returns 1 or BLANK()

I start out much the same way, with a single “flag” measure that evaluates the conditional test using an IF(), and that returns 1 if the condition is true.  But when it’s false, it returns BLANK() instead of 0:

[IfferBlanker]=
IF( <Non-Trivial Conditional Test>,
1,
BLANK()
)

Next, with my approach, I DO have to create multiple measures, which is a drawback.  However, each of these measures is very simple, like this:

[Filtered Sales]=
[Sales] * [IfferBlanker]

[Filtered Profit Margin]=
[Profit Margin] * [IfferBlanker]

Those resulting measures then return blanks if the condition isn’t met, and if every measure on a row returns blank, the pivot doesn’t show it (unless you change the default setting for blanks).  So it has the same end result as David’s approach.

Which Method is Superior?

I don’t think my way is better.  I plan to use David’s trick a lot actually – it’s a lot quicker to write one measure and apply a value filter than to write 6 or 7 measures, even if those measures are simple.  But I do think both methods have their usefulness.

For instance, if the report consumers are prone to fiddling, they may remove the value filter, perhaps even accidentally, and not realize that they are now seeing customers (or stores etc.) that do NOT fit the criteria.  And if the consumers of the model are  building their own pivots from scratch, they may not know how you applied the filter.  You may also sometimes want to mix and match filtered and unfiltered measures in a single pivot, which the value filter approach doesn’t support.  And if you’re going to build a LOT of pivots that require this filtering, investing in a handful of filtered measures may pay off in terms of effort, too.

I think I’m going to start using David’s approach about half the time, and stick with “Iffer-Blanker-Filter” measures for the other half, based on the situation.  Always good to have multiple tricks Smile


Guest Post – Greater Than/Less Than/Between Slicers

September 2, 2011

 
I get a reasonable number of questions in email each week, and rarely have time to respond to all of them.  Last week though I received a question that I couldn’t resist – it was just too “close to home” for me, on a theme that I like to call “with DAX, you can make slicers do ANYTHING.”  It reminded me a lot of the technique I used in the A/B Campaign Analysis post, but in a more generalized sense.

Pressed for time, I just dashed off a few hints in a reply and hoped that would be enough.  Well the bright dude on the other end took about 10 minutes to digest the hints and solve his problem, and then asked if there was something he could do to pay me back.  I asked if he’d be willing to write up the whole solution as a guest post, he said yes, and later that same day I received the following.

Slicing Data With Greater Than, Less Than, or Between

Guest Post by David Churchward

With PowerPivot’s ability to digest such huge datasets, it’s easy to get carried away and present dashboards that display too much information for users to digest.  Having fallen into this trap, I also found that as usage increases, users come up with more and more exceptions that need to be handled by the dashboard to present their required view on life.

Here’s a request I get all the time:  “I just want to see customers in my report if they have a margin pct greater than X – I don’t want to see any other customers.”  And of course, that also takes the form of “customers with margin less than Y” and “customers with margin between X and Y.”

In order to solve this issue, I started by formulating “Customer Sets” where I would condition Gross Margin bands, for example, Negative Margin, Margin < 5% and so on. I then realised that whichever sets I came up with, one of our users would inevitably find a requirement for a new one.  As a result, I worked on passing all of the onus onto the users by giving them a series of parameters from which they could select their own datasets.

The outcome was a series of slicers as below:

clip_image002

This gives users the means to select customers that fit within parameters that are within their control.

How It’s Done

Firstly three tables need to be created and then linked through to PowerPivot

clip_image003

The first table, let’s call it “ParameterType” contains three entries each with a numeric code associated to it. This code is simply to make the measures that we will create easier to write, less prone to typo errors and allows us to use “Max” or “Min” functions to ensure that we evaluate to one result and avoid that fantastic error message of “…value cannot be determined in the current context…..”.

The second and third tables, let’s call them “Parameter X” and “Parameter Y”, simply contain a series of GM percentages together with a Percentage Name.  The Percentage Name s are simply used for aesthetic reasons on the slicer as users don’t normally want to see 0.05 to represent 5%.

Once the tables are in place, there’s no need to link them together.  We’re simply going to use them as a means to collect the users parameters.

We now need to create 3 measures to determine which items have been selected by the user and make them available to our ultimate Gross Margin measure.  These are as follows:

GM_Parameter_Type  =MAX(ParameterType[GM_Parameter_Code])

Min_GM_Param_X =MIN(ParameterX[GM_Param_X])

Max_GM_Param_Y =MAX(ParameterY[GM_Param_Y])

This is all straight forward enough, but then I realised that I have to accommodate the fact that users are unpredictable and could select a parameter X that is higher than parameter Y.  In addition, because PowerPivot refreshes for each slicer operation, it will be too easy to enter a backwards range.  Therefore, we simply have to deal with it using two new measures

Selected_Max_PC

= IF(
    [Min_GM_Param_X] > [Max_GM_Param_Y],
    [Min_GM_Param_X],
    [Max_GM_Param_Y]
 
  )

Selected_Min_PC

= IF(
    [Max_GM_Param_Y] < [Min_GM_Param_X],
    [Max_GM_Param_Y],
    [Min_GM_Param_X]
 
  )

We’re now ready to construct our measures to deliver the correct Gross Margin values filtered by the ranges selected in our slicers.  In this example, my Gross Margin transactions are held in a table called FACT_TRANS and I have two measures called Margin and Marg_PCT (Gross Margin %).  I need three measures to evaluate each of the potential parameter types that could be selected – “Between X% and Y%” (measure called GM_BetweenXandY), “Greater Than X%” (measure called GM_GreaterThanX) and “Less Than Y%” (measure called GM_LessThanY).

GM_BetweenXandY

=IF(
[MARG_PCT]>=[Selected_Min_PC] &&
[Marg_PCT]<=[Selected_Max_PC],
[Margin],
BLANK()
  
)

GM_GreaterThanX

=IF(
[Marg_PCT]>=[Min_GM_Param_X],
[Margin],
BLANK()
  
)

GM_LessThanY

=IF(
[Marg_PCT]<=[Max_GM_Param_Y],
[Margin],
BLANK()
   )

We now have our three measures that can be called depending on the users selection in the ParameterType slicer.  The measure that does this is

Marg_XY

=if(
   [GM_Parameter_Type]=1,
   [GM_GreaterThanX],
   If(
    
[GM_Parameter_Type]=2,
     [GM_BetweenXandY],
     If(
       [GM_Parameter_Type]=3,
       [GM_LessThanY],
       BLANK()
       )

     )

   )

My dashboard is using GM% to filter those customers to be shown.  However, I want to show the Sales value as well and I want the filter to apply to those measures as well.  Therefore, I have two choices:

1. Construct further measures similar to Marg_XY but representing the sales measure (that returns BLANK() in cases that I don’t want it to show up) or

2. Create a flag to highlight the records to be shown and then filter on that flag.

I’m going to opt for number 2.  The measure used to create the flag is as follows:

Marg_XY_FLAG

=IF(FACT_TRANS[Marg_XY]=BLANK(),0,1)

I then set a simple value filter for Marg_XY_FLAG = 1 on the pivot table.

clip_image005

The slicers used don’t need cross filtering as they stand in isolation.  For performance sake, don’t forget to deselect the “Visually indicate items with no data” and “Show items with no data last” as shown below

clip_image007

The result when included in my customer sales analysis, subject to a bit of tidying up, is shown below.  I’ve used a random code to disguise Customer Names.

clip_image008

With 1.2million records in my fact table, the refresh rate on each slicer operation is under 1 second.  This would be even quicker if I had a linked customer dimension table but as I’m using random values I didn’t consider this worthwhile.


Distinct Count in PowerPivot v2 – MUCH Faster

July 28, 2011

A quick update, as tonight I managed to sit down and experiment a bit with the V2 Beta for the first time in many days.

(OK, actually, I am standing up…  because I have a fantastic new treadmill desk and have walked nearly 8 miles today while working, but that’s a topic for another post).

I was working with one of our HostedPowerPivot clients today who was observing slower-than-expected slicer click performance in one of their reports.  And after some sleuthing, I found that they had a “distinct count” measure in their model, something like this:

[Measure] = COUNTROWS(DISTINCT(Table[Column]))

Now, that’s a pretty useful measure in many situations, like “does product X sell every day?”  Simply taking a count of the date column doesn’t cut it, because if it sells twice on one day and not at all the next day, the count is still 2, and you want it to be 1.

Trouble is, in PowerPivot v1, that operation can be slow.  I won’t bore you with all the details, because frankly, I don’t know all of them, heh heh.  But I do know that PowerPivot v1 does end up creating all kinds of temporary new tables in memory behind the scenes when it is evaluating the measure.

Anyway, I’d heard a long time ago that PowerPivot V2 was going to make distinct counts a lot faster, so I thought I’d try it out.

The Test Case

First I needed to create a test case that was challenging.  I don’t want to compare “fast query vs. fast query” because that often gets skewed by fixed overhead.  Much better to test a “worst case scenario.”

So, I took a 15 million row table, and added a calculated column to it.  Calculated columns in PowerPivot are much less compressed than imported columns, and are therefore more challenging for the measure engine to scan through, looking for duplicates etc. which is required for a distinct count.

And then, to make matters worse, I just used the RAND() function so that there are many unique values:

image

Cutting to the Chase:  The Results

A distinct count measure in PowerPivot v1 took 35 seconds to complete in my test pivot.  This was a big part of why our client was seeing slow perf today.

That same measure, in that same pivot, took less than a second in the V2 beta.

Wow.  This is gonna be nice.

One Last Note:  Two Ways to Write the Formula

PowerPivot v1 did not have a dedicated function for distinct count, you had to do the countrows of distinct thing like what I showed above.

But in PowerPivot v2, there IS a DISTINCTCOUNT() function.  So you can write your distinct count measures in two ways:

[Measure] = COUNTROWS(DISTINCT(Table[Column]))

[Measure] = DISTINCTCOUNT(Table[Column])

That new function is nice, makes for a more readable formula.  BUT…  the performance is still the same.  Either way I wrote it, the measure was sub-second fast.  Very nice.


Guest Post: TOPN in PowerPivot V2

July 20, 2011

Well folks I haven’t had much time to play with the V2 Beta (aka CTP3) yet.  A lot going on over the past week:

  1. Authoring a brand new suite of models and reports for our Duane Reade DR-Direct solution
  2. A nasty crashing bug in PowerPivot for SharePoint that we’ve now learned to catch and repair
  3. I submitted a PowerPivot article to CIMA Insight magazine (for August)
  4. Running refresh tests with PowerPivot servers in Chicago pulling data from SQL in Dallas (over Internet VPN – more on this later) for the HostedPowerPivot offering

So in the meantime, David Hager has a short article to share on the new TOPN function.  I haven’t had a chance to try it yet, and I’m a little surprised that it doesn’t have special handling for ties (as he’s discovered).

David Hager on TOPN

Using the New PowerPivot Version 2 TOPN Function: Up, Down and Between

One of the new DAX functions included in the Denali PowerPivot CTP3 release is the TOPN function. This function returns a table which can be aggregated for the top n values. Prior to the release of this function, workarounds were available to achieve the same results, mainly through DAX ranking formulas (and Denali PowerPivot now has native functions for ranking too!). So, a typical DAX formula which uses the TOPN function is:

=SUMX(TOPN(N,SalesTable,SalesTable[SalesTotal]),SalesTable[SalesTotal])

where SalesTable is the table and SalesTotal is the column in that table containing the top n results to be analyzed.

Well, that was easy enough. However, there is a systemic problem here and it’s mainly data-related. In order to understand why the TOPN function don’t work the way you might expect, read the following that comes from the Denali PowerPivot CTP3 help documentation on TOPN.

“If there is a tie, in order_by values, at the Nth row of the table, then all tied rows are returned. Then, when there are ties at the Nth row the function might return more than n rows.”

In English, that means if there are duplicate values in your data, the TOPN function will return the WRONG answer if the Nth value is adjacent to duplicates. In order to get a meaningful result then, the data must be converted so that each data point has a unique value. The question is, how do you do that and retain the integrity of the data? An Excel trick can be used here as a workaround. This method requires that each value be incremented by a very small but unique value. The best implementation of this comes in the form of a calculated column with the following formula

SalesTable[UniqueIncremValue] = 
IF(RANDBETWEEN(-1,1)=-1,-RAND()*2,RAND())*0.000001

The goal of this formula is to create a column of unique numbers add up basically to zero. There is a balance that is required in this formula between affecting data and insuring that all values in the dataset are unique. The RAND function returns a 15-place decimal, which is the limit in Excel. However, if all of the decimal places were used, the value of the number could be as high as 1 or as low as -2. By multiplying the random numbers by a very small number, the numbers will not interfere with the individual values if they themselves are not small numbers. However, if the multiplier is made too small, not all of the random numbers may be unique depending on the size of the dataset. So, if you are going to use this workaround, you need to test [UniqueIncremValue] to see if all of the values are unique. The best way to do this is to compare its distinct count with the distinct count of a primary key column in the table. If they are equal, TOPN will work for any value of N.

Now, the original formula will work fine if the new calculated column is used in place of SalesTable[SalesTotal].

SalesTable[SalesTotalMod]=
SalesTable[SalesTotal]+ SalesTable[UniqueIncremValue]

It should also be noted that the TOPN function can used in a DAX like the one shown below to afford a BOTTONM result. Note that this formula takes advantage of the new DISTICTCOUNT function.

=SUMX(SalesTable,[SalesTotalMod])
-SUMX(
     TOPN(
         DISTINCTCOUNT(SalesTable[SalesTotalMod])- N,
         SalesTable,
         SalesTable[SalesTotalMod]
         ),
     SalesTable[SalesTotalMod]
    
)

I think you can see that the TOPN function is a valuable new item for your DAX toolbox. Have fun!