Creating Accurate Percentile Measures in DAX – Part I

September 29, 2011

Guest Post By Colin Banfield

29th September, 2011

Equivalents of Excel’s Percentile, Quartile, and Median functions are perhaps the most significant omissions in Denali’s DAX statistical function library. Quartile and Median are actually special cases of percentile, and in this post, we calculate these special cases. Some of the most insightful BI analysis involves the use of statistics, and PowerPivot provides an excellent environment for developing statistical measures. Perhaps percentile and its cohorts will be considered for inclusion in PowerPivot V3.

Percentile measures are not difficult to create in DAX, but the process is also not trivial. Adding to the complexity is the need to interpolate values if you want to duplicate the accuracy of Excel’s percentile functions. My approach to creating percentile measures in PowerPivot is as follows:

  1. Calculate the ascending rank of values in the dataset under exploration.
  2. Calculate the rank corresponding to required percentile. I use the formula n=P(N-1)/100+1, where n is the rank for a given percentile P, and N is the number of rows in the dataset. This formula provides the rank corresponding to Excel’s PERCENTILE or PERCENTILE.INC function. For a calculation equivalent to Excel’s PERCENTILE.EXC, the rank for a given percentile can be found by using the formula n=P(N+1)/100. In most cases, the rank calculated will not be an integer. Therefore, for a generalized solution, we must use the integer rank below and above the calculated value and interpolate. For example, the first formula in (1) calculates a rank of 11.25 for P=25 (25th percentile) and N=42. We must therefore interpolate between the dataset ranks of 11 and 12.
  3. Find the data values corresponding to the integer ranks derived from step 2. For convenience, let’s call these values PPercentileDown and PPercentileUp.
  4. Calculate the percentile using linear interpolation.

Let’s start with a simple dataset to illustrate the concepts. In figure 1, the original dataset is in the table on the left. The corresponding PowerPivot PivotTable is on the right.

Figure 1- Original Dataset

Step 1 – Calculate the ascending rank of values in the dataset

Fortunately, the newly minted RANK function in Denali makes the calculation a lot easier than the rank formula we had to create in V1. The rank formula is simply:

=RANKX(ALL(Data),[Sum of Value],,1)

Figure 2 – Rank Measure

Step 2 – Calculate the rank corresponding to required percentile

We will first calculate the rank corresponding to the 25th percentile (inclusive) in the dataset. Recalling the formula n=P(N-1)/100+1, this translates into the following DAX formula:

=CALCULATE(

      (COUNTROWS(Data)-1)*25/100+1,


      ALL(Data)

          )

In this case, the calculated rank is 11.25.

Figure 3 – Rank of 25th percentile

Step 3 – Find the data values corresponding to the integer ranks derived from step 2

Our 25th percentile value is somewhere between the values corresponding to the 11th and 12th rank, as shown in figure 3. We need formulas to extract these values as measures. The 25thPercentileDown formula is:

=MAXX(

      FILTER(

             ALL(Data),

             [Rank] = ROUNDDOWN([25PctRank_INC],0)

            ),

      [Sum of Value]

     )

We are returning the Max of Sum of Value, filtered so that [Rank] is equal to the rank of the 25th percentile, rounded down to the nearest integer rank. In other words, we are filtering the table to return only values where [Rank] = 11. In the case of ties, there will be more than one value in the filtered table (all identical). This is the reason for using MAXX. However, MINX, or even AVERAGEX would calculate the same result.

For 25thPercentileUp, you may be inclined to use a similar formula, substituting the ROUNDDOWN function for ROUNDUP. However, in the event of ties for the 25th percentile, the filter will be incorrect. This is so because [Rank] will calculate the same rank for ties (11 in this case), and ROUNDUP([25PctRank_INC],0)will be = 12. We can get around the problem by filtering the table using the TOPN function instead:

=MAXX(

      TOPN(

           ROUNDUP([25PctRank_INC],0),

           ALL(Data),

           [Sum of Value],

     1

          ),

     [Sum of Value]

    )

The formula simply finds the max of the top 12 values in the dataset (i.e. 11.25 rounded up to 12). The result is not affected by ties in the rank. Figure 4 shows the resultant measures.

Figure 4 – Rankup and Rankdown

Step 4 – Calculate the percentile using linear interpolation

Now that we have the values between which we need to interpolate to find the actual percentile, we use these values in a simple interpolation formula:

=[25thPercentileDown]+([25thPercentileUp]-[25thPercentileDown])*([25PctRank_INC]-ROUNDDOWN([25PctRank_INC],0))

= 1891.07 + (2073.29-1891.07)*(11.25-11) = 1891.07 + 182.22*0.25 = 1936.63

In figure 5, I’ve completed the calculations for the 50th percentile and 75th percentile and removed intermediate formulas from the PivotTable.

Figure 5 – Percentiles

Figure 5 illustrates that that values we calculated are identical to values calculated by using Excel’s PERCENTILE.INC function. Note that:

  • The 25th percentile is also the first quartile
  • The 50th percentile is the median (i.e. the same value returned by Excel’s MEDIAN function)
  • The 75th percentile is also the third quartile

Alternative median calculation

If we need to calculate the median without regard for percentiles, we could use the following formula:

=IF(

    MOD(

        COUNTROWS(ALL(Data)),

        2

  )=0,

    AVERAGEX(

             TOPN(

                  2,

                  TOPN(

                       (COUNTROWS(ALL(Data))/2)+1,

                       ALL(Data),

                       [Sum of Value],

                       1

                      ),

                  [Sum of Value]

                 ),

             [Sum of Value]

            ),

   MAXX(

        TOPN(

             COUNTROWS(ALL(Data))/2,

             ALL(Data),

             [Sum of Value],

            ),

        [Sum of Value]

       )

   )

Essentially, we want to return the value in the middle of the range, after sorting the range in ascending order. The calculation we use depends on whether there are an even or odd number of values in our dataset. The first part of the formula, MOD(COUNTROWS(ALL(Data)),2)=0, determines if the dataset has an even number of values. I would have preferred to use ISEVEN (and thus eliminate an extra function call), but alas, Excel’s ISEVEN and ISODD functions didn’t make it into DAX.

If the number of values is even, we need to get the top value in the first half of the dataset, plus the next higher value (because we have to average these two values to get the median). We accomplish this goal in the following part of the formula:

TOPN(

     (COUNTROWS(ALL(Data))/2)+1,

     ALL(Data),

     [Sum of Value],

     1

    )

Based on our sample dataset, the preceding formula will return a table with values where ID=1 to ID=22. Next, we get the top 2 values from this new table (ID=21 and ID=22). To accomplish this task, we wrap another TOPN function around the new table and sort in descending order:

TOPN(
     2,

     TOPN(

          (COUNTROWS(ALL(Data))/2)+1,

          ALL(Data),

          [Sum of Value],

          1

         ),

    [Sum of Value]

   ),

Finally, we average the two values:

AVERAGEX(

         TOPN(

              2,

              TOPN(

                   (COUNTROWS(ALL(Data))/2)+1,

                   ALL(Data),

                   [Sum of Value],

                   1

                  ),

            [Sum of Value]

           ),

        [Sum of Value]

       ),

If the number of values in the dataset is odd, we just return the middle number, using the formula in the final argument of IF:

MAXX(

     TOPN(

          COUNTROWS(ALL(Data))/2,

          ALL(Data),

          [Sum of Value],

         ),

    [Sum of Value]

   )

Note that TOPN rounds a number up or down arithmetically, if the first argument is a decimal value. Dividing an odd number by 2 will therefore always round the number to the next higher value.

Percentile INC vs. EXC

At the beginning of this post, I mentioned that to compute percentiles corresponding to Excel’s PERCENTILE.EXC, you use the formula n=P(N+1)/100 to get the rank for a given percentile. When, therefore, should you use one versus the other? The answer is similar to the use of STDEV.S/STDEV.P or VAR.S/VAR.P. PERCENTILE.EXC, which is a new function in Excel 2010, generally provides better results when we are using a sample from a larger dataset (where it’s unlikely that the sample would include the dataset’s zero and one-hundredth percentile values – and thus these percentiles are excluded). PERCENTILE.INC is better when we’re working with the entire dataset.

My rants of the day

Rant #1

I can’t help but end this post with a couple of rants. I’ll start with my biggest irritation in all of PowerPivot. WE CAN’T HIDE MEASURES. That’s bold, underlined, and highlighted. This situation is quite unfortunate. Imagine if you couldn’t prevent a VBA function from surfacing in the user interface (by declaring the function as private). Imagine if you couldn’t reuse routines in your development work, but had to rewrite them over and over. Imagine if you couldn’t validate complex expressions and improve performance by maintaining intermediate formulas. Without being able to hide measures from client tools, we are faced with all of the preceding issues. We can hide table columns, so why are measures any different? In Analysis Services, when you create a calculated measure, you have the option to have the measure hidden or displayed in client tools. Why does this logic make sense for calculated measures in AS, but not in PowerPivot?

Look at my field list for percentiles in figure 6 (some not mentioned in this post):

Figure 6 – PowerPivot Field List

Of all the measures shown, I want the client to see about half the number of measures in the list. The rest are intermediate results for development purposes only. To be honest, I don’t know which I find more baffling; that the development team can’t see the absolute necessity of this feature, or that I seem to be the only being on the planet that has a problem with this issue. Grrr. Unlike some very brilliant folks, I am unable to construct a half-page worth of a complex DAX formula, without the need to validate intermediate results.

On this matter, I think that I’m in good company. Flipping through the second edition of “The Microsoft Data Warehouse Toolkit,” written by authors from the highly regarded Kimball Group, I came across the following statement:

The best way to create calculations and measures in PowerPivot is to break long, complex formulas up into their component parts and build them incrementally. This allows you to experiment with the data and quickly see the results of your formulas.

Great advice, but incredibly, they don’t address how you are supposed to keep all of those intermediate results from the client’s view. The only solution currently available, and it’s a poor one from a performance and maintenance standpoint, is to merge the intermediate formulas into one mega-formula. Using perspectives don’t help either, since you can’t hide the original list of measures.

Rant #2

I’m glad to see that Denali includes TOPN, as this is a very useful function. But where is TOPNPERCENT? Can you name a product where these two functions don’t appear together, like Gemini or Siamese twins? Excel conditional formatting? Both there. Microsoft Access? Both there. T-SQL? Both there. MDX? Ditto. I find this level of inconsistency irritating. You can create a TOPN percent measure by using a formula like TOPN(COUNTROWS(Data)*N%), which is understandably trivial. Well, you can create a distinct count measure by using COUNT(DISTINCT), which is also trivial, but we now have DISTINCTCOUNT because creating distinct count measures are so common (and because the BI Pros all requested it J).

On the positive side, there is the “issue” of TOPN returning more than N records if there are ties in the Nth record. Microsoft’s products are somewhat varied in behavior. PowerPivot, Excel Conditional Formatting, and Access always return more than N values if there is a tie in the Nth value. MDX in Analysis Services never returns more than N values, so if there are ties in N, values are arbitrarily dropped (in contrast, SAS Analytics provides an option in MDX TopCount/BottomCount, which allows you to specify whether or not to return ties in N). T-SQL’s TOP predicate provides a WITH TIES qualifier to give you control over returning ties in value N. Personally, I see no logic in dropping tied values in the Nth record. Which records do you drop, and which record do you keep? Also, if there are ties in any of the values within the top N, all the ties are returned. Therefore, I think that PowerPivot, Excel and Access provide the correct logic.

Next…

In Part II, I will continue the discussion of percentiles using a slightly more complex example and demonstrate how to use percentile measures to create a box & whisker PivotChart!


Pivotstream is Hiring – SSIS/SQL Pros

September 28, 2011

 
image

***UPDATE***:  We have filled all available positions for the moment.  We may open new positions in the future of course, but also, we expect to be in a position to refer business to 3rd party SSIS/SQL professionals, so please still feel free to submit resumes in the meantime.

At a high level, we have two separate but related businesses at Pivotstream: 

  1. “Full-Service” BI Subscriptions – an example of which is covered in a blog post and then as a Microsoft case study.
  2. Self-Service PowerPivot Environments – including low-cost, optimized hosting as well as onsite training and software.

Both revolve around PowerPivot (and, in the near future, Tabular BISM). 

Both require a lot of backend SQL work to ingest, shape, and prepare data before it can be optimally consumed in PowerPivot.

And both businesses are booming.

If you are a SQL/SSIS professional with five years or more experience, have a friendly and flexible attitude, and want to witness firsthand the transformation of the BI industry, please drop us a resume:

info@pivotstream.com

Relocation is not required – we are very “telecommute friendly.”  We do ask that you be a US resident, however.


Slides from SQL Sat 85

September 27, 2011

 
Had a great time this past weekend in Orlando, speaking at SQL Saturday 85. 

Slides from the event are uploaded here.


DAX Autocomplete Stops Working–A Workable “Fix”

September 27, 2011

 
image

“There, good as new.”

By Now, You’ve Probably Seen This

So you’re typing along, writing a DAX measure.  And suddenly, poof!  Autocomplete stops working.

This happens to me all day, every day.  Especially when I use the [Measure]( filter expression ) syntax.  Here’s a quick “before and after” example:

PowerPivot autocomplete is working

Autocomplete is Working for Table/Column Names

PowerPivot autocomplete is NOT working

Same Table Name, But Autocomplete is NOT Working

Grab Your Duct Tape…

A guy named Scott Kaylie showed me this about six months ago and I am just now getting around to posting it.  Yes, my backlog of posts has in fact grown to be at least six months long Sad smile

Anyway, here’s the trick.  Just put a * in between the measure and the open parenthesis, as if you were going to multiply the measure by another expression:

Tricking autocomplete into working with *

OK, so then just go ahead and finish writing the measure, leaving the * in there:

Tricking autocomplete into working with *

And then, at the very end, go back and remove the *…

Tricking autocomplete into working with *

Voila.  Ugly yes, but much better than having no workaround.  Thanks Scott.

Any other questions please consult the following home repair manual:

From GraphJam


Mr. Excel Makes His HostedPowerPivot Site Public

September 22, 2011

 
image

PowerPivot Workbooks as Interactive Web Apps – Try it Yourself!

Bill Jelen and I talk a lot about a few recurring topics.  One of our favorites:  it’s one thing to be told that PowerPivot workbooks can be transformed into web applications with a simple Save As, and quite another thing to see it for yourself.

I don’t know why there’s a difference, but there is – you’re looking at something in your browser, it’s an interactive application, and then you remind yourself, “oh yeah, I built this in Excel,” and a light bulb comes on.  I see that effect every day when I show people for the first time.

Try Out Bill’s Apps, and Send in Your Own

To test that on a broader scale, Bill decided to take his site on HostedPowerPivot.com and open it up to the community, which I think is really quite cool.

Right now he only has two applications shared out publicly – he’s going to let YOU send in workbooks that he will review and publish.  He’s also got some PowerPivot gear to give away.

Watch the short video here for a quick tour, explanation, and instructions on submitting workbooks

Then, go check out the site itself:

http://mrexcel.hostedpowerpivot.com

Username:  MRXL@pivotstream.com
Password:  P0werP!vot

NOTE:  that first “0” in the password is a zero, not a letter.

Have fun!


Detecting ALL() – Detecting When Not Filtered

September 20, 2011

 
image

 

“I like to carry it, you never know when you’re gonna need it.”

-The  much-missed John Candy as Uncle Buck

 

 

A technique that you may need someday

File this under “you may never need this, but when you do, you’ll know it immediately.”  I’d call this one a solid 4 on the DAX Spicy Scale.

Why I needed this:  I had two different calendar tables, one at the Date level and one that was a custom calendar, Periods table.  Most of the measures in this model are written to be used with one table or the other, and I never have to “cross the streams.”

But then I ran into a case where a measure I had written to be used with the Periods table, suddenly needed to also be used on a pivot that was only filtered by Dates.  And I didn’t want to write a new version of this measure (for reasons that are mostly irrelevant here).

Of course, when I put the Period-focused measure on a Date-focused pivot, and there were no fields from Periods on the pivot, well…  the Period-focused measure returned junk.

What I decided to do, then, was detect if Periods was the aptly-named “Sir Not Appearing in This Film” and then assign a Period in that case.  In other words, detect if the Periods table was not on the pivot, and if not, FORCE a Period value into the evaluation of my measure.

Detecting ALL(), or the Absence of Filter on a Field or Table

Here is the final measure formula I used, color-coded for identify its parts:

IF(COUNTROWS(Periods) = COUNTROWS(ALL(Periods))-1,
   CALCULATE(My Original Measure,
      FILTER(Periods, Periods[Period] = [LatestPeriod])
   ),
  
My Original Measure
)

Let’s go part-by-part:

The Detector

IF(COUNTROWS(Periods) = COUNTROWS(ALL(Periods))-1

The “detector” counts the rows of Periods in the current pivot context, and compares that to the number of rows in the Periods table with all filters removed by ALL().  Pretty straightforward right?

So…  why the –1 at the end?  The reasons for that are slightly academic…  academic enough that I don’t really want to understand in great depth.  Let’s go with  the short version:  in this case, when I counted the rows of ALL(Periods), ALL() was kind enough to include the “blank” row of the Periods table.

What’s that?  You say you don’t HAVE a blank row in your Periods table???  Well, neither do I.  But you MIGHT have some rows in your Sales table that have a blank value for the [PeriodID] column.  Or maybe you have rows in the Sales table that DO have a [PeriodID] value, but that value does not appear in your Periods table.  Either way, you implicitly DO have blanks in your Periods table, and COUNTROWS(ALL()) decides to tell you about it.  So you’ll subtract one and like it, soldier!

(I have not tested, honestly, whether you always need – 1 in this detector.  If your Sales table is perfectly clean, maybe the – 1 is not needed, and maybe it is.  Someone let me know OK?  And I bet three-to-one that the answer comes from Italy.)

The Original Measure

My Original Measure

OK, this part IS straightforward.  Whatever my original measure was named, or perhaps its full original formula, appears here.  Moving on…

The Filter for the “No Periods Selected” Case

FILTER(Periods, Periods[Period] = [LatestPeriod])

OK, in the case that Periods is absent from the pivot, I take the original measure and then use the FILTER function to pretend that the Periods[Period] column IS on the pivot, and filtered to a value matching the most recent period, as calculated by my [LatestPeriod] measure.

But really, this part is going to be VERY different based on the circumstance.  Maybe you want to use a completely different measure, for instance.  Or set Periods to the first period this month.  Or the period corresponding to the current filter context from the Date table.

I merely included the “meat” of what I did here to drive home the intent.

Two Notes

One – note that there is NO difference between “the Periods table is not used on the pivot at all” and “the Periods table IS on the pivot but unfiltered in the current context.”  So if you have Periods on a slicer but nothing is selected, the detector will “go off.”  And if you have Periods on rows, the detector will still “go off” in the grand total cell of the pivot (and maybe in certain subtotal cells as well).

Two – I was messing around in this area when I discovered the need for the long-simmering Precedence Project.  In other words, when you start messing around with overriding filter context like I did with the FILTER() function above, and you’ve got a number of tables and relationships in play, every now and then you see something you don’t expect.  For that reason, I plan to return to the Precedence Project shortly.


SQL Saturday 85, Orlando FL, September 24

September 16, 2011

 
Just a quick note:  I will be in Orlando next weekend, September 24, speaking on PowerPivot.

It’s a free event, and you don’t even have to be a SQL pro.  If you’d like to drop in just for my session and then take off, you can, but there are definitely other sessions worth checking out even if you’re not a SQL pro.

Anyway, if you’re in the Orlando area and want to come out, I’d love to meet you.

Event link:  http://sqlsaturday.com/85/eventhome.aspx

Also note that I will be in Tampa, presenting multiple sessions on PowerPivot, on November 5th.


Profit & Loss–The Art of Cascading Subtotals Workbook

September 15, 2011

For those of you who want to look into “The Art of Cascading Subtotals” further, you’ll no doubt find the workbook useful.  I’ve added a link into the original article, but for those of you who have read through and didn’t find the workbook first time around (because it wasn’t there until the first request came in………..), YOU CAN DOWNLOAD THE WORKBOOK HERE.


Profit & Loss–The Art of the Cascading Subtotal

September 15, 2011

I occasionally look through the logs of what people were Googling (or Binging etc.) when they found their way to the blog.  Turns out that there’s a lot of accounting and finance related topics.  And while I know my way around Excel and PowerPivot pretty well, I have never directly worked in those industries.  I could never, ever, produce a credible post on something like a Balance Sheet, or P&L, or PB&J.  (OK, actually, I can handle that last one).

So I asked David Churchward if he’d be willing to write an occasional post on these topics.  He has graciously agreed, and what appears below is the first such post, on Profit and Loss in PowerPivot.

Guest Post from David Churchward:  P&L in PowerPivot

PandL Screenshot1As a fellow of CIMA, the first report that I wanted to create in PowerPivot was the infamous Profit & Loss (or P&L if you prefer).

It turns out, this wasn’t the simplest report to conquer first as there’s some hidden complexities. In this post, I’ll walk through how to create one version of a P&L but you should find some of the techniques useful elsewhere and transferable to other Profit & Loss layouts.

P&L reports can take a number of forms, but the approach that I will walk through here is easily transferable to any layout that may be required. To illustrate this, I’ll create the simple layout shown to the left (with a bunch of fake data).

In reality, this report should carry percentages and comparatives to prior year and budget, as well as show periodic values and year to date values. Perhaps I’ll come to all of that in a future post, but for now, the backbone of the report is in the calculations in the Cascade_Value_All column of this report.

The Problem

As you can see above, each section of the report can behave in a different manner to the others. To explain this, I’ll briefly explain the different section types:

Sales and Cost of Sales – these are broken into subsets based on the product groups sold (Air, Rail, Sea in this example). This is a simple sum of the underlying transactions within these categories.

Gross Margin – this is a subtotal of Sales and Cost of Sales (ie Sales less Cost of Sales or Sales plus Cost of Sales if these groups have different sign conventions) and, once again, this is broken down by the same product types held in our sales and Cost of Sales sections.

Sales Costs, Administration Costs and Directors Costs are a sum of underlying transactions and are broken down by cost categories such as Salaries, Travel expenses and so on.

Operating Profit is Gross Margin less Sales Costs, Administration Costs and Directors Costs. A breakdown of this value is possible, but probably wouldn’t be shown on a P&L report so we simply require one subtotal.

Profit Before Interest & Tax = Operating Profit less Non Operating Costs

Profit Before Tax = Profit Before Interest & Tax less Interest costs

Net Profit After Tax = Profit Before Tax less Tax costs

The above feels like a whole lot of different measures especially since each section can carry a further breakdown or simply show a subtotal.  In fact, it can all be done with one relatively simple measure and a little bit of data conditioning. Whilst this can be written as one measure, I’ll build this up as a series of measures to illustrate the method but the report ultimately displays as one measure.

The Art of the Cascading Subtotals

Given the above problem, let’s break this down into simple “Excel” type equations. Our dataset essentially contains 6 types of transactions.

[Note - Sales and costs have different sign conventions- Sales are negative and costs are positive - I'll come onto this - it's an accountancy thing!]

The 6 types are Sales, Cost of Sales, Costs (although we have 3 types being Sales Costs, Administration Costs and Directors Costs), Non Operating Costs, Interest, Tax

The values we want to show are as follows

Sales = sum(Sales)

Cost of Sales = sum(Cost of Sales)

Gross Margin = sum(Sales) + sum(Cost of Sales)

Costs = sum(Costs)

Operating Profit = Gross Margin + sum(Costs)

Just to interrupt the flow – can we therefore write Operating Profit as below?

Operating Profit = sum(Sales) + sum(Cost of Sales) +sum(Costs)

The answer is yes which means that our report is essentially summing all transactions in our report that precede it. Could we therefore use a derivative of the running subtotal concept? My method here is very similar to that of a running subtotal and this is what I tend to refer to as the Cascading Subtotal.

How It’s Done

We have our fact table which is a series of transactions representing sales or costs and each carry identifiers to determine which type of sale or cost they are. My fact table is called FACT_Tran

clip_image001

Heading1_Code is linked to a heading dimension table called DIM_Heading1 as shown below. DIM_Heading1 carries all of my primary report headings.

clip_image002

I’ll come onto the meaning of the columns Heading1_Summary and Heading1_Show_Detail fields in these tables shortly.

The field Heading2_Code in my fact table is linked to a heading dimension table called DIM_Heading2 as shown below and this table carries all of my secondary report headings.

clip_image003

You can go beyond this with third, fourth, fifth……… levels if you wish.

clip_image004

Aside from that, we’re only carrying dates and a customer field in our fact table. These will be referenced when time intelligence and slicing our Profit and Loss comes into play which I’m not going into for now.

First let’s deal with the back to front nature of Accounting Transactions

Nothing revolutionary here, but we have to deal with the fact that sales transactions are processed in the accounts as negative values and cost transactions are processed as positive transactions. We simply create a base measure which reverses this. Let’s call the measure Value_Corrected

Value_Corrected = SUM(FACT_Tran[Value]) * -1

I won’t dwell on this, it just needs dealing with. Let’s get on with some proper DAX!

But First – A quick trick!

Accountants like to be able to switch their reports between whole numbers, numbers represented in thousands and sometimes in Millions (I’ve never got to Billions but watch this space!).

Create a table called Divide_By with the values that you will divide by (therefore a good name for the table) and give each record a name.

clip_image005

You don’t need to link this table to anything but create the following measure called Selected_DivideBy on that table.

clip_image006

On our fact table, create another simple measure, let’s call it Report_Value, which sums our value column and divides by the result of our Selected_DivideBy measure.

clip_image007

For any other measures, reference this Report_Value measure and you’ll then be able to apply a slicer to all values that will change how they’re represented.

clip_image008

This can be used for numerous applications including currency translation although a Slowly Changing Dimension is probably more suited to currency and I believe this can be done in PowerPivot but I haven’t tried as yet.

Now Some Proper Measures

If I simply use my Report_Value measure, I get the following once I bring in my headings from DIM_Heading1 and DIM_Heading2

clip_image009

My transactions are summing correctly, but I don’t have any values for Gross Margin, Operating Profit or any other key P&L subtotals. I need to therefore create a measure which creates values for these subtotals. Consider our DIM_Heading1 table again and refer to “The Art of Cascading Subtotals

clip_image010

You’ll note that my first field in this table is an increasing integer. If we take Gross Margin as an example, which has a code 3, we need to sum everything that precedes that code (ie Sales and Cost of Sales). That therefore implies that I have to override the Heading1_Code link using an ALL function and then filter the dataset where Heading1_Code is less than 3. This measure looks like this:

Cascade_Subtotals

=CALCULATE

(

[Report_Value],

ALL(DIM_Heading1[Heading1_Name]),

DIM_Heading1[Heading1_Code] < VALUES(DIM_Heading1[Heading1_Code])

)

You’ll notice that we first remove the link dependency using the filter

ALL(DIM_Heading1[Heading1_Name])

and then re-apply a filter that will return a dataset where Heading1_Code is less than the code being evaluated on our report

DIM_Heading1[Heading1_Code] < VALUES(DIM_Heading1[Heading1_Code])

So, what happens when we use this measure on our report?

clip_image011

Ummm, it appears to have all gone wrong! Actually, it hasn’t. In isolation, this measure won’t work as DIM_Heading1[Heading1_Name] is presenting more than one answer to each equation evaluation. However, we can deal with this by instructing the measure to only evaluate when there is only one Heading1_Name on the report using Countrows and Values. I won’t go into these functions in detail here as Rob has already done an excellent job on these here and here and here.

 

Cascade_Subtotals

=IF(

COUNTROWS(VALUES(DIM_Heading1[Heading1_Name])) = 1,

CALCULATE

    (

[Report_Value],

ALL(DIM_Heading1[Heading1_Name]),

DIM_Heading1[Heading1_Code] < VALUES(DIM_Heading1[Heading1_Code])

     ),

BLANK()

    )

With this in place, we get the following:

clip_image012

So we now have a bigger mess, or so it seems. Our subtotals such as Gross Margin and Operating Profit are now calculating correctly, but we’re getting a lot of mess in between. For instance, Sales Costs are actually displaying Gross Margin values and Cost of Sales is showing Sales Values. Operating Profit, which I would like to see as just one total is actually populated with a secondary level of detail that I don’t want. If I expand the line, I get the following:

clip_image013

This may have some value in certain reports, but not on my Profit & Loss report here.

It should be noted that one of the values on each key row on this report is correct as it stands. We therefore simply need to evaluate when to use each one. This is where our Heading1_Summary field comes into play on our DIM_Heading1 table.

clip_image014

You’ll notice that I’ve flagged all of the key Profit & Loss subtotals with a 1. I know, that when my Heading 1 is carrying this flag, I want to use my Cascade_Subtotals measure. I can code this using a simple IF statement as follows:

Cascade_Value_Heading1_Summary

=IF(

MAX(DIM_Heading1[Heading1_Summary]) = 1,

[Cascade_Subtotals],

[Report_Value]

    )

This gives the following effect on my report

clip_image015

All of my key subtotals are now calculating correctly and I’m also showing the correct values in sections where a cascaded subtotal isn’t required. Job done? Well, not quite. You’ll notice that my key subtotals such as Operating Profit are still carrying details underneath them that I don’t want to show.

To get around this, I need to explicitly tell my measure that I want my Report_Value measure to execute at levels below my key subtotals (ie for Heading2_Name such as Air, Rail and so on). I sense that COUNTROWS may be able to do this for me so let’s try a quick measure to see what this would deliver using the following measure:

Countrows_DIM_Heading2_Name

=COUNTROWS(VALUES(DIM_Heading2[Heading2_Name]))

I’ve collapsed some of the screenshot below to show the key areas

clip_image016

You’ll notice that this measure evaluates to a value of 1 for each heading 2 (irrespective of whether there is a valid value on the report or not) and a value of 9 for all subtotals. I can therefore distinguish between a subtotal and a heading 2 line on my report using

COUNTROWS(VALUES(DIM_Heading2[Heading2_Name])) > 1

Let’s build this into our measure

Cascade_Value_Countrows

=IF(

MAX(DIM_Heading1[Heading1_Summary]) = 1

&&COUNTROWS(VALUES(DIM_Heading2[Heading2_Name])) > 1,

[Cascade_Subtotals],

[Report_Value]

    )

clip_image017

Great, my Operating Profit section now doesn’t carry any detail. However, I’ve also lost the lower level detail for Gross Margin. I need to put that back. To do this, I have a flag on my DIM_Heading2 table called Heading1_Show_Detail.

clip_image018

I can now tell my measure to behave differently for my Gross Margin section by adding the condition:

MAX(DIM_Heading1[Heading1_Show_Detail]) = 1

When I build all of this together, I get a measure that looks like this:

Cascade_Value_All

=if(

MAX(DIM_Heading1[Heading1_Summary]) = 1

&&(MAX(DIM_Heading1[Heading1_Show_Detail]) = 1

||COUNTROWS(VALUES(DIM_Heading2[Heading2_Name])) > 1

     ),

[Cascade_Subtotals],

[Report_Value]

    )

clip_image019

My measure is now correct. When I remove all of the others and tidy up, I get the following:

clip_image020

Quick Recap

In summary, we’ve basically been through the following steps

Value_Corrected = Created a base measure which reverses the accounting signage of transactions

Report_Value = Taking our Value_Corrected measure, we’ve adjusted for the fact that our report should be able to flex between GBP, Thousands and Millions

Cascade_Subtotals = we’ve created an alternative measure to Report_Value that we can use at Subtotal levels in our reporting using the Cascading Subtotals method

Cascade_Value_All = we’ve then conditioned our report to decide when to use the Cascade_Subtotals measure and when to use the Report_Value measure by using flags on our heading dimension tables and COUNTROWS functions to highlight subtotal levels.

CLICK HERE TO DOWNLOAD THE WORKBOOK

What Next

There’s still a lot we can do to this report such as showing comparatives such as Budget and Prior Year. We can also overlay time intelligence to show periodic and YTD values based on a time slicer. We should include key percentages such as Gross Margin percentage and Return on Sales for key sections of the report. When this is done, you can create a report similar to one I created some time ago which looks something like this (although hacked to preserve sensitive data)

clip_image021

One Last Point to Note

There’s numerous ways to construct these reports. In truth, I do a lot of data conditioning in SQL before I even touch PowerPivot which opens up a whole raft of extra possibilities. Taking account of Rob’s excellent post Less Columns, More Rows = Speed, an alternative is to simply accept that rows are cheap in terms of performance if your measures are optimised so there’s the possibility of bringing in the same dataset multiple times but displayed as different report headings. This is a method that I use widely and perhaps I’ll run through this in a future post if there’s the demand for it and Rob allows me to waffle on again.


CIMA Part 2 – The Hidden Costs of Traditional BI

September 13, 2011

 
Finding an image for "dark matter" should have been more challenging

“Requirements transmission is the ‘dark matter’ of BI Projects.”

-Me, SQL Saturday Cleveland (in one of my wittier moments)

I’ve always loved the concept:  we know the universe is essentially a lot “heavier” than all the things we can see.  There’s a lot of mass out there that we just can’t see – so-called dark matter.

On paper, BI projects seem pretty simple.  What kind of data do you collect.  What are the kinds of questions you need to answer.  Simple, we draw a line from A to B and off we go.  But then the project runs for a very long time – where does all the time go?

I’ve mentioned this before – the time vanishes in communication, and it vanishes in “ok now that I have what I want, I realize I don’t want that.” 

But in my second post to CIMA Insight, I explain in a bit more depth where and how time and budget manage to disappear in traditional BI projects.

Of course, it’s no surprise that the NEXT part explains how that is greatly reduced with PowerPivot.  I originally planned to include that in the current installment, but those folks at CIMA are strict about that 750 word limit Smile

And yes, the next part ALSO explains how I think traditional BI pros are going to become even more important than they are today.  So if you read this one, traditional BI pros, and want to come rip my head off, please wait until next month Smile

image

From the article:  Diagramming some of the hidden costs

Click Here to Read the Article