Creating Accurate Percentile Measures in DAX – Part I

September 30, 2011 at 12:25 pm

“Which records do you drop, and which record do you keep? Also, if there are ties in any of the values within the top N, all the ties are returned. Therefore, I think that PowerPivot, Excel and Access provide the correct logic.

If you want to aggregate N records, you have to have N records. Thus, you cannot use the TOPN function to do this, without a workaround.

September 30, 2011 at 3:17 pm

Hi David,

Thanks for the comment. I suppose that I’m used to calculating topn values which are always done within some context (i.e. top 10 something – products, customers, and so on). I also have trouble wrapping my head around the meaning of an aggregation where, say, you want an aggregation based on the top 10 values in a dataset. In the dataset, the first nine values are tied and count in the aggregation, but the next 20 values are tied and only one count in the aggregation.

However, I don’t wish to turn the discussion into one about TOPN, so I’ll accept that there are probably valid scenarios where only exactly N values make sense.

Whether or not you can use TOPN to return exactly N values is dependent on the other columns in the table. If you have a unique data column (e.g. a key column), you can add it to the TOPN formula, e.g. =TOPN(5,ALL(Table1),[Sum of Value],1,Table1[ID],1) will return exactly 5 records, even if there are ties in the 5th value of [Sum of Value].

If your table doesn’t have a unique data column, and you can’t add one, then you are correct – you must use a workaround.

October 1, 2011 at 10:25 am

BTW, great article!

October 1, 2011 at 6:47 pm

Well I’ll be…I only just noticed your article about TOPN on this site. Not sure how I missed that one. Sometimes I’m out of touch for periods, and so depend heavily on Vidas Matelis’s PowerPivot-info site to catch up with PowerPivot posts (thanks Vidas!).

One interesting thing about TOPN I discovered, but failed to mention in my post is that because TOPN can have multiple orderBy_expression/order pairs, you can use a second pair to influence the records returned. If the second orderBy_expression is a unique ID column, you will get exactly N values, as I mentioned in my previous message. However, even if you don’t have a unique ID column that’s readily available, you can create a random number column like the [UniqueIncremValue]in your post, and use it for the same purpose.

November 21, 2011 at 8:22 am

Update – As of RC0, you can hide measures from client tools. A big “THANK YOU” to the Analysis Services Team!!!

December 13, 2011 at 11:03 am

Hi Colin,

Is there a way to get the value at a specific index in a set in DAX. I am finding that matching the rank to the expected row position in the set is not working very well when the set has lots of duplicates and there are large gaps in the ranking values. Very likely that I am also doing something wrong.

Thanks,

Rich

May 15, 2012 at 11:58 pm

This is a great post Colin. By far the best walkthrough to set people up to be successful with this common problem using the latest denali functions like rankx().

One contribution that I would make here is that you can make these formulas more context sensitive within contexts (Pivot Tables) if you use the ALLSELECTED() wherever you are currently using ALL().

Powerful stuff. Keep up the good posts.

May 16, 2012 at 11:54 pm

Richard:
I apologize for taking so long to respond to your comment. The problem is that I’m getting notifications, and I wasn’t expecting to see comments posted months after the initial post. It’s sheer coincidence that I ended up on this page now and noticed the comment.

I don’t know if you’ve since sorted out the issue, but the problem lies with the “PercentileDown” formulas. For example, I formulated 25thPercentileDown as:

=MAXX(FILTER(ALL(Data), [Rank] = ROUNDDOWN([25PctRank_INC],0)), [Sum of Value])

However, if a duplicate happens to occur such that the required rank is skipped (in this case 11), the formula returns an empty value because the [Rank] can’t be found.

Instead the formula should be:

=MAXX(TOPN(ROUNDDOWN([25PctRank_INC],0),ALL(Data),[Sum of Value],1),[Sum of Value])

and the same goes for 50thPercentileDown and 75thPercentileDown.

Thank you for bringing this issue to my attention. I will do a follow-up “Errata” post. Great blog you have by the way!

Trevor:
Good point about ALLSELECTED()!…and thanks for the kind comments.

April 18, 2015 at 5:59 pm

This thread is old but it comes up in first in Google searches so I thought I would post my thoughts. I tried implementing this and the complexity was a little much, so I came up with an alternative. This method returns a discrete percentile, which means the value is a specific value from the sample, and is not interpolated. This suites my own organization’s purposes best. Interpolated would be more difficult but for all practical purposes they are the same. For an explanation of percentile discrete see: https://msdn.microsoft.com/en-us/library/hh231327.aspx

1) Create measure for number of cases – [Visit Count] =COUNTROWS(‘EDLvl3′)
2) Calculate the percentile index record (90th percentile in this case), this will be the value chosen once the data are subsequently processed in the next step – [90th Percentile Record Index] = ROUNDDOWN((0.1)*[Visit Count],0)+1
3) Calculate the percentile by using TOPN. You need to order by both the variable and unique identifier of the record. The unique identifier is required so that ties do not skew the result. Finally, the minimum of that TOPN table must be returned using the MINX function:
[90th Percentile ED LOS]
=MINX(
TOPN(
[Percentile Record Index] –Top N using percentile record index calculated above
,’EDLvl3’ –uses EDLvl3 table
,’EDLvl3′[GRH ED LOS],0 –Sorts variable in descending order (0 option for sort order)
,’EDLvl3′[zzAbstractLink] –Sorts next by any distinct column to handle ties
)
,[GRH ED LOS]
)

October 20, 2015 at 3:00 am

Great post Colin.It really helped in making percentile calculation code simple and efficient.

Josh, I stumbled up on the same issue with ties and TopNSKIP does not seem to be supported in tabular. Using unique key with TopN resolved the issue. Thank you!

April 21, 2018 at 1:29 pm

I am trying to get PERCENTILE.INC not for all values, but for values in each category. Kind difficult to achieve!

I have a table with categories (A, B, C) and values, like in this example. This PERCENTILE.INC shown here only works if my table has no categories.

Creating Accurate Percentile Measures in DAX – Part I

Cancel reply