Anakin asked questions about Power Pivot's STDEVX.P

“Only a master of evil formulas, Darth!”
”…hmm, which kinda DOES make you the master doesn’t it?  Crap, I give up, just
chop me in half and get it over with, but make sure Luke sees you do it ok?”

Anakin’s Second Line of Questioning

Two weeks ago I posted about a particular student’s knack for making me learn new stuff, specifically the “bubble up ranks” method.  (Ooh, I forgot to circle back and cover the “bubble up exceptions” part – making a note of that now).

Anyway, he also pressed me to teach STDDEVX().  That’s a first.  “No worries,” I said, “it’s just another X function – but only you will know if the numbers are correct, because I don’t know what Standard Deviation truly means.”

Embarrassing isn’t it?  To be a numbers guy and not know what Standard Deviation means?  I was on the varsity calculus team in high school.  I even have a Math major (to go along with my Computer Science and Philosophy majors – jack of all trades, master of none).  How did I slip through the cracks on Standard Deviation?

OK yeah, it measures deviation, variability.  I get that from the name.  And yeah I even “learned” it in Statistics class in college, to the extent I ever showed up at 8 am.  But I didn’t grok it.  Didn’t breathe it in deeply.  Until now. 

It’s all fun and games until the Football Project Returns!


Standard Deviation / STDDEV in Power Pivot

Standard Deviation (in Red) Only “Makes Sense” When Compared to the Average (Green)

Turns out, Standard Deviation is a simple concept!  It has real world tangibility!  No one should be scared of it, at all.

Before I wax philosophical, here are the formulas displayed above:

[Yards per Week]=

  [Yards Measure] / DISTINCTCOUNT(Schedule[WeekNumber])

      In English:  “take the total number of yards this player earned over the season, and divide by the
      number of weeks in the season.”

[StdDev Yards per Week]=

  STDEVX.P(VALUES(Schedule[WeekNumber]), [Yards Measure])

      In English:  “step through each week of the schedule and see how many yards this player gained in
      each week.  That results in a set of numbers.  Then take the standard deviation of that set of numbers.

Simple huh?

LaDainian Tomlinson had a Standard Deviation of 65.3 in 2003 – What Does that MEAN in the Real World?  Can we TOUCH It?

This is the cool part.  YES, we can absolutely touch it!  Two crucial points “unlocked” the mystery for me forever:

  1. Standard Deviation is “measured” in the same units as the average.  In our example, our average is measured in yards per week.  (He averaged 139.4 yards per week).  We can also say that Tomlinson had a standard deviation of 65.3 yards per week!
  2. We should expect “normal” occurrences to fall in a range that is plus or minus 1 standard deviation from the average.  In our example, in a “normal” game from 2003, we should expect Tomlinson to gain somewhere between (139.4 – 65.3 = 74.2) and (139.4 + 65.3 = 204.7) yards!

An Aside on the Value of Education…

dollarGreenAnakin, my student, taught me those two things in about 15 seconds.  Why (and how) was that simple explanation kept from me all these years?

Let’s put that in perspective.  My four-year private school college tuition from 1992-1996 was valued at $72k (I was on scholarship & didn’t pay a dime, no way my family could have afforded that).  Today that same university charges about $180k in tuition for 4 years.

And I estimate that I learned no more than 12 useful lessons over those four years. 

So in today’s dollars, my student taught me a lesson worth $15k.  Let’s not tell him that.  (OK he’s probably reading this).

Let’s Put That In Practice:  Upper and Lower Bounds

Let’s take point #2 from above and “implement” it in two new measures:

[Yards per Week Expected Upper Bound] =

  [Yards per Week] + [StdDev Yards per Week]

[Yards per Week Expected Lower Bound] =

  [Yards per Week] – [StdDev Yards per Week]

STDDEV Applied:  Dare We Say This Now Gives Us Some Primitive Amount of Predictive Power in Power Pivot?

Tomlinson’s High Average is Built on Big Games, But If You Had to Stake your Life on Someone Gaining at Least 85 yards in a week, Priest Holmes is Your Guy.

Hrm, this smells like another technique that’s relevant to charting.  More on this later.

Important:  Average and STD DEV Must Have the Same “Granularity”

One last point.  In this example, I was taking an average per week.  And my std dev measure stepped through individual weeks.  That is important:

  [Yards Measure] / DISTINCTCOUNT(Schedule[WeekNumber])

  STDEVX.P(VALUES(Schedule[WeekNumber]), [Yards Measure])

If I wanted to measure variability of yards per season, or yards per play, I would need to change both of these metrics to that same “per thing” methodology before I could use them together.

In other words, standard deviation is just as “tied” to a particular granularity as an average would be.  It is not a “universal” measurement.

Rob Collie

One of the founding engineers behind Power Pivot during his 14-year career at Microsoft, and creator of the world’s first cloud Power Pivot service, Rob is one of the foremost authorities on self-service business intelligence and next-generation spreadsheet technology.

This Post Has 26 Comments

  1. Very cool article, but there is a common problem made here, I think. If you are going by top running backs based on yards/week in 2003 then where is Clinton Portis? If I remember right, he should be number 1 or 2.

  2. Averages with a high std dev suggest a lack of control. Whether you’re looking at a pro sports player or class grades or chemical analyses, a high std dev relative to the average should immediately make the question “why?” pop into your head. In your numbers above you might conclude that Holmes is a more consistent player than Tomlinson. A student with erratic grades may be smart, but lacking in focus. A chemical analysis that has a high deviation may need better sampling methodology or more sensitive test equipment to become more reliable. Standard Deviation is such a wonderful tool! Obi-wan definitely used it, probably enhanced by the force, when training Luke. It’s all about control!

  3. You have to be careful here. The divide by distinct count of schedule weeks is not right. Some players do not play every week due to injury, PED suspension, etc. PowerPivot helped me prove every metric wrong at my company because they used this same logic. This is also why you are missing Clinton Portis. He averaged more yards per game than any of these guys that year, but only played 13 games. That kind of insight is pretty valuable. As an operations analyst, I found that managers knew this and would only hire people during the very first week of the month. If they did not, it would bring their monthly performance/employee/day metrics down unfairly. I’ll try and post the correct logic whenever I get to a PC.

  4. Two small points.
    1. The standard deviation can be seen as an estimate of the average deviation between the actual values and the average.
    2. In most cases (e.g. Statistical Process Control), statisticians use ±3 SD as the bounds of normal behavior, which means that more than 99% of the expected deviation is included. With a sample size of 16 games, ±3 SD is enormous.

  5. A useful post to note, thanks. We recall that std dev is (really) important but shy away from using it because we forget how to apply it – to make it tangible… Your example, along with @jonpeltier’s and the other comments, show how readily that can be done. Thanks again.

  6. im dying trying to figure out what is going on with this whole dax thing. i have tried writing this data so many different types of ways. i keep getting a message saying things like data is to large or too small. usually when i have a problem i google them but its not working. im trying to figure out the Loadfactor of certain flights on specific days. but some days dont have load factors on them because the id changed . will this effect my data? and why cant i get the SD? any help will be greatly apreciated

    heres my error

    ERROR – CALCULATION ABORTED: MdxScript(Sandbox) (13, 62) An argument of function ‘SQRT’ has the wrong data type or the result is too large or too small.

    heres the calculation i tried

    CALCULATE(SQRT((SUM([BLF])-(COUNT([BLF])*((SUM([BLF])/COUNT([BLF]))^2/COUNT([BLF])-1)))))

  7. heres te second try
    =(SQRT((SUM([BLF])-(COUNTROWS(([BLF])*((SUM([BLF])/COUNTROWS(([BLF]))^2/COUNTROWS(([BLF])-1))))))))

    gave me
    ERROR – CALCULATION ABORTED: MdxScript(Sandbox) (13, 81) The value for ‘BLF’ cannot be determined. Either ‘BLF’ doesn’t exist, or there is no current row for a column named ‘BLF’.

  8. so the its still not working. do you mind doing me a favor and writing it for me to copy? the table is called “vic-file june” and the colum that i want the data is “BLF”

  9. Not Sure if you would do it this way, but replace F:F with any column that will have a value in it…
    =(SQRT((SUM(F:F)-(SUMPRODUCT(–(F:F””))*((SUM(F:F)/SUMPRODUCT(–(F:F””))^2/SUMPRODUCT(–(F:F””))-1))))))

  10. What if I am trying to use the value I get for standard deviation in the context of a filter. I am trying to use standard deviation to filter out the outliers that show up in my graph. I currently only have a 2 column table. One with the transaction date of a service receipt, the other being the unit cost for that particular transaction. Depending on QTY ordered, occasionally unit cost is more then 3 standard deviations away. I only want the value to show in my table if it is within 1 or 2 standard deviations. Is something like this possible? My current attempt at the formula simply flattens my entire table down to 0 rows.

    =CALCULATE([Unit Cost],FILTER(SERVICE_RECEIPT,[Unit Cost]>[STD Unit Cost]))

  11. Happy to see that some old post are still visited. Also a bit stuck today trying to calculate a standard dev in a project of calculating monthly High-Low sales based on the past 3 years’ data & by doing so, check whether the current monthly forecast is outside the historic range (Hi or Lo). Very interesting project, but I’ll be happy when I’ll find the solution 🙂 One side comment overall, i read all over the website (which I follow since a year or so) that PowerPivot is “magical, fast and easy”. I do agree overall, but please, do not underestimate the learning curve and that it often takes a bit more than “15 minutes” (ref to this post and today’s post) to make things perfect and work great. Cheers to all and by the way, I am interested by a PowerPivot group in Belgium (around Brussels) if s.o. reads this… thx

Leave a Comment or Question