Running Totals Without a Traditional Calendar Table

June 30, 2011

PowerPivot provides a host of great functions like DATESYTD, DATESMTD, DATESBETWEEN, etc. that are useful for calculating many things, including a running total.

But with the exception of DATESBETWEEN, I seldom get to use any of those “time intelligence” functions, for the simple reason that our clients almost never operate on a traditional calendar.

I very often find myself working with a “calendar” that looks like this:

Non Traditional Calendar

Can’t Use Time Intelligence Functions With a Calendar Like This

And sometimes I even find myself with “calendars” that don’t have any date columns in them at all:

Another Non Traditional Calendar

This Table DOES Truly Represent the Business Calendar But Has No Dates in It!

So what do you do when you still need a “Year to Date” Total?

Silly humans.  They don’t care that the data is structured one way or another under the hood.  They just keep insisting on seeing useful things, like Year to Date totals.  They don’t want to hear how the blender is constructed, they just want their daiquiri.

So, what’s a report designer to do?  Give up?  No way.  We make daiquiris anyway.

I’m going to use that second calendar above, the one that has no dates in it.  In fact this is the same data set I have been using for the Precedence Project.  (Really, this is Part Three in disguise).  So let’s return to a familiar pivot:

image

Familiar Starting Point

It has Year and MonthNum on rows, and the simple measure Total Sales.  I want to end up with this:

image

Desired Result

Getting to that desired result, in my experience, is something you either stumble upon quickly or flail around forever and never find.  In fact, finding myself in a situation like this is what triggered me to start the Precedence Project in the first place.

Cutting to the chase:  this formula works

To keep this post short and sweet, I’m just going to share a working formula.  I hesitate to call it “the” formula, because there are multiple variations that work, and some fit certain biz requirements better than others.

So here is one that works.  It has some quirks that I will iron out in the next post.

[YTD Sales]=
[Total Sales](
                FILTER(
                  ALL(Periods),
                  Periods[PeriodNum]<=MAX(Periods[PeriodNum])
                ),
                VALUES(Periods[Year])
             )

In short, this  measure clears all filters on the entire Periods table, then adds back two filters – restores the current context for Year (using VALUES), and limits the PeriodNum to be less than or equal to the current context (I will explain later why I used MAX and not just VALUES).   Note that I am using PeriodNum (which keeps increasing across years and is unique to a given Month/Year combo), not MonthNum (which is on the pivot and “resets” to 1 for January of each year).

Next Post:  Explaining the Formula, and Showing How NOT to do it

Explaining that relatively simple formula, if I do a thorough job, will consume its own post.  And I also want to cover some other approaches – ones that seem like they should work but do not.  Because the things that DON’T work are even more educational than the things that do.


Homer Simpson has more accurate computing than us

June 24, 2011

 

“Hmmmm…  tasty numbers.”

Fun topic for a Friday!

Thought I’d take a break from the Precedence Project and just share something that I find both amusing and fascinating at the same time.  Your mileage may vary of course.

Recently, someone copied me on an email they sent to a colleague, explaining something kinda funky about Excel.  Actually, it’s something funky about ALL spreadsheets, and all computers in general really.

“Bob – as we discussed, Excel should sum 1.12-1.23+0.11 to zero but as you can see, certain orders actually produce the wrong answer.  This is not an excel formatting issue: Excel is actually calculating the wrong answer.  Try it for yourself.  The first time I discovered this it was with a more complex data set and it drove me nuts trying to figure out why my model wasn’t working exactly as it should.  It turns out it is a known limitation of the way in which computers represent "Double Precision Floating Point" numbers, whatever that means.  Garner has tried to explain it to me but it went over my head.  The simpler explanation is to just blame Rob Collie.”

And he included the following example:

clip_image001

Of course, I have nothing to do with this – I wasn’t even in high school at the time Excel was built.  But in my time at Microsoft I DID get embroiled in a very bitter battle related to this topic, so I am at least somewhat qualified to talk about it.  And I like to think that no one is better qualified to make analogies between this topic and cartoon characters – I’m your guy for that.

Why does this happen?  (Condensed Version)

We all know that the fraction “1/3” does not “fit” into our base-10 number system:  .3333 repeating infinitely.

Well, that isn’t peculiar to base 10 – converting fractional numbers between ANY two number systems can result in values that don’t “fit.”  Try fitting the fraction “1/7” into base-5 for example.

And alternatively, if you wanted to convert “1/3” into base-3, it WOULD fit nicely – you would represent “1/3” as “0.1” in base-3 – that looks weird, I know, but it’s true.

Well, computers use a base-2 number system.  So guess what?  Our friendly little base-10 fractions like 0.1 very often don’t “fit” into computer accurately.  They get stored as approximate values, but the difference between the value we expect to see and the value that is truly stored is cleverly hidden in digits that you never see.

image

Converting 0.1 to Binary Yields a Non-terminating Number, so Computers Approximate It

For example, I think Excel limits you to 15 visible digits of precision.  But under the hood, it’s operating on more than 15 digits.  So any errors that occur based on the base-10 vs. base-2 conversion most often only appear in those “dark” digits that you never see.  The difference is INCREDIBLY tiny, like trillionths of a percent, and it’s hidden, so you never really care.

Sometimes though, as you do arithmetic in Excel, that incredibly tiny difference, out at the 16th decimal place or beyond, can ripple into the visible digits, as it does in the example that was sent to me.

Do all spreadsheets have this problem?

Yes, they do.  Well, every spreadsheet but the first one.  VisiCalc didn’t have this problem, but in hindsight they wished they’d had it,  Their solution was much slower in terms of performance:

“At its heart, VisiCalc is about numbers. One of the early decisions we made was to use decimal arithmetic so that the errors would be the same one that an accountant would see using a decimal calculator. In retrospect this was a bad decision because people turn out to not care and it made calculations much slower than they would have been in binary.

We did want to have enough precision to handle large numbers for both scientific calculations and in the unrealistic case it would be used to calculate the United States budget. Of course, as it turned out, that was one of the real applications.”

From http://www.frankston.com/public/?name=ImplementingVisiCalc 

All Right, why is Homer Better Off?

You know why human civilization adopted base-10 numbers?  Because we all start out counting with ten fingers.  There isn’t anything all that special about 10 other than that it matches our finger count.

Here’s the punchline:  in most cartoons, animators have discovered that characters still look ok if you only draw four fingers on a hand.  And that, of course, is faster to do.  So it’s become basically an industry standard.  8-fingered cartoon characters.

 the_simpsons_movie_torrent_bittorrent_download MickeyMousefred_flintstone

 

 

 

 

 

So…  cartoon characters would use a Base-8 number system in their daily lives, not base-10.  And because 8 is really just 2*2*2, ALL fractions expressed in base-8 could be expressed 100% accurately in a base-2, binary computer.

Spreadsheets in Homer’s world don’t have these weird exception cases.


Precedence Part 3: ALLEXCEPT vs. ALL w/ VALUES

June 22, 2011

 
Snatch - A Damn Fine Movie

 

“However…  you do have ALL() the characteristics of a dog, Gary.  ALLEXCEPT() loyalty…”

 

 

 

This will be a quick one.  I think.

In part two, we left off with the observation that VALUES() can “trump” ALL() even when VALUES() is applied to a column that is not on the pivot:

[ALL then VALUES of field not on pivot] =
     [Total Sales](ALL(Periods),VALUES(Periods[Period Num]))

image_thumb5

One thing I have often puzzled over is this:  how does a combination of ALL(Table), VALUES(Table1[ColumnA]) compare to using ALLEXCEPT(Table1, Table1[ColumnA])?

In essence, those should do the same things right?  In one case, you set the whole table to ALL(), but then “restore” ColumnA to its pivot context using VALUES().  In the other case, you set every field in the table except ColumnA to ALL(), leaving ColumnA in the context set by the pivot.

They should result in the same thing, right?  Let’s check.

ALLEXCEPT() using columns that are on the pivot

[ALLEXCEPT fields on Pivot]=
[Total Sales](ALLEXCEPT(Periods,Periods[Year],
              Periods[FiscalQuarter]))

There are two fields on the pivot from the Periods table – Year and FiscalQuarter, and both of those are excluded from being “all’d” in the measure.  Results:

image

OK, good news.  Those are indeed the same results as the unfiltered base measure, and that’s what we got from ALL() plus VALUES() as well.  So from this, we can say that ALL() plus VALUES() is the same as ALLEXCEPT().

Not so fast though…

ALLEXCEPT() using columns that are NOT on the pivot

Let’s try ALLEXCEPT() using a field that is NOT on the pivot.  Remember, VALUES() trumped ALL() in this case, too, and the results were the same as the unfiltered base measure.  Will ALLEXCEPT follow suit?

[ALLEXCEPT field NOT on Pivot]=
[Total Sales](ALLEXCEPT(Periods,Periods[Period Num]))

image

OK, that was a bit unexpected.  ALLEXCEPT() returned the same results as pure ALL().  So ALLEXCEPT() does *not* preserve the pivot context of columns NOT on the pivot, whereas VALUES() *does* pick those up.

I’m not sure if that’s intentional on the part of the PowerPivot team, or if it’s just a quirk.

Summary

1) ALLEXCEPT() does behave precisely like ALL(), VALUES() as long as the “exempted” columns are columns on the pivot.

2) ALLEXCEPT() does NOT preserve pivot context, however, on columns that are not on the pivot.

Oh, and one more…

No need to use ALLEXCEPT() instead of ALL() when using VALUES()

In the past, whenever I wanted to do an ALL() with a VALUES(), I typically used an ALLEXCEPT() instead of the ALL().

For instance, I would do something like this:

[Total Sales](ALLEXCEPT(Periods,Periods[ColumnA]),
              VALUES(Periods[ColumnA]))

That turns out to have been an overly-careful habit.  As we’ve seen here, and in part two, VALUES() trumps the ALL(), and does not need the “help” of ALLEXCEPT() instead.

Yes, I’m sure many of you will say that’s obvious from everything covered so far, but just in case you find yourself tempted to use ALLEXCEPT() to exempt the column you are preserving with VALUES(), there’s no need.


Precedence Project Part Two: VALUES()

June 16, 2011

 
OK, picking up from part one

Let’s start with a simplified version of last post’s pivot – remove one of the row fields, and all of the measures but the base Total Sales measure:

image

Now let’s add a measure that sets Year to ALL:

[ALL Year Sales] = [Total Sales](ALL(Periods[Year]))

PowerPivot ALL Year Measure

Note that all of the orange cells have the same value – they are all Q1 cells and differ only on Year, so the all Year measure means each orange cell will display the total sales for Q1 across all years.  So the green cells all add up to the value in each orange cell.

Rule #3:  VALUES() Selectively “Shuts Off” ALL()

OK, we’ve played with VALUES() a few times on the blog, most notably here and here.  But previously, we’ve never used it like we are going to use it this time.  We are going to use it to preserve the original filter context and “fight back” against ALL().  Exciting huh?

Let’s add another of our previous measures back to the pivot – the one that applied ALL() to the entire Periods table:

[ALL Period Sales] = [Total Sales](ALL(Periods))

image

OK, now let’s add VALUES() to the mix in a new measure that has ALL applied to the whole Periods table, and VALUES applied to just one column in Periods:

[ALL with VALUES] =
     [Total Sales](ALL(Periods), VALUES(Periods[FiscalQuarter]))

ALL Against Whole Table, VALUES Against One Field

This time, again the Q1 cells are all the sum of every Q1 in the base Sales measure.  So the VALUES function, applied to the FiscalQuarter column in this measure, overrode the ALL which was applied to every column in the Periods table (including FiscalQuarter!)

So this new measure differs from the pure [ALL Period Sales] measure at the Quarter level.  But note the blue cells.  It’s still identical to [ALL Period Sales] at the Year level.

So is ALL() still “winning” at the Year level?  No.  It’s just that, at the Year level, VALUES(Periods[FiscalQuarter]) returns  the list of all four quarter values, because all four Quarter values truly due correspond to that cell’s coordinates in the pivot (the subtotal for the Year does indeed imply all quarters).

Neat huh?

Clarification/Correction

Derek pointed out in a comment that the two values in blue actually are NOT equal.  They are off by $281.  This is because there are blank values for year – some noise in the data – and I have filtered those out using the little Row Labels dropdown:

image

If I clear that filter and allow blanks to show up in the pivot, I get:

image

Note that the two values in blue are still off by $281.  But the mysterious missing $281 is indeed accounted for by the blank year.

So…  the pure ALL(Periods) measure catches that $281.  That’s expected.  What’s unexpected, at least initially, is that the ALL(Periods), VALUES(Periods[Fiscal Quarter]) measure MISSES that $281. 

Here’s why:  note that there’s only a blank Quarter under the blank year.  There are no blank quarters associated with the blue cells.  So that $281 is omitted.

Great catch Derek, this was awesome.  I probably should spend a post just explaining where those blanks Years and Quarters came from.

Back to the original post…

We can even confirm that VALUES() beats ALL() in a heads-up fight.  Rather than use ALL() against a whole table and VALUES() against one column, let’s use them both against the same single column:

[All vs VALUES Heads Up] =
      [Total Sales](ALL(Periods[Year]),VALUES(Periods[Year]))

VALUES Beats ALL Heads Up

Hey look!  Our new measure returns the EXACT same results as the base sales measure.  VALUES completely shut off ALL() in a head to head fight.

Rule #4:  VALUES() Wins Even if the “Targeted” Field Is Not on the Pivot!

Maybe this one isn’t so much a precedence rule as it is surprisingly cool.  Let’s try using VALUES() against a column from the Periods table that we don’t have on the pivot…

Periods Table

Let’s try that [Period Num] column.  Notice how it uniquely identifies each period, regardless of Year?  It’s the unique key for this table, so if VALUES() works, that *should* beat ALL() even if ALL() is applied to the entire table.

Let’s try it:

[ALL then VALUES of field not on pivot] =
     [Total Sales](ALL(Periods),VALUES(Periods[Period Num]))

image

…and the result is IDENTICAL to the base sales measure.  The ALL() is completely overridden.  Cool huh?  VALUES operates against the underlying tables, and merely takes into account the filters from the current pivot cell.

Good thing to know.  I’ve been taking advantage of that for a long time, it is very helpful.

OK, this post has run long so I will save ALLEXCEPT for the next one.


The Great Precedence Project (Filter Rules for DAX)

June 14, 2011

 
Richard Pryor wrote most of Blazing Saddles, explaining why it's the best movie Mel Brroks ever made

“Wait a minute. There might be legal precedent. Of course. Land snatching. Let’s see, land, la-land…see snatch. snatch…snatch…ah! Haley vs. United states. Haley 7, United States nothing. You see, it can be done!”

 

 

Keeping filters straight as they pile up

Some example questions to ponder:

  1. Within a FILTER or a CALCULATE, does the order of the filter arguments ever make a difference?
  2. Does using an ALL inside of a FILTER yield different results than using an ALL inside of a CALCULATE?
  3. Measures that are used inside of a FILTER or DATESBETWEEN are never impacted by external filter-setting forces are they?  Or are they?

I went through several months without learning anything new about DAX, and that seemed to me like a Good Sign.  A sign that perhaps I now knew everything I needed to know – not everything I could know, but everything I would ever really need to know.

But then a series of new scenarios presented by clients opened up my eyes to some very “squishy” places in my understanding.  I got everything working for the clients, but I took a note to dig in later and see if I could capture everything into a series of new “rules.”

In the meantime I fired off a few emails – specifically to the product team and “The Italians” – and got some initial advice (the Italians were particularly helpful).  But both of those parties understand this stuff almost TOO well – and what Excel people like me need, I think, is to learn from someone who struggled initially.

So I propose to be that person – who struggles with questions like the above, then triumphs, then explains.  I am going to post my findings as I go, but rest assured that it’s not an entirely selfless act – forcing myself to explain things is the way I best learn things myself Smile

It’s gonna take a series of posts to cover this.  And frankly, I’m not done discovering all the rules yet.  Let’s dig in.

Setting the stage:  two simple measures

We need a base to start from.  So, here’s a simple pivot:

Starting Point for PowerPivot Filter Precedence

The measure there is nothing fancy – just a simple SUM.  All of the fields on rows are from the Periods table:

Periods Table

Now let’s add an ALL(Periods) version of the measure:

[ALL Period Sales] = [Total Sales](ALL(Periods))

Starting Point for PowerPivot Filter Precedence with ALL

And if the syntax I am using, [Measure](<filters>), makes you scratch your head, remember that it is just shorthand syntax for CALCULATE([Measure], <filters>) – they are 100% the same, I just prefer the shorthand syntax for readability.  (This is a controversial topic within the DAX community however, and others disagree with me, but I’m right, damnit! Smile)

Rule #1:  “ALL()” and “Table[Col]=Val” Work Together

Let’s add a third measure, one that has the ALL() from above, but also sets Year=2009:

[ALL Period Sales Set to Year 2009] =

     [Total Sales](ALL(Periods),Periods[Year]=2009)

And the results:

ALL and Simple Filters Work Together

Neither clause “overruled” the other.  They were both applied.  We can think of this as “all Periods filters were completely removed, but then Year was set to 2009.”

Of course, thinking of it that way makes you wonder…  what if I reverse the clauses and have the ALL second?  Will that then override the simple filter for 2009?  I mean, I hope that doesn’t matter.  Marco and Alberto say it doesn’t matter.  But I need to see it myself…

Rule #2:  Filter order does NOT make a difference within a CALCULATE

OK, let’s reverse the order:

[ALL Period Sales Set to Year 2009] =

     [Total Sales](Periods[Year]=2009, ALL(Periods))

And yes, same result:

Order does NOT matter in filter clauses

OK, that’s reassuring.  But it gets more interesting from here.  In the next post, I’m going to show you that when ALL() meets VALUES(), there arguably IS a winner, and that ALLEXCEPT() and VALUES() yield outcomes that I did not expect.


Two tips from “the Old Country”

June 10, 2011

   

“Don Corleone, we have some information for you.”

For awhile now I’ve been meaning to share a few tips from who I call “the Italians” – Alberto Ferrari and Marco Russo.  I reviewed their book awhile back, if you recall.  Over the past several months they have discovered a few things that I think you will appreciate.

 

 

Tip 1 – <CTRL>-Scroll Wheel to zoom the measure window

OK, we all know that the formula measure editor window uses a small font:

PowerPivot Measure Editing Font is Small

PowerPivot Measure Editing Font is Small

Alberto discovered that if you hold down the <CTRL> key on your keyboard, and scroll the mouse wheel, you can increase the font size!

Bigger Font in the PowerPivot Measure Editor

Bigger Font in the PowerPivot Measure Editor

Neat huh?  Funny thing is, this does not work on all of my computers.  I use different mice, different versions of Windows, etc. – I scramble all of the variables.  So it’s hard to know why it doesn’t work everywhere.  (In fact, the two screenshots above were taken on different machines).

While we’re on the topic, you may also want to consider writing your measures *outside* of this dialog, so take a look at this post on using Notepad++ written by Colin Banfield.

Tip 2– Sort data before it’s imported to improve perf

Another Alberto discovery:  Believe it or not, when PowerPivot imports data, the sort order of the incoming data actually has an impact.

The original post is here.  Bottom line:  you can shave 25% or more off your workbook size if the data coming in is sorted ahead of time, and Alberto’s results suggested that it doesn’t matter much which column you sorted by.

NOTE:  We are NOT talking about sorting data AFTER it’s been imported.  We mean sorting it during the import process.  Sorting after import has ZERO effect.

Tip 2a – Smaller workbooks are faster workbooks

He also noted that pivottables built on those smaller workbooks are faster than on the workbooks created from unsorted data.

That 100% matches my expectations, because PowerPivot’s compression is not just an on-disk compression – that compression is also maintained when the data is loaded into memory, AND is used to speed up queries!

So I have been telling people for a long time:  smaller workbooks are generally faster workbooks.  You can use workbook size as a rough indicator of how fast it will be.

Tip 2b – The column you choose DOES matter, if…

At Pivotstream we recently had a SQL table that just refused to import.  PowerPivot basically just locked up every time we tried to import it.

The table was sorted by a column with millions of distinct values (the dollars column).  When we sorted it by basically ANY other column, it imported just fine.

And this was only a 1.5M row table.  It seems that certain data sets can just chew up massive amounts of RAM during import if they are not sorted properly.

This discovery was what triggered me to go back and share all of the tips above, so it was a fortunate find.


“These are my numbers. I make them go up.”

June 6, 2011

 

image

“This is my rifle KPI dashboard.  There are many like it but this one is mine.”

A number of posts rattling around in my head were delayed by the launch of HostedPowerPivot.com, but with that behind us, I’m heading out on the road for a bit.  Paradoxically that means I actually have more time for the blog.  Buckle up, I aim to post 2-3 times this week Smile

In this post I’m gonna take what seems like a brief detour into the irrelevant, but trust me, it has a point and I will get to it pronto.

Warcraft:  How to waste years of your productive capacity

I’m not really what you would call a “gamer.”  Seriously, I mean that.  But for several years when I was at Microsoft, I had a significant addiction to World of Warcraft, aka WoW.  All told, I think I spent the equivalent of several MONTHS at the keyboard playing the game (yes, the game reports that figure to you, almost as if to taunt you).  That’s months as in “not sleeping, not eating, not taking breaks – solid MONTHS of time.”  I shudder to think what I could have done with that time if I’d only had a productive side project back then.

But I wasn’t alone.  Many of my Microsoft colleagues, some of them at executive levels, played too, and sometimes MUCH more than I did.  In fact, they recruited me to play in the first place.

You are probably wondering “Why, why, WHY would we do such a thing?”  There are a few reasons, such as “how else do you interact with your friends from 9 pm to 1 am every single night?” 

But the more relevant answer is this:  because we all loved making numbers go up.

*MY* numbers!  Must…  go… higher…

Warcraft is basically a series of dashboards with a game built over the top.  Yes, you are playing a game.  But the game is a detail – the game is just the mechanism by which you make your numbers on your dashboards look more impressive.  I’m absolutely serious.

image

A typical overview dashboard describing a single player

These dashboards are YOURS.  They describe YOU, in a sense.  And they are not private to you.  Not at all.  Anyone else in the game can click on you and inspect your dashboard.  If that’s not enough, they can visit your online dashboard that Blizzard updates every day.

Your dashboards are your entire face and reputation to the online world, much as some people view their Facebook pages.

One of my childhood friends is a talk radio producer in Florida.  He’s also a poker pro who recently won a $10,000 entry into the World Series of Poker Main Event next month in Vegas.  He’s awfully good at making numbers go up.

Unsurprisingly, he also happens to have the highest overall numbers of anyone I knew in Warcraft:

achievements2 
Yes, I introduced him to the game.  Someday he may forgive me.

And of course, people build all kinds of spreadsheets and other applications to help them optimize their numbers:

166j5o8 
No, I did not make this.  But I like it :)

The moral of the story

There are actually two.  First of all, if you are reading this blog, you are a numbers geek.  And my advice to you is to stay the hell away from games like the above :)

The other moral is to recognize that we ALL like making numbers go up, and that can be a powerful force for good (and not just an insanely addictive and lucrative business for Blizzard).

And Now for a Productive Example

On a recent client visit I was invited to attend their end of day “KPI” meeting on two consecutive days.  It’s a 30 minute meeting held EVERY weekday at 5 PM.  Executives, team leaders, and the BI team all attend.

And the entire agenda of the meeting is to go over about 20 charts.  Yes, the same 20 charts.  Every day.

Each chart shows about 30 days of data.  So in effect, every day, they review the same 20 charts, each of which has changed by only 3% from the prior day.  Based on that description, you’d think this was the dullest, least dynamic and productive recurring meeting in history.

But it is anything but.  I was astonished at how many productive conversations broke out as the result of reviewing the data.  On day one I was convinced that it was a fluke, and that the next day would revert to quiet and dull.  But day two was again incredibly productive for them.

It was like being dragged to an opera and then having them show The Matrix instead.

Why this works

So I asked them what their secret was.  Why was this such a dynamic and engaged meeting, day after day?

These clever folks then killed me with my own sword.  You see, at lunch the day before, they had asked me why Warcraft was so addictive, and I had explained the whole “numbers going up” thing.

“These are OUR numbers Rob.  They are what we do.”

I loved it.  It inspired this whole post.

And when I got home, what was my first project?  Yes, that’s right – I started building a new internal KPI report for Pivotstream’s multiple businesses.  Because what gets measured gets done. 

And numbers you see… are numbers that you make go up.