Friday Bonus: UFO Sightings & Hallucinogen Use

October 7, 2011

 
clip_image002

I didn’t have this ready in time for the contest Microsoft was running, and I’m not sure they would have wanted this to represent PowerPivot in an “official” sense so it likely wouldn’t have won anyway.

But here it is folks – a data mashup of UFO sightings versus Hallucinogenic drug usage.  All sourced from real data, for the United States only, between 1930 and 2000.

To place everything on the same scale, all measures are “indexed” against their maximum.  UFO sightings peaked in 1999 and declined slightly in 2000 for instance, so 1999 is where the green line hits 100%.  And I indexed LSD and Ecstasy versus total Hallucinogen usage (of all types), so neither of those lines ever hits 100%

Before you go concluding that drug usage leads to UFO sightings that are merely hallucinations, remember that correlation is not causality.  It is just as likely that UFO pilots are drawn to drug usage.  They are particularly fond of spying on raves, apparently.


Couldn’t Resist… Parameterized PowerPivot Report!

July 20, 2011

In the intro to David’s guest post below I mentioned some new applications we are building.  Check out a PowerPivot report where, in the browser, you can type in a SKU number and get a full dashboard for just that product!

Parameterized PowerPivot Report in SharePoint

Click for Larger Version

Much respect to our resident Data Junkie Monkey, aka DJ Monkey, for pulling this together.


A Lesson in A-G-I-L-I-T-Y

May 11, 2011

image

A case study I just have to share

Over the past year, “Agility” has become quite a theme on this blog.  It’s fair to say that I just keep hammering it, over and over.  But I recently had my eyes opened as to what “extreme” PowerPivot agility can look like.

Let’s put it in football terms:  it’s like I’ve been going around telling everyone how they should be less like the 400+ pound Aaron Gibson and more like the nimble, versatile Marshall Faulk.

And then someone comes along like Barry Sanders:  agile in ways that you’d never think to recommend, but once you see it, you recognize it immediately.  And then you watch it over and over in awed slow-motion (seriously, click Barry’s name above for a video). 

That’s what this is like.

Dynamic Data Warehousing!

You don’t truly know what you need to know, until you are delivered what you THOUGHT you needed to know (I sound like Rumsfeld).  That was the theme of this blog post from last year.  That blog post concluded with my clients at the time realizing that the data they truly needed…  wasn’t even being collected.

I think it’s basic human nature to assume that yesterday’s gaps in your understanding are just that (yesterday’s), and that you won’t suffer from that in the future.  Marcellus Wallace recognized this tendency as pride, and rightly took a dim view of its helpfulness. (Video definitely NSFW!)

I’ve sung the praises of those folks (my clients) before, and I’m going to do it again:  Pride didn’t get in the way of their thinking on this issue.  Instead of telling themselves that they’d be better at anticipating their needs in the future, they decided to bake uncertainty into the CORE of their BI planning. 

They built a system that enables the following:  If they decide they are missing a set of data points, they can start collecting them, warehousing them, and analyzing them, end to end, in as little as two to four days’ time (that’s my estimate).

Here’s a diagram to illustrate:

image

Click for larger version

More Detail

Here’s some detail that was hard to fit into the diagram.  Basically these folks have millions of devices out “in the wild,” and those devices are instrumented to collect data about usage patterns.  When I first visited them in the early Fall, those devices were hardwired to collect only fixed data points, and we discovered that they needed to collect new data.

When I had the opportunity to drop back in on them recently, however, they revealed this new system.  Now, the only thing hardwired is flexibility.  The devices all call home once a day and see if there are new instructions awaiting them – brand new script written by their development team.  To make things painless and error-free for the development team, they also have built an internal portal that the developer visits to register the new message type that they are adding to the devices’ instrumentation scripts.  That portal takes care of configuring the data warehouse – new tables, retention policies, aggregation rules, as well as configuring the incoming message ports and mapping them into the right import processes.  Boom.

imageThey are even experimenting with ways to allow automatic generation and/or modification of PowerPivot models based on selections made in the portal.

It’s worth taking a step back and marveling at.  On Monday they can realize they have a blind spot in their radar.  On Tuesday and Wednesday they develop and test new instrumentation code.  On Thursday they roll it out to the devices.  And on Friday, they are literally collecting and analyzing MILLIONS of data points per day that they lacked at the beginning of the week.

It’s not like those new data points are on an island, either.  Via DeviceID, they are linked to multiple lookup/dimension tables and therefore can be integrated into the analysis performed on other fact/data point tables as well.  They can literally write measures that compare the newly-collected data against data they were already collecting – ratios, deltas, etc.  They can put the new metrics side by side with old metrics in a single pivot.  And, in theory, they could use the new data points to generate new lookup/dimension tables by tagging devices that exhibit high or low amounts of the newly-instrumented behaviors (although we did not discuss that on site – it just struck me as a possibility).

And late last year this organization had made zero investment, ZERO, in business intelligence.

Can this work for everyone?

Of course not.  Not everyone has the luxury of reprogramming their production systems at high frequency like this.  Not everyone can afford the risk or performance hit of having their production systems writing back directly to their data warehousing systems either – standard practice is to have your warehousing efforts “spying” on your transactional systems and taking occasional snapshots.  It’s a pull, not a push, which is why the “E” in “ETL” is Extract.

But this is definitely food for thought, for everyone.  “Why not?” is one of my favorite questions, because even if you can’t ultimately do something, examining the “why not” in detail is often very enlightening.

I know one of the standard objections is going to be, essentially, “Data Gone Wild:  No discipline.  Mile-long lists of tables and fields.”  Bah, I say!  Good problems to have!  Storage is cheap, flying blind is expensive.  And when you reach the point of being blinded by too much information, well, that’s an opportunity for a new set of tools and disciplines.

More to come

Last week was my first ever “doubleheader” – two consulting/training clients in a single week.  That can be hard on a blogger, heh heh.  But I look to be home for the next week or two, so you should expect to see a renewed flow of content here.  Got several things rattling around in my head.


A/B Campaign Analysis with Start & End Date Slicers

April 13, 2011

 
“Because every good story has a beginning, a middle, and an end.”

Awhile back I posted about promotional campaign analysis factoring in seasonal trends.  Now let’s look at another flavor of campaign analysis:  comparing results when a campaign is active versus when it was not.  Sometimes this is referred to as “A/B Testing.”

Let’s start by showing what the results can look like:  a report that has two date slicers – one where you select the Start Date of a campaign, and another where you select the End date:

image Pick Start and End Date, See How Sales Performed On vs. Off Program

The report then shows Sales results “On Program,” which are the sales that occurred between the start date and end date (inclusive) versus the Sales results “Off Program” (sales on all other dates).

Specifically, it shows Sales per Day for On vs. Off Program (because programs run for short periods of time, Sales per Day is a much more “apples to apples” comparison than Total Sales), and then the % change in Sales per Day when On Program versus Off (labeled “Program Delta” above).

(From a quick glance at Program Delta, it’s obvious that this was one awful program that ran from 5/20/03 to 7/8/03, but hey, that’s what we get when we use AdventureWorks as our data set).

Oh, and guess what?  Writing this blog post consumed a lot more time than building the report :)

How This Was Built, Step One:  Date Slicers and Date Measures

The first things you need are two single-column tables of dates – these are used to populate the Start and End date slicers:

 image   image

These tables are NOT related to ANY other tables in the model.  They stand alone, intentionally.  Also, MAKE SURE THEY ARE OF DATA TYPE DATE!  Otherwise the following steps will give you strange results.  Also, make sure none of the date columns in your model have time components lurking in them.

OK, now you need to define a measure on each of those two tables.  It’s the same formula, but I name one [Start Date] and the other [End Date], and assign each one to a different table:

image
image

By the way, the technique I’m showing here is a variation of a technique covered in prior blog posts:  one by Kasper and one by me.  Read Kasper’s in particular if I’m moving a little too fast for your taste in this post.

For grins (and to test this out), you can now slap both slicers and both measures on a pivot and inspect what each of those measures returns:

image

We’ll never place those measures on an actual report, but it’s good to see that they serve their purpose, which is to capture the dates that the user selects on the two slicers.  Note that I changed the captions on each slicer to reflect what their intended use is clear (originally they both just had “Date” as a caption since that was the column name in each table).

Step Two:  Sales Measures that are filtered by those Date Measures

Assuming we already have our base [Sales] measure defined, let’s start with [Sales on Program]:

[Sales on Program] = [Sales](DATESBETWEEN(Dates[FullDate],
      [Start Date],[End Date]))

We’re using the DATESBETWEEN function as a means of filtering our base [Sales] measure, and the [Start Date] and [End Date] measures that we defined previously are the end dates.

Note the use of the Dates table in the formula.  That is a third Date table, separate from the two slicer date tables, and that table IS related to the Sales table.

So…  we are picking up the user’s date selections from two tables that are NOT related to anything, and using the dates they selected to filter the “real” Dates table.  That filter then gets applied to the Sales table because it is related to the Sales table.

[Sales Off Program] is a bit trickier than “On” Program.  I suppose we COULD just subtract “On Program” sales from total [Sales], but just in case someday we need to build some date sensitivity into the base [Sales] measure, let’s defined [Sales Off Program] to be [Sales] filtered to dates OUTSIDE of the selected date range:

[Sales Off Program] = [Sales](FILTER(Dates,
   Dates[FullDate]<[Start Date] || 
   Dates[FullDate]>[End Date]))

First, notice that we are using the FILTER function this time rather than DATESBETWEEN.  There is no DATESNOTBETWEEN or DATESOUTSIDE function, so we have to express the logic more directly ourselves.  But that’s not a big deal – I’m pretty sure that DATESBETWEEN is really just a “skin” over the FILTER function anyway – I’ve used FILTER and DATESBETWEEN somewhat interchangeably over the past year and I always get the same results (and the same performance).

Also note the use of the “||” operator, also known as OR.  Rows from the Dates table are included if they are before the [Start Date] OR after the [End Date].

One more note:  you always want to use FILTER and DATESBETWEEN against the smallest tables you can.  Use them against your Dates table, for instance, rather than against the Date column in your Sales table, because they are MUCH slower against larger tables.

Here’s what we get with those new measures on our pivot, and with some Product hierarchy on rows:

image

Now you see why “per day” versions of these measures are required – the programs are so short that they are dwarfed by the “off” dates.

Note:  These two date slicers are great candidates for disabling cross-filtering and thereby improving the performance (update time) of this report.  See this blog post for an explanation.

Step Three:  Creating the Sales-Per-Day Measures

OK, first we need a [Day Count] measure that we can use as a denominator:

[Day Count] = COUNTROWS(DISTINCT(Sales[OrderDate]))

(Note that I am explicitly counting the Date column from the Sales table rather than the Dates table to account for product lines that did not exist for the duration of the entire Dates table, and therefore did not sell at all – there are tradeoffs here that I won’t go into, but I think this is the right thing to do in most cases when calculating a per-day measure).

[Sales per Day], then, is straightforward:

[Sales per Day] = [Sales] / [DayCount]

Now I need to create “On Program” and “Off Program” versions of that measure.  These formulas parallel the On/Off Sales measures from above:

[Sales per Day on Program] = [Sales per Day]
   (DATESBETWEEN(Dates[FullDate],
    [Start Date],[End Date]))

[Sales per Day Off Program] = [Sales per Day]
   (FILTER(Dates,
    Dates[FullDate]<[Start Date] || 
    Dates[FullDate]>[End Date]))

And lastly, let’s add the [Program Delta] measure, which is really Pct Change in Sales Per Day:

[Program Delta] = IF([Sales per Day on Program]=0,BLANK(),
   ([Sales per Day On Program] – [Sales per Day Off Program])/
   [Sales per Day off Program])

The IF in there is just to catch the case where a product did not sell during the selected dates.  If I leave that IF out, the measure will return –100% for those cases.  Sometimes that is what you want and sometimes it is not – it’s a case by case judgment.

Add some conditional formatting, and here’s the resulting report, repeated again from above:

image

Alternate Approach:  Using a Promotions Table

Rather than use two date slicers like we have here, you COULD have a single Promotions table that lists each promotion and its Start and End date.  Something like this:

image

Now, I just have to go back to my [Start Date] and [End Date] measures and change their definitions to reference the Start and End columns in this table, and “attach” them to this table instead of those slicer tables:

image

image

I don’t have to make ANY other changes.  Now I can remove the Start and End date slicers, and replace them with a single Promo Name slicer:

  image

Or, move Promo Name to Rows and some of the product stuff to slicers:

image

Other Fun Stuff

There are many, MANY other things you can do here too.  For instance, a measure that calculates sales per day for the month leading up to a promotion starting.  Or the month after it ends.  Or the exact same period 1 year ago.  PowerPivot truly does open doors that you’d never consider in traditional Excel (and probably would never get to in traditional BI).


Guest Post: Nth Occurrence DAX Formula

March 31, 2011

David Hager is At It Again

A short two weeks after his truly creative Measure Catalog Macro post, David Hager is back with a heavyweight DAX post.

This time he is exploring an area of the product that I have very little experience with, namely the EARLIER() function.  On the DAX spicy scale, EARLIER() is a 5 in my opinion, but the other reason I haven’t done much with it is because I’m pretty sure it’s only useful in calc columns.  At Pivotstream we do all of our calc columns in SQL (for several reasons I won’t go into here).

But not everyone has colleagues ready to write calc columns in SQL for them now do they?  In fact, the Great PowerPivot Survey is reinforcing that for me (I’m going to wait until next week before I start summarizing results, but it sure has been interesting so far).

OK David, take it away.

CREATING AN Nth OCCURRENCE DAX FORMULA

By David Hager

CLICK HERE TO DOWNLOAD THE WORKBOOK IN THIS POST

When you have a growing database of customer records (such as purchases at a store), it is useful to have a method for tracking the previous visit for each customer.

A formula of the following type can be used for this.

=CALCULATE(MAX(Table1[DATE]),FILTER(Table1, EARLIER(Table1[Customer])=Table1[Customer] && EARLIER(Table1[DATE])>Table1[DATE]))

(See http://gavinrussell.wordpress.com/2010/09/23/powerpivot-dax-session-notes-2-previous-row/ for the original idea behind this post).

image

However, note that the previous visit date is returned for every customer visit (except the initial one) for each customer. In some cases, this may be the desired result, but usually only the last customer visit record would add the previous visit as useful information.

While trying to make a formula that would do this, I thought that it would also be nice if I could expand that concept to find an Nth previous visit. Since I did not want to hard-code an Nth value that would have to be updated manually in the PowerPivot cube, I had to invent a method for creating a dynamic constant.

In order to create a dynamic constant for PowerPivot (which would simply be an updatable cell value in Excel), the first step is to create an Excel table. The table consists of a Date field which can be used to create a relationship with the fact table Date field and a constant field. In each row of the constant field, the same formula needs to point to the update cell (=$D$2).

image

Now, the table can be linked to PowerPivot, using the Create Linked Table feature.

Note from Rob:  Here’s another great example of a feature that I never use (Linked Table), because that linkage doesn’t refresh on the server.  But it has tremendous utility in desktop PowerPivot, so I’m glad David is showing it off here.

Two more steps remain before the dynamic constant can be used in a calculated column formula. By clicking Create Relationship in the PowerPivot window Design ribbon, a relationship can be established between the Date field in Table1 with the same in Table2. Finally, a calculated column named NTH is added to Table1 with the formula =RELATED(Table2[NTH]).

Now it is time to create the formula. The entire version of this formula is too complex to be viewed in a completely displayed form, so two parts of it are entered as calculated columns and are used as inputs into the formula shown below.

=IF(Table1[IsLastRecord],CALCULATE(MAX(Table1[DATE]),FILTER(Table1,EARLIER(Table1[Customer])=

Table1[Customer] && EARLIER(Table1[DATE])>Table1[DATE] &&Table1[IsNthPreviousRecord])),BLANK())

where

Table1[IsLastRecord]

=COUNTROWS(FILTER(Table1,EARLIER(Table1[Customer])=Table1[Customer]))=

(COUNTROWS(FILTER(Table1, EARLIER(Table1[Customer])=Table1[Customer] && EARLIER(Table1[DATE])>Table1[DATE]))+1)

and

Table1[IsNthPreviousRecord]

=(COUNTROWS(FILTER(Table1,EARLIER(Table1[Customer])=Table1[Customer]))-

(COUNTROWS(FILTER(Table1, EARLIER(Table1[Customer])=Table1[Customer] &&

EARLIER(Table1[DATE])>Table1[DATE]))+1)=Table1[NTH])

IsLastRecord counts the numbers of records for each customer and compares that number to the number of records previous to the last record plus 1. If the numbers are the same, the expression is true and the last record in row context for each customer will have a date depending on whether it had an Nth previous visit (otherwise a blank).

IsNthPreviousRecord is the 3rd filter in the main DAX formula. This Boolean expression is true if the numbers of records for each customer minus number of records previous to the last record plus 1 is equal to the desired Nth previous visit.

The following picture shows the returned records for the table example when Nth is equal to 2.

image

Note that only customers a & d have 3 or more visits, so those are the only dates returned.

I have not gone into great detail into explaining exactly how this DAX formula works mainly because if you work through this yourself you will gain a greater understanding of how to construct your own formulas. Note from Rob:  Heh heh, now I STILL don’t understand EARLIER().  Nor have I built a pivot table that exploits the power of this formula. I will also leave that as an exercise to the reader.

As you might imagine, there are many ways that this formula methodology for Nth occurrence can be used. Also, the technique for using a dynamic constant is pretty neat. I hope this helps in your future PowerPivot model-building projects!


Campaign Analysis: Seasonally-Adjusted Measures

January 16, 2011

 
Toilet Paper Sales At Halloween - Effective Promos or Just Seasonal?

 
“We sure sold a lot of toilet paper around Halloween this year.  That promotion we were running must’ve been REALLY effective!”

-Retail Manager Extraordinaire

 

 

 

 

   Halloween Mischief Has Nothing to Do With It!

A Common Question:  What Caused the Change?

Thought I’d take a break from the “Refresh on SharePoint” series and talk about an age-old problem…  and how easy it is to address with PowerPivot.  There isn’t much complexity behind it, and yet it’s still much more useful and re-useable than anything you can do with Excel alone.

Let’s say you work for the fictional AdventureWorks bicycle company.  And for the past few weeks, you’ve been running a special promotion:  “Buy a Bike, Get a Free Water Bottle.”  And so far, it’s looking pretty good, as evidenced by the highlighted region on this chart:

Basic PowerPivot PivotChart - Sales Qty Per Day Viewed By Week

 
OK, great.  But hey, those free water bottles are expensive and cut into your profits.  So if you want to be truly responsible, you need to make sure there aren’t other variables driving this bump instead.

(Oh, and for readers outside the US – are we the only country where teenagers engage in “toilet papering” other teens’ houses on Halloween?)

Getting a Baseline From Prior Years

The most common source of “noise” in sales like this is just normal seasonal fluctuation.  Weeks 17 thru 21 of the year – are those normally a hot week for bike sales every year, whether there are promotions running or not?

What I am going to show is just one technique for taking that into account.  There are many others, but this should get you thinking.

Step One:  Baseline PivotTable

In the chart above, I’m just looking at bike sales in 2004.  To get a baseline, I’m going to look at the three prior years:

Baseline Pivot for Seasonal Adjustment in PowerPivot

OK, yes, the numbers are smaller in prior years – these are the dangers of using a sample data set like AdventureWorks.  The inspiration for this blog post, however, is a real project I’m working on at Pivotstream…  and for client confidentiality reasons I obviously can’t use that data set.  So bear with me – the technique here is the important thing, not whether my sample data set is realistic.

Step Two:  Create a Weekly Adjustment Factor

At the bottom of that pivot, the grand total represents the average sales per day over the entire 2001-2003 period:

Grand Total Historical Baseline

The ratio of a given week’s sales per day vs. that grand total is an excellent indicator of how active that week of the year is on an ongoing basis.

So, let’s calculate it.  And since I only intend to calculate these ratios as one-time fixed values, I’m just going to use normal Excel formulas to do it (as opposed to creating a measure for it):

Ratio of Each Week Vs Grand Total Creates a Seasonal Adjustment Factor  
Calculating the Seasonal Adjustment Ratio for Each Week
($F$57 is a Fixed Reference to the Grand Total Cell)

Fill that formula down for all weeks and we get:

Ratio of Each Week Vs Grand Total Creates a Seasonal Adjustment Factor 2

Side Note:  Building the Right Reference Pivot

Before I move on, I think it’s worth mentioning that it pays to think about the makeup of your reference pivot.  What measure should you use?  For instance, if the price you charge for your products fluctuates a lot for random reasons (EX:  if you sell farm produce, which is subject to worldwide random price changes), then a pure dollar figure measure may not be a clean baseline.  So my Qty per Day measure would be better in that situation.

Similarly, if your company acquired another company in June of 2003, and its product line merged into yours in that month, effectively doubling your sales, that will also skew your baseline higher for the second half of the year – my Qty per Day measure would NOT be immune to that problem.  So in that case you may prefer to slice the baseline pivot to ONLY include the original company’s product line.

Point is…  it takes business-specific expertise to define your cleanest baseline.

Importing Those Factors Into the PowerPivot Model

OK, now that I have those numbers, I want to get them into PowerPivot so I can apply them all over the place.  A little Copy/Paste Values gets me this two column table in Excel:

Seasonal Adjustment Factor Table in Excel

From there, I can copy/paste into the PowerPivot window as a brand new table:

  Seasonal Adjustment Factor Table in PowerPivot

Relating that Table into the Rest of the Model

Next step is to create a relationship.  In this particular model, I need to connect this new table to the existing Date table, since that’s the only place where I have a Week Number column in the model:

Relating Seasonal Adjustment to Date Table

Now, I never intend to put fields from this new “Seasonal Adjustment” table on any of my pivots…  which means that table will never get filtered by anything (slicers, row fields, etc.)

That’s a problem, because ultimately I will need filtering by week in order to make my measures work.  So I need to get those values added into the Date table, because Date table fields WILL be on my pivots.  A little =RELATED is in order:

Seasonal Adjustment Factor Added to Date Table Using Related

Creating a “Seasonal Adjustment Divisor” Measure

Basically, we want a measure that, in the context of a specific week of the year, returns the corresponding Adjustment Factor from date table.  Once we have the measure, we can then divide other measures by that measure to create “seasonally adjusted” versions.

Here’s the formula for the “divisor” measure:

Seasonal Adjustment Divisor PowerPivot Measure Formula

Were you expecting something more complex?  Yeah, I was too :)  And to be fair, in my real-world example I am working on, it is a bit more complicated, but that has more to do with the data than basic approach.

To verify that it works, let’s compare it side by side with the original seasonal table I built in Excel:

Verifying that the Seasonal Adjustment Divisor Measure Works

Bingo.  Matches perfectly.  Of course, given that we’re just looking at single week numbers here, I didn’t have to use AVERAGE() – I could have used MIN(), MAX(), or SUM() and gotten the same results.

Seasonal Divisor Measure Works at Month Level TooBut AVERAGE() really does make a big difference at higher levels, like when I add Month to the pivot, here at right:

So the month of April, for instance, typically moves about 81% as much product as the average month.  Neat.

And December, unsurprisingly, clocks in at 170% as much volume as the average month (hey, AdventureWorks sample db DID get that one right).

OK, that’s the last prep step.  Now we get to the payoff – the creation of seasonally adjusted versions of our core measures.

 

Seasonally Adjusted Sales Measures

All that work above may seem like an investment, but trust me, it has taken me a lot longer to write up this blog post than it did to actually just build the model.  About 10 minutes, tops.

No matter what, though, the next step is as quick and as painless as it gets.  Just start dividing existing measures by the divisor measure!

For instance, at the beginning of this post I was using the “Qty per Day” measure.  Let’s make the seasonally adjusted version:

Seasonally Adjusted Sales per Day Measure in PowerPivot

(Oh, and if you are wondering how I made the text in the measure dialog bigger, well, it’s as simple as holding down the CTRL key and rolling the mouse wheel.  Thanks to Alberto Ferrari for this great tip.)

Time to test it out.  Let’s go back to the original chart and add this measure side by side with the original measure:

PowerPivot PivotChart - Seasonally Adjusted Sales Qty Per Day Measure 
Original Measure Side by Side with Seasonally Adjusted Version

Hey, when adjusted for seasonal trends, our sales activity was actually even higher…  but then again…  so were all of the other weeks on the chart as well.

In fact, after adjusting for seasonality, the “bump” from the promo campaign doesn’t seem as large as it did before.  Here’s the chart again, this time with just the seasonally adjusted measure:

PowerPivot PivotChart - Seasonally Adjusted Sales Qty Per Day Measure 2 
Seasonally Adjusted Measure by Itself – Where’s the Lift?

So yeah, maybe a small bump.  But not much.  Is it worth it?  Well, my next step would likely be to create a seasonally-adjusted measure of profit, since that will factor in the cost of the free water bottles.  Again, this all comes down to the particular of your business, which is why having the tools directly in the hands of the domain experts is so critical.

Use it Everywhere!

Just wanted to emphasize this one more time:  now that I have the [Seasonal Divisor] measure, I can quickly create seasonal-adjusted versions of as many measures as I need.  One quick division and I’m done.

Also – if you ever want to adjust your definition of seasonality, all you have to do is change the formula for that one divisor measure, or the table that underlies it, and ALL of your seasonal-adjusted measures respond accordingly.  You don’t get that kind of “one touch update” in traditional Excel.

Lastly, because you are creating measures rather than calculations that are fixed to the layout of a given sheet, all of your work is easily re-useable in many different contexts – analyze by individual products or by entire categories.  By city or by country, no problem.  At a daily or monthly level.  You get the idea.

Other Techniques to Consider

Two more possibilities just to get the imagination going:

  1. Define the Seasonal Adjustment table in SQL – why not move that whole process, and the underyling logic, into SQL?  That way, you’ll get updated seasonality numbers over time, and you can probably get more sophisticated logic into the system.  Sure, you’ll need help from your DBA, unless you ARE the DBA.  But cooperation with your DBA team is a prime attribute of an optimal PowerPivot system.
  2. Differentiate by other fields – hey, maybe Accessories have a different seasonality pattern than Bikes.  And the Southwest region of the country might not show as much dropoff in January as the Northeast does.  You can absolutely factor that into your Divisor measure.  You could even try to do it 100% dynamically, in the measure, without pre-calcing an adjustment table first.

OK, that’s enough for now.  Back to SharePoint refresh shortly.


Santa’s Elven Productivity Simulator Leaked Online!

December 23, 2010

 
PowerPivot Model for Elven Productivity in Santa's Workshop

Now With Ornaments! :)

Get Your Password Now!

Folks, Wikileaks is at it again.  This time, they’ve come into possession of one of Santa’s most closely-guarded management tools:  his Elven Workshop Productivity Simulator.

And naturally, Wikileaks has asked for my assistance in getting this out to the public.

So I did what every responsible person would do.  I decided to host it.

To log in, go to https://clients.pivotstream.com/sites/santa/pages/simulator.aspx

Username:  santa@pivotstream.com

P-word:  S@ntaModel1

How it was done

I won’t go into too much detail, but…

  1. The web app uses PowerPivot for SharePoint, Excel Services, etc.
  2. The app was built 100% in Excel 2010 with PowerPivot for Excel
  3. No VBA, no code, nothing like that
  4. PowerPivot measures in hidden pivot tables
  5. Then, a dose of advanced Excel formulas, and LOTS of conditional formatting

Special Thanks To

My wife Jocelyn – for doing the “lite brite” style tree art for me in Excel.

Pivotstream CEO Jeff Elderton – for letting me bounce ideas off of him…  and for, ahem, helping me “spruce” up the visuals a bit (get it??  Spruce???  Certainly worth every penny, this entertainment).

All of my colleagues at Pivotstream – for humoring me and tolerating this sort of thing.  I dare say they encourage it.

All of you – for stopping by, whether this is your first visit, one of many, or even…  both your first AND last visit :)

Disclaimers

1) I may disable this account at any moment if things get too crazy.  Check in with me on Twitter if you want to see status updates.

2) If you are not Christian, please don’t take any offense.  Trust me, I am no more Christian than you are – I’m pretty much not religious in any accepted sense, but I do love the trappings of the holidays I was raised with.

3) I am of course not associated with Wikileaks – big shocker I know.  I personally don’t have much trouble with Wikileaks though – the less secrets big organizations have from the little folks, the better, in my opinion.  So I can either take it or leave it when it comes to Wikileaks, but the reaction, the move to freeze assets and shut down any mention of them in the press – well THAT is pretty scary.

Anyway, enough disclaimer crap.  Enjoy, everyone.  I hope you find this as entertaining as I did…  which is pretty unlikely now that I think about it :)


PowerPivot time intelligent functions: why use ALL() and how to work around it

June 3, 2010

Posted by: Kasper de Jonge

Last week I got a most excellent question from Sasha at my question page. Using my timeintelligent function screencast Sasha created a workbook using  YTD  where he used one table with facts and dates in it. The problem he had was when created his time intelligent function he wasn’t able to use data from other columns than the date columns. In this blog post i´ll try to describe how the time intelligent functions work, what pitfalls are and how to solve them.

To be able to use a time intelligent function in DAX you use the Calculate function to group measures by a filter. With a time intelligent function you want to filter your values over a period of time (like YTD or previous month).  Most of the time you want the use these functions inside a pivottable where you use dates on the x or y-axis, the values inside the pivottable would show values per the current period context. This would logically result in the following DAX formula:

Dates YTD = CALCULATE(sum(Tablix1[nroforders]),DATESYTD(Tablix1[Date]))

This writes out to: you want the sum of Tablix1[nroforders] from the first Tablix1[Date] value of the year to the Tablix1[Date] belonging to the Tablix1[nroforders] in pivottable context. The YTD of the Tablix1[nroforders] from march 2009 would mean we need to take the sum of Tablix1[nroforders] from all rows from the start of year to march 2009, in the image below you can see a sample of the values that will be summed:

But when we add the formula to the measures and use it in a pivottable we see something strange:

As you can see the nroforders and the YTD formula result in the same values … this is not what we expected. The reason is because the time intelligent function requires an additional parameter, in a blog post at PowerPivotpro.com the PowerPivot product team gave the following answer to my question to why the result is not as expected:

When using Time intelligence functions like LASTDATE or DATESMTD, it is necessary to add another argument which is ALL (TimeTable). Without this only the selected dates are considered, and you can’t find the last month unless you are in the last month.

As you can see in the screenshot this is indeed what happens, the function only uses only the current date context in the sum.

Ok so we need to use ALL to get the results we want, this will result in the following syntax:

DatesYTD w All =CALCULATE(sum(Tablix1[nroforders]),DATESYTD(Tablix1[Date]),all(Tablix1))

This is indeed the result we expected, but having to use the ALL() function has a huge downside.  The ALL() function according to BOL:

Returns all the rows in a table, or all the values in a column, ignoring any filters that might have been applied. This function is useful for clearing filters and creating calculations on all the rows in a table.

This means when you use a filter or slicer this is being ignored by the time intelligent function that uses ALL(). As you can see below the results of the YTD are the same as unsliced:

There are two methods we can use to work around this problem:

  1. When you know what slicers and filters you want to work with you can use an ALL() alternative: ALLEXCEPT(). With ALLEXCEPT you can pass through “a list of columns for which context filters must be preserved”.
    In our case we would like to be able to slice on country. This would look like:
    DatesYTD w AllExcept =CALCULATE(sum(Tablix1[nroforders]),DATESYTD(Tablix1[Date]),allexcept(Tablix1,Tablix1[country]))
    I would say use this option if you already know what you want to slice on and don’t have much time to solve it properly.
  2. The proper and most flexible is option number 2. To be able to slice / filter on all the columns you can think of you should create a separate time table. This isn’t very user friendly and your end users will have a hard time grasping this. A few options i can think of to create this time table:
    1. Import the fact table with distinct on date columns (i hope your table isn’t too big ..)
    2. use excel to copy the date rows, remove duplicates and create a linked table (new dates won’t be added)
    3. Supply your users with a default time table in SharePoint/SSRS, they can import this by using the data feed option, use your DWH datetime table as source.

you should create a relationship on the datetime column between the fact table and the the imported time table (make sure your datetime columns have identical granularity, like year, month, day, otherwise the join wont find results). With this relationship in place you now are able to use ALL over the datetime table. When you use ALL() over the separate time table it no longer ignores filters / slicers over your  fact table.
The function would look like:
DatesYTD w All TT =CALCULATE(sum(Tablix1[nroforders]),DATESYTD(DateTable[Date]),all(DateTable))
You now are able to filter or slice all the columns from the fact table you want. In the RTM version of PowerPivot the ALL() in the DAX function is no longer required, the separate time table unfortunately still is.

The final workbook now looks like:

My conclusion is that Time intelligent functions still are a very powerful feature but i am really disappointed on its user-friendliness, while these workarounds are easy for IT/BI personnel to grasp and implement, end users will have a harder time implementing this. Maybe MS can implement a “add date time table” button in PowerPivot to automatically create a date time table to our PowerPivot tables to make it a little easier for end users to implement time intelligent functions.


Review new PowerPivot workbooks on SharePoint using approval workflows

June 1, 2010

Posted by Kasper de Jonge

Now something not really about the PowerPivot engine but something i think will be very nice to set up on yourSharePoint environment.

At the excellent linkedin discussion on PowerPivot, someone suggested that we might make review of a PowerPivot file possible. I answered we can because the PowerPivot gallery is a document library and we can manage approval workflows on this gallery. But i also wanted to check it out how I can do this.

In this blog post i’ll show you how to configure a PowerPivot gallery so that all new PowerPivot workbooks published to a PowerPivot gallery are reviewed by IT. I used a blog post i found which explains howto configure approval workflows step by step.

We start at our PowerPivot Gallery:

To enable workflows we go to Workflow settings in the library tools:

Select the current library and click add new workflow, we now get to our workflow settings:

We want to use the Approcal – SharePoint 2010 workflow, set a appropriate name  “PowerPivot approval”, we want a new task to be created at the reviewers task pane for every document that is added to the library. Next, we define the history list for the workflow. History lists are special logs which monitor the execution of the workflow. At each step of the workflow, messages at written to the log upon execution of the step. I want the workflow only to be executed when i upload a new file to the gallery:

We now can set up who the approvers are responsible of reviewing the PowerPivot workbooks, the settings are pretty straightforward.

Assign approvers, enter a request they will see at the workflow. The other settings aren’t necessary to complete the workflow.

We now can save our workflow and it will be in effect immediately.

When i now upload a new document to my PowerPivot gallery, we can see that the PowerPivot approval is in progress:

When i would look at my tasks lists within the same site collection i would see my new PowerPivot sheet is awaiting my review:

And we can open this workflow and comment on it, we can also open the document to look inside it:

We just approve my workbook now. When we check back at the gallery to see our workbook is approved:

So using workflow we can make an additional step in our Managed self service BI, and it is very easy to set up.


Building a cash flow statement in PowerPivot using dynamic measures in DAX

June 1, 2010

Posted by Kasper de Jonge

UPDATE FROM ROB:  You can now find all kinds of Accounting and Financial Techniques filed under a category created specifically for them.  To view those posts click here:

http://www.powerpivotpro.com/category/accountingfinancial-techniques/

Also, there is now also ANOTHER in-depth example of cash flow statements here:

http://www.powerpivotpro.com/2011/12/cash-flow-statement-in-powerpivot/ 

(Now, back to Kasper’s excellent post)

I got an excellent question at my “Ask a question page” on my blog. Greg asked me if i knew of a sample for a cash flow statement in PowerPivot. I did not know of one, and to be honest i didn’t even know what a cash flow statement was :).

Searching for a sample i found this picture of a cash flow statement:

This made things more clear. We want to see income and expense in two different tables. And then in a new table the cash flow statement where we can see the starting cash at the start of a month, income and expense in the month and the ending cash with the values at the end of the month.

I got really excited to solve this using PowerPivot, so I decided to build a sample myself. First i had to create a fact table that contains the values of the income and expense, i used the picture from above to create sample data:

As you can see we have income and expense for days in three months. To determine if a specific activity is income or expense, i created a secondary table that groups my activities:

We can create a relationship between the two tables so we can put them in a table and calculate a measure with income – expense.

Next i created a separate related time table to use in time intel. functions:

We create relationships between the date from the facttable and the new date table

Now we can create the first table putting income and expense in one table per month using only the relationships between the tables and no DAX:

Now for the interesting part, how do we create the cash flow table.

What we really need is a way to create 4 different calculations per row for each month on column, as we can see in our sample we need to have the following measures:

  1. Starting cash= the total ytd of (sum of disbursements – sum of receipts) until the previous month
  2. Receipts = sum of Receipts at the current month
  3. Disbursements = sum of Disbursements at the current month
  4. Ending cash = the total ytd of (sum of disbursements – sum of receipts) until the current month

These measure will translate to dax as the following:

  1. TOTALYTD(Activities[Sum of Value](‘Group'[group] = “Receipts”) – Activities[Sum of Value](‘Group'[group] = “Disbursements”),DATEADD(‘Date'[Date],-1,MONTH))
    subtract values from group disbursements from values of the group receipts for the totalytd until the previous month
  2. Activities[Sum of Value](‘Group'[group] = “Receipts”)
    values from group  receipts in the current month (current column context)
  3. Activities[Sum of Value](‘Group'[group] = “Disbursements”)
    values from group disbursements in the current month (current column context)
  4. TOTALYTD(Activities[Sum of Value](‘Group'[group] = “Receipts”) – Activities[Sum of Value](‘Group'[group] = “Disbursements”),’Date'[Date])
    subtract values from group disbursements from group  receipts for the totalytd until the end of the current month

But how can we do this in PowerPivot ? We need to determine what to calculate per row .. to do this we can use dynamic measures in DAX, as in this excellent post from MSFT Howie Dickerman.

First we need to be able to put values on the rows on which we can base our measures, to do this i created a new table in excel with rows and loaded it into PowerPivot:

We now can put these row labels against months in a new pivottable:

Now we can create a measure which uses the current row context to determine what to calculate at the measure. When we use the VALUES function we can see what the current row context for our Cashflow[CashFlow]  is and use this in an IF  statement in the measure to calculate a different measure for each row.

The formula will look like:

= IF(COUNTROWS(VALUES( Cashflow[CashFlow])) =1,
IF( VALUES(Cashflow[CashFlow]) = "1 Starting Cash",  TOTALYTD(Activities[Sum of Value]('Group'[group] = "Receipts") - Activities[Sum of Value]('Group'[group] = "Disbursements"),DATEADD('Date'[Date],-1,MONTH)) ,
IF( VALUES(Cashflow[CashFlow]) = "2 Receipts",  Activities[Sum of Value]('Group'[group] = "Receipts"),
IF( VALUES(Cashflow[CashFlow]) = "3 Disbursements",  Activities[Sum of Value]('Group'[group] = "Disbursements"),
IF( VALUES(Cashflow[CashFlow]) = "4 Ending Cash", TOTALYTD(Activities[Sum of Value]('Group'[group] = "Receipts") - Activities[Sum of Value]('Group'[group] = "Disbursements"),'Date'[Date]) ,
BLANK())))), Activities[Sum of Value])

This will create one measure that depending on the row context shows different calculations.
Resulting the actual cashflow pivottable:
You can see that the values are calculated per cashflow type per month showing a different calculation. I hope this was what Greg meant :)
This example shows again the great power of DAX, a lot of things are possible. I still get really excited about PowerPivot.
The sample file can be downloaded from my skydrive.

Retailer Overlap Analysis Using PowerPivot

March 16, 2010

 
Clint Pondering Where to Buy More Cigars

 

“The Rite Aids on one side of town, the Walgreens on the other, and me right in the middle.”

-The Shopper with No Name (and a Fistful of Dollars)

 

One of the coolest things about my new job at PivotStream is all of the cool new data sources I get to play with.  I thought I’d take a brief break from the Rank member posts and share a little bit from one of those sources.

Retailer Overlap Report

I just finished cooking up this report – click for full-size version.

All Retailer Intersections with PowerPivot

What does that show?

It shows the top 15 US retail chains by number of stores, and their competition against one another by ZIP code population.

For example:  take the first row, Albertsons.  Of all the people who live near an Albertsons, 65.9% of them also live close to a CVS, but only 4.7% live near a Rite-Aid.

Cool, huh?

Marketing versus competitors

Let’s say you are in charge of advertising and promotions for a large grocery or drugstore chain.  One of your primary jobs is to lure consumers into your stores as opposed to your competitors’.  Every day you receive a collection of all the specials, advertisements, etc. that your competitors are running in various regions of the country.

Naturally, you want to respond to their efforts.  But you can’t focus on them all equally, or you’d fall hopelessly behind.  So, which ones should you pay the most attention to?  This is the kind of thing that would help you.

Sliceable too!

OK, let’s say you work for Safeway, and you only run advertising for them in the South region.  How useful is the nationwide report above?  Hard to say.

For instance, nationwide, Walgreens competes with you for 47.3% of your customers, and Rite-Aid is second at 36.1%.  Does that hold up in the South?

One click and you have a brand new report:

South Region Only Retailer Intersections with PowerPivot

And look, Walgreens falls to 14.1%!  Rite Aid climbs to 52.5%…  but CVS now checks in at a whopping 73.5%!

So no, the national report would mislead you.  Same thing would be true if you ran all newspaper advertising nationally for Safeway, but you were ignoring an ad that CVS was running in the South.

Also Sliceable by Income Range!

Lastly, let’s say you work for ShopRite, and you control advertising in the South region.  Walgreens is running a monthlong special on Product X, and you happen to have the ability to run a better special on Product X…  but should you?

Well, you consult the report above, and it shows you that Walgreens only competes with you for 34.9% of your customers.  So you decide not to run a special on Product X, since that costs you a lot of money for not enough benefit.

But suppose you knew that Product X was primarily targeted at households making between $50K and $75K.  Slice by that income and you get:

South Region Only Specific Income Range Retailer Intersections with PowerPivot

Hey look, Walgreens competes with you for 61.6% of your customers in that income range. 

So that ad campaign might be targeted specifically at stealing business from ShopRite, for all you know :)

How did I do this?

It’s complicated, and once again pushed the bounds of what I know about DAX measures.  It starts out a lot like a Market Basket Analysis, which takes some work to explain, but then it takes another twist.

I’ll explain in subsequent posts :)

Flexibility!

But the crucial thing to note here for now is that this amount of flexibility is impossible in Excel by itself.

I mean, if I wanted to build a report that showed overlap between any two retail stores, with particular variables like region and income level held constant, I can do it.  Takes awhile but I can build one.

But as soon as someone says something like “I want to see this report filtered down to a different region,” well, you’re doing surgery on your original report.  Sometimes that takes almost as long as it did the first time around.

And when you’re done, well, now you have TWO reports to maintain.  Fun fun!

That’s the coolest thing about this report for me:  it remains flexible.  Its consumers don’t have to ask me for new versions of the report.  They can just click to filter.

And if they want new variables added, like racial demographics, that’s just a matter of adding new calculated columns to the PowerPivot window, not a new report.  I might even do that for next time.


Guest Post: Dany Hoter on Cube Formulas

March 1, 2010

 
album shot

“OK, I want the brooding frontman in the foreground, and the keyboard player and drum machine programmer in the back.”

-Photographer for the EuroSynth band, MDX

I’ve always loved this photo.  Dany Hoter, master of the analysis arts, takes his rightful place in the foreground.  Pierre-Henri and I, lurking in the back with mirthful smirks.

It always looked to me like the back cover of a cd booklet – this is kinda what I expected the Chemical Brothers to look like.  Alas, there is no cd, and no band.  This was taken in 2003 by Allan Folting (the Pivot Master) while the four of us were awaiting a hydrofoil car ferry (!) in Denmark.  (Post-production effects a la U2’s “Unforgettable Fire” provided by Dave Gainer, former Group Program Manager for Excel).

Today folks, we are honored to have Dany, the world’s leading authority on cube formulas, finish the lesson that I began.  A fitting conclusion indeed.

(For more information on Dany, one of my favorite human beings, be sure to check out his profile).

Take it away, Dany…

In the last chapter of the Fantasy Football case, Rob introduced the Excel cube functions. These 7 functions were introduced in Excel 2007 and are not directly related to PowerPivot. The cube functions purpose was to enable data from OLAP cubes to be brought into Excel cells using functions and thus enable any shape of report combining other Excel formulas, data from different cubes in one report etc.

Because PowerPivot create a data source which is compatible with OLAP cubes it can be consumed by these functions and open up many interesting scenarios.

In Rob’s post he used cubeset to generate the children of a member and to show the top N members in the set using a measure.

In the comments that followed Rob’s post people were asking about allowing filters on other dimensions while calculating the top N members in the set.

Download the sample now!

It is probably easiest if you go ahead an open up the workbook I created, located here, and then follow along below.

How it works

The only thing we need to keep in mind in to achieve our goal is to make sure that the order of the members in the set is by descending order of the measure combined by the filters. If you want to report on the top 10 products in June and show how they are selling in July , you need to make sure to get the top 10 based on sales and June. May be you want also to use a specific geography and report on the top 10 products in Canada in June.

The way to sort a set by a combination of values from different dimensions is by creating a tuple which is in MDX terminology a collection of single members , each from a different attribute(Or hierarchy). Sales of Ice cream in Seattle in June is a tuple with three members in it. A tuple can be created by reusing one the functions we already met in the original post – cubemember.

If you supply a range of cells to cubemember instead of just one it will create a tuple from all the members represented by the cells. It is your responsibility to make sure that each cell in the range contains a valid member and that they are all from different fields/columns.

In the attached example you can see such an example based on AdventureWorks data.

clip_image002

You see three slicers, on Country year and month. Notice that I use the month number and not the month name because I haven’t found a way to sort the months by their number value while showing their names.

In the upper box you can decide how to create the top 10 product report. You can choose between the selected month sales and the previous month. You can also choose between using just the measure value or a tuple which is created from the values of the filtered members in the slicers + the measure. In order to use the filtered member I need to extract the first member from the slicer set using cuberankedmember. A set is not a valid argument to cubemember.

So the formula to extract the first member from the countries filter is =CUBERANKEDMEMBER("Sandbox",Slicer_EnglishCountryRegionName,1)

And the formula using to define the tuple is =CUBEMEMBER("Sandbox",(O3,O4,O6,IF(L3="Selected Month",P5,O5)),"Tuple")

P5 extracts the filtered month while O5 contains the following formula =CUBEMEMBER("Sandbox","[Time].[MonthNumberOfYear].["&P5&"].lag(1)")

The MDX function lag(1) returns the previous member of a given member.

The solution has one main flaw: it cannot guarantee the right order and thus the true top 10 products if the user selects more than one member in any one of the filters. The reason is that multiple members cannot participate in the tuple creation and so only the first member in the filter set can be used. In the example a message is shown if the user selects multiple members .

We can be even more strict and blank the report completely if the value of cell Q3 is >1. This cell contains the formula =CUBESETCOUNT(Slicer_CalendarYear)*CUBESETCOUNT(Slicer_EnglishCountryRegionName)*CUBESETCOUNT(Slicer_MonthNumberOfYear)

Remember that each slicer returns a set with the filtered members and so can be consumed by cubesetcount and cuberankedmember. The same is true for any page field in a pivot.

The set in a page field is referenced by its cell address while the set in a slicer by using the slicer’s name.

I use conditional formatting to show only rows that are not empty and to show a border for the non-blank rows.