Quick Data Prep Trick: Use a Flat Pivot!

February 11, 2010

Today I had a huge source table that looked like this:

Store Month Payment Type Some Number
A Jan Cash 10
A Jan Credit 12
A Jan Debit 13
A Feb Cash 14
A Feb Credit 17
A Feb Debit 9
B Jan Cash 21
B Jan Credit 12
B Jan Debit 6

That went on for about 100K rows.  Thing is, I didn’t care about the Payment Type, and never would.  (My data set was a little different than this, so it’s hard to explain why I’d never care).

What I wanted was a much smaller source table that collapsed the PaymentType column and aggregated Some Number:

Store Month Sales
A Jan 35
A Feb 40
B Jan 39

Which would end up being about 1% of the rows.

My first thought was to create another table in PowerPivot and then use formulas to aggregate data from the big table into the new, small table.

But how to populate the first two columns of that new table?  Just getting all of the distinct pairs of Store/Month was going to be tedious work – I had > 200 stores!  I was contemplating VBA macros.

Then it hit me:  Flat Pivots!  (Kasper and I had been talking about them today for another reason – I think he’s going to share that soon).

What’s a Flat Pivot?

Flat PivotIt’s a new feature in Excel 2010.  Once you insert a PivotTable, you can set it to appear flat.

Thing is, I forget how you do that, because PowerPivot makes it SUPER easy at Pivot creation time, as shown here at right.

Drag Store and Month onto Rows, and heck, throw Sales into the Values area!  That yielded a PivotTable like this one below:

       Flat Pivot Results 1

Gotta get rid of those pesky subtotals now.  On the Design tab of the ribbon is the button I need:

No Subtotals
Which yields this as my Pivot:

Flat Pivot No SubtotalsPowerPivot Paste New Table ButtonCopy that to the clipboard and switch back over to the PowerPivot window.

On the main ribbon tab, click the big fat Paste button:

That yields a new table in PowerPivot:

Results Table in PowerPivot Window

It has all the unique combos of Store/Month, AND it already has the aggregated numerical column!  No formulas needed.

Took about 2 minutes.  Keep this in mind, it will save you a TON of time someday.

Hey, your PowerPivot Window is Green!!!

Why yes.  Yes it is.  Jealous?  Cough cough, GREEN with envy?

Don’t go looking for the setting.  It’s not there yet.  This is an improvement in recent builds of the addin.  You might also have noticed that the PowerPivot ribbon visuals are a little crisper.

Coming soon to a desktop near you :)


Jewelry Mashup for Alpha Geek Challenge

February 11, 2010

PowerPivotBookHello everyone. Rob invited me to do a guest post to talk about my Wives vs. Girlfriends entry in the Alpha Geek Challenge. That is a politically incorrect and provocative title for a real data analysis mash-up that I did with PowerPivot.

I spent the last 33 days locked in a room finishing five manuscripts for books about Excel 2010. One of those books will be PowerPivot For the Data Analyst from QUE. In the course of writing that book, I had to generate a lot of sample data sets to use as examples.

Some interesting things along the way:
1) I had some daily sales data for 2 ladies boutique stores. One is in a mall. The other is in an airport located 10 miles away. 3 years of mall history, only 2+ years of airport history because that store opened after the mall location.

2) Beginning theory: when there are rain delays, people have more time in the airport, so they buy more.

3) Beginning theory: when it gets hot, people go to the mall.

WeatherUnderground4) I knew that Rob had found weather data from somewhere, but as I started searching around, I could not find any good free weather databases. I did find a page a WeatherUnderground.com where you could retrieve weather statistics for one day for one city. Luckily, both of these stores were in the same city, so I only had to gather data for one city.

Unfortunately, getting three years of data meant visiting this web site 1095 times! Excel VBA lets you use the brute force approach, so while I was typing a chapter on one laptop, I set up a VBA macro on the other laptop to run an Excel Web Query over and over and over, changing the date in the URL each time. As each web page would come back to Excel, I would use VLOOKUP to pluck out the rainfall and the high temperature for the day.

5) Getting the data into PowerPivot was easy. Two linked tables. One relationship by date.

6) In the PowerPivot grid, I added some calculated columns with DAX:

  • To group the temperature data into buckets, I used a column called “Highs in the” and a DAX calculation of =INT(Weather[High Temp]/10)*10.
  • To group the daily dates into weekdays, I used =FORMAT(Sales[Date],”ddd”). This returns Mon, Tue, Wed, etc. You have to use Colin’s trick from the FAQ to actually sort those into custom date sequence. (Thanks Colin!)
  • Back in Excel, I had used the range version of VLOOKUP to assign classifications based on rain. The rainfall in inches was classified into “0-None”, “1-Trace”, on up to “4-Hurricane”.
  • I did not realize it when I started, but as I was doing DAX Measures, I found the need to have some concatenated fields so that I could do a count distinct. =CONCATENATE(Sales[Location],Sales[Date]) and =concatenate(Sales[Location],Sales[WeekdayName]).

7) I used the “four chart” layout in PowerPivot. I ended up using only three of those charts in the contest entry.
8) The important Measure that was enabled by DAX was “Sales by store by day that falls into the current filter context”.  There were some days in 2007 where only one store was open, and many days where 2 stores were
open. DAX Measures let me calculate total sales / (# of stores open on this day). That DAX Measure is: =sum(Sales[Net Sales])/Countrows(DISTINCT(Sales[LocationDays]))

9) During discovery, I used slicers, but for the contest entry, I needed different charts to have different filters, so I changed the slicers back to report filters so that one chart could be for the airport and another chart could be for the mall.

Surprising results:
As rainfall increases, sales plummet the airport. I guess that when people are freaking about because of flight delays, they are not busy shopping.

Chart1

Higher temperatures mean lower sales at the mall. When I was a kid, we would flock to the mall because they had air conditioning. Now, the people buying these high-end handbags probably already have air conditioning and no longer need to head to the mall.

Chart2

As I was looking for other trends, I ran one by weekday at the airport. Friday was the peak sales day during the week, probably indicating that business travelers are picking up a piece of jewelry on their way back home to their wife. But; I was surprised to see that the one day which beat Friday was Sunday. Sunday? If the business travelers are buying gifts for their wives on Friday, WHO the heck are they buying gifts for on Sunday? One theory is noted in the chart. In reality, that Sunday spike is probably from business women who are accessorizing up before heading out for their business trip. It would probably be interesting to break that Sunday data by hour to see if the sales are early in the day when people are heading home from weekend trips to the beach or on Sunday afternoon/evening when the business commuters are heading back out.

Once I had reached some interesting conclusions about the data, I spent some time formatting the workbook to not look like Excel. I got rid of a lot of the extra stuff that is standard on Excel charts, but then added some graphics (the rain is a pattern fill on the chart area in the left graph and the Delay Delay Delay graphic is a picture fill applied to the plot area. To use pictures in individual columns of the other charts, you have to click on a column twice. Once to select the series, and then a second time to select that one point. Then, use Format, Fill, from Picture.

I suppose the surprising PowerPivot lesson here was that the slicers were really helpful to me when analyzing the data, looking for interesting trends. Once I had decided on the message that I wanted to convey, though, the slicers had to go.

Thanks to Rob for allowing this guest post!


Calling for Guest Posts!

February 11, 2010

 stephen

“This must be a fashionable fight, it’s drawn the finest people.”

-Stephen

Last week, in in one of the comment threads, several of you were all over me with suggestions and questions, in a very good way.

And I think that’s to be expected, even though I know my way around this stuff pretty well.  I mean, “I can take the Pepsi challenge with that Amsterdam stuff any day.”  (That’s for you, Kasper.  And Pulp Fiction fans everywhere.)

But no one can really know every trick, every optimal method.  Excel and PowerPivot are both just too deep for one person to master everything.  And this is why community interaction is so much fun!  Everyone has something to teach, to suggest, or even to ask – and all of those activities enrich us all.

With all that said, it’s time for me to stop hogging the microphone.  If you have something cool you want to share – a data set you used, a technique you discovered, something you saw me do but you think you can improve it, let me know.  I’d be happy to post it here.  If you have your own blog, you can of course post it there, too – I’m just offering my site as a way to get more exposure for your ideas.

So, drop me an email if you have something in mind.  This is a standing offer.

And don’t worry, I’m not going to stop posting.  You can’t keep me away from this stuff :)


A few new FAQ features

February 10, 2010

Quick update so that everyone is aware of a few things the Great PowerPivot FAQ can do for you.

Have you ever noticed the View dropdown?  It contains 3 items of note:

View Dropdown

Latest View

This view shows you the most recently updated FAQ items at the top.  Useful if you just want to see what’s new on a particular day.

Portuguese Translation View

Yes, this is real :)   Thiago Zavaschi has been translating FAQ content from English to Portuguese, to satisfy the hungry PowerPivot audience in Brazil :)

If you see a need for another language translation and would like to volunteer, please let me know.  It’s very easy for me to set up another view.

Unanswered Questions View

I know, hard to believe, but there actually ARE a few questions posted that have no answers yet.

If you see a question here that you know the answer to, please drop me or any of the moderators a note.  We will credit you with the answer :)


Dirty Swedish Video

February 9, 2010

What Did You Say About Comptrollers

No, not that kind of video.  Sorry to get your hopes up.

No, you see, in my visit to Sweden, they recorded a promotional video of me saying things about PowerPivot, as a teaser for the recording of the entire presentation.

And they made me say terrible, unspeakable things.

I never wanted to have those kinds of thoughts about accountants.  But you can’t unring a bell.

See for yourself.  But you won’t be able to unthink this thought once it’s been, um, emitted.

Swedish PowerPivot Teaser


Surprising Example of PowerPivot Compression

February 8, 2010

Tony and Mike

“Best not analyze too deeply on this one, huh?”

A long time ago I promised a guy named Sam that I would dig up some examples of PowerPivot compression.  I then, of course, forgot all about it until today.  Sorry Sam.

But today I was playing with a data set on my desktop machine that was really getting me down.  The data was provided as a (regular, non-PowerPivot) Excel file with 500K rows in it.  And my machine labored to do anything at all with that data – opening it took forever.  Deleting a single row took forever.  And so on.

But this was the same machine I had used to do a PowerPivot demo with a 100M row data set, which ran with no problem!  0.5% of the data was bogging me down in Excel!

So I decided to do a “before and after” comparison.

Hypothesis

Based on my knowledge of PowerPivot compression, I expected that PowerPivot would do a little bit better than Excel on disk, and a lot better in RAM.

Didn’t quite turn out that way :)

Results

       PowerPivot Compression Relative to Excel

 

Data Stored in Excel Sheet

Data Stored in PowerPivot

File Size on Disk

126 MB

10.2 MB

RAM consumed by Excel.exe

1200 MB

140 MB

Wow.  8x better in RAM, 12x better on disk!  (I was SO tempted to use a pie chart.  Just kidding).

Good thing I ran the experiment twice!

First time I did this, the PowerPivot RAM number was 600 MB.  Still 2x better than Excel, but 8x is much better :)

Not quite sure what was going on with that 600 MB number.  I’d had Excel open for awhile and probably hadn’t closed Excel.exe completely after closing the native Excel file.  Probably that was it.

OK, don’t analyze!  Rejoice!

Bottom line:  12x better on disk and 8x better in RAM, no matter what, is tremendous.  Not to mention the performance of working with the data and how much better that is.

And PowerPivot’s data compression ratio only gets better with larger data sets.  500k rows is big, yes, but remember, this same computer handled 100M, which I obviously can’t even test out in Excel for comparison.

Now that the data is loaded into PowerPivot, I don’t even have to use my desktop machine to work with it.  I’m switching over to my laptop.  The laptop is even running 32-bit whereas the desktop is 64.  But hey, the laptop fits on my new treadmill desk :)

And my waistline could use some compression.


Captain Ramius Returns!

February 7, 2010

008668


“Once more, we play our dangerous game, a game of chess against our old adversary – the American Blogger.”

-Capt. Marko Ramius
(aka the Vilnius Schoolmaster)

We Have Sonar Contact

Ramius Returns

Yep, there you are again, I was hoping you’d come back :)

I’ve become convinced it was just a misunderstanding.  Weird that it made him angry enough to go around posting about my incredible hacking skills on other sites of course.

But a misunderstanding nonetheless.

So anyway Tom, drop me an email.  I won’t embarrass you.  I much prefer being friendly, and I’ve never known anyone from Vilnius :)   I’ll even send you an autographed PowerPivotPro coffee mug as a peace offering.

Pretty sure this is the last time I’ll post on this topic.  It’s been a fun Saturday diversion, though, and any opportunity to work in a good Red October quote is of course quite welcome :)


Espionage? Spam? Ravings of a lunatic?

February 6, 2010

Interesting thing happening today.  Someone commented on several of my posts today, saying this:

“powerpovotpro.com(PPP) hijacks your google.reader(GR) by always showing as the primary feed selection whenever you first enter GR or refresh GR. After adding PPP feed GR ignores your last viewed feed and your startup feed, even after removal of the PPP feed GR offers bugs you by displaying PPP feed and offering to subscribe to it. Behavior observed in firefox. Cured by removal of the cookie.”

He then went on to say he was going to report me to whoever would listen, and to call me a “bunch of f—ing idiots.”

He needs to get his facts straight.  I am merely a SINGLE “f—ing idiot.”  No other “f—ing idiots” are involved in the production of this site, at least not yet.

Posting to other sites saying the same thing about me!

OK, that’s funny enough.  But Dick Moffat just emailed me and told me the same guy posted the same thing as a comment on his blog.

Not cool.  And in neither case did he leave a legitimate email address.

I use WordPress.com’s standard RSS feature!

Folks, I don’t even host my own blog.  It’s technically powerpivotpro.wordpress.com, and runs entirely on wordpress.com servers. 

My RSS link is just the standard wordpress.com RSS module.  Nothing more.  So perhaps that module is doing something funky.  I doubt it, but if something strange is going on, it is not my fault :)

Lithuania for the win!

The only info I have on this person is this:

The Vilnius SpamMaster

So they spent an hour on my site before deciding to teach me a lesson, for something I did not do.

(Vilnius!  The hometown of Marko Ramius of Red October!  The Vilnius Schoolmaster!)

Anyway, “Tom,” as you called yourself, if you are out there reading this, please drop me an email and maybe we can clear this up.


PowerPivot Data in Scatter Charts, etc.

February 5, 2010

All Chart TypesThis was going around on the MS-internal discussion list this past weekend:

At right is a list of all chart types in Excel.  But not all of them are supported as PivotCharts.  Try to use XY (Scatter), Bubble, or Stock as a PivotChart, and you’ll get the alert below:

Illegal PivotChart Types

 

So, do you give up?

Convert to Cube FormulasNope.  Remember the post where I converted a PivotTable to Cube Formulas using that button on the ribbon?  This is a wonderful opportunity to use that feature in an unexpected way.

Once it’s converted to formulas, you can create any chart type you want against it, no restrictions.  It’s just regular old cells now.

Neat huh?

 

But what about grow/shrink?

The next question is usually “OK great I can scatter chart my data points.  But tomorrow, I may have a different number of data points.  Will my spreadsheet pick that up?”

The answer is “no, not even close.”  :)   When you convert to formulas, the original list of rows/columns is fixed, by virtue of the CUBEMEMBER function that makes up the axes.

Intrepid PowerPivot adventurers, however, may want to try out the following technique, by which the data WILL be picked up in your chart.

Oh no, not CUBESET again!

Oh yes.  CUBESET.  Again.  You’re gonna use that CUBESET function, and you’re gonna like it!

But you might have a love/hate reaction to some of the other Excel acrobatics I am about to share :)   For something that Excel actually supports natively, and seems to have gone out of its way to support, this should be easier to do than it is.  But why complain?  Fact is, it CAN be done, and it feels like magic.

It will be easiest if you follow along on your own desktop, so PLEASE GRAB THE WORKBOOK FROM HERE – that contains the results of everything I show below.

OK, you have the workbook now, right?  Let us continue.

To simulate the effect of getting more/less data, modify the Set in cell A1 to grab any column out of [DimCustomer] that you’d like.  Since each column has a different distinct number of members, you’ll get behavior that is essentially the same as more/less data after a refresh.

The “magic” here lies in two things.

First is defining named ranges that refer to an OFFSET formula rather than a fixed range:

clip_image002

Here is that formula in case you want to copy paste it:

     =OFFSET(Sheet1!$C$4,0,0, COUNT(Sheet1!$C$4:$C$5000),1)

The formula I used for YRange is simply another offset off of that first named range (XRange), using this formula:

     =OFFSET(XRange,0,1)

Second is this:  YOU MUST NOT RELY ON THE REFEDIT CONTROL TO POINT YOUR CHART AT YOUR DATA

The reason is, that refedit control will eat your dynamic named range and convert it into a fixed range, destroying all of that incredibly obscure work you did above J

This is the chart refedit control, for reference.  What you put in here will NOT adjust, ever.

clip_image004

Instead, do this:

  1. Create the chart.  Point it at whatever data you currently have, as if it was going to be a static chart (yes, use the refedit control for now)
  2. Now, in the created chart, select the data range within the chart.  If you do this right, your data points will get selection handles and the SERIES formula will appear in the formula bar, like this:

clip_image006

3. Then you edit that formula, replacing any range of cells with your dynamic named ranges, XRange and YRange

   a. Note that I’ve already done that in the picture above

   b. Also note that you must prepend the name of the workbook, then a “!”, to your names in order
        to make them work

   c. You must also perform the happy Excel dance in order to appease the ancient lords of recalc

Also of note:

  1. I only copied the formulas down to row 5000, and set my named ranges to cap out at row 5000 as well.
  2. Some fields will overshoot that 5000th row.  I could have gone farther of course, I just got bored.  You can pick a bigger number.
  3. I wrapped all formulas in IFERROR so that I don’t get error popups.
  4. If you don’t mind popups, I believe this whole thing works without IFERROR
  5. If one of your cube formulas returns an error for any reason other than running off the end of the data, you will miss data points
  6. That’s because the dynamic named range is based off of COUNT, rather than off of finding the last valid data point
  7. Even if I had not used IFERROR, the errors trigger the same problem
  8. It might be possible to fix this problem using INDIRECT and some even more arcane techniques

Simple, right? :)


Tommy Farmer? Donny Chong? A Photo Essay

February 4, 2010

Given that it’s Doppelganger Week on Facebook, I thought it was time to show you what I’ve been talking about.  Donald Farmer and Tommy Chong – not just separated at birth, but living intertwined lives ever since.

Check it out.

25 blog_author_donald_lg9 Donald_Farmer          4
7 Capture 3 1

Photos, Left to Right, Top to Bottom:  Donald at University, Greenwich 1976; Donald’s “Medieval Archaeology” Period, date unknown; Tommy, artist’s rendering on side panel of 1968 VW Minibus, 1981; Donald’s famous Speaking Vest makes its debut on Redmond campus; Tommy, mugshot, 1998; Donald, TechEd 2006 speaker photo; Donald on day 25 of his late 2009, 9-country, 27-day speaking tour; Tommy’s guest speaker role at SP Conf 2009 going awry; Donald on Larry King’s Extreme Makeover Edition; Tommy’s inaugural “Power Hour” podcast, Days Inn, Boulder Colorado 2006.

(Yes, Donald selected me as one of his top 10 BI bloggers of the year, and THIS is how I repay him.)