Cloud Data Approaching Critical Mass: Connection Cloud, SalesForce, PowerPivot, & Webinar on YouTube

December 13, 2012

 
image

Cloud Data Like SalesForce Available to PowerPivot as if it Were in a Local Database:
My Long Wish for a “Data Highway” Gets Closer Every Day
(Click for the Webinar Featuring Yours Truly on YouTube)

Flashback 2001:  The “Data Highway” Concept

Back at Microsoft in 2001 when I was working on what eventually became Excel 2003, I pitched a vision that I called “Data Highway.”  (OK, not an original name considering the Information Superhighway thing coined by Internet inventor Al Gore, but invention is smart and theft is genius, or something like that.)

The idea behind Data Highway was simple:  all relevant data made available to the most popular tools (cough cough Excel), in a convenient and refreshable format.  No manual gruntwork required to “fetch” data in other words – saving your brain for actual thinking.

imageThere were three elements to the pitch:

  1. A common internet protocol for exchanging data. 
  2. “Teaching” Excel, Access, and other tools to consume any data source exposed via that protocol.
  3. A marketplace for data where providers like Dun and Bradstreet could sell data to be piped straight into Excel.

Well the protocol flopped and our VP killed the marketplace idea before it got off the ground.  Having good ideas isn’t enough – you can’t be too early, and you also need to execute better than we did.

Fast Forward to Today

Here we are at the end of 2012, and we have all three elements available in different (but robust and real) forms:

Read the rest of this entry »


“Drill Across” in PowerPivot – Live Demo

May 24, 2012

Hyperlinks in a Pivot

“I’m telling you there are monkey-fighting hyperlinks in this Monday-to-Friday pivot!”

(Seriously this is how they cleaned up his line for TV, with “monkey-fighting” and “Monday to Friday”)

***UPDATE:  I am no longer working at Pivotstream and do not endorse their services.  All links are removed from this article but feel free to look them up if you are interested.

 

 

 

 

Retailer Competitive Overlap Application – New and Improved Live Demo

PowerPivot Retailer Competitive Overlap Application With Drill Across

Revamped/Simplified “Retailer Competitive Overlap” Application
(Note that the Row Labels Area of the Pivot Contains Hyperlinks!)

Clicking an Item to Get More Detail

The retailer overlap application is one that I’ve covered before, in my post announcing our live PowerPivot demo site, but I’ve recently spent some time improving it based on customer feedback and requests.

Specifically, our retail customers have asked the following:  “It’s great that I can see that Retailer X competes with me for our senior citizen customers much more aggressively than we thought, but can I get a list of the actual stores that overlap, with addresses?”

 

image

But WHICH Stores?  I Want to See the Addresses!

Hyperlinks in a Pivot!?

Let’s zoom in on the row area of the pivot pictured above:

image

Read the rest of this entry »


Datamarket: Quick Followup

April 19, 2012

 
image

“There are people out there whose jobs force them to be the place where two sources of data meet, and they are the ones who integrate and cross-reference that data to form conclusions…

…I think a lot of the world is like that.

-Bill Gates circa 2002

People are always asking me if I know Tyler Durden

I mean, I’m often asked if I ever met Bill Gates during my time at Microsoft.  I did, once, in 2002, when he wanted to review the XML features we were introducing in Excel 2003.

A few things from that meeting lodged in my head, and the quote above is one of them.  The first sentence is paraphrasing on my part as I don’t precisely remember.  But the last sentence, in italics, I remember word for word because I found it so validating of some of my own personal views and experience.

And I just realized, today, that quote should have been attached to one of the previous posts on DataMarket.

Real Reason for the Post:  People with V2 Don’t See the Same Thing?

Last week I posted a workbook that lets you download weather data from basically anywhere in the world, accompanied with instructions on how to customize it for your needs.

The workbook I provided was produced in PowerPivot V1 (SQL 2008 R2, versions 10.xx).

I received reports from a few people that when you got to the step of editing the connection in the workbook, you saw a different dialog than I saw.

clip_image001

Is This What You See When You Edit Connection In The Workbook I Provided?

What I See

From the previous post, here’s what I see, with the two things you need to change highlighted:

image

What I See

How To Make Required Changes If You Are Seeing the “Alternate” Dialog

image

Click Advanced

image

Fill In Required Info in These Three Places

That Should Do It

Let me know, again, if you have problems.  I am channeling feedback to Microsoft on this stuff so they can address any snags we hit.

Why is Microsoft paying such close attention to us in particular?

Because WE are those people where the data sources come together.


Apparently April 12 is “DataMarket Weather Day”

April 12, 2012

 
So cool…  Chris Webb ALSO posted today about downloading weather data from DataMarket.

And any post that starts with the words “I don’t always agree with what everything Rob Collie says” gets an immediate boost in credibility – very wise words indeed Smile

image

Click Image for Chris’s Post

Chris takes a different approach and goes through the full online UI rather than sharing out a pre-baked workbook like I did.  My approach was intended to make things simpler for you.  Let me know if I was successful Smile


Download 10,000 Days of Free Weather Data for Almost Any Location Worldwide

April 12, 2012

image

“And I feel, so much depends on the weather…
So is it raining in your bedroom retail?”

Example:  800 Days of Weather in New York City

image

820 Days of Weather Data from New York City, Pulled From DataMarket
(Temps in F, Precipitation in Inches)

Come on admit it.  It’s very likely that you would have a use for data like this, whether it was from a single location or for a range of locations, as long as the locations(s) were relevant to your work and the data was easy (and cheap) to get.

Good news:  I’m gonna show you how to get this same data for the location(s) you care about, for free, and make it easy for you.  Read on for the weather workbook download link and instructions.

First:  A Practical Application to Whet the Appetite

As I said in the last post, I think there’s a lot of important things to be learned if we only cross-referenced our data with other data that’s “out there.”

I happen to have access to two years of retail sales data for NYC, the same location that the weather data is from.  To disguise the sales data, I’m going to filter it down to sales of a single product, and not reveal what that product is.  Just know that it’s real, and I’m going to explore the following question:

If the weather for a particular week this year was significantly better or worse than the same week last year, does that impact sales?

Let’s take a look at what I found:

Impact of Weather on Sales of a Specific Product

RESULTS:  Weather This Year Versus Last Year (Yellow = “Better”, Blue = “Worse”),
Compared to Sales This Year Versus Last (Green = Higher, Red = Lower)

I don’t have a fancy measure yet that directly ties weather and sales together into a single correlative metric.  I’m not sure that’s even all that feasible, so we’re gonna have to eyeball it for now.

And here is what I see:  I see a band of weeks where sales were WORSE this year than last, and the weather those weeks was much BETTER this year than last.

And the strongest impact seems to be “number of snow days” – even more than temperature, a reduction in snow this year seems to correlate strongly with worse sales of this product.

Does that make sense?  I mean, when the weather is good, I would expect a typical retail location to do MORE business, especially in a pedestrian-oriented place like NYC.  And we are seeing the reverse.

Aha, but this is a product that I would expect people to need MORE of when the weather is bad, so we may in fact be onto something.  In fact this is a long-held theory of ours (and of the retailer’s), but we’ve just never been able to test it until now.

All right, let’s move on to how you can get your hands on data for your location(s).

Download the Workbook, Point it at Your Location(s)

Step 1:  Download my workbook from here.

Step 2:  Open it and find a location that interests you from this table on the first worksheet:

image

Nearly Four Thousand Cities Are Available on the First Sheet
(Pick One and Write Down the Location ID)

Step 3:  Open the PowerPivot Window, Open the connection for editing:

image

Step 4:  Replace the Location ID and Fill in your Account Key:

image

ALSO IMPORTANT:  Make sure the “Save My Account Key” Checkbox is Checked!

Don’t Have A DataMarket Account Key?

No problem, it’s easy to get and you only have to do it once, ever.

Step 1:  Go to https://datamarket.azure.com/

Step 2:  Click the Sign In button in the upper right to create your account:

image

Step 3:  Follow the Wizard, finish signing up.  (Yes it asks for payment info but you won’t be charged anything to grab the data I grabbed.)

Step 4:  Go to “My Account” and copy the account key:

image

Next:  Subscribe to the Free Version of the Weather Feed

Go to the data feed page for “WT360” and sign up for the free version:

image

I’ve Gone a Bit Crazy With This Service and Already Exhausted my Free Transactions,
But That Was NOT Easy to Do

Back in the Workbook…

Now that you’ve entered your account key, set the LocationID, subscribed to the feed, and saved the connection, you can just hit the Refresh button:

image

Send Me Your Mashups!

Or at least screenshots of them.  I’m Rob.  At a place called PowerPivotPro.  DotCom.

And let me know if you are having problems, too.

I’ll have some more details on Tuesday of things you can modify and extend, like how I added a “Snow Days” calculated column.


DataMarket Revisited: The Truth is Out There

April 10, 2012

image

How many discoveries are right under our noses,
if only we cross-referenced the right data sets?

Convergence of Multiple “Thought Streams”

Yeah, I love quoting movies.  And tv shows.  And song lyrics.  But it’s not the quoting that I enjoy – it’s the connection.  Taking something technical, for instance, and spotting an intrinsic similarity in something completely unrelated like a movie – I get a huge kick out of that.

That tendency to make connections kinda flows through my whole life – sometimes, it’s even productive and not just entertaining Smile

Anyway, I think I am approaching one of those aha/convergence moments.  It’s actually a convergence moment “squared,” because it’s a convergence moment about…  convergence.  Here are the streams that are coming together in my head:

1) “Expert” thinking is too often Narrow thinking

I’ve read a number of compelling articles and anecdotes about this in my life, most recently this one in the New York Times.  Particularly in science and medicine, you have to develop so many credentials just to get in the door that it tends to breed a rigid and less creative environment.

And the tragedy is this:  a conundrum that stumps a molecular cancer scientist might be solvable, at a glance, by the epidemiologist or the mathematician in the building next door.  Similarly, the molecular scientist might breeze over a crucial clue that would literally leap off the page at a graph theorist like my former professor Jeremy Spinrad.

2) Community cross-referencing of data/problems is a longstanding need

Flowing straight out of problem #1 above is this, need #2.  And it’s been a recognized need for a long time, by many people.

Swivel and ManyEyes Both Were Attempts at this Problem

Swivel and ManyEyes Both Were Attempts at this Problem

I remember being captivated, back in 2006-2007, with a website called Swivel.com.  It’s gone now – and I highly recommend reading this “postmortem” interview with its two founders – but the idea was solid:  provide a place for various data sets to “meet,” and to harness the power of community to spot trends and relationships that would never be found otherwise.  (Apparently IBM did something similar with a project called ManyEyes, but it’s gone now, too).

There is, of course, even a more mundane use than “community research mashups” – our normal business data would benefit a lot by being “mashed up” with demographics and weather data (just to point out the two most obvious).

I’ve been wanting something like this forever.  As far back as 2001, when we were working on Office 2003, I was trying to launch a “data market” type of service for Office users.  (An idea that never really got off the drawing board – our VP killed it.  And, at the time, I think that was the right call).

3) Mistake:  Swivel was a BI tool and not just a data marketplace

When I discovered that Swivel was gone, before I read the postmortem, I forced myself to think of reasons why they might have failed.  And my first thought was this:  Swivel forced you to use THEIR analysis tools.  They weren’t just a place where data met.  They were also a BI tool.

And as we know, BI tools take a lot of work.  They are not something that you just casually add to your business model.

In the interview, the founders acknowledge this, but their choice of words is almost completely wrong in my opinion:

image

Check out the two sections I highlighted.  The interface is not that important.  And people prefer to use what they already have.  That gets me leaning forward in my chair.

YES!  People prefer to use the analysis/mashup toolset they already use.  They didn’t want to learn Swivel’s new tools, or compensate for the features it lacked.  I agree 100%.

But to then utter the words “the interface is not that important” seems completely wrong to me.  The interface, the toolset, is CRITICAL!  What they should have said in this interview, I think, is “we should not have tried to introduce a new interface, because interface is critical and the users already made up their mind.”

4) PowerPivot is SCARY good at mashups

I’m still surprised at how simple and magical it feels to cross-reference one data set against another in PowerPivot.  I never anticipated this when I was working on PowerPivot v1 back at Microsoft.  The features that “power” mashups – relationships and formulas – are pretty…  mundane.  But in practice there’s just something about it.  It’s simple enough that you just DO it.  You WANT to do it.

Remember this?

OK, it’s pretty funny.  But it IS real data.  And it DOES tell us something surprising – I did NOT know, going in, that I would find anything when I mashed up UFO sightings with drug use.  And it was super, super, super easy to do.

When you can test theories easily, you actually test them.  If it was even, say, 50% more work to mash this up than it actually was, I probably never would have done it.  And I think that’s the important point…

PowerPivot’s mashup capability passes the critical human threshold test of “quick enough that I invest the time,” whereas other tools, even if just a little bit harder, do not.  Humans prioritize it off the list if it’s even just slightly too time consuming.

Which, in my experience, is basically the same difference as HAVING a capability versus having NO CAPABILITY whatsoever.  I honestly think PowerPivot might be the only data mashup tool worth talking about.  Yeah, in the entire world.  Not kidding.

5) “Export to Excel” is not to be ignored

Another thing favoring PowerPivot as the world’s only practically-useful mashup tool:  it’s Excel.

I recently posted about how every data application in the world has an Export to Excel button, and why that’s telling.

Let’s go back to that quote from one of the Swivel founders, and examine one more portion that I think reflects a mistake:

image

Can I get a “WTF” from the congregation???  R and SAS but NO mention of Excel!  Even just taking the Excel Pro, pivot-using subset of the Excel audience (the people who are reading this blog), Excel CRUSHES those two tools, combined, in audience.  Crushes them.

Yeah, the mundane little spreadsheet gets no respect.  But PowerPivot closes that last critical gap, in a way that continues to surprise even me.  Better at business than anything else.  Heck, better at science too.  Ignore it at your peril.

6) But Getting the Data Needs to be Just as Simple!

So here we go.  Even in the UFO example, I had to be handed the data.  Literally.  Our CEO already HAD the datasets, both the UFO sightings and the drug use data.  He gave them to me and said “see if you can do something with this.”

There is no way I EVER would have scoured the web for these data sets, but once they were conveniently available to me, I fired up my convenient mashup tool and found something interesting.

7) DataMarket will “soon” close that last gap

In a post last year I said that Azure DataMarket was falling well short of its potential, and I meant it.  That was, and is, a function of its vast potential much more so than the “falling short” part.  Just a few usability problems that need to be plugged before it really lights things up, essentially.

On one of my recent trips to Redmond, I had the opportunity to meet with some of the folks behind the scenes.

Without giving away any secrets, let me say this:  these folks are very impressive.  I love, love, LOVE the directions in which they are thinking.  I’m not sure how long it’s going to take for us to see the results of their current thinking.

But when we do, yet another “last mile” problem will be solved, and the network effect of combining “simple access to vast arrays of useful data sets” with “simple mashup tool” will be transformative.  (Note that I am not prone to hyperbole except when I am saying negative things, so statements like this are rare from me.)

In the meantime…

While we wait for the DataMarket team’s brainstorms to reach fruition, I am doing a few things.

1) I’ve added a new category to the blog for Real-World Data Mashups.  Just click here.

2) I’m going to do share some workbooks that make consumption of DataMarket simple.  Starting Thursday I will be providing some workbooks that are pre-configured to grab interesting data sets from Data Market.  Stay tuned.

3) I’m likely to run some contests and/or solicit guest posts on DataMarket mashups.

4) I’m toying with the idea of Pivotstream offering some free access to certain DataMarket data sets in our Hosted PowerPivot offering.

See you Thursday Smile


Movie Grosses and Inflation: Methodology

April 5, 2012

 
In the voting from the last post, the majority of respondents found the third ranking to be most realistic:

image

I agree – this is the one that I personally found most consistent with reality.  “The Exorcist” surprises me – I did not expect to see it in the top ten – but it was actually higher in the other two rankings.

Mashup of Movie Grosses with Average Historical Ticket Prices

Data Set #1:  Raw Movie Grosses

image

Raw Movie Grosses
(Source:  The-Numbers.com – Click Image for Source Data)

Data Set #2:  Average Historical Ticket Prices

image

Average Historical Ticket Prices
(Source:  BoxOfficeMojo – Click Image for Source Data)

Relate both of these to the Years table:

image

Related the Two Mashup Sets to My Common “Years” Table
(Yes, I Have Many Data Sets Now)

The Measures

Yes, it’s overkill in this case to use measures.  I could just use a calc column.  In fact I probably should.  But I have a thing for measures, they just feel more “portable” over the long haul.

[US Movie Gross] =
   MAX(MovieGrossesUS[Total Box Office])

[Avg Movie Ticket Price] =
   AVERAGE(MovieTicketPrices[Avg  Price])

[Tickets Sold Approx] =
   [US Movie Gross]/[Avg Movie Ticket Price]

Flattened Pivot

One of my favorite techniques, the flattened pivot!

image

Yields…

image

Turn Off Subtotals

To get a truly flat pivot, you then have to turn off subtotals:

image

And that yields:

image

Download the Workbook!

Go ahead and grab the whole thing here:

http://www.powerpivotpro.com/wp-content/uploads/2012/04/ShareableMovies.xlsx 

(This only contains movies and ticket prices though – none of my other goodies like Inflation etc.)

What were the other two rankings?

They were both based on differing measures of inflation.  If I have time tomorrow, I may do a bonus post.


Mashup: Movie Grosses vs. Inflation

April 3, 2012

 
Avatar vs. Episode IV - Which is REALLY the #1 of All Time?

“I Just Don’t Believe that Avatar Truly Out-Earned Star Wars Episode IV”

               -Me

(Image Credit:  Channel 24.  Click Image for Original)

You KNOW It’s Been Bugging You!

You’ve probably heard by now that “Avatar” is the #1-grossing movie of all time.  And before that, “Titanic” was #1.

Neither of which, of course, is Star Wars.  Which any child of the 1970’s KNOWS is the TRUE #1.  To suggest otherwise is blasphemy.  Let’s fix this crime.
 

The “Official” Box Office Ticket Sales Figures Do NOT Count Inflation!

Here’s the official all-time ranking (for the United States only):

image

US Ticket Sales Top Ten – NOT Inflation Adjusted
(source:  the-numbers.com – click image for raw data)

You need look no further than Episode I outgrossing Episode IV to know something is amiss Smile

Notice how five of the top ten films were all released in the past eight years?  Have movies gotten that much better?  Nope.

But ticket prices have sure gone up haven’t  they?  And hey, since I now have inflation figures from Shadowstats.com all pre-loaded into a model on my desktop (see this post about Moore’s Law and Inflation), it’s time for another quick mashup.

Question to Readers:  Which of these seems most correct?

PowerPivot is amazing.  One copy-paste of movie grosses, a quick relationship to the Years table, a few simple measures, and I have JUSTICE!

Here are three alternate rankings of all-time US movies, courtesy of my PowerPivot mashup.  Which one seems most realistic to you?

Three Inflation-Corrected Movie Rankings

They all share the obviously-correct #1 movie – comforting!  But they differ quite a bit on movies 2-10.

In each of the three alternates, I used a different data set to “level set” the price of tickets to the year the movie was released.  But other than data set, it was the same methodology.  It took me a lot longer to find the data sets than it did to produce these three alternate rankings.

So, which one seems most correct to you?  Let me know, and Thursday I will reveal the data sets behind all three.


Moore’s Law and Inflation

March 13, 2012

 
SQL Rockstar, aka Tom LaRock (Blog | Twitter) sent me a fascinating data set the other day:  a table of different computing devices over the years, their “horsepower” in calculations per second, and how much they cost:

image 

Source:  The Rise of the Machines

The Cost of a Million Calcs per Second, Over History

Tom remarked that some modern devices, like his iPad2, have more computing power per dollar than even some of the recent supercomputers.

To validate this, I added a calc column called “Cost per MCalc”

  =Computers[Cost]/(Computers[Comp per Second]/1000000)

And indeed, his iPad2 is cheaper per million calcs than, say, 2002’s Earth Simulator:

image

By a lot, too.  Like 40x cheaper per MCalc.

But then he had an astute follow-on question:  how would that change if we took inflation into account?  That’s when he tagged me in.

Enter Shadowstats!

Tom’s idea finally gave me an excuse to subscribe to Shadowstats.com.  About six months ago I even emailed them and asked them whether they had considered setting themselves up on Azure Datamarket.  (Their answer:  not yet.  My answer:  I’ll be back to convince you later.)

Shadowstats provides historical data on things like inflation and employment, and provides it in a convenient format (which is in itself quite valuable – have you ever tried to make sense of the data from .gov sites?  It’s a labyrinthine mess.)

image

CPI Data From Shadowstats.com, Pasted
Into PowerPivot as a New Table

Wow, 11% inflation in the US in 1974, my birth year?  Wow.  That’s intense.  And it piles up quickly when you have a few years in a row of high inflation.

Cumulative Impact of Inflation

To measure cumulative impact, I added a new column:

image

What that shows us is this:  Prices were 65% higher at the end of 1977 than they were at the beginning of 1970.  A 65% increase in the cost of living in just eight years.

Let’s chart it:

image

Inflation:  In 2011 It Took $6 to Buy What $1 Bought in 1970
(Official US Govt Numbers)

Factoring Inflation Into Price per MCalc:  2011 Adjustment Factor

image

Tom wanted to convert everything into 2011 dollars, which makes sense.  In order to do that, I created two measures.  Cumulative Inflation Official is just the Average of the same column in the CPI table of my source data, and the Adj Factor is:

=CALCULATE([Cumulative Inflation Official], Years[Year]=2011)
    / [Cumulative Inflation Official]

In other words “take the Cumulative Inflation value for 2011 and divide it by the Cumulative Inflation value for the current year.”

Now I can use that factor to give me a 2011-Adjusted cost per MCalc:

image

Where that second measure is just:

=[$ per M Calc]*[Adj Factor for 2011 Dollars Official]

Which shows us that the iPad2 is an even better deal (in terms of MCalc’s), compared to the Earth Simulator, than we had originally thought – more like 50x cheaper as opposed to our original 40x cheaper.

What’s with the Playstation3?

It rounds to $0.00 per MCalc even when adjusted to 2011 dollars?  I know game systems are sold at a loss, but really?  Well, the Playstation IS the only gaming/graphics system in the data, and those are of course dedicated renderers of polygons, and NOT general-purpose calcs.  So the number is indeed a lot higher – gaming systems just offer an insane number of (polygon) calcs per second, which is why there is so much interest these days in using GPU’s for business calc purposes – if you can “transform” a biz problem into a polygon problem, you’re off to the races.

Tying it All Together

Here you go, the “final” results:

image

Cost of a Million Calcs per Second, Adjusted to 2011 Dollars

Why the heck does it go back UP????

Isn’t that interesting?  We see a steady and sharp decline in price per MCalc from the 1970’s all the way into 2006, but then prices start to RISE again?

The data IS skewed a bit by the fact that we only are looking at 26 computers.  So if we happened to have a supercomputer in the data in 2009 but no PC or server, that can throw us off.

Let’s take a look by type of computer, then:

image

All Three Computer Types Show a Price per MCalc INCREASE in the late 2000’s
(Click for Larger Version)

OK, the shapes are a bit different, and the 2006 “plunge” in price for PC/Gaming/Handheld IS indeed due to 2006 being the year where the PS3 shows up…  but all three charts DO show some form of recent price increase per M Calcs.

So what’s going on?  Our Quest for Power Meets Physics.

I’m no expert but I’ve read enough over time to have a decent idea what’s causing that rise.

For the most part over computing history, our quest hasn’t really been for “cheap.”  It has been a quest for power.  My Dell PC in 1992 was about $3,000 and offered 33 M Calcs.  Five years later I bought a $2,500 machine that offered about 300 M Calcs.  So in five years the price fell a little but the power grew a lot.image

In theory, at that point I could have bought the 1992 level of power (33 MCalcs) for $300 or so.  But that’s not what I did.  I bought a more powerful machine rather than a cheaper machine.  My roommates would have made fun of me.  Conor lorded his fancy 3d graphics card over me every time we played Quake.  I needed 300 MCalcs!

Moore’s Law is based on our ability to continually cram ever-more transistors onto a single chip.  And as we’ve gotten closer and closer to the size of atoms, we’ve hit a bit of a limit in that regard.  Moore’s Law is “stalling.”

When Moore’s Law stalls, do we stop chasing power?  Nope, we just go in a different direction.  We start going multi-core.  Multi-CPU.  And while that DOES deliver more MCalcs, at the moment, it’s a more expensive way of doing it than the old way of “keep shrinking the transistors.”

Next Up…

One of the REALLY cool things about PowerPivot is its “mashup” capability.  I’ve shown it over and over.  But now that I have a good source of inflation statistics, I can dollar-correct ANYTHING.

image

I Can Now Mash This CPI Model Into ANY Other Data Source
That is Associated with a Year or Date!

I’ll have another wrinkle to share about inflation later this week, and the hint is the difference between “official” and “real” inflation – you may even see some measures in the field list above.


Friday Bonus: UFO Sightings & Hallucinogen Use

October 7, 2011

 
clip_image002

I didn’t have this ready in time for the contest Microsoft was running, and I’m not sure they would have wanted this to represent PowerPivot in an “official” sense so it likely wouldn’t have won anyway.

But here it is folks – a data mashup of UFO sightings versus Hallucinogenic drug usage.  All sourced from real data, for the United States only, between 1930 and 2000.

To place everything on the same scale, all measures are “indexed” against their maximum.  UFO sightings peaked in 1999 and declined slightly in 2000 for instance, so 1999 is where the green line hits 100%.  And I indexed LSD and Ecstasy versus total Hallucinogen usage (of all types), so neither of those lines ever hits 100%

Before you go concluding that drug usage leads to UFO sightings that are merely hallucinations, remember that correlation is not causality.  It is just as likely that UFO pilots are drawn to drug usage.  They are particularly fond of spying on raves, apparently.


Jewelry Mashup for Alpha Geek Challenge

February 11, 2010

PowerPivotBookHello everyone. Rob invited me to do a guest post to talk about my Wives vs. Girlfriends entry in the Alpha Geek Challenge. That is a politically incorrect and provocative title for a real data analysis mash-up that I did with PowerPivot.

I spent the last 33 days locked in a room finishing five manuscripts for books about Excel 2010. One of those books will be PowerPivot For the Data Analyst from QUE. In the course of writing that book, I had to generate a lot of sample data sets to use as examples.

Some interesting things along the way:
1) I had some daily sales data for 2 ladies boutique stores. One is in a mall. The other is in an airport located 10 miles away. 3 years of mall history, only 2+ years of airport history because that store opened after the mall location.

2) Beginning theory: when there are rain delays, people have more time in the airport, so they buy more.

3) Beginning theory: when it gets hot, people go to the mall.

WeatherUnderground4) I knew that Rob had found weather data from somewhere, but as I started searching around, I could not find any good free weather databases. I did find a page a WeatherUnderground.com where you could retrieve weather statistics for one day for one city. Luckily, both of these stores were in the same city, so I only had to gather data for one city.

Unfortunately, getting three years of data meant visiting this web site 1095 times! Excel VBA lets you use the brute force approach, so while I was typing a chapter on one laptop, I set up a VBA macro on the other laptop to run an Excel Web Query over and over and over, changing the date in the URL each time. As each web page would come back to Excel, I would use VLOOKUP to pluck out the rainfall and the high temperature for the day.

5) Getting the data into PowerPivot was easy. Two linked tables. One relationship by date.

6) In the PowerPivot grid, I added some calculated columns with DAX:

  • To group the temperature data into buckets, I used a column called “Highs in the” and a DAX calculation of =INT(Weather[High Temp]/10)*10.
  • To group the daily dates into weekdays, I used =FORMAT(Sales[Date],”ddd”). This returns Mon, Tue, Wed, etc. You have to use Colin’s trick from the FAQ to actually sort those into custom date sequence. (Thanks Colin!)
  • Back in Excel, I had used the range version of VLOOKUP to assign classifications based on rain. The rainfall in inches was classified into “0-None”, “1-Trace”, on up to “4-Hurricane”.
  • I did not realize it when I started, but as I was doing DAX Measures, I found the need to have some concatenated fields so that I could do a count distinct. =CONCATENATE(Sales[Location],Sales[Date]) and =concatenate(Sales[Location],Sales[WeekdayName]).

7) I used the “four chart” layout in PowerPivot. I ended up using only three of those charts in the contest entry.

8) The important Measure that was enabled by DAX was “Sales by store by day that falls into the current filter context”.  There were some days in 2007 where only one store was open, and many days where 2 stores were
open. DAX Measures let me calculate total sales / (# of stores open on this day). That DAX Measure is: =sum(Sales[Net Sales])/Countrows(DISTINCT(Sales[LocationDays]))

9) During discovery, I used slicers, but for the contest entry, I needed different charts to have different filters, so I changed the slicers back to report filters so that one chart could be for the airport and another chart could be for the mall.

Surprising results:
As rainfall increases, sales plummet the airport. I guess that when people are freaking about because of flight delays, they are not busy shopping.

Chart1

Higher temperatures mean lower sales at the mall. When I was a kid, we would flock to the mall because they had air conditioning. Now, the people buying these high-end handbags probably already have air conditioning and no longer need to head to the mall.

Chart2

As I was looking for other trends, I ran one by weekday at the airport. Friday was the peak sales day during the week, probably indicating that business travelers are picking up a piece of jewelry on their way back home to their wife. But; I was surprised to see that the one day which beat Friday was Sunday. Sunday? If the business travelers are buying gifts for their wives on Friday, WHO the heck are they buying gifts for on Sunday? One theory is noted in the chart. In reality, that Sunday spike is probably from business women who are accessorizing up before heading out for their business trip. It would probably be interesting to break that Sunday data by hour to see if the sales are early in the day when people are heading home from weekend trips to the beach or on Sunday afternoon/evening when the business commuters are heading back out.

Once I had reached some interesting conclusions about the data, I spent some time formatting the workbook to not look like Excel. I got rid of a lot of the extra stuff that is standard on Excel charts, but then added some graphics (the rain is a pattern fill on the chart area in the left graph and the Delay Delay Delay graphic is a picture fill applied to the plot area. To use pictures in individual columns of the other charts, you have to click on a column twice. Once to select the series, and then a second time to select that one point. Then, use Format, Fill, from Picture.

I suppose the surprising PowerPivot lesson here was that the slicers were really helpful to me when analyzing the data, looking for interesting trends. Once I had decided on the message that I wanted to convey, though, the slicers had to go.

Thanks to Rob for allowing this guest post!