Hello everyone. Rob invited me to do a guest post to talk about my Wives vs. Girlfriends entry in the Alpha Geek Challenge. That is a politically incorrect and provocative title for a real data analysis mash-up that I did with PowerPivot.
I spent the last 33 days locked in a room finishing five manuscripts for books about Excel 2010. One of those books will be PowerPivot For the Data Analyst from QUE. In the course of writing that book, I had to generate a lot of sample data sets to use as examples.
Some interesting things along the way:
1) I had some daily sales data for 2 ladies boutique stores. One is in a mall. The other is in an airport located 10 miles away. 3 years of mall history, only 2+ years of airport history because that store opened after the mall location.
2) Beginning theory: when there are rain delays, people have more time in the airport, so they buy more.
3) Beginning theory: when it gets hot, people go to the mall.
4) I knew that Rob had found weather data from somewhere, but as I started searching around, I could not find any good free weather databases. I did find a page a WeatherUnderground.com where you could retrieve weather statistics for one day for one city. Luckily, both of these stores were in the same city, so I only had to gather data for one city.
Unfortunately, getting three years of data meant visiting this web site 1095 times! Excel VBA lets you use the brute force approach, so while I was typing a chapter on one laptop, I set up a VBA macro on the other laptop to run an Excel Web Query over and over and over, changing the date in the URL each time. As each web page would come back to Excel, I would use VLOOKUP to pluck out the rainfall and the high temperature for the day.
5) Getting the data into PowerPivot was easy. Two linked tables. One relationship by date.
6) In the PowerPivot grid, I added some calculated columns with DAX:
- To group the temperature data into buckets, I used a column called “Highs in the” and a DAX calculation of =INT(Weather[High Temp]/10)*10.
- To group the daily dates into weekdays, I used =FORMAT(Sales[Date],”ddd”). This returns Mon, Tue, Wed, etc. You have to use Colin’s trick from the FAQ to actually sort those into custom date sequence. (Thanks Colin!)
- Back in Excel, I had used the range version of VLOOKUP to assign classifications based on rain. The rainfall in inches was classified into “0-None”, “1-Trace”, on up to “4-Hurricane”.
- I did not realize it when I started, but as I was doing DAX Measures, I found the need to have some concatenated fields so that I could do a count distinct. =CONCATENATE(Sales[Location],Sales[Date]) and =concatenate(Sales[Location],Sales[WeekdayName]).
7) I used the “four chart” layout in PowerPivot. I ended up using only three of those charts in the contest entry.
8) The important Measure that was enabled by DAX was “Sales by store by day that falls into the current filter context”. There were some days in 2007 where only one store was open, and many days where 2 stores were
open. DAX Measures let me calculate total sales / (# of stores open on this day). That DAX Measure is: =sum(Sales[Net Sales])/Countrows(DISTINCT(Sales[LocationDays]))
9) During discovery, I used slicers, but for the contest entry, I needed different charts to have different filters, so I changed the slicers back to report filters so that one chart could be for the airport and another chart could be for the mall.
As rainfall increases, sales plummet the airport. I guess that when people are freaking about because of flight delays, they are not busy shopping.
Higher temperatures mean lower sales at the mall. When I was a kid, we would flock to the mall because they had air conditioning. Now, the people buying these high-end handbags probably already have air conditioning and no longer need to head to the mall.
As I was looking for other trends, I ran one by weekday at the airport. Friday was the peak sales day during the week, probably indicating that business travelers are picking up a piece of jewelry on their way back home to their wife. But; I was surprised to see that the one day which beat Friday was Sunday. Sunday? If the business travelers are buying gifts for their wives on Friday, WHO the heck are they buying gifts for on Sunday? One theory is noted in the chart. In reality, that Sunday spike is probably from business women who are accessorizing up before heading out for their business trip. It would probably be interesting to break that Sunday data by hour to see if the sales are early in the day when people are heading home from weekend trips to the beach or on Sunday afternoon/evening when the business commuters are heading back out.
Once I had reached some interesting conclusions about the data, I spent some time formatting the workbook to not look like Excel. I got rid of a lot of the extra stuff that is standard on Excel charts, but then added some graphics (the rain is a pattern fill on the chart area in the left graph and the Delay Delay Delay graphic is a picture fill applied to the plot area. To use pictures in individual columns of the other charts, you have to click on a column twice. Once to select the series, and then a second time to select that one point. Then, use Format, Fill, from Picture.
I suppose the surprising PowerPivot lesson here was that the slicers were really helpful to me when analyzing the data, looking for interesting trends. Once I had decided on the message that I wanted to convey, though, the slicers had to go.
Thanks to Rob for allowing this guest post!