Retailer Overlap Analysis Using PowerPivot

March 16, 2010

 
Clint Pondering Where to Buy More Cigars

 

“The Rite Aids on one side of town, the Walgreens on the other, and me right in the middle.”

-The Shopper with No Name (and a Fistful of Dollars)

 

One of the coolest things about my new job at PivotStream is all of the cool new data sources I get to play with.  I thought I’d take a brief break from the Rank member posts and share a little bit from one of those sources.

Retailer Overlap Report

I just finished cooking up this report – click for full-size version.

All Retailer Intersections with PowerPivot

What does that show?

It shows the top 15 US retail chains by number of stores, and their competition against one another by ZIP code population.

For example:  take the first row, Albertsons.  Of all the people who live near an Albertsons, 65.9% of them also live close to a CVS, but only 4.7% live near a Rite-Aid.

Cool, huh?

Marketing versus competitors

Let’s say you are in charge of advertising and promotions for a large grocery or drugstore chain.  One of your primary jobs is to lure consumers into your stores as opposed to your competitors’.  Every day you receive a collection of all the specials, advertisements, etc. that your competitors are running in various regions of the country.

Naturally, you want to respond to their efforts.  But you can’t focus on them all equally, or you’d fall hopelessly behind.  So, which ones should you pay the most attention to?  This is the kind of thing that would help you.

Sliceable too!

OK, let’s say you work for Safeway, and you only run advertising for them in the South region.  How useful is the nationwide report above?  Hard to say.

For instance, nationwide, Walgreens competes with you for 47.3% of your customers, and Rite-Aid is second at 36.1%.  Does that hold up in the South?

One click and you have a brand new report:

South Region Only Retailer Intersections with PowerPivot

And look, Walgreens falls to 14.1%!  Rite Aid climbs to 52.5%…  but CVS now checks in at a whopping 73.5%!

So no, the national report would mislead you.  Same thing would be true if you ran all newspaper advertising nationally for Safeway, but you were ignoring an ad that CVS was running in the South.

Also Sliceable by Income Range!

Lastly, let’s say you work for ShopRite, and you control advertising in the South region.  Walgreens is running a monthlong special on Product X, and you happen to have the ability to run a better special on Product X…  but should you?

Well, you consult the report above, and it shows you that Walgreens only competes with you for 34.9% of your customers.  So you decide not to run a special on Product X, since that costs you a lot of money for not enough benefit.

But suppose you knew that Product X was primarily targeted at households making between $50K and $75K.  Slice by that income and you get:

South Region Only Specific Income Range Retailer Intersections with PowerPivot

Hey look, Walgreens competes with you for 61.6% of your customers in that income range. 

So that ad campaign might be targeted specifically at stealing business from ShopRite, for all you know :)

How did I do this?

It’s complicated, and once again pushed the bounds of what I know about DAX measures.  It starts out a lot like a Market Basket Analysis, which takes some work to explain, but then it takes another twist.

I’ll explain in subsequent posts :)

Flexibility!

But the crucial thing to note here for now is that this amount of flexibility is impossible in Excel by itself.

I mean, if I wanted to build a report that showed overlap between any two retail stores, with particular variables like region and income level held constant, I can do it.  Takes awhile but I can build one.

But as soon as someone says something like “I want to see this report filtered down to a different region,” well, you’re doing surgery on your original report.  Sometimes that takes almost as long as it did the first time around.

And when you’re done, well, now you have TWO reports to maintain.  Fun fun!

That’s the coolest thing about this report for me:  it remains flexible.  Its consumers don’t have to ask me for new versions of the report.  They can just click to filter.

And if they want new variables added, like racial demographics, that’s just a matter of adding new calculated columns to the PowerPivot window, not a new report.  I might even do that for next time.


Guest Post: Dany Hoter on Cube Formulas

March 1, 2010

 
album shot

“OK, I want the brooding frontman in the foreground, and the keyboard player and drum machine programmer in the back.”

-Photographer for the EuroSynth band, MDX

I’ve always loved this photo.  Dany Hoter, master of the analysis arts, takes his rightful place in the foreground.  Pierre-Henri and I, lurking in the back with mirthful smirks.

It always looked to me like the back cover of a cd booklet – this is kinda what I expected the Chemical Brothers to look like.  Alas, there is no cd, and no band.  This was taken in 2003 by Allan Folting (the Pivot Master) while the four of us were awaiting a hydrofoil car ferry (!) in Denmark.  (Post-production effects a la U2’s “Unforgettable Fire” provided by Dave Gainer, former Group Program Manager for Excel).

Today folks, we are honored to have Dany, the world’s leading authority on cube formulas, finish the lesson that I began.  A fitting conclusion indeed.

(For more information on Dany, one of my favorite human beings, be sure to check out his profile).

Take it away, Dany…

In the last chapter of the Fantasy Football case, Rob introduced the Excel cube functions. These 7 functions were introduced in Excel 2007 and are not directly related to PowerPivot. The cube functions purpose was to enable data from OLAP cubes to be brought into Excel cells using functions and thus enable any shape of report combining other Excel formulas, data from different cubes in one report etc.

Because PowerPivot create a data source which is compatible with OLAP cubes it can be consumed by these functions and open up many interesting scenarios.

In Rob’s post he used cubeset to generate the children of a member and to show the top N members in the set using a measure.

In the comments that followed Rob’s post people were asking about allowing filters on other dimensions while calculating the top N members in the set.

Download the sample now!

It is probably easiest if you go ahead an open up the workbook I created, located here, and then follow along below.

How it works

The only thing we need to keep in mind in to achieve our goal is to make sure that the order of the members in the set is by descending order of the measure combined by the filters. If you want to report on the top 10 products in June and show how they are selling in July , you need to make sure to get the top 10 based on sales and June. May be you want also to use a specific geography and report on the top 10 products in Canada in June.

The way to sort a set by a combination of values from different dimensions is by creating a tuple which is in MDX terminology a collection of single members , each from a different attribute(Or hierarchy). Sales of Ice cream in Seattle in June is a tuple with three members in it. A tuple can be created by reusing one the functions we already met in the original post – cubemember.

If you supply a range of cells to cubemember instead of just one it will create a tuple from all the members represented by the cells. It is your responsibility to make sure that each cell in the range contains a valid member and that they are all from different fields/columns.

In the attached example you can see such an example based on AdventureWorks data.

clip_image002

You see three slicers, on Country year and month. Notice that I use the month number and not the month name because I haven’t found a way to sort the months by their number value while showing their names.

In the upper box you can decide how to create the top 10 product report. You can choose between the selected month sales and the previous month. You can also choose between using just the measure value or a tuple which is created from the values of the filtered members in the slicers + the measure. In order to use the filtered member I need to extract the first member from the slicer set using cuberankedmember. A set is not a valid argument to cubemember.

So the formula to extract the first member from the countries filter is =CUBERANKEDMEMBER("Sandbox",Slicer_EnglishCountryRegionName,1)

And the formula using to define the tuple is =CUBEMEMBER("Sandbox",(O3,O4,O6,IF(L3="Selected Month",P5,O5)),"Tuple")

P5 extracts the filtered month while O5 contains the following formula =CUBEMEMBER("Sandbox","[Time].[MonthNumberOfYear].["&P5&"].lag(1)")

The MDX function lag(1) returns the previous member of a given member.

The solution has one main flaw: it cannot guarantee the right order and thus the true top 10 products if the user selects more than one member in any one of the filters. The reason is that multiple members cannot participate in the tuple creation and so only the first member in the filter set can be used. In the example a message is shown if the user selects multiple members .

We can be even more strict and blank the report completely if the value of cell Q3 is >1. This cell contains the formula =CUBESETCOUNT(Slicer_CalendarYear)*CUBESETCOUNT(Slicer_EnglishCountryRegionName)*CUBESETCOUNT(Slicer_MonthNumberOfYear)

Remember that each slicer returns a set with the filtered members and so can be consumed by cubesetcount and cuberankedmember. The same is true for any page field in a pivot.

The set in a page field is referenced by its cell address while the set in a slicer by using the slicer’s name.

I use conditional formatting to show only rows that are not empty and to show a border for the non-blank rows.


Generating a Tag Cloud via PowerPivot

February 21, 2010

blog1 “Navajos say it is good plan to have two arrows for one bow”

– Chief Thundercloud

Thanks to Rob Collie for letting me do another guest post. There will be a bit more Excel than PowerPivot in this post, but that’s what you get when you give MrExcel access to the blog!

Donald Farmer issued an interesting 3rd challenge for the Alpha Geek Challenge contest. He was looking for interesting visualizations created from PowerPivot pivot tables – something beyond the regular charts. 

Please take a look at the contest and vote for your favorites.

Update: Here is a link to an animated GIF showing the tag cloud in action. Watch the slicers along the right as well as the tag cloud and the SmartArt. Also, when you click on the next image, you should see a higher-resolution screen capture.

blog16 My entry uses 40,000 rows of Billboard Hot 100 Music Chart data and throws just about every visualization that I could think of:

  • A Decade slicer. This slicer was dragged outside of the bounding box so that I could make the buttons larger than usual and use two columns.
  • Four paired line charts that compare the decline of the Vocals genre and the rise of Country and Rap. These charts are static and use pivot tables that are not tied to the slicer.
  • A blue data bar that shows the average numbers of weeks on the chart by genre. These data bars are static and are not tied to the slicers.
  • A red data bar that shows the # of Unique Artists by genre for the selected decade(s). This required a DAX Measure with =COUNTROWS(Distinct(Table1[Artist])). This data bar uses one of the new feature in Excel 2010 where the data bar runs right-to-left. This is a cool feature for creating comparative histograms.
  • A set of 10 Win/Loss Sparklines that show year-by-year history for the top 10 artists. (Top 10 is based on song-chart-weeks. If Elvis had 10 songs on the chart for 10 weeks each, that would count for 100 chart-weeks.) More about this sparkline later.
  • The range F3:J18 contains some SmartArt that is dynamically refreshed to show (a) the top 3 genres for the selected decades, and (b) the top 3 artists in those genres for those decades. Now, since SmartArt is always static and never dynamically updated, this one was quite a trick.
  • The range E19:M23 contains a Tag Cloud of the top 10 artists for the selected decade. More about this one later.

A Bit About the Data

The data set was ugly. There were 20 columns of identifying data that provided year, genre, artist, title, beats per minute, followed by 67 columns listing the rank of Week 1 through Week 67. The original data set went back to 1890. Since I eventually hoped to publish a non-PowerPivot example on my public SkyDrive, I trimmed the data down dramatically. I added some calculated columns to count the # of weeks on the chart, the peak, a calculation to look for songs that hit #1 and other songs that peaked at #2 (see the “First Loser” chart on the SkyDrive version). I added a calculation in Excel to calculate the Decade. I deleted the song titles since this would screw up the Vertipaq compression and I did not plan on reporting at that level of granularity. I deleted the 67 columns of weekly data. The final file was 29K rows x 17 columns.

Initial Pivot Tables

My first pivot table had genre in the row area. Chart Weeks was in the value area, sorted descending. From this, I learned that Rock, Vocal, Country, Rap, and R&B were the top genres since 1950. I initially created a single chart showing all five lines, but then I used a trick from my Charts & Graphs book where four smaller charts can communicate more information that one large chart. These four charts became the first visualizations in the top left corner of the dashboard.

blog3

Creating the Sparklines

Sparklines are new in Excel 2010. Professor Edward Tufte imagined Sparklines as tiny, intense, word-sized charts in his book, Beautiful Evidence. As I was working on my book for Excel 2010 charting, I started to realize that (a) the best sparklines are larger than a single cell, (b) benefit from outside labels, and (c) benefit from manual gridlines.

In the image below, you can see that Elvis Presley had a song on the Hot 100 every year from 1958 through 1977, then hit the chart again in 1981, 1982, and 2004. In the second row, Elton John charted every year from 1970 through 2001, except for 1987. Fascinating stuff, right?

To make a sparkline larger than normal, you can increase the height and width of a cell. If that is not possible, then you can merge cells. (As MrExcel, I despise merged cells and I am a bit horrified that I am actually suggesting that you merge cells, but merged cells let you fit a lot of sparkline information into an area). In the screenshot below, cells B25:M25 are merged into a single cell and represent a single sparkline. Cells B26:M26 are merged into a single cell. I had to keep merging each row one at a time.

blog11

The data for the sparklines are off to the right of the screen. There are years from 1958-2009 stretching across the columns. A count of the # of songs by artist occurs in each cell. These Win/Loss sparklines actually need a positive number for the green marker and a negative number for the red number. I ended up using (# of Songs*2)-1. This would force cells with zero to become -1. Cells with 1 would still report 1, and songs with 2 would end up reporting 3, but I didn’t care, as all I wanted was positive or negative values.

The labels for 1970′s, 1980′s, through 2000′s in white are text boxes that I added manually. The dotted white lines that separate out the decades are Excel drawing tools added with the Insert tab. (Tip: Hold down the shift key while drawing the line to make sure it is perfectly straight. Once you get the first line drawn in, use the Drawing Tools Format tab to change the fill color to white and use a dashed line. Once you get the first line drawn in, Ctrl+Drag to make identical copies of that line.

Creating the SmartArt

blog4 SmartArt is new in Excel 2007. Use the SmartArt icon on the Insert tab in Excel. That is a Vertical Chevron List. I created the SmartArt with “Country” as the Level 1 text in each shape and “Bruce Springsteen” as the Level 2 text three times inside each shape. This is all just dummy text to allow Excel to size and shape the SmartArt. I used the SmartArt tools to apply a style and colors.

I created a pivot table that has Genre and Artist as the Row labels and Chart Weeks in the Values area. Both of those fields are set to use a Top 10 Filter to show the top 3 items based on Chart Weeks. This pivot table uses the Decade as a slicer.

Formulas point to that pivot table to build the text that should show up inside the SmartArt. The formula for the first Genre chevron is simple: =AT40. The formula for the three bullet points next to the first chevron is more complicated: =AU40&CHAR(10)&AU41&CHAR(10)&AU42. Those CHAR(10) are line feed characters and translate to bullet points when applied to the SmartArt.

Once you get the SmartArt looking just right, follow these steps:
a) Click inside the SmartArt to select the SmartArt and display the SmartArt Tools Design tab of the ribbon.
b) On the right side of the ribbon, choose Convert to Shapes.
What did this accomplish? Well, SmartArt which is new in Excel 2007 can not do one simple trick of Shapes which have been around since Excel 97. The text in SmartArt is static. In contrast, the text in a Shape can come from a formula in another cell.

You are going to do these steps 6 times:
a) Click on the text box inside the first Chevron. You want the bounding box to be solid, not dotted. If you get a dotted text box, click on the dots to make it solid.
b) In the formula bar, type an equals sign and point to the cell which will contain the first genre.
c) Press Enter.
You now have a dynamic text box on the shape.

Repeat for the other 2 genres and for the three text boxes that contain the top three artists.

Creating the Tag Cloud

Of the whole entry, I like the tag cloud the best.

I will have to try to build an animated GIF showing the SmartArt and the Tag Cloud as you choose various combinations of decades in the slicers.

Here is how to create a dynamic tag cloud using PowerPivot. AGCPost01The underlying pivot table has the Top 10 artists based on Chart Weeks, using More Sort Options to keep the data sorted high to low. The pivot table respects the decade slicer. The pivot table is nothing special, as shown here.

I noticed that many artists were in the database as “Beatles, The”. I wanted to get rid of the word “The”. So, below the pivot table, I used =SUBSTITUTE(AZ40,”, The”,”") to remove the comma and the word “The”.

AGCPost02 This screenshot shows how AZ63:BD63 is merged and uses a formula to point to the original pivot table.

In this new section of the worksheet, I bumped the top artist up to 36 point bold.

It is important to make the Fill Color for those 10 cells to be No Fill. This is the only way to make the area around the words be transparent when you create linked pictures.

Change the color of each value to another color.

Make the #2 artist a bit smaller, and progressively smaller as you continue down the list.

To build the Tag Cloud, you are going to use something that old-time Excellers know as the Camera Tool. The command sequence no longer uses the words “Camera Tool” anymore. Here’s what you do:

Select AZ63 and press Ctrl+C to copy that cell.

blog14 Go to the tag cloud area. Use the Paste dropdown on the Home tab and choose Linked Picture from Other Paste Options. This creates a picture of the top ranked artist. The background of the picture is transparent, so the orange color shows through.  (Technical problem: I changed the decades chosen in the slicer between the R. Kelly item above and the Madonna item at left).

Continue copying each individual cell and pasting as a picture. Arrange the pasted pictures in a somewhat random fashion, trying to keep the words from overlapping. The net result is that as you choose decades from the Slicer, the 10 artists in the Tag Cloud automatically update.

Here is the tag cloud for the 50’s:

blog12

Here is the tag cloud for the 80’s+90’s:

blog13

So – there you have it – a lot of basic Excel blocking and tackling, along with a bit of cool DAX measures to calculate the number of distinct artists in a genre. Again, please take a moment to visit the contest and vote for your favorites.

Will this Work On the Server Version?

Rob and I had a discussion this morning about the running this on PowerPivot Server. Since both the tag cloud and the SmartArt replacement are using shapes, none of those will render in Excel Services. Thus, this is a client-only solution for now.

Technorati Tags: ,,,

-Bill Jelen


Jewelry Mashup for Alpha Geek Challenge

February 11, 2010

PowerPivotBookHello everyone. Rob invited me to do a guest post to talk about my Wives vs. Girlfriends entry in the Alpha Geek Challenge. That is a politically incorrect and provocative title for a real data analysis mash-up that I did with PowerPivot.

I spent the last 33 days locked in a room finishing five manuscripts for books about Excel 2010. One of those books will be PowerPivot For the Data Analyst from QUE. In the course of writing that book, I had to generate a lot of sample data sets to use as examples.

Some interesting things along the way:
1) I had some daily sales data for 2 ladies boutique stores. One is in a mall. The other is in an airport located 10 miles away. 3 years of mall history, only 2+ years of airport history because that store opened after the mall location.

2) Beginning theory: when there are rain delays, people have more time in the airport, so they buy more.

3) Beginning theory: when it gets hot, people go to the mall.

WeatherUnderground4) I knew that Rob had found weather data from somewhere, but as I started searching around, I could not find any good free weather databases. I did find a page a WeatherUnderground.com where you could retrieve weather statistics for one day for one city. Luckily, both of these stores were in the same city, so I only had to gather data for one city.

Unfortunately, getting three years of data meant visiting this web site 1095 times! Excel VBA lets you use the brute force approach, so while I was typing a chapter on one laptop, I set up a VBA macro on the other laptop to run an Excel Web Query over and over and over, changing the date in the URL each time. As each web page would come back to Excel, I would use VLOOKUP to pluck out the rainfall and the high temperature for the day.

5) Getting the data into PowerPivot was easy. Two linked tables. One relationship by date.

6) In the PowerPivot grid, I added some calculated columns with DAX:

  • To group the temperature data into buckets, I used a column called “Highs in the” and a DAX calculation of =INT(Weather[High Temp]/10)*10.
  • To group the daily dates into weekdays, I used =FORMAT(Sales[Date],”ddd”). This returns Mon, Tue, Wed, etc. You have to use Colin’s trick from the FAQ to actually sort those into custom date sequence. (Thanks Colin!)
  • Back in Excel, I had used the range version of VLOOKUP to assign classifications based on rain. The rainfall in inches was classified into “0-None”, “1-Trace”, on up to “4-Hurricane”.
  • I did not realize it when I started, but as I was doing DAX Measures, I found the need to have some concatenated fields so that I could do a count distinct. =CONCATENATE(Sales[Location],Sales[Date]) and =concatenate(Sales[Location],Sales[WeekdayName]).

7) I used the “four chart” layout in PowerPivot. I ended up using only three of those charts in the contest entry.
8) The important Measure that was enabled by DAX was “Sales by store by day that falls into the current filter context”.  There were some days in 2007 where only one store was open, and many days where 2 stores were
open. DAX Measures let me calculate total sales / (# of stores open on this day). That DAX Measure is: =sum(Sales[Net Sales])/Countrows(DISTINCT(Sales[LocationDays]))

9) During discovery, I used slicers, but for the contest entry, I needed different charts to have different filters, so I changed the slicers back to report filters so that one chart could be for the airport and another chart could be for the mall.

Surprising results:
As rainfall increases, sales plummet the airport. I guess that when people are freaking about because of flight delays, they are not busy shopping.

Chart1

Higher temperatures mean lower sales at the mall. When I was a kid, we would flock to the mall because they had air conditioning. Now, the people buying these high-end handbags probably already have air conditioning and no longer need to head to the mall.

Chart2

As I was looking for other trends, I ran one by weekday at the airport. Friday was the peak sales day during the week, probably indicating that business travelers are picking up a piece of jewelry on their way back home to their wife. But; I was surprised to see that the one day which beat Friday was Sunday. Sunday? If the business travelers are buying gifts for their wives on Friday, WHO the heck are they buying gifts for on Sunday? One theory is noted in the chart. In reality, that Sunday spike is probably from business women who are accessorizing up before heading out for their business trip. It would probably be interesting to break that Sunday data by hour to see if the sales are early in the day when people are heading home from weekend trips to the beach or on Sunday afternoon/evening when the business commuters are heading back out.

Once I had reached some interesting conclusions about the data, I spent some time formatting the workbook to not look like Excel. I got rid of a lot of the extra stuff that is standard on Excel charts, but then added some graphics (the rain is a pattern fill on the chart area in the left graph and the Delay Delay Delay graphic is a picture fill applied to the plot area. To use pictures in individual columns of the other charts, you have to click on a column twice. Once to select the series, and then a second time to select that one point. Then, use Format, Fill, from Picture.

I suppose the surprising PowerPivot lesson here was that the slicers were really helpful to me when analyzing the data, looking for interesting trends. Once I had decided on the message that I wanted to convey, though, the slicers had to go.

Thanks to Rob for allowing this guest post!


PowerPivot Data in Scatter Charts, etc.

February 5, 2010

All Chart TypesThis was going around on the MS-internal discussion list this past weekend:

At right is a list of all chart types in Excel.  But not all of them are supported as PivotCharts.  Try to use XY (Scatter), Bubble, or Stock as a PivotChart, and you’ll get the alert below:

Illegal PivotChart Types

 

So, do you give up?

Convert to Cube FormulasNope.  Remember the post where I converted a PivotTable to Cube Formulas using that button on the ribbon?  This is a wonderful opportunity to use that feature in an unexpected way.

Once it’s converted to formulas, you can create any chart type you want against it, no restrictions.  It’s just regular old cells now.

Neat huh?

 

But what about grow/shrink?

The next question is usually “OK great I can scatter chart my data points.  But tomorrow, I may have a different number of data points.  Will my spreadsheet pick that up?”

The answer is “no, not even close.”  :)   When you convert to formulas, the original list of rows/columns is fixed, by virtue of the CUBEMEMBER function that makes up the axes.

Intrepid PowerPivot adventurers, however, may want to try out the following technique, by which the data WILL be picked up in your chart.

Oh no, not CUBESET again!

Oh yes.  CUBESET.  Again.  You’re gonna use that CUBESET function, and you’re gonna like it!

But you might have a love/hate reaction to some of the other Excel acrobatics I am about to share :)   For something that Excel actually supports natively, and seems to have gone out of its way to support, this should be easier to do than it is.  But why complain?  Fact is, it CAN be done, and it feels like magic.

It will be easiest if you follow along on your own desktop, so PLEASE GRAB THE WORKBOOK FROM HERE – that contains the results of everything I show below.

OK, you have the workbook now, right?  Let us continue.

To simulate the effect of getting more/less data, modify the Set in cell A1 to grab any column out of [DimCustomer] that you’d like.  Since each column has a different distinct number of members, you’ll get behavior that is essentially the same as more/less data after a refresh.

The “magic” here lies in two things.

First is defining named ranges that refer to an OFFSET formula rather than a fixed range:

clip_image002

Here is that formula in case you want to copy paste it:

     =OFFSET(Sheet1!$C$4,0,0, COUNT(Sheet1!$C$4:$C$5000),1)

The formula I used for YRange is simply another offset off of that first named range (XRange), using this formula:

     =OFFSET(XRange,0,1)

Second is this:  YOU MUST NOT RELY ON THE REFEDIT CONTROL TO POINT YOUR CHART AT YOUR DATA

The reason is, that refedit control will eat your dynamic named range and convert it into a fixed range, destroying all of that incredibly obscure work you did above J

This is the chart refedit control, for reference.  What you put in here will NOT adjust, ever.

clip_image004

Instead, do this:

  1. Create the chart.  Point it at whatever data you currently have, as if it was going to be a static chart (yes, use the refedit control for now)
  2. Now, in the created chart, select the data range within the chart.  If you do this right, your data points will get selection handles and the SERIES formula will appear in the formula bar, like this:

clip_image006

3. Then you edit that formula, replacing any range of cells with your dynamic named ranges, XRange and YRange

   a. Note that I’ve already done that in the picture above

   b. Also note that you must prepend the name of the workbook, then a “!”, to your names in order
        to make them work

   c. You must also perform the happy Excel dance in order to appease the ancient lords of recalc

Also of note:

  1. I only copied the formulas down to row 5000, and set my named ranges to cap out at row 5000 as well.
  2. Some fields will overshoot that 5000th row.  I could have gone farther of course, I just got bored.  You can pick a bigger number.
  3. I wrapped all formulas in IFERROR so that I don’t get error popups.
  4. If you don’t mind popups, I believe this whole thing works without IFERROR
  5. If one of your cube formulas returns an error for any reason other than running off the end of the data, you will miss data points
  6. That’s because the dynamic named range is based off of COUNT, rather than off of finding the last valid data point
  7. Even if I had not used IFERROR, the errors trigger the same problem
  8. It might be possible to fix this problem using INDIRECT and some even more arcane techniques

Simple, right? :)


Visualizing SQL Indices with PowerPivot

December 4, 2009

Hey folks, got another special guest post for you.  Today we are honored by the presence of the SillyBoy himself, Scott Senkeresty.  Scott is a very close friend of mine that I met in my first week at Microsoft.  These days Scott is still at Microsoft, working on antivirus features.

…and like all of my techie friends, he hears a lot about PowerPivot from me, and has been playing with it in his work.  Here is one of his recent experiments:

This is a query you can run against any sql database:

query

It uses a dmv in sql that shows intersesting stats about your index.  I used it against our production database, and added a calculated column for “Total Reads” = Seeks+Scans+Lookups.

Then I create a Chart n’ Table… just cuz.  I created Slicers for TotalReads, Updates, and Index Name (for kicks).  I added Index Name as a Row Column and Reads as values.

Now it’s time to par’tay. 

I am interested in finding under-utilized indexes (lots of updates, few reads).  So, I just use the read slicer and select the first 5 or so.  The update slicer automagically updates… and I scroll down the remaining “big numbers”.  Select 5 or so of those… and <poof!> my data!

Notice that my Index Name slicer has my nasty’s highlighted.  Not required since my table & chart show that data, but it’s cool to see that auto-filtering works so well, I almost don’t need the charts.

And hey look, an index with 26 million updates, and not a single read… Thanks Power Pivot!

clip_image002

And here is the query again as text in case you want to copy/paste it into PowerPivot’s query tool:

SELECT object_name(s.object_id) as [Table Name], i.Name as [Index Name],  user_updates as Updates, user_seeks as Seeks, user_scans as Scans, user_lookups as Lookups FROM sys.dm_db_index_usage_stats AS s INNER JOIN sys.indexes AS i ON s.object_id = i.object_id AND i.index_id = s.index_id WHERE objectproperty(s.object_id,’IsMsShipped’) = 0 AND s.database_id = db_id()

Enjoy!


Fixed Version of the “Qty per Day” Measure

November 24, 2009

“Unless I am wrong, and I am never wrong, they are headed dead into the Fire Swamp.”

-Prince Humperdink

A funny thing happened today.  Kasper de Jonge asked a couple of questions (in comments) about the Introduction to DAX Measures post.  I answered the “easy” question, and then went off to find the answer to the difficult questions.

Heh heh, I had them backwards.  The “easy” question led to an in-depth discussion with Howie and Marius from the DAX team, during which I discovered an error in the “Qty per Day” measure.

I’ve uploaded fixed versions of the videos, so anyone watching for the first time (or re-watching now) will get the proper information.

And the workbook file available here is also fixed.

If you watched the vids already and don’t want to watch them again, here’s a quick summary of what I changed:

  1. Remember the golden rule, “Filter then Calculate?” 
  2. Filter only impacts the “home” table (aka Fact table).  So far so good.
  3. But the filtered home table is never then used to then filter other tables during the Calculate phase.  Not even if you use RELATEDTABLE().  So for example:
    1. In my former example, I used COUNTROWS(RELATEDTABLE(DimDate)) as the denominator of the measure
    2. That expression, however, was returning the grand total number of rows in DimDate, no matter how the pivot was sliced.  (Because of point 2 above)
  4. The fix was to use the [OrderDateKey] column from the FactInternetSales table (the home table) and do a COUNTROWS of the DISTINCT set of values in that column.
  5. Note that if there had been a slicer (or row/column field) from the DimDate table in the pivot, then the DimDate table would have been filtered, but:
    1. Not as a result of the Fact table being filtered
    2. It is strictly filtered only by fields in the pivot that came from the DimDate table
    3. And it is therefore much less filtered than the Fact table, which is filtered by all of the fields in the pivot
  6. Along the way I ran into the fact that the DISTINCT function requires an explicit reference to TableName[ColumnName], and does not accept the shorter [ColumnName] only syntax.

…yeah, the revised videos might still be worth a look :)


Temperature Mashup Demo – Download the Workbook!

November 18, 2009

As promised, here is the workbook that contains everything I have demonstrated so far in the Temperature Mashup example.

       Click here to download workbook

(NOTE: I discovered a bug in the original file – the Qty per Day measure was returning incorrect results.  The version above is now is fixed.  Also no longer needs to be renamed from .DOCX).

And the updated demo walkthrus for this workbook are here:

     Joining the temperature data to my sales data

     Creating a Sales Qty Per Day Measure


Introducing PowerPivot DAX Measures

November 15, 2009

Up until this point, I’ve just used DAX (PowerPivot’s extension to the Excel formula language) to create calculated columns in the PowerPivot window.

There’s a lot of utility in DAX calc columns, but that’s not where DAX really shines.  DAX measures let you do things in Pivot reports that simply weren’t possible in Excel before.

In the last video, I mentioned that merely showing raw sales amount didn’t really tell me whether my products sold better or worse in varying temperatures, since there are varying numbers of days of each temperature – perhaps the relatively low sales on Hot days is just because there weren’t many Hot days?

Enter DAX measures, a PowerPivot feature on par with relationships, slicers, and mashups.  But you might overlook them on a casual tour through the PowerPivot for Excel addin, since they live behind a single ribbon button.

DO NOT OVERLOOK DAX MEASURES!  There is literally a world of power lurking there.  I want to make absolutely sure that you grasp them – both their power as well as the “how to,” so I will spend the next couple of posts on them.

First, a word about videos vs. text and screenshots

Awhile back I asked for feedback on which format was preferred – text and screenshots, or videos.  Videos got more votes, by about two-to-one.  But my text-and-screenshot posts are getting more views than my video posts.

Clearly, those of us who are too busy to watch videos are also too busy to respond to surveys :)

I respect that – personally I think I prefer to consume text and screenshots over videos.  A video seems like a commitment, whereas scanning text is on my terms, even if on net I spend the same amount of time on either.  So here is my new philosophy, always subject to revision:

  1. I’m going to use videos whenever I am introducing something new, that benefits from the explanatory power of video
  2. Quick tips and tricks, as well as “power” techniques – I think I’m going to lean toward text and screenshots for these, unless it’s an in-depth technique
  3. Even when I use video, I’m going to try to summarize the content of the videos – not as an attempt to replicate their content, but more as a table of contents so you know why I think it’s worth your time to watch the video

So, I’ll share the videos first, and a summary after.

The Videos

 

 

(These are the fixed versions from 11/24/09)

Videos Summary

  1. Quick demo of conditional formatting and number formatting, and how surprisingly impactful they are
  2. The DAX Measure Dialog
  3. Qty per Day can be calc’d using =SUM([Quantity]) / COUNTROWS(RELATEDTABLE(DimDate))
    1. Simple formula, but how does it work?  Please consult…
  4. …The Five Golden (and Simple) Rules of DAX Measures
    1. Home Table = The table where the numeric columns are
    2. Columns always wrapped in aggregation functions like SUM()
    3. Calcs always are working against the source tables, not the pivot report
    4. Two phases:  Filter, then Calculate
    5. Think of things as if they happen cell-by-cell
  5. Once you grasp those, you can suddenly do AMAZING things

The temperature data from last night’s video

November 13, 2009

Just a reminder that the temperature data I used in last night’s video is available here.  Have fun :)

(I’ll post the full PowerPivot workbook, too, once CTP3 is released).