Lt. Weinberg: Doing what? Kaffee will have this finished in about four days.
Capt. Whitaker: Doing various administrative things. Backup. Whatever.
Lt. Weinberg: In other words I have no responsibilities whatsoever.
Capt. Whitaker: Right.
Collectively we’ve spent a lot of digital ink on the topic of the relative roles of IT and BI professionals on one hand and the Excel pros on the other – here and here for instance.
In those comment threads, some IT and DB pros were wondering whether PowerPivot marginalizes their roles, and the consensus has been “absolutely not, you might even be more important now.”
If you have any doubts about that, check out this fiasco.
In the last football post, I created a spiffy new measure for Rush Yards. Since then, I’ve created similar measures for Passing and Receiving Yards. And now I want to create measures for Touchdowns – Rush, Pass, and Rec. Trouble is…
Problem #1: No Touchdown Column
Yep, whereas my Plays table contains a [Yards] column, it does not contain any columns remotely related to touchdowns.
Awesome. So Where’s the touchdown column??? Well, in another table (FactPlays), I find a [Result] column, and one of the values of that column is “Touchdown” – Neat.
I filter to [Result] to “Touchdown” – and the [PointScored] column in the same table shows me nothing but 6’s, which is good news since 6 is the number of points in a touchdown
Problem #2: Not all 6-pt plays are touchdowns?
But then to make sure, I clear the filter on [Result] and then filter [PointScored] = 6, and guess what? Sometimes when [PointScored] = 6, [Result] = <Blank>…

…which is NOT good because the only way to score 6 pts is to score a touchdown. So that [Result] column is basically worthless.
But hey, no big deal, I can work around this by just using [PointScored] = 6 as my “Is Touchdown” test, either by creating a new calc column or just doing this directly in a measure.
All I need to do is relate this FactPlays table to my Plays table…
Problem #3: These two tables speak completely different languages
What columns can I use to link these two tables? Given that the relevant entity here is a Play, I start looking for things related to PlayID… and I find a trainwreck.
The Plays table contains [GameCode], [DriveId], [Period], [PlayUniqueId],[SeasonNumber], and [PlayGUID] – the last of which is a calc column I created awhile back by concatenating [PlayUniqueID] and [SeasonNumber] – I had determined that combination to be globally unique (the db lacked this to begin with – ugh).

The FactPlays table contains [GameDate], [GameCode], [Time], and [Quarter].

Hey look – NOTHING MATCHES UP. FactPlays contains nothing resembling a Play ID. It doesn’t even have a GameCode. Nope, games in FactPlays are uniquely identified by [GameDate] combined with [GameNumber], and I assume I can identify Plays by somehow combining those two columns with [Time], which is the time on the game clock. Oh wait, that’s not quite right. I have to then combine that with [Quarter], since the clock starts over at 15:00 in each quarter.
This is gonna take me a little while.
Griping with a Purpose
I was actually working on a much sexier blog post when I ran into this problem. I stopped to share it because this is real-world data. EXPENSIVE data at that! STATS charges six figures (or more) for subscriptions to this stuff.
IT/BI/DB Pros: Please don’t do this to us
OK, that’s obvious. And I am pretty sure that no IT/BI/DB pros reading this would ever design a db like this in a million years. But things like this manage to happen anyway – after 2 months of following a number of SQL MVP’s on Twitter, I can say that one common theme is “oh my god I just got a new project and this is the worst db I have ever seen – how am I supposed to clean this up???”
I think we know how it happens. Without being told, I can say with certainty that it happened like this: When STATS got started as a company, their value proposition was in the collection of data, data that was far more detailed than anywhere else. How the data got stored and shared was an afterthought. And boom, they had customers immediately, clamoring for that data. “Here, take our money! Send us the data however you’ve got it, we need it!”
Collect data, get paid huge sums of money, repeat. No normal human being would interrupt that process to clean up the storage format. I know I wouldn’t. Years later, they surely realize that this format is terrible. But customers’ apps are now tightly bound with this format, so any investment in a new format would have minimal impact, at first anyway.
Price of Entry, or Welcome Forcing Function?
Hey, it’s one thing when you’re selling data to, say, ESPN.com, and the cost of understanding the data is just a small piece of the application that is being built. But with PowerPivot, bad schema is at best an expensive time sink, and at worst, a source of user error.
So… IT/BI/DB pros, I’d like to hear from you – how far do we have to go in order to be ready for broad PowerPivot adoption? And somewhat alternatively, is PowerPivot perhaps a reason why data quality will be taken more seriously in your organization? I (mostly) come from the Excel side of things and would love to know what your experience tells you.
Next Football Post: Solving the Touchdowns Problem >>