If you work for a sports organization, drop us a note at info@pivotstream.com – I have something interesting to suggest.
The Great Broken Links
March 15, 2010Just discovered this morning that all of my links to the Great Football Project were broken. Ugh, that’s the best intro to PowerPivot since it’s where I started blogging back in the Fall.
Do you work for a sports team?
March 5, 2010I see interesting things in the website logs. I don’t want to disclose who you are, but I very much would like to help you out.
I think we might be able to arrange something low-cost for you, to evaluate PowerPivot’s potential impact to what you do.
Please drop me a note at robert.c.collie@gmail.com and I will give you more information.
Give me good data or give me… um, good data
January 5, 2010
Lt. Weinberg: Doing what? Kaffee will have this finished in about four days.
Capt. Whitaker: Doing various administrative things. Backup. Whatever.
Lt. Weinberg: In other words I have no responsibilities whatsoever.
Capt. Whitaker: Right.
Collectively we’ve spent a lot of digital ink on the topic of the relative roles of IT and BI professionals on one hand and the Excel pros on the other – here and here for instance.
In those comment threads, some IT and DB pros were wondering whether PowerPivot marginalizes their roles, and the consensus has been “absolutely not, you might even be more important now.”
If you have any doubts about that, check out this fiasco.
In the last football post, I created a spiffy new measure for Rush Yards. Since then, I’ve created similar measures for Passing and Receiving Yards. And now I want to create measures for Touchdowns – Rush, Pass, and Rec. Trouble is…
Problem #1: No Touchdown Column
Yep, whereas my Plays table contains a [Yards] column, it does not contain any columns remotely related to touchdowns.
Awesome. So Where’s the touchdown column??? Well, in another table (FactPlays), I find a [Result] column, and one of the values of that column is “Touchdown” – Neat.
I filter to [Result] to “Touchdown” – and the [PointScored] column in the same table shows me nothing but 6’s, which is good news since 6 is the number of points in a touchdown
Problem #2: Not all 6-pt plays are touchdowns?
But then to make sure, I clear the filter on [Result] and then filter [PointScored] = 6, and guess what? Sometimes when [PointScored] = 6, [Result] = <Blank>…
…which is NOT good because the only way to score 6 pts is to score a touchdown. So that [Result] column is basically worthless.
But hey, no big deal, I can work around this by just using [PointScored] = 6 as my “Is Touchdown” test, either by creating a new calc column or just doing this directly in a measure.
All I need to do is relate this FactPlays table to my Plays table…
Problem #3: These two tables speak completely different languages
What columns can I use to link these two tables? Given that the relevant entity here is a Play, I start looking for things related to PlayID… and I find a trainwreck.
The Plays table contains [GameCode], [DriveId], [Period], [PlayUniqueId],[SeasonNumber], and [PlayGUID] – the last of which is a calc column I created awhile back by concatenating [PlayUniqueID] and [SeasonNumber] – I had determined that combination to be globally unique (the db lacked this to begin with – ugh).
The FactPlays table contains [GameDate], [GameCode], [Time], and [Quarter].
Hey look – NOTHING MATCHES UP. FactPlays contains nothing resembling a Play ID. It doesn’t even have a GameCode. Nope, games in FactPlays are uniquely identified by [GameDate] combined with [GameNumber], and I assume I can identify Plays by somehow combining those two columns with [Time], which is the time on the game clock. Oh wait, that’s not quite right. I have to then combine that with [Quarter], since the clock starts over at 15:00 in each quarter.
This is gonna take me a little while.
Griping with a Purpose
I was actually working on a much sexier blog post when I ran into this problem. I stopped to share it because this is real-world data. EXPENSIVE data at that! STATS charges six figures (or more) for subscriptions to this stuff.
IT/BI/DB Pros: Please don’t do this to us
OK, that’s obvious. And I am pretty sure that no IT/BI/DB pros reading this would ever design a db like this in a million years. But things like this manage to happen anyway – after 2 months of following a number of SQL MVP’s on Twitter, I can say that one common theme is “oh my god I just got a new project and this is the worst db I have ever seen – how am I supposed to clean this up???”
I think we know how it happens. Without being told, I can say with certainty that it happened like this: When STATS got started as a company, their value proposition was in the collection of data, data that was far more detailed than anywhere else. How the data got stored and shared was an afterthought. And boom, they had customers immediately, clamoring for that data. “Here, take our money! Send us the data however you’ve got it, we need it!”
Collect data, get paid huge sums of money, repeat. No normal human being would interrupt that process to clean up the storage format. I know I wouldn’t. Years later, they surely realize that this format is terrible. But customers’ apps are now tightly bound with this format, so any investment in a new format would have minimal impact, at first anyway.
Price of Entry, or Welcome Forcing Function?
Hey, it’s one thing when you’re selling data to, say, ESPN.com, and the cost of understanding the data is just a small piece of the application that is being built. But with PowerPivot, bad schema is at best an expensive time sink, and at worst, a source of user error.
So… IT/BI/DB pros, I’d like to hear from you – how far do we have to go in order to be ready for broad PowerPivot adoption? And somewhat alternatively, is PowerPivot perhaps a reason why data quality will be taken more seriously in your organization? I (mostly) come from the Excel side of things and would love to know what your experience tells you.
Next Football Post: Solving the Touchdowns Problem >>
Back home tonight, PowerPivot content resumes
December 1, 2009Sitting in Atlanta airport awaiting my flight back to Ohio. I’ve been away too long and am chomping at the bit to get some good PowerPivot and BI posts up this week. The Great Football Project in particular is calling to me, saying “Rob, come home. It is time.”
And I have some observational posts that have been rattling around in my head (and OneNote) throughout the trip…
See you all soon. Look for updates either tonight or early tomorrow.
-Rob
DAX: The =RELATED() Function
November 10, 2009A quick update here on the football project (not really warranting a video). The source db is awfully complex, and I’m in the process of wrapping my Excel-centric head around it.
For instance, in the Plays table, I’m trying to understand how many row constitute a single NFL play. It’s not one row per play – no, that would be too simple.
Here, for instance, is a field that claims to be a unique play ID, but when I sort by it, it clearly isn’t:
With the table sorted, though, I notice that there are multiple SeasonNumber values corresponding to PlayUniqueID = 1:
…so I add a calc column concatenating the two, with renewed hopes for uniqueness:
And… curses, no such luck:
At this point I realize that there really just are multiple rows per real event. There are other columns in there that I suspect are crucial: PlayerRole, PrimaryRole, etc. – all with integer values, sadly.
The good news, though, is that those integers are ID’s, and they reference into other tables that I are related to the Plays table. Once those relationships are set up, I can exploit a new function added by PowerPivot.
To gain some more understanding about how this table works, I decide to lean on a familiar Excel trick and just add some more columns, VLOOKUP-style, but using the new DAX function named =RELATED()
Be gone, VLOOKUP!
Check that out – it’s a single-parameter function! I just say, “hey, go get me this column from this related table,” and off it goes to fetch the values that match the row from the current table, using the relationship.
The equivalent VLOOKUP would be something like:
=VLOOKUP([PlayerID, CleanPlayers, 2, FALSE)
…and if the PlayerID column wasn’t the first column in the CleanPlayers table, well, it flat-out wouldn’t work.
Those three =RELATED() formulas yield:
Ok, this gives me some hope of figuring things out. (Steve Martin’s presence confuses me, though…)
I may or may not keep these columns in this table when I am done, since they are NOT required – I can build reports using the original fields in those other three tables, combined with fields from the Plays table, as demonstrated in previous posts. This is just a temporary convenience for exploring this table.
Note that Excel users work like this all the time. I kinda doubt DBA’s frequently add columns like this, expecting to just remove them shortly thereafter. But here I am, working with an environment that conveys real database advantages, but I’m still doing my informal, explore-and-experiment Excel shtick. I love it.
Next Football Post: The Greatest Function in all of PowerPivot >>
Sports fans use BI!
November 9, 2009I love this story, had to share. Apparently, Kansas City Chiefs fans noticed that Larry Johnson was just 75 yards short of passing Priest Holmes for the Chiefs’ all-time record for most career rushing yards.
Larry Johnson is quite simply, not a nice guy. His record off off-field violence and remarks has endeared him to exactly no one. And Priest Holmes was a class act that everyone loved, forced to retire early by a spinal condition.
So Chiefs fans started an online petition, requesting that the Chiefs intentionally bench Johnson, arguably their best player, so that he could not break Holmes’ record and forever be remembered as a Chiefs “great.”
Today, the Chiefs went one better and simply cut Johnson from the team. As in, go away and don’t come back. Just an incredible story really, both about the fans and the team’s management.
http://sports.espn.go.com/nfl/news/story?id=4637300
At least one meaty PowerPivot post coming later today. I have a special guest lined up
PowerPivot – Better football project links
November 7, 2009From looking at the site traffic, a lot of people are joining us in progress. The blog format of “most recent post first” isn’t so great for playing catchup, and I want to make sure that all new arrivals have the chance to follow the Great Football Project.
So I have done two things:
1) Added a permanent link at the top of the page, pointing to the introductory football post.
2) At the bottom of each football post, there will now be a link to the next football post. No more vertical zigzag scrolling to catch up.
Last night’s video, attempt #2
November 6, 2009OK, uploaded the full 287 MB file to YouTube. Much better than last night, although I’d still like something about 50% better. Anyone have tips for me? Drop me a comment or email
And still very interested in feedback on video vs. text/screenshot posts.
Next Football Post: The RELATED() Function >>
PowerPivot DAX & Relationships Payoff, Part Two
November 6, 2009(Continuing the series on the Great Football Project…)
OK, remember in the previous step, we saw how a relationship between Plays and CleanPlayers suddenly let me use a single PivotTable to slice Plays data by ANY column in CleanPlayers (and there was much rejoicing). But wait, there’s more!
And this time, I’m gonna try my hand at a video post. Here goes…
Would love feedback on this method vs. text-and-images. Strong preference for one or the other? Hybrid?
Is the video resolution sufficient for everyone? Windows Movie Maker was giving me any size video I wanted, as long as it was 30 MB
Might need another tool
Update: OMG that resolution is terribad. Gonna try again in the morning with a different toolset, it was better on my desktop even at 30 MB.
So I’m taking the vid down for now.
Posted by powerpivotpro