More on visualization

January 10, 2010

I know some of you have been following this topic closely, so I thought I’d point out another article.  This time, visualizations seen in the media are redesigned by members of the community to be clearer and more informative:

http://visualizeit.wordpress.com/2010/01/08/redesigned-visualizations/

It’s a little bit ironic, of course, that I had to stare at that post for awhile before figuring out what I had to do in order to see the redesigned graphic :)

But I would take these as examples of the chart police doing good, thoughtful work.  And the examples they are dissecting are just laugh-out-loud funny.


The FAQ is now accepting applications!

January 8, 2010

YoungGuns

“I’ll make ya famous.”

-William H. Bonney

When I first announced the FAQ, I mistakenly believed that I had a limited number of user accounts to give out.  Turns out, I have unlimited accounts, and the only limitation is on amount of content.  Right now that limit is 200 MB.

I’m pretty sure 200 MB is gonna have us covered :)

As you can see here, we’ve already got a number of excellent moderators for the FAQ (where “moderators” is a fancy word for “people who have write access.”)  I’m happy to add some more.

Moderator responsibilities are pretty light – you just have to be willing to contribute FAQ items from time to time.

If you’d like to be considered for this role, drop me an email or tweet (contact info at the bio link at the top of the page).  I don’t promise to take everyone who contacts me, but I will consider everyone.

Some guidelines:

  1. You do NOT have to be a Microsoft employee.  This is a community effort.
  2. You DO have to know what you are talking about.  You’ve spent time with PowerPivot already.  Maybe you’ve blogged about it, maybe you haven’t.  But you’ve done something you can tell me about.
  3. You DO have to post at least occasionally.  Over time I will trim the moderators list and clean out inactives.
  4. You DO have to be friendly, courteous, and responsible at all times.

Tobey-Maguire-Spider-Man“With great power comes great responsibility”

-Peter Parker

Basically I reserve the right to prune the list of moderators at any time and for any reason.  I won’t be a jerk about it of course (I, too, will follow rule #4), but if I need to tweak things I will.  Let’s consider this an experiment :)

I should also add that if you cease being a moderator for any reason, the items you contributed will stay on the site for the benefit of the community (but will remain attributed to you).

I don’t mean to sound harsh – what limited experience I have with this kind of thing tells me that clear expectations up front actually leads to a friendlier vibe all around.  No surprises = good.

With that out of the way, remember, we are here to help each other, and to enjoy the process.  Those are the goals.

Looking forward to hearing from you,
-Rob


Touchdown Problem Solved, and COUNTROWS

January 8, 2010

In the last post I was struggling with horribly-designed source tables.  To be honest, I never found a way to relate those two tables to each other, even through other intermediate tables that initially seemed promising.

But I DID find two columns in my Plays table itself that saved my bacon – [OffensiveTeamScoreAtPlayStart] and [OffensiveTeamScoreAtPlayEnd].

PowerPivot PtsScoredOnPlay Calc ColumnA simple calc column subtracting the two and now I have [PtsScoredOnPlay]:

  =[OffensiveTeamScoreAtPlayEnd]-[OffensiveTeamScoreAtPlayStart]

And that appears to give me what I want – see the picture at right.

(I was a little suspicious at first that the Offensive team could score 2 pts, since only defenses can score safeties, but then I realized these were 2-pt conversions by the offense – see how subtle “business rules” can be?)

OK, so with that calc column in hand, and my knowledge that all Touchdowns score 6 pts, I can go back to my PivotTable and add a measure:

 

   =CALCULATE(
         COUNTROWS(Plays),
        
Plays[PlayTypeName]="RUN-Run",
         Plays[PrimaryRoleName]="Runner",
         Plays[PtsScoredOnPlay]=6
    )

Dissecting the Formula

I’ve broken the formula into separate lines above.  Let’s go line by line.

  1. The first line is just the CALCULATE function.  I have previously described CALCULATE as a supercharged version of SUMIF.  That bears repeating.  Think “improved SUMIF.”
  2. The second line is the expression to be aggregated.  You are not limited to SUM.  In fact you can place many custom expressions here.  In this case I am using the COUNTROWS function.  I will explain that below.
  3. The remaining 3 lines are just conditional filters, like the “if” part of SUMIF, or more accurately, SUMIFS, since there are more than one.  Notice that the last one filters down to just plays that scored 6 pts, which are touchdowns.  The other two filters insure that this was a Rushing play, and the current player was indeed the one carrying the ball.

COUNTROWS is another good (new) friend

Longtime PivotTable users are familiar with a common trick.  When you want to count the number of rows from your source data that meet various criteria (in a PivotTable), you find a column in your data that is guaranteed to be unique, and then you add that field to your Values area, and change the “Summarize By” to count.

And if you didn’t have a column that contained unique values, well, you created one.

With COUNTROWS, you no longer have to do that.  COUNTROWS(<TableName>) works every time.

Tying it all together

OK, then, the formula above says:  “In my Plays table, count the number of rows that are rushing plays, where the current player is the one carrying the ball, that resulted in 6 pts.”

Does it work?  You bet.  PivotTable on left, ESPN on right.  (Priest Holmes, 2004).

PowerPivot RushTD Measure Priest Holmes PowerPivot RushTD ESPN Priest Holmes

OK, back to Sexy!

In the next post, I can now return to what I was originally trying to do, which was something semi-mind blowing…  if you’re an Excel geek anyway :)

Next Football Post:  Introducing the Anti-Pivot >>


PowerPivot for Science?

January 7, 2010

Is anyone using PowerPivot in a scientific environment, or considering it?

If so, drop me a note.  I know some folks who would like to talk to you.


Cursed spam filter!

January 6, 2010

Folks, when I first started this site, I wasn’t getting many comments.  I would check the spam filters regularly, and was surprised that, yep, everything in the spam filters was indeed spam.

Way to go spam filter!

But today I was browsing Bryce Johnson’s new blog, and discovered that he’d posted a comment here that never appeared.

So I checked the spam filters again.  And there were 6 real comments in there.  (Now approved).  I’m sure there have been more in the past, but they have expired out of the filter and are gone.

If your comments have been lost like this in the past, please let me know.  And I promise to check the spam filters regularly going forward.  Note that once I’ve approved your first comment, all others are auto-approved after that.


Give me good data or give me… um, good data

January 5, 2010

Kevin-Pollak-A-Few-Good-Men_5 Lt. Weinberg: Doing what? Kaffee will have this finished in about four days.
Capt. Whitaker: Doing various administrative things. Backup. Whatever.
Lt. Weinberg: In other words I have no responsibilities whatsoever.
Capt. Whitaker: Right.

 

Collectively we’ve spent a lot of digital ink on the topic of the relative roles of IT and BI professionals on one hand and the Excel pros on the other – here and here for instance.

In those comment threads, some IT and DB pros were wondering whether PowerPivot marginalizes their roles, and the consensus has been “absolutely not, you might even be more important now.”

If you have any doubts about that, check out this fiasco.

In the last football post, I created a spiffy new measure for Rush Yards.  Since then, I’ve created similar measures for Passing and Receiving Yards.  And now I want to create measures for Touchdowns – Rush, Pass, and Rec.  Trouble is…

PowerPivot - Results ColumnProblem #1:  No Touchdown Column

Yep, whereas my Plays table contains a [Yards] column, it does not contain any columns remotely related to touchdowns.

Awesome.  So Where’s the touchdown column???  Well, in another table (FactPlays), I find a [Result] column, and one of the values of that column is “Touchdown” – Neat.

I filter to [Result] to “Touchdown” – and the [PointScored] column in the same table shows me nothing but 6’s, which is good news since 6 is the number of points in a touchdown :)

Problem #2:  Not all 6-pt plays are touchdowns?

But then to make sure, I clear the filter on [Result] and then filter [PointScored] = 6, and guess what?  Sometimes when [PointScored] = 6, [Result] = <Blank>…

            PowerPivot - 6 Pts But no TD.

…which is NOT good because the only way to score 6 pts is to score a touchdown.  So that [Result] column is basically worthless.

But hey, no big deal, I can work around this by just using [PointScored] = 6 as my “Is Touchdown” test, either by creating a new calc column or just doing this directly in a measure.

All I need to do is relate this FactPlays table to my Plays table…

Problem #3:  These two tables speak completely different languages

What columns can I use to link these two tables?  Given that the relevant entity here is a Play, I start looking for things related to PlayID…  and I find a trainwreck.

The Plays table contains [GameCode], [DriveId], [Period], [PlayUniqueId],[SeasonNumber], and [PlayGUID] – the last of which is a calc column I created awhile back by concatenating [PlayUniqueID] and [SeasonNumber] – I had determined that combination to be globally unique (the db lacked this to begin with – ugh).

PowerPivot - Plays PlayID Columns

The FactPlays table contains [GameDate], [GameCode], [Time], and [Quarter].

                  PowerPivot - FactPlays PlayID Columns

Hey look – NOTHING MATCHES UP.  FactPlays contains nothing resembling a Play ID.  It doesn’t even have a GameCode.  Nope, games in FactPlays are uniquely identified by [GameDate] combined with [GameNumber], and I assume I can identify Plays by somehow combining those two columns with [Time], which is the time on the game clock.  Oh wait, that’s not quite right.  I have to then combine that with [Quarter], since the clock starts over at 15:00 in each quarter.

This is gonna take me a little while.

Griping with a Purpose

I was actually working on a much sexier blog post when I ran into this problem.  I stopped to share it because this is real-world data.  EXPENSIVE data at that!  STATS charges six figures (or more) for subscriptions to this stuff.

IT/BI/DB Pros:  Please don’t do this to us :)

OK, that’s obvious.  And I am pretty sure that no IT/BI/DB pros reading this would ever design a db like this in a million years.  But things like this manage to happen anyway – after 2 months of following a number of SQL MVP’s on Twitter, I can say that one common theme is “oh my god I just got a new project and this is the worst db I have ever seen – how am I supposed to clean this up???”

I think we know how it happens.  Without being told, I can say with certainty that it happened like this:  When STATS got started as a company, their value proposition was in the collection of data, data that was far more detailed than anywhere else.  How the data got stored and shared was an afterthought.  And boom, they had customers immediately, clamoring for that data.  “Here, take our money!  Send us the data however you’ve got it, we need it!”

Collect data, get paid huge sums of money, repeat.  No normal human being would interrupt that process to clean up the storage format.  I know I wouldn’t.  Years later, they surely realize that this format is terrible.  But customers’ apps are now tightly bound with this format, so any investment in a new format would have minimal impact, at first anyway.

Price of Entry, or Welcome Forcing Function?

Hey, it’s one thing when you’re selling data to, say, ESPN.com, and the cost of understanding the data is just a small piece of the application that is being built.  But with PowerPivot, bad schema is at best an expensive time sink, and at worst, a source of user error.

So…  IT/BI/DB pros, I’d like to hear from you – how far do we have to go in order to be ready for broad PowerPivot adoption?  And somewhat alternatively, is PowerPivot perhaps a reason why data quality will be taken more seriously in your organization?  I (mostly) come from the Excel side of things and would love to know what your experience tells you.

Next Football Post:  Solving the Touchdowns Problem >>


More discussion on gauges

January 4, 2010

Continuing the theme from the “chart police” post, more discussion on gauges/speedometers/dials over on the Dashboard Zone site:

Dashboard Zone 

Consensus from discussion around the chart police post seemed to be that these sorts of thing really ARE pretty gimmicky, and that there are better ways to get attention AND convey data.  I think I mostly agree, but I never say never :)

Been busy with the FAQ site over the weekend and today.  Tomorrow, I have to choose which PowerPivot topic to dive back into.  So much to choose from, my OneNote overfloweth! :)


The Great PowerPivot FAQ

January 1, 2010

PowerPivot FAQ “But I still haven’t found what I’m looking for.”

     -Bono

“In answer to your query, they’re written down for me.”

     -Blade Runner Holden

 

We’re fortunate to already have a number of great sources of PowerPivot information – the official site, some great blogs, and several forums.

We now find ourselves with a familiar problem, though:  given the breadth of the PowerPivot product, finding the answer to a specific question is often difficult unless you’ve been following all of those sources since their inception.  Digging through archives isn’t a lot of fun, even when assisted by a search engine.

So, here it is, The Great PowerPivot FAQ.

Contributing to the FAQ

Most of the q’s in the FAQ as of today came from a list I’d been maintaining in Excel, and I’ll of course be adding to it over time, but I hope to not be the only one responsible for all of this :)

So if you want to contribute, here are the three ways to do so:

  1. You can email me a question and answer (or post it in comments).  If I agree that it qualifies for the FAQ, I will post it to the FAQ and credit you as the contributor.
  2. You can send me an answer to a currently unanswered question, and again I’ll credit you.  Notice there’s an “Answered?” column in the FAQ, and there are a few “No’s” in there.
  3. I’m also hoping to have a few co-moderators who have edit rights.  I have a few people in mind and will be contacting them directly.  Depending on response, I might open it up to volunteers.

I don’t have enough logins to go around, and certainly don’t want to open anonymous editing, so for now that’s gonna have to be the system.

Why SharePoint?

I chose to use SharePoint because it gives me a convenient publishing mechanism – I get a hyperlinked table of contents for free, without manually having to keep that up to date in HTML.  It also gives readers the ability to sort and filter, and I can annotate with additional columns as needed.  I can use that site to publish all kinds of other stuff, too – other lists, wikis, whatever.  (If you have suggestions let me know.)

Also, the Data Grid view let me directly copy/paste my existing list of questions and answers from Excel directly into SharePoint.

And all hosted for $9 a month (as long as I’m ok with anonymous access, which I am).  Not bad.