A mystifying and awesome solution for many 2 many

imageRip van Winkle wakes up and looks around

I have a confession to make:  when it comes to technology, I’m a much better writer than reader.  I don’t pay nearly as much attention to what everyone else is saying in their books and on their blogs.  I’m too hunkered down in my own little world, figuring things out my way, to notice all the good stuff that may be going by.

Sometimes you just have to accept who you are though.  The way I work…  works for me.  I grind away on things, like erosion, until a simple picture emerges.  (And then I screenshot that sucker with WinSnap and throw it on the blog next to a movie quote, heh heh.)

Well something got into me recently.  I started checking in on SQLRockstar’s blog a bit more often, because MAN – that guy puts together a simultaneously technical and human series of posts, with ridiculously high production values.  I don’t know SQL and I don’t intend to learn, but I read anyway.  I like to think what’s happening on his site is similar to what happens here:  tech is being yoked and bent to serve humanity, not vice versa.

Oh and you definitely should check out his Disclosures page.  I mean, what if he were part owner of the gentlemen’s club whose explosion he covered?  I’d want to know that, and it’s very responsible of him to come clean up front so we can all read with confidence.  Don’t you feel better?  I do.  Here, let me try it.  Disclosure:  Tom is planning to review my book.  I feel so clean!

But I did something else the other day too.  I actually Googled something about PowerPivot.  (I know!  A big step for me!)  And I found something AMAZING.

The Dreaded Many to Many Problem

When I was at the SharePoint conference, one night I went to Ask the Experts.  If you’re a speaker at the conference, they put a tacky t-shirt on you that says EXPERT on it, and then they send waves of people to ask you questions about Reporting Services.  Which I know nothing about.  I tried to divert them to a real Reporting Services expert, but he was busy with someone else.  (Who probably was asking PowerPivot questions).

At some point though I started to get PowerPivot questions.  And one guy asked me this:

“We have a lot of many2many relationships in our business.  If we adopt PowerPivot, can Excel pros handle many2many?”  What’s many to many?  Here’s an example:

Simple Example of a Many to Many (M2M) Relationship Problem

Each customer likes more than one color, and there is more than one product of each color.
If we sell each customer all of the products from all of the colors that they like,
how much money do we make?

If you try to relate these tables to each other in PowerPivot, using Color as the linkage, you get an error:

The relationship cannot be created because each column contains duplicate values.  Select at least one column that contains only unique values.

You have a bad case of the many to manies.  Your prognosis is not good.

You CAN solve this using DAX measures and a few extra steps, but the DAX ends up being kinda scary.  At least to me.  The Italians aren’t afraid of it – they nail it here and here.  And I’ve followed their techniques before – the Retailer Competitive Overlap application uses it.  But it wasn’t fun.  At all.

So I answered honestly, and told him “no – it’s too complex to grasp for most, even I dread it, and it makes your formulas too complex.  M2M is something I encourage people to avoid.  It is not the simple part of PowerPivot.”

I don’t even attempt to cover M2M it in the book for instance.  It’s spicy scale level 6 in my opinion.

But I got home and started thinking:  what if the state of the art has advanced these past few years while I slumbered?  And off I went to Google.

Greetings, Gerhard Brueckl!

Here’s the article given to me by Google:

image

Click the image to read the article

Gerhard is a thinker.  He did some serious sleuthing, some reading of Jeffery Wang’s blog, then some thinking, and then some testing.  And then probably some more thinking.

He offers up a solution to M2M that I still don’t understand how/why it works.  But it’s so much simpler to write that it falls under the heading of “a pattern I can happily re-use many times over.”

Since for now it’s just a pattern to me rather than something I understand at the atomic level, I’ll just try to make it simple here.  Please read Gerhard’s article above if you want to understand more deeply.

And if I botch something here, please let me know Smile

Add JustCustomers and JustColors tables

First you need two new tables.  Single column tables that contain only the unique values from the column you’d like to relate on (Color) and from the Column you want to use on rows of your pivot (Customer):

image     image

The JustColors and JustCustomers Tables

Relationships

Next, set up relationships so that your model looks like this (orange arrows indicate the direction that filters flow, as always, but in this case it’s harder than usual to understand why the filter flows actually help):

image

A Simple Four-Table Pattern for M2M in PowerPivot

The Measure

Here’s the magic.

[Potential Sales] =
CALCULATE(SUM(Products[Price]),
‘JustCustomers’,
‘CustomersColorsBridge’,
‘JustColors’
)

What?  I just do a simple CALCULATE over a SUM, and then list the names of the three other tables???  (“Other” meaning the tables other than the one where I am performing the SUM).

Well, it works:

A Successful and Simple Many2Many Measure in PowerPivot.  Hallelujah.

A Successful and Simple Many2Many Measure in PowerPivot.  Hallelujah.

Listen, I’ve been blogging for three years on this stuff.  I’ve even written a book on it.  I think it’s a good book.  Others think it’s a good book.  But this never, EVER, would have occurred to me.

None of us should kid ourselves.  This is a deep, DEEP product, this PowerPivot thing, and it still has many magical things lurking to discover.

Well done Gerhard!  And well done, Jeffrey, Marco, and Alberto!  I love how the state of the art evolves through the interplay of community.

Download the workbook

The workbook I made for this post can be downloaded here.

22 Responses to A mystifying and awesome solution for many 2 many

  1. Marco Russo says:

    I have a deal.
    The formula can be simplified to this:
    [Potential Sales] =
    CALCULATE(SUM(Products[Price]),
    ‘CustomersColorsBridge’
    )

    Technically the only table(s) you have to include in the CALCULATE are the “bridge” table(s), which are the table(s) having many-to-one relationships with lookup tables and not the contrary. Ok, those tables that are not lookup.
    The deal is: how do you explain such a concept to Excel users?
    Or you think it is simpler to specify all of the tables in the chain, regardless its role?

    Thanks!

    Marco

    • powerpivotpro says:

      WHAT?? Now we’re getting even stranger.

      But simpler is better. I don’t think you should try explaining the mechanisms of this to Excel pros. Don’t even try explaining to me for instance – I’ll just zone out :)

      The important thing is this: it’s a simple pattern. We can follow patterns. I don’t need to understand every last nuance and mechanism really, I just need to know how to solve a problem.

      Think of it this way: Microsoft should take this pattern and just make a function out of it. Like, say,

      CALCULATEM2M(meas expr, bridge table. filter1, filter2, etc.)

      If we had that function, would any of us Excel pros worry about its innards? Nope :) And we’re *almost* as comfortable ignoring the innards of a simple pattern as we are ignoring the internals of a function.

      The thing about the prior M2M solutions was that they were complex patterns too. I couldn’t just use it, I HAD to understand it. And it was nasty.

      This? Oh, I can live with this.

      • Marco Russo says:

        Actually you can use the CALCULATE just like you would use CALCULATEM2M. If yuo have a cascading M2M relationship, then you should nest the two bdirge tables in this way:

        CALCULATE(
        measure,
        CALCULATETABLE( bridge1, bridge2 ),
        filter1, filter2, …
        )

        where bridge1 is the bridge table nearest to the fact table (on which the measure is defined, probably)
        So you say that a couple of example with a schema pattern that identifies bridge1 and bridge2 would be enough?

        Marco

      • As Marco said, CALCULATE already does that. When people say PowerPivot/Tabular does not Support many-to-many relationships, they mean it misses a feature that can be found in the traditional SSAS (aka Multi-Dimensional) model: the possibility of filtering a sub-model based on the information found in another sub-model. MD allows you to link a dimension to a fact table through another fact table.

        A probably easier way would allow users to specifiy a default filter expression for tables (or groups of measures). The filter Expression would be added to the CALCULATE(…) expression that is implicitly added around measures. This might also simplify the writing of these “calculated relationships” you love so much.:-)

      • Hi guys,
        when I first came up with this solution I was asking myself why M2M does not just work out-of-the-box?
        1-to-N relationships are automatically resolved down to the “lowest” used table (usually the fact-table) and filter it correctly. This does not work the other way round – unless you explicitly specify the “upper” table in your CALCULATE. But why can’t this also be done by the engine? Why arent all “used tables” (and the relationships/link-tables) added to the filter-context automatically?
        Take a little example based on Adventure Works:
        Cnt_SubCategory:=COUNTROWS(‘DimProductSubcategory’)

        lets take a query that has ‘DimProduct’[Style] on rows, [Cnt_SubCategory] on columns and a filter on ‘Date’[CalendarYear]

        our “used tables” would be ‘DimProduct’ (used on rows), ‘Date’ (used as filter) and ‘DimProductSubcategory’ (used on columns within our measure)
        our “relationships/link-tables” would be FactInternetsales as it is in the relationship-chain between ‘DimProduct’ and ‘Date’

        if all those tables are added to the filter-context/CALCULATE automatically by the engine, M2M would just work out of the box!
        So i was wondering why this is not done by the engine??
        the only reason i could think of is that it may blow up memory as a M2M design may cause a lot of duplicate rows??

        Unfortunatelly I could not investigate into this issue in more detail yet

        greets,
        gerhard

        • Marco Russo says:

          The problem is ambiguity.
          Just consider that in AdventureWorks you have FactInternetSales and FactResellerSales. Imagine you put in a PivotTable the Year of OrderDate in the rows and the following measure in Values area:
          Cnt_SubCategory:=COUNTROWS(‘DimProductSubcategory’)

          The possible meanings are:
          - the number of subcategories in the entire table
          - the number of subcategories in each year that have been bought in FactResellerSales
          - the number of subcategories in each year that have been bought in FactInternetSales
          - the number of subcategories in each year that have been bought in both FactResellerSales and FactInternetSales

          Do you see the issue now?

          • I did not see this issue in the first point but you are right of course. On the other hand I still think that the engine could be smarter.
            E.g. extend the context automatically if there is only one possible link
            Or flag some tables that are allowed to be added to the context automatically (e.g. bridge tables)

            but maybe I am just simplifying this too much :)

  2. Eric Hutton says:

    I had done something like this once – not by any intentional design, this was more the “throw things at it and see what sticks” approach (which I have to say is my approach 7 times out 10 still). I was totally baffled as to why it worked…

    Just when I was starting to feel like I was getting a grasp of the underlying mechanics of PowerPivot, something like this comes along. I’ve always thought that a table view type feature would go a long way to assist in working in PowerPivot. I do like how you approximate this in your book by showing how you can use filters in a PivotTable to replicate the filters you apply in a measure.

  3. Eric Hutton says:

    Ha! Just getting to your section covering “Advanced Use of FILTER()” where you have had to mock up the impact of different filters… this is exactly why I want some sort of table view feature!

  4. I read halfway through the article and am surprised at how frequently he makes a Typo on the word Bing… :) I will read the rest of the article after cup of coffee… Great stuff…
    Poor Maro & Alberto get lumped into a nationality again… Great authors, speakers, and BI Practitioners… Check them out here… http://www.sqlbi.com/about/

  5. and then I don’t proof my comment and I spell Marco’s name without a “c” … I do need coffee

  6. Bob Phillips says:

    I came back to it this week as I had a real need for such a solution.

    Rob said in an earlier comment that … We can follow patterns. I don’t need to understand every last nuance and mechanism really, I just need to know how to solve a problem… The problem there is that the example pattern is usually, perhaps necessarily, the simplest form, and the real world patterns are invariably more complex. My problem was certainly more complex, and very difficult to apply this solution to. I thought I would post here to help anyone else who has a similar situation.

    I had a many to many relationship to resolve, but in my case I had assets which were being used in separate tranches of an investment portfolio. Each of the assets could be used in many tranches, and each tranche could be built with many assets. The twist came in that each tranche would be comprised of a percentage of those assets, and I wanted to calculate the apportioned value. For instance, one tranche might of comprised of 50% Cash, 50% Stock, another might be 100% Cash, another 33.333% Bonds, 33.333% Stock, 33.333% Cash, and so on. I need to know how much is invested in cash, in bonds, and so on.

    Using the method that Rob showed, I could join the two tables and get some meaningful values out, but not apportioned. In my case, the measure was

    [InvestmentValue]
    =CALCULATE(
    SUM( Investments[Value] ),
    Assets
    )

    But that got me nowhere near to knowing how much was invested in each asset as it didn’t use the percentage (which is another column in my Assets table, the table akin too Rob’s CustomersColorsBridge table).

    I tried to amend the formula to use the percentage as well, and to round the results to the penny, but that got nowhere.

    The solution is a touch convoluted, but here goes.

    First, a new measure is added to sum the Asset Percentage over the investments

    [AssetPercent]
    =CALCULATE(
    SUM( Assets[Percent] ),
    Investments
    )

    this is similar to the calculation of the value over the assets.

    Then another measure, to calculate the rounded, apportioned value

    [AssetValue]
    =ROUND(
    SUMX(
    VALUES( Assets[InvestmentNum]
    )
    Investments[InvestmentValue] * Assets[AssetPercent]
    )

    Neither [InvestmmentValue] nor [AssetPercent] are shown in my results, just [AssetValue], they are used to establish the joins and for intermediate calculations.

    I appreciate this is horribly complex, but it is worth showing as it is more of a real world solution in my view. I am not pretending that I fully understand all of this, I didn’t even come up with the solution, that was a far smarter guy than me named Rory Archibald, who worked out this two-step fandango. But it seems to work.

    I just need to add another 3 many-to-many relationships now … maybe after a nice glass or two of Lagavulin.

  7. Dave says:

    Mind = Blown!

  8. Tim says:

    Hey Bob,

    I have been trying to solve a similar problem, and would love to understand more of your solution. Can you please post your diagram, and where these measures are placed?

    Really appreciate it.

  9. Thanks for posting this solution. Very helpful.

    In testing this myself, I think you can simplify further – I don’t believe you need the ‘JustCustomers’ table. In my testing I only needed the bridge table and the unique field to ‘join’ on.

  10. Sean McCarthy says:

    love it, thanks guys! :D

  11. Rob P says:

    Hi Rob
    I have your book and read your blog but cannot figure out a complex business problem with the root of the problem relating tables. This is probably way, way out of scope for a blog reply, but just wanted to put it out there to get some possible feedback.

    I am working from CRM Dynamics data to attribute revenue, cost, and profit relative channels and opportunities. I strung together a somewhat working example before that fell short of true table integration but accomplished most of my goal. Then I realized that a relationship that I thought was 1-to-1 was actually 1-to-many and now I don’t even know where to begin. The flow of the relationships makes this difficult. So for example, if we are looking at Opportunity XYZ that has revenue of 9,000 and it uses 2 Web Demos and 1 Conference, we would attribute 3000 revenue for each channel. I had a somewhat ok solution that worked until I realized that Opportunities can take more than one Appointment. The original solution basically had one long flattened table on Opportunity, but now that many Appointments can be assigned to one Opportunity, we need to do these calculations on a different sheet because if there is one Opportunity per row and many Appointments per Opportunity, this won’t work.

    These are my tables that have been exported from CRM:

    Opportunity table: Lists all opportunities and the campaign associated (either none or 1) with the opportunity.
    Opportunity ID*
    Campaign ID
    Revenue
    Date opportunity closed

    Campaign table: This is a lookup to the Opportunity table and also has the cost of a campaign and the channel type, which will be a lookup to a table referenced by another table. An Opportunity by CRM’s design can only take 1 Campaign, so that makes this easy.
    Campaign ID*
    Cost
    Channel Type ID
    Date Campaign complete

    So far, so good. But this is where it gets hard. Appointments are marketing channels like Campaigns but look to Opportunity as the lookup!

    Appointment table: Even though these are marketing channels like Campaigns, they do not have the same relationship with Opportunities, making my problem very hard. There are two types of Appointments-Web Demo and Customer Visit—and there can be many of each per Opportunity. This table lookup is the Opportunity fact table, which I know is not good. Bridge table needed? Would it help to break Web Demo and Customer Visit apart into separate tables?
    Appointment ID*
    Channel type ID
    Cost
    Opportunity ID
    Date Appointment complete

    Channel Type: This is where I am trying to combine the appointments and campaigns so eventually, I can create one table. It is a single column showing all the channels, 4 of which are campaigns (Mini-Conf, User Conf, Webcast, Tradeshow) and 2 of which are appointments (Web Demo, Customer Visit). I am not sure if I need bridge tables, one for CAmpaign types, one for Appointment types?
    Channel type ID*

    Date: Date table is created. I would love to use to link the different tables, but its not working.
    Date
    YearMonth

    If you have any ideas, I’d love to hear! Thanks in advance :)

  12. Bob says:

    It’s really cool that there is a way to use CALCULATE to resolve the many-to-many situation but how come no one mentioned the fact that the Grand Total is not the sum of the Customers Potential Sales?
    I vaguely remember reading in Rob’s first book that the individual parts of the pivot table have their own context (right?) but how would I get the Grand Total to be correct OR explain the results to my CIO?

  13. Just another trick: Reload data from the model with POWER QUERY. Then you can construct tables as you want ;-)

  14. Llewellyn says:

    I have an issue, and perhaps, just perhaps I can find a solution to it. I have multiple identical excel files that lie in various locations, to ensure that more than one person can access and work on these files. The files contain information on contracts and invoices. I have another file which I call “Dashboard” in which I would like to draw all the data on contracts and invoices from the multiple files into PowerPivot and report on them as if they were one dataset. Is this possible without loading each file individually, which doesn’t solve the reporting problem because they are still listed as separate datasets?

    • powerpivotpro says:

      I would look at Power Query – free addin from MS that allows you to import/shape/merge data before it lands in Power Pivot. It does exactly that.

      In fact there’s a chapter in my new book (“Power Pivot Alchemy”) on precisely this topic :)

      • Llewellyn says:

        Thanks for that, I do have excel 2013 – but this was unfortunately done in excel 2010. Anyway this is possible in 2010?

Leave a Comment or Question