“CONTAINSX” Revisited: What WAS the Match?

Building on a Popular Technique

Power Pivot Substring Match/Contains Grouping Column

Last week’s post on “CONTAINSX” proved to be quite popular.  In the comments, Sasha provided an alternate formula that used FILTER instead of SUMX.  Honestly I have a been of a “fetish” for SUMX – after all it IS the 5-Point Palm Exploding Function Technique – so at first I was like “nice work Sasha but I’m sticking with my SUMX.”

But then “en” asked if we could write a formula that reported what the matching keyword actually WAS – not just whether there WAS a match.

And then, Sasha’s formula came in super-handy.  A couple of quick mods and we were in business.  Read on for the formula, but first, a quick aside.

Another thing that is easier in Power Pivot than “Traditional” Excel

I really enjoyed the comments (and the emails) we received about the CONTAINSX post.  Here are two of my faves:

 

Things that are easier in Power Pivot than normal Excel - yes we have some examples of that

Excel Warrior Priestesses Endorse Power Pivot Over Traditional Excel for Problems Like These
(Although I Think She Was Talking About Today’s Followup Technique, not the Original)

Power Pivot is Freaking AMAZEBALLS.  You Heard it Here.

“Freaking AMAZEBALLS” is Actually a Technical Term,
Known Only to Those on the Inside of the Data Revolution
(Identity Redacted to Protect Our Operators in the Trenches)

Personally I suspected that this was easier in Power Pivot than regular Excel.  And normally, calc columns are NOT the strength of Power Pivot (measures are the game changers).  But now I’ve grown so used to calc columns in Power Pivot that I often struggle to write the “old Excel” equivalent.  I’ve crossed over, and can no longer judge such matters for myself.

So it was very cool to hear from XSzil on that front. 

And, of course, AMAZEBALLS for the win.

Get on with it!

OK, new formula time:

Power Pivot Substring Match/Contains Grouping Column

The formula for the highlighted column is:

[WhatWasTheMatch] =

=FIRSTNONBLANK(FILTER(
                      VALUES(MatchList[Keyword]),
                      SEARCH(
                             Matchlist[Keyword],
                             Companies[Company],
                             1,
                             0
                            )
                     )
              ,1
              )

Explanation – FIRSTNONBLANK Part (Yellow)

This is sort of a nonstandard use of FIRSTNONBLANK.  But then again, I find myself often “mis-using” FIRSTNONBLANK in useful ways like this.

You see the yellow-highlighted “,1” up there?  That is the second input to FIRSTNONBLANK, and in “standard” uses, that is typically a measure, and you’re trying to find, for instance, the first time a customer EVER bought something, which happens when the sales measure first returns a non-blank value.

But here, by using 1 as the last input, we “short circuit” the “is it blank?” test (the “NONBLANK” part), and grab the first value we find, period.

In other words, FIRSTNONBLANK with a 1 as the last input is a makeshift “FIRST TEXT VALUE WE FIND” function.

Explanation – the FILTER(VALUES(SEARCH(…))) part

OK, everything else (the green parts) has only one purpose – to return a single column of text values from the Matchlist[Keyword] column.  But only values from that column that actually were found in the current row of Companies[Company].

In other words, it’s only going to return keywords that were found.

But it all starts with VALUES(Matchlist[Keyword]).  And since there is no relationship or other dependency between the Companies table and the Matchlist table, this will ALWAYS start out with EVERY distinct value from the Matchlist[Keyword] column:

Our list of Keywords and the DAX VALUES() function - oh yes.

In this particular calc column formula, VALUES(Matchlist[Keyword])
Always starts with the unfiltered list of all values in that column.

Then the FILTER() kicks in and only keeps rows/values where the <filter expr> evaluates to TRUE.

Sasha cleverly used the SEARCH() function as the <filter expr> test to his FILTER().  I happily stole that.  Since SEARCH returns a non-zero number whenever it finds something, and FILTER treats nonzero numbers as TRUE, well, FILTER only keeps values that it finds.

OK Now What?

Last time I kinda left it to your imagination – what you would DO with such a column.

Well, how about slapping our new column on Rows of a pivot, with a couple of measures?

Using our Substring/Contains Match Column on Rows of a Power Pivot

Our New Column Works Pretty Well on Rows Don’t You Think?

Pretty cool.  Note that I typed over the “Row Labels” cell in the pivot and called it Industry, and typed over the blank cell and called it “<unknown>.”  Just to make things a bit more “Fisher-Price” for the consumer of the report, ya know.

Other things you could do:

  1. Use the new column as a slicer instead of a Rows field.
  2. Use the new column as a <filter> input to a CALCULATE.  A new measure called [Average Value of Copper Companies], for instance, with formula CALCULATE([Average Value], Companies[WhatWasTheMatch]=”Copper”)
  3. Write another calc column that groups “Mine” and “Mining” into a single value, like, um, “Mining.”  And then maybe use THAT new column for any of the purposes above.

Download the Updated Workbook

The updated workbook is available for download here.

EnjoySmile

13 Responses to “CONTAINSX” Revisited: What WAS the Match?

  1. Marco Russo says:

    OMG the FIRSTNONBLANK is like a TOP 1 in SQL and solve a lot of scenarios – *very* clever!

  2. theet says:

    Hi, I am new to Power Pivot and this example U have used …”Blows my Mind”, I am thinking how this can expand my reports at work!! What if there is a company with both Mine and Copper? And/or can one have a match as per column headings.ie with example 5 columns (obliviously not if the match has 2 ‘true’s’. I will be signing up with Chandoo…..

  3. Ted Murphy says:

    Hi Rob,

    Magic!

    I was looking at PowerView in Excel 2013 … and realised that the Map Visualation drill-down is very powerful and easy to use.

    The difficulty I had was in linking customer addresses to specific locations that could be utilised in PowerView.

    In Europe (UK & Ireland in particular) we operate an Address Line 1, Address Line 2 … Address Line n structure with no specific fields for the Street, City, State or Zip. (City is more likely to be a town or village in the Irish context!!) While there is a Post Code system in operation in the UK we do we have one in Ireland yet – so Post Codes are not an option here … although the Government is in the process of setting this up.

    In order to use the Map Drill Down capability, there is therefore a requirement to extract the village / town / city from a concatenation of the address lines. This is a very time-consuming activity.

    The revisited CONTAINSX post was just what I needed … almost! (It still does 95% plus of the work!)

    As I now understand it, in the event of multiple matches, FIRSTNONBLANK will return the lowest alphabetical match. The reality is that the town or city will be located towards the end of the concatenated Address string. What would make CONTAINSX more useable in the above context would be if it could be modified to return the last Keywork identified in the Search String rather than the lowest or highest alphabetically. For example, in the situation where Galway & Tuam are both Keywords, I want an address that contains ” …. Galway Road, Tuam …” to return Tuam rather than Galway because of its location in the string. If however the address was “… Tuam Road, Galway …” then I want Galway to be returned as the result.

    This has saved me hours of time already … and now that I have compiled a table of nearly 700 Keywords (Towns/Villages), it will make the task even shorter the next time I need to deploy.

    I would really appreciate any direction you can give in relation to going the final 5%!

    Many thanks,

    Ted Murphy.

    • powerpivotpro says:

      Well I have a “hacky” fix for you Ted. Change the formula to:

      =FIRSTNONBLANK(TOPN(1,VALUES(MatchList[Keyword]),
      SEARCH(Matchlist[Keyword],
      Companies[Company],
      1,
      0
      )
      ),1)

      And then, in your keywords list, introduce a keyword that is intentionally at the “front” of the list, alphabetically.

      For instance I added ” A Not Found” – with leading spaces intended.

      When SEARCH finds no match, and returns 0, the formula ends up grabbing ” A Not Found”

      When there IS a match, I think it finds the one that is latest in the string.

      Still verifying.

      • powerpivotpro says:

        yeah this seems to work :)

        you could then add another calc column that says IF = the dummy value, return BLANK(), otherwise return the calc column’s value (the column with the TOPN in it).

        that’s if you don’t want the dummy value showing up in slicers etc.

        i have not yet found a way to avoid using the dummy value, at least not without making the formula super long by having the SEARCH “sub-formula” appear twice in the formula.

  4. en says:

    Highly applicable stuff, thanks Rob!

    However believe i am facing some technical issues – suspect my Excel 2010 is bugged.
    Formula refuse to work in “New Measures”, returning the “naked column” errors on Companies[Company]. It is however totally operational as Calculated Columns.

    This same problem i faced in the prior post, very sad :<
    Will perhaps do a re-install of office or even consider getting 2013 stand-alone.

  5. Oxenskiold says:

    =
    FIRSTNONBLANK (
    FILTER (
    VALUES ( MatchList[Keyword] ),
    SEARCH ( Matchlist[Keyword], Companies[Company], 1, 0 )
    ),
    1
    )

    • Oxenskiold says:

      I really wish that spaces weren’t trimmed from comments when you post one.

      Then you could use the Dax-formatter from http://www.daxformatter.com. Dax formulas really are much more readable having been formatted by this tool. The formatter checks the syntax as well and it’s easy to copy the formatted DAX formula and use and understand it afterwards.

      You should really look into this Rob. I think.

      • powerpivotpro says:

        I did a quick google search just now and couldn’t find a way to force wordpress to perserve the whitespace in comments. If someone else knows a way please let me know :)

  6. Gordon says:

    Hi,
    Have tried to use this example to do a “fuzzy lookup” with a list of customers in Basename[Customer] which will appear in the Supplier_Report[Customer], and when found to put the value from Basename[Customer] in the calculated field

    In the adaption below, all worked well up to the final “BaseName[Customer] – if I put text in that works

    =IF(SUMX(BaseName,FIND(UPPER(BaseName[Customer]),UPPER(Supplier_Report[Customer]),,0))>0,BaseName[Customer],”Unknown”)

    Any suggestions welcome

Leave a Reply