### Building on a Popular Technique

Last week’s post on “CONTAINSX” proved to be quite popular.  In the comments, Sasha provided an alternate formula that used FILTER instead of SUMX.  Honestly I have a been of a “fetish” for SUMX – after all it IS the 5-Point Palm Exploding Function Technique – so at first I was like “nice work Sasha but I’m sticking with my SUMX.”

But then “en” asked if we could write a formula that reported what the matching keyword actually WAS – not just whether there WAS a match.

And then, Sasha’s formula came in super-handy.  A couple of quick mods and we were in business.  Read on for the formula, but first, a quick aside.

### Another thing that is easier in Power Pivot than “Traditional” Excel

I really enjoyed the comments (and the emails) we received about the CONTAINSX post.  Here are two of my faves:

Excel Warrior Priestesses Endorse Power Pivot Over Traditional Excel for Problems Like These
(Although I Think She Was Talking About Today’s Followup Technique, not the Original)

“Freaking AMAZEBALLS” is Actually a Technical Term,
Known Only to Those on the Inside of the Data Revolution
(Identity Redacted to Protect Our Operators in the Trenches)

Personally I suspected that this was easier in Power Pivot than regular Excel.  And normally, calc columns are NOT the strength of Power Pivot (measures are the game changers).  But now I’ve grown so used to calc columns in Power Pivot that I often struggle to write the “old Excel” equivalent.  I’ve crossed over, and can no longer judge such matters for myself.

So it was very cool to hear from XSzil on that front.

And, of course, AMAZEBALLS for the win.

### Get on with it!

OK, new formula time:

The formula for the highlighted column is:

[WhatWasTheMatch] =

=FIRSTNONBLANK(FILTER(
VALUES(MatchList[Keyword]),
SEARCH(
Matchlist[Keyword],
Companies[Company],
1,
0
)
)
,1
)

### Explanation – FIRSTNONBLANK Part (Yellow)

This is sort of a nonstandard use of FIRSTNONBLANK.  But then again, I find myself often “mis-using” FIRSTNONBLANK in useful ways like this.

You see the yellow-highlighted “,1” up there?  That is the second input to FIRSTNONBLANK, and in “standard” uses, that is typically a measure, and you’re trying to find, for instance, the first time a customer EVER bought something, which happens when the sales measure first returns a non-blank value.

But here, by using 1 as the last input, we “short circuit” the “is it blank?” test (the “NONBLANK” part), and grab the first value we find, period.

In other words, FIRSTNONBLANK with a 1 as the last input is a makeshift “FIRST TEXT VALUE WE FIND” function.

### Explanation – the FILTER(VALUES(SEARCH(…))) part

OK, everything else (the green parts) has only one purpose – to return a single column of text values from the Matchlist[Keyword] column.  But only values from that column that actually were found in the current row of Companies[Company].

In other words, it’s only going to return keywords that were found.

But it all starts with VALUES(Matchlist[Keyword]).  And since there is no relationship or other dependency between the Companies table and the Matchlist table, this will ALWAYS start out with EVERY distinct value from the Matchlist[Keyword] column:

In this particular calc column formula, VALUES(Matchlist[Keyword])
Always starts with the unfiltered list of all values in that column.

Then the FILTER() kicks in and only keeps rows/values where the <filter expr> evaluates to TRUE.

Sasha cleverly used the SEARCH() function as the <filter expr> test to his FILTER().  I happily stole that.  Since SEARCH returns a non-zero number whenever it finds something, and FILTER treats nonzero numbers as TRUE, well, FILTER only keeps values that it finds.

### OK Now What?

Last time I kinda left it to your imagination – what you would DO with such a column.

Well, how about slapping our new column on Rows of a pivot, with a couple of measures?

Our New Column Works Pretty Well on Rows Don’t You Think?

Pretty cool.  Note that I typed over the “Row Labels” cell in the pivot and called it Industry, and typed over the blank cell and called it “<unknown>.”  Just to make things a bit more “Fisher-Price” for the consumer of the report, ya know.

Other things you could do:

1. Use the new column as a slicer instead of a Rows field.
2. Use the new column as a <filter> input to a CALCULATE.  A new measure called [Average Value of Copper Companies], for instance, with formula CALCULATE([Average Value], Companies[WhatWasTheMatch]=”Copper”)
3. Write another calc column that groups “Mine” and “Mining” into a single value, like, um, “Mining.”  And then maybe use THAT new column for any of the purposes above.

Enjoy

#### Rob Collie

One of the founding engineers behind Power Pivot during his 14-year career at Microsoft, and creator of the world’s first cloud Power Pivot service, Rob is one of the foremost authorities on self-service business intelligence and next-generation spreadsheet technology.

## This Post Has 19 Comments

1. OMG the FIRSTNONBLANK is like a TOP 1 in SQL and solve a lot of scenarios – *very* clever!

1. powerpivotpro says:

Marco – I won’t lie. Whenever you praise something I have done, I get a seriously happy feeling. Like I have pleased the Yoda of DAX or something. So thank you very much for sharing that

2. theet says:

Hi, I am new to Power Pivot and this example U have used …”Blows my Mind”, I am thinking how this can expand my reports at work!! What if there is a company with both Mine and Copper? And/or can one have a match as per column headings.ie with example 5 columns (obliviously not if the match has 2 ‘true’s’. I will be signing up with Chandoo…..

3. Ted Murphy says:

Hi Rob,

Magic!

I was looking at PowerView in Excel 2013 … and realised that the Map Visualation drill-down is very powerful and easy to use.

The difficulty I had was in linking customer addresses to specific locations that could be utilised in PowerView.

In Europe (UK & Ireland in particular) we operate an Address Line 1, Address Line 2 … Address Line n structure with no specific fields for the Street, City, State or Zip. (City is more likely to be a town or village in the Irish context!!) While there is a Post Code system in operation in the UK we do we have one in Ireland yet – so Post Codes are not an option here … although the Government is in the process of setting this up.

In order to use the Map Drill Down capability, there is therefore a requirement to extract the village / town / city from a concatenation of the address lines. This is a very time-consuming activity.

The revisited CONTAINSX post was just what I needed … almost! (It still does 95% plus of the work!)

As I now understand it, in the event of multiple matches, FIRSTNONBLANK will return the lowest alphabetical match. The reality is that the town or city will be located towards the end of the concatenated Address string. What would make CONTAINSX more useable in the above context would be if it could be modified to return the last Keywork identified in the Search String rather than the lowest or highest alphabetically. For example, in the situation where Galway & Tuam are both Keywords, I want an address that contains ” …. Galway Road, Tuam …” to return Tuam rather than Galway because of its location in the string. If however the address was “… Tuam Road, Galway …” then I want Galway to be returned as the result.

This has saved me hours of time already … and now that I have compiled a table of nearly 700 Keywords (Towns/Villages), it will make the task even shorter the next time I need to deploy.

I would really appreciate any direction you can give in relation to going the final 5%!

Many thanks,

Ted Murphy.

1. powerpivotpro says:

Well I have a “hacky” fix for you Ted. Change the formula to:

=FIRSTNONBLANK(TOPN(1,VALUES(MatchList[Keyword]),
SEARCH(Matchlist[Keyword],
Companies[Company],
1,
0
)
),1)

And then, in your keywords list, introduce a keyword that is intentionally at the “front” of the list, alphabetically.

When SEARCH finds no match, and returns 0, the formula ends up grabbing ” A Not Found”

When there IS a match, I think it finds the one that is latest in the string.

Still verifying.

1. powerpivotpro says:

yeah this seems to work

you could then add another calc column that says IF = the dummy value, return BLANK(), otherwise return the calc column’s value (the column with the TOPN in it).

that’s if you don’t want the dummy value showing up in slicers etc.

i have not yet found a way to avoid using the dummy value, at least not without making the formula super long by having the SEARCH “sub-formula” appear twice in the formula.

4. en says:

Highly applicable stuff, thanks Rob!

However believe i am facing some technical issues – suspect my Excel 2010 is bugged.
Formula refuse to work in “New Measures”, returning the “naked column” errors on Companies[Company]. It is however totally operational as Calculated Columns.

This same problem i faced in the prior post, very sad :<
Will perhaps do a re-install of office or even consider getting 2013 stand-alone.

1. en says:

-pls disregard said technical issue, forgive my ignorance to realise this is very much a Calculated Column technique!

5. Oxenskiold says:

=
FIRSTNONBLANK (
FILTER (
VALUES ( MatchList[Keyword] ),
SEARCH ( Matchlist[Keyword], Companies[Company], 1, 0 )
),
1
)

1. Oxenskiold says:

I really wish that spaces weren’t trimmed from comments when you post one.

Then you could use the Dax-formatter from http://www.daxformatter.com. Dax formulas really are much more readable having been formatted by this tool. The formatter checks the syntax as well and it’s easy to copy the formatted DAX formula and use and understand it afterwards.

You should really look into this Rob. I think.

1. powerpivotpro says:

I did a quick google search just now and couldn’t find a way to force wordpress to perserve the whitespace in comments. If someone else knows a way please let me know

6. Gordon says:

Hi,
Have tried to use this example to do a “fuzzy lookup” with a list of customers in Basename[Customer] which will appear in the Supplier_Report[Customer], and when found to put the value from Basename[Customer] in the calculated field

In the adaption below, all worked well up to the final “BaseName[Customer] – if I put text in that works

=IF(SUMX(BaseName,FIND(UPPER(BaseName[Customer]),UPPER(Supplier_Report[Customer]),,0))>0,BaseName[Customer],”Unknown”)

Any suggestions welcome

7. Brian says:

I have tried this formula and it appears to be working great! However, I am trying to place a second filter on the “Search” database and am in need of help.
I want to have 5 different columns searching 5 different “Search types” at a time. I could do this by recording the tables separately, but I see no reason a double filter cannot be applied.

8. Rob says:

What a great help. That said, I have a need to search for two word strings. E.g., “made safe” or “”was safe”. Any way to adapt the formula to do this?

9. dpotta says:

Thanks, this formula solves a regular problem for me of searching for and returning a valid UK postcode in a long text string, just link it up to a Postcode database and BOOM!

10. Yoav says:

Thank you for this! It is super helpful!!

I have one question, a sort of complication of this formula. I have a dataset where column 1 is the state and column 2 is the city, if I use this formula as is I have problems where sometimes it returns the wrong value (where 1 city has two possible matches from two different states).

Is there a way to have this formula work by searching for a match only within the matching category so for example, it will only search for a matching city within the matching state?

Please let me know if this makes sense and if you can think of any possible solutions.

Thank you!!

11. en says:

Apparently PowerPivot iterates rows in an alphabetical order.
In cases of multiple-match it returns the match that is alphabetically at the top.

To work around the alphabetical order set in the [Keyword] list, I added another column to table Keyword (Matchlist[RANK]).
I then define the rank numbers (starting from “1”, then “2” and so on..) so when table Keyword is sorted alphabetically on Matchlist[RANK], the top [Keyword] that matches will be returned instead.

=LOOKUPVALUE(Matchlist[Keyword],Matchlist[RANK],
CALCULATE(FIRSTNONBLANK(Matchlist[RANK],1),
FILTER(VALUES((Matchlist[Keyword])),
SEARCH(Matchlist[Keyword],Companies[Company],1,0)0
)))

Basically CALCULATE returns the first/top Matchlist[RANK] value, and passes it back to LOOKUPVALUE to return the corresponding Matchlist[Keyword].
This passing back and forth to circumvent the alphabetical order of the Keyword matching feels very awkward to me. Not sure if there is a more optimal approach. Gladly appreciate any advice to correct my understanding.

1. en says:

Hmm.. not sure why the “Less than” and “More than” sign placed after SEARCH() is missing.
ie SEARCH () NOT EQUAL to 0

The formula again, hopefully the “” sign is there after “SEARCH()”.
=LOOKUPVALUE(Matchlist[Keyword],Matchlist[RANK],
CALCULATE(FIRSTNONBLANK(Matchlist[RANK],1),
FILTER(VALUES((Matchlist[Keyword])),
SEARCH(Matchlist[Keyword],Companies[Company],1,0)0
)))

Can any moderator reinstate the “not equal sign” in my original comment and delete this?