DAX – “CONTAINSX” Revisited: What WAS the Match?

January 28, 2014 at 4:45 pm

OMG the FIRSTNONBLANK is like a TOP 1 in SQL and solve a lot of scenarios – *very* clever!

January 29, 2014 at 2:50 pm

Marco – I won’t lie. Whenever you praise something I have done, I get a seriously happy feeling. Like I have pleased the Yoda of DAX or something. So thank you very much for sharing that 🙂

January 29, 2014 at 5:23 pm

You’re very welcome! 🙂

January 29, 2014 at 2:42 pm

Hi, I am new to Power Pivot and this example U have used …”Blows my Mind”, I am thinking how this can expand my reports at work!! What if there is a company with both Mine and Copper? And/or can one have a match as per column headings.ie with example 5 columns (obliviously not if the match has 2 ‘true’s’. I will be signing up with Chandoo…..

January 29, 2014 at 7:08 pm

Hi Rob,

Magic!

I was looking at PowerView in Excel 2013 … and realised that the Map Visualation drill-down is very powerful and easy to use.

The difficulty I had was in linking customer addresses to specific locations that could be utilised in PowerView.

In Europe (UK & Ireland in particular) we operate an Address Line 1, Address Line 2 … Address Line n structure with no specific fields for the Street, City, State or Zip. (City is more likely to be a town or village in the Irish context!!) While there is a Post Code system in operation in the UK we do we have one in Ireland yet – so Post Codes are not an option here … although the Government is in the process of setting this up.

In order to use the Map Drill Down capability, there is therefore a requirement to extract the village / town / city from a concatenation of the address lines. This is a very time-consuming activity.

The revisited CONTAINSX post was just what I needed … almost! (It still does 95% plus of the work!)

As I now understand it, in the event of multiple matches, FIRSTNONBLANK will return the lowest alphabetical match. The reality is that the town or city will be located towards the end of the concatenated Address string. What would make CONTAINSX more useable in the above context would be if it could be modified to return the last Keywork identified in the Search String rather than the lowest or highest alphabetically. For example, in the situation where Galway & Tuam are both Keywords, I want an address that contains ” …. Galway Road, Tuam …” to return Tuam rather than Galway because of its location in the string. If however the address was “… Tuam Road, Galway …” then I want Galway to be returned as the result.

This has saved me hours of time already … and now that I have compiled a table of nearly 700 Keywords (Towns/Villages), it will make the task even shorter the next time I need to deploy.

I would really appreciate any direction you can give in relation to going the final 5%!

Many thanks,

Ted Murphy.

January 29, 2014 at 7:25 pm

Well I have a “hacky” fix for you Ted. Change the formula to:

=FIRSTNONBLANK(TOPN(1,VALUES(MatchList[Keyword]),
SEARCH(Matchlist[Keyword],
Companies[Company],
1,
0
)
),1)

And then, in your keywords list, introduce a keyword that is intentionally at the “front” of the list, alphabetically.

For instance I added ” A Not Found” – with leading spaces intended.

When SEARCH finds no match, and returns 0, the formula ends up grabbing ” A Not Found”

When there IS a match, I think it finds the one that is latest in the string.

Still verifying.

January 29, 2014 at 7:43 pm

yeah this seems to work 🙂

you could then add another calc column that says IF = the dummy value, return BLANK(), otherwise return the calc column’s value (the column with the TOPN in it).

that’s if you don’t want the dummy value showing up in slicers etc.

i have not yet found a way to avoid using the dummy value, at least not without making the formula super long by having the SEARCH “sub-formula” appear twice in the formula.

January 31, 2014 at 4:17 am

Highly applicable stuff, thanks Rob!

However believe i am facing some technical issues – suspect my Excel 2010 is bugged.
Formula refuse to work in “New Measures”, returning the “naked column” errors on Companies[Company]. It is however totally operational as Calculated Columns.

This same problem i faced in the prior post, very sad :<
Will perhaps do a re-install of office or even consider getting 2013 stand-alone.

January 31, 2014 at 4:35 am

-pls disregard said technical issue, forgive my ignorance to realise this is very much a Calculated Column technique!

January 31, 2014 at 7:06 am

=
FIRSTNONBLANK (
FILTER (
VALUES ( MatchList[Keyword] ),
SEARCH ( Matchlist[Keyword], Companies[Company], 1, 0 )
),
1
)

January 31, 2014 at 7:18 am

I really wish that spaces weren’t trimmed from comments when you post one.

Then you could use the Dax-formatter from https://www.daxformatter.com. Dax formulas really are much more readable having been formatted by this tool. The formatter checks the syntax as well and it’s easy to copy the formatted DAX formula and use and understand it afterwards.

You should really look into this Rob. I think.

January 31, 2014 at 11:36 am

I did a quick google search just now and couldn’t find a way to force wordpress to perserve the whitespace in comments. If someone else knows a way please let me know 🙂

March 10, 2014 at 4:22 pm

Hi,
Have tried to use this example to do a “fuzzy lookup” with a list of customers in Basename[Customer] which will appear in the Supplier_Report[Customer], and when found to put the value from Basename[Customer] in the calculated field

In the adaption below, all worked well up to the final “BaseName[Customer] – if I put text in that works

=IF(SUMX(BaseName,FIND(UPPER(BaseName[Customer]),UPPER(Supplier_Report[Customer]),,0))>0,BaseName[Customer],”Unknown”)

Any suggestions welcome

July 22, 2014 at 4:54 pm

I have tried this formula and it appears to be working great! However, I am trying to place a second filter on the “Search” database and am in need of help.
I want to have 5 different columns searching 5 different “Search types” at a time. I could do this by recording the tables separately, but I see no reason a double filter cannot be applied.
Any advice would greatly appreciated.

May 19, 2015 at 9:12 pm

What a great help. That said, I have a need to search for two word strings. E.g., “made safe” or “”was safe”. Any way to adapt the formula to do this?

April 4, 2016 at 8:06 am

Thanks, this formula solves a regular problem for me of searching for and returning a valid UK postcode in a long text string, just link it up to a Postcode database and BOOM!

April 21, 2016 at 12:50 pm

Thank you for this! It is super helpful!!

I have one question, a sort of complication of this formula. I have a dataset where column 1 is the state and column 2 is the city, if I use this formula as is I have problems where sometimes it returns the wrong value (where 1 city has two possible matches from two different states).

Is there a way to have this formula work by searching for a match only within the matching category so for example, it will only search for a matching city within the matching state?

Please let me know if this makes sense and if you can think of any possible solutions.

Thank you!!

May 6, 2016 at 9:53 pm

Apparently PowerPivot iterates rows in an alphabetical order.
In cases of multiple-match it returns the match that is alphabetically at the top.

To work around the alphabetical order set in the [Keyword] list, I added another column to table Keyword (Matchlist[RANK]).
I then define the rank numbers (starting from “1”, then “2” and so on..) so when table Keyword is sorted alphabetically on Matchlist[RANK], the top [Keyword] that matches will be returned instead.

=LOOKUPVALUE(Matchlist[Keyword],Matchlist[RANK],
CALCULATE(FIRSTNONBLANK(Matchlist[RANK],1),
FILTER(VALUES((Matchlist[Keyword])),
SEARCH(Matchlist[Keyword],Companies[Company],1,0)0
)))

Basically CALCULATE returns the first/top Matchlist[RANK] value, and passes it back to LOOKUPVALUE to return the corresponding Matchlist[Keyword].
This passing back and forth to circumvent the alphabetical order of the Keyword matching feels very awkward to me. Not sure if there is a more optimal approach. Gladly appreciate any advice to correct my understanding.

May 6, 2016 at 10:05 pm

Hmm.. not sure why the “Less than” and “More than” sign placed after SEARCH() is missing.
ie SEARCH () NOT EQUAL to 0

The formula again, hopefully the “” sign is there after “SEARCH()”.
=LOOKUPVALUE(Matchlist[Keyword],Matchlist[RANK],
CALCULATE(FIRSTNONBLANK(Matchlist[RANK],1),
FILTER(VALUES((Matchlist[Keyword])),
SEARCH(Matchlist[Keyword],Companies[Company],1,0)0
)))

Can any moderator reinstate the “not equal sign” in my original comment and delete this?

September 26, 2016 at 5:18 pm

This is awesome..but is there a way to do it in query mode of powerbi?

I’m using the formula to label each of 500k rows with one of the 400 values in the match key column of another table. It works, but it doesn”t half kill the machine and is causing memory issues reducing the performance of the datamodel

November 14, 2016 at 12:36 pm

This is very helpful indeed! Now of course my users are requesting a modification to exclude certain terms. For example they want to search for “latex” in descriptions to find how many items we stock that might trigger a latex allergy, but to ignore items that say “latex-free” or “no latex”. I can see how the MatchList table could have one “yes” column and multiple “no” columns for each term, or a parent-child relationship in the same table using PATH, but I’m getting tangled in figuring out how to write such a formula.

November 15, 2016 at 4:18 am

How about adding 1 more calculated column that looks-up to a list filled with keywords to be excluded.
ie another MatchList that operates independent of existing one, but this searches another “Exclusion List”

Here’s my workbook (Excel 2013 and above) you might want to refer to for a working example of the Exclusion List in action.
https://www.dropbox.com/s/ou7g3zq0k0e4wua/DAX%20RANKED%20SEARCHES.xlsx?dl=0

November 15, 2016 at 10:11 am

Thanks for that idea. However, I need the exclusion keywords to apply only to the “permitted” term. Current example: I’m looking up usage on “needles” and “syringes”. I want to exclude “needle free” on “needle” items but do not want to exclude it for “syringes”. I may be wrong but my understanding of your worksheet suggests that the selection of excluded items will apply to all permitted terms.

November 15, 2016 at 9:12 pm

You are right, my exclusion is universally applied. I am also thinking of an additional column on the MatchList table to indicate specific exclusions. Will drop by here again if I got anything to show. Hopefully someone can point us out a easier way.

January 5, 2018 at 8:35 am

I hate to dig up old posts, but I just wanted to thank you, Rob, for providing us newbies with such a great and working solution.

Yet in my case it seems not really to work out after all. Having a lookup table (MatchList) with ~9000 rows and a target table (Companies) with ~10’000 rows, where each cell contains at least 300 words, seems to kill the machine (well actually it just never completes the processes – it takes ages!), even though hardware is not used that excessively (just one core on 80-90%, just 2GB of memory and just 2KB/s of I/O on the disk.)

Anyone an idea how to solve an issue like that?

DAX – “CONTAINSX” Revisited: What WAS the Match?

Building on a Popular Technique

Another thing that is easier in Power Pivot than “Traditional” Excel

Get on with it!

Explanation – FIRSTNONBLANK Part (Yellow)

Explanation – the FILTER(VALUES(SEARCH(…))) part

OK Now What?

Download the Updated Workbook

Cancel reply