Excel - Search an exact match within a string - filter

I'm currently struggling on finding the formula that will resolve my problem.
Here's the status quo:
In Sheet 1, column A, I have a set of string, such as:
/search.action?gender=men&brand=10177&tag=10203&tag=10336
/search.action?gender=women&brand=11579&tag=10001&tag=10138
/search.action?gender=men&brand=12815&tag=10203&tag=10299
/search.action?gender=women&brand=1396&tag=10203&tag=10513
/search.action?gender=women&brand=11&tag=10001&tag=10073
/search.action?gender=women&brand=1396&tag=10203&tag=10336
/search.action?gender=women&brand=13
In Sheet 2, column A, I have a set of strings such as:
brand=10177
brand=12815
brand=13
brand=1396
brand=11579
Finally, in sheet 1, column B will be my "filter" with the formula I'm struggling to find. The goal of my formula is to detect in any of the strings in sheet 1 if one of the string in sheet 2 is present (as an exact match!). Indeed, now it only finds approximative matches. As you can see, the row 5 shouldn't return anything. But with my current formula it does.
Here's the formula:
{=IFERROR(INDEX('Sheet 2'!$A$1:$A$5;MATCH(1;COUNTIF(A1;"*"&'Sheet 2'!$A$1:$A$5&"*");0));"")}
Any idea on the matter?
Please note that I don't want to use VBA, macros, but only a formula.
Thanks a lot for your help!

Following will solve your problem I guess:
=VLOOKUP(MID(A2,FIND("&",A2)+1,FIND("&",A2,FIND("&",A2)+1)-FIND("&",A2)-1),Sheet2!A:A,1,FALSE)
Basically with find function I have identified the start and length of the string in between "&" signs. and used in vlookup.
Another point to mention is this formula is only looking for the first 2 "&" signs.

For completeness, here is another solution based on this answer
=INDEX(Sheet2!$A$1:$A$5,MAX(IF(ISERROR(FIND(Sheet2!$A$1:$A$5,A1)),-1,1)*(ROW(Sheet2!$A$1:$A$5)-ROW(Sheet2!$A$1)+1)))
This is a bit more general and it doesn't matter how many search tags there are.
However as it stands it would match brand=13 in the second sheet with brand=1396 in the first sheet. To avoid that you could add an ampersand to the search strings:-
=INDEX(Sheet2!$A$1:$A$5,MAX(IF(ISERROR(FIND(Sheet2!$A$1:$A$5&"&",A1&"&")),-1,1)*(ROW(Sheet2!$A$1:$A$5)-ROW(Sheet2!$A$1)+1)))
This formula throws a #VALUE error if there is no match: to avoid this, you would need to put an IFERROR statement round it:-
=IFERROR(INDEX(Sheet2!$A$1:$A$5,MAX(IF(ISERROR(FIND(Sheet2!$A$1:$A$5&"&",A1&"&")),-1,1)*(ROW(Sheet2!$A$1:$A$5)-ROW(Sheet2!$A$1)+1))),"")
All these are array formulae.

Related

Google Sheets - Split Formula

I have a sheet where we paste values copied from a pdf into a column, such as:
2715411.0 28.10.2021 600.00
In Google sheets there are columns with formulas that split these values, one of each is:
=ArrayFormula(INDEX(SPLIT(REGEXREPLACE(C2:C274, "\s", "♥"),"♥"),ROW(C2)-ROW(C2),1))
This formula is returning "2715411" instead of "2715411.0". I've tested the formula if the value was "2715411.1" and it works so I'm assuming it's because the number is being "rounded".
Another thing to take into consideration is that sometimes the number we paste is something like "32434346 28.10.2021 600.00" so having always decimal places can't be the answer.
Can anyone help?
Thank you in advance.
=ArrayFormula(SUBSTITUTE(SPLIT(SUBSTITUTE(C2:C274,".","♦")," "),"♦","."))

Finding a number in text always gives an error

I have text datatype in a table column [Items], which always ends in a number:
Item 1
Item 3
Using a find formula in a calculated column works if I use text:
=FIND(" ",[Items])
But doesn't work if I use a number:
=FIND("1",[Items])
I have tried using FORMAT(1,"string"), tried looking for the number with and without quotes etc. I tried looking for " 1" with the space but nothing works if I include the number 1 in my formula.
Why not!? This type of behaviour isn't exhibited in Excel so makes it even more frustrating.
i think that you need to pass an argument for what it should return when it doesn't find anything:
=FIND("1",[items],1,-1)

Extract substring using importxml and substring-after

Using Google sheet 'ImportXML', I was able to extract the following data from a url(in cell A2) using:
=IMPORTXML(A2,"//a/#href[substring-after(., 'AGX:')]").
Data:
/vector/AGX:5WH
/vector/AGX:Z74
/vector/AGX:C52
/vector/AGX:A27
/vector/AGX:C6L
But, I want to extract the code after "/vector/AGX:". The code is not fixed to 3 letters and number of rows is not fixed as well.
I used =INDEX(SPLIT(AP2,"/,'vector',':'"),1,2). But it applied to only one line of data. Had to copy the index+split function to the whole column and had to insert an additional column to store the codes.
5WH
Z74
C52
A27
C6L
But, I want to be able to extract the code(s) after AGX: using ImportXML in one go. Is there a way?
Solution
Your issue is in how you are implementing the index formula. The first parameter returns the rows (in your case each element) and the second the column (in your case either AGX or the code after that).
If instead of getting a single cell we apply this formula on a range and we do not set any value for the row, the formula will return all the values achieving what you were aiming for. Here is its implementation (where F1:F5 will be the range of values you want this formula to be applied) :
=INDEX(SPLIT(F1:F5,"/,'vector',':'"),,2)
If you are interested in a solution simply using IMPORTXML and XPATH, according to the documentation you could use a substring as follows:
=IMPORTXML(A1,"//a/#href[substring-after(.,'SGX:')]")
The drawback of this is that it will return the full string and not exclusively what is after the SGX: which means that you would need to use a Google sheet formula to splitting this. This is the furthest I have achieved exclusively using XPath. In XML it would be easier to apply a forEach and really select what is after the : but I believe in sheets is more complicated if not impossible just using XPath.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

Google Sheets IMPORTXML query

I'm using Google Sheets as web scraper.
I have been using this IMPORTXML
=importxml(A1, "//div[#class='review-content']//text()")
and this is the results
Row1: {"publishedDate":"2019-01-05T22:19:28Z","updatedDate":"null","reportedDate":"null}
Row2: {"publishedDate":"2018-12-10T22:19:28Z","updatedDate":"null","reportedDate":"null}
Row3: {"publishedDate":"2018-12-09T22:19:28Z","updatedDate":"null","reportedDate":"null}
but am having trouble figuring out how to get only the "publishedDate" value.
Example:
Row1: 2019-01-05T22:19:28Z
Row2: 2018-12-10T22:19:28Z
Row3: 2018-12-09T22:19:28Z
Any ideas as to what I may be missing
How about these 3 samples? I thought them from the samples of your question. I think that there are several answers for your situation. So please think of this as 3 samples of them.
It supposes that the URL is put in the cell "A1".
Sample 1:
=ARRAYFORMULA(MID(IMPORTXML(A1, "//div[#class='review-content']//text()"),19,20))
When the length of string of each value is the constant, how about this?
The value is retrieved by MID().
Sample 2:
=ARRAYFORMULA(INDEX(SPLIT(IMPORTXML(A1, "//div[#class='review-content']//text()"),"""",TRUE,TRUE),,4))
When the position of each value is the constant, how about this?
The value is retrieved by SPLIT() and INDEX().
Sample 3:
=ARRAYFORMULA(REGEXEXTRACT(IMPORTXML(A1, "//div[#class='review-content']//text()"),"publishedDate"":""(\w.+?)"""))
When the pattern of each value is the constant, how about this?
The value is retrieved by REGEXEXTRACT().
References:
MID
SPLIT
INDEX
REGEXEXTRACT
If these were not the results you want, I apologize. At that time, in order to correctly replicate your situation, can you provide the URL you are using as #Rubén says?

Picking random cells based on previous random cell selection in Excel

This formula works well to return a random traveldestination1 value if it does find a match for C1 in the moderange range. It goes to #N/A otherwise:
=IF(MATCH(C1,moderange1,0),INDEX(traveldesination1,RANDBETWEEN(1,COUNTA(traveldesination1))),"nope")
How can I improve the formula to search another moderange range (non-adjacent) if a match for C1 is not found in moderange1 (it returns #N/A) (or moderange2 or moderange3 etc...)? It never actually gets to the point of displaying “nope” in this current formula so any code I add there doesn’t get used.
If it doesn't find a match in moderange1, I want it to search moderange2 and if it finds a match there, it should pick a random from traveldestination2 and so on.
I've managed to figure it out! - using nested IFNA conditions did the trick:
=ifna(ifna(ifna(code as above),next range's code, next range's code),"not found")

Resources