Check if Range exist as substring in a cell (googlesheets) - google-sheets-formula

Most formulas I already checked are about finding if a specific cell exists in a range.
I am trying to check the opposite, if values from a specific range exist as substring in a specific cell.
Example, my range A1:A10 is:
Juan Lopez
John Smith
Philip Sue
Philip Stark
Ronaldo Doe
And I want to check of any one of these values in the range exist in my cell C1
C1 = "Senior Designer: Philip Stark (France)"
the answer should be "Philip Stark"
What formula could I use for this?
Until now I have used :
=SUMPRODUCT(--ISNUMBER(SEARCH($A$1:$A$10,C1)))>0
This return true/false if value in range exists in target cell C1. But how can I get the Value from Range?

This will return TRUE if any non-blank string from A1:A10 is in C1:
=REGEXMATCH(C1, JOIN("|", FILTER(A1:A10, A1:A10 <> "")))
This will return a range of TRUE/FALSE for every string in A1:A10 if it is in C1 or not:
=INDEX(REGEXMATCH(C1, A1:A10))
And this one will return only strings in A1:A10 which are in C1:
=FILTER(A1:A10, REGEXMATCH(C1, A1:A10))
Mind the special regex chars which should be escaped if they are in those A1:A10 strings (there are none in your example, so I did not add this escaping).

Related

Modify query function so it can work as an arrayformula in Google Sheets

How do I modify this equation so I can use it with an array function instead of dragging it down.
SUBSTITUTE(JOIN(", ", UNIQUE(QUERY(A:D,"SELECT B WHERE C = '"&G2&"'"))), ", , ", "")
Explanation of the equation:
Have a function is used to extract and concatenate unique values from column B of a sheet named A:D, where the values in column C match a specific criteria. The function is made up of several parts:
It uses the QUERY function to extract all values from column B of sheet A:D where the values in column C match the specific criteria in G.
UNIQUE removes any duplicate values from previous step.
JOIN to concatenate into a single string separated by a comma to returns a string of unique values that match the criteria
SUBSTITUTE to replace occurrences of ", , " with an empty string.
can you try:
=BYROW(G2:G,LAMBDA(gx,IF(gx="",,TEXTJOIN(", ",1,IFNA(UNIQUE(FILTER(B:B,C:C=gx)))))))

How do I select a random word within a cell?

I want randomly to select a word from a cell that is generated by a form field using the "paragraph" answer option.
In one formula:
=index(split(A1," "),randbetween(1,1+len(A1)-len(substitute(A1," ",""))))
Does not remove punctuation.
Anchor the 1s in A1s (ie > A$1s) and can be copied down to suit (for different choices from the same cell).
Let's say you want to get a random word from a string in cell A1:
Get word count:
A2 =if(A1="","",counta(split(A1," ")))
Get random value between 1 and word count:
A3 =randbetween(1,A2)
Get the random word using the random value:
A4 =index(split(A1, " "),A3)

mutate() and str_replace() function

I'm trying to remove any strings that contains the word "Seattle" in the column named "County" in the table named Population
Population %>%
mutate( str_replace(County, "Seattle", ""))
It gives me an error message.
I suspect you are getting an error because in your mutate you aren't defining what column you're mutating...
Also, I think you will have better success with an if_else statement detecting the string pattern of Seattle using grepl and then replacing the contents. Below is the code I've used for something similar.
Population %>%
mutate(County = if_else(grepl("Seattle", County),"",County))
The grepl will detect the string pattern in the County field and provide a TRUE/FALSE return. From there, you just define what to do if it is found to be true, i.e. replace it with nothing (""), or keep the value as is (County).

String of words - DP

I have a string of words and I must determine the longest substring so that the latest 2 letters of a word must be the first 2 letters of a word after it.
For example, for the words:
star, artifact, book, ctenophore, list, reply
Edit: So the longest substring would be star, artifact, ctenophore, reply
I'm looking for an idea to solve this problem in O(n). No code, I appreciate any sugestions on how to solve it.
The closest thing to O(n) I have is this :
You should mark every word with an Id. Let's take your example :
star => 1st substring possible. Since you're looking for the longest substring, if a substring stars with ar, it's not the longest, because you can add star in the front.
let's set the star ID to 1, and its string comparison is ar
artifact => the two first character matches the first possible substring. let's set the artifact ID to 1 as well, and change the string comparison to ct
book => the two first character don't match anything in the string comparisons (there's only ct there), so we set the book ID to 2, and we add a new string comparison : ok
...
list => the first two character don't match anything in the string comparisons (re from ID == 1 and ok from ID ==2 ), so we create another ID = 3 and another string comparison
In the end, you just need to go through the IDs and see which one has the most elements. You can probably count it as you go as well.
The main idea of this algorithm is to memorize every substring we're looking for. If we find a match, we just update the right substring with the two new last characters, and if we don't, we add it to the "memory list"
Repeating this procedure makes it O(n*m), with m the number of different IDs.
First, read in all words into a structure. (You don't really need to, but it's easier to work that way. You could also read them in as you go.)
Idea is to have a lookup table (such as a Dictionary in .NET), which will contain key value pairs such that each two last letters of a word will have an entry in this lookup table, and their corresponding value will always be the longest 'substring' found so far.
Time complexity is O(n) - you only go through the list once.
Logic:
maxWord <- ""
word <- read next word
initial <- get first two letters of word
end <- get last two letters of word
if lookup contains key initial //that is the longest string so far... add to it
newWord <- lookup [initial] value + ", " + word
if lookup doesn't contain key end //nothing ends with these two letters so far
lookup add (end, newWord) pair
else if lookup [end] value length < newWord length //if this will be the longest string ending in these two letters, we replace the previous one
lookup [end] <- newWord
if maxWord length < newWord length //put this code here so you don't have to run through the lookup table again and find it when you finish
maxWord <- newWord
else //lookup doesn't contain initial, we use only the word, and similar to above, check if it's the longest that ends with these two letters
if lookup doesn't contain key end
lookup add (end, word) pair
else if lookup [end] value length < word length
lookup [end] <- word
if maxWord length < word length
maxWord <- word
The maxWord variable will contain the longest string.
Here is the actual working code in C#, if you want it: http://pastebin.com/7wzdW9Es

pig how to filter distinct couples (pairs)

I am new to Pig. I have a Pig script which generates tab-separated pairs between two element. One pair for each line, for example:
John Paul
Tom Nik
Mark Bill
Tom Nik
Paul John
I need to filter out duplicate combinations. If I use DISTINCT, I filter out double "Tom Nik" entry. The result is:
John Paul
Tom Nik
Mark Bill
Paul John
The problem with this approach is that I am left with both "John Paul" and "Paul John", which for my purposes should be treated as the same (same combination).
Is there a way to remove permutate combinations?
I'm not sure how string comparisons is implemented in Pig, but it may be worthwhile to try something like:
-- A is your input
B = FOREACH A GENERATE FLATTEN(($0 < $1 ? ($0, $1) : ($1, $0))) ;
C = DISTINCT B ;
By sorting the names so that the 'smaller' always appears first both John Paul and Paul John should now be in the same order, making the DISTINCT eliminate one.
However, this approach all depends on how the string comparison is implemented. For example if it compares length then the John Paul case will not be filtered correctly.

Resources