fuzzy search in supabase - supabase

IS there a way we can do fuzzy matching in Supabase Table.
I am looking to find match between words like "cherly" and "cher!y".
I tried fts, plfts, phfts and wfts but none of them do partial matching.
Is there a way to do fuzzy matching in Supabase?

Check out these extensions:
fuzzystrmatch (https://www.postgresql.org/docs/9.1/fuzzystrmatch.html)
pg_trgm (https://www.postgresql.org/docs/current/pgtrgm.html)
In your case you might want the levenshtein function (fuzzystrmatch) which would detect that there is only 1 character difference
SELECT levenshtein('cherly', 'cher!y');
result
| levenshtein |
| ----------- |
| 1 |
You can mix & match functions, wrap them up into an Postgres Function, then call them as an RPC (https://supabase.com/docs/reference/javascript/rpc)

Related

Add support for XPATH replace in JCR (Jackrabbit Oak)

I'm trying to determine if there's a way to create a custom Predicate to handle searches for text that contains accented characters.
The problem I am trying to solve is that I have the string "Montréal" stored in the JCR, and want it to show up if my query contains a search for "Montreal" or even "Montre".
I am trying to use the XPATH function fn:replace to do something like this:
replace('Montréal', '[éè]+', 'e')
Here's an example xpath query (run using the query tool in the CRX/DE):
/jcr:root/content/dam/mysite/en//*
[
(#jcr:primaryType = 'dam:AssetContent' and jcr:like(fn:replace(fn:lower-case(data/master/#city), '[éè]+', 'e'),'%montre%'))
]
However, when I attempt to use it, I get the error:
expected: jcr:like | jcr:contains | jcr:score | xs:dateTime | fn:lower-case | fn:upper-case | fn:name | rep:similar | rep:spellcheck | rep:suggest
Is there some way to enable the replace function?
I had faced a similar issue.
I will explain what I did to overcome that.
The requirement : There is a search bar, and in that user were using accented chars.
The problem : same. jcr:like & fn:replace didn't work.
What I did was, sent the search param as it is intp the backend (Java) through servlet, as I was building queries through a service there.
Then I just encoded them in base64, and added the same in the query, as AEM keeps non-english chars in base64 encoded values.
Then just decoded the results in the FE ( but you can do that in Java as well.)

egrep between 2 ranges in same column csv

not sure how to iterate between 2 sets of data on the same column, so lets say i have a CSV file with all titanic passangers and i want to extract the people between 20 and 29 years old and from 40 to 49 years old, and people who spoke english AND other lenguage lets say french, since both data are in the same column is quite challenging.
egrep does not seem to have a AND only and or so im struggling to find how to do it
so what i was trying was something like (from a coma separated csv)
3rd columns is Age and 8th is lenguage
(despite that i know that it might be easier solutions with some sed/awk etc i need it for training porposes in egrep)
egrep "^.*,.*,[2-0][0-9],.*,.*,[eng.*]" titanic-passengers.csv
thanks in advance.
You should use [^,]* to match a single column. .* will match across multiple columns.
To match 20-29 use 2[0-9]; to match 40-49 use 4[0-9]. You can then combine them with [24][0-9].
You don't need to put [] around the language, that's for matching a single character that's any of the characters in the brackets.
grep -E '^[^,]*,[^,]*,[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,eng' titanic-passengers.csv
maybe this one?
grep -E '^[^,]*,[^,*],[24][0-9],[^,]*,[^,]*,[^,]*,[^,]*,[^,]*( english|english )[^,]*' titanic-passengers.csv
#Barmar explained well the other patterns so I'll explain the "language" part.
To be sure to match at least one more language than english, you need to force a space before or after the word english. The OR operator is expressed by (pattern1|pattern2)

Regular expression to find pattern with insertion and deletion cross different lines?

I have text file with following lines as example:
AZZKLMNANAKK
AZZLNAKK
AZLPMNNAK
I would like to write regular expression (AZLN)which allows me to search for specific pattern that is shared between different lines with specifying certain number of insertion and deletion.
Would someone can help me with that ?
If your requirement is to do a simple search then use grep as follows.
grep "AZLN" Input_file

Hadoop/Pig regular expression matching

This is kind of an odd situation, but I'm looking for a way to filter using something like MATCHES but on a list of unknown patterns (of unknown length).
That is, if the given input is two files, one with numbers A:
xxxx
yyyy
zzzz
zzyy
...etc...
And the other with patterns B:
xx.*
yyy.*
...etc...
How can I filter the first input, by all of the patterns in the second?
If I knew all the patterns beforehand, I could
A = FILTER A BY (num MATCHES 'somepattern.*' OR num MATCHES 'someotherpattern'....);
The problem is that I don't know them beforehand, and since they're patterns and not simple strings, I cannot just use joins/groups (at least as far as I can tell).
Maybe a strange nested FOREACH...thing?
Any ideas at all?
If you use the | which operates as an OR you can construct a pattern out of the individual patterns.
(xx.*|yyy.*|zzzz.*)
This will do a check to see if it matches any of the patterns.
Edit:
To create the combined regex pattern:
* Create a string starting with (
* Read in each line (assuming each line is a pattern) and append it to a string followed by a |
* When done reading lines, remove the last character (which will be an unneeded |)
* Append a )
This will create a regex pattern to check all the patterns in the input file. (Note: It's assumed the file contains valid patterns)

How can I write a regex to repeatedly capture group within a larger match?

I'm getting a regex headache, so hopefully someone can help me here. I'm doing some file syntax conversion and I've got this situation in the files:
OpenMarker
keyword some expression
keyword some expression
keyword some expression
keyword some expression
keyword some expression
CloseMarker
I want to match all instances of "keyword" inside the markers. The marker areas are repeated and the keyword can appear in other places, but I don't want to match outside of the markers. What I don't seem to be able to work out is how to get a regex to pull out all the matches. I can get one to do the first or the last, but not to get all of them. I believe it should be possible and it's something to do with repeated capture groups -- can someone show me the light?
I'm using grepWin, which seems to support all the bells and whistles.
You could use:
(?<=OpenMarker((?!CloseMarker).)*)keyword(?=.*CloseMarker)
this will match the keyword inside OpenMarker and CloseMarker (using the option "dot matches newline").
sed -n -e '/OpenMarker[[:space:]]*CloseMarker/p' /path/to/file | grep keyword should work. Not sure if grep alone could do this.
There are only a few regex engines that support separate captures of a repeated group (.NET for example). So your best bet is to do this in two steps:
First match the section you're interested in: OpenMarker(.*?)CloseMarker (using the option "dot matches newline").
Then apply another regex to the match repeatedly: keyword (.*) (this time without the option "dot matches newline").

Resources