Underscore as fulltext search word boundaries in MariaDB 10 - full-text-search

Let's assume that a table node has a varchar column description with fulltext index in MariaDb 10.
The query
select description from node where match(description) against('night');
will match description values like
Night-and-day
What-a-wonderful-night
What a wonderful night
but will not match
What_a_wonderful_night
Now my question: it seems that space ( ) and hyphen (-) are considered to be word boundaries, but underscore (_) not. Is there a way by configuration or inside the query to make underscore be a word boundary as well?

Change the macro true_word_char() in ha_innobase.cc:
-#define true_word_char(c, ch) ((c) & (_MY_U | _MY_L | _MY_NMR) || (ch) == '_')
+#define true_word_char(c, ch) ((c) & (_MY_U | _MY_L | _MY_NMR))
and rebuild MariaDB.

Related

Buffer gets get reduced when escaping dot with back slash

I have the below query
SELECT
categorymap.id,
categorytype.name,
categorytype.value
FROM
categorymap,
categorytype
WHERE
( categorymap.logfilename = '**hello\.log**' )
AND ( categorymap.categorytypeid = categorytype.id )
Index is available for column logfilename of categorymap table.
I noticed the buffer gets was more when not adding "\" before "." in where clause. Both cases, before and after adding "\", index range scan was used on logfilename column as per explain plan.
Could someone please explain what role does '.' play in here in increasing buffer gets?
TIA
If you are talking about this:
= '**hello\.log**'
(maybe it is just = 'hello\.log'; double asterisks for bold?), then: you didn't escape anything. This query will search the logfilename column for exactly such a string: hello followed by a backslash \ followed by a dot . followed by log.
You'd escape a dot in e.g. regular expression, but there's none here, so ...

How below REGEXP_REPLACE works?

I have query in my project and that is having REGEXP_REPLACE
i tried to find how it works by searching but i found it like
w+ Matches a word character (that is, an alphanumeric or underscore
(_) character).
but not able to find '"\w+\":' why these "" are used and what is mean by '{|}|"',''
UPDATE (SELECT data,data_value FROM TEMP) t
SET t.DATA_VALUE=REGEXP_REPLACE(REGEXP_REPLACE(t.data, '"\w+\":',''),'{|}|"','');
can you please tell me how it works?
This appear to be a regular expression for stripping keys and enclosing brackets from a JSON string - unfortunately, if this is the case then it does not work in all situations.
The regular expression
'"\w+\":'
will match:
A " double quotation mark;
\w+ one-or-more word (a-z or A-Z or 0-9 or _) characters;
\" another double quotation mark - note: the \ character is not necessary; then
A : colon.
So:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote"}',
'"\w+":', -- Pattern matched
'' -- Replacement string
)
Will output:
{"value","value with \"quote"}
The second pattern {|}|" will match either a {, or a } or a " character (and could have been equivalently written as [{}"]) so:
REGEXP_REPLACE(
'{"value","value with \"quote"}',
'{|}|"', -- Pattern matched
'' -- Replacement string
)
Will output:
value,value with \quote
Which is fine, until (like my example) you have an escaped double quote (or curly braces) in the value string; in which case those will also get stripped leaving the escape character.
(Note: you would not typically find this but it is possible to include escaped quotes in the key. So {"keywith\":quote":"value"} would get replaced to {quote":"value"} and then quote:value which is not the intended output.)
If parsing JSON is what you are trying to do (pre-Oracle 12) then you can use:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'^{|"(\\"|[^"])+":(")?((\\"|[^"])+?)\2((,)|})',
'\3\6'
)
Which outputs:
value,value with \"quote,value with \"{}
Or in Oracle 12 you can do:
SELECT *
FROM JSON_TABLE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'$.*' NULL ON ERROR
COLUMNS (
value VARCHAR2(4000) PATH '$'
)
)
Which outputs:
VALUE
-----------------
value
value with "quote
value with "{}
example:::REGEXP_REPLACE( string, pattern [, replacement_string [, start_position [, nth_appearance [, match_parameter ] ] ] ] )
| is or(CAN MEAN MORE THAN ONE ALTERNATIVE ) , is for at least as in {n,} at least n times
https://www.techonthenet.com/oracle/functions/regexp_replace.php
"where I got my info"
'"\w+\":' why these "" are used and what is mean by '{|}|"',''
Matches a word character(\w)One or more times(+) this has to be messed up it's missing the right quantity of close parentheses by putting \" w+ \"
they allow the " to be shown. This expression takes one expression changes it then uses that as the basis for the next change. Good luck figuring the rest out. Regular expressions aren't too bad, pretty intuitive once you get the basics down.

Format string in Oracle

I'm building a string in oracle, where I get a number from a column and make it a 12 digit number with the LPad function, so the length of it is 12 now.
Example: LPad(nProjectNr,12,'0') and I get 000123856812 (for example).
Now I want to split this string in parts of 3 digit with a "\" as prefix, so that the result will look like this \000\123\856\812.
How can I archive this in a select statement, what function can accomplish this?
Assuming strings of 12 digits, regexp_replace could be a way:
select regexp_replace('000123856812', '(.{3})', '\\\1') from dual
The regexp matches sequences of 3 characters and adds a \ as a prefix
It is much easier to do this using TO_CHAR(number) with the proper format model. Suppose we use \ as the thousands separator.... (alas we can't start a format model with a thousands separator - not allowed in TO_CHAR - so we still need to concatenate a \ to the left):
See also edit below
select 123856812 as n,
'\' || to_char(123856812, 'FM000G000G000G000', 'nls_numeric_characters=.\') as str
from dual
;
N STR
--------- ----------------
123856812 \000\123\856\812
Without the FM format model modifier, TO_CHAR will add a leading space (placeholder for the sign, plus or minus). FM means "shortest possible string representation consistent with the model provided" - that is, in this case, no leading space.
Edit - it just crossed my mind that we can exploit TO_CHAR() even further and not need to concatenate the first \. The thousands separator, G, may not be the first character of the string, but the currency symbol, placeholder L, can!
select 123856812 as n,
to_char(123856812, 'FML000G000G000G000',
'nls_numeric_characters=.\, nls_currency=\') as str
from dual
;
SUBSTR returns a substring of a string passed as the first argument. You can specify where the substring starts and how many characters it should be.
Try
SELECT '\'||SUBSTR('000123856812', 1,3)||'\'||SUBSTR('000123856812', 4,3)||'\'||SUBSTR('000123856812', 7,3)||'\'||SUBSTR('000123856812', 10,3) FROM dual;

oracle regexp_replace delete last occurrence of special character

I have a pl sql string as follows :
String := 'ctx_ddl.add_stopword(''"SHARK_IDX19_SPL"'',''can'');
create index "SCOTT"."SHARK_IDX2"
on "SCOTT"."SHARK2"
("DOC")
indextype is ctxsys.context
parameters(''
datastore "SHARK_IDX2_DST"
filter "SHARK_IDX2_FIL"
section group "SHARK_IDX2_SGP"
lexer "SHARK_IDX2_LEX"
wordlist "SHARK_IDX2_WDL"
stoplist "SHARK_IDX2_SPL"
storage "SHARK_IDX2_STO"
sync (every "SYSDATE+(1/1)" memory 67108864)
'')
/
';
I have to get search the final occurrence of '/' and add ';' to it. Also I need to escape the quotes preset in parameters ('') to have extra quotes. I need output like
String := 'ctx_ddl.add_stopword(''"SHARK_IDX19_SPL"'',''can'');
create index "SCOTT"."SHARK_IDX2"
on "SCOTT"."SHARK2"
("DOC")
indextype is ctxsys.context
parameters(''''
datastore "SHARK_IDX2_DST"
filter "SHARK_IDX2_FIL"
section group "SHARK_IDX2_SGP"
lexer "SHARK_IDX2_LEX"
wordlist "SHARK_IDX2_WDL"
stoplist "SHARK_IDX2_SPL"
storage "SHARK_IDX2_STO"
sync (every "SYSDATE+(1/1)" memory 67108864)
'''')
/;
';
Any help.
There's an age-old saying: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems"
Unless you are confronted by a problem that truly requires regular expressions, I'd recommend working with basic string manipulation functions.
Semi-colon:
Use INSTR to find last occurence of '/', call this P1.
Result = Substr from position 1 through P1||';'||substr from P1+1 through to end-of-string
Parameters substitution:
Use INSTR to find where parameter list starts (i.e. find "parameters(" in your string) and ends (presumably the last closing parenthesis ")" in your string). Call these P2 and P3.
Result = substr from 1 through P2 || REPLACE(substr from P2+1 through P3-1,'''','''''''') || substr from P3 to end-of-string

How do I find/replace rows in oracle having text containing % and other special characters

Gurus, I have huge data with 540K rows and one of the field is Customer Address. Our system doesn't accept special characters '% | ^ \ /' in the address field.
Here is my query that works. How do I make it work for all the special characters in single update query?
select regexp_replace(address,'%',' ') from temp where address like '%\%%' escape '\';
update temp set address=regexp_replace(address,'%',' ') where address like '%\%%' escape '\';
update temp
set address = regexp_replace(address,'[%|^\/]',' ')
where regexp_like(address, '[%|^\/]')

Resources