I needed to use Oracle 11g's Contains() function to search some exact text contained in some field typed by the user. I was asked not to use the 'like' operator.
According to the Oracle documentation, for everything to work you need to:
Double } characters
Put the whole input between {}
This works in most cases except for a few ones. Below it a test case:
create table theme
(name varchar2(300 char) not null);
insert into theme (name)
values ('a');
insert into theme (name)
values ('b');
insert into theme (name)
values ('a or b');
insert into theme (name)
values ('Pdz344_1_b');
create index name_index on theme(name) indextype is ctxsys.context;
If the 'or' operator was interpreted, I would get all four results, which is hopefully not the case. Now if I run the following, I would expect is to only find 'a or b'.
select * from theme
where contains(name, '{a or b}')>0;
However I also get 'Pdz344_1_b'. But there's no 'a', 'o' not 'r' and I find it very surprising that this text is matched. Is there something I don't get about contains()'s syntax?
CONTAINS is not like LIKE operator at all. Since it using ORACLE TEXT search engine (something like google search), not just string matching.
{} - is an escape marker. Means everything you put inside should be treated as text to escape.
Therefore you issue query to find text that looks like a or b not like a or b.
So your query get matched against Pdz344_1_b because it has b char in it.
Row with only a character ain't matched because a character exists in the default stop list.
Why just b ain't matched? Because your match sequence actually looks like a\ or\ b.
So we have 3 tokens a _or _b (underscores represents spaces). a in stop list, and we have no string _b in the b row, because there only single character. But we do have this combination in the Pdz344_1_b row, because non-alphabetic characters are treated as whitespace. If you remove {} or query for {b or a} then you'll get matches against b as well.
Related
I have a string of comma separated values, that I want to trim down for display purpose.
The string is a comma separated list of values of varying lengths and number of list entries.
Each entry in the list is formatted as a five character pattern in the format "##-NX" followed by some text.
e.g., "01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc..."
Is there an regular expression function I can use to remove the text after the 5 character prefix portion of each entry in the list, returning "01-NX, 02-NX, 09-NX, 12-NX,..."?
I am a novice with regular expressions and I haven't been able figure out how to code the pattern.
I think what you need is
regexp_replace(regexp_replace(mystring, '(\d{2}-NX)(.*?)(,)', '\1\3'), '(\d{2}.*NX).*', '\1')
The inner REGEXP_REPLACE looks for a pattern like nn-NX (two numeric characters followed by "-NX") and any number of characters up to the next comma, then replaces it with the first and third term, dropping the "any number of characters" part.
The outer REGEXP_REPLACE looks for a pattern like two numeric characters followed by any number of characters up to the last NX, and keeps that part of the string.
Here is the Oracle code I used for testing:
with a as (
select '01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc.' as myString
from dual
)
select mystring
, regexp_replace(regexp_replace(mystring, '(\d{2}-NX)(.*?)(,)', '\1\3'), '(\d{2}.*NX).*', '\1') as output
from a
This alternative calls REGEXP_REPLACE() once.
Match 2 digits, a dash and 'NX' followed by any number of zero or more characters (non-greedy) where followed by a comma or the end of the string. Replace with the first group and the 3rd group which will be either the comma or the end of the string.
EDIT: Took dougp's advice and eliminated the RTRIM by adding the 3rd capture group. Thanks for that!
WITH tbl(str) AS (
SELECT '01-NX sometext, 02-NX morertext, 09-NX othertext, 12-NX etc.' FROM dual
)
SELECT
REGEXP_REPLACE(str, '(\d{2}-NX)(.*?)(,|$)', '\1\3') str
from tbl;
I would like to search for this string 'A&G BROS, INC.' using oracle contains statement
FROM contact
WHERE CONTAINS
(name, 'A&G BROS, INC.') > 0
But I do not get accurate results I get over 300,000 records basically anything containing INC.
I tried escaping the & char using
FROM contact
WHERE CONTAINS
(name, 'A&' || 'G BROS, INC.') > 0
I still get same massive results
Any idea how to run this query with this special chars I want to narrow the results down so I can al least get results that starts with "A&G" Note "LIKE" and "INSTR" cannot be used.
Another way to deal with the special characters is to use the function CHR(n), where n is the ASCII value of the special character. For &, it is 38, so instead of
'A&G BROS, INC.' you can use 'A'||CHR(38)||'G BROS, INC.'
Using these special characters directly in literals can be tricky, because they can behave differently in different environments.
You can find the ASCII value of a character using the ASCII function, like this:
select ascii('&') from dual;
ASCII('&')
38
The & is AND, but the , is also ACCUM. The behaviour of those operators explains what you are seeing.
You need to escape those characters:
To query on words or symbols that have special meaning in query expressions such as and & or| accum, you must escape them. There are two ways to escape characters in a query expression...
So you could do:
FROM contact
WHERE CONTAINS
(name, 'A\&G BROS\, INC.') > 0
or
FROM contact
WHERE CONTAINS
(name, 'A{&}G BROS{,} INC.') > 0
or
FROM contact
WHERE CONTAINS
(name, '{A&G BROS, INC.}') > 0
If you can't stop your client prompting for substitution variables - which is really a separate issue to the contains escapes - then you could combine this with your original approach:
FROM contact
WHERE CONTAINS
(name, '{A&' || 'G BROS, INC.}') > 0
I can do a MariaDB fulltext query which searches for the word beginning like this:
select * from mytable
where match(mycol) against ('+test*' in boolean mode)>0.0;
This finds words like "test", "tester", "testing".
If my search string contains special characters, I can put the search string in quotes:
select * from mytable
where match(mycol) against ('+"test-server"' in boolean mode)>0.0;
This will find all rows which contain the string test-server.
But it seems I cannot combine both:
select * from mytable
where match(mycol) against ('+"test-serv"*' in boolean mode)>0.0;
This results in an error:
Error: (conn:7) syntax error, unexpected $end, expecting FTS_TERM or FTS_NUMB or '*'
SQLState: 42000
ErrorCode: 1064
Placing the ´*´ in the quoted string will return no results (as expected):
select * from mytable
where match(mycol) against ('+"test-serv*"' in boolean mode)>0.0;
Does anybody know whether this is a limitation of MariaDB? Or a bug?
My MariaDB version is 10.0.31
WHERE MATCH(mycol) AGAINST('+test +serv*' IN BOOLEAN MODE)
AND mycol LIKE '%test_serv%'
The MATCH will find the desired rows plus some that are not desired. Then the LIKE will filter out the duds. Since the LIKE is being applied to only some rows, its slowness is masked.
(Granted, this does not work in all cases. And it requires some manual manipulation.)
d'Artagnan - Use
WHERE MATCH(mycol) AGAINST("+Arta*" IN BOOLEAN MODE)
AND mycol LIKE '%d\'Artagnan%'
Note that I used the suitable escaping for getting the apostrophe into the LIKE string.
So, the algorithm for your code goes something like:
Break the string into "words" the same way FULLTEXT would.
Toss any strings that are too short.
If no words are left, then you cannot use FULLTEXT and are stuck with a slow LIKE.
Stick * after the last word (or each word?).
Build the AGAINST with those word(s).
Add on AND LIKE '%...%' with the original phrase, suitably escaped.
I have a query like this:
INSERT INTO TAB_AUTOCRCMTREQUESTS
(RequestOrigin, RequestKey, CommentText) VALUES ('Tracker', 'OPM03865_0', '[Orange.Security.OrangePrincipal]
em[u02650791]okok
it's friday!')
As expected it is throwing an error of missing comma, due to this it's friday! which has a single quote.
I want to remove this single quote while inserting using Replace function.
How can this be done?
Reason for error is because of the single Quote. In order to correct it, you shall not remove the single quote instead you need to add one more i.e. you need to make it's friday to it''s friday while inserting.
If you need to replace it for sure, then try the below code :
insert into Blagh values(REPLACE('it''s friday', '''', ''),12);
I would suggest using Oracle q quote.
Example:
INSERT INTO TAB_AUTOCRCMTREQUESTS (RequestOrigin, RequestKey, CommentText)
VALUES ('Tracker', 'OPM03865_0',
q'{[Orange.Security.OrangePrincipal] em[u02650791]okok it's friday!}')
You can read about q quote here.
To shorten this article you will follow this format: q'{your string here}' where "{" represents the starting delimiter, and "}" represents the ending delimiter. Oracle automatically recognizes "paired" delimiters, such as [], {}, (), and <>. If you want to use some other character as your start delimiter and it doesn't have a "natural" partner for termination, you must use the same character for start and end delimiters.
Obviously you can't user [] delimiters because you have this in your queries. I sugest using {} delimiters.
Of course you can use double qoute in it it''s with replace. You can omit last parameter in replace because it isn't mandatory and without it it automatically will remove ' character.
INSERT INTO TAB_AUTOCRCMTREQUESTS (CommentText) VALUES (REPLACE('...it''s friday!', ''''))
Single quotes are escaped by doubling them up
INSERT INTO Blagh VALUES(REPLACE('it''s friday', '''', ''),12);
You can try this, (sorry but I don't know why q'[ ] works)
INSERT INTO TAB_AUTOCRCMTREQUESTS
(RequestOrigin, RequestKey, CommentText) VALUES ('Tracker', 'OPM03865_0', q'[[Orange.Security.OrangePrincipal] em[u02650791]okok it's friday!]')
I just got the q'[] from this link Oracle pl-sql escape character (for a " ' ") - this question could be a possible duplicate
I would like to know whats the XPath equivalent to SQL In query. Basically in sql i can do this:
select * from tbl1 where Id in (1,2,3,4)
so i want something similar in XPath/Xsl:
i.e.
//*[#id= IN('51417','1121','111')]
Please advice
(In XPath 2,) the = operator always works like in.
I.e. you can use
//*[#id = ('51417','1121','111')]
A solution is to write out the options as separate conditions:
//*[(#id = '51417') or (#id = '1121') or (#id = '111')]
Another, slightly less verbose solution that looks a bit like a hack, though, would be to use the contains function:
//*[contains('-51417-1121-111-', concat('-', #id, '-'))]
Literally, this means you're checking whether the value of the id attribute (preceeded and succeeded by a delimiter character) is a substring of -51417-1121-111-. Note that I am using a hyphen (-) as a delimiter of the allowable values; you can replace that with any character that will not appear in the id attribute.