Oracle REGEX to match two consecutive 'a' - oracle

I am trying to replace two consecutive aa in the oracle database using REGEXP_REPLACE.
The SQL I tried so far is below
select regexp_replace('aab','(a)(a)|(a)(a)','\1 \2 \3') from dual;
The expected result is a ab and the actual result is a a b.
Basically, I want (a)(a) to match with two consecutive aa.What is the regular expression I must use?
Please note I am using this particular SQL as a workaround if there are more than three or more consecutive a
select regexp_replace('aaa','(a)(a)|(a)(a)','\1 \2 \3') from dual;
gives me the result a a a which is expected.

You do not need (slow) regular expressions, you can use the simple (faster) string function REPLACE:
SELECT REPLACE(REPLACE(value, 'aa', 'a a'), 'aa', 'a a') AS output
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'aab' FROM DUAL UNION ALL
SELECT 'aaa' FROM DUAL UNION ALL
SELECT 'abb' FROM DUAL UNION ALL
SELECT 'aabaa' FROM DUAL UNION ALL
SELECT 'aaaaa' FROM DUAL;
Outputs:
Output
a ab
a a a
abb
a aba a
a a a a a
SQL Fiddle here

Related

oracle sorts differently on simple order by and listagg's order by

Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
select letter from (
select 'A' as letter from dual union all
select 'Á' as letter from dual union all
select 'B' as letter from dual union all
select 'C' as letter from dual ) t
order by letter;
result (is okay):
A
Á
B
C
But with this
select listagg(letter,', ') within group (order by letter) from (
select 'A' as letter from dual union all
select 'Á' as letter from dual union all
select 'B' as letter from dual union all
select 'C' as letter from dual ) t;
the order of result letters are different:
A, B, C, Á
Is it a simple Oracle bug in the latter case?
(I don't say anything about NLS. I think these queries should work the same way independetly of NLS.)
Update:
Clarification:
I'm running these queries one-by-another in SqlDeveloper in the same connection.
NLS_SORT HUNGARIAN
NLS_COMP BINARY
(If #MT0's answer is the solution then IMHO it is a bug in oracle using different default NLS setting on a simple order by clause and on a listagg call.)
You need to provide the NLS_SORT setting:
select listagg(letter,', ')
within group (order by NLSSORT(letter, 'NLS_SORT=BINARY_AI')) AS letters
from (
select 'A' as letter from dual union all
select 'Á' as letter from dual union all
select 'B' as letter from dual union all
select 'C' as letter from dual
) t;
Outputs:
LETTERS
A, Á, B, C
Note: LISTAGG does not appear to use the session NLS_SORT setting in its ORDER BY clause; but you can pass it in directly as shown above.
If you want to use the session parameter (rather than a specific value):
select listagg(letter,', ')
within group (
order by NLSSORT(
letter,
( SELECT 'NLS_SORT='||value
FROM NLS_SESSION_PARAMETERS
WHERE parameter = 'NLS_SORT' )
)
) AS letters
from (
select 'A' as letter from dual union all
select 'Á' as letter from dual union all
select 'B' as letter from dual union all
select 'C' as letter from dual
) t;
db<>fiddle here
Ordering of letters depends critically on NLS_SORT, so what you say at the end of your message is perfectly wrong.
The main question is - are you getting those contradictory results on exactly the same system, with all the same NLS settings? If so, then that's a bug in the implementation of the ORDER BY clause in LISTAGG. It would be good to have a test case - show us your NLS settings (the result of select * from v$nls_parameters) followed immediately by the two queries and their outputs. For good measure, show also select * from v$version (telling us your database version).

First three values of a string /000/ must match the first three values of a long number 0007689 oracle

My table has:
column1
TX/W/000/W/0001292SC_00_11-11-091-26W2_2.pdf
TX/W/000/TO/0001222/TX_Code_0001222.pdf
TX/W/000/TO/0001982/TX_Code_0001982.pdf
TX/W/000/TO/0002216/TX_Code_0002216.pdf
TX/W/000/TO/0006002/TX_Code_0006002.pdf
TX/W/006/CA/TX_WCA_006928.PDF
TX/W/702/TO/7021/TO_Data Transmittal_00_11-09-029-21W2_0_2.pdf
TX/W/000/CA/TX_WCA_0007902.PDF
TX/W/011/CA/TX_WDA_0008902.PDF
My current query is:
select REGEXP_SUBSTR (Column1,'\d{4,7}') as Result_set from table1
which gets
Result set
0001292
0001222
0001982
0002216
0006002
006928
7021
0007902
0008902
I have edited my question, it was not clear before, I am so sorry about that.
I would like the three values /006/ to match the first three values of 006928 using Regexp_Substr(), but I'm not sure how to do that.
For example (006) values should be the first values in the long number 006928, if not then ignore.
If your requirement is to isolate the three-digit string between the second and the third slash, and then to see if this three-digit pattern can be found after the fourth slash, you could do something like this. You don't need the WITH clause (I included it for testing, but you have your actual table and actual column name); the query begins at SELECT REGEXP_SUBSTR...
with
table1 (column1) as (
select 'TX/W/000/W/0001292SC_00_11-11-091-26W2_2.pdf' from dual union all
select 'TX/W/000/TO/0001222/TX_Code_0001222.pdf' from dual union all
select 'TX/W/000/TO/0001982/TX_Code_0001982.pdf' from dual union all
select 'TX/W/000/TO/0002216/TX_Code_0002216.pdf' from dual union all
select 'TX/W/000/TO/0006002/TX_Code_0006002.pdf' from dual union all
select 'TX/W/006/CA/TX_WCA_006928.PDF' from dual union all
select 'TX/W/702/TO/7021/TO_Data Transmittal_00_11-09-029-21W2_0_2.pdf'
from dual union all
select 'TX/W/000/CA/TX_WCA_0007902.PDF' from dual
)
select regexp_substr(column1, '\d{4,7}') as result
from table1
where substr(column1, instr(column1, '/', 1, 4) + 1)
like
'%' || substr(column1, instr(column1, '/', 1, 2) + 1,
instr(column1, '/', 1, 3) - instr(column1, '/', 1, 2) - 1)
|| '%'
;
RESULT
--------
0001292
0001222
0001982
0002216
0006002
006928
7021
0007902
Solution for the modified (edited) question.
Assumptions: the input string has a three-digit string between the second and the third forward slash. If the substring starting at the fourth forward slash contains a sub-substring of four or more consecutive digits, take the first such occurrence, and return the row only if the first three digits of this substring equals the substring found between the second and the third forward slash.
NOTE: The OP keeps looking for a string of digits between four and seven digits long. But with nothing following \d{4, 7}, this will match strings of digits of any length greater or equal to 4. If the requirement is that the string of digits in the "fifth token" be no longer than 7 digits, that can be accommodated, but it didn't seem to be part of the problem.
With that said: (notice that the last row, which I added for further testing, is NOT selected in the output - only matching at the BEGINNING of the string of 4-7 digits is valid, matching later in the string is not enough).
with
table1 (column1) as (
select 'TX/W/000/W/0001292SC_00_11-11-091-26W2_2.pdf' from dual union all
select 'TX/W/000/TO/0001222/TX_Code_0001222.pdf' from dual union all
select 'TX/W/000/TO/0001982/TX_Code_0001982.pdf' from dual union all
select 'TX/W/000/TO/0002216/TX_Code_0002216.pdf' from dual union all
select 'TX/W/000/TO/0006002/TX_Code_0006002.pdf' from dual union all
select 'TX/W/006/CA/TX_WCA_006928.PDF' from dual union all
select 'TX/W/702/TO/7021/TO_Data Transmittal_00_11-09-029-21W2_0_2.pdf'
from dual union all
select 'TX/W/000/CA/TX_WCA_0007902.PDF' from dual union all
select 'TX/W/007/ZZ/TX_WCA_0007902.PDF' from dual
)
select regexp_substr(column1, '\d{4,7}') as result
from table1
where regexp_substr(column1, '^([^/]*/){2}(\d{3})', 1, 1, null, 2)
=
regexp_substr(column1, '^([^/]*/){4}\D*(\d{3})\d+', 1, 1, null, 2)
;
RESULT
--------------------------------------------------------------
0001292
0001222
0001982
0002216
0006002
006928
7021
0007902
In SELECT we pick the first four to seven digits from the "long string of digits" (not sure if seven is guaranteed to be the upper bound, or if there may be longer strings of digits but at most seven digits must be selected, etc.)
The first regular expression in the WHERE clause is anchored at the beginning of the string (the ^ character at the beginning of the regular expression); then it looks for two occurrences of (zero or more non-slash characters followed by one slash), and then three digits. Parentheses in a regular expression create subexpressions, which can be referenced in the REGEXP functions. The sixth argument, 2, means select the second subexpression - in this case, the three digits after the second slash.
The last regular expression is similar: look for the fourth slash, followed by zero or more non-digits (\D) followed by three digits AS A SUBEXPRESSION (enclosed in parentheses) and immediately followed by at least one more digit (to enforce the requirement that the string of digits be at least FOUR digits long). Select the second subexpression, meaning the three digits at the beginning of the "long string of digits". Note that if no such "long string of digits" even exists, then the WHERE clause will automatically fail (that row will not be selected).
You could do:
select regexp_substr (column1,'\d{4,7}') as result_set
from table1
where regexp_substr (regexp_substr (column1,'\d{4,7}'), '006') is not null;
or with regexp_like():
select regexp_substr (column1,'\d{4,7}') as result_set
from table1
where regexp_like (regexp_substr (column1,'\d{4,7}'), '006');
or just plain non-regex like:
select regexp_substr (column1,'\d{4,7}') as result_set
from table1
where regexp_substr (column1,'\d{4,7}') like '%006%';
With your sample data, any of those return:
RESULT_SET
--------------------------------------------------------------
0006002
006928

regex function using only a single quote

I want to select with regex if the column has one single quote only.
Example..
Column1: who's responsible of this
Column2: who''s responsible of this
With this query
Select regexp_substr(column1,'''') from ex_tab
will always consider colum2 that have one single quote but actually it has 2
I want to select only one single quote not the doubles
I cannot use instr function because i might have who's responsible of the's
Um, you can certainly use INSTR to do this:
with str as (select 'who''s in charge?' col1 from dual union all
select 'who''''s in charge?' col1 from dual union all
select 'who''s in charge? I''m in charge!' col1 from dual union all
select 'who is in charge?' col1 from dual union all
select 'who''''s in charge? I''''m in charge!' from dual)
select col1,
case when instr(col1, '''''', 1) != 0 then 'no' else 'yes' end is_ok
from str;
COL1 IS_OK
--------------------------------- -----
who's in charge? yes
who''s in charge? no
who's in charge? I'm in charge! yes
who is in charge? yes
who''s in charge? I''m in charge! no
It'll most likely be faster than the regexp way.
You can negate characters in set using [^].
Some manual. I just noticed that there is no info there that ^ sign as first character in square brackets means "not in this set".
So regexp like:
^('?([^']+'[^']+'?)?)*$
should match everything that do not have two quotes one after another in it. And match single quote. And match strings that have more than one quote, separated by another string.
So, to put it as complete example, what Your want wloud be:
Select
column1,
REGEXP_COUNT(column1,'^(''?([^'']+''[^'']+''?)?)*$')
from
(
select 'who''''s responsible of this' as column1 FROM dual
union
select 'who''s responsible of this' as column1 FROM dual
union
select 'who''s responsible ''of'' this' as column1 FROM dual
union
select '''who''s responsible ''of'' this' as column1 FROM dual
union
select '''' as column1 FROM dual
)
Where is_ok column indicates wether the value contains two quotes, or not.
EDIT:
I just realized that there is much simpler solution. Just check if there are two or more occurances of quotes one after another. So, regexp would be just:
'{2,}
and here is working example, be aware of changing name of column to is_not_ok:
Select
column1,
REGEXP_COUNT(column1,'''{2,}') as is_not_ok
from
(
select 'who''''s responsible of this' as column1 FROM dual
union
select 'who''s responsible of this' as column1 FROM dual
union
select 'who''s responsible ''of'' this' as column1 FROM dual
union
select '''who''s responsible ''of'' this' as column1 FROM dual
union
select '''' as column1 FROM dual
union
select 'this''' as column1 FROM dual
union
select '''''' as column1 FROM dual
)

How to order by case insensitive ASC or DESC, with DISTINCT and UNION

How to order by case insensitive ASC or DESC for P/L sql 11g. this p/l sql basic question but i can't find good answer in Google please tell how to sort the select result case insensitive
this what i tried
SELECT DISTINCT
asssss,
saas_acc
FROM DUAL
UNION SELECT '--ALL--','ALL' FROM DUAL
ORDER BY upper(asssss) ASC ;
that gave to me ORA-01785: ORDER BY item must be the number of a SELECT-list expression
The simplest option would be to sort by the upper- (or lower-) case column data
ORDER BY UPPER( column_name )
DISTINCT actually filtered the UNIQUE content in the result set, with whatever expressions given in the SELECT clause.
We cannot order it using a Different expression or column name. Please see the example here.
SQL> l
1 SELECT DISTINCT (col1),(col2)
2 FROM
3 ( SELECT 'Hello' col1,'World' col2 FROM DUAL
4 UNION ALL
5 SELECT 'HELLO','WORLD' FROM DUAL
6* )
SQL> /
COL1 COL2
----- -----
HELLO WORLD
Hello World
You can see that DISTINCT is CASE SENSITIVE here.(2 rows displayed)
So, let me Do a UPPER() on both columns.
SQL> l
1 SELECT DISTINCT UPPER (col1),UPPER(col2)
2 FROM
3 ( SELECT 'Hello' col1,'World' col2 FROM DUAL
4 UNION ALL
5 SELECT 'HELLO','WORLD' FROM DUAL
6* )
SQL> /
UPPER UPPER
----- -----
HELLO WORLD
Just 1 row is Displayed, ignoring the case.
Coming back to the actual problem. To order something on a DISTINCT Resultset, it has to be a part of DISTINCT clause's expression/column.
So, When you issue DISTINCT COL1,COl2, the order by may be by COL1 or COL2/.. it cannot be COL3 or even UPPER(COL1) because UPPER() makes a different expression conflicting the expression over DISTINCT.
Finally, Answer for your Question would be
if you want your ORDER to be case-insensitive, DISTINCT also has to the same way! As given below
SELECT DISTINCT
UPPER(asssss),
saas_acc
FROM DUAL
ORDER BY upper(asssss) ASC ;
OR if UNION has to be used, better do this, or same as above one.
SELECT * FROM
(
SELECT DISTINCT asssss as asssss,
saas_acc
FROM DUAL
UNION
SELECT '--ALL--','ALL' FROM DUAL
)
ORDER BY upper(asssss) ASC ;
Out of my own Experience, I had always felt, what ever expression/column is specified in the ORDER BY, it is implicitly taken to final SELECT as well. Ordering is just based on the column number(position) in the result actually . In this situation, DISTINCT COL1,COl2 is already there. When you give ORDER BY UPPER(COL1), it will be tried to append into the SELECT expression, which is NOT possible at all. So, Semantic check itself, would disqualify this query with an Error!
To sort case insensitive you need to set the NLS_COMP to ANSI
NLS_COMP=ANSI
Details: http://www.orafaq.com/node/999
You can use upper or lower functions.
order by upper(columnName)
Update1
Try removing order-by clause from your query which will give you correct error, which is ORA-00904: "SAAS_ACC": invalid identifier. So you can search on google for this error or ask another question on SO.
Also have a look at how to use order by in union.

how to replace multiple strings together in Oracle

I have a string coming from a table like "can no pay{1},as your payment{2}due on {3}". I want to replace {1} with some value , {2} with some value and {3} with some value .
Is it Possible to replace all 3 in one replace function ? or is there any way I can directly write query and get replaced value ? I want to replace these strings in Oracle stored procedure the original string is coming from one of my table I am just doing select on that table
and then I want to replace {1},{2},{3} values from that string to the other value that I have from another table
Although it is not one call, you can nest the replace() calls:
SET mycol = replace( replace(mycol, '{1}', 'myoneval'), '{2}', mytwoval)
If there are many variables to replace and you have them in another table and if the number of variables is variable you can use a recursive CTE to replace them.
An example below. In table fg_rulez you put the strings with their replacement. In table fg_data you have your input strings.
set define off;
drop table fg_rulez
create table fg_rulez as
select 1 id,'<' symbol, 'less than' text from dual
union all select 2, '>', 'great than' from dual
union all select 3, '$', 'dollars' from dual
union all select 4, '&', 'and' from dual;
drop table fg_data;
create table fg_Data AS(
SELECT 'amount $ must be < 1 & > 2' str FROM dual
union all
SELECT 'John is > Peter & has many $' str FROM dual
union all
SELECT 'Eliana is < mary & do not has many $' str FROM dual
);
WITH q(str, id) as (
SELECT str, 0 id
FROM fg_Data
UNION ALL
SELECT replace(q.str,symbol,text), fg_rulez.id
FROM q
JOIN fg_rulez
ON q.id = fg_rulez.id - 1
)
SELECT str from q where id = (select max(id) from fg_rulez);
So, a single replace.
Result:
amount dollars must be less than 1 and great than 2
John is great than Peter and has many dollars
Eliana is less than mary and do not has many dollars
The terminology symbol instead of variable comes from this duplicated question.
Oracle 11gR2
Let's write the same sample as a CTE only:
with fg_rulez as (
select 1 id,'<' symbol, 'less than' text from dual
union all select 2, '>', 'greater than' from dual
union all select 3, '$', 'dollars' from dual
union all select 4, '+', 'and' from dual
), fg_Data AS (
SELECT 'amount $ must be < 1 + > 2' str FROM dual
union all
SELECT 'John is > Peter + has many $' str FROM dual
union all
SELECT 'Eliana is < mary + do not has many $' str FROM dual
), q(str, id) as (
SELECT str, 0 id
FROM fg_Data
UNION ALL
SELECT replace(q.str,symbol,text), fg_rulez.id
FROM q
JOIN fg_rulez
ON q.id = fg_rulez.id - 1
)
SELECT str from q where id = (select max(id) from fg_rulez);
If the number of values to replace is too big or you need to be able to easily maintain it, you could also split the string, use a dictionary table and finally aggregate the results
In the example below I'm assuming that the words in your string are separated with blankspaces and the wordcount in the string will not be bigger than 100 (pivot table cardinality)
with Dict as
(select '{1}' String, 'myfirstval' Repl from dual
union all
select '{2}' String, 'mysecondval' Repl from dual
union all
select '{3}' String, 'mythirdval' Repl from dual
union all
select '{Nth}' String, 'myNthval' Repl from dual
)
,MyStrings as
(select 'This is the first example {1} ' Str, 1 strnum from dual
union all
select 'In the Second example all values are shown {1} {2} {3} {Nth} ', 2 from dual
union all
select '{3} Is the value for the third', 3 from dual
union all
select '{Nth} Is the value for the Nth', 4 from dual
)
-- pivot is used to split the stings from MyStrings. We use a cartesian join for this
,pivot as (
Select Rownum Pnum
From dual
Connect By Rownum <= 100
)
-- StrtoRow is basically a cartesian join between MyStings and Pivot.
-- There as many rows as individual string elements in the Mystring Table
-- (Max = Numnber of rows Mystring table * 100).
,StrtoRow as
(
SELECT rownum rn
,ms.strnum
,REGEXP_SUBSTR (Str,'[^ ]+',1,pv.pnum) TXT
FROM MyStrings ms
,pivot pv
where REGEXP_SUBSTR (Str,'[^ ]+',1,pv.pnum) is not null
)
-- This is the main Select.
-- With the listagg function we group the string together in lines using the key strnum (group by)
-- The NVL gets the translations:
-- if there is a Repl (Replacement from the dict table) then provide it,
-- Otherwise TXT (string without translation)
Select Listagg(NVL(Repl,TXT),' ') within group (order by rn)
from
(
-- outher join between strings and the translations (not all strings have translations)
Select sr.TXT, d.Repl, sr.strnum, sr.rn
from StrtoRow sr
,dict d
where sr.TXT = d.String(+)
order by strnum, rn
) group by strnum
If you are doing this inside of a select, you can just piece it together, if your replacement values are columns, using string concatenation.

Resources