Oracle regexp_substr to get first portion of a string - oracle

I am stuck here. I am using oracle and I want to get the first part of a string before the first appearance of '|'. This is my query but it returns the last part i.e 25.0. I want it to return first part i.e 53. How do I achieve that?
select regexp_substr('53|100382951130|25.0', '[^|]+$', 1,1) as part1 from dual

Assuming you always have at least one occurrence of '|', you can use the following, with no regexp:
with test(string) as ( select '53|100382951130|25.0' from dual)
select substr(string, 1, instr(string, '|')-1)
from test
You could even use regexp to achieve the same thing, or even handle the case in which you have no '|':
with test(string) as (
select '53|100382951130|25.0' from dual union all
select '1234567' from dual)
select string,
substr(string, 1, instr(string, '|')-1),
regexp_substr(string, '[^|]*')
from test
You can even handle the case with no occurrence of '|' without regexp:
with test(string) as (
select '53|100382951130|25.0' from dual union all
select '1234567' from dual)
select string,
substr(string, 1, instr(string, '|')-1),
regexp_substr(string, '[^|]*'),
substr(string, 1,
case
when instr(string, '|') = 0
then length(string)
else
instr(string, '|')-1
end
)
from test

Related

How to Shorted Regular Expression

I have a RegExp as below, when I use it in Oracle SQL, I got ORA-12723 error, how can I let it in the shortest format?
WITH test_data ( str ) AS (
SELECT 'This is extension 1234, here is mobile phone: 090-1234-5678 maybe 8+24-98765432. Then +1-(234)-090-345 also 86 21-4566-4556' AS str FROM DUAL
)
SELECT TRIM(
TRAILING ',' FROM
REGEXP_REPLACE(
str,
'.*?(\+?\d{1,11}[-,\+]\d{1,11}[-,\+]\d{1,11}[-,\+]\d{1,11}[-,\+]\d{1,11}[-,\+]\d{3,11}|\+?\d{1,11}[-,\+]\d{1,11}[-,\+]\d{1,11}[-,\+]\d{1,11}[-,\+]\d{3,11}|\+?\d{1,11}[-,\+]\d{1,11}[-,\+]\d{1,11}[-,\+]\d{3,11}|\+?\d{1,11}[-,\+]\d{1,11}[-,\+]\d{3,11}|\+?\d{1,11}[-,\+]\d{3,11}|\d{3,11}|$)',
'\1,'
)
) AS replaced_str
FROM test_data
The result what I wonder as below:
1234,090-1234-5678,8+24-98765432,+1-(234)-090-345,86 21-4566-4556
Consider this approach. This uses CONNECT BY to traverse the string and parse it into elements that are separated by a space or the end of the line. Then for each element, remove non-digit characters ('\D'). Lastly use LISTAGG() to put the elements back into one comma delimited string.
WITH test_data(str) AS (
SELECT 'Txa233g141b Ta233141 Ta233142 Ta233147zz Ta233xx148zz' AS str FROM DUAL
)
SELECT listagg(regexp_replace(regexp_substr(str, '(.*?)( |$)', 1, level, null, 1), '\D'), ',')
within group (order by str) replaced_str
FROM test_data
connect by level <= regexp_count(str, ' ') + 1;
REPLACED_STR
--------------------------------------------------------------------------------
233141,233141,233142,233147,233148
1 row selected.

First three values of a string /000/ must match the first three values of a long number 0007689 oracle

My table has:
column1
TX/W/000/W/0001292SC_00_11-11-091-26W2_2.pdf
TX/W/000/TO/0001222/TX_Code_0001222.pdf
TX/W/000/TO/0001982/TX_Code_0001982.pdf
TX/W/000/TO/0002216/TX_Code_0002216.pdf
TX/W/000/TO/0006002/TX_Code_0006002.pdf
TX/W/006/CA/TX_WCA_006928.PDF
TX/W/702/TO/7021/TO_Data Transmittal_00_11-09-029-21W2_0_2.pdf
TX/W/000/CA/TX_WCA_0007902.PDF
TX/W/011/CA/TX_WDA_0008902.PDF
My current query is:
select REGEXP_SUBSTR (Column1,'\d{4,7}') as Result_set from table1
which gets
Result set
0001292
0001222
0001982
0002216
0006002
006928
7021
0007902
0008902
I have edited my question, it was not clear before, I am so sorry about that.
I would like the three values /006/ to match the first three values of 006928 using Regexp_Substr(), but I'm not sure how to do that.
For example (006) values should be the first values in the long number 006928, if not then ignore.
If your requirement is to isolate the three-digit string between the second and the third slash, and then to see if this three-digit pattern can be found after the fourth slash, you could do something like this. You don't need the WITH clause (I included it for testing, but you have your actual table and actual column name); the query begins at SELECT REGEXP_SUBSTR...
with
table1 (column1) as (
select 'TX/W/000/W/0001292SC_00_11-11-091-26W2_2.pdf' from dual union all
select 'TX/W/000/TO/0001222/TX_Code_0001222.pdf' from dual union all
select 'TX/W/000/TO/0001982/TX_Code_0001982.pdf' from dual union all
select 'TX/W/000/TO/0002216/TX_Code_0002216.pdf' from dual union all
select 'TX/W/000/TO/0006002/TX_Code_0006002.pdf' from dual union all
select 'TX/W/006/CA/TX_WCA_006928.PDF' from dual union all
select 'TX/W/702/TO/7021/TO_Data Transmittal_00_11-09-029-21W2_0_2.pdf'
from dual union all
select 'TX/W/000/CA/TX_WCA_0007902.PDF' from dual
)
select regexp_substr(column1, '\d{4,7}') as result
from table1
where substr(column1, instr(column1, '/', 1, 4) + 1)
like
'%' || substr(column1, instr(column1, '/', 1, 2) + 1,
instr(column1, '/', 1, 3) - instr(column1, '/', 1, 2) - 1)
|| '%'
;
RESULT
--------
0001292
0001222
0001982
0002216
0006002
006928
7021
0007902
Solution for the modified (edited) question.
Assumptions: the input string has a three-digit string between the second and the third forward slash. If the substring starting at the fourth forward slash contains a sub-substring of four or more consecutive digits, take the first such occurrence, and return the row only if the first three digits of this substring equals the substring found between the second and the third forward slash.
NOTE: The OP keeps looking for a string of digits between four and seven digits long. But with nothing following \d{4, 7}, this will match strings of digits of any length greater or equal to 4. If the requirement is that the string of digits in the "fifth token" be no longer than 7 digits, that can be accommodated, but it didn't seem to be part of the problem.
With that said: (notice that the last row, which I added for further testing, is NOT selected in the output - only matching at the BEGINNING of the string of 4-7 digits is valid, matching later in the string is not enough).
with
table1 (column1) as (
select 'TX/W/000/W/0001292SC_00_11-11-091-26W2_2.pdf' from dual union all
select 'TX/W/000/TO/0001222/TX_Code_0001222.pdf' from dual union all
select 'TX/W/000/TO/0001982/TX_Code_0001982.pdf' from dual union all
select 'TX/W/000/TO/0002216/TX_Code_0002216.pdf' from dual union all
select 'TX/W/000/TO/0006002/TX_Code_0006002.pdf' from dual union all
select 'TX/W/006/CA/TX_WCA_006928.PDF' from dual union all
select 'TX/W/702/TO/7021/TO_Data Transmittal_00_11-09-029-21W2_0_2.pdf'
from dual union all
select 'TX/W/000/CA/TX_WCA_0007902.PDF' from dual union all
select 'TX/W/007/ZZ/TX_WCA_0007902.PDF' from dual
)
select regexp_substr(column1, '\d{4,7}') as result
from table1
where regexp_substr(column1, '^([^/]*/){2}(\d{3})', 1, 1, null, 2)
=
regexp_substr(column1, '^([^/]*/){4}\D*(\d{3})\d+', 1, 1, null, 2)
;
RESULT
--------------------------------------------------------------
0001292
0001222
0001982
0002216
0006002
006928
7021
0007902
In SELECT we pick the first four to seven digits from the "long string of digits" (not sure if seven is guaranteed to be the upper bound, or if there may be longer strings of digits but at most seven digits must be selected, etc.)
The first regular expression in the WHERE clause is anchored at the beginning of the string (the ^ character at the beginning of the regular expression); then it looks for two occurrences of (zero or more non-slash characters followed by one slash), and then three digits. Parentheses in a regular expression create subexpressions, which can be referenced in the REGEXP functions. The sixth argument, 2, means select the second subexpression - in this case, the three digits after the second slash.
The last regular expression is similar: look for the fourth slash, followed by zero or more non-digits (\D) followed by three digits AS A SUBEXPRESSION (enclosed in parentheses) and immediately followed by at least one more digit (to enforce the requirement that the string of digits be at least FOUR digits long). Select the second subexpression, meaning the three digits at the beginning of the "long string of digits". Note that if no such "long string of digits" even exists, then the WHERE clause will automatically fail (that row will not be selected).
You could do:
select regexp_substr (column1,'\d{4,7}') as result_set
from table1
where regexp_substr (regexp_substr (column1,'\d{4,7}'), '006') is not null;
or with regexp_like():
select regexp_substr (column1,'\d{4,7}') as result_set
from table1
where regexp_like (regexp_substr (column1,'\d{4,7}'), '006');
or just plain non-regex like:
select regexp_substr (column1,'\d{4,7}') as result_set
from table1
where regexp_substr (column1,'\d{4,7}') like '%006%';
With your sample data, any of those return:
RESULT_SET
--------------------------------------------------------------
0006002
006928

How to get character or string after nth occurrence of pipeline '|' symbol in ORACLE using REGULAR_EXPRESSION?

What is the regular expression query to get character or string after nth occurrence of pipeline | symbol in ORACLE? For example I have two strings as follows,
Jack|Sparrow|17-09-16|DY7009|Address at some where|details
|Jack|Sparrow|17-09-16||Address at some where|details
I want 'DY7009' which is after 3rd pipeline symbol starting from 1st position, So what will be regular expression query for this? And in second string suppose that 1st position having | symbol, then I want 4th string if there is no value then it should give NULL or BLANK value.
select regexp_substr('Jack|Sparrow|17-09-16|DY7009|Address at some where|details'
,' ?? --REX Exp-- ?? ') as col
from dual;
Result - DY7009
select regexp_substr('Jack|Sparrow|17-09-16|DY7009|Address at some where|details'
,' ?? --REX Exp-- ?? ') as col
from dual;
Result - '' or (i.e. NULL)
So what should be the regexp? Please help. Thank you in Advance
NEW UPDATE Edit ---
Thank you all guys!!, I appreciate your answer!!. I think, I didn't ask question right. I just want a regular expression to get 'string/character string' after nth occurrence of pipeline symbol. I don't want to replace any string so only regexp_substr will do the job.
----> If 'Jack|Sparrow|SQY778|17JULY17||00J1' is a string
I want to find string value after 2nd pipe line symbol here the answer will be SQY778. If i want to find string after 3rd pipeline symbol then answer will be 17JULY17. And if I want to find value after 4th pipeline symbol then it should give BLANK or NULL value because there is nothing after 4th pipeline symbol. If I want to find string 5th symbol then I will only replace one digit in Regular expression i.e. 5 and I will get 00J1 as a result.
Here ya go. Replace the 4th argument to regexp_substr() with the number of the field you want.
with tbl(str) as (
select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
)
select regexp_substr(str, '(.*?)(\||$)', 1, 4, NULL, 1) field_4
from tbl;
FIELD_4
--------
DY7009
SQL>
To list all the fields:
with tbl(str) as (
select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
)
select regexp_substr(str, '(.*?)(\||$)', 1, level, NULL, 1) split
from tbl
connect by level <= regexp_count(str, '\|')+1;
SPLIT
-------------------------
Jack
Sparrow
17-09-16
DY7009
Address at some where
details
6 rows selected.
SQL>
So if you want select fields you could use:
with tbl(str) as (
select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
)
select
regexp_substr(str, '(.*?)(\||$)', 1, 1, NULL, 1) first,
regexp_substr(str, '(.*?)(\||$)', 1, 2, NULL, 1) second,
regexp_substr(str, '(.*?)(\||$)', 1, 3, NULL, 1) third,
regexp_substr(str, '(.*?)(\||$)', 1, 4, NULL, 1) fourth
from tbl;
Note this regex handles NULL elements and will still return the correct value. Some of the other answers use the form '[^|]+' for parsing the string but this fails when there is a NULL element and should be avoided. See here for proof: https://stackoverflow.com/a/31464699/2543416
Don't have enough reputation to comment on Chris Johnson's answer so adding my own. Chris has the correct approach of using back-references but forgot to escape the Pipe character.
The regex will look like this.
WITH dat
AS (SELECT 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details' AS str,
3 AS pos
FROM DUAL
UNION
SELECT ' |Jack|Sparrow|17-09-16||Address at some where|details' AS str,
4 AS pos
FROM DUAL)
SELECT str,
pos,
REGEXP_REPLACE (str, '^([^\|]*\|){' || pos || '}([^\|]*)\|.*$', '\2')
AS regex_result
FROM dat;
I'm creating the regex dynamically by adding the position of the Pipe character dynamically.
The result looks like this.
|Jack|Sparrow|17-09-16||Address at some where|details (4):
Jack|Sparrow|17-09-16|DY7009|Address at some where|details (3): DY7009
You can use regex_replace to get the nth matching group. In your example, the fourth match could be retrieved like this:
select regexp_replace(
'Jack|Sparrow|17-09-16|DY7009|Address at some where|details',
'^([^\|]*\|){3}([^\|]*)\|.*$',
'\4'
) as col
from dual;
Edit: Thanks Arijit Kanrar for pointing out the missing escape characters.
To OP: regex_replace doesn't replace anything in the database, only in the returned string.
You can use this query to get the value at the specific column ( nth occurrence ) as follows
SELECT nth_string
FROM
(SELECT TRIM (REGEXP_SUBSTR (long_string, '[^|]+', 1, ROWNUM) ) nth_string ,
level AS lvl
FROM
(SELECT REPLACE('Jack|Sparrow|17-09-16|DY7009|Address at some where|details','||','| |') long_string
FROM DUAL
)
CONNECT BY LEVEL <= REGEXP_COUNT ( long_string, '[^|]+')
)
WHERE lvl = 4;
Note that i am using the standard query in oracle to split a delimited string into records. To handle blank between delimiters as in your second case, i am replacing it with a space ' ' . The space gets converted to NULL after applying TRIM() function.
You can get any nth record by replacing the number in lvl = at the end of the query.
Let me know your feedback. Thanks.
EDIT:
It seems to not work with purely regexp_substr() as there is no way to convert blank between '||' to Oracle NULL .So intermediate TRIM() was required and i am adding a replace to make it easier. There will be patterns to directly match this scenario, but could not find them.
Here are all scenarios for 4th occurence .
WITH t
AS (SELECT '|Jack|Sparrow|SQY778|17JULY17||00J1' long_string
FROM dual
UNION ALL
SELECT 'Jack|Sparrow|SQY778|17JULY17||00J1' long_string
FROM dual
UNION ALL
SELECT '||Jack|Sparrow|SQY778|17JULY17|00J1' long_string
FROM dual)
SELECT long_string,
Trim (Regexp_substr (mod_string, '\|([^|]+)', 1, 4, NULL, 1)) nth_string
FROM (SELECT long_string,
Replace(long_string, '||', '| |') mod_string
FROM t) ;
LONG_STRING NTH_STRING
------------------------ -----------
|Jack|Sparrow|SQY778|17JULY17||00J1 17JULY17
Jack|Sparrow|SQY778|17JULY17||00J1 NULL
||Jack|Sparrow|SQY778|17JULY17|00J1 SQY778
EDIT2: Finally a pattern that gives the solution.Thanks to Gary_W
To get the nth occurence from the string , use:
WITH t
AS (SELECT '|Jack|Sparrow|SQY778|17JULY17||00J1' long_string
FROM dual
UNION ALL
SELECT 'Jack|Sparrow|SQY778|17JULY17||00J1' long_string
FROM dual
UNION ALL
SELECT '||Jack|Sparrow|SQY778|17JULY17|00J1' long_string
FROM dual)
SELECT long_string,
Trim (regexp_substr (long_string, '(.*?)(\||$)', 1, :n + 1, NULL, 1)) nth_string
FROM t;

split string in oracle query

I am trying to fetch phone numbers from my Oracle database table. The phone numbers may be separated with comma or "/". Now I need to split those entries which have a "/" or comma and fetch the first part.
Follow this approach,
with t as (
select 'Test 1' name from dual
union
select 'Test 2, extra 3' from dual
union
select 'Test 3/ extra 3' from dual
union
select ',extra 4' from dual
)
select
name,
regexp_instr(name, '[/,]') pos,
case
when regexp_instr(name, '[/,]') = 0 then name
else substr(name, 1, regexp_instr(name, '[/,]')-1)
end first_part
from
t
order by first_part
;
Lookup substr and instr functions or solve the puzzle using regexp.
I added a table test with one column phone_num. And added rows similar to your description.
select *
from test;
PHONE_NUM
------------------------------
0123456789
0123456789/1234
0123456789,1234
3 rows selected.
select
case
when instr(phone_num, '/') > 0 then substr(phone_num, 0, instr(phone_num, '/')-1)
when instr(phone_num, ',') > 0 then substr(phone_num, 0, instr(phone_num, ',')-1)
else phone_num
end phone_num
from test
PHONE_NUM
------------------------------
0123456789
0123456789
0123456789
3 rows selected.
This generally works. Although it will fail if you have rows with commas and slashes.

how to replace multiple strings together in Oracle

I have a string coming from a table like "can no pay{1},as your payment{2}due on {3}". I want to replace {1} with some value , {2} with some value and {3} with some value .
Is it Possible to replace all 3 in one replace function ? or is there any way I can directly write query and get replaced value ? I want to replace these strings in Oracle stored procedure the original string is coming from one of my table I am just doing select on that table
and then I want to replace {1},{2},{3} values from that string to the other value that I have from another table
Although it is not one call, you can nest the replace() calls:
SET mycol = replace( replace(mycol, '{1}', 'myoneval'), '{2}', mytwoval)
If there are many variables to replace and you have them in another table and if the number of variables is variable you can use a recursive CTE to replace them.
An example below. In table fg_rulez you put the strings with their replacement. In table fg_data you have your input strings.
set define off;
drop table fg_rulez
create table fg_rulez as
select 1 id,'<' symbol, 'less than' text from dual
union all select 2, '>', 'great than' from dual
union all select 3, '$', 'dollars' from dual
union all select 4, '&', 'and' from dual;
drop table fg_data;
create table fg_Data AS(
SELECT 'amount $ must be < 1 & > 2' str FROM dual
union all
SELECT 'John is > Peter & has many $' str FROM dual
union all
SELECT 'Eliana is < mary & do not has many $' str FROM dual
);
WITH q(str, id) as (
SELECT str, 0 id
FROM fg_Data
UNION ALL
SELECT replace(q.str,symbol,text), fg_rulez.id
FROM q
JOIN fg_rulez
ON q.id = fg_rulez.id - 1
)
SELECT str from q where id = (select max(id) from fg_rulez);
So, a single replace.
Result:
amount dollars must be less than 1 and great than 2
John is great than Peter and has many dollars
Eliana is less than mary and do not has many dollars
The terminology symbol instead of variable comes from this duplicated question.
Oracle 11gR2
Let's write the same sample as a CTE only:
with fg_rulez as (
select 1 id,'<' symbol, 'less than' text from dual
union all select 2, '>', 'greater than' from dual
union all select 3, '$', 'dollars' from dual
union all select 4, '+', 'and' from dual
), fg_Data AS (
SELECT 'amount $ must be < 1 + > 2' str FROM dual
union all
SELECT 'John is > Peter + has many $' str FROM dual
union all
SELECT 'Eliana is < mary + do not has many $' str FROM dual
), q(str, id) as (
SELECT str, 0 id
FROM fg_Data
UNION ALL
SELECT replace(q.str,symbol,text), fg_rulez.id
FROM q
JOIN fg_rulez
ON q.id = fg_rulez.id - 1
)
SELECT str from q where id = (select max(id) from fg_rulez);
If the number of values to replace is too big or you need to be able to easily maintain it, you could also split the string, use a dictionary table and finally aggregate the results
In the example below I'm assuming that the words in your string are separated with blankspaces and the wordcount in the string will not be bigger than 100 (pivot table cardinality)
with Dict as
(select '{1}' String, 'myfirstval' Repl from dual
union all
select '{2}' String, 'mysecondval' Repl from dual
union all
select '{3}' String, 'mythirdval' Repl from dual
union all
select '{Nth}' String, 'myNthval' Repl from dual
)
,MyStrings as
(select 'This is the first example {1} ' Str, 1 strnum from dual
union all
select 'In the Second example all values are shown {1} {2} {3} {Nth} ', 2 from dual
union all
select '{3} Is the value for the third', 3 from dual
union all
select '{Nth} Is the value for the Nth', 4 from dual
)
-- pivot is used to split the stings from MyStrings. We use a cartesian join for this
,pivot as (
Select Rownum Pnum
From dual
Connect By Rownum <= 100
)
-- StrtoRow is basically a cartesian join between MyStings and Pivot.
-- There as many rows as individual string elements in the Mystring Table
-- (Max = Numnber of rows Mystring table * 100).
,StrtoRow as
(
SELECT rownum rn
,ms.strnum
,REGEXP_SUBSTR (Str,'[^ ]+',1,pv.pnum) TXT
FROM MyStrings ms
,pivot pv
where REGEXP_SUBSTR (Str,'[^ ]+',1,pv.pnum) is not null
)
-- This is the main Select.
-- With the listagg function we group the string together in lines using the key strnum (group by)
-- The NVL gets the translations:
-- if there is a Repl (Replacement from the dict table) then provide it,
-- Otherwise TXT (string without translation)
Select Listagg(NVL(Repl,TXT),' ') within group (order by rn)
from
(
-- outher join between strings and the translations (not all strings have translations)
Select sr.TXT, d.Repl, sr.strnum, sr.rn
from StrtoRow sr
,dict d
where sr.TXT = d.String(+)
order by strnum, rn
) group by strnum
If you are doing this inside of a select, you can just piece it together, if your replacement values are columns, using string concatenation.

Resources