Regex is not handling chr(10) character - oracle

SELECT
REGEXP_SUBSTR (
str,
'(.*?)(~-delim~-|$)',
1,
LEVEL,
NULL,
1
) output
FROM (
SELECT 'Line1~-delim~-Line2' AS str FROM DUAL
)
CONNECT BY LEVEL <= REGEXP_COUNT (str, '~-delim~-') + 1
Output is
Line1
Line2
SELECT
REGEXP_SUBSTR (
str,
'(.*?)(~-delim~-|$)',
1,
LEVEL,
NULL,
1
) output
FROM (
SELECT 'Line1'||chr(10)||'~-delim~-Line2' AS str FROM DUAL
)
CONNECT BY LEVEL <= REGEXP_COUNT (str, '~-delim~-') + 1
Output is
Line2
Why the the newline is causing the string to suffer? I would expect the output to be:
Line1
Line2

By default, the Oracle regexp engine does not match the wildcard . to the newline character (chr(10)).
You can change that behavior by using the fifth argument to regexp_substr. Currently you assigned null to it. Change that to 'n' (including the single-quotes) and try again. You will get what you need.
Then check the documentation for REGEXP_SUBSTR, specifically about the fifth parameter - you will see what other options are available. You may not need them now, but perhaps you will remember them when you need them in the future.

Related

Need to extract text with REGEXP_SUBSTR - cannot find the right combination

Here is an example of my text- I am trying to get TEXTPART3 AS THE ANSWER:
TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4
I used TRIM(LEADING ':' FROM REGEXP_SUBSTR('textstatementhere', ':.+?-')) - but it is not accounting for the two ":" and the "-" in the text statement I get ' TEXTPART2: TEXTPART3 -'
Can anyone help?
Thanks in advance!
The problem has a more efficient solution using only standard string functions:
with
sample_input (str) as (
select 'TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4' from dual
)
select substr(str, pos, instr(str, '-', pos) - pos - 1) as text_part_3
from (select str, instr(str, ':', 1, 2) + 2 as pos from sample_input)
;
TEXT_PART_3
-----------
TEXTPART3
Leverage the REGEXP_SUBSTR Function to Do All of Your Work
Use a subexpression, (\w), and reference it:
WITH exmple AS (
SELECT
'TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4' txt
FROM
dual
)
SELECT
txt,
regexp_substr(txt, ': (\w*) -', 1, 1, NULL,
1)
FROM
exmple;
I see that you used ., in lieu of \w. Because you chose the meta-character,. (which represents all characters except new line (though that can be included if "n" is set as a pattern matching modifier)), the second colon is thrown in to the matching set.
What does TEXTPART3 include?
Perhaps the meta-character, \w, (which stands for alphanumeric or underscore (_) character), is not what you need.
You could replace it with a non-matching character list to avoid the problems with .:
[^:].
This approach would look like this:
REGEXP_SUBSTR(txt, ': ([^:]*) -',1,1,NULL,1)
Lastly, with the quantiifier associated with this subexpression, I used * which means zero or more matches. I assumed that there would be instances where there could be zero matches for this TEXTPART3. If this is not the case, we can use +.

PL/SQL - Split string into an associative array

In plsql is there a way to split a string into an associative array?
Sample string: 'test1:First string, test2: Second string, test3: Third string'
INTO
TYPE as_array IS TABLE OF VARCHAR2(50) INDEX BY VARCHAR2(50);
a_array as_array;
dbms_output.put_line(a_array('test1')); // Output 'First string'
dbms_output.put_line(a_array('test2')); // Output 'Second string'
dbms_output.put_line(a_array('test3')); // Output 'Third string'
The format of the string does not matter for my purposes. It could be 'test1-First string; test2-Second string; test3-Third string'. I could do this with a very large function manually splitting by commas first and then splitting each of those but I'm wondering if there is something built in to the language.
Like I said, I am not looking to do it through a large function (especially using substr and making it look messy). I am looking for something that does my task simpler.
There is no built in function for such a requirement.
But you can easily build a query like below to parse these strings:
SELECT y.*
FROM (
select trim(regexp_substr(str,'[^,]+', 1, level)) as str1
from (
SELECT 'test1:First string, test2: Second string, test3: Third string' as Str
FROM dual
)
connect by regexp_substr(str, '[^,]+', 1, level) is not null
) x
CROSS APPLY(
select trim(regexp_substr(str1,'[^:]+', 1, 1)) as key,
trim(regexp_substr(str1,'[^:]+', 1, 2)) as value
from dual
) y
KEY VALUE
------ --------------
test1 First string
test2 Second string
test3 Third string
Then you may use this query in your function and pass it's result to the array.
I leave this exercise for you, I believe you can manage it (tip: use Oracle's bulk collect feature)
This method handles NULL list elements if you need to still show that element 2 is NULL for example. Note the second element is NULL:
-- Original data with multiple delimiters and a NULL element for testing.
with orig_data(str) as (
select 'test1:First string,, test3: Third string' from dual
),
--Split on first delimiter (comma)
Parsed_data(rec) as (
select regexp_substr(str, '(.*?)(,|$)', 1, LEVEL, NULL, 1)
from orig_data
where str is not null
CONNECT BY LEVEL <= REGEXP_COUNT(str, ',') + 1
)
-- For testing-shows records based on 1st level delimiter
--select rec from parsed_data;
-- Split the record into columns
select trim(regexp_replace(rec, '^(.*):.*', '\1')) key,
trim(regexp_replace(rec, '^.*:(.*)', '\1')) value
from Parsed_data;
Watch out for the regex form of [^,]+ for parsing delimited strings, it fails on NULL elements. More Information

Regexp_substr find string not matching a group of characters

I have a string like mystr = 'value1~|~value2~|~ ... valuen". I need it as one column separated on rows like this:
value1
value2
...
valuen
I'm trying this
select regexp_substr(mystr, '[^(~\|~)]', 1 , lvl) from dual, (select level as lvl from dual connect by level <= 5);
The problem is that ~|~ is not treated as a group, if I add ~ to anywhere in the string it gets separated; also () are treated as separators.
Any help is highly appreciated! Thanks! ~|~
Quick and dirty solution:
with t as (
select rtrim(regexp_substr('value1~|~value2~|~value3~|~value4', '(.+?)($|~\|~)', 1,level,''),'~|~')value from dual connect by level<10
) select * from t where value is not null;
[] signifies a single character match and [^] signifies a single character that does not match any of the contained characters.
So [^(~\|~)] will match any one character that is not ( or ~ or \ or | or ~ (again) or ).
What you want is a match that is terminated by your separator:
SELECT REGEXP_SUBSTR(
mystr,
'(.*?)(~\|~)',
1,
LEVEL,
NULL,
1
)
FROM DUAL
CONNECT BY LEVEL < REGEXP_COUNT( mystr, '(.*?)(~\|~)' );
(or if you cannot have zero-width matches, you can use the regular expression '(.+?)(~\|~)' and <= in the CONNECT BY clause.)
This will parse the delimited list and the format of the regex will handle NULL list elements should they occur as shown in the example.
SQL> with tbl(str) as (
select 'value1~|~value2~|~~|~value4' from dual
)
select regexp_substr(str, '(.*?)(~\|~|$)', 1, level, NULL, 1) parsed
from tbl
connect by level <= regexp_count(str, '~\|~')+1;
PARSED
--------------------------------
value1
value2
value4
SQL>

How to apply regular expression on the below given string

i have a string 'MCDONALD_YYYYMMDD.TXT' i need to use regular expressions and append the '**' after the letter 'D' in the string given . (i.e In the string at postion 9 i need to append '*' based on a column value 'star_len'
if the star_len = 2 the o/p = ''MCDONALD??_YYYYMMDD.TXT'
if the star_len = 1 the o/p = ''MCDONALD?_YYYYMMDD.TXT'
with
inputs ( filename, position, symbol, len ) as (
select 'MCDONALD_20170812.TXT', 9, '*', 2 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- SQL query begins BELOW THIS LINE.
select substr(filename, 1, position - 1) || rpad(symbol, len, symbol)
|| substr(filename, position) as new_str
from inputs
;
NEW_STR
-----------------------
MCDONALD**_20170812.TXT
select regexp_replace('MCDONALD_YYYYMMDD.TXT','MCDONALD','MCDONALD' ||
decode(star_len,1,'*',2,'**'))
from dual
This is how you could do it. I don't think you need it as a regular expression though if it is always going to be "MCDONALD".
EDIT: If you need to be providing the position in the string as well, I think a regular old substring should work.
select substr('MCDONALD_YYYYMMDD.TXT',1,position-1) ||
decode(star_len,1,'*',2,'**') || substr('MCDONALD_YYYYMMDD.TXT',position)
from dual
Where position and star_len are both columns in some table you provide(instead of dual).
EDIT2: Just to be more clear, here is another example using a with clause so that it runs without adding a table in.
with testing as
(select 'MCDONALD_YYYYMMDD.TXT' filename,
9 positionnum,
2 star_len
from dual)
select substr(filename,1,positionnum-1) ||
decode(star_len,1,'*',2,'**') ||
substr(filename,positionnum)
from testing
For the fun of it, here's a regex_replace solution. I went with a star since that what your variable was called even though your example used a question mark. The regex captures the filename string in 2 parts, the first being from the start up to 1 character before the position value, the second the rest of the string. The replace puts the captured parts back together with the stars in between.
with tbl(filename, position, star_len ) as (
select 'MCDONALD_20170812.TXT', 9, 2 from dual
)
select regexp_replace(filename,
'^(.{'||(position-1)||'})(.*)$', '\1'||rpad('*', star_len, '*')||'\2') as fixed
from tbl;

How to add new line separator into data for oracle

I'm working on displaying data from oracle.
is there a way to make the following data inside the table:
example :
'1.somedata, 2.somedata, 3.somedata, 4.somedata, 5.somedata'
to display like:
example:
'1. somedata
2. somedata
3. somedata
4. somedata
5. somedata'
on the interface?
do i add new line separator directly into the data?
or do i separator them into new line when i query it?
or is there any other simple way?
Thanks.
There are so many ways to do this, here is one if you are selecting from a column:
SELECT REPLACE ('1.somedata, 2.somedata, 3.somedata, 4.somedata, 5.somedata', ',', CHR (13) || CHR (10)) AS split
FROM DUAL;
1.somedata
2.somedata
3.somedata
4.somedata
5.somedata
I personally would use the listagg function and use '' as the delimiter.
SELECT LISTAGG(last_name, ' ')
WITHIN GROUP (ORDER BY hire_date, last_name) "Emp_list",
MIN(hire_date) "Earliest"
FROM employees
WHERE department_id = 30;
Remember that Apex is generating a web page, which means the end result is HTML. Apex, however, will also sometimes escape special HTML characters for you, like < and &. Since you're viewing a table, I assume the source of your data is a query and your "somedata" field is a single column. Try this:
SELECT REPLACE( somedata_column, ',', '<br />' )
FROM mytable
You don't say what version of Apex. In Apex 4.x, the column would need to be set to a Standard Report Column, which would stop Apex from the <br> elements. I forget what the column type is in Apex 5.x.
Check below sample query which converts coma separated list data into rows
SELECT substr( '1.AL,2.AL,3.AL,4.AL,5.AL,6.AL,',
( case when rownum = 1 then 1
else instr( '1.AL,2.AL,3.AL,4.AL,5.AL,6.AL,', ',', 1, rownum - 1 ) + 1
end ),
instr( substr( '1.AL,2.AL,3.AL,4.AL,5.AL,6.AL,',
( case when rownum = 1 then 1
else instr( '1.AL,2.AL,3.AL,4.AL,5.AL,6.AL,', ',', 1, rownum - 1 ) + 1
end )
), ',' ) - 1
) as data
FROM dual
CONNECT BY LEVEL <= length( '1.AL,2.AL,3.AL,4.AL,5.AL,6.AL,' ) - length ( replace('1.AL,2.AL,3.AL,4.AL,5.AL,6.AL,', ',') )
Hope this will help you!

Resources