How to get the second last word in a string? - oracle

I have a string like below, and I need to get 10472314 in it. It is the last word but one in this case. Can you let me know how to get it in PL/SQL block? Any string function?
processed "SCOTT"."PRINCE05" 10472314 rows

The set of digits before the word 'rows' at the end of the string:
regexp_replace(text, '^(.+ )([0-9]+)( rows)$', '\2')
The third word:
regexp_substr(text, '\S+', 1, 3)
The second last word (the nth word where n = the number of words -1):
regexp_substr(text, '\S+', 1, regexp_count(text,'\S+') -1)
If you are processing a significant number of rows then the regex functions can be slow. The trade-off you have to make is between the expressiveness of regular expressions and the performance of plain substr and instr. Personally I prefer regexes unless there is a clear performance issue.

This gets you the last TWO words in a string and then shows you the first word of those two words which will always be the second-last word:
SUBSTRING_INDEX(SUBSTRING_INDEX(, " ", -2), ' ', 1)

If we use a '-1' magic option in INSTR then we can search the string in reverse order. By doing that, I believe we can find the last word but one like below.
It looks dirty but it just works for me.
SELECT SUBSTR( name, INSTR( name, ' ', -1 , 2 ) + 1 , INSTR( name, ' ', -1 , 1 ) - INSTR( name, ' ', -1 , 2 ) )
FROM prince_test;
INSTR( name, ' ', -1 , 2 ) + 1 ==> the second occurrence of SPACE
INSTR( name, ' ', -1 , 1 ) - INSTR( name, ' ', -1 , 2 ) ==> position gap between the first occurrence and the second.

Related

Need to extract text with REGEXP_SUBSTR - cannot find the right combination

Here is an example of my text- I am trying to get TEXTPART3 AS THE ANSWER:
TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4
I used TRIM(LEADING ':' FROM REGEXP_SUBSTR('textstatementhere', ':.+?-')) - but it is not accounting for the two ":" and the "-" in the text statement I get ' TEXTPART2: TEXTPART3 -'
Can anyone help?
Thanks in advance!
The problem has a more efficient solution using only standard string functions:
with
sample_input (str) as (
select 'TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4' from dual
)
select substr(str, pos, instr(str, '-', pos) - pos - 1) as text_part_3
from (select str, instr(str, ':', 1, 2) + 2 as pos from sample_input)
;
TEXT_PART_3
-----------
TEXTPART3
Leverage the REGEXP_SUBSTR Function to Do All of Your Work
Use a subexpression, (\w), and reference it:
WITH exmple AS (
SELECT
'TEXTPART1 : TEXTPART2: TEXTPART3 - TEXTPART4' txt
FROM
dual
)
SELECT
txt,
regexp_substr(txt, ': (\w*) -', 1, 1, NULL,
1)
FROM
exmple;
I see that you used ., in lieu of \w. Because you chose the meta-character,. (which represents all characters except new line (though that can be included if "n" is set as a pattern matching modifier)), the second colon is thrown in to the matching set.
What does TEXTPART3 include?
Perhaps the meta-character, \w, (which stands for alphanumeric or underscore (_) character), is not what you need.
You could replace it with a non-matching character list to avoid the problems with .:
[^:].
This approach would look like this:
REGEXP_SUBSTR(txt, ': ([^:]*) -',1,1,NULL,1)
Lastly, with the quantiifier associated with this subexpression, I used * which means zero or more matches. I assumed that there would be instances where there could be zero matches for this TEXTPART3. If this is not the case, we can use +.

Oracle / PLSQL: TRANSLATE Function - Why does this return NULL?

I am new to Oracle / PLSQL and I was testing some code lines using TRANSLATE function.
Scenario 1:
DBMS_OUTPUT.put_line(TRANSLATE('27383', '0123456789', ' '));
DBMS_OUTPUT.put_line(LENGTH(TRANSLATE('27383', '0123456789', ' ')));
I get output as:
NULL
NULL
Scenario 2:
DBMS_OUTPUT.put_line(TRANSLATE('2021 01 01', '0123456789', ' '));
DBMS_OUTPUT.put_line(LENGTH(TRANSLATE('2021 01 01', '0123456789', ' ')));
I get output as:
5 space characters
5
I thought, as the to_string is one character ' ' (space), the first occurrence is getting replaced and other values are getting removed from the original string.
Like, in '2021 01 01', '0' is replaced with ' ' and the rest of the numerical characters are removed making it a string with 5 space characters (2 spaces it originally had plus 3 spaces from the replacement).
By this logic, in the first scenario, '2' should be replaced with ' ' removing other numbers. Remaining should be a string with one space character, but it's not what is happening.
Can someone explain to me what happens here?
Think of it as a one-to-one pairing of before/after values, and that we need a one-to-one pair for everything.
Thus considering: '0123456789', ' '
The first string has 10 "positions", so the second string needs 10 positions, namely:
space,null,null,null,null,null,null,null,null,null,
So
0 goes to space
1 goes to null
2 goes to null
and so forth.

How to apply regular expression on the below given string

i have a string 'MCDONALD_YYYYMMDD.TXT' i need to use regular expressions and append the '**' after the letter 'D' in the string given . (i.e In the string at postion 9 i need to append '*' based on a column value 'star_len'
if the star_len = 2 the o/p = ''MCDONALD??_YYYYMMDD.TXT'
if the star_len = 1 the o/p = ''MCDONALD?_YYYYMMDD.TXT'
with
inputs ( filename, position, symbol, len ) as (
select 'MCDONALD_20170812.TXT', 9, '*', 2 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- SQL query begins BELOW THIS LINE.
select substr(filename, 1, position - 1) || rpad(symbol, len, symbol)
|| substr(filename, position) as new_str
from inputs
;
NEW_STR
-----------------------
MCDONALD**_20170812.TXT
select regexp_replace('MCDONALD_YYYYMMDD.TXT','MCDONALD','MCDONALD' ||
decode(star_len,1,'*',2,'**'))
from dual
This is how you could do it. I don't think you need it as a regular expression though if it is always going to be "MCDONALD".
EDIT: If you need to be providing the position in the string as well, I think a regular old substring should work.
select substr('MCDONALD_YYYYMMDD.TXT',1,position-1) ||
decode(star_len,1,'*',2,'**') || substr('MCDONALD_YYYYMMDD.TXT',position)
from dual
Where position and star_len are both columns in some table you provide(instead of dual).
EDIT2: Just to be more clear, here is another example using a with clause so that it runs without adding a table in.
with testing as
(select 'MCDONALD_YYYYMMDD.TXT' filename,
9 positionnum,
2 star_len
from dual)
select substr(filename,1,positionnum-1) ||
decode(star_len,1,'*',2,'**') ||
substr(filename,positionnum)
from testing
For the fun of it, here's a regex_replace solution. I went with a star since that what your variable was called even though your example used a question mark. The regex captures the filename string in 2 parts, the first being from the start up to 1 character before the position value, the second the rest of the string. The replace puts the captured parts back together with the stars in between.
with tbl(filename, position, star_len ) as (
select 'MCDONALD_20170812.TXT', 9, 2 from dual
)
select regexp_replace(filename,
'^(.{'||(position-1)||'})(.*)$', '\1'||rpad('*', star_len, '*')||'\2') as fixed
from tbl;

removing character after 3 different specific characters

I have a problem to get a result as example below using oracle. I have a lot of different data in a field A, and need to do a few step (as below) to become a result in field B.
example:
LM2963NAMBLK-P/NOPB/SA and the result: LM2963NAM
remove all characters after '/'
remove character '-P'
remove character 'BLK'
The below should work:
select A,
regexp_replace(
regexp_replace(A, '-P|/.*', ''),
'^(.*)BLK$', '\1'
) B
--remainder of query
The nested regexp_replace will replace -P or / followed by 0 or more occurrences of any characters with the empty string, leaving you with your desired output plus a possible BLK at the end. The | represents "or." I'm not logged in to oracle at the moment, but the - and / shouldn't need to be escaped as / is not a special character in Oracle regex and - is only a special character inside of [ ]. The outside regexp_replace will replace BLK at the end with the empty string if it is there, or just return your string if it is not there.
Well, this is one way...
select substr( substr( substr( 'LM2963NAMBLK-P/NOPB',1, slash_pos-1 ), 1, dash_p_pos-1 ), 1, blk_pos-1 )
from
(
select 'LM2963NAMBLK-P/NOPB'
, instr( 'LM2963NAMBLK-P/NOPB', '/' ) slash_pos
, instr( 'LM2963NAMBLK-P/NOPB', '-P' ) dash_p_pos
, instr( 'LM2963NAMBLK-P/NOPB', 'BLK' ) blk_pos
from dual
);
SUBSTR(SU
---------
LM2963NAM

REGEXP_SUBSTR for portion of string

I would like to get:
82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
from the following expression
LASTNAME_FIRSTNAME_82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
Does someone know how I can get this using regexp_substr ?
EDIT
Basically I have a field which has 7 sets each separated by _ . The string I gave is just one example. I wanted to retrieve everything after the second _ . There is no fixed character length so I can not use a substr function. Hence I was using regexp_substr. I was able to get away by using a simplified version
Select FILE_NAME, ( (REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,3)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,4)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,5)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+_',1,6)) ||
(REGEXP_SUBSTR(FILE_NAME,'[^_]+',1,7)) ) as RegExp
from tbl
Here is some more data from the FILE_NAME field
LAST_FIRST_82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
SMITH_JOHN_82961_0130BPQX9QZN9G4P5RDTPA9HR4R_MR_1_1of1
LASTNAME_FIRSTNAME_99999_01V0MU4XUQK0Y24Y9RYTFA7W1CM_MR_3_1of1
To get everything after the second underscore, you do not need regular expressions, but can use something like the following:
select substr(FILE_NAME, instr(FILE_NAME, '_', 1, 2) +1 ) from tbl
The instr returns the position of the second occurrence of '_', starting by the first character; the substr simply gets everything starting from the position given by instr + 1
From your Requirement, you can just go ahead and use the simple SUBSTRfunction. Its faster, and it addresses the simple need to remove the String LASTNAME_FIRSTNAME.
select substr('LASTNAME_FIRSTNAME_82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1', 20) data_string
from dual;
Output:
data_string
-----------------
82961_01B04WZXQQSUGJ4YMRRT2A7TRHK_MR_2_1of1
Unless you have another underlying logic you need to address?
Kindly clarify so i can edit the answer accordingly.

Resources