Retrieving first X words from a string in Oracle Select - oracle

I need to select the first X words in a string, where x can be any number from 0-100. Is there an easy way to do this? I found the following example to select the first 2 words from a string:
select regexp_replace('Hello world this is a test', '(\w+ \w+).*$','\1') as first_two
from dual
How would I select the first X words from a string where X can be a number from 0-100?

Selecting the first four words:
select
regexp_replace(
'Hello world this is a test etc',
'(((\w+)\s){4}).*', -- Change 4 to wanted number of words here!
'\1'
)
from dual;
Edit
The above solution only works if the words are seperated by exactly one white space character. If the words are seperated by one or more white space characters, the \s must be extended to \s+:
select
regexp_replace(
'Hello world this is a test etc',
'(((\w+)\s+){4}).*', -- Change 4 to wanted number of words here!
'\1'
)
from dual;

This method takes the result of extracting the number of words you want, then reduces multiple spaces to one:
select trim(regexp_replace(regexp_substr('Hello world this is a test etc', '(([^ ]*)( |$)*){3}'), ' +', ' '))
from dual;
EDIT: This is getting ugly, but wrapped a TRIM() around it to get rid of the trailing space (the one after the last word selected).

this would do it, but it may be a bit inelegant, replace "2" with the number of words to find
select substr('this is a number of words',1,instr('this is a number of words',' ',1,2))
from dual
does assume words always end with a space

Related

how to trim leading zero in oracle sql from concatenation text (text:number-number-number

how to trim leading zero in oracle sql from concatenation text
(text:number-number-number)
example(word:number-number-number) word can have text or double zero
but always has char before it after word, max digits separated by '-'
all time max 3 digits i want to keep zeros in first part. and after
that if remove leading 0 in sequence but keep it if it's only one 0
MachineAbc00:1-0-03 = MachineAbc00:1-0-3
MachineAbc00:1-001-02 = MachineAbc00:1-1-2
tried many combination, not successful , like
REGEXP_REPLACE ('MachineO00:1-0-03*', '0+(?!$)', '-')
REGEXP_REPLACE ('MTROPQFMO00:1-0-03*', '(-0){1,}', '-')
If all the input strings are in the exact format you said they are, then something like this should work:
with
sample_strings (str) as (
select 'MachineAbc00:1-0-03' from dual union all
select 'MachineAbc00:1-001-02' from dual union all
select 'MachineZzzyx:200-020-002' from dual union all
select 'machineCX032:0-000-0' from dual
)
select str as old_str,
regexp_replace(str, '([:-])0*(\d+)', '\1\2') as new_str
from sample_strings
;
OLD_STR NEW_STR
------------------------ ------------------------
MachineAbc00:1-0-03 MachineAbc00:1-0-3
MachineAbc00:1-001-02 MachineAbc00:1-1-2
MachineZzzyx:200-020-002 MachineZzzyx:200-20-2
machineCX032:0-000-0 machineCX032:0-0-0
The regular expression function finds any occurrence of (colon or dash) followed by (zero or more 0 characters/digits) followed by at least one more digit. The "zero or more 0 digits" is maximal with the property that there must be at least one more digit AFTER that match (even if that extra digit hapens to be a zero - see my last test string, which I added precisely in order to test that this works correctly). The function replaces each such occurrence with the first and third fragments, removing the middle one (the zeros you must remove from your string). The references \1 and \2 refer to the first and the second parenthesized sub-expressions - the punctuation mark (colon or dash) and, respectively, the final digits (excluding the leading zeros that must be removed).

regex to get rid of special characters and getting rid of the space that it left

is there a way to get rid of the spaces after getting rid of the special characters in regex?
For example, if I do
Select REGEXP_REPLACE
('Test¥ÇÂ\est1_^_\L¢\L\this is a test', '[^0-9A-Za-z\-\#\<\>\(\)\"\,\/\]', ' ') test
from dual;
this will result in: Test \est1 \L \L\this is a test
I want results to show without the spaces where it replaced the special characters, but not between regular words like this:
Test\est1\L\L\this is a test
Thanks
You just need to add the space in the pattern as follows:
SQL> SELECT REGEXP_REPLACE(
2 'Test¥ÇÂ\est1_^_\L¢\L\this is a test',
3 '[^0-9A-Za-z-#\<>()\"\,/ ]',
4 ''
5 ) TEST
6 FROM DUAL;
TEST
-----------------------------
TestA\est1\L\L\this is a test
SQL>

Oracle regex_replace not working as expected

I have following SQL query (Oracle 18c):
SELECT
--FIRST
translate(
' sOmE tEsT
eNdOfLiNe',
chr(10)||chr(11)||chr(13), 'replText'
) "Result1",
--SECOND
regexp_replace(
' sOmE tEsT
eNdOfLiNe',
'[\x0A|\x0B|`\x0D]', 'replText'
) "Result2",
--THIRD
regexp_replace(
' sOmE tEsT
eNdOfLiNe',
'[\r\n\t]', 'replText', 1, 0
) "Result3"
FROM dual
What I would like to do is replace all tabs, return carriages and new line indicators with new string but it seems like regexp replace is not working (returns initial text). I am really sorry about formatting but I need to handle text in exact format as above with \r \n \t mixed chars.
Here is fiddle: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=63834f9bcab93136635366f18c375b13
I am learning Oracle right now and don't understand why second and third solution returns initial text. The first solution seems to work but I would like to achieve the same effect in SECOND and THIRD solution. What I missed?
I'm pretty sure Oracle does not allow escape sequences in a character class. I believe this is what you have to do. In response to your comment on another answer here and as you are learning, regex is most definitely not regex. Especially Oracle's implementation.
EDIT to explain the regex: The regex pattern is building a string of a regex character class containing 3 characters, hence the concatenation. You can't just have escape characters in the regex as then regex would take those characters as part of the character class pattern itself.
SELECT REGEXP_REPLACE(
' sOmE tEsT
eNdOfLiNe', '['||CHR(9)||CHR(10)||CHR(13)||']', 'X') Result3
FROM dual;
RESULT3
------------------------------
sOmE tEsTXXXXXXXX eNdOfLiNe
1 row selected.
You can try the below using similar format as translate
select regexp_replace(
' sOmE tEsT
eNdOfLiNe',
chr(10)||'|'||chr(11)||'|'||chr(13), 'replText') "Result3"
FROM dual

FInd if the fifth position is a letter and not a number using ORACLE

How can I find if the fifth position is a letter and thus not a number using Oracle ?
My last try was using the following statement:
REGEXP_LIKE (table_column, '([abcdefghijklmnopqrstuvxyz])');
Perhaps you'd rather check whether 5th position contains a number (which means that it is not something else), i.e. do the opposite of what you're doing now.
Why? Because a "letter" isn't only ASCII; have a look at the 4th row in my example - it contains Croatian characters and these aren't between [a-z] (nor [A-Z]).
SQL> with test (col) as
2 (select 'abc_3def' from dual union all
3 select 'A435D887' from dual union all
4 select '!#$%&/()' from dual union all
5 select 'ASDĐŠŽĆČ' from dual
6 )
7 select col,
8 case when regexp_like(substr(col, 5, 1), '\d+') then 'number'
9 else 'not a number'
10 end result
11 from test;
COL RESULT
------------- ------------
abc_3def number
A435D887 not a number
!#$%&/() not a number
ASDĐŠŽĆČ not a number
SQL>
Anchor to the start of the string else you may get unexpected results. This works, but remove the caret (start of string anchor) and it returns 'TRUE'! Note it uses the case-insensitive flag of 'i'.
select 'TRUE'
from dual
where regexp_like('abcd4fg', '^.{4}[A-Z]', 'i');
Yet another way to do it:
regexp_like(table_column, '^....[[:alpha:]]')
Using the character class [[:alpha:]] will pick up all letters upper case, lower case, accented and etc. but will ignore numbers, punctuation and white space characters.
If what you care about is that the character is not a number, then use
not regexp_like(table_column, '^....[[:digit:]]')
or
not regexp_like(table_column, '^....\d')
Try:
REGEXP_LIKE (table_column, '^....[a-z]')
Or:
SUBSTR (table_column, 5, 1 ) BETWEEN 'a' AND 'z'

How Can I Extract String in Oracle

I would like to extract following string in Oracle. How can I do that?
Original String: 011113584378(+) CARD, STAFF
Expected String: STAFF CARD
I presume you have the luxury of writing a PL/SQL function? Then just use "SUBSTR", and/or "INSTR", and || concatenation operator to parse your input.
Here is an example:
https://www.techonthenet.com/oracle/questions/parse.php
...The field may contain the following value:
F:\Siebfile\YD\S_SR_ATT_1-60SS_1-AM3L.SAF
In this case, I need to return the value of '1-60SS', as this is the value that resides between the 3rd and 4th underscores.
SOLUTION:
create or replace function parse_value (pValue varchar2)
return varchar2
is
v_pos3 number;
v_pos4 number;
begin
/* Return 3rd occurrence of '_' */
v_pos3 := INSTR (pValue, '_', 1, 3) + 1;
/* Return 4rd occurrence of '_' */
v_pos4 := INSTR (pValue, '_', 1, 4);
return SUBSTR (pValue, v_pos3, v_pos4 - v_pos3);
end parse_value;
Ok, I'll bite. This example uses REGEXP_REPLACE to describe the string, saving the parts you need in order to rearrange them before returning them. It would be better if you showed some real-world examples of the data you are dealing with as I can only guarantee this example will work with the one line you provided.
The regular expression matches any characters starting at the beginning of the string and ending with a close paren-space. The next set of any characters up to but not including the comma-space is "remembered" by enclosing them in parens. This is called a captured group. The next captured group is the set of characters after that comma-space separator until the end of the line (the dollar sign). The captured groups are referred to by their order from left to right. The 3rd argument is the string to return, which is the 2nd and 1st captured groups, in that order, separated by a space.
SQL> with tbl(str) as (
select '+011113584378(+) CARD, STAFF' from dual
)
select regexp_replace(str, '^.*\) (.*), (.*)$', '\2 \1') formatted
from tbl;
FORMATTED
----------
STAFF CARD
SQL>

Resources