CHR(0) in REGEXP_LIKE - oracle

I am using the queries to check how chr(0) behaves in regexp_like.
CREATE TABLE t1(a char(10));
INSERT INTO t1 VALUES('0123456789');
SELECT CASE WHEN REGEXP_LIKE(a,CHR(0)) THEN 1 ELSE 0 END col, DUMP(a)
FROM t1;
The output I am getting like this -
col dump(a)
----------- -----------------------------------
1 Typ=96 Len=10: 48,49,50,51,52,53,54,55,56,57
I am totally confused, if there is no chr(0) as shown by the dump(a), how regexp_like is finding the chr(0) in the column and returning 1? Shouldn't it return 0 here?

CHR(0) is the character used to terminate a string in the C programming language (among others).
When you pass CHR(0) to the function it will, in turn, pass it to lower level function that will parse the strings you have passed in and build a regular expression pattern from that string. This regular expression pattern will see CHR(0) and think it is the string terminator and ignore the rest of the pattern.
The behaviour is easier to see with REGEXP_REPLACE:
SELECT REGEXP_REPLACE( 'abc' || CHR(0) || 'e', CHR(0), 'd' )
FROM DUAL;
What happens when you run this:
CHR(0) is compiled into a regular expression and become a string terminator.
Now the pattern is just the string terminator and so the pattern is a zero-length string.
The regular expression is then matched against the input string and it reads the first character a and finds a zero-length string can be matched before the a so it replaces the nothing it has matched before the a with an d giving the output da.
It will then repeat for the next character transforming b to db.
and so on until you reach the end-of-string when it will match the zero-length pattern and append a final d.
And you will get get the output:
dadbdcd_ded
(where _ is the CHR(0) character.)
Note: the CHR(0) in the input is not replaced.
If the client program you are using is also truncating the string at CHR(0) you may not see the entire output (this is an issue with how your client is representing the string and not with Oracle's output) but it can also be shown using DUMP():
SELECT DUMP( REGEXP_REPLACE( 'abc' || CHR(0) || 'e', CHR(0), 'd' ) )
FROM DUAL;
Outputs:
Typ=1 Len=11: 100,97,100,98,100,99,100,0,100,101,100
[TL;DR] So what is happening with
REGEXP_LIKE( '1234567890', CHR(0) )
It will make a zero-length string regular expression pattern and it will look for a zero-length match before the 1 character - which it will find and then return that it has found a match.

Aleksej kind of beat me to it, but CHR(0) is the value for the string terminator (kind of like the NULL keyword but not exactly). Think of it like an internal end-of-string indicator that CHR(0) apparently can see. Note that if you try the query with the keyword NULL, it will return zero, as nothing can be compared to NULL and the comparison thus will fail (as you were expecting). Interesting. Perhaps someone more experienced with the internal workings can explain further, I would be interested to hear more.

Not an answer, just some experiments, but too long for a comment.
REGEXP_COUNT seems to be confused by chr(0), counting every character as chr(0); besides, it seems to find one occurrence more than the size of the string.
SQL> select dump('a'), regexp_count('a', chr(0)) from dual;
DUMP('A') REGEXP_COUNT('A',CHR(0))
---------------- ------------------------
Typ=96 Len=1: 97 2
SQL> select dump(chr(0)), regexp_count(chr(0), chr(0)) from dual;
DUMP(CHR(0)) REGEXP_COUNT(CHR(0),CHR(0))
-------------- ---------------------------
Typ=1 Len=1: 0 2
SQL> select dump('0123456789' || chr(0)), regexp_count('0123456789' || chr(0), chr(0)) from dual;
DUMP('0123456789'||CHR(0)) REGEXP_COUNT('0123456789'||CHR(0),CHR(0))
--------------------------------------------- -----------------------------------------
Typ=1 Len=11: 48,49,50,51,52,53,54,55,56,57,0 12
LIKE seems to have a good behaviour, while its REGEXP version seems to fail:
SQL> select 1 from dual where 'a' like '%' || chr(0) || '%';
no rows selected
SQL> select 1 from dual where regexp_like ('a', chr(0));
1
----------
1
Same thing for INSTR and REGEXP_INSTR
SQL> select 1 from dual where instr('a', chr(0)) != 0;
no rows selected
SQL> select 1 from dual where regexp_instr('a', chr(0)) != 0;
1
----------
1
Tested on 11g XE Release 11.2.0.2.0 - 64bit

Related

FInd if the fifth position is a letter and not a number using ORACLE

How can I find if the fifth position is a letter and thus not a number using Oracle ?
My last try was using the following statement:
REGEXP_LIKE (table_column, '([abcdefghijklmnopqrstuvxyz])');
Perhaps you'd rather check whether 5th position contains a number (which means that it is not something else), i.e. do the opposite of what you're doing now.
Why? Because a "letter" isn't only ASCII; have a look at the 4th row in my example - it contains Croatian characters and these aren't between [a-z] (nor [A-Z]).
SQL> with test (col) as
2 (select 'abc_3def' from dual union all
3 select 'A435D887' from dual union all
4 select '!#$%&/()' from dual union all
5 select 'ASDĐŠŽĆČ' from dual
6 )
7 select col,
8 case when regexp_like(substr(col, 5, 1), '\d+') then 'number'
9 else 'not a number'
10 end result
11 from test;
COL RESULT
------------- ------------
abc_3def number
A435D887 not a number
!#$%&/() not a number
ASDĐŠŽĆČ not a number
SQL>
Anchor to the start of the string else you may get unexpected results. This works, but remove the caret (start of string anchor) and it returns 'TRUE'! Note it uses the case-insensitive flag of 'i'.
select 'TRUE'
from dual
where regexp_like('abcd4fg', '^.{4}[A-Z]', 'i');
Yet another way to do it:
regexp_like(table_column, '^....[[:alpha:]]')
Using the character class [[:alpha:]] will pick up all letters upper case, lower case, accented and etc. but will ignore numbers, punctuation and white space characters.
If what you care about is that the character is not a number, then use
not regexp_like(table_column, '^....[[:digit:]]')
or
not regexp_like(table_column, '^....\d')
Try:
REGEXP_LIKE (table_column, '^....[a-z]')
Or:
SUBSTR (table_column, 5, 1 ) BETWEEN 'a' AND 'z'

Convert String to Date field for SQL Oracle

I need to convert a string to a date field. The field stores 30 characters. Dates, when present, are formatted as 'yyyymmdd' (20170202). In all cases, dates have 22 spaces after. I need to format this field as a date field like this: dd-mm-yyyy.
I've tried several formulas:
TO_CHAR(PERSACTION.NEW_VALUE_02, 'dd-mm-yyyy') ,TO_CHAR(PERSACTION.NEW_VALUE_02, 'yyyymmdd'), trim(TO_CHAR(PERSACTION.NEW_VALUE_02, 'yyyymmdd')) with error message: invalid number format model. Your expertise is welcome and appreciated.
to_char(to_date( rtrim(new_value_02), 'yyyymmdd'), 'dd-mm-yyyy')
Should do the trick. rtrim removes spaces on right side of string. Then I convert it to date using the date format specified, and then convert it to a string again in the desired format.
Did tried to convert to date format and then to char again?
TO_CHAR(TO_DATE(PERSACTION.NEW_VALUE_02,'yyyymmdd'),'dd-mm-yyyy')
Please, please, please do not store DATEs and CHARACTER datatypes. This will only lead to issues that can be avoided when using the DATE datatype.
If you want to change the string 20170202 to another string and not actually a date (which would have no intrinsic formatted text representation), you could optionally use a regular expression to transform it, instead of converting to a date and back:
select regexp_replace('20170202 ', '^(\d{4})(\d{2})(\d{2}) +$', '\3-\2-\1')
from dual;
REGEXP_REPLACE(
---------------
02-02-2017
Or you could use substr instead of regexp_substr, which may perform better even if you have to call it three times; using a CTE just to avoid repeating the value:
with t(str) as (
select '20170202 ' from dual
)
select substr(str, 7, 2) ||'-'|| substr(str, 5, 2) ||'-'|| substr(str, 1, 4)
from t;
SUBSTR(STR
----------
02-02-2017
If you do convert to a date and back you would uncover any values which cannot be converted, as they will cause an exception to be thrown. That would imply you have bad data; which would have been avoided by using the right data type in the first place, of course. These will convert any old rubbish, with varying results depending on how far the strings stray from the pattern you expect - but including strings like '20170231' which represent an invalid date. And null value or strings of just spaces will be converted to odd things with the substr version, but you could filter those out.
You can see the kind of variation you would get with some sample data that doesn't match your expectations:
with t(str) as (
select '20170202 ' from dual
union all select '20170231 ' from dual
union all select '2017020c ' from dual
union all select '2017020 ' from dual
union all select '201702021 ' from dual
union all select ' ' from dual
union all select null from dual
)
select str,
regexp_replace(str, '^(\d{4})(\d{2})(\d{2}) +$', '\3-\2-\1') as reg,
substr(str, 7, 2) ||'-'|| substr(str, 5, 2) ||'-'|| substr(str, 1, 4) as sub
from t;
STR REG SUB
------------- ------------- -------------
20170202 02-02-2017 02-02-2017
20170231 31-02-2017 31-02-2017
2017020c 2017020c 0c-02-2017
2017020 2017020 0 -02-2017
201702021 201702021 02-02-2017
- -
--
With the anchors and whitespace expectation, the regular expression doesn't modify anything that doesn't consist entirely of 8 numeric characters. But it can still form invalid 'dates'.

trim value till specified string in oracle pl/sql

i want to trim value of the given string till specified string in oracle pl/sql.
some thing like below.
OyeBuddy$$flex-Flex_Image_Rotator-1443680885520.
In the above string i want to trim till $$ so that i will get "flex-Flex_Image_Rotator-1443680885520".
You can use different ways; here are two methods, with and without regexp:
with test(string) as ( select 'OyeBuddy$$flex-Flex_Image_Rotator-1443680885520.' from dual)
select regexp_replace(string, '(.*)(\$\$)(.*)', '\3')
from test
union all
select substr(string, instr(string, '$$') + length('$$'))
from test
You want to do a SUBSTR where the starting position is going to be the position of '$$' + 2 . +2 is because the string '$$' is of length 2, and we don't want to include that string in the result.
Something like -
SELECT SUBSTR (
'ABCDEF$$some_big_text',
INSTR ('ABCDEF$$some_big_text', '$$') + 2)
FROM DUAL;

what will translate function do if I want to change some chars to nothing?

I have a sql statement:
select translate('abcdefg', 'abc', '') from dual;
Why the result is nothing?
I think it should be 'defg'.
From the documentation:
You cannot use an empty string for to_string to remove all characters in from_string from the return value. Oracle Database interprets the empty string as null, and if this function has a null argument, then it returns null. To remove all characters in from_string, concatenate another character to the beginning of from_string and specify this character as the to_string. For example, TRANSLATE(expr, 'x0123456789', 'x') removes all digits from expr.
So you can do something like:
select translate('abcdefg', '#abc', '#') from dual;
TRANSLATE('ABCDEFG','#ABC','#')
-------------------------------
defg
... using any character that isn't going to be in your from_string.
select translate('abcdefg', 'abc', '') from dual;
To add to Alex's answer, you could use any character(allowed in SQL) for that matter to concatenate to remove all the characters. So, you could even use a space instead of empty string. An empty string in Oracle is considered as NULL value.
So, you could also do -
SQL> SELECT TRANSLATE('abcdefg', ' abc', ' ') FROM dual;
TRAN
----
defg
SQL>
Which is the same as -
SQL> SELECT TRANSLATE('abcdefg', chr(32)||'abc', chr(32)) FROM dual;
TRAN
----
defg
SQL>
Since the ascii value of space is 32.
It was just a demo, it is better to use any other character than space for better understanding and code readability.

Oracle 11.2 to_number multiple commas

In Oracle 11.2, is there some number format, nf, that will work with to_number to parse arbitrary length varchar2s containing digits and commas?
I can achieve this without a number format, by using regexp_replace, but I'd prefer to achieve the same thing using just a number format.
e.g., the following 2 statements work:
select to_number(regexp_replace('12,345', ',', '')) from dual;
select to_number(regexp_replace('1,234,567', ',', '')) from dual;
but I'd prefer:
select to_number('12,345', nf) from dual;
select to_number('1,234,567', nf) from dual;
where nf is one number format string that works for both statements.
If I try nf = '99,999', the first statement works, but the second fails.
Thanks.
Oracle won't complain if the number format is too long, so you can use a model that has enough digits to cope with the biggest number you can receive:
SQL> select to_number('12,345',
2 '999G999G999G999G999G999G999G999G999G999G999G999G999') from dual;
TO_NUMBER('12,345','999G999G999G999G999G999G999G999G999G999G999G999G999')
-------------------------------------------------------------------------
12345
SQL> select to_number('1,234,567',
2 '999G999G999G999G999G999G999G999G999G999G999G999G999') from dual;
TO_NUMBER('1,234,567','999G999G999G999G999G999G999G999G999G999G999G999G999')
----------------------------------------------------------------------------
1234567
SQL> select to_number('999,999,999,999,999,999,999,999,999,999,999,999,999',
2 '999G999G999G999G999G999G999G999G999G999G999G999G999') from dual;
TO_NUMBER('999,999,999,999,999,999,999,999,999,999,999,999,999','999G999G999G999
--------------------------------------------------------------------------------
1.0000E+39
I've used the G group separator instead of a fixed comma to support globalisation, but the effect is the same.
The only caveat is that the source number has to have the right grouping so it matches the formatting exactly for the digits it does have:
SQL> select to_number('1,2345',
2 '999G999G999G999G999G999G999G999G999G999G999G999G999') from dual;
select to_number('1,2345',
*
ERROR at line 1:
ORA-01722: invalid number
Although I support Alex Poole's answer, here's another crude but effective way of solving the problem that should perform better than doing a regex.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_of_numbers (
example_num VARCHAR2(50)
)
/
INSERT INTO table_of_numbers (example_num)
VALUES ('12,345')
/
INSERT INTO table_of_numbers (example_num)
VALUES ('1,234,567')
/
Query 1:
SELECT TO_NUMBER(example_num, RPAD('9', LENGTH(example_num) - 1, '9')) fudge
FROM table_of_numbers
Results:
| FUDGE |
-----------
| 12345 |
| 1234567 |
If you need to match the commas, then you could do something slightly more sophisticated with INSTR and LPAD to make sure you generate the right mask.
For this :
select to_number('1,234,567', nf) from dual;
Use nf = 9,999,999 will work.

Resources