I have a table with an email field this field can only have the following characters:
'abcdefghijklmnopqrstuvwxyz0123456789. # _- +'
How can you check the email field to know if I have any different characters from the ones I mentioned ('abcdefghijklmnopqrstuvwxyz0123456789. # _- +')?
This sounds like a perfect job for a regular expression - just check whether the E-Mail contains any characters that are not in your list. You can use regexp_like for this:
regexp_like(e_mail, '[^-a-z0-9.#_ +]')
(I've replaced a...z and 0..9 with the respective ranges - shorter and more readable. Note that the hyphen '-' has to be the first character after the initial caret '^' to indicate that it is a literal hyphen and not part of a character range).
Simple test case:
with v_data(e_mail) as (
select 'xyz#abc.com' from dual union all
select 'xyz(#def.com' from dual union all
select 'ab123-def#gmail.com' from dual
)
select
e_mail,
(case
when regexp_like(e_mail, '[^-a-z0-9.#_ +]') then 'NO'
else 'YES'
end) as is_valid_email
from v_data
However, a valid E-Mail adresse can contain tons of additional characters - uppercase letters for example.
Related
I am trying to come up with an equivalent of the below Oracle statement in Snowflake. This would check if the different parts of the string separated by '.' matches the number of characters in the REGEXP_LIKE expression. I have come up with a rudimentary version to perform the check in Snowflake but I am sure there's a better and cleaner way to do it. I am looking to come up with a one-liner regular expression check in Snowflake similar to Oracle. Appreciate your help!
-- Oracle
SELECT -- would return True
CASE
WHEN REGEXP_LIKE('AB.XYX.12.34.5670.89', '^\w{2}\.\w{3}\.\w{2}') THEN 'True'
ELSE NULL
END AS abc
FROM DUAL
-- Snowflake
SELECT -- would return True
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 1), '[A-Z0-9]{2}') AND
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 2), '[A-Z0-9]{3}') AND
REGEXP_LIKE(SPLIT_PART('AB.XYX.12.34.5670.89', '.', 3), '[A-Z0-9]{2}') AS abc
You need to add a .* at the end as the REGEXP_LIKE adds explicit ^ && $ to string:
The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$'). To match any string starting with ABC, the pattern would be 'ABC.*'.
select
column1 as str,
REGEXP_LIKE(str, '\\w{2}\\.\\w{3}\\.\\w{2}.*') as oracle_way
FROM VALUES
('AB.XYX.12.34.5670.89')
;
gives:
STR
ORACLE_WAY
AB.XYX.12.34.5670.89
TRUE
Or in the context of your question:
SELECT IFF(REGEXP_LIKE('AB.XYX.12.34.5670.89', '\\w{2}\\.\\w{3}\\.\\w{2}.*'), 'True', null) AS abc;
Your use of \w seems to suggest you don't need delimited strings to be strictly [A-Z0-9] since word characters allow underscore and period. If all bets were off and the only requirement was to have . at 3rd, 7th and 10th position, you could have used like this way.
select 'AB.XGH.12.34.5670.89' like '__.___.__.%' ;
I have a table in Oracle database with special characters attached at first and last position in the field value. I want to eliminate those special characters while querying the table. I have used INSTR function but I had to apply for each and every special character using CASE expression.
Is there a way to eliminate any special characters that is attached only at first and last positions in one shot?
The query I am using as is below:
CASE WHEN
INSTR(emp_address,'"')=1 THEN REPLACE((emp_address,'"', '').
.
.
.
You can use regular expressions to replace the leading and trailing character of a string if they match the regular expression pattern. For example, if your definition of a "special character" is anything that is not an alpha-numeric character then you can use the regular expression:
^ the start-of-the-string then
[^[:alnum:]] any single character that does not match the POSIX alpha-numeric character group
| or
[^[:alnum:]] any single character that does not match the POSIX alpha-numeric character group then
$ the end-of-the-string.
Like this:
SELECT emp_address,
REGEXP_REPLACE(
emp_address,
'^[^[:alnum:]]|[^[:alnum:]]$'
) AS simplified_emp_address
FROM table_name
Which, for the sample data:
CREATE TABLE table_name (emp_address) AS
SELECT 'test' FROM DUAL UNION ALL
SELECT '"test2"' FROM DUAL UNION ALL
SELECT 'Not "this" one' FROM DUAL;
Outputs:
EMP_ADDRESS
SIMPLIFIED_EMP_ADDRESS
test
test
"test2"
test2
Not "this" one
Not "this" one
If you have a more complicated definition of a special character then change the regular expression appropriately.
db<>fiddle here
From the String ES-123456-PSA Spain-101, I need to extract only ES-123456-101 Delimiter position is fixed.
Tried REGEXP_SUBSTR('ES-123456-PSA Spain-101','[^-]+',2,3 ) which gives PSA Spain.
Is there a way to ignore those specific characters and returns rest of them.
If you want ES-123456-101 then use this:
SELECT REGEXP_REPLACE('ES-123456-PSA Spain-101', '[^-]+-', '', 1, 3 )
FROM dual;
If you want ES-12345-101 then could you explain the logic for 12345 not 123456? Typo or omit the last character?
you can also use subtr and instr
with t as
(
select 'ES-123456-PSA Spain-101' as text from dual
)
select substr(text,1,instr(text,'-',1,2)) -- ES-123456-
||substr(text,instr(text,'-',1,3)+1) -- 101
from t
A super simple example of my script looks as follows:
-- Report Name: "Report_1"
col letters new_value p_letters
SELECT letters
FROM param_table
WHERE report_name = 'Report_1';
CREATE TABLE temp_table_1
(letter varchar2(1));
INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '&&p_letters' = '' OR letter IN (&&p_letters);
For some reason, our system has a table called param_table: users enter parameters through the UI, the parameters entered are written to param_table, and then my script pulls the user's parameters from param_table.
As far as I understand, the first SELECT statement selects the letters column from param_table and makes its values accessible in '&&p_letters'. In my INSERT INTO statement, when my WHERE clause looks like this...
WHERE letter IN (&&p_letters);
...and the user enters letters separated by single quotes, eg ('A', B', C'), the script runs fine. I want to make the parameter optional, so I adjusted the WHERE clause like this:
WHERE '&&p_letters' = '' OR letter IN (&&p_letters);
In my output file, I get the following error:
WHERE (('' = '') OR letter IN ()) *
ERROR at line ...:
ORA-00936: missing expression
The compiler has evaluated the substitution variable correctly as '', but I'm getting an error.
Any idea what I could be doing wrong here?
The ORA-00936 is because IN () is not valid - you're missing something inside that. It is that it is complaining about, not the '' = '' part, though the result of that is undefined. You can check both conditions:
SQL> select * from dual where '' = '';
no rows selected
SQL> select * from dual where dummy in ();
select * from dual where dummy in ()
*
ERROR at line 1:
ORA-00936: missing expression
If you set verify on you can see how the substitution is handled. For your original query you'd see:
old:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE letter IN (&&p_letters)
new:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE letter IN ('A','B','C')
3 rows inserted.
You can see that the post-substitution statement looks, and is, valid.
With your modified query you'd see:
old:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '&&p_letters' = '' OR letter IN (&&p_letters)
new:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE ''A','B','C'' = '' OR letter IN ('A','B','C')
which generates an ORA-00920 because of the messed-up single quotes in the first expression. With no value from letters you'd instead see:
old:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '&&p_letters' = '' OR letter IN (&&p_letters)
new:INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE '' = '' OR letter IN ()
which is the error you saw, ORA-00936.
I'd be tempted to do this with a collection type, either your own, or if you're comfortable with it then a built-in one:
INSERT INTO temp_table_1(letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE SYS.DBMS_DEBUG_VC2COLL(&&p_letters) IS EMPTY
OR letter MEMBER OF SYS.DBMS_DEBUG_VC2COLL(&&p_letters);
That works with your three comma-separated values, or null, since an empty collection is allowed. Read more about is empty and member of.
It would be better, of course, to not store comma-separated lists in a single column value anyway, and to change your data model so this kind of manipulation and reliance on client behaviour isn't necessary.
Assuming you're stuck with the data model, you could at least avoid the client reliance buy tokenizing the string (I'm using one common approach below) and looking for matches. However, you also need to account for either the report name not being in the table at all or the report existing with no letters value, both of which are handled by the max(letters) .. is null check - which makes it a bit ugly.
It's all in one statement though, with no need for a separate query to get the parameters and no need for substitution variables. (And there may be better ways to do it!)
INSERT INTO temp_table_1 (letter)
SELECT DISTINCT letter
FROM table_alphabet
WHERE (
SELECT MAX(letters)
FROM param_table
WHERE report_name = 'Report_2'
) IS NULL
OR letter IN (
SELECT TRIM(q'[']' FROM REGEXP_SUBSTR(letters, '[^,]', 1, LEVEL))
FROM param_table
WHERE report_name = 'Report_2'
CONNECT BY REGEXP_SUBSTR(letters, '[^,]', 1, level) IS NOT NULL
);
I'm using the function called: "REGEXP_LIKE", with next below pattern:
^[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$
But, I have a column that contain next values to analyze:
REGEXP_LIKE (column_name,'^[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$')
FRANÞOISVERBEKE#TISCALINET.BE
GENEVIÞVE.DELSOIR#MINFIN.FED.BE
CREVECOEURÆ-OLI#HOTMAIL.COM
HERVÉ.GHILBERT#SKYNET.BE
As you note, all of them contain special character and all of them are considered correct when I use the function with this pattern.
Do you know why, if I'm not specifying the special characters? How can I exclude all special characters with this function and this pattern?
I am not entirely sure you can accomplish this within your regular expression. However, you could add an additional filter as below:
SELECT * FROM table_name
WHERE REGEXP_LIKE(column_name,'^[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$')
AND REPLACE(TRANSLATE(LOWER(column_name), 'abcdefghijklmnopqrstuvwxyz0123456789#+.-_%','zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz'),'z') IS NOT NULL
The TRANSLATE() function will replace all of the "regular" letters (plus the characters ordinarily allowed in email addresses; I think I've gotten them all) with 'z's; the REPLACE() function replaces these with nothing; if the resulting string IS NOT NULL then there are "special" characters.
I could not confirm that this actually works since the character set in my database is ASCII and doesn't return "special" characters for the regex. But I confirmed that the REPLACE(TRANSLATE()) clause does work:
WITH t1 AS (
SELECT 'FRANÞOISVERBEKE#TISCALINET.BE' AS mycolumn FROM dual
)
SELECT mycolumn
, REPLACE(TRANSLATE(LOWER(mycolumn),'abcdefghijklmnopqrstuvwxyz0123456789#+.-_%','zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz'),'z') AS mynewcolumn
FROM t1
WHERE REPLACE(TRANSLATE(LOWER(mycolumn),'abcdefghijklmnopqrstuvwxyz0123456789#+.-_%','zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz'),'z') IS NOT NULL
Result:
MYCOLUMN MYNEWCOLUMN
FRANÞOISVERBEKE#TISCALINET.BE þ