Escape Pipe in SQL Loader - oracle

I have a pipe delimited file which has to be loaded via SQL*Loader in Oracle.
My control file looks like this:
LOAD DATA
REPLACE
INTO TABLE1
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
ID "TRIM(:ID)",
TEXT "NVL(TRIM(:TEXT),' ')"
)
The TEXT column in the data file can contain text with "|"- i.e., delimiter too.
How can I accept pipe in the TEXT column?

You can't escape the delimiter; but if you want everything up to the first pipe to be the ID and everything after the first pipe to be TEXT, you could treat the record in the data file as a single field and split it using SQL functions, e.g.:
LOAD DATA
INFILE ...
REPLACE
INTO TABLE TABLE1
TRAILING NULLCOLS
(
ID CHAR(4000) "regexp_replace(:ID, '^(.*?)(\\|(.*))?$', '\\1')",
TEXT EXPRESSION "regexp_replace(:ID, '^(.*?)(\\|(.*))?$', '\\3')"
)
There is no FIELDS clause.
The ID is initially up to 4000 characters from the line (just a large value to hopefully capture any data you have). A regex replace is then applied to that; the pattern defines a first group as any characters (non-greedy), optionally followed by a second group comprising a pipe and a third inner group of zero or more characters after that pipe. The original value is replaced by group 1.
The TEXT is defined as an EXPRESSION, meaning it isn't obtained directly from the file; instead the same regex pattern is applied to the original ID value, but now that is replaced by the third group, which is everything after the first pipe (if there is one).
An equivalent in plain SQL as a demo would be:
with data (id) as (
select '123|test 1' from dual
union all
select '234|test 2|with pipe' from dual
union all
select '345|test 3|with|multiple|pipes|' from dual
union all
select null from dual
union all
select '678' from dual
union all
select '789|' from dual
)
select id as original,
regexp_replace(ID, '^(.*?)(\|(.*))?$', '\1') as id,
regexp_replace(ID, '^(.*?)(\|(.*))?$', '\3') as text
from data;
which gives:
ORIGINAL ID TEXT
------------------------------- ---- ------------------------------
123|test 1 123 test 1
234|test 2|with pipe 234 test 2|with pipe
345|test 3|with|multiple|pipes| 345 test 3|with|multiple|pipes|
567 567
678| 678
If you don't need to worry about records without that first pipe, or with that first pipe but followed by nothing, then the regex could be simpler:
(
ID CHAR(4000) "regexp_replace(:ID, '^(.*?)\\|(.*)$', '\\1')",
TEXT EXPRESSION "regexp_replace(:ID, '^(.*?)\\|(.*)$', '\\2')"
)

Related

Correct EMAIL Oracle

I have a table with an email field this field can only have the following characters:
'abcdefghijklmnopqrstuvwxyz0123456789. # _- +'
How can you check the email field to know if I have any different characters from the ones I mentioned ('abcdefghijklmnopqrstuvwxyz0123456789. # _- +')?
This sounds like a perfect job for a regular expression - just check whether the E-Mail contains any characters that are not in your list. You can use regexp_like for this:
regexp_like(e_mail, '[^-a-z0-9.#_ +]')
(I've replaced a...z and 0..9 with the respective ranges - shorter and more readable. Note that the hyphen '-' has to be the first character after the initial caret '^' to indicate that it is a literal hyphen and not part of a character range).
Simple test case:
with v_data(e_mail) as (
select 'xyz#abc.com' from dual union all
select 'xyz(#def.com' from dual union all
select 'ab123-def#gmail.com' from dual
)
select
e_mail,
(case
when regexp_like(e_mail, '[^-a-z0-9.#_ +]') then 'NO'
else 'YES'
end) as is_valid_email
from v_data
However, a valid E-Mail adresse can contain tons of additional characters - uppercase letters for example.

How to replace string using Regexp_Replace in oracle

I want to replace this:
"STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"
with this:
"STORES/KOL#10#8#36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"
basically this is conditional based replace I want to replace / with #
like STORES/KOL string should be STORES/KOL
but 10/8/36 string should be 10#8#36
This will replace the 2nd and 3rd / character with a #:
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT '"STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"'
FROM DUAL;
Query:
SELECT REGEXP_REPLACE(
value,
'^(.*?/.*?)/(.*?)/(.*)$',
'\1#\2#\3'
) AS replacement
FROM test_data
Output:
| REPLACEMENT |
| :---------------------------------------------------------------------------------------------------------------- |
| "STORES/KOL#10#8#36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL" |
db<>fiddle here
Here is one option using REGEXP_REPLACE. We can try targeting the following regex pattern:
#(\d+)/(\d+)/(\d+)#
Then replace using the three capture groups, replacing the path separators with pound signs.
WITH yourTable AS (
SELECT 'STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL' AS input FROM dual
)
SELECT
input,
REGEXP_REPLACE(input, '#(\d+)/(\d+)/(\d+)#', '#\1#\2#\3#') AS output
FROM yourTable;
Demo
Whether or not this regex replacement is specific enough/accurate for the rest of your data depends on that data, which you never showed us.
with s as (select '"STORES/KOL#10/8/36#1718.00#4165570.00#119539388#PT3624496#9902001#04266#6721#PT3624496-11608091-1-55-STORES/KOL"' str from dual)
select
replace(replace(str, '/', '#'), 'STORES#KOL', 'STORES/KOL') result_str_1,
regexp_replace(str, '(\d)/', '\1#') result_str_2
from s;

splitting a comma separated field and use in 'IN' clause oracle sql [duplicate]

I have (and don't own, so I can't change) a table with a layout similar to this.
ID | CATEGORIES
---------------
1 | c1
2 | c2,c3
3 | c3,c2
4 | c3
5 | c4,c8,c5,c100
I need to return the rows that contain a specific category id. I starting by writing the queries with LIKE statements, because the values can be anywhere in the string
SELECT id FROM table WHERE categories LIKE '%c2%';
Would return rows 2 and 3
SELECT id FROM table WHERE categories LIKE '%c3%' and categories LIKE '%c2%'; Would again get me rows 2 and 3, but not row 4
SELECT id FROM table WHERE categories LIKE '%c3%' or categories LIKE '%c2%'; Would again get me rows 2, 3, and 4
I don't like all the LIKE statements. I've found FIND_IN_SET() in the Oracle documentation but it doesn't seem to work in 10g. I get the following error:
ORA-00904: "FIND_IN_SET": invalid identifier
00904. 00000 - "%s: invalid identifier"
when running this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories); (example from the docs) or this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories) <> 0; (example from Google)
I would expect it to return rows 2 and 3.
Is there a better way to write these queries instead of using a ton of LIKE statements?
You can, using LIKE. You don't want to match for partial values, so you'll have to include the commas in your search. That also means that you'll have to provide an extra comma to search for values at the beginning or end of your text:
select
*
from
YourTable
where
',' || CommaSeparatedValueColumn || ',' LIKE '%,SearchValue,%'
But this query will be slow, as will all queries using LIKE, especially with a leading wildcard.
And there's always a risk. If there are spaces around the values, or values can contain commas themselves in which case they are surrounded by quotes (like in csv files), this query won't work and you'll have to add even more logic, slowing down your query even more.
A better solution would be to add a child table for these categories. Or rather even a separate table for the catagories, and a table that cross links them to YourTable.
You can write a PIPELINED table function which return a 1 column table. Each row is a value from the comma separated string. Use something like this to pop a string from the list and put it as a row into the table:
PIPE ROW(ltrim(rtrim(substr(l_list, 1, l_idx - 1),' '),' '));
Usage:
SELECT * FROM MyTable
WHERE 'c2' IN TABLE(Util_Pkg.split_string(categories));
See more here: Oracle docs
Yes and No...
"Yes":
Normalize the data (strongly recommended) - i.e. split the categorie column so that you have each categorie in a separate... then you can just query it in a normal faschion...
"No":
As long as you keep this "pseudo-structure" there will be several issues (performance and others) and you will have to do something similar to:
SELECT * FROM MyTable WHERE categories LIKE 'c2,%' OR categories = 'c2' OR categories LIKE '%,c2,%' OR categories LIKE '%,c2'
IF you absolutely must you could define a function which is named FIND_IN_SET like the following:
CREATE OR REPLACE Function FIND_IN_SET
( vSET IN varchar2, vToFind IN VARCHAR2 )
RETURN number
IS
rRESULT number;
BEGIN
rRESULT := -1;
SELECT COUNT(*) INTO rRESULT FROM DUAL WHERE vSET LIKE ( vToFine || ',%' ) OR vSET = vToFind OR vSET LIKE ('%,' || vToFind || ',%') OR vSET LIKE ('%,' || vToFind);
RETURN rRESULT;
END;
You can then use that function like:
SELECT * FROM MyTable WHERE FIND_IN_SET (categories, 'c2' ) > 0;
For the sake of future searchers, don't forget the regular expression way:
with tbl as (
select 1 ID, 'c1' CATEGORIES from dual
union
select 2 ID, 'c2,c3' CATEGORIES from dual
union
select 3 ID, 'c3,c2' CATEGORIES from dual
union
select 4 ID, 'c3' CATEGORIES from dual
union
select 5 ID, 'c4,c8,c5,c100' CATEGORIES from dual
)
select *
from tbl
where regexp_like(CATEGORIES, '(^|\W)c3(\W|$)');
ID CATEGORIES
---------- -------------
2 c2,c3
3 c3,c2
4 c3
This matches on a word boundary, so even if the comma was followed by a space it would still work. If you want to be more strict and match only where a comma separates values, replace the '\W' with a comma. At any rate, read the regular expression as:
match a group of either the beginning of the line or a word boundary, followed by the target search value, followed by a group of either a word boundary or the end of the line.
As long as the comma-delimited list is 512 characters or less, you can also use a regular expression in this instance (Oracle's regular expression functions, e.g., REGEXP_LIKE(), are limited to 512 characters):
SELECT id, categories
FROM mytable
WHERE REGEXP_LIKE('c2', '^(' || REPLACE(categories, ',', '|') || ')$', 'i');
In the above I'm replacing the commas with the regular expression alternation operator |. If your list of delimited values is already |-delimited, so much the better.

Multiple lines in a column in oracle to a single row

My oracle table is as follows ( Address column having multiple lines):
ID Address
--------------------
1456897 No 61
11th Street
Tatabad Coimbatore - 641012
How to get the desired result as (with Address column as a single line) ?
ID Address
-------------------------
1456897 No 61 , 11th Street, Tatabad Coimbatore - 641012
I don't know if your database has its newlines as \x0a or \x0d or \x0d\x0a. I therefore propose a the following solution that handles all three kind of new lines. It will however replace mutliple newlines with one ,. This might be what you want, or it might not.
select
id,
regexp_replace(
address,
'('||chr(10)||'|'||chr(13)||')+',
', ') as address,
....
from
....
remove new line character in the column - something like
SELECT REPLACE(Address_column, '\n', ' ') -- \n might be also \r\n or even \r
FROM table_name

SQL*Loader: Dealing with delimiter characters in data

I am loading some data to Oracle via SQLLDR. The source file is "pipe delimited".
FIELDS TERMINATED BY '|'
But some records contain pipe character in data, and not as separator. So it breaks correct loading of records as it understands indata pipe characters as field terminator.
Can you point me a direction to solve this issue?
Data file is about 9 GB, so it is hard to edit manually.
For example,
Loaded row:
ABC|1234567|STR 9 R 25|98734959,32|28.12.2011
Rejected Row:
DE4|2346543|WE| 454|956584,84|28.11.2011
Error:
Rejected - Error on table HSX, column DATE_N.
ORA-01847: day of month must be between 1 and last day of month
DATE_N column is the last one.
You could not use any separator, and do something like:
field FILLER,
col1 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\1')",
col2 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\2')",
col3 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\3')",
col4 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\4')",
col5 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\5')",
col6 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\6')"
This regexp takes six capture groups (inside parentheses) separated by a vertical bar (I had to escape it because otherwise it means OR in regexp). All groups except the third cannot contain a vertical bar ([^|]*), the third group may contain anything (.*), and the regexp must span from beginning to end of the line (^ and $).
This way we are sure that the third group will eat all superfluous separators. This only works because you've only one field that may contain separators. If you want to proofcheck you can for example specify that the fourth group starts with a digit (include \d at the beginning of the fourth parenthesized block).
I have doubled all backslashes because we are inside a double-quoted expression, but I am not really sure that I ought to.
It looks to me that it's not really possible for SQL*Loader to handle your file because of the third field which: can contain the delimiter, is not surrounded by quotes and is of a variable length. Instead, if the data you provide is an accurate example then I can provide a sample workaround. First, create a table with one column of VARCHAR2 with length the same as the maximum length of any one line in your file. Then just load the entire file into this table. From there you can extract each column with a query such as:
with CTE as
(select 'ABC|1234567|STR 9 R 25|98734959,32|28.12.2011' as CTETXT
from dual
union all
select 'DE4|2346543|WE| 454|956584,84|28.11.2011' from dual)
select substr(CTETXT, 1, instr(CTETXT, '|') - 1) as COL1
,substr(CTETXT
,instr(CTETXT, '|', 1, 1) + 1
,instr(CTETXT, '|', 1, 2) - instr(CTETXT, '|', 1, 1) - 1)
as COL2
,substr(CTETXT
,instr(CTETXT, '|', 1, 2) + 1
,instr(CTETXT, '|', -1, 1) - instr(CTETXT, '|', 1, 2) - 1)
as COL3
,substr(CTETXT, instr(CTETXT, '|', -1, 1) + 1) as COL4
from CTE
It's not perfect (though it may be adaptable to SQL*Loader) but would need a bit of work if you have more columns or if your third field is not what I think it is. But, it's a start.
OK, I recomend you to parse the file and replace the delimiter.
In command line in Unix/linux you should do:
cat current_file | awk -F'|' '{printf( "%s,%s,", $1, $2); for(k=3;k<NF-2;k++) printf("%s|", $k); printf("%s,%s,%s", $(NF-2),$(NF-1),$NF);print "";}' > new_file
This command will not change your current file.
Will create a new file, comma delimited, with five fields.
It splits the input file on "|" and take first, second, anything to antelast, antelast, and last chunk.
You can try to sqlldr the new_file with "," delimiter.
UPDATE:
The command can be put in a script like (and named parse.awk)
#!/usr/bin/awk
# parse.awk
BEGIN {FS="|"}
{
printf("%s,%s,", $1, $2);
for(k=3;k<NF-2;k++)
printf("%s|", $k);
printf("%s,%s,%s\n", $(NF-2),$(NF-1),$NF);
}
and you can run in this way:
cat current_file | awk -f parse.awk > new_file

Resources