I have included notrim for rowdata column in external table as suggesterd by Alex (This is a continuation of this question,),
But now End of Line character is also appending at the rowdata column, I mean , End of line (CR-LF) is also joins at the end of rowdata.
I don't want to use substr() or translate() , since file size is around 1GB,
My external table creation process :
'CREATE TABLE ' || rec.ext_table_name || ' (ROW_DATA VARCHAR2(4000)) ORGANIZATION EXTERNAL ' ||
'(TYPE ORACLE_LOADER DEFAULT DIRECTORY ' || rec.dir_name || ' ACCESS ' || 'PARAMETERS (RECORDS ' ||
'DELIMITED by NEWLINE NOBADFILE NODISCARDFILE ' ||
'FIELDS REJECT ROWS WITH ALL NULL FIELDS (ROW_DATA POSITION(1:4000) char)) LOCATION (' || l_quote ||
'temp.txt' || l_quote || ')) REJECT LIMIT UNLIMITED'
Is there any other paramenter I can add , to remove the End-of-line character. Thanks.
EDIT 1:
My file :
Some first line with spaces at end
Some second line with spaces at end
My Ext table :
Some first line with spaces at end <EOL>
Some second line with spaces at end <EOL>
to be more clear , I will explain in java (when I assign column values to string , it is something like below),
without notrim :
rowdata[1]="Some first line with spaces at end";
rowdata[2]="Some second line with spaces at end";
with notrim:
rowdata[1]="Some first line with spaces at end \n";
rowdata[2]="Some second line with spaces at end \n";
what I want it to be :
rowdata[1]="Some first line with spaces at end ";
rowdata[2]="Some second line with spaces at end ";
the delimiter is also a part of rowdata, since no trim is specified.
EDIT2:
Line-Endings : CRLF
Platform :
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit
Production PL/SQL Release 12.1.0.1.0 - Production
"CORE 12.1.0.1.0 Production" TNS for Solaris: Version 12.1.0.1.0 -
Production NLSRTL Version 12.1.0.1.0 - Production
SELECT DUMP(ROW_DATA,1016) FROM EXT_TABLE WHERE ROWNUM = 1;
Typ=1 Len=616 CharacterSet=AL32UTF8:
41,30,30,30,30,30,30,30,30,30,30,31,30,30,30,30,37,36,36,36,44,30,30,30,30,31,32,35,30,38,31,36,32,35,30,38,31,36,31,33,34,37,30,39,44,42,20,41,30,36,31,30,30,30,30,30,30,30,30,30,30,30,30,32,30,30,4d,59,52,20,32,5a,20,30,31,36,30,30,30,31,32,31,32,33,34,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,52,49,42,46,50,58,30,30,30,31,30,30,30,30,30,30,30,30,31,30,36,32,38,30,31,30,32,30,30,47,20,20,20,20,53,20,20,30,30,30,30,30,30,30,30,30,30,30,20,20,20,20,20,20,20,4e,39,32,37,32,20,20,20,20,20,20,30,30,30,30,30,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,39,39,38,54,45,53,54,52,52,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,4f,50,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,54,52,41,4e,53,49,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,52,52,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,4f,50,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,54,52,41,4e,53,49,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,d
Len should be 615
Your file line endings are CRLF (suggesting the file is created in Windows?), but your database is running on Solaris. As the documentation says:
If DELIMITED BY NEWLINE is specified, then the actual value used is platform-specific. On UNIX platforms, NEWLINE is assumed to be "\n". On Windows operating systems, NEWLINE is assumed to be "\r\n".
As your database platform is Unix it's only using the LF (\n) as the record delimiter. You can either change the delimiter in your file, or change the terminated by clause to look for the Windows line-ending:
,,,
records delimited by "\r\n" nobadfile ...
If you might get files with either type of line ending and can't control that, you could add a preprocessor step to strip any that do exist. If you create an executable script file, either in the same directory as the file or (as Oracle recommends) in a different Oracle-accessible directory, say called remove_cr which contains:
/usr/bin/sed -e "s/\\r$//" $1
you can add a call to that in your external table definition, and keep the newline temrinator:
...
records delimited by newline nobadfile nodiscardfile
preprocessor 'remove_cr'
...
Make sure you read the the security warnings in the documentation though.
Demo with a temp.txt file with CRLF line endings:
create table t42_ext (
row_data varchar2(4000)
)
organization external
(
type oracle_loader default directory d42 access parameters
(
records delimited by newline nobadfile nodiscardfile
preprocessor 'remove_cr'
fields reject rows with all null fields
(
row_data position(1:4000) char notrim
)
)
location ('temp.txt')
)
reject limit unlimited;
select '<'|| row_data ||'>' from t42_ext;
'<'||ROW_DATA||'>'
--------------------------------------------------------------------------------
<Line1sometext >
<Line2sometext >
<Line3sometext >
I am loading some data to Oracle via SQLLDR. The source file is "pipe delimited".
FIELDS TERMINATED BY '|'
But some records contain pipe character in data, and not as separator. So it breaks correct loading of records as it understands indata pipe characters as field terminator.
Can you point me a direction to solve this issue?
Data file is about 9 GB, so it is hard to edit manually.
For example,
Loaded row:
ABC|1234567|STR 9 R 25|98734959,32|28.12.2011
Rejected Row:
DE4|2346543|WE| 454|956584,84|28.11.2011
Error:
Rejected - Error on table HSX, column DATE_N.
ORA-01847: day of month must be between 1 and last day of month
DATE_N column is the last one.
You could not use any separator, and do something like:
field FILLER,
col1 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\1')",
col2 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\2')",
col3 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\3')",
col4 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\4')",
col5 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\5')",
col6 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\6')"
This regexp takes six capture groups (inside parentheses) separated by a vertical bar (I had to escape it because otherwise it means OR in regexp). All groups except the third cannot contain a vertical bar ([^|]*), the third group may contain anything (.*), and the regexp must span from beginning to end of the line (^ and $).
This way we are sure that the third group will eat all superfluous separators. This only works because you've only one field that may contain separators. If you want to proofcheck you can for example specify that the fourth group starts with a digit (include \d at the beginning of the fourth parenthesized block).
I have doubled all backslashes because we are inside a double-quoted expression, but I am not really sure that I ought to.
It looks to me that it's not really possible for SQL*Loader to handle your file because of the third field which: can contain the delimiter, is not surrounded by quotes and is of a variable length. Instead, if the data you provide is an accurate example then I can provide a sample workaround. First, create a table with one column of VARCHAR2 with length the same as the maximum length of any one line in your file. Then just load the entire file into this table. From there you can extract each column with a query such as:
with CTE as
(select 'ABC|1234567|STR 9 R 25|98734959,32|28.12.2011' as CTETXT
from dual
union all
select 'DE4|2346543|WE| 454|956584,84|28.11.2011' from dual)
select substr(CTETXT, 1, instr(CTETXT, '|') - 1) as COL1
,substr(CTETXT
,instr(CTETXT, '|', 1, 1) + 1
,instr(CTETXT, '|', 1, 2) - instr(CTETXT, '|', 1, 1) - 1)
as COL2
,substr(CTETXT
,instr(CTETXT, '|', 1, 2) + 1
,instr(CTETXT, '|', -1, 1) - instr(CTETXT, '|', 1, 2) - 1)
as COL3
,substr(CTETXT, instr(CTETXT, '|', -1, 1) + 1) as COL4
from CTE
It's not perfect (though it may be adaptable to SQL*Loader) but would need a bit of work if you have more columns or if your third field is not what I think it is. But, it's a start.
OK, I recomend you to parse the file and replace the delimiter.
In command line in Unix/linux you should do:
cat current_file | awk -F'|' '{printf( "%s,%s,", $1, $2); for(k=3;k<NF-2;k++) printf("%s|", $k); printf("%s,%s,%s", $(NF-2),$(NF-1),$NF);print "";}' > new_file
This command will not change your current file.
Will create a new file, comma delimited, with five fields.
It splits the input file on "|" and take first, second, anything to antelast, antelast, and last chunk.
You can try to sqlldr the new_file with "," delimiter.
UPDATE:
The command can be put in a script like (and named parse.awk)
#!/usr/bin/awk
# parse.awk
BEGIN {FS="|"}
{
printf("%s,%s,", $1, $2);
for(k=3;k<NF-2;k++)
printf("%s|", $k);
printf("%s,%s,%s\n", $(NF-2),$(NF-1),$NF);
}
and you can run in this way:
cat current_file | awk -f parse.awk > new_file