Oracle SQL-Loader handling efficiently internal Double Quotes in values - oracle
I have some Oracle SQL Loader challenges and looking for an efficient and simple solution.
my source files to be loaded are pipe | delimited, where values are enclosed by Double Quotes ".
the problem seems to be that some of the values contains internal Double Quotes.
e.g.: ..."|"a":"b"|"...
this causes my records to be rejected under the excuse of:
no terminator found after TERMINATED and ENCLOSED field
there are various solutions over the web but non seems to fit:
[1]
I have tried to replace all internal double quotes in quoting the quotes,
but it seems that when applying this function on too many fields on the control files
(I have ~2000+ fields and using FILLER to load only a subset)
the loader complains again:
SQL*Loader-350: Syntax error at line 7.
Expecting "," or ")", found ",".
field1 char(36) "replace(:field1,'"','""')",
(I do not know why but when applying this solution on a narrow subset of columns it does seem to work)
thing is that potentially all fields may include internal double quotes.
[2]
I'm able to load all data when omitting the global optionally enclosed by '"', but then all enclosing quotes becomes part of the data in the target table.
[3]
I can omit the global optionally enclosed by '"' statement and place it only at selected fields,
while try to "replace(:field1,'"','""')" statement on the remainder, but this is difficult to implement,
as I cannot know ahead what are the suspected fields to include internal double quotes.
here are my questions:
is there no simple way to convince the loader to handle with care internal double quotes (when values are enclosed by them)?
if I'm forced to fix the data ad-hock, is there a one liner Linux command to convert only internal double quotes to another string/char,
say, single quotes?
if I'm forced to load data with the quotes to the target table, is there a simple way to remove the enclosing double quotes from all fields,
all at once (the table has ~1000 columns). is the solution practical performance wise to very large tables?
If you never had pipes in the enclosed fields you could do it from the control file. If you can have both pipes and double-quotes within a field then I think you have no choice but to preprocess the files, unfortunately.
Your solution [1], to replace double-quotes with an SQL operator, is happening too late to be useful; the delimiters and enclosures have already been interpreted by SQL*Loader before it does the SQL step. Your solution [2], to ignore the enclosure, would work in combination with [1] - until one of the fields did contain a pipe character. And solution [3] has the same problems as using [1] and/or [2] globally.
The documentation for specifying delimiters mentions that:
Sometimes the punctuation mark that is a delimiter must also be included in the data. To make that possible, two adjacent delimiter characters are interpreted as a single occurrence of the character, and this character is included in the data.
In other words, if you repeated the double-quotes inside the fields then they would be escaped and would appear in the table data. As you can't control the data generation, you could preprocess the files you get to replace all the double-quotes with escaped double quotes. Except you don't want to replace all of them - the ones that are actually real enclosures should not be escaped.
You could use a regular expression to target the relevant characters will skipping others. Not my strong area, but I think you can do this with lookahead and lookbehind assertions.
If you had a file called orig.txt containing:
"1"|A|"B"|"C|D"
"2"|A|"B"|"C"D"
3|A|""B""|"C|D"
4|A|"B"|"C"D|E"F"G|H""
you could do:
perl -pe 's/(?<!^)(?<!\|)"(?!\|)(?!$)/""/g' orig.txt > new.txt
That looks for a double-quote which is not preceded by the line-start anchor or a pipe character; and is not followed by a pipe character or line end anchor; and replaces only those with escaped (doubled) double-quotes. Which would make new.txt contain:
"1"|A|"B"|"C|D"
"2"|A|"B"|"C""D"
3|A|"""B"""|"C|D"
4|A|"B"|"C""D|E""F""G|H"""
The double-quotes at the start and end of fields are not modified, but those in the middle are now escaped. If you then loaded that with a control file with double-quote enclosures:
load data
truncate
into table t42
fields terminated by '|' optionally enclosed by '"'
(
col1,
col2,
col3,
col4
)
Then you would end up with:
select * from t42 order by col1;
COL1 COL2 COL3 COL4
---------- ---------- ---------- --------------------
1 A B C|D
2 A B C"D
3 A "B" C|D
3 A B C"D|E"F"G|H"
which hopefully matches your original data. There may be edge cases that don't work (like a double-quote followed by a pipe within a field) but there's a limit to what you can do to attempt to interpret someone else's data... There may also be (much) better regular expression patterns, of course.
You could also consider using an external table instead of SQL*Loader, if the data file is (or can be) in an Oracle directory and you have the right permissions. You still have to modify the file, but you could do it automatically with the preprocessor directive, rather than needing to do that explicitly before calling SQL*Loader.
Related
SQL Loader: Double quotes inside double quotes
Hi I have problem loading data with double quotes inside double quotes. I always got rejected and save in bad file. My sql scripts is like this. ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER ACCESS PARAMETERS ( records delimited by newline STRING SIZES ARE IN CHARACTERS logfile DIR_LOGS:'logs%p.log' badfile DIR_LOGS:'bads%p.bad' discardfile DIR_LOGS:'discarded%p.dsc' fields terminated by ',' optionally enclosed by '"' missing field values are NULL ) My data is like this: Name, Address, Company Name, "Juan "Julio" Dela Cruz", "Block 5, lot6 Frobes Subd", "REGUS Corp "A"". "Ferdinand Magellan", "Block 5, lot6, Frobes Subd", "REGUS Corp" I want to retain the double quotes in the name and company. Juan "Julio" Dela Cruz and REGUS Corp "A". What can you guys recommend?
There's nothing you can do, as far as SQL*Loader is concerned. If values are optionally enclosed into double quotes, then you can NOT have double quotes within those values. If you aren't allowed to modify data, you have to find someone who can so that optionally enclosing character is changed to something else (i.e. not the double quote). Otherwise you won't be able to successfully load data.
How to retain double quotes in a column while loading a file using SQL Loader
I am trying to load a txt file with | (pipe) delimiter to an Oracle table via SQL loader utility. All the fields are enclosed with double quotes. But there are some text fields in the files that have additional double quotes in addition to the enclosed ones that needs to be retained. All the table columns are defined as VARCHAR. Here's the control parameters am using OPTIONS (DIRECT=TRUE,SKIP=1) LOAD DATA CHARACTERSET UTF8 INFILE aaa.txt APPEND INTO TABLE info_table FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS This is my sample file "1"|"High "Gold Tip" Tea, 600" "2"|""10000 Beers, Wines & Spirits"" Table should be loaded with the below details Record 1: Column 1 - 1 Column 2 - High "Gold Tip" Tea, 600 Record 2: Column 1 - 2 Column 2 - 10000 Beers, Wines & Spirits
Unfortunately, there's nothing much to be said. File format is bad. You can't enclose values into characters that are used in those fields themselves. As data contain double quotes, you'll have to optionally enclose values into something else, not double quotes. However, as you already split values with pipe characters, what do you need double quotes to optionally enclose those field values? Omit them from the file and you won't have any problem (of such kind, of course; who knows what might come next, but that's another story).
How to load data into Oracle using SQL Loader with skipping and merging columns?
I am trying to load data into Oracle database using sqlloader, My data looks like following. 1|2|3|4|5|6|7|8|9|10 I do not want to load first and last column into table, I want to load 2|3|4|5|6|7|8|9 into one field. The table I am trying to load into has only one filed named 'field1'. If anyone has this kind of experience, could you give some advice? I tried BOUNDFILLER, FILLER and so on, I could not make it. Help me. :)
Load the entire row from the file into a BOUNDFILLER, then extract the part you need into the column. You have to tell sqlldr that the field is terminated by the carriage return/linefeed (assuming a Windows OS) so it will read the entire line from the file as one field. here the whole line from the file is read into "dummy" as BOUNDFILLER. "dummy" does not match a column name, and it's defined as BOUNDFILLER anyway, so the whole row is "remembered". The next line in the control file starts with a column that DOES match a column name, so sqlldr attempts to execute the expression. It extracts a substring from the saved "dummy" and puts it into the "col_a" column. The regular expression in a nutshell returns the part of the string after but not including the first pipe, and before but not including the last pipe. Note the double backslashes. In my environment anyway, when using a backslash to take away the special meaning of the pipe (not needed when between the square brackets) it gets stripped when passing from sqlldr to the regex engine so two backslashes are required in the control file (normally a pipe symbol is a logical OR) so one gets through in the end. If you have trouble here, try one backslash and see what happens. Guess how long THAT took me to figure out! load data infile 'x_test.dat' TRUNCATE into table x_test FIELDS TERMINATED BY x'0D0A' ( dummy BOUNDFILLER, col_a expression "regexp_substr(:dummy, '[^|]*\\|(.+)\\|.*', 1, 1, NULL, 1)" ) EDIT: Use this to test the regular expression. For example, if there is an additional pipe at the end: select regexp_substr('1|2|3|4|5|6|7|8|9|10|', '[^|]*\|(.+)\|.*\|', 1, 1, NULL, 1) from dual; 2nd edit: For those uncomfortable with regular expressions, this method uses nested SUBSTR and INSTR functions: SQL> with tbl(str) as ( select '1|2|3|4|5|6|7|8|9|10|' from dual ) select substr(str, instr(str, '|')+1, (instr(str, '|', -1, 2)-1 - instr(str , '|')) ) after from tbl; AFTER --------------- 2|3|4|5|6|7|8|9 Deciding which is easier to maintain is up to you. Think of the developer after you and comment at any rate! :-)
sqlldr WHEN clause
I am trying to code a sqlldr.ctl file WHEN Clause to limit the records imported to those matching a portion of the current Schema's name. The code I have (which does NOT work) is: LOAD DATA TRUNCATE INTO TABLE TMP_PRIM_ACCTS when REGION_NUM = substr(user,-3,3) Fields terminated by "|" Optionally enclosed by '"' Trailing NULLCOLS ( PORTFOLIO_ACCT, PRIMARY_ACCT_ID NULLIF (PRIMARY_ASSET_ID="NULL"), REGION_NUM NULLIF (PARTITION_NUM="NULL") ) sqlldr returns: SQL*Loader-350: Syntax error at line 3. Expecting quoted string or hex identifier, found "substr". when PARTITION_NUM = substr(user,-3,3) I cannot put single quotes around "user", because that turns it into the literal string "user". Can anyone explain how I can reference the "active" User in this WHEN Clause? Thank you!
Can you try something like this? (now I can't make test with SQLLDR, but this is syntax I used for changing values): when REGION_NUM = "substr(:user,-3,3)"
It doesn't look like you can. The documentation only shows fixed values: Trying to use an expression in when that clause (or in nullif; thought I'd try to see if you could cause a rejection based on null PK value) you just see the literal value in the log: Table TMP_PRIM_ACCTS, loaded when REGION_NUM = 0X73756273747228757365722c2d332c3329(character 'substr(user,-3,3)') which is sort of what you referred when you said you couldn't quote user, but you'd have to quite the whole thing anyway. Using :user doesn't work either, the colon is seen as just another character, it doesn't try to find a column called user instead. The simplest approach may be to pre-process the data file and remove any rows which don't match the pattern (e.g. via a regex). That would actually be slightly easier if you used an external table instead of SQL*Loader. Alternatively, generate your control file and embed the correct literal value based on the user you'll connect as.
How to call ora_hash function inside control file in sql loader?
I'm trying to call a function(ORA_HASH) inside sqlldr but I'm not able to achive the target. Data File abc.txt AKY,90035,"G","DP",20150121,"",0,,,,,,"","E8BD4346-A174-468B-ABC2-1586B81A8267",1,17934,5099627512855,"TEST of CLOROM","",14.00,"",14.00,17934,5099627512855,"TEST of CLOROM",14.00,"ONE TO BE T ONE",344,0,"98027f93-4f1a-44b2-b609-7ffbb041a375",,,AKY8035,"Taken Test","L-20 Shiv Lok" AKY,8035,"D","DP",20150121,"",0,,,,,,"","E8BD4346-A174-468B-ABC2-1586B81A8267",2,17162,5099627885843,"CEN TESt","",15.00,"",250.00,17162,5099627885843,"CEN TESt",15.00,"ONE TDAILY",3659,0,"09615cc8-77c9-4781-b51f-d44ec85bbe54",,,LLY8035,"Taken Test","L-20 Shiv Lok" Control file cnt_file.ctl load data into table Table_XYZ fields terminated by "," optionally enclosed by '"' F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17,F18,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28,F29,F30,F31 ORA_HASH(CONCAT(F2,F5,F6,F9,F10,F12,F13,F14,F15,F16,F17,F19,F21,F22)),F32 ORA_HASH(CONCAT(f23,H24,F7,F8,F3)),F33,F34,F35 sqlldr "xxxxx/yyyyy" control=cnt_file.ctl data=abc.txt whenever I'm executing sqlldr from Linux box I'm getting below error SQL*Loader-350: Syntax error at line 4. Expecting "," or ")", found "ORA_HASH". F29,F30,F31,KEY_CLMNS_HASH ORA_HASH(CONCAT( F2,F5 ^ Any idea
You might consider using a virtual column on the table to which you are loading the data. For columns which are deterministically based on other column values in the same row, that usually ends up being a more simple solution than anything involving SQL*Loader.
You're doing a few things wrong. The immediate error is because the Oracle function call has to be enclosed in double quotes: ...,F31 "ORA_HASH(CONCAT(F2,F5,F6,...))",... The second issue is that the concat function only takes two arguments, so you would either have to nest (lots of) concat calls, or more readably use the concatenation operator instead: ...,F31 "ORA_HASH(F2||F5||F6||...)",... And finally you need to prefix the field names inside your function call with a colon: ...,F31 "ORA_HASH(:F2||:F5||:F6||...)",... This is explained in the documentation: The following requirements and restrictions apply when you are using SQL strings: ... The SQL string must be enclosed in double quotation marks. And To refer to fields in the record, precede the field name with a colon (:). Field values from the current record are substituted. A field name preceded by a colon (:) in a SQL string is also referred to as a bind variable. Note that bind variables enclosed in single quotation marks are treated as text literals, not as bind variables.