SQL Loader: Double quotes inside double quotes - oracle

Hi I have problem loading data with double quotes inside double quotes. I always got rejected and save in bad file.
My sql scripts is like this.
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
ACCESS PARAMETERS (
records delimited by newline
STRING SIZES ARE IN CHARACTERS
logfile DIR_LOGS:'logs%p.log'
badfile DIR_LOGS:'bads%p.bad'
discardfile DIR_LOGS:'discarded%p.dsc'
fields terminated by ',' optionally enclosed by '"'
missing field values are NULL
)
My data is like this:
Name, Address, Company Name,
"Juan "Julio" Dela Cruz", "Block 5, lot6 Frobes Subd", "REGUS Corp "A"".
"Ferdinand Magellan", "Block 5, lot6, Frobes Subd", "REGUS Corp"
I want to retain the double quotes in the name and company.
Juan "Julio" Dela Cruz and REGUS Corp "A". What can you guys recommend?

There's nothing you can do, as far as SQL*Loader is concerned. If values are optionally enclosed into double quotes, then you can NOT have double quotes within those values.
If you aren't allowed to modify data, you have to find someone who can so that optionally enclosing character is changed to something else (i.e. not the double quote). Otherwise you won't be able to successfully load data.

Related

How to avoid " " in select statement in column name for column that has numberAlphabet pattern in Oracle?

I have doubts regarding double quote column name in Oracle. I tried creating column name in number_alphabets pattern but this won't work. Then I used double quote and I was able to create table with this column name. When I do select, column name comes within double quote.
I have attached script in here.
CREATE TABLE test
(
"100_title" VARCHAR2(200) NULL
)
SELECT * FROM test
When I do select, in result set, column name will be "100_title" but I do not want "" in it. Is there a way to fix this?
From the Database Object Names and Qualifiers documentation:
Nonquoted identifiers cannot be Oracle Database reserved words. Quoted identifiers can be reserved words, although this is not recommended.
and
Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character.
Nonquoted identifiers can only contain alphanumeric characters from your
database character set and the underscore (_). Database links can contain
periods (.) and "at" signs (#).
Quoted identifiers can contain any characters and punctuations marks as well
as spaces. However, neither quoted nor nonquoted identifiers can contain
double quotation marks or the null character (\0).
So your question:
When I do select, in result set, column name will be "100_title" but I do not want "" in it. Is there a way to fix this?
The column identifier 100_title starts with a non-alphabetic character so by point 6 of that documentation you must use double quotes with the identifier.
How the column name displays depends on the user interface you are using. On db<>fiddle, the column name is displayed without quotes and this will be the same with many other interfaces.
If the user interface you are using only outputs the identifier with surrounding quotes then you could change the identifier from "100_title" to title_100 as this starts with an alphabetic character and contains only alpha-numeric and underscore characters and, thus, does not need to be quoted.
The short version is "no; pick a name that starts with a letter"
If you use a name that starts with a number you'll have to use " every time you mention the column name, and you'll have to get the case right. Your column is called "100_title", not "100_Title" or "100_TITLE"
Call it title_100, then you can refer to it as any case, even TiTLe_100 if you like, and generally your life will be easier

How to retain double quotes in a column while loading a file using SQL Loader

I am trying to load a txt file with | (pipe) delimiter to an Oracle table via SQL loader utility. All the fields are enclosed with double quotes. But there are some text fields in the files that have additional double quotes in addition to the enclosed ones that needs to be retained. All the table columns are defined as VARCHAR. Here's the control parameters am using
OPTIONS (DIRECT=TRUE,SKIP=1)
LOAD DATA
CHARACTERSET UTF8
INFILE aaa.txt
APPEND INTO TABLE info_table
FIELDS TERMINATED BY "|"
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
This is my sample file
"1"|"High "Gold Tip" Tea, 600"
"2"|""10000 Beers, Wines & Spirits""
Table should be loaded with the below details
Record 1:
Column 1 - 1
Column 2 - High "Gold Tip" Tea, 600
Record 2:
Column 1 - 2
Column 2 - 10000 Beers, Wines & Spirits
Unfortunately, there's nothing much to be said.
File format is bad. You can't enclose values into characters that are used in those fields themselves. As data contain double quotes, you'll have to optionally enclose values into something else, not double quotes.
However, as you already split values with pipe characters, what do you need double quotes to optionally enclose those field values? Omit them from the file and you won't have any problem (of such kind, of course; who knows what might come next, but that's another story).

Oracle SQL-Loader handling efficiently internal Double Quotes in values

I have some Oracle SQL Loader challenges and looking for an efficient and simple solution.
my source files to be loaded are pipe | delimited, where values are enclosed by Double Quotes ".
the problem seems to be that some of the values contains internal Double Quotes.
e.g.: ..."|"a":"b"|"...
this causes my records to be rejected under the excuse of:
no terminator found after TERMINATED and ENCLOSED field
there are various solutions over the web but non seems to fit:
[1]
I have tried to replace all internal double quotes in quoting the quotes,
but it seems that when applying this function on too many fields on the control files
(I have ~2000+ fields and using FILLER to load only a subset)
the loader complains again:
SQL*Loader-350: Syntax error at line 7.
Expecting "," or ")", found ",".
field1 char(36) "replace(:field1,'"','""')",
(I do not know why but when applying this solution on a narrow subset of columns it does seem to work)
thing is that potentially all fields may include internal double quotes.
[2]
I'm able to load all data when omitting the global optionally enclosed by '"', but then all enclosing quotes becomes part of the data in the target table.
[3]
I can omit the global optionally enclosed by '"' statement and place it only at selected fields,
while try to "replace(:field1,'"','""')" statement on the remainder, but this is difficult to implement,
as I cannot know ahead what are the suspected fields to include internal double quotes.
here are my questions:
is there no simple way to convince the loader to handle with care internal double quotes (when values are enclosed by them)?
if I'm forced to fix the data ad-hock, is there a one liner Linux command to convert only internal double quotes to another string/char,
say, single quotes?
if I'm forced to load data with the quotes to the target table, is there a simple way to remove the enclosing double quotes from all fields,
all at once (the table has ~1000 columns). is the solution practical performance wise to very large tables?
If you never had pipes in the enclosed fields you could do it from the control file. If you can have both pipes and double-quotes within a field then I think you have no choice but to preprocess the files, unfortunately.
Your solution [1], to replace double-quotes with an SQL operator, is happening too late to be useful; the delimiters and enclosures have already been interpreted by SQL*Loader before it does the SQL step. Your solution [2], to ignore the enclosure, would work in combination with [1] - until one of the fields did contain a pipe character. And solution [3] has the same problems as using [1] and/or [2] globally.
The documentation for specifying delimiters mentions that:
Sometimes the punctuation mark that is a delimiter must also be included in the data. To make that possible, two adjacent delimiter characters are interpreted as a single occurrence of the character, and this character is included in the data.
In other words, if you repeated the double-quotes inside the fields then they would be escaped and would appear in the table data. As you can't control the data generation, you could preprocess the files you get to replace all the double-quotes with escaped double quotes. Except you don't want to replace all of them - the ones that are actually real enclosures should not be escaped.
You could use a regular expression to target the relevant characters will skipping others. Not my strong area, but I think you can do this with lookahead and lookbehind assertions.
If you had a file called orig.txt containing:
"1"|A|"B"|"C|D"
"2"|A|"B"|"C"D"
3|A|""B""|"C|D"
4|A|"B"|"C"D|E"F"G|H""
you could do:
perl -pe 's/(?<!^)(?<!\|)"(?!\|)(?!$)/""/g' orig.txt > new.txt
That looks for a double-quote which is not preceded by the line-start anchor or a pipe character; and is not followed by a pipe character or line end anchor; and replaces only those with escaped (doubled) double-quotes. Which would make new.txt contain:
"1"|A|"B"|"C|D"
"2"|A|"B"|"C""D"
3|A|"""B"""|"C|D"
4|A|"B"|"C""D|E""F""G|H"""
The double-quotes at the start and end of fields are not modified, but those in the middle are now escaped. If you then loaded that with a control file with double-quote enclosures:
load data
truncate
into table t42
fields terminated by '|' optionally enclosed by '"'
(
col1,
col2,
col3,
col4
)
Then you would end up with:
select * from t42 order by col1;
COL1 COL2 COL3 COL4
---------- ---------- ---------- --------------------
1 A B C|D
2 A B C"D
3 A "B" C|D
3 A B C"D|E"F"G|H"
which hopefully matches your original data. There may be edge cases that don't work (like a double-quote followed by a pipe within a field) but there's a limit to what you can do to attempt to interpret someone else's data... There may also be (much) better regular expression patterns, of course.
You could also consider using an external table instead of SQL*Loader, if the data file is (or can be) in an Oracle directory and you have the right permissions. You still have to modify the file, but you could do it automatically with the preprocessor directive, rather than needing to do that explicitly before calling SQL*Loader.

Multiple enclosed-by symbols in SQL*Loader

I am loading a CSV file into an Oracle table. One field in some records is enclosed as "abc#xyz.com" and the same field in other records is enclosed as "'abc#xyz.com'". I need to load only abc#xyz.com.
I used OPTIONALLY ENCLOSED BY '"' but it does not help in the second case. Is there a way to specify two symbols in the OPTIONALLY ENCLOSED BY clause? Or what are the other ways of achieving this?
You can apply an SQL operator to trim the leading and trailing single quotes. For example, with a data file containing:
"abc#xyz.com"
"'abc#xyz.com'"
'abc#xyz.com'
And a control file for a dummy table:
LOAD DATA
TRUNCATE INTO TABLE t42
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
EMAIL CHAR(30) "TRIM(BOTH '''' FROM :EMAIL)"
)
This loads the stripped values:
select * from t42;
EMAIL
------------------------------
abc#xyz.com
abc#xyz.com
abc#xyz.com
As you can see, this will load values which are enclosed in single quotes, double quotes, or both - as long as the singles are within the doubles and not the other way around.

Delimeter files issues

I do have a flat file with not a fixed structure like
name,phone_num,Address
bob,8888,2nd main,5th floor,avenue road
Here the last column Address has the value 2nd main,5th floor,avenue road but since the same delimeter , is used for seperating columns also i am not getting any clue how to handle the same.
the structure of flat file may change from file to file.
How to handle such kind of flat files while importing using Informatica or SQL * Loader or UTL Files
I will not have any access to flat file just i should read the data from it but i can't edit the data in flat file.
Using SQLLoader
load data
append
into table schema.table
fields terminated by '~'
trailing nullcols
(
line BOUNDFILLER,
name "regexp_substr(:line, '^[^,]+')",
phone_num "regexp_substr(:line, '[^,]+', 1, 2)",
Address "regexp_replace(:line, '^.*?,.*?,')"
)
you need to change your source file to enclose the fields in an escape character eg:
name,phone_num,Address
bob,8888,^2nd main,5th floor,avenue road^
then in sql-loader you'd put:
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '^'
just pick a delimiter that doesn't normally appear in your data.
If you could get the source data enclosed within double quotes ( or any quotes for that matter) you can make use of 'Optional Quotes' option in Informatica while reading from Flat file

Resources