Sqlldr- No terminator found after terminated and enclosed field - oracle

I use Oracle 11g.
My data file looks like below:
1|"\a\ab\"|"do not "clean" needles"|"#"
2|"\b\bg\"|"wall "69" side to end"|"#"
My control file is:
load data
infile 'short.txt'
CONTINUEIF LAST <> '"'
into table "PORTAL"."US_FULL"
fields terminated by "|" OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
u_hlevel,
u_fullname NULLIF u_fullname=BLANKS,
u_name char(2000) NULLIF c_name=BLANKS ,
u_no NULLIF u_no=BLANKS
)
While loading data through sqlldr, a .bad file is created and .log file contains error message stating "No terminator found after terminated and enclosed field"
Double quotes starting and ending are not in my data, however I would need double quotes withing the data like in above example surrounding clean and 69. Ex: My data file after loading should look like:
1, \a\ab\, do not "clean" needles, #
2, \b\bg\ , wall "69" side to end , #
How to accomplish this?

Asking your provider to correct the data file may not be an option, but I ultimately found a solution that requires you to update your control file slightly to specify your "enclosed by" character for each field instead of for all fields.
For my case, I had an issue where if [first_name] field came in with double-quotes wrapping a nickname it would not load. (EG: Jonathon "Jon"). In the data file the name was shown as "Jonathon "Jon"" . So the "enclosed by" was throwing an error because there were double quotes around the value and double quotes around part of the value ("Jon"). So instead of specifying that the value should be enclosed by double quotes, I omitted that and just manually removed the quotes from the string.
Load Data
APPEND
INTO TABLE MyDataTable
fields terminated by "," ---- Noticed i omitted the "enclosed by"
TRAILING NULLCOLS
(
column1 enclosed by '"', --- Specified "enclosed by" here for all cols
column2 enclosed by '"',
FIRST_NAME "replace(substr(:FIRST_NAME,2, length(:FIRST_NAME)-2), chr(34) || chr(34), chr(34))", -- Omitted "enclosed by". substr removes doublequotes, replace fixes double quotes showing up twice. chr(34) is charcode for doublequote
column4 enclosed by '"',
column5 enclosed by '"'
)

I'm afraid since the fields are surrounded by double-quotes the double-quotes you want to preserve need to be escaped by adding another double-quote in front like this:
1|"\a\ab\"|"do not ""clean"" needles"|"#"
Alternately if you can get the data without the fields being surrounded by double-quotes, this would work too:
1|\a\ab\|do not "clean" needles|#
If you can't get the data provider to format the data as needed (i.e. search for double-quotes and replace with 2 double-quotes before extracting to the file), you will have to pre-process the file to set up double quotes one of these ways so the data will load as you expect.

Related

Skip first character from CSV file in Oracle sql loader control file

How do I skip the first character?
Here is the CSV file that I want to load
H
B"01","Mosco"
B"02","Delhi"
T
Here is the control file
LOAD DATA
INFILE 'capital.csv'
APPEND
INTO TABLE CAPITALS
WHEN (01)='B'
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
(
ID,
CAPITAL
)```
WHEN i RUN THIS THE 'B' COMES INTO PICTURE.
The table should look like
[![Table view][1]][1]
How do I skip the 'B'?
[1]: https://i.stack.imgur.com/2U3Vo.png
Disregard the first character. Can you have the source put a comma after the record type indicator?
If so, do this to ignore it:
(
RECORD_IND FILLER,
ID,
CAPITAL
)
If not, this should take care of it in your situation:
ID "SUBSTR(:ID, 2)",

How below REGEXP_REPLACE works?

I have query in my project and that is having REGEXP_REPLACE
i tried to find how it works by searching but i found it like
w+ Matches a word character (that is, an alphanumeric or underscore
(_) character).
but not able to find '"\w+\":' why these "" are used and what is mean by '{|}|"',''
UPDATE (SELECT data,data_value FROM TEMP) t
SET t.DATA_VALUE=REGEXP_REPLACE(REGEXP_REPLACE(t.data, '"\w+\":',''),'{|}|"','');
can you please tell me how it works?
This appear to be a regular expression for stripping keys and enclosing brackets from a JSON string - unfortunately, if this is the case then it does not work in all situations.
The regular expression
'"\w+\":'
will match:
A " double quotation mark;
\w+ one-or-more word (a-z or A-Z or 0-9 or _) characters;
\" another double quotation mark - note: the \ character is not necessary; then
A : colon.
So:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote"}',
'"\w+":', -- Pattern matched
'' -- Replacement string
)
Will output:
{"value","value with \"quote"}
The second pattern {|}|" will match either a {, or a } or a " character (and could have been equivalently written as [{}"]) so:
REGEXP_REPLACE(
'{"value","value with \"quote"}',
'{|}|"', -- Pattern matched
'' -- Replacement string
)
Will output:
value,value with \quote
Which is fine, until (like my example) you have an escaped double quote (or curly braces) in the value string; in which case those will also get stripped leaving the escape character.
(Note: you would not typically find this but it is possible to include escaped quotes in the key. So {"keywith\":quote":"value"} would get replaced to {quote":"value"} and then quote:value which is not the intended output.)
If parsing JSON is what you are trying to do (pre-Oracle 12) then you can use:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'^{|"(\\"|[^"])+":(")?((\\"|[^"])+?)\2((,)|})',
'\3\6'
)
Which outputs:
value,value with \"quote,value with \"{}
Or in Oracle 12 you can do:
SELECT *
FROM JSON_TABLE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'$.*' NULL ON ERROR
COLUMNS (
value VARCHAR2(4000) PATH '$'
)
)
Which outputs:
VALUE
-----------------
value
value with "quote
value with "{}
example:::REGEXP_REPLACE( string, pattern [, replacement_string [, start_position [, nth_appearance [, match_parameter ] ] ] ] )
| is or(CAN MEAN MORE THAN ONE ALTERNATIVE ) , is for at least as in {n,} at least n times
https://www.techonthenet.com/oracle/functions/regexp_replace.php
"where I got my info"
'"\w+\":' why these "" are used and what is mean by '{|}|"',''
Matches a word character(\w)One or more times(+) this has to be messed up it's missing the right quantity of close parentheses by putting \" w+ \"
they allow the " to be shown. This expression takes one expression changes it then uses that as the basis for the next change. Good luck figuring the rest out. Regular expressions aren't too bad, pretty intuitive once you get the basics down.

Oracle SQL: Using Replace function while Inserting

I have a query like this:
INSERT INTO TAB_AUTOCRCMTREQUESTS
(RequestOrigin, RequestKey, CommentText) VALUES ('Tracker', 'OPM03865_0', '[Orange.Security.OrangePrincipal]
em[u02650791]okok
it's friday!')
As expected it is throwing an error of missing comma, due to this it's friday! which has a single quote.
I want to remove this single quote while inserting using Replace function.
How can this be done?
Reason for error is because of the single Quote. In order to correct it, you shall not remove the single quote instead you need to add one more i.e. you need to make it's friday to it''s friday while inserting.
If you need to replace it for sure, then try the below code :
insert into Blagh values(REPLACE('it''s friday', '''', ''),12);
I would suggest using Oracle q quote.
Example:
INSERT INTO TAB_AUTOCRCMTREQUESTS (RequestOrigin, RequestKey, CommentText)
VALUES ('Tracker', 'OPM03865_0',
q'{[Orange.Security.OrangePrincipal] em[u02650791]okok it's friday!}')
You can read about q quote here.
To shorten this article you will follow this format: q'{your string here}' where "{" represents the starting delimiter, and "}" represents the ending delimiter. Oracle automatically recognizes "paired" delimiters, such as [], {}, (), and <>. If you want to use some other character as your start delimiter and it doesn't have a "natural" partner for termination, you must use the same character for start and end delimiters.
Obviously you can't user [] delimiters because you have this in your queries. I sugest using {} delimiters.
Of course you can use double qoute in it it''s with replace. You can omit last parameter in replace because it isn't mandatory and without it it automatically will remove ' character.
INSERT INTO TAB_AUTOCRCMTREQUESTS (CommentText) VALUES (REPLACE('...it''s friday!', ''''))
Single quotes are escaped by doubling them up
INSERT INTO Blagh VALUES(REPLACE('it''s friday', '''', ''),12);
You can try this, (sorry but I don't know why q'[ ] works)
INSERT INTO TAB_AUTOCRCMTREQUESTS
(RequestOrigin, RequestKey, CommentText) VALUES ('Tracker', 'OPM03865_0', q'[[Orange.Security.OrangePrincipal] em[u02650791]okok it's friday!]')
I just got the q'[] from this link Oracle pl-sql escape character (for a " ' ") - this question could be a possible duplicate

Example sqlldr to parse Apache common log

My searches for complex sqlldr parsing of key-value pairs was thin. So posting an example that worked for my needs that you may be able to adapt.
The issue: millions of lines of Tomcat access log e.g.
time='[01/Jan/2001:00:00:03 +0000]' srcip='192.168.0.1' localip='10.0.0.1' referer='-' url='/limsM/SamplesGet-SampleMaster?samplefilters=%5B%22parent_sample%20%3D%208504571%22%2C%22status%20%3D%20'D'%22%5D&depthfilters=%5B%22scale_id%20%3D%2011311%22%5D' servername='yo.yo.dyne.org' rspms='218' rspbytes='2198'
are to be parsed into this Oracle table for convenience of analysis of selected parameters.
create table transfer.loganal (
time date
, timestr varchar2(30)
, srcip varchar2(75)
, localip varchar2(15)
, referer clob
, uri clob
, servername varchar2(50)
, rspms number
, rspbytes number
, logsource varchar2(50)
);
What does a sqlldr control script look like that will accomplish this?
This is my first working solution. Refinements, suggestions, improvements always welcome.
Given Tomcat access log in a directory, e.g.
yoyotomcat/
combined.20010101
combined.20010102
...
This file saved as combined.ctl as a sibling of yoyotomcat
-- Load an Apache common log format
-- essentially key-value pairs
-- example line of source data
-- time='[01/Jan/2001:00:00:03 +0000]' srcip='192.168.0.1' localip='10.0.0.1' referer='-' url='/limsM/SamplesGet-SampleMaster?samplefilters=%5B%22parent_sample%20%3D%208504571%22%2C%22status%20%3D%20'D'%22%5D&depthfilters=%5B%22scale_id%20%3D%2011311%22%5D' servername='yo.yo.dyne.org' rspms='218' rspbytes='2198'
--
LOAD DATA
INFILE 'yoyodyne/combined.2001*' "STR '\n'"
TRUNCATE INTO TABLE transfer.loganal
TRAILING NULLCOLS
(
time enclosed by "time='[" and "+0000]' " "to_date(:time, 'dd/Mon/yyyy:hh24:mi:ss')"
, srcip enclosed by "srcip='" and "' "
, localip enclosed by "localip='" and "' "
, referer char(10000) enclosed by "referer='" and "' "
, uri char(10000) enclosed by "url='" and "' "
, servername enclosed by "servername='" and "' "
, rspms enclosed by "rspms='" and "' " "decode(:rspms, '-', null, to_number(:rspms))"
, rspbytes enclosed by "rspbytes='" and "'" "decode(:rspbytes, '-', null, to_number(:rspbytes))"
, logsource "'munchausen'"
)
Load the hypothetical example content by running this from a command prompt
sqlldr userid=buckaroo#banzai direct=true control=combined.ctl
Your mileage may vary. I'm on Oracle 12. There may be features used here that are relatively new. Not sure.
Illumination
This variant of the "enclosed by" functionality works well for key-value pairs. Its not regular expression, but is performant.
The ability to treat the column name as a bind variable and apply available SQL functions to it enables much additional flexibility.
Have some log that has really long GETs, thus the specification of unreasonably long string values. 255 as a default wasn't enough.
Rspms and rspbytes sometimes had '-'. Used SQL to work around frequent "not a number" errors.
The control file as written presumes all fields are present. Not a good assumption over time. Looking for config to allow null column when a enclosure is not matched.
Cheers.

Multiple rows in single field not getting loaded | SQL Loader | Oracle

I need to load from CSV file into an Oracle Table.
The problem i m facing is that, the DESCRIPTION field is having Multiple Lines in itself.
Solution i am using for it as ENCLOSURE STRING " (Double Quotes)
Using KSH to call for sqlldr.
I am getting following two problems:
The row having Description with multiple lines, is not getting loaded as it terminates there itself and values of further fields/columns are not visible for loader. ERROR: second enclosure string not present (Obviously " is not found.)
The second line(and lines beyond that) of DESCRIPTION field is being treated as NEW Row in itself and is thus getting populated. It is GARBAGE DATA.
CONTROL File:
OPTIONS(SKIP=1)
LOAD DATA
BADFILE '/home/fimsctl/datafiles/inbound/core_po/logs/core_po_data.bad'
DISCARDFILE '/home/fimsctl/datafiles/inbound/core_po/logs/core_po_data.dsc'
APPEND INTO TABLE FIMS_OWNER.FINANCE_PO_INBOUND_T
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
PO_NUM,
CREATED_DATE "to_Date(:CREATED_DATE,'mm/dd/yyyy hh24:mi:ss')",
PO_TYPE,
PO_STATUS,
NOTREQ1 FILLER,
NOTREQ2 FILLER,
PO_VALUE,
LINE_ITEM_NUMBER,
QUANTITY,
LINE_ITEM_DESCRIPTION,
RATE_VALUE,
CURRENCY_CODE,
UOM_ID,
PO_REQUESTER_WWID,
QUANTITY_ORDERED,
QUANTITY_RECEIVED,
QUANTITY_BILLED terminated by whitespace
)
CSV File Data:
COL1,8/4/2014 5:52,COL3,COL4,COL5,,,COL8,COL9,"Description Data",COL11,COL12,COL13,COL14,COL15,COL16,COL17
COL1,8/4/2014 8:07,COL3,COL4,COL5,,,COL8,COL9,,"GE MAKE 1X250 WATT HPSV NON INTEGRAL IP-65 **[NEWLINE HERE]**
DIE-CAST ALUMINIUM FIXTURE COMPLETE SET **[NEWLINE HERE]**
WITH SEPRATE CONTROL GEAR BOX WITH CHOKE, **[NEWLINE HERE]**
IGNITOR, CAPACITOR & LAMP-T",COL11,COL12,COL13,COL14,COL15,COL16,COL17
COL1,8/4/2014 8:13,COL3,COL4,COL5,,,COL8,COL9,"Description Data",COL11,COL12,COL13,COL14,COL15,COL16,COL17

Resources