Example sqlldr to parse Apache common log - oracle

My searches for complex sqlldr parsing of key-value pairs was thin. So posting an example that worked for my needs that you may be able to adapt.
The issue: millions of lines of Tomcat access log e.g.
time='[01/Jan/2001:00:00:03 +0000]' srcip='192.168.0.1' localip='10.0.0.1' referer='-' url='/limsM/SamplesGet-SampleMaster?samplefilters=%5B%22parent_sample%20%3D%208504571%22%2C%22status%20%3D%20'D'%22%5D&depthfilters=%5B%22scale_id%20%3D%2011311%22%5D' servername='yo.yo.dyne.org' rspms='218' rspbytes='2198'
are to be parsed into this Oracle table for convenience of analysis of selected parameters.
create table transfer.loganal (
time date
, timestr varchar2(30)
, srcip varchar2(75)
, localip varchar2(15)
, referer clob
, uri clob
, servername varchar2(50)
, rspms number
, rspbytes number
, logsource varchar2(50)
);
What does a sqlldr control script look like that will accomplish this?

This is my first working solution. Refinements, suggestions, improvements always welcome.
Given Tomcat access log in a directory, e.g.
yoyotomcat/
combined.20010101
combined.20010102
...
This file saved as combined.ctl as a sibling of yoyotomcat
-- Load an Apache common log format
-- essentially key-value pairs
-- example line of source data
-- time='[01/Jan/2001:00:00:03 +0000]' srcip='192.168.0.1' localip='10.0.0.1' referer='-' url='/limsM/SamplesGet-SampleMaster?samplefilters=%5B%22parent_sample%20%3D%208504571%22%2C%22status%20%3D%20'D'%22%5D&depthfilters=%5B%22scale_id%20%3D%2011311%22%5D' servername='yo.yo.dyne.org' rspms='218' rspbytes='2198'
--
LOAD DATA
INFILE 'yoyodyne/combined.2001*' "STR '\n'"
TRUNCATE INTO TABLE transfer.loganal
TRAILING NULLCOLS
(
time enclosed by "time='[" and "+0000]' " "to_date(:time, 'dd/Mon/yyyy:hh24:mi:ss')"
, srcip enclosed by "srcip='" and "' "
, localip enclosed by "localip='" and "' "
, referer char(10000) enclosed by "referer='" and "' "
, uri char(10000) enclosed by "url='" and "' "
, servername enclosed by "servername='" and "' "
, rspms enclosed by "rspms='" and "' " "decode(:rspms, '-', null, to_number(:rspms))"
, rspbytes enclosed by "rspbytes='" and "'" "decode(:rspbytes, '-', null, to_number(:rspbytes))"
, logsource "'munchausen'"
)
Load the hypothetical example content by running this from a command prompt
sqlldr userid=buckaroo#banzai direct=true control=combined.ctl
Your mileage may vary. I'm on Oracle 12. There may be features used here that are relatively new. Not sure.
Illumination
This variant of the "enclosed by" functionality works well for key-value pairs. Its not regular expression, but is performant.
The ability to treat the column name as a bind variable and apply available SQL functions to it enables much additional flexibility.
Have some log that has really long GETs, thus the specification of unreasonably long string values. 255 as a default wasn't enough.
Rspms and rspbytes sometimes had '-'. Used SQL to work around frequent "not a number" errors.
The control file as written presumes all fields are present. Not a good assumption over time. Looking for config to allow null column when a enclosure is not matched.
Cheers.

Related

Skip first character from CSV file in Oracle sql loader control file

How do I skip the first character?
Here is the CSV file that I want to load
H
B"01","Mosco"
B"02","Delhi"
T
Here is the control file
LOAD DATA
INFILE 'capital.csv'
APPEND
INTO TABLE CAPITALS
WHEN (01)='B'
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
(
ID,
CAPITAL
)```
WHEN i RUN THIS THE 'B' COMES INTO PICTURE.
The table should look like
[![Table view][1]][1]
How do I skip the 'B'?
[1]: https://i.stack.imgur.com/2U3Vo.png
Disregard the first character. Can you have the source put a comma after the record type indicator?
If so, do this to ignore it:
(
RECORD_IND FILLER,
ID,
CAPITAL
)
If not, this should take care of it in your situation:
ID "SUBSTR(:ID, 2)",

Oracle PL/SQL Query With Dynamic Parameters in Where Clause

I'm trying to write a dynamic query that could have a different amount of parameters of different type. The only issue I'm having is handling if the value is a string therefore needing single quotes around it. I am using the value of a field called key_ref_ to determine what my where clause will look like. Some examples are:
LINE_NO=1^ORDER_NO=P6002277^RECEIPT_NO=1^RELEASE_NO=1^
PART_NO=221091^PART_REV=R02^
At the moment I am replacing the '^' with ' and ' like this:
REPLACE( key_ref_, '^' ,' and ' );
Then I'm trying to create the dynamic query like this:
EXECUTE IMMEDIATE
'select '||column_name_||' into column_ from '||base_table_||' where '||
key_ref_ || 'rownum = 1';
This won't work in cases where the value is not a number.
Also I only added "rownum = 1" to handle the extra 'and' at the end instead of removing the last occurence.
If the input will not have the tild symbol(~) then you can try the below code.
if the input has tild, you can replace it with some other value which should not be there in input
considering the input provided in the example..
LINE_NO=1^ORDER_NO=P6002277^RECEIPT_NO=1^RELEASE_NO=1^PART_NO=221091^PART_REV=R02^
use the below code
replace(replace(replace('LINE_NO=1^ORDER_NO=P6002277^RECEIPT_NO=1^RELEASE_NO=1^PART_NO=221091^PART_REV=R02^','^','~ and '),'=','=~'),'~',q'[']')
and the result would be
LINE_NO='1' and ORDER_NO='P6002277' and RECEIPT_NO='1' and RELEASE_NO='1' and PART_NO='221091' and PART_REV='R02' and
System will type cast the number fields so, there would not be any issue.

Sqlldr- No terminator found after terminated and enclosed field

I use Oracle 11g.
My data file looks like below:
1|"\a\ab\"|"do not "clean" needles"|"#"
2|"\b\bg\"|"wall "69" side to end"|"#"
My control file is:
load data
infile 'short.txt'
CONTINUEIF LAST <> '"'
into table "PORTAL"."US_FULL"
fields terminated by "|" OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
u_hlevel,
u_fullname NULLIF u_fullname=BLANKS,
u_name char(2000) NULLIF c_name=BLANKS ,
u_no NULLIF u_no=BLANKS
)
While loading data through sqlldr, a .bad file is created and .log file contains error message stating "No terminator found after terminated and enclosed field"
Double quotes starting and ending are not in my data, however I would need double quotes withing the data like in above example surrounding clean and 69. Ex: My data file after loading should look like:
1, \a\ab\, do not "clean" needles, #
2, \b\bg\ , wall "69" side to end , #
How to accomplish this?
Asking your provider to correct the data file may not be an option, but I ultimately found a solution that requires you to update your control file slightly to specify your "enclosed by" character for each field instead of for all fields.
For my case, I had an issue where if [first_name] field came in with double-quotes wrapping a nickname it would not load. (EG: Jonathon "Jon"). In the data file the name was shown as "Jonathon "Jon"" . So the "enclosed by" was throwing an error because there were double quotes around the value and double quotes around part of the value ("Jon"). So instead of specifying that the value should be enclosed by double quotes, I omitted that and just manually removed the quotes from the string.
Load Data
APPEND
INTO TABLE MyDataTable
fields terminated by "," ---- Noticed i omitted the "enclosed by"
TRAILING NULLCOLS
(
column1 enclosed by '"', --- Specified "enclosed by" here for all cols
column2 enclosed by '"',
FIRST_NAME "replace(substr(:FIRST_NAME,2, length(:FIRST_NAME)-2), chr(34) || chr(34), chr(34))", -- Omitted "enclosed by". substr removes doublequotes, replace fixes double quotes showing up twice. chr(34) is charcode for doublequote
column4 enclosed by '"',
column5 enclosed by '"'
)
I'm afraid since the fields are surrounded by double-quotes the double-quotes you want to preserve need to be escaped by adding another double-quote in front like this:
1|"\a\ab\"|"do not ""clean"" needles"|"#"
Alternately if you can get the data without the fields being surrounded by double-quotes, this would work too:
1|\a\ab\|do not "clean" needles|#
If you can't get the data provider to format the data as needed (i.e. search for double-quotes and replace with 2 double-quotes before extracting to the file), you will have to pre-process the file to set up double quotes one of these ways so the data will load as you expect.

Multiple rows in single field not getting loaded | SQL Loader | Oracle

I need to load from CSV file into an Oracle Table.
The problem i m facing is that, the DESCRIPTION field is having Multiple Lines in itself.
Solution i am using for it as ENCLOSURE STRING " (Double Quotes)
Using KSH to call for sqlldr.
I am getting following two problems:
The row having Description with multiple lines, is not getting loaded as it terminates there itself and values of further fields/columns are not visible for loader. ERROR: second enclosure string not present (Obviously " is not found.)
The second line(and lines beyond that) of DESCRIPTION field is being treated as NEW Row in itself and is thus getting populated. It is GARBAGE DATA.
CONTROL File:
OPTIONS(SKIP=1)
LOAD DATA
BADFILE '/home/fimsctl/datafiles/inbound/core_po/logs/core_po_data.bad'
DISCARDFILE '/home/fimsctl/datafiles/inbound/core_po/logs/core_po_data.dsc'
APPEND INTO TABLE FIMS_OWNER.FINANCE_PO_INBOUND_T
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
PO_NUM,
CREATED_DATE "to_Date(:CREATED_DATE,'mm/dd/yyyy hh24:mi:ss')",
PO_TYPE,
PO_STATUS,
NOTREQ1 FILLER,
NOTREQ2 FILLER,
PO_VALUE,
LINE_ITEM_NUMBER,
QUANTITY,
LINE_ITEM_DESCRIPTION,
RATE_VALUE,
CURRENCY_CODE,
UOM_ID,
PO_REQUESTER_WWID,
QUANTITY_ORDERED,
QUANTITY_RECEIVED,
QUANTITY_BILLED terminated by whitespace
)
CSV File Data:
COL1,8/4/2014 5:52,COL3,COL4,COL5,,,COL8,COL9,"Description Data",COL11,COL12,COL13,COL14,COL15,COL16,COL17
COL1,8/4/2014 8:07,COL3,COL4,COL5,,,COL8,COL9,,"GE MAKE 1X250 WATT HPSV NON INTEGRAL IP-65 **[NEWLINE HERE]**
DIE-CAST ALUMINIUM FIXTURE COMPLETE SET **[NEWLINE HERE]**
WITH SEPRATE CONTROL GEAR BOX WITH CHOKE, **[NEWLINE HERE]**
IGNITOR, CAPACITOR & LAMP-T",COL11,COL12,COL13,COL14,COL15,COL16,COL17
COL1,8/4/2014 8:13,COL3,COL4,COL5,,,COL8,COL9,"Description Data",COL11,COL12,COL13,COL14,COL15,COL16,COL17

Read using CSVREAD with non-printing characters as field and record separators

I have a file that I would like to read in H2 that uses FIELD(ASCII code 31) & RECORD(ASCII code 30) as the field and record separators in my file. I've tried this but it's not working...
SELECT * FROM CSVREAD('test.csv', null, 'rowSeparator=' || CHAR(30) || 'fieldSeparator=' || CHAR(31));
How do I need to format this to read from my file?
EDIT I
This parses the fields out correctly but the rows aren't being parsed out...not sure why:
SELECT * FROM CSVREAD('C:\Users\zmacomber\ReceiptPrinter\data\bak\address.dat', null, STRINGDECODE('charset=UTF-8 rowSeparator=' || CHAR(30) || ' fieldSeparator=' || CHAR(31)));
Looking at the source code of the CSV tool, unfortunately you can not currently change the row separator used for reading (parsing). The row separator is only used for writing, not for reading. For reading, you would need to use \n, \r, or a combination of both.
I understand this is unexpected, but that's the way it currently is.

Resources