Multiple Line - Multiple CLOB per line SQL Loader - bash

I'm finding myself in a predicament.
I am parsing a logfile of multiple entries of SOAP calls. Each soap call can contain payloads of 4000+ characters preventing me to use varchar2. So I must use a CLOB)
I have to load those payloads onto a oracle DB (12g).
I succesfully split the log into single fields and got the payloads and the header of the calls in two single files.
How do I create a CTL file that loads from an infile (that contains data for other fields) and reads CLOB files in pairs?
Ideally:
LOAD DATA
INFILE 'load.ldr'
BADFILE 'load.bad'
APPEND
INTO TABLE ops_payload_tracker
FIELDS TERMINATED BY '§'
TRAILING NULLCOLS
( id,
direction,
http_info,
payload CLOB(),
header CLOB(),
host_name)
but then I don't know, and can't find anywhere on the internet, how to do it for more than one record and how to reference that one record with two CLOBS.
Worth mentioning that it is JBOSS logs, on bash environment.

check size of varchar type in 12c. I thought it was increased to 32K
https://oracle-base.com/articles/12c/extended-data-types-12cR1
see that sample SQL Loader, CLOB, delimited fields
"I already create two separate files for payload and headers. How
should I specify that the two files are there for the same ID?"
see example here:
https://oracle-base.com/articles/10g/load-lob-data-using-sql-loader
roughly:
sample table
1,one,01-JAN-2006,1_clob_header.txt,2_clob_details.txt
2,two,02-JAN-2006,2_clob_heder.txt,2_clob_details.txt
ctl
LOAD DATA
INFILE 'lob_test_data.txt'
INTO TABLE lob_tab
FIELDS TERMINATED BY ','
(number_content CHAR(10),
varchar2_content CHAR(100),
date_content DATE "DD-MON-YYYY" ":date_content",
clob_filename FILLER CHAR(100),
clob_content LOBFILE(clob_filename) TERMINATED BY EOF,
blob_filename FILLER CHAR(100),
blob_content LOBFILE(blob_filename) TERMINATED BY EOF)

Related

Suggestion for loading data of 2M records in to DB

Users upload data file through application (JSF) which has 2 million records, i have to upload it to DB. Loading through JAVA asynchronous call is occupying more memory out-of memory exception and also most of the time it is getting timeout.
So for that what i did is, stored uploaded file as CLOB in table1, i use UNIX shell script which runs every 15 minutes to see if table1 has not-processed records, if then read that CLOB file and load in to table2 using SQLLDR in the same shell script.It is working fine, but there is a 15 minutes delay in processing records.
So i think the same SQLLDR process can be run through a PL/SQL package or procedure and the same package can be called through JAVA JDBC call.. rite? any examples?
If it's one-time export/import you can use SQL Developer. It enables you to export displayed rows in a loader format. B/Clobs are exported as separate files.
Following Oracle's blog:
LOAD DATA
INFILE 'loader.txt'
INTO TABLE my_table
FIELDS TERMINATED BY ','
( id CHAR(10),
author CHAR(30),
created DATE "YYYY-MM-DD" ":created",
fname FILLER CHAR(80),
text LOBFILE(fname) TERMINATED BY EOF
)
"fname" is an arbitrary label, we could have used "fred" and it would
have worked exactly the same. It just needs to be the same on the two
lines where it is used.
loader.txt:
1,John Smith,2015-04-29,file1.txt
2,Pete Jones,2013-01-31,file2.txt
If you want to know how to dump a CLOB column into a file, please refer to Dumping CLOB fields into files?.

LINES TERMINATED BY only supports newline '\n' right now

I have files where the column is delimited by char(30) and the lines are delimited by char(31). I'm using these delimiters mainly because the columns may contain newline (\n), so the default line delimiter for hive is not useful for us.
I have tried to change the line delimiter in hive but get the error below:
LINES TERMINATED BY only supports newline '\n' right now.
Any suggestion?
Write custom SerDe may work?
is there any plan to enhance this functionality in hive in new releases?
thanks
Not sure if this helps, or is the best answer, but when faced with this issue, what we ended up doing is setting the 'textinputformat.record.delimiter' Map/Reduce java property to the value being used. In our case it was a string "{EOL}", but could be any unique string for all practical purposes.
We set this in our beeline shell which allowed us to pull back the fields correctly. It should be noted that once we did this, we converted the data to Avro as fast as possible so we didn't need to explain to every user, and the user's baby brother, to set the {EOL} line delimiter.
set textinputformat.record.delimiter={EOL};
Here is the full example.
#example CSV data (fields broken by '^' and end of lines broken by the String '{EOL}'
ID^TEXT
11111^Some THings WIth
New Lines in THem{EOL}11112^Some Other THings..,?{EOL}
111113^Some crazy thin
gs
just crazy{EOL}11114^And Some Normal THings.
#here is the CSV table we laid on top of the data
CREATE EXTERNAL TABLE CRAZY_DATA_CSV
(
ID STRING,
TEXT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\136'
STORED AS TEXTFILE
LOCATION '/archive/CRAZY_DATA_CSV'
TBLPROPERTIES('skip.header.line.count'='1');
#here is the Avro table which we'll migrate into below.
CREATE EXTERNAL TABLE CRAZY_DATA_AVRO
(
ID STRING,
TEXT STRING
)
STORED AS AVRO
LOCATION '/archive/CRAZY_DATA_AVRO'
TBLPROPERTIES ('avro.schema.url'='hdfs://nameservice/archive/avro_schemas/CRAZY_DATA.avsc');
#And finally, the magic is here. We set the custom delimiter and import into our Avro table.
set textinputformat.record.delimiter={EOL};
INSERT INTO TABLE CRAZY_DATA_AVRO SELECT * from CRAZY_DATA_CSV;
I have worked it out by using the option during the extract --hive-delims-replacement ' ' in sqoop so the characters \n \001 \r are removed from the columns.

LOAD DATA IN ORACLE

Hi am trying to use load data in oracle. if am using
LINES TERMINATED BY '<>'
it is throwing
SQL*Loader-350: Syntax error at line 1.
Expecting "(", found "LINES".
why it is happening .whether there is no LINES teminated by clause in oracle?
I think LINES TERMINATED is not defined in ORACLE; check Stream Record Format from the ORACLE documentation:
A file is in stream record format when the records are not specified
by size; instead SQL*Loader forms records by scanning for the record
terminator. Stream record format is the most flexible format, but
there can be a negative effect on performance. The specification of a
datafile to be interpreted as being in stream record format looks
similar to the following: INFILE datafile_name ["str
terminator_string"]
Example:
load data
infile 'example.dat' "str '|\n'"
into table example
fields terminated by ',' optionally enclosed by '"'
(col1 char(5),
col2 char(7))
example.dat:
hello,world,|
james,bond,|
See http://docs.oracle.com/cd/B19306_01/server.102/b14215/ldr_concepts.htm for more.

bulk load UDT columns in Oracle

I have a table with the following structure:
create table my_table (
id integer,
point Point -- UDT made of two integers (x, y)
)
and i have a CSV file with the following data:
#id, point
1|(3, 5)
2|(7, 2)
3|(6, 2)
now i want to bulk load this CSV into my table, but i cant find any information about how to handle the UDT in Oracle sqlldr util. Is is possible to use the bulk load util when having UDT columns?
I don't know if sqlldr can do this, but personally I would use an external table.
Attach the file as an external table (the file must be on the database server), and then insert the contents of the external table into the destination table transforming the UDT into two values as you go. The following select from dual should help you with the translation:
select
regexp_substr('(5, 678)', '[[:digit:]]+', 1, 1) x_point,
regexp_substr('(5, 678)', '[[:digit:]]+', 1, 2) y_point
from dual;
UPDATE
In sqlldr, you can transform fields using standard SQL expressions:
LOAD DATA
INFILE 'data.dat'
BADFILE 'bad_orders.txt'
APPEND
INTO TABLE test_tab
FIELDS TERMINATED BY "|"
( info,
x_cord "regexp_substr(:x_cord, '[[:digit:]]+', 1, 1)",
)
The control file above will extract the first digit in the fields like (3, 4), but I cannot find a way to extract the second digit - ie I am not sure if it is possible to have the same field in the input file inserted into two columns.
If external tables are not an option for you, I would suggest either (1) transform the file before loading, using sed, awk, Perl etc or (2) SQLLDR the file into a temporary table and then have a second process to trandform the data and insert into your final table. Another option is to look at how the file is generated - could you generate it so that the field you need to transform is repeated in two fields in the file, eg:
data|(1, 2)|(1, 2)
Maybe someone else will chip in with a way to get sqlldr to do what you want.
Solved the problem after more research, because Oracle SQL*Loader has this feature, and it is used by specifying a column object, the following was the solution:
LOAD DATA
INFILE *
INTO TABLE my_table
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
id,
point column object
(
x,
y
)
)
BEGINDATA
1,3,5
2,7,2
3,6,2

Loading Unicode Characters with Oracle SQL Loader (sqlldr) results in question marks

I'm trying to load localized strings from a unicode (UTF8-encoded) csv using SQL Loader into an oracle database. I've tried all sort of combinations but nothing seems to give me the result I'm looking for which is to have special greek characters like (Δ) not get converted to Δ or ¿.
My table definition looks like this:
CREATE TABLE "GLOBALIZATIONRESOURCE"
(
"RESOURCETYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"CULTURE" VARCHAR2(20 CHAR) NOT NULL ENABLE,
"KEY" VARCHAR2(128 CHAR) NOT NULL ENABLE,
"VALUE" VARCHAR2(2048 CHAR),
"DESCRIPTION" VARCHAR2(512 CHAR),
CONSTRAINT "PK_GLOBALIZATIONRESOURCE" PRIMARY KEY ("RESOURCETYPE","CULTURE","KEY") USING INDEX TABLESPACE REPSPACE_IX ENABLE
)
TABLESPACE REPSPACE;
I have tried the following configurations in my control file (and actually every permutation I could think of)
load data
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
load data
CHARACTERSET UTF8
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
load data
CHARACTERSET UTF16
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY X'002c' OPTIONALLY ENCLOSED BY X'0022'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
With the first two options, the unicode characters don't get encoded and just show up as upside down question marks.
If I choose last option, UTF16, then I get the following error even though all my data in my fields are much shorter than the length specified.
Field in data file exceeds maximum length
It seems as though every possible combination of ctl file configurations (even setting the byte order to little and big) doesn't work correctly. Can someone please give an example of a configuration (table structure and CTL file) that correctly loads unicode data from a csv? Any help would be greatly appreciated.
Note: I've already been to http://docs.oracle.com/cd/B19306_01/server.102/b14215/ldr_concepts.htm, http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm and http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm.
I had same issue and resolved by below steps -
Open data file into Notepad++ , Go to "Encoding" dropdown and select UTF8 encoding and save file.
use CHARACTERSET UTF8 into CTL file and then upload data.
You have two problem;
Character set.
Answer: You can solve this problem by finding your text character set (most of time notepad++ can do this.). After finding character set, you have to find sqlldr correspond of character set name. So, you can find this info from link https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313
After all of these, you should solve character set problem.
In contrast to your actual data length, sqlldr says that, Field in data file exceeds maximum length.
Answer: You can solve this problem by adding CHAR(4000) (or what the actual length is) to problematic column. In my case, the problematic column is "E" column. Example is below. In my case I solved my problem in this way, hope helps.
LOAD DATA
CHARACTERSET UTF8
-- This line is comment
-- Turkish charset (for ÜĞİŞ etc.)
-- CHARACTERSET WE8ISO8859P9
-- Character list is here.
-- https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313
INFILE 'data.txt' "STR '~|~\n'"
TRUNCATE
INTO TABLE SILTAB
FIELDS TERMINATED BY '#'
TRAILING NULLCOLS
(
a,
b,
c,
d,
e CHAR(4000)
)
You must ensure that the following charactersets are the same:
db characterset
dump file characterset
the client from which you are doing the import (NLS_LANG)
If the client-side characterset is different, oracle will attempt to perform character conversions to the native db characterset and this might not always provide the desired result.
Don't use MS Office to save the spreadsheet into unicode .csv.
Instead, use OpenOffice to save into unicode-UTF8 .csv file.
Then in the loader control file, add "CHARACTERSET UTF8"
run Oracle SQL*Loader, this gives me correct results
There is a range of character set encoding that you can use in control file while loading data from sql loader.
For greek characters I believe Western European char set should do the trick.
LOAD DATA
CHARACTERSET WE8ISO8859P1
or in case of MS word input files with smart characters try in control file
LOAD DATA
CHARACTERSET WE8MSWIN1252

Resources