SQLOADER load multple files into one table - windows

I am using SQLOADER to load multiple csv-files into 1 table.
This is the content of my ctl-file
load data
append
into table SAMP_TABLE
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( COLUMN1 CHAR(4000),
COLUMN2 CHAR(4000),
COLUMN3 CHAR(4000)
)
And this is my batch file
#echo off
for %%F in ("C:\Users\test\*.csv") do (
sqlldr username/pw#dbip CONTROL='C:\Users\test\test2.ctl' LOG='C:\Users\test\TEST.log' "DATA=%%F"
)
pause
All my csv-files, control-file and the batch-file are in the same directory.
I have two csv-files with the same columns only different content. The problem
is now that it imports only the first csv-file not the second one and i dont know why..? I would appreciate if someone could tell me what i am doing wrong

You just need to give multiple infile requests
load data
infile 'data1.csv'
infile 'data2.csv'
...
infile 'datan.csv'
append
into table TABLE1
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( COLUMN1 CHAR(4000),
COLUMN2 CHAR(4000),
COLUMN3 CHAR(4000)
)
If the files are always present in the same folder
infile /path/*.csv

Hello this is the solution of my problem:
#echo off
IF NOT EXIST C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4% md C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%
for %%F in ("C:\Users\test\*.csv") do (
sqlldr dbuser/dbpw#dbip CONTROL='C:\Users\test.ctl' LOG='C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%.log' "DATA=%%F" skip=1
move %%F C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%
)
pause

Related

Oracle External Table RECORDS DELIMITED BY '",\n"' not working, how can I delimit by a character and newline in the same time?

I am trying to read large CSV files with lots of Newline characters in them.
this is how the data looks like in the CSV file.
"LastValueInRow",
"FirstValueInNextRow",
I would like to use " + , + NEWLINE + " as records delimiter to prevent it from reading all other return characters as new records.
The following code reads most CSV records correctly by using NEWLINE (\n) + "
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "IMPORT_TEST"
ACCESS PARAMETERS
( RECORDS DELIMITED BY '\n"'
BADFILE SNOW_IMPORT_TEST:'TEST_1.bad'
LOGFILE SNOW_IMPORT_TEST:'TEST_1.log'
SKIP 1
FIELDS TERMINATED BY '","'
MISSING FIELD VALUES ARE NULL
)
LOCATION
( "IMPORT_TEST":'TEST_1.csv'
)
)
Adding any characters before the \n doesn't return any rows, below is what I want which doesn't work:
( RECORDS DELIMITED BY '",\n"'
Is it possible to use " + , + \n + " as records delimiter.
Thanks.
After a lot of research I have found that the best solution is to replace the return characters in the CSV file to a different character using Windows PowerShell then update the records delimiter in the external table.
I have created the following Powershell script to remove all the return characters in the CSV file (where $loc is the directory and $file_name is the file name)
(Get-content -raw -path $loc\$file_name".csv") -replace '[\r\n]', '|' | Out-File -FilePath $loc\$file_name"_PP.csv" -Force -Encoding ascii -nonewline
Then I have updated the external table parameter to read the records based on the new delimiter '",||"'.
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "IMPORT_TEST"
ACCESS PARAMETERS
( RECORDS DELIMITED BY '",||"'
BADFILE SNOW_IMPORT_TEST:'TEST_1_PP.bad'
LOGFILE SNOW_IMPORT_TEST:'TEST_1_PP.log'
SKIP 1
FIELDS TERMINATED BY '","'
MISSING FIELD VALUES ARE NULL
)
LOCATION
( "IMPORT_TEST":'TEST_1_PP.csv'
)
)
Now the external table is reading all the records correctly.

SQL Loader - Multiple Files and Grabbing file names

I have a folder with over 400K txt files.
With names like
deID.RESUL_12433287659.txt_234323456.txt
deID.RESUL_34534563649.txt_345353567.txt
deID.RESUL_44235345636.txt_537967875.txt
deID.RESUL_35234663456.txt_423452545.txt
I want to store all the files and their content in the following way:
file_name file_content
deID.RESUL_12433287659.txt_234323456.txt Content 1
deID.RESUL_34534563649.txt_345353567.txt Content 2
deID.RESUL_44235345636.txt_537967875.txt Content 3
deID.RESUL_35234663456.txt_423452545.txt Content 4
I tried creating Control file using:
LOAD
DATA
INFILE 'deID.RESUL_12433287659.txt_234323456.txt'
INFILE 'deID.RESUL_34534563649.txt_345353567.txt'
INFILE 'deID.RESUL_44235345636.txt_537967875.txt'
INFILE 'deID.RESUL_35234663456.txt_423452545.txt'
APPEND INTO TABLE TBL_DATA
EVALUATE CHECK_CONSTRAINTS
REENABLE DISABLED_CONSTRAINTS
EXCEPTIONS EXCEPTION_TABLE
FIELDS TERMINATED BY ""
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
FILE_NAME
)
Is there a way I can grab the files names dynamically and specify wildcard in the INFILE so I don't have to mention 400K files one by one in my control file?
1) Create table to hold data/files
create table TBL_DATA(file_name varchar2(4000), file_content clob);
2) Create load_all.ctl
LOAD DATA
INFILE file_list.txt
INSERT INTO TABLE TBL_DATA
APPEND
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
file_name char(4000)
, file_content LOBFILE(file_name) TERMINATED BY EOF
)
3) Redirect list of file to one file_list.txt
ls -1 *.txt > file_list.txt
4) Run sqlldr user/pass#db control=load_all.ctl
5) load_all.ctl,file_list.txt and source files should be in the same folder.

sqlldr stream record format on commandline

I am loading a delimited file using sqlldr. I have kept file format/table details in the ctl file and pass other parameters on the command line.
sqlldr control=sp.ctl data=data.20170502.txt SKIP=1 userid=xyz#db/pwd log=sp.log bad=sp.bad
sp.ctl
LOAD DATA
TRUNCATE
INTO TABLE "T_DATA"
TRUNCATE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
C_1 CHAR(2000),
C_2 CHAR(2000),
C_3 CHAR(2000)
)
I now need to use a stream record format on this data file.
infile 'example3.dat' "str '|\n'"
However, I am not using the infile syntax.
So I tried using
sqlldr control=sp.ctl data=data.20170502.txt "str '!\n'" SKIP=1
userid=xyz#db/pwd log=sp.log bad=sp.bad
It gives an error:
LRM-00112: multiple values not allowed for parameter 'data'
How do I pass the record delimiter on the command line?

Bash - replace string inside all files in directory

I have 31 .ctl files in a directory, they looks like this:
load data CHARACTERSET AL32UTF8
infile '../dane/kontakty_Biura_wyborcze.csv' "str '\n'"
append
into table ODI_PUW_OSOBY2
fields terminated by ';'
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( LP CHAR(4000),
WOJEWODZTWO CHAR(4000),
POWIAT CHAR(4000),
GMINA CHAR(4000),
NAZWA_INSTYTUCJI CHAR(4000),
KOD CHAR(4000),
MIEJSCOWOSC CHAR(4000),
ADRES CHAR(4000),
NAZWISKO_I_IMIE CHAR(4000),
FUNKCJA CHAR(4000),
TEL_SLUZB_STACJON_1 CHAR(4000),
TEL_SLUZB_STACJON_2 CHAR(4000),
TEL_SLUZB_STACJON_3 CHAR(4000),
TEL_SLUZB_KOM_1 CHAR(4000),
TEL_SLUZB_KOM_2 CHAR(4000),
FAX_SLUZB_1 CHAR(4000),
FAX_SLUZB_2 CHAR(4000),
EMAIL_SLUZB_1 CHAR(4000),
EMAIL_SLUZB_2 CHAR(4000),
WWW CHAR(4000),
TYP CONSTANT "Biura wyborcze.",
ODI_SESJA_ID CONSTANT "20130717144702"
ODI_STATUS CONSTANT "0",
IMIE EXPRESSION "pg_odi_utils.zwroc_imiona(pg_odi_utils.usun_przyrostki(:NAZWISKO_I_IMIE),0)",
NAZWISKO EXPRESSION "pg_odi_utils.zwroc_nazwisko(pg_odi_utils.usun_przyrostki(:NAZWISKO_I_IMIE),0)"
)
There are 31 files like this. I need to replace value in this line:
ODI_SESJA_ID CONSTANT '20130717144702'
to new timestamp, the same for all files. Current timestamp is not known (I mean value that exists in file currently, in this case '20130717144702').
So I need to (for each file found in directory):
find line starting from ODI_SESJA_ID
replace value after 'ODI_SESJA_ID CONSTANT ' with new one
the rest lines in file should stay untouched
What is the best way to do this using bash? Should I use sed or similar tools? How?
Something like:
sed 's/\(^[ \t]\+ODI_SESJA_ID\ CONSTANT\).*/\1 \"newtimestamp\"/' tmp
should work.
Group the string that will be retained, adding the placeholder (\1) in the replacement string. Replace newtimestamp with whatever value you prefer, of course.
I would do this using sed like so:
sed -i "/^[ \t]*ODI_SESJA_ID CONSTANT/s/'[^']\+'/'REPLACEMENT'/" *.ctl
The -i flag to sed means it modifies the files in place, so I usually try it on a single file first with the -e flag instead of the -i flag and confirm that sed's output is what I was looking for.
Explanation:
The double-quotes protect my regex from the shell.
/^[ \t]*ODI_SESJA_ID CONSTANT/ matches only the lines that start with whitespace followed by 'ODI_SESJA_ID CONSTANT'.
s/'[^']\+'/'REPLACEMENT'/ substitutes 'REPLACEMENT' (quoted) for the first quoted portion of the text on matching lines.
The document at http://www.catonmat.net/blog/wp-content/uploads/2008/09/sed1line.txt (top Google hit for 'sed one liners' is pretty helpful for quickly dispatching these sort of tasks.
I found some simplest solution, it seems to be good:
sed -i 's/.*ODI_SESJA_ID.*/ ODI_SESJA_ID CONSTANT "'$(date +%s)'",/' *.ctl
It replaces lines that contains ODI_SESJA_ID to new value. Not very elegant, because it replaces entire line, instead of only value that need to be processed.

LOAD DATA query error

What is the problem with this line
$load ="LOAD DATA INFILE $inputFile INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
echo $load;
mysql_query($load);
The echo result is;
LOAD DATA INFILE appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
The error is;
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED B' at line 1
According to the MYSQL LOAD DATA Reference it should have single quotes around the input file:
$load ="LOAD DATA INFILE '$inputFile' INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
Eventually looking likes this
LOAD DATA INFILE 'appendpb.csv' INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
Assuming the path of the file is correct.

Resources