I have a folder with over 400K txt files.
With names like
deID.RESUL_12433287659.txt_234323456.txt
deID.RESUL_34534563649.txt_345353567.txt
deID.RESUL_44235345636.txt_537967875.txt
deID.RESUL_35234663456.txt_423452545.txt
I want to store all the files and their content in the following way:
file_name file_content
deID.RESUL_12433287659.txt_234323456.txt Content 1
deID.RESUL_34534563649.txt_345353567.txt Content 2
deID.RESUL_44235345636.txt_537967875.txt Content 3
deID.RESUL_35234663456.txt_423452545.txt Content 4
I tried creating Control file using:
LOAD
DATA
INFILE 'deID.RESUL_12433287659.txt_234323456.txt'
INFILE 'deID.RESUL_34534563649.txt_345353567.txt'
INFILE 'deID.RESUL_44235345636.txt_537967875.txt'
INFILE 'deID.RESUL_35234663456.txt_423452545.txt'
APPEND INTO TABLE TBL_DATA
EVALUATE CHECK_CONSTRAINTS
REENABLE DISABLED_CONSTRAINTS
EXCEPTIONS EXCEPTION_TABLE
FIELDS TERMINATED BY ""
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
FILE_NAME
)
Is there a way I can grab the files names dynamically and specify wildcard in the INFILE so I don't have to mention 400K files one by one in my control file?
1) Create table to hold data/files
create table TBL_DATA(file_name varchar2(4000), file_content clob);
2) Create load_all.ctl
LOAD DATA
INFILE file_list.txt
INSERT INTO TABLE TBL_DATA
APPEND
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
file_name char(4000)
, file_content LOBFILE(file_name) TERMINATED BY EOF
)
3) Redirect list of file to one file_list.txt
ls -1 *.txt > file_list.txt
4) Run sqlldr user/pass#db control=load_all.ctl
5) load_all.ctl,file_list.txt and source files should be in the same folder.
Related
I am trying to read large CSV files with lots of Newline characters in them.
this is how the data looks like in the CSV file.
"LastValueInRow",
"FirstValueInNextRow",
I would like to use " + , + NEWLINE + " as records delimiter to prevent it from reading all other return characters as new records.
The following code reads most CSV records correctly by using NEWLINE (\n) + "
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "IMPORT_TEST"
ACCESS PARAMETERS
( RECORDS DELIMITED BY '\n"'
BADFILE SNOW_IMPORT_TEST:'TEST_1.bad'
LOGFILE SNOW_IMPORT_TEST:'TEST_1.log'
SKIP 1
FIELDS TERMINATED BY '","'
MISSING FIELD VALUES ARE NULL
)
LOCATION
( "IMPORT_TEST":'TEST_1.csv'
)
)
Adding any characters before the \n doesn't return any rows, below is what I want which doesn't work:
( RECORDS DELIMITED BY '",\n"'
Is it possible to use " + , + \n + " as records delimiter.
Thanks.
After a lot of research I have found that the best solution is to replace the return characters in the CSV file to a different character using Windows PowerShell then update the records delimiter in the external table.
I have created the following Powershell script to remove all the return characters in the CSV file (where $loc is the directory and $file_name is the file name)
(Get-content -raw -path $loc\$file_name".csv") -replace '[\r\n]', '|' | Out-File -FilePath $loc\$file_name"_PP.csv" -Force -Encoding ascii -nonewline
Then I have updated the external table parameter to read the records based on the new delimiter '",||"'.
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "IMPORT_TEST"
ACCESS PARAMETERS
( RECORDS DELIMITED BY '",||"'
BADFILE SNOW_IMPORT_TEST:'TEST_1_PP.bad'
LOGFILE SNOW_IMPORT_TEST:'TEST_1_PP.log'
SKIP 1
FIELDS TERMINATED BY '","'
MISSING FIELD VALUES ARE NULL
)
LOCATION
( "IMPORT_TEST":'TEST_1_PP.csv'
)
)
Now the external table is reading all the records correctly.
I am trying to export my hql output to csv in beeline using below command :
beeline -u "jdbc:hive2://****/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"?tez.queue.name=devices-jobs --outputformat=csv2 -e "use schema_name; select * from table_name where open_time_new>= '2020-07-13' and open_time_new < '2020-07-22'" > filename.csv
The problem is that some column values in the table contains commas which pushes the data of same column to the next column value.
For eg:
| abcd | as per data,outage fault,xxxx.
| xyz |as per the source,ghfg,hjhjg.
The above data will get saved as 4 column instead of 2.
Need help!
Try the approach with local directory:
insert overwrite local directory '/tmp/local_csv_report'
row format delimited fields terminated by "," escaped by '\\'
select *
from table_name
where open_time_new >= '2020-07-13'
and open_time_new < '2020-07-22'
This will create several csv files under your local /tmp/local_csv_report directory, so using simple cat after that will merge the results into a single file.
I am using SQLOADER to load multiple csv-files into 1 table.
This is the content of my ctl-file
load data
append
into table SAMP_TABLE
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( COLUMN1 CHAR(4000),
COLUMN2 CHAR(4000),
COLUMN3 CHAR(4000)
)
And this is my batch file
#echo off
for %%F in ("C:\Users\test\*.csv") do (
sqlldr username/pw#dbip CONTROL='C:\Users\test\test2.ctl' LOG='C:\Users\test\TEST.log' "DATA=%%F"
)
pause
All my csv-files, control-file and the batch-file are in the same directory.
I have two csv-files with the same columns only different content. The problem
is now that it imports only the first csv-file not the second one and i dont know why..? I would appreciate if someone could tell me what i am doing wrong
You just need to give multiple infile requests
load data
infile 'data1.csv'
infile 'data2.csv'
...
infile 'datan.csv'
append
into table TABLE1
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( COLUMN1 CHAR(4000),
COLUMN2 CHAR(4000),
COLUMN3 CHAR(4000)
)
If the files are always present in the same folder
infile /path/*.csv
Hello this is the solution of my problem:
#echo off
IF NOT EXIST C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4% md C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%
for %%F in ("C:\Users\test\*.csv") do (
sqlldr dbuser/dbpw#dbip CONTROL='C:\Users\test.ctl' LOG='C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%.log' "DATA=%%F" skip=1
move %%F C:\Users\test\%date:~-10,2%"-"%date:~-7,2%"-"%date:~-4,4%
)
pause
I am loading a delimited file using sqlldr. I have kept file format/table details in the ctl file and pass other parameters on the command line.
sqlldr control=sp.ctl data=data.20170502.txt SKIP=1 userid=xyz#db/pwd log=sp.log bad=sp.bad
sp.ctl
LOAD DATA
TRUNCATE
INTO TABLE "T_DATA"
TRUNCATE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
C_1 CHAR(2000),
C_2 CHAR(2000),
C_3 CHAR(2000)
)
I now need to use a stream record format on this data file.
infile 'example3.dat' "str '|\n'"
However, I am not using the infile syntax.
So I tried using
sqlldr control=sp.ctl data=data.20170502.txt "str '!\n'" SKIP=1
userid=xyz#db/pwd log=sp.log bad=sp.bad
It gives an error:
LRM-00112: multiple values not allowed for parameter 'data'
How do I pass the record delimiter on the command line?
What is the problem with this line
$load ="LOAD DATA INFILE $inputFile INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
echo $load;
mysql_query($load);
The echo result is;
LOAD DATA INFILE appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
The error is;
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED B' at line 1
According to the MYSQL LOAD DATA Reference it should have single quotes around the input file:
$load ="LOAD DATA INFILE '$inputFile' INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
Eventually looking likes this
LOAD DATA INFILE 'appendpb.csv' INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
Assuming the path of the file is correct.