What is the problem with this line
$load ="LOAD DATA INFILE $inputFile INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
echo $load;
mysql_query($load);
The echo result is;
LOAD DATA INFILE appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
The error is;
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED B' at line 1
According to the MYSQL LOAD DATA Reference it should have single quotes around the input file:
$load ="LOAD DATA INFILE '$inputFile' INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
Eventually looking likes this
LOAD DATA INFILE 'appendpb.csv' INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
Assuming the path of the file is correct.
Related
I am trying to read large CSV files with lots of Newline characters in them.
this is how the data looks like in the CSV file.
"LastValueInRow",
"FirstValueInNextRow",
I would like to use " + , + NEWLINE + " as records delimiter to prevent it from reading all other return characters as new records.
The following code reads most CSV records correctly by using NEWLINE (\n) + "
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "IMPORT_TEST"
ACCESS PARAMETERS
( RECORDS DELIMITED BY '\n"'
BADFILE SNOW_IMPORT_TEST:'TEST_1.bad'
LOGFILE SNOW_IMPORT_TEST:'TEST_1.log'
SKIP 1
FIELDS TERMINATED BY '","'
MISSING FIELD VALUES ARE NULL
)
LOCATION
( "IMPORT_TEST":'TEST_1.csv'
)
)
Adding any characters before the \n doesn't return any rows, below is what I want which doesn't work:
( RECORDS DELIMITED BY '",\n"'
Is it possible to use " + , + \n + " as records delimiter.
Thanks.
After a lot of research I have found that the best solution is to replace the return characters in the CSV file to a different character using Windows PowerShell then update the records delimiter in the external table.
I have created the following Powershell script to remove all the return characters in the CSV file (where $loc is the directory and $file_name is the file name)
(Get-content -raw -path $loc\$file_name".csv") -replace '[\r\n]', '|' | Out-File -FilePath $loc\$file_name"_PP.csv" -Force -Encoding ascii -nonewline
Then I have updated the external table parameter to read the records based on the new delimiter '",||"'.
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "IMPORT_TEST"
ACCESS PARAMETERS
( RECORDS DELIMITED BY '",||"'
BADFILE SNOW_IMPORT_TEST:'TEST_1_PP.bad'
LOGFILE SNOW_IMPORT_TEST:'TEST_1_PP.log'
SKIP 1
FIELDS TERMINATED BY '","'
MISSING FIELD VALUES ARE NULL
)
LOCATION
( "IMPORT_TEST":'TEST_1_PP.csv'
)
)
Now the external table is reading all the records correctly.
I have a folder with over 400K txt files.
With names like
deID.RESUL_12433287659.txt_234323456.txt
deID.RESUL_34534563649.txt_345353567.txt
deID.RESUL_44235345636.txt_537967875.txt
deID.RESUL_35234663456.txt_423452545.txt
I want to store all the files and their content in the following way:
file_name file_content
deID.RESUL_12433287659.txt_234323456.txt Content 1
deID.RESUL_34534563649.txt_345353567.txt Content 2
deID.RESUL_44235345636.txt_537967875.txt Content 3
deID.RESUL_35234663456.txt_423452545.txt Content 4
I tried creating Control file using:
LOAD
DATA
INFILE 'deID.RESUL_12433287659.txt_234323456.txt'
INFILE 'deID.RESUL_34534563649.txt_345353567.txt'
INFILE 'deID.RESUL_44235345636.txt_537967875.txt'
INFILE 'deID.RESUL_35234663456.txt_423452545.txt'
APPEND INTO TABLE TBL_DATA
EVALUATE CHECK_CONSTRAINTS
REENABLE DISABLED_CONSTRAINTS
EXCEPTIONS EXCEPTION_TABLE
FIELDS TERMINATED BY ""
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
FILE_NAME
)
Is there a way I can grab the files names dynamically and specify wildcard in the INFILE so I don't have to mention 400K files one by one in my control file?
1) Create table to hold data/files
create table TBL_DATA(file_name varchar2(4000), file_content clob);
2) Create load_all.ctl
LOAD DATA
INFILE file_list.txt
INSERT INTO TABLE TBL_DATA
APPEND
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
file_name char(4000)
, file_content LOBFILE(file_name) TERMINATED BY EOF
)
3) Redirect list of file to one file_list.txt
ls -1 *.txt > file_list.txt
4) Run sqlldr user/pass#db control=load_all.ctl
5) load_all.ctl,file_list.txt and source files should be in the same folder.
I am loading a delimited file using sqlldr. I have kept file format/table details in the ctl file and pass other parameters on the command line.
sqlldr control=sp.ctl data=data.20170502.txt SKIP=1 userid=xyz#db/pwd log=sp.log bad=sp.bad
sp.ctl
LOAD DATA
TRUNCATE
INTO TABLE "T_DATA"
TRUNCATE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
C_1 CHAR(2000),
C_2 CHAR(2000),
C_3 CHAR(2000)
)
I now need to use a stream record format on this data file.
infile 'example3.dat' "str '|\n'"
However, I am not using the infile syntax.
So I tried using
sqlldr control=sp.ctl data=data.20170502.txt "str '!\n'" SKIP=1
userid=xyz#db/pwd log=sp.log bad=sp.bad
It gives an error:
LRM-00112: multiple values not allowed for parameter 'data'
How do I pass the record delimiter on the command line?
currently in my code below line used to fix the newline break in a csv :
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' MY_FILE.csv > MY_FILE.csv.tmp
I want to do a pre check like if there is a new line break present in the file then only script will run the above command to fix that file, how do I add a pre check for this ?
my csv file looks as below and having 1 millions records in it :
20160711,"M","N1","F","S","A","good data with.....some special character and space (new line)
space ..
....","M","072","00126"
20160711,"M","N1","F","S","A","R","M","072","00126"
20160711,"M","N1","F","S","A","R","M","072","00126"
new line can appear anywhere in the file .
#sabya Perhaps count the double quotes on a line? If odd, then there is a return somewhere:
gawk '{if (and(1,gsub(/"/, "\"")) HasReturn = 1; exit} END {exit HasReturn}'
I would respectfully suggest you load the data as given and not alter it in order to maintain data integrity by constructing the control file to preserve the newline between the double-quotes.
Construct the control file like this using the "str" clause on the infile option line to set the end of record character. It tells sqlldr that hex 0D (carriage return, or ^M) is the record separator (this way it will ignore the linefeeds inside the double-quotes):
LOAD DATA
infile "test.dat" "str x'0D'"
TRUNCATE
INTO TABLE test
replace
fields terminated by ","
optionally enclosed by '"'
(
cola char,
colb char,
colc char
)
More info in this post: https://stackoverflow.com/a/37216660/2543416
Can anyone give me why I am getting error while creating partitioed table from bash shell.
[cloudera#localhost ~]$ hive -e "create table peoplecountry (
name1 string,
name2 string,
salary int,
country string
)
partitioned by (country string)
row format delimited
column terminated by '\n'";
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.7.0.jar!/hive-log4j.properties
Hive history file=/tmp/cloudera/hive_job_log_0fdf7083-8ab4-499f-8048-a85f162d1357_376056456.txt
FAILED: ParseException line 8:0 missing EOF at 'column' near 'delimited'
If you meant newline at end of each row of your data then you need to use:
line terminated by '\n'
instead of column terminated by ,
In case you meant each column in the row to separated by a delimiter , then specify as
fields terminated by '\n'
refer :
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL