I use sqlldr to import CSV files and I have some problem with date multiple formats.
Dates inside the CSV file are DD/MM/YYYY and if there is no date it is a single dot
CSV file
DATE_COLUMN;OTHER_COLUMN
01/01/2013;other column content 1
.;other column content 2
My .ctl file for sqlldr
LOAD DATA
INFILE '/path/to/my/file.csv'
REPLACE INTO TABLE table_to_fill
FIELDS TERMINATED BY ';'
(
COLUMNDATE "decode(:COLUMNDATE ,NULL,'.', to_date(:COLUMNDATE ,'DD/MM/YYYY'))",
OTHER_COLUMN
)
The import is working when I use :
decode(:COLUMNDATE ,NULL,'.'))
or
to_date(:COLUMNDATE ,'DD/MM/YYYY')
But not when I try to combine both...
Here is the error log :
Record 1: Rejected - Error on table table_to_fill, column COLUMNDATE.
ORA-01858: a non-numeric character was found where a numeric was expected
How can I combine these, please ?
I thought that the last parameter of the "decode" function was for the default value of the column, am I wrong ?
SQL Loader's "regular" syntax should be enough here. Try this:
LOAD DATA
INFILE '/path/to/my/file.csv'
REPLACE INTO TABLE table_to_fill
FIELDS TERMINATED BY ';'
(
COLUMNDATE DATE(7) "DD/MM/YYYY" NULLIF COLUMNDATE = "."
OTHER_COLUMN
)
Related
I'm loading data into my table through SQL Loader
data loading is successful but i''m getting garbage(repetitive) value in a particular column for all rows
After inserting :
column TERM_AGREEMENT is getting value '806158336' for every record
My csv file contains atmost 3 digit data for that column,but i'm forced to set my column definition to Number(10).
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION,
**TERM_AGREEMENT INTEGER**
)
create table LOAN_BALANCE_MASTER_INT
(
ACCOUNT_NO NUMBER(30),
CUSTOMER_NAME VARCHAR2(70),
LIMIT NUMBER(30),
PRODUCT_DESC VARCHAR2(30),
SUBPRODUCT_CODE NUMBER,
ARREARS_INT NUMBER(20,2),
IRREGULARITY NUMBER(20,2),
PRINCIPLE_IRREGULARITY NUMBER(20,2),
**TERM_AGREEMENT NUMBER(10)**
)
INTEGER is for binary data type. If you're importing a csv file, I suppose the numbers are stored as plain text, so you should use INTEGER EXTERNAL. The EXTERNAL clause specifies character data that represents a number.
Edit:
The issue seems to be the termination character of the file. You should be able to solve this issue by editing the INFILE line this way:
INFILE'/ipoapplication/utl_file/LBR_HE_Mar16.csv' "STR X'5E204D'"
Where '5E204D' is the hexadecimal for '^ M'. To get the hexadecimal value you can use the following query:
SELECT utl_raw.cast_to_raw ('^ M') AS hexadecimal FROM dual;
Hope this helps.
I actually solved this issue on my own.
Firstly, thanks to #Gary_W AND #Alessandro for their inputs.Really appreciate your help guys,learned some new things in the process.
Here's the new fragment which worked and i got the correct data for the last column
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION,
**TERM_AGREEMENT INTEGER Terminated by Whitspace**
)
'Terminated by whitespace' - I went through some threads of SQL Loader and i used 'terminated by whitespace' in the last column of his ctl file. it worked ,this time i didn't even had to use 'INTEGER' or 'EXTERNAL' or EXPRESSION '..' for conversion.
Just one thing, now can you guys let me now what could possibly be creating issue ?what was there in my csv file in that column and how by adding this thing solved the issue ?
Thanks.
My table data has contains new line character it is loading from sql loader ctl file, one column called 'IPADDRESS'is loading with new line character:
My ctl file :
load data
INFILE 'abc.txt'
INTO TABLE TABLENAME
APPEND
FIELDS TERMINATED BY '\|'
(MAKE,
CUST_ID "UPPER(:CUST_ID)",
IPADDRESS "REGEXP_REPLACE(:IPADDRESS, '\\.\\D+', '', 1, 0)"
)
Data in table storing is Ex:
Make CUST_ID IPADDRESS
------------------------------
C MPG-VG-ALG01 "9.7.69.37
"
C MPG-VG-ALG03 "9.7.69.39
"
Sample input file data :
C|mpg-vg-alg01.gdl.mex.ibm.com|9.7.69.37
C|mpg-vg-alg03.gdl.mex.ibm.com|9.7.69.39
C|mpg-vg-alg04.gdl.mex.ibm.com|9.7.69.23
Answer for my question is : column_name "REPLACE(:column_name,CHR(13),'')";
Yes, one option would be using REPLACE() function but need to add more;
add CHAR(data_length) for string any data type even if it's of type VARCHAR2
add CHR(10)(line feed) also along with CHR(13)(carriage return)
don't forget to add TRIM() function nested within REPLACE() against extra
issues too
using the third argument is redundant
such as
column_name CHAR(4000) "REPLACE(TRIM(:column_name),CHR(13)||CHR(10))"'
moreover
column_name CHAR(4000) "TRANSLATE(TRIM(:column_name),CHR(13)||CHR(10),' ')"'
might be used as an alternative.
I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '\t' .
But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is getting loaded into a single column and most of the values it showing as NULL.
below is my create table statement.
CREATE TABLE example
(
organization STRING,
order BIGINT,
created_on TIMESTAMP,
issue_date TIMESTAMP,
qty INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '"'
STORED AS TEXTFILE;
Input file sample;-
"Organization" "Order" "Created on" "issue_date" "qty"
"GB" "111223" "2015/02/06 00:00:00" "2015/05/15 00:00:00" "5"
"UK" "1110" "2015/05/06 00:00:00" "2015/06/1 00:00:00" "51"
and Load statement to push data into hive table.
LOAD DATA INPATH '/user/example.csv' OVERWRITE INTO TABLE example
What could be the issue and how can I ignore header of the file.
and if I remove ESCAPED BY '"' from create statement its loading in respective columns but all the values are enclosed by double quotes.
How can I remove double quotes from values and ignore header of the file?
You can now use OpenCSVSerde which allows you to define the separator character and easily escape surrounding double-quotes :
CREATE EXTERNAL TABLE example (
organization STRING,
order BIGINT,
created_on TIMESTAMP,
issue_date TIMESTAMP,
qty INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "\""
)
LOCATION '/your/folder/location/';
You don't want to use escaped by, that's for escape characters, not quote characters. I don't think that Hive actually has support for quote characters. You might want to take a look at this csv serde which accepts a quotechar property.
Also if you have HUE, you can use the metastore manager webapp to load the CSV in, this will deal with the header row, column datatypes and so on.
Use CSV Serde to create the table. I've created a table in hive as follows, and it works like charm.
CREATE EXTERNAL TABLE IF NOT EXISTS myTable (
id STRING,
url STRING,
name STRING
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties ("separatorChar" = "\t")
LOCATION '<folder location>';
"Hive now includes an OpenCSVSerde which will properly parse those quoted fields without adding additional jars or error prone and slow regex."
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
source = Ben Doerr
How to handle fields enclosed within quotes(CSV) in importing data from S3 into DynamoDB using EMR/Hive
You can use a CSV serde " csv-serde-1.1.2.jar " to load the file without double quotes.
download link:
http://ogrodnek.github.io/csv-serde/
and the create table statement as
CREATE TABLE <table_name> (col_name_1 type1, col_name_2 type2, ...) row format serde 'com.bizo.hive.serde.csv.CSVSerde';
you can remove the header with the following property in the create table stmt
tblproperties ("skip.header.line.count"="1");
Need help here.
This is related to hive.
i have a text file with a single long line, for e.g:
JASON 29\SASHA 24\CHRISTINE 15\ROBERT 20\
Now i need to create a table in hive, whose rows are delimited using "\" (backslash), like if i insert the data from the above mentioned line "JASON 29\SASHA 24...." i would want 4 rows to be inseted in my table.
in other words, i want my custom char to be row delimiters, and not the default "\n".
i wrote the DDL:
CREATE TABLE newline_tab
(
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\\'
STORED AS TEXTFILE;
but i am unable to create the table, and im getting following error:
FAILED: SemanticException 9:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\''
any help would be appreciated :)
CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
here is my case :
input lines:
"vijay" <\t> "a-b-c","a-c-d","a-d-c"
"kumar" <\t> "a-b-c","b-c-d""
i created table like this :
hive >create table user_infos(name string, path ARRAY<String> --i need array only)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS
TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE ;
output received :
hive > select * from user_infos ;
"vijay" ["**\"a-b-c\"**","**\"a-c-d\"**","**\"a-d-c\"**"]
"kumar" ["**\"a-b-c\"**","**\"b-c-d\"**"]
problem here is : i don't want double quotes i.e., \"
Required output :
vijay ["a-b-c","a-c-d","a-d-c"]
kumar ["a-b-c","b-c-d"]
Is there any why to achieve this not using custom Serde. Any thing like ENCLOSED BY like in mysql?
I was also stuck with the same issue as my fields are enclosed with double quotes and separated by semicolon(;). My table name is employee1.
So I have searched with links and I have found perfect solution for this.
#ramisetty.vijay: Yes, We have to use serde for this. Please download serde jar using this link : https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar
then follow below steps using hive prompt :
add jar path/to/csv-serde.jar;
create table employee1(id string, name string, addr string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
;
and then load data from your given path using below query:
load data local inpath 'path/xyz.csv' into table employee1;
and then run :
select * from employee1;
Thanks.