I'm trying to load data from a Datafile in different tables, I read a lot about field declaration and delimitation(Position(n:n), terminated by ). The point is than I'm not sure how to do what I need to do. Let me explain this with an example.
I have two tables (person, phone):
person_table( person_id_pk, person_name) - phone_table(person_id_pk, phone)
I have a datafile with:
$ datafile.txt
1,jack pierson,+13526985442
2,Katherine McLaren,+15264586548
My point is, when I'm declaring my ConfigFile.ctl, how do I specify than the field number 3 (phone field) should be insert or append into "phone_table", and the others two fields (person_id, person_name) should be insert or append into "person_table"
Considering than the fields are not fixed length, my reference is the field position. (Field datafile position)
I was thinking to try something like
$configfile.ctl
LOAD DATA
INFILE datafile.txt
APPEND
INTO TABLE person_table
(
person_id_pk POSITION (*) INTEGER EXTERNAL TERMINATED BY "," ,
person_name POSITION(*+1) CHAR(30) TERMINATED BY ","
)
INTO TABLE phone_table
(
person_id_fk POSITION (*) INTEGER EXTERNAL TERMINATED BY ","
phone ------> Right here is my point, how can I specify to SQL Loader than here
should be the field number 3 from datafile
)
I hope you guys get my point. it is a HUGE issue for me, because i'm dealing with CSV files which contains 60, 80, even 100 fields (columns based on Excel File). And every fields or group of fields could be in different tables.
I really appreciate the guide and help you could grant me. I'm probably wrong about my example and controlfile declarations, I haven't implemented anything yet. So I'm open to every suggest you could give me.
Your control file should look like this. The second "INTO TABLE" Uses POSITION(1) to move the logical "pointer" back to the start of the current line so it can be read again. then the name is skipped by defining it as a FILLER.
LOAD DATA
INFILE datafile.txt
APPEND
INTO TABLE person_table
FIELDS TERMINATED BY "," TRAILING NULLCOLS
(
person_id_pk INTEGER EXTERNAL,
person_name CHAR(30)
)
INTO TABLE phone_table
FIELDS TERMINATED BY "," TRAILING NULLCOLS
(
person_id_fk POSITION(1) INTEGER EXTERNAL,
x_name FILLER,
phone CHAR(12)
)
Related
I'm loading data into my table through SQL Loader
data loading is successful but i''m getting garbage(repetitive) value in a particular column for all rows
After inserting :
column TERM_AGREEMENT is getting value '806158336' for every record
My csv file contains atmost 3 digit data for that column,but i'm forced to set my column definition to Number(10).
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION,
**TERM_AGREEMENT INTEGER**
)
create table LOAN_BALANCE_MASTER_INT
(
ACCOUNT_NO NUMBER(30),
CUSTOMER_NAME VARCHAR2(70),
LIMIT NUMBER(30),
PRODUCT_DESC VARCHAR2(30),
SUBPRODUCT_CODE NUMBER,
ARREARS_INT NUMBER(20,2),
IRREGULARITY NUMBER(20,2),
PRINCIPLE_IRREGULARITY NUMBER(20,2),
**TERM_AGREEMENT NUMBER(10)**
)
INTEGER is for binary data type. If you're importing a csv file, I suppose the numbers are stored as plain text, so you should use INTEGER EXTERNAL. The EXTERNAL clause specifies character data that represents a number.
Edit:
The issue seems to be the termination character of the file. You should be able to solve this issue by editing the INFILE line this way:
INFILE'/ipoapplication/utl_file/LBR_HE_Mar16.csv' "STR X'5E204D'"
Where '5E204D' is the hexadecimal for '^ M'. To get the hexadecimal value you can use the following query:
SELECT utl_raw.cast_to_raw ('^ M') AS hexadecimal FROM dual;
Hope this helps.
I actually solved this issue on my own.
Firstly, thanks to #Gary_W AND #Alessandro for their inputs.Really appreciate your help guys,learned some new things in the process.
Here's the new fragment which worked and i got the correct data for the last column
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION,
**TERM_AGREEMENT INTEGER Terminated by Whitspace**
)
'Terminated by whitespace' - I went through some threads of SQL Loader and i used 'terminated by whitespace' in the last column of his ctl file. it worked ,this time i didn't even had to use 'INTEGER' or 'EXTERNAL' or EXPRESSION '..' for conversion.
Just one thing, now can you guys let me now what could possibly be creating issue ?what was there in my csv file in that column and how by adding this thing solved the issue ?
Thanks.
As mentioned in the title, i wish to have a control file to handle this case. The scenario is i have to insert record into different table. For example, WHEN (1:3) is HEA, it need to Append into table header. WHEN (1:3) is DTL it need replace into table Detail. is that possible to do this?
I have a situation where data from one file goes to three tables depending on the first field in the file. The WHEN clause looks at the first field and takes action based on that. Notice that when a 'WHEN' is met, the first field is then skipped by declaring it a filler. To answer your question, I believe you can put the APPEND or REPLACE after the INTO TABLE clause. Give it a try and let us know.
OPTIONS (DIRECT=TRUE)
UNRECOVERABLE
LOAD DATA
APPEND
INTO TABLE TABLE_A
WHEN (01) = 'CLM'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
( rec_skip filler POSITION(1)
,CLM_CLAIM_ID CHAR NULLIF(CLM_CLAIM_ID=BLANKS)
...
)
INTO TABLE TABLE_B
WHEN (01) = 'SLN'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
( rec_skip filler POSITION(1)
,SL_CLAIM_ID CHAR NULLIF(SL_CLAIM_ID=BLANKS)
...
)
INTO TABLE TABLE_C
WHEN (01) = 'COB'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
( rec_skip filler POSITION(1)
,COB_CLAIM
...
)
More info: http://docs.oracle.com/cd/B28359_01/server.111/b28319/ldr_control_file.htm#i1005657
I was trying to load records from a file to an oracle table based on conditions. Since OR operator and WHEN IN statements do not work in sql loader, I tried multiple insert to a table. However, only the records that match the first condition were loaded in the table and the records that matched the second condition were not loaded. My control file looks like below:
Options (BINDSIZE = 7340032)
Load Data
APPEND
INTO TABLE TEMP_GLOBAL_ONE_FEE_REBATE WHEN ACT_TYPE = 'SR'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
RPT_YEAR,
RPT_MONTH,
........
........
)
INTO TABLE TEMP_GLOBAL_ONE_FEE_REBATE WHEN ACT_TYPE = 'SL'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
RPT_YEAR,
RPT_MONTH,
........
........
)
** As mentioned, only those records with act_type = 'SR' were loaded and those records with act_type = 'SL' were not loaded.
Any idea how to go on this? Thank you.
Your problem is that the first INTO command reads the file from beginning to end, and then the second INTO command picks up where the first one finished - which is the end of the file in your case.
To achieve what you are trying to do, you're gonna have to use two seperate sql loader commands. See this post on AskTom for reference -
https://asktom.oracle.com/pls/apex/f?p=100:11:::YES:RP:P11_QUESTION_ID:3181887000346205200
A more elegant solution would be reading the data from the file using a pl/sql procedure and UTL_FILE package, but this is only worth the trouble if the import is something that happens a lot, and not a one time thing.
You should use POSITION(1) in the first column of each field list:
To force record scanning to start in a specific location, you use the POSITION parameter.
Control file
Options (BINDSIZE = 7340032)
Load Data
APPEND
INTO TABLE TEMP_GLOBAL_ONE_FEE_REBATE WHEN ACT_TYPE = 'SR'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
RPT_YEAR POSITION(1),
RPT_MONTH,
........
........
)
INTO TABLE TEMP_GLOBAL_ONE_FEE_REBATE WHEN ACT_TYPE = 'SL'
FIELDS TERMINATED BY '|' TRAILING NULLCOLS
(
RPT_YEAR POSITION(1),
RPT_MONTH,
........
........
)
Sample data
2015|01|SL
2015|02|SL
2015|03|SL
2015|03|SR
2015|04|SR
2015|04|XX
This will load 2 rows with 'SR', 3 rows with 'SL', and discard one row.
References
SQL*Loader with multiple WHENs is rejecting all rows, the Ask Tom queston mentioned in the accepted answer
Distinguishing Different Input Record Formats in SQL*Loader Control File
Loading Data into Multiple Tables in SQL*Loader Control File Reference
I Have my data in this format.
"123";"mybook1";"2002";"publisher1";
"456";"mybook2;the best seller";"2004";"publisher2";
"789";"mybook3";"2002";"publisher1";
the fields are enclosed in "" and are delimited by ;
Also the book name may contain ';' in between.
Can you tell me how to load this data from file to hive table
the below query which i am using now obviously not working ;
create table books (isbn string,title string,year string,publisher string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;'
if possible i want the userid and year fields to be stored as Int.
Please help
Thanks,
Harish
The thing you are missing is RegexSerDe. It's very helpful in inserting only a part of text from the input. Your DDL goes like :
create table books ( isbn string, title string, year string, publisher string )
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(?:\")(\\d*)(?:\"\;\")([^\"]*)(?:\"\;\")(\\d*)(?:\"\;\")([^\"]*)\"(?:\;)" ,
"output.format.string" = "%1$s %2$s %3$s %4$s"
)
STORED AS TEXTFILE;
The regex may look complex at the first sight due the escaping and non-capturing groups. Actually it contains 2 groups (\d*) & ([^"]*) placed alternately two times. The non-capturing groups ((?:) just helps to remove the unnecessary context. The group ([^"]*) also take care of ';' inside bookName field.
But nothing comes without a cost. Despite all of its features, RegexSerDe supports only string fields. All you can do is to call the default hive UDF cast to do the transformation when selecting the data from the table. eg(actual syntax may vary a bit) :
SELECT cast( year as int ) from books;
Hope this helps.
In order to load data (from a CSV file) into an Oracle database, I use SQL*Loader.
In the table that receives these data, there is a varchar2(500) column, called COMMENTS.
For some reasons, I want to ignore this information from the CSV file.
Thus, I wrote this control file:
Options (BindSize=10000000,Readsize=10000000,Rows=5000,Errors=100)
Load Data
Infile 'XXX.txt'
Append into table T_XXX
Fields Terminated By ';'
TRAILING NULLCOLS
(
...
COMMENTS FILLER,
...
)
This code seems to work correctly, as the COMMENTS field in database is always set to null.
However, if in my CSV file I have a record where the corresponding COMMENTS field exceeds the 500 characters limit, I get an error from SQL*Loader:
Record 2: Rejected - Error on table T_XXX, column COMMENTS.
Field in data file exceeds maximum length
Is there a way to really exclude the processing of my COMMENTS fields?
I can't reproduce your problem. I'm using Oracle 10.2.0.3.0 with SQL*Loader 10.2.0.1.
Here is my test case:
SQL> CREATE TABLE test_sqlldr (
2 ID NUMBER,
3 comments VARCHAR2(20),
4 id2 NUMBER
5 );
Table created
Control file:
LOAD DATA
INFILE test.data
INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS
( id,
comments filler,
id2
)
data file:
1;aaa;2
3;abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz;4
5;bbb;6
I'm using the command sqlldr userid=xxx/yyy#zzz control=test.ctl and I'm getting all the rows without errors:
SQL> select * from test_sqlldr;
ID COMMENTS ID2
---------- -------------------- ----------
1 2
3 4
5 6
You may try another approach, I'm getting the same desired result with the following control file:
LOAD DATA
INFILE test.data
INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS
( id,
comments "substr(:comments,1,0)",
id2
)
Update following Romaintaz's comment: I looked into it again and managed to get the same error as you when the size of the column exceeded 255 characters. This is because the default datatype of SQL*Loader is char(255). If you have a column with more data you will have to specify the length. The following control file solved the problem for a column with 300 characters:
LOAD DATA
INFILE test.data
INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS
( id,
comments filler char(4000),
id2
)
Hope this Helps,
--
Vincent
Just to suggest a tiny improvement, you might try something like:
LOAD DATA
IN FILE test.data INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'TRAILING NULLCOLS
(
id,
comments char(4000) "substr(:comments, 1, 200)",
id2)
Now you'll grab the first 200 characters (or any number you specify in it's place) of all comments - unless some of your input records have values for the comments field that exceed 4000 characters, in which they'll be rejected by loader with the 'exceeds max length' error noted earlier. But assuming that's rare or not the case, all the records will load with some of the comments truncated to 200 chars.
If you go over char(4000) you'll get a SQL Loader error - there's a limit to how far you can push the beast.