Import CSV which every cell terminated by newline - oracle

I have CSV file. The data looks like this :
PRICE_a
123
PRICE_b
500
PRICE_c
1000
PRICE_d
506
My XYZ Table is :
CREATE TABLE XYZ (
DESCRIPTION_1 VARCHAR2(25),
VALUE NUMBER
)
Do csv as above can be imported to the oracle?
How do I create a control.ctl file?

Here's how to do it without having to do any pre-processing. Use the CONCATENATE 2 clause to tell SQL-Loader to join every 2 lines together. This builds logical records but you have no separator between the 2 fields. No problem, but first understand how the data file is read and processed. SQL-Loader will read the data file a record at a time, and try to map each field in order from left to right to the fields as listed in the control file. See the control file below. Since the concatenated record it read matches with TEMP from the control file, and TEMP does not match a column in the table, it will not try to insert it. Instead, since it is defined as a BOUNDFILLER, that means don't try to do anything with it but save it for future use. There are no more data file fields to try to match, but the control file next lists a field name that matches a column name, DESCRIPTION_1, so it will apply the expression and insert it.
The expression says to apply the regexp_substr function to the saved string :TEMP (which we know is the entire record from the file) and return the substring of that record consisting of zero or more non-numeric characters from the start of the string where followed by zero or more numeric characters until the end of the string, and insert that into the DESCRIPTION_1 column.
The same is then done for the VALUE column, only returning the numeric part at the end of the string, skipping the non-numeric at the beginning of the string.
load data
infile 'xyz.dat'
CONCATENATE 2
into table XYZ
truncate
TRAILING NULLCOLS
(
TEMP BOUNDFILLER CHAR(30),
DESCRIPTION_1 EXPRESSION "REGEXP_SUBSTR(:TEMP, '^([^0-9]*)[0-9]*$', 1, 1, NULL, 1)",
VALUE EXPRESSION "REGEXP_SUBSTR(:TEMP, '^[^0-9]*([0-9]*)$', 1, 1, NULL, 1)"
)
Bada-boom, bada-bing:
SQL> select *
from XYZ
/
DESCRIPTION_1 VALUE
------------------------- ----------
PRICE_a 123
PRICE_b 500
PRICE_c 1000
PRICE_d 506
SQL>
Note that this is pretty dependent on the data following your example, and you should do some analysis of the data to make sure the regular expressions will work before putting this into production. Some tweaking will be required if the descriptions could contain numbers. If you can get the data to be properly formatted with a separator in a true CSV format, that would be much better.

Related

loading data in table using SQL Loader

I'm loading data into my table through SQL Loader
data loading is successful but i''m getting garbage(repetitive) value in a particular column for all rows
After inserting :
column TERM_AGREEMENT is getting value '806158336' for every record
My csv file contains atmost 3 digit data for that column,but i'm forced to set my column definition to Number(10).
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION,
**TERM_AGREEMENT INTEGER**
)
create table LOAN_BALANCE_MASTER_INT
(
ACCOUNT_NO NUMBER(30),
CUSTOMER_NAME VARCHAR2(70),
LIMIT NUMBER(30),
PRODUCT_DESC VARCHAR2(30),
SUBPRODUCT_CODE NUMBER,
ARREARS_INT NUMBER(20,2),
IRREGULARITY NUMBER(20,2),
PRINCIPLE_IRREGULARITY NUMBER(20,2),
**TERM_AGREEMENT NUMBER(10)**
)
INTEGER is for binary data type. If you're importing a csv file, I suppose the numbers are stored as plain text, so you should use INTEGER EXTERNAL. The EXTERNAL clause specifies character data that represents a number.
Edit:
The issue seems to be the termination character of the file. You should be able to solve this issue by editing the INFILE line this way:
INFILE'/ipoapplication/utl_file/LBR_HE_Mar16.csv' "STR X'5E204D'"
Where '5E204D' is the hexadecimal for '^ M'. To get the hexadecimal value you can use the following query:
SELECT utl_raw.cast_to_raw ('^ M') AS hexadecimal FROM dual;
Hope this helps.
I actually solved this issue on my own.
Firstly, thanks to #Gary_W AND #Alessandro for their inputs.Really appreciate your help guys,learned some new things in the process.
Here's the new fragment which worked and i got the correct data for the last column
LOAD DATA
infile '/ipoapplication/utl_file/LBR_HE_Mar16.csv'
REPLACE
INTO TABLE LOAN_BALANCE_MASTER_INT
fields terminated by ',' optionally enclosed by '"'
(
ACCOUNT_NO,
CUSTOMER_NAME,
LIMIT,
REGION,
**TERM_AGREEMENT INTEGER Terminated by Whitspace**
)
'Terminated by whitespace' - I went through some threads of SQL Loader and i used 'terminated by whitespace' in the last column of his ctl file. it worked ,this time i didn't even had to use 'INTEGER' or 'EXTERNAL' or EXPRESSION '..' for conversion.
Just one thing, now can you guys let me now what could possibly be creating issue ?what was there in my csv file in that column and how by adding this thing solved the issue ?
Thanks.

Replace data of one column with substring of another column in sql loader

I am loading data from a csv file into a table using sqlldr. There is one column which is not present in every row of the csv file. The data needed to populate this column is present in one of the other columns of the row. I need to split (split(.) )that column's data and populate into that column.
Like:-
column1:- abc.xyz.n
So the unknown column(column2) should be
column2:- xyz
Also, there is another column which is present in the row but it's not what I want to input into the table. It is also needed to be populated from column1. But there are around 50 if-else cases in that. Is decode preferable to do this?
column1:- abc.xyz.n
Then,
column2:- hi if(column1 has 'abc')
if(column1 has 'abd' then 'hello')
like this there are around 50 if-else cases.
Thanks for help.
For the first part of your question, define the column1 data in the control file as BOUNDFILLER with a name that does not match a table column name which tells sqlldr to remember it but don't use it. If you need to load it into a column, use the column name plus the remembered name. For column2, use the remembered BOUNDFILLER name in an expression where it returns the part you need (in this case the 2nd field, allowing for NULLs):
x boundfiller,
column1 EXPRESSION ":x",
column2 EXPRESSION "REGEXP_SUBSTR(:x, '(.*?)(\\.|$)', 1, 2, NULL, 1)"
Note the double backslash is needed else it gets removed as it gets passed to the regex engine from sqlldr and the regex pattern is altered incorrectly. A quirk I guess.
Anyway after this column1 ends up with "abc.xyz.n" and column2 gets "xyz".
For the second part of your question, you could use an expression as already shown but call a custom function you create where you pass the extracted value and it would return the searched value from a lookup table. You certainly don't want to hardcode your 50 lookup values. You could do the same thing basically in a table level trigger too. Note I show a select statement for an example only but this should be encapsulated in a function for reusability and maintainability:
Just to show you can do it:
col2 EXPRESSION "(select 'hello' from dual where REGEXP_SUBSTR(:x, '(.*?)(\\.|$)', 1, 2, NULL, 1) = 'xyz')"
The right way:
col2 EXPRESSION "(myschema.mylookupfunc(REGEXP_SUBSTR(:x, '(.*?)(\\.|$)', 1, 2, NULL, 1)))"
mylookupfunc returns the result of looking up 'xyz' in the lookup table, i.e. 'hello' as per your example.

What can I do to ensure fields longer than column width go to the BAD File?

When creating Oracle external tables, how should I phrase the reject rows clause to ensure that any field which exceeds its column width rejects and goes in the BADFILE?
This is my current design and I don't want records greater than 20 characters. I do want them to go BADFILE instead. Yet, they still appear when I select * from foobar
DROP TABLE FOOBAR CASCADE CONSTRAINTS;
CREATE TABLE FOOBAR
(
FOO_MAX20 VARCHAR2(20 CHAR)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY FOOBAR
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE
BADFILE 'foobar_bad_rec.txt'
DISCARDFILE 'foobar_discard_rec.txt'
LOGFILE 'foobar_logfile.txt'
FIELDS
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
(
FOO_MAX20 POSITION(1:20)
)
)
LOCATION (foobar:'foobar.txt') )
REJECT LIMIT UNLIMITED
PARALLEL ( DEGREE DEFAULT INSTANCES DEFAULT )
NOMONITORING;
Here is my external file foobar.txt
1234567
1234567890123456
126464843750476074218751012345678901234567890
7135009765625
048669433593
7
527
You can't do this with the reject rows clause, as it only accepts one form.
You have a variable-length (delimited) record, but a fixed-length field. Everything after the last position you specify, which is 20 in this case, is seen as filler that you want to ignore. That isn't an error condition; you might have rubbish at the end that isn't relevant to your table. There is nothing that says chars 21-45 in your third record shouldn't be there - just that you aren't interested in them.
It would be nice if you could discard them with the load when clause, but you don't seem to be able to compare , say, (21:21) to null or an empty string - the former isn't recognised and the latter causes an internal error, which isn't good.
You can make the longer records be sent to the bad file by forcing an SQL error when it tries to put a longer parsed value from the file into the field, by changing:
FOO_MAX20 POSITION(1:20)
to
FOO_MAX20 POSITION(1:21)
Values that are up to 20 characters are still loaded:
select * from foobar;
FOO_MAX20
--------------------
1234567
1234567890123456
7135009765625
048669433593
7
527
6 rows selected
but for anything longer than 20 characters it'll try to put 21 chars in to the database's 20-char field, which gets this in the log:
error processing column FOO_MAX20 in row 3 for datafile /path/to/dir/foobar.txt
ORA-12899: value too large for column FOO_MAX20 (actual: 21, maximum: 20)
And the bad file gets that record:
126464843750476074218751012345678901234567890
Have a CHECK CONSTRAINT on the column to not allow any value exceeding the `LENGTH'.

SQL loader position

New to SQL loader and am a bit confused about the POSITION.
Let's use the following sample data as reference:
Munising 49862 MI
Shingleton49884 MI
Seney 49883 MI
And here is the load statement:
LOAD DATA
INFILE 'zipcodes.dat'
REPLACE INTO TABLE zipcodes (
city_name POSITION(1) CHAR(10),
zip_code POSITION(*) CHAR(5),
state_abbr POSITION(*+1) CHAR(2)
)
In the load statement, the city_name POSITION is 1. How does SQLLDR know where it ends? Is CHAR(10) the trick here? Counting the two spaces behind 'Munising', it has 10 characters.
Also why is zip_code assigned with CHAR even though it contains nothing but numbers?
Thank You
Yes, when end position is not specified, it is derived from the datatype. This documentation explains the POSITION clause.
city_name POSITION(1) CHAR(10)
Here the starting position of data field is 1. Ending position is not specified, but is derived from the datatype, that is 10.
zip_code POSITION(*) CHAR(5)
Here * specifies that, data field immediately follows the previous field and should be 5 bytes long.
state_abbr POSITION(*+1) CHAR(2)
Here +1 specifies the offset from the previous field. Sqlloader skips 1 byte and reads next 2 bytes, as derived from char(2) datatype.
As to why zipcode is CHAR, zip code is considered simply a fixed length string. You are not going to do any arithmetic operations on it. So, CHAR is appropriate for it.
Also, have a look at SQL Loader datatypes. In control file you are telling SQL*Loader how to interpret the data. It can be different from that of table structure. In this example you could also specify INTEGER EXTERNAL for zip code.
we need three text file & 1 batch file for Load Data:
Suppose your file location 'D:\loaddata'
input file 'D:\loaddata\abc.CSV'
1. D:\loaddata\abc.bad -- empty
2. D:\loaddata\abc.log -- empty
3. D:\loaddata\abc.ctl "Write Code Below"
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.CSV'
TRUNCATE
into table Your_table
(
a_column POSITION (1:7) char,
b_column POSITION (8:10) char,
c_column POSITION (11:12) char,
d_column POSITION (13:13) char,
f_column POSITION (14:20) char
)
D:\loaddata\abc.bat --- For execution
sqlldr db_user/db_passward#your_tns control=D:\loaddata\abc.ctl log=D:\loaddata\abc.log
After double click "D:\loaddata\abc.bat" file you data will be load desire oracle table. if anything wrong check you "D:\loaddata\abc.bad" and "D:\loaddata\abc.log" file

How to really skip the processing of a column?

In order to load data (from a CSV file) into an Oracle database, I use SQL*Loader.
In the table that receives these data, there is a varchar2(500) column, called COMMENTS.
For some reasons, I want to ignore this information from the CSV file.
Thus, I wrote this control file:
Options (BindSize=10000000,Readsize=10000000,Rows=5000,Errors=100)
Load Data
Infile 'XXX.txt'
Append into table T_XXX
Fields Terminated By ';'
TRAILING NULLCOLS
(
...
COMMENTS FILLER,
...
)
This code seems to work correctly, as the COMMENTS field in database is always set to null.
However, if in my CSV file I have a record where the corresponding COMMENTS field exceeds the 500 characters limit, I get an error from SQL*Loader:
Record 2: Rejected - Error on table T_XXX, column COMMENTS.
Field in data file exceeds maximum length
Is there a way to really exclude the processing of my COMMENTS fields?
I can't reproduce your problem. I'm using Oracle 10.2.0.3.0 with SQL*Loader 10.2.0.1.
Here is my test case:
SQL> CREATE TABLE test_sqlldr (
2 ID NUMBER,
3 comments VARCHAR2(20),
4 id2 NUMBER
5 );
Table created
Control file:
LOAD DATA
INFILE test.data
INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS
( id,
comments filler,
id2
)
data file:
1;aaa;2
3;abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz;4
5;bbb;6
I'm using the command sqlldr userid=xxx/yyy#zzz control=test.ctl and I'm getting all the rows without errors:
SQL> select * from test_sqlldr;
ID COMMENTS ID2
---------- -------------------- ----------
1 2
3 4
5 6
You may try another approach, I'm getting the same desired result with the following control file:
LOAD DATA
INFILE test.data
INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS
( id,
comments "substr(:comments,1,0)",
id2
)
Update following Romaintaz's comment: I looked into it again and managed to get the same error as you when the size of the column exceeded 255 characters. This is because the default datatype of SQL*Loader is char(255). If you have a column with more data you will have to specify the length. The following control file solved the problem for a column with 300 characters:
LOAD DATA
INFILE test.data
INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS
( id,
comments filler char(4000),
id2
)
Hope this Helps,
--
Vincent
Just to suggest a tiny improvement, you might try something like:
LOAD DATA
IN FILE test.data INTO TABLE test_sqlldr
APPEND
FIELDS TERMINATED BY ';'TRAILING NULLCOLS
(
id,
comments char(4000) "substr(:comments, 1, 200)",
id2)
Now you'll grab the first 200 characters (or any number you specify in it's place) of all comments - unless some of your input records have values for the comments field that exceed 4000 characters, in which they'll be rejected by loader with the 'exceeds max length' error noted earlier. But assuming that's rare or not the case, all the records will load with some of the comments truncated to 200 chars.
If you go over char(4000) you'll get a SQL Loader error - there's a limit to how far you can push the beast.

Resources