I have a CSV (pipe-delimited) file as below
ID|NAME|DES
1|A|B
2|C|D
3|E|F
I need to insert the data into a temp table where I already have SQLLODER in place, but my table have only one column. The below is the control file configuration for loading from csv.
OPTIONS (SKIP=1)
LOAD DATA
CHARACTERSET UTF8
TRUNCATE
INTO TABLE EMPLOYEE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
NAME
)
How do I select the data from only 2nd column from the csv and insert into only one column in the table EMPLOYEE?
Please let me know if you have any questions.
If you're using a filler field you don't need to have a matching column in the database table - that's the point, really - and as long as you know the field you're interested in is always the second one, you don't need to modify the control file if there are extra fields in the file, you just never specify them.
So this works, with just a filler ID field added and the three-field data file you showed:
OPTIONS (SKIP=1)
LOAD DATA
CHARACTERSET UTF8
TRUNCATE
INTO TABLE EMPLOYEE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
IF FILLER,
NAME
)
Dmoe'd with:
SQL> create table employee (name varchar2(30));
$ sqlldr ...
Commit point reached - logical record count 3
SQL> select * from employee;
NAME
------------------------------
A
C
E
Adding more fields to the data file makes no difference, as long as they are after the field you are actually interested in. The same thing works for external tables, which can be more convenient for temporary/staging tables, as long as the CSV file is available on the database server.
Columns in data file which needs to be excluded from load can be defined as FILLER.
In given example use following. List all incoming fields and add filler to those columns needs to be ignored from load, e.g.
(
ID FILLER,
NAME,
DES FILLER
)
Another issue here is to ignore header line as in CSV so just use OPTIONS clause e.g.
OPTIONS(SKIP=1)
LOAD DATA ...
Regards,
R.
Related
I've an issue where one of the column loaded in a hive table contains junk character ("~) in a column suffixed with actual value (ABC). So the actual value that's visible for this column is (ABC"~).
This column can have either ABC (or any such string) or NULL. The table is huge and Update is not an option here.
I've thought of a solution of creating a temp table with this column containing either the string (ABC) or NULL, thereby want to remove this junk character ("~) completely while copying the data from original table to this temp table.
Any help on how I can remove this junk? I tried using regexp function, but no success. Any suggestions?
I was not using regexp properly; my fault.
The data loaded initially in the table had the extra characters attached to a column's values. For Ex: If the column's actual value was Adf452, then the data contained in the cell was Adf452"~.
So I loaded the data to a temp table like this:
insert overwrite table tempTable select colA, colB, colC, regexp_replace(colC,"\"~",""), partitionedCol from origTable;
This simply loaded the data in tempTable without those junk characters.
I have a text file with around 100 columns terminated by "|". And i need to get few of the columns from this file into by External Table. So the solution i have is either specify all columns under ACCESS PARAMETERS section in the same order as the file. and define required columns in the Create table definition. Or define all columns in the same order in the create table itself.
Can i avoid defining all the columns in the query? Is it possible to get the columns based on the first row name itself - Provided i have the column names as the first row.
Or is it atleast possible to get all columns like a (select * ) without mentioning each column?
Below is the code i use
drop table lz_purchase_data;
CREATE TABLE lz_purchase_data
(REC_ID CHAR(50))
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "FILEZONE"
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE Skip 1
FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"'
LRTRIM MISSING FIELD VALUES ARE NULL
) LOCATION( 'PURCHASE_DATA.txt' ))
REJECT LIMIT UNLIMITED
PARALLEL 2 ;
select * from LZ_PURCHASE_DATA;
Is it possible to create external partitioned table without location? I want to add all the locations later, together with partitions.
i tried:
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
PARTITIONED BY day;
but i got ParseException: missing EOF at 'PARTITIONED' near 'TEXTFILE'
I don't think so, as said in alter location.
But anyway, i think your query as some errors and the correct script would be :
CREATE EXTERNAL TABLE IF NOT EXISTS a.b
(line STRING)
COMMENT 'abc'
PARTITIONED BY (day String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE
;
I think the issue is that you have not specified data type for your partition column "day". And you can create a HIVE external table without location and can use ALTER table options later to change the location.
I am running the following commands to create my table ABC and insert data from all files that are in my designated file path. Now I want to add a column with filename, but I can't find any way to do that without looping through the files or something. Any suggestions on what the best way to do this would be?
CREATE TABLE ABC
(NAME string
,DATE string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
hive -e "LOAD DATA LOCAL INPATH '${DATA_FILE_PATH}' INTO TABLE ABC;"
Hive does have virtual columns, which include INPUT__FILE__NAME. The link shows how to use this in a statement.
To fill another table with the filename as a column. Assuming your location of your data is hdfs://hdfs.location:port/data/folder/filename1
DROP TABLE IF EXISTS ABC2;
CREATE TABLE ABC2 (
filename STRING COMMENT 'this is the file the row was in',
name STRING,
date STRING);
INSERT INTO TABLE ABC2 SELECT split(INPUT__FILE__NAME,'folder/')[1],* FROM ABC;
You can alter the split to change how much of the full path you actually want to store.
I have several input files being read into an external table in Oracle. I want to run some queries across the content from all the files, however, there are some queries where I would like to filter the data based on the input file it came from. Is there a way to access the name of the source file in a select statement against an external table or somehow create a column in the external table that includes the location source.
Here is an example:
CREATE TABLE MY_TABLE (
first_name CHAR(100 BYTES)
last_name CHAR(100 BYTES)
)
ORGANIZATION EXTERNAL
TYPE ORACLE_LOADER
DEFAULT DIRECTORY TMP
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
SKIP 1
badfile 'my_table.bad'
discardfile 'my_table.dsc'
LOGFILE 'my_table.log'
FIELDS terminated BY 0x'09' optionally enclosed BY '"' LRTRIM missing field VALUES are NULL
(
first_name char(100),
last_name
)
)
LOCATION ( TMP:'file1.txt','file2.txt')
)
REJECT LIMIT 100;
select distinct last_name
from MY_TABLE
where location like 'file2.txt' -- This is the part I don't know how to code
Any suggestions?
There is always the option to add the file name to the input file itself as an additional column. Ideally, I would like to avoid this work around.
The ALL_EXTERNAL_LOCATIONS data dictionary view contains information about external table locations. Also DBA_* and USER_* versions.
Edit: (It would help if I read the question thoroughly.)
You don't just want to read the location for the external table, you want to know which row came from which file. Basically, you need to:
Create a shell script that adds the file location to the file contents and sends them to stdin.
Add the PREPROCESSOR directive to your external table definition to execute the script.
Alter the external table definition to include a column to show the filename appended in the first step.
Here is an asktom article explaining it in detail.