Oracle SQL save file name from LOCATION as a column in external table - oracle

I have several input files being read into an external table in Oracle. I want to run some queries across the content from all the files, however, there are some queries where I would like to filter the data based on the input file it came from. Is there a way to access the name of the source file in a select statement against an external table or somehow create a column in the external table that includes the location source.
Here is an example:
CREATE TABLE MY_TABLE (
first_name CHAR(100 BYTES)
last_name CHAR(100 BYTES)
)
ORGANIZATION EXTERNAL
TYPE ORACLE_LOADER
DEFAULT DIRECTORY TMP
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
SKIP 1
badfile 'my_table.bad'
discardfile 'my_table.dsc'
LOGFILE 'my_table.log'
FIELDS terminated BY 0x'09' optionally enclosed BY '"' LRTRIM missing field VALUES are NULL
(
first_name char(100),
last_name
)
)
LOCATION ( TMP:'file1.txt','file2.txt')
)
REJECT LIMIT 100;
select distinct last_name
from MY_TABLE
where location like 'file2.txt' -- This is the part I don't know how to code
Any suggestions?
There is always the option to add the file name to the input file itself as an additional column. Ideally, I would like to avoid this work around.

The ALL_EXTERNAL_LOCATIONS data dictionary view contains information about external table locations. Also DBA_* and USER_* versions.
Edit: (It would help if I read the question thoroughly.)
You don't just want to read the location for the external table, you want to know which row came from which file. Basically, you need to:
Create a shell script that adds the file location to the file contents and sends them to stdin.
Add the PREPROCESSOR directive to your external table definition to execute the script.
Alter the external table definition to include a column to show the filename appended in the first step.
Here is an asktom article explaining it in detail.

Related

Dynamically Load Multiple CSV files to Oracle External Table

I am trying to load my oracle external table dynamically with multiple .csv files.
I am able to load one .csv file but as soon as I alter with new .csv file name, the table gets rewritten.
I have multiple .csv files in a folder which changes everyday with a prefix of the date.
Eg file name FileName1_20200607.csv, FileName2_20200607.csv
I dont think there is a way to write 'FileName*20200607.csv' to pick all the files for that date?
My code:
......
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "DATA_DIR_PATH"
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE BADFILE CRRENG_ORA_APPS_OUT_DIR
: 'Filebad' DISCARDFILE DATA_OUT_PATH :
'Filedesc.dsc' LOGFILE DATA_OUT_PATH :
'Filelog.log' SKIP 0 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED
BY '"' AND '"' MISSING FIELD VALUES ARE NULL REJECT ROWS WITH ALL NULL
FIELDS )
LOCATION
( 'FileName1_20200607.csv',
'FileName2_20200607.csv'
)
);
But I want to populate these file name dynamically. It should pick up all the file names from the DATA_DIR. There are about 50 other file names.
I can add Unix script if need be.

Insert part of data from csv into oracle table

I have a CSV (pipe-delimited) file as below
ID|NAME|DES
1|A|B
2|C|D
3|E|F
I need to insert the data into a temp table where I already have SQLLODER in place, but my table have only one column. The below is the control file configuration for loading from csv.
OPTIONS (SKIP=1)
LOAD DATA
CHARACTERSET UTF8
TRUNCATE
INTO TABLE EMPLOYEE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
NAME
)
How do I select the data from only 2nd column from the csv and insert into only one column in the table EMPLOYEE?
Please let me know if you have any questions.
If you're using a filler field you don't need to have a matching column in the database table - that's the point, really - and as long as you know the field you're interested in is always the second one, you don't need to modify the control file if there are extra fields in the file, you just never specify them.
So this works, with just a filler ID field added and the three-field data file you showed:
OPTIONS (SKIP=1)
LOAD DATA
CHARACTERSET UTF8
TRUNCATE
INTO TABLE EMPLOYEE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
IF FILLER,
NAME
)
Dmoe'd with:
SQL> create table employee (name varchar2(30));
$ sqlldr ...
Commit point reached - logical record count 3
SQL> select * from employee;
NAME
------------------------------
A
C
E
Adding more fields to the data file makes no difference, as long as they are after the field you are actually interested in. The same thing works for external tables, which can be more convenient for temporary/staging tables, as long as the CSV file is available on the database server.
Columns in data file which needs to be excluded from load can be defined as FILLER.
In given example use following. List all incoming fields and add filler to those columns needs to be ignored from load, e.g.
(
ID FILLER,
NAME,
DES FILLER
)
Another issue here is to ignore header line as in CSV so just use OPTIONS clause e.g.
OPTIONS(SKIP=1)
LOAD DATA ...
Regards,
R.

Oracle External Table Columns based on header row or star(*)

I have a text file with around 100 columns terminated by "|". And i need to get few of the columns from this file into by External Table. So the solution i have is either specify all columns under ACCESS PARAMETERS section in the same order as the file. and define required columns in the Create table definition. Or define all columns in the same order in the create table itself.
Can i avoid defining all the columns in the query? Is it possible to get the columns based on the first row name itself - Provided i have the column names as the first row.
Or is it atleast possible to get all columns like a (select * ) without mentioning each column?
Below is the code i use
drop table lz_purchase_data;
CREATE TABLE lz_purchase_data
(REC_ID CHAR(50))
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "FILEZONE"
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE Skip 1
FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"'
LRTRIM MISSING FIELD VALUES ARE NULL
) LOCATION( 'PURCHASE_DATA.txt' ))
REJECT LIMIT UNLIMITED
PARALLEL 2 ;
select * from LZ_PURCHASE_DATA;

In Hive, how do I load only part of the raw data to a table?

I've got a typical CREATE TABLE statement as follows:
CREATE EXTERNAL TABLE temp_url (
MSISDN STRING,
TIMESTAMP STRING,
URL STRING,
TIER1 STRING
)
row format delimited fields terminated by '\t' lines terminated by '\n'
LOCATION 's3://mybucket/input/project_blah/20140811/';
Where /20140811/ is a directory with gigabytes worth of data inside.
Loading the things is not a problem. Querying anything on it, however, chokes Hive up and simply gives me a number of MapRed errors.
So instead, I'd like to ask if there's a way to load only part of the data in /20140811/. I know I can select a few files from inside the folder, dump them into another folder, and use that, but it seems tedious, especially when I've got 20 or so of this /20140811/ directories.
Is there something like this:
CREATE EXTERNAL TABLE temp_url (
MSISDN STRING,
TIMESTAMP STRING,
URL STRING,
TIER1 STRING
)
row format delimited fields terminated by '\t' lines terminated by '\n'
LOCATION 's3://mybucket/input/project_blah/Half_of_20140811/';
I'm also open to non-hive answers. Perhaps there's a way in s3cmd to quickly get a certain amount of data inside /20140811/ dump it into /20140811_halved/ or something.
Thanks.
I would suggest the following as a workaround :
Create a temp table with same structure. (using like)
insert into NEW_TABLE select * from OLD_TABLE limit 1000;
You add as many filter conditions to filter out data and load.
Hope this helps you.
Since you are saying that you have "20 or so of this /20140811/ directories", why don't you try creating an external table with partitions on those directories and run your queries on a single partition.

Have dynamic columns in external tables

My requirement is that I have to use a single external table in a store procedure for different text files which have different columns.
Can I use dynamic columns in external tables in Oracle 11g? Like this:
create table ext_table as select * from TBL_test
organization external (
type oracle_loader
default directory DATALOAD
access parameters(
records delimited by newline
fields terminated by '#'
missing field values are null
)
location ('APD.txt')
)
reject limit unlimited;
The set of columns that are defined for an external table, just like the set of columns that are defined for a regular table, must be known at the time the external table is defined. You can't choose at runtime to determine that the table has 30 columns today and 35 columns tomorrow. You could also potentially define the external table to have the maximum number of columns that any of the flat files will have, name the columns generically (i.e. col1 through col50) and then move the complexity of figuring out that column N of the external table is really a particular field to the ETL code. It's not obvious, though, why that would be more useful than creating the external table definition properly.
Why is there a requirement that you use a single external table definition to load many differently formatted files? That does not seem reasonable.
Can you drop and re-create the external table definition at runtime? Or does that violate the requirement for a single external table definition?

Resources