I have , for instances 4 csv files in a directory with 3 columns each one, but the name of the file contains the date and time and I need to add these information (date and time) to 2 new columns in the external tables. Is there some way to do that directly in Oracle without having to prepare the csv files adding the new columns on them (parsing the name and extracting the date and time)
Thanks
Related
How to insert data from multiple files having different columns into a table in Oracle database using SQL Loader with Single control file.
Basically ,
We have 3 CSV files
file 1 having columns a,b,c
file 2 having columns d,e,f
file 3 having columns g,h,i
We need to insert the above attributes to a Table named "TableTest"
having columns a,b,c ,d,e,f,g,h,i
Using single control file
Thanks in advance
You really can't. You can either splice the .csv files together (a lot of nasty work) or create 3 tables to load and then use plsql or sql to join them together into your target table.
Based in the csv file column header it should create table dynamically and also insert records of that csv file into the newly create table.
Ex:
1) If i upload a file TEST.csv with 3 columns, it should create a table dynamically with three
2) Again if i upload a new file called TEST2.csv with 5 columns, it should create a table dynamically with five columns.
Every time it should create a table based on the uploaded csv file header..
how to achieve this in oracle APEX..
Thanks in Advance..
Without creating new tables you can treat the CSVs as tables using a TABLE function you can SELECT from. If you download the packages from the Alexandria Project you will find a function that will do just that inside CSV_UTIL_PKG (clob_to_csv is this function but you will find other goodies in here).
You would just upload the CSV and store in a CLOB column and then you can build reports on it using the CSV_UTIL_PKG code.
If you must create a new table for the upload you could still use this parser. Upload the file and then select just the first row (e.g. SELECT * FROM csv_util_pkg.clob_to_csv(your_clob) WHERE ROWNUM = 1). You could insert this row into an Apex Collection using APEX_COLLECTION.CREATE_COLLECTION_FROM_QUERY to make it easy to then iterate over each column.
You would need to determine the datatype for each column but could just use VARCHAR2 for everything.
But if you are just using generic columns you could just as easily just store one addition column as a name of this collection of records and store all of the uploads in the same table. Just build another table to store the column names.
Simply store this file as BLOB if structure is "dynamic".
You can use XML data type for this use case too but it won't be very different from BLOB column.
There is a SecureFile feature since 11g, It is a new BLOB implementation, it performs better than regular BLOB and it is good for unstructured or semi structured data.
I have 3 columns: user, datetime, and data
My data is space delimited and each row is delimited by a new line
right now I'm using the regexserde to read in my input, however I want to partition by the user. If I do that user can no longer be a column, correct? If so how do I load my data onto my tables?
In Hive each partition corresponds to a folder in HDFS. You can reload the data from your unpartitioned Hive table into a new partitioned HIve table using a create-table-as-select (CTAS) statement. See https://cwiki.apache.org/Hive/languagemanual-ddl.html#LanguageManualDDL-CreateTable for more details.
You can order the data in HDFS in sub-directories under the current directory, the directory name has to be in the format PART_NAME=PART_VALUE.
If your data is split into files where in each file you have only one type of "user" just create directories corresponding to the usernames (e.g. USERNAME=XYZ) and put all the files that match that username in its directory.
Next you can create an external-table with partitions (see example).
The only problem is that you'll have to define the column "user" that's in your data anyway (but you can just ignore it) and query the other column (USERNAME) which will provide the needed partition pruning.
I have the following folder structure in hdfs
/input/data/yyyy/mm/dd/
and inside it data files, for example:
/input/data/2013/05/01/
file_2013_05_01_01.json // file format yyyy_mm_dd_hh
file_2013_05_01_02.json // file format yyyy_mm_dd_hh
....
I've defined hive external table for this folder:
CREATE EXTERNAL TABLE input_data (
vr INT, ....
)
PARTITIONED BY (tsp STRING)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
STORED AS TEXTFILE;
adding for each folder a partition as following:
alter table input_data ADD PARTITION (tsp="2013-05-01") LOCATION '/input/data/2013/05/01/';
The following query will take as input all files in date 2013-05-01
select ... from input_data where tps="2013-05-01"
How can I take only files of specific hour? without changing the hdfs structure to put each hour in separate folder?
You could make use of a virtual column called INPUT__FILE__NAME. It is one of the 2 two virtual columns provided by Hive 0.8.0 and onward and represents the input file's name for a mapper task. So you could do something like this :
select ... from input_data
where tps="2013-05-01"
and INPUT__FILE__NAME='file_2013_05_01_01.json';
HTH
You could make use of the following construct:
SELECT
*
FROM
my_input_data
WHERE
INPUT__FILE__NAME LIKE '%hh.json';
Here hh is your desired hour and INPUT__FILE__NAME is the virtual column available to hive queries while processing a given file.
I have a dump of several Postgresql Tables in a selfcontained CSV file which I want to import into an Oracle Database with a matching schema. I found several posts on how to distribute data from one CSV "table" to multiple Oracle tables, but my problem is several DIFFERENT CVS "tables" in the same file.
Is it possible to specify table separators or somehow mark new tables in an SQLLDR control file, or do I have to split up the file manually before feeding it to SQLLDR?
That depends on your data. How do you determine which table a row is destined for? If you can determine which table base on data in the row, then it is fairly easy to do with a WHEN.
LOAD DATA
INFILE bunchotables.dat
INTO TABLE foo WHEN somecol = 'pick me, pick me' (
...column defs...
)
INTO TABLE bar WHEN somecol = 'leave me alone' (
... column defs
)
If you've got some sort of header row that determines the target table then you are going to have to split it before hand with another utility.