Have dynamic columns in external tables - oracle

My requirement is that I have to use a single external table in a store procedure for different text files which have different columns.
Can I use dynamic columns in external tables in Oracle 11g? Like this:
create table ext_table as select * from TBL_test
organization external (
type oracle_loader
default directory DATALOAD
access parameters(
records delimited by newline
fields terminated by '#'
missing field values are null
)
location ('APD.txt')
)
reject limit unlimited;

The set of columns that are defined for an external table, just like the set of columns that are defined for a regular table, must be known at the time the external table is defined. You can't choose at runtime to determine that the table has 30 columns today and 35 columns tomorrow. You could also potentially define the external table to have the maximum number of columns that any of the flat files will have, name the columns generically (i.e. col1 through col50) and then move the complexity of figuring out that column N of the external table is really a particular field to the ETL code. It's not obvious, though, why that would be more useful than creating the external table definition properly.
Why is there a requirement that you use a single external table definition to load many differently formatted files? That does not seem reasonable.
Can you drop and re-create the external table definition at runtime? Or does that violate the requirement for a single external table definition?

Related

Column count in a export and import from source to destination

Can anyone help me with the below query. Do no of column should match in source and target table while export and import data from source to destination using datapump in Oracle 11.1
Eg: we are exporting sourcedb.tab(10 columns) and importing to targetdb.tab(11 columns).
Will this work or will give an error.
This should work but I haven't tried.
From Oracle 11.2 documentation (Can't find that for 11.1, but most likely the same):
When Data Pump detects that the source table and target table do not
match (the two tables do not have the same number of columns or the
target table has a column name that is not present in the source
table), it compares column names between the two tables. If the tables
have at least one column in common, then the data for the common
columns is imported into the table (assuming the datatypes are
compatible). The following restrictions apply:
This behavior is not > supported for network imports.
The following types of columns cannot be dropped: object columns,
object attributes, nested table columns, and ref columns based on a
primary key.
Also note that you need to set parameter TABLE_EXISTS_ACTION=APPEND (or TRUNCATE , which remove all existing data). Otherwise, data pump will take the default value of SKIP leaving the table as is.
11.2 Documentation of Data Pump Import
It won't work, as far as I can tell. Target table has to match the source table.
So, what can you do?
create a database link between those two databases and insert rows manually, e.g.
insert into target#db_link (col1, col2, ..., col10, col11)
select col1, col2, ..., col10, null
from source
drop 11th column from the target table, perform import and then alter table to re-create the 11th column

Hive not creating separate directories for skewed table

My hive version is 1.2.1. I am trying to create a skewed table but it clearly doesn't seem to be working. Here is my table creation script:-
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.mytable
(
country string,
payload string
)
PARTITIONED BY (year int,month int,day int,hour int)
SKEWED BY (country) on ('USA','Brazil') STORED AS DIRECTORIES
STORED AS TEXTFILE;
INSERT OVERWRITE TABLE mydb.mytable PARTITION(year = 2019, month = 10, day=05, hour=18)
SELECT country,payload FROM mydb.mysource;
The select query returns names of countries and some associated string data (payload). So, based on the way I have specified skewing on the column 'country' I was expecting the insert statement to cause creation of separate directories for USA & Brazil (the select query returns enough rows with country as USA & Brazil), but this clearly didn't happen. I see that hive created directory called 'HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME' and all the values went into a single file in that directory. Skewed table is only supposed to send rows with default values (those not specified in table creation statement) to common directory (which is what HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME seems to be) and should create dedicated directories for the rows with skew values. But instead all is going to the default directory and the other directory isn't even created. Do I have to toggle any hive options to make this thing work?
It looks like old bug, doesn't look like it's fixed yet. https://issues.apache.org/jira/browse/HIVE-13697. Basically internally when Hive stores these skew values specified during the table creation, they are converted to lower case before storing in the metastore. That's why the workaround for now is to convert case in the select statement so it goes to the right bucket. I tested this and this way it works.

Oracle External Table Columns based on header row or star(*)

I have a text file with around 100 columns terminated by "|". And i need to get few of the columns from this file into by External Table. So the solution i have is either specify all columns under ACCESS PARAMETERS section in the same order as the file. and define required columns in the Create table definition. Or define all columns in the same order in the create table itself.
Can i avoid defining all the columns in the query? Is it possible to get the columns based on the first row name itself - Provided i have the column names as the first row.
Or is it atleast possible to get all columns like a (select * ) without mentioning each column?
Below is the code i use
drop table lz_purchase_data;
CREATE TABLE lz_purchase_data
(REC_ID CHAR(50))
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "FILEZONE"
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE Skip 1
FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"'
LRTRIM MISSING FIELD VALUES ARE NULL
) LOCATION( 'PURCHASE_DATA.txt' ))
REJECT LIMIT UNLIMITED
PARALLEL 2 ;
select * from LZ_PURCHASE_DATA;

WHY does this simple Hive table declaration work? As if by magic

The following HQL works to create a Hive table in HDInsight which I can successfully query. But, I have several questions about WHY it works:
My data rows are, in fact, terminated by carriage return line feed, so why does 'COLLECTION ITEMS TERMINATED BY \002' work? And what is \002 anyway? And no location for the blob is specified so, again, why does this work?
All attempts at creating the same table and specifying "CREATE EXTERNAL TABLE...LOCATION '/user/hive/warehouse/salesorderdetail'" have failed. The table is created but no data is returned. Leave off "external" and don't specify any location and suddenly it works. Wtf?
CREATE TABLE IF NOT EXISTS default.salesorderdetail(
SalesOrderID int,
ProductID int,
OrderQty int,
LineTotal decimal
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS TEXTFILE
Any insights are greatly appreciated.
UPDATE:Thanks for the help so far. Here's the exact syntax I'm using to attempt external table creation. (I've only changed the storage account name.) I don't see what I'm doing wrong.
drop table default.salesorderdetailx;
CREATE EXTERNAL TABLE default.salesorderdetailx(SalesOrderID int,
ProductID int,
OrderQty int,
LineTotal decimal)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS TEXTFILE
LOCATION 'wasb://mycn-1#my.blob.core.windows.net/mycn-1/hive/warehouse/salesorderdetailx'
When you create your cluster in HDInsight, you have to specify underlying blob storage. It assumes that you are referencing that blob storage. You don't need to specific a location because your query is creating an internal table (see answer #2 below) which is created at a default location. External tables need to specify a location in Azure blob storage (outside of the cluster) so that the data in the table is not deleted when the cluster is dropped. See the Hive DDL for more information.
By default, tables are created as internal, and you have to specify the "external" to make them external tables.
Use EXTERNAL tables when:
Data is used outside Hive
You need data to be updateable in real time
Data is needed when you drop the cluster or the table
Hive should not own data and control settings, directories, etc.
Use INTERNAL tables when:
You want Hive to manage the data and storage
Short term usage (like a temp table)
Creating table based on existing table (AS SELECT)
Does the container "user/hive/warehouse/salesorderdetail" exist in your blob storage? That might explain why it is failing for your external table query.

Oracle SQL save file name from LOCATION as a column in external table

I have several input files being read into an external table in Oracle. I want to run some queries across the content from all the files, however, there are some queries where I would like to filter the data based on the input file it came from. Is there a way to access the name of the source file in a select statement against an external table or somehow create a column in the external table that includes the location source.
Here is an example:
CREATE TABLE MY_TABLE (
first_name CHAR(100 BYTES)
last_name CHAR(100 BYTES)
)
ORGANIZATION EXTERNAL
TYPE ORACLE_LOADER
DEFAULT DIRECTORY TMP
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
SKIP 1
badfile 'my_table.bad'
discardfile 'my_table.dsc'
LOGFILE 'my_table.log'
FIELDS terminated BY 0x'09' optionally enclosed BY '"' LRTRIM missing field VALUES are NULL
(
first_name char(100),
last_name
)
)
LOCATION ( TMP:'file1.txt','file2.txt')
)
REJECT LIMIT 100;
select distinct last_name
from MY_TABLE
where location like 'file2.txt' -- This is the part I don't know how to code
Any suggestions?
There is always the option to add the file name to the input file itself as an additional column. Ideally, I would like to avoid this work around.
The ALL_EXTERNAL_LOCATIONS data dictionary view contains information about external table locations. Also DBA_* and USER_* versions.
Edit: (It would help if I read the question thoroughly.)
You don't just want to read the location for the external table, you want to know which row came from which file. Basically, you need to:
Create a shell script that adds the file location to the file contents and sends them to stdin.
Add the PREPROCESSOR directive to your external table definition to execute the script.
Alter the external table definition to include a column to show the filename appended in the first step.
Here is an asktom article explaining it in detail.

Resources