Spectrum Same External Table Shows in Multiple Schemas (svv_external_tables) - external-tables

It's a really simple test actually. I create a couple external schemas and create an external table in one of the schemas and then querying svv_external_tables shows the table exists in ALL schemas!! What am I missing?
create external schema mytestschema from data catalog
database 'mytestdb'
iam_role 'arn:aws:iam::123456789:role/spectrumrole'
;
create external table mytestdb.mytestschema.newtable (
col1 varchar(200),
col2 varchar(200),
col3 varchar(200)
)
partitioned by (cycle_date varchar(20) )
stored as parquet
location 's3://s3loc';
select * from svv_external_tables;

External schema doesn't hold the table descriptions, it just hold the connection parameters to a database in data catalog. Or put the other way - whatever is in the data catalog database is shown in every external schema that points to it.

Related

Cannot create an external hadoop table in db2

i have a problem with creating an external hadoop table in db2.
I use this create statement:
CREATE EXTERNAL HADOOP TABLE DATA_LAKE.TABLE_TEST (
Id int,
blabla varchar(10)
)
but when i run this i got an error saying:
The name of the object to be created is identical to the existing name
"TABLE_TEST" of type "Table".. SQLCODE=-601,SQLSTATE=42710.
On my file share i don't have any table name with this name and also in db2 under DATA_LAKE schema i don't have any TABLE_TEST table. I also tried to find where is this table, maybe in a catalog , but i didn't find anything.
SELECT * FROM SYSCAT.TABLES
WHERE TABNAME LIKE '%TABLE_TEST%'
Please for any help

Create partitioned table from non partitioned table

Suppose I have internal orc non partitioned table in Hive:
CREATE TABLE IF NOT EXISTS non_partitioned_table(
id STRING,
company STRING,
city STRING,
country STRING,
)
STORED AS ORC;
Is it possible somehow create parquet partitioned table this way via cte like statement?
create partitioned_table PARTITION ON (date STRING) like non_partitioned_table;
alter table partitioned_table SET FILEFORMAT PARQUET;
This create statement doesn't work.
So basically I need to add column and make table partitioned by this column. I know that I can create table through the simple create table statement, but I need to do it within CREATE TABLE LIKE and the altered somehow
Your table doesn't have a date column to begin with, so you're going to have to make a new one.
You might be able to ALTER TABLE non_partitioned_table ADD PARTITION, but haven't tried that myself. If you want to try it, I would suggest the partition location be outside of the existing HDFS directory.
Anyways, the CREATE-TABLE-LIKE DDL does not support PARTITIONED BY
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
You need to copy the DESCRIBE TABLE schema from the first, then alter it and add the PARTITIONED BY, and optionally specify STORED AS. (SET FILEFORMAT PARQUET doesn't change the data type in-place).
Then, if you want the data in the new table, you need to INSERT OVERWRITE TABLE

update/dropping external table in Hive

I'm working on Hive (tables) and I have some problem with updating and dropping external table.
I created 2 external tables : T1 and T2 with same attributes
create external table T1(
nom string,
prenom string,
age int);
With query :
insert overwrite table T2(
select
nom,
prenom,
age from T1;
I can update T2 with data in T1, but doing :
drop table T2;
and then recreating it create external table T2..... I get automatically all present in T2 before dropping, while I would to have an empty table.
Is it "normal". Anybody could explain to me, why? and/or recommandate some method?
thx.
Dropping the table would not have removed the data present in the HDFS. The files will be available in the folder
/user/hive/warehouse/dbname.db/tablename
Try creating the table second time by removing the data from HDFS or with some other location specified in the create query itself.

creating partition in external table in hive

I have successfully created and added Dynamic partitions in an Internal table in hive. i.e. by using following steps:
1-created a source table
2-loaded data from local into source table
3- created another table with partitions - partition_table
4- inserted the data to this table from source table resulting in creation of all the partitions dynamically
My question is, how to perform this in external table? I read so many articles on this, but i am confused , that do I have to specify path to the already existing partitions for creating partitions for external table??
example:
Step 1:
create external table1 ( name string, age int, height int)
location 'path/to/dataFile/in/HDFS';
Step 2:
alter table table1 add partition(age)
location 'path/to/already/existing/partition'
I am not sure how to proceed with partitioning in external tables. Can somebody please help by giving step by step description of the same?.
Thanks in advance!
Yes, you have to tell Hive explicitly what is your partition field.
Consider you have a following HDFS directory on which you want to create a external table.
/path/to/dataFile/
Let's say this directory already have data stored(partitioned) department wise as follows:
/path/to/dataFile/dept1
/path/to/dataFile/dept2
/path/to/dataFile/dept3
Each of these directories have bunch of files where each file
contains actual comma separated data for fields say name,age,height.
e.g.
/path/to/dataFile/dept1/file1.txt
/path/to/dataFile/dept1/file2.txt
Now let's create external table on this:
Step 1. Create external table:
CREATE EXTERNAL TABLE testdb.table1(name string, age int, height int)
PARTITIONED BY (dept string)
ROW FORMAT DELIMITED
STORED AS TEXTFILE
LOCATION '/path/to/dataFile/';
Step 2. Add partitions:
ALTER TABLE testdb.table1 ADD PARTITION (dept='dept1') LOCATION '/path/to/dataFile/dept1';
ALTER TABLE testdb.table1 ADD PARTITION (dept='dept2') LOCATION '/path/to/dataFile/dept2';
ALTER TABLE testdb.table1 ADD PARTITION (dept='dept3') LOCATION '/path/to/dataFile/dept3';
Done, run select query once to verify if data loaded successfully.
1. Set below property
set hive.exec.dynamic.partition=true
set hive.exec.dynamic.partition.mode=nonstrict
2. Create External partitioned table
create external table1 ( name string, age int, height int)
location 'path/to/dataFile/in/HDFS';
3. Insert data to partitioned table from source table.
Basically , the process is same. its just that you create external partitioned table and provide HDFS path to table under which it will create and store partition.
Hope this helps.
The proper way to do it.
Create the table and mention it is partitioned.
create external table1 ( name string, age int, height int)
partitioned by (age int)
stored as ****(your format)
location 'path/to/dataFile/in/HDFS';
Now you have to refresh the partitions in the hive metastore.
msck repair table table1
This will take care of loading all your partitions into the hive metastore.
You can use msck repair table at any point during your process to have the metastore updated.
Follow the below steps:
Create a temporary table/Source table
create table source_table(name string,age int,height int) row format delimited by ',';
Use your delimiter as in the file instead of ',';
Load data into the source table
load data local inpath 'path/to/dataFile/in/HDFS';
Create external table with partition
create external table external_dynamic_partitions(name string,height int)
partitioned by (age int)
location 'path/to/dataFile/in/HDFS';
Enable dynamic partition mode to nonstrict
set hive.exec.dynamic.partition.mode=nonstrict
Load data to external table with partitions from source file
insert into table external_dynamic partition(age)
select * from source_table;
That's it.
You can check the partitions information using
show partitions external_dynamic;
You can even check if it is an external table or not using
describe formatted external_dynamic;
External table is a type of table in Hive where the data is not moved to the hive warehouse. That means even if U delete the table, the data still persists and you will always get the latest data, which is not the case with Managed table.

Hive partition folder changes after import to another table

I have a Hive table TEST with this configuration:
create external table if not exists TEST (
ID bigint,
ACTIVITY_ID string,
BATCH_NBR
)
PARTITIONED BY (year INT, month INT, day INT)
CLUSTERED BY (BATCH_NBR) into 20 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/lake/hive/test';
And I have Hive files in this location which I can easily load into Hive table and it works.
/user/lake/hive/test/2013/01/01/part-r-00001
Now if I create another table STORE and insert some data from this TEST table, folder structures are getting changes for the Test table. I was expecting after loading the same data, location for the STORE table will have something like this:
/user/core/store/2014/07/03/batch123231.1313
But the above location changed to this:
/user/core/store/year=2013/month=01/day=01/
I'm using insert overwrite table STORE select * from TEST; query for loading STORE table from TEST.
How can I load that table and preserve the same folder structure in destination?
Internal table in Hive will follow their own/default folder structure in /apps/hive/warehouse folder and will not preserve folder structure if the data is loaded from an external Hive table. I was using internal table for "Store", so it was not working as expected.

Resources