Hive: Create New Table from Existing Partitioned Table - hadoop

I'm using Amazon's Elastic MapReduce and I have a hive table created based on a series of log files stored in Amazon S3 and split in folders by day like so:
data/day=2011-09-01/log_file.tsv
data/day=2011-09-02/log_file.tsv
I am currently trying to create an additional table which filters out some unwanted activity in these log files but I can't figure out how to do this and keep getting errors such as:
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.
If my initial table create statement looks something like this:
CREATE EXTERNAL TABLE IF NOT EXISTS table1 (
... fields ...
)
PARTITIONED BY ( DAY STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bucketname/data/';
That initial table works fine and I've been able to query it with no problems.
How then should I create a new table that shares the structure of the previous one but simply filters out data? This doesn't seem to work.
CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;
FROM table1
INSERT OVERWRITE TABLE table2
SELECT * WHERE
col1 = '%somecriteria%' AND
more criteria...
;
As I've stated above, this returns:
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.
Thanks!

This always works for me:
CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;
INSERT OVERWRITE TABLE table2 PARTITION (day) SELECT col1, col2, ..., day FROM table1;
ALTER TABLE table2 RECOVER PARTITIONS;
Notice that I've added 'day' as a column in the SELECT statement. Also notice that there is an ALTER TABLE line which is necessary for Hive to become aware of the partitions that were newly created in table2.

I have never used the like option.. so thanks for showing me that. Will that actually create all of the partitions that the first table has as well? If not, that could be the issue. You could try using dynamic partitions:
create external table if not exists table2 like table1;
insert overwrite table table2 partition(part) select col1, col2 from table1;
Might not be the best solution, as I think you have to specify your columns in the select clause (as well as the partition column in the partition clause).
And, you must turn on dynamic partitioning.
I hope this helps.

Related

How to copy complete table structure including partitions into another table in oracle?

I need to do exchange partition. I have table_A which is partitioned by File_ID and sub-partitioned by file_type. I need to create temp table within my PL/SQL block which has the table structure as Table_A (including partitions). I need something like below:
create table temp_tab as
select * from table_A partition(SYS_P8924) where file_id=1000
partition by file_type
I know this a bad code example but somehow Can I achieve this? Appreciate help on this.
No worries guys! I got the solution:
create table temp_tab
partition by list (file_type)
(
partition values ('P1'),
partition values ('P2')
)
as
select * from table_A partition(SYS_P8924) where file_id=1000
Since my sub-partitions were 2 in number, hence it was partitioned into 2 sets.

Add partition to hive table with no data

I am trying to create a hive table which has the same columns as another table (partitioned). I use the following query for the same
CREATE TABLE destTable STORED AS PARQUET AS select * from srcTable where 1=2;
Apparently I cannot use 'PARTITIONED BY(col_name)' because destTable must not be partitioned. But I want to mention that destTable should be partitioned by a column (same as srcTable) before I add data to it.
Is there a way to do that?
As you mentioned, destTable can not be a partitioned table so there is no way to do this directly. Also, destTable can not be an external table.
In this situation you will need to create a temporary "staging_table" (un-partitioned and a Hive-managed table) to hold the data.
Step 1: Transfer everything from srcTable to the staging_table
Step 2: Create a partitioned destTable and do:
INSERT OVERWRITE TABLE destTable PARTITION(xxxx)
SELECT * FROM staging_table;
Hope this helps.

Hive - How to query a table to get its own name?

I want to write a query such that it returns the table name (of the table I am querying) and some other values. Something like:
select table_name, col1, col2 from table_name;
I need to do this in Hive. Any idea how I can get the table name of the table I am querying?
Basically, I am creating a lookup table that stores the table name and some other information on a daily basis in Hive. Since Hive does not (at least the version we are using) support full-fledged INSERTs, I am trying to use the workaround where we can INSERT into a table with a SELECT query that queries another table. Part of this involves actually storing the table name as well. How can this be achieved?
For the purposes of my use case, this will suffice:
select 'table_name', col1, col2 from table_name;
It returns the table name with the other columns that I will require.

Hive load specific columns

I am interested in loading specific columns into a table created in Hive.
Is it possible to load the specific columns directly or I should load all the data and create a second table to SELECT the specific columns?
Thanks
Yes you have to load all the data like this :
LOAD DATA [LOCAL] INPATH /Your/Path [OVERWRITE] INTO TABLE yourTable;
LOCAL means that your file is on your local system and not in HDFS, OVERWRITE means that the current data in the table will be deleted.
So you create a second table with only the fields you need and you execute this query :
INSERT OVERWRITE TABLE yourNewTable
yourSelectStatement
FROM yourOldTable;
It is suggested to create an External Table in Hive and map the data you have and then create a new table with specific columns and use the create table as command
create table table_name as select statement from table_name;
For example the statement looks like this
create table employee as select id as id,emp_name as name from emp;
Try this:
Insert into table_name
(
#columns you want to insert value into in lowercase
)
select columns_you_need from source_table;

convert normal column as partition column in hive

I have a table with 3 columns. now i need to modify one of the column as a partition column.
Is there any possibility? If not, how can we add partition to existing table. I used the below syntax:
create table t1 (eno int, ename string ) row format delimited fields terminated by '\t';
load data local '/....path/' into table t1;
alter table t1 add partition (p1='india');
i am getting errors.........
Any one know how to add partition to existing table......?
Thanks in advance.
I don't think this is directly possible. Hive would have to completely rearrange and split the files in HDFS because adding the partition would impose a new directory structure.
What I suggest you do is simply create a new table with the desired schema and partition, and insert everything from the first into the second.
You can't add a partition to a created table.
But you can do something like these steps.
Create a new table and insert data from the old table to the new one.
/*Original table structure*/
CREATE TABLE original_table(
c1 string,
c2 string,
c3 string)
STORED AS ORC;
/*Partitioned table structure*/
CREATE TABLE partitioned_table(
c1 string,
c2 string)
partitioned by (c3 string)
STORED AS ORC;
/*load data from original_table to partitioned_table*/
insert into
table partitioned_table partition(c3)
select c1, c2, c3
from original_table;
/*rename original_table to old_table. You can just drop it if you want it*/
ALTER TABLE original_table RENAME TO old_table;
/*rename partitioned_table to original_table*/
ALTER TABLE partitioned_table RENAME TO original_table;
I think there is no way to convert an existing column of a table to partition.
If you want to add a partition in a table use ALTER command as you have already done. If you are dealing with the external table then specify the location field as well. I am not sure whether a partition can be added using ALTER command for managed tables.

Resources