Create new hive table from existing external portioned table - hadoop

I have a external partitioned table with almost 500 partitions. I am trying to create another external table with same properties as of the old table. Then i want to copy all the partitions from my old table to the newly created table. below is my create table query. My old table is stored as TEXTFILE and i want to save the new one as ORC file.
'add jar json_jarfile;
CREATE EXTERNAL TABLE new_table_orc (col1,col2,col3...col27)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (....)
STORED AS orc
LOCATION 'path';'
And after creation of this table. i am using the below query to insert the partitions from old table to new one.i only want to copy few columns from original table to new table
'set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE new_table_orc PARTITION (year,month,day) SELECT col2,col3,col6,year,month,day FROM old_table;
ALTER TABLE new_table_orc RECOVER PARTITIONS;'
i am getting below error.
'FAILED: SemanticException [Error 10044]: Line 2:23 Cannot insert into target table because column number/types are different 'day': Table insclause-0 has 27 columns, but query has 6 columns.'
Any suggestions?

Your query has to match the number and type of columns in your new table. You have created your new table with 27 regular columns and 3 partition columns, but your query only select six columns.
If you really only care about those six columns, then modify the new table to have only those columns. If you do want all columns, then modify your select statement to select all of those columns.
You also will not need the "recover partitions" statement. When you insert into a table with dynamic partitions, it will create those partitions both in the filesystem and in the metastore.

Related

Create partitioned table from non partitioned table

Suppose I have internal orc non partitioned table in Hive:
CREATE TABLE IF NOT EXISTS non_partitioned_table(
id STRING,
company STRING,
city STRING,
country STRING,
)
STORED AS ORC;
Is it possible somehow create parquet partitioned table this way via cte like statement?
create partitioned_table PARTITION ON (date STRING) like non_partitioned_table;
alter table partitioned_table SET FILEFORMAT PARQUET;
This create statement doesn't work.
So basically I need to add column and make table partitioned by this column. I know that I can create table through the simple create table statement, but I need to do it within CREATE TABLE LIKE and the altered somehow
Your table doesn't have a date column to begin with, so you're going to have to make a new one.
You might be able to ALTER TABLE non_partitioned_table ADD PARTITION, but haven't tried that myself. If you want to try it, I would suggest the partition location be outside of the existing HDFS directory.
Anyways, the CREATE-TABLE-LIKE DDL does not support PARTITIONED BY
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
You need to copy the DESCRIBE TABLE schema from the first, then alter it and add the PARTITIONED BY, and optionally specify STORED AS. (SET FILEFORMAT PARQUET doesn't change the data type in-place).
Then, if you want the data in the new table, you need to INSERT OVERWRITE TABLE

How can I partition a hive table by (only) a portion of a timestamp column?

Assume I have a Hive table that includes a TIMESTAMP column that is frequently (almost always) included in the WHERE clauses of a query. It makes sense to partition this table by the TIMESTAMP field; however, to keep to a reasonable cardinality, it makes sense to partition by day (not by the maximum resolution of the TIMESTAMP).
What's the best way to achieve this? Should I create an additional column (DATE) and partition on that? Or is there a way to achieve the partition without creating a duplicate column?
Its not a new column, but its a pseudo-column, You should re-create your table with adding the partitioning specification like this :
create table table_name (
id int,
name string,
timestamp string
)
partitioned by (date string)
Then you load the data creating the partitions dynamically like this
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
FROM table_name_old tno
INSERT OVERWRITE TABLE table_name PARTITION(substring(timestamp,0,10))
SELECT tno.id, tno.name, tno.timestamp;
Now if you select all from your table you will see a new column for the partition, but consider that a Hive partition is just a subdirectory and its not a real column, hence it does not affect the total table size only by some kilobytes.
As partition is also one of the column in hive, every partition has value(assign using static or dynamic partition) and every partition is mapped to directory in HDFS, so it has to be additional column.
You may choose one the below option:
Let's say table DDL:
CREATE TABLE temp( id string) PARTITIONED BY (day int)
If the data is organised day wise then add static partition:
ALTER TABLE xyz
ADD PARTITION (day=00)
location '/2017/02/02';
or
INSERT OVERWRITE TABLE xyz
PARTITION (day=1)
SELECT id FROM temp 
WHERE dayOfTheYear(**timestamp**)=1;
Generate day number using dynamic partition:
INSERT INTO TABLE xyz
PARTITION (day)
SELECT id ,
dayOfTheYear(day)
FROM temp;
Hive doesn't have any dayOfTheYear function you create it.

hive first column to consider in partition table

Creating partition table in hive, does it mandatory to choose always the last column for partition column.
If I choose 1st column as partition, I cant do filter data, is there any way to choose first column for partition?
In hive, if you want to partition a table, you have to define partition column first during table creation time. & while populating the data into table you need to specify as follow:
"INSERT INTO partitioned_table PARTITION(status) SELECT id , name, status from temp_tbl "
in this way using you can partition based on last column only. if you want to partition on the basis of first column. you have to write a Mapreduce job for that . that is the only option available.
I guess the problem you are facing is that you already have table "source" in your local system or hdfs and you want to upload it to partitioned table. And you want the first column in the source table to be partitioned in hive. As the source table does not have headers i guess we can not do anything here if we try to directly upload the file in the hive destination folder. The only alternate way i know is that create a non partitioned table in hive whose structure is exactly the same as the source file. then upload the source data to non partitioned table first, then copy the data from non partitioned table to partitioned table.
Suppose the source file is like this
create table source(eid int, ename int, esal int) partitioned by (dept string)
your non partioned table where you upload the data is like thiscreate table nopart(dept string, esal int,ename string, eid int)
then you use the dynamic partition by command insert overwrite table source partition(dept) select eid,ename,esal,dept from nopart;
the order of the parameters is the only point here.

FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException while inserting data into Hive partitioned table

I have an employee data with 3 departments A,B,C.
I am trying to create partioned table on departments.
I created the table using below command.
create external table Parti_Trail (EmployeeID Int,FirstName
String,Designation String,Salary Int) PARTITIONED BY (Department
String) row format delimited fields terminated by "," location
'/user/sree/HiveTrail';
But this did nt load my table with data in location '/user/sree/HiveTrail'
So I tried to load my table
LOAD DATA INPATH '/user/aibladmin/HiveTrail' OVERWRITE INTO TABLE Parti_SCDTrail PARTITION(department);
But showing
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: department not found in table's partition spec: {department=null}
Why is it so. Am I doing anything wrong?
What happens if we SET hive.exec.dynamic.partition.mode = nonstrict;
While creating partitioned table , do we need to keep data seperated in different folder or whether it automatically get seperated into different partitions
For external tables with partition in Hive you need to run an ALTER statement to update the Metastore for new partitions. Because external tables are not managed by Hive.
Check this link
Hope it helps...!!!

convert normal column as partition column in hive

I have a table with 3 columns. now i need to modify one of the column as a partition column.
Is there any possibility? If not, how can we add partition to existing table. I used the below syntax:
create table t1 (eno int, ename string ) row format delimited fields terminated by '\t';
load data local '/....path/' into table t1;
alter table t1 add partition (p1='india');
i am getting errors.........
Any one know how to add partition to existing table......?
Thanks in advance.
I don't think this is directly possible. Hive would have to completely rearrange and split the files in HDFS because adding the partition would impose a new directory structure.
What I suggest you do is simply create a new table with the desired schema and partition, and insert everything from the first into the second.
You can't add a partition to a created table.
But you can do something like these steps.
Create a new table and insert data from the old table to the new one.
/*Original table structure*/
CREATE TABLE original_table(
c1 string,
c2 string,
c3 string)
STORED AS ORC;
/*Partitioned table structure*/
CREATE TABLE partitioned_table(
c1 string,
c2 string)
partitioned by (c3 string)
STORED AS ORC;
/*load data from original_table to partitioned_table*/
insert into
table partitioned_table partition(c3)
select c1, c2, c3
from original_table;
/*rename original_table to old_table. You can just drop it if you want it*/
ALTER TABLE original_table RENAME TO old_table;
/*rename partitioned_table to original_table*/
ALTER TABLE partitioned_table RENAME TO original_table;
I think there is no way to convert an existing column of a table to partition.
If you want to add a partition in a table use ALTER command as you have already done. If you are dealing with the external table then specify the location field as well. I am not sure whether a partition can be added using ALTER command for managed tables.

Resources