Error - Drop column of a partitioned table in Hive - hadoop

How do I drop a column of a partiotioned table in Hive?
I have an external table with 4 columns, for example:
column_A
column_B
column_C
dt_test - partition
I have to drop the column_C, so, I'm trying the follow command:
ALTER TABLE TABLE REPLACE COLUMNS(column_A string, column_B string, dt_test timestamp);
The follow error occurs:
Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. Partition column name dt_test conflicts with table columns.
Thanks!

You can drop column_c by the follow command:
ALTER TABLE TABLE REPLACE COLUMNS(column_A string, column_B string);
not having the partition column: dt_test timestamp

Related

Replace hive table with partition

There is a Hive-table with 2 string columns one partition "cmd_out".
I'm trying to rename all 2 columns ('col1', 'col2'), by using Replace-function:
Alter table 'table_test' replace columns(
'col22' String,
'coll33' String
)
But I receive the following exception:
Partition column name 'cmd_out' conflicts with table columns.
When I include the partition column in query
Alter table 'table_test' replace columns(
'cmd_out' String,
'col22' String,
'coll33' String
)
I receive:
Duplicate column name cmd_out in the table definition
if you want to rename a column, you need to use alter table ... change.
Here is the syntax
alter table mytab change col1 new_col1 string;

How I avoid the "NULL" in the first "Field Name" of Hive table

First of all I have created the table "emp" in Hive by using below commands:
create table emp (id INT, name STRING, address STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
Then load the data in this "emp" table by this below command:
LOAD DATA LOCAL INPATH '\home\cloudera\Desktop\emp.txt' overwrite into table emp;
When I select the data from "emp" table: it show me first field of table Null
like this:
You have an header row in your file and the first value id cannot be converted into INT therefore being replaced by NULL.
add tblproperties ("skip.header.line.count"="1") to your table definition
For an existing table -
alter table emp set tblproperties ("skip.header.line.count"="1");

Hive - Insert data into partitioned table: partition not found

I'm having issue while trying to inserting new data in Hive external partitioned table.
Table is partitioned by day, the error I got is:
FAILED: SemanticException [Error 10006]: Line 1:51 Partition not found ''18102016''
My query is as following:
ALTER TABLE my_source_table RECOVER PARTITIONS;
INSERT OVERWRITE TABLE my_dest_table PARTITION (d = '18102016')
SELECT
'III' AS primary_alias_type,
iii_id AS primary_alias_id,
FROM
my_source_table
WHERE
d = '18102016'
The my_dest_table has been created as:
CREATE EXTERNAL TABLE my_dest_table (
primary_alias_type string,
primary_alias_id
) PARTITIONED BY (d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://my_bucket/my_external_tables/'
Any idea on what I'm doing wrong? Thanks!
I believe you should ALTER TABLE my_source_table RECOVER PARTITIONS; do this for your destination table.
ALTER TABLE my_dest_table RECOVER PARTITIONS;
try this.
Note: Of course you should remove the extra comma what Alex L mentioned. Which will give other parsing error.

Inserting into Hive table - Non Partitioned table to Partitioned table - Cannot insert into target table because column number/types

When I tried to insert into a Partiotioned table I am getting the bellow error
SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ''US'': Table insclause-0 has 2 columns, but query has 3 columns.
My Input data
1,aaa,US
2,bbb,US
3,ccc,IN
4,ddd,US
5,eee,IN
6,fff,IN
7,ggg,US
Created hive table tx
create table tx (no int,name string,country string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Created Partitioned table t1 partitioned by country
create table t1 (no int,name string) PARTITIONED BY (country string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
I tried the bellow two insert, but failed
INSERT OVERWRITE TABLE t1 PARTITION (country='US')
SELECT * from tx where country = 'US';
INSERT OVERWRITE TABLE t1 PARTITION (country='US')
SELECT no,name,country from tx where country = 'US';
Error : SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ''US'': Table insclause-0 has 2 columns, but query has 3 columns.
A Big Thanks to Samson Scharfrichter
INSERT OVERWRITE TABLE t1 PARTITION (country='US')
SELECT no,name from tx where country = 'US';
INSERT INTO TABLE t1 PARTITION (country='IN')
SELECT no,name from tx where country = 'IN';
I checked the Partitions
hive> SHOW PARTITIONS t1;
OK
country=IN
country=US
Time taken: 0.291 seconds, Fetched: 2 row(s)
hive>

Alter hive table add or drop column

I have orc table in hive I want to drop column from this table
ALTER TABLE table_name drop col_name;
but I am getting the following exception
Error occurred executing hive query: OK FAILED: ParseException line 1:35 mismatched input 'user_id1' expecting PARTITION near 'drop' in drop partition statement
Can any one help me or provide any idea to do this? Note, I am using hive 0.14
You cannot drop column directly from a table using command ALTER TABLE table_name drop col_name;
The only way to drop column is using replace command. Lets say, I have a table emp with id, name and dept column. I want to drop id column of table emp. So provide all those columns which you want to be the part of table in replace columns clause. Below command will drop id column from emp table.
ALTER TABLE emp REPLACE COLUMNS( name string, dept string);
There is also a "dumb" way of achieving the end goal, is to create a new table without the column(s) not wanted. Using Hive's regex matching will make this rather easy.
Here is what I would do:
-- make a copy of the old table
ALTER TABLE table RENAME TO table_to_dump;
-- make the new table without the columns to be deleted
CREATE TABLE table AS
SELECT `(col_to_remove_1|col_to_remove_2)?+.+`
FROM table_to_dump;
-- dump the table
DROP TABLE table_to_dump;
If the table in question is not too big, this should work just well.
suppose you have an external table viz. organization.employee as: (not including TBLPROPERTIES)
hive> show create table organization.employee;
OK
CREATE EXTERNAL TABLE `organization.employee`(
`employee_id` bigint,
`employee_name` string,
`updated_by` string,
`updated_date` timestamp)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'
You want to remove updated_by, updated_date columns from the table. Follow these steps:
create a temp table replica of organization.employee as:
hive> create table organization.employee_temp as select * from organization.employee;
drop the main table organization.employee.
hive> drop table organization.employee;
remove the underlying data from HDFS (need to come out of hive shell)
[nameet#ip-80-108-1-111 myfile]$ hadoop fs -rm hdfs://getnamenode/apps/hive/warehouse/organization.db/employee/*
create the table with removed columns as required:
hive> CREATE EXTERNAL TABLE `organization.employee`(
`employee_id` bigint,
`employee_name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'
insert the original records back into original table.
hive> insert into organization.employee
select employee_id, employee_name from organization.employee_temp;
finally drop the temp table created
hive> drop table organization.employee_temp;
ALTER TABLE emp REPLACE COLUMNS( name string, dept string);
Above statement can only change the schema of a table, not data.
A solution of this problem to copy data in a new table.
Insert <New Table> Select <selective columns> from <Old Table>
ALTER TABLE is not yet supported for non-native tables; i.e. what you get with CREATE TABLE when a STORED BY clause is specified.
check this https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
After a lot of mistakes, in addition to above explanations, I would add simpler answers.
Case 1: Add new column named new_column
ALTER TABLE schema.table_name
ADD new_column INT COMMENT 'new number column');
Case 2: Rename a column new_column to no_of_days
ALTER TABLE schema.table_name
CHANGE new_column no_of_days INT;
Note that in renaming, both columns should be of same datatype like above as INT
For external table its simple and easy.
Just drop the table schema then edit create table schema , at last again create table with new schema.
example table: aparup_test.tbl_schema_change and will drop column id
steps:-
------------- show create table to fetch schema ------------------
spark.sql("""
show create table aparup_test.tbl_schema_change
""").show(100,False)
o/p:
CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP, id BIGINT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
'parquet.compress' = 'snappy'
)
""")
------------- drop table --------------------------------
spark.sql("""
drop table aparup_test.tbl_schema_change
""").show(100,False)
------------- edit create table schema by dropping column "id"------------------
CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
'parquet.compress' = 'snappy'
)
""")
------------- sync up table schema with parquet files ------------------
spark.sql("""
msck repair table aparup_test.tbl_schema_change
""").show(100,False)
==================== DONE =====================================
Even below query is working for me.
Alter table tbl_name drop col_name

Resources