In Hive, how can I add a column only if that column does not exist? - hadoop

I would like to add a new column to a table, but only if that column does not already exist.
This works if the column does not exist:
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
But when I execute it a second time, I get an error.
Column 'mycolumn' exists
When I try to use the "IF NOT EXISTS" syntax that is supported for CREATE TABLE and ADD PARTITION, I get a syntax error:
ALTER TABLE MyTable ADD IF NOT EXISTS COLUMNS (mycolumn string);
FAILED: ParseException line 3:42 required (...)+ loop did not match anything at input 'COLUMNS' in add partition statement
What I need is something that can execute itempotently so I can run my query whether this column exists or not.

You can partially work it around, by setting the hive.cli.errors.ignore flag. In this case hive CLI will force the execution of further queries even when queries on the way fail.
In this example:
SET hive.cli.errors.ignore=true;
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
ALTER TABLE MyTable ADD COLUMNS (mycolumn2 string);
hive will execute all queries, even though there'll be an error in the second query.

Well there is no direct way to do that. I mean through a single query.
There are two other ways:
1.) Using JDBC:
1.1) Do describe on the table name.
1.2) You will get a list of columns in result set.
1.3) Check if your columns exists or not by iterating through the result set.
2.) Using hive Metastore client:
2.1) Create a object of HiveMetastoreClient
2.2) HiveMetastoreClient.getFields(<>db_name, <table_name>).get(index).getName() will give you the column name.
2.3) Check if your column exists of not by comparing the list.
Hope it helps...!!!

Related

Can I add a subcolumn to a hive struct column using alter table?

I've a very complex table structure in hive, let's say that it's like the following table:
create table dirceu ( a struct<b:string,c:string>);
Now I do need to add another subcolumn to the a column, and it should have the structure b,c and d, I'm trying to do it with the following alter table:
alter table dirceu change column a a struct<b:string,c:string, d:string>;
But this throw the following error:
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
a (state=08S01,code=1)
Is there a way to do this using alter table? I know that I can do it using create table and copy the data, but I would like to know if there is another way to do it.
UPDATE
I'm using hive: 2.1.0.2.6.1.0-129
HortonWorks: HDP-2.6.1.0
This is the command you need to give,
ALTER TABLE dirceu CHANGE COLUMN a a STRUCT<b:STRING, c:STRING, d:STRING>;
You can't add a new field to Collection data type. Instead, you need to completely change the schema.
Hope you like the answer. Yippee!!
I guess it works by this command,
alter table dirceu add column a struct<d:string>;
Please let me know if it doesn't work by comment.

Concat_ws not working in insert statement in hive

Using hive, I'm trying to concatenate columns from one table and insert them in another table using the query
insert into table temp_error
select * from (Select 'temp_test','abcd','abcd','abcd',
from_unixtime(unix_timestamp()),concat_ws('|',sno,name,age)
from temp_test_string)c;
I get the required output till I use Select *. But as soon as I try to insert it into the table, it does not give concatenated output but gives the value of sno only instead of whole concatenated output.
Thanks guys.
I found why it was behaving that way. It's because while creating table I gave "separate fields by '|'". So what I was trying to insert as a string into the table, hive was interpreting it as different columns.

How to drop hive column?

I have two columns Id and Name in Hive table, and I want to delete the Name column. I have used following command:
ALTER TABLE TableName REPLACE COLUMNS(id string);
The result was that the Name column values were assigned to the Id column.
How can I drop a specific column of the table and is there any other command in Hive to achieve my goal?
In addition to the existing answers to the question : Alter hive table add or drop column
As per Hive documentation,
REPLACE COLUMNS removes all existing columns and adds the new set of columns.
REPLACE COLUMNS can also be used to drop columns. For example, ALTER TABLE test_change REPLACE COLUMNS (a int, b int); will remove column c from test_change's schema.
The query you are using is right. But this will modify only schema i.e, the metastore. This will not modify anything on data side.
So, before you are dropping the column you should make sure that you hav correct data file.
In your case the data file should not contain name values.
If you don't want to modify the file then create another table with only specific column that you need.
Create table tablename as select id from already_existing_table
let me know if this helps.

how to add a column after another one in monetDB

I am trying to add a new column in a monetDB database and I want it positioned after a specific one. In mysql this is possible using the AFTER keyword.
ALTER TABLE myTable ADD myNewColumn VARCHAR(255) AFTER myOtherColumn
I am trying this in the mclient:
sql>ALTER TABLE dbname.table_name ADD COLUMN new_name AFTER existing_name SET DEFAULT NULL;
What I get is a syntax error:
syntax error, unexpected AFTER in: "ALTER TABLE dbname.table_name ADD COLUMN new_name AFTER"
It is true that the ALTER documentation does not specify that AFTER exists, but I am hoping that anybody knows an alternative.
The safe way is to create an new table with the columns properly ordered and move the data; you probably know this already.
However if you really cannot do that, create a view:
CREATE VIEW AS SELECT [order the columns however you want here] FROM your_table;

HIVE: How create a table with all columns in another table EXCEPT one of them?

When I need to change a column into a partition (convert normal column as partition column in hive), I want to create a new table to copy all columns except one. I currently have >50 columns in the original table. Is there any clean way of doing that?
Something like:
CREATE student_copy LIKE student EXCEPT age and hair_color;
Thanks!
You can use a regex:
CTAS using REGEX column spec. :
set hive.support.quoted.identifiers=none;
CREATE TABLE student_copy AS SELECT `(age|hair_color)?+.+` FROM student;
set hive.support.quoted.identifiers=column;
BUT (as mentioned by Kishore Kumar Suthar :
this will not create a partitioned table, as that is not supported with CTAS (Create Table As Select).
Only way I see for you to get your partitioned table is by getting the complete create statement of the table (as mentioned by Abraham):
SHOW CREATE TABLE student;
Altering it to create a partition on the column you want. And after that you can use the select with regex when inserting into the new table.
If your partition column is already part of this select, then you need to make sure it is the last column you insert. If it is not you can exclude that column in the regex and including it as last. Also if you expect several partitions to be created based on your insert statement you need to enable 'dynamic partitioning':
set hive.support.quoted.identifiers=none;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO TABLE student_copy PARTITION(partcol1) SELECT `(age|hair_color|partcol1)?+.+`, partcol1 FROM student;
set hive.support.quoted.identifiers=column;
the 'hive.support.quoted.identifiers=none' is required to use the backticks '`' in the regex part of the query. I set this parameter to it's original value after my statement: 'hive.support.quoted.identifiers=column'
CREATE TABLE student_copy LIKE student;
It just copies the source table definition.
CREATE TABLE student_copy AS select name, age, class from student;
Target cannot be partitioned table.
Target cannot be external table.
It copies the structure as well as the data
I use below command to get the create statement of existing table.
SHOW CREATE TABLE student;
Copy the result and modify that based on your requirement for new table and run the modified command to get the new table.

Resources