alter hive multiple column - hadoop

How do we alter the datatype of multiple columns in Hive ?
CREATE TABLE test_change (a int, b int, c int);
ALTER TABLE test_change CHANGE a a string b b doube c c decimal(11,2);

As far as I know, you can't. In the Hive documentation you can find the following:
ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
[COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
This command will allow users to change a column's name, data type, comment, or position, or an arbitrary combination of them. The PARTITION clause is available in Hive 0.14.0 and later; see Upgrading Pre-Hive 0.13.0 Decimal Columns for usage. A patch for Hive 0.13 is also available (see HIVE-7971).
The documentation is speaking about "a column".
The alternative would be to write multiple queries, one for each datatype you have to change.
Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Related

How to create table in Hive with specific column values from another table

I am new to Hive and have some problems. I try to find a answer here and other sites but with no luck... I also tried many different querys that come to my mind, also without success.
I have my source table and i want to create new table like this.
Were:
id would be number of distinct counties as auto increment numbers and primary key
counties as distinct names of counties (from source table)
You could follow this approach.
A CTAS(Create Table As Select)
with your example this CTAS could work
CREATE TABLE t_county
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE AS
WITH t AS(
SELECT DISTINCT county, ROW_NUMBER() OVER() AS id
FROM counties)
SELECT id, county
FROM t;
You cannot have primary key or foreign keys on Hive as you have primary key on RBDMSs like Oracle or MySql because Hive is schema on read instead of schema on write like Oracle so you cannot implement constraints of any kind on Hive.
I can not give you the exact answer because of it suppose to you must try to do it by yourself and then if you have a problem or a doubt come here and tell us. But, what i can tell you is that you can use the insertstatement to create a new table using data from another table, I.E:
create table CARS (name string);
insert table CARS select x, y from TABLE_2;
You can also use the overwrite statement if you desire to delete all the existing data that you have inside that table (CARS).
So, the operation will be
CREATE TABLE ==> INSERT OPERATION (OVERWRITE?) + QUERY OPERATION
Hive is not an RDBMS database, so there is no concept of primary key or foreign key.
But you can add auto increment column in Hive. Please try as:
Create table new_table as
select reflect("java.util.UUID", "randomUUID") id, countries from my_source_table;

Change column name in a table in Clickhouse

Is there any way to ALTER a table and change the column name in clickhouse?
I only found to change tha table name but not for an individual column in a straight forward way.
Thanks.
The feature has been introduced here into v20.4.
ALTER TABLE table1 RENAME COLUMN old_name TO new_name
You can also rename multiple columns at on:
ALTER TABLE table1
RENAME COLUMN old_name1 TO new_name1,
RENAME COLUMN old_name2 TO new_name2
Old answer:
ClickHouse doesn't have that feature yet.
Implementation is not trivial, because ALTERs that changing columns
are processed outside of usual replication queue, and adding rename
without reworking of ALTERs will introduce race conditions in
replicated tables.
https://github.com/yandex/ClickHouse/issues/146#issuecomment-255631384
As #Slash said, the solution for now is to create new table and
INSERT INTO `new_table` SELECT * FROM `old_table`
Do not forget that column aliasing won't work there (AS).
INSERT INTO `new_table` SELECT a, b AS c, c AS b FROM `old_table`
That will still insert a into first column, b into second column and c into third column. AS has no effect there.
You can try use CREATE TABLE new_table with another field name
and run INSERT INTO new_table SELECT old_field AS new_field FROM old_table
If you created the table using Engine=log, it won't allow you to alter or rename the column.
connection_string = f'clickhouse://{username}:{password}#{host}:{port}/{database}'
engine = create_engine(connection_string)
conn = engine.connect()
table = "table1"
schema = 'Parameter String, Key UInt8'
engine.execute("CREATE TABLE IF NOT EXISTS {}({}) ENGINE = Log".format(table,schema))
If you created table using the mergeTree engine, it’s allowed to rename the column:
engine.execute("CREATE TABLE IF NOT EXISTS {}({}) ENGINE =MergeTree ORDER BY Key".format(table,schema))

How to drop hive column?

I have two columns Id and Name in Hive table, and I want to delete the Name column. I have used following command:
ALTER TABLE TableName REPLACE COLUMNS(id string);
The result was that the Name column values were assigned to the Id column.
How can I drop a specific column of the table and is there any other command in Hive to achieve my goal?
In addition to the existing answers to the question : Alter hive table add or drop column
As per Hive documentation,
REPLACE COLUMNS removes all existing columns and adds the new set of columns.
REPLACE COLUMNS can also be used to drop columns. For example, ALTER TABLE test_change REPLACE COLUMNS (a int, b int); will remove column c from test_change's schema.
The query you are using is right. But this will modify only schema i.e, the metastore. This will not modify anything on data side.
So, before you are dropping the column you should make sure that you hav correct data file.
In your case the data file should not contain name values.
If you don't want to modify the file then create another table with only specific column that you need.
Create table tablename as select id from already_existing_table
let me know if this helps.

Hive drops all the partitions if the partition column name is not correct

I am facing a strange issue with hive,
I have a table, partitioned on the basis of dept_key (its a integer eg.3212)
table is created as follows
create external table dept_details (dept_key,dept_name,dept_location) PARTITIONED BY (dept_key_partition INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' LOCATION '/dept_details/dept/';
Now I have some partitions already added e.g: 1204,1203,1204
When I tried dropping the partition I by mistake typed only dept_key and not "dept_key_partition" this in turn dropped all my partition
drop query alter table dept_details drop partition (dept_key=12), its a very strange issue which I am facing. Please let me know what can be the probable issue.
Thank you.
From Hive LanguageManual DDL...
As of Hive 0.14 (HIVE-8411), users are able to provide a partial
partition spec for certain above alter column statements, similar to
dynamic partitioning (...) you can change many existing partitions at once using a single ALTER statement (...) Similar to dynamic partitioning, hive.exec.dynamic.partition must be set to true
Looks like the partial specification feature has been ported to other commands like DROP, TRUNCATE even if it's not explicit in the documentation.
In short: it's not a bug, it's a feature.

Alter table add column AFTER in derby database

Is there a way in a derby database to add a column after another just like mysql does?
I don't believe so. Column ordering isn't really a standard SQL feature. Well-written db applications shouldn't care about column ordering. You can specify the order of the columns in your output by naming the columns in your SQL statement as in:
create table t (a int, b int, c int );
select b, c, a from t;

Resources