Update new added columns in hive - hadoop

I have been trying to make updates to an orc table in hive which is bucketed and also set transactional=true property. The normal update works great but as soon as I alter the table and add a new column e.g. column_added_5, and try to update column_added_5 the statement executes but the column does not get updated.
Any help/directions is appreciated.

I think that one way is:
CREATE TABLE new_table_name AS SELECT column1,column2,column3, ... "default_value" as column_added_5 FROM your_table_name;
DROP TABLE your_table_name;
ALTER TABLE new_table_name RENAME TO your_table_name;

Did you try this:
ALTER TABLE table_name ADD COLUMNS ( column_added_5 STRING COMMENT 'Column 5');

Related

Apache Hive: How to Add Column at Specific Location in Table

I want to add a new column to a specific location in hive table. when i add new column it goes to the last position.
You need to recreate table. If the table is external and data already contains new column, then issue drop and create table statements. General solution is to:
1. create new_table...;
2. insert overwrite new_table select from old_table;
3. drop old_table;
4. alter new_table rename to old_table;
Also if datafiles already contain new column in some position you can
1. Alter table add column
Change column position using this example:
2. ALTER TABLE test_change CHANGE old_name new_name STRING AFTER other_col CASCADE;
See docs here: Change Column Name/Type/Position/Comment
How frequently are people running SELECT *?? Typically, people list out each column in the select statement. Just add the column to the end, and adjust like SELECT last_col, first_col, second_col ...
Alternatively, create a VIEW that runs a select statement with the column ordering you want.
Rename the table to something else, and name the view to the table, and no one would know any different

Change column name in a table in Clickhouse

Is there any way to ALTER a table and change the column name in clickhouse?
I only found to change tha table name but not for an individual column in a straight forward way.
Thanks.
The feature has been introduced here into v20.4.
ALTER TABLE table1 RENAME COLUMN old_name TO new_name
You can also rename multiple columns at on:
ALTER TABLE table1
RENAME COLUMN old_name1 TO new_name1,
RENAME COLUMN old_name2 TO new_name2
Old answer:
ClickHouse doesn't have that feature yet.
Implementation is not trivial, because ALTERs that changing columns
are processed outside of usual replication queue, and adding rename
without reworking of ALTERs will introduce race conditions in
replicated tables.
https://github.com/yandex/ClickHouse/issues/146#issuecomment-255631384
As #Slash said, the solution for now is to create new table and
INSERT INTO `new_table` SELECT * FROM `old_table`
Do not forget that column aliasing won't work there (AS).
INSERT INTO `new_table` SELECT a, b AS c, c AS b FROM `old_table`
That will still insert a into first column, b into second column and c into third column. AS has no effect there.
You can try use CREATE TABLE new_table with another field name
and run INSERT INTO new_table SELECT old_field AS new_field FROM old_table
If you created the table using Engine=log, it won't allow you to alter or rename the column.
connection_string = f'clickhouse://{username}:{password}#{host}:{port}/{database}'
engine = create_engine(connection_string)
conn = engine.connect()
table = "table1"
schema = 'Parameter String, Key UInt8'
engine.execute("CREATE TABLE IF NOT EXISTS {}({}) ENGINE = Log".format(table,schema))
If you created table using the mergeTree engine, it’s allowed to rename the column:
engine.execute("CREATE TABLE IF NOT EXISTS {}({}) ENGINE =MergeTree ORDER BY Key".format(table,schema))

HIVE: How create a table with all columns in another table EXCEPT one of them?

When I need to change a column into a partition (convert normal column as partition column in hive), I want to create a new table to copy all columns except one. I currently have >50 columns in the original table. Is there any clean way of doing that?
Something like:
CREATE student_copy LIKE student EXCEPT age and hair_color;
Thanks!
You can use a regex:
CTAS using REGEX column spec. :
set hive.support.quoted.identifiers=none;
CREATE TABLE student_copy AS SELECT `(age|hair_color)?+.+` FROM student;
set hive.support.quoted.identifiers=column;
BUT (as mentioned by Kishore Kumar Suthar :
this will not create a partitioned table, as that is not supported with CTAS (Create Table As Select).
Only way I see for you to get your partitioned table is by getting the complete create statement of the table (as mentioned by Abraham):
SHOW CREATE TABLE student;
Altering it to create a partition on the column you want. And after that you can use the select with regex when inserting into the new table.
If your partition column is already part of this select, then you need to make sure it is the last column you insert. If it is not you can exclude that column in the regex and including it as last. Also if you expect several partitions to be created based on your insert statement you need to enable 'dynamic partitioning':
set hive.support.quoted.identifiers=none;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO TABLE student_copy PARTITION(partcol1) SELECT `(age|hair_color|partcol1)?+.+`, partcol1 FROM student;
set hive.support.quoted.identifiers=column;
the 'hive.support.quoted.identifiers=none' is required to use the backticks '`' in the regex part of the query. I set this parameter to it's original value after my statement: 'hive.support.quoted.identifiers=column'
CREATE TABLE student_copy LIKE student;
It just copies the source table definition.
CREATE TABLE student_copy AS select name, age, class from student;
Target cannot be partitioned table.
Target cannot be external table.
It copies the structure as well as the data
I use below command to get the create statement of existing table.
SHOW CREATE TABLE student;
Copy the result and modify that based on your requirement for new table and run the modified command to get the new table.

How to modify data type in Oracle with existing rows in table

How can I change DATA TYPE of a column from number to varchar2 without deleting the table data?
You can't.
You can, however, create a new column with the new data type, migrate the data, drop the old column, and rename the new column. Something like
ALTER TABLE table_name
ADD( new_column_name varchar2(10) );
UPDATE table_name
SET new_column_name = to_char(old_column_name, <<some format>>);
ALTER TABLE table_name
DROP COLUMN old_column_name;
ALTER TABLE table_name
RENAME COLUMN new_column_name TO old_coulumn_name;
If you have code that depends on the position of the column in the table (which you really shouldn't have), you could rename the table and create a view on the table with the original name of the table that exposes the columns in the order your code expects until you can fix that buggy code.
You have to first deal with the existing rows before you modify the column DATA TYPE.
You could do the following steps:
Add the new column with a new name.
Update the new column from old column.
Drop the old column.
Rename the new column with the old column name.
For example,
alter table t add (col_new varchar2(50));
update t set col_new = to_char(col_old);
alter table t drop column col_old cascade constraints;
alter table t rename column col_new to col_old;
Make sure you re-create any required indexes which you had.
You could also try the CTAS approach, i.e. create table as select. But, the above is safe and preferrable.
The most efficient way is probably to do a CREATE TABLE ... AS SELECT
(CTAS)
alter table table_name modify (column_name VARCHAR2(255));
Since we can't change data type of a column with values, the approach that I was followed as below,
Say the column name you want to change type is 'A' and this can be achieved with SQL developer.
First sort table data by other column (ex: datetime).
Next copy the values of column 'A' and paste to excel file.
Delete values of the column 'A' an commit.
Change the data type and commit.
Again sort table data by previously used column (ex: datetime).
Then paste copied data from excel and commit.

Hive load specific columns

I am interested in loading specific columns into a table created in Hive.
Is it possible to load the specific columns directly or I should load all the data and create a second table to SELECT the specific columns?
Thanks
Yes you have to load all the data like this :
LOAD DATA [LOCAL] INPATH /Your/Path [OVERWRITE] INTO TABLE yourTable;
LOCAL means that your file is on your local system and not in HDFS, OVERWRITE means that the current data in the table will be deleted.
So you create a second table with only the fields you need and you execute this query :
INSERT OVERWRITE TABLE yourNewTable
yourSelectStatement
FROM yourOldTable;
It is suggested to create an External Table in Hive and map the data you have and then create a new table with specific columns and use the create table as command
create table table_name as select statement from table_name;
For example the statement looks like this
create table employee as select id as id,emp_name as name from emp;
Try this:
Insert into table_name
(
#columns you want to insert value into in lowercase
)
select columns_you_need from source_table;

Resources