I am prototyping my pig script to hive. I need to add a status column to the table which is imported from Oracle database.
My pig scripts looks like this:
user_data = LOAD 'USER_DATA' USING PigStorage(',') AS (USER_ID:int,MANAGER_ID:int,USER_NAME:int);
user_data_status = FOREACH user_data GENERATE
USER_ID,
MANAGER_ID,
USER_NAME,
'active' AS STATUS;
Here I am adding the STATUS column with 'active' value to the user_data table.
How can I add column to an existing table to add column while importing the table via Hive QL??
As far as I know, You will have to reload the data as you did in Pig.
For example, If you already have the table user_data with columns USER_ID:int,MANAGER_ID:int,USER_NAME:int and you are looking for USER_ID:int,MANAGER_ID:int,USER_NAME:int, STATUS:active
You can re-load the table user_data_status by using something like this
INSERT OVERWRITE TABLE user_data_status SELECT *, 'active' AS STATUS FROM user_data;
Though there are options to add columns to the existing table, that would only update the metadata in metastore and the values would be defaulted to NULL.
If I was you, I would rather re-load the complete data rather than looking to update the complete table using UPDATE command after Altering the column structure. Hope this helps !
Related
I am new to Hive and have some problems. I try to find a answer here and other sites but with no luck... I also tried many different querys that come to my mind, also without success.
I have my source table and i want to create new table like this.
Were:
id would be number of distinct counties as auto increment numbers and primary key
counties as distinct names of counties (from source table)
You could follow this approach.
A CTAS(Create Table As Select)
with your example this CTAS could work
CREATE TABLE t_county
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE AS
WITH t AS(
SELECT DISTINCT county, ROW_NUMBER() OVER() AS id
FROM counties)
SELECT id, county
FROM t;
You cannot have primary key or foreign keys on Hive as you have primary key on RBDMSs like Oracle or MySql because Hive is schema on read instead of schema on write like Oracle so you cannot implement constraints of any kind on Hive.
I can not give you the exact answer because of it suppose to you must try to do it by yourself and then if you have a problem or a doubt come here and tell us. But, what i can tell you is that you can use the insertstatement to create a new table using data from another table, I.E:
create table CARS (name string);
insert table CARS select x, y from TABLE_2;
You can also use the overwrite statement if you desire to delete all the existing data that you have inside that table (CARS).
So, the operation will be
CREATE TABLE ==> INSERT OPERATION (OVERWRITE?) + QUERY OPERATION
Hive is not an RDBMS database, so there is no concept of primary key or foreign key.
But you can add auto increment column in Hive. Please try as:
Create table new_table as
select reflect("java.util.UUID", "randomUUID") id, countries from my_source_table;
Working on hive table, where I need to change column name as below, its working as expected and changing column name but underline value of this column getting NULL.
ALTER TABLE db.tbl CHANGE hdfs_loaddate hdfs_load_date String;
Here changed column name is hdfs_load_date and values are getting NULL after renaming column name.
Does any one have idea to fix this. Thanks in advance!!
#Ajay_SK Referencing this article: Hive Alter table change Column Name
There is a comment:
Note that the column change will not change any underlying data if it is a parquet table. That is, if you have data in the table already, renaming a column will not make the data in that column accessible under the new name: select a from test_change; 1 alter table test_change change a a1 int; select a1 from test_change; null
He is specific to parquet, but the scenario you describe is similar where you have successfully changed the name, but hive still thinks the original data is in the original key.
A better approach to solve your issue, would be to create a new table of the schema you want with column name change. Then perform an Insert INTO new table select FROM * old table.
I have a massive table stored as parquet and I need to add columns based on conditions.
Is there a way to do that without having to recreate a new table in Hive or Impala?
Something like this?
ALTER TABLE xyz
ADD COLUMN flag AS (CASE WHEN ... END)
Thank you
I don't believe that Hive or Impala support computed columns. This type of calculation is often done using a view:
CREATE VIEW v_xyz AS
SELECT xyz.*,
(CASE WHEN ... END) as flag
FROM xyz;
You can then update the view at any time to adjust the logic or add new columns.
I have two columns Id and Name in Hive table, and I want to delete the Name column. I have used following command:
ALTER TABLE TableName REPLACE COLUMNS(id string);
The result was that the Name column values were assigned to the Id column.
How can I drop a specific column of the table and is there any other command in Hive to achieve my goal?
In addition to the existing answers to the question : Alter hive table add or drop column
As per Hive documentation,
REPLACE COLUMNS removes all existing columns and adds the new set of columns.
REPLACE COLUMNS can also be used to drop columns. For example, ALTER TABLE test_change REPLACE COLUMNS (a int, b int); will remove column c from test_change's schema.
The query you are using is right. But this will modify only schema i.e, the metastore. This will not modify anything on data side.
So, before you are dropping the column you should make sure that you hav correct data file.
In your case the data file should not contain name values.
If you don't want to modify the file then create another table with only specific column that you need.
Create table tablename as select id from already_existing_table
let me know if this helps.
I have a tab-separated textfile in HDFS, and want to export this into a MySQL table.
Since the rows in the textfile do not have numerical ids, how do I export into a table with an ID automatically set during the SQL INSERT (autoincrement)?
If I try to export (id being the last defined attribute in the table), I get
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at entity.__loadFromFields(entity.java:996)
If I take the autogenerated class and modify it to exclude the id-attribute, I get
java.io.IOException: java.sql.SQLException: No value specified for parameter 27
where parameter 27 is 'id'.
Version is Sqoop 1.3.0-cdh3u3
In Sqoop 1.4.1, writing a "null" in the text file field position corresponding to the autoincrement field worked for me. After exported to mySQL you will see an incremented and automatically asigned ID.
As somebody on the Sqoop mailinglist suggested:
Create a temporary table without the ID
Sqoop-export into this table
Copy the rows of this table into the final table (that has the autoincrement ID)
My source table is in HIVE. What works for me is that I add a column called id int, and populate the column as NULL. After sqoop, the mysql will receive insert (id, X, Y) values (null, "x_value, "y_value"). Then mysql knows to populate the id as auto-increment.