How to create table in Hive with specific column values from another table - hadoop

I am new to Hive and have some problems. I try to find a answer here and other sites but with no luck... I also tried many different querys that come to my mind, also without success.
I have my source table and i want to create new table like this.
Were:
id would be number of distinct counties as auto increment numbers and primary key
counties as distinct names of counties (from source table)

You could follow this approach.
A CTAS(Create Table As Select)
with your example this CTAS could work
CREATE TABLE t_county
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE AS
WITH t AS(
SELECT DISTINCT county, ROW_NUMBER() OVER() AS id
FROM counties)
SELECT id, county
FROM t;
You cannot have primary key or foreign keys on Hive as you have primary key on RBDMSs like Oracle or MySql because Hive is schema on read instead of schema on write like Oracle so you cannot implement constraints of any kind on Hive.

I can not give you the exact answer because of it suppose to you must try to do it by yourself and then if you have a problem or a doubt come here and tell us. But, what i can tell you is that you can use the insertstatement to create a new table using data from another table, I.E:
create table CARS (name string);
insert table CARS select x, y from TABLE_2;
You can also use the overwrite statement if you desire to delete all the existing data that you have inside that table (CARS).
So, the operation will be
CREATE TABLE ==> INSERT OPERATION (OVERWRITE?) + QUERY OPERATION

Hive is not an RDBMS database, so there is no concept of primary key or foreign key.
But you can add auto increment column in Hive. Please try as:
Create table new_table as
select reflect("java.util.UUID", "randomUUID") id, countries from my_source_table;

Related

How to create Fact table in Hive and replace original values in table with key (id) values

What I want to create I explanied below. Is that possible to do in Hive?
I could do that in Python using Pandas and replace over columns, but I was wondering can that be done with query in Hive?
I have uploaded Source table in Hive and created Dimensional tables like below also (in Cloudera HUE), so is it possible to somehow create that Fact table by using Dimensional tables id values and replace values in Source table?
I have my Source table:
I create Dimensional tables from Source table:
And I want to create Fact table like this:
Join by values with source table and select IDs:
insert overwrite table fact
select pr.id as property, t.id as type, pl.id as place, s.price
from source_table s
left join property_dim pr on s.property=pr.property
left join type_dim t on s.type=t.type
left join place_dim pl on s.place=pl.place

View the contents and constraints of a table

I am working on this assignment question and it is asking me:
To create a table called (TEMP_CUST) from an existing table Customers
View the content and constraints of TEMP_CUST table
What I have done so far is I have created my table, didn't add any constraints to the table TEMP_CUST and viewed the table using the DESC command.
Here is the code for table creation
CREATE TABLE TEMP_CUST
AS
(SELECT
CUSTOMER#, LASTNAME,
FIRSTNAME, ADDRESS, CITY,
STATE, ZIP, REFERRED,
REGION, EMAIL
FROM
CUSTOMERS);
DESC TEMP_CUST;
Now that I have done that I want to view the constraints of the table. I have used this command but am not sure if it is correct.
SELECT *
FROM USER_CONSTRAINTS
WHERE TABLE_NAME = 'TEMP_CUST';
i have used this command, but not sure if it is correct.
You haven't said why you don't think it's correct so we have to guess the reason for your doubt. Perhaps it's because the set of constraints you get is smaller than the set of constraints for the original CUSTOMERS table?
That is correct. When we use CREATE TABLE ... AS SELECT the statement creates a new table with the projection, column names and datatypes of the original tables (assuming a vanilla SELECT clause) and the data (determined by the WHERE clause, if any). However, the only constraints which are created are NOT NULL constraints on the primary key column(s) and any other mandatory columns. The new table does not have primary key, foreign key or check constraints. We have to create these explicitly.
Hence, this query ...
SELECT * FROM USER_CONSTRAINTS
WHERE TABLE_NAME = 'TEMP_CUST';
... might return fewer constraints than you were expecting.

How to drop hive column?

I have two columns Id and Name in Hive table, and I want to delete the Name column. I have used following command:
ALTER TABLE TableName REPLACE COLUMNS(id string);
The result was that the Name column values were assigned to the Id column.
How can I drop a specific column of the table and is there any other command in Hive to achieve my goal?
In addition to the existing answers to the question : Alter hive table add or drop column
As per Hive documentation,
REPLACE COLUMNS removes all existing columns and adds the new set of columns.
REPLACE COLUMNS can also be used to drop columns. For example, ALTER TABLE test_change REPLACE COLUMNS (a int, b int); will remove column c from test_change's schema.
The query you are using is right. But this will modify only schema i.e, the metastore. This will not modify anything on data side.
So, before you are dropping the column you should make sure that you hav correct data file.
In your case the data file should not contain name values.
If you don't want to modify the file then create another table with only specific column that you need.
Create table tablename as select id from already_existing_table
let me know if this helps.

HIVE: How create a table with all columns in another table EXCEPT one of them?

When I need to change a column into a partition (convert normal column as partition column in hive), I want to create a new table to copy all columns except one. I currently have >50 columns in the original table. Is there any clean way of doing that?
Something like:
CREATE student_copy LIKE student EXCEPT age and hair_color;
Thanks!
You can use a regex:
CTAS using REGEX column spec. :
set hive.support.quoted.identifiers=none;
CREATE TABLE student_copy AS SELECT `(age|hair_color)?+.+` FROM student;
set hive.support.quoted.identifiers=column;
BUT (as mentioned by Kishore Kumar Suthar :
this will not create a partitioned table, as that is not supported with CTAS (Create Table As Select).
Only way I see for you to get your partitioned table is by getting the complete create statement of the table (as mentioned by Abraham):
SHOW CREATE TABLE student;
Altering it to create a partition on the column you want. And after that you can use the select with regex when inserting into the new table.
If your partition column is already part of this select, then you need to make sure it is the last column you insert. If it is not you can exclude that column in the regex and including it as last. Also if you expect several partitions to be created based on your insert statement you need to enable 'dynamic partitioning':
set hive.support.quoted.identifiers=none;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO TABLE student_copy PARTITION(partcol1) SELECT `(age|hair_color|partcol1)?+.+`, partcol1 FROM student;
set hive.support.quoted.identifiers=column;
the 'hive.support.quoted.identifiers=none' is required to use the backticks '`' in the regex part of the query. I set this parameter to it's original value after my statement: 'hive.support.quoted.identifiers=column'
CREATE TABLE student_copy LIKE student;
It just copies the source table definition.
CREATE TABLE student_copy AS select name, age, class from student;
Target cannot be partitioned table.
Target cannot be external table.
It copies the structure as well as the data
I use below command to get the create statement of existing table.
SHOW CREATE TABLE student;
Copy the result and modify that based on your requirement for new table and run the modified command to get the new table.

How to let CREATE TABLE...AS SELECT in HIVE do not populate data?

When I run CTAS in HIVE, the data is also populated simultaneously. But I just want to create the table, but not populate the data. How and what I should do? Thanks.
You can do that by using the LIKE keyword.
create table new_table_name LIKE old_table_name
This will create the table structure without the data.
Use create EXTERNAL table instead of create table. Observe External keyword.
Use where condition in select statement and give a value of where which fetches no records from hive.
Example table name demo1
id name country
1 abc India
2 xyz Germany
3 pqr France
In CREATE TABLEā€¦AS SELECT in HIVE
Create table demo2...As SELECT id, name, country from demo1 where id=0;
So, in above where condition of id is given as 0 and from above data the select statement will fetch no record, similarly choose a value in where condition which returns no records. Hence no data will be inserted in newly created table.
#Sunil's answer helped me as well, I am just posting an addition that was necessary in my case.
The source table was in Avro format but the new one I wanted in ORC, hence,
CREATE TABLE dataaggregate_orc_empty LIKE dataaggregate_avro_compressed ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ('orc.compress'='ZLIB');
The above step can be split in two steps, if required :
CREATE TABLE dataaggregate_orc_empty LIKE dataaggregate_avro_compressed;
alter table dataaggregate_orc_empty set fileformat ORC;
I would be glad if someone provides inputs for the data format changes that occur in this process and related problems, if any.

Resources