Hive table always set column comment is "from deserializer" - hadoop

I execute query to create Hive table below:
CREATE TABLE db1.test_create_tbl( column1 smallint COMMENT 'desc of column')
COMMENT 'desc of table'
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
I execute query to display table schema below:
DESCRIBE db1.test_create_tbl
But when i get table schema, column description always display "from deserializer"
Please touch me, thanks

I have come across same issue. I have changed the hive configuration and it worked fine.
set the below parameter in hive shell.
set hive.serdes.using.metastore.for.schema=org.apache.hadoop.hive.serde2.OpenCSVSerde;

Related

Hive Alter External Table and Update Schema

I am looking for a command to add columns and update schema for my Hive External table backed by Avro schema.
Here is what I have tried so far.
I have a Hive External Table with Avro backed Schema created with this command -
CREATE EXTERNAL TABLE `person_hourly`(
'personid' string COMMENT '',
'name' string COMMENT ''
)
PARTITIONED BY (
'partitiontime' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
'hdfs://nameservice1/web/PersonData/'
TBLPROPERTIES (
'avro.schema.url'='hdfs:///schemas/PersonV1.avsc'
)
I would like to add additional columns and update schema for this table.
alter table person_hourly ADD COLUMNS (lastname string ) SET TBLPROPERTIES ('avro.schema.url' = 'hdfs:///schemas/PersonV2.avsc')
But I cannot do this since I get an error
FAILED: ParseException line 1:64 missing EOF at 'SET' near ')'
So I tried adding column separately, which worked, but I cannot update the schema
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. at least one column must be specified for the table
The Data Definition Language (DDL) for ALTER TABLE can be found here
ALTER TABLE table_name SET TBLPROPERTIES table_properties;
 
table_properties:
  : (property_name = property_value, property_name = property_value, ... )
And your comment
I tried adding column separately, which worked
I think that's what you should do. Add the column, then set the properties
if you modify the schema in the hdfs, it will be detected by Hive. Hive read the schema on runtime, it doesn't save any schema information when you use avsc through avro.schema.url
Regards,
Hector
The code below worked for me..
You can change the schema definition in avsc file (with proper formatting) then can use simply alter command with setting path of updated schema file.
ALTER TABLE table_name SET TBLPROPERTIES ("path of updated schema avsc format file")

Update new added columns in hive

I have been trying to make updates to an orc table in hive which is bucketed and also set transactional=true property. The normal update works great but as soon as I alter the table and add a new column e.g. column_added_5, and try to update column_added_5 the statement executes but the column does not get updated.
Any help/directions is appreciated.
I think that one way is:
CREATE TABLE new_table_name AS SELECT column1,column2,column3, ... "default_value" as column_added_5 FROM your_table_name;
DROP TABLE your_table_name;
ALTER TABLE new_table_name RENAME TO your_table_name;
Did you try this:
ALTER TABLE table_name ADD COLUMNS ( column_added_5 STRING COMMENT 'Column 5');

How to let CREATE TABLE...AS SELECT in HIVE do not populate data?

When I run CTAS in HIVE, the data is also populated simultaneously. But I just want to create the table, but not populate the data. How and what I should do? Thanks.
You can do that by using the LIKE keyword.
create table new_table_name LIKE old_table_name
This will create the table structure without the data.
Use create EXTERNAL table instead of create table. Observe External keyword.
Use where condition in select statement and give a value of where which fetches no records from hive.
Example table name demo1
id name country
1 abc India
2 xyz Germany
3 pqr France
In CREATE TABLE…AS SELECT in HIVE
Create table demo2...As SELECT id, name, country from demo1 where id=0;
So, in above where condition of id is given as 0 and from above data the select statement will fetch no record, similarly choose a value in where condition which returns no records. Hence no data will be inserted in newly created table.
#Sunil's answer helped me as well, I am just posting an addition that was necessary in my case.
The source table was in Avro format but the new one I wanted in ORC, hence,
CREATE TABLE dataaggregate_orc_empty LIKE dataaggregate_avro_compressed ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ('orc.compress'='ZLIB');
The above step can be split in two steps, if required :
CREATE TABLE dataaggregate_orc_empty LIKE dataaggregate_avro_compressed;
alter table dataaggregate_orc_empty set fileformat ORC;
I would be glad if someone provides inputs for the data format changes that occur in this process and related problems, if any.

How to get hive table comment through JDBC

Assume that a table is created like following in Hive:
create table test1
(
field_name int
)comment='TestTableComment';
Now I'd like to get the comment for the table (TestTableComment) through JDBC in Java, how can I get it?
Have you tried DatabaseMetaData.getTables()?
Reference

Hive load specific columns

I am interested in loading specific columns into a table created in Hive.
Is it possible to load the specific columns directly or I should load all the data and create a second table to SELECT the specific columns?
Thanks
Yes you have to load all the data like this :
LOAD DATA [LOCAL] INPATH /Your/Path [OVERWRITE] INTO TABLE yourTable;
LOCAL means that your file is on your local system and not in HDFS, OVERWRITE means that the current data in the table will be deleted.
So you create a second table with only the fields you need and you execute this query :
INSERT OVERWRITE TABLE yourNewTable
yourSelectStatement
FROM yourOldTable;
It is suggested to create an External Table in Hive and map the data you have and then create a new table with specific columns and use the create table as command
create table table_name as select statement from table_name;
For example the statement looks like this
create table employee as select id as id,emp_name as name from emp;
Try this:
Insert into table_name
(
#columns you want to insert value into in lowercase
)
select columns_you_need from source_table;

Resources