Does the avroserde in hive allow to update or delete records? - hadoop

I have a table in hive which is created using ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'.
Trying to update a record but I receive the following error message:
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. (state=42000,code=10294)
It seems that AvroSerDe does not support ACID transaction but I can't find any info about that.

Hive transactions doesn't support Avro file formats as of latest Hive release. And it's currently integrated to work with only ORC file formats.
According to Hive documentation "Only ORC file format is supported in this first release. The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC."
You can find more information about Hive transactions here

Related

How to delete existing record that is already loaded using hive

I am loading data as per daily routine in a external table of hive from local file system and it is around one year of data I have in my table. Today client informed me that the yesterday`s data was incorrect. Now how to delete the yesterday's data from the table which has already a huge amount of data in it.
You can only delete data from hive table by using Hive Transaction Management.But there are certain limitations:
1)File format should be orc type.
2)Your table must be bucketed.
3)Transaction can not be enabled on external table because its out of meta store control.
By default transaction management feature is off. You can turn this on by updating hive-site.xml file.

tblproperties("skip.header.line.count"="1") added while creating table in hive is making some issue in Imapla

I have created a table in Hive, and need to load the data using CSV file. So while creating table i mention the table property "tblproperties("skip.header.line.count"="1")".
And I have loaded data into my table. This is my input file content
After loading data i am able to see the output in Hive console. Where as when i am fetching data from the same table in Impala, it is giving some problem and not skipping the header as i mentioned at the time of creation of table.
The impala result is like below.
Now my question is
Why Impala is not able to take the table properties and skipping the header.?
Please give me some information about it.

Attempt to do update or delete using transaction manager that does not support these operations

While trying to update a data in Hive table in Cloudera Quickstart VM, I'm getting this error.
Error while compiling statement: FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
I added some changes in hive-site.xml file and also restarted the hive and cloudera.These are changes which I made in Hive-site.xml
hive.support.concurrency – true
hive.enforce.bucketing – true
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager –org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on – true
hive.compactor.worker.threads – 1
I've tried with the configuration you provided in a hortonworks sandbox and I was able to do ACID operations on a table and I suppose it works also in Cloudera environment. Although there a some things to mention:
make sure hive has the properties you gave it (you can verify them in Hive CLI using SET command)
table that you work with must be bucketed, declared as ORC format and has in it's table properties 'transactional'='true' (hive support ACID operations only for ORC format and transactional tables). An example of a proper table is like this:
hive>create table testTableNew(id int ,name string ) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
You can follow this example.

Hive - Hbase integration Transactional update with timestamp

I am new to hadoop and big data, just trying to figure out the possibilities to move my Data store to hbase these days, and I have come across a problem, which some of you might be able to help me with. So its like,
I have a hbase table "hbase_testTable" with Column Family : "ColFam1". I have set the version of "ColFam1" to 10, as I have to maintain history upto 10 updates to this column family. Which works fine. When I try to add new rows through hbase shell with explicit timestamp value it works fine. Basically I want to use the timestamp as my version control. So I specify the time stamp as
put 'hbase_testTable' '1001','ColFam1:q1', '1000$', 3
where '3' is my version. And everything works fine.
Now I am trying to integrate with HIVE external table, and I have all mappings well set to match that of hbase table like below :
create external table testtable (id string, q1 string, q2 string, q3 string)
STOREd BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam1:q1, colfam1:q2, colfam1:q3")
TBLPROPERTIES("hbase.table.name" = "testtable", "transactional" = "true");
And works fine with normal insertion. It updates the HBase table and vice-versa.
Even though the external table is made "Transactional", I am not able to update the data on HIVE. It gives me an error :
FAILED: SemanticException [Error 10294]: Attempt to do update or delete
using transaction manager that does not support these operations
Said that, Any updates, made to the hbase tables are reflected immediately on the hive table.
I can update the Hbase table with hive external table by trying to insert into the hive external table for the "rowid" with new data for the column.
Is it possible to I control the timestamp being written to the referenced hbase table ( like 4,5,6,7..etc) Please help.
The timestamp is one of important element in Hbase versioning. You are trying to create your own timestamp, which works fine at Hbase level.
One point, is you should be very careful, with unique and non-negative. You can look at Custom versioning in HBase-Definitve Guide book.
Now you have Hive on top of Hbase. As per documentation,
there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.
Thats for the reading part. And for putting data, you can look here.
It still says that, you have to give valid time stamp and not any other value.
The future versions are expected to expose the timestamp attribute.
I hope you got a better idea regarding how to deal with custom timestamp in Hive-Hbase integration.

create a Parquet backed Hive table by using a schema file

Cloudera documentation, shows a simple way to "create a Avro backed Hive table by using an Avro schema file." This works great. I would like to do the same thing for a Parquet backed Hive table, but the relevant documentation in this case lists out every column type rather than reading from a schema. Is it possible to read the Parquet columns from a schema, in the same way as Avro data?
Currently, the answer appears to be no. There is an open issue with Hive.
https://issues.apache.org/jira/browse/PARQUET-76
The issue has been active recently, so hopefully in the near future Hive will offer the same functionality for Parquet as it does for Avro.

Resources