Hive Unexpected DataOperationType: UNSET - hadoop

Trying to persist hive table from storm-hive client, Getting following logs in HiveMetastoreServer logs.
020-02-26 23:20:27,748 ERROR org.apache.thrift.server.TThreadPoolServer: [pool-8-thread-178]: Error occurred during processing of message.
java.lang.IllegalStateException: Unexpected **DataOperationType: UNSET** agentInfo=Unknown txnid:1641
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:906) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:781) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]

Try to use explode instead of unnest! check this:
https://stackoverflow.com/a/51846380/9185215

I have downgraded storm-hive client to 1.2.3 from 2.1.0. And also excluded hive dependency jars from storm-hive 1.2.3 and added hive client version 2.1.1 to match my cloudera environment.

Related

Create Hive table with Phoenix handler throws NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo

I want to create hive table on top of phoenix table in emr.
I am facing a NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo
What I have done so far:
I followed the instructions from https://phoenix.apache.org/hive_storage_handler.html and added phoenix-hive-5.0.0-HBase-2.0.jar to hive-env.sh as well as in hive-site.xml .
Restarted the hive service systemctl restart hive-server2.service
Restarted the metastore systemctl restart hive-hcatalog-server.service
Executed create table command from hue:
create external table ext_table (
i1 int,
s1 string,
f1 float,
d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "ext_table",
"phoenix.zookeeper.quorum" = "localhost",
"phoenix.zookeeper.znode.parent" = "/hbase",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "i1",
"phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);
Got an exception: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.security.SecurityInfo)
I am using emr-6.1.0
HBase 2.2.5
Phoenix 5.0.0
Hive 3.1.2
Anybody has an idea what can be the issue?
Update
I followed the advice from #leftjoin and used ADD JAR from hue to add phoenix-hive jar to classpath. Then I faced jar compatibility issue caused by phoenix hive connector that I use:
phoenix-hive-5.0.0-HBase-2.0.jar.
The newer versions of phoenix connectors are not archived into single bundle that could be downloaded from phoenix website . Instead
the connectors are located now in github repo.
I built the new phoenix-hive connector (versions: Phoenix->5.1.0, Hive->3.1.2, Hbase->2.2) and used it to create the Hive table.
As a result I got another exception, which I am not able to fix:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/phoenix/compat/hbase/CompatSteppingSplitPolicy
I think it is still somehow connected to dependency issues. But no clue what is exactly.
As a workaround put jar into hdfs and execute ADD JAR command before create table and query:
ADD JAR hdfs://path/to/your/jar/phoenix-hive-5.0.0-HBase-2.0.jar;

How to create external table to DynamoDB on Hive 3.1

I'd like to know if it's possible to have an external table pointing to a DynamoDB table on AWS using Hive.
I'm not using AWS EMR, what I'm using is a Hadoop Stack configured through Apache Ambari.
Hive version: Hive 3.1.0.3.1.4.0-315
What I did was:
Downloaded the EMR Dynamo-Hive connector JARS directly from the maven repository: https://mvnrepository.com/artifact/com.amazon.emr
I loaded all the JARS in hive.aux.jars.path:
emr-dynamodb-hadoop-4.12.0.jar
emr-dynamodb-hive-4.12.0.jar
emr-dynamodb-tools-4.12.0.jar
hive1.2-shims-4.12.0.jar
hive1-shims-4.12.0.jar
hive2-shims-4.12.0.jar
hive2-shims-4.15.0.jar
shims-common-4.12.0.jar
shims-loader-4.12.0.jar
But when I try to create the table with:
CREATE EXTERNAL TABLE dynamo_LabDynamoHive
(id double, nome string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "LabDynamoHive",
"dynamodb.column.mapping" = "id:id,nome:nome"
);
I get the following error:
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist
INFO : Completed executing command(queryId=hive_20200422142624_6ebabdc8-8942-4025-84a8-411505d20895); Time taken: 0.203 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist (state=08S01,code=1)
I know I'm not loading a Shims JAR for Hive 3, but I'd like to know if any of you have tried and succeded in using an external table with DynamoDB using Hive 3 outside of EMR.
Any help or directions would be greatly appreciated!
problem is apparently the source code of this EMR connector is somewhat outdated and lacks Hive 3.x support recently introduced by AWS for EMR 6.0.
However, you can find a working 3.1 implementation here, forked from the official EMR connector: https://github.com/ramsesrm/emr-dynamodb-connector
Installation steps as follow:
1- compile the mentioned code (mvn clean package)
2- install the 3 JARs in your hive.aux.jars.path, along with aws-java-sdk-core and aws-java-sdk-dynamodb JARs from AWS (shim JARs are not required), 5 in total.
Tha's it. Don't forget to specify the region as a TBLPROPERTIES if you're not using the default US one.

HIVE_STATS_JDBC_TIMEOUT for Hive queries in Spark

I've just setup a new hadoop 3.0 cluster with Hive 2.3.2 and Spark 2.3. When I want to run some queries on Hive tables, getting following error.
I know there were some bugs in Hive, but seems like it was fixed for 2.1.1, but not sure what's the situation with 2.3.2 version. Do you have any idea if that could be handled somehow?
Thanks
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import spark.sql
import spark.sql
scala> sql("show databases")
java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:205)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
... 49 elided
I am running spar 2.3 with Hive 2.3.2 and encounter similar issue.
The fix you mentioned is for Hive 2.1 as can be seen from the Spark Jira following:
https://issues.apache.org/jira/browse/SPARK-13446
You can see from the latest comment that people are getting exactly same error as yours.
Also, as this so question answered, the current Hive version supported by Spark is 2.1

net.sf.jasperreports.engine.JRException: No deserializer defined

I am tring to connect HBASE with jasperreports-server-cp-6.0.1. I have hadoop 2.5.2 and hbase-1.0.1 installed on my system.
I have installed HBasePlugin-0.5.1.nbm plugin in iReport 5.6.0.
I have followed all the steps given in: http://community.jaspersoft.com/wiki/hadoop-hbase
When I write the following Query:
{ "tableName" : "blogposts", "deserializerClass" : "com.jaspersoft.hbase.deserialize.impl.ShellDeserializer" }
In iReport, I am getting the following error:
Message:
net.sf.jasperreports.engine.JRException: No deserializer defined
Level:
SEVERE
Stack Trace:
No deserializer defined
com.jaspersoft.hadoop.hbase.query.HBaseQueryWrapper.<init>(HBaseQueryWrapper.java:152)
com.jaspersoft.hadoop.hbase.HBaseFieldsProvider.getFields(HBaseFieldsProvider.java:50)
com.jaspersoft.ireport.hbase.designer.HBaseFieldsProvider.getFields(HBaseFieldsProvider.java:57)
com.jaspersoft.ireport.hbase.connection.HBaseConnection.readFields(HBaseConnection.java:185)
com.jaspersoft.ireport.designer.wizards.ConnectionSelectionWizardPanel.validate(ConnectionSelectionWizardPanel.java:146)
org.openide.WizardDescriptor$7.run(WizardDescriptor.java:1357)
org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
Could you please help me with this error (I also tried with iReport 4.0.2, but I received the same error)?
Both iReport and the HBase connector are outdated.
Try using the Apache Phoenix JDBC driver which is compatible with the latest release (6.2) of the Jaspersoft products:
http://community.jaspersoft.com/wiki/how-use-apache-phoenix-jdbc-driver-run-reports-hbase
Thanks!

Hadoop Hive integration INSERT query

i'am hadoop newbie,and i'am trying this tutorial:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
1.Starting hive is done successfully with the parameter:
hive --auxpath /cygdrive/c/Hadoop/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/cygdrive/c/javaHBase/hbase-0.94.6/hbase-0.94.6.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/zookeeper-3.4.3.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/guava-r09.jar -hiveconf hbase.master=localhost:60010
2.starting hbase is successuful.
3."CREATE TABLE hbase_table_1" is done successfully
4.I verified with commands list and show tables, all is ok
here is my problem
"INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;"
i get this error message searching in "htp://localhost:50060/tasklog?attemptid..."
java.lang.ClassNotFoundException: org/apache/hadoop/hive/hbase/HBaseSerDe
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat
Continuing ...
java.lang.NullPointerException
Continuing ...
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:280)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:78)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
...
i tried to copy hive jars to hbase install and vice vesa...
note: i added necessary JARS with hive command: ADD JAR C:...\hive-hbase-handler-0.9.0.jar etc.
hbase version: 0.94.6
hive version: 0.9.0
any additional export? or configuration?
I need help please!
Thanks a lot!
Try adding these jars in ditributed cache by running following commands in hive CLI or include these lines in $HOME/.hiverc file. That should resolve the ClassNotFoundException.
ADD JAR ...../hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar;
ADD JAR ...../hbase-0.94.1/hbase-0.94.6.jar;
ADD JAR ...../hbase-0.94.1/lib/zookeeper-3.4.3.jar;
ADD JAR ...../hbase-0.94.1/lib/guava-11.0.2.jar;
ADD JAR ...../hbase-0.94.1/lib/protobuf-java-2.4.0a.jar;
This is required when running hive queries in mapred mode (not local).
Brother do following things. your problem will be solved. ( change according to your version )
copy these files in hadoop lib directory
$HIVE_HOME/lib/hive-hbase-handler-0.8.1.jar,$HIVE_HOME/lib/hbase-0.90.0.jar,$HIVE_HOME/lib/zookeeper-3.3.1.jar,$HIVE_HOME/lib/guava-r06.jar,
and hive-serde jar

Resources