Integrating Hbase with Hive: Register Hbase table - hadoop

I am using Hortonworks Sandbox 2.0 which contains the following version of Hbase and Hive
Component Version
------------------------
Apache Hadoop 2.2.0
Apache Hive 0.12.0
Apache HBase 0.96.0
Apache ZooKeeper 3.4.5
...and
I am trying to register my hbase table into hive using the following query
CREATE TABLE IF NOT EXISTS Document_Table_Hive (key STRING, author STRING, category STRING) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES (‘hbase.columns.mapping’ = ‘:key,metadata:author,categories:category’) TBLPROPERTIES (‘hbase.table.name’ = ‘Document’);
This does not work, I get the following Exception:
2014-03-26 09:14:57,341 ERROR exec.DDLTask (DDLTask.java:execute(435)) – java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.setConf(HBaseStorageHandler.java:249)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
2014-03-26 09:14:57,368 ERROR ql.Driver (SessionState.java:printError(419)) – FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hbase/HBaseConfiguration
I have already created the Hbase Table “Document” and the describe command gives the following description
‘Document’,
{NAME => ‘categories’,..},
{NAME => ‘comments’,..},
{NAME => ‘metadata’,..}
I have tried the following things
add hive.aux.jars.path in hive-site.xml
hive.aux.jars.path
file:///etc/hbase/conf/hbase-site.xml,file:///usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar,file:///usr/lib/hive/lib/hive-hbase-handler-0.12.0.2.0.6.0-76.jar,file:///usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar,file:///usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar
add jars using hive add jar command
add jar /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar;
add jar /usr/lib/hive/lib/hive-hbase-handler-0.12.0.2.0.6.0-76.jar;
add jar /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar;
add jar /usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar;
add file /etc/hbase/conf/hbase-site.xml
specify the hadoop_classpath
export HADOOP_CLASSPATH=/etc/hbase/conf:/usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2:/usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar
And it is still not working!
How can I add the jars in the hive classpath so that it finds the hbaseConfiguration class,
or is it an entirely different issue?

No need to copy the entire jars. Just hbase-*.jar , zookeeper*.jar, hive-hbase-handler*.jar would be enough. By default all hadoop related jars would be added to hadoop classpath, Since hive internally uses hadoop command to execute.
Or
Instead of copying hbase jars to hive library by specifying HIVE_AUX_JARS_PATH environment variable to /usr/lib/hbase/lib/ in /etc/hive/conf/hive-env.sh will also do.
The second approach is more suggested than first

Related

Create Hive table with Phoenix handler throws NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo

I want to create hive table on top of phoenix table in emr.
I am facing a NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo
What I have done so far:
I followed the instructions from https://phoenix.apache.org/hive_storage_handler.html and added phoenix-hive-5.0.0-HBase-2.0.jar to hive-env.sh as well as in hive-site.xml .
Restarted the hive service systemctl restart hive-server2.service
Restarted the metastore systemctl restart hive-hcatalog-server.service
Executed create table command from hue:
create external table ext_table (
i1 int,
s1 string,
f1 float,
d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "ext_table",
"phoenix.zookeeper.quorum" = "localhost",
"phoenix.zookeeper.znode.parent" = "/hbase",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "i1",
"phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);
Got an exception: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.security.SecurityInfo)
I am using emr-6.1.0
HBase 2.2.5
Phoenix 5.0.0
Hive 3.1.2
Anybody has an idea what can be the issue?
Update
I followed the advice from #leftjoin and used ADD JAR from hue to add phoenix-hive jar to classpath. Then I faced jar compatibility issue caused by phoenix hive connector that I use:
phoenix-hive-5.0.0-HBase-2.0.jar.
The newer versions of phoenix connectors are not archived into single bundle that could be downloaded from phoenix website . Instead
the connectors are located now in github repo.
I built the new phoenix-hive connector (versions: Phoenix->5.1.0, Hive->3.1.2, Hbase->2.2) and used it to create the Hive table.
As a result I got another exception, which I am not able to fix:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/phoenix/compat/hbase/CompatSteppingSplitPolicy
I think it is still somehow connected to dependency issues. But no clue what is exactly.
As a workaround put jar into hdfs and execute ADD JAR command before create table and query:
ADD JAR hdfs://path/to/your/jar/phoenix-hive-5.0.0-HBase-2.0.jar;

How to create external table to DynamoDB on Hive 3.1

I'd like to know if it's possible to have an external table pointing to a DynamoDB table on AWS using Hive.
I'm not using AWS EMR, what I'm using is a Hadoop Stack configured through Apache Ambari.
Hive version: Hive 3.1.0.3.1.4.0-315
What I did was:
Downloaded the EMR Dynamo-Hive connector JARS directly from the maven repository: https://mvnrepository.com/artifact/com.amazon.emr
I loaded all the JARS in hive.aux.jars.path:
emr-dynamodb-hadoop-4.12.0.jar
emr-dynamodb-hive-4.12.0.jar
emr-dynamodb-tools-4.12.0.jar
hive1.2-shims-4.12.0.jar
hive1-shims-4.12.0.jar
hive2-shims-4.12.0.jar
hive2-shims-4.15.0.jar
shims-common-4.12.0.jar
shims-loader-4.12.0.jar
But when I try to create the table with:
CREATE EXTERNAL TABLE dynamo_LabDynamoHive
(id double, nome string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "LabDynamoHive",
"dynamodb.column.mapping" = "id:id,nome:nome"
);
I get the following error:
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist
INFO : Completed executing command(queryId=hive_20200422142624_6ebabdc8-8942-4025-84a8-411505d20895); Time taken: 0.203 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist (state=08S01,code=1)
I know I'm not loading a Shims JAR for Hive 3, but I'd like to know if any of you have tried and succeded in using an external table with DynamoDB using Hive 3 outside of EMR.
Any help or directions would be greatly appreciated!
problem is apparently the source code of this EMR connector is somewhat outdated and lacks Hive 3.x support recently introduced by AWS for EMR 6.0.
However, you can find a working 3.1 implementation here, forked from the official EMR connector: https://github.com/ramsesrm/emr-dynamodb-connector
Installation steps as follow:
1- compile the mentioned code (mvn clean package)
2- install the 3 JARs in your hive.aux.jars.path, along with aws-java-sdk-core and aws-java-sdk-dynamodb JARs from AWS (shim JARs are not required), 5 in total.
Tha's it. Don't forget to specify the region as a TBLPROPERTIES if you're not using the default US one.

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

XmlSerde error in hive

While trying to execute the create table statement in hive getting the below error.
CREATE EXTERNAL TABLE BOOKDATA(
> TITLE VARCHAR(40),
> PRICE INT
> )ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
> WITH SERDEPROPERTIES (
> "column.xpath.TITLE"="/CATALOG/BOOK/TITLE/",
> "column.xpath.PRICE"="/CATALOG/BOOK/PRICE/")
> STORED AS
> INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
> LOCATION '/sourcedata'
> TBLPROPERTIES (
> "xmlinput.start"="<CATALOG",
> "xmlinput.end"= "</CATALOG>"
> );
FAILED: SemanticException Cannot find class 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
Please help on how to resolve this issue.I am using hive CLI.
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution
1.
Requires restart of HiveServer2
hive.aux.jars.path
- Default Value: (empty)
- Added In: Hive 0.2.0 or earlier
The location of the plugin jars that contain implementations
of user defined functions (UDFs) and SerDes.
2.
hive.reloadable.aux.jars.path
- Default Value: (empty)
- Added In: Hive 0.14.0 with HIVE-7553
The locations of the plugin jars, which can be comma-separated folders or jars. They can be renewed (added, removed,
or updated) by executing the Beeline reload command without having to
restart HiveServer2. These jars can be used just like the auxiliary
classes in hive.aux.jars.path for creating UDFs or SerDes.
I too faced a similar issue in past. What worked out for me is just changing the hive engine to tez.
set hive.execution.engine=tez;
When you set Tez engine, It will pick up all the jars it needs to run a query.
Let me know if that works out for you.

Hadoop Hive integration INSERT query

i'am hadoop newbie,and i'am trying this tutorial:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
1.Starting hive is done successfully with the parameter:
hive --auxpath /cygdrive/c/Hadoop/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar,/cygdrive/c/javaHBase/hbase-0.94.6/hbase-0.94.6.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/zookeeper-3.4.3.jar,/cygdrive/c/Hadoop/hive-0.9.0/lib/guava-r09.jar -hiveconf hbase.master=localhost:60010
2.starting hbase is successuful.
3."CREATE TABLE hbase_table_1" is done successfully
4.I verified with commands list and show tables, all is ok
here is my problem
"INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;"
i get this error message searching in "htp://localhost:50060/tasklog?attemptid..."
java.lang.ClassNotFoundException: org/apache/hadoop/hive/hbase/HBaseSerDe
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat
Continuing ...
java.lang.ClassNotFoundException:
org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat
Continuing ...
java.lang.NullPointerException
Continuing ...
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:280)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:78)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
...
i tried to copy hive jars to hbase install and vice vesa...
note: i added necessary JARS with hive command: ADD JAR C:...\hive-hbase-handler-0.9.0.jar etc.
hbase version: 0.94.6
hive version: 0.9.0
any additional export? or configuration?
I need help please!
Thanks a lot!
Try adding these jars in ditributed cache by running following commands in hive CLI or include these lines in $HOME/.hiverc file. That should resolve the ClassNotFoundException.
ADD JAR ...../hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar;
ADD JAR ...../hbase-0.94.1/hbase-0.94.6.jar;
ADD JAR ...../hbase-0.94.1/lib/zookeeper-3.4.3.jar;
ADD JAR ...../hbase-0.94.1/lib/guava-11.0.2.jar;
ADD JAR ...../hbase-0.94.1/lib/protobuf-java-2.4.0a.jar;
This is required when running hive queries in mapred mode (not local).
Brother do following things. your problem will be solved. ( change according to your version )
copy these files in hadoop lib directory
$HIVE_HOME/lib/hive-hbase-handler-0.8.1.jar,$HIVE_HOME/lib/hbase-0.90.0.jar,$HIVE_HOME/lib/zookeeper-3.3.1.jar,$HIVE_HOME/lib/guava-r06.jar,
and hive-serde jar

Resources