How to connect Hive in iReport? - hadoop

I am using Hadoop-0.20.0 and Hive-0.8.0. Now i have data into Hive table and i want generate reports from that. For that I am using iReport-4.5.0. For that I also download HivePlugin-0.5.nbm in iReport.
Now I am going to connect Hive connection in iReport.
Create New Data source --> New --> Hive Connection
Jdbc Drive: org.apache.hadoop.hive.jdbc.HiveDriver
Jdbc URl: jdbc:hive//localhost:10000/default
Server Address: localhost
Database: default
user name: root
password: somepassword
Then click on Test connection button.
I am getting error like:
Exception
Message:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format)
Level:
SEVERE
Stack Trace:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException:
java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format)
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:226)
org.apache.hadoop.hive.jdbc.HiveConnection.<init>(HiveConnection.java:72)
org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)
com.jaspersoft.ireport.designer.connection.JDBCConnection.getConnection(JDBCConnection.java:140)
com.jaspersoft.ireport.hadoop.hive.connection.HiveConnection.getConnection(HiveConnection.java:48)
com.jaspersoft.ireport.designer.connection.JDBCConnection.test(JDBCConnection.java:447)
com.jaspersoft.ireport.designer.connection.gui.ConnectionDialog.jButtonTestActionPerformed(ConnectionDialog.java:335)
com.jaspersoft.ireport.designer.connection.gui.ConnectionDialog.access$300(ConnectionDialog.java:43)
Can any one help me in this? Where i am wrong or missing something?

"I also download HivePlugin-0.5.nbm in iReport."
This isn't clear. iReport 4.5 has the Hadoop Hive connector pre-installed. Why did you download the connector separately? Did you install this plugin?
Create New Data source --> New --> Hive Connection
Jdbc Drive: org.apache.hadoop.hive.jdbc.HiveDriver
...
This isn't possible with the current Hadoop Hive connector. When you create a new "Hadoop Hive Connection" you are given only one parameter to fill out: the url.
I'm guessing that you created a JDBC connection when you meant to create a Hadoop Hive connection. This is a logical thing to do. Hive is accessed via JDBC. But the Hive JDBC driver is still pretty new. It has a number of shortcomings. That's why the Hive connector was added to iReport. It is based on the Hive JDBC driver, but it includes a wrapper around it to avoid some problems.
Or maybe you installed an old Hive connector over the top of the one that's already included with iReport 4.5. At some point in the past the Hive connector let you fill in extra information like the JDBC Driver.
Start with a fresh iReport installation, and make sure you use the Hadoop Hive Connection. That should clear it up.

The error "java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format)" happens because the VersionInfo class in hadoop-common.jar attempts to locate the version info using the current thread's class loader.
https://github.com/apache/hadoop/blob/release-2.6.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/VersionInfo.java#L41-L58
The code in question looks like this...
package org.apache.hadoop.util;
...
public class VersionInfo {
...
protected VersionInfo(String component) {
info = new Properties();
String versionInfoFile = component + "-version-info.properties";
InputStream is = null;
try {
is = Thread.currentThread().getContextClassLoader()
.getResourceAsStream(versionInfoFile);
if (is == null) {
throw new IOException("Resource not found");
}
info.load(is);
} catch (IOException ex) {
LogFactory.getLog(getClass()).warn("Could not read '" +
versionInfoFile + "', " + ex.toString(), ex);
} finally {
IOUtils.closeStream(is);
}
}
If your tool attempts to connect to the datasource in a separate thread, it will generate this error.
The easiest way to work around the issue is to put the hadoop-common.jar library in $JAVA_HOME/lib/ext or use the command line setting -Djava.endorsed.dirs to point to the hadoop-common.jar library. Then the thread's class loader will always be able to find this information.

Related

Create Hive table with Phoenix handler throws NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo

I want to create hive table on top of phoenix table in emr.
I am facing a NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo
What I have done so far:
I followed the instructions from https://phoenix.apache.org/hive_storage_handler.html and added phoenix-hive-5.0.0-HBase-2.0.jar to hive-env.sh as well as in hive-site.xml .
Restarted the hive service systemctl restart hive-server2.service
Restarted the metastore systemctl restart hive-hcatalog-server.service
Executed create table command from hue:
create external table ext_table (
i1 int,
s1 string,
f1 float,
d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "ext_table",
"phoenix.zookeeper.quorum" = "localhost",
"phoenix.zookeeper.znode.parent" = "/hbase",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "i1",
"phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);
Got an exception: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.security.SecurityInfo)
I am using emr-6.1.0
HBase 2.2.5
Phoenix 5.0.0
Hive 3.1.2
Anybody has an idea what can be the issue?
Update
I followed the advice from #leftjoin and used ADD JAR from hue to add phoenix-hive jar to classpath. Then I faced jar compatibility issue caused by phoenix hive connector that I use:
phoenix-hive-5.0.0-HBase-2.0.jar.
The newer versions of phoenix connectors are not archived into single bundle that could be downloaded from phoenix website . Instead
the connectors are located now in github repo.
I built the new phoenix-hive connector (versions: Phoenix->5.1.0, Hive->3.1.2, Hbase->2.2) and used it to create the Hive table.
As a result I got another exception, which I am not able to fix:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/phoenix/compat/hbase/CompatSteppingSplitPolicy
I think it is still somehow connected to dependency issues. But no clue what is exactly.
As a workaround put jar into hdfs and execute ADD JAR command before create table and query:
ADD JAR hdfs://path/to/your/jar/phoenix-hive-5.0.0-HBase-2.0.jar;

How to create external table to DynamoDB on Hive 3.1

I'd like to know if it's possible to have an external table pointing to a DynamoDB table on AWS using Hive.
I'm not using AWS EMR, what I'm using is a Hadoop Stack configured through Apache Ambari.
Hive version: Hive 3.1.0.3.1.4.0-315
What I did was:
Downloaded the EMR Dynamo-Hive connector JARS directly from the maven repository: https://mvnrepository.com/artifact/com.amazon.emr
I loaded all the JARS in hive.aux.jars.path:
emr-dynamodb-hadoop-4.12.0.jar
emr-dynamodb-hive-4.12.0.jar
emr-dynamodb-tools-4.12.0.jar
hive1.2-shims-4.12.0.jar
hive1-shims-4.12.0.jar
hive2-shims-4.12.0.jar
hive2-shims-4.15.0.jar
shims-common-4.12.0.jar
shims-loader-4.12.0.jar
But when I try to create the table with:
CREATE EXTERNAL TABLE dynamo_LabDynamoHive
(id double, nome string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "LabDynamoHive",
"dynamodb.column.mapping" = "id:id,nome:nome"
);
I get the following error:
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist
INFO : Completed executing command(queryId=hive_20200422142624_6ebabdc8-8942-4025-84a8-411505d20895); Time taken: 0.203 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist (state=08S01,code=1)
I know I'm not loading a Shims JAR for Hive 3, but I'd like to know if any of you have tried and succeded in using an external table with DynamoDB using Hive 3 outside of EMR.
Any help or directions would be greatly appreciated!
problem is apparently the source code of this EMR connector is somewhat outdated and lacks Hive 3.x support recently introduced by AWS for EMR 6.0.
However, you can find a working 3.1 implementation here, forked from the official EMR connector: https://github.com/ramsesrm/emr-dynamodb-connector
Installation steps as follow:
1- compile the mentioned code (mvn clean package)
2- install the 3 JARs in your hive.aux.jars.path, along with aws-java-sdk-core and aws-java-sdk-dynamodb JARs from AWS (shim JARs are not required), 5 in total.
Tha's it. Don't forget to specify the region as a TBLPROPERTIES if you're not using the default US one.

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

Not able to connect to hive on AWS EMR using java

I have setup AWS EMR cluster with hive. I want to connect to hive thrift server from my local machine using java. I tried following code-
Class.forName("com.amazon.hive.jdbc3.HS2Driver");
con = DriverManager.getConnection("jdbc:hive2://ec2XXXX.compute-1.amazonaws.com:10000/default","hadoop", "");
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HiveJDBCDriver.html.As mentioned in the developer guide, added jars related with hive jdbc driver to class path.
But I am getting exception when trying to get connection.
I was able to connect to hive server on simple hadoop cluster using above code (with different jdbc driver).
Can someone please suggest if I am missing something?
Is it possible to connect to hive server on AWS EMR from local machine using hive jdbc?
(Merged Answer from the comments)
Hive is running on port 10000 but only locally, you have to create a ssh tunnel to the emr.
The following is from the documentation for hive 0.13.1
Create Tunnel
ssh -o ServerAliveInterval=10 -i path-to-key-file -N -L 10000:localhost:10000 hadoop#master-public-dns-name
Connect to JDBC
jdbc:hive2://localhost:10000/default
You can use the code using the library JSch
public static void portForwardForHive() {
try {
if(session != null && session.isConnected()) {
return;
}
JSch jsch = new JSch();
jsch.addIdentity(PATH_TO_SSH_KEY_PEM);
String host = REMOTE_HOST;
session = jsch.getSession(USER, host, 22);
// username and password will be given via UserInfo interface.
UserInfo ui = new MyUserInfo();
session.setUserInfo(ui);
session.connect();
int assingedPort = session.setPortForwardingL(LPORT, RHOST, RPORT);
System.out.println("Port forwarding done for the post : " + assingedPort);
} catch (Exception e) {
System.out.println(e);
}
}
Not sure if you've resolved this yet, but its a bug in EMR that's just bitten me.
For direct jdbc connectivity like you are doing, you must include the jdbc drivers in your shaded uber-jar. For jdbc access from within dataframes, you cannot access the jar in your uber-jar (another unrelated bug), but you must specify it on the command line (S3 is a convenient place to keep them):
--files s3://mybucketJAR/postgresql-9.4-1201.jdbc4.jar
However, even after this you will run into another problem if you are specifically trying to access hive. Amazon has built their own jdbc drivers with a different class hierarchy to the normal hive driver (com.amazon.hive.jdbc41.HS2Driver), however the EMR cluster includes the standard Hive jdbc driver in its standard path (org.apache.hive.jdbc.HiveDriver).
This is automatically registered as being capable of handling the jdbc:hive and jdbc:hive2 urls, so when you try to connect to a hive URL it finds this one first and uses it - even if you specifically register the amazon one. Unfortunately, this one is not compatible with amazon's EMR build of Hive.
There are two possible solutions:
1: Find the offending driver and unregister it:
Scala example:
val jdbcDrv = Collections.list(DriverManager.getDrivers)
for(i <- 0 until jdbcDrv.size) {
val drv = jdbcDrv.get(i)
val drvName = drv.getClass.getName
if(drvName == "org.apache.hive.jdbc.HiveDriver") {
log.info(s"Deregistering JDBC Driver: ${drvName}")
DriverManager.deregisterDriver(drv)
}
}
Or
2: As I found out later, you can specify the driver as part of the connect properties when you attempt to connect:
Scala example:
val hiveCredentials = new java.util.Properties
hiveCredentials.setProperty("user", hiveDBUser)
hiveCredentials.setProperty("password", hiveDBPassword)
hiveCredentials.setProperty("driver", "com.amazon.hive.jdbc41.HS2Driver")
val conn = DriverManager.getConnection(hiveDBURL, hiveCredentials)
This is a more "correct" version as it should override any preregistered handlers even if they have completely different class hierarchies.

hive failed execution error return code 2 from org.apache.hadoop.hive.ql.exec.mapredtask

I have one query. It is executing fine on Hive CLI and returning the result. But when I am executing it with the help of Hive JDBC, I am getting an error below:
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:192)
What is the problem? Also I am starting the Hive Thrift Server through Shell Script. (I have written a shell script which has command to start Hive Thrift Server) Later I decided to start Hive thrift Server manually by typing command as:
hadoop#ubuntu:~/hive-0.7.1$ bin/hive --service hiveserver
Starting Hive Thrift Server
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:10000.
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:99)
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:80)
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:73)
at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:384)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
hadoop#ubuntu:~/hive-0.7.1$
Please help me out from this.
Thanks
For this error :
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuer
Go to this link :
http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_Hive.html
and add
**hadoop-0.20-core.jar
hive/lib/hive-exec-0.7.1.jar
hive/lib/hive-jdbc-0.7.1.jar
hive/lib/hive-metastore-0.7.1.jar
hive/lib/hive-service-0.7.1.jar
hive/lib/libfb303.jar
lib/commons-logging-1.0.4.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar**
to the class path of your project , add this jars from the lib of hadoop and hive, and try the code. and also add the path of hadoop, hive, and hbase(if your are using) lib folder path to the project class path, like you have added the jars.
and for the second error you got
type
**netstat -nl | grep 10000**
if it shows something means hive server is already running. the second error comes only when the port you are specifying is already acquired by some other process, by default server port is 10000 so very with the above netstat command which i said.
Note : suppose you have connected using code exit from ... bin/hive of if you are connected through bin/hive > then code will not connect because i think (not sure) only one client can connect to the hive server.
do above steps hopefully will solve your problem.
NOTE : exit from cli when you are going to execute the code, and dont start cli while code is being executing.
Might be some issue with permission, just try some query like "SELECT * FROM " which won't start MR jobs.
Try to paste these property before the codes.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
set hive.auto.convert.join = false;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=10000;

Resources