java.io.EOFException when trying to run examples on HBase standalone - hadoop

I'm trying to run this example: https://github.com/larsgeorge/hbase-book/blob/master/ch03/src/main/java/client/PutExample.java, from this book: http://ofps.oreilly.com/titles/9781449396107/, on a standalone HBase installation. Starting HBase works fine and the shell is accessible, but when I try to run the example I get the following error:
Exception in thread "main" java.io.IOException: Call to /127.0.0.1:55958 failed on local exception: java.io.EOFException
at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:872)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:841)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:141)
at $Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:174)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:295)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:272)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:324)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:228)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1228)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1190)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1177)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:914)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:784)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1014)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:814)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:778)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:188)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159)
at client.CRUDExample.main(CRUDExample.java:26)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:548)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:486)
Thanks in advance.

The problem was that I compiled the examples with a newer version of HBase. To fix this error, for these examples, edit pom.xml and make sure that the HBase dependency is the same version as you're running and build again. (Also don't forget to remove chXX/target/cached_classpath.txt, otherwise it still adds the other library to your classpath)

I met this problem with exactly same error logs, but difference reason.
It is fixed after I added bellow items to hbase client config:
<property>
<name>hbase.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.rpc.engine</name>
<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>

Related

Hive:Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

I'm new in Hive.
Today I installed the Hive, followed the book and just used CLI create a table. Then I had a problem.
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Here is the version:
hadoop 2.9.0
HBase 1.4.5
Hive 2.3.3
TIPS:
no MySQL and no hive-site.xml.Just a purity Hive.
In my case, the problem was that I had tried to start the hive-shell before the metastore_db service.
To solve this problem I had to delete (or move) the metastore_db and try the following command:
$ rm -rf metastore_db
$ schematool -dbType derby -initSchema
$ hive
I've just solved it.
Thanks to this blog! https://blog.csdn.net/senvil/article/details/48894237 !
In the hive-site.xml, add this:
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description/>

Hive metastore Configuration with derby

In RedHat test server I installed hadoop 2.7 and I ran Hive ,Pig & Spark with out issues .But when tried to access metastore of Hive from Spark I got errors So I thought of putting hive-site.xml(After extracting 'apache-hive-1.2.1-bin.tar.gz' file I just add $HIVE_HOME to bashrc as per tutorial and everything was working other than this integration with Spark) In apache site I found that I need to put hive-site.xml as metastore configuration
I created the file as below
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
</configuration>
I put IP as localhost since it is single node machine .After that I am not able to connect to even Hive .It is throwing error
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
....
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby://localhost:1527/metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: No suitable driver found for jdbc:derby://localhost:1527/metastore_db;create=true
There are lot many error log pointing to the same thing . If I remove hive-site.xml from the conf folder hive is working without issues .Can anyone point me to the right path for default metastore configuration
Thanks
Anoop R
Derby is used as an embedded database. try using
jdbc:derby:metastore_db;create=true
as jdbc-url. see also
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-EmbeddedMetastore
To use the metastore fully functional (and by that to be able to access it from different services), try setting up using mysql as described in the document above.
As you are setting up an embedded metastore database, use the property below as JDBC URL:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>
I was also facing similar kind of exception while installing hive. The thing which worked for me was to initialize the derby db. I used the following command to solve the problem : command -> Go to $HIVE_HOME/bin and run the command schematool -initSchema -dbType derby .
You can follow the link http://www.edureka.co/blog/apache-hive-installation-on-ubuntu
It will work if you put derbyclient.jar in lib folder of hive

Set LD_LIBRARY_PATH or java.library.path for YARN / Hadoop2 Jobs

i have a Hadoop FileSystem which is using native libraries with JNI.
Apparently i have to include the shared object independently of the currently executed job. But i can't find a way to tell Hadoop/Yarn where it should look for the shared object.
I had partial success with the following solutions, while starting the wordcount example with yarn.
Setting export JAVA_LIBRARY_PATH=/path when starting the resource- and the nodemanager.
This helps with with the resource and the nodemanager, but the actual Job/Application fails. Printing the LD_LIBRARY_PATH and the java.library.path while executing the wordcount example yield the following result. What
/logs/userlogs/application_x/container_x_001/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x
Setting yarn.app.mapreduce.am.env="LD_LIBRARY_PATH=/path"
This did help with some of the Jobs. The actual map/reduce job did work (at least i have the correct results), but the call did fail with the Error no jni-xtreemfs in java.library.path.
Somehow the first application/job did work and shows
/logs/userlogs/application_x/container_x_001/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/path:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_001:/path
But the second and the rest did fail with:
/logs/userlogs/application_x/container_x_002/stdout
...
java.library.path : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_002:/opt/hadoop-2.7.1/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
LD_LIBRARY_PATH : /tmp/hadoop-u/nm-local-dir/usercache/u/appcache/application_x/container_x_002/opt/hadoop-2.7.1/lib/native
The stacktrace for the later shows, that the error occured while executing YarnChild:
2015-08-03 15:24:03,851 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.UnsatisfiedLinkError: no jni-xtreemfs in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
at java.lang.Runtime.loadLibrary0(Runtime.java:849)
at java.lang.System.loadLibrary(System.java:1088)
at org.xtreemfs.common.libxtreemfs.jni.NativeHelper.loadLibrary(NativeHelper.java:54)
at org.xtreemfs.common.libxtreemfs.jni.NativeClient.<clinit>(NativeClient.java:41)
at org.xtreemfs.common.libxtreemfs.ClientFactory.createClient(ClientFactory.java:72)
at org.xtreemfs.common.libxtreemfs.ClientFactory.createClient(ClientFactory.java:51)
at org.xtreemfs.common.clients.hadoop.XtreemFSFileSystem.initialize(XtreemFSFileSystem.java:191)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Supply the libjni-xtreemfs.so via the commandline argument -files
This does work. I assume the .so is copied to the tmp directory. But this is no feasible solution, because it would require the users to supply the path to the .so on every call.
Does anybody now how i can globally set the LD_LIBRARY_PATH or the java.library.path or can suggest which configuration options i did probably miss? I'd be very thankful!
Short Answer: in your mapred-site.xml put the following
<property>
<name>mapred.child.java.opts</name>
<value>-Djava.library.path=$PATH_TO_NATIVE_LIBS</value>
</property>
Explanation:
The Job/Applications aren't executed by yarn rather than by a mapred (map/reduce) container, whoose configuration is controlled by the mapred-site.xml file. Specifying custom java parameters there causes that actual workers to spin with the correct path
Use mapreduce.map.env in your job or site configuration.
Usage is as follows:
<property>
<name>mapreduce.map.env</name>
<value>LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/my/libs</value>
</property>
Note:
Hadoop docs encourage the use of mapreduce.map.env for this over mapred.child.java.opts. "Usage of -Djava.library.path can cause programs to no longer function if hadoop native libraries are used."

NoClassDefFoundError for simple Yarn application

I was trying to run the simple yarn application from simple-yarn-app. But I am getting the following exception in my application error logs.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/conf/YarnConfiguration
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
at java.lang.Class.getMethod0(Class.java:2774)
at java.lang.Class.getMethod(Class.java:1663)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.conf.YarnConfiguration
But if I run "yarn classpath" command on all my datanodes, I see the following output:
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*
which has the path to the yarn-client, yarn-api, yarn-common and the hadoop-common jars required by the application. Can anyone point me to the direction where I might have forgotten to set the right classpath.
I found that Hadoop does not resolve $HADOOP_HOME and $YARN_HOME environment variables while iterating over the YarnConfiguration attributes. Running the following in your Yarn Client will print the unresolved configuration, like,
$HADOOP_HOME/, $HADOOP_HOME/lib/
YarnConfiguration conf = new YarnConfiguration()
for (String c : conf.getStrings(
YarnConfiguration.YARN_APPLICATION_CLASSPATH,
YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)) {
System.out.println(c);
}
So, if you provide the full path for the yarn.application.classpath property, the NoClassDefFoundError issue gets resolved.
<property>
<description>CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries</description>
<name>yarn.application.classpath</name>
<value>
/etc/hadoop/conf,
/usr/lib/hadoop/*,
/usr/lib/hadoop/lib/*,
/usr/lib/hadoop-hdfs/*,
/usr/lib/hadoop-hdfs/lib/*,
/usr/lib/hadoop-mapreduce/*,
/usr/lib/hadoop-mapreduce/lib/*,
/usr/lib/hadoop-yarn/*,
/usr/lib/hadoop-yarn/lib/*
</value>
</property>
The issue will happen on YARN clusters where the ResourceManager and/or NodeManager daemons are started with an application classpath that is incomplete. Even something as simple as the included spark-shell will fail:
user#linux$ spark-shell --master yarn-client
Sadly, you only find out when launching your application; or running it long enough to run into the class(es) that are missing. To remedy this issue, I took the output of the following CLASSPATH command,
user#linux$ yarn classpath
and cleaned it up (because it contains duplicates and non-canonical entries), appended it to the below YARN configuration directive, which is found in
/etc/hadoop/conf/yarn-site.xml, and finally restarted the YARN cluster daemons:
user#linux$ sudo vi /etc/hadoop/conf/yarn-site.xml
[ ... ]
<property>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$YARN_HOME/*,
$YARN_HOME/lib/*,
/etc/hadoop/conf,
/usr/lib/hadoop/*,
/usr/lib/hadoop/lib,
/usr/lib/hadoop/lib/*,
/usr/lib/hadoop-hdfs,
/usr/lib/hadoop-hdfs/*,
/usr/lib/hadoop-hdfs/lib/*,
/usr/lib/hadoop-yarn/*,
/usr/lib/hadoop-yarn/lib/*,
/usr/lib/hadoop-mapreduce/*,
/usr/lib/hadoop-mapreduce/lib/*
</value>
</property>
The entries above that don't contain references to environment variables, are the ones that I added. Remember to copy this modified file to all nodes on your YARN cluster prior to restarting the ResourceManager and NameNode daemons.
In general, you'll want to package all non-provided dependencies (classes and modules) into your application archive. =:)

Can't get HBase to connect to Hadoop

EDIT: I was able to get this to work. I created a tutorial to show how:
http://www.dreamsyssoft.com/blog/blog.php?/archives/5-How-to-use-HBase-Hadoop-Clustered.html
I can get HBase working just fine when I set the hbase-site.xml property:
<name>hbase.rootdir</name>
<value>file:///app/hbase/hbase/</value>
This works just fine, it stores the data in the directory as expected, however I want it to connect to my running hadoop instance now instead of using a local file. I set it to
<value>hdfs://localhost:9000/</value>
Instead of the local file and it will not work. Is there some extra configuration I need to do on the hadoop side to support this? Hadoop is running and using port 9000.
Here's my core-site.xml from hadoop:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Also here's the error when trying to do a "list" in hbase shell:
hbase(main):001:0> list
TABLE
12/06/28 15:26:29 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
12/06/28 15:26:29 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:569)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:98)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:275)
at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:91)
at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:178)
I believe Hbase cannot find the zookeeper quorum, you have to set the hbase.zookeeper.quorum property in hbase-site.xml. Also check if classpath is set properly or not, check this doc out
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath
I used the configuration suggested in http://hbase.apache.org/book.html#zookeeper
stop Ipatables and start hbase with root mode.
there may be compatibility issue please check http://rajkrrsingh.blogspot.in/2013/11/hbase-compatibility-with-hadoop-2x.html

Resources