Thrift Server is crashing due to "RetriesExhaustedException" - hadoop

When running thrift (/usr/hdp/2.3.0.0-2557/hbase/bin/hbase-daemon.sh start thrift) it stops working every once in a while.
In the logs I can see the exception:
2015-11-12 11:56:11,926 WARN [thrift-worker-3] thrift.ThriftServerRunner$HBaseHandler: Can't get the location
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:309)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:811)
at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.scannerOpenWithScan(ThriftServerRunner.java:1451)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)
at com.sun.proxy.$Proxy11.scannerOpenWithScan(Unknown Source)
at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4609)
at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4593)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: hconnection-0x20ae27f0 closed
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
... 22 more
I end up restarting thrift to get it working again.
I've tried implementing the following workaround found here but its still crashing:
<property>
<name>hbase.thrift.connection.max-idletime</name>
<value>1800000</value>
</property>
We are using HDP 2.3.0 stack from HortonWorks / HBase 1.1.1.2.3.0.0-2557

This is an ongoing issue with HBase. There are two seperate tickets regarding this problem: HBASE-14533 and HBASE-14196. Both tickets provide a patch to resolve the problem.
Regarding to Pankaj Kumar's comment from 01/19 the issue can be resolved if you merge both patches. Maybe this helps in your case, too.

Related

M/R job submission failing with error: Could not find Yarn tags property > (mapreduce.job.tags)

I am getting the following exception when running a map/reduce job. We submit map/reduce jobs through oozie.
Failing Oozie Launcher, Main class
[org.apache.oozie.action.hadoop.JavaMain], main() threw exception,
Could not find Yarn tags property (mapreduce.job.tags)
java.lang.RuntimeException: Could not find Yarn tags property
(mapreduce.job.tags) at
org.apache.oozie.action.hadoop.LauncherMainHadoopUtils.getChildYarnJobs(LauncherMainHadoopUtils.java:53)
at
org.apache.oozie.action.hadoop.LauncherMainHadoopUtils.killChildYarnJobs(LauncherMainHadoopUtils.java:88)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:46) at
org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:46)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:38) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:228)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:378)
at
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:296)
at
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745
I did a google search, and found the following SO post: Hadoop MapReduce job starts but can not find Map class? However the resolution mentioned in this post is not working for me, I cannot see any file permission related errors in the log files.
We are using Cloudera distribution.
You need to upgrade Oozie sharelibs. Follow instructions in Cloudera's documentation. Namely:
sudo oozie-setup sharelib create -fs FS_URI -locallib /usr/lib/oozie/oozie-sharelib-yarn
Don't forget to restart Oozie afterwards. This helped us to solve this particular problem after CDH 5.5 upgrade.

Cannot start Hive in terminal

I have installed and Configured Apache Hive-1.2.1 long ago. It worked fine. Recently I have installed Apache Spark-2.7.0 and started using its shells. Now when I want to work with Hive again, it didn't start. It's showing the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
at org.apache.commons.logging.impl.SLF4JLocationAwareLog.debug(SLF4JLocationAwareLog.java:133)
at org.apache.hadoop.hive.common.LogUtils.logConfigLocation(LogUtils.java:147)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:128)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:77)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:58)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:637)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
I tried reinstalling Hive, but the same error follows. Is this error due to installing Spark? How can I run Hive normally again?
It seems that you are having a conflict with your logging library. This question could help you: java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log

Cassandra on mac: stopping cassandra shows error

I downloaded cassandra from the official site and ran it with:
./bin/cassandra -f
Cassandra seems to be working fine and I am able to connect to it via cqlsh
When I stop it using CTRL-C, it throws an error.
INFO 17:39:06 Stop listening to thrift clients
INFO 17:39:06 Stop listening for CQL clients
INFO 17:39:06 Announcing shutdown
INFO 17:39:06 Node localhost/127.0.0.1 state jump to normal
INFO 17:39:08 Waiting for messaging service to quiesce
INFO 17:39:08 MessagingService has terminated the accept() thread
ERROR 17:39:08 Exception in thread Thread[StorageServiceShutdownHook,5,main]
java.io.IOError: java.io.IOException: Unknown error: 316
at org.apache.cassandra.net.MessagingService.shutdown(MessagingService.java:750) ~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:682) ~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.9.jar:2.1.9]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
Caused by: java.io.IOException: Unknown error: 316
at sun.nio.ch.NativeThread.signal(Native Method) ~[na:1.8.0_60]
at sun.nio.ch.ServerSocketChannelImpl.implCloseSelectableChannel(ServerSocketChannelImpl.java:292) ~[na:1.8.0_60]
at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234) ~[na:1.8.0_60]
at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115) ~[na:1.8.0_60]
at sun.nio.ch.ServerSocketAdaptor.close(ServerSocketAdaptor.java:137) ~[na:1.8.0_60]
at org.apache.cassandra.net.MessagingService$SocketThread.close(MessagingService.java:1017) ~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.net.MessagingService.shutdown(MessagingService.java:746) ~[apache-cassandra-2.1.9.jar:2.1.9]
... 3 common frames omitted
I was wondering whether I missed any setup procedures. The getting started guide says it should running out of the box. I am using OSX Yosemite 10.10.5
Thanks in advance.
It appears this is a JDK bug and it looks like this could be fixed by updating your JDK, what version are you currently on?
It looks like CASSANDRA-8220 (C* 2.2.1+, and 3.0.0-alpha1) was introduced to work around the problem, but I think upgrading your JDK should fix this as well.

Fiware Cosmos Hive Authorization Issue

I'm using a shared instance of Fiware Cosmos (meaning I don't have root privileges). I have until today successfully acessed and managed tables in hive both remotely using jdbc, and Hive CLI.
But now I'm getting this error when starting Hive CLI:
log4j:ERROR Could not instantiate class [org.apache.hadoop.hive.shims.HiveEventCounter].
java.lang.RuntimeException: Could not load shims in class org.apache.hadoop.log.metrics.EventCounter
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:123)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)
at org.apache.hadoop.hive.shims.ShimLoader.getEventCounter(ShimLoader.java:98)
at org.apache.hadoop.hive.shims.HiveEventCounter.<init>(HiveEventCounter.java:34)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:357)
at java.lang.Class.newInstance(Class.java:310)
at org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:330)
at org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476)
at org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:354)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:127)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:77)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:58)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:641)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.log.metrics.EventCounter
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)
... 27 more
log4j:ERROR Could not instantiate appender named "EventCounter".
Logging initialized using configuration in jar:file:/usr/local/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
I can however perform select and create in the Hive CLI.
If I then try to access Hive remotely, I get this:
Connecting to jdbc:hive://x.x.x.x:10000/default?user=user&password=XXXXXXXXXX
Could not establish connection: java.net.ConnectException: Connection refused
I didn't do any changes in code or commands before the errors appeared, and after googling around I haven't found any working solutions.
If anyone can guide me to where the problem is, or how to find it, or even better how to solve it, I'd be grateful.
Thanks in advance!
HiveServer2 (the Hive JDBC service) is a very unstable piece of shoftware. In our Prod cluster we have a CRON job to restart each instance every day, and even then, sometimes it blows OutOfMemory errors then just hangs saying Connection refused like you show. Open a ticket to your Hadoop admin so that he/she retarts the damn service.
On the other hand, the org.apache.hadoop.log.metrics.EventCounter message smells like someone tried to change a shared config somewhere (or tried to upgrade some JARs) and now Hive believes that it runs on a very, very old version of Hadoop
=> e.g. comments in Hive-4133 or that MapR support post
The cause of these issues were Hive upgrades in Cosmos. A more thorough explanation and solution is found here:
My Hive client stopped working with Cosmos instance

Hadoop/Yarn distributed shell example

I'm trying to run the distributed shell example (using a SVN checkout of Hadoop, which is why the version is set to 3.0.0-SNAPSHOT):
yarn jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar \
-jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar \
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command whoami
However it does not work:
12/09/03 13:44:37 FATAL distributedshell.Client: Error running CLient
java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128)
at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getClusterMetrics(ClientRMProtocolPBClientImpl.java:123)
at org.hadoop.yarn.client.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:163)
at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:316)
at org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:164)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown protocol: org.apache.hadoop.yarn.api.ClientRMProtocolPB
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.getProtocolImpl(ProtobufRpcEngine.java:398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:456)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1732)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1728)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1726)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy7.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getClusterMetrics(ClientRMProtocolPBClientImpl.java:121)
... 8 more
The essential problem seems to be in the second trace:
Unknown protocol: org.apache.hadoop.yarn.api.ClientRMProtocolPB
Does anyone know how protocol registration for Hadoops ProtoBufRPC works? Any idea on how to debug?
Edit: With Hadoop version 2.0.1-alpha, it works slightly better.
12/09/03 18:43:14 INFO distributedshell.Client: Application did not finish. YarnState=FAILED, DSFinalStatus=FAILED. Breaking monitoring loop
12/09/03 18:43:14 ERROR distributedshell.Client: Application failed to complete successfully
So maybe my build did not work right. Any ideas of what is causing the problem above (I'd really like to use HEAD, as I'm planning to do some low level experiments, beyond MapReduce)? Or is HEAD partially broken, does distributed shell on HEAD work for you?
My own (not yet working ...) client still fails with the same error:
Caused by: java.io.IOException: Unknown protocol: org.apache.hadoop.yarn.api.ClientRMProtocolPB
It turned out that the main problem with my own code was that I naively instantiated the Configuration class, instead of instantiating YarnConfiguration. This way, the yarn config files were not read, and it tried to contact the servers on their default ports - which don't agree with my settings.
The same bug seems to be present in the distributedshell example.

Resources