ERROR NioEndpoint Socket accept failed java.io.IOException: Too many open files - sonarqube

After updating SonarQube to v5.3 (configuration adopted from the previous version we used, v5.1) we're getting the following error which stops SQ from running:
2016.02.16 00:26:11 ERROR web[o.s.s.c.t.CeWorkerCallableImpl] Executed task | project=<my-project-id> | id=AVLnP-hq9AOM7J73mzYa | time=13ms
2016.02.16 00:26:14 ERROR web[o.a.t.u.n.NioEndpoint] Socket accept failed
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[na:1.8.0_51]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) ~[na:1.8.0_51]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) ~[na:1.8.0_51]
at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:688) ~[tomcat-embed-core-8.0.18.jar:8.0.18]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
This error appears every 1-2 days.
Thanks in advance for your help.

Operate System treat network connections as a file,every connection is a file descriptor.So i suggest you to check open files limit of system.

Related

Sonarqube Background Task (too many open files)

We have a large project (~35,000 java files) that currently has about 10,000 issues. About 3 weeks ago, our nightly scan started failing on version sonarqube 6.5. I upgraded to 6.6 and found the same problem. First, it was failing due to heap space. I have upgraded this machine to give it more memory and that seems to have allowed the analysis to finish. Now the scan fails every night during the background task to post the analysis due to "Too many open files". We have upped the limits for open files for the sonar user but that doesn't seem to have any effect. We have other smaller projects and they all finish easily. It's just this very large project that constantly fails. Has anyone seen this before?
This morning I installed sonarqube 6.7 to see if that fixes it. I'm currently running an analysis but it takes about 3 hours to finish and fail.
We increased the number of open files allowed for the sonar user.
-sh-4.1$ whoami
sonar
-sh-4.1$ ulimit -Hs
unlimited
-sh-4.1$ ulimit -Hn
1048576
Here is the error we are seeing
org.sonar.server.computation.task.projectanalysis.component.VisitException: Visit of Component {key=applications:sonar:src/main/java/com/MyFileName.java,type=FILE} failed
at org.sonar.server.computation.task.projectanalysis.component.VisitException.rethrowOrWrap(VisitException.java:44)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visit(VisitorsCrawler.java:74)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visitChildren(VisitorsCrawler.java:110)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visitImpl(VisitorsCrawler.java:97)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visit(VisitorsCrawler.java:72)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visitChildren(VisitorsCrawler.java:110)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visitImpl(VisitorsCrawler.java:97)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visit(VisitorsCrawler.java:72)
at org.sonar.server.computation.task.projectanalysis.step.ExecuteVisitorsStep.execute(ExecuteVisitorsStep.java:51)
at org.sonar.server.computation.task.step.ComputationStepExecutor.executeSteps(ComputationStepExecutor.java:64)
at org.sonar.server.computation.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:52)
at org.sonar.server.computation.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:75)
at org.sonar.ce.taskprocessor.CeWorkerImpl.executeTask(CeWorkerImpl.java:92)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:59)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Fail to process issues of component 'applications:sonar:src/main/java/com/MyFileName.java'
at org.sonar.server.computation.task.projectanalysis.issue.IntegrateIssuesVisitor.processIssues(IntegrateIssuesVisitor.java:83)
at org.sonar.server.computation.task.projectanalysis.issue.IntegrateIssuesVisitor.visitAny(IntegrateIssuesVisitor.java:63)
at org.sonar.server.computation.task.projectanalysis.component.TypeAwareVisitorWrapper.visitAny(TypeAwareVisitorWrapper.java:82)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visitNode(VisitorsCrawler.java:117)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visitImpl(VisitorsCrawler.java:100)
at org.sonar.server.computation.task.projectanalysis.component.VisitorsCrawler.visit(VisitorsCrawler.java:72)
... 21 more
Caused by: java.lang.IllegalStateException: Fail to traverse file: /opt/sonarqube-6.5/temp/ce/6622607282221286408/1421069522563365386/source-11991.txt
at org.sonar.server.computation.task.projectanalysis.batch.BatchReportReaderImpl.readFileSource(BatchReportReaderImpl.java:153)
at org.sonar.server.computation.task.projectanalysis.source.SourceLinesRepositoryImpl.readLines(SourceLinesRepositoryImpl.java:45)
at org.sonar.server.computation.task.projectanalysis.issue.TrackerRawInputFactory$RawLazyInput.loadLineHashSequence(TrackerRawInputFactory.java:80)
at org.sonar.core.issue.tracking.LazyInput.getLineHashSequence(LazyInput.java:34)
at org.sonar.server.computation.task.projectanalysis.issue.TrackerRawInputFactory$RawLazyInput.loadIssues(TrackerRawInputFactory.java:105)
at org.sonar.core.issue.tracking.LazyInput.getIssues(LazyInput.java:50)
at org.sonar.core.issue.tracking.Tracking.<init>(Tracking.java:46)
at org.sonar.core.issue.tracking.Tracker.track(Tracker.java:37)
at org.sonar.server.computation.task.projectanalysis.issue.TrackerExecution.track(TrackerExecution.java:41)
at org.sonar.server.computation.task.projectanalysis.issue.IntegrateIssuesVisitor.processIssues(IntegrateIssuesVisitor.java:76)
... 26 more
Caused by: java.io.FileNotFoundException: /opt/sonarqube-6.5/temp/ce/6622607282221286408/1421069522563365386/source-11991.txt (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:301)
at org.sonar.server.computation.task.projectanalysis.batch.BatchReportReaderImpl.readFileSource(BatchReportReaderImpl.java:151)
... 35 more
Upgrading to SonarQube 6.7 fixed this error with too many open files.

HORTONWORKS - Hbase/Phoenix - WALEditCodec - missing

I am receiving the following error while trying to run Phoenix on top of Hbase:
EXCEPTION #1:
2017-11-07 12:40:12,620 WARN [RS_LOG_REPLAY_OPS-XXX:16020-0]
regionserver.SplitLogWorker: log splitting of
WALs/XXX.XXX.XXX.XXX,16020,1507179047656-
splitting/XXX.XXX.XXX.XXX%2C16020%2C1507179047656.default.1507179049782 failed, returning error
java.io.IOException: Cannot get log reader
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:355)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
:$
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:355)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException: Unable to find org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:36)
at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:103)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:297)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:307)
at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:164)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
... 11 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:32)
... 17 more
APPLIED PATCHES #1:
I have applied the following settings through the Ambari web UI for the advanced Hbase configs as specified by the Hortonworks document:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-upgrade/content/configure-phoenix-25.html
EXCEPTION #2
FATAL [RS_LOG_REPLAY_OPS-XXX:16020-1] conf.Configuration: error parsing conf core-site.xml
java.io.FileNotFoundException: /etc/hadoop/2.6.1.0-129/0/core-site.xml (Too many open files)
APPLIED PATCHES #2
I checked each 'core-site.xml' file on each server that contained a Hbase region server and made sure it ended with a </configuration>. As well as the core-site.xml files in the specified directory '/etc/hadoop/2.6.1.0-129/0/core-site.xml'
Haven't been able to find any other information regarding this issue.
I went into the HDFS and deleted all WAL split logs using the following command:
hdfs dfs -rm -r /apps/hbase/data/WALs/*splitting*
This resolved exception #1. Keep in mind from what I've read this will incur data loss.
For exception #2 I went back and checked the open file limits for each server (ulimit -n) and updated where applicable with respect to the Hortonworks doc:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/kerb-config-limits.html

What causes recovery log corruption, and what can be done about it?

Have checked the ServerErr.log and found the below given exception: Which is basically due to the corruption of log1 and log2 in tranlog and partnerlog location. What to know how does the log1 and log2 gets corrupt. Want to know the complete the procedure.
[3/22/16 8:07:44:489 EDT] 0000000a WsServerImpl E WSVR0009E: Error occurred during startup com.ibm.ws.exception.RuntimeError: com.ibm.ws.exception.RuntimeError: com.ibm.ws.recoverylog.spi.LogCorruptedException
at com.ibm.ws.runtime.WsServerImpl.bootServerContainer(WsServerImpl.java:199)
at com.ibm.ws.runtime.WsServerImpl.start(WsServerImpl.java:140)
at com.ibm.ws.runtime.WsServerImpl.main(WsServerImpl.java:461)
at com.ibm.ws.runtime.WsServer.main(WsServer.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:79)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:618)
at com.ibm.wsspi.bootstrap.WSLauncher.launchMain(WSLauncher.java:183)
at com.ibm.wsspi.bootstrap.WSLauncher.main(WSLauncher.java:90)
at com.ibm.wsspi.bootstrap.WSLauncher.run(WSLauncher.java:72)
at org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:78)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:92)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:68)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:400)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:79)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:618)
at org.eclipse.core.launcher.Main.invokeFramework(Main.java:336)
at org.eclipse.core.launcher.Main.basicRun(Main.java:280)
at org.eclipse.core.launcher.Main.run(Main.java:977)
at com.ibm.wsspi.bootstrap.WSPreLauncher.launchEclipse(WSPreLauncher.java:336)
at com.ibm.wsspi.bootstrap.WSPreLauncher.main(WSPreLauncher.java:91)
Caused by: com.ibm.ws.exception.RuntimeError: com.ibm.ws.recoverylog.spi.LogCorruptedException
at com.ibm.ws.runtime.component.TxServiceImpl.asynchRecoveryProcessingComplete(TxServiceImpl.java:2180)
at com.ibm.ws.Transaction.JTA.RecoveryManager.recoveryFailed(RecoveryManager.java:1740)
at com.ibm.ws.Transaction.JTA.RecoveryManager.run(RecoveryManager.java:2333)
at java.lang.Thread.run(Thread.java:810)
Caused by: com.ibm.ws.recoverylog.spi.LogCorruptedException
at com.ibm.ws.recoverylog.spi.LogHandle.openLog(LogHandle.java:555)
at com.ibm.ws.recoverylog.spi.MultiScopeRecoveryLog.openLog(MultiScopeRecoveryLog.java:573)
at com.ibm.ws.recoverylog.spi.RecoveryLogImpl.openLog(RecoveryLogImpl.java:71)
at com.ibm.ws.Transaction.JTA.RecoveryManager.run(RecoveryManager.java:2263)
The log1 and log2 files under the <profile_root>/tranlog directory record information required for transaction recovery. The partnerlog files hold information on resources involved in a transaction. The problem occurring here has been documented by IBM at this Technote: Application Server fails to start with CWRLS0009E error. According to the same document, IBM recommends using the 'Transaction=all' trace specification to identify the cause of the corruption.
If you are not running WebSphere Process Server (WPS) or IBM Business Process Manager (BPM), you can empty these transaction log files and start your server. You can use cp /dev/null log1 on all the tranlog & partnerlog files. For WPS & BPM, you cannot delete these files because they contain operational information about the product itself. See this link for more details.

Cassandra on mac: stopping cassandra shows error

I downloaded cassandra from the official site and ran it with:
./bin/cassandra -f
Cassandra seems to be working fine and I am able to connect to it via cqlsh
When I stop it using CTRL-C, it throws an error.
INFO 17:39:06 Stop listening to thrift clients
INFO 17:39:06 Stop listening for CQL clients
INFO 17:39:06 Announcing shutdown
INFO 17:39:06 Node localhost/127.0.0.1 state jump to normal
INFO 17:39:08 Waiting for messaging service to quiesce
INFO 17:39:08 MessagingService has terminated the accept() thread
ERROR 17:39:08 Exception in thread Thread[StorageServiceShutdownHook,5,main]
java.io.IOError: java.io.IOException: Unknown error: 316
at org.apache.cassandra.net.MessagingService.shutdown(MessagingService.java:750) ~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:682) ~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.9.jar:2.1.9]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
Caused by: java.io.IOException: Unknown error: 316
at sun.nio.ch.NativeThread.signal(Native Method) ~[na:1.8.0_60]
at sun.nio.ch.ServerSocketChannelImpl.implCloseSelectableChannel(ServerSocketChannelImpl.java:292) ~[na:1.8.0_60]
at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:234) ~[na:1.8.0_60]
at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115) ~[na:1.8.0_60]
at sun.nio.ch.ServerSocketAdaptor.close(ServerSocketAdaptor.java:137) ~[na:1.8.0_60]
at org.apache.cassandra.net.MessagingService$SocketThread.close(MessagingService.java:1017) ~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.net.MessagingService.shutdown(MessagingService.java:746) ~[apache-cassandra-2.1.9.jar:2.1.9]
... 3 common frames omitted
I was wondering whether I missed any setup procedures. The getting started guide says it should running out of the box. I am using OSX Yosemite 10.10.5
Thanks in advance.
It appears this is a JDK bug and it looks like this could be fixed by updating your JDK, what version are you currently on?
It looks like CASSANDRA-8220 (C* 2.2.1+, and 3.0.0-alpha1) was introduced to work around the problem, but I think upgrading your JDK should fix this as well.

Hadoop/Yarn distributed shell example

I'm trying to run the distributed shell example (using a SVN checkout of Hadoop, which is why the version is set to 3.0.0-SNAPSHOT):
yarn jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar \
-jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0-SNAPSHOT.jar \
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command whoami
However it does not work:
12/09/03 13:44:37 FATAL distributedshell.Client: Error running CLient
java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128)
at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getClusterMetrics(ClientRMProtocolPBClientImpl.java:123)
at org.hadoop.yarn.client.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:163)
at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:316)
at org.apache.hadoop.yarn.applications.distributedshell.Client.main(Client.java:164)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown protocol: org.apache.hadoop.yarn.api.ClientRMProtocolPB
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.getProtocolImpl(ProtobufRpcEngine.java:398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:456)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1732)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1728)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1726)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy7.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getClusterMetrics(ClientRMProtocolPBClientImpl.java:121)
... 8 more
The essential problem seems to be in the second trace:
Unknown protocol: org.apache.hadoop.yarn.api.ClientRMProtocolPB
Does anyone know how protocol registration for Hadoops ProtoBufRPC works? Any idea on how to debug?
Edit: With Hadoop version 2.0.1-alpha, it works slightly better.
12/09/03 18:43:14 INFO distributedshell.Client: Application did not finish. YarnState=FAILED, DSFinalStatus=FAILED. Breaking monitoring loop
12/09/03 18:43:14 ERROR distributedshell.Client: Application failed to complete successfully
So maybe my build did not work right. Any ideas of what is causing the problem above (I'd really like to use HEAD, as I'm planning to do some low level experiments, beyond MapReduce)? Or is HEAD partially broken, does distributed shell on HEAD work for you?
My own (not yet working ...) client still fails with the same error:
Caused by: java.io.IOException: Unknown protocol: org.apache.hadoop.yarn.api.ClientRMProtocolPB
It turned out that the main problem with my own code was that I naively instantiated the Configuration class, instead of instantiating YarnConfiguration. This way, the yarn config files were not read, and it tried to contact the servers on their default ports - which don't agree with my settings.
The same bug seems to be present in the distributedshell example.

Resources