Nutch Elasticsearch Integration - elasticsearch

I'm following this tutorial for setting up nutch alongwith Elasticsearch. Whenever I try to index the data into the ES, it returns an error. Following are the logs:-
Command:-
bin/nutch index elasticsearch -all
Logs when I add elastic.port(9200) in conf/nutch-site.xml :-
2016-05-05 13:22:49,903 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100
2016-05-05 13:22:49,904 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2016-05-05 13:22:49,904 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2016-05-05 13:22:49,904 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2016-05-05 13:22:49,905 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.metadata.MetadataIndexer
2016-05-05 13:22:49,906 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.more.MoreIndexingFilter
2016-05-05 13:22:49,961 INFO elastic.ElasticIndexWriter - Processing remaining requests [docs = 0, length = 0, total docs = 0]
2016-05-05 13:22:49,961 INFO elastic.ElasticIndexWriter - Processing to finalize last execute
2016-05-05 13:22:54,898 INFO client.transport - [Peggy Carter] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:9200]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9200]][cluster:monitor/nodes/info] request_id [1] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-05 13:22:55,682 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.elastic.ElasticIndexWriter
2016-05-05 13:22:55,683 INFO indexer.IndexingJob - Active IndexWriters :
ElasticIndexWriter
elastic.cluster : elastic prefix cluster
elastic.host : hostname
elastic.port : port (default 9300)
elastic.index : elastic index command
elastic.max.bulk.docs : elastic bulk index doc counts. (default 250)
elastic.max.bulk.size : elastic bulk index length. (default 2500500 ~2.5MB)
2016-05-05 13:22:55,711 INFO elasticsearch.plugins - [Adrian Toomes] loaded [], sites []
2016-05-05 13:23:00,763 INFO client.transport - [Adrian Toomes] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:92$0]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9200]][cluster:monitor/nodes/info] request_id [0] time$ out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-05 13:23:00,766 INFO indexer.IndexingJob - IndexingJob: done.
Logs when default port 9300 is used:-
2016-05-05 13:58:44,584 INFO elasticsearch.plugins - [Mentallo] loaded [], sites []
2016-05-05 13:58:44,673 WARN transport.netty - [Mentallo] Message not fully read (response) for [0] handler future(org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler$1#3c80f1dd), error [true], resetting
2016-05-05 13:58:44,674 INFO client.transport - [Mentallo] failed to get node info for [#transport#-1][ubuntu][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:173)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:125)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.StreamCorruptedException: Unsupported version: 1
at org.elasticsearch.common.io.ThrowableObjectInputStream.readStreamHeader(ThrowableObjectInputStream.java:46)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at org.elasticsearch.common.io.ThrowableObjectInputStream.<init>(ThrowableObjectInputStream.java:38)
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:170)
... 23 more
2016-05-05 13:58:44,676 INFO indexer.IndexingJob - IndexingJob: done.
I've configured everything fine. Have had a look at various threads as well but to no avail. Also java version for both ES and JVM is same. Is there a bug in here?
I'm using Nutch 2.3.1 and have tried with both ES 1.4.4 and 2.3.2. I can see data in Mongo but I cannot index data in ES. Why??

Related

Execute hive query cause yarn resource manager to throw file does not exist exception

I'm configuring hive 3.1.0 to work with hadoop 3.0.0.
This error throw almost immediately when I submit a simple query on beeline that cause map reduce
0: jdbc:hive2://> select count(*) from airlinedata;
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN ql.Driver: Hive-on-MR is deprecated in Hive 2 and may not be available in the futureversions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = UUT81HC_20181011102444_2df01ff5-ca05-403c-b0e1-15f8f7715dc7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
2018-10-11 10:24:45,510 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
2018-10-11 10:24:45,555 INFO [HiveServer2-Background-Pool: Thread-124] client.RMProxy (RMProxy.java:newProxyInstance(133)) - Connecting to ResourceManager at /10.184.153.232:8032
18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:73)
at org.apache.hadoop.mapreduce.TypeConverter.toYarn(TypeConverter.java:78)
at org.apache.hadoop.mapred.ClientServiceDelegate.(ClientServiceDelegate.java:120)
at org.apache.hadoop.mapred.ClientCache.getClient(ClientCache.java:68)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:343)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:423)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:70)
... 40 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder.setAppId(Lorg/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder; #36: invokevirtual
Reason:
Type 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
Current Frame:
bci: #36
flags: { }
locals: { 'org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
stack: { 'com/google/protobuf/SingleFieldBuilder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
Bytecode:
0x0000000: 2ab4 0011 c700 1b2b c700 0bbb 002f 59b7
0x0000010: 0030 bf2a 2bb5 000a 2ab6 0031 a700 0c2a
0x0000020: b400 112b b600 3257 2a59 b400 1304 80b5
0x0000030: 0013 2ab0
Stackmap Table:
same_frame(#19)
same_frame(#31)
same_frame(#40)
at org.apache.hadoop.mapreduce.v2.proto.MRProtos$JobIdProto.newBuilder(MRProtos.java:1017)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.(JobIdPBImpl.java:37)
... 45 more
yarn resoucemanager stacktrace
2018-10-11 10:24:49,896 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2018-10-11 10:24:49,896 INFO recovery.RMStateStore: Updating info for app: application_1539226955170_0002
2018-10-11 10:24:49,897 INFO capacity.CapacityScheduler: Application Attempt appattempt_1539226955170_0002_000002 is done. finalState=FAILED
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application.
2018-10-11 10:24:49,897 INFO scheduler.AppSchedulingInfo: Application application_1539226955170_0002 requests cleared
2018-10-11 10:24:49,897 INFO rmapp.RMAppImpl: application_1539226955170_0002 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED
2018-10-11 10:24:49,898 INFO capacity.LeafQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2018-10-11 10:24:49,898 WARN resourcemanager.RMAuditLogger: USER=UUT81HC OPERATION=Application Finished - Failed
TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1539226955170_0002 failed 2 times due to AM Container for appattempt_1539226955170_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2018-10-11 10:24:49.876]File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://10.184.153.232:19000/tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1495)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1488)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1503)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:366)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:234)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
For more detailed output, check the application tracking page: http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002 Then click on links to logs of each attempt.
. Failing the application. APPID=application_1539226955170_0002
2018-10-11 10:24:49,898 INFO capacity.ParentQueue: Application removed - appId: application_1539226955170_0002 user: UUT81HC leaf-queue of parent: root #applications: 0
2018-10-11 10:24:49,899 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1539226955170_0002,name=select count(*) from airlinedata (Stage-1),user=UUT81HC,queue=default,state=FAILED,trackingUrl=http://HC-UT40048C.apac.com:8088/cluster/app/application_1539226955170_0002,appMasterHost=N/A,submitTime=1539228287412,startTime=1539228287413,finishTime=1539228289896,finalStatus=FAILED,memorySeconds=1482,vcoreSeconds=0,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=,applicationType=MAPREDUCE,resourceSeconds=1482 MB-seconds\, 0 vcore-seconds,preemptedResourceSeconds=0 MB-seconds\, 0 vcore-seconds
After examine how hive execute mapreduce job on yarn, I found that it first it create map.xml and reduce.xml in /tmp with permission drwx------ (only owner can use it)
2018-10-11 10:24:45,133 INFO hdfs.StateChange: BLOCK* allocate blk_1073742318_1495, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,225 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,248 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/map.xml
2018-10-11 10:24:45,294 INFO hdfs.StateChange: BLOCK* allocate blk_1073742319_1496, replicas=10.184.153.232:9866 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,411 INFO hdfs.StateChange: DIR* completeFile: /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:45,437 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hive/UUT81HC/0d321851-1d90-4f19-ac50-12d120da601d/hive_2018-10-11_10-24-44_868_5772391105026287697-3/-mr-10005/b8800c0f-f09c-41ca-ab69-a79b72fc9597/reduce.xml
2018-10-11 10:24:45,772 INFO hdfs.StateChange: BLOCK* allocate blk_1073742320_1497, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,438 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,463 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.jar
2018-10-11 10:24:46,618 INFO namenode.FSDirectory: Increasing replication from 2 to 10 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,639 INFO hdfs.StateChange: BLOCK* allocate blk_1073742321_1498, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split
2018-10-11 10:24:46,706 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.split is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,791 INFO hdfs.StateChange: BLOCK* allocate blk_1073742322_1499, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo
2018-10-11 10:24:46,870 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.splitmetainfo is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:24:46,971 INFO hdfs.StateChange: BLOCK* allocate blk_1073742323_1500, replicas=10.184.153.232:9866 for /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml
2018-10-11 10:24:47,370 INFO hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/UUT81HC/.staging/job_1539226955170_0002/job.xml is closed by DFSClient_NONMAPREDUCE_164506931_1
2018-10-11 10:32:15,741 INFO blockmanagement.BlockManager: StorageInfo TreeSet fill ratio DS-d4c2a5a0-435d-4b44-b408-3cd04587cd09 : 1.0
But somehow yarn can't read that when executing job and throw out file does not exist. I did set permission 777 on /tmp but this file is self create by hive in executing process so I can't do anything with it.
I doubt that this problem is something related to user or permission when using hive in hadoop. What should I do with this?

Spark1.6.2 in cluster mode in emr

iam a submitted a spark job in emr. it runs well in client mode. but while running in cluster mode it shows following error.in
in code im using sqs receiver from GitHub - imapi/spark-sqs-receiver. and trying to access from amazon s3 using s3a.
16/10/21 07:10:00 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 71, ip-172-31-54-127.ec2.internal, partition 0,NODE_LOCAL, 2115 bytes)
16/10/21 07:10:00 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-54-127.ec2.internal:43789 (size: 2.4 KB, free: 511.5 MB)
16/10/21 07:10:01 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 72, ip-172-31-54-127.ec2.internal, partition 1,NODE_LOCAL, 2115 bytes)
16/10/21 07:10:01 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, ip-172-31-54-127.ec2.internal): java.lang.ExceptionInInitializerError
at sqs.sqstest.Streaming$1.call(Streaming.java:146)
at sqs.sqstest.Streaming$1.call(Streaming.java:97)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:170)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:170)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sqs.sqstest.Streaming.<clinit>(Streaming.java:41)
... 15 more
16/10/21 07:10:01 INFO YarnClusterSchedulerBackend: Disabling executor 1.
16/10/21 07:10:01 INFO DAGScheduler: Executor lost: 1 (epoch 1)
16/10/21 07:10:01 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
16/10/21 07:10:01 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-172-31-54-127.ec2.internal, 43789)
16/10/21 07:10:01 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
16/10/21 07:10:02 INFO YarnAllocator: Completed container container_1477032067457_0003_02_000002 on host: ip-172-31-54-127.ec2.internal (state: COMPLETE, exit status: 50)
16/10/21 07:10:02 WARN YarnAllocator: Container marked as failed: container_1477032067457_0003_02_000002 on host: ip-172-31-54-127.ec2.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1477032067457_0003_02_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 50
16/10/21 07:10:02 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1477032067457_0003_02_000002 on host: ip-172-31-54-127.ec2.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1477032067457_0003_02_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

HDFS error + Incomplete HDFS URI, no host: hdfs://l27.0.0.1:9000

I run flume on hadoop with command
flume-ng agent -c conf -f conf/flume-twitter.conf -n TwitterAgent
and Now i am not able to save the twitter data. It shows error.
I posted the error below. Can you have any idea ??
2016-01-29 15:48:56,979 (Twitter4J Async Dispatcher[0]) [DEBUG - com.cloudera.flume.source.TwitterSource$1.onStatus(TwitterSource.java:108)] ArtistLR: RT Ar
tsUK1 An unlikely duo: Homeland's Nina Hoss plots with art joker Christian Janko
wski #Arts https://t.co/hJyClDCgk0
2016-01-29 15:48:57,136 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creatin
g hdfs://l27.0.0.1:9000/tweets/movies/2016/01/29/15//FlumeData.1454062721600.tmp
2016-01-29 15:48:57,139 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HD
FS IO error
java.io.IOException: Incomplete HDFS URI, no host: hdfs://l27.0.0.1:9000/tweets/movies/2016/01/29/15/FlumeData.1454062721600.tmp
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-01-29 15:48:57,334 (agent-shutdown-hook) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.stop(LifecycleSupervisor.java:79)] Stopping lifecycle s
upervisor 11
Look closely at the URL: l27.0.0.1 has a small L instead of the digit 1. A typo in the configuration?

Graylog2 - Startup fail. Address already in use

I am trying to install graylog2. I have installed open-jdk7. I have also installed elasticsearch and mongodb using apt on ubuntu 14.04.
I am new to both graylog and elasticsearch. I just want to try a trail installation and try these out. And I also did search similar questions and tried their suggestions. But none of them worked for my case.
I have followed the installation instructions on graylog.org. But when I try to start the graylog2 server I get the following error.
2015-02-12 03:19:36,216 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2015-02-12 03:19:36,222 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2015-02-12 03:19:36,225 INFO : org.graylog2.periodical.IndexerClusterCheckerThread - Indexer not fully initialized yet. Skipping periodic cluster check.
2015-02-12 03:19:36,229 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.ThroughputCounterManagerThread] periodical in [0s], polling every [1s].
2015-02-12 03:19:36,280 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.DeadLetterThread] periodical, running forever.
2015-02-12 03:19:36,295 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [0s], polling every [20s].
2015-02-12 03:19:36,299 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.InputCacheWorkerThread] periodical, running forever.
2015-02-12 03:19:36,334 DEBUG: org.graylog2.periodical.ClusterHealthCheckThread - No input running in cluster!
2015-02-12 03:19:36,368 DEBUG: org.graylog2.caches.DiskJournalCache - Committing output-cache (entries 0)
2015-02-12 03:19:36,383 DEBUG: org.graylog2.caches.DiskJournalCache - Committing input-cache (entries 0)
2015-02-12 03:19:36,885 ERROR: com.google.common.util.concurrent.ServiceManager - Service IndexerSetupService [FAILED] has failed in the STARTING state.
org.elasticsearch.transport.BindTransportException: Failed to bind to [9300]
at org.elasticsearch.transport.netty.NettyTransport.doStart(NettyTransport.java:396)
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
at org.elasticsearch.transport.TransportService.doStart(TransportService.java:90)
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
at org.elasticsearch.node.internal.InternalNode.start(InternalNode.java:242)
at org.graylog2.initializers.IndexerSetupService.startUp(IndexerSetupService.java:101)
at com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54)
at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.netty.channel.ChannelException: Failed to bind to: /127.0.0.1:9300
at org.elasticsearch.common.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at org.elasticsearch.transport.netty.NettyTransport$3.onPortNumber(NettyTransport.java:387)
at org.elasticsearch.common.transport.PortsRange.iterate(PortsRange.java:58)
at org.elasticsearch.transport.netty.NettyTransport.doStart(NettyTransport.java:383)
... 8 more
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
Elastic search is showing the following status
{
"cluster_name" : "graylog2",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
The following are the changes I made to elasticsearch.yml
cluster.name: graylog2
network.bind_host: 127.0.0.1
network.host: 127.0.0.1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["127.0.0.1", MYSYS IP]
and graylog2.conf
is_master = true
password_secret = changed
root_password_sha2 = changed
elasticsearch_max_docs_per_index = 20000000
elasticsearch_shards = 1
elasticsearch_replicas = 0
elasticsearch_cluster_name = graylog2
elasticsearch_discovery_zen_ping_multicast_enabled = false
elasticsearch_discovery_zen_ping_unicast_hosts = IP_ARR:9300
mongodb_useauth = false
I tried killing the process on the port 9300 and tried starting graylog again. But I got the following error
2015-02-12 04:01:24,976 INFO : org.elasticsearch.transport - [graylog2-server] bound_address {inet[/127.0.0.1:9300]}, publish_address {inet[/127.0.0.1:9300]}
2015-02-12 04:01:25,227 INFO : org.elasticsearch.discovery - [graylog2-server] graylog2/LGkZJDz1SoeENKj6Rr0e8w
2015-02-12 04:01:25,252 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] processing [update local node]: execute
2015-02-12 04:01:25,253 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] cluster state updated, version [0], source [update local node]
2015-02-12 04:01:25,259 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] set local cluster state to version 0
2015-02-12 04:01:25,259 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] processing [update local node]: done applying updated cluster_state (version: 0)
2015-02-12 04:01:25,325 WARN : org.elasticsearch.transport.netty - [graylog2-server] exception caught on transport layer [[id: 0x82f30fa7]], closing connection
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
.......
2015-02-12 04:01:28,536 DEBUG: org.elasticsearch.action.admin.cluster.health - [graylog2-server] no known master node, scheduling a retry
2015-02-12 04:01:28,564 DEBUG: org.elasticsearch.transport.netty - [graylog2-server] disconnected from [[graylog2-server][LGkZJDz1SoeENKj6Rr0e8w][ubuntu-greylog-9945][inet[/127.0.0.1:9300]]{client=true, data=false, master=false}]
2015-02-12 04:01:28,573 DEBUG: org.elasticsearch.discovery.zen - [graylog2-server] filtered ping responses: (filter_client[true], filter_data[false]) {none}
2015-02-12 04:01:28,590 WARN : org.elasticsearch.transport.netty - [graylog2-server] exception caught on transport layer [[id: 0xe27feaff]], closing connection
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
Can you please point out to what I am doing wrong here and what I am missing??
if ES and greylog2 running on same server, try (del/comment) in elasticsearch.conf
#transport.tcp.port: 9300
and (add/uncomment) in greylog.conf
elasticsearch_transport_tcp_port = 9350

Spring-XD Curator Connection Timeout

In Spring-XD the Curator Connection times out:
WARN ConnectionStateManager-0 curator.ConnectionState - Connection
attempt unsuccessful after 63021 (greater than max timeout of 60000).
Resetting connection and trying again with a new connection.
Curator tries to re-establish the connection, but fails. Please check the logs below. Has anyone faced similar issue? Please let me know if you know of any ways to resolve the issue or if you know of any workarounds.
Also the default Curator connection time out is 60000. Is there a way to increase it? Does spring-xd expose a property which can be set?
2014-12-10 01:24:41,003 WARN ConnectionStateManager-0
server.ContainerRegistrar - >>> disconnected container:
1c8a234d-4b8d-4d65-b374-xxxxe8619 2014-12-10 01:24:41,004 INFO
DeploymentsPathChildrenCache-0 server.ContainerRegistrar - Path cache
event: null, type: CONNECTION_SUSPENDED 2014-12-10 01:24:41,005 INFO
ConnectionStateManager-0 server.ContainerRegistrar - Undeploying
module [ModuleDescriptor#350920b1 moduleName = 'rabbit', moduleLabel =
'rabbit', group = 'xxx-ingestion-2', sourceChannelName = [null],
sinkChannelName = [null], sinkChannelName = [null], index = 0, type =
source, parameters = map['vhost' -> 'xxx_virtual_host', 'requeue' ->
'false', 'outputType' -> 'text/plain', 'queues' -> 'xx.xxx.queue',
'addresses' -> 'xxxmq.xx.xxxx.com'], children = list[[empty]]]
2014-12-10 01:24:46,022 ERROR pool-22-thread-1
connection.CachingConnectionFactory - Channel shutdown: clean
connection shutdown; protocol method:
method<connection.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
2014-12-10 01:24:56,007 **ERROR CuratorFramework-0
curator.ConnectionState - Connection timed out for connection string
(514.xx.93.xxx:2181,504.58.xxx.xx:2181) and timeout (15000) / elapsed**
(15004) org.apache.curator.CuratorConnectionLossException:
KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:779)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:58)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:265)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) 2014-12-10 01:24:56,161
ERROR main-EventThread curator.ConnectionState - Connection timed out
for connection string (514.xx.93.xxx:2181,504.58.xxx.xx:2181) and
timeout (15000) / elapsed (15159)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:287)
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41)
at org.springframework.xd.dirt.server.ContainerRegistrar$StreamModuleWatcher.process(ContainerRegistrar.java:744)
at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:67)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2014-12-10 01:25:03,014 ERROR CuratorFramework-0 imps.CuratorFrameworkImpl - **Background retry gave up**
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:779)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:58)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:265)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Is this reproducible? Are you running in clustered or single node mode?
The Curator connection timeout (in milliseconds) can be set via system property curator-default-connection-timeout.

Resources