Errors reading data from 1G file on localCluster mode with apache storm - apache-storm

Hi I'm using storm with local cluster mode for developing.
I ran a simple code that contains spout and two bolts, the code example count words from log file.
code example url :
the code works perfectly with small log files (7.3M), but when I try to run a big log file (100M-1000M) I'm getting exceptions.
I set a long delay till the cluster is going down.
May I miss some configuration options here?
11326 [Thread-6] INFO backtype.storm.daemon.supervisor - Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "HelloStorm-1-1403522378", :executors ([3 3] [ 4 4] [2 2] [1 1])} for this supervisor 868aff95-7b63-44d1-ad55-2dd07d9c7ba2 on port 1024 with id df052251-45ec-4bc3-a486-c2bf11a8a0fa
11336 [Thread-6] INFO backtype.storm.daemon.worker - Launching worker for HelloStorm-1-1403522378 on 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024 with id df052251-45ec-4bc3-a486-c2bf11a8a0fa and conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "" true, "" 5, "zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations" true, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper. session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 50, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/li b:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/var/tmp//77d5cd63-9539-44a4-892a-9e91553987df", "storm.messaging.netty.buffer_size" 5242880, "supervisor.w orker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.sha red.thread.pool.size" 4, "" "localhost", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_t hreads" 1, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts " nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor. frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topology.spout.w ait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "topology.sleep.spout." 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" (1024 1025 1026), "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx256m", "nimbus.thrift.port" 6627, "topol ogy.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.wait.strategy" "com.lm ax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory", "drpc.invoc ations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "storm.thrift.transport" "", "topology.state.synchroniz ation.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.netty.Context", "" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode" "local", "topolog y.optimize" true, "topology.max.task.parallelism" nil}
11337 [Thread-6] INFO - Starting
11344 [Thread-6-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
11358 [Thread-6] INFO - Starting
11611 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor line-reader-spout:[2 2]
11618 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks line-reader-spout:[2 2]
11632 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Opening spout line-reader-spout:(2)
Start Time: 18512885554479686
11634 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Opened spout line-reader-spout:(2)
11636 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Activating spout line-reader-spout:(2)
11638 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor line-reader-spout:[2 2]
11677 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor word-counter:[3 3]
11721 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks word-counter:[3 3]
11725 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor word-counter:[3 3]
11733 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor word-spitter:[4 4]
11735 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks word-spitter:[4 4]
11737 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor word-spitter:[4 4]
11746 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor __system:[-1 -1]
11747 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks __system:[-1 -1]
11748 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor __system:[-1 -1]
11761 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor __acker:[1 1]
11765 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks __acker:[1 1]
11767 [Thread-6] INFO backtype.storm.daemon.executor - Timeouts disabled for executor __acker:[1 1]
11768 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor __acker:[1 1]
11768 [Thread-6] INFO backtype.storm.daemon.worker - Launching receive-thread for 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024
11786 [Thread-6] INFO backtype.storm.daemon.worker - Worker has topology config {"" "HelloStorm-1-1403522378", "dev.zookeeper.path" "/tmp/dev-storm-zookeeper", " cs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "" true, "" 5, "zmq.linger.millis" 0, "topology.skip.missing.k ryo.registrations" true, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, " terval.millis" 50, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.l ocal.dir" "/var/tmp//77d5cd63-9539-44a4-892a-9e91553987df", "storm.messaging.netty.buffer_size" 5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "inputF ile" "test_log.log", "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.shared.thread.pool.size" 4, "" "localhost", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "stor m.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.kryo.decorators" (), "" "HelloStorm", "topology.transfer.buffer.size" 1024, "topology.worker .childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "superviso r.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topolo gy.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" 1, "storm.zookeeper.retry.interval" 1000, "topology.slee" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" (1024 1025 1026), "topology.debug" false, "nimbus.task.launc h.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.kryo.register" nil, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx25 6m", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "top ology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serializ ation.DefaultKryoFactory", "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "storm.thrift.transport" " ortPlugin", "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.nett y.Context", "" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode" "local", "topology.optimize" true, "topology.max.task.parallelism" nil}
11786 [Thread-6] INFO backtype.storm.daemon.worker - Worker df052251-45ec-4bc3-a486-c2bf11a8a0fa for storm HelloStorm-1-1403522378 on 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024 has finished loading
11801 [Thread-18-word-counter] INFO backtype.storm.daemon.executor - Preparing bolt word-counter:(3)
11821 [Thread-18-word-counter] INFO backtype.storm.daemon.executor - Prepared bolt word-counter:(3)
11823 [Thread-20-word-spitter] INFO backtype.storm.daemon.executor - Preparing bolt word-spitter:(4)
11825 [Thread-20-word-spitter] INFO backtype.storm.daemon.executor - Prepared bolt word-spitter:(4)
11838 [Thread-24-__acker] INFO backtype.storm.daemon.executor - Preparing bolt __acker:(1)
11840 [Thread-22-__system] INFO backtype.storm.daemon.executor - Preparing bolt __system:(-1)
11854 [Thread-24-__acker] INFO backtype.storm.daemon.executor - Prepared bolt __acker:(1)
12173 [Thread-22-__system] INFO backtype.storm.daemon.executor - Prepared bolt __system:(-1)
112055 [main-EventThread] INFO - State change: SUSPENDED
112058 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
112058 [Thread-6-EventThread] INFO - State change: SUSPENDED
112058 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121441 [main-EventThread] INFO - State change: SUSPENDED
121442 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121442 [main-EventThread] INFO - State change: SUSPENDED
121442 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121443 [main-EventThread] INFO - State change: SUSPENDED
121443 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121443 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
121444 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
134654 [main-EventThread] INFO - State change: SUSPENDED
134655 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
134655 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
134656 [main-EventThread] WARN - Session expired event received
134656 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
134656 [main-EventThread] WARN - Session expired event received
134657 [main-EventThread] INFO - State change: LOST
134657 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
134657 [main-EventThread] INFO - State change: LOST
139931 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
149745 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
149745 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
149746 [main-EventThread] WARN - Session expired event received
149746 [main-EventThread] INFO - State change: LOST
149747 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
149747 [main-EventThread] WARN - Session expired event received
149747 [main-EventThread] INFO - State change: LOST
149747 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158929 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158931 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158931 [Thread-6-EventThread] WARN - Session expired event received
158931 [Thread-6-EventThread] INFO - State change: LOST
158931 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158932 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
158933 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
176934 [ConnectionStateManager-0] WARN - There are no ConnectionStateListeners registered.
357333 [CuratorFramework-5] ERROR - Connection timed out
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at ~[curator-client-1.0.1.jar:na]
at [curator-client-1.0.1.jar:na]
at [curator-framework-1.0.1.jar:na]
at [curator-framework-1.0.1.jar:na]
at [curator-framework-1.0.1.jar:na]
at [curator-framework-1.0.1.jar:na]
at$200( [curator-framework-1.0.1.jar:na]
at$ [curator-framework-1.0.1.jar:na]
at java.util.concurrent.FutureTask$Sync.innerRun( [na:1.6.0_65]
at [na:1.6.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask( [na:1.6.0_65]
at java.util.concurrent.ThreadPoolExecutor$ [na:1.6.0_65]
at [na:1.6.0_65]
I got new exception running 70M file:
622366 [CuratorFramework-9] ERROR - Background exception was not retry-able or retry gave up
java.lang.OutOfMemoryError: GC overhead limit exceeded

The problem seems to be exactly as described: you've loaded more data into memory than your JVM can support. I assume this is happening to the spout. For very large files you'll need to break up the processing by either splitting the files in advance or streaming the files in instead of trying to load the whole file into memory.


PIG : count of each product in distinctive Locations

I am trying to do following Step1 to Step4 in pig:
STEP 1:- Create a user table:and take data from /tmp/users.txt-
|Column 1 | USER ID |int|
|Column 2 |EMAIL|chararray|
|Column 3 |LANGUAGE |chararray|
|Column 4 |LOCATION |chararray|
STEP 2:- Crate a transaction table and take data from /tmp/transaction.txt:-
|Column 1 | ID |int|
|Column 2 |PRODUCT|int|
|Column 3 |USER ID |int|
|Column 4 |PURCHASE AMOUNT |double|
|Coulmn 5 |DESCRIPTION |chararray|
Step 3:- Find out the count of each product in distinctive Locations.
Step 4:- Display the results.
For achieving above I did the following :
users = LOAD '/tmp/users.txt' USING PigStorage(',') AS (USERID:int, EMAIL:chararray, LANGUAGE:chararray, LOCATION: chararray);
trans = LOAD '/tmp/transaction.txt' USING PigStorage(',') AS (ID:int, PRODUCT:int, USERID:int, PURCHASEAMOUNT: double, DESCRIPTION: chararray);
users_trans = JOIN users BY USERID RIGHT, trans BY USERID;
C = FOREACH B GENERATE group as comb, COUNT(users_trans) AS Total;
But, I am getting errors.. It will helpful if you assist as I am new to pig.
1 1 1 300 a jumper
2 1 2 300 a jumper
3 1 5 300 a jumper
4 2 3 100 a rubber chicken
5 1 3 300 a jumper
6 5 4 500 a soapbox
7 3 3 200 a adhesive
8 4 1 300 a lotion
9 4 4 500 a sweater
10 5 4 600 a jeans
Error Log:
2019-12-27 06:17:22,180 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/tmp/temp2029752934/tmp-883821114/part-r-00000:0+130
2019-12-27 06:17:22,242 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584)
2019-12-27 06:17:22,242 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - 100
2019-12-27 06:17:22,242 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - soft limit at 83886080
2019-12-27 06:17:22,242 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
2019-12-27 06:17:22,242 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
2019-12-27 06:17:22,244 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-12-27 06:17:22,248 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2019-12-27 06:17:22,248 [LocalJobRunner Map Task Executor #0] WARN - SchemaTupleBackend has already been initialized
2019-12-27 06:17:22,250 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: C[7,4],B[6,4] C: C[7,4],B[6,4] R: C[7,4]
2019-12-27 06:17:22,254 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
2019-12-27 06:17:22,254 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2019-12-27 06:17:22,254 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Spilling map output
2019-12-27 06:17:22,254 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufend = 100; bufvoid = 104857600
2019-12-27 06:17:22,254 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
2019-12-27 06:17:22,262 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine - Aliases being processed per job phase (AliasName[line,offset]): M: C[7,4],B[6,4] C: C[7,4],B[6,4] R: C[7,4]
2019-12-27 06:17:22,264 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
2019-12-27 06:17:22,265 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1424814286_0002_m_000000_0 is done. And is in the process of committing
2019-12-27 06:17:22,266 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -map
2019-12-27 06:17:22,266 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1424814286_0002_m_000000_0' done.
2019-12-27 06:17:22,266 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -Finishing task: attempt_local1424814286_0002_m_000000_0
2019-12-27 06:17:22,266 [Thread-18] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2019-12-27 06:17:22,266 [Thread-18] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for reduce tasks
2019-12-27 06:17:22,267 [pool-9-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1424814286_0002_r_000000_0
2019-12-27 06:17:22,272 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2019-12-27 06:17:22,272 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-27 06:17:22,274 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2019-12-27 06:17:22,274 [pool-9-thread-1] INFO org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#2582aa54
2019-12-27 06:17:22,275 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager: memoryLimit=652528832, maxSingleShuffleLimit=163132208, mergeThreshold=430669056, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-12-27 06:17:22,275 [EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - attempt_local1424814286_0002_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-12-27 06:17:22,276 [localfetcher#2] INFO org.apache.hadoop.mapreduce.task.reduce.LocalFetcher - localfetcher#2 about to shuffle output of map attempt_local1424814286_0002_m_000000_0 decomp: 14 len: 18 to MEMORY
2019-12-27 06:17:22,277 [localfetcher#2] INFO org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput - Read 14 bytes from map-output for attempt_local1424814286_0002_m_000000_0
2019-12-27 06:17:22,277 [localfetcher#2] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - closeInMemoryFile -> map-output of size: 14, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->14
2019-12-27 06:17:22,277 [EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - EventFetcher is interrupted.. Returning
2019-12-27 06:17:22,278 [Readahead Thread #3] WARN - Failed readahead on ifile
EBADF: Bad file descriptor
at$POSIX.posix_fadvise(Native Method)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
2019-12-27 06:17:22,278 [pool-9-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
2019-12-27 06:17:22,280 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-12-27 06:17:22,280 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2019-12-27 06:17:22,280 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 7 bytes
2019-12-27 06:17:22,281 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merged 1 segments, 14 bytes to disk to satisfy reduce memory limit
2019-12-27 06:17:22,281 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 1 files, 18 bytes from disk
2019-12-27 06:17:22,281 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 0 segments, 0 bytes from memory into reduce
2019-12-27 06:17:22,281 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2019-12-27 06:17:22,281 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 7 bytes
2019-12-27 06:17:22,282 [pool-9-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
2019-12-27 06:17:22,283 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2019-12-27 06:17:22,283 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-12-27 06:17:22,284 [pool-9-thread-1] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128
2019-12-27 06:17:22,285 [pool-9-thread-1] WARN - SchemaTupleBackend has already been initialized
2019-12-27 06:17:22,286 [pool-9-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: C[7,4],B[6,4] C: C[7,4],B[6,4] R: C[7,4]
2019-12-27 06:17:22,287 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1424814286_0002_r_000000_0 is done. And is in the process of committing
2019-12-27 06:17:22,289 [pool-9-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
2019-12-27 06:17:22,289 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Task - Task attempt_local1424814286_0002_r_000000_0 is allowed to commit now
2019-12-27 06:17:22,292 [pool-9-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local1424814286_0002_r_000000_0' to file:/tmp/temp2029752934/tmp726323435/_temporary/0/task_local1424814286_0002_r_000000
2019-12-27 06:17:22,292 [pool-9-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2019-12-27 06:17:22,292 [pool-9-thread-1] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1424814286_0002_r_000000_0' done.
2019-12-27 06:17:22,292 [pool-9-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1424814286_0002_r_000000_0
2019-12-27 06:17:22,292 [Thread-18] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
2019-12-27 06:17:22,460 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1424814286_0002
2019-12-27 06:17:22,460 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases B,C
2019-12-27 06:17:22,460 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: C[7,4],B[6,4] C: C[7,4],B[6,4] R: C[7,4]
2019-12-27 06:17:22,463 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,464 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,465 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,471 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2019-12-27 06:17:22,474 [main] INFO - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.9.2 0.16.0 root 2019-12-27 06:17:20 2019-12-27 06:17:22 HASH_JOIN,GROUP_BY
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local1289071959_0001 2 1 n/a n/a n/a n/a n/a n/a n/a n/a trans,users,users_trans HASH_JOIN
job_local1424814286_0002 1 1 n/a n/a n/a n/a n/a n/a n/a n/a B,C GROUP_BY,COMBINER file:/tmp/temp2029752934/tmp726323435,
Successfully read 5 records from: "/tmp/users.txt"
Successfully read 10 records from: "/tmp/transaction.txt"
Successfully stored 1 records in: "file:/tmp/temp2029752934/tmp726323435"
Total records written : 1
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1289071959_0001 -> job_local1424814286_0002,
2019-12-27 06:17:22,475 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,476 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,477 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,485 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,486 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,487 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metricswith processName=JobTracker, sessionId= - already initialized
2019-12-27 06:17:22,492 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 15 time(s).
2019-12-27 06:17:22,493 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 55 time(s).
2019-12-27 06:17:22,493 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2019-12-27 06:17:22,496 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - is deprecated. Instead, use fs.defaultFS
2019-12-27 06:17:22,496 [main] WARN - SchemaTupleBackend has already been initialized
2019-12-27 06:17:22,503 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input files to process : 1
2019-12-27 06:17:22,503 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-12-27 06:17:22,541 [main] INFO org.apache.pig.Main - Pig script completed in 2 seconds and 965 milliseconds (2965 ms)
First of all: It seems that you are starting up with Pig. It may be valuable to know that Cloudera recently decided to deprecate Pig. It will of course not cease to exist, but think twice if you are planning to pick up a new skill or implement new use cases. I would recommend looking into Hive/Spark/Impala as more future proof alternatives.
Your job succeeds, but presumably not with output you want. There are several hints to what may be wrong (data types/field names) however this does not point at a specific problem in the code.
My recommendation would be to find out where the problem exactly occurs. Simply cut off the end of your code and print an intermediate result to see if you are still on track.
In the (likely) event you have a problem in your load statement already, it is worth noting that you can still narrow it down further. First load, and then apply the schema.
Given the data you have, first problem would be that you have no commas, so you must load the lines as a whole, then split them later. I used two or more spaces in the transactions file because your last column appears to be one string containing spaces. For accuracy, I suggest having a better delimiter than spaces/tabs.
Then the group by needs to reference the relations that the data comes from.
Everything else is fine, I think, though I'm not sure about the COUNT(X)
A = LOAD '/tmp/users.txt' USING PigStorage() as (line:chararray);
USERS = FOREACH A GENERATE FLATTEN(STRSPLIT(line, '\\s+')) AS (userid:int,email:chararray,language:chararray,location:chararray);
B = LOAD '/tmp/transactions.txt' USING PigStorage() as (line:chararray);
TRANS = FOREACH B GENERATE FLATTEN(STRSPLIT(line, '\\s\\s+')) AS (id:int,product:int,userid:int,purchase:double,desc:chararray);
X = JOIN USERS BY userid RIGHT, TRANS BY userid;
X_grouped = GROUP X BY (TRANS::desc, USERS::location);
RES = FOREACH X_grouped GENERATE group as comb, COUNT(X) AS Total;
\d RES;
((a jeans,HN),1)
((a jumper,FR),1)
((a jumper,GB),1)
((a jumper,IS),1)
((a jumper,US),1)
((a lotion,US),1)
((a soapbox,HN),1)
((a sweater,HN),1)
((a adhesive,FR),1)
((a rubber chicken,FR),1)

Problem in Flink UI on Mesos cluster with two slave nodes

I have four physical nodes with docker installed on each of them. I configured Mesos,Flink,Zookeeper,Hadoop and Marathon on docker of each one. I had already had three nodes,one slave and two masters, that I had run Flink on Marathon and its UI had been run without any problems. After that, I changed the cluster,two masters and two slaves. I added this Json file in Marathon, it was ran, but Flink UI was not shown in both slave nodes. The error is in following.
"id": "flink",
"cmd": "/home/flink-1.7.2/bin/ -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1",
"cpus": 1.0,
"mem": 1024,
"instances": 2
Service temporarily unavailable due to an ongoing leader election. Please refresh
I cleared Zookeeper contents with this commands:
/home/zookeeper-3.4.14/bin/ /var/lib/zookeeper/data/ -n 10
rm -rf /var/lib/zookeeper/data/version-2
rm /var/lib/zookeeper/data/
Also, I ran this command and delete Flink contents in Zookeeper:
delete /flink/default/leader/....
But still one of Flink UI has problem.
I have configured Flink high availability like this:
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha/
fs.hdfs.hadoopconf: /opt/hadoop/etc/hadoop
fs.hdfs.hdfssite: /opt/hadoop/etc/hadoop/hdfs-site.xml
recovery.zookeeper.path.mesos-workers: /mesos-workers /opt/java
Because I used Mesos cluster, I did not change any thing in flink-conf.yaml.
This is part of slave log which has error:
- Remote connection to [null] failed with
Connection refused: localhost/
2019-07-03 07:22:42,922 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797] has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/]
2019-07-03 07:22:43,003 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with
Connection refused: localhost/
2019-07-03 07:22:43,004 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797]
has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/]
2019-07-03 07:22:43,072 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with
Connection refused: localhost/
2019-07-03 07:22:43,073 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797]
has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/]
2019-07-03 07:23:45,891 WARN
- Error while retrieving the leader gateway. Retrying to connect to
This is Zookeeper log for the node that has the error in Flink UI:
2019-07-03 09:43:33,425 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg
2019-07-03 09:43:33,434 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: to address: /
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: to address: /
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: to address: /
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: to address: /
2019-07-03 09:43:33,435 [myid:] - WARN [main:QuorumPeerConfig#354] - Non-optimial configuration, consider an odd number of servers.
2019-07-03 09:43:33,436 [myid:] - INFO [main:QuorumPeerConfig#398] - Defaulting to majority quorums
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2019-07-03 09:43:33,445 [myid:3] - INFO [main:QuorumPeerMain#130] - Starting quorum peer
2019-07-03 09:43:33,450 [myid:3] - INFO [main:ServerCnxnFactory#117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-07-03 09:43:33,452 [myid:3] - INFO [main:NIOServerCnxnFactory#89] - binding to port
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1159] - tickTime set to 2000
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1205] - initLimit set to 10
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1179] - minSessionTimeout set to -1
2019-07-03 09:43:33,459 [myid:3] - INFO [main:QuorumPeer#1190] - maxSessionTimeout set to -1
2019-07-03 09:43:33,464 [myid:3] - INFO [main:QuorumPeer#1470] - QuorumPeer communication is not secured!
2019-07-03 09:43:33,464 [myid:3] - INFO [main:QuorumPeer#1499] - quorum.cnxn.threads.size set to 20
2019-07-03 09:43:33,465 [myid:3] - INFO [main:QuorumPeer#669] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-07-03 09:43:33,519 [myid:3] - INFO [main:QuorumPeer#684] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-07-03 09:43:33,566 [myid:3] - INFO [ListenerThread:QuorumCnxManager$Listener#736] - My election bind port: /
2019-07-03 09:43:33,574 [myid:3] - INFO [QuorumPeer[myid=3]/] - LOOKING
2019-07-03 09:43:33,575 [myid:3] - INFO [QuorumPeer[myid=3]/] - New election. My id = 3, proposed zxid=0x0
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LEADING (n.state), 1 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,582 [myid:3] - INFO [WorkerSender[myid=3]:QuorumCnxManager#347] - Have smaller server identifier, so dropping the connection: (4, 3)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerSender[myid=3]:QuorumCnxManager#347] - Have smaller server identifier, so dropping the connection: (4, 3)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LEADING (n.state), 1 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,584 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,585 [myid:3] - INFO [/$Listener#743] - Received connection request /
2019-07-03 09:43:33,585 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,585 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,587 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 4 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,587 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1025] - Connection broken for id 4, my id = 3, error =
at org.apache.zookeeper.server.quorum.QuorumCnxManager$
2019-07-03 09:43:33,589 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1028] - Interrupting SendWorker
2019-07-03 09:43:33,588 [myid:3] - INFO [/$Listener#743] - Received connection request /
2019-07-03 09:43:33,589 [myid:3] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#941] - Interrupted while waiting for message on queue
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(
at java.util.concurrent.ArrayBlockingQueue.poll(
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(
at org.apache.zookeeper.server.quorum.QuorumCnxManager$
2019-07-03 09:43:33,589 [myid:3] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#951] - Send worker leaving thread
2019-07-03 09:43:33,590 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,590 [myid:3] - INFO [QuorumPeer[myid=3]/] - FOLLOWING
2019-07-03 09:43:33,591 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x3 (n.peerEpoch) FOLLOWING (my state)
2019-07-03 09:43:33,593 [myid:3] - INFO [QuorumPeer[myid=3]/] - TCP NoDelay set to: true
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:java.version=1.8.0_191
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:java.vendor=Oracle Corporation
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:java.class.path=/home/zookeeper-3.4.14/bin/../zookeeper-server/target/classes:/home/zookeeper-3.4.14/bin/../build/classes:/home/zookeeper-3.4.14/bin/../zookeeper-server/target/lib/*.jar:/home/zookeeper-3.4.14/bin/../build/lib/*.jar:/home/zookeeper-3.4.14/bin/../lib/slf4j-log4j12-1.7.25.jar:/home/zookeeper-3.4.14/bin/../lib/slf4j-api-1.7.25.jar:/home/zookeeper-3.4.14/bin/../lib/netty-3.10.6.Final.jar:/home/zookeeper-3.4.14/bin/../lib/log4j-1.2.17.jar:/home/zookeeper-3.4.14/bin/../lib/jline-0.9.94.jar:/home/zookeeper-3.4.14/bin/../lib/audience-annotations-0.5.0.jar:/home/zookeeper-3.4.14/bin/../zookeeper-3.4.14.jar:/home/zookeeper-3.4.14/bin/../zookeeper-server/src/main/resources/lib/*.jar:/home/zookeeper-3.4.14/bin/../conf:
2019-07-03 09:43:33,598 [myid:3] - INFO
[QuorumPeer[myid=3]/] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:java.compiler=<NA>
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:os.arch=amd64
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:os.version=4.18.0-21-generic
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:user.home=/root
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/] - Server environment:user.dir=/
2019-07-03 09:43:33,599 [myid:3] - INFO
[QuorumPeer[myid=3]/] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/lib/zookeeper/data/version-2 snapdir /var/lib/zookeeper/data/version-2
2019-07-03 09:43:33,600 [myid:3] - INFO
[QuorumPeer[myid=3]/] - FOLLOWING - LEADER ELECTION TOOK - 25
2019-07-03 09:43:33,601 [myid:3] - INFO
[QuorumPeer[myid=3]/$QuorumServer#185] - Resolved hostname: to address: /
2019-07-03 09:43:33,637 [myid:3] - INFO [QuorumPeer[myid=3]/] - Getting a snapshot from leader 0x300000000
2019-07-03 09:43:33,644 [myid:3] - INFO [QuorumPeer[myid=3]/] - Snapshotting: 0x300000000 to /var/lib/zookeeper/data/version-2/snapshot.300000000
2019-07-03 09:44:24,320 [myid:3] - INFO [NIOServerCxn.Factory:] - Accepted socket connection from /
2019-07-03 09:44:24,324 [myid:3] - INFO [NIOServerCxn.Factory:] - Client attempting to establish new session at /
2019-07-03 09:44:24,327 [myid:3] - WARN
[QuorumPeer[myid=3]/] - Got zxid 0x300000001 expected 0x1
2019-07-03 09:44:24,327 [myid:3] - INFO [SyncThread:3:FileTxnLog#216] - Creating new log file: log.300000001
2019-07-03 09:44:24,384 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860000 with negotiated timeout 10000 for client /
2019-07-03 09:44:24,892 [myid:3] - INFO [NIOServerCxn.Factory:] - Accepted socket connection from /
2019-07-03 09:44:24,892 [myid:3] - INFO [NIOServerCxn.Factory:] - Client attempting to establish new session at /
2019-07-03 09:44:24,908 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860001 with negotiated timeout 10000 for client /
2019-07-03 09:44:26,410 [myid:3] - INFO [NIOServerCxn.Factory:] - Accepted socket connection from /
2019-07-03 09:44:26,411 [myid:3] - WARN [NIOServerCxn.Factory:] - Connection request from old client /; will be dropped if server is in r-o mode
2019-07-03 09:44:26,411 [myid:3] - INFO [NIOServerCxn.Factory:] - Client attempting to establish new session at /
2019-07-03 09:44:26,422 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860002 with negotiated timeout 10000 for client /
2019-07-03 09:45:41,553 [myid:3] - INFO [NIOServerCxn.Factory:] - Closed socket connection for client / which had sessionid 0x300393be5860001
2019-07-03 09:45:41,567 [myid:3] - INFO [NIOServerCxn.Factory:] - Closed socket connection for client / which had sessionid 0x300393be5860000
2019-07-03 09:45:41,597 [myid:3] - WARN [NIOServerCxn.Factory:] - Unable to read additional data from client sessionid 0x300393be5860002, likely client has closed socket
2019-07-03 09:45:41,597 [myid:3] - INFO [NIOServerCxn.Factory:] - Closed socket connection for client / which had sessionid 0x300393be5860002
2019-07-03 09:46:20,896 [myid:3] - INFO [NIOServerCxn.Factory:] - Accepted socket connection from /
2019-07-03 09:46:20,901 [myid:3] - INFO [NIOServerCxn.Factory:] - Client attempting to establish new session at /
2019-07-03 09:46:20,916 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860003 with negotiated timeout 40000 for client /
2019-07-03 09:46:43,827 [myid:3] - INFO [NIOServerCxn.Factory:] - Accepted socket connection from /
2019-07-03 09:46:43,830 [myid:3] - INFO [NIOServerCxn.Factory:] - Client attempting to establish new session at /
2019-07-03 09:46:43,856 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860004 with negotiated timeout 10000 for client /
2019-07-03 09:46:44,336 [myid:3] - INFO [NIOServerCxn.Factory:] -
Accepted socket connection from /
2019-07-03 09:46:44,336 [myid:3] - INFO [NIOServerCxn.Factory:] - Client attempting to establish new session at /
2019-07-03 09:46:44,348 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694]
- Established session 0x300393be5860005 with negotiated timeout 10000 for client /
Would you please guide me how to use both Mesos slaves to run Flink platform?
Any help would be really appreciated.

Apache storm Supervisor routinely shutting down worker

I made topology in Apache Storm(0.9.6) with kafka-storm, zookeeper(3.4.6)
(3 zookeeper each node, and 3 supervisor each node. operate 3 topology)
I add 2 storm&zookeeper nodes and change topology.worker configuration 3 to 5.
But after 2 nodes, storm supervisor routinely shutting down worker. Checked with iostat command, read and write throughput is under 1mb.
In supervisor log, show like below.
2016-10-19T15:07:38.904+0900 b.s.d.supervisor [INFO] Shutting down and clearing state for id ee13ada9-641e-463a-9be5-f3ed66fdb8f3. Current supervisor time: 1476857258. State: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1476857226, :storm-id "top3-17-1476839721", :executors #{[36 36] [6 6] [11 11] [16 16] [21 21] [26 26] [31 31] [-1 -1] [1 1]}, :port 6701}
2016-10-19T15:07:38.905+0900 b.s.d.supervisor [INFO] Shutting down b278933f-f9c7-4189-b615-1d70c7988f17:ee13ada9-641e-463a-9be5-f3ed66fdb8f3
2016-10-19T15:07:38.907+0900 b.s.util [INFO] Error when trying to kill 9306. Process is probably already dead.
2016-10-19T15:07:44.948+0900 b.s.d.supervisor [INFO] Shutting down and clearing state for id d6df820a-7c29-4bff-a606-9e8e36fafab2. Current supervisor time: 1476857264. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1476857264, :storm-id "top3-17-1476839721", :executors #{[-1 -1]}, :port 6701}
2016-10-19T15:07:44.949+0900 b.s.d.supervisor [INFO] Shutting down b278933f-f9c7-4189-b615-1d70c7988f17:d6df820a-7c29-4bff-a606-9e8e36fafab2
2016-10-19T15:07:45.954+0900 b.s.util [INFO] Error when trying to kill 11171. Process is probably already dead.
2016-10-19T15:07:45.954+0900 b.s.d.supervisor [INFO] Shut down b278933f-f9c7-4189-b615-1d70c7988f17:d6df820a-7c29-4bff-a606-9e8e36fafab2
And in zookeeper.out log... show like below(xxx ip address is another storm zookeeper address)
2016-09-20 02:31:06,031 [myid:5] - INFO [NIOServerCxn.Factory:] - Closed socket connection for client / which had sessionid 0x5574372bbf00004
2016-09-20 02:31:08,116 [myid:5] - WARN [NIOServerCxn.Factory:] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x5574372bbf0000a, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(
I don't know why worker is down routinely. How can i fix it? Is this something wrong?
Oh, my zookeeper and storm configuration is like below
zoo.cfg(same all nodes)
- "storm01"
- "storm02"
- "storm03"
- "storm04"
- "storm05"
storm.zookeeper.port: 2181
election.port:3888 "storm01"
- "storm01"
- "storm02"
- "storm03"
- "storm04"
- "storm05"
- 6700
- 6701
- 6702
- 6703
- 6704
storm.local.dir: /log/storm-data
worker.childopts: "-Xmx5120m"
topology.workers: 5
storm.log.dir: /log/storm-log

Table not fount exception (E0729) , while executing hive query from oozie workflow

select * from ufo_session_details limit 5
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hive-wf">
<start to="hive-node"/>
<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<ok to="end"/>
<error to="fail"/>
<kill name="fail">
<message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
<end name="end"/>
oozie.libpath = ${nameNode}/tmp/nt283s
Error Log
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001] Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
stderr logs
Logging initialized using configuration in file:/opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'ufo_session_details'
Intercepting System.exit(10001)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
syslog logs
2015-11-03 00:26:20,599 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2015-11-03 00:26:20,902 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/8045442539840332845_326451332_1282624021/ <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/Script_SusRes.q
2015-11-03 00:26:20,911 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/3435440518513182209_187825668_1219418250/ <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/Script_SusRes.sql
2015-11-03 00:26:20,913 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/-5883507949569818012_2054276612_1203833745/ <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/lib
2015-11-03 00:26:20,916 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/distcache/6682880817470643170_1186359172_1225814386/ <- /opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/workflow_SusRes.xml
2015-11-03 00:26:21,441 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2015-11-03 00:26:21,448 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#698cdde3
2015-11-03 00:26:21,602 INFO org.apache.hadoop.mapred.MapTask: Processing split: hdfs://
2015-11-03 00:26:21,630 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library
2015-11-03 00:26:21,635 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
2015-11-03 00:26:21,652 WARN Snappy native library is available
2015-11-03 00:26:21,652 INFO Snappy native library loaded
2015-11-03 00:26:21,663 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2015-11-03 00:26:22,654 INFO SessionState:
Logging initialized using configuration in file:/opt/app/workload/hadoop/mapred/local/taskTracker/wfe/jobcache/job_201510130626_0451/attempt_201510130626_0451_m_000000_0/work/
2015-11-03 00:26:22,910 INFO org.apache.hadoop.hive.ql.Driver: <PERFLOG>
2015-11-03 00:26:22,911 INFO org.apache.hadoop.hive.ql.Driver: <PERFLOG method=TimeToSubmit>
2015-11-03 00:26:22,912 INFO org.apache.hadoop.hive.ql.Driver: <PERFLOG method=compile>
2015-11-03 00:26:22,998 INFO hive.ql.parse.ParseDriver: Parsing command: select * from ufo_session_details limit 5
2015-11-03 00:26:23,618 INFO hive.ql.parse.ParseDriver: Parse Completed
2015-11-03 00:26:23,799 INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: Starting Semantic Analysis
2015-11-03 00:26:23,802 INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
2015-11-03 00:26:23,802 INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer: Get metadata for source tables
2015-11-03 00:26:23,990 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-11-03 00:26:24,031 INFO org.apache.hadoop.hive.metastore.ObjectStore: ObjectStore, initialize called
2015-11-03 00:26:24,328 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
2015-11-03 00:26:28,112 INFO org.apache.hadoop.hive.metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-11-03 00:26:28,169 INFO org.apache.hadoop.hive.metastore.ObjectStore: Initialized ObjectStore
2015-11-03 00:26:30,767 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: 0: get_table : db=default tbl=ufo_session_details
2015-11-03 00:26:30,768 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=wfe ip=unknown-ip-addr cmd=get_table : db=default tbl=ufo_session_details
2015-11-03 00:26:30,781 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-11-03 00:26:30,782 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-11-03 00:26:33,319 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: NoSuchObjectException(message:default.ufo_session_details table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(
at com.sun.proxy.$Proxy11.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(
at com.sun.proxy.$Proxy12.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(
at org.apache.hadoop.hive.ql.Driver.compile(
at org.apache.hadoop.hive.ql.Driver.compile(
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(
at org.apache.hadoop.hive.cli.CliDriver.processCmd(
at org.apache.hadoop.hive.cli.CliDriver.processLine(
at org.apache.hadoop.hive.cli.CliDriver.processLine(
at org.apache.hadoop.hive.cli.CliDriver.processReader(
at org.apache.hadoop.hive.cli.CliDriver.processFile(
at org.apache.hadoop.hive.cli.CliDriver.main(
at org.apache.oozie.action.hadoop.HiveMain.runHive(
at org.apache.oozie.action.hadoop.HiveMain.main(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.hadoop.mapred.MapTask.runOldMapper(
at org.apache.hadoop.mapred.Child$
at Method)
at org.apache.hadoop.mapred.Child.main(
1677 [main] INFO org.apache.hadoop.hive.ql.Driver -
1679 [main] INFO org.apache.hadoop.hive.ql.Driver -
1680 [main] INFO org.apache.hadoop.hive.ql.Driver -
1771 [main] INFO hive.ql.parse.ParseDriver - Parsing command: select * from ufo_session_master limit 5
2512 [main] INFO hive.ql.parse.ParseDriver - Parse Completed
2683 [main] INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Starting Semantic Analysis
2686 [main] INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Completed phase 1 of Semantic Analysis
2686 [main] INFO org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Get metadata for source tables
2831 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://
2952 [main] WARN hive.metastore - Failed to connect to the MetaStore Server...
2952 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
3952 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://
3959 [main] WARN hive.metastore - Failed to connect to the MetaStore Server...
3960 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
4960 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://
4967 [main] WARN hive.metastore - Failed to connect to the MetaStore Server...
4967 [main] INFO hive.metastore - Waiting 1 seconds before next connection attempt.
5978 [main] ERROR org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table ufo_session_master

Graylog2 - Startup fail. Address already in use

I am trying to install graylog2. I have installed open-jdk7. I have also installed elasticsearch and mongodb using apt on ubuntu 14.04.
I am new to both graylog and elasticsearch. I just want to try a trail installation and try these out. And I also did search similar questions and tried their suggestions. But none of them worked for my case.
I have followed the installation instructions on But when I try to start the graylog2 server I get the following error.
2015-02-12 03:19:36,216 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2015-02-12 03:19:36,222 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2015-02-12 03:19:36,225 INFO : org.graylog2.periodical.IndexerClusterCheckerThread - Indexer not fully initialized yet. Skipping periodic cluster check.
2015-02-12 03:19:36,229 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.ThroughputCounterManagerThread] periodical in [0s], polling every [1s].
2015-02-12 03:19:36,280 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.DeadLetterThread] periodical, running forever.
2015-02-12 03:19:36,295 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [0s], polling every [20s].
2015-02-12 03:19:36,299 INFO : org.graylog2.periodical.Periodicals - Starting [org.graylog2.periodical.InputCacheWorkerThread] periodical, running forever.
2015-02-12 03:19:36,334 DEBUG: org.graylog2.periodical.ClusterHealthCheckThread - No input running in cluster!
2015-02-12 03:19:36,368 DEBUG: org.graylog2.caches.DiskJournalCache - Committing output-cache (entries 0)
2015-02-12 03:19:36,383 DEBUG: org.graylog2.caches.DiskJournalCache - Committing input-cache (entries 0)
2015-02-12 03:19:36,885 ERROR: - Service IndexerSetupService [FAILED] has failed in the STARTING state.
org.elasticsearch.transport.BindTransportException: Failed to bind to [9300]
at org.elasticsearch.transport.netty.NettyTransport.doStart(
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(
at org.elasticsearch.transport.TransportService.doStart(
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(
at org.elasticsearch.node.internal.InternalNode.start(
at org.graylog2.initializers.IndexerSetupService.startUp(
Caused by: Failed to bind to: /
at org.elasticsearch.common.netty.bootstrap.ServerBootstrap.bind(
at org.elasticsearch.transport.netty.NettyTransport$3.onPortNumber(
at org.elasticsearch.common.transport.PortsRange.iterate(
at org.elasticsearch.transport.netty.NettyTransport.doStart(
... 8 more
Caused by: Address already in use
at Method)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
... 1 more
Elastic search is showing the following status
"cluster_name" : "graylog2",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
The following are the changes I made to elasticsearch.yml graylog2
network.bind_host: false ["", MYSYS IP]
and graylog2.conf
is_master = true
password_secret = changed
root_password_sha2 = changed
elasticsearch_max_docs_per_index = 20000000
elasticsearch_shards = 1
elasticsearch_replicas = 0
elasticsearch_cluster_name = graylog2
elasticsearch_discovery_zen_ping_multicast_enabled = false
elasticsearch_discovery_zen_ping_unicast_hosts = IP_ARR:9300
mongodb_useauth = false
I tried killing the process on the port 9300 and tried starting graylog again. But I got the following error
2015-02-12 04:01:24,976 INFO : org.elasticsearch.transport - [graylog2-server] bound_address {inet[/]}, publish_address {inet[/]}
2015-02-12 04:01:25,227 INFO : org.elasticsearch.discovery - [graylog2-server] graylog2/LGkZJDz1SoeENKj6Rr0e8w
2015-02-12 04:01:25,252 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] processing [update local node]: execute
2015-02-12 04:01:25,253 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] cluster state updated, version [0], source [update local node]
2015-02-12 04:01:25,259 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] set local cluster state to version 0
2015-02-12 04:01:25,259 DEBUG: org.elasticsearch.cluster.service - [graylog2-server] processing [update local node]: done applying updated cluster_state (version: 0)
2015-02-12 04:01:25,325 WARN : org.elasticsearch.transport.netty - [graylog2-server] exception caught on transport layer [[id: 0x82f30fa7]], closing connection
2015-02-12 04:01:28,536 DEBUG: - [graylog2-server] no known master node, scheduling a retry
2015-02-12 04:01:28,564 DEBUG: org.elasticsearch.transport.netty - [graylog2-server] disconnected from [[graylog2-server][LGkZJDz1SoeENKj6Rr0e8w][ubuntu-greylog-9945][inet[/]]{client=true, data=false, master=false}]
2015-02-12 04:01:28,573 DEBUG: org.elasticsearch.discovery.zen - [graylog2-server] filtered ping responses: (filter_client[true], filter_data[false]) {none}
2015-02-12 04:01:28,590 WARN : org.elasticsearch.transport.netty - [graylog2-server] exception caught on transport layer [[id: 0xe27feaff]], closing connection
Can you please point out to what I am doing wrong here and what I am missing??
if ES and greylog2 running on same server, try (del/comment) in elasticsearch.conf
#transport.tcp.port: 9300
and (add/uncomment) in greylog.conf
elasticsearch_transport_tcp_port = 9350
