I already changed my web app's port from 3030 to 8081(eb-default). But I still get the "502" bad gateway error.
here is my log file.
[2016-03-26T03:26:57.709Z] DEBUG [12162] : Reading config file: /etc/elasticbeanstalk/.aws-eb-stack.properties
[2016-03-26T03:26:57.709Z] DEBUG [12162] : Checking if the command processor should execute...
[2016-03-26T03:26:57.711Z] DEBUG [12162] : Checking whether the command is applicable to instance (i-525a308a)..
[2016-03-26T03:26:57.711Z] INFO [12162] : Command is applicable to this instance (i-525a308a)..
[2016-03-26T03:26:57.711Z] DEBUG [12162] : Checking if the received command stage is valid..
[2016-03-26T03:26:57.711Z] INFO [12162] : No stage_num in command. Valid stage..
[2016-03-26T03:26:57.711Z] INFO [12162] : Received command CMD-TailLogs: {"execution_data":"{\"aws_access_key_id\":\"ASIAJDXEY3WNMSVE3VPA\",\"signature\":\"+dKY04cX3l4Yd443BItdPVBn6Zc=\",\"security_token\":\"AQoDYXdzEG0a8AJjWGaSl9U1L1NRB5WqSKl198DCqFWu7qQ0veWFkdmJlQwhpIEGZrr41GOHTXgylzOx1aBAZPTp3lTU81YtRqDy4JDS9zysHCn2+6vNv9M1k\\\/ztyanXbzOdZB2ZmwKd9pnj5XwN1wcGe88YACQO6P3ZF7sIsuMBkFL\\\/xz+aIgSL\\\/v3hXdXkRUHlLTgZMj2ZEBoVvOeXYp2c1w6kzHONT1DGLwq1IPjlubbGKdAia2pUGixKv7RNMMJc1VuaoUSlW4+tmuFvbEjSpjMoA91pRGmvBJpp7gkl0fsnFY6uOC+87hlENXvqxszgQt9FhyzQY3dRUeIsCG1HIUm33nZciJzsDMyP\\\/M4ZqaG+cZ5YFCGvJXDtVNLTilpC5OYpGZZD4Q\\\/F0kXmCZ20\\\/ofqFfDPQD92TG64X+JOhREqIFquewtDk9psCJoYbZ4ODDWwTXcLVa1VK+Kp7bXtiaJ6dicTYCU9lBUiNt2UlvnUOjXn+fE\\\/LiCn49e3BQ==\",\"policy\":\"eyJleHBpcmF0aW9uIjoiMjAxNi0wMy0yNlQwMzo1Njo0OC4zNDVaIiwiY29uZGl0aW9ucyI6W1sic3RhcnRzLXdpdGgiLCIkeC1hbXotbWV0YS10aW1lX3N0YW1wIiwiIl0sWyJzdGFydHMtd2l0aCIsIiR4LWFtei1tZXRhLXB1Ymxpc2hfbWVjaGFuaXNtIiwiIl0sWyJzdGFydHMtd2l0aCIsIiRrZXkiLCJyZXNvdXJjZXNcL2Vudmlyb25tZW50c1wvbG9nc1wvIl0sWyJzdGFydHMtd2l0aCIsIiR4LWFtei1tZXRhLWJhdGNoX2lkIiwiIl0sWyJzdGFydHMtd2l0aCIsIiR4LWFtei1tZXRhLWZpbGVfbmFtZSIsIiJdLFsic3RhcnRzLXdpdGgiLCIkeC1hbXotc2VjdXJpdHktdG9rZW4iLCIiXSxbInN0YXJ0cy13aXRoIiwiJENvbnRlbnQtVHlwZSIsIiJdLFsiZXEiLCIkYnVja2V0IiwiZWxhc3RpY2JlYW5zdGFsay11cy13ZXN0LTItOTk2Njk5MTA4MTI4Il0sWyJlcSIsIiRhY2wiLCJwcml2YXRlIl1dfQ==\"}","instance_ids":["i-525a308a"],"data":"92b9ea79-f302-11e5-bc66-9761883272e3","command_name":"CMD-TailLogs","api_version":"1.0","resource_name":"AWSEBAutoScalingGroup","request_id":"92b9ea79-f302-11e5-bc66-9761883272e3","command_timeout":"600"}
[2016-03-26T03:26:57.711Z] INFO [12162] : Command processor should execute command.
[2016-03-26T03:26:57.711Z] DEBUG [12162] : Storing current stage..
[2016-03-26T03:26:57.711Z] DEBUG [12162] : Stage_num does not exist. Not saving null stage. Returning..
[2016-03-26T03:26:57.712Z] INFO [12162] : Executing CMD-TailLogs - stage
[2016-03-26T03:26:57.712Z] INFO [12162] : Executing command: CMD-TailLogs...
[2016-03-26T03:26:57.716Z] DEBUG [12162] : Reading config file: /etc/elasticbeanstalk/.aws-eb-stack.properties
[2016-03-26T03:26:57.717Z] DEBUG [12162] : Refreshing metadata..
[2016-03-26T03:27:05.130Z] DEBUG [12162] : Refreshed environment metadata.
[2016-03-26T03:27:05.130Z] DEBUG [12162] : Retrieving metadata for key: AWS::ElasticBeanstalk::Ext||_ContainerConfigFileContent||commands..
[2016-03-26T03:27:05.132Z] DEBUG [12162] : Retrieving metadata for key: AWS::ElasticBeanstalk::Ext||_API||_Commands..
[2016-03-26T03:27:05.137Z] INFO [12162] : Found enabled addons: ["logpublish"].
[2016-03-26T03:27:05.368Z] INFO [12162] : Updating Command definition of addon logpublish.
[2016-03-26T03:27:05.368Z] DEBUG [12162] : Loaded definition of Command CMD-TailLogs.
[2016-03-26T03:27:05.368Z] INFO [12162] : Executing command CMD-TailLogs activities...
[2016-03-26T03:27:05.368Z] DEBUG [12162] : Setting environment variables..
[2016-03-26T03:27:05.368Z] INFO [12162] : Running AddonsBefore for command CMD-TailLogs...
[2016-03-26T03:27:05.369Z] DEBUG [12162] : Running stages of Command CMD-TailLogs from stage 0 to stage 0...
[2016-03-26T03:27:05.369Z] INFO [12162] : Running stage 0 of command CMD-TailLogs...
[2016-03-26T03:27:05.369Z] DEBUG [12162] : Loaded 1 actions for stage 0.
[2016-03-26T03:27:05.369Z] INFO [12162] : Running 1 of 1 actions: TailLogs...
I think It has something to do with stage num but I have no idea how to resolve this.
The problem was my code wasn't actually building locally. I fixed the code and upload it again.
I am receiving an UnknownHostException when running simple code with Spark on Mesos + Hadoop. When i run the first run, the job will show UnknownHostException and job fail, but when i run the second run, the job will finish.
The testing code on Spark1.6.0+ YARN + Hadoop is ok.
Testing code
file = sc.textFile("hdfs://cluster1/user/root/readme.txt")
file.count()
When first time run will fail ( Can't get Hadoop cluster name ) , the second run will success , find what happen.....
root#hadoopnn1 /opt/spark-1.6.0 > ./bin/pyspark --master mesos://zk://hadoopnn1.nogle.com:2181,hadoopnn2.nogle.com:2181,hadoopslave1.nogle.com:2181/mesos
Python 3.5.1 (default, Dec 8 2015, 10:40:49)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark-1.6.0/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/01/26 11:52:36 INFO spark.SparkContext: Running Spark version 1.6.0
I0126 11:52:36.671980 10758 slave.cpp:3896] Framework 53fdde51-729b-4aa0-b0b1-2fc93b59de61-0002 seems to have exited. Ignoring shutdown timeout for executor '0'
16/01/26 11:52:36 INFO spark.SecurityManager: Changing view acls to: root
16/01/26 11:52:36 INFO spark.SecurityManager: Changing modify acls to: root
16/01/26 11:52:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/01/26 11:52:37 INFO util.Utils: Successfully started service 'sparkDriver' on port 37732.
16/01/26 11:52:37 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/01/26 11:52:37 INFO Remoting: Starting remoting
16/01/26 11:52:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.1.30.112:45752]
16/01/26 11:52:37 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 45752.
16/01/26 11:52:37 INFO spark.SparkEnv: Registering MapOutputTracker
16/01/26 11:52:37 INFO spark.SparkEnv: Registering BlockManagerMaster
16/01/26 11:52:37 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-46550571-e2c2-4c88-b0ce-78fc159fd5d8
16/01/26 11:52:37 INFO storage.MemoryStore: MemoryStore started with capacity 511.5 MB
16/01/26 11:52:37 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/01/26 11:52:37 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/01/26 11:52:37 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/01/26 11:52:37 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/01/26 11:52:37 INFO ui.SparkUI: Started SparkUI at http://10.1.30.112:4040
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#716: Client environment:host.name=hadoopnn1
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#723: Client environment:os.name=Linux
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#724: Client environment:os.arch=2.6.32-504.el6.x86_64
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#725: Client environment:os.version=#1 SMP Wed Oct 15 04:27:16 UTC 2014
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#733: Client environment:user.name=root
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#741: Client environment:user.home=/root
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#log_env#753: Client environment:user.dir=/opt/spark-1.6.0
2016-01-26 11:52:37,964:11756(0x7fbce21fc700):ZOO_INFO#zookeeper_init#786: Initiating client connection, host=hadoopnn1.nogle.com:2181,hadoopnn2.nogle.com:2181,hadoopslave1.nogle.com:2181 sessionTimeout=10000 watcher=0x32b1d77c40 sessionId=0 sessionPasswd=<null> context=0x7fbd4c0017a0 flags=0
I0126 11:52:37.964711 11846 sched.cpp:166] Version: 0.26.0
2016-01-26 11:52:37,966:11756(0x7fbcde5f6700):ZOO_INFO#check_events#1703: initiated connection to server [10.1.30.113:2181]
2016-01-26 11:52:38,165:11756(0x7fbcde5f6700):ZOO_INFO#check_events#1750: session establishment complete on server [10.1.30.113:2181], sessionId=0x252529a8a57001e, negotiated timeout=10000
I0126 11:52:38.166128 11837 group.cpp:331] Group process (group(1)#10.1.30.112:57650) connected to ZooKeeper
I0126 11:52:38.166153 11837 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0126 11:52:38.166162 11837 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I0126 11:52:38.166968 11839 detector.cpp:156] Detected a new leader: (id='44')
I0126 11:52:38.167044 11842 group.cpp:674] Trying to get '/mesos/json.info_0000000044' in ZooKeeper
I0126 11:52:38.167554 11840 detector.cpp:482] A new leading master (UPID=master#10.1.30.112:5050) is detected
I0126 11:52:38.167595 11837 sched.cpp:264] New master detected at master#10.1.30.112:5050
I0126 11:52:38.167809 11837 sched.cpp:274] No credentials provided. Attempting to register without authentication
I0126 11:52:38.168429 11840 sched.cpp:643] Framework registered with 53fdde51-729b-4aa0-b0b1-2fc93b59de61-0003
16/01/26 11:52:38 INFO mesos.CoarseMesosSchedulerBackend: Registered as framework ID 53fdde51-729b-4aa0-b0b1-2fc93b59de61-0003
16/01/26 11:52:38 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35540.
16/01/26 11:52:38 INFO netty.NettyBlockTransferService: Server created on 35540
16/01/26 11:52:38 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/01/26 11:52:38 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.1.30.112:35540 with 511.5 MB RAM, BlockManagerId(driver, 10.1.30.112, 35540)
16/01/26 11:52:38 INFO storage.BlockManagerMaster: Registered BlockManager
I0126 11:52:38.215968 10751 slave.cpp:1294] Got assigned task 0 for framework 53fdde51-729b-4aa0-b0b1-2fc93b59de61-0003
I0126 11:52:38.216191 10751 slave.cpp:1410] Launching task 0 for framework 53fdde51-729b-4aa0-b0b1-2fc93b59de61-0003
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Using Python version 3.5.1 (default, Dec 8 2015 10:40:49)
SparkContext available as sc, HiveContext available as sqlContext.
>>> file = sc.textFile("hdfs://hadoopcluster1/user/root/readme.txt")
file.count()16/01/26 11:52:40 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 169.9 KB, free 169.9 KB)
16/01/26 11:52:40 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 15.3 KB, free 185.2 KB)
16/01/26 11:52:40 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.1.30.112:35540 (size: 15.3 KB, free: 511.5 MB)
16/01/26 11:52:40 INFO spark.SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2
>>> 16/01/26 11:52:40 INFO mesos.CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (hadoopnn1.nogle.com:39627) with ID 59f03bb4-1760-412f-a2ee-fb98d21ad6af-S3
16/01/26 11:52:40 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoopnn1.nogle.com:43934 with 511.5 MB RAM, BlockManagerId(59f03bb4-1760-412f-a2ee-fb98d21ad6af-S3, hadoopnn1.nogle.com, 43934)
16/01/26 11:52:43 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
16/01/26 11:52:43 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev ee66447eca13aaf50524c5266e409796640134a8]
16/01/26 11:52:43 INFO mapred.FileInputFormat: Total input paths to process : 1
..................
..................
16/01/26 11:52:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hadoopnn1.nogle.com): java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoopcluster1
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:212)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: hadoopcluster1
... 38 more
>>> file = sc.textFile("hdfs://hadoopcluster1/user/root/readme.txt")
file.count()16/01/26 11:52:46 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 91.2 KB, free 285.8 KB)
16/01/26 11:52:46 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 21.3 KB, free 307.0 KB)
16/01/26 11:52:46 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.1.30.112:35540 (size: 21.3 KB, free: 511.5 MB)
16/01/26 11:52:46 INFO spark.SparkContext: Created broadcast 2 from textFile at NativeMethodAccessorImpl.java:-2
>>>
16/01/26 11:52:46 INFO mapred.FileInputFormat: Total input paths to process : 1
16/01/26 11:52:46 INFO spark.SparkContext: Starting job: count at <stdin>:1
16/01/26 11:52:46 INFO scheduler.DAGScheduler: Got job 1 (count at <stdin>:1) with 2 output partitions
16/01/26 11:52:46 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at <stdin>:1)
16/01/26 11:52:46 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/01/26 11:52:46 INFO scheduler.DAGScheduler: Missing parents: List()
16/01/26 11:52:46 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (PythonRDD[5] at count at <stdin>:1), which has no missing parents
16/01/26 11:52:46 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 5.7 KB, free 312.7 KB)
16/01/26 11:52:46 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 3.7 KB, free 316.4 KB)
16/01/26 11:52:46 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.1.30.112:35540 (size: 3.7 KB, free: 511.5 MB)
16/01/26 11:52:46 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
16/01/26 11:52:46 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (PythonRDD[5] at count at <stdin>:1)
16/01/26 11:52:46 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
16/01/26 11:52:46 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 8, hadoopnn1.nogle.com, partition 0,NODE_LOCAL, 2144 bytes)
16/01/26 11:52:46 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 9, hadoopnn1.nogle.com, partition 1,NODE_LOCAL, 2144 bytes)
16/01/26 11:52:46 INFO spark.ContextCleaner: Cleaned accumulator 2
16/01/26 11:52:46 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 10.1.30.112:35540 in memory (size: 3.7 KB, free: 511.5 MB)
16/01/26 11:52:46 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hadoopnn1.nogle.com:43934 (size: 3.7 KB, free: 511.5 MB)
16/01/26 11:52:46 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on hadoopnn1.nogle.com:43934 in memory (size: 3.7 KB, free: 511.5 MB)
16/01/26 11:52:46 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoopnn1.nogle.com:43934 (size: 21.3 KB, free: 511.5 MB)
16/01/26 11:52:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 8) in 958 ms on hadoopnn1.nogle.com (1/2)
16/01/26 11:52:47 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 9) in 958 ms on hadoopnn1.nogle.com (2/2)
16/01/26 11:52:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/01/26 11:52:47 INFO scheduler.DAGScheduler: ResultStage 1 (count at <stdin>:1) finished in 0.961 s
16/01/26 11:52:47 INFO scheduler.DAGScheduler: Job 1 finished: count at <stdin>:1, took 1.006764 s
132
I try to add export HADOOP_CONF_DIR to spark-env.sh but still the same.
export HADOOP_CONF_DIR="/etc/hadoop/conf"
It seems something happen, so I search the stackoverflow and find a likely question, so I add the Workaround config to spark-defaults.xml and it's work.
spark.files file:///etc/hadoop/conf/hdfs-site.xml,file:///etc/hadoop/conf/core-site.xml
UnknownHostException with Mesos + Spark and custom Jar
So I use the workaround to deal it, but I still want to ask if there are any better solution to handle it?
A common issue , but still i am not able to resolve it .
wc mapred job is stucking at map and reduce 0% .
Below is node manager log
----------------------------------------------------- Log start -----------------------------------------
015-11-23 10:15:18,789 INFO org.apache.spark.network.yarn.YarnShuffleService: Started YARN shuffle service for Spark on port 7337. Authentication is not enabled. Registered executor file is /yarn/nm/registeredExecutors.ldb
2015-11-23 10:15:18,804 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin#6b2d2828
2015-11-23 10:15:18,804 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorProcessTree : null
2015-11-23 10:15:18,804 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Physical memory check enabled: true
2015-11-23 10:15:18,804 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Virtual memory check enabled: false
2015-11-23 10:15:18,815 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager for null: physical-memory=1024 virtual-memory=2151 virtual-cores=2
2015-11-23 10:15:18,816 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit
2015-11-23 10:15:18,819 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2015-11-23 10:15:18,820 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2015-11-23 10:15:18,821 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2015-11-23 10:15:18,821 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
--------------------------------Log End ----------------------------------------
I am not sure whether below is the main reason :
ContainersMonitorImpl: Using ResourceCalculatorProcessTree : null
I updated yarn site and mapred site xml resource mb details ; but getting the same error.
Any suggestions is most appreciated.b Also can anyone let me know how to do a pseudo cluster refresh ?
I'm getting the below error while submiting spark submit query. can any one please suggest how to resolve this issue
15/02/18 12:06:17 INFO network.ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl#5173169
java.nio.channels.CancelledKeyException
at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
15/02/18 12:06:17 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(bkcttplpd037.verizon.com,39010) not found
15/02/18 12:06:17 INFO network.ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl#7a73a542
15/02/18 12:06:17 INFO network.ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl#7a73a542
java.nio.channels.CancelledKeyException
at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:310)
at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
15/02/18 12:06:18 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/02/18 12:06:18 INFO network.ConnectionManager: Selector thread was interrupted!
15/02/18 12:06:18 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(abc02.com,49740)
15/02/18 12:06:18 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(abc01.com,49740) not found
15/02/18 12:06:18 WARN network.ConnectionManager: All connections not cleaned up
15/02/18 12:06:18 INFO network.ConnectionManager: ConnectionManager stopped
15/02/18 12:06:18 INFO storage.MemoryStore: MemoryStore cleared
15/02/18 12:06:18 INFO storage.BlockManager: BlockManager stopped
15/02/18 12:06:18 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/02/18 12:06:18 INFO spark.SparkContext: Successfully stopped SparkContext
15/02/18 12:06:18 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.15/02/18 12:06:18 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
I was running the "Generating vectors from documents" sample from the book "Mahout in Action" from Cygwin on Windows.
Hadoop is started only on the local machine.
Below is my running command:
$ bin/mahout seq2sparse -i reuters-seqfiles/ -o reuters-vectors -ow
But it shows below java.io.IOException, anyone knows what causes this problem? Thanks in advance!
Running on hadoop, using HADOOP_HOME=my_hadoop_path
HADOOP_CONF_DIR=my_hadoop_conf_path
13/05/13 18:38:03 WARN driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only
13/05/13 18:38:03 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
13/05/13 18:38:03 INFO common.HadoopUtil: Deleting reuters-vectors
13/05/13 18:38:04 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
13/05/13 18:38:04 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
13/05/13 18:38:04 INFO input.FileInputFormat: Total input paths to process : 2
13/05/13 18:38:04 INFO mapred.JobClient: Running job: job_201305131836_0001
13/05/13 18:38:05 INFO mapred.JobClient: map 0% reduce 0%
13/05/13 18:38:15 INFO mapred.JobClient: Task Id : attempt_201305131836_0001_m_000003_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
13/05/13 18:38:15 WARN mapred.JobClient: Error reading task outputhttp://namenode_address:50060/tasklog?plaintext=true&taskid=attempt_201305131836_0001_m_000003_0&filter=stdout
13/05/13 18:38:15 WARN mapred.JobClient: Error reading task outputhttp://namenode_address:50060/tasklog?plaintext=true&taskid=attempt_201305131836_0001_m_000003_0&filter=stderr
13/05/13 18:38:21 INFO mapred.JobClient: Task Id : attempt_201305131836_0001_m_000003_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
Below is the running log of tasktracker:
INFO org.apache.hadoop.mapred.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
INFO org.apache.hadoop.mapred.TaskTracker: ProcessTree implementation is missing on this system. TaskMemoryManager is disabled.
INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201305141049_0001_m_000002_0 task's state:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201305141049_0001_m_000002_0
INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201305141049_0001_m_000002_0
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201305141049_0001_m_1036671648
INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201305141049_0001_m_1036671648 spawned.
INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201305141049_0001_m_1036671648 exited. Number of tasks it ran: 0
WARN org.apache.hadoop.mapred.TaskRunner: attempt_201305141049_0001_m_000002_0 Child Error
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
INFO org.apache.hadoop.mapred.TaskRunner: attempt_201305141049_0001_m_000002_0 done; removing files.
INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
By looking at the whatever log you have posted, it seems you haven't set the HADOOP_HOME=my_hadoop_path and HADOOP_CONF_DIR=my_hadoop_conf_path.
You need to put those directory paths for e.g. HADOOP_HOME=/usr/lib/hadoop and HADOOP_CONF_DIR=/usr/lib/hadoop/conf.
If this is not the case, try with bin/mahout only and check if seq2sparse is present somewhere in the list. This line clearly states that it's not found: driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only.