I am running — currently on a single machine — one Mesos master, one slave, one Zookeeper and Marathon. All are running inside a Docker container.
They all seem to be communicating with each other correctly.
NOTE: Usually Mesos master runs on port 5050 and slaves on port 5051. The port 5050 was already used by some other app on my machine, so master runs on port 5051 and slave on port 5052.
Through Marathon, I then try to run the basic-0 example, but the job keeps failing. I used to be able to run jobs, but I made some changes to the way I was running the Docker containers, in order to remove all local ips.
I run Mesos containers with the following parameters :
Master :
docker run --net=host -d \
-e MESOS_NATIVE_JAVA_LIBRARY="/usr/local/lib/libmesos.so" \
-e MESOS_NATIVE_LIBRARY="/usr/local/lib/libmesos.so" \
-e MESOS_QUORUM=1 -e MESOS_LOG_DIR='/var/tmp' \
-e MESOS_WORK_DIR='/tmp' \
-e MESOS_ZK=zk://$IP:2181/mesos \
-e MESOS_PORT=5051 \
-e MESOS_ADVERTISE_PORT=5051 \
-e MESOS_ADVERTISE_IP=$IP \
-e MESOS_HOSTNAME=$IP \
-p 5051:5051 \
eurobd/mesos-master
Slave :
docker run --net=host -d \
-e MESOS_MASTER=zk://$IP:2181/mesos \
-e MESOS_LOG_DIR=/var/tmp \
-e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=2mins \
-e MESOS_ISOLATOR=cgroups/cpu,cgroups/mem \
-e MESOS_CONTAINERIZERS=docker,mesos \
-e MESOS_HOSTNAME=$IP \
-e MESOS_ADVERTISE_IP=$IP \
-e MESOS_IP=$IP \
-e MESOS_ADVERTISE_PORT=5052 \
-e MESOS_PORT=5052 \
-v /run/docker.sock:/run/docker.sock \
-v /sys:/sys \
-v /proc:/host/proc:ro \
-p 5052:5052 \
eurobd/mesos-slave
Here are the various logs that I read, but didn't help me :
Executor stderr :
I0517 10:00:48.011389 125 logging.cpp:188] INFO level logging started!
I0517 10:00:48.012583 125 exec.cpp:143] Version: 0.28.0
I0517 10:00:48.013772 130 exec.cpp:472] Slave exited ... shutting down
Executor stdout... Shutting down (nothing more)
Slave logs (only a part):
I0517 10:00:47.705261 11 slave.cpp:4374] Current disk usage 0.63%. Max allowed age: 6.256098991214965days
I0517 10:00:47.888465 7 slave.cpp:1361] Got assigned task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 for framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:47.889526 7 gc.cpp:83] Unscheduling '/tmp/mesos/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000' from gc
I0517 10:00:47.889643 10 gc.cpp:83] Unscheduling '/tmp/mesos/meta/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000' from gc
I0517 10:00:47.889832 7 slave.cpp:1480] Launching task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 for framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:47.890251 7 paths.cpp:528] Trying to chown '/tmp/mesos/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000/executors/basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003/runs/01392987-0e53-4175-a019-d7b2ba815287' to user 'root'
I0517 10:00:47.894273 7 slave.cpp:5367] Launching executor basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/tmp/mesos/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000/executors/basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003/runs/01392987-0e53-4175-a019-d7b2ba815287'
I0517 10:00:47.894803 11 docker.cpp:1009] No container info found, skipping launch
I0517 10:00:47.894989 7 slave.cpp:1698] Queuing task 'basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003' for executor 'basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003' of framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:47.895226 6 containerizer.cpp:666] Starting container '01392987-0e53-4175-a019-d7b2ba815287' for executor 'basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003' of framework 'c0df712a-510c-47f1-84a5-0644c6393726-0000'
I0517 10:00:47.898272 13 launcher.cpp:147] Forked child with pid '124' for container '01392987-0e53-4175-a019-d7b2ba815287'
I0517 10:00:47.898432 13 containerizer.cpp:1118] Checkpointing executor's forked pid 124 to '/tmp/mesos/meta/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000/executors/basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003/runs/01392987-0e53-4175-a019-d7b2ba815287/pids/forked.pid'
I0517 10:00:49.904361 6 slave.cpp:1891] Asked to kill task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:49.904460 6 slave.cpp:3002] Handling status update TASK_KILLED (UUID: 10164116-b7eb-480f-841e-830f9301e174) for task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000 from #0.0.0.0:0
I0517 10:00:49.905267 7 status_update_manager.cpp:320] Received status update TASK_KILLED (UUID: 10164116-b7eb-480f-841e-830f9301e174) for task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:49.905645 7 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_KILLED (UUID: 10164116-b7eb-480f-841e-830f9301e174) for task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:49.984380 11 slave.cpp:3400] Forwarding the update TASK_KILLED (UUID: 10164116-b7eb-480f-841e-830f9301e174) for task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000 to master#MASTER_IP:5051
I0517 10:00:50.013109 7 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 10164116-b7eb-480f-841e-830f9301e174) for task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:50.013226 7 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_KILLED (UUID: 10164116-b7eb-480f-841e-830f9301e174) for task basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003 of framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:53.040210 7 containerizer.cpp:1608] Executor for container '01392987-0e53-4175-a019-d7b2ba815287' has exited
I0517 10:00:53.040279 7 containerizer.cpp:1392] Destroying container '01392987-0e53-4175-a019-d7b2ba815287'
I0517 10:00:53.042726 11 provisioner.cpp:306] Ignoring destroy request for unknown container 01392987-0e53-4175-a019-d7b2ba815287
I0517 10:00:53.042866 8 slave.cpp:3886] Executor 'basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003' of framework c0df712a-510c-47f1-84a5-0644c6393726-0000 terminated with signal Killed
I0517 10:00:53.042927 8 slave.cpp:3990] Cleaning up executor 'basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003' of framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:53.043233 9 gc.cpp:55] Scheduling '/tmp/mesos/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000/executors/basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003/runs/01392987-0e53-4175-a019-d7b2ba815287' for gc 6.99999950039407days in the future
I0517 10:00:53.043305 9 gc.cpp:55] Scheduling '/tmp/mesos/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000/executors/basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003' for gc 6.99999949956148days in the future
I0517 10:00:53.043344 9 gc.cpp:55] Scheduling '/tmp/mesos/meta/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000/executors/basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003/runs/01392987-0e53-4175-a019-d7b2ba815287' for gc 6.99999949898074days in the future
I0517 10:00:53.043372 8 slave.cpp:4078] Cleaning up framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:53.043376 9 gc.cpp:55] Scheduling '/tmp/mesos/meta/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000/executors/basic-0.3a8b8882-1c16-11e6-89e4-0242ac110003' for gc 6.99999949824296days in the future
I0517 10:00:53.043486 12 status_update_manager.cpp:282] Closing status update streams for framework c0df712a-510c-47f1-84a5-0644c6393726-0000
I0517 10:00:53.043511 9 gc.cpp:55] Scheduling '/tmp/mesos/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000' for gc 6.99999949670222days in the future
I0517 10:00:53.043563 9 gc.cpp:55] Scheduling '/tmp/mesos/meta/slaves/c0df712a-510c-47f1-84a5-0644c6393726-S0/frameworks/c0df712a-510c-47f1-84a5-0644c6393726-0000' for gc 6.99999949613333days in the future
I can probably give more information, if it isn't enough. Thanks for your help !
Related
I have a MapReduce Job that process 1.4 Tb of data.
While doing it, I am getting the error as below.
The number of splits is 6444.
Before starting the job I set the following settings:
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.map.java.opts.max.heap", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx8192m");
conf.set("mapreduce.reduce.java.opts", "-Xmx8192m");
conf.set("mapreduce.job.heap.memory-mb.ratio", "0.8");
conf.set("mapreduce.task.timeout", "21600000");
The error:
2018-05-18 00:50:36,595 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524473936587_2969_m_004719_3: Container [pid=11510,containerID=container_1524473936587_2969_01_004894] is running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical memory used; 8.8 GB of 16.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1524473936587_2969_01_004894 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 11560 11510 11510 11510 (java) 14960 2833 9460879360 2133706 /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894
|- 11510 11508 11510 11510 (bash) 0 0 11497472 679 /bin/bash -c /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx8192m -Djava.io.tmpdir=/sdk/7/yarn/nm/usercache/administrator/appcache/application_1524473936587_2969/container_1524473936587_2969_01_004894/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.106.79.75 41869 attempt_1524473936587_2969_m_004719_3 4894 1>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stdout 2>/var/log/hadoop-yarn/container/application_1524473936587_2969/container_1524473936587_2969_01_004894/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Any help would be really appreciated!
The setting mapreduce.map.memory.mb will set the physical memory size of the container running the mapper (mapreduce.reduce.memory.mb will do the same for the reducer container).
Besure that you adjust the heap value as well. In newer version of YARN/MRv2 the setting mapreduce.job.heap.memory-mb.ratio can be used to have it auto-adjust. The default is .8, so 80% of whatever the container size is will be allocated as the heap. Otherwise, adjust manually using mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap settings.
BTW, I believe that 1 GB is the default and it is quite low. I recommend reading the below link. It provides a good understanding of YARN and MR memory setting, how they relate, and how to set some baseline settings based on the cluster node size (disk, memory, and cores).
Reference: http://community.cloudera.com/t5/Cloudera-Manager-Installation/ERROR-is-running-beyond-physical-memory-limits/td-p/55173
Try to set yarn memory allocation limits:
SET yarn.scheduler.maximum-allocation-mb=16G;
SET yarn.scheduler.minimum-allocation-mb=8G;
You may lookup other Yarn settings here:
https://www.ibm.com/support/knowledgecenter/STXKQY_BDA_SHR/bl1bda_tuneyarn.htm
Try with : set yarn.app.mapreduce.am.resource.mb=1000;
Explanation is here :
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G
YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb .
When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver.
When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.
https://community.cloudera.com/t5/Support-Questions/Yarn-Container-is-running-beyond-physical-memory-limits-but/m-p/199353#M161393
I am running on a 4 node cluster.
Please suggest what should I put the size of container memory. If I put it more than 1 GB then exactly what size.
What are the criteria for configuring container memory?
Error:
Diagnostic Messages for this Task:
Container [pid=46465,containerID=container_1503271937182_4757_01_000032] is
running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
physical memory used; 2.7 GB of 2.1 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1503271937182_4757_01_000032 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
|- 46465 46463 46465 46465 (bash) 0 0 108654592 308 /bin/bash -c /usr/java/jdk1.8.0_121/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32 1>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stdout
2>/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032/stderr
|- 46483 46465 46465 46465 (java) 2929 1281 2828042240 262018 /usr/java/jdk1.8.0_121/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx820m -Djava.io.tmpdir=/data3/yarn/nm/usercache/hdfs/appcache/application_1503271937182_4757/container_1503271937182_4757_01_000032/tmp
-Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/data1/yarn/container-logs/application_1503271937182_4757/container_1503271937182_4757_01_000032
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 10.104.72.113 58079 attempt_1503271937182_4757_m_000015_3 32
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 18 Reduce: 72 Cumulative CPU: 1219.78 sec HDFS Read: 3412303867 HDFS Write: 3935714 SUCCESS
Stage-Stage-9: Map: 18 Reduce: 72 Cumulative CPU: 332.43 sec HDFS Read: 3321536722 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 25 minutes 52 seconds 210 msec
hive failed
I'm very new to Apache Mesos and was getting acquainted by following the instructions on the Mesos Getting Started page using Mesos version 1.0.0.
I followed the "Downloading Mesos" and "Building Mesos (POSIX)" instructions on Ubuntu Linux and compiled Mesos into a userspace build directory: $BUILD.
Afterwards, I attempted to try out the "Run Python framework" in the "Examples" section as a non-root user using 3 terminals and a user write-able log directory: $MESOS_BASE_VAR_DIR.
Running this example gave me the following error:
Task 0 is in state TASK_FAILED
The update data did not match!
Expected: 'data with a \x00 byte'
Actual: ''
Failed to call scheduler's statusUpdate
What am I doing wrong?
Below are the commands I used on 3 separate terminals, and abridged outputs I observed on each respective terminal.
Terminal Inputs
Terminal 1: (Mesos Master)
cd $BUILD
./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=$MESOS_BASE_VAR_DIR/var/lib/mesos
Terminal 2: (Mesos Agent)
cd $BUILD
./bin/mesos-agent.sh --master=127.0.0.1:5050 --work_dir=$MESOS_BASE_VAR_DIR/var/lib/mesos
Terminal 3: (Task Job / Run Python Framework)
cd $BUILD
./src/examples/python/test-framework 127.0.0.1:5050
Terminal Outputs
Terminal 1: (Mesos Master)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0810 05:55:32.932193 21522 main.cpp:263] Build: 2016-08-09 19:23:08 by me
I0810 05:55:32.932521 21522 main.cpp:264] Version: 1.0.0
I0810 05:55:32.944061 21522 main.cpp:370] Using 'HierarchicalDRF' allocator
<snip>
I0810 05:56:00.718638 21539 master.cpp:1847] The newly elected leader is master#127.0.0.1:5050 with id 10500b16-aed1-421a-bcd6-44d82874e936
I0810 05:56:00.718727 21539 master.cpp:1860] Elected as the leading master!
<snip>
I0810 05:56:13.194939 21541 master.cpp:2424] Received SUBSCRIBE call for framework 'Test Framework (Python)' at scheduler
I0810 05:56:13.195627 21541 master.cpp:2500] Subscribing framework Test Framework (Python) with checkpointing enabled and capabilities [ ]
I0810 05:56:13.200559 21543 hierarchical.cpp:271] Added framework 10500b16-aed1-421a-bcd6-44d82874e936-0000
<snip>
W0810 05:56:26.117612 21536 master.cpp:6567] Possibly orphaned completed task 3 of framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117663 21536 master.cpp:6567] Possibly orphaned completed task 2 of framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117679 21536 master.cpp:6567] Possibly orphaned completed task 1 of framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117693 21536 master.cpp:6567] Possibly orphaned completed task 0 of framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117714 21536 master.cpp:6567] Possibly orphaned completed task 3 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117729 21536 master.cpp:6567] Possibly orphaned completed task 2 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117743 21536 master.cpp:6567] Possibly orphaned completed task 1 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117756 21536 master.cpp:6567] Possibly orphaned completed task 0 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117774 21536 master.cpp:6567] Possibly orphaned completed task 3 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117789 21536 master.cpp:6567] Possibly orphaned completed task 2 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117804 21536 master.cpp:6567] Possibly orphaned completed task 1 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.117816 21536 master.cpp:6567] Possibly orphaned completed task 0 of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001 that ran on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:26.118463 21536 master.cpp:4872] Re-registered agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox) with cpus(*):4; mem(*):4950; disk(*):9.43713e+08; ports(*):[31000-32000]
I0810 05:56:26.118600 21536 master.cpp:4940] Sending updated checkpointed resources to agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:26.118614 21537 hierarchical.cpp:478] Added agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 (linuxbox) with cpus(*):4; mem(*):4950; disk(*):9.43713e+08; ports(*):[31000-32000] (allocated: )
I0810 05:56:26.121769 21536 master.cpp:5709] Sending 1 offers to framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580
I0810 05:56:26.122695 21538 master.cpp:5002] Received update of agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox) with total oversubscribed resources
I0810 05:56:26.123118 21538 hierarchical.cpp:542] Agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 (linuxbox) updated with oversubscribed resources (total: cpus(*):4; mem(*):4950; disk(*):9.43713e+08; ports(*):[31000-32000], allocated: cpus(*):4; mem(*):4950; disk(*):9.43713e+08; ports(*):[31000-32000])
I0810 05:56:26.131975 21538 master.cpp:3342] Processing ACCEPT call for offers: [ 10500b16-aed1-421a-bcd6-44d82874e936-O0 ] on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox) for framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580
W0810 05:56:26.135907 21537 validation.cpp:647] Executor default for task 0 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0810 05:56:26.135973 21537 validation.cpp:659] Executor default for task 0 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I0810 05:56:26.136854 21537 master.cpp:7439] Adding task 0 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 (linuxbox)
I0810 05:56:26.136960 21537 master.cpp:3831] Launching task 0 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.138489 21537 validation.cpp:647] Executor default for task 1 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0810 05:56:26.138540 21537 validation.cpp:659] Executor default for task 1 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I0810 05:56:26.138764 21537 master.cpp:7439] Adding task 1 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 (linuxbox)
I0810 05:56:26.138855 21537 master.cpp:3831] Launching task 1 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.139731 21537 validation.cpp:647] Executor default for task 2 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0810 05:56:26.139770 21537 validation.cpp:659] Executor default for task 2 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I0810 05:56:26.139966 21537 master.cpp:7439] Adding task 2 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 (linuxbox)
I0810 05:56:26.140054 21537 master.cpp:3831] Launching task 2 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:26.140861 21537 validation.cpp:647] Executor default for task 3 uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W0810 05:56:26.140897 21537 validation.cpp:659] Executor default for task 3 uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I0810 05:56:26.141103 21537 master.cpp:7439] Adding task 3 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 (linuxbox)
I0810 05:56:26.141196 21537 master.cpp:3831] Launching task 3 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580 with resources cpus(*):1; mem(*):128 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
<snip>leveldb logs</snip>
I0810 05:56:27.633244 21543 master.cpp:5249] Executor 'default' of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox): exited with status 1
I0810 05:56:27.633337 21543 master.cpp:6928] Removing executor 'default' with resources of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:27.654547 21540 master.cpp:5147] Status update TASK_FAILED (UUID: 8d5cb564-0ee1-4ade-aa2f-51980f4eb385) for task 0 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 from agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:27.654714 21540 master.cpp:5195] Forwarding status update TASK_FAILED (UUID: 8d5cb564-0ee1-4ade-aa2f-51980f4eb385) for task 0 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000
I0810 05:56:27.655238 21540 master.cpp:6833] Updating the state of task 0 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (latest state: TASK_FAILED, status update state: TASK_FAILED)
I0810 05:56:27.670207 21540 master.cpp:5147] Status update TASK_FAILED (UUID: ca90acb2-0da2-4933-ab8b-5142072a3b68) for task 1 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 from agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:27.670295 21540 master.cpp:5195] Forwarding status update TASK_FAILED (UUID: ca90acb2-0da2-4933-ab8b-5142072a3b68) for task 1 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000
I0810 05:56:27.670701 21540 master.cpp:6833] Updating the state of task 1 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (latest state: TASK_FAILED, status update state: TASK_FAILED)
I0810 05:56:27.688217 21539 master.cpp:1284] Framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580 disconnected
I0810 05:56:27.688283 21539 master.cpp:2725] Disconnecting framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580
I0810 05:56:27.688330 21539 master.cpp:2749] Deactivating framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580
E0810 05:56:27.688390 21544 process.cpp:2105] Failed to shutdown socket with fd 9: Transport endpoint is not connected
I0810 05:56:27.688524 21539 master.cpp:1297] Giving framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580 0ns to failover
I0810 05:56:27.688583 21543 hierarchical.cpp:382] Deactivated framework 10500b16-aed1-421a-bcd6-44d82874e936-0000
I0810 05:56:27.689872 21540 master.cpp:5561] Framework failover timeout, removing framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580
I0810 05:56:27.689924 21540 master.cpp:6296] Removing framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (Test Framework (Python)) at scheduler-2c565680-3f64-4ee3-8bbb-231addd11944#192.168.2.3:60580
I0810 05:56:27.690450 21540 master.cpp:6833] Updating the state of task 3 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0810 05:56:27.690906 21540 master.cpp:6899] Removing task 3 with resources cpus(*):1; mem(*):128 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:27.691269 21540 master.cpp:6833] Updating the state of task 2 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0810 05:56:27.691507 21540 master.cpp:6899] Removing task 2 with resources cpus(*):1; mem(*):128 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:27.691673 21540 master.cpp:6833] Updating the state of task 1 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (latest state: TASK_FAILED, status update state: TASK_KILLED)
I0810 05:56:27.691707 21540 master.cpp:6899] Removing task 1 with resources cpus(*):1; mem(*):128 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
I0810 05:56:27.691895 21540 master.cpp:6833] Updating the state of task 0 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 (latest state: TASK_FAILED, status update state: TASK_KILLED)
I0810 05:56:27.691926 21540 master.cpp:6899] Removing task 0 with resources cpus(*):1; mem(*):128 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 on agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox)
W0810 05:56:27.693068 21540 master.cpp:5140] Ignoring status update TASK_FAILED (UUID: e202f922-ff82-4852-9393-02bc762edf8b) for task 2 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 from agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox) because the framework is unknown
I0810 05:56:27.693373 21541 hierarchical.cpp:333] Removed framework 10500b16-aed1-421a-bcd6-44d82874e936-0000
W0810 05:56:27.708871 21541 master.cpp:5140] Ignoring status update TASK_FAILED (UUID: e28a2332-e6ef-458e-81a8-ea8bd8edce19) for task 3 of framework 10500b16-aed1-421a-bcd6-44d82874e936-0000 from agent f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0 at slave(1)#192.168.2.3:5051 (linuxbox) because the framework is unknown
Terminal 2: (Mesos Agent)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0810 05:55:36.992377 21562 main.cpp:243] Build: 2016-08-09 19:23:08 by me
I0810 05:55:36.992777 21562 main.cpp:244] Version: 1.0.0
I0810 05:55:37.012877 21562 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
W0810 05:55:37.018576 21562 backend.cpp:75] Failed to create 'bind' backend: BindBackend requires root privileges
I0810 05:55:37.026435 21562 main.cpp:434] Starting Mesos agent
I0810 05:55:37.028482 21582 slave.cpp:198] Agent started on 1)#192.168.2.3:5051
I0810 05:55:37.028508 21582 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="${MESOS_SRC_DIR}/mesos-1.0.0/build/src" --logbufsecs="0" --logging_level="INFO" --master="127.0.0.1:5050" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos"
I0810 05:55:37.036922 21582 slave.cpp:519] Agent resources: cpus(*):4; mem(*):4950; disk(*):9.43713e+08; ports(*):[31000-32000]
I0810 05:55:37.037050 21582 slave.cpp:527] Agent attributes: [ ]
I0810 05:55:37.037096 21582 slave.cpp:532] Agent hostname: linuxbox
I0810 05:55:37.057514 21579 state.cpp:57] Recovering state from '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta'
W0810 05:55:37.157933 21579 state.cpp:544] Failed to find executor libprocess pid/http marker file
I0810 05:55:37.169766 21581 slave.cpp:4870] Recovering framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000
I0810 05:55:37.170205 21581 slave.cpp:5798] Recovering executor 'default' of framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000
I0810 05:55:37.175171 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000/executors/default/runs/919bb8b4-70aa-48e0-ae41-d48b7752b36c' for gc 6.98397945710519days in the future
I0810 05:55:37.175750 21581 slave.cpp:4281] Cleaning up framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000
I0810 05:55:37.175777 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000/executors/default/runs/919bb8b4-70aa-48e0-ae41-d48b7752b36c' for gc 6.98397945307852days in the future
I0810 05:55:37.175894 21578 status_update_manager.cpp:282] Closing status update streams for framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000
I0810 05:55:37.175906 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000/executors/default' for gc 6.98397945185185days in the future
I0810 05:55:37.175981 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000/executors/default' for gc 6.98397945105185days in the future
I0810 05:55:37.176434 21580 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000' for gc 6.99999795937481days in the future
I0810 05:55:37.177531 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000' for gc 6.99999794667852days in the future
I0810 05:55:37.177633 21581 slave.cpp:4870] Recovering framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002
I0810 05:55:37.177726 21581 slave.cpp:5798] Recovering executor 'default' of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002
I0810 05:55:37.180641 21585 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002/executors/default/runs/e8afac0d-a20e-499b-8e0e-6c64d64f0cdb' for gc 6.71944235485926days in the future
I0810 05:55:37.180917 21585 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002/executors/default/runs/e8afac0d-a20e-499b-8e0e-6c64d64f0cdb' for gc 6.71944235118222days in the future
I0810 05:55:37.181032 21585 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002/executors/default' for gc 6.71944235001185days in the future
I0810 05:55:37.181112 21585 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002/executors/default' for gc 6.71944234915556days in the future
I0810 05:55:37.181126 21581 slave.cpp:4281] Cleaning up framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002
I0810 05:55:37.181241 21584 status_update_manager.cpp:282] Closing status update streams for framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002
I0810 05:55:37.181540 21578 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002' for gc 6.99999789952593days in the future
I0810 05:55:37.181849 21581 slave.cpp:4870] Recovering framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001
I0810 05:55:37.181891 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002' for gc 6.99999789593778days in the future
I0810 05:55:37.181915 21581 slave.cpp:5798] Recovering executor 'default' of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001
I0810 05:55:37.184018 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001/executors/default/runs/a262d43a-52d4-4469-b264-1a143a66f147' for gc 6.71397935287111days in the future
I0810 05:55:37.184191 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001/executors/default/runs/a262d43a-52d4-4469-b264-1a143a66f147' for gc 6.71397935029037days in the future
I0810 05:55:37.184459 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001/executors/default' for gc 6.71397934912days in the future
I0810 05:55:37.184531 21581 slave.cpp:4281] Cleaning up framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001
I0810 05:55:37.184554 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001/executors/default' for gc 6.71397934666667days in the future
I0810 05:55:37.184633 21582 status_update_manager.cpp:282] Closing status update streams for framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001
I0810 05:55:37.185003 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001' for gc 6.9999978598963days in the future
I0810 05:55:37.185269 21581 slave.cpp:4870] Recovering framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000
I0810 05:55:37.185309 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001' for gc 6.99999785635556days in the future
I0810 05:55:37.185339 21581 slave.cpp:5798] Recovering executor 'default' of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000
I0810 05:55:37.187397 21580 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000/executors/default/runs/af8c10fa-1125-46f3-baa4-2ccf63fc50ae' for gc 6.70797236873778days in the future
I0810 05:55:37.187611 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000/executors/default/runs/af8c10fa-1125-46f3-baa4-2ccf63fc50ae' for gc 6.70797236678222days in the future
I0810 05:55:37.187702 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000/executors/default' for gc 6.70797236584296days in the future
I0810 05:55:37.187732 21581 slave.cpp:4281] Cleaning up framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000
I0810 05:55:37.187780 21584 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000/executors/default' for gc 6.70797236505778days in the future
I0810 05:55:37.187798 21585 status_update_manager.cpp:282] Closing status update streams for framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000
I0810 05:55:37.188364 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000' for gc 6.99999782332148days in the future
I0810 05:55:37.188454 21582 gc.cpp:55] Scheduling '${MESOS_BASE_VAR_DIR}/tmp/var/lib/mesos/meta/slaves/f8d8dd51-7098-436e-94c8-42dfb0db9fae-S0/frameworks/f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000' for gc 6.99999781941037days in the future
I0810 05:55:37.192026 21582 status_update_manager.cpp:200] Recovering status update manager
I0810 05:55:37.192095 21582 status_update_manager.cpp:208] Recovering executor 'default' of framework 6883bc82-edc4-4fb5-8e19-bd1937ab4509-0000
I0810 05:55:37.192287 21582 status_update_manager.cpp:208] Recovering executor 'default' of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0002
I0810 05:55:37.192469 21582 status_update_manager.cpp:208] Recovering executor 'default' of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0001
I0810 05:55:37.192637 21582 status_update_manager.cpp:208] Recovering executor 'default' of framework f8d8dd51-7098-436e-94c8-42dfb0db9fae-0000
I0810 05:55:37.205670 21583 slave.cpp:4782] Finished recovery
Terminal 3: (Task Job / Run Python Framework)
I0810 05:55:55.672643 21606 sched.cpp:226] Version: 1.0.0
I0810 05:55:55.686266 21673 sched.cpp:330] New master detected at master#127.0.0.1:5050
I0810 05:55:55.686885 21673 sched.cpp:341] No credentials provided. Attempting to register without authentication
I0810 05:56:13.201035 21680 sched.cpp:743] Framework registered with 10500b16-aed1-421a-bcd6-44d82874e936-0000
Registered with framework ID 10500b16-aed1-421a-bcd6-44d82874e936-0000
Received offer 10500b16-aed1-421a-bcd6-44d82874e936-O0 with cpus: 4.0 and mem: 4950.0
Launching task 0 using offer 10500b16-aed1-421a-bcd6-44d82874e936-O0
Launching task 1 using offer 10500b16-aed1-421a-bcd6-44d82874e936-O0
Launching task 2 using offer 10500b16-aed1-421a-bcd6-44d82874e936-O0
Launching task 3 using offer 10500b16-aed1-421a-bcd6-44d82874e936-O0
Task 0 is in state TASK_FAILED
The update data did not match!
Expected: 'data with a \x00 byte'
Actual: ''
Failed to call scheduler's statusUpdate
It confused me a lot, finally i got that work by using sudo privilege.
I got a cluster of 1 master node and 2 slaves and I'm trying to compile my application with mesos.
Basically, here is the command that I use:
mesos-execute --name=alc1 --command="ccmake -j myapp" --master=10.11.12.13:5050
Offers are made from the slave but this compilation task keeps failing.
[root#master-node ~]# mesos-execute --name=alc1 --command="ccmake -j myapp" --master=10.11.12.13:5050
I0511 22:26:11.623016 11560 sched.cpp:222] Version: 0.28.0
I0511 22:26:11.625602 11564 sched.cpp:326] New master detected at master#10.11.12.13:5050
I0511 22:26:11.625952 11564 sched.cpp:336] No credentials provided. Attempting to register without authentication
I0511 22:26:11.627279 11564 sched.cpp:703] Framework registered with 70582e35-5d6e-4915-a919-cae61c904fd9-0139
Framework registered with 70582e35-5d6e-4915-a919-cae61c904fd9-0139
task alc1 submitted to slave 70582e35-5d6e-4915-a919-cae61c904fd9-S2
Received status update TASK_RUNNING for task alc1
Received status update TASK_FAILED for task alc1
I0511 22:26:11.759610 11567 sched.cpp:1903] Asked to stop the driver
I0511 22:26:11.759639 11567 sched.cpp:1143] Stopping framework '70582e35-5d6e-4915-a919-cae61c904fd9-0139'
On the sandbox slave node, here is the stderr logs:
I0511 22:26:13.781070 5037 exec.cpp:143] Version: 0.28.0
I0511 22:26:13.785001 5040 exec.cpp:217] Executor registered on slave 70582e35-5d6e-4915-a919-cae61c904fd9-S2
sh: ccmake: command not found
I0511 22:26:13.892653 5042 exec.cpp:390] Executor asked to shutdown
Just to mentionned that commands like this work fine and get me the expected results:
[root#master-node ~]# mesos-execute --name=alc1 --command="find / -name a" --master=10.11.12.13:5050
I0511 22:26:03.733172 11550 sched.cpp:222] Version: 0.28.0
I0511 22:26:03.736112 11554 sched.cpp:326] New master detected at master#10.11.12.13:5050
I0511 22:26:03.736383 11554 sched.cpp:336] No credentials provided. Attempting to register without authentication
I0511 22:26:03.737730 11554 sched.cpp:703] Framework registered with 70582e35-5d6e-4915-a919-cae61c904fd9-0138
Framework registered with 70582e35-5d6e-4915-a919-cae61c904fd9-0138
task alc1 submitted to slave 70582e35-5d6e-4915-a919-cae61c904fd9-S2
Received status update TASK_RUNNING for task alc1
Received status update TASK_FINISHED for task alc1
I0511 22:26:04.184813 11553 sched.cpp:1903] Asked to stop the driver
I0511 22:26:04.184844 11553 sched.cpp:1143] Stopping framework '70582e35-5d6e-4915-a919-cae61c904fd9-0138'
I don't really get what is needed for even troubleshot this issue.
Version: spring-xd-1.0.1
Distributed mode: yarn
Hadoop version: cdh5
I have modified the config/servers.yml to point to right applicationDir, zookeeper, hdfs, resourcemanager,redis, mysqldb
However after the push, when I start admin, it is killed by yarn after sometime.
I do not understand why the container will consume 31G of memory.
Please point me in the right direction to debug this problem. Also, how do I increase the log level
Following error is observed in logs:
Got ContainerStatus=[container_id { app_attempt_id { application_id { id: 432 cluster_timestamp: 1415816376410 } attemptId: 1 } id: 2 } state: C_COMPLETE diagnostics: "Container [pid=19374,containerID=container_1415816376410_0432_01_000002] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 31.7 GB of 2.1 GB virtual memory used. Killing container.\nDump of the process-tree for container_1415816376410_0432_01_000002 :\n\t|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|- 19381 19374 19374 19374 (java) 3903 121 33911242752 303743 /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication \n\t|- 19374 24125 19374 19374 (bash) 0 0 110804992 331 /bin/bash -c /usr/java/jdk1.7.0_45-cloudera/bin/java -DxdHomeDir=./spring-xd-yarn-1.0.1.RELEASE.zip -Dxd.module.config.location=file:./modules-config.zip/ -Dspring.application.name=admin -Dspring.config.location=./servers.yml org.springframework.xd.dirt.server.AdminServerApplication 1>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stdout 2>/var/log/hadoop-yarn/container/application_1415816376410_0432/container_1415816376410_0432_01_000002/Container.stderr \n\nContainer killed on request. Exit code is 143\nContainer exited with a non-zero exit code 143\n" exit_status: 143
Yes, with the current version 1.1.0/1.1.1 you don't need to run the admin explicitly. The containers and admin will be instantiated by yarn when you submit the application.