How to disable apache mesos memory/disk isolation? - mesos

I am checking out Apache Aurora (1.1.0)(0.16.0) and Apache Mesos (0.16.0) (1.1.0)with docker container. Here is an example Aurora job definition,
process_nginx = Process(
name='nginx',
cmdline=textwrap.dedent(r'''
exec /path_to/nginx -g "daemon off; pid /run/nginx.pid; error_log stderr notice;"
'''),
min_duration=3,
daemon=True,
)
task_nginx = Task(
name='nginx',
processes=[process_nginx,],
resources=Resources(
cpu=0.1,
ram=20*MB,
disk=50*MB,
),
finalization_wait=14,
)
job_nginx = Job(
cluster='x',
role='root',
name='nginx',
instances=6,
service=True,
task=task_nginx,
priority=1,
#tier='preferred',
constraints={
'X_HOST_MACHINE_ID': 'limit:2',
'HOST_TYPE.FRONTEND': 'true',
},
update_config=UpdateConfig(
batch_size=1,
watch_secs=29,
rollback_on_failure=True,
),
container=Docker(
image='my_nginx_docker_image_name',
parameters=[
{'name': 'network', 'value': 'host'},
{'name': 'log-driver', 'value': 'journald'},
{'name': 'log-opt', 'value': 'tag=nginx'},
{'name': 'oom-score-adj', 'value': '-500'},
{'name': 'memory-swappiness', 'value': '1'},
],
),
)
But, since specifying disk and ram limits bother me, I want to make both disabled.
problem 1
I thought only CPU resource would be isolated(=limited) if my all mesos agents are launched with the option --isolation=cgroups/cpu (not --isolation=cgroups/cpu,cgroups/mem).
But even in this case, all docker containers launched by mesos docker containerizer have --memory option, which is hard limit and causes OOM killer if a docker container requires more memory. (And it seems mesos docker containerizer does not support --memory-reservation.)
problem 2
Even in case of --isolation=cgroups/cpu, removing ram or disk parameter from Aurora Resource instance causes the following error.
Error loading configuration: TypeCheck(FAILED): MesosJob[task] failed: Task[resources] failed: Resources[ram] is required.
My question
Is it possible to disable memory and disk isolation ?
What is the difference between --isolation=cgroups/cpu and --isolation=cgroups/cpu,cgroups/mem?

As you've discovered, you can disable the memory and disk isolators in Mesos by not specifying them as part of the isolation agent flag. I'm unsure about the behavior of the Docker Containerizer in this scenario, but you might want to try using the Mesos Containerizer instead, as this is the preferred way to run Docker images in Mesos going forward.
As far as omitting the Resources from your Aurora config goes, unfortunately that won't be possible. Every Aurora job must specify its resource requirements so that the scheduler can match your task instances up with an offer from Mesos.

Related

Unknown processors type "resourcedetection" for "resourcedetection"

Running OT Collector with image ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector:0.58.0
In config.yaml I have,
processors:
batch:
resourcedetection:
detectors: [ env ]
timeout: 2s
override: false
The collector is deployed as a sidecar but it keeps failing with
collector server run finished with error: failed to get config: cannot unmarshal the configuration: unknown processors type "resourcedetection" for "resourcedetection" (valid values: [resource span probabilistic_sampler filter batch memory_limiter attributes])
Any idea as to what is causing this? I haven't found any relevant documentation/question
The Resource Detection Processor is part of the otelcol-contrib distro upstream and you'd hence would need to use otel/opentelemetry-collector-contrib:0.58.0 (or the equivalent on your container registry of choice) for this processor to be available in your collector.

ceph runtime config is not the same with ceph.conf

i am using ceph-deploy to deploy ceph cluster. after deployment is finished, i found the runtime config is not the same with ceph.conf. i did not modify the runtime config in manual.
[root#sz02 ~]# ceph daemon osd.0 config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",
[root#sz02 ~]# cat /etc/ceph/ceph.conf | grep "rbd cache size"
rbd cache size = 268435456
we can see that rbd_cache_size is different. so i want to know:
whether ceph runtime config reads the value from ceph.conf or not? if not, what's the meaning of ceph.conf?
thanks
An OSD while it's starting reads /etc/ceph/ceph.conf and applies found parameters from this file to its runtime config. If it doesn't find some parameters, it uses the default values described in the docs. So the setting rbd cache size = 268435456 should take an effect.
You can do the following:
Restart the osd daemon.
Check that the setting rbd cache size = 268435456 is under [client] config section in your ceph.conf.
If you don't want to restart the daemon:
ceph tell osd.0 injectargs '--rbd_cache_size=268435456'
but it's suggested to change it on all osds:
ceph tell osd.* injectargs '--rbd_cache_size=268435456'

Chronos insufficient resources warning

I'm trying to run Chronos on Mesos, but all my jobs are stuck in a queueing state.
systemctl status chronos -l shows:
Mar 20 20:21:08 core-mq3 chronos[17940]: [2017-03-20 20:21:08,985] WARN Insufficient resources remaining for task 'ct:1490040556081:0:JobName:', will append to queue. (Needed: [cpus: 0.5 mem: 256.0 disk: 256.0], Found: [cpus: 1.8 mem: 11034.0 disk: 60398.8,cpus: 2.0 mem: 6542.0 disk: 60399.0]) (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:155)
So, it is refusing the offers even though all the resources are more than required.
This was a red herring. There was a constraint that the agent did not fulfill, which is why it couldn't run the task.
Running curl GET <chronos>/scheduler/jobs/search?name=<job> gave me all the details of the job, which I used to verify that the constraint was not being fulfilled.

How to use Marathon health check command mode?

I am running docker containers on mesos / marathon. I wanted to implement health checks, basically want to run a health check script. My question is, will the health check command be run on the container itself or does it run on the slave? It probably is container level since this is per application health check, so kind of obvious, but I would like to confirm it. Didn't find any relevant documentation that says where it is run.
Thanks
I did try an echo to /tmp/testfile via the command, which I see on the slave. This means it runs on the slave? Just need confirmation. Any more information is useful
The short answer is: it depends. Long answer below : ).
Command heath checks are run by the Mesos docker executor in your task container via docker exec. If you run your containers using the "unified containerizer", i.e., in case of docker containers without docker daemon, things are similar, with the difference there is no docker exec and Mesos executor simply enters the mnt namespace of your container before executing the command health check (see this doc). HTTP and TCP health checks are run by the Marathon scheduler hence not necessarily on the node where your container is running (unless you run Marathon at the same node with Mesos agent, which is probably you should not be doing). Check out this page.
Now starting with Mesos 1.2.0 and Marathon 1.3, there is a possibility to run so-called Mesos-native health checks. In this case, both HTTP(S) and TCP health checks run on the agent where your container is running. To make sure the container network can be reached, these checks enter the net namespace of your container.
Mesos-level health checks (MESOS_HTTP, MESOS_HTTPS, MESOS_TCP, and COMMAND) are locally executed by Mesos on the agent running the corresponding task and thus test reachability from the Mesos executor. Mesos-level health checks offer the following advantages over Marathon-level health checks:
Mesos-level health checks are performed as close to the task as possible, so they are are not affected by networking failures.
Mesos-level health checks are delegated to the agents running the tasks, so the number of tasks that can be checked can scale horizontally with the number of agents in the cluster.
Limitations and considerations
Mesos-level health checks consume extra resources on the agents; moreover, there is some overhead for fork-execing a process and entering the tasks’ namespaces every time a task is checked.
The health check processes share resources with the task that they check. Your application definition must account for the extra resources consumed by the health checks.
Mesos-level health checks require tasks to listen on the container’s loopback interface in addition to whatever interface they require. If you run a service in production, you will want to make sure that the users can reach it.
Marathon currently does NOT support the combination of Mesos and Marathon level health checks.
Example usage
HTTP:
{
"path": "/api/health",
"portIndex": 0,
"protocol": "HTTP",
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3,
"ignoreHttp1xx": false
}
or Mesos HTTP:
{
"path": "/api/health",
"portIndex": 0,
"protocol": "MESOS_HTTP",
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3
}
or secure HTTP:
{
"path": "/api/health",
"portIndex": 0,
"protocol": "HTTPS",
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3,
"ignoreHttp1xx": false
}
Note: HTTPS health checks do not verify the SSL certificate.
or TCP:
{
"portIndex": 0,
"protocol": "TCP",
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 0
}
or COMMAND:
{
"protocol": "COMMAND",
"command": { "value": "curl -f -X GET http://$HOST:$PORT0/health" },
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3
}
{
"protocol": "COMMAND",
"command": { "value": "/bin/bash -c \\\"</dev/tcp/$HOST/$PORT0\\\"" }
}
Further Information: https://mesosphere.github.io/marathon/docs/health-checks.html

Recovering from Consul "No Cluster leader" state

I have:
one mesos-master in which I configured a consul server;
one mesos-slave in which I configure consul client, and;
one bootstrap server for consul.
When I hit start I am seeing the following error:
2016/04/21 19:31:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader
2016/04/21 19:31:44 [ERR] agent: coordinate update error: rpc error: No cluster leader
How do I recover from this state?
Did you look at the Consul docs ?
It looks like you have performed a ungraceful stop and now need to clean your raft/peers.json file by removing all entries there to perform an outage recovery. See the above link for more details.
As of Consul 0.7 things work differently from Keyan P's answer. raft/peers.json (in the Consul data dir) has become a manual recovery mechanism. It doesn't exist unless you create it, and then when Consul starts it loads the file and deletes it from the filesystem so it won't be read on future starts. There are instructions in raft/peers.info. Note that if you delete raft/peers.info it won't read raft/peers.json but it will delete it anyway, and it will recreate raft/peers.info. The log will indicate when it's reading and deleting the file separately.
Assuming you've already tried the bootstrap or bootstrap_expect settings, that file might help. The Outage Recovery guide in Keyan P's answer is a helpful link. You create raft/peers.json in the data dir and start Consul, and the log should indicate that it's reading/deleting the file and then it should say something like "cluster leadership acquired". The file contents are:
[ { "id": "<node-id>", "address": "<node-ip>:8300", "non_voter": false } ]
where <node-id> can be found in the node-id file in the data dir.
If u got raft version more than 2:
[
{
"id": "e3a30829-9849-bad7-32bc-11be85a49200",
"address": "10.88.0.59:8300",
"non_voter": false
},
{
"id": "326d7d5c-1c78-7d38-a306-e65988d5e9a3",
"address": "10.88.0.45:8300",
"non_voter": false
},
{
"id": "a8d60750-4b33-99d7-1185-b3c6d7458d4f",
"address": "10.233.103.119",
"non_voter": false
}
]
In my case I had 2 worker nodes in the k8s cluster, after adding another node the consul servers could elect a master and everything is up and running.
I will update what I did:
Little Background: We scaled down the AWS Autoscaling so lost the leader. But we had one server still running but without any leader.
What I did was:
I scaled up to 3 servers(don't make 2-4)
stopped consul in all 3 servers.sudo service consul stop(you can do status/stop/start)
created peers.json file and put it in old server(/opt/consul/data/raft)
start the 3 servers (peers.json should be placed on 1 server only)
For other 2 servers join it to leader using consul join 10.201.8.XXX
check peers are connected to leader using consul operator raft list-peers
Sample peers.json file
[
{
"id": "306efa34-1c9c-acff-1226-538vvvvvv",
"address": "10.201.n.vvv:8300",
"non_voter": false
},
{
"id": "dbeeffce-c93e-8678-de97-b7",
"address": "10.201.X.XXX:8300",
"non_voter": false
},
{
"id": "62d77513-e016-946b-e9bf-0149",
"address": "10.201.X.XXX:8300",
"non_voter": false
}
]
These id you can get from each server in /opt/consul/data/
[root#ip-10-20 data]# ls
checkpoint-signature node-id raft serf
[root#ip-10-1 data]# cat node-id
Some useful commands:
consul members
curl http://ip:8500/v1/status/peers
curl http://ip:8500/v1/status/leader
consul operator raft list-peers
cd opt/consul/data/raft/
consul info
sudo service consul status
consul catalog services
You may also ensure that bootstrap parameter is set in your Consul configuration file config.json on the first node:
# /etc/consul/config.json
{
"bootstrap": true,
...
}
or start the consul agent with the -bootstrap=1 option as described in the official Failure of a single server cluster Consul documentation.

Resources