Mesos / Marathon : forward port make deployement fail - mesos

I have marathon/mesos successfully deploying app, but if I add a port mapping, it doesn't work anymore.
My slave container is runned as :
docker run --privileged -v /data/docker/notebook:/notebook:rw -v /etc/localtime:/etc/localtime:ro --net=host -e NFS_PATH=$NFS_PATH -e IP=$IP -e RESOURCES=$RESOURCES -e ATTRIBUTES=$ATTRIBUTES -e HOSTNAME=$HOSTNAME -e MASTER=$MASTER -e SLAVE_PORT=$SLAVE_PORT -d -p 5151:5151 --name $CONTAINER_NAME $IMAGE_NAME
Then in the slave container I have to start by hand the daemon because of a strange [time="2015-10-17T12:27:40.963674511Z" level=fatal msg="Error starting daemon: error initializing graphdriver: operation not permitted"] error, so I do :
docker -d -D --insecure-registry=localhost:5000 -g /var/test
Then I see my slave on Mesos as a working ressource, and I can post some app to marathon :
{
"id": "rstudiorocker2",
"container": {
"type" : "DOCKER",
"volumes" : [],
"docker" : {
"image" : "localhost:5000/rocker/rstudio",
"privileged" : true,
"parameters" : [],
"forcePullImage" : true
}
}
}
Here the app is instantaenously deployed on the slave. The issue is that rocker is listening on port 8787, and I want to access on it on another port, so I try to make a port mapping :
{
"id": "rstudiorocker",
"container": {
"type" : "DOCKER",
"volumes" : [],
"docker" : {
"image" : "192.168.0.38:5000/rocker/rstudio",
"privileged" : true,
"parameters" : [],
"forcePullImage" : true,
"network":"BRIDGE",
"portMappings": [
{ "containerPort": 8787,
"hostPort": 2036,
"protocol": "tcp" }
, { "containerPort": 8787,
"hostPort": 2036,
"protocol": "udp" }
]}
}
}
and here the problem appear : the app stay on "stagging" stage, without never being deployed (even if I delete all other app first) :(
What could go wrong ?

You've tried to map the same container port twice, which is not allowed by Marathon:
"portMappings": [
{ "containerPort": 8787,
"hostPort": 2036,
"protocol": "tcp" },
{ "containerPort": 8787,
"hostPort": 2036,
"protocol": "udp" }
]}
Marathon will reject this configuration with a message like
{"message":"Bean is not valid","errors":[{"attribute":"ports","error":"Elements must be unique"}]}
Try changing one of the containerPort values, eg:
"portMappings": [
{ "containerPort": 8787,
"hostPort": 0,
"protocol": "tcp" },
{ "containerPort": 8789,
"hostPort": 0,
"protocol": "udp" }
]}

Related

Bash redirection in docker container failing when ran in ECS task on Amazon Linux 2 instances

I am trying to run an ECS task that contains 3 containers - postgres, redis, and an image from a private ECR repository. The custom image container definition has a command to wait until the postgres container can receive traffic via a bash command
"command": [
"/bin/bash",
"-c",
"while !</dev/tcp/postgres/5432; do echo \"Waiting for postgres database to start...\"; /bin/sleep 1; done; /bin/sh /app/start-server.sh;"
],
When I run this via docker-compose on my local machine through docker it works, but on the Amazon Linux 2 EC2 machine this is printed when the while loop runs:
/bin/bash: line 1: postgres: Name or service not known
/bin/bash: line 1: /dev/tcp/postgres/5432: Invalid argument
The postgres container runs without error and the last log from that container is
database system is ready to accept connections
I am not sure if this is a docker network issue or an issue with amazon linux 2's bash not being compiled with --enable-net-redirections which I found explained here
Task Definition:
{
"networkMode": "bridge",
"containerDefinitions": [
{
"environment": [
{
"name": "POSTGRES_DB",
"value": "metadeploy"
},
{
"name": "POSTGRES_USER",
"value": "<redacted>"
},
{
"name": "POSTGRES_PASSWORD",
"value": "<redacted>"
}
],
"essential": true,
"image": "postgres:12.9",
"mountPoints": [],
"name": "postgres",
"memory": 1024,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "metadeploy-postgres",
"awslogs-region": "us-east-1",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "mdp"
}
}
},
{
"essential": true,
"image": "redis:6.2",
"name": "redis",
"memory": 1024
},
{
"command": [
"/bin/bash",
"-c",
"while !</dev/tcp/postgres/5432; do echo \"Waiting for postgres database to start...\"; /bin/sleep 1; done; /bin/sh /app/start-server.sh;"
],
"environment": [
{
"name": "DJANGO_SETTINGS_MODULE",
"value": "config.settings.local"
},
{
"name": "DATABASE_URL",
"value": "<redacted-postgres-url>"
},
{
"name": "REDIS_URL",
"value": "redis://redis:6379"
},
{
"name": "REDIS_HOST",
"value": "redis"
}
],
"essential": true,
"image": "the private ecr image uri built from here https://github.com/SFDO-Tooling/MetaDeploy",
"links": [
"redis"
],
"mountPoints": [
{
"containerPath": "/app/node_modules",
"sourceVolume": "AppNode_Modules"
}
],
"name": "web",
"portMappings": [
{
"containerPort": 8080,
"hostPort": 8080
},
{
"containerPort": 8000,
"hostPort": 8000
},
{
"containerPort": 6006,
"hostPort": 6006
}
],
"memory": 1024,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "metadeploy-web",
"awslogs-region": "us-east-1",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "mdw"
}
}
}
],
"family": "MetaDeploy",
"volumes": [
{
"host": {
"sourcePath": "/app/node_modules"
},
"name": "AppNode_Modules"
}
]
}
The corresponding docker-compose.yml contains:
version: '3'
services:
postgres:
environment:
POSTGRES_DB: metadeploy
POSTGRES_USER: postgres
POSTGRES_PASSWORD: sample_db_password
volumes:
- ./postgres:/var/lib/postgresql/data:delegated
image: postgres:12.9
restart: always
redis:
image: redis:6.2
web:
build:
context: .
dockerfile: Dockerfile
command: |
/bin/bash -c 'while !</dev/tcp/postgres/5432; do echo "Waiting for postgres database to start..."; /bin/sleep 1; done; \
/bin/sh /app/start-server.sh;'
ports:
- '8080:8080'
- '8000:8000'
# Storybook server
- '6006:6006'
stdin_open: true
tty: true
depends_on:
- postgres
- redis
links:
- redis
environment:
DJANGO_SETTINGS_MODULE: config.settings.local
DATABASE_URL: postgres://postgres:sample_db_password#postgres:5432/metadeploy
REDIS_URL: redis://redis:6379
REDIS_HOST: redis
volumes:
- .:/app:cached
- /app/node_modules
Do I need to recompile bash to use --enable-net-redirections, and if so how can I do that?
Without bash's net redirection feature, your best bet is to use something like nc or netcat (if available) to determine if the port is open. If those aren't available, it may be worth modifying your app logic to better handle database failure cases.
Alternately, a potential better approach would be:
Adding a healthcheck to the postgres image.
Modifying the web service's depends_on clause "long syntax" to add a dependency on postgres being service_healthy instead of the default service_started.
This approach has two key benefits:
The postgres image likely has the tools to detect if the database is up and running.
The web service no longer needs to manually check if the database is ready or not.

Azure Iotedge start docker with --net=host so that I can access my IP

I would like in my Java code to find my IP address. My code is inside a docker container and I always get the IP address of the docker container instead of my machine.
I run the docker like this
docker run -p 8080:8080 --privileged --net=host -d 6b45f71550a3
This is my Java code
InetAddress addr = InetAddress.getLocalHost();
String hostname = InetAddress.getByName(addr.getHostName()).toString();
I need to modify the deployment.template.json so that the generated docker does take the IP Address of the machine
"modules": {
"MyModule": {
"version": "1.0",
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "dev.azurecr.io/dev:0.0.1-arm32v7",
"createOptions": {
"ExposedPorts":{"8080/tcp": {}},
"HostConfig": {
"PortBindings": {
"8080/tcp": [
{
"HostPort": "8080"
}
]
}
}
}
}
}
}
I was going to say that you can't do that but apparently you can by using
"createOptions": {
"NetworkingConfig": {
"EndpointsConfig": {
"host": {}
}
},
"HostConfig": {
"NetworkMode": "host"
}
}
I haven't tried it. I found it here: https://github.com/Azure/iot-edge-v1/issues/517. Maybe that will help.

Mesos/Marathon - reserved resources for role are not offered for Marathon app

I have assigned slave resources to the particular role ("app-role") by set --default_role="app-role" parameter to ExecStart for slave service ( /etc/systemd/system/dcos-mesos-slave.service). Next I have restarted slave agent:
sudo systemctl daemon-reload
sudo systemctl stop dcos-mesos-slave.service
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave.service
and verified by: curl master.mesos/mesos/slaves.
After that I expect marathon app with acceptedResourceRoles attribute will receive only these particular resource offers, but it does not happen (the app is still in waiting state).
Why does marathon didn't receive it? How should this be done to make it work?
{
"id": "/basic-4",
"cmd": "python3 -m http.server 8080",
"cpus": 0.5,
"mem": 32,
"disk": 0,
"instances": 1,
"acceptedResourceRoles": [
"app-role"
],
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "python:3",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 8080,
"hostPort": 0,
"servicePort": 10000,
"protocol": "tcp",
"name": "my-vip",
"labels": {
"VIP_0": "/my-service:5555"
}
}
],
"privileged": false,
"parameters": [],
"forcePullImage": false
}
},
"portDefinitions": [
{
"port": 10000,
"protocol": "tcp",
"name": "default",
"labels": {}
}
]
}
This works only if marathon is started with --mesos_role set.
In the context of the question this should be: --mesos_role 'app-role'.
If you set --mesos_role other, Marathon will register with Mesos for this role – it will receive offers for resources that are reserved
for this role, in addition to unreserved resources.
If you set default_accepted_resource_roles *, Marathon will apply this default to all AppDefinitions that do not explicitly
define acceptedResourceRoles. Since your AppDefinition defines that
option, the default will not be applied (both are equal anyways).
If you set "acceptedResourceRoles": [""] in an AppDefinition (or the AppDefinition inherits a default of ""), Marathon will only
consider unreserved resources for launching of this app.
More: https://mesosphere.github.io/marathon/docs/recipes.html

DC/OS marathon Virtual network not working

I installed DC/OS with 3 masters and 3 agents and face a problem with virtual networking. Here is my Marathon app spec:
{
"id": "/nginx",
"cmd": null,
"cpus": 1,
"mem": 128,
"disk": 0,
"instances": 1,
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "nginx",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 80,
"hostPort": 0,
"servicePort": 10002,
"protocol": "tcp",
"name": "main1",
"labels": {
"VIP_0": "9.0.0.0:34562"
}
}
],
"privileged": false,
"parameters": [],
"forcePullImage": false
}
},
"portDefinitions": [
{
"port": 10002,
"protocol": "tcp",
"labels": {}
}
]
}
I see the following in the DC/OS virtual network section:
VIRTUAL NETWORK NAME | SUBNET | AGENT PREFIX LENGTH
dcos 9.0.0.0/8 24
The containers stays in waiting for a long time. If I remove the port mapping section it runs successfully.
Basically I need to know how to work with this new virtual network, and fix the service discovery and load balancing without using any extra stuff.
Took me some time to figure it out as well...
You need to:
Remove all ports assignment in the task definition
Describe the name of the network to attach to (default network created is named "dcos")
{
"id": "yourtask",
"container": {
"type": "DOCKER",
"docker": {
"image": "your/image",
"network": "USER"
}
},
"acceptedResourceRoles" : [
"slave_public"
],
"ipAddress": {
"networkName": "dcos"
},
"instances": 2,
"cpus": 0.2,
"mem": 128
}

How to mount HDFS in a Docker container

I made an application Dockerized in a Docker container. I intended to make the application able to access files from our HDFS. The Docker image is to be deployed on the same cluster where we have HDFS installed via Marathon-Mesos.
Below is the json to be POST to Marathon. It seems that my app is able to read and write files in the HDFS. Can someone comment on the safety of this? Would files changed by my app correctly changed in the HDFS as well? I Googled around and didn't find any similar approaches...
{
"id": "/ipython-test",
"cmd": null,
"cpus": 1,
"mem": 1024,
"disk": 0,
"instances": 1,
"container": {
"type": "DOCKER",
"volumes": [
{
"containerPath": "/home",
"hostPath": "/hadoop/hdfs-mount",
"mode": "RW"
}
],
"docker": {
"image": "my/image",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 8888,
"hostPort": 0,
"servicePort": 10061,
"protocol": "tcp",
}
],
"privileged": false,
"parameters": [],
"forcePullImage": true
}
},
"portDefinitions": [
{
"port": 10061,
"protocol": "tcp",
"labels": {}
}
]
}
You might have a look at the Docker volume docs.
Basically, the volumes definition in the app.json would trigger the start of the Docker image with the flag -v /hadoop/hdfs-mount:/home:RW, meaning that the host path gets mapped to the Docker container as /home in read-write mode.
You should be able to verify this if you SSH into the node which is running the app and do a docker inspect <containerId>.
See also
https://mesosphere.github.io/marathon/docs/native-docker.html

Resources