DC/OS marathon Virtual network not working - mesos

I installed DC/OS with 3 masters and 3 agents and face a problem with virtual networking. Here is my Marathon app spec:
{
"id": "/nginx",
"cmd": null,
"cpus": 1,
"mem": 128,
"disk": 0,
"instances": 1,
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "nginx",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 80,
"hostPort": 0,
"servicePort": 10002,
"protocol": "tcp",
"name": "main1",
"labels": {
"VIP_0": "9.0.0.0:34562"
}
}
],
"privileged": false,
"parameters": [],
"forcePullImage": false
}
},
"portDefinitions": [
{
"port": 10002,
"protocol": "tcp",
"labels": {}
}
]
}
I see the following in the DC/OS virtual network section:
VIRTUAL NETWORK NAME | SUBNET | AGENT PREFIX LENGTH
dcos 9.0.0.0/8 24
The containers stays in waiting for a long time. If I remove the port mapping section it runs successfully.
Basically I need to know how to work with this new virtual network, and fix the service discovery and load balancing without using any extra stuff.

Took me some time to figure it out as well...
You need to:
Remove all ports assignment in the task definition
Describe the name of the network to attach to (default network created is named "dcos")
{
"id": "yourtask",
"container": {
"type": "DOCKER",
"docker": {
"image": "your/image",
"network": "USER"
}
},
"acceptedResourceRoles" : [
"slave_public"
],
"ipAddress": {
"networkName": "dcos"
},
"instances": 2,
"cpus": 0.2,
"mem": 128
}

Related

consul deregister_critical_service_after is not woring

Hello everyone I have a healthcheck on my consul service, my goal is whenever the service is unhealthy then the consul should remove them from the service catalog.
Bellow is my config
{
"service": {
"name": "api",
"tags": [ "api-tag" ],
"port": 80
},
"check": {
"id": "api_up",
"name": "Fetch health check from local nginx",
"http": "http://localhost/HealthCheck",
"interval": "5s",
"timeout": "1s",
"deregister_critical_service_after": "15s"
},
"data_dir": "/consul/data",
"retry_join": [
"192.168.0.1",
"192.168.0.2",
]
}
Thanks for all the helps
The reason the service is not being de-registered is that the check is being specified outside of the service {} block in your JSON. This makes the check a node-level check, not a service-level check.
Here's a pretty-printed version of the config you provided.
{
"service": {
"name": "api",
"tags": [
"api-tag"
],
"port": 80
},
"check": {
"id": "api_up",
"name": "Fetch health check from local nginx",
"http": "http://localhost/HealthCheck",
"interval": "5s",
"timeout": "1s",
"deregister_critical_service_after": "15s"
},
"data_dir": "/consul/data",
"retry_join": [
"192.168.0.1",
"192.168.0.2",
]
}
Below is the configuration you should be using in order to correctly associate the check with the configured service, and de-register the service after the check has been marked as critical for more than 15 seconds.
{
"service": {
"name": "api",
"tags": [
"api-tag"
],
"port": 80,
"check": {
"id": "api_up",
"name": "Fetch health check from local nginx",
"http": "http://localhost/HealthCheck",
"interval": "5s",
"timeout": "1s",
"deregister_critical_service_after": "15s"
}
},
"data_dir": "/consul/data",
"retry_join": [
"192.168.0.1",
"192.168.0.2"
]
}
Note this statement from the docs for DeregisterCriticalServiceAfter.
If a check is in the critical state for more than this configured value, then its associated service (and all of its associated checks) will automatically be deregistered. The minimum timeout is 1 minute, and the process that reaps critical services runs every 30 seconds, so it may take slightly longer than the configured timeout to trigger the deregistration. This should generally be configured with a timeout that's much, much longer than any expected recoverable outage for the given service.

How to register multiple service instances in consul on one machine

I have a consul running locally on a dev machine. I also have one golang service running on two different ports on the same machine. Is there a way to register them as one service but two instances in consul using golang API (for example, is it possible to specify the node name when registering)?
Here's a very basic example which registers two instances of a service named my-service. Each instance is configured to listen on a different port, 8080 and 8081 respectively.
The key thing to note is that the service instances are also registered with a unique service ID in order to disambiguate between instance A and instance B of my-service which are running on the same agent.
package main
import (
"fmt"
"github.com/hashicorp/consul/api"
)
func main() {
// Get a new client
client, err := api.NewClient(api.DefaultConfig())
if err != nil {
panic(err)
}
service_name := "my-service"
service_ports := [2]int{8080, 8081}
for idx, port := range service_ports {
svc_reg := &api.AgentServiceRegistration{
ID: fmt.Sprintf("%s-%d", service_name, idx),
Name: service_name,
Port: port,
}
client.Agent().ServiceRegister(svc_reg)
}
}
After running go mod init consul-register (or any module name), and executing the code with go run main.go, you can see the service has been registered in the catalog.
$ consul catalog services
consul
my-service
Both service instances are correctly being returned for service discovery queries over DNS or HTTP.
$ dig #127.0.0.1 -p 8600 -t SRV my-service.service.consul +short
1 1 8080 b1000.local.node.dc1.consul.
1 1 8081 b1000.local.node.dc1.consul.
$ curl localhost:8500/v1/health/service/my-service
[
{
"Node": {
"ID": "11113853-a8e0-5787-7482-538078db855a",
"Node": "b1000.local",
"Address": "127.0.0.1",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "127.0.0.1",
"lan_ipv4": "127.0.0.1",
"wan": "127.0.0.1",
"wan_ipv4": "127.0.0.1"
},
"Meta": {
"consul-network-segment": ""
},
"CreateIndex": 11,
"ModifyIndex": 13
},
"Service": {
"ID": "my-service-0",
"Service": "my-service",
"Tags": [],
"Address": "",
"Meta": null,
"Port": 8080,
"Weights": {
"Passing": 1,
"Warning": 1
},
"EnableTagOverride": false,
"Proxy": {
"Mode": "",
"MeshGateway": {},
"Expose": {},
"TransparentProxy": {}
},
"Connect": {},
"CreateIndex": 14,
"ModifyIndex": 14
},
"Checks": [
{
"Node": "b1000.local",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": "",
"ServiceTags": [],
"Type": "",
"Definition": {},
"CreateIndex": 11,
"ModifyIndex": 11
}
]
},
{
"Node": {
"ID": "11113853-a8e0-5787-7482-538078db855a",
"Node": "b1000.local",
"Address": "127.0.0.1",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "127.0.0.1",
"lan_ipv4": "127.0.0.1",
"wan": "127.0.0.1",
"wan_ipv4": "127.0.0.1"
},
"Meta": {
"consul-network-segment": ""
},
"CreateIndex": 11,
"ModifyIndex": 13
},
"Service": {
"ID": "my-service-1",
"Service": "my-service",
"Tags": [],
"Address": "",
"Meta": null,
"Port": 8081,
"Weights": {
"Passing": 1,
"Warning": 1
},
"EnableTagOverride": false,
"Proxy": {
"Mode": "",
"MeshGateway": {},
"Expose": {},
"TransparentProxy": {}
},
"Connect": {},
"CreateIndex": 15,
"ModifyIndex": 15
},
"Checks": [
{
"Node": "b1000.local",
"CheckID": "serfHealth",
"Name": "Serf Health Status",
"Status": "passing",
"Notes": "",
"Output": "Agent alive and reachable",
"ServiceID": "",
"ServiceName": "",
"ServiceTags": [],
"Type": "",
"Definition": {},
"CreateIndex": 11,
"ModifyIndex": 11
}
]
}
]

Registering Multiple Same-Host Services

I am using the Consul API to register a local web-service running on various ports on my local machine. My end-goal is to be able to run multiple backends and load balance against them on different ports.
I am running a local Consul server of one node for development in a Vagrant VM. I have registered the first instance of my service:
{
"Node": {
"ID": "49d3be4b-5ee5-5f0f-e145-dcb1782e5b4b",
"Node": "localhost",
"Address": "127.0.0.1",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "127.0.0.1",
"wan": "127.0.0.1"
},
"Meta": {
"consul-network-segment": ""
},
"CreateIndex": 5,
"ModifyIndex": 6
},
"Services": {
"consul": {
"ID": "consul",
"Service": "consul",
"Tags": [],
"Address": "",
"Port": 8300,
"EnableTagOverride": false,
"CreateIndex": 5,
"ModifyIndex": 5
},
"rusty": {
"ID": "rusty",
"Service": "rusty",
"Tags": [
"rusty",
"rust"
],
"Address": "127.0.0.1",
"Port": 8001,
"EnableTagOverride": false,
"CreateIndex": 247,
"ModifyIndex": 491
}
}
}
You can see my service, rusty, registered on port 8001. The strange thing is that when I register the same service on a different port, Consul supersedes port 8001 with the new service port.
Is there not a way to run multiple backends for a service on different ports on the same host?
Try to check that you are registering services with different IDs. For complete info see the parameters for /agent/service/register endpoint.
Here is an example with two rusty service instances with different IDs rusty1 and rusty2
{
"Node": {
"ID": "eff2fae3-6ee5-5de7-bf1a-c041992a1d6a",
"Node": "FB20160707",
"Address": "192.168.1.66",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "192.168.1.66",
"wan": "192.168.1.66"
},
"Meta": {},
"CreateIndex": 5,
"ModifyIndex": 6
},
"Services": {
"consul": {
"ID": "consul",
"Service": "consul",
"Tags": [],
"Address": "",
"Port": 8300,
"EnableTagOverride": false,
"CreateIndex": 5,
"ModifyIndex": 5
},
"rusty1": {
"ID": "rusty1",
"Service": "rusty",
"Tags": [],
"Address": "10.10.10.10",
"Port": 8001,
"EnableTagOverride": false,
"CreateIndex": 16,
"ModifyIndex": 28
},
"rusty2": {
"ID": "rusty2",
"Service": "rusty",
"Tags": [],
"Address": "10.10.10.10",
"Port": 8002,
"EnableTagOverride": false,
"CreateIndex": 19,
"ModifyIndex": 29
}
}
}
As per my comment to #ruslan-sennov, if the services section looked like this (the ID for each instance of the rusty service is made unique by adding the port, but the name is kept as rusty):
"Services": {
"consul": {
"ID": "consul",
"Service": "consul",
"Tags": [],
"Address": "",
"Port": 8300,
"EnableTagOverride": false,
"CreateIndex": 5,
"ModifyIndex": 5
},
"rusty": {
"ID": "rusty:8001",
"Service": "rusty",
"Tags": [
"rusty",
"rust"
],
"Address": "127.0.0.1",
"Port": 8001,
"EnableTagOverride": false,
"CreateIndex": 247,
"ModifyIndex": 491
},
"rusty": {
"ID": "rusty:8002",
"Service": "rusty",
"Tags": [
"rusty",
"rust"
],
"Address": "127.0.0.1",
"Port": 8002,
"EnableTagOverride": false,
"CreateIndex": 247,
"ModifyIndex": 491
}
}
This then means you can query the rusty service with a SRV query and get detail on which ports are available:
dig #127.0.0.1 rusty.service.consul SRV
; <<>> DiG 9.11.3 <<>> rusty.service.consul SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56091
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 52, AUTHORITY: 0, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;rusty.service.consul. IN SRV
;; ANSWER SECTION:
rusty.service.consul. 0 IN SRV 1 1 8001 FB20160707.node.dc1.consul.
rusty.service.consul. 0 IN SRV 1 1 8002 FB20160707.node.dc1.consul.
If you also change the names to be unique (rusty1 and rusty2 as suggested by Ruslan) you lose this querying ability.
I know this is late to answer this, but hope this would help someone.
As per Spring Cloud Consul docs, Add this to bootstrap.yml.
spring:
cloud:
consul:
discovery:
instanceId: ${spring.application.name}:${vcap.application.instance_id:${spring.application.instance_id:${random.value}}}

How to mount HDFS in a Docker container

I made an application Dockerized in a Docker container. I intended to make the application able to access files from our HDFS. The Docker image is to be deployed on the same cluster where we have HDFS installed via Marathon-Mesos.
Below is the json to be POST to Marathon. It seems that my app is able to read and write files in the HDFS. Can someone comment on the safety of this? Would files changed by my app correctly changed in the HDFS as well? I Googled around and didn't find any similar approaches...
{
"id": "/ipython-test",
"cmd": null,
"cpus": 1,
"mem": 1024,
"disk": 0,
"instances": 1,
"container": {
"type": "DOCKER",
"volumes": [
{
"containerPath": "/home",
"hostPath": "/hadoop/hdfs-mount",
"mode": "RW"
}
],
"docker": {
"image": "my/image",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 8888,
"hostPort": 0,
"servicePort": 10061,
"protocol": "tcp",
}
],
"privileged": false,
"parameters": [],
"forcePullImage": true
}
},
"portDefinitions": [
{
"port": 10061,
"protocol": "tcp",
"labels": {}
}
]
}
You might have a look at the Docker volume docs.
Basically, the volumes definition in the app.json would trigger the start of the Docker image with the flag -v /hadoop/hdfs-mount:/home:RW, meaning that the host path gets mapped to the Docker container as /home in read-write mode.
You should be able to verify this if you SSH into the node which is running the app and do a docker inspect <containerId>.
See also
https://mesosphere.github.io/marathon/docs/native-docker.html

Configuring prometheus mesos-exporter running on mesosphere DCOS

I am trying to set up mesos exporter on my mesosphere DCOS cluster. The link I am referring to is https://github.com/prometheus/mesos_exporter. The JSON file I have used is :
{
"id": "/mesosexporter",
"instances": 6,
"cpus": 0.1,
"mem": 25,
"constraints": [["hostname", "UNIQUE"]],
"acceptedResourceRoles": ["slave_public","*"],
"container": {
"type": "DOCKER",
"docker": {
"image": "prom/mesos-exporter",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 9105,
"hostPort": 9105,
"protocol": "tcp"
}
]
}
},
"healthChecks": [{
"protocol": "TCP",
"gracePeriodSeconds": 600,
"intervalSeconds": 30,
"portIndex": 0,
"timeoutSeconds": 10,
"maxConsecutiveFailures": 2
}]
}
But only meter exposed to Prometheus is 'mesos_exporter_slave_scrape_errors_total'. What are the other meters which mesos exporter exposes to Promethues. The readme from the github of mesos-exporter says that we need to provide command line flags, but if I want to run mesos exporter as a docker container how should I specify the configuration?
EDIT - The meter 'mesos_exporter_slave_scrape_errors_total' gives non-zero value, indicating that errors occurred during the scrape.
EDIT - After adding the 'parameter' primitive my JSON file looks like:
{
"id": "/mesosexporter",
"instances": 1,
"cpus": 0.1,
"mem": 25,
"constraints": [["hostname", "UNIQUE"]],
"acceptedResourceRoles": ["slave_public"],
"container": {
"type": "DOCKER",
"docker": {
"image": "prom/mesos-exporter",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 9105,
"hostPort": 9105,
"protocol": "tcp"
}
],
"privileged": true,
"parameters": [
{ "key": "-exporter.discovery", "value": "true" },
{ "key": "-exporter.discovery.master-url",
"value": "http://mymasterDNS.amazonaws.com:5050" }
]
}
},
"healthChecks": [{
"protocol": "TCP",
"gracePeriodSeconds": 600,
"intervalSeconds": 30,
"portIndex": 0,
"timeoutSeconds": 10,
"maxConsecutiveFailures": 2
}]
}
Mesos version: 0.22.1
Marathon version: 0.8.2-SNAPSHOT
The app remains in 'deploying' state after using this JSON
But only meter exposed to Prometheus is 'mesos_exporter_slave_scrape_errors_total'. What are the other meters which mesos exporter exposes to Promethues.
If the mesos-exporter is listening on port 9100, you can check http://<hostname>:9100/metrics to know what metrics are being exposed. I am referring the prom/node-exporter that I have setup on one of the systems.
but if I want to run mesos exporter as a docker container how should I specify the configuration?
I am assuming you're POST'ing this JSON file to the Marathon REST API to start Docker containers. If that is indeed the case, you can specify additional options using parameters JSON directive. More info can be found on Marathon docs under the section Privileged Mode and Arbitrary Docker Options.
Hope that helps!
Using the args primitive solved the problem. The equivalent docker command is docker run -p 9105:9105 prom/mesos-exporter -exporter.discovery=true -exporter.discovery.master-url="mymasternodeDNS:5050" -log.level=debug
As the parameters 'exporter.discovery', 'exporter.discovery.master-url' and 'log.level' are for the image entry point and not for 'docker run', 'args' has to be used.
The format for 'args' as added in the working JSON is:
"args": [
"-exporter.discovery=true",
"-exporter.discovery.master-url=http://mymasternodeDNS:5050",
"-log.level=debug"]

Resources