I am very new to consul , and has been reading about consul clustering recently. My understanding is , for each node (equivalent to a physical machine or VM), we will run a local consul agent (in client mode), hence any microservices running in that node will register itself thru this agent. but what happen if this one and only one agent is down, won't the microservices in that node unable to register anymore? Or should we expect more than one consul agent (in client mode) per node to handle such situation?
You are correct. If the Consul agent is down, the services on that host will not be able to register with the agent, and Consul will consider all services which were previously registered against the agent to be unavailable.
A very simple solution is to run Consul under a process manager like systemd, and configure systemd to restart the agent if the process unexpectedly fails. You can find an example systemd unit for this at https://learn.hashicorp.com/tutorials/consul/deployment-guide#configure-systemd. If Consul is installed from the HashiCorp Linux package repo (https://learn.hashicorp.com/tutorials/consul/get-started-install), this systemd unit will be included as part of the installation package.
Related
Should I run consul slaves alongside nomad slaves or inside them?
The later might not make sense at all but I'm asking it just in case.
I brought my own nomad cluster up with consul slaves running alongside nomad slaves (inside worker nodes), my deployable artifacts are docker containers (java spring applications).
The issue with my current setup is that my applications can't access consul slaves (to read configurations) (none of 0.0.0.0, localhost, worker node ip worked)
Lets say my service exposes 8080, I configured docker part (in hcl file) to use bridge as network mode. Nomad maps 8080 to 43210.
Everything is fine until my service tries to reach the consul slave to read configuration. Ideally giving nomad worker node IP as consul host to Spring should suffice. But for some reason it's not.
I'm using latest version of nomad.
I configured my nomad slaves like https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/client1.hcl
And the link below shows how I configured/ran my consul slave:
https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml
Note: if I use static port mapping and host as the network mode for docker (in nomad) I'll be fine but then I can't deploy more than one instance of each application in each worker node (due to port conflic)
Nomad jobs listen on a specific host/port pair.
You might want to ssh into the server and run docker ps to see what host/port pair the job is listening on.
a93c5cb46a3e image-name bash 2 hours ago Up 2 hours 10.0.47.2:21435->8000/tcp, 10.0.47.2:21435->8000/udp foo-bar
Additionally, you will need to ensure that the consul nomad job is listening on port 0.0.0.0, or the specific ip of the machine. I believe that is this config value: https://www.consul.io/docs/agent/options.html#_bind
All those will need to match up in order to consul to be reachable.
More generally, I might recommend: if you're going to run consul with nomad, you might want to switch to host networking, so that you don't have to deal with the specifics of the networking within a container. Additionally, you could schedule consul as a system job so that it is automatically present on every host.
So I managed to solve the issue like this:
nomad.job.group.network.mode = host
nomad.job.group.network.port: port "http" {}
nomad.job.group.task.driver = docker
nomad.job.group.task.config.network_mode = host
nomad.job.group.task.config.ports = ["http"]
nomad.job.group.task.service.connect: connect { native = true }
nomad.job.group.task.env: SERVER_PORT= "${NOMAD_PORT_http}"
nomad.job.group.task.env: SPRING_CLOUD_CONSUL_HOST = "localhost"
nomad.job.group.task.env: SPRING_CLOUD_SERVICE_REGISTRY_AUTO_REGISTRATION_ENABLED = "false"
Running consul agent (slaves) using docker-compose alongside nomad agent (slave) with host as network mode + exposing all required ports.
Example of nomad job: https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/location-update-publisher.hcl
Example of consul agent config (docker-compose file): https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml
Disclaimer: The LAB is part of Cluster Visualization Framework called: LiteArch Trafik which I have created as an interesting exercise to understand Nomad and Consul.
It took me long time to shift my mind from K8S to Nomad and Consul,
Integration them was one of my effort I spent in the last year.
When service resolution doesn't work, I found out it's more or less the DNS configuration on servers.
There is a section for it on Hashicorp documentation called DNS Forwarding
Hashicorp DNS Forwarding
I have created a LAB which explains how to set up Nomad and Consul.
But you can use the LAB seperately.
I created the LAB after learning the hard way how to install the cluster and how to integrate Nomad and Consul.
With the LAB you need Ubuntu Multipass installed.
You execute one script and you will get full functional Cluster locally with three servers and three nodes.
It shows you as well how to install docker and integrate the services with Consul and DNS services on Ubuntu.
After running the LAB you will get the links to Nomad, Fabio, Consul.
Hopefully it will guide you through the learning process of Nomad and Consul
LAB: LAB
Trafik:Trafik Visualizer
I am migrating a standard all-linux nomad/consul cluster where the nomad/consul servers use almost no resources with our workloads, and spinning up dedicated linux VMs just for them in our new environment seems a bit wasteful, when the environment I am moving to has multiple windows VMs with spare capacity which I could use for the nomad server and consul server processes to give me the necessary redundancy.
So my question boils down to: If I have the consul server and nomad server processes exclusively on windows and the nomad agent and consul agent processes exclusively on linux-- will they all just get along? The nomad jobs are all dockerized except for a native system prometheus exporter.
Both Consul and Nomad are operating system agnostic. You can use a mix of OS's within your cluster without issue. The main requirement is that you have direct IP connectivity between the agents (i.e., no NAT), low latency (sub 10ms), and the required ports opened for Consul and/or Nomad agent communication.
See https://www.consul.io/docs/install/ports and https://www.nomadproject.io/docs/install/production/requirements#ports-used for more detail.
There is a Consul cluster in my local environment, and some developers' local machines as well. Each developer has a Tomcat server which runs some web artifacts in Docker container, so I want to register these artifacts as services on Tomcat deploy.
Assuming that we have already registered empty node for each developer's local machine, how can i register/deregister a new service on existing node? Do i need consul agent running on any node?
I know it's possible to add service when registering node, but haven't found any info about how to add services to node dynamically. I'd prefer HTTP API if possible (it's much easier to run on local machines).
Do i need consul agent running on any node?
Yes, even though you can add external services to a remote machine using curl post too, the service discovery is going to benifit you with the agent running on nodes too.
I know it's possible to add service when registering node, but haven't found any info about how to add services to node dynamically.
Registering a service is fairly easy on consul and you can find more details at the following link:
https://www.consul.io/intro/getting-started/services.html
However, if you wish to give better isolation to your developers, I would recommend running the consul agent server/client in docker and let registrator take care of everything.
Registrator from gliderlabs is service registry bridge for Docker. It automatically registers and deregisters services for any Docker container by inspecting containers as they come online.
You can find more details here: https://github.com/gliderlabs/registrator
In my microservices system I plan to use docker swarm and Consul.
In order to ensure the high availability of Consul I’m going to build a cluster of 3 server agents (along with a client agent per node), but this doesn’t save me from local consul agent failure.
Am I missing something?
If not, how can I configure swarm to be aware of more than 1 consul agents?
Consul is the only service discovery backend that don't support multiple endpoints while using swarm.
Both zookeeper and etcd support the etcd://10.0.0.4,10.0.0.5 format of providing multiple Ip's for the "cluster" of discovery back-ends while using Swarm.
To answer your question how you can configure Swarm to support more than 1 consul (server) - I don't have a definitive answer to it but can point you in a direction and something you can test ( no guarantees ) :
One suggestion worth testing (which is not recommended for production) is to use a Load Balancer that can pass your requests from the Swarm manager to one of the three consul servers.
So when starting the swarm managers you can point to consul://ip_of_loadbalancer:port
This will however cause the LB to be a bottleneck (if it goes down).
I have not tested the above and can't answer if it will work or not - it is merely a suggestion.
I am playing a little with Docker and Consul and i have a couple of questions regarding agent-service mapping especially in docker environment. Assume i have a service name "myGreatService" being simple web nodejs helloworld application encapsulated with docker image named "myGreatServiceImage". From Consul docs i did understand that when you register a service (through HTTP or service definition file) than service is about to be "wired" to agent/consul node (the wired node can be retrieved via /v1/catalog/service/). So if a consul node is down (or node health check decided it is down) than all services "wired" to that consule node will automatically be marked as down. Am i right ?
If i run my GreatServiceImage image multiple times on a single host via docker (resulting of multiple instances of "myGreatService" service)
how many agents shall I run ?
A single per host managing all containers (all service instances) on that host? Or maybe a separate agent for each container (service instance) ?
If a health check for a service fails then the service will be marked as down and won't show up if you do a DNS query for that service
dig #localhost -p 8500 apache.service.consul
If you do a call to the api you will see that the service is still listed. This is because the service is not removed, it is just marked as down. If you would do an api call to check the health of that service it would be shown as down.
curl localhost/v1/catalog/service/apache
curl localhost/v1/health/service/apache
You can add the ?passing flag to that last call to recieve only the healthy services. (just like the dns query)
curl localhost/v1/health/service/apache?passing
If the consul agent on the host fails then all services running on that host won't show up if you query consul for the services. (either via a dns query or via the api).
As for the number of agents you should be running: Run one consul agent per host. Let your services register themselves via the api of your local consul agent. (or preconfigure all your services in the config files, but I recommend you to make this a dynamic process of self registering)