consul servers without LAN every server connected through wan - consul

im just getting started learning nomad and consul
i have several servers without a Local Area Network and they are connected through wam (which i think you mean by datacenters) every server is a datacenter
i found in the docs https://www.consul.io/docs/architecture that each datacenter should have 3 to 5 consul servers so is my case applicable with consul and nomad
should i make all of the consul servers or 3 servers as consul and the rest are consul clients

You could use:
3*consul servers or
2*consul clients and one server
I have every tested all these cases they work correctly

Related

Nomad and consul setup

Should I run consul slaves alongside nomad slaves or inside them?
The later might not make sense at all but I'm asking it just in case.
I brought my own nomad cluster up with consul slaves running alongside nomad slaves (inside worker nodes), my deployable artifacts are docker containers (java spring applications).
The issue with my current setup is that my applications can't access consul slaves (to read configurations) (none of 0.0.0.0, localhost, worker node ip worked)
Lets say my service exposes 8080, I configured docker part (in hcl file) to use bridge as network mode. Nomad maps 8080 to 43210.
Everything is fine until my service tries to reach the consul slave to read configuration. Ideally giving nomad worker node IP as consul host to Spring should suffice. But for some reason it's not.
I'm using latest version of nomad.
I configured my nomad slaves like https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/client1.hcl
And the link below shows how I configured/ran my consul slave:
https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml
Note: if I use static port mapping and host as the network mode for docker (in nomad) I'll be fine but then I can't deploy more than one instance of each application in each worker node (due to port conflic)
Nomad jobs listen on a specific host/port pair.
You might want to ssh into the server and run docker ps to see what host/port pair the job is listening on.
a93c5cb46a3e image-name bash 2 hours ago Up 2 hours 10.0.47.2:21435->8000/tcp, 10.0.47.2:21435->8000/udp foo-bar
Additionally, you will need to ensure that the consul nomad job is listening on port 0.0.0.0, or the specific ip of the machine. I believe that is this config value: https://www.consul.io/docs/agent/options.html#_bind
All those will need to match up in order to consul to be reachable.
More generally, I might recommend: if you're going to run consul with nomad, you might want to switch to host networking, so that you don't have to deal with the specifics of the networking within a container. Additionally, you could schedule consul as a system job so that it is automatically present on every host.
So I managed to solve the issue like this:
nomad.job.group.network.mode = host
nomad.job.group.network.port: port "http" {}
nomad.job.group.task.driver = docker
nomad.job.group.task.config.network_mode = host
nomad.job.group.task.config.ports = ["http"]
nomad.job.group.task.service.connect: connect { native = true }
nomad.job.group.task.env: SERVER_PORT= "${NOMAD_PORT_http}"
nomad.job.group.task.env: SPRING_CLOUD_CONSUL_HOST = "localhost"
nomad.job.group.task.env: SPRING_CLOUD_SERVICE_REGISTRY_AUTO_REGISTRATION_ENABLED = "false"
Running consul agent (slaves) using docker-compose alongside nomad agent (slave) with host as network mode + exposing all required ports.
Example of nomad job: https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/nomad/location-update-publisher.hcl
Example of consul agent config (docker-compose file): https://github.com/bmd007/statefull-geofencing-faas/blob/master/infrastructure/server2.yml
Disclaimer: The LAB is part of Cluster Visualization Framework called: LiteArch Trafik which I have created as an interesting exercise to understand Nomad and Consul.
It took me long time to shift my mind from K8S to Nomad and Consul,
Integration them was one of my effort I spent in the last year.
When service resolution doesn't work, I found out it's more or less the DNS configuration on servers.
There is a section for it on Hashicorp documentation called DNS Forwarding
Hashicorp DNS Forwarding
I have created a LAB which explains how to set up Nomad and Consul.
But you can use the LAB seperately.
I created the LAB after learning the hard way how to install the cluster and how to integrate Nomad and Consul.
With the LAB you need Ubuntu Multipass installed.
You execute one script and you will get full functional Cluster locally with three servers and three nodes.
It shows you as well how to install docker and integrate the services with Consul and DNS services on Ubuntu.
After running the LAB you will get the links to Nomad, Fabio, Consul.
Hopefully it will guide you through the learning process of Nomad and Consul
LAB: LAB
Trafik:Trafik Visualizer

Can I run an integrated nomad/consul cluster with windows servers and linux clients?

I am migrating a standard all-linux nomad/consul cluster where the nomad/consul servers use almost no resources with our workloads, and spinning up dedicated linux VMs just for them in our new environment seems a bit wasteful, when the environment I am moving to has multiple windows VMs with spare capacity which I could use for the nomad server and consul server processes to give me the necessary redundancy.
So my question boils down to: If I have the consul server and nomad server processes exclusively on windows and the nomad agent and consul agent processes exclusively on linux-- will they all just get along? The nomad jobs are all dockerized except for a native system prometheus exporter.
Both Consul and Nomad are operating system agnostic. You can use a mix of OS's within your cluster without issue. The main requirement is that you have direct IP connectivity between the agents (i.e., no NAT), low latency (sub 10ms), and the required ports opened for Consul and/or Nomad agent communication.
See https://www.consul.io/docs/install/ports and https://www.nomadproject.io/docs/install/production/requirements#ports-used for more detail.

Why is Consul client necessary?

For Eureka, services can register themselves to Eureka server directly. Why should we send request to Consul client instead of Consul server? Are there any problems to let services communicate with Consul server directly?
Appreciate your help, thanks!
No, there is no problem in communicating directly to the servers.
Consul clients are used in big data centers with many (5+) Consul agents. The Consul developers recommend to use three to five server agents per datacenter. If you need more agents (for hundreds of micro services e.g.) than you should use client agents that are connected to server agents instead of launching more server agents which will decrease performance.
But in a smaller datacenter there is no problem using server agents directly.

Achieve Fault Tolerance with Consul Cluster

I have created consul server cluster using different ports in localhost.
I used below commands for that.
server 1:
consul agent -server -bootstrap-expect=3 -data-dir=consul-data -ui -bind=127.0.0.1 -dns-port=8601 -http-port=8501 -serf-lan-port=8303 -serf-wan-port=8304 -server-port=8305 -node=node1
server 2:
consul agent -server -bootstrap-expect=3 -data-dir=consul-data2 -ui -bind=127.0.0.1 -dns-port=8602 -http-port=8502 -serf-lan-port=8306 -serf-wan-port=8307 -server-port=8308 -node=node2 -join=127.0.0.1:8303
server 3:
consul agent -server -bootstrap-expect=3 -data-dir=consul-data1 -ui -bind=127.0.0.1 -node=node3 -join=127.0.0.1:8303
Then I created 2 microservices using spring boot, called service_A and service_B.
Service_B calls service_A to get some data.
Both services get registered with one of the above servers.
In application.properties:
spring.cloud.consul.port=8501 #For service_A
spring.cloud.consul.port=8502 #For service_B
This works fine as Service_B discovers Service_A without any problem.
Now When I kill the consul server which service_A got registered, system fails to give results since Service_B cannot find Service_A.
How should I make this system fault tolerant, Which means even though the consul server fails, services who registered with that server automatically get registered with another server which is available in the cluster.
Further I need to know how consul achieves High availability and fault tolerance in service registration and discovery. Hope you get the Question.
Apparently, you can deploy a consul cluster in your local machine but you cannot expect any resilience mechanism or fault tolerance in that same local machine. It's because your spring services (service_A & service_B) has been configured to identify the consul server which runs in the given consul server port under bootstrap.yml (default 8500).
spring:
cloud:
consul:
config:
watch:
enabled: true
port: 8500
discovery:
instanceId: ${spring.application.name}:${random.value}
So each services will discover the consul servers that runs under 8500 port (you can change it as you wish). If you are running your consul cluster in your same local machine you cannot assign the same port number (8500) to each cluster nodes that need to be identified. It will be differed in order to run under same ip address. To achieve this you will need to deploy each consul nodes under different ip addresses with the same port number 8500.
8301 is the serf LAN port that used to handle gossip in the LAN. Even this port can be the same in each nodes to maintain the cluster inter-connection.
The easiest way to achieve this is that to use a private subnet in a AWS VPC.
And then you can assign separate configurations for each subnet nodes with the same port number for each server nodes so that it can be identified by your services_A & service_B with #EnableDiscoveryClient annotation.

HA for the local Consul agent with Docker-Swarm

In my microservices system I plan to use docker swarm and Consul.
In order to ensure the high availability of Consul I’m going to build a cluster of 3 server agents (along with a client agent per node), but this doesn’t save me from local consul agent failure.
Am I missing something?
If not, how can I configure swarm to be aware of more than 1 consul agents?
Consul is the only service discovery backend that don't support multiple endpoints while using swarm.
Both zookeeper and etcd support the etcd://10.0.0.4,10.0.0.5 format of providing multiple Ip's for the "cluster" of discovery back-ends while using Swarm.
To answer your question how you can configure Swarm to support more than 1 consul (server) - I don't have a definitive answer to it but can point you in a direction and something you can test ( no guarantees ) :
One suggestion worth testing (which is not recommended for production) is to use a Load Balancer that can pass your requests from the Swarm manager to one of the three consul servers.
So when starting the swarm managers you can point to consul://ip_of_loadbalancer:port
This will however cause the LB to be a bottleneck (if it goes down).
I have not tested the above and can't answer if it will work or not - it is merely a suggestion.

Resources