I currently have 2 services running on a single node (EC2 instance) with a consul client. I would like to health check both of these by hitting a single endpoint, namely: http://localhost:8500/v1/agent/health/service/id/AliasService based on the information Consul provides from https://www.consul.io/api/agent/service.html.
The issue is that I can't seem to find any sort of documentation regarding this AliasService, just that I can use it to run health checks. I've tried putting it into my service definitions but to no avail. It just seems to ignore it altogether.
It seems that what you need is to manually define both services and then attach HTTP health check to one of them and alias health check to the other. What is being aliased here is not service but health check.
For example:
$ consul services register -name ssh -port 22
$ consul services register -name ssh-alias -port 22 -address 172.17.0.1
$ cat >ssh-check.json
{
"ID": "ssh",
"Name": "SSH TCP on port 22",
"ServiceID": "ssh",
"TCP": "localhost:22",
"Interval": "10s",
"Timeout": "1s"
}
$ curl --request PUT --data #ssh-check.json http://127.0.0.1:8500/v1/agent/check/register
$ cat >ssh-alias-check.json
{
"ID": "ssh-alias",
"Name": "SSH TCP on port 22 - alias",
"ServiceID": "ssh-alias",
"AliasService": "ssh"
}
$ curl --request PUT --data #ssh-alias-check.json http://127.0.0.1:8500/v1/agent/check/register
Here I have defined two separate services and two health checks. But only the first health check is doing actual work, the second is aliasing health status from one service to the other.
Related
I want to fetch service health information from consul. How can I search a service with curl cmd when there is space in their name and tag both
one more question is curl --get http://127.0.0.1:8500/v1/health/checks/$service will check for service check, I want to check if a Node check is failing for a service or not. How to do that?
curl --get http://127.0.0.1:8500/v1/health/checks/$service --data-urlencode 'filter=Status == "critical"'
here if service name and tag both are "ldisk gd" then with this cmd it will throw
curl: (6) Could not resolve host: gd; Name or service not known
Can't pass the name within quotes with that getting Bad request error
Let say I registered a service in consul, so that I can query it by something like:
curl http://localhost:8500/v1/catalog/service/BookStore.US
and it returns
[
{
"ID": "xxxxx-xxx-...",
"ServiceName": "BookStore.US",
...
}
]
If I am using consul directly in my code, it is ok. But the problem is that when I want to use the SRV record directly, it does not work.
Normally, there is a service record created by consul with the name service_name.service.consul. In the above case, it is "BookStore.US.service.consul"
So that you can use "dig" command to get it.
dig #127.0.0.1 -p 8600 BookStore.US.service.consul SRV
But when I tried to "dig" it, it failed with 0 answer session.
My question:
How does consul construct the service/SRV name (pick up some fields in the registered consul record and concat them?)
Is there any way for me to search the SRV records with wildcards, so that at least I can search the SRV name by using the key word "BookStore"
The SRV lookup is not working because Consul is interpreting the . in the service name as domain separator in the hostname.
Per https://www.consul.io/docs/discovery/dns#standard-lookup, service lookups in Consul can use the following format.
[tag.]<service>.service[.datacenter].<domain>
The tag and datacenter components are optional. The other components must be specified. Given the name BookStore.US.service.consul, Consul interprets the components to be:
Tag: BookStore
Service: US
Sub-domain: service
TLD: consul
Since you do not have a service registered by the name US, the DNS server correctly responds with zero records.
In order to resolve this, you can do one of two things.
Register the service with a different name, such as bookstore-us.
{
"Name": "bookstore-us",
"Port": 1234
}
Specify the US location as a tag in the service registration.
{
"Name": "bookstore",
"Tags": ["us"],
"Port": 1234
}
Note that in either case, the service name should be a valid DNS label. That is, it may contain only the ASCII letters a through z (in a case-insensitive manner), the digits 0 through 9, and the hyphen-minus character ('-').
The SRV query should then successfully return a result for the service lookup.
# Period in hostname changed to a hyphen
$ dig -t SRV bookstore-us.service.consul +short
# If `US` is a tag:
# Standard lookup
$ dig -t SRV us.bookstore.service.consul +short
# RFC 2782-style lookup
$ dig -t SRV _bookstore._us.service.consul +short
I'm using Consul to monitor services health status. I use Consul watch command to fire a handler when some service is failed. Currently I'm using this command:
consul watch -type=checks -state=passing /home/consul/health.sh
This works, however I'd like to know inside health.sh the name of the failed service, so I could send a proper alert message containing failed service name. How can I get failed service name there?
Your script could get all the required information by reading from stdin. Information will be sent as JSON. You can easily examine those events by simply adding cat - | jq . into your handler.
The check information outputted by consul watch -type=check contains a ServiceName field that contains the name of the service the check is associated with.
[
{
"Node": "foobar",
"CheckID": "service:redis",
"Name": "Service 'redis' check",
"Status": "passing",
"Notes": "",
"Output": "",
"ServiceID": "redis",
"ServiceName": "redis"
}
]
(See https://www.consul.io/docs/dynamic-app-config/watches#checks for official docs.)
Checks associated with services should have values in both the ServiceID and ServiceName fields. These fields will be empty for node level checks.
The following command watches changes in health checks, and outputs the name of a service when its check transitions to a state other than "passing" (i.e., warning or critical).
$ consul watch -type=checks -state=passing "jq --raw-output '.[] | select(.ServiceName!=\"\" and .Status!=\"passing\") | .ServiceName'"
Is it possible to use existing LetsEncrypt certificates (.pem format) in Traefik?
I have Traefik/Docker set up to generate acme.json - can I import my existing certificates for a set of domains?
Eventually I found the correct solution - not to use Traefik's ACME integration but instead to simply mount a network volume (EFS) containing certificates as issued by certbot in manual mode.
Why was this my chosen method? Because I'm mounting that certificate-holding NFS volume on two servers (blue and green). These servers are the live & staging servers for the web servers. At any time one will be "live" and the other can be either running a release candidate or otherwise in a "hot standby" role.
For this reason, better to make a separation of concerns and have a third server run as a dedicated "certificate manager". This t2.nano server will basically never be touched and has the sole responsibility of running certbot once a week, writing the certificates into an NFS mount that is shared (in read-only mode) by the two web servers.
In this way, Traefik runs on both blue and green servers to takes care of its primary concern of proxying web traffic, and simply points to the certbot-issued certificate files. For those who found this page and could benefit from the same solution, here is the relevant extract from my traefik.toml file:
defaultEntryPoints = ["https","http"]
[docker]
watch = true
exposedbydefault = false
swarmMode = true
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "/cert.pem"
keyFile = "/privkey.pem"
Here is the relevant section from my Docker swarm stack file:
version: '3.2'
volumes:
composer:
networks:
traefik:
external: true
services:
proxy:
image: traefik:latest
command: --docker --web --docker.swarmmode --logLevel=DEBUG
ports:
- "80:80"
- "443:443"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./traefik.toml:/traefik.toml
- "./certs/live/example.com/fullchain.pem:/cert.pem"
- "./certs/live/example.com/privkey.pem:/privkey.pem"
networks:
- traefik
And finally here is the command that cron runs once a week on the dedicated cert server, configured to use ACME v2 for wildcard certs and Route 53 integration for challenge automation:
sudo docker run -it --rm --name certbot \
-v `pwd`/certs:/etc/letsencrypt \
-v `pwd`/lib:/var/lib/letsencrypt \
-v `pwd`/log:/var/log/letsencrypt \
--env-file ./env \
certbot/dns-route53 \
certonly --dns-route53 \
--server https://acme-v02.api.letsencrypt.org/directory \
-d example.com \
-d example.net \
-d *.example.com \
-d *.example.net \
--non-interactive \
-m me#example.org \
--agree-tos
The folder certs is the NFS volume shared between the three servers.
Despite that presumably anyone wonders why you actually want/need this and anyone will advise you against this, because traefik handles the automatic re-challenge very well, here's what the acme.json looks like:
{
"Account": {
"Email": "acme#example.com",
"Registration": {
"body": {
"status": "valid",
"contact": [
"mailto:acme#example.com"
]
},
"uri": "https://acme-v02.api.letsencrypt.org/acme/acct/12345678"
},
"PrivateKey": "ABCD...EFG="
},
"Certificates": [
{
"Domain": {
"Main": "example.com",
"SANs": null
},
"Certificate": "ABC...DEF=",
"Key": "ABC...DEF"
},
{
"Domain": {
"Main": "anotherexample.com",
"SANs": null
},
"Certificate": "ABC...DEF==",
"Key": "ABC...DEF=="
}
],
"HTTPChallenges": {}
}
Now it's up to you to write an import script or template parser to loop through your certificates/keys and put the content into the brackets or just generate your own json file in that format.
Notice that it's a slightly different format than classic pem style (i.e without line-breaks).
I just want to add that I meanwhile found another approach while I had the need to use HAProxy:
run one traefik only for acme at port 81 behind HAProxy
acl acme path_beg -i /.well-known/acme-challenge/
use_backend acme if acme
have a small webserver providing the acme.json (i.e. shell2http)
have a cronjob (i use gitlab ci) which downloads the acme.json and extracts the certs (https://github.com/containous/traefik/blob/821ad31cf6e7721ba231164ab468ee554983560f/contrib/scripts/dumpcerts.sh)
append the certs to a traefik.toml template
build a docker image and push to private registry
run this private traefik instance as main traefik on 80/443 in read-only mode on any backend-servers behind HAProxy
write a file-watcher script which restarts all ro-traefiks when acme.json changes and trigger dumpcert-script
Completing sgohl reply:
IMPORTANT: make sure that private key are 4096 bit long. 2048 bit will NOT work, and traefik will try to request a new certificate to Letsencrypt.
Identify your (in this example certbot) certificates. Example:
/etc/letsencrypt/live
├── example.com
│ ├── cert.pem -> ../../archive/example.com/cert6.pem
│ ├── chain.pem -> ../../archive/example.com/chain6.pem
│ ├── fullchain.pem -> ../../archive/example.com/fullchain6.pem
│ ├── privkey.pem -> ../../archive/example.com/privkey6.pem
│ └── README
Check that private key is 4096 bit long:
openssl rsa -text -in privkey.pem | grep bit
Expected result similar to:
writing RSA key
RSA Private-Key: (4096 bit, 2 primes)
Generate string containing desired certificates.
3.1. Certificate
_IN=/etc/letsencrypt/live/example.com/fullchain.pem
_OUT=~/traefik_certificate
cat $_IN | base64 | tr '\n' ' ' | sed --expression='s/\ //g' > $_OUT
3.2. Private key
_IN=/etc/letsencrypt/live/example.com/privkey.pem
_OUT=~/traefik_key
cat $_IN | base64 | tr '\n' ' ' | sed --expression='s/\ //g' > $_OUT
Prepare code snippet containing above strings:
...
"Certificates": [
{
"domain": {
"main": "example.com"
},
"certificate": "CONTENT_OF_traefik_certificate",
"key": "CONTENT_OF_certificate",
"Store": "default"
},
...
Place above code snippet in the right place (below 'Certificates' JSON key) in traefik's acme.json file:
vim /path/to/acme.json
No restart of traefik container is needed to pick the new certificate.
I'm setting elasticsearch cluster on GCE, eventualy it will be used from within the application which is on the same network, but for now while developing, i want to have access from my dev env. Also even on long term i would have to access kibana from external network so need to know how to allow that. For now i'm not taking care of any security considirations.
The cluster (currently one node) on GCE instance, centos7.
It has external host enabled, the ephemeral option.
I can access 9200 from within the instance:
es-instance-1 ~]$ curl -XGET 'localhost:9200/?pretty'
But not via the external ip which when i test it shows 9200 closed:
es-instance-1 ~]$ nmap -v -A my-external-ip | grep 9200
9200/tcp closed wap-wsp
localhost as expected is good:
es-instance-1 ~]$ nmap -v -A localhost | grep 9200
Discovered open port 9200/tcp on 127.0.0.1
I see a similar question here and following it i went to create a firewall rule. First added a tag 'elastic-cluster' to the instance and then a rule
$ gcloud compute instances describe es-instance-1
tags:
items:
- elastic-cluster
$ gcloud compute firewall-rules create allow-elasticsearch --allow TCP:9200 --target-tags elastic-cluster
Here it's listed as created:
gcloud compute firewall-rules list
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS
allow-elasticsearch default 0.0.0.0/0 tcp:9200 elastic-cluster
so now there is a rule which supposed to allow 9200
but still not reachable
es-instance-1 ~]$ nmap -v -A my-external-ip | grep 9200
9200/tcp closed wap-wsp
What am i missing?
Thanks