Traefik and custom certificate renewal - lets-encrypt

I have a specific certificate generated by letsencrypt.
In my traefik config, I have:
kind: ConfigMap
apiVersion: v1
metadata:
name: traefik-config
data:
traefik.toml: |
# traefik.toml
defaultEntryPoints = ["http","https"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "/etc/xxx/my-cert.crt"
keyFile = "/etc/xxx/my-cert.key"
[acme] # Automatically add Let's Encrypt Certificate.
storage= "/etc/certificate/acme.json"
email = "john.doe#company.com"
entryPoint = "https"
onHostRule = true
caServer = "https://acme-v02.api.letsencrypt.org/directory"
[acme.dnsChallenge]
provider = "route53"
delayBeforeCheck = 0
[[acme.domains]]
main = "*.company.com"
#[[acme.domains]]
# main = "*.espace-client.company.com"
Thing is my certicate :
/etc/xxx/my-cert.crt
will end in 10 days.
I also have the a certificate for the wild card: *.company.com
Will traefik renew it automatically or should I do something ?

According to the documentation, the certificate should never end in 10 days. Something must be wrong.
"If there are less than 30 days remaining before the certificate
expires, Traefik will attempt to renew it automatically."
You should check the logs of your traefik container:
docker logs traefik-container

Related

Traefik Redirect Domain to Subdomain

I want to permanently redirect all requests to example.com and www.example.com to blog.example.com in a TLS environment.
My current config:
traefik.toml:
[entryPoints]
[entryPoints.web]
address = ":80"
[entryPoints.web.http.redirections.entryPoint]
to = "websecure"
scheme = "https"
[entryPoints.websecure]
address = ":443"
[providers.docker]
exposedbydefault = false
watch = true
network = "web"
[providers.file]
filename = "traefik_dynamic.toml"
[certificatesResolvers.lets-encrypt.acme]
email = "mymail#example.com"
storage = "/letsencrypt/acme.json"
[certificatesResolvers.lets-encrypt.acme.dnsChallenge]
provider = "myprovider"
traefik_dynamic.toml:
[http.middlewares]
[http.middlewares.goToBlog.redirectregex]
regex = "^https://(.*)example.com/(.*)"
replacement = "https://blog.example.com/$${2}"
permanent = true
[http.routers]
[http.routers.gotoblog]
rule = "Host(`example.com`) || Host(`www.example.com`)"
entrypoints = ["websecure"]
middlewares = ["goToBlog"]
service = "noop#internal"
[http.routers.gotoblog.tls]
certResolver = "lets-encrypt"
When I try to access example.com it gives my an SSL Protocol Error. All my other endpoints including blog.example.com are working. What am I doing wrong?
Okey, obviously it had nothing to do with my redirect configuration. Seemed like a hickup in traefik / docker, similar to ACME certificates timeout with traefik. Just waited one day and everything worked as expected. Just two minor updates to correct the redirect configuration. Maybe there's a more elegant solution.
traefik_dynamic.toml:
[http.middlewares]
[http.middlewares.goToBlog.redirectregex]
regex = "^https://(.*)example.com/(.*)"
replacement = "https://blog.example.com/${2}" # no double $$
permanent = true
[http.routers]
[http.routers.gotoblog]
rule = "Host(`example.com`, `www.example.com`)" # just an array of domains is fine, too
entrypoints = ["websecure"]
middlewares = ["goToBlog"]
service = "noop#internal"
[http.routers.gotoblog.tls]
certResolver = "lets-encrypt"

Certificate Auth. : the specified credentials were rejected by the server

It's been multiple days since I started trying enabling all my Windows hosts to be reachable with Ansible via the certificate authentication method. I use a script to configure WinRM and to create a self-signed certificate. On multiple hosts, it works fine and after the script is finished I can connect to them via certificate authentication but on some other (like 15-20% of them) it's impossible.
I get this error message:
fatal: [SERVERNAME]: UNREACHABLE! => {
"changed": false,
"msg": "certificate: the specified credentials were rejected by the server",
"unreachable": true
}
What is strange is that I don't see the login event in the Windows event viewer. Here is my WinRM configuration:
Service
RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GR;;;IU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)
MaxConcurrentOperations = 4294967295
MaxConcurrentOperationsPerUser = 1500
EnumerationTimeoutms = 240000
MaxConnections = 300
MaxPacketRetrievalTimeSeconds = 120
AllowUnencrypted = false
Auth
Basic = true
Kerberos = true
Negotiate = true
Certificate = true
CredSSP = true
CbtHardeningLevel = Relaxed
DefaultPorts
HTTP = 5985
HTTPS = 5986
IPv4Filter = *
IPv6Filter = *
EnableCompatibilityHttpListener = false
EnableCompatibilityHttpsListener = false
CertificateThumbprint
AllowRemoteAccess = true
Winrs
AllowRemoteShellAccess = true
IdleTimeout = 7200000
MaxConcurrentUsers = 10
MaxShellRunTime = 2147483647
MaxProcessesPerShell = 25
MaxMemoryPerShellMB = 1024
MaxShellsPerUser = 30
Both the listener and the certificate mapping are configured on the windows machine:
Listener
Address = *
Transport = HTTPS
Port = 5986
Hostname
Enabled = true
URLPrefix = wsman
CertificateThumbprint = 927...C26E
ListeningOn = 127.0.0.1, 172.20.x.x
CertMapping
URI = *
Subject = ansibleuser#localhost
Issuer = 579f3eb1c3756339a246843f70e1a89b14fdc244
UserName = ansibleuser
Enabled = true
Password
What I've tried up until now:
Check the presence of the LocalAccountTokenFilterPolicy registry key
Configure the access to WinRM through winrm configSDDL default
Check GPOs
Change the password (check and uncheck password never expires,
etc...)
Create another local admin user
Enable basic and unencrypted authentication
Change the connection type to private (could not since the servers
are domain joined)
Run the script provided by ansible to configure WinRM
I don't understand what is going on and it's driving me nuts. Did someone encounter this problem before ?
I'm open to all suggestions, thanks in advance.
FINALLY found a solution to this problem:
Based on this thread Client Certificate-based authentication stopped working for PS Remoting, I found out that a registry key named "ClientAuthTrustMode" should be set to the value "2" and, with that, the error message magically disappears.
Here is a Microsoft article detailing the implication of the key : Overview of TLS - SSL (Schannel SSP)
Here is a simple powershell command to flip the switch :
Set-ItemProperty -Path registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL -Name ClientAuthTrustMode -Type DWord -Value 2
Hopefully that will help someone out there.
Thanks for Your solution! It solves my problem also.
We had lot of servers using ansible and WINRM with certificate based authentication, but only one of our servers had the same issue as yours was...
The only one interesting difference, what I've found on your shared settings is this:
ListeningOn = 127.0.0.1, 172.20.x.x
This is also same as mine... localhost is in the first place...
ListeningOn = 127.0.0.1, 19x.1xx.19.98, ::1
Other servers in our setup is add the network interface on the first place like this:
ListeningOn = 19x.16.61.38, 127.0.0.1, ::1
I really don't know, it matters or not, but this is the only difference, what I've found.
Thanks a lot.

Gitlab-runner Interactive Web Terminals not connected

I have installed successfully a gitlab-runner on a VM, and it is used by some of my projects. I would like to use the Interactive Web Terminal to have a chance to debug when some pipeline fails.
I'm trying to configure my config.toml file, following this docu of GitLab but I'm not understanding which ip address I should use in the setting listen_address. Should it be the ip of the running machine? The docker container instance? Or what?
Here is my current configuration:
concurrent = 2
check_interval = 0
log_level = "panic"
[session_server]
listen_address = "0.0.0.0:8093" # listen on all available interfaces on port 8093
session_timeout = 1800
[[runners]]
name = "A test private repo"
url = "https://gitlab.com/"
token = "myToken"
executor = "docker"
[runners.custom_build_dir]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.custom]
run_exec = ""
Screen of error I get
I noticed that when I hit the 0.0.0.0:8093 address on the machine where the gitlab-runner is running I get this response:
Your configuration should use:
[session_server]
session_timeout = 1800
listen_address = "0.0.0.0:8093"
advertise_address = "<your runner IP/hostname>:8093"
Should it be the ip of the running machine?
Yes

Traefik not getting SSL certificates for new domains

I've got Traefik/Docker Swarm/Let's Encrypt/Consul set up, and it's been working fine. It managed to successfully get certificates for the domains admin.domain.tld, registry.domain.tld and staging.domain.tld, but now that I've tried adding containers that are serving domain.tld and matomo.domain.tld those aren't getting any certificates (browser warns of self signed certificate because it's the default Traefik certificate).
My Traefik configuration (that's being uploaded to Consul):
debug = false
logLevel = "DEBUG"
insecureSkipVerify = true
defaultEntryPoints = ["https", "http"]
[entryPoints]
[entryPoints.ping]
address = ":8082"
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[traefikLog]
filePath = '/var/log/traefik/traefik.log'
format = 'json'
[accessLog]
filePath = '/var/log/traefik/access.log'
format = 'json'
[accessLog.fields]
defaultMode = 'keep'
[accessLog.fields.headers]
defaultMode = 'keep'
[accessLog.fields.headers.names]
"Authorization" = "drop"
[retry]
[api]
entryPoint = "traefik"
dashboard = true
debug = false
[ping]
entryPoint = "ping"
[metrics]
[metrics.influxdb]
address = "http://influxdb:8086"
protocol = "http"
pushinterval = "10s"
database = "metrics"
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "domain.tld"
watch = true
exposedByDefault = false
network = "net_web"
swarmMode = true
[acme]
email = "my#mail.tld"
storage = "traefik/acme/account"
entryPoint = "https"
onHostRule = true
[acme.httpChallenge]
entryPoint = "http"
Possibly related, in traefik.log I repeatedly (as in almost once per second) get the following (but only for the registry subdomain). Sounds like an issue to persist the data to consul, but there are no errors indicating such an issue.
{"level":"debug","msg":"Looking for an existing ACME challenge for registry.domain.tld...","time":"2019-07-07T11:37:23Z"}
{"level":"debug","msg":"Looking for provided certificate to validate registry.domain.tld...","time":"2019-07-07T11:37:23Z"}
{"level":"debug","msg":"No provided certificate found for domains registry.domain.tld, get ACME certificate.","time":"2019-07-07T11:37:23Z"}
{"level":"debug","msg":"ACME got domain cert registry.domain.tld","time":"2019-07-07T11:37:23Z"}
Update: I managed to find this line in the log:
{"level":"error","msg":"Error getting ACME certificates [matomo.domain.tld] : cannot obtain certificates: acme: Error -\u003e One or more domains had a problem:\n[matomo.domain.tld] acme: error: 400 :: urn:ietf:paramsacme:error:connection :: Fetching http://matomo.domain.tld/.well-known/acme-challenge/WJZOZ9UC1aJl9ishmL2ACKFbKoGOe_xQoSbD34v8mSk: Timeout after connect (your server may be slow or overloaded), url: \n","time":"2019-07-09T16:27:43Z"}
So it seems the issue is the challenge failing because of a timeout. Why the timeout though?
Update 2: More log entries:
{"level":"debug","msg":"Looking for an existing ACME challenge for staging.domain.tld...","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"Looking for provided certificate to validate staging.domain.tld...","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"No provided certificate found for domains staging.domain.tld, get ACME certificate.","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"No certificate found or generated for staging.domain.tld","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"http: TLS handshake error from 10.255.0.2:51981: remote error: tls: unknown certificate","time":"2019-07-10T19:38:34Z"}
But then, after a couple minutes to an hour, it works (for two domains so far).
not sure if its a feature or a bug, but removing the following http to https redirect solved it for me:
[entryPoints.http.redirect]
entryPoint = "https"

Traefik ACME HTTP SNI 01 .well-known not available (v1.5.0)

I am using a complex Traefik - Dropcart setup with automatic SSL certification via Let's Encrypt. Because of the TLS-SNI termintation I switched to the rc5 Docker version of Let's Encrypt which support HTTP-SNI, DNS isn't an option for me.
Unfortunately it gives an 400 timeout error (see logs).
Config
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
compress = true
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[...]
[acme]
email = "email#address.com"
caServer = "https://acme-staging.api.letsencrypt.org/directory"
storage = "/etc/traefik/acme/acme.json"
entryPoint = "https"
onHostRule = true
acmeLogging = true
#dnsProvider = "manual"
[acme.httpChallenge]
entryPoint = "http"
Logs
domain.example.com:acme: Error 400 - urn:acme:error:connection -
Fetching http://domain.example.com/.well-known/acme-challenge/5uyEKpgr[...]c4CfMOZjc: Timeout
Error Detail:
Validation for domain.example.com:80
Resolved to:
*IPv4*
*IPv6*
Used: *IPv6*
]"
Does anyone know how I can get HTTP validation fixed?
Thanks!
EDIT:
Same config seemed to work on a consul backend. So maybe something to do with Docker or acme.json?

Resources