I've got Traefik/Docker Swarm/Let's Encrypt/Consul set up, and it's been working fine. It managed to successfully get certificates for the domains admin.domain.tld, registry.domain.tld and staging.domain.tld, but now that I've tried adding containers that are serving domain.tld and matomo.domain.tld those aren't getting any certificates (browser warns of self signed certificate because it's the default Traefik certificate).
My Traefik configuration (that's being uploaded to Consul):
debug = false
logLevel = "DEBUG"
insecureSkipVerify = true
defaultEntryPoints = ["https", "http"]
[entryPoints]
[entryPoints.ping]
address = ":8082"
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[traefikLog]
filePath = '/var/log/traefik/traefik.log'
format = 'json'
[accessLog]
filePath = '/var/log/traefik/access.log'
format = 'json'
[accessLog.fields]
defaultMode = 'keep'
[accessLog.fields.headers]
defaultMode = 'keep'
[accessLog.fields.headers.names]
"Authorization" = "drop"
[retry]
[api]
entryPoint = "traefik"
dashboard = true
debug = false
[ping]
entryPoint = "ping"
[metrics]
[metrics.influxdb]
address = "http://influxdb:8086"
protocol = "http"
pushinterval = "10s"
database = "metrics"
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "domain.tld"
watch = true
exposedByDefault = false
network = "net_web"
swarmMode = true
[acme]
email = "my#mail.tld"
storage = "traefik/acme/account"
entryPoint = "https"
onHostRule = true
[acme.httpChallenge]
entryPoint = "http"
Possibly related, in traefik.log I repeatedly (as in almost once per second) get the following (but only for the registry subdomain). Sounds like an issue to persist the data to consul, but there are no errors indicating such an issue.
{"level":"debug","msg":"Looking for an existing ACME challenge for registry.domain.tld...","time":"2019-07-07T11:37:23Z"}
{"level":"debug","msg":"Looking for provided certificate to validate registry.domain.tld...","time":"2019-07-07T11:37:23Z"}
{"level":"debug","msg":"No provided certificate found for domains registry.domain.tld, get ACME certificate.","time":"2019-07-07T11:37:23Z"}
{"level":"debug","msg":"ACME got domain cert registry.domain.tld","time":"2019-07-07T11:37:23Z"}
Update: I managed to find this line in the log:
{"level":"error","msg":"Error getting ACME certificates [matomo.domain.tld] : cannot obtain certificates: acme: Error -\u003e One or more domains had a problem:\n[matomo.domain.tld] acme: error: 400 :: urn:ietf:paramsacme:error:connection :: Fetching http://matomo.domain.tld/.well-known/acme-challenge/WJZOZ9UC1aJl9ishmL2ACKFbKoGOe_xQoSbD34v8mSk: Timeout after connect (your server may be slow or overloaded), url: \n","time":"2019-07-09T16:27:43Z"}
So it seems the issue is the challenge failing because of a timeout. Why the timeout though?
Update 2: More log entries:
{"level":"debug","msg":"Looking for an existing ACME challenge for staging.domain.tld...","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"Looking for provided certificate to validate staging.domain.tld...","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"No provided certificate found for domains staging.domain.tld, get ACME certificate.","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"No certificate found or generated for staging.domain.tld","time":"2019-07-10T19:38:34Z"}
{"level":"debug","msg":"http: TLS handshake error from 10.255.0.2:51981: remote error: tls: unknown certificate","time":"2019-07-10T19:38:34Z"}
But then, after a couple minutes to an hour, it works (for two domains so far).
not sure if its a feature or a bug, but removing the following http to https redirect solved it for me:
[entryPoints.http.redirect]
entryPoint = "https"
Related
Trying to connect to a windows host from a Linux Zorin control Host by using Ansible.
Installed winrm in the windows machine and set all the required authentication methods to True.
Configuration of winrm in the Window Host
PS C:\WINDOWS\system32> winrm get winrm/config
Config
MaxEnvelopeSizekb = 500
MaxTimeoutms = 60000
MaxBatchItems = 32000
MaxProviderRequests = 4294967295
Client
NetworkDelayms = 5000
URLPrefix = wsman
AllowUnencrypted = true
Auth
Basic = true
Digest = true
Kerberos = true
Negotiate = true
Certificate = true
CredSSP = false
DefaultPorts
HTTP = 5985
HTTPS = 5986
TrustedHosts
Service
RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GXGR;;;S-1-5-21-2039588290-1060779563-2652726705-1011)(A;;GR;;;IU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)
MaxConcurrentOperations = 4294967295
MaxConcurrentOperationsPerUser = 1500
EnumerationTimeoutms = 240000
MaxConnections = 300
MaxPacketRetrievalTimeSeconds = 120
AllowUnencrypted = false
Auth
Basic = true
Kerberos = true
Negotiate = true
Certificate = false
CredSSP = false
CbtHardeningLevel = Relaxed
DefaultPorts
HTTP = 5985
HTTPS = 5986
IPv4Filter = *
IPv6Filter = *
EnableCompatibilityHttpListener = false
EnableCompatibilityHttpsListener = false
CertificateThumbprint
AllowRemoteAccess = true
Winrs
AllowRemoteShellAccess = true
IdleTimeout = 7200000
MaxConcurrentUsers = 2147483647
MaxShellRunTime = 2147483647
MaxProcessesPerShell = 2147483647
MaxMemoryPerShellMB = 2147483647
MaxShellsPerUser = 2147483647
Even after setting the Basic = true, getting the specified creds were rejected error. Tried making AllowUnencrypted = true, but it is showing following error message:
WSManFault
Message
ProviderFault
WSManFault
Message = WinRM firewall exception will not work since one of the network connection types on this machine is set to Public. Change the network connection type to either Domain or Private and try again.
Tried changing the network connection type to private. And tried making AllowUnencrypted = true, getting the same error again as above(WinRM firewall exception will not work since one of the network connection types on this machine is set to Public. Change the network connection type to either Domain or Private and try again.)
Tried adding a firewall exception rule to the port 5985 too on the windows host.
Tried giving the permissions of Read and Execute to the user by winrm configsddl default also. Even though not working.
Giving the right credentials. The hosts file of ansible is as follows:
[win]
<IP>
[win:vars]
ansible_user=<username>
ansible_password=<password>
ansible_connection=winrm
ansible_winrm_scheme=http
ansible_winrm_transport=basic
ansible_winrm_port=5985
ansible_winrm_server_cert_validation=ignore
Trying the following ansible command:
ansible win -i hosts -m win_ping
I tried everything i found in the internet, but not able to establish the connection through winrm.
I will be thankful to anyone who provides the solution. My eyes are bleeding red from watching the error on the screen from 4 days.
I changed the ansible_winrm_transport from basic to ntlm. It resolved my issue.
I want to permanently redirect all requests to example.com and www.example.com to blog.example.com in a TLS environment.
My current config:
traefik.toml:
[entryPoints]
[entryPoints.web]
address = ":80"
[entryPoints.web.http.redirections.entryPoint]
to = "websecure"
scheme = "https"
[entryPoints.websecure]
address = ":443"
[providers.docker]
exposedbydefault = false
watch = true
network = "web"
[providers.file]
filename = "traefik_dynamic.toml"
[certificatesResolvers.lets-encrypt.acme]
email = "mymail#example.com"
storage = "/letsencrypt/acme.json"
[certificatesResolvers.lets-encrypt.acme.dnsChallenge]
provider = "myprovider"
traefik_dynamic.toml:
[http.middlewares]
[http.middlewares.goToBlog.redirectregex]
regex = "^https://(.*)example.com/(.*)"
replacement = "https://blog.example.com/$${2}"
permanent = true
[http.routers]
[http.routers.gotoblog]
rule = "Host(`example.com`) || Host(`www.example.com`)"
entrypoints = ["websecure"]
middlewares = ["goToBlog"]
service = "noop#internal"
[http.routers.gotoblog.tls]
certResolver = "lets-encrypt"
When I try to access example.com it gives my an SSL Protocol Error. All my other endpoints including blog.example.com are working. What am I doing wrong?
Okey, obviously it had nothing to do with my redirect configuration. Seemed like a hickup in traefik / docker, similar to ACME certificates timeout with traefik. Just waited one day and everything worked as expected. Just two minor updates to correct the redirect configuration. Maybe there's a more elegant solution.
traefik_dynamic.toml:
[http.middlewares]
[http.middlewares.goToBlog.redirectregex]
regex = "^https://(.*)example.com/(.*)"
replacement = "https://blog.example.com/${2}" # no double $$
permanent = true
[http.routers]
[http.routers.gotoblog]
rule = "Host(`example.com`, `www.example.com`)" # just an array of domains is fine, too
entrypoints = ["websecure"]
middlewares = ["goToBlog"]
service = "noop#internal"
[http.routers.gotoblog.tls]
certResolver = "lets-encrypt"
It's been multiple days since I started trying enabling all my Windows hosts to be reachable with Ansible via the certificate authentication method. I use a script to configure WinRM and to create a self-signed certificate. On multiple hosts, it works fine and after the script is finished I can connect to them via certificate authentication but on some other (like 15-20% of them) it's impossible.
I get this error message:
fatal: [SERVERNAME]: UNREACHABLE! => {
"changed": false,
"msg": "certificate: the specified credentials were rejected by the server",
"unreachable": true
}
What is strange is that I don't see the login event in the Windows event viewer. Here is my WinRM configuration:
Service
RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GR;;;IU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)
MaxConcurrentOperations = 4294967295
MaxConcurrentOperationsPerUser = 1500
EnumerationTimeoutms = 240000
MaxConnections = 300
MaxPacketRetrievalTimeSeconds = 120
AllowUnencrypted = false
Auth
Basic = true
Kerberos = true
Negotiate = true
Certificate = true
CredSSP = true
CbtHardeningLevel = Relaxed
DefaultPorts
HTTP = 5985
HTTPS = 5986
IPv4Filter = *
IPv6Filter = *
EnableCompatibilityHttpListener = false
EnableCompatibilityHttpsListener = false
CertificateThumbprint
AllowRemoteAccess = true
Winrs
AllowRemoteShellAccess = true
IdleTimeout = 7200000
MaxConcurrentUsers = 10
MaxShellRunTime = 2147483647
MaxProcessesPerShell = 25
MaxMemoryPerShellMB = 1024
MaxShellsPerUser = 30
Both the listener and the certificate mapping are configured on the windows machine:
Listener
Address = *
Transport = HTTPS
Port = 5986
Hostname
Enabled = true
URLPrefix = wsman
CertificateThumbprint = 927...C26E
ListeningOn = 127.0.0.1, 172.20.x.x
CertMapping
URI = *
Subject = ansibleuser#localhost
Issuer = 579f3eb1c3756339a246843f70e1a89b14fdc244
UserName = ansibleuser
Enabled = true
Password
What I've tried up until now:
Check the presence of the LocalAccountTokenFilterPolicy registry key
Configure the access to WinRM through winrm configSDDL default
Check GPOs
Change the password (check and uncheck password never expires,
etc...)
Create another local admin user
Enable basic and unencrypted authentication
Change the connection type to private (could not since the servers
are domain joined)
Run the script provided by ansible to configure WinRM
I don't understand what is going on and it's driving me nuts. Did someone encounter this problem before ?
I'm open to all suggestions, thanks in advance.
FINALLY found a solution to this problem:
Based on this thread Client Certificate-based authentication stopped working for PS Remoting, I found out that a registry key named "ClientAuthTrustMode" should be set to the value "2" and, with that, the error message magically disappears.
Here is a Microsoft article detailing the implication of the key : Overview of TLS - SSL (Schannel SSP)
Here is a simple powershell command to flip the switch :
Set-ItemProperty -Path registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL -Name ClientAuthTrustMode -Type DWord -Value 2
Hopefully that will help someone out there.
Thanks for Your solution! It solves my problem also.
We had lot of servers using ansible and WINRM with certificate based authentication, but only one of our servers had the same issue as yours was...
The only one interesting difference, what I've found on your shared settings is this:
ListeningOn = 127.0.0.1, 172.20.x.x
This is also same as mine... localhost is in the first place...
ListeningOn = 127.0.0.1, 19x.1xx.19.98, ::1
Other servers in our setup is add the network interface on the first place like this:
ListeningOn = 19x.16.61.38, 127.0.0.1, ::1
I really don't know, it matters or not, but this is the only difference, what I've found.
Thanks a lot.
I have a specific certificate generated by letsencrypt.
In my traefik config, I have:
kind: ConfigMap
apiVersion: v1
metadata:
name: traefik-config
data:
traefik.toml: |
# traefik.toml
defaultEntryPoints = ["http","https"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "/etc/xxx/my-cert.crt"
keyFile = "/etc/xxx/my-cert.key"
[acme] # Automatically add Let's Encrypt Certificate.
storage= "/etc/certificate/acme.json"
email = "john.doe#company.com"
entryPoint = "https"
onHostRule = true
caServer = "https://acme-v02.api.letsencrypt.org/directory"
[acme.dnsChallenge]
provider = "route53"
delayBeforeCheck = 0
[[acme.domains]]
main = "*.company.com"
#[[acme.domains]]
# main = "*.espace-client.company.com"
Thing is my certicate :
/etc/xxx/my-cert.crt
will end in 10 days.
I also have the a certificate for the wild card: *.company.com
Will traefik renew it automatically or should I do something ?
According to the documentation, the certificate should never end in 10 days. Something must be wrong.
"If there are less than 30 days remaining before the certificate
expires, Traefik will attempt to renew it automatically."
You should check the logs of your traefik container:
docker logs traefik-container
I am using a complex Traefik - Dropcart setup with automatic SSL certification via Let's Encrypt. Because of the TLS-SNI termintation I switched to the rc5 Docker version of Let's Encrypt which support HTTP-SNI, DNS isn't an option for me.
Unfortunately it gives an 400 timeout error (see logs).
Config
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
compress = true
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[...]
[acme]
email = "email#address.com"
caServer = "https://acme-staging.api.letsencrypt.org/directory"
storage = "/etc/traefik/acme/acme.json"
entryPoint = "https"
onHostRule = true
acmeLogging = true
#dnsProvider = "manual"
[acme.httpChallenge]
entryPoint = "http"
Logs
domain.example.com:acme: Error 400 - urn:acme:error:connection -
Fetching http://domain.example.com/.well-known/acme-challenge/5uyEKpgr[...]c4CfMOZjc: Timeout
Error Detail:
Validation for domain.example.com:80
Resolved to:
*IPv4*
*IPv6*
Used: *IPv6*
]"
Does anyone know how I can get HTTP validation fixed?
Thanks!
EDIT:
Same config seemed to work on a consul backend. So maybe something to do with Docker or acme.json?