Traefik poor upload perfomance - performance

Recently I moved to traefik as my reverse proxy of choice. But noticed that upload speed to my synology NAS decreased dramatically while using traefik with tls enabled. I did a little of investigation and installed librespeed container to do some speed tests.
The results surprised me. Plain http (directly to container over VPN) 150/300, and while using traefik (over public IP) the best it can do was 100/20. VM configuration is 16 CPUs (hardware AES encryption supported / AMD Epyc 7281) and 32 gigs of ram with 10Gb net.
Is it the right perfomance I should expect from traefik? Upload speed decreased more than 10 times. Maybe it is configuration issue?
services:
traefik:
image: traefik:v2.9.6
container_name: traefik
restart: unless-stopped
networks:
- outbound
- internal
command:
- "--serversTransport.insecureSkipVerify=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.docker=true"
- "--providers.docker.watch"
- "--providers.docker.network=outbound"
- "--providers.docker.swarmMode=false"
- "--entrypoints.http.address=:80"
- "--entrypoints.https.address=:443"
- "--entryPoints.traefik.address=:8888"
- "--entrypoints.http.http.redirections.entryPoint.to=https"
- "--entrypoints.http.http.redirections.entryPoint.scheme=https"
- "--providers.file.directory=/rules"
- "--providers.file.watch=true"
- "--api.insecure=true"
- "--accessLog=true"
- "--accessLog.filePath=/traefik.log"
- "--accessLog.bufferingSize=100"
- "--accessLog.filters.statusCodes=400-499"
- "--metrics"
- "--metrics.prometheus.buckets=0.1,0.3,1.2,5.0"
#- "--log.level=DEBUG"
- "--certificatesResolvers.myresolver.acme.caServer=https://acme-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.myresolver.acme.storage=acme.json"
- "--certificatesResolvers.myresolver.acme.httpChallenge.entryPoint=http"
- "--certificatesResolvers.myresolver.acme.tlsChallenge=true"
- "--certificatesResolvers.myresolver.acme.email=asd#asd.me"
volumes:
- /etc/localtime:/etc/localtime:ro
- ./traefik/acme.json:/acme.json
- ./traefik/traefik.log:/traefik.log
- ./traefik/rules:/rules
- /var/run/docker.sock:/var/run/docker.sock:ro
ports:
- "80:80"
- "443:443"
- "8888:8888"
librespeed:
image: adolfintel/speedtest
container_name: librespeed
environment:
- MODE=standalone
networks:
- outbound
ports:
- 8080:80
labels:
- "traefik.enable=true"
- "traefik.http.routers.librespeed.rule=Host(`s.mydomain.com`)"
- "traefik.http.services.librespeed.loadbalancer.server.port=80"
- "traefik.http.routers.librespeed.entrypoints=https,http"
- "traefik.http.routers.librespeed.tls=true"
- "traefik.http.routers.librespeed.tls.certresolver=myresolver"
Maybe up to 2x times speed decrese.

There could be a few reasons why you are experiencing a decrease in upload speed when using Traefik as your reverse proxy with TLS enabled.
One potential reason is that the overhead of the encryption and decryption process is causing a bottleneck in your system. The CPU usage of your VM may be high when running Traefik, which can cause a decrease in performance.
Another potential reason could be that the configuration of your Traefik container is not optimized for performance. For example, there might be some misconfigured settings that are causing high CPU usage, or there might be some settings that are not properly utilizing the resources available on your system.
You could try some of the following steps to help improve the performance of your Traefik container:
Increase the number of worker threads in Traefik by adding the --global.sendTimeout=6h and --global.readTimeout=6h to the command.
Increase the number of worker processes in Traefik by adding the --workers=16 to the command.
To check if the problem is related to the encryption process, you could try disabling the encryption to see if that improves the performance.
Finally, you could try disabling the access log, which could help to reduce the CPU usage

Related

Docker containers become unresponsive/hang on error

I'm running Docker Desktop on Windows and am having a problem with containers becoming unresponsive on startup errors. This doesn't happen 'every' time, but by far most of the time. Consequently, I have to be very careful to start my containers 1 at a time, and if I see one error, I have to "Restart Docker Desktop" and start the starting again.
I'm using docker-compose and as a specific example, this morning I started elasticsearch, zookeeper, then kafka. Kafka threw an exception regarding the zookeeper state and shuts down - but now the kafka container is unresponsive in docker. I can't stop it (it's already stopped?) but it shows as running. I can't CLI into it, I can't restart it. The only way forwards is to restart docker using the debug menu. (If I have the restart:always flag on, then the containers will actually restart automatically, but given they're throwing errors, it will just spin around in circles starting then dying without my being able to stop/kill/remove the offending container)
Once I've restarted docker, I'll be able to view the log of the container and see the error that was thrown...
This happens with pretty much all of my containers, however it does appear that if I start the container whilst viewing the log window within Docker Desktop, it is perhaps 'more likely' that I'll be able to start the container again if it has an error.
I've tried several different containers and this seems to be a pretty common issue for us, it doesn't appear to relate to any specific settings that I'm passing into the containers, however an extract from our docker-compose file is below:
volumes:
zData:
kData:
eData:
zookeeper:
container_name: zookeeper
image: bitnami/zookeeper:latest
environment:
ALLOW_ANONYMOUS_LOGIN: "yes" #Dev only
ZOOKEEPER_ROOT_LOGGER: WARN, CONSOLE
ZOOKEEPER_CONSOLE_THRESHOLD: WARN
ports:
- "2181:2181"
volumes:
- zData:/bitnami/zookeeper:rw
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: zookeeper
fluentd-async-connect: "true"
kafka:
container_name: kafka
image: bitnami/kafka:latest
depends_on:
- zookeeper
environment:
ALLOW_PLAINTEXT_LISTENER: "yes" # Debug only
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_CREATE_TOPICS: xx1_event:1:1,xx2_event:1:1,xx3_event:1:1,xx4_event:1:1
KAFKA_JMX_OPTS: -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=${DOCKER_HOSTNAME} -Dcom.sun.management.jmxremote.rmi.port=9096 -Djava.net.preferIPv4Stack=true
JMX_PORT: 9096
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
hostname: kakfa
ports:
- 9092:9092
- 9096:9096
volumes:
- kData:/bitnami/kafka:rw
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: zookeeper
fluentd-async-connect: "true"
elasticsearch:
image: bitnami/elasticsearch:latest
container_name: elasticsearch
cpu_shares: 2048
environment:
ELASTICSEARCH_HEAP_SIZE: "2048m"
xpack.monitoring.enabled: "false"
ports:
- 9200:9200
- 9300:9300
volumes:
- C:/config/elasticsearch.yml:/opt/bitnami/elasticsearch/config/my_elasticsearch.yml:rw
- eData:/bitnami/elasticsearch/data:rw
I've wondered about the potential for this to be a resourcing issue, however I'm running this on an a reasonably spec'd laptop (i7 laptop, SSD, 16GB RAM) using WSL2 (also happens when using Hyper-V) and RAM limits don't look like they're being approached. And when there are no errors on startup, the system runs fine and uses far more resources.
Any ideas on what I could try? I'm surprised there's not many more people struggling with this?
There is currently an issue https://github.com/moby/moby/issues/40063 where containers will hang/freeze/become unresponsive when logging is set to fluentd in asynchronous mode AND the fluentd container is not operational.

docker nginx-proxy requests let's encrypt certificates hit rate limit

I use nginx-proxy from jwilder and observe that the same letsencrypt certificates are repeatedly recreated. I am in the processed of debugging the servers and my guess is that if I start only a subset of the servers, the certificate for the ones not started are lost. When these are started later, the certificates are recreated with requests to letsencrypt. eventually I hit the rate limit. -- Another explanation could be that the cause may be that I removed and re-started the relevant container which keeps the certificates?
ACME server returned an error: urn:ietf:params:acme:error:rateLimited
:: There were too many requests of a given type :: Error creating new
order :: too many certificates already issued for exact set of
domains: caldav.gerastree.at: see
https://letsencrypt.org/docs/rate-limits/.
The limit is 5 per week.
What can be done to "reuse" certificates and not have new ones requested? When are certificates removed?
The docker-compse.yml file is from traskit, which is a multi-architecture version of jwilder:
version: '2'
services:
frontproxy:
image: traskit/nginx-proxy
container_name: frontproxy
labels:
- "com.github.jrcs.letsencrypt_nginx_proxy_companion.docker_gen"
restart: always
environment:
DEFAULT_HOST: default.vhost
HSTS: "off"
ports:
- "80:80"
- "443:443"
volumes:
# - /home/frank/Data/htpasswd:/etc/nginx/htpasswd
- /var/run/docker.sock:/tmp/docker.sock:ro
- "certs-volume:/etc/nginx/certs:ro"
- "/etc/nginx/vhost.d"
- "/usr/share/nginx/html"
nginx-letsencrypt-companion:
restart: always
image: jrcs/letsencrypt-nginx-proxy-companion
volumes:
- "certs-volume:/etc/nginx/certs"
- "/var/run/docker.sock:/var/run/docker.sock:ro"
volumes_from:
- "frontproxy"
volumes:
certs-volume:
For anyone finding this in the future: LE say that there's no way to clear the status of your domain-set once you've hit the rate-limit until the 7 day "sliding window" has elapsed, regardless of how you spell or arrange the domains in the certbot command.
However, if like me, you have a spare domain kicking around that you haven't yet added to the cert, add that to another -d flag and re-run the command. This worked for me.
Have the same issue, issuing certs within docker container when container starts. Seems like there is no way to resolve it. You can use stage server - but certs will not be authorized by CA.
So, yea, if its an option for you - you could have certbot running on host, and pass certs inside container.

Microservice can not reach Elasticsearch Image

I have one Microservice with Jhipster version 5v and a image 2.4.1 of the ElasticSearch running in vagrant centos 7v. The two image are running but the operations of save and search can not reach the Elasticsearch image.
docker-compose:
service-app:
image: "..."
depends_on:
- service-mysql
- service-elasticsearch
- kafka
- zookeeper
- jhipster-registry
environment:
- SPRING_PROFILES_ACTIVE=dev,swagger
- SPRING_CLOUD_CONFIG_URI=http://admin:admin#jhipster-registry:8761/config
- SPRING_DATASOURCE_URL=jdbc:mysql://service-mysql:3306/service?useUnicode=true&characterEncoding=utf8&useSSL=false
- SPRING_DATA_CASSANDRA_CONTACTPOINTS=cassandra
- JHIPSTER_SLEEP=30
- JHIPSTER_LOGGING_LOGSTASH_HOST=jhipster-logstash
- JHIPSTER_LOGGING_LOGSTASH_PORT=5000
- SPRING_DATA_ELASTICSEARCH_CLUSTER-NAME=SERVICE
- SPRING_DATA_ELASTICSEARCH_CLUSTER_NODES=service-elasticsearch:9300
- SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=kafka
- SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=zookeeper
- EUREKA_CLIENT_SERVICEURL_DEFAULTZONE=http://admin:admin#jhipster-registry:8761/eureka
ports:
- 60088:8088
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "10"
service-elasticsearch:
image: ...
volumes:
- service-elasticsearch:/usr/share/elasticsearch/data/
environment:
- network.host=0.0.0.0
- cluster.name=service
- discovery.type=single-node
- CLUSTER_NAME=SERVICE
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "10"
application_dev.yml:
data:
elasticsearch:
properties:
path:
home: target/elasticsearch
application_prod:
data:
jest:
uri: http://localhost:9200
domain:
The issue is that one of your ES node in your cluster is running on low disk space, hence you are getting this exception.
Please make sure that you as clean up the disk space on the ES nodes on which you are getting the exception. I have faced this issue 2-3 times and it does not depend on the Elasticsearch index size, hence even you might have a very small index on large disk(let's suppose 2 TB) but if you don't have a free disk space more than 10% (which is almost 200 GB, which is huge) still you will get this exception and you need to clean up your disk space.

Configuring Docker with Traefik, Nginx and Laravel

I am trying to figure out how to setup a simple stack for development and later deployment. I want to utilize Docker to serve Traefik in a container as the public facing reverse-proxy, which then interfaces as needed with a Nginx container that is used only to serve static frontend files (HTML, CSS, JS) and a backend PHP container that runs Laravel (I'm intentionally decoupling the frontend and API for this project).
I am trying my best to learn through all of the video and written tutorials out there, but things become complicated very quickly (at least, for my uninitiated brain) and it's a bit overwhelming. I have a one-week deadline to complete this project and I'm strongly considering dropping Docker altogether for the time being out of fear that I'll spend the whole trying to mess around with the configuration instead of actually coding!
To get started, I have a simple docker-compose with the following configuration that I've verified at least runs correctly:
version: '3'
services:
reverse-proxy:
image: traefik
command: --api --docker # Enables Web UI and tells Traefik to listen to Docker.
ports:
- "80:80" # HTTP Port
- "8080:8080" # Web UI
volumes:
- /var/run/docker.sock:/var/run/docker.sock # So that Traefik can listen to the Docker events.
Now, I need to figure out how to connect Nginx and PHP/Laravel effectively.
First of all don't put yourself under stress to learn new stuff. Because if you do, learning new stuff won't feel that comfortable anymore. Take your knowledge of technology and get stuff done. When you're done and you realize you have 1/2 days to go to your deadline, try to overdeliver by including new technology. This way you won't screw your deadline and you will not be under stress figuring our new technology or configuration.
The configuration you see below is not complete nor functionally tested. I just copied most of the stuff out of 3 of my main projects in order to give you a starting-point. Traefik as-is can be complicated to set up properly.
version: '3'
# Instantiate your own configuration with a Dockerfile!
# This way you can build somewhere and just deploy your container
# anywhere without the need to copy files around.
services:
# traefik as reverse-proxy
traefik:
build:
context: .
dockerfile: ./Dockerfile-for-traefik # including traefik.toml
command: --docker
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
# this file you'll have to create manually `touch acme.json && chmod 600 acme.json`
- /home/docker/volumes/traefik/acme.json:/opt/traefik/acme.jso
networks:
- overlay
ports:
- 80:80
- 443:443
nginx:
build:
context: .
dockerfile: ./Dockerfile-for-nginx
networks:
- overlay
depends_on:
- laravel
volumes:
# you can copy your assets to production with
# `tar -c -C ./myassets . | docker cp - myfolder_nginx_1:/var/www/assets`
# there are many other ways to achieve this!
- assets:/var/www/assets
# define your application + whatever it needs to run
# important:
# - "build:" will search for a Dockerfile in the directory you're specifying
laravel:
build: ./path/to/laravel/app
environment:
MYSQL_ROOT_PASSWORD: password
ENVIRONMENT: development
MYSQL_DATABASE: your_database
MYSQL_USER: your_database_user
networks:
- overlay
links:
- mysql
volumes:
# this path is for development
- ./path/to/laravel/app:/app
# you need a database, right?
mysql:
image: mysql:5
environment:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: your_database
MYSQL_USER: your_database_user
networks:
- overlay
volumes:
- mysql-data:/var/lib/mysql
volumes:
mysql-data:
assets:
networks:
overlay:

Traefik - Can't connect via https

I am trying to run Traefik on a Raspberry Pi Docker Swarm (specifally following this guide https://github.com/openfaas/faas/blob/master/guide/traefik_integration.md from the OpenFaaS project) but have run into some trouble when actually trying to connect via https.
Specifically there are two issues:
1) When I connect to http://192.168.1.20/ui I am given the username / password prompt. However the details (unhashed password) generated by htpasswd and used in the below docker-compose.yml are not accepted.
2) Visting the https version (http://192.168.1.20/ui) does not connect at all. This is the same if I try to connect using the domain I have set in --acme.domains
When I explore /etc/ I can see that no /etc/traefik/ directory exists but should presumably be created so perhaps this is the root of my problem?
The relevant part of my docker-compose.yml looks like
traefik:
image: traefik:v1.3
command: -c --docker=true
--docker.swarmmode=true
--docker.domain=traefik
--docker.watch=true
--web=true
--debug=true
--defaultEntryPoints=https,http
--acme=true
--acme.domains='<my domain>'
--acme.email=myemail#gmail.com
--acme.ondemand=true
--acme.onhostrule=true
--acme.storage=/etc/traefik/acme/acme.json
--entryPoints=Name:https Address::443 TLS
--entryPoints=Name:http Address::80 Redirect.EntryPoint:https
ports:
- 80:80
- 8080:8080
- 443:443
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- "acme:/etc/traefik/acme"
networks:
- functions
deploy:
labels:
- traefik.port=8080
- traefik.frontend.rule=PathPrefix:/ui,/system,/function
- traefik.frontend.auth.basic=user:password <-- relevant credentials from htpasswd here
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 20
window: 380s
placement:
constraints: [node.role == manager]
volumes:
acme:
Any help very much appreciated.
Due to https://community.letsencrypt.org/t/2018-01-09-issue-with-tls-sni-01-and-shared-hosting-infrastructure/49996
The TLS challenge (default) for Let's Encrypt doesn't work anymore.
You must use the DNS challenge instead https://docs.traefik.io/configuration/acme/#dnsprovider.
Or waiting for the merge of https://github.com/containous/traefik/pull/2701

Resources