Related
I'm having some issues, trying to connect a producer container with a kafka container.
I will have 3 differents projects, each running in a docker container on the same machine :
kafka server
producer
consumer
At this moment, my Kafka's server is running well and I have just made a producer which i'm trying to send a message (only basic test).
I'm getting this error :
kafka:9092/bootstrap: Connect to ipv4#172.28.0.3:9092 failed: Connection refused
I checked multiple post/response but I'm kinda lost, I have read I needed the same network so I did it but can't figure what's else I'm missing.
Here is my Kafka server docker compose :
version: '3'
services:
laravel.test:
build:
context: ./vendor/laravel/sail/runtimes/8.1
dockerfile: Dockerfile
args:
WWWGROUP: '${WWWGROUP}'
image: sail-8.1/app
extra_hosts:
- 'host.docker.internal:host-gateway'
ports:
- '${APP_PORT:-80}:80'
environment:
WWWUSER: '${WWWUSER}'
LARAVEL_SAIL: 1
XDEBUG_MODE: '${SAIL_XDEBUG_MODE:-off}'
XDEBUG_CONFIG: '${SAIL_XDEBUG_CONFIG:-client_host=host.docker.internal}'
volumes:
- '.:/var/www/html'
networks:
- sail
depends_on:
- pgsql
pgsql:
image: 'postgres:13'
ports:
- '${FORWARD_DB_PORT:-5432}:5432'
environment:
PGPASSWORD: '${DB_PASSWORD:-secret}'
POSTGRES_DB: '${DB_DATABASE}'
POSTGRES_USER: '${DB_USERNAME}'
POSTGRES_PASSWORD: '${DB_PASSWORD:-secret}'
volumes:
- 'sail-pgsql:/var/lib/postgresql/data'
networks:
- sail
healthcheck:
test:
[
"CMD",
"pg_isready",
"-q",
"-d",
"${DB_DATABASE}",
"-U",
"${DB_USERNAME}"
]
retries: 3
timeout: 5s
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- 22181:2181
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
ports:
- 29092:29092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: LISTENER_INTERNAL://kafka:9092,LISTENER_EXTERNAL://localhost:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_INTERNAL:PLAINTEXT,LISTENER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_INTERNAL
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
- kafka
networks:
sail:
driver: bridge
kafka:
driver: bridge
volumes:
sail-pgsql:
driver: local
Here my producer :
version: '3'
services:
laravel.test:
build:
context: ./vendor/laravel/sail/runtimes/8.1
dockerfile: Dockerfile
args:
WWWGROUP: '${WWWGROUP}'
image: sail-8.1/app
extra_hosts:
- 'host.docker.internal:host-gateway'
ports:
- '${APP_PORT:-80}:80'
environment:
WWWUSER: '${WWWUSER}'
LARAVEL_SAIL: 1
XDEBUG_MODE: '${SAIL_XDEBUG_MODE:-off}'
XDEBUG_CONFIG: '${SAIL_XDEBUG_CONFIG:-client_host=host.docker.internal}'
volumes:
- '.:/var/www/html'
networks:
- sail
- proxy
depends_on:
- pgsql
pgsql:
image: 'postgres:13'
ports:
- '${FORWARD_DB_PORT:-5432}:5432'
environment:
PGPASSWORD: '${DB_PASSWORD:-secret}'
POSTGRES_DB: '${DB_DATABASE}'
POSTGRES_USER: '${DB_USERNAME}'
POSTGRES_PASSWORD: '${DB_PASSWORD:-secret}'
volumes:
- 'sail-pgsql:/var/lib/postgresql/data'
networks:
- sail
healthcheck:
test: ["CMD", "pg_isready", "-q", "-d", "${DB_DATABASE}", "-U", "${DB_USERNAME}"]
retries: 3
timeout: 5s
networks:
sail:
driver: bridge
proxy:
external:
name: server_kafka
volumes:
sail-pgsql:
driver: local
When I inspect the network I see my both container :
docker network inspect server_kafka
[
{
"Name": "server_kafka",
"Id": "6bc8ed3f604da554eeead58dca06ba8dd926673ae9683095de18a56b45bbd70f",
"Created": "2022-02-21T11:55:31.022965034+01:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.28.0.0/16",
"Gateway": "172.28.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"0b4c79952626cc79e757681137523ad918b802945d82df501f7f1ad1141ca841": {
"Name": "microservice-1_laravel.test_1",
"EndpointID": "99398ddf104b8f8ce478a611e6885fba8960fd956c59938cca4fbecc8d7d21c5",
"MacAddress": "02:42:ac:1c:00:02",
"IPv4Address": "172.28.0.2/16",
"IPv6Address": ""
},
"c1bf90faa56f61432ad99e11408ea50bfeaf5fcdeca0bfd9791bf28cb9fea835": {
"Name": "server_kafka_1",
"EndpointID": "4af2450208622da0550d935332c5451c8c53a07ff10a120fb519f30cfbd157cb",
"MacAddress": "02:42:ac:1c:00:03",
"IPv4Address": "172.28.0.3/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.network": "kafka",
"com.docker.compose.project": "server",
"com.docker.compose.version": "1.29.2"
}
}
]
The doc above is nice.
In my case, I was missing the network for zookeeper in my docker-compose
networks:
- kafka
I've been running my ECK (Elastic Cloud on Kubernetes) cluster for a couple of weeks with no issues. However, 3 days ago filebeat stopped being able to connect to my ES service. All pods are up and running (Elastic, Beats and Kibana).
Also, shelling into filebeats pods and connecting to the Elasticsearch service works just fine:
curl -k -u "user:$PASSWORD" https://quickstart-es-http.quickstart.svc:9200
{
"name" : "aegis-es-default-4",
"cluster_name" : "quickstart",
"cluster_uuid" : "",
"version" : {
"number" : "7.14.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "",
"build_date" : "",
"build_snapshot" : false,
"lucene_version" : "8.9.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Yet the filebeats pod logs are producing the below error:
ERROR
[publisher_pipeline_output] pipeline/output.go:154
Failed to connect to backoff(elasticsearch(https://quickstart-es-http.quickstart.svc:9200)):
Connection marked as failed because the onConnect callback failed: could not connect to a compatible version of Elasticsearch:
503 Service Unavailable:
{
"error": {
"root_cause": [
{ "type": "master_not_discovered_exception", "reason": null }
],
"type": "master_not_discovered_exception",
"reason": null
},
"status": 503
}
I haven't made any changes so I think it's a case of authentication or SSL certificates needing updating?
My filebeats config looks like this:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: quickstart
namespace: quickstart
spec:
type: filebeat
version: 7.14.0
elasticsearchRef:
name: quickstart
config:
filebeat:
modules:
- module: gcp
audit:
enabled: true
var.project_id: project_id
var.topic: topic_name
var.subcription: sub_name
var.credentials_file: /usr/certs/credentials_file
var.keep_original_message: false
vpcflow:
enabled: true
var.project_id: project_id
var.topic: topic_name
var.subscription_name: sub_name
var.credentials_file: /usr/certs/credentials_file
firewall:
enabled: true
var.project_id: project_id
var.topic: topic_name
var.subscription_name: sub_name
var.credentials_file: /usr/certs/credentials_file
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext:
runAsUser: 0
containers:
- name: filebeat
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
- name: credentials
mountPath: /usr/certs
readOnly: true
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: credentials
secret:
defaultMode: 420
items:
secretName: elastic-service-account
And it was working just fine - haven't made any changes to this config to make it lose access.
Did a little more digging and found that there weren't enough resources to be able to assign a master node.
Got this when I tried to run GET /_cat/master and it returned the same 503 no master error. I added a new node pool and it started running normally.
I'm trying to deploy and ELK stack on AKS that will take messages from RabbitMQ and ultimately end up in Kibana. To do this I'm using the Elastic operator via
kubectl apply -f https://download.elastic.co/downloads/eck/1.3.0/all-in-one.yaml
Everything is working except the connection between Logstash and Elasticsearch. I can log in to Kibana, I can get the default Elasticsearch message in the browser, all the logs look fine so I think the issue lies in the logstash configuration. My configuration is at the end of the question, you can see I'm using secrets to get the various passwords and access the public certificates to make the https work.
Most confusingly, I can bash into the running logstash pod and with the exact same certificate run
curl --cacert /etc/logstash/certificates/tls.crt -u elastic:<redacted-password> https://rabt-db-es-http:9200
This gives me the response:
{
"name" : "rabt-db-es-default-0",
"cluster_name" : "rabt-db",
"cluster_uuid" : "9YoWLsnMTwq5Yor1ak2JGw",
"version" : {
"number" : "7.10.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "51e9d6f22758d0374a0f3f5c6e8f3a7997850f96",
"build_date" : "2020-11-09T21:30:33.964949Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
To me, this verifies that I the pod can communicate with the database, and has the correct user, password and certificates in place to do it securely. Why then does this fail using a logstash conf file?
The error from the logstash end is
[WARN ] 2021-01-14 15:24:38.360 [Ruby-0-Thread-6: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-10.6.2-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:241] elasticsearch - Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"https://rabt-db-es-http:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '401' contacting Elasticsearch at URL 'https://rabt-db-es-http:9200/'"}
From Elasticsearch I can see the failed requests as
{"type": "server", "timestamp": "2021-01-14T15:36:13,725Z", "level": "WARN", "component": "o.e.x.s.t.n.SecurityNetty4HttpServerTransport", "cluster.name": "rabt-db", "node.name": "rabt-db-es-default-0", "message": "http client did not trust this server's certificate, closing connection Netty4HttpChannel{localAddress=/10.244.0.30:9200, remoteAddress=/10.244.0.15:37766}", "cluster.uuid": "9YoWLsnMTwq5Yor1ak2JGw", "node.id": "9w3fXZBZQGeBMeFYGqYUXw" }
---
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-config
labels:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
data:
logstash.yml: |
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
---
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-pipeline
labels:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
data:
logstash.conf: |
input {
rabbitmq {
host => "rabt-mq"
port => 5672
durable => true
queue => "rabt-rainfall-queue"
exchange => "rabt-rainfall-exchange"
exchange_type => "direct"
heartbeat => 30
durable => true
user => "${RMQ_USERNAME}"
password => "${RMQ_PASSWORD}"
}
file {
path => "/usr/share/logstash/config/intensity.csv"
start_position => "beginning"
codec => plain {
charset => "ISO-8859-1"
}
type => "intensity"
}
}
filter {
csv {
separator => ","
columns => ["Duration", "Intensity"]
}
}
output {
if [type] == "rainfall" {
elasticsearch {
hosts => [ "${ES_HOSTS}" ]
ssl => true
cacert => "/etc/logstash/certificates/tls.crt"
index => "rabt-rainfall-%{+YYYY.MM.dd}"
}
}
else if[type] == "intensity"{
elasticsearch {
hosts => [ "${ES_HOSTS}" ]
ssl => true
cacert => "/etc/logstash/certificates/tls.crt"
index => "intensity-%{+YYYY.MM.dd}"
}
}
}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rainfall-intensity-threshold
labels:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
data:
intensity.csv: |
Duration,Intensity
0.1,7.18941593
0.2,6.34611898
0.3,5.89945352
0.4,5.60173824
0.5,5.38119846
0.6,5.20746530
0.7,5.06495933
0.8,4.94467113
0.9,4.84094288
1,4.75000000
2,4.19283923
3,3.89773029
4,3.70103175
5,3.55532256
6,3.44053820
7,3.34638544
8,3.26691182
9,3.19837924
10,3.13829388
20,2.77018141
30,2.57520486
40,2.44524743
50,2.34897832
60,2.27314105
70,2.21093494
80,2.15842723
90,2.11314821
100,2.07345020
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rabt-logstash
labels:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
template:
metadata:
labels:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
spec:
containers:
- name: logstash
image: docker.elastic.co/logstash/logstash:7.9.2
ports:
- name: "tcp-beats"
containerPort: 5044
env:
- name: ES_HOSTS
value: "https://rabt-db-es-http:9200"
- name: ES_USER
value: "elastic"
- name: ES_PASSWORD
valueFrom:
secretKeyRef:
name: rabt-db-es-elastic-user
key: elastic
- name: RMQ_USERNAME
valueFrom:
secretKeyRef:
name: rabt-mq-default-user
key: username
- name: RMQ_PASSWORD
valueFrom:
secretKeyRef:
name: rabt-mq-default-user
key: password
volumeMounts:
- name: config-volume
mountPath: /usr/share/logstash/config
- name: pipeline-volume
mountPath: /usr/share/logstash/pipeline
- name: ca-certs
mountPath: /etc/logstash/certificates
readOnly: true
volumes:
- name: config-volume
projected:
sources:
- configMap:
name: logstash-config
- configMap:
name: rainfall-intensity-threshold
- name: pipeline-volume
configMap:
name: logstash-pipeline
- name: ca-certs
secret:
secretName: rabt-db-es-http-certs-public
---
apiVersion: v1
kind: Service
metadata:
name: rabt-logstash
labels:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
spec:
ports:
- name: "tcp-beats"
port: 5044
targetPort: 5044
selector:
app.kubernetes.io/name: rabt
app.kubernetes.io/component: logstash
You're missing the user/password in the Logstash output configuration:
elasticsearch {
hosts => [ "${ES_HOSTS}" ]
ssl => true
cacert => "/etc/logstash/certificates/tls.crt"
index => "rabt-rainfall-%{+YYYY.MM.dd}"
user => "${ES_USER}"
password => "${ES_PASSWORD}"
}
I want to deploy my microservice infrastructure as AKS at Azure. I created a node on which 3 microservices run. My API gateway should be able to be addressed with a public IP and data should be forwarded to my other two microservices.
PS /home/jan-marius> kubectl get pods
NAME READY STATUS RESTARTS AGE
apigateway-77875f89cb-qcmnf 1/1 Running 0 18h
contacts-5ccc69f74-x287p 1/1 Running 0 18h
templates-579fc4984b-srv7h 1/1 Running 0 18h
so far so good.After that I created a public IP from the Microsoft Docs and changed my Yaml file as follows.
az network public-ip create \
--resource-group myResourceGroup \
--name myAKSPublicIP \
--sku Standard \
--allocation-method static
apiVersion: apps/v1
kind: Deployment
metadata:
name: apigateway
spec:
replicas: 1
selector:
matchLabels:
app: apigateway
template:
metadata:
labels:
app: apigateway
spec:
nodeSelector:
"beta.kubernetes.io/os": linux
containers:
- name: apigateway
image: xxx.azurecr.io/apigateway:11
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 512Mi
ports:
- containerPort: 8800
name: apigateway
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/azure-dns-label-name: tegos-sendmessage
name: apigateway
spec:
loadBalancerIP: 20.50.10.36
type: LoadBalancer
ports:
- port: 8800
selector:
app: apigateway
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: contacts
spec:
replicas: 1
selector:
matchLabels:
app: contacts
template:
metadata:
labels:
app: contacts
spec:
nodeSelector:
"beta.kubernetes.io/os": linux
containers:
- name: contacts
image: xxx.azurecr.io/contacts:12
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 512Mi
ports:
- containerPort: 8100
name: contacts
---
apiVersion: v1
kind: Service
metadata:
name: contacts
spec:
ports:
- port: 8100
selector:
app: contacts
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: templates
spec:
replicas: 1
selector:
matchLabels:
app: templates
template:
metadata:
labels:
app: templates
spec:
nodeSelector:
"beta.kubernetes.io/os": linux
containers:
- name: templates
image: xxx.azurecr.io/templates:13
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 512Mi
ports:
- containerPort: 8200
name: templates
---
apiVersion: v1
kind: Service
metadata:
name: templates
spec:
ports:
- port: 8200
selector:
app: templates
However, if I want to call the external IP address with get service, the status is
S /home/jan-marius> kubectl get service apigateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
apigateway LoadBalancer 10.0.181.113 <pending> 8800:30817/TCP 19h
PS /home/jan-marius> kubectl describe service apigateway
Name: apigateway
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.beta.kubernetes.io/azure-dns-label-name":"tegos-sendmessage"},"nam...
service.beta.kubernetes.io/azure-dns-label-name: tegos-sendmessage
Selector: app=apigateway
Type: LoadBalancer
IP: 10.0.181.113
IP: 20.50.10.36
Port: <unset> 8800/TCP
TargetPort: 8800/TCP
NodePort: <unset> 30817/TCP
Endpoints: 10.244.0.14:8800
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 5m (x216 over 17h) service-controller Ensuring load balancer
I read on the net that this error can occur if the locations of the cluster and the external IP or the LoadBalancer types do not match. I am sure that the locations match. I can't be sure about the LoadBalancer types. The external IP SKU is set to standard. However, I have never defined the type of LoadBalancer and I don't know where it can be found. Can someone tell me what I'm doing wrong and how I can provide my web service?
[![enter image description here][1]][1]
PS /home/jan-marius> az aks show -g SendMessageResource -n SendMessageCluster
{
"aadProfile": null,
"addonProfiles": {
"httpapplicationrouting": {
"config": {
"HTTPApplicationRoutingZoneName": "e6e284534ad74c0d9c01.westeurope.aksapp.io"
},
"enabled": true,
"identity": null
},
"omsagent": {
"config": {
"loganalyticsworkspaceresourceid": "/subscriptions/a553134ba7eb-cb83-484d-a05d-44bb70125b8a/resourcegroups/defaultresourcegroup-weu/providers/microsoft.operationalinsights/workspaces/defaultworkspace-a55ba7eb-cb83-484d-a05d-44bb334170125b8a-weu"
},
"enabled": true,
"identity": null
}
},
"agentPoolProfiles": [
{
"availabilityZones": null,
"count": 1,
"enableAutoScaling": null,
"enableNodePublicIp": false,
"maxCount": null,
"maxPods": 110,
"minCount": null,
"mode": "System",
"name": "nodepool1",
"nodeLabels": {},
"nodeTaints": null,
"orchestratorVersion": "1.15.11",
"osDiskSizeGb": 100,
"osType": "Linux",
"provisioningState": "Succeeded",
"scaleSetEvictionPolicy": null,
"scaleSetPriority": null,
"spotMaxPrice": null,
"tags": null,
"type": "VirtualMachineScaleSets",
"vmSize": "Standard_DS2_v2"
}
],
"apiServerAccessProfile": null,
"autoScalerProfile": null,
"diskEncryptionSetId": null,
"dnsPrefix": "SendMessag-SendMessageResou-a55ba7",
"enablePodSecurityPolicy": null,
"enableRbac": true,
"fqdn": "sendmessag-sendmessageresou-a55ba7-14596671.hcp.westeurope.azmk8s.io",
"id": "/subscriptions/a55b3141a7eb-cb83-484d-a05d-44bb70125b8a/resourcegroups/SendMessageResource/providers/Microsoft.ContainerService/managedClusters/SendMessageCluster",
"identity": null,
"identityProfile": null,
"kubernetesVersion": "1.15.11",
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7bzXktZht3zLbHrz3Xpv3VNhtrj/XmBKOIHB0D0ZpBIrsfXcg9veBov8n3cU/F/oKIfqcL2xaoktVwZFz9AjEi7qPXdxrsVLjV2+w0kPyC3ZC5JbtLSO4CFgn0MtclC6mE3OPYczYPoFdZI3/w/AmoZ6TsT7MupkCjKtrYIIaDZ/22zuTMYMvJro7cfjKI5OSR7soybXcoFKw+3tzwO9Mv9lUQr7x0eRCUAUJN6OziEI9p36fLEnNgRG4GiJJZP5aqqsVRUDuu8PF9pO0YLMBr3b2HHgzpDwSebZ6TU//okuc30cqG/2v2LkjBDRGrs5YxiSv3+ejr/9A4XGWup4Z"
}
]
}
},
"location": "westeurope",
"maxAgentPools": 10,
"name": "SendMessageCluster",
"networkProfile": {
"dnsServiceIp": "10.0.0.10",
"dockerBridgeCidr": "172.17.0.1/16",
"loadBalancerProfile": {
"allocatedOutboundPorts": null,
"effectiveOutboundIps": [
{
"id": "/subscriptions/a55b3142a7eb-cb83-484d-a05d-44bb70125b8a/resourceGroups/MC_SendMessageResource_SendMessageCluster_westeurope/providers/Microsoft.Network/publicIPAddresses/988314172c28-d4da-431e-b7f8-5acb08e468b4",
"resourceGroup": "MC_SendMessageResource_SendMessageCluster_westeurope"
}
],
"idleTimeoutInMinutes": null,
"managedOutboundIps": {
"count": 1
},
"outboundIpPrefixes": null,
"outboundIps": null
},
"loadBalancerSku": "Standard",
"networkMode": null,
"networkPlugin": "kubenet",
"networkPolicy": null,
"outboundType": "loadBalancer",
"podCidr": "10.244.0.0/16",
"serviceCidr": "10.0.0.0/16"
},
"nodeResourceGroup": "MC_SendMessageResource_SendMessageCluster_westeurope",
"privateFqdn": null,
"provisioningState": "Succeeded",
"resourceGroup": "SendMessageResource",
"servicePrincipalProfile": {
"clientId": "9009bcd8-4933-4641-b00b-237e157d86589b"
},
"sku": {
"name": "Basic",
"tier": "Free"
},
"type": "Microsoft.ContainerService/ManagedClusters",
"windowsProfile": null
}
if your publicip is in another resource group - you need to specify the resource group for the ip:
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/azure-dns-label-name: tegos-sendmessage
service.beta.kubernetes.io/azure-load-balancer-resource-group: myResourceGroup
name: apigateway
spec:
loadBalancerIP: 20.50.10.36
type: LoadBalancer
ports:
- port: 8800
selector:
app: apigateway
I've decided to follow this guide and I've encountered many problems.
First of all specifying traefik command in command was required or else of I've got error that entrypoint.sh can't find command storedata, and yes > yaml syntax is valid way to pass multi-line command in docker-compose.yml
So here's a docker-compose.yml:
docker-compose.yml
visualizer:
image: dockersamples/visualizer:latest
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
networks:
- traefik
- default
ports:
- "8001:8080"
deploy:
labels:
- "traefik.port=8080"
- "traefik.tags=monitoring"
- "traefik.docker.network=infra_traefik"
- "traefik.backend=visualizer"
- "traefik.frontend.rule=Host:visualizer.swarm.xxx.io"
- "traefik.frontend.auth.basic=admin:$$apr1$$dxw2H03E$$VWrfVhKQWyaRiZ4XsfWCK/"
restart_policy:
condition: on-failure
replicas: 1
placement:
constraints:
- node.labels.name == master
consul:
image: consul
command: agent -server -bootstrap-expect=1
volumes:
- consul-data:/consul/data
environment:
- CONSUL_LOCAL_CONFIG={"datacenter":"ams3","server":true}
- CONSUL_BIND_INTERFACE=eth0
- CONSUL_CLIENT_INTERFACE=eth0
deploy:
labels:
- "traefik.enable=false"
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
networks:
- traefik
proxy_init:
image: traefik:1.6.3-alpine
command: >
traefik
storeconfig
--api
--entrypoints='Name:http Address::80 Redirect.EntryPoint:https'
--entrypoints='Name:https Address::443 TLS'
--defaultentrypoints=http,https
--acme
--acme.storage="traefik/acme/account"
--acme.entryPoint=https
--acme.httpChallenge.entryPoint=http
--acme.onHostRule=true
--acme.acmelogging=true
--acme.onDemand=false
--acme.email="xxx#gmail.com"
--docker
--docker.swarmMode
--docker.domain=swarm.xxx.io
--docker.watch
--consul
--consul.endpoint=consul:8500
--consul.prefix=traefik
--accesslogsfile=/dev/stdout
--debug
networks:
- traefik
deploy:
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
depends_on:
- consul
proxy:
image: traefik:1.6.3-alpine
depends_on:
- traefik_init
- consul
command: >
traefik
--consul
--consul.watch
--consul.endpoint=consul:8500
--consul.prefix=traefik
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- traefik
ports:
- 80:80
- 443:443
- 8080:8080
deploy:
labels:
- "traefik.docker.network=infra_traefik"
- "traefik.port=8080"
- "traefik.tags=monitoring"
- "traefik.backend.loadbalancer.stickiness=true"
- "traefik.frontend.passHostHeader=true"
- "traefik.frontend.rule=Host:proxy.swarm.xxx.io"
- "traefik.frontend.auth.basic=admin:$$apr1$$hfqD9TtY$$oGSy9nS."
mode: global
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
update_config:
parallelism: 1
delay: 10s
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
networks:
traefik:
driver: overlay
volumes:
portainer-data:
driver: local
consul-data:
driver: local
traefik-data:
driver: local
And here's what logs shows of proxy_init container:
infra_proxy_init.1.4rtllualg8od#swarm-manager-0 | 2018/06/12 15:41:35 Storing configuration:
{
"LifeCycle": {
"RequestAcceptGraceTimeout": 0,
"GraceTimeOut": 10000000000
},
"GraceTimeOut": 0,
"Debug": true,
"CheckNewVersion": true,
"SendAnonymousUsage": false,
"AccessLogsFile": "/dev/stdout",
"AccessLog": null,
"TraefikLogsFile": "",
"TraefikLog": null,
"Tracing": null,
"LogLevel": "DEBUG",
"EntryPoints": {
"http": {
"Address": ":80",
"TLS": null,
"Redirect": {
"entryPoint": "https"
},
"Auth": null,
"WhitelistSourceRange": null,
"WhiteList": null,
"Compress": false,
"ProxyProtocol": null,
"ForwardedHeaders": {
"Insecure": true,
"TrustedIPs": null
}
},
"https": {
"Address": ":443",
"TLS": {
"MinVersion": "",
"CipherSuites": null,
"Certificates": [],
"ClientCAFiles": null,
"ClientCA": {
"Files": null,
"Optional": false
}
},
"Redirect": null,
"Auth": null,
"WhitelistSourceRange": null,
"WhiteList": null,
"Compress": false,
"ProxyProtocol": null,
"ForwardedHeaders": {
"Insecure": true,
"TrustedIPs": null
}
}
},
"Cluster": null,
"Constraints": [],
"ACME": {
"Email": "xxx#gmail.com",
"Domains": null,
"Storage": "traefik/acme/account",
"StorageFile": "",
"OnDemand": false,
"OnHostRule": true,
"CAServer": "",
"EntryPoint": "https",
"DNSChallenge": null,
"HTTPChallenge": {
"EntryPoint": "http"
},
"DNSProvider": "",
"DelayDontCheckDNS": 0,
"ACMELogging": true,
"TLSConfig": null
},
"DefaultEntryPoints": [
"http",
"https"
],
"ProvidersThrottleDuration": 2000000000,
"MaxIdleConnsPerHost": 200,
"IdleTimeout": 0,
"InsecureSkipVerify": false,
"RootCAs": null,
"Retry": null,
"HealthCheck": {
"Interval": 30000000000
},
"RespondingTimeouts": null,
"ForwardingTimeouts": null,
"AllowMinWeightZero": false,
"Web": null,
"Docker": {
"Watch": true,
"Filename": "",
"Constraints": null,
"Trace": false,
"TemplateVersion": 0,
"DebugLogGeneratedTemplate": false,
"Endpoint": "unix:///var/run/docker.sock",
"Domain": "swarm.xxx.io",
"TLS": null,
"ExposedByDefault": true,
"UseBindPortIP": false,
"SwarmMode": true
},
"File": null,
"Marathon": null,
"Consul": {
"Watch": true,
"Filename": "",
"Constraints": [],
"Trace": false,
"TemplateVersion": 0,
"DebugLogGeneratedTemplate": false,
"Endpoint": "consul:8500",
"Prefix": "traefik",
"TLS": null,
"Username": "",
"Password": ""
},
"ConsulCatalog": null,
"Etcd": null,
"Zookeeper": null,
"Boltdb": null,
"Kubernetes": null,
"Mesos": null,
"Eureka": null,
"ECS": null,
"Rancher": null,
"DynamoDB": null,
"ServiceFabric": null,
"Rest": null,
"API": {
"EntryPoint": "traefik",
"Dashboard": true,
"Debug": false,
"CurrentConfigurations": null,
"Statistics": null
},
"Metrics": null,
"Ping": null
}
Second I've specified everything as in that manual, all the consul prefixes, and so on, and Traefik says that he can't find frontends and backends, also as pretty much anything in traefik/:
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/backends/\": Key not found in store"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/frontends/\": Key not found in store"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Transaction 8714f767-ed5a-477f-ae46-6ebd0b4e15c2 begins"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/tls/\": Key not found in store"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Configuration received from provider consul: {}"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=info msg="Skipping same configuration for provider consul"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot get key traefik/alias Key not found in store, setting default traefik"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/backends/\": Key not found in store"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:54:59Z" level=debug msg="Datastore reload"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=error msg="Datastore sync error: Object lock value: expected 8714f767-ed5a-477f-ae46-6ebd0b4e15c2, got 068a8a6d-66a9-4d01-b44e-020a601c05da, retrying in 677.561632ms"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Datastore reload"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Datastore reload"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/frontends/\": Key not found in store"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:54:59Z" level=debug msg="Cannot get key traefik/alias Key not found in store, setting default traefik"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/tls/\": Key not found in store"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/backends/\": Key not found in store"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/frontends/\": Key not found in store"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Configuration received from provider consul: {}"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:55:00Z" level=debug msg="Cannot list keys under \"traefik/tls/\": Key not found in store"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:55:00Z" level=debug msg="Configuration received from provider consul: {}"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:55:00Z" level=info msg="Skipping same configuration for provider consul"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=info msg="Skipping same configuration for provider consul"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot get key traefik/alias Key not found in store, setting default traefik"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/backends/\": Key not found in store"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:55:00Z" level=debug msg="Building ACME client..."
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:54:59Z" level=debug msg="Cannot list keys under \"traefik/frontends/\": Key not found in store"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:55:00Z" level=debug msg="Cannot list keys under \"traefik/tls/\": Key not found in store"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:55:00Z" level=debug msg="Configuration received from provider consul: {}"
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:55:00Z" level=info msg="Skipping same configuration for provider consul"
infra_proxy.0.vzv9q4rns6r7#swarm-manager-0 | time="2018-06-12T15:55:00Z" level=debug msg="Using HTTP Challenge provider."
infra_proxy.0.xu6e4zez242f#swarm-master | time="2018-06-12T15:55:00Z" level=debug msg="Reset ACME account object."
And here's logs from consul container:
holms#debian ~> docker --tls service logs -f infra_consul
infra_consul.1.uo2dqcm0s6tv#swarm-master | ==> Found address '10.0.2.253' for interface 'eth0', setting bind option...
infra_consul.1.uo2dqcm0s6tv#swarm-master | ==> Found address '10.0.2.253' for interface 'eth0', setting client option...
infra_consul.1.uo2dqcm0s6tv#swarm-master | BootstrapExpect is set to 1; this is the same as Bootstrap mode.
infra_consul.1.uo2dqcm0s6tv#swarm-master | bootstrap = true: do not enable unless necessary
infra_consul.1.uo2dqcm0s6tv#swarm-master | ==> Starting Consul agent...
infra_consul.1.uo2dqcm0s6tv#swarm-master | ==> Consul agent running!
infra_consul.1.uo2dqcm0s6tv#swarm-master | Version: 'v1.1.0'
infra_consul.1.uo2dqcm0s6tv#swarm-master | Node ID: '05472876-6f66-bb37-5f4d-62b08624a655'
infra_consul.1.uo2dqcm0s6tv#swarm-master | Node name: 'ef0f060252d0'
infra_consul.1.uo2dqcm0s6tv#swarm-master | Datacenter: 'ams3' (Segment: '<all>')
infra_consul.1.uo2dqcm0s6tv#swarm-master | Server: true (Bootstrap: true)
infra_consul.1.uo2dqcm0s6tv#swarm-master | Client Addr: [10.0.2.253] (HTTP: 8500, HTTPS: -1, DNS: 8600)
infra_consul.1.uo2dqcm0s6tv#swarm-master | Cluster Addr: 10.0.2.253 (LAN: 8301, WAN: 8302)
infra_consul.1.uo2dqcm0s6tv#swarm-master | Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
infra_consul.1.uo2dqcm0s6tv#swarm-master |
infra_consul.1.uo2dqcm0s6tv#swarm-master | ==> Log data will now stream in as it occurs:
infra_consul.1.uo2dqcm0s6tv#swarm-master |
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:05472876-6f66-bb37-5f4d-62b08624a655 Address:10.0.2.253:8300}]
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] serf: EventMemberJoin: ef0f060252d0.ams3 10.0.2.253
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] serf: EventMemberJoin: ef0f060252d0 10.0.2.253
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] agent: Started DNS server 10.0.2.253:8600 (udp)
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] raft: Node at 10.0.2.253:8300 [Follower] entering Follower state (Leader: "")
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] consul: Adding LAN server ef0f060252d0 (Addr: tcp/10.0.2.253:8300) (DC: ams3)
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] consul: Handled member-join event for server "ef0f060252d0.ams3" in area "wan"
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] agent: Started DNS server 10.0.2.253:8600 (tcp)
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] agent: Started HTTP server on 10.0.2.253:8500 (tcp)
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:19 [INFO] agent: started state syncer
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:25 [WARN] raft: Heartbeat timeout from "" reached, starting election
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:25 [INFO] raft: Node at 10.0.2.253:8300 [Candidate] entering Candidate state in term 2
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:25 [INFO] raft: Election won. Tally: 1
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:25 [INFO] raft: Node at 10.0.2.253:8300 [Leader] entering Leader state
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:25 [INFO] consul: cluster leadership acquired
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:25 [INFO] consul: New leader elected: ef0f060252d0
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:25 [INFO] consul: member 'ef0f060252d0' joined, marking health alive
infra_consul.1.uo2dqcm0s6tv#swarm-master | 2018/06/12 14:38:26 [INFO] agent: Synced node info
The problem is related to command, it must be an array instead of multiline string (>).
Note also that with alpine version (and only with alpine version), you need to add traefik before storeconfig:
proxy_init:
image: traefik:1.6.3-alpine
command:
- "traefik"
- "storeconfig"
- ...
--
Invalid:
command: >
traefik
--consul
--consul.watch
--consul.endpoint=consul:8500
--consul.prefix=traefik
valid:
command:
- "traefik"
- "--consul"
- "--consul.watch"
- "--consul.endpoint=consul:8500"
- "--consul.prefix=traefik"
--
visualizer:
image: dockersamples/visualizer:latest
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
networks:
- traefik
- default
ports:
- "8001:8080"
deploy:
labels:
- "traefik.port=8080"
- "traefik.tags=monitoring"
- "traefik.docker.network=infra_traefik"
- "traefik.backend=visualizer"
- "traefik.frontend.rule=Host:visualizer.swarm.xxx.io"
- "traefik.frontend.auth.basic=admin:$$apr1$$dxw2H03E$$VWrfVhKQWyaRiZ4XsfWCK/"
restart_policy:
condition: on-failure
replicas: 1
placement:
constraints:
- node.labels.name == master
consul:
image: consul
command: agent -server -bootstrap-expect=1
volumes:
- consul-data:/consul/data
environment:
- CONSUL_LOCAL_CONFIG={"datacenter":"ams3","server":true}
- CONSUL_BIND_INTERFACE=eth0
- CONSUL_CLIENT_INTERFACE=eth0
deploy:
labels:
- "traefik.enable=false"
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
networks:
- traefik
proxy_init:
image: traefik:1.6.3-alpine
command:
- "traefik"
- "storeconfig"
- "--api"
- "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
- "--entrypoints=Name:https Address::443 TLS"
- "--defaultentrypoints=http,https"
- "--acme"
- "--acme.storage=traefik/acme/account"
- "--acme.entryPoint=https"
- "--acme.httpChallenge.entryPoint=http"
- "--acme.onHostRule=true"
- "--acme.acmelogging=true"
- "--acme.onDemand=false"
- "--acme.email=xxx#gmail.com"
- "--docker"
- "--docker.swarmMode"
- "--docker.domain=swarm.xxx.io"
- "--docker.watch"
- "--consul"
- "--consul.endpoint=consul:8500"
- "--consul.prefix=traefik"
- "--accesslogsfile=/dev/stdout"
- "--debug"
networks:
- traefik
deploy:
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
depends_on:
- consul
proxy:
image: traefik:1.6.3-alpine
depends_on:
- traefik_init
- consul
command:
- "traefik"
- "--consul"
- "--consul.watch"
- "--consul.endpoint=consul:8500"
- "--consul.prefix=traefik"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- traefik
ports:
- 80:80
- 443:443
- 8080:8080
deploy:
labels:
- "traefik.docker.network=infra_traefik"
- "traefik.port=8080"
- "traefik.tags=monitoring"
- "traefik.backend.loadbalancer.stickiness=true"
- "traefik.frontend.passHostHeader=true"
- "traefik.frontend.rule=Host:proxy.swarm.xxx.io"
- "traefik.frontend.auth.basic=admin:$$apr1$$hfqD9TtY$$oGSy9nS."
mode: global
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
update_config:
parallelism: 1
delay: 10s
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
networks:
traefik:
driver: overlay
volumes:
portainer-data:
driver: local
consul-data:
driver: local
traefik-data:
driver: local