Unable to install ELK on AKS 1 pod has unbound immediate PersistentVolumeClaims - elasticsearch

I am confused and need direction. I am trying to Install ELK on AKS. I am following through with the below link for documentation.
https://ahmedhosameldein.wordpress.com/2021/03/25/install-elk-stack-on-azure-kubernetes-cluster-aks-using-helm/
k get pv,pvc,sc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-03cbd512-ba0c-403e-83b2-3b690a67912c 5Gi RWX Retain Bound default/elasticsearch-master-elasticsearch-master-0 elk-azurefile-sc 70m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-0 Bound pvc-03cbd512-ba0c-403e-83b2-3b690a67912c 5Gi RWX elk-azurefile-sc 70m
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/azurefile file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/azurefile-csi file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/azurefile-csi-premium file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/azurefile-premium file.csi.azure.com Delete Immediate true 79m
storageclass.storage.k8s.io/default (default) disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/elk-azurefile-sc kubernetes.io/azure-file Retain Immediate true 74m
storageclass.storage.k8s.io/managed disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/managed-csi disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/managed-csi-premium disk.csi.azure.com Delete WaitForFirstConsumer true 79m
storageclass.storage.k8s.io/managed-premium disk.csi.azure.com Delete WaitForFirstConsumer true 79m
azureuser#satishvm:~$ k get po
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 0/1 CrashLoopBackOff 16 (3m53s ago) 70m
elasticsearch-rpbon-test 0/1 Error 0 69m
When I describe pods I get the below error:
k describe po elasticsearch-master-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 23m (x31 over 77m) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )
Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )
Warning BackOff 3m32s (x298 over 77m) kubelet Back-off restarting failed container
Get all status as per below:
k get all
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-master-0 0/1 CrashLoopBackOff 18 (2m56s ago) 81m
pod/elasticsearch-rpbon-test 0/1 Error 0 80m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/elasticsearch-master ClusterIP 10.0.67.70 <none> 9200/TCP,9300/TCP 81m
service/elasticsearch-master-headless ClusterIP None <none> 9200/TCP,9300/TCP 81m
service/kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 90m
NAME READY AGE
statefulset.apps/elasticsearch-master 0/1 81m

Related

how do I pipe in file content as args to kubectl?

I wish to run k6 in a container with some simple javascript load from local file system,
It seems the below had some syntax error
$ cat simple.js
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
vus: 10,
duration: '30s',
};
export default function () {
http.get('http://100.96.1.79:8080');
sleep(1);
}
$kubectl run k6 --image=grafana/k6 -- run - <simple.js
//OR
$kubectl run k6 --image=grafana/k6 run - <simple.js
in the k6 pod log, I got
│ time="2023-02-16T12:12:05Z" level=error msg="could not initialize '-': could not load JS test 'file:///-': no exported functions in s │
I guess this means the simple.js is not really passed to k6 this way?
thank you!
I think you can't pipe (host) files into Kubernetes containers this way.
One way that it should work is to:
Create a ConfigMap to represent your file
Apply a Pod config that mounts the ConfigMap file
NAMESPACE="..." # Or default
kubectl create configmap simple \
--from-file=${PWD}/simple.js \
--namespace=${NAMESPACE}
kubectl get configmap/simple \
--output=yaml \
--namespace=${NAMESPACE}
Yields:
apiVersion: v1
kind: ConfigMap
metadata:
name: simple
data:
simple.js: |
import http from 'k6/http';
import { sleep } from 'k6';
export default function () {
http.get('http://test.k6.io');
sleep(1);
}
NOTE You could just create e.g. configmap.yaml with the above YAML content and apply it.
Then with pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: simple
spec:
containers:
- name: simple
image: docker.io/grafana/k6
args:
- run
- /m/simple.js
volumeMounts:
- name: simple
mountPath: /m
volumes:
- name: simple
configMap:
name: simple
Apply it:
kubectl apply \
--filename=${PWD}/pod.yaml \
--namespace=${NAMESPACE}
Then, finally:
kubectl logs pod/simple \
--namespace=${NAMESPACE}
Yields:
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /m/simple.js
output: -
scenarios: (100.00%) 1 scenario, 1 max VUs, 10m30s max duration (incl. graceful stop):
* default: 1 iterations for each of 1 VUs (maxDuration: 10m0s, gracefulStop: 30s)
running (00m01.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default [ 0% ] 1 VUs 00m01.0s/10m0s 0/1 iters, 1 per VU
running (00m01.4s), 0/1 VUs, 1 complete and 0 interrupted iterations
default ✓ [ 100% ] 1 VUs 00m01.4s/10m0s 1/1 iters, 1 per VU
data_received..................: 17 kB 12 kB/s
data_sent......................: 542 B 378 B/s
http_req_blocked...............: avg=128.38ms min=81.34ms med=128.38ms max=175.42ms p(90)=166.01ms p(95)=170.72ms
http_req_connecting............: avg=83.12ms min=79.98ms med=83.12ms max=86.27ms p(90)=85.64ms p(95)=85.95ms
http_req_duration..............: avg=88.61ms min=81.28ms med=88.61ms max=95.94ms p(90)=94.47ms p(95)=95.2ms
{ expected_response:true }...: avg=88.61ms min=81.28ms med=88.61ms max=95.94ms p(90)=94.47ms p(95)=95.2ms
http_req_failed................: 0.00% ✓ 0 ✗ 2
http_req_receiving.............: avg=102.59µs min=67.99µs med=102.59µs max=137.19µs p(90)=130.27µs p(95)=133.73µs
http_req_sending...............: avg=67.76µs min=40.46µs med=67.76µs max=95.05µs p(90)=89.6µs p(95)=92.32µs
http_req_tls_handshaking.......: avg=44.54ms min=0s med=44.54ms max=89.08ms p(90)=80.17ms p(95)=84.62ms
http_req_waiting...............: avg=88.44ms min=81.05ms med=88.44ms max=95.83ms p(90)=94.35ms p(95)=95.09ms
http_reqs......................: 2 1.394078/s
iteration_duration.............: avg=1.43s min=1.43s med=1.43s max=1.43s p(90)=1.43s p(95)=1.43s
iterations.....................: 1 0.697039/s
vus............................: 1 min=1 max=1
vus_max........................: 1 min=1 max=1
Tidy:
kubectl delete \
--filename=${PWD}/pod.yaml \
--namespace=${NAMESPACE}
kubectl delete configmap/simple \
--namespace=${NAMESPACE}
kubectl delete namespace/${NAMESPACE}

Kubernetes readiness probe fails

I wrote a readiness_probe for my pod by using a bash script. Readiness probe failed with Reason: Unhealthy but when I manually get in to the pod and run this command /bin/bash -c health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping); if [[ $health -ne 401 ]]; then exit 1; fi bash script exits with code 0.
What could be the reason? I am attaching the code and the error below.
Edit: Found out that the health variable is set to 000 which means timeout in for bash script.
readinessProbe:
exec:
command:
- /bin/bash
- '-c'
- |-
health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping);
if [[ $health -ne 401 ]]; then exit 1; fi
"kubectl describe pod {pod_name}" result:
Name: rustici-engine-54cbc97c88-5tg8s
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Tue, 12 Jul 2022 18:39:08 +0200
Labels: app.kubernetes.io/name=rustici-engine
pod-template-hash=54cbc97c88
Annotations: <none>
Status: Running
IP: 172.17.0.5
IPs:
IP: 172.17.0.5
Controlled By: ReplicaSet/rustici-engine-54cbc97c88
Containers:
rustici-engine:
Container ID: docker://f7efffe6fc167e52f913ec117a4d78e62b326d8f5b24bfabc1916b5f20ed887c
Image: batupaksoy/rustici-engine:singletenant
Image ID: docker-pullable://batupaksoy/rustici-engine#sha256:d3cf985c400c0351f5b5b10c4d294d48fedfd2bb2ddc7c06a20c1a85d5d1ae11
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 12 Jul 2022 18:39:12 +0200
Ready: False
Restart Count: 0
Limits:
memory: 350Mi
Requests:
memory: 350Mi
Liveness: exec [/bin/bash -c health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping);
if [[ $health -ne 401 ]]; then exit 1; else exit 0; echo $health; fi] delay=10s timeout=5s period=10s #success=1 #failure=20
Readiness: exec [/bin/bash -c health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/api/v2/ping);
if [[ $health -ne 401 ]]; then exit 1; else exit 0; echo $health; fi] delay=10s timeout=5s period=10s #success=1 #failure=10
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-whb8d (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-whb8d:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24s default-scheduler Successfully assigned default/rustici-engine-54cbc97c88-5tg8s to minikube
Normal Pulling 23s kubelet Pulling image "batupaksoy/rustici-engine:singletenant"
Normal Pulled 21s kubelet Successfully pulled image "batupaksoy/rustici-engine:singletenant" in 1.775919851s
Normal Created 21s kubelet Created container rustici-engine
Normal Started 20s kubelet Started container rustici-engine
Warning Unhealthy 4s kubelet Readiness probe failed:
Warning Unhealthy 4s kubelet Liveness probe failed:
The probe could be failing because it is facing performance issues or slow startup. To troubleshoot this issue, you will need to check that the probe doesn’t start until the app is up and running in your pod. Perhaps you will need to increase the Timeout of the Readiness Probe, as well as the Timeout of the Liveness Probe, like in the following example:
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 2
timeoutSeconds: 10
You can find more details about how to configure the Readlines Probe and Liveness Probe in this link.

Elasticserach not creating the Indice for new pipeline via logstash

I have set-up a ELK but I see elasticsearch not creating the Index and unable to upload the data, Service Elasticsearch and Logstash both are running..
Below is the details.. However I do not see anything on he logs.
Elastic config:
[root#aruba-elk2 rm_logs]# cat /etc/elasticsearch/elasticsearch.yml
# Elasticserach config
#########################
cluster.name: log-cohort-test
node.name: aruba-elk2
node.master: true
path:
data: /elk/lib/elasticsearch
logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
bootstrap.system_call_filter: False
[root#aruba-elk2 rm_logs]#
[root#aruba-elk2 rm_logs]#
LOGSTASH COnfig:
[root#aruba-elk2 rm_logs]# cat /etc/logstash/logstash.yml
path.data: /var/lib/logstash
path.logs: /var/log/logstash
[root#aruba-elk2 rm_logs]# cat /etc/logstash/conf.d/logstash-syslog.conf
input {
file {
path => [ "/elk/rm_logs/*.txt" ]
type => "rmlog"
}
}
filter {
if [type] == "rmlog" {
grok {
match => { "message" => "%{HOSTNAME:hostname},%{DATE:date},%{HOUR:hour1}:%{MINUTE:minute1},%{NUMBER}-%{WORD},%{USER:user},%{USER:user2} %{NUMBER:pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:number1} %{NUMBER:number2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:command},%{PATH:path}" }
add_field => [ "received_at", "%{#timestamp}" ]
}
}
}
output {
if [type] == "rmlog" {
elasticsearch {
hosts => ["aruba-elk2:9200"]
manage_template => false
index => "rmlog-%{+YYYY.MM.dd}"
#document_type => "messages"
}
}
}
Input data Source:
[root#aruba-elk2 rm_logs]# cd /elk/rm_logs/
[root#aruba-elk2 rm_logs]# ls -ltrh | head
total 2.6M
-rw-r--r-- 1 root root 558 Jan 11 11:27 dbxchw092.txt
-rw-r--r-- 1 root root 405 Jan 11 11:27 dbxtx220.txt
-rw-r--r-- 1 root root 241 Jan 11 11:27 dbxcvm139.txt
-rw-r--r-- 1 root root 455 Jan 11 11:27 dbxcnl038.txt
-rw-r--r-- 1 root root 230 Jan 11 11:27 dbxchw052.txt
-rw-r--r-- 1 root root 143 Jan 11 11:27 dbxtx222.txt
-rw-r--r-- 1 root root 577 Jan 11 11:27 dbxtx224.txt
-rw-r--r-- 1 root root 274 Jan 11 11:27 dbxcvm082.txt
-rw-r--r-- 1 root root 281 Jan 11 11:27 dbxcsb003.txt
Sample of above data file:
testhost-in2,19/01/11,06:34,04-mins,arnav,arnav 2427 0.1 0.0 58980 580 ? S 06:30 0:00 rm -rf /test/ehf/users/arnav-090119-184844,/dv/ehf/users/arnav-090119-
testhost-in2,19/01/11,06:40,09-mins,arnav,arnav 2427 0.1 0.0 58980 580 ? S 06:30 0:00 rm -rf /dv/ehf/users/arnav-090119-184844,/dv/ehf/users/arnav-090119-\
testhost-in2,19/01/11,06:45,14-mins,arnav,arnav 2427 0.1 0.0 58980 580 ? S 06:30 0:01 rm -rf /
LOGS:
Logstash logs:
[root#aruba-elk2 logstash]# cat logstash-plain.log
[2019-01-12T23:48:31,653][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.5.4"}
[2019-01-12T23:48:34,959][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>48, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-01-12T23:48:35,374][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://aruba-elk2:9200/]}}
[2019-01-12T23:48:35,588][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"http://aruba-elk2:9200/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [http://aruba-elk2:9200/][Manticore::SocketException] Connection refused"}
[2019-01-12T23:48:35,608][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//aruba-elk2:9200"]}
[2019-01-12T23:48:36,063][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/var/lib/logstash/plugins/inputs/file/.sincedb_076330d5fd2c2b811bc1960a3d0547be", :path=>["/elk/rm_logs/*.txt"]}
[2019-01-12T23:48:36,095][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x424bb675 run>"}
[2019-01-12T23:48:36,155][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2019-01-12T23:48:36,156][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-01-12T23:48:36,542][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-01-12T23:48:40,796][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://aruba-elk2:9200/"}
[2019-01-12T23:48:40,855][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2019-01-12T23:48:40,859][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
Elasticsearch LOGS:
[root#aruba-elk2 elasticsearch]# cat gc.log.0.current| tail
2019-01-13T00:13:29.280+0530: 1237.781: Total time for which application threads were stopped: 0.0002681 seconds, Stopping threads took: 0.0000316 seconds
2019-01-13T00:13:31.281+0530: 1239.782: Total time for which application threads were stopped: 0.0003670 seconds, Stopping threads took: 0.0000586 seconds
2019-01-13T00:13:32.281+0530: 1240.782: Total time for which application threads were stopped: 0.0003134 seconds, Stopping threads took: 0.0000708 seconds
2019-01-13T00:13:37.282+0530: 1245.783: Total time for which application threads were stopped: 0.0004663 seconds, Stopping threads took: 0.0001315 seconds
2019-01-13T00:13:51.284+0530: 1259.785: Total time for which application threads were stopped: 0.0004230 seconds, Stopping threads took: 0.0000691 seconds
2019-01-13T00:13:57.286+0530: 1265.787: Total time for which application threads were stopped: 0.0008421 seconds, Stopping threads took: 0.0002697 seconds
2019-01-13T00:13:58.287+0530: 1266.787: Total time for which application threads were stopped: 0.0004467 seconds, Stopping threads took: 0.0000706 seconds
2019-01-13T00:14:11.288+0530: 1279.789: Total time for which application threads were stopped: 0.0004702 seconds, Stopping threads took: 0.0001105 seconds
2019-01-13T00:14:18.289+0530: 1286.790: Total time for which application threads were stopped: 0.0004123 seconds, Stopping threads took: 0.0000750 seconds
Any help will be appreciated..

Send log-data from fluentd in kubernetes cluster to elasticsearch in remote standalone server outside cluster BUT IN HOST NETWORK?

As a further question of this question I want to know how I can reach my external service (elasticsearch) from inside a kubernetes pod (fluentd) if the external service is not reachable via internet but only from the host-network where also my kubernetes is hosted.
Here is the external service kubernetes object I applied:
kind: Service
apiVersion: v1
metadata:
name: ext-elastic
namespace: kube-system
spec:
type: ExternalName
externalName: 192.168.57.105
ports:
- port: 9200
So now I have this service:
ubuntu#controller:~$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ext-elastic ExternalName <none> 192.168.57.105 9200/TCP 2s
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 1d
The elasticsearch is there:
ubuntu#controller:~$ curl 192.168.57.105:9200
{
"name" : "6_6nPVn",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "ZmxYHz5KRV26QV85jUhkiA",
"version" : {
"number" : "6.2.3",
"build_hash" : "c59ff00",
"build_date" : "2018-03-13T10:06:29.741383Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
But from my fluentd-pod I can neither resolve the service-name in an nslookup nor ping the simple IP. Those commands are both not working:
ubuntu#controller:~$ kubectl exec fluentd-f5dks -n kube-system ping 192.168.57.105
ubuntu#controller:~$ kubectl exec fluentd-f5dks -n kube-system nslookup ext-elastic
Here is the description about my network-topology:
The VM where my elasticsearch is on has 192.168.57.105 and the VM where my kubernetes controller is on has 192.168.57.102. As shown above, the connection works well.
The controller-node has also the IP 192.168.56.102. This is the network in which he is together with the other worker-nodes (also VMs) of my kubernetes-cluster.
My fluentd-pod is seeing himself as 172.17.0.2. It can easily reach the 192.168.56.102 but not the 192.168.57.102 although it is it's host and also one and the same node.
Edit
The routing table of the fluentd-pod looks like this:
ubuntu#controller:~$ kubectl exec -ti fluentd-5lqns -n kube-system -- route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.244.0.1 0.0.0.0 UG 0 0 0 eth0
10.244.0.0 * 255.255.255.0 U 0 0 0 eth0
The /etc/resolc.conf of the fluentd-pod looks like this:
ubuntu#controller:~$ kubectl exec -ti fluentd-5lqns -n kube-system -- cat /etc/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
The routing table of the VM that is hosting the kubernetes controller and can reach the desired elasticsearch service looks like this:
ubuntu#controller:~$ route
Kernel-IP-Routentabelle
Ziel Router Genmask Flags Metric Ref Use Iface
default 10.0.2.2 0.0.0.0 UG 0 0 0 enp0s3
10.0.2.0 * 255.255.255.0 U 0 0 0 enp0s3
10.244.0.0 * 255.255.255.0 U 0 0 0 kube-bridge
10.244.1.0 192.168.56.103 255.255.255.0 UG 0 0 0 enp0s8
10.244.2.0 192.168.56.104 255.255.255.0 UG 0 0 0 enp0s8
172.17.0.0 * 255.255.0.0 U 0 0 0 docker0
192.168.56.0 * 255.255.255.0 U 0 0 0 enp0s8
192.168.57.0 * 255.255.255.0 U 0 0 0 enp0s9
Basically, your pod should have a route to the endpoint IP or the default route to the router which can redirect this traffic to the destination.
The destination endpoint should also have the route(or default route) to the source of the traffic to be able to send a reply.
Check out this article for details about routing in AWS cloud as an example.
In a general sense, a route table tells network packets which way they
need to go to get to their destination. Route tables are managed by
routers, which act as “intersections” within the network — they
connect multiple routes together and contain helpful information for
getting traffic to its final destination. Each AWS VPC has a VPC
router. The primary function of this VPC router is to take all of the
route tables defined within that VPC, and then direct the traffic flow
within that VPC, as well as to subnets outside of the VPC, based on
the rules defined within those tables.
Route tables consist of a list of destination subnets, as well as
where the “next hop” is to get to the final destination.

Kibana is not connecting to ElasticSearch

I am trying to use the Kubernetes 1.7.12 fluentd-elasticsearch addon:
https://github.com/kubernetes/kubernetes/tree/v1.7.12/cluster/addons/fluentd-elasticsearch
ElasticSearch starts up and can respond with:
{
"name" : "0322714ad5b7",
"cluster_name" : "kubernetes-logging",
"cluster_uuid" : "_na_",
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},
"tagline" : "You Know, for Search"
}
But Kibana is still unable to connect to it. The connection error starts out with:
{"type":"log","#timestamp":"2018-01-23T07:42:06Z","tags":["warning","elasticsearch"],"pid":6,"message":"Unable to revive connection: http://elasticsearch-logging:9200/"}
{"type":"log","#timestamp":"2018-01-23T07:42:06Z","tags":["warning","elasticsearch"],"pid":6,"message":"No living connections"}
And after ElasticSearch is up, the error changes to:
{"type":"log","#timestamp":"2018-01-23T07:42:08Z","tags":["status","plugin:elasticsearch#1.0.0","error"],"pid":6,"state":"red","message":"Status changed from red to red - Service Unavailable","prevState":"red","prevMsg":"Unable to connect to Elasticsearch at http://elasticsearch-logging:9200."}
So it seems as though, Kibana is finally able to get a response from ElasticSearch, but a connection still cannot be established.
This is what the Kibana dashboard looks like:
I tried to get the logs to output more information, but do not have enough knowledge about Kibana and ElasticSearch to know what else I can try next.
I am able to reproduce the error locally using this docker-compose.yml:
version: '2'
services:
elasticsearch-logging:
image: gcr.io/google_containers/elasticsearch:v2.4.1-2
ports:
- "9200:9200"
- "9300:9300"
kibana-logging:
image: gcr.io/google_containers/kibana:v4.6.1-1
ports:
- "5601:5601"
depends_on:
- elasticsearch-logging
environment:
- ELASTICSEARCH_URL=http://elasticsearch-logging:9200
It doesn't look like there should be much involved based on what I can tell from this question:
Kibana on Docker cannot connect to Elasticsearch
and this blog: https://gunith.github.io/docker-kibana-elasticsearch/
But I can't figure out what I'm missing.
Any ideas what else I might be able to try?
Thank you for your time. :)
Update 1:
curling http://elasticsearch-logging on the Kubernetes cluster resulted in the same output:
{
"name" : "elasticsearch-logging-v1-68km4",
"cluster_name" : "kubernetes-logging",
"cluster_uuid" : "_na_",
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},
"tagline" : "You Know, for Search"
}
curling http://elasticsearch-logging/_cat/indices?pretty on the Kubernetes cluster timed out because of a proxy rule. Using the docker-compose.yml and curling locally (e.g. curl localhost:9200/_cat/indices?pretty) results in:
{
"error" : {
"root_cause" : [ {
"type" : "master_not_discovered_exception",
"reason" : null
} ],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
The docker-compose logs show:
[2018-01-23 17:04:39,110][DEBUG][action.admin.cluster.state] [ac1f2a13a637] no known master node, scheduling a retry
[2018-01-23 17:05:09,112][DEBUG][action.admin.cluster.state] [ac1f2a13a637] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2018-01-23 17:05:09,116][WARN ][rest.suppressed ] path: /_cat/indices, params: {pretty=}
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:234)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:804)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Update 2:
Running kubectl --namespace kube-system logs -c kubedns po/kube-dns-667321983-dt5lz --tail 50 --follow
yields:
I0124 16:43:33.591112 5 dns.go:264] New service: kibana-logging
I0124 16:43:33.591225 5 dns.go:264] New service: nginx
I0124 16:43:33.591251 5 dns.go:264] New service: registry
I0124 16:43:33.591274 5 dns.go:264] New service: sudoe
I0124 16:43:33.591295 5 dns.go:264] New service: default-http-backend
I0124 16:43:33.591317 5 dns.go:264] New service: kube-dns
I0124 16:43:33.591344 5 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0124 16:43:33.591369 5 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0124 16:43:33.591390 5 dns.go:264] New service: kubernetes
I0124 16:43:33.591409 5 dns.go:462] Added SRV record &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
I0124 16:43:33.591429 5 dns.go:264] New service: elasticsearch-logging
Update 3:
I'm still trying to get everything to work, but with the help of others, I am confident it is a RBAC issue. I'm not completely sure, but it looks like the elasticsearch nodes were not able to connect with the master (which I never knew was even needed) due to permissions.
Here are some steps that helped, in case it helps others starting out:
with RBAC on:
# kubectl --kubeconfig kubeconfig.yaml --namespace kube-system logs po/elasticsearch-logging-v1-wkwcs
F0119 00:18:44.285773 9 elasticsearch_logging_discovery.go:60] kube-system namespace doesn't exist: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "kube-system". (get namespaces kube-system)
goroutine 1 [running]:
k8s.io/kubernetes/vendor/github.com/golang/glog.stacks(0x1f7f600, 0xc400000000, 0xee, 0x1b2)
vendor/github.com/golang/glog/glog.go:766 +0xa5
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).output(0x1f5f5c0, 0xc400000003, 0xc42006c300, 0x1ef20c8, 0x22, 0x3c, 0x0)
vendor/github.com/golang/glog/glog.go:717 +0x337
k8s.io/kubernetes/vendor/github.com/golang/glog.(*loggingT).printf(0x1f5f5c0, 0xc400000003, 0x16949d6, 0x1e, 0xc420579ee8, 0x2, 0x2)
vendor/github.com/golang/glog/glog.go:655 +0x14c
k8s.io/kubernetes/vendor/github.com/golang/glog.Fatalf(0x16949d6, 0x1e, 0xc420579ee8, 0x2, 0x2)
vendor/github.com/golang/glog/glog.go:1145 +0x67
main.main()
cluster/addons/fluentd-elasticsearch/es-image/elasticsearch_logging_discovery.go:60 +0xb53
[2018-01-19 00:18:45,273][INFO ][node ] [elasticsearch-logging-v1-wkwcs] version[2.4.1], pid[5], build[c67dc32/2016-09-27T18:57:55Z]
[2018-01-19 00:18:45,275][INFO ][node ] [elasticsearch-logging-v1-wkwcs] initializing ...
# kubectl --kubeconfig kubeconfig.yaml --namespace kube-system exec kibana-logging-2104905774-69wgv curl elasticsearch-logging.kube-system:9200/_cat/indices?pretty
{
"error" : {
"root_cause" : [ {
"type" : "master_not_discovered_exception",
"reason" : null
} ],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
With RBAC off:
# kubectl --kubeconfig kubeconfig.yaml --namespace kube-system log elasticsearch-logging-v1-7shgk
[2018-01-26 01:19:52,294][INFO ][node ] [elasticsearch-logging-v1-7shgk] version[2.4.1], pid[5], build[c67dc32/2016-09-27T18:57:55Z]
[2018-01-26 01:19:52,294][INFO ][node ] [elasticsearch-logging-v1-7shgk] initializing ...
[2018-01-26 01:19:53,077][INFO ][plugins ] [elasticsearch-logging-v1-7shgk] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
# kubectl --kubeconfig kubeconfig.yaml --namespace kube-system exec elasticsearch-logging-v1-7shgk curl http://elasticsearch-logging:9200/_cat/indices?pretty
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 40 100 40 0 0 2 0 0:00:20 0:00:15 0:00:05 10
green open .kibana 1 1 1 0 6.2kb 3.1kb
Thanks everyone for your help :)
A few troubleshooting tips:
1) ensure ElasticSearch is running fine.
Enter the container running elasticsearch and run:
curl localhost:9200
You should get a JSON, with some data about elasticsearch.
2) ensure ElasticSearch is reachable from the kibana container
Enter the kibana container and run:
curl <elasticsearch_service_name>:9200
You should get the same output as above.
3) Ensure your ES indices are fine.
Run the following command from the elasticsearch container:
curl localhost:9200/_cat/indices?pretty
You should get a table with all indices in your ES cluster and their status (which should be green or yellow in case you only have one ES replica).
If one of the above points fails, check the logs of your ES container for any error messages and try to solve them.
This exception indicates 2 misconfiguration
1. DNS Addon of Kubernetes is not working properly. Check your dns addon logs
2. Pod 2 Pod communication is not working properly. This is related with your underlying sdn addon cni flannel calico.
You can check by pinging one pod from another pod. If it is not working than check your networking configuration especially kube-proxy component.

Resources