metrics-server show unknown on one node - metrics-server

It just happen on almalinux 9.0 server which I add into cluster today.
k8s version is 1.19.16
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-dev-node1 110m 11% 2601Mi 104%
k8s-dev-node8 160m 5% 3600Mi 56%
k8s-dev-node9 105m 3% 2201Mi 34%
k8s-dev-node16 <unknown> <unknown> <unknown> <unknown>
hostNetwork has been seted on metrics server deployment.
top pod is fine and get node stats is fine too.
kubectl get --raw /api/v1/nodes/k8s-dev-node16/proxy/stats/summary
{
"node": {
"nodeName": "k8s-dev-node16",
"systemContainers": [
{
"name": "kubelet",
"startTime": "2022-11-10T05:52:14Z",
"cpu": {
"time": "2022-11-10T09:42:50Z",
"usageNanoCores": 17536218,
"usageCoreNanoSeconds": 221415603000
},
"memory": {
"time": "2022-11-10T09:42:50Z",
"availableBytes": 8011243520,
"usageBytes": 58892288,
"workingSetBytes": 46686208,
"rssBytes": 0,
"pageFaults": 2555049,
"majorPageFaults": 1
}
},
...

Related

consul deregister_critical_service_after is not woring

Hello everyone I have a healthcheck on my consul service, my goal is whenever the service is unhealthy then the consul should remove them from the service catalog.
Bellow is my config
{
"service": {
"name": "api",
"tags": [ "api-tag" ],
"port": 80
},
"check": {
"id": "api_up",
"name": "Fetch health check from local nginx",
"http": "http://localhost/HealthCheck",
"interval": "5s",
"timeout": "1s",
"deregister_critical_service_after": "15s"
},
"data_dir": "/consul/data",
"retry_join": [
"192.168.0.1",
"192.168.0.2",
]
}
Thanks for all the helps
The reason the service is not being de-registered is that the check is being specified outside of the service {} block in your JSON. This makes the check a node-level check, not a service-level check.
Here's a pretty-printed version of the config you provided.
{
"service": {
"name": "api",
"tags": [
"api-tag"
],
"port": 80
},
"check": {
"id": "api_up",
"name": "Fetch health check from local nginx",
"http": "http://localhost/HealthCheck",
"interval": "5s",
"timeout": "1s",
"deregister_critical_service_after": "15s"
},
"data_dir": "/consul/data",
"retry_join": [
"192.168.0.1",
"192.168.0.2",
]
}
Below is the configuration you should be using in order to correctly associate the check with the configured service, and de-register the service after the check has been marked as critical for more than 15 seconds.
{
"service": {
"name": "api",
"tags": [
"api-tag"
],
"port": 80,
"check": {
"id": "api_up",
"name": "Fetch health check from local nginx",
"http": "http://localhost/HealthCheck",
"interval": "5s",
"timeout": "1s",
"deregister_critical_service_after": "15s"
}
},
"data_dir": "/consul/data",
"retry_join": [
"192.168.0.1",
"192.168.0.2"
]
}
Note this statement from the docs for DeregisterCriticalServiceAfter.
If a check is in the critical state for more than this configured value, then its associated service (and all of its associated checks) will automatically be deregistered. The minimum timeout is 1 minute, and the process that reaps critical services runs every 30 seconds, so it may take slightly longer than the configured timeout to trigger the deregistration. This should generally be configured with a timeout that's much, much longer than any expected recoverable outage for the given service.

Local [Polkadot + Statemine] testnet with polkadot-launch

I want to deploy a local Polkadot Testnet via polkadot-launch.
I built the executables from:
polkadot: v0.9.9-1
cumulus: statemine_v3
This is the config.json:
{
"relaychain": {
"bin": "./bin/polkadot",
"chain": "rococo-local",
"nodes": [
{
"name": "alice",
"wsPort": 9944,
"port": 30444
},
{
"name": "bob",
"wsPort": 9955,
"port": 30555
},
{
"name": "charlie",
"wsPort": 9966,
"port": 30666
},
{
"name": "dave",
"wsPort": 9977,
"port": 30777
}
],
"genesis": {
"runtime": {
"runtime_genesis_config": {
"configuration": {
"config": {
"validation_upgrade_frequency": 1,
"validation_upgrade_delay": 1
}
}
}
}
}
},
"parachains": [
{
"bin": "./bin/polkadot-collator",
"id": "200",
"balance": "1000000000000000000000",
"nodes": [
{
"wsPort": 9988,
"port": 31200,
"name": "alice",
"flags": ["--force-authoring", "--", "--execution=wasm"]
}
]
},
{
"bin": "./bin/polkadot-collator",
"id": "300",
"balance": "1000000000000000000000",
"nodes": [
{
"wsPort": 9999,
"port": 31300,
"name": "alice",
"flags": ["--force-authoring", "--", "--execution=wasm"]
}
]
}
],
"simpleParachains": [
{
"bin": "./bin/adder-collator",
"id": "400",
"port": "31400",
"name": "alice",
"balance": "1000000000000000000000"
}
],
"hrmpChannels": [
{
"sender": 200,
"recipient": 300,
"maxCapacity": 8,
"maxMessageSize": 512
}
],
"types": {},
"finalization": false
}
When I call polkadot-launch, all alice, bob, charlieanddave` have ok logs:
$ tail -f alice.log
2021-09-25 19:34:30 🙌 Starting consensus session on top of parent 0x7df9c10b7ff6ded2b2712273633582c445345541a3a5d20fab85e67c041bab5c
2021-09-25 19:34:30 🎁 Prepared block for proposing at 8 [hash: 0x9b44a965ee76e35f4721888c53f31ebe8920224bcea359e82ff8dedb1734502b; parent_hash: 0x7df9…ab5c; extrinsics (2): [0xc67d…93c7, 0xecd4…0d35]]
2021-09-25 19:34:30 🔖 Pre-sealed block for proposal at 8. Hash now 0xd167ba86acc9f96c1282793387bc37f09b9cd4132113e8c59027958673bd22ae, previously 0x9b44a965ee76e35f4721888c53f31ebe8920224bcea359e82ff8dedb1734502b.
2021-09-25 19:34:30 ✨ Imported #8 (0xd167…22ae)
2021-09-25 19:34:32 💤 Idle (3 peers), best: #8 (0xd167…22ae), finalized #5 (0x4ab2…f178), ⬇ 1.4kiB/s ⬆ 2.2kiB/s
2021-09-25 19:34:36 🙌 Starting consensus session on top of parent 0xd167ba86acc9f96c1282793387bc37f09b9cd4132113e8c59027958673bd22ae
2021-09-25 19:34:36 🎁 Prepared block for proposing at 9 [hash: 0x5978b3aded771de9bdbcbe1cbb8d65f36dd0f85db791cf4faa7a43c2ad9a720e; parent_hash: 0xd167…22ae; extrinsics (2): [0xf381…4795, 0xfe0b…8a55]]
2021-09-25 19:34:36 🔖 Pre-sealed block for proposal at 9. Hash now 0x0f3261953f7ee2bf7973d5b3b988eceaf001ab8f8c0ee770d2c47e360e597caa, previously 0x5978b3aded771de9bdbcbe1cbb8d65f36dd0f85db791cf4faa7a43c2ad9a720e.
2021-09-25 19:34:36 ✨ Imported #9 (0x0f32…7caa)
2021-09-25 19:34:37 💤 Idle (3 peers), best: #9 (0x0f32…7caa), finalized #6 (0x6046…7629), ⬇ 1.5kiB/s ⬆ 2.7kiB/s
2021-09-25 19:34:42 ✨ Imported #10 (0x7a55…a45a)
2021-09-25 19:34:42 💤 Idle (3 peers), best: #10 (0x7a55…a45a), finalized #7 (0x7df9…ab5c), ⬇ 1.6kiB/s ⬆ 1.6kiB/s
2021-09-25 19:34:47 💤 Idle (3 peers), best: #10 (0x7a55…a45a), finalized #8 (0xd167…22ae), ⬇ 3.0kiB/s ⬆ 3.4kiB/s
2021-09-25 19:34:48 👶 New epoch 1 launching at block 0x7602…ec7f (block slot 272101548 >= start slot 272101548).
2021-09-25 19:34:48 👶 Next epoch starts at slot 272101558
2021-09-25 19:34:48 ✨ Imported #11 (0x7602…ec7f)
2021-09-25 19:34:50 🥩 Round #9 concluded, committed: SignedCommitment { commitment: Commitment { payload: 0xf0a3fb9ad9246d2071beed4daebaf1145e4ccc2939d2a739a832cbbf51fc28ed, block_number: 9, validator_set_id: 0 }, signatures: [Some(Signature(d586df10c3502ca9f0babf7b032e409a79c2506f384ea48216a6d802435d7cd45caf1eca39d7972d01e0a019de10061c40b8f13d5e07f4c7a1865e36d0a72c3100)), None, Some(Signature(ac6edcd9d5df9ed08ddb3870ab058b2709c2e2c31aef9e758a76cc0032971f8d6063048a5fa051f5b6d0be98ef12e9d964b9c9ab371aa1a422eaef4da394773200)), Some(Signature(32635d143cb5f1f4ab1475a0cbc3565f96ae68bf55f7090c53180075fc6f11a86e547e45d6c98b1b8af70cb07e14d3f188c1186320ea9b862fcc83553d059f2101))] }.
But 9988.log seems weird:
$ tail -f 9988.log
error: The argument '--force-authoring' was provided more than once, but cannot be used multiple times
USAGE:
polkadot-collator --alice --collator --offchain-worker <ENABLED> --force-authoring --in-peers <COUNT> --max-parallel-downloads <COUNT> --node-key-type <TYPE> --out-peers <COUNT> --parachain-id <parachain-id> --pool-kbytes <COUNT> --pool-limit <COUNT> --port <PORT> --rpc-methods <METHOD SET> --state-cache-size <Bytes> --sync <SYNC_MODE> --tmp --tracing-receiver <RECEIVER> --wasm-execution <METHOD> --ws-port <PORT>
For more information try --help
So I guess I'm only running a Relay Chain?
Collators for Cumulus/Statemine chains 200 and 300 are dead?
What's wrong with my setup?
Polkadot launch is handing over that argument by default now (because it was almost always needed). So just delete "--force-authoring", and it should all work.

not getting kibana gui outside kubernetes

I have created kops cluster and elasticsearch logging as below.
kops create cluster --zones ap-southeast-1a,ap-southeast-1b,ap-southeast-1c --topology private --networking calico --master-size t2.micro --master-count 3 --node-size t2.micro --node-count 2 --cloud-labels "Project=Kubernetes,Team=Devops" ${NAME} --ssh-public-key /root/.ssh/id_rsa.pub --yes
https://github.com/kubernetes/kops/blob/master/addons/logging-elasticsearch/v1.7.0.yaml
then some important cluster information.
root#ubuntu:~# kubectl get services -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch-logging ClusterIP 100.67.69.222 <none> 9200/TCP 2m
kibana-logging ClusterIP 100.67.182.172 <none> 5601/TCP 2m
kube-dns ClusterIP 100.64.0.10 <none> 53/UDP,53/TCP 6m
root#ubuntu:~# kubectl cluster-info
Kubernetes master is running at https://${NAME}
Elasticsearch is running at https://${NAME}/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy
Kibana is running at https://${NAME}/api/v1/namespaces/kube-system/services/kibana-logging/proxy
KubeDNS is running at https://${NAME}/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
now when i access "https://${NAME}/api/v1/namespaces/kube-system/services/kibana-logging"
i got as below in browser
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "kibana-logging",
"namespace": "kube-system",
"selfLink": "/api/v1/namespaces/kube-system/services/kibana-logging",
"uid": "8fc914f7-3f65-11e9-a970-0aaac13c99b2",
"resourceVersion": "923",
"creationTimestamp": "2019-03-05T16:41:44Z",
"labels": {
"k8s-addon": "logging-elasticsearch.addons.k8s.io",
"k8s-app": "kibana-logging",
"kubernetes.io/cluster-service": "true",
"kubernetes.io/name": "Kibana"
}
},
"spec": {
"ports": [
{
"protocol": "TCP",
"port": 5601,
"targetPort": "ui"
}
],
"selector": {
"k8s-app": "kibana-logging"
},
"clusterIP": "100.67.182.172",
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {
}
}
}
when i access "https://${NAME}/api/v1/namespaces/kube-system/services/elasticsearch-logging"
i got as below in browser
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "elasticsearch-logging",
"namespace": "kube-system",
"selfLink": "/api/v1/namespaces/kube-system/services/elasticsearch-logging",
"uid": "8f7cc654-3f65-11e9-a970-0aaac13c99b2",
"resourceVersion": "902",
"creationTimestamp": "2019-03-05T16:41:44Z",
"labels": {
"k8s-addon": "logging-elasticsearch.addons.k8s.io",
"k8s-app": "elasticsearch-logging",
"kubernetes.io/cluster-service": "true",
"kubernetes.io/name": "Elasticsearch"
}
},
"spec": {
"ports": [
{
"protocol": "TCP",
"port": 9200,
"targetPort": "db"
}
],
"selector": {
"k8s-app": "elasticsearch-logging"
},
"clusterIP": "100.67.69.222",
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {
}
}
}
when i access "https://${NAME}/api/v1/namespaces/kubesystem/services/elasticsearch-logging/proxy/"
i got below in browser
{
"name" : "elasticsearch-logging-0",
"cluster_name" : "kubernetes-logging",
"cluster_uuid" : "_na_",
"version" : {
"number" : "5.6.4",
"build_hash" : "8bbedf5",
"build_date" : "2017-10-31T18:55:38.105Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
when i access "https://${NAME}/api/v1/namespaces/kube-system/services/kibana-logging/proxy"
i got error as below.
:Error: 'dial tcp 100.111.147.69:5601: connect: connection refused'
Trying to reach: 'http://100.111.147.69:5601/'
why i am not getting GUI of kibana here?
After one hour, i got kibana logs as below
{"type":"log","#timestamp":"2019-03-22T12:46:58Z","tags":["info","optimize"],"pid":1,"message":"Optimizing and caching bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page. This may take a few minutes"}
{"type":"log","#timestamp":"2019-03-22T13:18:19Z","tags":["info","optimize"],"pid":1,"message":"Optimization of bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page complete in 1880.89 seconds"}
{"type":"log","#timestamp":"2019-03-22T13:18:20Z","tags":["status","plugin:kibana#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:18:21Z","tags":["status","plugin:elasticsearch#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:18:21Z","tags":["status","plugin:xpack_main#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:18:22Z","tags":["status","plugin:graph#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:18:30Z","tags":["reporting","warning"],"pid":1,"message":"Generating a random key for xpack.reporting.encryptionKey. To prevent pending reports from failing on restart, please set xpack.reporting.encryptionKey in kibana.yml"}
{"type":"log","#timestamp":"2019-03-22T13:18:30Z","tags":["status","plugin:reporting#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:18:36Z","tags":["status","plugin:xpack_main#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","#timestamp":"2019-03-22T13:18:36Z","tags":["status","plugin:graph#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","#timestamp":"2019-03-22T13:18:36Z","tags":["status","plugin:reporting#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","#timestamp":"2019-03-22T13:18:36Z","tags":["status","plugin:elasticsearch#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","#timestamp":"2019-03-22T13:18:55Z","tags":["status","plugin:elasticsearch#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Kibana index ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}
{"type":"log","#timestamp":"2019-03-22T13:18:57Z","tags":["license","info","xpack"],"pid":1,"message":"Imported license information from Elasticsearch for [data] cluster: mode: trial | status: active | expiry date: 2019-04-21T11:47:30+00:00"}
{"type":"log","#timestamp":"2019-03-22T13:18:57Z","tags":["status","plugin:xpack_main#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}
{"type":"log","#timestamp":"2019-03-22T13:18:57Z","tags":["status","plugin:graph#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}
{"type":"log","#timestamp":"2019-03-22T13:18:57Z","tags":["status","plugin:reporting#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}
{"type":"log","#timestamp":"2019-03-22T13:20:37Z","tags":["status","plugin:searchprofiler#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:37Z","tags":["status","plugin:ml#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:38Z","tags":["status","plugin:ml#5.6.4","info"],"pid":1,"state":"yellow","message":"Status changed from green to yellow - Waiting for Elasticsearch","prevState":"green","prevMsg":"Ready"}
{"type":"log","#timestamp":"2019-03-22T13:20:38Z","tags":["status","plugin:ml#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","#timestamp":"2019-03-22T13:20:38Z","tags":["status","plugin:tilemap#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:38Z","tags":["status","plugin:watcher#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:38Z","tags":["status","plugin:grokdebugger#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:38Z","tags":["status","plugin:upgrade#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:38Z","tags":["status","plugin:console#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:39Z","tags":["status","plugin:metrics#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:41Z","tags":["status","plugin:timelion#5.6.4","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","#timestamp":"2019-03-22T13:20:41Z","tags":["listening","info"],"pid":1,"message":"Server running at http://0:5601"}
{"type":"log","#timestamp":"2019-03-22T13:20:41Z","tags":["status","ui settings","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"response","#timestamp":"2019-03-22T13:26:01Z","tags":[],"pid":1,"method":"get","statusCode":200,"req":{"url":"/","method":"get","headers":{"host":"MYHOSTNAME","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36","accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3","accept-encoding":"gzip, deflate, br","accept-language":"en-US,en;q=0.9","cache-control":"max-age=0","upgrade-insecure-requests":"1","x-forwarded-for":"172.20.1.246","x-forwarded-uri":"/api/v1/namespaces/kube-system/services/kibana-logging/proxy/"},"remoteAddress":"100.124.142.0","userAgent":"100.124.142.0"},"res":{"statusCode":200,"responseTime":622,"contentLength":9},"message":"GET / 200 622ms - 9.0B"}
also curl from master node
admin#ip-172-20-51-6:~$ curl 100.66.205.174:5601
<script>var hashRoute = '/api/v1/proxy/namespaces/kube-system/services/kibana-logging/app/kibana';
var defaultRoute = '/api/v1/proxy/namespaces/kube-system/services/kibana-logging/app/kibana';
var hash = window.location.hash;
if (hash.length) {
window.location = hashRoute + hash;
} else {
window.location = defaultRoute;
browser log
This is happening because it takes some time to find optimal setup of ELK.
If you check log of kibana-logging container, you will see these output:
$ kubectl -n kube-system logs -f kibana-logging-8675b4ffd-75rzd
{"type":"log","#timestamp":"2019-03-21T09:11:30Z","tags":["info","optimize"],"pid":1,"message":"Optimizing and caching bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page. This may take a few minutes"}
You need to wait around 1h, then service will be available and you will able access kibana ui.
For more information see this github discussion

Can't connect to ElasticSearch server using Java API and shield

I am trying to connect to my Elasticsearch server using the Java Api and shield. I can execute index, get, delete and search operations on the existing cluster using sense plugin (e.g) and via curl on 9200. I've seen other threads about this but none of them worked and none of them were trying to connect to a Elasticsearch webserver with shield.
I used the same API to connect with my localhost of elasticsearch and it worked fine however when I try to connect with my web server I always get the same error:
Error
1342 [main] DEBUG org.elasticsearch.shield.transport.netty - [Benjamin Jacob Grimm] connected to node [{#transport#-1}{HOST_IP}{HOST/HOST_IP:9300}]
1431 [elasticsearch[Benjamin Jacob Grimm][generic][T#1]] DEBUG org.elasticsearch.shield.transport.netty - [Benjamin Jacob Grimm] disconnecting from [{#transport#-1}{HOST_IP}{HOST/HOST_IP:9300}], channel closed event
1463 [main] INFO org.elasticsearch.client.transport - [Benjamin Jacob Grimm] failed to get node info for {#transport#-1}{HOST_IP}{HOST/HOST_IP:9300}, disconnecting...
NodeDisconnectedException[[][HOST/HOST_IP:9300][cluster:monitor/nodes/liveness] disconnected]
...9200/_nodes
"cluster_name": "elasticsearch",
"nodes": {
"UYdZbCQKQZavtFYOoUpawg": {
"name": "Desmond Pitt",
"transport_address": "HOST_IP:9300",
"host": "HOST_IP",
"ip": "HOST_IP",
"version": "2.3.3",
"build": "218bdf1",
"http_address": "HOST_IP:9200",
"settings": {
"pidfile": "/var/run/elasticsearch/elasticsearch.pid",
"cluster": {
"name": "elasticsearch"
},
"path": {
"conf": "/etc/elasticsearch",
"data": "/var/lib/elasticsearch",
"logs": "/var/log/elasticsearch",
"home": "/usr/share/elasticsearch"
},
"shield": {
"http": {
"ssl": "true"
},
"https": {
"ssl": "true"
},
"transport": {
"ssl": "true"
}
},
"name": "Desmond Pitt",
"client": {
"type": "node"
},
"http": {
"cors": {
"allow-origin": "*",
"allow-headers": "Authorization, Origin, X-Requested-With, Content-Type, Accept",
"allow-credentials": "true",
"allow-methods": "OPTIONS, HEAD, GET, POST, PUT, DELETE",
"enabled": "true"
}
},
"index": {
"queries": {
"cache": {
"type": "opt_out_cache"
}
}
},
"foreground": "false",
"config": {
"ignore_system_properties": "true"
},
"network": {
"host": "HOST_IP",
"bind_host": "0.0.0.0",
"publish_host": "HOST_IP"
}
}
Java code:
TransportClient client = TransportClient.builder()
.addPlugin(ShieldPlugin.class)
.settings(Settings.builder()
.put("cluster.name", ClusterName)
.put("shield.user", "USER:PASSWORD")
.build())
.build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(HOST), 9300));
I've tried as stated on Can't connect to ElasticSearch server using Java API to sync my Java API java version and my server and currently i'm using:
Java API:
C:\Program Files\Java\jdk1.8.0_92
Server:
"version": "1.8.0_91",
"vm_name": "OpenJDK 64-Bit Server VM",
I don't know if it has any problem using ...0_91 and 0_92 but doesn't seem to make any difference because the java API works weel on my localhost server.
If you need more information feel free to ask.
Thanks in advance!
UPDATE:
Changes I did in elasticsearch.yml
shield.ssl.keystore.path: /usr/share/elasticsearch/bin/shield/elastic.jks
shield.ssl.keystore.password: password
shield.ssl.keystore.key_password: password
shield.transport.ssl: true
shield.http.ssl: true
shield.https.ssl: true
network.host: HOST_IP
network.publish_host: HOST_IP
shield.ssl.hostname_verification.resolve_name: false
Result of https://HOST:9200/_cluster/health?pretty=true
{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 5,
"active_shards": 5,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 5,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 50
}
UPDATE2:
I've tried activate SSL according to official documentation and I got the following errors:
2082 [elasticsearch[Steel Serpent][transport_client_worker][T#1]{New I/O worker #1}] DEBUG org.elasticsearch.shield.transport.netty - [Steel Serpent] SSL/TLS handshake failed, closing channel: null
java.nio.channels.ClosedChannelException
at org.jboss.netty.handler.ssl.SslHandler.channelDisconnected(SslHandler.java:575)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Temporary Solution
After that try I did as Vladislav Kysliy suggested and disabled SSL and it worked but I'm looking for a real solution and not a temporary one.
As i can see you enabled SSL encryption. But your java code didn't activate SSL. According official documentation you should use something like this:
TransportClient client = TransportClient.builder()
.addPlugin(ShieldPlugin.class)
.settings(Settings.builder()
.put("cluster.name", "myClusterName")
.put("shield.user", "transport_client_user:changeme")
.put("shield.ssl.keystore.path", "/path/to/client.jks") (1)
.put("shield.ssl.keystore.password", "password")
.put("shield.transport.ssl", "true")
...
.build())
Moreover i would test my code without any encryption and add some new features(e.g SSL) to config and code step by step.
UPD: To be honest remotely fixing ssl issues will be tricky. This errors often appeared when client sends an invalid SSL certificate. Probably you need to disable client auth
Because of you use SSL + Shield the main idea is check your functionality step-by-step: disable SSL - check in Java -API client, enable SSL - check again.

Mesos Marathon apps with persistent volume apps stuck at suspended

I'm having trouble running an app in Marathon using persistent local volumes. Having followed the instructions, starting Marathon with a role and principal and creating a simple app with a persistent volume, it just hangs at suspended. It seems that the slave has responded with a valid offer, but can't actually start up the app. The slave doesn't log anything regarding the task, even when I compile with the debug option and turn logging right up with GLOG_v=2.
Also it seems that Marathon is constantly rolling the task ID as it failing to start, but I can't see why anywhere.
Oddly when I run without persistent volume, but with disk reservation the app starts running.
The debug logging on Marathon doesn't appear to be showing anything useful, however I could be missing something. Could anyone give me any pointers as to what the problem may be or where to look for additional debug? Many thanks in advance 😄 .
Here's some info about my environment and debug info:
Slave: Ubuntu 14.04 running 0.28 prebuilt and tested in 0.29 built from source
Master: Mesos 0.28 running inside a Docker Ubuntu 14.04 image on CoreOS
Marathon: 1.1.1 running inside a Docker Ubuntu 14.04 image on CoreOS
App with persistent storage
App info from v2/apps/test/tasks on Marathon
{
"app": {
"id": "/test",
"cmd": "while true; do sleep 10; done",
"args": null,
"user": null,
"env": {},
"instances": 1,
"cpus": 1,
"mem": 128,
"disk": 0,
"executor": "",
"constraints": [
[
"role",
"CLUSTER",
"persistent"
]
],
"uris": [],
"fetch": [],
"storeUrls": [],
"ports": [
10002
],
"portDefinitions": [
{
"port": 10002,
"protocol": "tcp",
"labels": {}
}
],
"requirePorts": false,
"backoffSeconds": 1,
"backoffFactor": 1.15,
"maxLaunchDelaySeconds": 3600,
"container": {
"type": "MESOS",
"volumes": [
{
"containerPath": "test",
"mode": "RW",
"persistent": {
"size": 100
}
}
]
},
"healthChecks": [],
"readinessChecks": [],
"dependencies": [],
"upgradeStrategy": {
"minimumHealthCapacity": 0.5,
"maximumOverCapacity": 0
},
"labels": {},
"acceptedResourceRoles": null,
"ipAddress": null,
"version": "2016-05-19T11:31:54.861Z",
"residency": {
"relaunchEscalationTimeoutSeconds": 3600,
"taskLostBehavior": "WAIT_FOREVER"
},
"versionInfo": {
"lastScalingAt": "2016-05-19T11:31:54.861Z",
"lastConfigChangeAt": "2016-05-18T16:46:59.684Z"
},
"tasksStaged": 0,
"tasksRunning": 0,
"tasksHealthy": 0,
"tasksUnhealthy": 0,
"deployments": [
{
"id": "4f3779e5-a805-4b95-9065-f3cf9c90c8fe"
}
],
"tasks": [
{
"id": "test.4b7d4303-1dc2-11e6-a179-a2bd870b1e9c",
"slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S17",
"host": "ip-10-0-90-61.eu-west-1.compute.internal",
"localVolumes": [
{
"containerPath": "test",
"persistenceId": "test#test#4b7d4302-1dc2-11e6-a179-a2bd870b1e9c"
}
],
"appId": "/test"
}
]
}
}
App info in Marathon: (it seems the deployment is spinning)
App without persistent storage
App info from v2/apps/test2/tasks on Marathon
{
"app": {
"id": "/test2",
"cmd": "while true; do sleep 10; done",
"args": null,
"user": null,
"env": {},
"instances": 1,
"cpus": 1,
"mem": 128,
"disk": 100,
"executor": "",
"constraints": [
[
"role",
"CLUSTER",
"persistent"
]
],
"uris": [],
"fetch": [],
"storeUrls": [],
"ports": [
10002
],
"portDefinitions": [
{
"port": 10002,
"protocol": "tcp",
"labels": {}
}
],
"requirePorts": false,
"backoffSeconds": 1,
"backoffFactor": 1.15,
"maxLaunchDelaySeconds": 3600,
"container": null,
"healthChecks": [],
"readinessChecks": [],
"dependencies": [],
"upgradeStrategy": {
"minimumHealthCapacity": 0.5,
"maximumOverCapacity": 0
},
"labels": {},
"acceptedResourceRoles": null,
"ipAddress": null,
"version": "2016-05-19T13:44:01.831Z",
"residency": null,
"versionInfo": {
"lastScalingAt": "2016-05-19T13:44:01.831Z",
"lastConfigChangeAt": "2016-05-19T13:09:20.106Z"
},
"tasksStaged": 0,
"tasksRunning": 1,
"tasksHealthy": 0,
"tasksUnhealthy": 0,
"deployments": [],
"tasks": [
{
"id": "test2.bee624f1-1dc7-11e6-b98e-568f3f9dead8",
"slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S18",
"host": "ip-10-0-90-61.eu-west-1.compute.internal",
"startedAt": "2016-05-19T13:44:02.190Z",
"stagedAt": "2016-05-19T13:44:02.023Z",
"ports": [
31926
],
"version": "2016-05-19T13:44:01.831Z",
"ipAddresses": [
{
"ipAddress": "10.0.90.61",
"protocol": "IPv4"
}
],
"appId": "/test2"
}
],
"lastTaskFailure": {
"appId": "/test2",
"host": "ip-10-0-90-61.eu-west-1.compute.internal",
"message": "Slave ip-10-0-90-61.eu-west-1.compute.internal removed: health check timed out",
"state": "TASK_LOST",
"taskId": "test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c",
"timestamp": "2016-05-19T13:15:24.155Z",
"version": "2016-05-19T13:09:20.106Z",
"slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S17"
}
}
}
Slave log when running the app without:
I0519 13:09:22.471876 12459 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.471906 12459 status_update_manager.cpp:497] Creating StatusUpdate stream for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.472262 12459 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.477686 12459 status_update_manager.cpp:374] Forwarding update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to the agent
I0519 13:09:22.477830 12453 process.cpp:2605] Resuming slave(1)#10.0.90.61:5051 at 2016-05-19 13:09:22.477814016+00:00
I0519 13:09:22.477967 12453 slave.cpp:3638] Forwarding the update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to master#10.0.82.230:5050
I0519 13:09:22.478185 12453 slave.cpp:3532] Status update manager successfully handled status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.478229 12453 slave.cpp:3548] Sending acknowledgement for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to executor(1)#10.0.90.61:34262
I0519 13:09:22.488315 12460 pid.cpp:95] Attempting to parse 'master#10.0.82.230:5050' into a PID
I0519 13:09:22.488370 12460 process.cpp:646] Parsed message name 'mesos.internal.StatusUpdateAcknowledgementMessage' for slave(1)#10.0.90.61:5051 from master#10.0.82.230:5050
I0519 13:09:22.488452 12452 process.cpp:2605] Resuming slave(1)#10.0.90.61:5051 at 2016-05-19 13:09:22.488441856+00:00
I0519 13:09:22.488600 12458 process.cpp:2605] Resuming (14)#10.0.90.61:5051 at 2016-05-19 13:09:22.488590080+00:00
I0519 13:09:22.488632 12458 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.488726 12458 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.492985 12452 process.cpp:2605] Resuming slave(1)#10.0.90.61:5051 at 2016-05-19 13:09:22.492974080+00:00
I0519 13:09:22.493021 12452 slave.cpp:2629] Status update manager successfully handled status update acknowledgement (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
May be due to low disk space or RAM.
Minimum Idle configuration is specified in the below link

Resources