Ansible - Gather Confluent services Status from all hosts in a file - ansible

This is my set-up
Bootstrap servers
confl-server1
confl-server2
confl-server3
Connect Servers
confl-server4
confl-server5
REST Proxy
confl-server4
Schema Registry
confl-server4
confl-server5
Control Center
confl-server6
Zookeepers
confl-server7
confl-server8
confl-server9
When I execute the systemctl status confluent-* command on On confl-server4, I get the below output.
systemctl status confluent-*
● confluent-kafka-connect.service - Apache Kafka Connect - distributed
Loaded: loaded (/usr/lib/systemd/system/confluent-kafka-connect.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/confluent-kafka-connect.service.d
└─override.conf
Active: active (running) since Thu 2022-02-24 17:33:06 EST; 1 day 18h ago
Docs: http://docs.confluent.io/
Main PID: 29825 (java)
CGroup: /system.slice/confluent-kafka-connect.service
└─29825 java -Xms256M -Xmx2G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX...
● confluent-schema-registry.service - RESTful Avro schema registry for Apache Kafka
Loaded: loaded (/usr/lib/systemd/system/confluent-schema-registry.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/confluent-schema-registry.service.d
└─override.conf
Active: active (running) since Thu 2022-01-06 15:49:55 EST; 1 months 20 days ago
Docs: http://docs.confluent.io/
Main PID: 23391 (java)
CGroup: /system.slice/confluent-schema-registry.service
└─23391 java -Xmx1000M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.aw...
● confluent-kafka-rest.service - A REST proxy for Apache Kafka
Loaded: loaded (/usr/lib/systemd/system/confluent-kafka-rest.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/confluent-kafka-rest.service.d
└─override.conf
Active: active (running) since Sun 2022-01-02 00:06:07 EST; 1 months 25 days ago
Docs: http://docs.confluent.io/
Main PID: 890 (java)
CGroup: /system.slice/confluent-kafka-rest.service
└─890 java -Xmx256M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.h...
I want to write an ansible script where I can get the service status information on a particular host with the service name in a single output that I can redirect to a file on broker1
This is what I have tried ( based on some post on SO ),
---
- name: Check Confluent services status
# hosts: localhost
hosts: all
gather_facts: false
become: true
vars:
ansible_ssh_extra_args: "-o StrictHostKeyChecking=no"
ansible_host_key_checking: false
tasks:
- name: Check if confluent is active
command: systemctl status confluent-*
register: confluent_check
ignore_errors: yes
no_log: True
failed_when: false
- name: Debug message - Check if confluent is active
debug:
msg: "{{ ansible_play_hosts | map('extract', hostvars, 'confluent_check') | map(attribute='stdout') | list }}"
but it gives the output and a lot more for different confluent components for every service in long format on every server
ok: [confl-server4] => {
"msg": [
"● confluent-zookeeper.service - Apache Kafka - ZooKeeper\n Loaded: loaded (/usr/lib/systemd/system/confluent-zookeeper.service; enabled; vendor preset: disabled)\n Drop-In: /etc/systemd/system/confluent-zookeeper.service.d\n └─override.conf\n Active: active (running) since Mon 2022-01-10 11:54:38 EST; 1 months 16 days ago\n Docs: http://docs.confluent.io/\n Main PID: 26052 (java)\n CGroup: /system.slice/confluent-zookeeper.service\n └─26052 java -Xmx1g -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xloggc:/var/log/kafka/zookeeper-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/etc/kafka/log4j.properties -cp /usr/bin/../ce-broker-plugins/build/libs/*:/usr/bin/../ce-broker-plugins/build/dependant-libs/*:/usr/bin/../ce-auth-providers/build/libs/*:/usr/bin/../ce-auth-providers/build/dependant-libs/*:/usr/bin/../ce-rest-server/build/libs/*:/usr/bin/../ce-rest-server/build/dependant-libs/*:/usr/bin/../ce-audit/build/libs/*:/usr/bin/../ce-audit/build/dependant-libs/*:/usr/bin/../share/java/kafka/*:/usr/bin/../share/java/confluent-metadata-service/*:/usr/bin/../share/java/rest-utils/*:/usr/bin/../share/java/confluent-common/*:/usr/bin/../share/java/confluent-security/schema-validator/*:/usr/bin/../support-metrics-client/build/dependant-libs-2.12.10/*:/usr/bin/../support-metrics-client/build/libs/*:/usr/share/java/support-metrics-client/*:/usr/bin/../support-metrics-fullcollector/build/dependant-libs-2.12.10/*:/usr/bin/../support-metrics-fullcollector/build/libs/*:/usr/share/java/support-metrics-fullcollector/* -Dlog4j.configuration=file:/etc/kafka/zookeeper_log4j.properties org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/kafka/zookeeper.properties\n\nFeb 26 07:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 07:54:39,613] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 08:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 08:54:39,612] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 08:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 08:54:39,612] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)\nFeb 26 08:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 08:54:39,612] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 09:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 09:54:39,612] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 09:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 09:54:39,612] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)\nFeb 26 09:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 09:54:39,613] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 10:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 10:54:39,612] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 10:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 10:54:39,612] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)\nFeb 26 10:54:39 confl-server8 zookeeper-server-start[26052]: [2022-02-26 10:54:39,612] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)",
"● confluent-zookeeper.service - Apache Kafka - ZooKeeper\n Loaded: loaded (/usr/lib/systemd/system/confluent-zookeeper.service; enabled; vendor preset: disabled)\n Drop-In: /etc/systemd/system/confluent-zookeeper.service.d\n └─override.conf\n Active: active (running) since Mon 2022-01-10 11:52:47 EST; 1 months 16 days ago\n Docs: http://docs.confluent.io/\n Main PID: 23394 (java)\n CGroup: /system.slice/confluent-zookeeper.service\n └─23394 java -Xmx1g -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xloggc:/var/log/kafka/zookeeper-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/etc/kafka/log4j.properties -cp /usr/bin/../ce-broker-plugins/build/libs/*:/usr/bin/../ce-broker-plugins/build/dependant-libs/*:/usr/bin/../ce-auth-providers/build/libs/*:/usr/bin/../ce-auth-providers/build/dependant-libs/*:/usr/bin/../ce-rest-server/build/libs/*:/usr/bin/../ce-rest-server/build/dependant-libs/*:/usr/bin/../ce-audit/build/libs/*:/usr/bin/../ce-audit/build/dependant-libs/*:/usr/bin/../share/java/kafka/*:/usr/bin/../share/java/confluent-metadata-service/*:/usr/bin/../share/java/rest-utils/*:/usr/bin/../share/java/confluent-common/*:/usr/bin/../share/java/confluent-security/schema-validator/*:/usr/bin/../support-metrics-client/build/dependant-libs-2.12.10/*:/usr/bin/../support-metrics-client/build/libs/*:/usr/share/java/support-metrics-client/*:/usr/bin/../support-metrics-fullcollector/build/dependant-libs-2.12.10/*:/usr/bin/../support-metrics-fullcollector/build/libs/*:/usr/share/java/support-metrics-fullcollector/* -Dlog4j.configuration=file:/etc/kafka/zookeeper_log4j.properties org.apache.zookeeper.server.quorum.QuorumPeerMain /etc/kafka/zookeeper.properties\n\nFeb 26 07:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 07:52:48,217] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 08:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 08:52:48,216] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 08:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 08:52:48,216] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)\nFeb 26 08:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 08:52:48,217] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 09:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 09:52:48,216] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 09:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 09:52:48,216] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)\nFeb 26 09:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 09:52:48,216] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 10:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 10:52:48,216] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)\nFeb 26 10:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 10:52:48,216] INFO zookeeper.snapshot.trust.empty : false (org.apache.zookeeper.server.persistence.FileTxnSnapLog)\nFeb 26 10:52:48 confl-server7 zookeeper-server-start[23394]: [2022-02-26 10:52:48,217] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)",
I also tried
- name: checking service status
command: systemctl status "{{ item }}"
loop: "{{ ansible_facts.services.keys() | select('match', '^.*confluent.*$') | list }}"
register: result
ignore_errors: yes
- name: checking service status showing report
debug:
var: result
But that gives even longer output for each host
I would like to get the server name, service name and status - running or failed -
Server: confl-server4
Service: confluent-kafka-connect.service
Active: Active (Running).
or (Failed), if failed
for services on all servers in a single file on the broker1 host
How can I achieve that?
Thank you

Related

Metricbeat - Not creating any logfile

I am trying to set up metric beat for my CentOS7 host. I have explictly mentioned the logfile location for the metricbeat and the logging level is debug, but I dont see a log file created. I can see the logs in journalctl. Please let me know why the logfile is not creating. Same setting works with filebeat and the log file gets created.
Metricbeat version:
root#example.domain.com:/usr/share/metricbeat# metricbeat version
metricbeat version 7.2.0 (amd64), libbeat 7.2.0 [9ba65d864ca37cd32c25b980dbb4020975288fc0 built 2019-06-20 15:07:31 +0000 UTC]
Metricbeat config file:
/etc/metricbeat/metricbeat.yml
metricbeat:
config:
modules:
path: /etc/metricbeat/modules.d/*.yml
reload.enabled: true
reload.period: 10s
output.logstash:
hosts: ['logstash.domain.com:5158']
worker: 1
compression_level: 3
loadbalance: true
ssl:
certificate: /usr/share/metricbeat/metricbeat.crt
key: /usr/share/metricbeat/metricbeat.key
verification_mode: none
logging:
level: debug
to_files: true
files:
path: /var/myapp/log/metricbeat
name: metricbeat.log
rotateeverybytes: 10485760
keepfiles: 7
Ideally it should create a file (metricbeat.log) in /var/myapp/log/metricbeat location, but I dont see any files getting created.
Journalctl output:
* metricbeat.service - Metricbeat is a lightweight shipper for metrics.
Loaded: loaded (/usr/lib/systemd/system/metricbeat.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2022-01-24 08:51:13 PST; 39min ago
Docs: https://www.elastic.co/products/beats/metricbeat
Main PID: 13520 (metricbeat)
CGroup: /system.slice/metricbeat.service
`-13520 /usr/share/metricbeat/bin/metricbeat -e -c /etc/metricbeat/metricbeat.yml -path.home /usr/share/metricbeat -path.config /etc/metricbeat -path.data /var/lib/metricbeat -path.logs /var/log/metricbeat
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "/var/lib/metricbeat",
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "-path.logs",
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "/var/log/metricbeat"
Jan 24 09:30:14 example.domain.com metricbeat[13520]: ]
Jan 24 09:30:14 example.domain.com metricbeat[13520]: },
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "user": {
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "name": "root"
Jan 24 09:30:14 example.domain.com metricbeat[13520]: },
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "event": {
Jan 24 09:30:14 example.domain.com metricbeat[13520]: "module": "system",
I dont see any thing in "/var/log/metricbeat" directory as well.
UPDATE: I tried with version 6.3 and 7.16. It works fine. Looks like some issue with 7.2

Missing queues from RabbitMQ Metricbeat

It looks like only a fraction of the queues on my RabbitMQ cluster are making it into Elasticsearch via Metricbeat.
When I query RabbitMQ's /api/overview, I see 887 queues reported:
object_totals: {
consumers: 517,
queues: 887,
exchanges: 197,
connections: 305,
channels: 622
},
When I query RabbitMQ's /api/queues (which is what Metricbeat hits), I count 887 queues there as well.
When I get a unique count of the field rabbitmq.queue.name in Elasticsearch, I am seeing only 309 queues.
I don't see anything in the debug output that stands out to me. It's just the usual INFO level startup messages, followed by the publish information:
root#rabbitmq:/etc/metricbeat# metricbeat -e
2019-06-24T21:13:33.692Z INFO instance/beat.go:571 Home path: [/usr/share/metricbeat] Config path: [/etc/metricbeat] Data path: [/var/lib/metricbeat] Logs path: [/var/log/metricbeat]
2019-06-24T21:13:33.692Z INFO instance/beat.go:579 Beat ID: xxx
2019-06-24T21:13:33.692Z INFO [index-management.ilm] ilm/ilm.go:129 Policy name: metricbeat-7.1.1
2019-06-24T21:13:33.692Z INFO [seccomp] seccomp/seccomp.go:116 Syscall filter successfully installed
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:827 Beat info {"system_info": {"beat": {"path": {"config": "/etc/metricbeat", "data": "/var/lib/metricbeat", "home": "/usr/share/metricbeat", "logs": "/var/log/metricbeat"}, "type": "metricbeat", "uuid": "xxx"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:836 Build info {"system_info": {"build": {"commit": "3358d9a5a09e3c6709a2d3aaafde628ea34e8419", "libbeat": "7.1.1", "time": "2019-05-23T13:23:10.000Z", "version": "7.1.1"}}}
2019-06-24T21:13:33.692Z INFO [beat] instance/beat.go:839 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.11.5"}}}
[...]
2019-06-24T21:13:33.694Z INFO [beat] instance/beat.go:872 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"effective":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"ambient":null}, "cwd": "/etc/metricbeat", "exe": "/usr/share/metricbeat/bin/metricbeat", "name": "metricbeat", "pid": 30898, "ppid": 30405, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2019-06-24T21:13:33.100Z"}}}
2019-06-24T21:13:33.694Z INFO instance/beat.go:280 Setup Beat: metricbeat; Version: 7.1.1
2019-06-24T21:13:33.694Z INFO [publisher] pipeline/module.go:97 Beat name: metricbeat
2019-06-24T21:13:33.694Z INFO instance/beat.go:391 metricbeat start running.
2019-06-24T21:13:33.694Z INFO cfgfile/reload.go:150 Config reloader started
2019-06-24T21:13:33.694Z INFO [monitoring] log/log.go:117 Starting metrics logging every 30s
[...]
2019-06-24T21:13:43.696Z INFO filesystem/filesystem.go:57 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:43.696Z INFO fsstat/fsstat.go:59 Ignoring filesystem types: sysfs, rootfs, ramfs, bdev, proc, cpuset, cgroup, cgroup2, tmpfs, devtmpfs, configfs, debugfs, tracefs, securityfs, sockfs, dax, bpf, pipefs, hugetlbfs, devpts, ecryptfs, fuse, fusectl, pstore, mqueue, autofs
2019-06-24T21:13:44.696Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://xxx))
2019-06-24T21:13:44.711Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://xxx)) established
2019-06-24T21:14:03.696Z INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":130,"time":{"ms":131}},"total":{"ticks":1960,"time":{"ms":1965},"value":1960},"user":{"ticks":1830,"time":{"ms":1834}}},"handles":{"limit":{"hard":1048576,"soft":1024},"open":12},"info":{"ephemeral_id":"xxx","uptime":{"ms":30030}},"memstats":{"gc_next":30689808,"memory_alloc":21580680,"memory_total":428076400,"rss":79917056}},"libbeat":{"config":{"module":{"running":0},"reloads":2},"output":{"events":{"acked":7825,"batches":11,"total":7825},"read":{"bytes":66},"type":"logstash","write":{"bytes":870352}},"pipeline":{"clients":4,"events":{"active":313,"published":8138,"retry":523,"total":8138},"queue":{"acked":7825}}},"metricbeat":{"rabbitmq":{"connection":{"events":2987,"failures":10,"success":2977},"exchange":{"events":1970,"success":1970},"node":{"events":10,"success":10},"queue":{"events":3130,"failures":10,"success":3120}},"system":{"cpu":{"events":2,"success":2},"filesystem":{"events":7,"success":7},"fsstat":{"events":1,"success":1},"load":{"events":2,"success":2},"memory":{"events":2,"success":2},"network":{"events":4,"success":4},"process":{"events":18,"success":18},"process_summary":{"events":2,"success":2},"socket_summary":{"events":2,"success":2},"uptime":{"events":1,"success":1}}},"system":{"cpu":{"cores":4},"load":{"1":0.48,"15":0.28,"5":0.15,"norm":{"1":0.12,"15":0.07,"5":0.0375}}}}}}
I think if there were a problem getting the queue, I should see an error in the logs above as per https://github.com/elastic/beats/blob/master/metricbeat/module/rabbitmq/queue/data.go#L94-L104
Here's the metricbeat.yml:
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
reload.period: 10s
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
name: metricbeat
fields:
environment: development
processors:
- add_cloud_metadata: ~
output.logstash:
hosts: ["xxx"]
Here's the modules.d/rabbitmq.yml:
- module: rabbitmq
metricsets: ["node", "queue", "connection", "exchange"]
enabled: true
period: 2s
hosts: ["xxx"]
username: xxx
password: xxx
I solved it by upgrading Elastic Stack from 7.1.1 to 7.2.0.

Elasticsearch: Data server not discovering master

I have a 1 data - 1 master es cluster. (using 6.4.2 on CentOS 7)
On my master01:
==> /opt/elasticsearch/logs/master01-elastic.my-local-domain-master01-elastic/esa-local-stg-cluster.log <==
[2019-02-08T11:06:21,267][INFO ][o.e.n.Node ] [master01-elastic] initialized
[2019-02-08T11:06:21,267][INFO ][o.e.n.Node ] [master01-elastic] starting ...
[2019-02-08T11:06:21,460][INFO ][o.e.t.TransportService ] [master01-elastic] publish_address {10.18.0.13:9300}, bound_addresses {10.18.0.13:9300}
[2019-02-08T11:06:21,478][INFO ][o.e.b.BootstrapChecks ] [master01-elastic] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-02-08T11:06:24,543][INFO ][o.e.c.s.MasterService ] [master01-elastic] zen-disco-elected-as-master ([0] nodes joined)[, ], reason: new_master {master01-elastic}{10kX4tQMTzS0O8AQYvieZw}{GH9oflu7QZuJB_U7sPJDlg}{10.18.0.13}{10.18.0.13:9300}{xpack.installed=true}
[2019-02-08T11:06:24,550][INFO ][o.e.c.s.ClusterApplierService] [master01-elastic] new_master {master01-elastic}{10kX4tQMTzS0O8AQYvieZw}{GH9oflu7QZuJB_U7sPJDlg}{10.18.0.13}{10.18.0.13:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {master01-elastic}{10kX4tQMTzS0O8AQYvieZw}{GH9oflu7QZuJB_U7sPJDlg}{10.18.0.13}{10.18.0.13:9300}{xpack.installed=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)[, ]]])
[2019-02-08T11:06:24,575][INFO ][o.e.h.n.Netty4HttpServerTransport] [master01-elastic] publish_address {10.18.0.13:9200}, bound_addresses {10.18.0.13:9200}
[2019-02-08T11:06:24,575][INFO ][o.e.n.Node ] [master01-elastic] started
[2019-02-08T11:06:24,614][INFO ][o.e.l.LicenseService ] [master01-elastic] license [c2004733-fa30-4249-bb07-d5f2238816ad] mode [basic] - valid
[2019-02-08T11:06:24,615][INFO ][o.e.g.GatewayService ] [master01-elastic] recovered [0] indices into cluster_state
[root#master01-elastic ~]# systemctl status elasticsearch
● master01-elastic_elasticsearch.service - Elasticsearch-master01-elastic
Loaded: loaded (/usr/lib/systemd/system/master01-elastic_elasticsearch.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2019-02-08 11:06:12 EST; 2 days ago
Docs: http://www.elastic.co
Main PID: 18695 (java)
CGroup: /system.slice/master01-elastic_elasticsearch.service
├─18695 /bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Djava.awt.headless=true -Dfile.encoding...
└─18805 /usr/share/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller
Feb 08 11:06:12 master01-elastic systemd[1]: Started Elasticsearch-master01-elastic.
[root#master01-elastic ~]# ss -tula | grep -i 9300
[root#master01-elastic ~]#
cluster logs on my master01:
[2019-02-11T02:36:21,406][INFO ][o.e.n.Node ] [master01-elastic] initialized
[2019-02-11T02:36:21,406][INFO ][o.e.n.Node ] [master01-elastic] starting ...
[2019-02-11T02:36:21,619][INFO ][o.e.t.TransportService ] [master01-elastic] publish_address {10.18.0.13:9300}, bound_addresses {10.18.0.13:9300}
[2019-02-11T02:36:21,654][INFO ][o.e.b.BootstrapChecks ] [master01-elastic] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-02-11T02:36:24,813][INFO ][o.e.c.s.MasterService ] [master01-elastic] zen-disco-elected-as-master ([0] nodes joined)[, ], reason: new_master {master01-elastic}{10kX4tQMTzS0O8AQYvieZw}{Vgq60hVVRn-3aO_uBuc2uQ}{10.18.0.13}{10.18.0.13:9300}{xpack.installed=true}
[2019-02-11T02:36:24,818][INFO ][o.e.c.s.ClusterApplierService] [master01-elastic] new_master {master01-elastic}{10kX4tQMTzS0O8AQYvieZw}{Vgq60hVVRn-3aO_uBuc2uQ}{10.18.0.13}{10.18.0.13:9300}{xpack.installed=true}, reason: apply cluster state (from master [master {master01-elastic}{10kX4tQMTzS0O8AQYvieZw}{Vgq60hVVRn-3aO_uBuc2uQ}{10.18.0.13}{10.18.0.13:9300}{xpack.installed=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)[, ]]])
[2019-02-11T02:36:24,856][INFO ][o.e.h.n.Netty4HttpServerTransport] [master01-elastic] publish_address {10.18.0.13:9200}, bound_addresses {10.18.0.13:9200}
[2019-02-11T02:36:24,856][INFO ][o.e.n.Node ] [master01-elastic] started
[2019-02-11T02:36:24,873][INFO ][o.e.l.LicenseService ] [master01-elastic] license [c2004733-fa30-4249-bb07-d5f2238816ad] mode [basic] - valid
[2019-02-11T02:36:24,875][INFO ][o.e.g.GatewayService ] [master01-elastic] recovered [0] indices into cluster_state
This makes master undiscoverable so in my data01
[2019-02-11T02:24:09,882][WARN ][o.e.d.z.ZenDiscovery ] [data01-elastic] not enough master nodes discovered during pinging (found [[]], but needed [1]), pinging again
Also on my data01
[root#data01-elastic ~]# cat /etc/elasticsearch/data01-elastic/elasticsearch.yml | grep -i zen
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts: 10.18.0.13:9300
[root#data01-elastic ~]# ping 10.18.0.13
PING 10.18.0.13 (10.18.0.13) 56(84) bytes of data.
64 bytes from 10.18.0.13: icmp_seq=1 ttl=64 time=0.171 ms
64 bytes from 10.18.0.13: icmp_seq=2 ttl=64 time=0.147 ms
How can I further troubleshoot this?
The cluster was deployed using these ansible scripts:
with this configuration for the master:
- hosts: masters
tasks:
- name: Elasticsearch Master Configuration
import_role:
name: elastic.elasticsearch
vars:
es_instance_name: "{{ ansible_hostname }}"
es_data_dirs:
- "{{ data_dir }}"
es_log_dir: "/opt/elasticsearch/logs"
es_config:
node.name: "{{ ansible_hostname }}"
cluster.name: "{{ cluster_name }}"
discovery.zen.ping.unicast.hosts: "{% for host in groups['masters'] -%}{{ hostvars[host]['ansible_ens33']['ipv4']['address'] }}:9300{% if not loop.last %},{% endif %}{%- endfor %}"
http.port: 9200
transport.tcp.port: 9300
node.data: false
node.master: true
bootstrap.memory_lock: true
network.host: '{{ ansible_facts["ens33"]["ipv4"]["address"] }}'
discovery.zen.minimum_master_nodes: 1
es_xpack_features: []
es_scripts: false
es_templates: false
es_version_lock: true
es_heap_size: 2g
es_api_port: 9200
and this for the data
- hosts: data
tasks:
- name: Elasticsearch Data Configuration
import_role:
name: elastic.elasticsearch
vars:
es_instance_name: "{{ ansible_hostname }}"
es_data_dirs:
- "{{ data_dir }}"
es_log_dir: "/opt/elasticsearch/logs"
es_config:
node.name: "{{ ansible_hostname }}"
cluster.name: "{{ cluster_name }}"
discovery.zen.ping.unicast.hosts: "{% for host in groups['masters'] -%}{{ hostvars[host]['ansible_ens33']['ipv4']['address'] }}:9300{% if not loop.last %},{% endif %}{%- endfor %}"
http.port: 9200
transport.tcp.port: 9300
node.data: true
node.master: false
bootstrap.memory_lock: true
network.host: '{{ ansible_facts["ens33"]["ipv4"]["address"] }}'
discovery.zen.minimum_master_nodes: 1
es_xpack_features: []
es_scripts: false
es_templates: false
es_version_lock: true
es_heap_size: 6g
es_api_port: 9200
The 2 VMs I was trying to establish communication among were Centos7 which has firewalld enabled by default.
Disabling and stopping the service solved the issue.

How to enable IP spoofing by Jmeter

I am trying to achieve IP spoofing as guide in Blazemeter tutorial
I did it successfully by single IP but when I configure more that one IP through CSV file then I am not able to achieve the desired result.
only single IP is displaying in the access log.
It is clear from JMeter that 2 requests are generating from different IPs
Screenshot1:
Screenshot2:
2016/08/22 12:11:40 INFO - jmeter.engine.StandardJMeterEngine: Running the test!
2016/08/22 12:11:40 INFO - jmeter.samplers.SampleEvent: List of sample_variables: []
2016/08/22 12:11:40 INFO - jmeter.gui.util.JMeterMenuBar: setRunning(true,*local*)
2016/08/22 12:11:40 INFO - jmeter.engine.StandardJMeterEngine: Starting ThreadGroup: 1 : Thread Group
2016/08/22 12:11:40 INFO - jmeter.engine.StandardJMeterEngine: Starting 1 threads for group Thread Group.
2016/08/22 12:11:40 INFO - jmeter.engine.StandardJMeterEngine: Thread will continue on error
2016/08/22 12:11:40 INFO - jmeter.threads.ThreadGroup: Starting thread group number 1 threads 1 ramp-up 1 perThread 1000.0 delayedStart=false
2016/08/22 12:11:40 INFO - jmeter.threads.ThreadGroup: Started thread group number 1
2016/08/22 12:11:40 INFO - jmeter.engine.StandardJMeterEngine: All thread groups have been started
2016/08/22 12:11:40 INFO - jmeter.threads.JMeterThread: Thread started: Thread Group 1-1
2016/08/22 12:11:40 INFO - jmeter.services.FileServer: Stored: D:\apache-jmeter-3.0\apache-jmeter-3.0\bin\IP.csv
2016/08/22 12:11:40 INFO - jmeter.protocol.http.sampler.HTTPHCAbstractImpl: Local host = IAPC76
2016/08/22 12:11:40 INFO - jmeter.protocol.http.sampler.HTTPHC4Impl: HTTP request retry count = 0
2016/08/22 12:11:42 INFO - jmeter.util.BeanShellTestElement: Ip should behttp://localhost:8080/
2016/08/22 12:11:43 INFO - jmeter.util.BeanShellTestElement: Ip should behttp://localhost:8080/
2016/08/22 12:11:43 INFO - jmeter.threads.JMeterThread: Stop Thread seen: org.apache.jorphan.util.JMeterStopThreadException: End of file detected
2016/08/22 12:11:43 INFO - jmeter.threads.JMeterThread: Thread finished: Thread Group 1-1
2016/08/22 12:11:43 INFO - jmeter.engine.StandardJMeterEngine: Notifying test listeners of end of test
2016/08/22 12:11:43 INFO - jmeter.services.FileServer: Close: D:\apache-jmeter-3.0\apache-jmeter-3.0\bin\IP.csv
2016/08/22 12:11:43 INFO - jmeter.gui.util.JMeterMenuBar: setRunning(false,*local*)
2016/08/22 12:12:16 INFO - jmeter.engine.StandardJMeterEngine: Running the test!
2016/08/22 12:12:16 INFO - jmeter.samplers.SampleEvent: List of sample_variables: []
2016/08/22 12:12:16 INFO - jmeter.gui.util.JMeterMenuBar: setRunning(true,*local*)
2016/08/22 12:12:16 INFO - jmeter.engine.StandardJMeterEngine: Starting ThreadGroup: 1 : Thread Group
2016/08/22 12:12:16 INFO - jmeter.engine.StandardJMeterEngine: Starting 1 threads for group Thread Group.
2016/08/22 12:12:16 INFO - jmeter.engine.StandardJMeterEngine: Thread will continue on error
2016/08/22 12:12:16 INFO - jmeter.threads.ThreadGroup: Starting thread group number 1 threads 1 ramp-up 1 perThread 1000.0 delayedStart=false
2016/08/22 12:12:16 INFO - jmeter.threads.ThreadGroup: Started thread group number 1
2016/08/22 12:12:16 INFO - jmeter.engine.StandardJMeterEngine: All thread groups have been started
2016/08/22 12:12:16 INFO - jmeter.threads.JMeterThread: Thread started: Thread Group 1-1
2016/08/22 12:12:16 INFO - jmeter.services.FileServer: Stored: D:\apache-jmeter-3.0\apache-jmeter-3.0\bin\IP.csv
2016/08/22 12:12:34 INFO - jmeter.util.BeanShellTestElement: Ip should behttp://192.168.1.140:8080/
2016/08/22 12:12:34 INFO - jmeter.util.BeanShellTestElement: Ip should behttp://192.168.1.140:8080/
2016/08/22 12:12:34 INFO - jmeter.threads.JMeterThread: Stop Thread seen: org.apache.jorphan.util.JMeterStopThreadException: End of file detected
2016/08/22 12:12:34 INFO - jmeter.threads.JMeterThread: Thread finished: Thread Group 1-1
2016/08/22 12:12:34 INFO - jmeter.engine.StandardJMeterEngine: Notifying test listeners of end of test
2016/08/22 12:12:34 INFO - jmeter.services.FileServer: Close: D:\apache-jmeter-3.0\apache-jmeter-3.0\bin\IP.csv
2016/08/22 12:12:34 INFO - jmeter.gui.util.JMeterMenuBar: setRunning(false,*local*)

Changing Hostname on Mesos-Slave not working

I followed the tutorial from https://open.mesosphere.com/getting-started/install/ to setup mesos and marathon.
I am using Vagrant to create 2 nodes, a master and a slave.
At the end of the tutorial I have marathon and mesos functioning.
First problem: Only the slave on the master machine is visible to mesos. The "independent" slave on the second vagrant node machine is not visible on mesos even though I have put the same settings in /etc/mesos/zk for both the nodes. From what I understand this is the file that gives the master node addresses.
Second Problem: When I change the hostname for the slave on the master machine to the ip address, the slave does not run.
When I remove the file /etc/mesos-slave/hostname, and restart the slave start running again.
I get the following logs in /var/log/mesos:
Log file created at: 2016/09/13 19:38:57
Running on machine: vagrant-ubuntu-trusty-64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0913 19:38:57.316082 2870 logging.cpp:194] INFO level logging started!
I0913 19:38:57.319680 2870 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I0913 19:38:57.321099 2870 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0913 19:38:57.322904 2870 main.cpp:434] Starting Mesos agent
I0913 19:38:57.323637 2887 slave.cpp:198] Agent started on 1)#10.0.2.15:5051
I0913 19:38:57.323648 2887 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="192.168.33.20" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.33.20:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
I0913 19:38:57.323942 2887 slave.cpp:519] Agent resources: cpus(*):1; mem(*):244; disk(*):35164; ports(*):[31000-32000]
I0913 19:38:57.323969 2887 slave.cpp:527] Agent attributes: [ ]
I0913 19:38:57.323974 2887 slave.cpp:532] Agent hostname: 192.168.33.20
I0913 19:38:57.326578 2886 state.cpp:57] Recovering state from '/var/lib/mesos/meta'
After this when I do "sudo service mesos-slave status" it says stop/waiting.
I am not sure how to go about dealing with these two problems. Any help appreciated.
Update
On the "Independent Slave Machine" I am getting the following logs:
file: mesos-slave.vagrant-ubuntu-trusty-64.invalid-user.log.ERROR.20160914-141226.1197
Log file created at: 2016/09/14 14:12:26
Running on machine: vagrant-ubuntu-trusty-64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0914 14:12:26.699146 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:26.700430 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:27.634099 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:28.784499 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:34.914746 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:36.906472 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:37.242663 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:40.442214 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:42.033504 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:47.239245 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:12:50.712105 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0914 14:13:03.200935 1234 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
file: mesos-slave.vagrant-ubuntu-trusty-64.invalid-user.log.INFO.20160914-141502.4788
Log file created at: 2016/09/14 14:15:02
Running on machine: vagrant-ubuntu-trusty-64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0914 14:15:02.491973 4788 logging.cpp:194] INFO level logging started!
I0914 14:15:02.495968 4788 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I0914 14:15:02.497270 4788 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0914 14:15:02.498855 4788 main.cpp:434] Starting Mesos agent
I0914 14:15:02.499091 4788 slave.cpp:198] Agent started on 1)#10.0.2.15:5051
I0914 14:15:02.499195 4788 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="192.168.33.31" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.33.20:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
I0914 14:15:02.499560 4788 slave.cpp:519] Agent resources: cpus(*):1; mem(*):244; disk(*):35164; ports(*):[31000-32000]
I0914 14:15:02.499620 4788 slave.cpp:527] Agent attributes: [ ]
I0914 14:15:02.499650 4788 slave.cpp:532] Agent hostname: 192.168.33.31
I0914 14:15:02.502511 4803 state.cpp:57] Recovering state from '/var/lib/mesos/meta'
I0914 14:15:02.502554 4803 state.cpp:697] No checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
I0914 14:15:02.502630 4803 state.cpp:100] Failed to find the latest agent from '/var/lib/mesos/meta'
I0914 14:15:02.510077 4807 status_update_manager.cpp:200] Recovering status update manager
I0914 14:15:02.510150 4807 containerizer.cpp:522] Recovering containerizer
I0914 14:15:02.510758 4807 provisioner.cpp:253] Provisioner recovery complete
I0914 14:15:02.510815 4807 slave.cpp:4782] Finished recovery
I0914 14:15:02.511342 4804 group.cpp:349] Group process (group(1)#10.0.2.15:5051) connected to ZooKeeper
I0914 14:15:02.511368 4804 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0914 14:15:02.511376 4804 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I0914 14:15:02.513720 4804 detector.cpp:152] Detected a new leader: (id='4')
I0914 14:15:02.513813 4804 group.cpp:706] Trying to get '/mesos/json.info_0000000004' in ZooKeeper
I0914 14:15:02.514854 4804 zookeeper.cpp:259] A new leading master (UPID=master#10.0.2.15:5050) is detected
I0914 14:15:02.514928 4804 slave.cpp:895] New master detected at master#10.0.2.15:5050
I0914 14:15:02.514940 4804 slave.cpp:916] No credentials provided. Attempting to register without authentication
I0914 14:15:02.514961 4804 slave.cpp:927] Detecting new master
I0914 14:15:02.514976 4804 status_update_manager.cpp:174] Pausing sending status updates
E0914 14:15:03.228878 4811 process.cpp:2105] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I0914 14:15:03.229086 4806 slave.cpp:3732] master#10.0.2.15:5050 exited
W0914 14:15:03.229099 4806 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0914 14:15:03.342586 4811 process.cpp:2105] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I0914 14:15:03.342675 4806 slave.cpp:3732] master#10.0.2.15:5050 exited
W0914 14:15:03.342685 4806 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0914 14:15:06.773352 4811 process.cpp:2105] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I0914 14:15:06.773438 4806 slave.cpp:3732] master#10.0.2.15:5050 exited
W0914 14:15:06.773448 4806 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0914 14:15:09.190912 4811 process.cpp:2105] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I0914 14:15:09.191007 4806 slave.cpp:3732] master#10.0.2.15:5050 exited
W0914 14:15:09.191017 4806 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0914 14:15:16.597836 4811 process.cpp:2105] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I0914 14:15:16.597929 4806 slave.cpp:3732] master#10.0.2.15:5050 exited
W0914 14:15:16.597940 4806 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
I0914 14:15:33.944555 4809 slave.cpp:3732] master#10.0.2.15:5050 exited
W0914 14:15:33.944607 4809 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0914 14:15:33.944682 4811 process.cpp:2105] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I0914 14:16:02.515676 4804 slave.cpp:4591] Current disk usage 4.72%. Max allowed age: 5.969647788608773days
E0914 14:16:11.307096 4811 process.cpp:2105] Failed to shutdown socket with fd 11: Transport endpoint is not connected
I0914 14:16:11.307189 4806 slave.cpp:3732] master#10.0.2.15:5050 exited
W0914 14:16:11.307199 4806 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
Update 2
The configurations for both the machines seem to be the same (I say seem because I have verified, they are the same but still I cannot seem to connect the remote slave, so there must be something going wrong).
The logs for the machine slave1 are as following:
mesos-slave.slave1.invalid-user.log.WARNING
Log file created at: 2016/09/17 20:28:34
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0917 20:28:34.018565 17112 slave.cpp:202]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
E0917 20:28:34.797722 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
W0917 20:28:34.797917 17129 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:35.612090 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
W0917 20:28:35.612185 17133 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:37.841622 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
W0917 20:28:37.841723 17128 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:38.358543 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
W0917 20:28:38.358711 17128 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:51.705592 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
W0917 20:28:51.705704 17128 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
mesos-slave.slave1.invalid-user.log.INFO
Log file created at: 2016/09/17 20:28:34
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0917 20:28:34.011777 17112 logging.cpp:194] INFO level logging started!
I0917 20:28:34.014294 17112 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I0917 20:28:34.016263 17112 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0917 20:28:34.017916 17112 main.cpp:434] Starting Mesos agent
I0917 20:28:34.018307 17112 slave.cpp:198] Agent started on 1)#127.0.0.1:5051
I0917 20:28:34.018381 17112 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="192.168.33.31" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.33.20:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
W0917 20:28:34.018565 17112 slave.cpp:202]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
I0917 20:28:34.018896 17112 slave.cpp:519] Agent resources: cpus(*):1; mem(*):244; disk(*):35164; ports(*):[31000-32000]
I0917 20:28:34.018959 17112 slave.cpp:527] Agent attributes: [ ]
I0917 20:28:34.018987 17112 slave.cpp:532] Agent hostname: 192.168.33.31
I0917 20:28:34.022061 17127 state.cpp:57] Recovering state from '/var/lib/mesos/meta'
I0917 20:28:34.022337 17127 state.cpp:697] No checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
I0917 20:28:34.022431 17127 state.cpp:100] Failed to find the latest agent from '/var/lib/mesos/meta'
I0917 20:28:34.028128 17133 group.cpp:349] Group process (group(1)#127.0.0.1:5051) connected to ZooKeeper
I0917 20:28:34.028177 17133 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0917 20:28:34.028187 17133 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I0917 20:28:34.028659 17130 status_update_manager.cpp:200] Recovering status update manager
I0917 20:28:34.028875 17129 containerizer.cpp:522] Recovering containerizer
I0917 20:28:34.029595 17129 provisioner.cpp:253] Provisioner recovery complete
I0917 20:28:34.029912 17112 slave.cpp:4782] Finished recovery
I0917 20:28:34.030637 17133 detector.cpp:152] Detected a new leader: (id='6')
I0917 20:28:34.030733 17133 group.cpp:706] Trying to get '/mesos/json.info_0000000006' in ZooKeeper
I0917 20:28:34.032158 17133 zookeeper.cpp:259] A new leading master (UPID=master#127.0.0.1:5050) is detected
I0917 20:28:34.032232 17133 slave.cpp:895] New master detected at master#127.0.0.1:5050
I0917 20:28:34.032245 17133 slave.cpp:916] No credentials provided. Attempting to register without authentication
I0917 20:28:34.032263 17133 slave.cpp:927] Detecting new master
I0917 20:28:34.032281 17133 status_update_manager.cpp:174] Pausing sending status updates
E0917 20:28:34.797722 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I0917 20:28:34.797904 17129 slave.cpp:3732] master#127.0.0.1:5050 exited
W0917 20:28:34.797917 17129 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:35.612090 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I0917 20:28:35.612174 17133 slave.cpp:3732] master#127.0.0.1:5050 exited
W0917 20:28:35.612185 17133 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:37.841622 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I0917 20:28:37.841713 17128 slave.cpp:3732] master#127.0.0.1:5050 exited
W0917 20:28:37.841723 17128 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:38.358543 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I0917 20:28:38.358700 17128 slave.cpp:3732] master#127.0.0.1:5050 exited
W0917 20:28:38.358711 17128 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
E0917 20:28:51.705592 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
I0917 20:28:51.705665 17128 slave.cpp:3732] master#127.0.0.1:5050 exited
W0917 20:28:51.705704 17128 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
mesos-slave.slave1.invalid-user.log.ERROR
Log file created at: 2016/09/17 20:28:34
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0917 20:28:34.797722 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:35.612090 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:37.841622 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:38.358543 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:51.705592 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
mesos-slave.INFO
Log file created at: 2016/09/17 20:28:34
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0917 20:28:34.011777 17112 logging.cpp:194] INFO level logging started!
I0917 20:28:34.014294 17112 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I0917 20:28:34.016263 17112 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0917 20:28:34.017916 17112 main.cpp:434] Starting Mesos agent
I0917 20:28:34.018307 17112 slave.cpp:198] Agent started on 1)#127.0.0.1:5051
I0917 20:28:34.018381 17112 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="192.168.33.31" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.33.20:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
W0917 20:28:34.018565 17112 slave.cpp:202]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
I0917 20:28:34.018896 17112 slave.cpp:519] Agent resources: cpus(*):1; mem(*):244; disk(*):35164; ports(*):[31000-32000]
I0917 20:28:34.018959 17112 slave.cpp:527] Agent attributes: [ ]
I0917 20:28:34.018987 17112 slave.cpp:532] Agent hostname: 192.168.33.31
I0917 20:28:34.022061 17127 state.cpp:57] Recovering state from '/var/lib/mesos/meta'
mesos-slave.ERROR
Log file created at: 2016/09/17 20:28:34
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0917 20:28:34.797722 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:35.612090 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:37.841622 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:38.358543 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
E0917 20:28:51.705592 17135 process.cpp:2105] Failed to shutdown socket with fd 12: Transport endpoint is not connected
Logs for the master machine are as following:
mesos-slave.master.invalid-user.log.WARNING
Log file created at: 2016/09/17 20:28:30
Running on machine: master
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0917 20:28:30.118418 21439 slave.cpp:202]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
mesos-slave.master.invalid-user.log.INFO
Log file created at: 2016/09/17 20:28:30
Running on machine: master
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0917 20:28:30.107797 21423 logging.cpp:194] INFO level logging started!
I0917 20:28:30.112454 21423 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I0917 20:28:30.113862 21423 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0917 20:28:30.114965 21423 main.cpp:434] Starting Mesos agent
I0917 20:28:30.118180 21439 slave.cpp:198] Agent started on 1)#127.0.0.1:5051
I0917 20:28:30.118201 21439 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="192.168.33.20" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.33.20:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
W0917 20:28:30.118418 21439 slave.cpp:202]
**************************************************
Agent bound to loopback interface! Cannot communicate with remote master(s). You might want to set '--ip' flag to a routable IP address.
**************************************************
I0917 20:28:30.118688 21439 slave.cpp:519] Agent resources: cpus(*):1; mem(*):244; disk(*):35164; ports(*):[31000-32000]
I0917 20:28:30.118716 21439 slave.cpp:527] Agent attributes: [ ]
I0917 20:28:30.118719 21439 slave.cpp:532] Agent hostname: 192.168.33.20
I0917 20:28:30.121039 21440 state.cpp:57] Recovering state from '/var/lib/mesos/meta'

Resources