Pacemaker + Corosync + DRBD + Apache. Fail-over OK, but fail if RJ45 is lost in master - pacemaker

I am testing my cluster,
this is the status
iba#iba-master1:/$ sudo pcs status
[sudo] password for iba:
Cluster name: cluster_iba
Cluster Summary:
* Stack: corosync
* Current DC: iba-master1 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Wed Mar 24 16:06:11 2021
* Last change: Wed Mar 24 15:36:14 2021 by root via cibadmin on iba-master1
* 2 nodes configured
* 5 resource instances configured
Node List:
* Online: [ iba-master1 iba-master2 ]
Full List of Resources:
* virtual_ip (ocf::heartbeat:IPaddr2): Started iba-master1
* Clone Set: DrbdData-clone [DrbdData] (promotable):
* Masters: [ iba-master1 ]
* Slaves: [ iba-master2 ]
* DrbdFS (ocf::heartbeat:Filesystem): Started iba-master1
* WebServer (ocf::heartbeat:apache): Started iba-master1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
If i make a Fail-over, it's all work OK. The slave take the control.
But if I remove the RJ45 connector from the master... it doesn't work properly.
In some point my status in the new slave say:
Failed Resource Actions:
* virtual_ip_start_0 on iba-master1 'error' (1): call=61, status='complete', exitreason='[findif] failed', last-rc-change='2021-03-24 16:30:49 +01:00', queued=0ms, exec=54ms
and after this, the (new) slave can't take control anymore if (new) master fail-over.
virtual_ip (ocf::heartbeat:IPaddr2) and WebServer (ocf::heartbeat:apache) was stopped.
these are my constraint list
iba#iba-master1:/mnt$ sudo pcs constraint list --full
Location Constraints:
Ordering Constraints:
promote DrbdData-clone then start DrbdFS (kind:Mandatory) (id:order-DrbdData-clone-DrbdFS-mandatory)
start DrbdFS then start virtual_ip (kind:Mandatory) (id:order-DrbdFS-virtual_ip-mandatory)
start virtual_ip then start WebServer (kind:Mandatory) (id:order-virtual_ip-WebServer-mandatory)
Colocation Constraints:
DrbdFS with DrbdData-clone (score:INFINITY) (with-rsc-role:Master) (id:colocation-DrbdFS-DrbdData-clone-INFINITY)
virtual_ip with DrbdFS (score:INFINITY) (id:colocation-virtual_ip-DrbdFS-INFINITY)
WebServer with virtual_ip (score:INFINITY) (id:colocation-WebServer-virtual_ip-INFINITY)
Ticket Constraints:
iba#iba-master1:/mnt$
Any idea what I'm doing wrong?
Thanks a lot.

Related

MiNiFi - NiFi Connection Failure: Unknown Host Exception : Able to telnet host from the machine where MiNiFi is running

I am running MiNiFi in a Linux Box (gateway server) which is behind my company's firewall. My NiFi is running on an AWS EC2 cluster (running in standalone mode).
I am trying to send data from the Gateway to NiFi running in AWS EC2.
From gateway, I am able to telnet to EC2 node with the public DNS and the remote port which I have configured in the nifi.properties file
nifi.properties
# Site to Site properties
nifi.remote.input.host=ec2-xxx.us-east-2.compute.amazonaws.com
nifi.remote.input.secure=false
nifi.remote.input.socket.port=1026
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
Telnet connection from Gateway to NiFi
iot1#iothdp02:~/minifi/minifi-0.5.0/conf$ telnet ec2-xxx.us-east-2.compute.amazonaws.com 1026
Trying xx.xx.xx.xxx...
Connected to ec2-xxx.us-east-2.compute.amazonaws.com.
Escape character is '^]'.
The Public DNS is resolving to the correct Public IP of the EC2 node.
From the EC2 node, when I do nslookup on the Public DNS, it gives back the private IP.
From AWS Documentation: "The public IP address is mapped to the primary private IP address through network address translation (NAT). "
Hence, I am not adding the Public DNS and the Public IP in /etc/host file in the EC2 node.
From MiNiFi side, I am getting the below error:
minifi-app.log
iot1#iothdp02:~/minifi/minifi-0.5.0/logs$ cat minifi-app.log
2018-11-14 16:00:47,910 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:00:47,911 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:01:02,334 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with 0 Records and 0 Swap Files in 20 milliseconds (Stop-the-world time = 6 milliseconds, Clear Edit Logs time = 4 millis), max Transaction ID -1
2018-11-14 16:02:47,911 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:02:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:03:02,354 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with 0 Records and 0 Swap Files in 18 milliseconds (Stop-the-world time = 3 milliseconds, Clear Edit Logs time = 5 millis), max Transaction ID -1
2018-11-14 16:03:10,636 WARN [Timer-Driven Process Thread-8] o.a.n.r.util.SiteToSiteRestApiClient Failed to get controller from http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi-api due to java.net.UnknownHostException: ec2-xxx.us-east-2.compute.amazonaws.com: unknown error
2018-11-14 16:03:10,636 WARN [Timer-Driven Process Thread-8] o.apache.nifi.controller.FlowController Unable to communicate with remote instance RemoteProcessGroup[http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi] due to org.apache.nifi.controller.exception.CommunicationsException: org.apache.nifi.controller.exception.CommunicationsException: Unable to communicate with Remote NiFi at URI http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi due to: ec2-xxx.us-east-2.compute.amazonaws.com: unknown error
2018-11-14 16:04:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:04:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:05:02,380 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with 0 Records and 0 Swap Files in 25 milliseconds (Stop-the-world time = 8 milliseconds, Clear Edit Logs time = 6 millis), max Transaction ID -1
2018-11-14 16:06:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:06:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:07:02,399 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with
MiNiFi config.yml
MiNiFi Config Version: 3
Flow Controller:
name: Gateway-IDS_v0.1
comment: "1. ConsumeMQTT - MiNiFi will consume mqtt messages in gateway\n2. Remote\
\ Process Group will send messages to NiFi "
Core Properties:
flow controller graceful shutdown period: 10 sec
flow service write delay interval: 500 ms
administrative yield duration: 30 sec
bored yield duration: 10 millis
max concurrent threads: 1
variable registry properties: ''
FlowFile Repository:
partitions: 256
checkpoint interval: 2 mins
always sync: false
Swap:
threshold: 20000
in period: 5 sec
in threads: 1
out period: 5 sec
out threads: 4
Content Repository:
content claim max appendable size: 10 MB
content claim max flow files: 100
always sync: false
Provenance Repository:
provenance rollover time: 1 min
implementation: org.apache.nifi.provenance.MiNiFiPersistentProvenanceRepository
Component Status Repository:
buffer size: 1440
snapshot frequency: 1 min
Security Properties:
keystore: ''
keystore type: ''
keystore password: ''
key password: ''
truststore: ''
truststore type: ''
truststore password: ''
ssl protocol: ''
Sensitive Props:
key:
algorithm: PBEWITHMD5AND256BITAES-CBC-OPENSSL
provider: BC
Processors:
- id: 6396f40f-118f-33f4-0000-000000000000
name: ConsumeMQTT
class: org.apache.nifi.processors.mqtt.ConsumeMQTT
max concurrent tasks: 1
scheduling strategy: TIMER_DRIVEN
scheduling period: 0 sec
penalization period: 30 sec
yield period: 1 sec
run duration nanos: 0
auto-terminated relationships list: []
Properties:
Broker URI: tcp://localhost:1883
Client ID: nifi
Connection Timeout (seconds): '30'
Keep Alive Interval (seconds): '60'
Last Will Message:
Last Will QoS Level:
Last Will Retain:
Last Will Topic:
MQTT Specification Version: '0'
Max Queue Size: '10'
Password:
Quality of Service(QoS): '0'
SSL Context Service:
Session state: 'true'
Topic Filter: MQTT
Username:
Controller Services: []
Process Groups: []
Input Ports: []
Output Ports: []
Funnels: []
Connections:
- id: f0007aa3-cf32-3593-0000-000000000000
name: ConsumeMQTT/Message/85ebf198-0166-1000-5592-476a7ba47d2e
source id: 6396f40f-118f-33f4-0000-000000000000
source relationship names:
- Message
destination id: 85ebf198-0166-1000-5592-476a7ba47d2e
max work queue size: 10000
max work queue data size: 1 GB
flowfile expiration: 0 sec
queue prioritizer class: ''
Remote Process Groups:
- id: c00d3132-375b-323f-0000-000000000000
name: ''
url: http://ec2-xxx.us-east-2.compute.amazonaws.com:9090
comment: ''
timeout: 30 sec
yield period: 10 sec
transport protocol: RAW
proxy host: ''
proxy port: ''
proxy user: ''
proxy password: ''
local network interface: ''
Input Ports:
- id: 85ebf198-0166-1000-5592-476a7ba47d2e
name: From MiNiFi
comment: ''
max concurrent tasks: 1
use compression: false
Properties:
Port: 1026
Host Name: ec2-xxx.us-east-2.compute.amazonaws.com
Output Ports: []
NiFi Properties Overrides: {}
Any pointers on how to troubleshoot this issue?
In MiNiFi config.yml, I changed the URL under Remote Process Groups from http://ec2-xxx.us-east-2.compute.amazonaws.com:9090 to http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi.

Kafka stream app failing to fetch offsets for partition

I created a kafka cluster with 3 brokers and following details:
Created 3 topics, each one with replication factor=3 and partitions=2.
Created 2 producers each one writing to one of the topics.
Created a Streams application to process messages from 2 topics and write to the 3rd topic.
It was all running fine till now but I suddenly started getting the following warning when starting the Streams application:
[WARN ] 2018-06-08 21:16:49.188 [Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1] org.apache.kafka.clients.consumer.internals.Fetcher [Consumer clientId=Stream3-4f7403ad-aba6-4d34-885d-60114fc9fcff-StreamThread-1-restore-consumer, groupId=] Attempt to fetch offsets for partition Stream3-KSTREAM-OUTEROTHER-0000000005-store-changelog-0 failed due to: Disk error when trying to access log file on the disk.
Due to this warning, Streams application is not processing anything from the 2 topics.
I tried following things:
Stopped all brokers, deleted kafka-logs directory for each broker and restarted the brokers. It didn't solve the issue.
Stopped zookeeper and all brokers, deleted zookeeper logs as well as kafka-logs for each broker, restarted zookeeper and brokers and created the topics again. This too didn't solve the issue.
I am not able to find anything related to this error on official docs or web. Does anyone have an idea of why am I getting this error suddenly?
EDIT:
Out of 3 brokers, 2 brokers(broker-0 and broker-2) continously emit these logs:
Broker-0 logs:
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:03:08,750] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-2 logs:
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial11_topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
[2018-06-09 02:04:46,889] INFO [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Retrying leaderEpoch request for partition initial12_topic-0 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread)
Broker-1 shows following logs:
[2018-06-09 01:21:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:31:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-06-09 01:39:44,667] ERROR [KafkaApi-1] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2018-06-09 01:41:26,689] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
I again stopped zookeeper and brokers, deleted their logs and restarted. As soon as I create the topics again, I start getting the above logs.
Topic details:
[zk: localhost:2181(CONNECTED) 3] get /brokers/topics/initial11_topic
{"version":1,"partitions":{"1":[1,0,2],"0":[0,2,1]}}
cZxid = 0x53
ctime = Sat Jun 09 01:25:42 EDT 2018
mZxid = 0x53
mtime = Sat Jun 09 01:25:42 EDT 2018
pZxid = 0x54
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 4] get /brokers/topics/initial12_topic
{"version":1,"partitions":{"1":[2,1,0],"0":[1,0,2]}}
cZxid = 0x61
ctime = Sat Jun 09 01:25:47 EDT 2018
mZxid = 0x61
mtime = Sat Jun 09 01:25:47 EDT 2018
pZxid = 0x62
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
[zk: localhost:2181(CONNECTED) 5] get /brokers/topics/final11_topic
{"version":1,"partitions":{"1":[0,1,2],"0":[2,0,1]}}
cZxid = 0x48
ctime = Sat Jun 09 01:25:32 EDT 2018
mZxid = 0x48
mtime = Sat Jun 09 01:25:32 EDT 2018
pZxid = 0x4a
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 1
Any clue?
I found out the issue. It was due to following incorrect config in server.properties of broker-1:
advertised.listeners=PLAINTEXT://10.23.152.109:9094
Mistakenly port for advertised.listeners got changed to same as port of advertised.listeners of broker-2.

Apache storm Supervisor routinely shutting down worker

I made topology in Apache Storm(0.9.6) with kafka-storm, zookeeper(3.4.6)
(3 zookeeper each node, and 3 supervisor each node. operate 3 topology)
I add 2 storm&zookeeper nodes and change topology.worker configuration 3 to 5.
But after 2 nodes, storm supervisor routinely shutting down worker. Checked with iostat command, read and write throughput is under 1mb.
In supervisor log, show like below.
2016-10-19T15:07:38.904+0900 b.s.d.supervisor [INFO] Shutting down and clearing state for id ee13ada9-641e-463a-9be5-f3ed66fdb8f3. Current supervisor time: 1476857258. State: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1476857226, :storm-id "top3-17-1476839721", :executors #{[36 36] [6 6] [11 11] [16 16] [21 21] [26 26] [31 31] [-1 -1] [1 1]}, :port 6701}
2016-10-19T15:07:38.905+0900 b.s.d.supervisor [INFO] Shutting down b278933f-f9c7-4189-b615-1d70c7988f17:ee13ada9-641e-463a-9be5-f3ed66fdb8f3
2016-10-19T15:07:38.907+0900 b.s.util [INFO] Error when trying to kill 9306. Process is probably already dead.
2016-10-19T15:07:44.948+0900 b.s.d.supervisor [INFO] Shutting down and clearing state for id d6df820a-7c29-4bff-a606-9e8e36fafab2. Current supervisor time: 1476857264. State: :disallowed, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1476857264, :storm-id "top3-17-1476839721", :executors #{[-1 -1]}, :port 6701}
2016-10-19T15:07:44.949+0900 b.s.d.supervisor [INFO] Shutting down b278933f-f9c7-4189-b615-1d70c7988f17:d6df820a-7c29-4bff-a606-9e8e36fafab2
2016-10-19T15:07:45.954+0900 b.s.util [INFO] Error when trying to kill 11171. Process is probably already dead.
2016-10-19T15:07:45.954+0900 b.s.d.supervisor [INFO] Shut down b278933f-f9c7-4189-b615-1d70c7988f17:d6df820a-7c29-4bff-a606-9e8e36fafab2
And in zookeeper.out log... show like below(xxx ip address is another storm zookeeper address)
2016-09-20 02:31:06,031 [myid:5] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1007] - Closed socket connection for client /xxx.xxx.xxx.xxx:39426 which had sessionid 0x5574372bbf00004
2016-09-20 02:31:08,116 [myid:5] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x5574372bbf0000a, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
I don't know why worker is down routinely. How can i fix it? Is this something wrong?
Oh, my zookeeper and storm configuration is like below
zoo.cfg(same all nodes)
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/log/zkdata/1
clientPort=2181
server.1=storm01:2888:3888
server.2=storm02:2888:3888
server.3=storm03:2888:3888
server.4=storm04:2888:3888
server.5=storm05:2888:3888
autopurge.purgeInterval=1
storm.yaml
storm.zookeeper.servers:
- "storm01"
- "storm02"
- "storm03"
- "storm04"
- "storm05"
storm.zookeeper.port: 2181
zookeeper.multiple.setup:
follower.port:2888
election.port:3888
nimbus.host: "storm01"
storm.supervisor.hosts:
- "storm01"
- "storm02"
- "storm03"
- "storm04"
- "storm05"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6704
storm.local.dir: /log/storm-data
worker.childopts: "-Xmx5120m -Djava.net.preferIPv4Stack=true"
topology.workers: 5
storm.log.dir: /log/storm-log

ElasticSearch Ec2 - BindTransportException: Failed to bind to [9300]

I am trying to connect to elasticsearch on ec2 using node client from my local machine. But i am getting below error.
org.elasticsearch.transport.BindTransportException: Failed to bind to [9300]
Not sure what other properties i should use. Can anyone please help me with this issue. elasticsearch client version used:2.1.0
node client:
Node node = nodeBuilder()
.settings(ImmutableSettings.settingsBuilder().put("http.enabled",true).put("client.transport.sniff",false)
.put("transport.tcp.port",9300))
.local(false)
.node();
client = node.client();
elasticsearch.yml
cluster:
name: XX
network:
host: XX
discovery:
type: ec2
ec2:
groups: XX
availability_zones: XX
cloud:
aws:
access_key: XX
secret_key: XX
region: XX

RabbitMQ on WindowsServer 2012 R2 crashes on start

I'm trying to run RabbitMQ on a Windows Server 2012 R2 machine, but it keeps crashing. Here's what I see when running "rabbitmqctl status":
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.5.1\sbin>rabbitmqctl status
Status of node 'rabbit#MYMACHINENAME' ...
Error: unable to connect to node 'rabbit#MYMACHINENAME': nodedown
DIAGNOSTICS
===========
attempted to contact: ['rabbit#MYMACHINENAME']
rabbit#MYMACHINENAME:
* unable to connect to epmd (port 4369) on MYMACHINENAME: address (cannot connect to host/port)
current node details:
- node name: 'rabbitmq-cli-6880#MYMACHINENAME'
- home dir: C:\Users\myname
- cookie hash: L9D52tPzwbCgggPY6qPS3g==
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.5.1\sbin>
Here's what I see in the logs (after a long series of apparently successful "PROGRESS REPORT" entries):
=CRASH REPORT==== 22-Apr-2015::11:57:24 ===
crasher:
initial call: rabbit_epmd_monitor:init/1
pid: <0.165.0>
registered_name: []
exception exit: {{badmatch,noport},
[{rabbit_epmd_monitor,init,1,[]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,237}]}]}
in function gen_server:init_it/6 (gen_server.erl, line 330)
ancestors: [rabbit_epmd_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.164.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 376
stack_size: 27
reductions: 463
neighbours:
=SUPERVISOR REPORT==== 22-Apr-2015::11:57:24 ===
Supervisor: {local,rabbit_epmd_monitor_sup}
Context: start_error
Reason: {{badmatch,noport},
[{rabbit_epmd_monitor,init,1,[]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,237}]}]}
Offender: [{pid,undefined},
{name,rabbit_epmd_monitor},
{mfargs,{rabbit_epmd_monitor,start_link,[]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 22-Apr-2015::11:57:24 ===
crasher:
initial call: application_master:init/4
pid: <0.139.0>
registered_name: []
exception exit: {bad_return,
{{rabbit,start,[normal,[]]},
{'EXIT',
{error,
{{shutdown,
{failed_to_start_child,rabbit_epmd_monitor,
{{badmatch,noport},
[{rabbit_epmd_monitor,init,1,[]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,237}]}]}}},
{child,undefined,rabbit_epmd_monitor_sup,
{rabbit_restartable_sup,start_link,
[rabbit_epmd_monitor_sup,
{rabbit_epmd_monitor,start_link,[]},
false]},
transient,infinity,supervisor,
[rabbit_restartable_sup]}}}}}}
in function application_master:init/4 (application_master.erl, line 133)
ancestors: [<0.138.0>]
messages: [{'EXIT',<0.140.0>,normal}]
links: [<0.138.0>,<0.7.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 212
neighbours:
I've enabled port 4369 in Windows Firewall, and this port appears to be "listening":
netstat -an
Proto Local Address Foreign Address State PID
TCP 0.0.0.0:4369 0.0.0.0:0 LISTENING 6496
process 6469:
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.5.1\sbin>tasklist /svc /FI "PID eq 6496"
Image Name PID Services
========================= ======== ============================================
epmd.exe 6496 N/A
Any ideas on other avenues/config/troubleshooting I can pursue?
This ultimately seemed to be due to my being connected to a VPN. When not on the VPN, I am able to view the management console, view status at the command line, etc.
The error {badmatch,noport} suggest that perhaps RabbitMQ couldn't access EPMD's port.
If you still want to use the VPN with RabbitMQ make sure to open the ports required by RabbitMQ.
See What ports does RabbitMQ use? and read about inet_dist_listen_{min,max} here: http://www.rabbitmq.com/configure.html

Resources