Consul agent does not respond - consul

I have a problem with registration of a service by a consul agent. The consul agent is listed as alive in the cluster members information, but it does not register a service or respond to queries via HTTP interface.
There is an error in the log but I can not interpret it:
2015/06/16 16:09:42 [INFO] agent: Joining cluster...
2015/06/16 16:09:42 [INFO] agent: (LAN) joining: [10.10.100.226]
2015/06/16 16:09:42 [INFO] agent: (LAN) joined: 1 Err: <nil>
2015/06/16 16:09:42 [INFO] agent: Join completed. Synced with 1 initial agents
Here is the configuration of the consul agent that runs on this server:
{"data_dir":"/opt/consul","datacenter":"dc","log_level":"INFO","node_name":"app01","retry_join":["10.10.100.226"]}
And the configuration of the server. The cluster has 3 server agents.
{"client_addr":"0.0.0.0","data_dir":"/opt/consul","datacenter":"ovh-rbx","log_level":"INFO","node_name":"consul-server","server":true,"ui_dir":"/opt/consul/ui"}

I received an answer on the Consul mailing list, so I will post it here just in case someone else stumbles into the same problem:
"The log message that you pasted with the error, which says "Err: ", is actually fine. We always dump any error on that step, even if there was none. The message following that one, "Join completed", confirms that the join was successful, so this shouldn't be anything to worry about.
I noticed that you don't have any bootstrap options set in the server configuration. Bootstrapping a Consul cluster is a required step. Did you pass any bootstrap options on the command line during start? You can read about bootstrapping here: https://consul.io/docs/guides/bootstrapping.html, but basically if you don't already, you should just add "bootstrap_expect": 3 to your configuration on the server nodes."
Setting bootstrap option to number of servers, deleting the data directory and restarting the cluster solved the problem.

Related

ejabberdctl start succeeds,but status and stop failed to connect to node

I was following this guide to set up jabbed on cluster http://chadillac.github.io/2012/11/17/easy-ejabberd-clustering-guide-mnesia-mysql/
I am using two was instances having ip
Master -> 111.222.333.444
Slave -> 222.333.444.555
But since I do not have DNS configured so I am using ip addresses like 111.222.333.444 etc instead of ‘master.domain.com’ .
I haven’t been successful at seeing up the cluster yet but before that I am having a problem at my master node .
I start the server with
/tmp/ej1809/sbin/ejabberdctl start
Then I get no output but I see in the logs that that the server started.
then I check the status using
/tmp/ej1809/sbin/ejabberdctl status
But I get the error as
Failed RPC connection to the node 'ejabberd#111.222.333.444’: nodedown
And even when I try to stop the node using /tmp/ej1809/sbin/ejabberdctl stop then also
I get
Failed RPC connection to the node 'ejabberd#111.222.333.444’: nodedown
But I cannot understand the reason behind it.
Can anyone help me solve it please?
Stop and kill processes like epmd, erl, beam.
Then start ejabberd with "ejabberdctl live", that will keep the erlang shell open for you to see the log messages in realtime, including the erlang node name:
...
13:21:22.662 [info] ejabberd 19.02.52 is started in the node ejabberd#localhost in 7.07s
13:21:22.667 [info] Start accepting TCP connections at 0.0.0.0:5444 for ejabberd_http
13:21:22.667 [info] Application ejabberd started on node ejabberd#localhost
You can check if "epmd" knows about that node:
$ epmd -names
epmd: up and running on port 4369 with data:
name ejabberd at port 33519
Then let's see if ejabberdctl can connect with that node:
$ ejabberdctl help | grep "node name:"
--node nodename ejabberd node name: ejabberd#localhost
And finally:
$ ejabberdctl status
The node ejabberd#localhost is started with status: started
ejabberd 19.02.52 is running in that node
I assume you didn't yet edit anything in ejabberdctl.cfg, specifically the ERLANG_NODE. But if you did, I recommend to reinstall ejabberd, to ensure you have default configuration, and then retry those steps. Once ejabberd works perfectly, you can start modifying the configuration files (ejabberd.yml and ejabberdctl.cfg) to suit your real requirements (clustering, etc).
At some time, if you have problems setting clustering, you may find some ideas to debug the problem in
https://ejabberd.im/interconnect-erl-nodes/index.html

Cannot produce events to Confluent Kafka deployed on AWS EC2 from local machine

I'm trying to connect from an external client (my laptop) to a broker in a Kafka cluster that I have running on ec2 machines. When I try and connect from my local machine I get the following error:
$ ./kafka-console-producer --broker-list AWS.PRIV.ATE.IP:9092 --topic test
>hi
>[2018-09-20 13:28:53,952] ERROR Error when sending message to topic test with key: null, value: 2 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0: 1519 ms has passed since batch creation plus linger time
The topic exists because if I run (from local machine)
$ ./kafka-topics --list --zookeeper AWS.PRIV.ATE.IP:2181
__confluent.support.metrics
__consumer_offsets
_schemas
connect-configs
connect-offsets
connect-status
test
The cluster configuration is from Confluent's AWS quickstart template: https://github.com/aws-quickstart/quickstart-confluent-kafka/blob/master/templates/confluent-kafka.template and I'm running the open source version.
The three broker ec2 instances are visible to my local machine, which I verified by stopping the Kafka broker, starting a simple HTTP server on port 9092, and successfully curling that server using the internal IP address of the ec2 instance.
If I ssh into one of the broker instances I can successfully produce and consume messages across the cluster. The only update I've made to the out-of-the-box configuration provided by the template is changing listeners=PLAINTEXT://ec2-AWS-PUB-LIC-IP.compute-1.amazonaws.com:9092 in server.properties on each machine and then restarted the kafka server.
I can provide more configuration or debugging info if necessary. Believe the issue is something regarding IP address discoverability/visibility but I'm not entirely sure what.
You need to set advertised.listeners too.
See https://rmoff.net/2018/08/02/kafka-listeners-explained/ for details.

Setting up a three tier environment in puppet

These are my files:
Nodes.pp file
site.pp file
I need to setup the infrastructure in the diagram, and I would like to use Puppet Automation in order to do so. I would need to, 
Create 4 VMs, one for DB, 1 web server, 1 load balancer, 1 master
Set them up with Puppet Agent
Find the appropriate modules/cookbooks from the community site
(Puppet Forge/ Chef Supermarket)
Configure the nodes using recipes/classes fetched from the community
sites.
Provide configuration parameters in order to have all these nodes
connect to each other.
 
End goal is to have a working Wordpress setup.
I got stuck with the master agent configuration process. I have a Puppet master and 3 agents up and running. But, but whenever I run #puppet agent --test in the agent, It throws an error. I look forward to the community's help.
The error I am getting is...
[root#agent1 vagrant]# puppet agent --noop --test
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
First take a look at the puppet master logs.
Second: The error message is to short. There is missing something after the
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could The text after the "Could" can be helpful ;)

Impala The Cloudera Manager Agent got an unexpected response from this role's web server

i have done an hadoop cluster installation with cloudera manager. After this installation impala status has become bad.
I have the following error for master node:
Web Server Status
and this one for nodes with imapala daemon:
Impala Daemon Ready Check, Web Server Status
looking into logs i have found some errors:
The health test result for IMPALAD_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent got an unexpected response from this role's web server.
looking into cloudera-scm-agent.log there are those errors:
1261 Monitor-HostMonitor throttling_logger ERROR (29 skipped) Failed to collect NTP metrics
i tryed to install NTP (sudo apt-get install ntp) but after this installation HDFS, HIVE, YARN and others services goes bad, removing that only impala goes bad.
MainThread agent ERROR Failed to connect to previous supervisor.
Another error is this:
Monitor-GenericMonitor throttling_logger ERROR Error fetching metrics at 'http://nodo-1:50075/jmx'
i tried looking all hostnames and seems correct...
so, what is this problem? how can i solve it?
I also had problem with NTP, the problem still existed after installing NTP , but when I done sudo service ntp restart the error was fixed

The Node Agent is stopped

I'm trying to start my Node on a command prompt like this:
C:\IBM\WebSphere\AppServer\profiles\AppSrv01\bin>startnode
ADMU0128I: Starting tool with the AppSrv01 profile
ADMU3100I: Reading configuration for server: nodeagent
ADMU3200I: Server launched. Waiting for initialization status.
ADMU3011E: Server launched but failed initialization. Server logs, startServer.log, and other log files under C:\IBM\WebSphere\AppServer\profiles\AppSrv01\logs\nodeagent should contain failure information
What should I do? I try to search on how to start the node agent but it's all the same and I execute those command but it fall under the same error (noted at the top).
Here are the content of startServer.log
[8/15/13 13:42:21:240 CST] 00000040 NodeSync E ADMS0005E: The system is unable to generate synchronization request: javax.management.JMRuntimeException: ADMN0022E: Access is denied for the getFolderSyncUpdates operation on CellSync MBean because of insufficient or empty credentials.
find your soap.client.props file in your profile
add in your deployment's id / pass
see if that works.
also, try stopping your node, and restarting

Resources