how I should handle postgres-xl gtm failover - postgres-xl

I have a cluster like that node1(gtm), node2(gtm-slave), node3(gtm-proxy1, coord1, datanode1)I setup those nodes using pgxc_ctl.
I am testing the gtm failover. Here is what I did:
copy pgxc_ctl.conf from node1 to node2
shutdown node1
ssh int node2 and type pgxc_ctl, then type failover gtm
ssh into node3, and type gtm_ctl reconnect -Z gtm_proxy -D proxydir -o "-s node2 -t 20001"
try psql it give me error "FATAL: Could not obtain a transaction ID from GTM. The GTM might have failed or lost connectivity"
so I type "stop all" and "init all", looks like it fixed the issue.
But I don't think I should need to restart the cluster. Anyone have a suggestions

ok, I found out why. After adding the first datanode, need to stop all and init all. If I don't do that, the postgres will connect to the gtm server instead of the gtm-proxy.

Related

ejabberdctl start succeeds,but status and stop failed to connect to node

I was following this guide to set up jabbed on cluster http://chadillac.github.io/2012/11/17/easy-ejabberd-clustering-guide-mnesia-mysql/
I am using two was instances having ip
Master -> 111.222.333.444
Slave -> 222.333.444.555
But since I do not have DNS configured so I am using ip addresses like 111.222.333.444 etc instead of ‘master.domain.com’ .
I haven’t been successful at seeing up the cluster yet but before that I am having a problem at my master node .
I start the server with
/tmp/ej1809/sbin/ejabberdctl start
Then I get no output but I see in the logs that that the server started.
then I check the status using
/tmp/ej1809/sbin/ejabberdctl status
But I get the error as
Failed RPC connection to the node 'ejabberd#111.222.333.444’: nodedown
And even when I try to stop the node using /tmp/ej1809/sbin/ejabberdctl stop then also
I get
Failed RPC connection to the node 'ejabberd#111.222.333.444’: nodedown
But I cannot understand the reason behind it.
Can anyone help me solve it please?
Stop and kill processes like epmd, erl, beam.
Then start ejabberd with "ejabberdctl live", that will keep the erlang shell open for you to see the log messages in realtime, including the erlang node name:
...
13:21:22.662 [info] ejabberd 19.02.52 is started in the node ejabberd#localhost in 7.07s
13:21:22.667 [info] Start accepting TCP connections at 0.0.0.0:5444 for ejabberd_http
13:21:22.667 [info] Application ejabberd started on node ejabberd#localhost
You can check if "epmd" knows about that node:
$ epmd -names
epmd: up and running on port 4369 with data:
name ejabberd at port 33519
Then let's see if ejabberdctl can connect with that node:
$ ejabberdctl help | grep "node name:"
--node nodename ejabberd node name: ejabberd#localhost
And finally:
$ ejabberdctl status
The node ejabberd#localhost is started with status: started
ejabberd 19.02.52 is running in that node
I assume you didn't yet edit anything in ejabberdctl.cfg, specifically the ERLANG_NODE. But if you did, I recommend to reinstall ejabberd, to ensure you have default configuration, and then retry those steps. Once ejabberd works perfectly, you can start modifying the configuration files (ejabberd.yml and ejabberdctl.cfg) to suit your real requirements (clustering, etc).
At some time, if you have problems setting clustering, you may find some ideas to debug the problem in
https://ejabberd.im/interconnect-erl-nodes/index.html

heroku and rabbitmq - unable to run multiple worker dynos

I am using CloudAMPQ addon for Heroku. As RabbitMQ needs a unique node name for each of its process, I run into warning when I scale my worker dynos from 1 to 2 or more:
/app/.heroku/python/lib/python3.6/site-packages/kombu/pidbox.py:71: UserWarning: A node named coworker#fstrk.io is already using this process mailbox!
Maybe you forgot to shutdown the other node or did not do so properly?
Or if you meant to start multiple nodes on the same host please make sure
you give each node a unique node name!
My Procfile line looks like this
coworker: celery -l info -A getmybot worker -Q slack -c ${COWORKER_PROCESSES:-4} --hostname coworker#fstrk.io --without-gossip --without-mingle --without-heartbeat
how do I go about it?
Try change --hostname coworker#fstrk.io to --hostname coworker#%%h
More details in official docs:
http://docs.celeryproject.org/en/latest/reference/celery.bin.worker.html

Marathon event subscriptions

have been trying to enable event subscriptions. I found the Marathon REST API. I attempted to restart marathon with the "--event_subscriber http_callback" and created the "event_subscriber" and "http_endpoints". When i restart it shows " --http_endpoints http://localhost:1234/" and I am running "nc -l -p 1234" to listen to the port. I am not getting anything when i create new apps.
It seems that i am having trouble enabling it. As i keep getting the error.
"http event callback system is not running on this Marathon instance. Please re-start this instance with \"--event_subscriber http_callback\"
Maybe i am missing something? Any help is much appreciated. Thanks.
Issue resolved! i fixed it by running the following command
marathon --jar --master zk://your_ip:5050,your_ip:5050,your_ip:5050/mesos --event_subscriber http_callback
and to get it to take restart marathon on ALL masters
sudo service marathon restart
Once back up check the page and you should be good to go.

Elasticsearch is not running

I tried running ubuntu elasticsearch at 12:04, after I install and run is OK, but i'm check sudo /etc/init.d/elasticsearch status there I see message elasticsearch is not running. and I tried to run in the browser to localhost: 9200 also failed.
help me please..
It will not start automatically after you install it for good reason. You don't want it to accidentally join a cluster configured to use multicast discovery. See my post here for information on the basics for configuring elasticsearch.
In addition to that post, also make sure you set the following two options in /etc/elasticsearch/elasticsearch.yml:
cluster.name: some-other-name
discovery.zen.ping.multicast.enabled: false
After you have done that, start it by running:
sudo service elasticsearch start
You should almost always disable multicast because on a local testing environment you only have one node so you don't need it, and in a production environment it's just bad practice since nodes accidentally joining the cluster can break things (trust me, I've had this happen and it's a headache).
Did you try sudo /etc/init.d/elasticsearch start?

Unable to get Mesos to run from tutorial: Setting up a Single Node Mesosphere Cluster

I have been following this tutorial to try and setup a single node mesosphere cluster from their
official tutorial:
http://mesosphere.com/docs/getting-started/developer/single-node-install/
I followed all the commands without any issues, and I also added the ports 5050 and 8080 to my security group. When I try to access the console for mesos/marathon, I get a "Internet Explorer cannot display the webpage" message.
They also recommend checking it the following way:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
But that comes up with an error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0106 17:03:08.126703 20993 process.cpp:1561] Failed to initialize, gethostbyname2: Unknown host
*** Check failure stack trace: ***
I am not really sure how to troubleshoot this either, and there are not many tutorials I could find on how to install mesos on ubuntu.
I checked the contents of the zk file, seems to be the default value.
$ cat /etc/mesos/zk
zk://localhost:2181/mesos
I would really appreciate any clues on how to go about this one.
Edit: The process is definitely running too - just an fyi:
root 31545 8.5 5.9 187464 35604 ? Ssl 17:28 0:00 /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --log_dir=/var/log/mesos
root 31563 28.5 2.1 116304 12856 ? Rs 17:28 0:00 /usr/local/sbin/mesos-master --zk=zk://localhost:2181/mesos --port=5050 --log_dir=/var/log/mesos --quorum=1 --wo
Mesos uses gethostbyname2 to resolve hostnames to IPs. The first thing I would recommend, is to try "ping localhost" and "ping hostname", and verify that there are no strange settings in /etc/hosts. If you're doing a multi-node cluster, I'd recommend that hostname map to the public IP address (not 127.0.x.1).
If that doesn't help, you can try setting the --ip and --hostname flags when starting mesos-master and mesos-slave, to bypass the gethostbyname2 resolution. These can also be set by writing to the file-based parameters, e.g. /etc/mesos/mesos-master/ip
For additional troubleshooting, try running wget http://localhost:5050 (or curl -L) from the mesos master, to verify that it is locally visible. Also try wget http://<public_ip>:5050 to verify that the web server is up and serving to the public IP. Depending on how your (EC2?) node is setup, you may need to expose/forward the port, or connect to a VPN.
Thanks Adam. I ran the wget and curl commands, and nothing was actually listening on port 8080 or 5050. I did open those ports in the ec2. A simple reboot did the trick however, once I ssh'ed into the ec2 instance after the reboot, both mesos and marathon were running and both ports are now showing after I ran
netstat -ntln.

Resources