Setup distributed arangodb cluster - cluster-computing

I whould like to setup an arangodb cluster with 3 virtual machines.
In the first machine I executed $ arangodb I got the following output
ubuntu#arangodb-1:/etc/arangodb3$ arangodb
2018/04/19 09:15:46 Starting arangodb version 0.10.4, build 553aab6
2018/04/19 09:15:46 Serving as master with ID '5f388575' on :8528...
2018/04/19 09:15:46 Waiting for 3 servers to show up.
2018/04/19 09:15:46 Use the following commands to start other servers:
arangodb --starter.data-dir=./db2 --starter.join 127.0.0.1
arangodb --starter.data-dir=./db3 --starter.join 127.0.0.1
2018/04/19 09:15:46 Listening on 0.0.0.0:8528 (:8528)
In the second machine i got the following problem
ubuntu#arangodb-2:~$ arangodb --starter.data-dir=./db2 --starter.join 10.100.0.105
2018/04/19 09:23:12 Starting arangodb version 0.10.4, build 553aab6
2018/04/19 09:23:12 Contacting master http://10.100.0.105:8528...
2018/04/19 09:23:27 Cannot start because of error from master: Post
http://10.100.0.105:8528/hello: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018/04/19 09:23:28 Contacting master http://10.100.0.105:8528...
Can some one help me to getting started with arangodb cluster.

Evidently your machines do not seem to reach each other. ifconfig on all three machines should indicate which network they share.
Make sure that they can ping each other.
make sure that you can run curl <other-machine>:8528/version to respond something like {"version":"0.10.4","build":"553aab6"} or so.

Please try the --starter.address=addr option as described on https://github.com/arangodb-helper/arangodb/blob/master/docs/Manual/Programs/Starter/Options.md

Related

ejabberdctl start succeeds,but status and stop failed to connect to node

I was following this guide to set up jabbed on cluster http://chadillac.github.io/2012/11/17/easy-ejabberd-clustering-guide-mnesia-mysql/
I am using two was instances having ip
Master -> 111.222.333.444
Slave -> 222.333.444.555
But since I do not have DNS configured so I am using ip addresses like 111.222.333.444 etc instead of ‘master.domain.com’ .
I haven’t been successful at seeing up the cluster yet but before that I am having a problem at my master node .
I start the server with
/tmp/ej1809/sbin/ejabberdctl start
Then I get no output but I see in the logs that that the server started.
then I check the status using
/tmp/ej1809/sbin/ejabberdctl status
But I get the error as
Failed RPC connection to the node 'ejabberd#111.222.333.444’: nodedown
And even when I try to stop the node using /tmp/ej1809/sbin/ejabberdctl stop then also
I get
Failed RPC connection to the node 'ejabberd#111.222.333.444’: nodedown
But I cannot understand the reason behind it.
Can anyone help me solve it please?
Stop and kill processes like epmd, erl, beam.
Then start ejabberd with "ejabberdctl live", that will keep the erlang shell open for you to see the log messages in realtime, including the erlang node name:
...
13:21:22.662 [info] ejabberd 19.02.52 is started in the node ejabberd#localhost in 7.07s
13:21:22.667 [info] Start accepting TCP connections at 0.0.0.0:5444 for ejabberd_http
13:21:22.667 [info] Application ejabberd started on node ejabberd#localhost
You can check if "epmd" knows about that node:
$ epmd -names
epmd: up and running on port 4369 with data:
name ejabberd at port 33519
Then let's see if ejabberdctl can connect with that node:
$ ejabberdctl help | grep "node name:"
--node nodename ejabberd node name: ejabberd#localhost
And finally:
$ ejabberdctl status
The node ejabberd#localhost is started with status: started
ejabberd 19.02.52 is running in that node
I assume you didn't yet edit anything in ejabberdctl.cfg, specifically the ERLANG_NODE. But if you did, I recommend to reinstall ejabberd, to ensure you have default configuration, and then retry those steps. Once ejabberd works perfectly, you can start modifying the configuration files (ejabberd.yml and ejabberdctl.cfg) to suit your real requirements (clustering, etc).
At some time, if you have problems setting clustering, you may find some ideas to debug the problem in
https://ejabberd.im/interconnect-erl-nodes/index.html

cloudera host with bad health during install

Trying again & again with all required steps completed but cluster Installation when install selected Parcels, always shows every host with bad health. setup never completed at full.
i am installing cm 5.5 on CentOS 6.7 using virtualbox.
The Error
Host is in bad health cm.feuni.edu
Host is in bad health dn1.feuni.edu
Host is in bad health dn2.feuni.edu
Host is in bad health nn1.feuni.edu
Host is in bad health nn2.feuni.edu
Host is in bad health rm.feuni.edu
above error are shown on step 6 where setup says
The selected parcels are being downloaded and installed on all the hosts in the cluster
in previous step 5 all hosts were completed with heartbeat checks in the end
memory distributions
cm 8GB
all others with 1GB
i could not find proper answer anywhere else. What reason could be for the bad health?
I don't know if it will help you...
For me, after a few days I struggled with it,
I found the log files (at )
It had a comment there is a mismatch of the guid,
so I uninstalled everything from both machines (using the script they give,/usr/share/cmf/uninstall-cloudera-manager.sh , yum remove 'cloudera-manager-*' and deletion of every directory related to cloudera I found...)
and then removed the guid file:
rm /var/lib/cloudera-scm-agent/cm_guid
Afterwards I re-installed everything, and that fixed that issue for me...
I read online that there can be issues with the hostname and things like that, but I guess that if you get to this part of the installation, you already fixed all the domain/FDQN/hosname/hosts issues.
It saddens me there is no real manual/FAQ for this product.. :(
Good luck!
I faced the same problem. This is my solution:
First I edited config.ini
$ nano /etc/cloudera-scm-agent/config.ini
so that the hostname where the same as the command $ hostname returned.
then I restarted the agent and the server of cloudera:
$ service cloudera-scm-agent restart
$ service cloudera-scm-server restart
then in cloudera manager I deleted the cluster and added again. The wizard continued to run normally.

Impala The Cloudera Manager Agent got an unexpected response from this role's web server

i have done an hadoop cluster installation with cloudera manager. After this installation impala status has become bad.
I have the following error for master node:
Web Server Status
and this one for nodes with imapala daemon:
Impala Daemon Ready Check, Web Server Status
looking into logs i have found some errors:
The health test result for IMPALAD_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent got an unexpected response from this role's web server.
looking into cloudera-scm-agent.log there are those errors:
1261 Monitor-HostMonitor throttling_logger ERROR (29 skipped) Failed to collect NTP metrics
i tryed to install NTP (sudo apt-get install ntp) but after this installation HDFS, HIVE, YARN and others services goes bad, removing that only impala goes bad.
MainThread agent ERROR Failed to connect to previous supervisor.
Another error is this:
Monitor-GenericMonitor throttling_logger ERROR Error fetching metrics at 'http://nodo-1:50075/jmx'
i tried looking all hostnames and seems correct...
so, what is this problem? how can i solve it?
I also had problem with NTP, the problem still existed after installing NTP , but when I done sudo service ntp restart the error was fixed

Unable to get Mesos to run from tutorial: Setting up a Single Node Mesosphere Cluster

I have been following this tutorial to try and setup a single node mesosphere cluster from their
official tutorial:
http://mesosphere.com/docs/getting-started/developer/single-node-install/
I followed all the commands without any issues, and I also added the ports 5050 and 8080 to my security group. When I try to access the console for mesos/marathon, I get a "Internet Explorer cannot display the webpage" message.
They also recommend checking it the following way:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
But that comes up with an error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0106 17:03:08.126703 20993 process.cpp:1561] Failed to initialize, gethostbyname2: Unknown host
*** Check failure stack trace: ***
I am not really sure how to troubleshoot this either, and there are not many tutorials I could find on how to install mesos on ubuntu.
I checked the contents of the zk file, seems to be the default value.
$ cat /etc/mesos/zk
zk://localhost:2181/mesos
I would really appreciate any clues on how to go about this one.
Edit: The process is definitely running too - just an fyi:
root 31545 8.5 5.9 187464 35604 ? Ssl 17:28 0:00 /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --log_dir=/var/log/mesos
root 31563 28.5 2.1 116304 12856 ? Rs 17:28 0:00 /usr/local/sbin/mesos-master --zk=zk://localhost:2181/mesos --port=5050 --log_dir=/var/log/mesos --quorum=1 --wo
Mesos uses gethostbyname2 to resolve hostnames to IPs. The first thing I would recommend, is to try "ping localhost" and "ping hostname", and verify that there are no strange settings in /etc/hosts. If you're doing a multi-node cluster, I'd recommend that hostname map to the public IP address (not 127.0.x.1).
If that doesn't help, you can try setting the --ip and --hostname flags when starting mesos-master and mesos-slave, to bypass the gethostbyname2 resolution. These can also be set by writing to the file-based parameters, e.g. /etc/mesos/mesos-master/ip
For additional troubleshooting, try running wget http://localhost:5050 (or curl -L) from the mesos master, to verify that it is locally visible. Also try wget http://<public_ip>:5050 to verify that the web server is up and serving to the public IP. Depending on how your (EC2?) node is setup, you may need to expose/forward the port, or connect to a VPN.
Thanks Adam. I ran the wget and curl commands, and nothing was actually listening on port 8080 or 5050. I did open those ports in the ec2. A simple reboot did the trick however, once I ssh'ed into the ec2 instance after the reboot, both mesos and marathon were running and both ports are now showing after I ran
netstat -ntln.

CDH 5.1 host IP address change

I have a CDH 5.1 cluster with 3 nodes. We installed it using cloudera manager automated installation.
It was running perfect until we moved the box to a different network and IP addresses changed. I tried following steps
1. Stopped service, cloudera-scm-server.
2. Stopped service, cloudera-scm-agent
3. Edit the /etc/cloudera-scm-agent/config.ini
4. change the server host to the new ip.
5. restart service, cloudera-scm-agent, cloudera-scm-server.
not working .
Then i followed
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manager-Administration-Guide/cmag_change_hostnames.html
Not helped even after changing the ips in the PostgreSQL directly.
I found following blog :
http://www.geovanie.me/changing-ip-of-node-in-cdh-cluster/
Getting following error in the scm-agent log file
ProtocolError: <ProtocolError for 127.0.0.1/RPC2: 401 Unauthorized>
Not working ....
Can anyone please help how to change all IP addresses in a cdh 5.1 cluster safely .....
Thanks,
Amit
This is causing because of precious cloudera-scm-agent service wasn't stopped correctly, please try,
$> ps -ef | grep supervisord
$> kill -9 <processID>
then restart the agent again.
$>service cloudera-scm-agent start

Resources