Cluster configuration on MonetDB: cannot discover other nodes - monetdb

I have installed and configured a 3 monetdb nodes cluster on 3 virtual machines on my MacBook (Using Oracle Virtual Box). I use MonetDB 5 server 11.37.7
I have followed the Cluster Management documentation of MonetDB, but the monetdb discover command only returns the dbfarm of the local instance. Each node still isn't aware of other nodes.
I can connect to any nodes from any other node using monetdb -h [host] -P [passphrase], I can also discover the remote farms of a specific host by using monetdb -h host -P passphrase discover
The answer to this question monetdb cluster management can't setup helped me in setting the listenaddr property to 0.0.0.0, but still, the discover command only returns the local monetdb farm.
EDIT
Thanks to Jennie suggestion below, I noticed that the monetdb log file contains error while sending broadcast message: Network is unreachable.
I used netcat utility to brodcast UDP message from one node to the other 2 and it worked, I can ping, ssh and the 3 nodes are part of the same network configured with virtualbox, but the error is still there.

All your VMs must be in the same LAN environment. monetdb discover basically goes over all IP addresses under the same subnet.
Can you some how verify that's the case?

I got it working, thanks to #Jennie's post. For anyone using VirtualBox:
Use the first network adapter of each configured node with Bridge access instead of NAT
Configure the following property of your dbfarm: listenaddr=0.0.0.0
For testing purpose, it may be worth reducing the property discoveryttl to less than the default 10mns

Related

How can i connect to my elasticsearch cluster from another machine?

I want to connect my elasticsearch cluster from another machine i went through some documentation where they had mentioned that i had change the network.bind_host : 0 .But i didn't find the network.bind_host in my elasticsearch.yml . I got only network.host in my elasticsearch.yml file.Even i tried it by giving as
network.host :0 but i cant able to connect from another machine. And i also tried removing ## before network.host :0 which throws an error when starting elasticsearch cluster.
When i am connecting from another machine i have to give http://clustermachingip:9200 right?
Can anyone please help on this problem?
Thanks..
When you want to connect to an elasticsearch instance of an another machine, yes the address is http://clustermachingip:9200. Can you try setting network.bind_host: clustermachingip
If this doesn't work then you might want to check the connectivity to the machine you are trying to connect to using something like a ping command.
ping clustermachingip
EDIT:
You can just start elasticsearch in one machine and try one of the following curl commands from the other machine.
curl 'clustermachingip:9200/_cat/nodes?v'
curl 'clustermachingip:9200/_cat/health?v'
EDIT2: Clearing out confusion between network.host, network.bind_host
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/modules-network.html#advanced-network-settings
The network.host setting explained in Commonly used network settings
is a shortcut which sets the bind host and the publish host at the
same time. In advanced used cases, such as when running behind a proxy
server, you may need to set these settings to different values:
network.bind_host
This specifies which network interface(s) a node should bind to in order to listen for incoming requests. A node can bind to multiple
interfaces, e.g. two network cards, or a site-local address and a
local address. Defaults to network.host. network.publish_host
The publish host is the single interface that the node advertises to other nodes in the cluster, so that those nodes can connect to it.
Currently an elasticsearch node may be bound to multiple addresses,
but only publishes one. If not specified, this defaults to the “best”
address from network.host, sorted by IPv4/IPv6 stack preference, then
by reachability.
Set your network.host in elasticsearch.yml to 0.0.0.0 i.e. it will listen on all available bound addresses.
network.host: 0.0.0.0
Check your connectivity to the host machine on the port (in case you haven't changed the port it will be 9200).
In case you are not able to connect to the host machine still, I will suggest checking your iptables and allow connections to port 9200.

Why can't standalone slaves connect to master on separate Mac OS boxes?

I have two Macs (both OS X EI Caption) at home, both are connected to same wifi. I want to install an spark cluster (with two workers) on this two computers.
Mac1 (192.168.1.2) is my master, with Spark 1.5.2, it is up and working well, and I can see the Spark UI at http://localhost:8080/ (also I see spark://Mac1:7077)
I also have run one slave on this machine (Mac1), and I see it under workers in the Spark UI.
Then, I have copied the Spark on the second machine (Mac2), and I am trying to run another Slave on Mac2 (192.168.2.9) by this command:
./sbin/start-slave.sh spark://Mac1:7077
But, it does not work: Looking at log it shows:
Failed to connect to master Mac1:7077
Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster#Mac1:7077/),Path(/User/Master)]
Networking-wise, at Mac1, I can SSH to Mac2, and vice versa, but I cannot telnet to Mac1:7077.
I will appreciate it if you help me to solve this problem.
tl;dr Use -h option for ./sbin/start-master.sh, i.e. ./sbin/start-master.sh -h Mac1
Optionally, you could do ./sbin/start-slave.sh spark://192.168.1.2:7077 instead.
The reason is that binding to ports in Spark is very sensitive to what names and IPs are used. So, in your case, 192.168.1.2 != Mac1. They're different "names" in Spark, and that's why you can use ssh successfully as it uses name resolver on OS while it does not work at Spark level where the above condition holds, i.e. the "names" are not equal.
Likely a networking/firewall issue on the mac.
Also, your error message you copy/pasted reference port 7070. is this the issue?
using IP addresses in conf/slaves works somehow, but I have to use IP everywhere to address the cluster instead of host name.
SPARK + Standalone Cluster: Cannot start worker from another machine

Hadoop Cluster distributed in different sub-networks (Docker + Flannel)

I want to have Hadoop 2.3.0 in a multi bare-metal cluster using Docker. I have a master container and a slave container (in this first setup). When Master and Slave containers are in the same host (and therefore, same Flannel subnet), Hadoop works perfectly. However, if the Master and Slave are in different bare metal nodes (hence, different flannel subnets), it simply does not work (I get a connection refused error). Both containers can ping and ssh one another, so there is no connectivity problem. For some reason, it seems that hadoop needs all the nodes in the cluster to be in the same subnet. Is there a way to circumvent this?
Thanks
I think having the nodes in separate flannel subnets introduces some NAT-related rules which cause such issues.
See the below link which seems to have addressed a similar issue
Re: Networking Problem in creating HDFS cluster.
Hadoop uses a bunch of other ports for communication between the nodes, the above assumes these ports are unblocked.
ssh and ping are not enough. If you have iptables or any other firewalls, either you need to disable or open up the ports. You can set up the cluster, as long as hosts can communicate with each other and ports are open. Run telnet <namenode> <port> to ensure hosts are communicating on desired ports.

How to restart single node hadoop cluster on ec2

I have installed a single node haodoop cluster on using Hortonworks/Ambari on Amazon's ec2 host.
Since I don't want this cluster running 24/7, I stop the instance when done. When I reboot the instance later, I get a new IP address and then ambari no longer is able to start the Hadoop related services.
Is there a way other than completely redeploying to reconfigure the cluster so the services will start?
It looks like the IP address lives in various xml files under /etc, in the postgres database table ambari, and possibly other places I haven't found yet.
I tried updating the xml files and postgres database with updated versions of the ip address, internal and external dns names as I could find them, but to no avail. I have not been able to restart the services.
The reason I am doing this is to possibly save the deployment time and data configuration on hdfs and other project specific setup each time I restart the host.
Any suggestions?
Thanks!
Elastic IP can be used. Also, since you mentioned it being a single node cluster - you can use localhost or private IP.
If you use elastic IP, your UIs will always be on the same public IP. However, if you use private IP or localhost and do not associate your instance with an elastic IP you will have to look for public IP everytime you start the instance and then connect to the web UI using the IP.
Thanks for the help, both Harman and TJ are correct. I haven't used an elastic IP because I might have more than one of these running and a time, and for now at least, I don't mind looking up the public ip address.
Harman's suggestion of using "localhost" as the fqdn when setting up ambari in the first place is a really good idea in retrospect. Unless I go through the whole setup again, that's water under the bridge for me, but I recommend this to others who might read this post.
In my case, I figured this out on my own before coming back to the page. The specific step I took was insanely simple after all, thanks to Occam's Razor.
I added the following line in /etc/hosts:
<new internal IP> <old internal dns name>
and then did
ambari-server restart. from the command line. Then I am able to restart all services after logging into ambari.

Hadoop namenode web UI not opening in CDH4

I recently installed the CDH distribution of Cloudera to create a 2 node cluster. From the Cloudera Manager UI, all services are running fine.
All the command line tools (hive etc ) are also working fine and I am able to read and write data to hdfs.
However the namenode (and datanode) web UI alone is not opening. Checking on netstat -a | grep LISTEN, the processes are listening on the assigned ports and there are no firewall rules which are blocking the connections ( I already disabled iptables)
I initially though that it could be a DNS issue but even the IP address is not working. The Cloudera Manager installed on the same machine on another port is opening fine.
Any pointers on how to debug this problem?
I had faced the same issue.
First it was because NAMENODE in safemode
then after because of two IP address (I have two NIC configured on CDH Cluster one internal connectivity of the servers (10.0.0.1) and other is to connect servers form Internet (192.168.0.1))
When i try to open NAMENODE GUI form any of the server connected to cluster on network 10.0.0.1 then GUI is opening and works fine but from other any other machine connected to servers by 192.168.0.1 network it fails.

Resources