Hadoop Cluster distributed in different sub-networks (Docker + Flannel) - hadoop

I want to have Hadoop 2.3.0 in a multi bare-metal cluster using Docker. I have a master container and a slave container (in this first setup). When Master and Slave containers are in the same host (and therefore, same Flannel subnet), Hadoop works perfectly. However, if the Master and Slave are in different bare metal nodes (hence, different flannel subnets), it simply does not work (I get a connection refused error). Both containers can ping and ssh one another, so there is no connectivity problem. For some reason, it seems that hadoop needs all the nodes in the cluster to be in the same subnet. Is there a way to circumvent this?
Thanks

I think having the nodes in separate flannel subnets introduces some NAT-related rules which cause such issues.
See the below link which seems to have addressed a similar issue
Re: Networking Problem in creating HDFS cluster.
Hadoop uses a bunch of other ports for communication between the nodes, the above assumes these ports are unblocked.

ssh and ping are not enough. If you have iptables or any other firewalls, either you need to disable or open up the ports. You can set up the cluster, as long as hosts can communicate with each other and ports are open. Run telnet <namenode> <port> to ensure hosts are communicating on desired ports.

Related

How to setup Kubernetes cluster on several windows hosts?

I have several Windows servers available and would like to setup a Kubernetes cluster on them.
Is there some tool or a step by step instruction how to do so?
What I tried so far is to install DockerDesktop and enable its Kubernetes feature.
That gives me a single node Cluster. However, adding additional nodes to that Docker-Kubernetes Cluster (from different Windows hosts) does not seem to be possible:
Docker desktop kubernetes add node
Should I first create a Docker Swarm and could then run Kubernetes on that Swarm? Or are there other strategies?
I guess that I need to open some ports in the Windows Firewall Settings of the hosts? And map those ports to some Docker containers in which Kubernetes is will be installed? What ports?
Is there some program that I could install on each Windows host and that would help me with setting up a network with multiple hosts and connecting the Kubernetes nodes running inside Docker containers? Like a "kubeadm for Windows"?
Would be great if you could give me some hint on the right direction.
Edit:
Related info about installing kubeadm inside Docker container:
https://github.com/kubernetes/kubernetes/issues/35712
https://github.com/kubernetes/kubeadm/issues/17
Related question about Minikube:
Adding nodes to a Windows Minikube Kubernetes Installation - How?
Info on kind (kubernetes in docker) multi-node cluster:
https://dotnetninja.net/2021/03/running-a-multi-node-kubernetes-cluster-on-windows-with-kind/
(Creates multi-node kubernetes cluster on single windows host)
Also see:
https://github.com/kubernetes-sigs/kind/issues/2652
https://hub.docker.com/r/kindest/node
You can always refer to the official kubernetes documentation which is the right source for the information.
This is the correct way to manage this question.
Based on Adding Windows nodes, you need to have two prerequisites:
Obtain a Windows Server 2019 license (or higher) in order to configure the Windows node that hosts Windows containers. If you are
using VXLAN/Overlay networking you must have also have KB4489899
installed.
A Linux-based Kubernetes kubeadm cluster in which you have access to the control plane (see Creating a single control-plane cluster with kubeadm).
Second point is especially important since all control plane components are supposed to be run on linux systems (I guess you can run a Linux VM on one of the servers to host a control plane components on it, but networking will be much more complicated).
And once you have a proper running control plane, there's a kubeadm for windows to proper join Windows nodes to the kubernetes cluster. As well as a documentation on how to upgrade windows nodes.
For firewall and which ports should be open check ports and protocols.
For worker node (which will be windows nodes):
Protocol Direction Port Range Purpose Used By
TCP Inbound 10250 Kubelet API Self, Control plane
TCP Inbound 30000-32767 NodePort Services All
Another option can be running windows nodes in cloud managed kuberneres, for example GKE with windows node pool (yes, I understand that it's not your use-case, but for further reference).

Cluster configuration on MonetDB: cannot discover other nodes

I have installed and configured a 3 monetdb nodes cluster on 3 virtual machines on my MacBook (Using Oracle Virtual Box). I use MonetDB 5 server 11.37.7
I have followed the Cluster Management documentation of MonetDB, but the monetdb discover command only returns the dbfarm of the local instance. Each node still isn't aware of other nodes.
I can connect to any nodes from any other node using monetdb -h [host] -P [passphrase], I can also discover the remote farms of a specific host by using monetdb -h host -P passphrase discover
The answer to this question monetdb cluster management can't setup helped me in setting the listenaddr property to 0.0.0.0, but still, the discover command only returns the local monetdb farm.
EDIT
Thanks to Jennie suggestion below, I noticed that the monetdb log file contains error while sending broadcast message: Network is unreachable.
I used netcat utility to brodcast UDP message from one node to the other 2 and it worked, I can ping, ssh and the 3 nodes are part of the same network configured with virtualbox, but the error is still there.
All your VMs must be in the same LAN environment. monetdb discover basically goes over all IP addresses under the same subnet.
Can you some how verify that's the case?
I got it working, thanks to #Jennie's post. For anyone using VirtualBox:
Use the first network adapter of each configured node with Bridge access instead of NAT
Configure the following property of your dbfarm: listenaddr=0.0.0.0
For testing purpose, it may be worth reducing the property discoveryttl to less than the default 10mns

access hdfs outside docker swarm

I have created a docker swarm configuration which consists of a namenode, datanode, resource manager, and yarn workers. These all work well together and I can run hdfs dfs commands from any container in the swarm. I have also exposed the port 9000 using the ports section of the yaml. In my core-site.xml I use the hostname of the namenode from the swarm configuration.
I am unable to get a client outside of the swarm to access the cluster using the hdfs dfs commands. I have a different core-site.xml which has the address of the host machine for the swarm. When I run commands I get a java.io.EOFException.
Is there any way to get an external client to connect to the hadoop cluster running in docker swarm?
Turns out this is resolved by following the multihomed network instructions. It seems that it appears to the namenode as though it has two network interfaces to work with. One for the overlay network across the swarm and another from exposing the port.

Why can't standalone slaves connect to master on separate Mac OS boxes?

I have two Macs (both OS X EI Caption) at home, both are connected to same wifi. I want to install an spark cluster (with two workers) on this two computers.
Mac1 (192.168.1.2) is my master, with Spark 1.5.2, it is up and working well, and I can see the Spark UI at http://localhost:8080/ (also I see spark://Mac1:7077)
I also have run one slave on this machine (Mac1), and I see it under workers in the Spark UI.
Then, I have copied the Spark on the second machine (Mac2), and I am trying to run another Slave on Mac2 (192.168.2.9) by this command:
./sbin/start-slave.sh spark://Mac1:7077
But, it does not work: Looking at log it shows:
Failed to connect to master Mac1:7077
Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster#Mac1:7077/),Path(/User/Master)]
Networking-wise, at Mac1, I can SSH to Mac2, and vice versa, but I cannot telnet to Mac1:7077.
I will appreciate it if you help me to solve this problem.
tl;dr Use -h option for ./sbin/start-master.sh, i.e. ./sbin/start-master.sh -h Mac1
Optionally, you could do ./sbin/start-slave.sh spark://192.168.1.2:7077 instead.
The reason is that binding to ports in Spark is very sensitive to what names and IPs are used. So, in your case, 192.168.1.2 != Mac1. They're different "names" in Spark, and that's why you can use ssh successfully as it uses name resolver on OS while it does not work at Spark level where the above condition holds, i.e. the "names" are not equal.
Likely a networking/firewall issue on the mac.
Also, your error message you copy/pasted reference port 7070. is this the issue?
using IP addresses in conf/slaves works somehow, but I have to use IP everywhere to address the cluster instead of host name.
SPARK + Standalone Cluster: Cannot start worker from another machine

Hadoop namenode web UI not opening in CDH4

I recently installed the CDH distribution of Cloudera to create a 2 node cluster. From the Cloudera Manager UI, all services are running fine.
All the command line tools (hive etc ) are also working fine and I am able to read and write data to hdfs.
However the namenode (and datanode) web UI alone is not opening. Checking on netstat -a | grep LISTEN, the processes are listening on the assigned ports and there are no firewall rules which are blocking the connections ( I already disabled iptables)
I initially though that it could be a DNS issue but even the IP address is not working. The Cloudera Manager installed on the same machine on another port is opening fine.
Any pointers on how to debug this problem?
I had faced the same issue.
First it was because NAMENODE in safemode
then after because of two IP address (I have two NIC configured on CDH Cluster one internal connectivity of the servers (10.0.0.1) and other is to connect servers form Internet (192.168.0.1))
When i try to open NAMENODE GUI form any of the server connected to cluster on network 10.0.0.1 then GUI is opening and works fine but from other any other machine connected to servers by 192.168.0.1 network it fails.

Resources