Currently I'm trying to change data node's ssh port in Hadoop. I have master and one data node and each host's ssh ports are different.
What I did :
Generated ssh key and I can connect to data node without password.
Added both on /etc/hosts as master and worker1
Changed port for master node in hadoop-env.sh file.
Changed the file on data node also.
The problem is, Hadoop uses same ports for master and data node like this.
How do I make Hadoop use different port for master and data node?
Any help would be appreciated :)
Related
I am using WebHDFSSensor and for that we need to provide namenode. However, active namenode and standBy namenode change. I can't just provide current namenode host to webhdfs_conn_id. I have to create connection from both host. I tried to provide host as an array but it didn't work.
So my question here is , Lets consider I need connection with name webhdfs_default and I need it for 2 host w.x.y.z and a.b.c.d. How do I create that?
I have an hdfs cluster listening on 192.168.50.1:9000 which means it only accepts connections via that IP. I would like it to listen at 0.0.0.0:9000. When I enter 127.0.0.1 localhost master in the /etc/hosts, then it starts at 127.0.0.1:9000, which prevents all nodes to connect.
This question is similar to this one How to make Hadoop servers listening on all IPs, but for hdfs, not yarn.
Is there an equivalent setting for core-site.xml like yarn.resourcemanager.bind-host or any other way to configure this? If not, then what's the reasoning behind this? Is it a security feature?
For the NameNode you need to set these to 0.0.0.0 in your hdfs-site.xml:
dfs.namenode.rpc-bind-host
dfs.namenode.servicerpc-bind-host
dfs.namenode.lifeline.rpc-bind-host
dfs.namenode.http-bind-host
dfs.namenode.https-bind-host
The DataNodes use 0.0.0.0 by default.
If you ever need to find a config variable for HDFS, refer to hdfs-default.xml.
Also very useful, if you look at any of the official Hadoop docs, at the bottom left corner of the page are all the default values for the various XML files.
So you can go to Apache Hadoop 2.8.0 or your specific version and find the settings you're looking for.
Well the question is quite old already, however, usually you do not need to configure the bind address 0.0.0.0 because it is the default value! You'd rather have an entry in the /etc/hosts file 127.0.0.1 hostname which hadoop resolves to 127.0.0.1.
Consequently you need to remove that entry and hadoop will bind to all interfaces (0.0.0.0) without any additional config entries in the config files.
I have two Macs (both OS X EI Caption) at home, both are connected to same wifi. I want to install an spark cluster (with two workers) on this two computers.
Mac1 (192.168.1.2) is my master, with Spark 1.5.2, it is up and working well, and I can see the Spark UI at http://localhost:8080/ (also I see spark://Mac1:7077)
I also have run one slave on this machine (Mac1), and I see it under workers in the Spark UI.
Then, I have copied the Spark on the second machine (Mac2), and I am trying to run another Slave on Mac2 (192.168.2.9) by this command:
./sbin/start-slave.sh spark://Mac1:7077
But, it does not work: Looking at log it shows:
Failed to connect to master Mac1:7077
Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster#Mac1:7077/),Path(/User/Master)]
Networking-wise, at Mac1, I can SSH to Mac2, and vice versa, but I cannot telnet to Mac1:7077.
I will appreciate it if you help me to solve this problem.
tl;dr Use -h option for ./sbin/start-master.sh, i.e. ./sbin/start-master.sh -h Mac1
Optionally, you could do ./sbin/start-slave.sh spark://192.168.1.2:7077 instead.
The reason is that binding to ports in Spark is very sensitive to what names and IPs are used. So, in your case, 192.168.1.2 != Mac1. They're different "names" in Spark, and that's why you can use ssh successfully as it uses name resolver on OS while it does not work at Spark level where the above condition holds, i.e. the "names" are not equal.
Likely a networking/firewall issue on the mac.
Also, your error message you copy/pasted reference port 7070. is this the issue?
using IP addresses in conf/slaves works somehow, but I have to use IP everywhere to address the cluster instead of host name.
SPARK + Standalone Cluster: Cannot start worker from another machine
I want to have Hadoop 2.3.0 in a multi bare-metal cluster using Docker. I have a master container and a slave container (in this first setup). When Master and Slave containers are in the same host (and therefore, same Flannel subnet), Hadoop works perfectly. However, if the Master and Slave are in different bare metal nodes (hence, different flannel subnets), it simply does not work (I get a connection refused error). Both containers can ping and ssh one another, so there is no connectivity problem. For some reason, it seems that hadoop needs all the nodes in the cluster to be in the same subnet. Is there a way to circumvent this?
Thanks
I think having the nodes in separate flannel subnets introduces some NAT-related rules which cause such issues.
See the below link which seems to have addressed a similar issue
Re: Networking Problem in creating HDFS cluster.
Hadoop uses a bunch of other ports for communication between the nodes, the above assumes these ports are unblocked.
ssh and ping are not enough. If you have iptables or any other firewalls, either you need to disable or open up the ports. You can set up the cluster, as long as hosts can communicate with each other and ports are open. Run telnet <namenode> <port> to ensure hosts are communicating on desired ports.
In my Hadoop environment, I need to configure my slave nodes so that when they communicate in the middle of a map/reduce job they use the internal IP instead of the external IP that it picks up from the hostname.
Is there any way to set up my Hadoop config files to specify that the nodes should communicate using the internal IPs instead of the external IPs? I've already used the internal IPs in my core-site.xml, master, and slave files.
I've done some research and I've seen people mention the "slave.host.name" parameter, but which config file would I place this parameter in? Are there any other solutions to this problem?
Thanks!
The IP routing tables have to be changed so that the network between the Hadoop nodes uses a particular gateway. Don't think Hadoop has any setting to change which gateway to use.
You can configured slave.host.name in mapred-site.xml for each slave node.
Also remember to use that host name (instead of IP) consistently for all other configurations (core-site.xml, hdfs-site.xml, mapred-site.xml, masters, slaves) and also /etc/hosts file.