Changing FQDN of nodes in hadoop cluster - hadoop

I would like to change the DNS of the nodes added to my hadoop cluster.
FOr example, FQDN of a node in my cluster is hadoop1.dev.com and I would like to change it to hadoop1.abc.xyz
COuld someone suggest me the process to change it without effecting my cluster data.

Update your /etc/hosts file like below, then restart network service to take effect
x.x.x.x hadoop1.dev.com hadoop1.abc.xyz

Related

Where can I find the include file in Hadoop-1.2.0 single node cluster on ubuntu?

I am trying set the include file on my Hadoop-1.2.0 single node cluster so that no other nodes can talk to the namenode, other that what are in the include file. Not sure how to go about this
I have set up a single node cluster and am currently new to Hadoop. I have an openssh-server installed and have set up the namenode directory and you can view my hdfs-site.xml here.
If you have done the setup manually then you have to set the ssh connections to the other hosts, manually. So, if you have not setup the ssh connections then you are good.
For a UI based installation you have if you have not specified the other hosts you want to add to the cluster then there are no other task-nodes/data-nodes that are connecting.
If you are trying to restricts users then you have to look at the ACLs and allow connections to only your subnet. For users unique user-id and password is good enough for the restriction.

Dynamic IPs for Hadoop cluster

I need to setup a multi-node Hadoop cluster. So far, I have done installations using static IP addresses for each of the cluster nodes. However, in my latest cluster, I need to work with DHCP assigned nodes. So I am wondering, how should I get the cluster working and survive restarts etc.
Is it mandatory to have static IP address for the cluster nodes or can we get it working with dynamic IPs as well?
Any expert guidance please.
For standalone and pseudo-distributed modes, you can get going on dynamic IP address for it runs on a single node.
For fully distributed mode, the nodes are identified with the file masters and slaves located in 'HADOOP_HOME/conf'. These names are hosts which have been described in '/etc/hosts'. So, when IP of any node changes, Hadoop cannot identify the machines or nodes or hosts (even if identified, these nodes have no Hadoop configured). Thus, you cannot achieve the fully distributed Hadoop setup.
Get your DHCP configured on a router if you can. Else install DHCP on all of the nodes. And get going!!

How can I force Hadoop to use my hostnames instead of IP-XX-XX-XX-XX

So, I'm configuring a 10 node cluster with Hadoop 2.5.2, and so far it's working, but the only issue I have is that when trying to communicate with the nodes, Hadoop is guessing their hostnames based on their IP, instead of using the ones I've configured.
Let me be more specific: this is happening when starting a job, but when I start up yarn (for instance), the slave nodes names, are used correctly. The scheme that Hadoop uses to auto-generate the names of the nodes is IP-XX-XX-XX-XX, so for a node with IP 179.30.0.1 it's name would be IP-179-30-0-1.
This is forcing me to edit every /etc/hosts file on each node, so that their 127.0.0.1 ip is named like Hadoop says.
Can I make Hadoop use the names I have those hosts? Or am I force to do this extra configuration step normally?

Adding a secondary node on another computer?

The running instance of elasticsearch on a server is running with all defaults, no changes.
How can I scale horizontally to another server on another network?
Where do you specify this?
I only see one elasticsearch.yml file in the config directory, do I have to make a new config file for each cluster/node etc I would like to enable? The config file appears to be for one instance only. How do I tell it to use it as a master and the secondary server outside of the network as a secondary instance?
On the other node, you install ES as usual and, depending on the network characteristics and your preference, you change or not things in elasticsearch.yml of both ES instances.
ES uses by default multicasting on the network to discover nodes in the same cluster. A cluster is defined by "cluster.name" property you can find in elasticsearch.yml file. Nodes with the same "cluster.name" will join the same cluster. If using multicasting, you need to make sure, first the multicasting is available in your network configuration, and then that you don't have firewalls or any other things that could block communication between the nodes (like port 54328).
You can also use unicasting for nodes discovery, where the address of each node is specified in elasticsearch.yml. For more details about this, check elasticsearch.yml file as it has some good description of these settings. For example, disable multicasting:
discovery.zen.ping.multicast.enabled: false
and configure unicasting:
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

How to run HDFS cluster without DNS

I'm building a local HDFS dev environment (actually hadoop + mesos + zk + kafka) to ease development of Spark jobs and facilitate local integrated testing.
All other components are working fine but I'm having issues with HDFS. When the Data Node tries to connect to the name node, I get a DisallowedDataNodeException:
org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode
Most questions related to the same issue boil down to name resolution of the data node at the name node either static through the etc/hosts files or using dns. Static resolution is not an option with docker, as I don't know the data nodes when the name node container is created. I would like to avoid creating and maintaining an additional DNS service. Ideally, I would like to wire everything using the --link feature of docker.
Is there a way to configure HDFS in such a way that it only uses IP addresses to work?
I found this property and set to false, but it didn't do the trick:
dfs.namenode.datanode.registration.ip-hostname-check (default: true)
Is there a way to have a multi-node local HDFS cluster working only using IP addresses and without using DNS?
I would look at reconfiguring your Docker image to use a different hosts file [1]. In particular:
In the Dockerfile(s), do the switch-a-roo [1]
Bring up the master node
Bring up the data nodes, linked
Before starting the datanode, copy over the /etc/hosts to the new location, /tmp/hosts
Append the master node's name and master node ip to the new hosts file
Hope this works for you!
[1] https://github.com/dotcloud/docker/issues/2267#issuecomment-40364340

Resources