I am trying to access a firewalled Hadoop cluster running YARN via a SOCKS proxy. The cluster itself is not using proxied connections -- only my client running on a local machine (e.g. a laptop) is connected via ssh -D 9999 user#gateway-host to a machine that can see the Hadoop cluster.
In the Hadoop configuration core-site.xml (on my laptop) I have the following lines:
<property>
<name>hadoop.socks.server</name>
<value>localhost:9999</value>
</property>
<property>
<name>hadoop.rpc.socket.factory.class.default</name>
<value>org.apache.hadoop.net.SocksSocketFactory</value>
</property>
Accessing HDFS this way works great. However, when I try to submit a YARN job, it fails and I can see in the logs that the nodes are not able to talk to each other:
java.io.IOException: Failed on local exception: java.net.SocketException: Connection refused; Host Details : local host is: "host1"; destination host is: "host2":8030;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
where host1 and host2 are both parts of the hadoop cluster.
I guess what is happening is that the hadoop nodes are trying to communicate via a socks proxy as well and this is obviously failing since no proxy server exists on each host. Is there a way to fix this apart from setting up a dedicated proxy server?
You are right, the Hadoop nodes must not use the SOCKS proxy for the communication. You can achieve that by marking the SocketFactory setting on the cluster side final.
In core-site.xml on the cluster, add the final tag to the default SocketFactory property:
<property>
<name>hadoop.rpc.socket.factory.class.default</name>
<value>org.apache.hadoop.net.StandardSocketFactory</value>
<final>true</final>
</property>
Obviously, you must restart cluster services.
Related
I'm trying to set up two types of hadoop clusters: one standalone via SSH localhost and the other in aws ec2.
Both fail for similar issues: a connection refused error.
Here are some pictures of the issues: This is the result of ssh localhost
The next is: the failed run.
This is the relevenat portion of ~/.ssh/config
I can run hadoop, hdfs, yarn, and all the other commands. But, when I actually type this and run it, it fails:
Of note, I'm following this tutorial for the aws ec2 cluster, (this command is almost at the end). https://awstip.com/setting-up-multi-node-apache-hadoop-cluster-on-aws-ec2-from-scratch-2e9caa6881bd
Which is failling on this command: scp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml ubuntu#ec2-54-209-221-47.compute-1.amazonaws.com:/home/ubuntu/hadoop/conf
That's not my ec2 link; it's from the example, but that's where it's faiing with the same error as the 2nd and 4th pictures.
I want to acces cloudera from a distant machine via Talend for big data. In order to do that i changed the ip of the host in cloudera by editing the file /etc/hosts and /etc/sysconfig/network.
I can acces cloudera from Talend. However the problem is that my datanode and Namenode seems to be not connected. When i check the log details of my Datanode i get the following errors :
Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 beginning handshake with NN
Initialization failed for Block pool BP-1183376682-127.0.0.1-1433878467265 (Datanode Uuid null) service to quickstart.cloudera/127.0.0.1:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=5802ab81-2c28-4beb-b155-cac31567d23a, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-83500942-6c65-4ecd-a0c2-a448be86bcb0;nsid=975591446;c=0)
the datanode still uses the wrong ip adress ( 127.0.0.1 ) even though i made the modifications in core-site.xml, hdfs-site.xml and mapred-site.xml by editing the previous ip adress by the new one.
I followed the steps given in this tutorial to do so :
https://www.youtube.com/watch?v=fSGpYHjGIRY
How can i fix this error ?
On Debian 8, the /etc/hosts will contain entry for 127.0.1.1 with your hostname you gave during Linux installation. Cloudera will use this IP-address for some of its services.
A regular HDFS will contain multiple servers with different hostnames/IP-addresses and will list those IPs as allowed. As your log says, the traffic is originating from 127.0.0.1, which is not the IP-address of your hostname.
For Cloudera single-server setup, the only way I found was to do the initial setup so, that /etc/hosts doesn't have the 127.0.1.1 entry in it.
I'm using the cloudera distribution of Hadoop and recently had to change the IP addresses of a few nodes in the cluster. After the change, on one of the nodes (Old IP:10.88.76.223, New IP: 10.88.69.31) the following error comes up when I try to start the data node service.
Initialization failed for block pool Block pool BP-77624948-10.88.65.174-13492342342 (storage id DS-820323624-10.88.76.223-50010-142302323234) service to hadoop-name-node-01/10.88.65.174:6666
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(10.88.69.31, storageID=DS-820323624-10.88.76.223-50010-142302323234, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster25;nsid=1486084428;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:656)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:3593)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:899)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:91), I was unable to start the datanode service due to the following error:
Has anyone had success with changing the IP address of a hadoop data node and join it back to the cluster without data loss?
CHANGE HOST IP IN CLOUDERA MANAGER
Change Host IP on all node
sudo nano /etc/hosts
Edit the ip cloudera config.ini on all node if the master node ip changes
sudo nano /etc/cloudera-scm-agent/config.ini
Change IP in PostgreSQL Database
For the password Open PostgreSQL password
cat /etc/cloudera-scm-server/db.properties
Find the password lines
Example. com.cloudera.cmf.db.password=gUHHwvJdoE
Open PostgreSQL
psql -h localhost -p 7432 -U scm
Select table in PostgreSQL
select name,host_id,ip_address from hosts;
Update table IP
update hosts set ip_address = 'xxx.xxx.xxx.xxx' where host_id=x;
Exit the tool
\q
Restart the service on all node
service cloudera-scm-agent restart
Restart the service on master node
service cloudera-scm-server restart
Turns out its better to:
Decommission the server from the cluster to ensure that all blocks are replicated to other nodes in the cluster.
Remove the server from the cluster
Connect to the server and change the IP address then restart the cloudera agent
Notice that cloudera manager now shows two entries for this server. Delete the entry with the old IP and longest heartbeat time
Add the server to the required cluster and add required roles back to the server (e.g. HDFS datanode, HBASE RS, Yarn)
HDFS will read all data disks and recognize the block pool and cluster IDs, then register the datanode.
All data will be available and the process will be transparent to any client.
NOTE: If you run into name resolution errors from HDFS clients, the application has likely cached the old IP and will most likely need be restarted. Particularly Java clients that previously referenced this server e.g. HBASE clients must be restarted due to the JVM caching IPs indefinitely. Java based clients will likely throw errors relating to connectivity to the server with changed IP because they have the old IP cached until they are restarted.
ultra-noob. I have a server machine with cdh3u1 pseudo-distrib, and a client machine with a java application using the cdh3u1 API.
How do I configure the client to talk to the server? I've been googling for hours and couldn't find where is the "client configuration" file. The "hdfs-default", "core-default" and "mapred-default" and their "-site" counterparts all look like server (namenode and datanode) config to me.
Is it just "multipurpose client server" config and I should cherry-pick the attributes in these files that are appropriate to the client? which are they? probably missing something big here...
Thanks, Ido
make sure that the client machine can access the hadoop server machine ip. If you use a virtualbox for the hadoop server (cdh3 vm), then add a "host-only" network interface (see details here: host-only networking with virtualbox. I'm assuming that your static ip for the hadoop server is 192.168.56.101 and that you're able to ping it from your client.
configure a hostname for your hadoop server machine in both the server and client machine. If you want to name your hadoop server "local-elephant", add the following line to /etc/hosts in both machines: 192.168.56.101 local-elephant.
in the server machine goto /etc/hadoop/conf change the values of the following properties from "localhost" to "local-elephant": in core-site.xml the value of fs.default.name and in mapred-site.xml the value of mapred.job.tracker.
in the client machine, create core-site.xml and mapred-site.xml in the classpath of your java application. In those files put only the fs.default.name and mapred.job.tracker properties.
I have a network with some weird (as I understand) DNS server which causes Hadoop or HBase to malfunction.
It resolves my hostname to some address my machine doesn't know about (i.e. there is no such interface).
Hadoop does work if I have following entries in /etc/hosts:
127.0.0.1 localhost
127.0.1.1 myhostname
If entry "127.0.1.1 myhostname" is not present uploading file to HDFS fails and complains that it can replicate the file only to 0 datanodes instead of 1.
But in this case HBase does not work: creating a table from HBase shell causes NotAllMetaRegionsOnlineException (caused actually by HMaster trying to bind to wrong address returned by DNS server for myhostname).
In other network, I am using following /etc/hosts:
127.0.0.1 localhost
192.168.1.1 myhostname
And both Hadoop and HBase work.
The problem is that in second network the address is dynamic and I can't list it into /etc/hosts to override result returned by weird DNS.
Hadoop is run in pseudo-distributed mode. HBase also runs on single node.
Changing behavior of DNS server is not an option.
Changing "localhost" to 127.0.0.1 in hbase/conf/regionservers doesn't change anything.
Can somebody suggest a way how can I override its behavior while retaining internet connection (I actually work at client's machine through Teamviewer). Or some way to configure HBase (or Zookeeper it is managing) not to use hostname to determine address to bind?
Luckily, I've found the workaround to this DNS server problem.
DNS server returned invalid address when queried by local hostname.
HBase by default does reverse DNS lookup on local hostname to determine where to bind.
Because the address returned by DNS server was invalid, HMaster wasn't able to bind.
Workaround:
In hbase/conf/hbase-site.xml explicitly specify interfaces that will be used for master and regionserver:
<configuration>
<property>
<name>hbase.master.dns.interface</name>
<value>lo</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo</value>
</property>
</configuration>
In this case, I specified loopback interface (lo) to be used for both master and regionserver.
a simple tool I wrote to check for DNS issues:
https://github.com/sujee/hadoop-dns-checker