Spark I/O error constructing remote block - hadoop

I want to create an home-made spark cluster with two computer in the same network. The setup is the following:
A) 192.168.1.9 spark master with hadoop hdfs installed
Hadoop has this core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
</configuration>
B) 192.168.1.6 with spark only (slave)
From B I want to access to a file in A's hadoop hdfs using the spark command:
...
# Load files
file_1 = "input_1.pkl"
file_2 = "input_2.pkl"
hdfs_base_path = "hdfs://192.168.1.9:9000/folderx/"
sc.addFile(hdfs_base_path + file_1)
sc.addFile(hdfs_base_path + file_2)
# Get files back
with open(SparkFiles.get(file_1), 'rb') as fw:
// use fw
However, if I want to test the program in B, when I execute the program in B using the command:
./spark-submit --master local program.py
The output is the following:
17/07/25 19:02:51 INFO SparkContext: Added file hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl at hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl with timestamp 1501002171301
17/07/25 19:02:51 INFO Utils: Fetching hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl to /tmp/spark-838c3774-36ec-4db1-ab01-a8a8c627b100/userFiles-b4973f80-be6e-4f2e-8ba1-cd64ddca369a/fetchFileTemp1979399086141127743.tmp
17/07/25 19:02:51 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
And later:
17/07/25 19:02:51 WARN DFSClient: Failed to connect to /127.0.0.1:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
The program tries to access 127.0.0.1:50010, and it is wrong. Should I install hadoop also in B? If it is not necessary, what is the correct configuration? Thank you!

btw, just in case anyone comes to find some sort of solution, I fixed my issue by pointed quickstart.cloudera to real IP address instead of 127.0.0.1.
The default /etc/hosts is
127.0.0.1 quickstart.cloudera quickstart localhost localhost.domain
What you want is
127.0.0.1 localhost localhost.domain
xxxIP_Address_oF_YOUR_VM quickstart.cloudera quickstart
You may want to modify /usr/bin/cloudera-quickstart-ip as well, because every time you restart your VM, the hosts file may got reset again.

Related

Hadoop java.net.SocketException: Network is unreachable

I´m executing this command in a 4 node hadoop cluster on the namenode node:
hadoop fs -ls /
But it shows an error:
ls: Failed on local exception: java.net.SocketException:
Network is unreachable; Host Details: local host is "namenode/172.16.1.2";
destination host is: "namenode":9000;
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
</configuration>
cat /etc/hosts:
172.16.1.2 namenode
172.16.1.3 datanode1
172.16.1.4 datanode2
172.16.1.5 datanode3
First try to ping namenode and see what happen. If ping reaches the host, check the firewall via iptables on your current machine and namenode because it is probably blocking related traffic.
For me work setting JVM config
-Djava.net.preferIPv4Stack=true

how to connect localhost as client and server in hadoop same user in ubuntu

If client and server in same ubuntu machine, not able to connect.
giving error
Call From ashish/127.0.0.1 to localhost:54310 failed on connection
exception: java.net.ConnectException: Connection refused; For more
details see: http://wiki.apache.org/hadoop/ConnectionRefused
By server do you mean namenode? By client do you mean hadoop client, datanode, nodemanager? Are you sure namenode is running and exposed on localhost:54310? Could you try
> nc -vz localhost 54310
How does your /etc/hosts look like? How does your core-site.xml and hdfs-site.xml looks like? What do you get for (as your hadoop user):
> jps -ml
What do you get for:
> sudo iptables -L
Also take a look at:
Hadoop cluster setup - java.net.ConnectException: Connection refused
Hadoop Datanodes cannot find NameNode

hadoop Protocol message tag had invalid wire type

I Set up hadoop 2.6 cluster using two nodes of 8 cores each on Ubuntu 12.04. sbin/start-dfs.sh and sbin/start-yarn.sh both succeed. And I can see the following after jps on the master node.
22437 DataNode
22988 ResourceManager
24668 Jps
22748 SecondaryNameNode
23244 NodeManager
The jps outcome on the slave node is
19693 DataNode
19966 NodeManager
I then run the PI example.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 30 100
Which gives me there error-log
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
The problem seems with the HDFS file system since trying out the command bin/hdfs dfs -mkdir /user fails with the similar exception.
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "Master-R5-Node/xxx.ww.y.zz"; destination host is: "Master-R5-Node":54310;
where xxx.ww.y.zz is the ip-address of Master-R5-Node
I have checked and followed all the recommendations of ConnectionRefused on Apache and on this site.
Despite the week long effort, I cannot get it fixed.
Thanks.
There are so many reasons to what may lead to the problem I faced. But I finally ended up fixing it using some of the following things.
Make sure that you have the needed permission to the /hadoop and hdfs temporary files. (you have to figure out where that is for your paticular case)
remove the port number from fs.defaultFS in $HADOOP_CONF_DIR/core-site.xml. It should look like this:
`<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://my.master.ip.address/</value>
<description>NameNode URI</description>
</property>
</configuration>`
Add the following two properties to `$HADOOP_CONF_DIR/hdfs-site.xml
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
Voila! You should now be up and running!

Fully Distributed HBase Error

I'm trying to setup HBase 0.96 to run on top of my Hadoop 2.2.0 cluster. I run start-hbase.sh and the master along with the regions startup. I can log into each region and see the processes running. However when check to see how many regions are up either through the web ui or a shell command I get a response of 0. Based on the logs it looks like the region servers are starting up not unable to notify the master that they are running. I confirmed that the master is listening on port 60000 and ports 60000 along with 60020 are both open. I've included my hbase-site file along with the logs from a region server.
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/master</value>
</property>
Log File:
2013-11-08 20:08:58,357 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=10.119.102.58,60000,1383941300240 with port=60020, startcode=1383941300420
2013-11-08 20:09:18,636 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connec$
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1924)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:790)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending local=/100.65.$
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:573)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:858)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1532)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1421)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
... 5 more
2013-11-08 20:09:18,676 WARN [regionserver60020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
I don't think the hbase.zookeeper.quorum is set correctly, which may cause the connection timeout. If you just wan't to test 0.96, start it in standalone mode and then make sure the zookeeper cluster is running before you change to distributed mode.
The HRegionServer complains that it can't connect to HMaster in order to report status (up).
It's probable that the HMaster process is not running so you may want to start it, or if you already started it to check the master log file.
Check your master server is listening on port 60000 by using the following command
netstat -l
tcp6 0 0 Vostro-350:60000 : LISTEN
if the server is listening on ipv6 then disabled it.
To disable you have to append the following to the file: /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
After reboot you should validate that IPV6 is really off by:
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
(0 = IPV6 on ; 1 = IPV6 off)
Ref link : Connecting and Persisting to HBase

hbase cannot connect to hadoop

I am running an hdfs instance in pseudo-distributed mode, and tried to make another hbase instance connected to it on the same server. Logs in hadoop are fine, but I constantly got the connection failure in hbase' log
==================================================================================
2012-05-01 10:49:07,212 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
2012-05-01 10:49:07,213 WARN org.apache.zookeeper.ClientCnxn: Session 0x13708dc552d0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2012-05-01 10:49:08,882 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
2012-05-01 10:49:08,882 WARN org.apache.zookeeper.ClientCnxn: Session 0x13708dc552d0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
==================================================================================
Configuration of core-site.xml#hadoop
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Configuration of hbase-site.xml#hbase
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
I also tried to replace localhost with the actual ip of the server, but got the same error.
First, you need to make sure your hbase master node is running, you can use jps to check.
If it is not running, you can run it by start-hbase.sh command or hbase master start.
And then check its status by other commands, like netstat -an | grep 9000
Second, if the previous method does not work, check your firewall configuration such as iptables and SELinux.
Use sudo iptables -L to check your iptables configuration. You can disable the iptables by sudo service iptables stop command under redhat based linux systems.
Use getenforce to check if SElinux is in enforcing mode.
Third, check the system configuration, for example, ssh etc.
You need to replace the core hadoop jar in the $HBASE_HOME/bin/lib/hadoop-{{version}}core.jar with the one in the $HADOOP_HOME/hadoop-{{version}}core.jar
I was running into the same problem when I tried to install hbase 0.92 from hbase 0.90-xxx which was working fine, i replaced the hbase-env.sh and hbase-site.xml from the old hbase to the new but forgot to copy the hadoop core jar.
I'm always suspicious when I see localhost in a config. Also when you use localhost, then it becomes very difficult (to impossible) to access any services from the psuedo distributed system from a host other than the one you are running on.
You did say you tried the IP address, but you might want to make sure its the IP address that the node really thinks its at.
Check the zookeeper logs from when it starts up and see what IP address it "thinks" its at. There should be a line like:
2012-01-31 09:32:46,083 - INFO [main:Environment#97] - Server environment:host.name=ip-10-8-127-58.ec2.internal
Then use the value of host.name as the value for all places in ALL Hadodop, HBase, Hive, Zookeeper, etc config files that need a hostname (assuming they are all on the same machine as you said you are in psuedo-distributed mode)
Also you did not show your hbase.zookeeper.quorum setting in hbase-site.xml. That is where hbase gets its knowledge of the zookeeper's address
<property>
<name>hbase.zookeeper.quorum</name>
<value>ip-10-8-127-58.ec2.internal</value>
</property>
I think Hbase cannot find the zookeeper quorum, you have to set the hbase.zookeeper.quorum property in hbase-site.xml. Also check if classpath is set properly or not, check this doc out http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Resources