Hadoop: ERROR BlockSender.sendChunks() exception - hadoop

I have a cluster to use Hadoop (one master that works as namenode and datanode, and two slaves). I saw in the log files these messages of error:
hadoop-hduser-datanode-master.log file:
2017-05-15 13:02:55,303 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() exception:
java.io.IOException: Tubería rota
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:570)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:739)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:527)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:239)
at java.lang.Thread.run(Thread.java:748)
That happened only in the master node, after a while of inactivity. Fifteen minutes before, I ran a wordcount example successfully.
The OS in each node is Ubuntu 16.04. The cluster was created using VirtualBox.
Could you help me, please?

I followed this link:
https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
to configure some parameters about the memory and my problem was resolved!
Note: in some posts I read that this error could be for lack of disk space (non in my case) and in other the reason was the SO's version (they recommended downgrade Ubuntu 16.04 to 14.04).

Related

HBase masters are going down after enabling Ranger plugin

I have a 3 node Distributed Hbase cluster of version 2.0.2 and its working properly. After I Installed apache ranger From Same Ambari & I just Enabled the Hbase plugin and then I restarted the HBase Immediately Both HBase masters are getting down with Below logs.
ERROR [Thread-16] master.HMaster: ***** ABORTING master ,16000,1585061451214: The coprocessor org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor threw java.lang.UnsatisfiedLinkError: /run/hbase/.cache/JNA/temp/jna2781046120401699711.tmp: /run/hbase/.cache/JNA/temp/jna2781046120401699711.tmp: failed to map segment from shared object *****
ERROR [Thread-16] master.HMaster: Failed to become active master
java.lang.NullPointerException
at org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1405)
at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1310)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:930)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2234)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:570)
at java.lang.Thread.run(Thread.java:745)
If I Disable the Plugin and everything is working fine
We were experiencing a similar issue with a near identical stack trace to yours. Like you, everything worked only if the plugin was disabled.
For us it turns out that the reason was that the /run mount had a noexec flag, which prevented tmp files from being executed within it. The solution was to remount /run on the Hbase master nodes using:
sudo mount -o remount,exec /run
After this we restarted the HBase services and everything was working again.
Note that this modified mount will not be saved after a restart of the machine. For it to persist you may need to use something like fstab

Apache Phoenix Installation not done properly

We are trying to install Phoenix 4.4.0 on HBase 1.0.0-cdh5.4.4 (CDH5.5.5 four nodes cluster) via this installation document: Phoenix installation
Based on that we copied our phoenix-server-4.4.0-HBase-1.0.jar to hbase libs on each region server and master server, so that, on each /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hbase/lib folder in the master and three region servers.
After that we reboot the HBase service via Cloudera Manager.
Everything seems to be ok, but when we are trying to access to phoenix shell via ./sqlline.py localhost command, we get a Zookeeper error in that way:
15/09/09 14:20:51 WARN client.ZooKeeperRegistry: Can't retrieve clusterId from Zookeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
So we are not sure that the installation is properly done. Is necessary any further configuration?
We are not even sure wether we are using the sqlline command properly.
Any help will be appreciated.
After reinstalling the 4 nodes cluster on AWS, phoenix is now working properly.
It's a pitty that we don't know exactly what was really happening, but we think that after several changes in our config, we broke something that made phoenix impossible to work.
One thing to take into consideration is that sqllline command has to be executed with an ip that is in the zookeeper quorum, and this is something we were doing wrong, since we were trying to run it from the namenode, and it wasn't in the zookeeper quorum.Once we run sqlline.py from a datanode, everything is working fine.
Btw, the installation guide that we finally followed is Phoenix Installation

Hadoop jobtracker UI not accessible

I've configured hadoop 1.0.4 in pseudo-distributed mode. Everything's good, I can put local files in HDFS and run wordcount task. But I just can't access the jobtracker web UI through localhost:50030, localhost:50070 doesn't work neither.
HTTP ERROR 404
Problem accessing /jobtracker.jsp. Reason:
/jobtracker.jsp Powered by Jetty://
I look at the log files, but there's no error...
I used to have some problem with datanode, and jobtracker complained about replication, but that is solved and now all daemons are good (namenode, datanode, jobtracker, tasktracker, secondarynamenode) and no error in any of the log files.
Any suggestions?
Ok finally I solved it myself, I had to re-install the system then re-install hadoop. I think the problem should be that I've previously installed the CDH4 on my system, which is hadoop 2.0.0 and even if I uninstalled all of its packages (debian system) and change the tmp folder of HDFS, but maybe there's still something left. The only way is to restart over.

FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed

I am using Ubuntu 12.04, hadoop-0.23.5, hive-0.9.0.
I specified my metastore_db separately to some other place $HIVE_HOME/my_db/metastore_db in hive-site.xml
Hadoop runs fine, jps gives ResourceManager,NameNode,DataNode,NodeManager,SecondaryNameNode
Hive gets started perfectly,metastore_db & derby.log also created,and all hive commands run successfully,I can create databases,table,etc. But after few day later,when I run show databases,or show tables, get below error
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I had this problem too and the accepted answer did not help me so will add my solution here for others:
My problem was I had a single machine with a pseudo distributed set up installed with hive. It was working fine with localhost as the host name. However when we decided to add multiple machines to the cluster we also decided to give the machines proper names "machine01, machine 02 etc etc".
I changed all the hadoop conf/*-site.xml files and the hive-site.xml file too but still had the error. After exhaustive research I realized that in the metastore db hive was picking up the URIs not from *-site files, but from the metastore tables in mysql. Where all the hive table meta data was saved are two tables SDS and DBS. Upon changing the DB_LOCATION_URI column and LOCATION in the tables DBS and SDS respectively to point to the latest namenode URI, I was back in business.
Hope this helps others.
reasons for this
If you changed your Hadoop/Hive version,you may be specifying previous hadoop version (which has ds.default.name=hdfs://localhost:54310 in core-site.xml) in your hive-0.9.0/conf/hive-env.sh
file
$HADOOP_HOME may be point to some other location
Specified version of Hadoop is not working
your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
In case of fresh installation
the above problem can be the effect of a name node issue
try formatting the namenode using the command
hadoop namenode -format
1.Turn off your namenode from safe mode. Try the commands below:
hadoop dfsadmin -safemode leave
2.Restart your Hadoop daemons:
sudo service hadoop-master stop
sudo service hadoop-master start

Datanode failing in Hadoop on single Machine

I set up and configured sudo node hadoop environment on ubuntu 12.04 LTS using following tutorial
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#formatting-the-hdfs-filesystem-via-the-namenode
After typing hadoop/bin $ start-all.sh
everything going fine then i checked the Jps
then NameNode, JobTracker ,TaskTracker,SecondaryNode have been started but DataNode not started ...
If any know how to resolve this issue please let me know..
ya i resolved it...
java.io.IOException: Incompatible namespaceIDs
If you see the error java.io.IOException: Incompatible namespaceIDs in the logs of a DataNode (logs/hadoop-hduser-datanode-.log), chances are you are affected by issue HDFS-107 (formerly known as HADOOP-1212).
The full error looked like this on my machines:
... ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 308967713; datanode namespaceID = 113030094
at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:281)
at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:121)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:230)
at org.apache.hadoop.dfs.DataNode.(DataNode.java:199)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1202)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1146)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1167)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1326)
t the moment, there seem to be two workarounds as described below.
Workaround 1: Start from scratch
I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:
Stop the cluster
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
Restart the cluster
When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.
Workaround 2: Updating namespaceID of problematic DataNodes
Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:
Stop the DataNode
Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
Restart the DataNode
If you followed the instructions in my tutorials, the full path of the relevant files are:
NameNode: /app/hadoop/tmp/dfs/name/current/VERSION
DataNode: /app/hadoop/tmp/dfs/data/current/VERSION (background: dfs.data.dir is by default set to ${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir in this tutorial to /app/hadoop/tmp).
The solution for the problem is clearly given in the following site:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids

Resources