I'm working on a cluster with hbase.
One node crashed couple days ago. I restarted the cluster; since that time, the root region is in transition despite all my efforts.
70236052 -ROOT-,,0.70236052 state=CLOSING, ts=Wed Apr 10 15:06:04 CEST
2013 (417729s ago), server=NODE09...
I tried to :
restart HBase
remove the service and re-install it
revome the service and install the master onto another node
install 2 different Hbase
format the HDFS namenode
deleting the HBase file from HDFS system
It still can find this region in transition.
I tried to access to the .META. table :
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for
after 7 tries
I attempted to use the command /bin/hbase hbck :
org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for
after 10 tries.
I'm out of ideas for solving this issue.
Does someone have any suggestions?
Regards
It might be that the time on the node where the problematic regionserver is on is not synchronized with the rest of the cluster. check that NTP is configured correctly.
In any event check the log of the problematic regionserver
Related
I am using hadoop apache 2.7.1 cluster which consists of 4 data nodes and two name nodes becuase it is high available
deployed on centos 7
and it began working at 01-08-2017
and we know that logs will be generated for each service
and let's take the current logs for example
hadoop-root-datanode-dn1.log
hadoop-root-datanode-dn2.log
where hadoop_root is the user iam logging with
my problem is:
in dn1 log i can find info from 01-08-2017 until today
but in dn2 log doesn't have all complete info ,as it is emptied every day so it has only info related to today
is there any properties to control this behavior or it is centos problem
any help please ?
By default, the .log files are rotated daily by log4j. This is configurable with /etc/hadoop/conf/log4j.properties.
https://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/
Not to suggest you're running a Cloudera cluster, but if you did, those files are not deleted. They're rolled and renamed
Oh, and I would suggest not running your daemons as root. Most hadoop installation guides explicitly have you create a hdfs or hadoop user
I deleted multiple old files (HiveLogs/MR-Job intermediate files) from HDFS location /temp/hive-user/hive_2015*.
After that, I noticed my four node cluster is responding very slow and having the following issue.
I re-started my cluster, it worked fine for 3-4 hours, and then again it started giving same issue as follows:
Hadoopdfs health page is getting loaded very slowly.
File browsing is very slow
Namenode logs getting full with "Blocks does not belongs to any File".
All operation to my cluster is slow.
I found it could be because of I deleted hdfs files, according to HDFS JIRA- 7815 and 7480, as I deleted huge numbers of file Namenode could not delete blocks properly. As Namenode was busy with multiple deletions tasks. This is an existing bug with older version of Hadoop (older than 2.6.0).
Can anyone please suggest quick fix without upgrading my hadoop cluster or patch installation?
How can I identify those orphan blocks and delete them from Hadoop FS?
I have installed HDP Ambari with three nodes in VM, i restarted one of three nodes i.e., datanode2 after that, i lost heart beat from that node in Ambari. I restarted ambari-agent in all three nodes, then also not working. Kindly find me a solution.
Well the provided information is not sufficient, anyway i will try to tell you the normal approach I take to debug this.
First check if all the ambari-agents are running, use the command ambari-agent status.
Check the logs of both ambari-agent and ambari-server. Normally the logs are available at /var/log/ambari-agent and /var/log/ambari-server. Logs should tell you the exact reason for heartbeat lost.
Most common reasons for the agent failure would be Connection issues between the machines, version mismatch or corrupt database entry.
I think log files should help you.
How do I troubleshoot and recover a Lost Node in my long running EMR cluster?
The node stopped reporting a few days ago. The host seems to be fine and HDFS too. I noticed the issue only from the Hadoop Applications UI.
EMR nodes are ephemeral and you cannot recover them once they are marked as LOST. You can avoid this in first place by enabling 'Termination Protection' feature during a cluster launch.
Regarding finding reason for LOST node, you can probably check YARN ResourceManager logs and/or Instance controller logs of your cluster to find out more about root cause.
I am running hbase on HDP on Amazon machine,
When i reboot my system and start all hbase services, it get started.
But after some time my region server get down.
Latest error that i am getting from its log file is that
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /apps/hbase/data/usertable/dd5a251551619e0109349a0dce855e1b/recovered.edits/0000000000000001172.temp could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1657)
Now i am not able to start it.
Any suggestion why it is happing.
Thanks in advance.
Make sure you datanodes are up and running. Also, set "dfs.data.dir" to some permanent location, if you haven't done it yet. It defaults to the "/tmp" dir which gets emptied at each restart. Also, make sure that your datanodes are able to talk to the namenode and there is no network related issue and the datanode machines have enough free space left.