HBase - hbase:metadata holds info about non existing RegionServer ID - "Master startup cannot progress, in holding-pattern until region onlined." - hadoop

I cannot start Hbase Master because I am getting this error:
[Thread-18] master.HMaster: hbase:meta,,1.1588230740
is NOT online; state={1588230740 state=OPEN, ts=1569328636085, server=regionserver17,16020,1566375930434};
ServerCrashProcedures=true.
Master startup cannot progress, in holding-pattern until region onlined.
Hbase Master is active and green but actually it is not started properly since it generates those WARNings in logs and actually I cannot even do the list in Hbase shell because then I get error: ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
hbase:meta is referencing some non existing ID 1566375930434 which does not exist in WALs nor in zookeeper-client /hbase-unsecure/rs list.
I tried with these commands:
$ sudo -u hdfs hdfs dfs -rm -r /apps/hbase/data/WALs/
$ zookeeper-client rmr /hbase-unsecure/rs
I also tried and this:
rm -f /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0/version-2/*
and restarted the Hbase but still always having the same issue.
If anyone can give me additional advice what to try.
Thanks

We resolved this issue.
Solution is to
stop Hbase
log to zookeeper-client as root
execute command rmr /hbase-unsecure/meta-region-server
start Hbase

You maybe config Zookeeper with OS path. This error could happens when you start and stop many time. I will get this case, so I config Zookeeper dir with hdfs path. This is my hbase-site.xml
<property> <name>hbase.zookeeper.property.dataDir</name> <value>hdfs://master:9000/user/hdoop/zookeeper</value> </property>
Goodluck for you.

Related

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

Hadoop - namenode is not starting up

I am trying to run hadoop as a root user, i executed namenode format command hadoop namenode -format when the Hadoop file system is running.
After this, when i try to start the name node server, it shows error like below
13/05/23 04:11:37 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:330)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
I tried to search for any solution, but cannot find any clear solution.
Can anyone suggest?
Thanks.
DFS needs to be formatted. Just issue the following command after stopping all and then restart.
hadoop namenode -format
Cool, i have found the solution.
Stop all running server
1) stop-all.sh
Edit the file /usr/local/hadoop/conf/hdfs-site.xml and add below configuration if its missing
<property>
<name>dfs.data.dir</name>
<value>/app/hadoop/tmp/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/app/hadoop/tmp/dfs/name</value>
<final>true</final>
</property>
Start both HDFS and MapReduce Daemons
2) start-dfs.sh
3) start-mapred.sh
Then now run the rest of the steps to run the map reduce sample given in this link
Note : You should be running the command bin/start-all.sh if the direct command is not running.
format hdfs when namenode stop.(just like the top answer).
I add some more details.
FORMAT command will check or create path/dfs/name, and initialize or reinitalize it.
then running start-dfs.sh would run namenode, datanode, then namesecondary.
when namenode check not exist path/dfs/name or not initialize, it occurs a fatal error, then exit.
that's why namenode not start up.
more details you can check HADOOP_COMMON/logs/XXX.namenode.log
Make sure the directory you've specified for your namenode is completely empty. Something like a "lost+found" folder in said directory will trigger this error.
hdfs-site.xml your value is wrong. You input the wrong folder that's why is not starting the name node.
First mkdir [folder], then set hdfs-site.xml then format
make sure that the directory to name(dfs.name.dir) and data (dfs.data.dir) folder is correctly listed in hdfs-site.xml
Formatting namenode worked for me
bin/hadoop namenode -format

FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed

I am using Ubuntu 12.04, hadoop-0.23.5, hive-0.9.0.
I specified my metastore_db separately to some other place $HIVE_HOME/my_db/metastore_db in hive-site.xml
Hadoop runs fine, jps gives ResourceManager,NameNode,DataNode,NodeManager,SecondaryNameNode
Hive gets started perfectly,metastore_db & derby.log also created,and all hive commands run successfully,I can create databases,table,etc. But after few day later,when I run show databases,or show tables, get below error
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I had this problem too and the accepted answer did not help me so will add my solution here for others:
My problem was I had a single machine with a pseudo distributed set up installed with hive. It was working fine with localhost as the host name. However when we decided to add multiple machines to the cluster we also decided to give the machines proper names "machine01, machine 02 etc etc".
I changed all the hadoop conf/*-site.xml files and the hive-site.xml file too but still had the error. After exhaustive research I realized that in the metastore db hive was picking up the URIs not from *-site files, but from the metastore tables in mysql. Where all the hive table meta data was saved are two tables SDS and DBS. Upon changing the DB_LOCATION_URI column and LOCATION in the tables DBS and SDS respectively to point to the latest namenode URI, I was back in business.
Hope this helps others.
reasons for this
If you changed your Hadoop/Hive version,you may be specifying previous hadoop version (which has ds.default.name=hdfs://localhost:54310 in core-site.xml) in your hive-0.9.0/conf/hive-env.sh
file
$HADOOP_HOME may be point to some other location
Specified version of Hadoop is not working
your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
In case of fresh installation
the above problem can be the effect of a name node issue
try formatting the namenode using the command
hadoop namenode -format
1.Turn off your namenode from safe mode. Try the commands below:
hadoop dfsadmin -safemode leave
2.Restart your Hadoop daemons:
sudo service hadoop-master stop
sudo service hadoop-master start

Datanode process not running in Hadoop

I set up and configured a multi-node Hadoop cluster using this tutorial.
When I type in the start-all.sh command, it shows all the processes initializing properly as follows:
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-jawwadtest1.out
jawwadtest1: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.out
jawwadtest2: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest2.out
jawwadtest1: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-jawwadtest1.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-jawwadtest1.out
jawwadtest1: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest1.out
jawwadtest2: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest2.out
However, when I type the jps command, I get the following output:
31057 NameNode
4001 RunJar
6182 RunJar
31328 SecondaryNameNode
31411 JobTracker
32119 Jps
31560 TaskTracker
As you can see, there's no datanode process running. I tried configuring a single-node cluster but got the same problem. Would anyone have any idea what could be going wrong here? Are there any configuration files that are not mentioned in the tutorial or I may have looked over? I am new to Hadoop and am kinda lost and any help would be greatly appreciated.
EDIT:
hadoop-root-datanode-jawwadtest1.log:
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/$
************************************************************/
2012-08-09 23:07:30,717 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loa$
2012-08-09 23:07:30,734 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:30,735 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:30,736 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:31,018 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:31,024 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:32,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to $
2012-08-09 23:07:37,949 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: $
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(Data$
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransition$
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNo$
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java$
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNod$
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode($
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataN$
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.$
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1$
2012-08-09 23:07:37,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: S$
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at jawwadtest1/198.101.220.90
************************************************************/
You need to do something like this:
bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie)
rm -Rf /app/tmp/hadoop-your-username/*
bin/hadoop namenode -format (or hdfs in the 2.x serie)
the solution was taken from:
http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-troubleshooting/. Basically it consists in restarting from scratch, so make sure you won't loose data by formating the hdfs.
I ran into the same issue. I have created a hdfs folder '/home/username/hdfs' with sub-directories name, data, and tmp which were referenced in config xml files of hadoop/conf.
When I started hadoop and did jps, I couldn't find datanode so I tried to manually start datanode using bin/hadoop datanode. Then I realized from error message that it has permissions issue accessing the dfs.data.dir=/home/username/hdfs/data/ which was referenced in one of the hadoop config files. All I had to do was stop hadoop, delete the contents of /home/username/hdfs/tmp/* directory and then try this command - chmod -R 755 /home/username/hdfs/ and then start hadoop. I could find the datanode!
I faced similar issue while running the datanode. The following steps were useful.
In [hadoop_directory]/sbin directory use ./stop-all.sh to stop all the running services.
Remove the tmp dir using rm -r [hadoop_directory]/tmp (The path configured in [hadoop_directory]/etc/hadoop/core-site.xml)
sudo mkdir [hadoop_directory]/tmp (Make a new tmp directory)
Go to */hadoop_store/hdfs directory where you have created namenode and datanode as sub-directories. (The paths configured in [hadoop_directory]/etc/hadoop/hdfs-site.xml). Use
rm -r namenode
rm -r datanode
In */hadoop_store/hdfs directory use
sudo mkdir namenode
sudo mkdir datanode
In case of permission issue, use
chmod -R 755 namenode
chmod -R 755 datanode
In [hadoop_directory]/bin use
hadoop namenode -format (To format your namenode)
In [hadoop_directory]/sbin directory use ./start-all.sh or ./start-dfs.sh to start the services.
Use jps to check the services running.
Delete the datanode under your hadoop folder then rerun start-all.sh
I was having the same problem running a single-node pseudo-distributed instance. Couldn't figure out how to solve it, but a quick workaround is to manually start a DataNode with
hadoop-x.x.x/bin/hadoop datanode
Follow these steps and your datanode will start again.
Stop dfs.
Open hdfs-site.xml
Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
Then remove the hadoopdata directory and add the data.dir and name.dir in hdfs-site.xml and again format namenode.
Then start dfs again.
Need to follow 3 steps.
(1) Need to go to the logs and check the most recent log (In hadoop-
2.6.0/logs/hadoop-user-datanode-ubuntu.log)
If the error is as
java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1
i.e. namenode cluster id and datanode cluster id's are not identical.
(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error
(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version
clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089
Restart Hadoop. Will run DataNode
Stop all the services - ./stop-all.sh
Format all the hdfs tmp directory from all the master and slave. Don't forget to format from slave.
Format the namenode.(hadoop namenode -format)
Now start the services on namenode.
./bin/start-all.sh
This made a difference for me to start the datanode service.
Stop the dfs and yarn first.
Remove the datanode and namenode directories as specified in the core-site.xml file.
Re-create the directories.
Then re-start the dfs and the yarn as follows.
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
Hope this works fine.
Delete the files under $hadoop_User/dfsdata and $hadoop_User/tmpdata
then run:
hdfs namenode -format
finally run:
start-all.sh
Then your problem gets solved.
Please control if the the tmp directory property is pointing to a valid directory in core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
If the directory is misconfigured, the datanode process will not start properly.
Run Below Commands in Line:-
stop-all.sh (Run Stop All to Stop all the hadoop process)
rm -r /usr/local/hadoop/tmp/ (Your Hadoop tmp directory which you configured in hadoop/conf/core-site.xml)
sudo mkdir /usr/local/hadoop/tmp (Make the same directory again)
hadoop namenode -format (Format your namenode)
start-all.sh (Run Start All to start all the hadoop process)
JPS (It will show the running processes)
Step 1:- Stop-all.sh
Step 2:- got to this path
cd /usr/local/hadoop/bin
Step 3:- Run that command
hadoop datanode
Now DataNode work
Check whether the hadoop.tmp.dir property in the core-site.xml is correctly set.
If you set it, navigate to this directory, and remove or empty this directory.
If you didn't set it, you navigate to its default folder /tmp/hadoop-${user.name}, likewise remove or empty this directory.
In case of Mac os(Pseudo-distributed mode):
Open terminal
Stop dfs. 'sbin/stop-all.sh'.
cd /tmp
rm -rf hadoop*
Navigate to hadoop directory. Format the hdfs. bin/hdfs namenode -format
sbin/start-dfs.sh
Error in datanode.log file
$ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log
Shows:
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clusterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c
Solution: Stop your cluster and issue the below command & then start your cluster again.
sudo rm -rf /usr/local/hadoop_tmp/hdfs/datanode/*
I have got details of the issue in the log file like below :
"Invalid directory in dfs.data.dir: Incorrect permission for /home/hdfs/dnman1, expected: rwxr-xr-x, while actual: rwxrwxr-x"
and from there I identified that the datanote file permission was 777 for my folder. I corrected to 755 and it started working.
Instead of deleting everything under the "hadoop tmp dir", you can set another one. For example, if your core-site.xml has this property:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp</value>
</property>
You can change this to:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/data/tmp2</value>
</property>
and then scp core-site.xml to each node, and then "hadoop namenode -format", and then restart hadoop.
This is for newer version of Hadoop (I am running 2.4.0)
In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.
In the file: hdfs-site.xml
Look out for directory paths corresponding to
dfs.namenode.name.dir
dfs.namenode.data.dir
Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh
Hope this helps.
You need to check :
/app/hadoop/tmp/dfs/data/current/VERSION and /app/hadoop/tmp/dfs/name/current/VERSION ---
in those two files and that to Namespace ID of name node and datanode.
If and only if data node's NamespaceID is same as name node's NamespaceID then your datanode will run.
If those are different copy the namenode NamespaceID to your Datanode's NamespaceID using vi editor or gedit and save and re run the deamons it will work perfectly.
if formatting the tmp directory is not working then try this:
first stop all the entities like namenode, datanode etc. (you will
be having some script or command to do that)
Format tmp directory
Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents
in the directory manually
Now format your namenode again
start all the entities then use jps command to confirm that the
datanode has been started
Now run whichever application you have
Hope this helps.
I configured hadoop.tmp.dir in conf/core-site.xml
I configured dfs.data.dir in conf/hdfs-site.xml
I configured dfs.name.dir in conf/hdfs-site.xml
Deleted everything under "/tmp/hadoop-/" directory
Changed file permissions from 777 to 755 for directory listed under dfs.data.dir
And the data node started working.
Even after removing the remaking the directories, the datanode wasn't starting.
So, I started it manually using bin/hadoop datanode
It did not reach any conclusion. I opened another terminal from the same username and did jps and it showed me the running datanode process.
It's working, but I just have to keep the unfinished terminal open by the side.
Follow these steps and your datanode will start again.
1)Stop dfs.
2)Open hdfs-site.xml
3)Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
4)Then start dfs again.
Got the same error. Tried to start and stop dfs several times, cleared all directories that are mentioned in previous answers, but nothing helped.
The issue was resolved only after rebooting OS and configuring Hadoop from the scratch. (configuring Hadoop from the scratch without rebooting didn't work)
Once I was not able to find data node using jps in hadoop, then I deleted the
current folder in the hadoop installed directory (/opt/hadoop-2.7.0/hadoop_data/dfs/data) and restarted hadoop using start-all.sh and jps.
This time I could find the data node and current folder was created again.
Try this
stop-all.sh
vi hdfs-site.xml
change the value given for property dfs.data.dir
format namenode
start-all.sh
I Have applied some mixed configuration, and its worked for me.
First >>
Stop Hadoop all Services using
${HADOOP_HOME}/sbin/stop-all.sh
Second >>
Check mapred-site.xml which is located at your ${HADOOP_HOME}/etc/hadoop/mapred-site.xml and change the localhost to master.
Third >>
Remove the temporary folder created by hadoop
rm -rf //path//to//your//hadoop//temp//folder
Fourth >>
Add the recursive permission on temp.
sudo chmod -R 777 //path//to//your//hadoop//temp//folder
Fifth >>
Now Start all the services again. And First check that all service including datanode is running.
enter image description here
mv /usr/local/hadoop_store/hdfs/datanode /usr/local/hadoop_store/hdfs/datanode.backup
mkdir /usr/local/hadoop_store/hdfs/datanode
hadoop datanode OR start-all.sh
jps

HBase connection exception

I try to run HBase in a Pseudo-Distributed mode. But it doesn't work after I set hbase-site.xml.
Each time I try to run a command inside hbase shell I get this error:
ERROR:
org.apache.hadoop.hbase.ZooKeeperConnectionException:
org.apache.hadoop.hbase.ZooKeeperConnectionException:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = connectionLoss for
/hbase
I set up ssh and make sure all port are correct.
Moreover, I cannot stop hbase though ./bin/stop-hbase.sh. I only get the follow output.
stopping hbase........................................................
Pseudo-distributed means that you are running all of the processes on one machine. You need to check that all of the required processes are running:
Hadoop:
NameNode
DataNode
JobTracker
TaskTracker
Zookeeper:
HQuorumPeer
HBase:
HMaster
RegionServer
You also need to ensure that your hbase-site.xml contains the correct entries for zookeeper defining the host name and the port. The HBase FAQ and Wiki are really quite good. What are you missing from there?
It's because the HBase documentation has you setup your HDFS settings to point to port 8020, but the Hadoop instructions configure HDFS for port 9000.
Change hbase-site.xml settings that HBase recommends to point to port 9000 instead:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
I had similar issue and got the same error message as above. In my case HMaster was not running. Using
sudo start-hbase.sh
resolved the issue.
i just fixed the problem by deleting the hbase.rootdir and the hbase.zookeeper.property.dataDir folders. for example:
more conf/hbase-site.xml
gives me:
hbase.rootdir
file:///somepath/hbase/testuser/hbase
hbase.zookeeper.property.dataDir
/somepath/hbase/testuser/zookeeper
then remove the old data:
rm -fr /somepath/hbase/testuser/hbase
mkdir -p /somepath/hbase/testuser/hbase
rm -fr /somepath/hbase/testuser/zookeeper
mkdir -p /somepath/hbase/testuser/zookeeper
then to start it:
bin/start-hbase.sh
and finally i could connect to the local instance:
./bin/hbase shell

Resources