Restarting NameNode in Hadoop Cluster without format - hadoop

Due to some reasons had to shut down my master node in cluster, as if we start the cluster again the namenode wont run unless we format it again, is their any solution to start name-node without formatting... Tried everything..
Start-all.sh or starting namenode/datanodes individually but Namenode wont start untill i format it again, How can i start Name-node without formatting.
Thanks in Advance

Please post the log information.
In fact, it needn't format when you restart the hadoop. Because the HDFS meta information would be storage in the disk, if you format the namenode, the meta information will be lost.
You can try whether there is namenode process still exist when you stop the cluster, use the commond ps -e|grep java. If yes, kill it and start namenode again.

Related

Corrupted block in hdfs cluster

The screenshot added below shows the output of hdfs fsck /. It shows that the "/" directory is corrupted. This is the masternode of my Hadoop cluster. What to do?
If you are using Hadoop 2, you can run a Standby namenode to achieve High Availability. Without that, your cluster's master will be a Single Point of Failure.
You can not retrieve the data of Namenode from anywhere else since it is different from the usual data you store. If your namenode goes down, your blocks and files will still be there, but you won't be able to access them since there would be no related metadata in the namenode.

Does hard restart of agent delete data?

Does hard restart of agent delete data?
I did a hard restart of my agent and now I do not see data in hadoop fs -ls /user/hue or hadoop fs -ls /user/hive where did it go? I also do not see my other users but only hue and hive. What do I do? Where did it go?
I don't think data in hdfs should go anywhere with that.
If I query my tables in hive, I keep getting
The operation has no results
Help please!
Doing a hard restart of the Cloudera Manager agent will not cause data loss, but will cause all of the Hadoop daemons to be restarted. A normal restart of the agent does not do this, so a hard restart is useful if you need to force a stop of all the running processes.
If you are seeing no data in HDFS following a restart check the status of the HDFS service in Cloudera Manager. It will tell you how much capacity is used in HDFS, the number of files and other metrics. If you're seeing no data it could be that your DataNodes have not been started. Check to see if this is the case and if your NameNode is still in safe mode.

How do I safely remove a Hadoop datanode for maintenance?

I want to take a single machine out of a Hadoop cluster temporarily.
Most documentation says take it out of by adding it to the yarn and dfs .exclude files. I don't want to add it to the dfs.exclude and yarn.exclude files and decommission it with hdfs dfsadmin -refreshNodes, though, because I want to take it out, make some changes to the machine, and bring it back online as soon as possible. I don't want to copy hundreds of gigabytes of data over to avoid under-replicated blocks!
Instead, I'd like to be able to power off the machine quickly while making sure:
The cluster as a whole is still operational.
No data is lost by the journalmanager or nodemanager processes.
No Yarn jobs fail or go AWOL when the process dies.
My best guess at how to do this is by issuing:
./hadoop-daemon.sh --hosts hostname stop datanode
./hadoop-daemon.sh --hosts hostname stop journalnode
./yarn-daemon.sh --hosts hostname stop nodemanager
And then starting each of these processes individually again when the machine comes back online.
Is that safe? And is there a more efficient way to do this?

After restar HBase ZooKeeper log Quorum.Learner: Got zxid 0x100000001 expected 0x1

I am performing some tests using HBase and Hadoop, I did setup a cluster with one master, two zookeeper and four region servers. Up until yestarday everything was working perfectly well, starting from today it simply don't start anymore.
When executing start-hbase all the process get alive:
HMaster using ports 8020 and 60010
HQuorumPeer using ports 2181 and 3888
HRegionServer
However when I take a look onto the server logs it seems the servers got stucked for some reason...
. HServer stop printing a WARNING about a native library that I was supposed to be using
. HQuorumPeer on node 1 prints a WARNING about Getting a zxid 0x10000000001 expected 0x1
. HQuorumPerr on node 1 has not print at all
Does someone has any idea on this?
Thanks.
Well, I am far, far away to be considered a hbase/hadoop expert. In fact it is just the first time I am playing around with it. Probably, the problem I had face was related to unproperly shutdown or corrupt file from the couple hbase/hadoop.
So here is my tip if you found yourself on the same situation:
cleanup all hbase logs, in my case at $HBASE_INSTALL/logs/*
cleanup all zookeeper data, in my case at /var/zookeeper/*
cleanup all hadoop data, in my case at /var/hadoop/*
cleanup all hdfs logs, in my case at /var/hdfs/log/*
cleanup all hdfs namenode data, in my case at /var/hdfs/namenode/*
cleanup all hdfs datanode data, in my case at /var/hdfs/datanode/*
format your hdfs cluster typing the command hdfs namenode -format
IMPORTANT: Don't do that if you have data, you will probably loose all of it. I could do that once I am just using it for test purpose.
I will keep reading about hbase/hadoop in order to understand it better, anyway I can guarantee that is a tool far to be "plug and play" when compared to cassandra.
Hope this can help.
Regards

Region server geting down frequently after system start

I am running hbase on HDP on Amazon machine,
When i reboot my system and start all hbase services, it get started.
But after some time my region server get down.
Latest error that i am getting from its log file is that
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /apps/hbase/data/usertable/dd5a251551619e0109349a0dce855e1b/recovered.edits/0000000000000001172.temp could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1657)
Now i am not able to start it.
Any suggestion why it is happing.
Thanks in advance.
Make sure you datanodes are up and running. Also, set "dfs.data.dir" to some permanent location, if you haven't done it yet. It defaults to the "/tmp" dir which gets emptied at each restart. Also, make sure that your datanodes are able to talk to the namenode and there is no network related issue and the datanode machines have enough free space left.

Resources