Unable to Start the Name Node in hadoop

Unable to Start the Name Node in hadoop - hadoop

I am running the hadoop in my local system but while running ./start-all.sh command its running all functionality except Name Node while running it's getting connection refused and in log file prints below exception
java.io.ioexception : there appears to be a gap in the edit log, we expected txid 1, but got txid 291.
Can You please help me.

Start namenode with recover flag enabled. Use the following command
./bin/hadoop namenode -recover

The metadata in Hadoop NN consists of:
fsimage: contains the complete state of the file system at a point in time
edit logs: contains each file system change (file creation/deletion/modification) that was made after the most recent fsimage.
If you list all files inside your NN workspace directory, you'll see files include:
fsimage_0000000000000000000 (fsimage)
fsimage_0000000000000000000.md5
edits_0000000000000003414-0000000000000003451 (edit logs, there're many ones with different name)
seen_txid (a separated file contains last seen transaction id)
When NN starts, Hadoop will load fsimage and apply all edit logs, and meanwhile do a lot of consistency checks, it'll abort if the check failed. Let's make it happen, I'll rm edits_0000000000000000001-0000000000000000002 from many of my edit logs in my NN workspace, and then try to sbin/start-dfs.sh, I'll get error message in log like:
java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 3.
So your error message indicates that your edit logs is inconsitent(may be corrupted or maybe some of them are missing). If you just want to play hadoop on your local and don't care its data, you could simply hadoop namenode -format to re-format it and start from beginning, otherwise you need to recovery your edit logs, from SNN or somewhere you backed up before.

Related

How to restore version file for datanode after formatting namenode

I have executed hadoop namenode -format command. Namenode runs just fine, but the datanode cannot start. Version file for datanode shows that it's a NAME_NODE. Before formatting I have deleting everything from /hadoop/hdfs/namenode/* and /hadoop/hdfs/data/*
Now everytime I try to delete everything and re-format the namenode, datanode doesn't get start because of the incorrectly generated VERSION file. Googling didn't yield much.

It sounds like your version number doesn't match the name node. The version file is used to see what namenode that the datanode belongs to. Once they no longer match the datanode doesn't want to join the name node as it believe it belongs to a different namenode.
Here's a good explanation here of what you need to do.

NiFi ListHDFS cannot find directory, FileNotFoundException

Have pipeline in NiFi of the form listHDFS->moveHDFS, attempting to run the pipeline we see the error log
13:29:21 HSTDEBUG01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Returning CLUSTER State: StandardStateMap[version=43, values={emitted.timestamp=1525468790000, listing.timestamp=1525468790000}]
13:29:21 HSTDEBUG01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Found new-style state stored, latesting timestamp emitted = 1525468790000, latest listed = 1525468790000
13:29:21 HSTDEBUG01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Fetching listing for /hdfs/path/to/dir
13:29:21 HSTERROR01631000-d439-1c41-9715-e0601d3b971c
ListHDFS[id=01631000-d439-1c41-9715-e0601d3b971c] Failed to perform listing of HDFS due to File /hdfs/path/to/dir does not exist: java.io.FileNotFoundException: File /hdfs/path/to/dir does not exist
Changing the listHDFS path to /tmp seems to run ok, thus making me think that the problem is with my permissions on the directory I'm trying to list. However, changing the NiFi user to a user that can access that directory (eg. hadoop fs -ls /hdfs/path/to/dir) by setting the bootstrap.properties value run.as=myuser and restarting (see https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#bootstrap_properties) still seems to produce the same problem for the directory. The literal dir. string being used that is not working is:
"/etl/ucera_internal/datagov_example/raw-ingest-tracking/version-1/ingest"
Does anyone know what is happening here? Thanks.
** Note: The hadoop cluster I am accessing does not have kerberos enabled (it is a secured MapR hadoop cluster).
Update: It appears that the mapr hadoop implementation is different enough that it requires special steps in order for NiFi to properly work on it (see https://community.mapr.com/thread/10484 and http://hariology.com/integrating-mapr-fs-and-apache-nifi/). May not get a chance to work on this problem for some time to see if still works (as certain requirements have changed), so am dumping the link here for others who may have this problem in the meantime.

Could you once make sure you have entered correct path and directory needs to be exists in HDFS.
It seems to be list hdfs processor not able to find the directory that you have configured in directory property and logs are not showing any permission denied issues.
If logs shows permission denied then you can change the nifi running user in bootstrap.conf and
Once you make change in nifi properties then NiFi needs to restart to apply the changes (or) change the permissions on the directory that NiFi can have access.

Spark/Hadoop can't read root files

I'm trying to read a file inside a folder that only me (and root) can read/write, through spark, first I start the shell with:
spark-shell --master yarn-client
then I:
val base = sc.textFile("file///mount/bases/FOLDER_LOCKED/folder/folder/file.txt")
base.take(1)
And got the following error:
2018-02-19 13:40:20,835 WARN scheduler.TaskSetManager:
Lost task 0.0 in stage 0.0 (TID 0, mydomain, executor 1):
java.io.FileNotFoundException: File file: /mount/bases/FOLDER_LOCKED/folder/folder/file.txt does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
...
I am suspecting that as yarn/hadoop was launched with the user hadoop it can't go further in this folder to get the file. How could I solve this?
OBS: This folder can't be open to other users because it has private data.
EDIT1: This /mount/bases is a network storage, using a cifs connection.
EDIT2: hdfs and yarn was launched with the user hadoop

As hadoop was the user that lauched hdfs and yarn, he is the user that will try to open a file in a job, so it must be authorized to access this folder, fortunely hadoop checks what user is executing the job first to allow the access to a folder/file, so you will not take risks at this.

Well, if it would have been access related issue with the file, you would have got 'access denied' as an error. In this particular scenario, I think file that you are trying to read is not present at all, or might have some other name[typos]. Just check for the file name.

Hadoop on Mesos fails with "Could not find or load main class org.apache.hadoop.mapred.MesosExecutor"

I have a Mesos cluster setup -- I have verified that the master can see the slaves -- but when I attempt to run a Hadoop job, all tasks wind up with a status of LOST. The same error is present in all the slave stderr logs:
Error: Could not find or load main class org.apache.hadoop.mapred.MesosExecutor
and that is the only line in the stderr logs.
Following the instructions on http://mesosphere.io/learn/run-hadoop-on-mesos/, I have put a modified Hadoop distribution on HDFS which each slave can access.
In the lib directory of the Hadoop distribution, I have added hadoop-mesos-0.0.4.jar and mesos-0.14.2.jar.
I have verified that each slave does in fact download this Hadoop distribution, and that hadoop-mesos-0.0.4.jar contains the class org.apache.hadoop.mapred.MesosExecutor, so I cannot figure out why the class cannot be found.
I am using Hadoop from CDH4.4.0 and mesos-0.15.0-rc4.
Does any one have any suggestions as to what might be the problem? I know I would always start with a CLASSPATH problem, but, in this case, the mesos-slave is downloading, unpacking, and attempting to run a Hadoop TaskTracker so I would imagine any CLASSPATH would be setup by the mesos-slave.
In the stdout of the slave logs, the environment is printed. There is a MESOS_HADOOP_HOME which is empty. Should this be set to something? If it is supposed to be set to the downloaded Hadoop distribution, I cannot set it in advance because the Hadoop distribution is downloaded to a new location every time.
In the event that is related (some permissions issue maybe), when attempting to browse slave logs via the master UI, I get the error Error browsing path: ....
The user running mesos-slave can browse to the correct directory when I do so manually.

I found the problem. bin/hadoop of the downloaded Hadoop distribution attempts to find its location by running which $0. However, that will find a current Hadoop installation if one exists (i.e. /usr/lib/hadoop), and will load the jars under that installation's lib directory instead of the downloaded one's lib directory.
I had to modify bin/hadoop of the downloaded distribution to find its own location with dirname $0 instead of which $0.

Datanode failing in Hadoop on single Machine

I set up and configured sudo node hadoop environment on ubuntu 12.04 LTS using following tutorial
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#formatting-the-hdfs-filesystem-via-the-namenode
After typing hadoop/bin $ start-all.sh
everything going fine then i checked the Jps
then NameNode, JobTracker ,TaskTracker,SecondaryNode have been started but DataNode not started ...
If any know how to resolve this issue please let me know..

ya i resolved it...
java.io.IOException: Incompatible namespaceIDs
If you see the error java.io.IOException: Incompatible namespaceIDs in the logs of a DataNode (logs/hadoop-hduser-datanode-.log), chances are you are affected by issue HDFS-107 (formerly known as HADOOP-1212).
The full error looked like this on my machines:
... ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 308967713; datanode namespaceID = 113030094
at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:281)
at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:121)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:230)
at org.apache.hadoop.dfs.DataNode.(DataNode.java:199)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1202)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1146)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1167)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1326)
t the moment, there seem to be two workarounds as described below.
Workaround 1: Start from scratch
I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:
Stop the cluster
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
Restart the cluster
When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.
Workaround 2: Updating namespaceID of problematic DataNodes
Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:
Stop the DataNode
Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
Restart the DataNode
If you followed the instructions in my tutorials, the full path of the relevant files are:
NameNode: /app/hadoop/tmp/dfs/name/current/VERSION
DataNode: /app/hadoop/tmp/dfs/data/current/VERSION (background: dfs.data.dir is by default set to ${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir in this tutorial to /app/hadoop/tmp).
The solution for the problem is clearly given in the following site:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Unable to Start the Name Node in hadoop - hadoop

Start namenode with recover flag enabled. Use the following command ./bin/hadoop namenode -recover

Related

How to restore version file for datanode after formatting namenode

NiFi ListHDFS cannot find directory, FileNotFoundException

Spark/Hadoop can't read root files

Hadoop on Mesos fails with "Could not find or load main class org.apache.hadoop.mapred.MesosExecutor"

Datanode failing in Hadoop on single Machine

Categories

Resources