Whenever I restart my ubuntu system (Vbox) and start my hadoop , my name node is not working - hadoop

Whenever I restart my ubuntu system (Vbox) and start my Hadoop, my name node is not working.
To resolve this I have to always the folders of namenode and datanode and format Hadoop every time I restart my system.
Since 2 days am trying to resolve the issue but its not working. I tried to give the permissions 777 again to the namenode and datanode folders, also I tried changing the paths for the same.
My error is
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /blade/Downloads/Hadoop/data/datanode is in an inconsistent state: storage directory does not exist or is not accessible
Please help me to resolve the issue.

You cannot just shutdown the VM. You need to cleanly stop the datanode and namenode processes in that order, otherwise there's a potential for a corrupted HDFS, causing you to need to reformat, assuming that you don't have a backup system
I'd also suggest putting Hadoop data for a VM in its own VM drive and mount, not a shared host folder under Downloads

Related

0 datanodes when copying file from local to hadoop

My OS is Windows 10.
Ubuntu 20.04.3 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64) installed on Windows 10.
When I copy the local file to hadoop, I am receiving an error as 0 datanodes available.
I am able to copy the file from hadoop to local folder. I can see the file in local directory using the command $ ls -l
Also I am able to create directory or files in hadoop. But if restart the ubuntu terminal again, there is no such directory or files exist. It shows empty.
The steps I followed:
1. start-all.sh
2. jps
(datanodes missing)
3. copy the local file to hadoop
ERROR as 0 datanodes available
4. copy files from hadoop to local directory successful
If you stop/restart the WSL2 terminal without running stop-dfs or stop-all, you run the risk of corrupting the namenode, and it needs to be reformatted using hadoop namenode -format, not rm the namenode directory.
After formatting, you can restart the datanodes and they should become healthy again.
Same logic applies in a production environment, which is why you should always have a standby namenode for failover

DataNode not started after changing the datanode directories parameter. DiskErrorException

I have added the new disk to the hortonworks sandbox on the OracleVM, following this example:
https://muffinresearch.co.uk/adding-more-disk-space-to-a-linux-virtual-machine/
I set the owner of the mounted disk directory as hdfs:hadoop recursively and give the 777 permisions to it.
I added the mounted disk folder to the datanode directories after coma using Ambari. Also tried changing XML directly.
And after the restart the dataNode always crashes with the DiskErrorException Too many failed volumes.
Any ideas what I am doing wrong?
I found the workaround of this problem, I mounted the disk to the to the /mnt/sdb folder,
and this folder I use as the datanode directories entry. But if create the /mnt/sdb/data and use it as entry the exception dissapears and everythig works like a charm.
No idea why(

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

hadoop: different datanodes configuration in shared directory

I try to run hadoop in clustered server machines.
But, problem is server machines uses shared directories, but file directories are not physically in one disk. So, I guess if I configure different datanode direcotry in each machine (slave), I can run hadoop without disk/storage bottleneck.
How do I configure datanode differently in each slave or
How do I configure setup for master node to find hadoop that are installed in different directory in slave node when starting namenode and datanodes using "start-dfs.sh" ?
Or, is there some fancy way for this environment?
Thanks!

How to Switch between namenodes in hadoop?

Pseudomode Cluster:
Suppose first time I created a namenode on Machine "A" with name "Root1".
This will create a HDFS on tha machine.
Now i copy some file to HDFS using copyFromLocal and do some mapreduce.
Now i need to change some /conf files.
I'll change config file and to make them effective I formatted namenode with name "Root2".
If i browse the HDFS , it will be empty (means it will not contain those which copied earlier for "Root1").
If I want to see old file (for "Root1"), is there any way to switch to that HDFS or namenode (Root2 to Root1 ) ??
To be clear. Did you launch the another namenode on your machine ?
Type sudo jps in console or http://localhost:50070 in browser and check if you have more than one datanode. If there is just one node you lost your data from HDFS. If you have two namenodes you can check the filesystem in Internet browser on http://localhost:50070.
Here is instruction how to launch more than one datanode on one machine.

Resources