DataNode not started after changing the datanode directories parameter. DiskErrorException - hadoop

I have added the new disk to the hortonworks sandbox on the OracleVM, following this example:
https://muffinresearch.co.uk/adding-more-disk-space-to-a-linux-virtual-machine/
I set the owner of the mounted disk directory as hdfs:hadoop recursively and give the 777 permisions to it.
I added the mounted disk folder to the datanode directories after coma using Ambari. Also tried changing XML directly.
And after the restart the dataNode always crashes with the DiskErrorException Too many failed volumes.
Any ideas what I am doing wrong?

I found the workaround of this problem, I mounted the disk to the to the /mnt/sdb folder,
and this folder I use as the datanode directories entry. But if create the /mnt/sdb/data and use it as entry the exception dissapears and everythig works like a charm.
No idea why(

Related

No Such file or directory : hdfs

I deployed Kubernetes on a single node using minikube and then installed hadoop and hdfs with helm. It's working well.
The problem is when i try to copy a file from local to hdfs $ hadoop fs -copyFromLocal /data/titles.csv /data i get this: No such file or directory
this is the path on local :
You've shown a screenshot of your host's filesystem GUI details panel.
Unless you mount /data folder inside the k8s pod, there will be no /data folder you can put from.
In other words, you should get a similar error with just ls /data, and this isn't an HDFS problem since "local" means different things in different contexts.
You have at least 3 different "local" filesystems - your host, the namenode pod, the datanode pod, and possibly also the minikube driver (if using a VM)

Whenever I restart my ubuntu system (Vbox) and start my hadoop , my name node is not working

Whenever I restart my ubuntu system (Vbox) and start my Hadoop, my name node is not working.
To resolve this I have to always the folders of namenode and datanode and format Hadoop every time I restart my system.
Since 2 days am trying to resolve the issue but its not working. I tried to give the permissions 777 again to the namenode and datanode folders, also I tried changing the paths for the same.
My error is
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /blade/Downloads/Hadoop/data/datanode is in an inconsistent state: storage directory does not exist or is not accessible
Please help me to resolve the issue.
You cannot just shutdown the VM. You need to cleanly stop the datanode and namenode processes in that order, otherwise there's a potential for a corrupted HDFS, causing you to need to reformat, assuming that you don't have a backup system
I'd also suggest putting Hadoop data for a VM in its own VM drive and mount, not a shared host folder under Downloads

hadoop.tmp.dir dont work in right location

In my core-site.xml, I changed the hadoop.tmp.dir location in another big HHD (/data/hadoop_tmp), this HHD is not linux /tmp location, then formatted my namenode, started my dfs and yarn, I believe it worked.
But the default location appears in the same folder, and when I use hive, hive-jar is loaded in the default location (/tmp), my /tmp is too small and then hive job fails
I dont know why my config does not work.

Data node capacity increase is not detected by the HDFS

I have a virtual hadoop cluster composed of 4 virtual machine (1 master and 3 slaves) and recently added 100GB capacity to my datanodes in cluster. The problem is that HDFS does not detect this increasing. I restart datanode, namenode and also format HDFS but didn't affect. How can I fix this problem?
I'm writing the steps to add datanode/increase hdfs volume size:
Create a folder in the root eg. /hdfsdata
Create a folder under home eg. /home/hdfsdata
Provide permission to 'hdfs' user to this folder: chown hdfs:hadoop -R /home/hdfsdata
Provide file/folder permissions to this folder: chmod 777 -R /home/hdfsdata.
Mount this new folder: mount --bind /home/hdfsdata/ /hdfsdata/
At last, add this newly created directory to the hdfs-site.xml under the property "dfs.datanode.data.dir" with a comma separated value.
When i cd to /hadoop/hdfs/data
After the above steps, restart the HDFS service and you have your capacity increased.

hdfs data directory "is in an inconsistent state: is incompatible with others."

Sorry, but this is getting on my nerves...
Exactly when i start loading a table through hive, I start getting this error. And dear old google is not able to help either.
my situation -
single node setup. Namenode working properly.
datanode startup is failing with this message -
ERROR datanode.DataNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /xxxxxx/hadoop/hdfs-data-dir is in an inconsistent state: is incompatible with others.
I have already tried to re-format my namenode, but it doesnt help.
Also, I tried to find ways to "format" my datanode, but no success so far..
help please...
This site pointed me at a solution after a drive got reformatted:
I ran into a problem with hadoop where it wouldn’t startup after I
reformatted a drive. To fix this make sure the VERSION number is the
same across all hadoop directories
md5sum /hadoop/sd*/dfs/data/current/VERSION
If they aren’t the same version across all partitions, then you will
get the error.
I simply copied the VERSION information from one of the other drives, changed permissions, and restarted HDFS.
Found a fix.
Needed to
create a fresh hdfs directory,
remove the write permissions from the group (chmod g-w xxxx) and
remove all temporary files from /tmp pertaining to hadoop/hdfs.
I am convinced that there could/would be a better/cleaner way to fix this.
still keeping the question open therefore.

Resources