Data node capacity increase is not detected by the HDFS - hadoop

I have a virtual hadoop cluster composed of 4 virtual machine (1 master and 3 slaves) and recently added 100GB capacity to my datanodes in cluster. The problem is that HDFS does not detect this increasing. I restart datanode, namenode and also format HDFS but didn't affect. How can I fix this problem?

I'm writing the steps to add datanode/increase hdfs volume size:
Create a folder in the root eg. /hdfsdata
Create a folder under home eg. /home/hdfsdata
Provide permission to 'hdfs' user to this folder: chown hdfs:hadoop -R /home/hdfsdata
Provide file/folder permissions to this folder: chmod 777 -R /home/hdfsdata.
Mount this new folder: mount --bind /home/hdfsdata/ /hdfsdata/
At last, add this newly created directory to the hdfs-site.xml under the property "dfs.datanode.data.dir" with a comma separated value.
When i cd to /hadoop/hdfs/data
After the above steps, restart the HDFS service and you have your capacity increased.

Related

No Such file or directory : hdfs

I deployed Kubernetes on a single node using minikube and then installed hadoop and hdfs with helm. It's working well.
The problem is when i try to copy a file from local to hdfs $ hadoop fs -copyFromLocal /data/titles.csv /data i get this: No such file or directory
this is the path on local :
You've shown a screenshot of your host's filesystem GUI details panel.
Unless you mount /data folder inside the k8s pod, there will be no /data folder you can put from.
In other words, you should get a similar error with just ls /data, and this isn't an HDFS problem since "local" means different things in different contexts.
You have at least 3 different "local" filesystems - your host, the namenode pod, the datanode pod, and possibly also the minikube driver (if using a VM)

how do you create a hdfs data directory?

everytime my hadoop server reboots, I have to format the namenode to start the hadoop. This removes all of the files in my hadoop installation.
I need to move my hadoop hdfs location from /tmp file to permenant location where whenever the server reboots, I don't have to format the namenode etc.
I am very new to hadoop.
How do I create a hdfs file in another directory?
How do I reference this data directory in config file so that I don't have to format the namenode?
These two properties of the hdfs-site.xml determine where local files are stored.
The defaults are under /tmp
dfs.namenode.name.dir
dfs.datanode.data.dir
You typically have to format a namenode only when the HDFS processes failed to terminate correctly (such as a power failure or forced shutdown). It is encouraged to run a standby Namenode to prevent these scenarios.

DataNode not started after changing the datanode directories parameter. DiskErrorException

I have added the new disk to the hortonworks sandbox on the OracleVM, following this example:
https://muffinresearch.co.uk/adding-more-disk-space-to-a-linux-virtual-machine/
I set the owner of the mounted disk directory as hdfs:hadoop recursively and give the 777 permisions to it.
I added the mounted disk folder to the datanode directories after coma using Ambari. Also tried changing XML directly.
And after the restart the dataNode always crashes with the DiskErrorException Too many failed volumes.
Any ideas what I am doing wrong?
I found the workaround of this problem, I mounted the disk to the to the /mnt/sdb folder,
and this folder I use as the datanode directories entry. But if create the /mnt/sdb/data and use it as entry the exception dissapears and everythig works like a charm.
No idea why(

Hadoop removing a mount point folder from Cloudera

I've searched and I've been reading on Cloudera Hadoop on removing mount point file systems but I cannot find a thing on removing them.
I have two SSD drives in 6 machines and when I initially installed Cloudera Hadoop it added all file systems and I only need two mount points to run a few teragen and terasorts.
I need to remove everything except for:
/dev/nvme0n1 and /dev/nvme1n1
In Cloudera Manager you can modify the list of drives used for HDFS data at:
Clusters > HDFS > Configuration > DataNode Default Group (or whatever you may have renamed this to) > DataNode Data Directory

Hadoop DFSClient installation

I run Hadoop cluster and I'm interested to install one more machine with DFSClient only.
This machine (lets call it machine X) will not be a part of the cluster.
Machine X will run DFSClient and I should be able to see HDFS from it.
In order to install DFSClient, I copied Hadoop home directory from one of cluster's node to machine X (including .jar files and configuration).
Then I run:
hadoop fs -ls /
I get the local ROOT directory (not HDFS ROOT).
What am I doing wrong?
Copy hdfs-site.xml and place in a folder under your local linux account home dir. Then ensure that your name node (default.fs.name) is pointing to the remote namenode. Then try hadoop --config <your_config_folder> fs -ls / where your_config_folder is where you placed your hdfs-site.xml.
Technically it should work if the following steps are done
If you have copied the configuration files(*.xml) from the hadoop cluster.
HADOOP_HOME set with the copied hadoop path.
Machine X should have access to the cluster network

Resources