Hadoop DFSClient installation - hadoop

I run Hadoop cluster and I'm interested to install one more machine with DFSClient only.
This machine (lets call it machine X) will not be a part of the cluster.
Machine X will run DFSClient and I should be able to see HDFS from it.
In order to install DFSClient, I copied Hadoop home directory from one of cluster's node to machine X (including .jar files and configuration).
Then I run:
hadoop fs -ls /
I get the local ROOT directory (not HDFS ROOT).
What am I doing wrong?

Copy hdfs-site.xml and place in a folder under your local linux account home dir. Then ensure that your name node (default.fs.name) is pointing to the remote namenode. Then try hadoop --config <your_config_folder> fs -ls / where your_config_folder is where you placed your hdfs-site.xml.

Technically it should work if the following steps are done
If you have copied the configuration files(*.xml) from the hadoop cluster.
HADOOP_HOME set with the copied hadoop path.
Machine X should have access to the cluster network

Related

0 datanodes when copying file from local to hadoop

My OS is Windows 10.
Ubuntu 20.04.3 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64) installed on Windows 10.
When I copy the local file to hadoop, I am receiving an error as 0 datanodes available.
I am able to copy the file from hadoop to local folder. I can see the file in local directory using the command $ ls -l
Also I am able to create directory or files in hadoop. But if restart the ubuntu terminal again, there is no such directory or files exist. It shows empty.
The steps I followed:
1. start-all.sh
2. jps
(datanodes missing)
3. copy the local file to hadoop
ERROR as 0 datanodes available
4. copy files from hadoop to local directory successful
If you stop/restart the WSL2 terminal without running stop-dfs or stop-all, you run the risk of corrupting the namenode, and it needs to be reformatted using hadoop namenode -format, not rm the namenode directory.
After formatting, you can restart the datanodes and they should become healthy again.
Same logic applies in a production environment, which is why you should always have a standby namenode for failover

No Such file or directory : hdfs

I deployed Kubernetes on a single node using minikube and then installed hadoop and hdfs with helm. It's working well.
The problem is when i try to copy a file from local to hdfs $ hadoop fs -copyFromLocal /data/titles.csv /data i get this: No such file or directory
this is the path on local :
You've shown a screenshot of your host's filesystem GUI details panel.
Unless you mount /data folder inside the k8s pod, there will be no /data folder you can put from.
In other words, you should get a similar error with just ls /data, and this isn't an HDFS problem since "local" means different things in different contexts.
You have at least 3 different "local" filesystems - your host, the namenode pod, the datanode pod, and possibly also the minikube driver (if using a VM)

Hadoop removing a mount point folder from Cloudera

I've searched and I've been reading on Cloudera Hadoop on removing mount point file systems but I cannot find a thing on removing them.
I have two SSD drives in 6 machines and when I initially installed Cloudera Hadoop it added all file systems and I only need two mount points to run a few teragen and terasorts.
I need to remove everything except for:
/dev/nvme0n1 and /dev/nvme1n1
In Cloudera Manager you can modify the list of drives used for HDFS data at:
Clusters > HDFS > Configuration > DataNode Default Group (or whatever you may have renamed this to) > DataNode Data Directory

hadoop: different datanodes configuration in shared directory

I try to run hadoop in clustered server machines.
But, problem is server machines uses shared directories, but file directories are not physically in one disk. So, I guess if I configure different datanode direcotry in each machine (slave), I can run hadoop without disk/storage bottleneck.
How do I configure datanode differently in each slave or
How do I configure setup for master node to find hadoop that are installed in different directory in slave node when starting namenode and datanodes using "start-dfs.sh" ?
Or, is there some fancy way for this environment?
Thanks!

How to Switch between namenodes in hadoop?

Pseudomode Cluster:
Suppose first time I created a namenode on Machine "A" with name "Root1".
This will create a HDFS on tha machine.
Now i copy some file to HDFS using copyFromLocal and do some mapreduce.
Now i need to change some /conf files.
I'll change config file and to make them effective I formatted namenode with name "Root2".
If i browse the HDFS , it will be empty (means it will not contain those which copied earlier for "Root1").
If I want to see old file (for "Root1"), is there any way to switch to that HDFS or namenode (Root2 to Root1 ) ??
To be clear. Did you launch the another namenode on your machine ?
Type sudo jps in console or http://localhost:50070 in browser and check if you have more than one datanode. If there is just one node you lost your data from HDFS. If you have two namenodes you can check the filesystem in Internet browser on http://localhost:50070.
Here is instruction how to launch more than one datanode on one machine.

Resources