I deployed Kubernetes on a single node using minikube and then installed hadoop and hdfs with helm. It's working well.
The problem is when i try to copy a file from local to hdfs $ hadoop fs -copyFromLocal /data/titles.csv /data i get this: No such file or directory
this is the path on local :
You've shown a screenshot of your host's filesystem GUI details panel.
Unless you mount /data folder inside the k8s pod, there will be no /data folder you can put from.
In other words, you should get a similar error with just ls /data, and this isn't an HDFS problem since "local" means different things in different contexts.
You have at least 3 different "local" filesystems - your host, the namenode pod, the datanode pod, and possibly also the minikube driver (if using a VM)
Related
What is the correct way to change the container log location in a Dataproc cluster during cluster creation?
The default path is /var/log/hadoop-yarn/userlogs and I want to change it to a local SSD mount such as /mnt/1/hadoop/yarn/userlogs. I tried adding
--properties=yarn:yarn.nodemanager.log-dirs
to the gcloud dataproc clusters create command but got the error -
bash: --properties=yarn:yarn.nodemanager.log-dirs=/mnt/1/hadoop/yarn: No such file or directory
This is most likely because the local SSD gets mounted after the cluster is created. Can someone please help?
My OS is Windows 10.
Ubuntu 20.04.3 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64) installed on Windows 10.
When I copy the local file to hadoop, I am receiving an error as 0 datanodes available.
I am able to copy the file from hadoop to local folder. I can see the file in local directory using the command $ ls -l
Also I am able to create directory or files in hadoop. But if restart the ubuntu terminal again, there is no such directory or files exist. It shows empty.
The steps I followed:
1. start-all.sh
2. jps
(datanodes missing)
3. copy the local file to hadoop
ERROR as 0 datanodes available
4. copy files from hadoop to local directory successful
If you stop/restart the WSL2 terminal without running stop-dfs or stop-all, you run the risk of corrupting the namenode, and it needs to be reformatted using hadoop namenode -format, not rm the namenode directory.
After formatting, you can restart the datanodes and they should become healthy again.
Same logic applies in a production environment, which is why you should always have a standby namenode for failover
1.I am trying to mount HDFS to a local directory using NFS Gateway,I was successful in mounting the root "/",What i want to achieve is mounting only a part of HDFS (say /user) on local file system is there any way by which this can be done
2.Also i want to know how to mount HDFS on a separate machine,not part of the cluster,what i ideally want is a machine acting as NFS drive on which i have mounted a part of HDFS and all client uploading the file to that remote directory
i am using Hortonworks vm 2.2 ,Any suggestion would be very helpul
I've searched and I've been reading on Cloudera Hadoop on removing mount point file systems but I cannot find a thing on removing them.
I have two SSD drives in 6 machines and when I initially installed Cloudera Hadoop it added all file systems and I only need two mount points to run a few teragen and terasorts.
I need to remove everything except for:
/dev/nvme0n1 and /dev/nvme1n1
In Cloudera Manager you can modify the list of drives used for HDFS data at:
Clusters > HDFS > Configuration > DataNode Default Group (or whatever you may have renamed this to) > DataNode Data Directory
I run Hadoop cluster and I'm interested to install one more machine with DFSClient only.
This machine (lets call it machine X) will not be a part of the cluster.
Machine X will run DFSClient and I should be able to see HDFS from it.
In order to install DFSClient, I copied Hadoop home directory from one of cluster's node to machine X (including .jar files and configuration).
Then I run:
hadoop fs -ls /
I get the local ROOT directory (not HDFS ROOT).
What am I doing wrong?
Copy hdfs-site.xml and place in a folder under your local linux account home dir. Then ensure that your name node (default.fs.name) is pointing to the remote namenode. Then try hadoop --config <your_config_folder> fs -ls / where your_config_folder is where you placed your hdfs-site.xml.
Technically it should work if the following steps are done
If you have copied the configuration files(*.xml) from the hadoop cluster.
HADOOP_HOME set with the copied hadoop path.
Machine X should have access to the cluster network