How to change the container log location in a Dataproc cluster? - hadoop

What is the correct way to change the container log location in a Dataproc cluster during cluster creation?
The default path is /var/log/hadoop-yarn/userlogs and I want to change it to a local SSD mount such as /mnt/1/hadoop/yarn/userlogs. I tried adding
--properties=yarn:yarn.nodemanager.log-dirs
to the gcloud dataproc clusters create command but got the error -
bash: --properties=yarn:yarn.nodemanager.log-dirs=/mnt/1/hadoop/yarn: No such file or directory
This is most likely because the local SSD gets mounted after the cluster is created. Can someone please help?

Related

No Such file or directory : hdfs

I deployed Kubernetes on a single node using minikube and then installed hadoop and hdfs with helm. It's working well.
The problem is when i try to copy a file from local to hdfs $ hadoop fs -copyFromLocal /data/titles.csv /data i get this: No such file or directory
this is the path on local :
You've shown a screenshot of your host's filesystem GUI details panel.
Unless you mount /data folder inside the k8s pod, there will be no /data folder you can put from.
In other words, you should get a similar error with just ls /data, and this isn't an HDFS problem since "local" means different things in different contexts.
You have at least 3 different "local" filesystems - your host, the namenode pod, the datanode pod, and possibly also the minikube driver (if using a VM)

Whenever I restart my ubuntu system (Vbox) and start my hadoop , my name node is not working

Whenever I restart my ubuntu system (Vbox) and start my Hadoop, my name node is not working.
To resolve this I have to always the folders of namenode and datanode and format Hadoop every time I restart my system.
Since 2 days am trying to resolve the issue but its not working. I tried to give the permissions 777 again to the namenode and datanode folders, also I tried changing the paths for the same.
My error is
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /blade/Downloads/Hadoop/data/datanode is in an inconsistent state: storage directory does not exist or is not accessible
Please help me to resolve the issue.
You cannot just shutdown the VM. You need to cleanly stop the datanode and namenode processes in that order, otherwise there's a potential for a corrupted HDFS, causing you to need to reformat, assuming that you don't have a backup system
I'd also suggest putting Hadoop data for a VM in its own VM drive and mount, not a shared host folder under Downloads

Why Namenode is not coming UP in HDFS on kubernetes

I tried running HDFS on k8s using this project. I got journal node and zookeeper Up and running but if I see namenode it is not going to Error state
If I look after the logs I see it as hostname could not be resolved. Then I tried seeing the /etc/hosts entry of the file on the boot up script, IP and hostname for name is not set
I'm not using any customized image just using the same helm charts as it was.
What should I need to do If I want name node to be up and running
Helm Chart Link: https://github.com/apache-spark-on-k8s/kubernetes-HDFS/tree/master/charts

Hadoop fuse on multinode

I need to use hadoop fuse to mount HDFS on a multi-node cluster. How can I achieve that?
I have successfully deployed fuse on a single-node cluster, but I doubt it would work on multi-node. Can anyone please throw light over this ?
It doesn't matter, whether your cluster is single node or multinode. If you want to mount HDFS on a remote machine, make sure that particular machine has access to cluster network. Setup a hadoop client(with the same hadoop version in cluster) in the node in which you are planning to mount HDFS using FUSE.
The difference while mounting is namenode url.
(dfs://NAMENODEHOST:NN-IPC-PORT/)
In case of single node namenode url would be localhost(0.0.0.0/127.0.0.1/0), but in multinode cluster you have to give namenode Hostname/Ip address instead of localhost. It's possible to mount hdfs in any linux machines which can access hadoop cluster.
Trying to use Fuse to mount HDFS. Can't compile libhdfs

How to connect mac to hadoop/hdfs cluster

I have CDH for running in a cluster and I have ssh access to the machine. I need to connect my Mac to Cluster, so if I do hadoop fs -ls , it should show me the content of the cluster.
I have configured HADOOP_CONF to point to the configuration of the cluster. I am running CDH4 in my cluster. Am I missing something here , Is it possible to connect ?
Is there some ssh key setup that I need to do ?
There are a few of things you will need to ensure to do this:
You need to set your HADOOP_CONF_DIR environment variable to point to a directory that carries config XMLs that point to your cluster.
Your Mac should be able to directly access the hosts that form your cluster (all of them). This can be done via VPN, for example - if the cluster is secured from external networks.
Your Mac should carry the same version of Hadoop that the cluster runs.

Resources