Hadoop standalone mode not starting at the local machine have permission issues - hadoop

I am not able to figure out what the problem is, I have checked all the links available for the problem and tried but still the same problem.
Please need help as the sandbox available needs higher configuration like more RAM.
hstart
WARNING: Attempting to start all Apache Hadoop daemons as adityaverma
in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: adityaverma#localhost: Permission denied
(publickey,password,keyboard-interactive).
Starting datanodes
localhost: adityaverma#localhost: Permission denied
(publickey,password,keyboard-interactive).
Starting secondary namenodes [Adityas-MacBook-Pro.local]
Adityas-MacBook-Pro.local: adityaverma#adityas-macbook-pro.local:
Permission denied (publickey,password,keyboard-interactive).
2018-05-30 11:07:03,084 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
Starting resourcemanager
Starting nodemanagers
localhost: adityaverma#localhost: Permission denied (publickey,password,keyboard-interactive).

This error typically means you failed to setup passwordless SSH. For example, the same error should happen with ssh localhost, and it should not prompt for a password
Check the Hadoop documentation again on SSH key generation and add it to your authorized keys file
I might suggest setting up a virtual machine anyway (for example, using Vagrant) if the sandbox requires too many resources. The Hortonworks&Cloudrea installation docs are fairly detailed to install a cluster from scratch
This way, Hadoop isn't cluttering your Mac's hard drive and a Linux server will closer match Hadoop installations running in production environments

Related

hadoop on macOS initiating secondary namenode fails due to ssh connection refused

I've successfully gone through initiating single-node in a pseudo-distributed mode described in https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation, under Window's wsl2 environment.
After that, I tried to repeat it using MacBookPro. But somehow start-dfs.sh fails. Terminal throws error:
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [kakaoui-MacBookPro.local]
kakaoui-MacBookPro.local: ssh: connect to host kakaoui-macbookpro.local port 22: Connection refused
2021-06-26 23:01:23,377 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Okay. There are answers saying I should enable ssh connection via system property, but it is already set so and ssh localhost also works fine.
And then thing goes worth; Sometimes it is described that secondary namenode fails as:
Starting secondary namenodes [kakaoui-MacBookPro.local]
kakaoui-MacBookPro.local: ssh: connect to host kakaoui-macbookpro.local port 22: Operation timed out
Then when I leave Mac for a while and again command start-dfs.sh, once in a while it succeeds. And as I do stop-dfs.sh and start-dfs.sh to check, it fails.
Even if I could successfully start-dfs.sh, a lot of problems like not being able to start data node or resourcemanager or nodemanager etc comes after. I couldn't run hadoop environment even once.
Feels like everything is mixed up and things are not stable at all. Tried reinstalling this and that for several times already. Unfortunately most of initiation failure is not even recored in /logs folder.
Currently I'm using:
macOS: Catalina 10.15.6
java: 1.8.0_291
hadoop: 3.3.1
I've spent whole two day just trying. Please help!
Okay, I found the solution that I don’t understand. I turned off wifi connection during initiation process and all processes started up. Can’t understand how wifi connection interferes ssh localhost though.
Provide ssh-key less access to all your worker nodes in hosts file, even localhost as well as kakaoui-macbookpro.local. Read instruction in the Creating a SSH Public Key on OSX.
At last test access without password by ssh localhost and ssh [yourworkernode] (maybe ssh kakaoui-macbookpro.local).

Secondary namenode connection timed out

I'm trying to set up hadoop on my Mac Mojave 10.14.6. The hadoop version I'm using is 3.0.3
I followed this tutorial to set up the config: https://dbmstutorials.com/hive/hdfs-setup-on-mac.html
While running hdfs namenode -format I have this following error for the secondary namenode:
Starting secondary namenodes [xp]
xp: ssh: connect to host xp port 22: Operation timed out
2019-12-09 09:26:03,796 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I allowed remote login and created ssh keys without password and desactivate the fire wall to check if it could help but the problem remains. Any help would be greatly appreciated :)
Yes I tried to ssh xp and it didn't work. After investigating a bit more I managed to make it work...
I changed the ip in the /etc/hosts from 127.0.1.1 to another one which were responding to the ping. I don't know why the ip 127.0.1.1 didn't work but at least the problem seems to be fixed for now

localhost: ERROR: Cannot set priority of datanode process 2984

I set up and configured a multi-node Hadoop .Will appear when I start
My Ubuntu is 16.04 and Hadoop is 3.0.2
Starting namenodes on [master]
Starting datanodes
localhost: ERROR: Cannot set priority of datanode process 2984
Starting secondary namenodes [master]
master: ERROR: Cannot set priority of secondarynamenode process 3175
2018-07-17 02:19:39,470 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
Who can tell me which link is wrong?
I had the same error and fixed it by ensuring that the datanode and namenode locations have the right permissions and are owned by the user starting hadoop daemons.
Check that
The directory path properties in hdfs-site.xml under $HADOOP_CONF_DIR are pointing to valid locations.
dfs.namenode.name.dir
dfs.datanode.data.dir
dfs.namenode.checkpoint.dir
Hadoop user must have write permission for these paths
If the write permission is not present for the mentioned paths, then the processes might not start and the error you see can occur.
I had the same error, and tried the above method, but it doesn't work.
I set XXX_USER in all xxx-env.sh files, and got the same result.
Finally I set HADOOP_SHELL_EXECNAME="root" in ${HADOOP_HOME}/bin/hdfs, and the error disappeared.
The default value of HADOOP_SHELL_EXECNAME is "HDFS".
I had the same error when I renamed my Ubuntu home directory, and had to edit core-site.xml, changing the value of the property hadoop.tmp.dir to the new path.
Just append the word "native" to your HADOOP_OPTS like this:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
I had the same issue, you just need to check hadoop/logs directory and look for a .log file for datanode, type more nameofthefile.log and check for the errors, mine was a problem in to configuration, I fixed it and it worked.

Namenode cannot start

I am trying to upgrade HDFS from 1.2.1 to version 2.6. However, whenever I run start-dfs.sh -upgrade command, I get the below error:
hduser#Cluster1-NN:/usr/local/hadoop2/hadoop-2.6.0/etc_bkp/hadoop$ $HADOOP_NEW_HOME/sbin/start-dfs.sh -upgrade
15/05/17 12:45:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [nn]
Error: Please specify one of --hosts or --hostnames options and not both.
nn: starting datanode, logging to /var/hadoop/logs/hadoop-hduser-datanode-Cluster1-NN.out
dn1: starting datanode, logging to /var/hadoop/logs/hadoop-hduser-datanode-Cluster1-DN1.out
dn2: starting datanode, logging to /var/hadoop/logs/hadoop-hduser-datanode-Cluster1-DN2.out
Starting secondary namenodes [0.0.0.0]
Error: Please specify one of --hosts or --hostnames options and not both.
Please let me know if any of you experts have come across such error.
I got the same problem on Arch Linux with newly installed Hadoop 2.7.1. I'm not sure whether my case is the same as yours or not, but my experience should help. I just comment out the line HADOOP_SLAVES=/etc/hadoop/slaves in /etc/profile.d/hadoop.sh and re-login. Both accessing HDFS and running streaming jobs work for me.
The cause is that the Arch-specific script /etc/profile.d/hadoop.sh declares the $HADOOP_SLAVES environment variable. And in start-dfs.sh, hadoop-daemons.sh is called with --hostnames arguments. This confuses libexec/hadoop-config.sh.
You may want to type echo $HADOOP_SLAVES as the hadoop user. If there are non-empty outputs, check your .bashrc and/or other shell startup scripts. Hope that helps :)
Maybe it is short of some hadoop library.Can you show the detail information of namenode logs?

Cannot access hdfs file system running in mapr sandbox VM

I have just installed the MapR sandbox virtual machine running in Virtualbox. The VM is set up using "NAT" network mode and ports are forwarded to my Mac. Since the ports are forwarded I am guessing that I should be able to access the hdfs on "localhost".
now I am trying to list the contents of the hdfs on the VM:
$ hadoop fs -fs maprfs://localhost -ls /
15/03/25 15:16:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-03-25 15:16:11,6646 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1586 Thread: 4548153344 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!
2015-03-25 15:16:16,6683 ERROR Client fs/client/fileclient/cc/client.cc:813 Thread: 4548153344 Failed to initialize client for cluster localhost:7222, error Connection refused(61)
ls: Could not create FileClient
I also tried with 127.0.0.1, with sudo and with the port :5660 at the end without success.
Any ideas?
Changing from NAT network mode to host only fixed the problem. Then, of course I have to use the IP of the VM for accessing maprfs.
if you are just running plain Spark on local/single node then you dont need HDFS, you can just mention your input and output files to be loaded from local file system, like below:
file:///pathtoinput
file:///pathtooutput

Resources