How to install Hadoop on M1 Mac - hadoop

I followed serveral tuitorial and everytime I start Hadoop will have these
feiyechen#FEIYEdeMac-mini ~ % start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as feiyechen in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
Starting datanodes
localhost: datanode is running as process 55832. Stop it first and ensure /tmp/hadoop-feiyechen-datanode.pid file is empty before retry.
Starting secondary namenodes [FEIYEdeMac-mini.local]
FEIYEdeMac-mini.local: secondarynamenode is running as process 55966. Stop it first and ensure /tmp/hadoop-feiyechen-secondarynamenode.pid file is empty before retry.
2022-01-28 20:35:24,311 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
feiyechen#FEIYEdeMac-mini ~ % jps
55832 DataNode
57838 Jps
55966 SecondaryNameNode
57247 NameNode
Tutorial said should got these after run jps
I only have 4 items: DataNode, Jps, SecondaryNameNode, NameNode. Is that mean I failed?

It means you have a running HDFS installation, but not YARN.
You should be able to run start-yarn.sh separately if you want the ResourceManger + NodeManager
Otherwise, there are log files created for both the YARN processes that would include information about why they are failing.

Related

Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server Error

vijay#ubuntu:~$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as vijay in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: namenode is running as process 22733. Stop it first and ensure /tmp/hadoop-vijay-namenode.pid file is empty before retry.
Starting datanodes
localhost: datanode is running as process 22866. Stop it first and ensure /tmp/hadoop-vijay-datanode.pid file is empty before retry.
Starting secondary namenodes [ubuntu]
ubuntu: secondarynamenode is running as process 23072. Stop it first and ensure /tmp/hadoop-vijay-secondarynamenode.pid file is empty before retry.
Starting resourcemanager
Starting nodemanagers
vijay#ubuntu:~$ jps
23072 SecondaryNameNode
22866 DataNode
22733 NameNode
24447 Jps
enter image description here
I am facing hadoop web console error
Currently installed java version "19.0.1" 2022-10-18 and Hadoop 3.3.4

Spark-shell --master yarn stuck

I installed Hadoop and Spark via Homebrew
$ brew list --versions | grep spark
apache-spark 2.2.0
$ brew list --versions | grep hadoop
hadoop 2.8.1 2.8.2 hdfs
where Hadoop 2.8.2 is what I am using.
I followed this post to configure Hadoop. Also, followed this post to configure spark.yarn.archive as:
spark.yarn.archive hdfs://localhost:9000/user/panc25/spark-jars.zip
The following are my Hadoop/Spark related environment setting in my .bash_profile :
# ---------------------
# Hadoop
# ---------------------
export HADOOP_HOME=/usr/local/Cellar/hadoop/2.8.2
export YARN_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop/
alias hadoop-start="$HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start-yarn.sh"
alias hadoop-stop="$HADOOP_HOME/sbin/stop-yarn.sh;$HADOOP_HOME/sbin/stop-dfs.sh"
# ---------------------
# Apache Spark
# ---------------------
export SPARK_HOME=/usr/local/Cellar/apache-spark/2.2.0/libexec
export PATH=$SPARK_HOME/../bin:$SPARK_HOME/sbin:$PATH
I can successfully start hadoop (hdfa + yarn):
$ hadoop-start
17/11/12 17:08:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/hadoop-panc25-namenode-mbp13mid2017.local.out
localhost: starting datanode, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/hadoop-panc25-datanode-mbp13mid2017.local.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/hadoop-panc25-secondarynamenode-mbp13mid2017.local.out
17/11/12 17:08:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/yarn-panc25-resourcemanager-mbp13mid2017.local.out
localhost: starting nodemanager, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/yarn-panc25-nodemanager-mbp13mid2017.local.out
$ jps
92723 NameNode
93188 Jps
93051 ResourceManager
93149 NodeManager
92814 DataNode
92926 SecondaryNameNode
However, when I start spark-shell --master yarn it seems to freeze and I don't know what is going on:
What is wrong?
BTW, I could visit the SparkUI http://localhost:4040/, but all pages are blank.
I experienced a similar issue an was caused by the fact that I forgot to append /conf to HADOOP_CONF_DIR env variable (/etc/hadoop/conf).
In my case I was running spark 2.1 cloudera distribution and specified HADOOP_CONF_DIR=/etc/hadoop/conf/:/etc/hive/conf/ . Due to some reason it was getting stuck so I modified it to HADOOP_CONF_DIR=/etc/hadoop/conf/ and it worked. Still looking for the root cause !

Hadoop's NameNode and DataNode Service did not run in single_mode

I installed Hadoop 2.7.2 on Ubuntu 16.04 in single mode. But neither NameNode nor DataNode Services run after starting the Hadoop.
hduser#saber-Studio-1435:/usr/local/hadoop$ start-all.sh
This script is Deprecated.
Instead use start-dfs.sh and start-yarn.sh
16/06/20 15:34:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-saber-Studio-1435.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-saber-Studio-1435.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 7214. Stop it first.
16/06/20 15:35:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
resourcemanager running as process 7374. Stop it first.
localhost: nodemanager running as process 7502. Stop it first.
Status:
hduser#saber-Studio-1435:/usr/local/hadoop$ jps
8747 Jps
7502 NodeManager
7374 ResourceManager
7214 SecondaryNameNode
First stop the hadoop $HADOOP_HOME ./sbin/stop-all.sh
Then format the hadoop ecosytem
./bin/hadoop namenode -format
./bin/hadoop datanode -format
./bin/hdfs namenode -format
./bin/hdfs datanode -format
Then start agian using ./sbin/start-all.sh
Then try jps on cli and if still does'nt works then remove the directory created for hdfs and recreate it using mkdir -p

Hadoop 2.6.2, start-dfs.sh dont start jobtacker and tasktracker

I installed hadoop single node, and now Im starting the cluster with start-dfs.sh command.
But jobotracker and tasktracker are not appearing with jps command, so it seems that they are not starting.
Do you see why? Im installing the version 2.6.2...
After execute the command start-dfs.sh, this appears:
[hadoopadmin#hadoop ~]$ start-dfs.sh
16/03/23 12:17:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop-2.6.2/logs/hadoop-hadoopadmin-namenode-hadoop.out
localhost: starting datanode, logging to /usr/local/hadoop-2.6.2/logs/hadoop-hadoopadmin-datanode-hadoop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.6.2/logs/hadoop-hadoopadmin-secondarynamenode-hadoop.out
16/03/23 12:17:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoopadmin#hadoop ~]$ jps
2881 DataNode
2758 NameNode
3142 Jps
3039 SecondaryNameNode
[hadoopadmin#hadoop ~]$
There is no JobTracker and TaskTracker anymore. We have NodeManager and resourceManager. Here you just started dfs services not started yarn services, to start yarn services run start-yarn.sh then only yarn related services will start.
If you want to start all services run start-all.sh (not a good practice)

Hadoop 2.2 - datanode doesn't start up

I had Hadoop 2.4 this morning (see my previous 2 questions). Now I removed it and installed 2.2 as I had issues with 2.4, and also as I think 2.2 is the latest stable release. Now I followed the tutorial here:
http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1
I am pretty sure I did everything right but I am facing similar issues again.
When I run jps it is obvious that the data node is not starting up.
What am I doing wrong again?
hduser#test02:~$ start-dfs.sh
14/06/06 18:12:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-test02.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-test02.out
localhost: Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
localhost: It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-test02.out
0.0.0.0: Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
0.0.0.0: It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/06 18:13:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hduser#test02:~$ jps
2201 Jps
hduser#test02:~$ jps
2213 Jps
hduser#test02:~$ start-yarn
start-yarn: command not found
hduser#test02:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-test02.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-test02.out
hduser#test02:~$ jps
2498 NodeManager
2264 ResourceManager
2766 Jps
hduser#test02:~$ jps
2784 Jps
2498 NodeManager
2264 ResourceManager
hduser#test02:~$ jps
2498 NodeManager
2264 ResourceManager
2796 Jps
hduser#test02:~$
My problem was that I took these instructions from the tutorial too literally.
Paste following between <configuration>
fs.default.name
hdfs://localhost:9000
I suspected this was wrong while doing it but still I did it.
It seemed incorrect as the core-site.xml file is in XML format.
So actually, it needs to look like this.
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Changing it to this fixed my problem.
I had similar issues with DataNode not starting up. What I did was reformat the namenode, then restarted the cluster. Then, running jps confirmed that data node was started up.
This can be caused by placing the HDFS directory in your "home" directory (on a linux box) since upon starting up and shutting down the OS affects these folders (not exactly sure how, but to prevent this problem in the future, move the HDFS directory out of your home directory).
Please let me know if this works.

Resources