Re-format filesystem error starting Hadoop services on Mac - macos

iMac 2020 Intel, MacOS Monterey 12.6, Java 1.8, Hadoop 3.3.4 as at 9-Feb-23
I am getting this error when starting Hadoop with this command:
$HADOOP_HOME/sbin/start-all.sh
Irrespective of response as Y or N, the error keeps running on the terminal and never stops.
localhost: Re-format filesystem in Storage Directory root=
/tmp/hadoop-arshadssss/dfs/name; location= null ? (Y or N) Invalid
input:
I followed the steps from https://techblost.com/how-to-install-hadoop-on-mac-with-homebrew/ to install and configure. It feels like all is done and this is final step...any help/support to resolve would be appreciated.
I tried killing the namenode process from activity monitor and re-starting to no avail

start-all is a deprecated script.
You should use start-dfs and start-yarn separately.
Or use hdfs namenode start, for example, and same for datanode, yarn resourcemanager, nodemanager, etc. all separately.
Then debug which daemon is actually causing the problem

Related

Cannot start running on browser the namenode for Hadoop

It is my first time in installing Hadoop on my Linux (Fedora distro) running on VM (using Parallel on my Mac). And I followed every step on this video and including the textual version of it.And then when I run it on localhost (or the equivalent value from hostname) in port 50070, I got the following message.
...can't establish a connection to the server at localhost:50070
When I run the jps by the way command I don't have the datanode and namenode unlike at the end of the textual version tutorial which has the following:
While mine has only the following processes running:
6021 NodeManager
3947 SecondaryNameNode
5788 ResourceManager
8941 Jps
When I run the hadoop namenode command I have some of the following [redacted] error:
Cannot access storage directory /usr/local/hadoop_store/hdfs/namenode
16/10/11 21:52:45 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_store/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I tried to access by the way the above mentioned directories and it existed.
Any hint for this newbie? ;-)
You would need to give read and write permission to user with which you are running the services on directory /usr/local/hadoop_store/hdfs/namenode.
Once done, you should run format command using hadoop namenode -format
Then try to start your services.
delete files /app/hadoop/tmp/*
and try again formatting the namenode and then start-dfs.sh & start-yarn.sh

Hadoop Installation: Format Namenode

I'm struggling with installing Hadoop 2.2.0 on my Mac OSX 10.9.3. I essentially followed this tutorial:
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
When I run $HADOOP_PREFIX/bin/hdfs namenode -format to format namenode, I get the message:
SHUTDOWN_MSG: Shutting down NameNode at Macintosh.local/192.168.0.103. I believe this is preventing me from successfully running the test
$HADOOP_PREFIX/bin/hadoop jar $HADOOP_PREFIX/share/hadoop/yarn/hadoop-yarn-applications-
distributedshell-2.2.0.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar
$HADOOP_PREFIX/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar --
shell_command date --num_containers 2 --master_memory 1024
Does anyone know how to correctly format namenode?
(Regarding the test command above, someone mentioned to me that it could have something to do with the hdfs file system not functioning properly, if this is relevant.)

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

Hadoop jobtracker UI not accessible

I've configured hadoop 1.0.4 in pseudo-distributed mode. Everything's good, I can put local files in HDFS and run wordcount task. But I just can't access the jobtracker web UI through localhost:50030, localhost:50070 doesn't work neither.
HTTP ERROR 404
Problem accessing /jobtracker.jsp. Reason:
/jobtracker.jsp Powered by Jetty://
I look at the log files, but there's no error...
I used to have some problem with datanode, and jobtracker complained about replication, but that is solved and now all daemons are good (namenode, datanode, jobtracker, tasktracker, secondarynamenode) and no error in any of the log files.
Any suggestions?
Ok finally I solved it myself, I had to re-install the system then re-install hadoop. I think the problem should be that I've previously installed the CDH4 on my system, which is hadoop 2.0.0 and even if I uninstalled all of its packages (debian system) and change the tmp folder of HDFS, but maybe there's still something left. The only way is to restart over.

Hadoop CDH3 ERROR. Could not start Hadoop datanode daemon

I'm deploying Hadoop CDH3 in pseudo-distributed mode on a VPS.
So i have installed CDH3, then i have executed
sudo apt-get install hadoop-0.20-conf-pseudo
but if i try to start all daemons with
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
it throws
ERROR. Could not start Hadoop datanode daemon
The same installation and starting commands works on my notebook.
I don't understand the cause. In fact the log file is empty. The available RAM is about 900MB, with 98G of available disk space.
Which can be the cause or how can i discover it? I'm excluding that the error is from the configuration files.
Consider using Cloudera Manager, it could save you some time (especially if you use multiple nodes). There is a nice video on Youtube which shows deployment process

Resources