start-all.sh command not found - hadoop

I have just installed Cloudera VM setup for hadoop. But when I open the command prompt and want to start all daemons for hadoop using command 'start-all.sh' , I get an error stating "bash : start-all.sh: command not found".
I have tried 'start-dfs.sh' too yet still gives the same error. When I use 'jps' command, I can see that none of the daemons have been started.

You can find start-all.sh and start-dfs.sh scripts in bin or sbin folders. You can use the following command to find that. Go to hadoop installation folder and run this command.
find . -name 'start-all.sh' # Finds files having name similar to start-all.sh
Then you can specify the path to start all the daemons using bash /path/to/start-all.sh

If you're using the QuickStart VM then the right way to start the cluster (as #cricket_007 hinted) is by restarting it in the Cloudera Manager UI. The start-all.sh scripts will not work since those only apply to the Hadoop servers (Name Node, Data Node, Resource Manager, Node Manager ...) but not all the services in the ecosystem (like Hive, Impala, Spark, Oozie, Hue ...).
You can refer to the YouTube video and the official documentation Starting, Stopping, Refreshing, and Restarting a Cluster

Related

Secondary Name Node not starting on hadoop

I have installed Hadoop via command line in GCP as well as all its components and my secondary name doesn't run when I execute JPS. All the other Daemons run fine. What could be the problem?

How to start Datanode? (Cannot find start-dfs.sh script)

We are setting up automated deployments on a headless system: so using the GUI is not an option here.
Where is start-dfs.sh script for hdfs in Hortonworks Data Platform? CDH / cloudera packages those files under the hadoop/sbin directory. However when we search for those scripts under HDP they are not found:
$ pwd
/usr/hdp/current
Which scripts exist in HDP ?
[stack#s1-639016 current]$ find -L . -name \*.sh
./hadoop-hdfs-client/sbin/refresh-namenodes.sh
./hadoop-hdfs-client/sbin/distribute-exclude.sh
./hadoop-hdfs-datanode/sbin/refresh-namenodes.sh
./hadoop-hdfs-datanode/sbin/distribute-exclude.sh
./hadoop-hdfs-nfs3/sbin/refresh-namenodes.sh
./hadoop-hdfs-nfs3/sbin/distribute-exclude.sh
./hadoop-hdfs-secondarynamenode/sbin/refresh-namenodes.sh
./hadoop-hdfs-secondarynamenode/sbin/distribute-exclude.sh
./hadoop-hdfs-namenode/sbin/refresh-namenodes.sh
./hadoop-hdfs-namenode/sbin/distribute-exclude.sh
./hadoop-hdfs-journalnode/sbin/refresh-namenodes.sh
./hadoop-hdfs-journalnode/sbin/distribute-exclude.sh
./hadoop-hdfs-portmap/sbin/refresh-namenodes.sh
./hadoop-hdfs-portmap/sbin/distribute-exclude.sh
./hadoop-client/sbin/hadoop-daemon.sh
./hadoop-client/sbin/slaves.sh
./hadoop-client/sbin/hadoop-daemons.sh
./hadoop-client/etc/hadoop/hadoop-env.sh
./hadoop-client/etc/hadoop/kms-env.sh
./hadoop-client/etc/hadoop/mapred-env.sh
./hadoop-client/conf/hadoop-env.sh
./hadoop-client/conf/kms-env.sh
./hadoop-client/conf/mapred-env.sh
./hadoop-client/libexec/kms-config.sh
./hadoop-client/libexec/init-hdfs.sh
./hadoop-client/libexec/hadoop-layout.sh
./hadoop-client/libexec/hadoop-config.sh
./hadoop-client/libexec/hdfs-config.sh
./zookeeper-client/conf/zookeeper-env.sh
./zookeeper-client/bin/zkCli.sh
./zookeeper-client/bin/zkCleanup.sh
./zookeeper-client/bin/zkServer-initialize.sh
./zookeeper-client/bin/zkEnv.sh
./zookeeper-client/bin/zkServer.sh
Notice: there are ZERO start/stop sh scripts..
In particular I am interested in the start-dfs.sh script that starts the namenode(s) , journalnode, and datanodes.
How to start DataNode
su - hdfs -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode";
Github - Hortonworks Start Scripts
Update
Decided to hunt for it myself.
Spun up a single node with Ambari, installed HDP 2.2 (a), HDP 2.3 (b)
sudo find / -name \*.sh | grep start
Found
(a) /usr/hdp/2.2.8.0-3150/hadoop/src/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/s‌​tart-dfs.sh
Weird that it doesn't exist in /usr/hdp/current, which should be symlinked.
(b) /hadoop/yarn/local/filecache/10/mapreduce.tar.gz/hadoop/sbin/start-dfs.sh
The recommended way to administer your hadoop cluster would be via the administrator panel. Since you are working on Hotronworks distribution, it makes more sense for you to use Ambari instead.

Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop

Job tracker and Task tracker don't sow up when ran the start-all.sh command in ububtu for hadoop
I do get the rest of the processes while i run the "JPS" command in unix.
Not sure why i am not being shown with the job tracker and task tracker.Have been following couple of links and couldn't get my prob sorted.
Steps done :
-Multiple times formatted the namenode
-Multiple time deleted and recreated the tmp folder with appropriate permissions.
What could be the issue ?
Any suggestions would really help me as i am struggling in setting up hadoop on my laptop.I am new to it though.
Try starting jobtracker and tasktracker separately.
From your hadoop HOME directory run
. bin/../libexec/hadoop-config.sh
Then from hadoop BIN directory run
hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker
You must have been using hadoop 2.x version where jobtracker is replaced with YARN resource manager. Using jps(jdk is needed) you can check whether resouce manager is running. If it is running then the default url for it is (host-name):8088. You can check your nodes,jobs also configuration there.If not running then start them with sbin/start-yarn.sh.

Could not find and execute start-all.sh and Stop-all.sh on Cloudera VM for Hadoop

How to start / Stop services from command line CDH4 --. I am new to Hadoop. Installed VM from Cloudera. Could not find start-all.sh and stop-all.sh . How to stop or start the task tracker or data node if I want. It is a single node cluster which I am using on Centos. I haven't dont any modifications.
More over I see there are changes in the directory structures in all flavours. I could not locate these sh files on the VM for my installation.
[cloudera#localhost ~]$ stop-all.sh
bash: stop-all.sh: command not found
Highly appreciate your support.
use Sudo su hdfs to start and to stop just type exit it will stop all the services.

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

Resources