I found following error while execute Crontab.
"/home/sqoop/proddata/test.sh: 2: /home/sqoop/proddata/test.sh: hdfs: not found"
I run the following command in the terminal, and also the same command I mention in Script and both are working but using cron it does not work
# hdfs dfs -mkdir /user/hadoop/table/testtable
# vim test.sh
hdfs dfs -mkdir /user/hadoop/table/testtable
[root#nn1 hadoop-2.9.0]# ./sbin/start-dfs.sh
Starting namenodes on [nn1]
nn1: namenode running as process 2707. Stop it first.
nn1: datanode running as process 2859. Stop it first.
dn1: bash: line 0: cd: /home/user1/hadoop-2.9.0: No such file or directory
dn1: bash: /home/user1/hadoop-2.9.0/sbin/hadoop-daemon.sh: No such file or directory
dn2: bash: line 0: cd: /home/user1/hadoop-2.9.0: No such file or directory
dn2: bash: /home/user1/hadoop-2.9.0/sbin/hadoop-daemon.sh: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 3052. Stop it first.
[root#nn1 hadoop-2.9.0]#
For reference:
Master:
Hostname= nn1
Username= user1
Slave1:
Hostname=dn1
username=slave1
slave2:
Hostname=dn2
username=slave2
You're running the commands as the root user, not user1, where it's looking for the files.
And you really should not run any component of Hadoop as root anyways
It also says your processes are running, so it's not clear why you're trying to start them again
I am on Amazon Linux. I was trying to setup hadoop.
[ec2-user#ec2-user ~]$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
/home/ec2-user/hadoop-2.6.0/bin/hdfs: line 276: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64/jre/bin/java: No such file or directory
[ec2-user#ec2-user ~]$
I am installing Hortonworks Data Platform 2.2 manually on CentOS 6.5 64bit from RPM. During formating a namenode Insufficent parameters error is "thrown"
Those are instructions according to manual:
Format and Start HDFS
1. Execute these commands at the NameNode host machine:
su - hdfs
/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh
namenode -format
/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh -
-config $HADOOP_CONF_DIR start namenode
But during format command:
[root#virtual ~]# su - hdfs
[hdfs#virtual ~]$ /usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh namenode -format
Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>
It doesn't pass through parameters check:
# Runs a Hadoop command as a daemon.
#
# Environment Variables
#
# HADOOP_CONF_DIR Alternate conf dir. Default is ${HADOOP_PREFIX}/conf.
# HADOOP_LOG_DIR Where log files are stored. PWD by default.
# HADOOP_MASTER host:path where hadoop code should be rsync'd from
# HADOOP_PID_DIR The pid files are stored. /tmp by default.
# HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER by default
# HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.
##
export HADOOP_HOME=/usr/hdp/2.2.0.0-2041/hadoop
usage="Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>"
# if no args specified, show usage
if [ $# -le 1 ]; then
echo $usage
exit 1
fi
I am not sure if this is bug in script or in manual ...
Any hint would help
Thx
For formatting the NameNode, you can use the following command run as the 'hdfs' admin user:
/usr/bin/hdfs namenode -format
For starting up the NameNode daemon, use the hadoop-daemon.sh script:
/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode
"-config $HADOOP_CONF_DIR" is an optional parameter here in case you want to reference a specific Hadoop configuration directory.
I am using Hadoop 1.2.1 on master and slave but I have them installed on different directories. So when I invoke bin/start-dfs.sh on master, I get the following error.
partho#partho-Satellite-L650: starting datanode, logging to /home/partho/hadoop/apache/hadoop-1.2.1/libexec/../logs/hadoop-partho-datanode-partho-Satellite-L650.out
hduser#node2-VirtualBox: bash: line 0: **cd: /home/partho/hadoop/apache/hadoop-1.2.1/libexec/..: No such file or directory**
hduser#node2-VirtualBox: bash: **/home/partho/hadoop/apache/hadoop-1.2.1/bin/hadoop-daemon.sh: No such file or directory**
partho#partho-Satellite-L650: starting secondarynamenode, logging to /home/partho/hadoop/apache/hadoop-1.2.1/libexec/../logs/hadoop-partho-secondarynamenode-partho-Satellite-L650.out
The daemons are getting created fine on the Master as you can see below
partho#partho-Satellite-L650:~/hadoop/apache/hadoop-1.2.1$ jps
4850 Jps
4596 DataNode
4441 NameNode
4764 SecondaryNameNode
It is obvious that Hadoop is trying to find the hadoop-daemon.sh and libexec on the slave using the $HADOOP_HOME on the master.
How can I configure individual datanodes/slaves so that when I start a cluster from master, the Hadoop home directory for the respective slaves are checked for hadoop-daemon.sh?
Hadoop usually sets the HADOOP_HOME environment variable on each node in a file named hadoop-env.sh.
You can update hadoop-env.sh on each node with the path for the respective node. It should maybe be in /home/partho/hadoop/apache/hadoop-1.2.1/. Probably want to stop the cluster first so it will pickup the changes.
If you have locate installed run
locate hadoop-env.sh
or find / -name "hadoop-env.sh"
For Best solution for this you should keep hadoop directory in your any directory but it should be same for both like Example:
on master path:
/opt/hadoop
on slave path
/opt/hadoop
it doesn't matter which version you are using but directory name should be same
Once you set up the cluster, to start all daemons from master
bin/hadoop namenode -format(if required)
bin/stop-dfs.sh
bin/start-dfs.sh
bin/start-mapred.sh
In order to start all nodes from master,
- you need to install ssh on each node
- once you install ssh and generate ssh key in each server, try connecting each nodes from master
- make sure slaves file in master node has all Ips of all nodes
So commands would be
- install ssh(in each node) : apt-get install openssh-server
- once ssh is installed,generate key : ssh-keygen -t rsa -P ""
- Create password less login from namenode to each node:
ssh-copy-id -i $HOME/.ssh/id_rsa.pub user#datanodeIP
user - hadoop user on each machine`enter code here`
- put all nodes ip in slaves(in conf dir) file in namenode
Short Answer
On Master-Side
hadoop-daemons.sh
In $HADOOP_HOME/sbin/hadoop-daemons.sh (not $HADOOP_HOME/sbin/hadoop-daemon.sh, there is an s in the filename), there is a line calling $HADOOP_HOME/sbin/slaves.sh. In my version (Hadoop v2.7.7), it reads:
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$#"
Change the line it to the following line to make it respect slave-side environment variables:
exec "$bin/slaves.sh" "source" ".bash_aliases" \; "hadoop-daemon.sh" "$#"
yarn-daemons.sh
Similarly, in $HADOOP_HOME/sbin/yarn-daemons.sh, change the line:
exec "$bin/slaves.sh" --config $YARN_CONF_DIR cd "$HADOOP_YARN_HOME" \; "$bin/yarn-daemon.sh" --config $YARN_CONF_DIR "$#"
to
exec "$bin/slaves.sh" "source" ".bash_aliases" \; "yarn-daemon.sh" "$#"
On Slave-Side
Put all Hadoop-related environment variables into $HOME/.bash_aliases.
Start / Stop
To start HDFS, just run start-dfs.sh on master-side. The slave-side data node will be started as if hadoop-daemon.sh start datanode is executed from an interactive shell on slave-side.
To stop HDFS, just run stop-dfs.sh.
Note
The above changes already are already completed. But for perfectionists, you may also want to fix the callers of sbin/hadoop-daemons.sh so that the commands are correct when you dump them. In this case, find all occurrences of hadoop-daemons.sh in the Hadoop scripts and replace --script "$bin"/hdfs to --script hdfs (and all --script "$bin"/something to just --script something). In my case, all the occurrences are hdfs, and since the slave side will rewrite the command path when it is hdfs related, the command works just fine with or without this fix.
Here is an example fix in sbin/start-secure-dns.sh.
Change:
"$HADOOP_PREFIX"/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start datanode $dataStartOpt
to
"$HADOOP_PREFIX"/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode $dataStartOpt
In my version (Hadoop v2.7.7), the following files need to be fixed:
sbin/start-secure-dns.sh (1 occurrence)
sbin/stop-secure-dns.sh (1 occurrence)
sbin/start-dfs.sh (5 occurrences)
sbin/stop-dfs.sh (5 occurrences)
Explanation
In sbin/slaves.sh, the line which connects the master to the slaves via ssh reads:
ssh $HADOOP_SSH_OPTS $slave $"${#// /\\ }" \
2>&1 | sed "s/^/$slave: /" &
I added 3 lines before it to dump the variables:
printf 'XXX HADOOP_SSH_OPTS: %s\n' "$HADOOP_SSH_OPTS"
printf 'XXX slave: %s\n' "$slave"
printf 'XXX command: %s\n' $"${#// /\\ }"
In sbin/hadoop-daemons.sh, the line calling sbin/slaves.sh reads (I split it into 2 lines to prevent scrolling):
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; \
"$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$#"
The sbin/start-dfs.sh script calls sbin/hadoop-daemons.sh. Here is the result when sbin/start-dfs.sh is executed:
Starting namenodes on [master]
XXX HADOOP_SSH_OPTS:
XXX slave: master
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: namenode
master: starting namenode, logging to /home/hduser/hadoop-2.7.7/logs/hadoop-hduser-namenode-akmacbook.out
XXX HADOOP_SSH_OPTS:
XXX slave: slave1
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: datanode
slave1: bash: line 0: cd: /home/hduser/hadoop-2.7.7: Permission denied
slave1: bash: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh: Permission denied
Starting secondary namenodes [master]
XXX HADOOP_SSH_OPTS:
XXX slave: master
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: secondarynamenode
master: starting secondarynamenode, logging to /home/hduser/hadoop-2.7.7/logs/hadoop-hduser-secondarynamenode-akmacbook.out
As you can see from the above result, the script does not respect the slave-side .bashrc and etc/hadoop/hadoop-env.sh.
Solution
From the result above, we know that the variable $HADOOP_CONF_DIR is resolved at master-side. The problem will be solved if it is resolved at slave-side. However, since the shell created by ssh (with a command attached) is a non-interactive shell, the .bashrc script is not loaded on the slave-side. Therefore, the following command prints nothing:
ssh slave1 'echo $HADOOP_HOME'
We can force it to load .bashrc:
ssh slave1 'source .bashrc; echo $HADOOP_HOME'
However, the following block in .bashrc (default in Ubuntu 18.04) guards non-interactive shells:
# If not running interactively, don't do anything
case $- in
*i*) ;;
*) return;;
esac
At this point, you may remove the above block from .bashrc to try to achieve the goal, but I don't think it's a good idea. I did not try it, but I think that the guard is there for a reason.
On my platform (Ubuntu 18.04), when I login interactively (via console or ssh), .profile loads .bashrc, and .bashrc loads .bash_aliases. Therefore, I have a habit of keeping all .profile, .bashrc, .bash_logout unchanged, and put any customizations into .bash_aliases.
If on your platform .bash_aliases does not load, append the following code to .bashrc:
if [ -f ~/.bash_aliases ]; then
. ~/.bash_aliases
fi
Back to the problem. We could therefore load .bash_aliases instead of .bashrc. So, the following code does the job, and the $HADOOP_HOME from the slave-side is printed:
ssh slave1 'source .bash_aliases; echo $HADOOP_HOME'
By applying this technique to the sbin/hadoop-daemons.sh script, the result is the Short Answer mentioned above.