I can't run Hive from Mac terminal - hadoop

I downloaded Hive and Hadoop onto my system, when I enter the jps command all the nodes seem to be running:
81699 SecondaryNameNode
65058 ResourceManager
82039 NodeManager
36086
81463 NameNode
91288 Jps
37193 Launcher
95256 Launcher
81563 DataNode
However when I try to run hive using the ./hive command I get the following error:
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
ERROR: Invalid HADOOP_COMMON_HOME
Unable to determine Hadoop version information.
'hadoop version' returned:
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
ERROR: Invalid HADOOP_COMMON_HOME
This is what my ~/.bashrc file looks like:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_311.jdk
export HADOOP_HOME=/opt/homebrew/Cellar/hadoop/3.3.1
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop
export HIVE_HOME=/Users/arjunpanyam/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin

Related

How to remove ERROR start-dfs.sh in Hadoop-3.2.0

Getting following errors when running start-dfs.sh to start hadoop services:
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [ahsan-Lenovo-G570]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
In hadoop home directory open etc/hadoop/hadoop-env.sh file and add below lines to remove error:
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
You can add your user name by replacing root in above commands.

Not taking right user name while starting Hadoop

Im attempting to start Hadoop
./sbin/start-dfs.sh
but I get the following error
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
I ran this from my terminal before I execute start
export HADOOP_USER_NAME="myname"
export HDFS_NAMENODE_USER="myname"
export HDFS_DATANODE_USER="myname"
export HDFS_SECONDARYNAMENODE_USER="myname"
export YARN_RESOURCEMANAGER_USER="myname"
export YARN_NODEMANAGER_USER="myname"
I have also created data folder and assigned it to same user group. Anything else Im missing?

Should hadoop installation path be the same across nodes

Hadoop 2.7 is installed at /opt/pro/hadoop/hadoop-2.7.3 at master, then the whole installation is copied to slave, but different directory /opt/pro/hadoop-2.7.3. I then update the environment variables (e.g., HADOOP_HOME, hdfs_site.xml for namenode and datanode) at slave machine.
Now I can run hadoop version at slave successfully. However, in the master, start-dfs.sh fails with message:
17/02/18 10:24:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-namenode-shijie-ThinkPad-T410.out
master: starting datanode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-datanode-shijie-ThinkPad-T410.out
slave: bash: line 0: cd: /opt/pro/hadoop/hadoop-2.7.3: No such file or directory
slave: bash: /opt/pro/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-secondarynamenode-shijie-ThinkPad-T410.out
17/02/18 10:26:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The hadoop uses the HADOOP_HOME of master(/opt/pro/hadoop/hadoop-2.7.3) at slave, while the HADOOP_HOME at slave is /opt/pro/hadoop-2.7.3.
So should the HADOOP_HOME be the same across nodes when installation?
.bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-7-openjdk-amd64/bin
export HADOOP_HOME=/opt/pro/hadoop-2.7.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
At slave server, $HADOOP_HOME/etc/hadoop has a file masters:
xx#wodaxia:/opt/pro/hadoop-2.7.3/etc/hadoop$ cat masters
master
No, Not necessarily. But if the paths are different among the nodes, then you cannot use the scripts like start-dfs.sh, stop-dfs.sh and the same for yarn. These scripts refer the $HADOOP_PREFIX variable of the node where the script is executed.
Snippet of code from hadoop-daemons.sh used by start-dfs.sh to start all the datanodes.
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$#"
The script is written this way because of the assumption that all the nodes of cluster follow the same $HADOOP_PREFIX or $HADOOP_HOME (deprecated) path.
To overcome this,
1) Either try to have the path same across all the nodes.
2) Or login to each node in the cluster and start the dfs process applicable for that node using,
$HADOOP_HOME/sbin/hadoop-daemon.sh start <namenode | datanode | secondarynamenode| journalnode>
Same procedure for yarn as well,
$HADOOP_HOME/sbin/yarn-daemon.sh start <resourcemanager | nodemanager>
No, it should not. $HADOOP_HOME is individual per each Hadoop node, but it can be instantiated by different ways. You can define it in global way by setting it in .bashrc file or it can be set in local hadoop-env.sh script in your Hadoop folder for example. Verify that the values are the same on every node of the cluster. If it is global you can check it by echo $HADOOP_HOME. If it is a script option, you can verify this variable by importing it into current context and checking it again:
. /opt/pro/hadoop/hadoop-2.7.3/bin/hadoop-env.sh
echo $HADOOP_HOME
Besides make sure that you don't have hadoop.home.dir property in your configuration, as it overrides environmental $HADOOP_HOME

Cannot start running on browser the namenode for Hadoop

It is my first time in installing Hadoop on my Linux (Fedora distro) running on VM (using Parallel on my Mac). And I followed every step on this video and including the textual version of it.And then when I run it on localhost (or the equivalent value from hostname) in port 50070, I got the following message.
...can't establish a connection to the server at localhost:50070
When I run the jps by the way command I don't have the datanode and namenode unlike at the end of the textual version tutorial which has the following:
While mine has only the following processes running:
6021 NodeManager
3947 SecondaryNameNode
5788 ResourceManager
8941 Jps
When I run the hadoop namenode command I have some of the following [redacted] error:
Cannot access storage directory /usr/local/hadoop_store/hdfs/namenode
16/10/11 21:52:45 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_store/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I tried to access by the way the above mentioned directories and it existed.
Any hint for this newbie? ;-)
You would need to give read and write permission to user with which you are running the services on directory /usr/local/hadoop_store/hdfs/namenode.
Once done, you should run format command using hadoop namenode -format
Then try to start your services.
delete files /app/hadoop/tmp/*
and try again formatting the namenode and then start-dfs.sh & start-yarn.sh

Hadoop cannot start NodeManager

I have installed the Hadoop Cluster which is the hadoop 0.23.9 version. I install the HDFS-1943.patch and now I can start all the namenode and datanode. (start-dfs.sh is working for me)
However, when I want to start the yarn daemons (running start-yarn.sh) , it shows the following error as the same as the previous happening:
[root#dbnode1 sbin]# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hchen/hadoop-0.23.9/logs/yarn-root- resourcemanager-dbnode1.out
datanode: starting nodemanager, logging to /home/hchen/hadoop-0.23.9/logs/yarn-root-nodemanager-dbnode2.out
datanode: Unrecognized option: -jvm
datanode: Error: Could not create the Java Virtual Machine.
datanode: Error: A fatal exception has occurred. Program will exit.
I have installed the patch already and start-dfs.sh is working for me. Why start-yarn.sh does not work??
Run HDFS as a non-root user with the appropriate permissions. Here is a JIRA with more details.

Resources