Hadoop : start-dfs.sh does not work when calling directly - hadoop

I have a very strange problem when starting hadoop.
When I call start-dfs.sh using absolute path /usr/local/hadoop/etc/hadoop/sbin/start-dfs.sh, it starts without any problem.
But as I add hadoop into my environment variables :
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
I would like to call it directly using start-dfs.sh. But when I start like this, it throws error :
20/10/26 16:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
20/10/26 16:36:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I wonder what is the problem ? I have all my Java home and core-site.xml well configured. Why it's not working if I start it directly from bash ?

It seems that you need to set the JAVA_HOME environment variable to where your java package is located in your (I suppose) Linux distribution. To do that, you have to locate the path to the java installation.
In order to do that you can use the following command on your terminal, as shown here:
find /usr -name java 2> /dev/null
which is gonna output one or a number of paths (depends on how many java versions you have on your system) like in the screenshot below:
You can choose one of the versions (or just take the single one you have) and copy the path of it.
Up next, to set the environment variable of JAVA_HOME with its path you need to copy the path you got from the output above and trim the last directory (aka the /java directory) off of it on a text editor.
For my system I chose the third version of java in my system so I went in the .bashrc file and added those 2 lines at the bottom (notice how on the setting of the variable the path ends before the /bin directory, while the setting of its path ends after the /bin directory):
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-i386
export PATH=$PATH:/usr/lib/jvm/java-11-openjdk-i386/bin/
So the bottom of the .bashrc file looks like this:
And to test it out, it works without the full path on the script (this also works for the start-all.sh and stop-all.sh scripts as well):

Finally the problem is that I have another hadoop in /opt/module. When I call hdfs for example, it refers to the /opt/module one than the one in /usr/local.

Related

localhost: ERROR: Cannot set priority of datanode process 2984

I set up and configured a multi-node Hadoop .Will appear when I start
My Ubuntu is 16.04 and Hadoop is 3.0.2
Starting namenodes on [master]
Starting datanodes
localhost: ERROR: Cannot set priority of datanode process 2984
Starting secondary namenodes [master]
master: ERROR: Cannot set priority of secondarynamenode process 3175
2018-07-17 02:19:39,470 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
Who can tell me which link is wrong?
I had the same error and fixed it by ensuring that the datanode and namenode locations have the right permissions and are owned by the user starting hadoop daemons.
Check that
The directory path properties in hdfs-site.xml under $HADOOP_CONF_DIR are pointing to valid locations.
dfs.namenode.name.dir
dfs.datanode.data.dir
dfs.namenode.checkpoint.dir
Hadoop user must have write permission for these paths
If the write permission is not present for the mentioned paths, then the processes might not start and the error you see can occur.
I had the same error, and tried the above method, but it doesn't work.
I set XXX_USER in all xxx-env.sh files, and got the same result.
Finally I set HADOOP_SHELL_EXECNAME="root" in ${HADOOP_HOME}/bin/hdfs, and the error disappeared.
The default value of HADOOP_SHELL_EXECNAME is "HDFS".
I had the same error when I renamed my Ubuntu home directory, and had to edit core-site.xml, changing the value of the property hadoop.tmp.dir to the new path.
Just append the word "native" to your HADOOP_OPTS like this:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
I had the same issue, you just need to check hadoop/logs directory and look for a .log file for datanode, type more nameofthefile.log and check for the errors, mine was a problem in to configuration, I fixed it and it worked.

Snappy compressed file on HDFS appears without extension and is not readable

I configured a Map Reduce job to save output as a Sequence file compressed with Snappy. The MR job executes successfully however in HDFS the output file looks as the following:
I've expected that the file will have a .snappy extension and that it should be part-r-00000.snappy. And now I think that this may be the reason for the file to be not readable when I'm trying to read it from a local file system using this pattern hadoop fs -libjars /path/to/jar/myjar.jar -text /path/in/HDFS/to/my/file
So I'm getting the –libjars: Unknown command when executing the command:
hadoop fs –libjars /root/hd/metrics.jar -text /user/maria_dev/hd/output/part-r-00000
And when I'm using this command hadoop fs -text /user/maria_dev/hd/output/part-r-00000, I'm getting the error:
18/02/15 22:01:57 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
-text: Fatal internal error
java.lang.RuntimeException: java.io.IOException: WritableName can't load class: com.hd.metrics.IpMetricsWritable
Caused by: java.lang.ClassNotFoundException: Class com.hd.ipmetrics.IpMetricsWritable not found
Could it be that the absence of the .snappy extension causes the problem? What other command should I try to read the compressed file?
The jar is in my local file system /root/hd/ Where should I place it not to cause ClassNotFoundException? Or how should I modify the command?
Instead of hadoop fs –libjars (which actually has a wrong hyphen and should be -libjars. Copy that exactly, and you won't see Unknown command)
You should be using HADOOP_CLASSPATH environment variable
export HADOOP_CLASSPATH=/root/hd/metrics.jar:${HADOOP_CLASSPATH}
hadoop fs -text /user/maria_dev/hd/output/part-r-*
The error clearly says ClassNotFoundException: Class com.hd.ipmetrics.IpMetricsWritable not found.
It means that a required library is missing in classpath.
To clarify your doubts:
Map-Reduce by default output the file as part-* and there is no
meaning of extension. Remember extension "thing" is just a metadata
usually required by windows operating system to determine suitable
program for the file. It has no meaning in linux/unix and the
system's behavior is not going to change, even though you rename the
file as .snappy (you may actually try this).
The command looks absolutely fine to inspect the snappy file, but it seems that some required jar file are not there, which is causing ClassNotFoundException.
EDIT 1:
By default hadoop picks the jar files from the path emit by below command:
$ hadoop classpath
By default it list all the hadoop core jars.
You can add your jar by executing below command on the prompt
export HADOOP_CLASSPATH=/path/to/my/custom.jar
After executing this, try checking the class path again by hadoop classpath command and you should be able to see your jar listed along with hadoop core jars.

How can I run a Bigtable HBase shell from any directory?

I started by following these instructions to install a hbase and configure it to hit my BigTable instance. That all works fine, however next up I wanted to additionally configure this installation so I can run hbase shell from anywhere.
So I added the following to my .zshrc:
export HBASE_HOME=/path/to/my/hbase
export PATH=$HBASE_HOME:...
When I run hbase shell now I get the following:
2017-04-28 09:58:45,069 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
NativeException: java.io.IOException: java.lang.ClassNotFoundException: com.google.cloud.bigtable.hbase1_2.BigtableConnection
initialize at /Users/mmscibor/.hbase/lib/ruby/hbase/hbase.rb:42
(root) at /Users/mmscibor/.hbase/bin/hirb.rb:131
I figured something was up with where it was looking for it's .jars and noticed that the .tar I downloaded had a lib directory so additionally tried:
hbase shell -cp $HBASE_HOME/lib/
But no luck. However, if I navigate to $HBASE_HOME and run hbase shell everything works fine again.
What am I missing here?
You are probably running into the issue described here:
https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/issues/226
You need to set GOOGLE_APPLICATION_CREDENTIALS in your environment, or run gcloud auth application-default login.

Should hadoop installation path be the same across nodes

Hadoop 2.7 is installed at /opt/pro/hadoop/hadoop-2.7.3 at master, then the whole installation is copied to slave, but different directory /opt/pro/hadoop-2.7.3. I then update the environment variables (e.g., HADOOP_HOME, hdfs_site.xml for namenode and datanode) at slave machine.
Now I can run hadoop version at slave successfully. However, in the master, start-dfs.sh fails with message:
17/02/18 10:24:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-namenode-shijie-ThinkPad-T410.out
master: starting datanode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-datanode-shijie-ThinkPad-T410.out
slave: bash: line 0: cd: /opt/pro/hadoop/hadoop-2.7.3: No such file or directory
slave: bash: /opt/pro/hadoop/hadoop-2.7.3/sbin/hadoop-daemon.sh: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/pro/hadoop/hadoop-2.7.3/logs/hadoop-shijiex-secondarynamenode-shijie-ThinkPad-T410.out
17/02/18 10:26:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The hadoop uses the HADOOP_HOME of master(/opt/pro/hadoop/hadoop-2.7.3) at slave, while the HADOOP_HOME at slave is /opt/pro/hadoop-2.7.3.
So should the HADOOP_HOME be the same across nodes when installation?
.bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:/usr/lib/jvm/java-7-openjdk-amd64/bin
export HADOOP_HOME=/opt/pro/hadoop-2.7.3
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
At slave server, $HADOOP_HOME/etc/hadoop has a file masters:
xx#wodaxia:/opt/pro/hadoop-2.7.3/etc/hadoop$ cat masters
master
No, Not necessarily. But if the paths are different among the nodes, then you cannot use the scripts like start-dfs.sh, stop-dfs.sh and the same for yarn. These scripts refer the $HADOOP_PREFIX variable of the node where the script is executed.
Snippet of code from hadoop-daemons.sh used by start-dfs.sh to start all the datanodes.
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$#"
The script is written this way because of the assumption that all the nodes of cluster follow the same $HADOOP_PREFIX or $HADOOP_HOME (deprecated) path.
To overcome this,
1) Either try to have the path same across all the nodes.
2) Or login to each node in the cluster and start the dfs process applicable for that node using,
$HADOOP_HOME/sbin/hadoop-daemon.sh start <namenode | datanode | secondarynamenode| journalnode>
Same procedure for yarn as well,
$HADOOP_HOME/sbin/yarn-daemon.sh start <resourcemanager | nodemanager>
No, it should not. $HADOOP_HOME is individual per each Hadoop node, but it can be instantiated by different ways. You can define it in global way by setting it in .bashrc file or it can be set in local hadoop-env.sh script in your Hadoop folder for example. Verify that the values are the same on every node of the cluster. If it is global you can check it by echo $HADOOP_HOME. If it is a script option, you can verify this variable by importing it into current context and checking it again:
. /opt/pro/hadoop/hadoop-2.7.3/bin/hadoop-env.sh
echo $HADOOP_HOME
Besides make sure that you don't have hadoop.home.dir property in your configuration, as it overrides environmental $HADOOP_HOME

Hadoop and JZMQ - no jzmq in java.library.path

I'm trying to get the JZMQ code working on ONE of the nodes on Hadoop Cluster. I have necessary native jmzq library files installed under - /usr/local/lib directory on that node.
Here's the list -
libjzmq.a libjzmq.la libjzmq.so libjzmq.so.0 libjzmq.so.0.0.0 libzmq.a libzmq.la libzmq.so libzmq.so.3 libzmq.so.3.0.0 pkgconfig
In my shell script if I run the Java command below, it works absolutely fine -
java -Djava.library.path=/usr/local/lib -classpath class/:lib/:lib/jzmq-2.1.3.jar bigdat.twitter.queue.TweetOMQSub
But when I run the below command, it throws Exception in thread "main"
java.lang.UnsatisfiedLinkError: no jzmq in java.library.path
hadoop jar $jarpath bigdat.twitter.queue.TweetOMQSub
I explicitly set the necessary files/Jars in Hadoop Classpath, Opts etc, using Export command
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/lib/"
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/txtUser/analytics/lib/*
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native/:/usr/local/lib/
The source code JAR and jzmq-2.1.3.jar files are present under /home/txtUser/analytics/lib/ folder on Hadoop node.
Also, the /usr/local/lib is added onto the system ld.conf
Can anyone suggest, given inputs on what I may be doing wrong here?
Add following line to /etc/profile or .bashrc:
export JAVA_LIBRARY_PATH=/usr/local/lib
Reload /etc/profile or .bashrc.

Resources