Unable to find Namenode class when setting up Hadoop on Windows 8 - windows

Trying to set up Hadoop 2.4.1 on my machine using Cygwin and I'm stuck when I try to run
$ hdfs namenode -format
which gives me
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I think it's due to an undefined environment variable, since I can run
$ hadoop version
without a problem. I've defined the following:
JAVA_HOME
HADOOP_HOME
HADOOP_INSTALL
as well as adding the Hadoop \bin and \sbin (and Cygwin's \bin) to the Path. Am I missing an environment variable that I need to define?

Ok, looks like the file hadoop\bin\hdfs also has to be changed like the hadoop\bin\hadoop file described here.
The end of the file must be changed from:
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$#"
to
exec "$JAVA" -classpath "$(cygpath -pw "$CLASSPATH")" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$#"
I assume I'll have to make similar changes to the hadoop\bin\mapred and hadoop\bin\yarn when I get to using those files.

Related

spark-submit command not found in airflow

I am trying to run my spark job in airflow, when I executed this command spark-submit --class dataload.dataload_daily /home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar in terminal, it works fine without any issue.
However, I am doing the same here in airflow, but keep getting the error
/tmp/airflowtmpKQMdzp/spark-submit-scalaWVer4Z: line 1: spark-submit:
command not found
t1 = BashOperator(task_id = 'spark-submit-scala',
bash_command = 'spark-submit --class dataload.dataload_daily \
/home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar',
dag=dag,
retries=0,
start_date=datetime(2018, 4, 14))
I have my spark path mentioned in bash_profile,
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7
export PATH="$SPARK_HOME/bin/:$PATH"
sourced this file as well. Not sure how to debug this, can anyone help me on this?
You could start with bash_command = 'echo $PATH' to see if your path is being updated correctly.
This is because you are metioning editing the bash_profile, but as far as I know Airflow is being run as another user. Since the other user has no changes in the bash_profile, the path to Spark might be missing.
As mentioned here (How do I set an environment variable for airflow to use?) you could try setting the path in .bashrc.

File not found exception while starting Flume agent

I have installed Flume for the first time. I am using hadoop-1.2.1 and flume 1.6.0
I tried setting up a flume agent by following this guide.
I executed this command : $ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
It says log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: ./logs/flume.log (No such file or directory)
Isn't the flume.log file generated automatically? If not, how can I rectify this error ?
Try this:
mkdir ./logs
sudo chown `whoami` ./logs
bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
The first line creates the logs directory in the current directory if it does not already exist. The second one sets the owner of that directory to the current user (you) so that flume-ng running as your user can write to it.
Finally, please note that this is not the recommended way to run Flume, just a quick hack to try it.
You are getting this error probably because you are running command directly on console, you've to first go to the bin in flume and try running your command there over console.
As #Botond says, you need to set the right permissions.
However, if you run Flume within a program, like supervisor or with a custom script, you might want to change the default path, as it's relative to the launcher.
This path is defined in your /path/to/apache-flume-1.6.0-bin/conf/log4j.properties. There you can change the line
flume.log.dir=./logs
to use an absolute path that you would like to use - you still need the right permissions, though.

Trouble running pig in both local or mapreduce mode

I already have Hadoop 1.2 running on my Ubuntu VM which is running on Windows 7 machine. I recently installed Pig 0.12.0 on my same Ubuntu VM. I have downloaded the pig-0.12.0.tar.gz from the apache website. I have all the variables such as JAVA_HOME, HADOOP_HOME, PIG_HOME variables set correctly. When I try to start pig in local mode this is what I see:
chandeln#ubuntu:~$ pig -x local
pig: invalid option -- 'x'
usage: pig
chandeln#ubuntu:~$ echo $JAVA_HOME
/usr/lib/jvm/java7
chandeln#ubuntu:~$ echo $HADOOP_HOME
/usr/local/hadoop
chandeln#ubuntu:~$ echo $PIG_HOME
/usr/local/pig
chandeln#ubuntu:~$ which pig
/usr/games/pig
chandeln#ubuntu:~$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/jvm/java7/bin:/usr/local/hadoop/bin:/usr/local/pig/bin
Since I am not a Unix expert I am not sure if this is the problem but the command which pig actually returns /usr/games/pig instead of /usr/local/pig. Is this the root cause of the problem?
Please guide.
I was able to fix the problem by changing the following lines in my .bashrc. This gave precedence to the /usr/local/pig directory instead of /usr/games/pig
BEFORE: export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$PIG_HOME/bin
AFTER: export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PIG_HOME/bin:$PATH

Error: Could not find or load main class org.apache.flume.node.Application - Install flume on hadoop version 1.2.1

I have built a hadoop cluster which 1 master-slave node and the other is slave. And now, I wanna build a flume to get all log of the cluster on master machine. However, when I try to install flume from tarball and I always get:
Error: Could not find or load main class org.apache.flume.node.Application
So, please help me to find the answer, or the best way to install flume on my cluster.
many thanks!
It is basically because of FLUME_HOME..
Try this command
$ unset FLUME_HOME
I know its been almost a year for this question, but I saw it!
When you set your agnet using sudo bin/flume-ng.... make sure to specify the file where the agent configuration is.
--conf-file flume_Agent.conf -> -f conf/flume_Agent.conf
This did the trick!
look like you run flume-ng in /bin folder
flume after build in
/flume-ng-dist/target/apache-flume-1.5.0.1-bin/apache-flume-1.5.0.1-bin
run flume-ng in this
I suppose you are trying to run flume from cygwin on windows? If that is the case, I had a similar issue. The problem might be with the flume-ng script.
Find the following line in bin/flume-ng:
$EXEC java $JAVA_OPTS $FLUME_JAVA_OPTS "${arr_java_props[#]}" -cp "$FLUME_CLASSPATH" \
-Djava.library.path=$FLUME_JAVA_LIBRARY_PATH "$FLUME_APPLICATION_CLASS" $*
and replace it with this
$EXEC java $JAVA_OPTS $FLUME_JAVA_OPTS "${arr_java_props[#]}" -cp `cygpath -wp "$FLUME_CLASSPATH"` \
-Djava.library.path=`cygpath -wp $FLUME_JAVA_LIBRARY_PATH` "$FLUME_APPLICATION_CLASS" $*
Notice that the paths have been replaced with the windows directories. Java would not be able to find the library paths from the cygdrive paths and we would have to convert it to the correct windows paths wherever applicable
Maybe you are using the source files, you first should compile the source code and generate the binary code, then inside the binary files directory, you can execute: bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1. All these information you can follow: https://cwiki.apache.org/confluence/display/FLUME/Getting+Started
I got same issue before, it's simply due to FLUME_CLASSPATH not set
the best way to debug is see the java command being fired and make sure that flume lib is included in the CLASSPATH (-cp),
As in following command its looking for /lib/*, thats where the flume-ng-*.jar are, but its incorrect because there's nothing in /lib, in this line -cp '/staging001/Flume/server/conf://lib/*:/lib/*'. It has to be ${FLUME_HOME}/lib.
usr/lib/jvm/java-1.8.0-ibm-1.8.0.3.20-1jpp.1.el7_2.x86_64/jre/bin/java -Xms100m -Xmx500m $'-Dcom.sun.management.jmxremote\r' \
-Dflume.monitoring.type=http \
-Dflume.monitoring.port=34545 \
-cp '/staging001/Flume/server/conf://lib/*:/lib/*' \
-Djava.library.path= org.apache.flume.node.Application \
-f /staging001/Flume/server/conf/flume.conf -n client
So, if you look at the flume-ng script,
There's FLUME_CLASSPATH setup, which if absent it is setup based on FLUME_HOME.
# prepend $FLUME_HOME/lib jars to the specified classpath (if any)
if [ -n "${FLUME_CLASSPATH}" ] ; then
FLUME_CLASSPATH="${FLUME_HOME}/lib/*:$FLUME_CLASSPATH"
else
FLUME_CLASSPATH="${FLUME_HOME}/lib/*"
fi
So make sure either of those environments is set. With FLUME_HOME set, (I'm using systemd)
Environment=FLUME_HOME=/staging001/Flume/server/
Here's the working java exec.
/usr/lib/jvm/java-1.8.0-ibm-1.8.0.3.20-1jpp.1.el7_2.x86_64/jre/bin/java -Xms100m -Xmx500m \
$'-Dcom.sun.management.jmxremote\r' \
-Dflume.monitoring.type=http \
-Dflume.monitoring.port=34545 \
-cp '/staging001/Flume/server/conf:/staging001/Flume/server/lib/*:/lib/*' \
-Djava.library.path= org.apache.flume.node.Application \
-f /staging001/Flume/server/conf/flume.conf -n client

CLASSPATH issue in Hadoop on Cygwin while running "hadoop version" command

I have installed Hadoop version 2.1 beta from Apache on Windows using Cygwin terminal. Running the command hadoop version gets me this error :
Error: Could not find or load main class org.apache.hadoop.util.VersionInfo
you can also add the following to your ~/.bashrc
export HADOOP_CLASSPATH=$(cygpath -pw $(hadoop classpath)):$HADOOP_CLASSPATH
this solved it for me
I met the same issue when trying to install Hadoop 2.2.0 on windows 2008 Server Sp1 64bit.
I have installed cygwin64 and configured openssh.
The answer by user2870991 works for me.
Modify the \hadoop\bin\hadoop script as below, comment the original exec line and insert the new one.
#exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$#"
#add the -claspath "$(cygpath -pw "$CLASSPATH")" TO FIX the script running in cygwin
exec "$JAVA" -classpath "$(cygpath -pw "$CLASSPATH")" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$#"
Add the below statement in hadoop-config.sh # line no 285
CLASSPATH=`cygpath -wp "$CLASSPATH"`
//Comments goes here
if [ "$HADOOP_CLASSPATH" != "" ]; then
# Prefix it if its to be preceded
if [ "$HADOOP_USER_CLASSPATH_FIRST" != "" ]; then
CLASSPATH=${HADOOP_CLASSPATH}:${CLASSPATH}
else
CLASSPATH=${CLASSPATH}:${HADOOP_CLASSPATH}
fi
fi
Output :
admin#admin-PC /cygdrive/e/hadoop/hadoop-2.2.0/bin
$ ./hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /E:/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

Resources