Hadoop streaming and log4j - hadoop

I have a Hadoop streaming job which fails for some reason. To find out why this happens I found corresponding stderr of the failed task, but there is only message about log4j not initialized:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Referenced website says that this means, default configuration files log4j.properties and log4j.xml can not be found and the application performs no explicit configuration.
I my system log4j.properties file is located in the usual ${HADOOP_HOME}/etc/hadoop/ directory. Why Hadoop cannot find it? Is this because streaming job is not supposed to log via log4j anyways?.. Is it possible to see stdout/stderr of a streaming job written in e.g. Perl?
Thanks!

Related

Flume conflicts hadoop with SLF4J: Class path contains multiple SLF4J bindings

I am having this messages between Flume, Hive and Hadoop when I start Flume every time. what is the best way to avoid this? I was thinking to remove one jar from flume lib directory but not sure if that going to effect others (hive, hadoop) or not.
Info: Sourcing environment configuration script /usr/local/flume/conf/flume-env.sh
Info: Including Hadoop libraries found via (/usr/local/hadoop/bin/hadoop) for HDFS access
+ exec /usr/java/jdk1.7.0_79/bin/java -Xms100m -Xmx200m -Dcom.sun.management.jmxremote -cp '/usr/local/flume/conf:/usr/local/flume/lib/*:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar' -Djava.library.path=:/usr/local/hadoop/lib/native org.apache.flume.node.Application --conf-file /usr/local/flume/conf/spooling3.properties --name agent1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/flume/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
The log messages you have mentioned may be treated as "ordinary warning messages". (No error.)
If you take a look at https://issues.apache.org/jira/browse/FLUME-2913 , you can see some further explanation.
The way the classpath is constructed for Apache Flume is: bin/flume-ng bash script collects all the classpaths from HBase and HDFS and combines them with Flume's own classpath.
If there is a different slf4j jar anywhere, you will see the warning.

Trying to fetch twitter data through flume

I have been trying to fetch twitter data through flume. The twitter app that i made is named pntgoswami18 and description is BackToCollege. I have done all the key and token replacements required.
But Running the fetch like this:
bin/flume-ng agent -n TwitterAgent --conf ./conf/ -f conf/flume-twitter.conf -Dflume.root.logger=Debug.console
returns a screen with these warnings
log4j:WARN No appenders could be found for logger (org.apache.flume.node.PollingPropertiesFileConfigurationProvider).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
And the terminal keeps waiting for something. Have kept it running for a while but nothing happened. What am I doing wrong ?
my flume-env.sh file contents are like this
$JAVA_OPTS="-Xms500m -Xmx1000m -Dcom.sun.management.jmxremote"
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export FLUME_CLASSPATH='/usr/local/flume/lib'
One of the issue in command is wrong specification of logger.
Replace -Dflume.root.logger=Debug.console by -Dflume.root.logger=Debug,console.

Hadoop log files cannot be found

I have configured hadoop-2.7.2 in windows. I couldn't find any logs in %HADOOP_HOME%\logs directory for hdfs and yarn.
In Hadoop-2.5.2, there will be two log files hadoop.log and yarn.log. But in new hadoop version, the log files are not generated it seems.
How to enable these logs again to debug the services.
Thanks,
Kumar
For hadoop.log and yarn.log you need to enable this mean.
open %HADOOP_HOME%\etc\hadoop\log4j.properties
Check following properties
hadoop.root.logger=INFO,console
hadoop.log.file=hadoop.log
hadoop.log.maxfilesize=200MB
hadoop.log.maxbackupindex=5
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
And set set YARN_ROOT_LOGGER=INFO,RFA,console and set HADOOP_ROOT_LOGGER=INFO,RFA,console at hadoop-env.cmd and yarn-env.cmd respectively.

hadoop log4j not working

My jobs are running successfully with Hadoop 2.6.0 but the logger is not working at all
I always see
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
yarn-site.xml has the directory with the log4j.properties file listed. I also tried passing it manually via -Dlog4j.configuration option.
the file is here: http://www.pastebin.ca/2966941
To enable AppSummaryLogging for the RM,
set yarn.server.resourcemanager.appsummary.logger to ,RMSUMMARY
in hadoop-env.sh
Try the above step as mentioned in log4j.properties. Not sure if it works.

hadoop creates dir that cannot be found

I use the following hadoop command to create a directory
hdfs dfs -mkdir /tmp/testing/morehere1
I get the following message:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
Not understanding the error, I run the command again, which returns this message:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
mkdir: `/tmp/testing/morehere2': File exists
then when I try to go to the directory just created, it's not there.
cd /tmp/testing/morehere2
-bash: cd: /tmp/testing/morehere2: No such file or directory
Any ideas what I am doing wrong?
hdfs dfs -mkdir /tmp/testing/morehere1
This command created a directory in the hdfs . Dont worry about the log4j warning . The command created the directory successfully . That is why you got the error mkdir: /tmp/testing/morehere2': File exists the second time you tried the command .
The following command will not work , since the directory is not created in your local filesystem , but in hdfs .
cd /tmp/testing/morehere2
Use the command below to check the created directory in hdfs :
hdfs dfs -ls /tmp/testing
You should be able to see the new directory there .
About the log4j warning : You can ignore that as it will not cause your hadoop commands to fail . But if you want to correct it , you can add a File appender to log4j.properties .
Remember that there's a difference between HDFS and your local file system. That first line that you posted creates a directory in HDFS, not on your local system. So you can't cd to it or ls it or anything directly; if you want to access it, you have to through hadoop. It's also very rare to be logging to HDFS as file appends have never been well-supported. I suspect that you actually want to be creating that directory locally, and that might be part of your problem.
If your MR code were running fine previously and Now its showing this log4j error then restart all the hadoop daemons. It may solve your problem as it solves mine :)

Resources