Trying to fetch twitter data through flume - hadoop

I have been trying to fetch twitter data through flume. The twitter app that i made is named pntgoswami18 and description is BackToCollege. I have done all the key and token replacements required.
But Running the fetch like this:
bin/flume-ng agent -n TwitterAgent --conf ./conf/ -f conf/flume-twitter.conf -Dflume.root.logger=Debug.console
returns a screen with these warnings
log4j:WARN No appenders could be found for logger (org.apache.flume.node.PollingPropertiesFileConfigurationProvider).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
And the terminal keeps waiting for something. Have kept it running for a while but nothing happened. What am I doing wrong ?
my flume-env.sh file contents are like this
$JAVA_OPTS="-Xms500m -Xmx1000m -Dcom.sun.management.jmxremote"
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export FLUME_CLASSPATH='/usr/local/flume/lib'

One of the issue in command is wrong specification of logger.
Replace -Dflume.root.logger=Debug.console by -Dflume.root.logger=Debug,console.

Related

How logs printed directly onto the console in yarn-cluster mode using spark

I am new in spark and i want to print logs on console using apache spark in yarn cluster mode.
You need to check the value in log4j.properties file. In my case i have this file in /etc/spark/conf.dist directory
log4j.rootCategory=INFO,console
INFO - prints the all the logs on the console. You can change the value to ERROR, WARN to limit the information you would like to see on the console as sparks logs can be overwhelming

hadoop log4j not working

My jobs are running successfully with Hadoop 2.6.0 but the logger is not working at all
I always see
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
yarn-site.xml has the directory with the log4j.properties file listed. I also tried passing it manually via -Dlog4j.configuration option.
the file is here: http://www.pastebin.ca/2966941
To enable AppSummaryLogging for the RM,
set yarn.server.resourcemanager.appsummary.logger to ,RMSUMMARY
in hadoop-env.sh
Try the above step as mentioned in log4j.properties. Not sure if it works.

Hadoop streaming and log4j

I have a Hadoop streaming job which fails for some reason. To find out why this happens I found corresponding stderr of the failed task, but there is only message about log4j not initialized:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Referenced website says that this means, default configuration files log4j.properties and log4j.xml can not be found and the application performs no explicit configuration.
I my system log4j.properties file is located in the usual ${HADOOP_HOME}/etc/hadoop/ directory. Why Hadoop cannot find it? Is this because streaming job is not supposed to log via log4j anyways?.. Is it possible to see stdout/stderr of a streaming job written in e.g. Perl?
Thanks!

Can't start master and slave, strange thing named "bogon" in the log

I downloaded a new pre-build spark for hadoop 2.2 file. Following this document, I want to launch my master on my single machine. After untar the file, I enter the sbin and start-master, but I face this strange problem, here is the log:
Spark Command: /Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/bin/java -cp :/opt/spark-0.9.0-incubating-bin-hadoop2/conf:/opt/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip bogon --port 7077 --webui-port 8080
========================================
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.jboss.netty.channel.ChannelException: Failed to bind to: bogon/125.211.213.133:7077
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
What's that bogon? And where is the IP 125.211.213.133(not my IP) comes from? What's the problem here?
"bogon" comes from the command line provided. You probably forgot to replace the parameter --ip to the local ip of your host.
When using the sbin/start-master.sh, if not IP is provided, the reported hostname of the machine is used:
start-master.sh
if [ "$SPARK_MASTER_IP" = "" ]; then
SPARK_MASTER_IP=`hostname`
fi
If the reported hostname is not right, you can provide Spark with is IP by setting the env variable.
SPARK_MASTER_IP=172.17.0.1 start-master.sh
check your hostname by run the command hostname if you are linux env. And I think 125.211.213.133 is the IP for bogon, and you mistakenly set your hostname to "bogon".
For quick fix, you can run command hostname localhost and try again.

hadoop creates dir that cannot be found

I use the following hadoop command to create a directory
hdfs dfs -mkdir /tmp/testing/morehere1
I get the following message:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
Not understanding the error, I run the command again, which returns this message:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
mkdir: `/tmp/testing/morehere2': File exists
then when I try to go to the directory just created, it's not there.
cd /tmp/testing/morehere2
-bash: cd: /tmp/testing/morehere2: No such file or directory
Any ideas what I am doing wrong?
hdfs dfs -mkdir /tmp/testing/morehere1
This command created a directory in the hdfs . Dont worry about the log4j warning . The command created the directory successfully . That is why you got the error mkdir: /tmp/testing/morehere2': File exists the second time you tried the command .
The following command will not work , since the directory is not created in your local filesystem , but in hdfs .
cd /tmp/testing/morehere2
Use the command below to check the created directory in hdfs :
hdfs dfs -ls /tmp/testing
You should be able to see the new directory there .
About the log4j warning : You can ignore that as it will not cause your hadoop commands to fail . But if you want to correct it , you can add a File appender to log4j.properties .
Remember that there's a difference between HDFS and your local file system. That first line that you posted creates a directory in HDFS, not on your local system. So you can't cd to it or ls it or anything directly; if you want to access it, you have to through hadoop. It's also very rare to be logging to HDFS as file appends have never been well-supported. I suspect that you actually want to be creating that directory locally, and that might be part of your problem.
If your MR code were running fine previously and Now its showing this log4j error then restart all the hadoop daemons. It may solve your problem as it solves mine :)

Resources