So i configured flume to write my apache2 access logs to hdfs ...and as i figured out the by the logs of flume is that all the configuration are correct but i dont know the reason why is it still not writing to hdfs.
So here is my flume config file
#agent and component of agent
search.sources = so
search.sinks = si
search.channels = sc
# Configure a channel that buffers events in memory:
search.channels.sc.type = memory
search.channels.sc.capacity = 20000
search.channels.sc.transactionCapacity = 100
# Configure the source:
search.sources.so.channels = sc
search.sources.so.type = exec
search.sources.so.command = tail -F /var/log/apache2/access.log
# Describe the sink:
search.sinks.si.channel = sc
search.sinks.si.type = hdfs
search.sinks.si.hdfs.path = hdfs://localhost:9000/flumelogs/
search.sinks.si.hdfs.writeFormat = Text
search.sinks.si.hdfs.fileType = DataStream
search.sinks.si.hdfs.rollSize=0
search.sinks.si.hdfs.rollCount = 10000
search.sinks.si.hdfs.batchSize=1000
search.sinks.si.rollInterval=1
and here are my flume logs
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Creating channels
14/12/18 17:47:56 INFO channel.DefaultChannelFactory: Creating instance of channel sc type memory
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Created channel sc
14/12/18 17:47:56 INFO source.DefaultSourceFactory: Creating instance of source so, type exec
14/12/18 17:47:56 INFO sink.DefaultSinkFactory: Creating instance of sink: si, type: hdfs
14/12/18 17:47:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/18 17:47:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Channel sc connected to [so, si]
14/12/18 17:47:56 INFO node.Application: Starting new configuration:{ sourceRunners:{so=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:so,state:IDLE} }} sinkRunners:{si=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#3de76481 counterGroup:{ name:null counters:{} } }} channels:{sc=org.apache.flume.channel.MemoryChannel{name: sc}} }
14/12/18 17:47:56 INFO node.Application: Starting Channel sc
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: sc: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: sc started
14/12/18 17:47:56 INFO node.Application: Starting Sink si
14/12/18 17:47:56 INFO node.Application: Starting Source so
14/12/18 17:47:56 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/apache2/access.log
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: si: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: si started
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: so: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: so started
and this is the command, i have used to start flume
flume-ng agent -n search -c conf -f ../conf/flume-conf-search
and i have a path in hdfs
hadoop fs -mkdir hdfs://localhost:9000/flumelogs
but i dont know why it is not writing to hdfs..i can see the access logs of apache2 ..but flume is not sending them to hdfs /flumelogs dir....please help ! !
I don't think it's a permission issue, you would see exceptions when flume is flushing to HDFS. There are two possible reasons for this problem:
1) there's is not enough data in the buffer, flume doesn't think it has to flush yet. Your sink batch size is 1000, your channel's capacity is 20000. To verify this, CTRL -C your flume process, that will force the process to flush to HDFS.
2) the more probable reason is that your exec source is not running properly. This can be due to a path problem with the tail command. Add the full path to tail in your command, like /bin/tail -F /var/log/apache2/access.log or /usr/bin/tail -F /var/log/apache2/access.log (depending on your system) check
which tail
for the correct path.
Could you please check the permissions on this folder : hdfs://localhost:9000/flumelogs/
My guess is that flume doesn't have the permission to write to that folder
Related
I am trying to stream and retrieve Twitter data using Flume but unable to do so because of some sort of error.
When I try executing it using the command:
flume-ng agent -n TwitterAgent -c conf -f /home/hadoop/Flume/conf/twitter.conf
I get the following:
Info: Including Hadoop libraries found via (/home/hadoop/hadoop-2.10.1/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/home/hadoop/hbase-2.2.5/bin/hbase) for HBASE access
Info: Including Hive libraries found via (/home/hadoop/apache-hive-2.3.7-bin) for Hive access
+ exec /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx20m -cp 'conf:/home/hadoop/Flume/lib/*:/home/hadoop/hadoop-2.10.1/etc/hadoop:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/common/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.10.1/contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-2.2.5/conf:/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/hadoop/hbase-2.2.5:/home/hadoop/hbase-2.2.5/lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.2.5.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/home/hadoop/hadoop-2.10.1/etc/hadoop:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/common/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.10.1/contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-2.2.5/conf:/home/hadoop/apache-hive-2.3.7-bin/lib/*' -Djava.library.path=:/home/hadoop/hadoop-2.10.1/lib/native:/home/hadoop/hadoop-2.10.1/lib/native org.apache.flume.node.Application -n TwitterAgent -f /home/hadoop/Flume/conf/twitter.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/Flume/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.3.7-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/11/20 02:23:44 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
20/11/20 02:23:44 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/hadoop/Flume/conf/twitter.conf
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 WARN conf.FlumeConfiguration: Agent configuration for 'TwitterAgent' has no configfilters.
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Creating channels
20/11/20 02:23:44 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Created channel MemChannel
20/11/20 02:23:44 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type org.apache.flume.source.twitter.TwitterSource
**20/11/20 02:23:44 ERROR node.AbstractConfigurationProvider: Source Twitter has been removed due to an error during configuration**
j**ava.lang.InstantiationException: Incompatible source and channel settings defined. source's batch size is greater than the channels transaction capacity. Source: Twitter, batch size = 1000, channel MemChannel, transaction capacity = 100**
at org.apache.flume.node.AbstractConfigurationProvider.checkSourceChannelCompatibility(AbstractConfigurationProvider.java:386)
at org.apache.flume.node.AbstractConfigurationProvider.getSourceChannels(AbstractConfigurationProvider.java:367)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:329)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:105)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/11/20 02:23:44 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [HDFS]
20/11/20 02:23:44 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#78e3d64e counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
20/11/20 02:23:44 INFO node.Application: Starting Channel MemChannel
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
20/11/20 02:23:44 INFO node.Application: Starting Sink HDFS
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
The terminal just stays stuck here and nothing happens. I tried waiting for several minutes but it stays the same.
My config file twitter.conf is located at /home/hadoop/Flume/conf and is as follows:
#Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
#Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey =##
TwitterAgent.sources.Twitter.consumerSecret =##
TwitterAgent.sources.Twitter.accessToken =##
TwitterAgent.sources.Twitter.accessTokenSecret =##
TwitterAgent.sources.Twitter.keywords =covid,covid-19,coronavirus
#Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
#Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100
TwitterAgent.channels.MemChannel.transactionCapacity = 100
#Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
My flume-env.sh file is as follows:
#Licensed to the Apache Software Foundation (ASF) under one
#or more contributor license agreements. See the NOTICE file
#distributed with this work for additional information
#regarding copyright ownership. The ASF licenses this file
#to you under the Apache License, Version 2.0 (the
#"License"); you may not use this file except in compliance
#with the License. You may obtain a copy of the License at
#http://www.apache.org/licenses/LICENSE-2.0
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
#during Flume startup.
#Enviroment variables can be set here.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=$CLASSPATH:/home/hadoop/Flume/lib/*
FLUME_CLASSPATH="/home/hadoop/Flume/lib/flume-sources-1.0-SNAPSHOT.jar"
#Give Flume more memory and pre-allocate, enable remote monitoring via JMX
#export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
#Let Flume write raw event data and configuration information to its log files for debugging
#purposes. Enabling these flags is not recommended in production,
#as it may result in logging sensitive user information or encryption secrets.
#export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "
#Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""
The error says
j**ava.lang.InstantiationException: Incompatible source and channel settings defined. source's batch size is greater than the channels transaction capacity. Source: Twitter, batch size = 1000, channel MemChannel, transaction capacity = 100**
So you can try either decrease source batch size or increase channel capacity to match source batch size.
Update: Apparently after some research I found that I used a bad version of : flume-sources-1.0-SNAPSHOT.jar which is a jar file found in the lib folder of Flume. Fixed it by generating my own jar by following the method at: https://community.cloudera.com/t5/Support-Questions/issue-flume-twitter/m-p/22938#M6597
I am trying to fetch the twitter data in HDFS but getting issue.
Here is my flume.conf file
TwitterAgent.sources= Twitter
TwitterAgent.channels= MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sources.Twitter.consumerKey=xxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret= xxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken=xxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords= hadoop,election,sports, cricket,Big data
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100
In Env.sh file, I have the path
#FLUME_CLASSPATH="/usr/lib/flume-sources-1.0-SNAPSHOT.jar"
Now I am using the below command to get the data-
[cloudera#quickstart etc]$ flume-ng agent -n TwitterAgent -c conf -f /etc/flume-ng/conf/flume.conf
It showing some logs but I am getting the below error and it is getting stuck after HDFS sink started.
16/09/25 05:18:36 WARN conf.FlumeConfiguration: Could not configure source Twitter due to: Component has no type. Cannot configure. Twitter
org.apache.flume.conf.ConfigurationException: Component has no type. Cannot configure. Twitter
at org.apache.flume.conf.ComponentConfiguration.configure(ComponentConfiguration.java:76)
at org.apache.flume.conf.source.SourceConfiguration.configure(SourceConfiguration.java:56)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSources(FlumeConfiguration.java:567)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:346)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.access$000(FlumeConfiguration.java:213)
at org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:127)
at org.apache.flume.conf.FlumeConfiguration.<init>(FlumeConfiguration.java:109)
at org.apache.flume.node.PropertiesFileConfigurationProvider.getFlumeConfiguration(PropertiesFileConfigurationProvider.java:189)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:89)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/09/25 05:18:36 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Creating channels
16/09/25 05:18:36 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Created channel MemChannel
16/09/25 05:18:36 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [HDFS]
16/09/25 05:18:36 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#3963542c counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
16/09/25 05:18:36 INFO node.Application: Starting Channel MemChannel
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
16/09/25 05:18:36 INFO node.Application: Starting Sink HDFS
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
In configuration file please replace
TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource
by
TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource
I am installing CDH4.6.0 with the help of this site I am running start-all.sh to start services.
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-hdfs-datanode start
/etc/init.d/hadoop-hdfs-secondarynamenode start
/etc/init.d/hadoop-0.20-mapreduce-jobtracker start
/etc/init.d/hadoop-0.20-mapreduce-tasktracker start
bin/bash [to start bash prompt after starting services]
After executing these instructions as a part of docker file, like
CMD ["start-all.sh"]
It starts all the services
When i jps it, i can see only
jps
Namenode
Datanode
Secondary Namenode
Tasktracker
But job tracker is not yet started. log is as follows
2015-01-23 07:26:46,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=JobTracker, sessionId=
2015-01-23 07:26:46,735 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker up at: 8021
2015-01-23 07:26:46,735 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker webserver: 50030
2015-01-23 07:26:47,725 INFO org.apache.hadoop.mapred.JobTracker:
Creating the system directory
2015-01-23 07:26:47,750 WARN org.apache.hadoop.mapred.JobTracker: Failed
to operate on mapred.system.dir (hdfs://localhost:8020/var/lib/hadoop-
hdfs/cache/mapred/mapred/system) because of permissions.
2015-01-23 07:26:47,750 WARN org.apache.hadoop.mapred.JobTracker: This
directory should be owned by the user 'mapred (auth:SIMPLE)'
2015-01-23 07:26:47,751 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ...
org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
But when i again start it from bash prompt, it works. Why so? Any suggestions?
I can see it from the log. Job tracker is starting at port 8020 and why is it trying to operate at port 8020? Is it a problem? If so, how to tackle it?
Seems like the mapred user doesn't have privilege to write files/directories inside the HDFS root directory.
Switch to hdfs user and assign necessary privilege to mapred user before starting mapreduce service.
sudo -su hdfs ;
hadoop fs -chmod 777 /
/etc/init.d/hadoop-0.20-mapreduce-jobtracker stop; /etc/init.d/hadoop-0.20-mapreduce-jobtracker start
I am trying to use Apache Flume for saving tweets to my HDFS. I am currently using the Cloudera image with Hadoop and Flume. I was following the tutorial from Cloudera's blog, but I am not able to connect to the Twitter API.
I am getting following error:
2014-03-14 09:43:14,021 INFO org.apache.flume.node.Application: Waiting for channel: MemChannel to start. Sleeping for 500 ms
2014-03-14 09:43:14,069 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
2014-03-14 09:43:14,069 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
2014-03-14 09:43:14,522 INFO org.apache.flume.node.Application: Starting Sink HDFS
2014-03-14 09:43:14,522 INFO org.apache.flume.node.Application: Starting Source Twitter
2014-03-14 09:43:14,525 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
2014-03-14 09:43:14,525 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
2014-03-14 09:43:14,595 INFO twitter4j.TwitterStreamImpl: Establishing connection.
2014-03-14 09:43:14,680 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2014-03-14 09:43:14,823 INFO org.mortbay.log: jetty-6.1.26
2014-03-14 09:43:14,946 INFO org.mortbay.log: Started SocketConnector#0.0.0.0:41414
2014-03-14 09:43:16,249 INFO twitter4j.TwitterStreamImpl: 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.
HTTP ERROR: 401
Problem accessing '/1.1/statuses/filter.json'. Reason:
Unauthorized
2014-03-14 09:43:16,249 INFO twitter4j.TwitterStreamImpl: Waiting for 10000 milliseconds
2014-03-14 09:43:26,251 INFO twitter4j.TwitterStreamImpl: Establishing
I have copied my twitter API credentials to the flume.conf (I have tried in both on disc and web UI). I have also tried to regenerate them again and copy those new ones, but it didn't help me.
My pom.xml contains:
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>3.0.5</version>
</dependency>
That means that there shouldn't be the problem that is described here.
And I have also set the system time by command:
sudo ntpdate pool.ntp.org
Does anybody have some idea of what can possibly be wrong?
Thank you very much in advance for any suggestions and help.
Try upgrading to Twitter4J 3.0.6.. I resolved a similar issue by upgrading to 3.0.6
Update:
Its because of invalid consumer key/secret, access token/secret, and make sure to have the system clock is in sync."
I'm new at using Flume and Hadoop so I'm trying to setup the simplest (but somewhat helpful/realistic) example I can. I'm using the HortonWorks Sandbox in a VM client. After following one tutorial 12 (which involves setting up and using Flume) everything seems to be working correctly.
So I setup my own flume.conf that should
Read from an apache access log
Use a memory channel
Write to the HDFS
Simple enough right? Here's my conf file
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
agent.sources.exec-source.type=exec
agent.sources.exec-source.command=tail -F /var/log/httpd/access_log
agent.sinks.hdfs-sink.type=hdfs
agent.sinks.hdfs-sink.hdfs.path=/flume/events
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess
agent.sinks.hdfs-sink.hdfs.rollInterval=10
agent.sinks.hdfs-sink.hdfs.rollSize=0
agent.channels.ch1.type=memory
agent.channels.ch1.capacity=1000
agent.sources.exec-source.channels=ch1
agent.sinks.hdfs-sink.channel=ch1
I've seen several people have problems writing to HDFS, and in most cases it was that there weren't enough logs to fill the HDFS block. However, rollInterval=10 should generate a new file every 10 seconds, as long as at least 1 line is written to it. I can run "tail -F /var/log/httpd/access_log" in another window and see lines being written to the log fairly consistantly, so I don't think it's that.
and here's the command/output from trying to start this agent
[root#sandbox ~]# flume-ng agent -f /etc/flume/conf/flume.conf -n apache-agent
Warning: No configuration directory set! Use --conf <dir> to override.
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-log4j12-1.4.3.jar from classpath
Info: Including HBASE libraries found via (/usr/bin/hbase) for HBASE access
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-log4j12-1.4.3.jar from classpath
+ exec /usr/jdk/jdk1.6.0_31//bin/java -Xmx20m -cp '/usr/lib/flume/lib/*:/usr/lib/hadoop/libexec/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hadoop/libexec/..:/usr/lib/hadoop/libexec/../hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/libexec/../lib/asm-3.2.jar:/usr/lib/hadoop/libexec/../lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/libexec/../lib/commons-cli-1.2.jar:/usr/lib/hadoop/libexec/../lib/commons-codec-1.4.jar:/usr/lib/hadoop/libexec/../lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-configuration-1.6.jar:/usr/lib/hadoop/libexec/../lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-digester-1.8.jar:/usr/lib/hadoop/libexec/../lib/commons-el-1.0.jar:/usr/lib/hadoop/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-io-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-lang-2.4.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/libexec/../lib/commons-math-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-net-3.1.jar:/usr/lib/hadoop/libexec/../lib/core-3.1.1.jar:/usr/lib/hadoop/libexec/../lib/guava-11.0.2.jar:/usr/lib/hadoop/libexec/../lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/libexec/../lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-tools.jar:/usr/lib/hadoop/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/libexec/../lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jdeb-0.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-core-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-json-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-server-1.8.jar:/usr/lib/hadoop/libexec/../lib/jets3t-0.6.1.jar:/usr/lib/hadoop/libexec/../lib/jetty-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jsch-0.1.42.jar:/usr/lib/hadoop/libexec/../lib/junit-4.5.jar:/usr/lib/hadoop/libexec/../lib/kfs-0.2.2.jar:/usr/lib/hadoop/libexec/../lib/log4j-1.2.15.jar:/usr/lib/hadoop/libexec/../lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/libexec/../lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/lib/hadoop/libexec/../lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hbase/bin/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hbase/bin/..:/usr/lib/hbase/bin/../hbase-0.94.6.1.3.0.0-107-security.jar:/usr/lib/hbase/bin/../hbase-0.94.6.1.3.0.0-107-security-tests.jar:/usr/lib/hbase/bin/../lib/activation-1.1.jar:/usr/lib/hbase/bin/../lib/asm-3.1.jar:/usr/lib/hbase/bin/../lib/avro-1.5.3.jar:/usr/lib/hbase/bin/../lib/avro-ipc-1.5.3.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hbase/bin/../lib/commons-cli-1.2.jar:/usr/lib/hbase/bin/../lib/commons-codec-1.4.jar:/usr/lib/hbase/bin/../lib/commons-collections-3.2.1.jar:/usr/lib/hbase/bin/../lib/commons-configuration-1.6.jar:/usr/lib/hbase/bin/../lib/commons-digester-1.8.jar:/usr/lib/hbase/bin/../lib/commons-el-1.0.jar:/usr/lib/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/lib/hbase/bin/../lib/commons-io-2.1.jar:/usr/lib/hbase/bin/../lib/commons-lang-2.5.jar:/usr/lib/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/lib/hbase/bin/../lib/commons-math-2.1.jar:/usr/lib/hbase/bin/../lib/commons-net-1.4.1.jar:/usr/lib/hbase/bin/../lib/core-3.1.1.jar:/usr/lib/hbase/bin/../lib/guava-11.0.2.jar:/usr/lib/hbase/bin/../lib/hadoop-core.jar:/usr/lib/hbase/bin/../lib/high-scale-lib-1.1.1.jar:/usr/lib/hbase/bin/../lib/httpclient-4.1.2.jar:/usr/lib/hbase/bin/../lib/httpcore-4.1.3.jar:/usr/lib/hbase/bin/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-xc-1.8.8.jar:/usr/lib/hbase/bin/../lib/jamon-runtime-2.3.1.jar:/usr/lib/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/lib/hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/lib/hbase/bin/../lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hbase/bin/../lib/jersey-core-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-json-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-server-1.8.jar:/usr/lib/hbase/bin/../lib/jettison-1.1.jar:/usr/lib/hbase/bin/../lib/jetty-6.1.26.jar:/usr/lib/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/lib/hbase/bin/../lib/jruby-complete-1.6.5.jar:/usr/lib/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsr305-1.3.9.jar:/usr/lib/hbase/bin/../lib/junit-4.10-HBASE-1.jar:/usr/lib/hbase/bin/../lib/libthrift-0.8.0.jar:/usr/lib/hbase/bin/../lib/log4j-1.2.16.jar:/usr/lib/hbase/bin/../lib/metrics-core-2.1.2.jar:/usr/lib/hbase/bin/../lib/netty-3.2.4.Final.jar:/usr/lib/hbase/bin/../lib/protobuf-java-2.4.0a.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hbase/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/lib/hbase/bin/../lib/velocity-1.7.jar:/usr/lib/hbase/bin/../lib/xmlenc-0.52.jar:/usr/lib/hbase/bin/../lib/zookeeper.jar:/etc/hadoop/conf:/usr/lib/hadoop/bin:/usr/lib/hadoop/build.xml:/usr/lib/hadoop/CHANGES.txt:/usr/lib/hadoop/conf:/usr/lib/hadoop/contrib:/usr/lib/hadoop/hadoop-ant-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-ant.jar:/usr/lib/hadoop/hadoop-client-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-client.jar:/usr/lib/hadoop/hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/hadoop-examples-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-examples.jar:/usr/lib/hadoop/hadoop-minicluster-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-minicluster.jar:/usr/lib/hadoop/hadoop-test-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-test.jar:/usr/lib/hadoop/hadoop-tools-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-tools.jar:/usr/lib/hadoop/HDP-CHANGES.txt:/usr/lib/hadoop/ivy:/usr/lib/hadoop/ivy.xml:/usr/lib/hadoop/lib:/usr/lib/hadoop/libexec:/usr/lib/hadoop/LICENSE.txt:/usr/lib/hadoop/logs:/usr/lib/hadoop/LONGWING-CHANGES.txt:/usr/lib/hadoop/NOTICE.txt:/usr/lib/hadoop/pids:/usr/lib/hadoop/README.txt:/usr/lib/hadoop/sbin:/usr/lib/hadoop/webapps:/usr/lib/hadoop/lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-tools.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib/jdeb-0.8.jar:/usr/lib/hadoop/lib/jdiff:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-2.1:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/*plugin*jar:/usr/lib/hadoop/lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/zookeeper/bin:/usr/lib/zookeeper/conf:/usr/lib/zookeeper/lib:/usr/lib/zookeeper/zookeeper-3.4.5.1.3.0.0-107.jar:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/zookeeper/lib/ant-1.8.0.jar:/usr/lib/zookeeper/lib/ant-launcher-1.8.0.jar:/usr/lib/zookeeper/lib/backport-util-concurrent-3.1.jar:/usr/lib/zookeeper/lib/classworlds-1.1-alpha-2.jar:/usr/lib/zookeeper/lib/commons-codec-1.6.jar:/usr/lib/zookeeper/lib/commons-io-2.2.jar:/usr/lib/zookeeper/lib/commons-logging-1.1.1.jar:/usr/lib/zookeeper/lib/httpclient-4.2.3.jar:/usr/lib/zookeeper/lib/httpcore-4.2.3.jar:/usr/lib/zookeeper/lib/jline-0.9.94.jar:/usr/lib/zookeeper/lib/jsoup-1.7.1.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:/usr/lib/zookeeper/lib/maven-ant-tasks-2.1.3.jar:/usr/lib/zookeeper/lib/maven-artifact-2.2.1.jar:/usr/lib/zookeeper/lib/maven-artifact-manager-2.2.1.jar:/usr/lib/zookeeper/lib/maven-error-diagnostics-2.2.1.jar:/usr/lib/zookeeper/lib/maven-model-2.2.1.jar:/usr/lib/zookeeper/lib/maven-plugin-registry-2.2.1.jar:/usr/lib/zookeeper/lib/maven-profile-2.2.1.jar:/usr/lib/zookeeper/lib/maven-project-2.2.1.jar:/usr/lib/zookeeper/lib/maven-repository-metadata-2.2.1.jar:/usr/lib/zookeeper/lib/maven-settings-2.2.1.jar:/usr/lib/zookeeper/lib/nekohtml-1.9.6.2.jar:/usr/lib/zookeeper/lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/lib/zookeeper/lib/plexus-interpolation-1.11.jar:/usr/lib/zookeeper/lib/plexus-utils-3.0.8.jar:/usr/lib/zookeeper/lib/wagon-file-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-2.4.jar:/usr/lib/zookeeper/lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-shared-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-shared4-2.4.jar:/usr/lib/zookeeper/lib/wagon-provider-api-2.4.jar:/usr/lib/zookeeper/lib/xercesMinimal-1.9.6.2.jar:/usr/lib/hadoop/libexec/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hadoop/libexec/..:/usr/lib/hadoop/libexec/../hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/libexec/../lib/asm-3.2.jar:/usr/lib/hadoop/libexec/../lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/libexec/../lib/commons-cli-1.2.jar:/usr/lib/hadoop/libexec/../lib/commons-codec-1.4.jar:/usr/lib/hadoop/libexec/../lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-configuration-1.6.jar:/usr/lib/hadoop/libexec/../lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-digester-1.8.jar:/usr/lib/hadoop/libexec/../lib/commons-el-1.0.jar:/usr/lib/hadoop/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-io-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-lang-2.4.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/libexec/../lib/commons-math-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-net-3.1.jar:/usr/lib/hadoop/libexec/../lib/core-3.1.1.jar:/usr/lib/hadoop/libexec/../lib/guava-11.0.2.jar:/usr/lib/hadoop/libexec/../lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/libexec/../lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-tools.jar:/usr/lib/hadoop/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/libexec/../lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jdeb-0.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-core-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-json-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-server-1.8.jar:/usr/lib/hadoop/libexec/../lib/jets3t-0.6.1.jar:/usr/lib/hadoop/libexec/../lib/jetty-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jsch-0.1.42.jar:/usr/lib/hadoop/libexec/../lib/junit-4.5.jar:/usr/lib/hadoop/libexec/../lib/kfs-0.2.2.jar:/usr/lib/hadoop/libexec/../lib/log4j-1.2.15.jar:/usr/lib/hadoop/libexec/../lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/libexec/../lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/lib/hadoop/libexec/../lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/conf' -Djava.library.path=:/usr/lib/hadoop/libexec/../lib/native/Linux-amd64-64:/usr/lib/hadoop/libexec/../lib/native/Linux-amd64-64:/usr/lib/hbase/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application -f /etc/flume/conf/flume.conf -n apache-agent
13/09/03 12:35:11 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
13/09/03 12:35:11 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/etc/flume/conf/flume.conf
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Added sinks: hdfs-sink Agent: agent
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent]
13/09/03 12:35:11 WARN node.AbstractConfigurationProvider: No configuration found for this host:apache-agent
13/09/03 12:35:11 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} }
Now at this point I realize I'm missing several things.
1) I expect to see something along the lines of "INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-sink started" as my last line, which I don't
2) If I use the command “hadoop fs -lsr /flume” I should see new logs in my HDFS, but I don't. The last logs are from 8/28/2013, when I did the tutorial.
I also don't expect to see that WARN line in there, but I'm not sure why it's there, so maybe that's my problem and someone can tell me why.
So my questions are:
1) Can anyone tell me what might be going wrong here?
2) When I get this problem sorted out, is there anything else I should be looking for to see what Flume is working correctly, reading what it should and writing to where it should and when?
The answer is, of course, to name your agent when you start flume the same as your agent name in the config file. So my command line should have ended "-n agent" and NOT "-n apache-agent" since my flume.conf file specifies "agent.X"
After that everything appears to work.
In the config file you specified
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
so the agent name is 'agent' flume expects that while running the flume-agent you should use the same name as specified in the config file so the command should be
/usr/lib/flume/bin/flume-ng agent -n agent
Did you do set the agent in step #3 ?
Check out the original blog post and the Hadoop UI Hue and it Hadoop tutorials.