I am using flume to download twitter data but when i run the Flume command i get this error . kindly help please.
2016-08-13 17:21:12,945 (conf-file-poller-0) [ERROR - org.apache.flume.node.Poll ingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertie FileConfigurationProvider.java:149)] Unhandled error java.lang.AssertionError: java.lang.reflect.InvocationTargetException at twitter4j.HttpClientFactory.getInstance(HttpClientFactory.java:81) at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:51) at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40) at org.apache.flume.source.twitter.TwitterSource.configure(TwitterSource .java:115) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSources(Abstr actConfigurationProvider.java:326) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration( AbstractConfigurationProvider.java:97) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$File WatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51 1) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask. access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask. run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct orAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC onstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at twitter4j.HttpClientFactory.getInstance(HttpClientFactory.java:73) ... 14 more Caused by: java.lang.NoSuchMethodError: twitter4j.conf.Configuration.getHttpClie ntConfiguration()Ltwitter4j/HttpClientConfiguration; at twitter4j.StreamingReadTimeoutConfiguration.isGZIPEnabled(TwitterStre amImpl.java:804) at twitter4j.HttpClientBase.<init>(HttpClientBase.java:25) at twitter4j.HttpClientImpl.<init>(HttpClientImpl.java:55) ... 19 more
Configuration file is as follows :-
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = FMCaWqdvqJPzzmY76uPAzJphX
TwitterAgent.sources.Twitter.consumerSecret = me1DRFssnyirpj3j1cOC4rByvxTR1F2LfF2n3udaOXb6K9yWiZ
TwitterAgent.sources.Twitter.accessToken = 753216791690653696- CN3n8Krf8DBI5KorjnfjlmoVCFf7EnE
TwitterAgent.sources.Twitter.accessTokenSecret = KIEEPE6XDthMZ6peCoVYNDXKOy1ElTbF7hjXFggW7NHQK
TwitterAgent.sources.Twitter.keywords = india , elections , congress
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path =hdfs://localhost:9000/user/data/tweets_raw/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Flume command is :-
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf - Dflume.root.logger=DEBUG,console -n TwitterAgent
Related
I am trying to ingest data using flume from kafka source to hdfs. Below is my flume conf file.
flume1.sources = kafka-source-1
flume1.channels = hdfs-channel-1
flume1.sinks = hdfs-sink-1
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.bootstrap.servers = localhost:9092
flume1.sources.kafka-source-1.zookeeperConnect = localhost:2181
flume1.sources.kafka-source-1.topic = MYNEWSFEEDS
flume1.sources.kafka-source-1.batchSize = 100
flume1.sources.kafka-source-1.channels = hdfs-channel-1
flume1.channels.hdfs-channel-1.type = memory
flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = test-events
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = hdfs://quickstart.cloudera:8020/tmp
flume1.sinks.hdfs-sink-1.hdfs.rollCount=100
flume1.sinks.hdfs-sink-1.hdfs.rollSize=0
flume1.channels.hdfs-channel-1.capacity = 10000
flume1.channels.hdfs-channel-1.transactionCapacity = 1000
I am using below command to run flume agent:
sudo flume-ng agent --name flume1 --conf-file '/etc/flume-ng/conf/flafka.conf' Dflume.root.logger=TRACE,console
But I am getting below error:
18/03/12 16:49:18 ERROR node.AbstractConfigurationProvider: Source
kafka-source-1 has been removed due to an error during configuration
org.apache.flume.conf.ConfigurationException: Bootstrap Servers must
be specified at
org.apache.flume.source.kafka.KafkaSource.doConfigure(KafkaSource.java:330)
at
org.apache.flume.source.BasicSourceSemantics.configure(BasicSourceSemantics.java:65)
at
org.apache.flume.source.AbstractPollableSource.configure(AbstractPollableSource.java:63)
at
org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at
org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:326)
at
org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at
org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Although, I have specified the Bootstrap Servers in conf file but still it give same error. Have tried many permutations and combinations but no success.
According to the official JavaDoc, you should replace
flume1.sources.kafka-source-1.bootstrap.servers = localhost:9092
with
flume1.sources.kafka-source-1.kafka.bootstrap.servers = localhost:9092
I'm using Flume 1.6.0-cdh5.9.1 to stream Tweets using Twitter source.
The configuration file is below:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = hadoop, cloudera
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/cloudera/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
For the Cloudera .jar dependency, I've built flume-sources-1.0-SNAPSHOT.jar with Maven using below dependencies:
<dependencies>
<!-- For the Twitter API -->
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>4.0.6</version>
</dependency>
<!-- Hadoop Dependencies -->
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.6.0-cdh5.9.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-sdk</artifactId>
<version>1.6.0-cdh5.9.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.9.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
Now, when I run the Flume Agent, it starts successfully, connects to Twitter, but halts after the last line (Receiving status stream):
2017-02-08 21:55:12,556 (Twitter Stream consumer-1[initializing]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Establishing connection.
2017-02-08 21:55:46,474 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Connection established.
2017-02-08 21:55:46,474 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Receiving status stream.
After the last line nothing happens. It doesn't terminate, doesn't stream anything. I had a look at the HDFS location and nothing is created there.
Can someone help me here?
The problem lies in the configuration TwitterAgent.sources.Twitter.keywords
Twitter Source will work fine and will continuously pull Tweets as long as it finds data in the Firehose. I tried with some other popular recent keyword and it worked just perfectly fine.
Im using flume to move files to hdfs ... while moving file its showing this error.. please help me to solve this issue.
15/05/20 15:49:26 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
15/05/20 15:49:26 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/crayondata.com/shanmugapriya/apache-flume-1.5.2-bin/staging/HypeVisitorTest.java to /home/crayondata.com/shanmugapriya/apache-flume-1.5.2-bin/staging/HypeVisitorTest.java.COMPLETED
15/05/20 15:49:26 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
15/05/20 15:49:26 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
15/05/20 15:49:26 INFO hdfs.BucketWriter: Creating hdfs://localhost:9000/sha/HypeVisitorTest.java.1432117166377.tmp
15/05/20 15:49:26 ERROR hdfs.HDFSEventSink: process failed
java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:216)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2564)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:270)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:262)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:718)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183)
at org.apache.flume.sink.hdfs.BucketWriter.access$1700(BucketWriter.java:59)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:715)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/05/20 15:49:26 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:471)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:216)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2564)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:270)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:262)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:718)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183)
at org.apache.flume.sink.hdfs.BucketWriter.access$1700(BucketWriter.java:59)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:715)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
15/05/20 15:49:26 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
Here is my flumeconf.conf file
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/shanmugapriya/apache-flume-1.5.2-bin/staging
a1.sources.r1.fileHeader = true
a1.sources.r1.maxBackoff = 10000
a1.sources.r1.basenameHeader = true
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/sha
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.idleTimeout = 100
a1.sinks.k1.hdfs.filePrefix = %{basename}
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 1000
a1.channels.c1.byteCapacity = 0
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
please help to solve this.. TIA...
#Shan Please confirm you have the relevant Hadoop HDFS jars in your classpath for Apache Flume
Also from your sink to HDFS I see that you have port 9000, however the default port is normally 8020, is this correct?
Flume agent 1 does not connect to Flume agent 2. What could be the reason ?
I am using Flume to stream log file to HDFS using 2 Agents. The first agent is located at the source machine where the log file exists, while the second agent is located in the machine (IP Address is 10.10.201.40) where Hadoop is installed.
The configuration file of the first agent (flume-src-agent.conf) is as follows:
source_agent.sources = weblogic_server
source_agent.sources.weblogic_server.type = exec
source_agent.sources.weblogic_server.command = tail -f AdminServer.log
source_agent.sources.weblogic_server.batchSize = 1
source_agent.sources.weblogic_server.channels = memoryChannel
source_agent.sources.weblogic_server.interceptors = itime ihost itype
source_agent.sources.weblogic_server.interceptors.itime.type = timestamp
source_agent.sources.weblogic_server.interceptors.ihost.type = host
source_agent.sources.weblogic_server.interceptors.ihost.useIP = false
source_agent.sources.weblogic_server.interceptors.ihost.hostHeader = host
source_agent.sources.weblogic_server.interceptors.itype.type = static
source_agent.sources.weblogic_server.interceptors.itype.key = log_type
source_agent.sources.weblogic_server.interceptors.itype.value = apache_access_combined
source_agent.channels = memoryChannel
source_agent.channels.memoryChannel.type = memory
source_agent.channels.memoryChannel.capacity = 100
source_agent.sinks = avro_sink
source_agent.sinks.avro_sink.type = avro
source_agent.sinks.avro_sink.channel = memoryChannel
source_agent.sinks.avro_sink.hostname = 10.10.201.40
source_agent.sinks.avro_sink.port = 4545
The configuration file of the second agent (flume-trg-agent.conf) is as follows:
collector.sources = AvroIn
collector.sources.AvroIn.type = avro
collector.sources.AvroIn.bind = 0.0.0.0
collector.sources.AvroIn.port = 4545
collector.sources.AvroIn.channels = mc1 mc2
collector.channels = mc1 mc2
collector.channels.mc1.type = memory
collector.channels.mc1.capacity = 100
collector.channels.mc2.type = memory
collector.channels.mc2.capacity = 100
collector.sinks = HadoopOut
collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.channel = mc2
collector.sinks.HadoopOut.hdfs.path = hdfs://localhost:54310/user/root
collector.sinks.HadoopOut.hdfs.callTimeout = 150000
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text
collector.sinks.HadoopOut.hdfs.rollSize = 0
collector.sinks.HadoopOut.hdfs.rollCount = 10000
collector.sinks.HadoopOut.hdfs.rollInterval = 600
When the 1st agent is run, I get the following error:
2015-04-08 15:14:10,251 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException:Failed to send events
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:382)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient {host:10.10.201.40, port:4545}: RPC connection error
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:161)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:115)
at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:590)
at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127)
at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:209)
at org.apache.flume.sink.AbstractRpcSink.verifyConnection(AbstractRpcSink.java:269)
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:339)
... 3 more
Caused by: java.io.IOException: Error connecting to /10.10.201.40:4545
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
at.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:203)
at.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:152)
at.org.apache.avro.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:147)
When the 2nd Agent is run, I get the following error:
2015-04-08 15:53:31,649 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG-org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] Pollingsink runner starting
2015-04-08 15:53:31,844 (lifecycleSupervisor-1-3) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start EventDrivenSourceRunner: { source:Avro source AvroIn: {bindAddress: 0.0.0.0, port: 4545 } } - Exception follows.
org.jboss.netty.channel.ChannelException: Failed to bind to: /0.0.0.0:4545
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:298)
at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)
at org.apache.flume.source.AvroSource.start(AvroSource.java:225)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.bind(NioServerSocketPipelineSink.java:138)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleServerSocket(NioServerSocketPipelineSink.java:90)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:64)
at org.jboss.netty.channel.Channels.bind(Channels.java:569)
at org.jboss.netty.channel.AbstractChannel.bind(AbstractChannel.java:187)
at org.jboss.netty.bootstrap.ServerBootstrap$Binder.channelOpen(ServerBootstrap.java:343)
at org.jboss.netty.channel.Channels.fireChannelOpen(Channels.java:170)
at org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory.newChannel(NioServerSocketChannelFactory.java:158)
at org.jboss.netty.channel.socket.nio.NioServerSocketChannel.<init>(NioServerSocketChannel.java:80)
at org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory.newChannel(NioServerSocketChannelFactory.java:86)
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:277)
... 13 more
The answer to your question is in the second log:
Address already in use
The reason for this is that there's another process using port 4545. Just reconfigure both agents to another port, let say 41414, then it should work.
For binding issues type netstat -plten and check for the pid for the process and kill the process. Doing that will solve the binding issues when you run the agent again
I am trying to integrate FLUME with HDFS and my FLUME config file is
hdfs-agent.sources= netcat-collect
hdfs-agent.sinks = hdfs-write
hdfs-agent.channels= memoryChannel
hdfs-agent.sources.netcat-collect.type = netcat
hdfs-agent.sources.netcat-collect.bind = localhost
hdfs-agent.sources.netcat-collect.port = 11111
hdfs-agent.sinks.hdfs-write.type = FILE_ROLL
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:50020/user/oracle/flume
hdfs-agent.sinks.hdfs-write.rollInterval = 30
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream
hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannel
hdfs-agent.sinks.hdfs-write.channel=memoryChannel.
And my core site file is
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>
When i try to run the flume agent , it is starting and it is able to read from the nc command but while writing to the hdfs i am getting the below exception. I have tried to start in safe mode using hadoop dfsadmin -safemode leave still i have the same below exception.
2014-02-14 10:31:53,785 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating hdfs://127.0.0.1:50020/user/oracle/flume/FlumeData.1392354113707.tmp
2014-02-14 10:31:54,011 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error
java.io.IOException: Call to /127.0.0.1:50020 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1089)
at org.apache.hadoop.ipc.Client.call(Client.java:1057)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:369)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1489)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1523)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1505)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:781)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:689)
Please let me know if have configured something wrong in any of the properties files so that it will work.
Also please let me know if i am using the correct port for this
my target is to integrate flume and hadoop.
i have a single node server setup for hadoop
You must provide a port number with fs.default.name
Example :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9001</value>
</property>
</configuration>
After that edit the Flume config file as below
hdfs-agent.sources= netcat-collect
hdfs-agent.sinks = hdfs-write
hdfs-agent.channels= memoryChannel
hdfs-agent.sources.netcat-collect.type = netcat
hdfs-agent.sources.netcat-collect.bind = localhost
hdfs-agent.sources.netcat-collect.port = 11111
hdfs-agent.sinks.hdfs-write.type = hdfs
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume
hdfs-agent.sinks.hdfs-write.rollInterval = 30
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream
hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannel
hdfs-agent.sinks.hdfs-write.channel=memoryChannel
Changes :
hdfs-agent.sinks.hdfs-write.type = hdfs(sink type as hdfs)
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume(port number)
hdfs-agent.sinks.hdfs-write.channel=memoryChannel(Removed the dot symbol after memoryChannel)