I'm testing a case using Storm 1.2.2 and Kafka 2.x as my Spout. So i created a LocalCluster just for test purposes.
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka_spout", new KafkaSpout<>(KafkaSpoutConfig.builder("MYKAFKAIP:9092", "storm-test-dpi").build()), 1);
builder.setBolt("bolt", new LoggerBolt()).shuffleGrouping("kafka_spout");
LocalCluster localCluster = new LocalCluster();
localCluster.submitTopology("kafkaBoltTest", new Config(), builder.createTopology());
Utils.sleep(10000);
After initialize this app i got the following:
9293 [Thread-20-kafka_spout-executor[3 3]] INFO o.a.k.c.u.AppInfoParser - Kafka version : 0.10.1.0
9293 [Thread-20-kafka_spout-executor[3 3]] INFO o.a.k.c.u.AppInfoParser - Kafka commitId : 3402a74efb23d1d4
And after a lot of error:
9664 [Thread-20-kafka_spout-executor[3 3]] INFO o.a.s.k.s.KafkaSpout - Initialization complete
9703 [Thread-20-kafka_spout-executor[3 3]] WARN o.a.k.c.c.i.Fetcher - Unknown error fetching data for topic-partition storm-test-dpi-0
9714 [Thread-20-kafka_spout-executor[3 3]] WARN o.a.k.c.c.i.Fetcher - Unknown error fetching data for topic-partition storm-test-dpi-0
9742 [Thread-20-kafka_spout-executor[3 3]] WARN o.a.k.c.c.i.Fetcher - Unknown error fetching data for topic-partition storm-test-dpi-0
9756 [Thread-20-kafka_spout-executor[3 3]] WARN o.a.k.c.c.i.Fetcher - Unknown error fetching data for topic-partition storm-test-dpi-0
9767 [Thread-20-kafka_spout-executor[3 3]] WARN o.a.k.c.c.i.Fetcher - Unknown error fetching data for topic-partition storm-test-dpi-0
9781 [Thread-20-kafka_spout-executor[3 3]] WARN o.a.k.c.c.i.Fetcher - Unknown error fetching data for topic-partition storm-test-dpi-0
9806 [Thread-20-kafka_spout-executor[3 3]] WARN o.a.k.c.c.i.Fetcher - Unknown error fetching data for topic-partition storm-test-dpi-0
I think this problem is because of Kafka Version, as you can see the log is showing version "0.10.1.0" but my Kafka version is "2.x".
This is my pom.xml:
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>${version.storm}</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka-client</artifactId>
<version>${version.storm}</version>
</dependency>
Where ${version.storm} is 1.2.2
You are supposed to also declare the version of kafka-clients you are using. The storm-kafka-client POM sets the kafka-clients scope to provided. This means kafka-clients won't be included when you build. We do this so you can easily upgrade.
The reason it's even running for you is because you are using LocalCluster in some test code, where provided dependencies are present.
Add this to your POM, and it should work:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>your-kafka-version-here</version>
</dependency>
I am trying to ingest data using flume from kafka source to hdfs. Below is my flume conf file.
flume1.sources = kafka-source-1
flume1.channels = hdfs-channel-1
flume1.sinks = hdfs-sink-1
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.bootstrap.servers = localhost:9092
flume1.sources.kafka-source-1.zookeeperConnect = localhost:2181
flume1.sources.kafka-source-1.topic = MYNEWSFEEDS
flume1.sources.kafka-source-1.batchSize = 100
flume1.sources.kafka-source-1.channels = hdfs-channel-1
flume1.channels.hdfs-channel-1.type = memory
flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = test-events
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = hdfs://quickstart.cloudera:8020/tmp
flume1.sinks.hdfs-sink-1.hdfs.rollCount=100
flume1.sinks.hdfs-sink-1.hdfs.rollSize=0
flume1.channels.hdfs-channel-1.capacity = 10000
flume1.channels.hdfs-channel-1.transactionCapacity = 1000
I am using below command to run flume agent:
sudo flume-ng agent --name flume1 --conf-file '/etc/flume-ng/conf/flafka.conf' Dflume.root.logger=TRACE,console
But I am getting below error:
18/03/12 16:49:18 ERROR node.AbstractConfigurationProvider: Source
kafka-source-1 has been removed due to an error during configuration
org.apache.flume.conf.ConfigurationException: Bootstrap Servers must
be specified at
org.apache.flume.source.kafka.KafkaSource.doConfigure(KafkaSource.java:330)
at
org.apache.flume.source.BasicSourceSemantics.configure(BasicSourceSemantics.java:65)
at
org.apache.flume.source.AbstractPollableSource.configure(AbstractPollableSource.java:63)
at
org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at
org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:326)
at
org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at
org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Although, I have specified the Bootstrap Servers in conf file but still it give same error. Have tried many permutations and combinations but no success.
According to the official JavaDoc, you should replace
flume1.sources.kafka-source-1.bootstrap.servers = localhost:9092
with
flume1.sources.kafka-source-1.kafka.bootstrap.servers = localhost:9092
I am using flume to download twitter data but when i run the Flume command i get this error . kindly help please.
2016-08-13 17:21:12,945 (conf-file-poller-0) [ERROR - org.apache.flume.node.Poll ingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertie FileConfigurationProvider.java:149)] Unhandled error java.lang.AssertionError: java.lang.reflect.InvocationTargetException at twitter4j.HttpClientFactory.getInstance(HttpClientFactory.java:81) at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:51) at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40) at org.apache.flume.source.twitter.TwitterSource.configure(TwitterSource .java:115) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSources(Abstr actConfigurationProvider.java:326) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration( AbstractConfigurationProvider.java:97) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$File WatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51 1) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask. access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask. run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct orAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC onstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at twitter4j.HttpClientFactory.getInstance(HttpClientFactory.java:73) ... 14 more Caused by: java.lang.NoSuchMethodError: twitter4j.conf.Configuration.getHttpClie ntConfiguration()Ltwitter4j/HttpClientConfiguration; at twitter4j.StreamingReadTimeoutConfiguration.isGZIPEnabled(TwitterStre amImpl.java:804) at twitter4j.HttpClientBase.<init>(HttpClientBase.java:25) at twitter4j.HttpClientImpl.<init>(HttpClientImpl.java:55) ... 19 more
Configuration file is as follows :-
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = FMCaWqdvqJPzzmY76uPAzJphX
TwitterAgent.sources.Twitter.consumerSecret = me1DRFssnyirpj3j1cOC4rByvxTR1F2LfF2n3udaOXb6K9yWiZ
TwitterAgent.sources.Twitter.accessToken = 753216791690653696- CN3n8Krf8DBI5KorjnfjlmoVCFf7EnE
TwitterAgent.sources.Twitter.accessTokenSecret = KIEEPE6XDthMZ6peCoVYNDXKOy1ElTbF7hjXFggW7NHQK
TwitterAgent.sources.Twitter.keywords = india , elections , congress
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path =hdfs://localhost:9000/user/data/tweets_raw/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Flume command is :-
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf - Dflume.root.logger=DEBUG,console -n TwitterAgent
So I've just started working with storm and trying to understand it. I am trying to connect to the kafka topic, read the data and write it to the HDFS bolt.
At first I created it without the shuffleGrouping("stormspout") and my Storm UI was showing that the spout was consuming the data from the topic but nothing was being written to the bolt (except for the empty files it was creating on the HDFS) . I then added shuffleGrouping("stormspout"); and now the bolt appears to be giving an error. If anyone can help with this, I will really appreciate it.
Thanks,
Colman
Error
2015-04-13 00:02:58 s.k.PartitionManager [INFO] Read partition information from: /storm/partition_0 --> null
2015-04-13 00:02:58 s.k.PartitionManager [INFO] No partition information found, using configuration to determine offset
2015-04-13 00:02:58 s.k.PartitionManager [INFO] Last commit offset from zookeeper: 0
2015-04-13 00:02:58 s.k.PartitionManager [INFO] Commit offset 0 is more than 9223372036854775807 behind, resetting to startOffsetTime=-2
2015-04-13 00:02:58 s.k.PartitionManager [INFO] Starting Kafka 192.168.134.137:0 from offset 0
2015-04-13 00:02:58 s.k.ZkCoordinator [INFO] Task [1/1] Finished refreshing
2015-04-13 00:02:58 b.s.d.task [INFO] Emitting: stormspout default [colmanblah]
2015-04-13 00:02:58 b.s.d.executor [INFO] TRANSFERING tuple TASK: 2 TUPLE: source: stormspout:3, stream: default, id: {462820364856350458=5573117062061876630}, [colmanblah]
2015-04-13 00:02:58 b.s.d.task [INFO] Emitting: stormspout __ack_init [462820364856350458 5573117062061876630 3]
2015-04-13 00:02:58 b.s.d.executor [INFO] TRANSFERING tuple TASK: 1 TUPLE: source: stormspout:3, stream: __ack_init, id: {}, [462820364856350458 5573117062061876630 3]
2015-04-13 00:02:58 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: stormspout:3, stream: __ack_init, id: {}, [462820364856350458 5573117062061876630 3]
2015-04-13 00:02:58 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: TUPLE: source: stormspout:3, stream: __ack_init, id: {}, [462820364856350458 5573117062061876630 3]
2015-04-13 00:02:58 b.s.d.executor [INFO] Execute done TUPLE source: stormspout:3, stream: __ack_init, id: {}, [462820364856350458 5573117062061876630 3] TASK: 1 DELTA:
2015-04-13 00:02:59 b.s.d.executor [INFO] Prepared bolt stormbolt:(2)
2015-04-13 00:02:59 b.s.d.executor [INFO] Processing received message FOR 2 TUPLE: source: stormspout:3, stream: default, id: {462820364856350458=5573117062061876630}, [colmanblah]
2015-04-13 00:02:59 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.NullPointerException
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$fn__5697$fn__5710$fn__5761.invoke(executor.clj:794) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.util$async_loop$fn__452.invoke(util.clj:465) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.NullPointerException: null
at org.apache.storm.hdfs.bolt.HdfsBolt.execute(HdfsBolt.java:92) ~[storm-hdfs-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$fn__5697$tuple_action_fn__5699.invoke(executor.clj:659) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$mk_task_receiver$fn__5620.invoke(executor.clj:415) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.disruptor$clojure_handler$reify__1741.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:120) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
... 6 common frames omitted
2015-04-08 04:26:39 b.s.d.executor [ERROR]
java.lang.RuntimeException: java.lang.NullPointerException
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$fn__5697$fn__5710$fn__5761.invoke(executor.clj:794) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.util$async_loop$fn__452.invoke(util.clj:465) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.NullPointerException: null
at org.apache.storm.hdfs.bolt.HdfsBolt.execute(HdfsBolt.java:92) ~[storm-hdfs-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$fn__5697$tuple_action_fn__5699.invoke(executor.clj:659) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.daemon.executor$mk_task_receiver$fn__5620.invoke(executor.clj:415) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.disruptor$clojure_handler$reify__1741.onEvent(disruptor.clj:58) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:120) ~[storm-core-0.9.3.2.2.0.0-2041.jar:0.9.3.2.2.0.0-2041]
Code:
TopologyBuilder builder = new TopologyBuilder();
Config config = new Config();
//config.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 7000);
config.setNumWorkers(1);
config.setDebug(true);
//LocalCluster cluster = new LocalCluster();
//zookeeper
BrokerHosts brokerHosts = new ZkHosts("192.168.134.137:2181", "/brokers");
//spout
SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, "myTopic", "/kafkastorm", "KafkaSpout");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.forceFromStart = true;
builder.setSpout("stormspout", new KafkaSpout(spoutConfig),4);
//bolt
SyncPolicy syncPolicy = new CountSyncPolicy(10); //Synchronize data buffer with the filesystem every 10 tuples
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB); // Rotate data files when they reach five MB
FileNameFormat fileNameFormat = new DefaultFileNameFormat().withPath("/stormstuff"); // Use default, Storm-generated file names
builder.setBolt("stormbolt", new HdfsBolt()
.withFsUrl("hdfs://192.168.134.137:8020")//54310
.withSyncPolicy(syncPolicy)
.withRotationPolicy(rotationPolicy)
.withFileNameFormat(fileNameFormat),2
).shuffleGrouping("stormspout");
//cluster.submitTopology("ColmansStormTopology", config, builder.createTopology());
try {
StormSubmitter.submitTopologyWithProgressBar("ColmansStormTopology", config, builder.createTopology());
} catch (AlreadyAliveException e) {
e.printStackTrace();
} catch (InvalidTopologyException e) {
e.printStackTrace();
}
POM.XML dependencies
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.3</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.3</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hdfs</artifactId>
<version>0.9.3</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.1.1</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
First of all try to emit the values from the execute method, if you are emitting from different worker thread, then let all the worker threads to feed the data in LinkedBlockingQueue and only a single worker thread will allow to emit the values from LinkedBlockingQueue.
Secondly, try to Set Config.setMaxSpoutPending to some value and again try to run the code, and check if the scenario persist try to reduce that value.
Reference - Config.TOPOLOGY_MAX_SPOUT_PENDING: This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.
I eventually figured this out by going through the storm source code.
I wasn't setting
RecordFormat format = new DelimitedRecordFormat().withFieldDelimiter("|");
and including it like
builder.setBolt("stormbolt", new HdfsBolt()
.withFsUrl("hdfs://192.168.134.137:8020")//54310
.withSyncPolicy(syncPolicy)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withFileNameFormat(fileNameFormat),1
).shuffleGrouping("stormspout");
In the HDFSBolt.Java class, it tries to use this and basically falls over if its not set. That was where the NPE was coming from.
Hope this helps someone else out, ensure you have set all the bits that are required in this class. A more useful error message such as "RecordFormat not set" would be nice....
I am trying to integrate FLUME with HDFS and my FLUME config file is
hdfs-agent.sources= netcat-collect
hdfs-agent.sinks = hdfs-write
hdfs-agent.channels= memoryChannel
hdfs-agent.sources.netcat-collect.type = netcat
hdfs-agent.sources.netcat-collect.bind = localhost
hdfs-agent.sources.netcat-collect.port = 11111
hdfs-agent.sinks.hdfs-write.type = FILE_ROLL
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:50020/user/oracle/flume
hdfs-agent.sinks.hdfs-write.rollInterval = 30
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream
hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannel
hdfs-agent.sinks.hdfs-write.channel=memoryChannel.
And my core site file is
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>
When i try to run the flume agent , it is starting and it is able to read from the nc command but while writing to the hdfs i am getting the below exception. I have tried to start in safe mode using hadoop dfsadmin -safemode leave still i have the same below exception.
2014-02-14 10:31:53,785 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating hdfs://127.0.0.1:50020/user/oracle/flume/FlumeData.1392354113707.tmp
2014-02-14 10:31:54,011 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error
java.io.IOException: Call to /127.0.0.1:50020 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1089)
at org.apache.hadoop.ipc.Client.call(Client.java:1057)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:369)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1489)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1523)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1505)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:781)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:689)
Please let me know if have configured something wrong in any of the properties files so that it will work.
Also please let me know if i am using the correct port for this
my target is to integrate flume and hadoop.
i have a single node server setup for hadoop
You must provide a port number with fs.default.name
Example :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9001</value>
</property>
</configuration>
After that edit the Flume config file as below
hdfs-agent.sources= netcat-collect
hdfs-agent.sinks = hdfs-write
hdfs-agent.channels= memoryChannel
hdfs-agent.sources.netcat-collect.type = netcat
hdfs-agent.sources.netcat-collect.bind = localhost
hdfs-agent.sources.netcat-collect.port = 11111
hdfs-agent.sinks.hdfs-write.type = hdfs
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume
hdfs-agent.sinks.hdfs-write.rollInterval = 30
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
hdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream
hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannel
hdfs-agent.sinks.hdfs-write.channel=memoryChannel
Changes :
hdfs-agent.sinks.hdfs-write.type = hdfs(sink type as hdfs)
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://127.0.0.1:9001/user/oracle/flume(port number)
hdfs-agent.sinks.hdfs-write.channel=memoryChannel(Removed the dot symbol after memoryChannel)