NoSuchMethod error in flume with hdfs as sink

NoSuchMethod error in flume with hdfs as sink - hadoop

I am trying to configure flume with HDFS as sink.
this is my flume.conf file:
agent1.channels.ch1.type = memory
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
agent1.sinks.log-sink1.type = logger
agent1.sinks.hdfs-sink.channel=ch1
agent1.sinks.hdfs-sink.type=hdfs
agent1.sinks.hdfs-sink.hdfs.path=hdfs://localhost:9000/flume/flumehdfs/
agent1.sinks.hdfs-sink.hdfs.fileType = DataStream
agent1.sinks.hdfs-sink.hdfs.writeFormat = Text
agent1.sinks.hdfs-sink.hdfs.batchSize = 1000
agent1.sinks.hdfs-sink.hdfs.rollSize = 0
agent1.sinks.hdfs-sink.hdfs.rollCount = 10000
agent1.sinks.hdfs-sink.hdfs.rollInterval = 600
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1 hdfs-sink
My hadoop version is:
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
Flume version is :
apache-flume-1.4.0
I have put these two jar files in flume/lib directory
hadoop-0.20.2-core
hadoop-common-0.22.0
I put the hadoop-common jar there since I was getting the following error when starting flume agent:
Unhandled error
java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled()Z
at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:491)
at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:240)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
Now agent is starting. This is the startup log :
logger=DEBUG,console
Info: Including Hadoop libraries found via (/home/user/Downloads/hadoop-0.20.2/bin/hadoop) for HDFS access
Exception in thread "main" java.lang.NoClassDefFoundError: classpath
Caused by: java.lang.ClassNotFoundException: classpath
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: classpath. Program will exit.
+ exec /usr/lib/jvm/default-java/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp '/home/user/Downloads/apache-flume-1.4.0-bin/conf:/home/user/Downloads/apache-flume-1.4.0-bin/lib/*' -Djava.library.path=:/home/user/Downloads/hadoop-0.20.2/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application -n agent1 -f ./conf/flume.conf
2013-09-04 07:55:22,634 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting
2013-09-04 07:55:22,639 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:78)] Configuration provider started
2013-09-04 07:55:22,640 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:./conf/flume.conf for changes
2013-09-04 07:55:22,642 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:./conf/flume.conf
2013-09-04 07:55:22,648 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,648 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1020)] Created context for hdfs-sink: hdfs.fileType
2013-09-04 07:55:22,649 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:loggerSink
2013-09-04 07:55:22,650 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1020)] Created context for loggerSink: type
2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,651 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,651 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:log-sink1
2013-09-04 07:55:22,651 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1020)] Created context for log-sink1: type
2013-09-04 07:55:22,651 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: loggerSink Agent: agent
2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: log-sink1 hdfs-sink Agent: agent1
2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:loggerSink
2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:hdfs-sink
2013-09-04 07:55:22,655 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:log-sink1
2013-09-04 07:55:22,655 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:313)] Starting validation of configuration for agent: agent, initial-configuration: AgentConfiguration[agent]
SOURCES: {seqGenSrc={ parameters:{channels=memoryChannel, type=seq} }}
CHANNELS: {memoryChannel={ parameters:{capacity=100, type=memory} }}
SINKS: {loggerSink={ parameters:{type=logger, channel=memoryChannel} }}
2013-09-04 07:55:22,661 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:468)] Created channel memoryChannel
2013-09-04 07:55:22,671 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:674)] Creating sink: loggerSink using LOGGER
2013-09-04 07:55:22,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:371)] Post validation configuration for agent
AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[agent]
SOURCES: {seqGenSrc={ parameters:{channels=memoryChannel, type=seq} }}
CHANNELS: {memoryChannel={ parameters:{capacity=100, type=memory} }}
AgentConfiguration created with Configuration stubs for which full validation was performed[agent]
SINKS: {loggerSink=ComponentConfiguration[loggerSink]
CONFIG:
CHANNEL:memoryChannel
}
2013-09-04 07:55:22,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:135)] Channels:memoryChannel
2013-09-04 07:55:22,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] Sinks loggerSink
2013-09-04 07:55:22,674 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] Sources seqGenSrc
2013-09-04 07:55:22,674 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:313)] Starting validation of configuration for agent: agent1, initial-configuration: AgentConfiguration[agent1]
SOURCES: {avro-source1={ parameters:{port=41414, channels=ch1, type=avro, bind=0.0.0.0} }}
CHANNELS: {ch1={ parameters:{type=memory} }}
SINKS: {hdfs-sink={ parameters:{hdfs.fileType=DataStream, hdfs.path=hdfs://localhost:9000/flume/flumehdfs/, hdfs.batchSize=1000, hdfs.rollInterval=600, hdfs.rollSize=0, hdfs.writeFormat=Text, type=hdfs, hdfs.rollCount=10000, channel=ch1} }, log-sink1={ parameters:{type=logger, channel=ch1} }}
2013-09-04 07:55:22,675 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:468)] Created channel ch1
2013-09-04 07:55:22,677 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:674)] Creating sink: hdfs-sink using HDFS
2013-09-04 07:55:22,678 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:674)] Creating sink: log-sink1 using LOGGER
2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:371)] Post validation configuration for agent1
AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[agent1]
SOURCES: {avro-source1={ parameters:{port=41414, channels=ch1, type=avro, bind=0.0.0.0} }}
CHANNELS: {ch1={ parameters:{type=memory} }}
SINKS: {hdfs-sink={ parameters:{hdfs.fileType=DataStream, hdfs.path=hdfs://localhost:9000/flume/flumehdfs/, hdfs.batchSize=1000, hdfs.rollInterval=600, hdfs.rollSize=0, hdfs.writeFormat=Text, type=hdfs, hdfs.rollCount=10000, channel=ch1} }}
AgentConfiguration created with Configuration stubs for which full validation was performed[agent1]
SINKS: {log-sink1=ComponentConfiguration[log-sink1]
CONFIG:
CHANNEL:ch1
}
2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:135)] Channels:ch1
2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] Sinks hdfs-sink log-sink1
2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] Sources avro-source1
2013-09-04 07:55:22,680 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [agent, agent1]
2013-09-04 07:55:22,680 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:150)] Creating channels
2013-09-04 07:55:22,691 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:40)] Creating instance of channel ch1 type memory
2013-09-04 07:55:22,699 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel ch1
2013-09-04 07:55:22,700 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:39)] Creating instance of source avro-source1, type avro
2013-09-04 07:55:22,733 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: log-sink1, type: logger
2013-09-04 07:55:22,736 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: hdfs-sink, type: hdfs
2013-09-04 07:55:22,985 (conf-file-poller-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:493)] Hadoop Security enabled: false
2013-09-04 07:55:22,989 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:119)] Channel ch1 connected to [avro-source1, log-sink1, hdfs-sink]
2013-09-04 07:55:22,996 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{avro-source1=EventDrivenSourceRunner: { source:Avro source avro-source1: { bindAddress: 0.0.0.0, port: 41414 } }} sinkRunners:{hdfs-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#709446e4 counterGroup:{ name:null counters:{} } }, log-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#16ba5c7a counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} }
2013-09-04 07:55:23,011 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel ch1
2013-09-04 07:55:23,064 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] Monitoried counter group for type: CHANNEL, name: ch1, registered successfully.
2013-09-04 07:55:23,064 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: CHANNEL, name: ch1 started
2013-09-04 07:55:23,065 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink hdfs-sink
2013-09-04 07:55:23,066 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink log-sink1
2013-09-04 07:55:23,068 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source avro-source1
2013-09-04 07:55:23,069 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:192)] Starting Avro source avro-source1: { bindAddress: 0.0.0.0, port: 41414 }...
2013-09-04 07:55:23,069 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] Monitoried counter group for type: SINK, name: hdfs-sink, registered successfully.
2013-09-04 07:55:23,069 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SINK, name: hdfs-sink started
2013-09-04 07:55:23,078 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] Polling sink runner starting
2013-09-04 07:55:23,079 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] Polling sink runner starting
2013-09-04 07:55:23,458 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] Monitoried counter group for type: SOURCE, name: avro-source1, registered successfully.
2013-09-04 07:55:23,462 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SOURCE, name: avro-source1 started
2013-09-04 07:55:23,464 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:217)] Avro source avro-source1 started.
But when ever some event is coming, the following error is coming in the flume logs and nothing is getting written to hdfs aswell.
ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:422)] process failed
java.lang.NoSuchMethodError: org.apache.hadoop.util.Shell.getGROUPS_COMMAND()[Ljava/lang/String;
at org.apache.hadoop.security.UnixUserGroupInformation.getUnixGroups(UnixUserGroupInformation.java:345)
at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:264)
at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:300)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:192)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1792)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:76)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1826)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1808)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:265)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:190)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
I am missing some configuration or jar file?

I was recently troubleshooting a similar (although not exactly the same) problem, and I recall two solutions that may help you.
First, you're running a very old version of Hadoop it looks like. It's currently on (stable) version 1.2.1, 0.20.2 was released Feb 2010. I recall coming across a thread where a user was having a similar problem with 0.2x.x and the suggestion was for him to upgrade.
My issue was eventually solved by installing the correct java version. I believe JDK 1.6 or higher is needed (at least for the newer versions of Hadoop). For CentOS, "yum install java-1.6.0-openjdk-devel" fixed me perfect.
I'm sorry I can't offer more right now. I'll try to find the relevant threads I was reading earlier and reply again, but in the mean time maybe this will give you a place to start. If nothing else please reply back with your java -version and maybe that will help with further troubleshooting.

These are the jars I added to solve the problem:
hadoop-core-1.0.4.jar
commons-configuration-1.6.jar
commons-httpclient-3.0.1.jar
jets3t-0.6.1.jar
commons-codec-1.4.jar
Note: I didn't have to add any extra jar when using hadoop 1.2.1

To those who are currently using HDP 2.2 with Flume 1.4 and you recieve an error like this: I just fixed it by replacing hadoop-common.jar in /flume/lib by a more recent version I found in /usr/hdp/ folder (your path may vary)

Related

Unable to retrieve Twitter streaming data using Flume

I am trying to stream and retrieve Twitter data using Flume but unable to do so because of some sort of error.
When I try executing it using the command:
flume-ng agent -n TwitterAgent -c conf -f /home/hadoop/Flume/conf/twitter.conf
I get the following:
Info: Including Hadoop libraries found via (/home/hadoop/hadoop-2.10.1/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/home/hadoop/hbase-2.2.5/bin/hbase) for HBASE access
Info: Including Hive libraries found via (/home/hadoop/apache-hive-2.3.7-bin) for Hive access
+ exec /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx20m -cp 'conf:/home/hadoop/Flume/lib/*:/home/hadoop/hadoop-2.10.1/etc/hadoop:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/common/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.10.1/contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-2.2.5/conf:/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/hadoop/hbase-2.2.5:/home/hadoop/hbase-2.2.5/lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.2.5.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/home/hadoop/hadoop-2.10.1/etc/hadoop:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/common/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.10.1/contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-2.2.5/conf:/home/hadoop/apache-hive-2.3.7-bin/lib/*' -Djava.library.path=:/home/hadoop/hadoop-2.10.1/lib/native:/home/hadoop/hadoop-2.10.1/lib/native org.apache.flume.node.Application -n TwitterAgent -f /home/hadoop/Flume/conf/twitter.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/Flume/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.3.7-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/11/20 02:23:44 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
20/11/20 02:23:44 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/hadoop/Flume/conf/twitter.conf
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 WARN conf.FlumeConfiguration: Agent configuration for 'TwitterAgent' has no configfilters.
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Creating channels
20/11/20 02:23:44 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Created channel MemChannel
20/11/20 02:23:44 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type org.apache.flume.source.twitter.TwitterSource
**20/11/20 02:23:44 ERROR node.AbstractConfigurationProvider: Source Twitter has been removed due to an error during configuration**
j**ava.lang.InstantiationException: Incompatible source and channel settings defined. source's batch size is greater than the channels transaction capacity. Source: Twitter, batch size = 1000, channel MemChannel, transaction capacity = 100**
at org.apache.flume.node.AbstractConfigurationProvider.checkSourceChannelCompatibility(AbstractConfigurationProvider.java:386)
at org.apache.flume.node.AbstractConfigurationProvider.getSourceChannels(AbstractConfigurationProvider.java:367)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:329)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:105)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/11/20 02:23:44 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [HDFS]
20/11/20 02:23:44 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#78e3d64e counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
20/11/20 02:23:44 INFO node.Application: Starting Channel MemChannel
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
20/11/20 02:23:44 INFO node.Application: Starting Sink HDFS
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
The terminal just stays stuck here and nothing happens. I tried waiting for several minutes but it stays the same.
My config file twitter.conf is located at /home/hadoop/Flume/conf and is as follows:
#Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
#Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey =##
TwitterAgent.sources.Twitter.consumerSecret =##
TwitterAgent.sources.Twitter.accessToken =##
TwitterAgent.sources.Twitter.accessTokenSecret =##
TwitterAgent.sources.Twitter.keywords =covid,covid-19,coronavirus
#Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
#Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100
TwitterAgent.channels.MemChannel.transactionCapacity = 100
#Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
My flume-env.sh file is as follows:
#Licensed to the Apache Software Foundation (ASF) under one
#or more contributor license agreements. See the NOTICE file
#distributed with this work for additional information
#regarding copyright ownership. The ASF licenses this file
#to you under the Apache License, Version 2.0 (the
#"License"); you may not use this file except in compliance
#with the License. You may obtain a copy of the License at
#http://www.apache.org/licenses/LICENSE-2.0
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
#during Flume startup.
#Enviroment variables can be set here.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=$CLASSPATH:/home/hadoop/Flume/lib/*
FLUME_CLASSPATH="/home/hadoop/Flume/lib/flume-sources-1.0-SNAPSHOT.jar"
#Give Flume more memory and pre-allocate, enable remote monitoring via JMX
#export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
#Let Flume write raw event data and configuration information to its log files for debugging
#purposes. Enabling these flags is not recommended in production,
#as it may result in logging sensitive user information or encryption secrets.
#export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "
#Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""

The error says
j**ava.lang.InstantiationException: Incompatible source and channel settings defined. source's batch size is greater than the channels transaction capacity. Source: Twitter, batch size = 1000, channel MemChannel, transaction capacity = 100**
So you can try either decrease source batch size or increase channel capacity to match source batch size.

Update: Apparently after some research I found that I used a bad version of : flume-sources-1.0-SNAPSHOT.jar which is a jar file found in the lib folder of Flume. Fixed it by generating my own jar by following the method at: https://community.cloudera.com/t5/Support-Questions/issue-flume-twitter/m-p/22938#M6597

Unable to remove elastic search service during Azure DevOps Server upgrade

Trying to upgrade TFS 2018 to DevOps Server 2019. Stuck at readiness checks screen. It wants me to remove elasticsearch service.
The service is stopped.
And the path mentioned in the service properties, specifically the Search folder, does not exist C:\Program Files\Microsoft Team Foundation Server 2018\Search\ES\elasticsearchv5\bin\elasticsearch-service-x64.exe //RS//elasticsearch-service-x64 does not exist.
No way to run service -remove from the bin folder.
Relevant log:
[Info #07:49:10.510] +-+-+-+-+-| Running Service Not Installed: Verifying the following Windows service is not installed: elasticsearch-service-x64 |+-+-+-+-+-
[Info #07:49:10.511]
[Info #07:49:10.511] +-+-+-+-+-| Verifying the following Windows service is not installed: elasticsearch-service-x64 |+-+-+-+-+-
[Info #07:49:10.511] Starting Node: VSEARCHSERVICENOTINSTALLED
[Info #07:49:10.511] NodePath : VINPUTS/Conditional/Progress/Conditional/VSEARCHSERVICENOTINSTALLED
[Info #07:49:10.511] Verifying that the following service is NOT installed: elasticsearch-service-x64. Machine: ..
[Info #07:49:10.512] Node returned: Error
[Error #07:49:10.512] The following Windows service is installed on your computer: elasticsearch-service-x64. Remove elasticsearch-service-x64 to continue. Read the troubleshooting guide (https://go.microsoft.com/fwlink/?linkid=828578) for more details.
[Info #07:49:10.512] Completed Service Not Installed: Error
[Info #07:49:10.512] -----------------------------------------------------
[Info #07:49:10.512]
[Info #07:49:10.512] +-+-+-+-+-| Running VerifySearchIndexLocation: Verifying that the search index location path is valid. |+-+-+-+-+-
[Info #07:49:10.512]
[Info #07:49:10.512] +-+-+-+-+-| Verifying that the search index location path is valid. |+-+-+-+-+-
[Info #07:49:10.512] Starting Node: VSEARCHINDEXLOCATIONVERIFIER
[Info #07:49:10.512] NodePath : VINPUTS/Conditional/Progress/Conditional/VSEARCHINDEXLOCATIONVERIFIER
[Info #07:49:10.513] Node returned: Success
[Info #07:49:10.513] Completed VerifySearchIndexLocation: Success
[Info #07:49:10.513] -----------------------------------------------------
[Info #07:49:10.513]
[Info #07:49:10.513] +-+-+-+-+-| Running Verify ElasticSearch port is available: Verifying that a port is available in range 9200-9299 |+-+-+-+-+-
[Info #07:49:10.513]
[Info #07:49:10.513] +-+-+-+-+-| Verifying that a port is available in range 9200-9299 |+-+-+-+-+-
[Info #07:49:10.513] Starting Node: VSEARCHESPORTAVAILABLE
[Info #07:49:10.513] NodePath : VINPUTS/Conditional/Progress/Conditional/VSEARCHESPORTAVAILABLE
[Info #07:49:10.514] Port: 9200 is available for configuring elasticsearch
[Info #07:49:10.514] Node returned: Success
[Info #07:49:10.514] Completed Verify ElasticSearch port is available: Success
[Info #07:49:10.514] -----------------------------------------------------
[Info #07:49:10.514]
[Info #07:49:10.514] +-+-+-+-+-| Running VerifySearchServiceAccount: Verifying that the search service account name and password is valid. |+-+-+-+-+-
[Info #07:49:10.514]
[Info #07:49:10.515] +-+-+-+-+-| Verifying that the search service account name and password is valid. |+-+-+-+-+-
[Info #07:49:10.515] Starting Node: VSEARCHACCOUNTVALID
[Info #07:49:10.515] NodePath : VINPUTS/Conditional/Progress/Conditional/VSEARCHACCOUNTVALID
[Info #07:49:10.515] Node returned: Success
[Info #07:49:10.515] Completed VerifySearchServiceAccount: Success
[Info #07:49:10.515] -----------------------------------------------------

In that case, just run the command in cmd as an administrator
sc delete elasticsearch-service-x64
According to microsoft documentation, the sc delete command removes the service from the registry without the need for an executable in the specified path.
After this command close all windows referring to the windows service and request a new check in Azure DevOps Server 2019

Importing data from file to ElasticSearch with logstash

I have script that logs temperature + humidity from diffrent sensors and stores the data from each sensor to his directory and every day a new log is made in this format YYYY-MM-DD.log.
${data_root}/A/0/*.log
${data_root}/A/1/*.log
ETC..
the logs are in this format:
2018-03-02 03:48:14 25.00 27.10
(YYYY-MM-DD TIME Temperature Humidity)
I had trouble with understanding how to correctly config my logstash instance, I figured that my input should look something like this:
input {
file{ path => "/var/wlogs/a1/*.log" type=>"a1"}
file{ path => "/var/wlogs/a2/*.log" type=>"a2"}
etc..
}
and the filter should look something like this:
filter{
if [type] == "a1" {
grok {
match => { "message" => "(?<timestamp>%{YEAR}-%{MONTHNUM:month}-%{MONTHDAY:day} %{TIME}) %{NUMBER:temperature:float} %{NUMBER:humidity:float}" }
}
}
if [type] == "a2" {....}
Im trying to export the the data in the output section to ElasticSearch with no success.
output{
elasticsearch { hosts =>["ec2-xxxxxx.eu-west-2.compute.amazonaws.com:9200"] user=>"elastic" password=>"pass" index=>"{type}"}
stdout{ codec => rubydebug}
}
here is the console output when I try to run it:
ubuntu#ip-xxx-xxx:/usr/share/logstash$ sudo bin/logstash -f ~/logstash.conf
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2018-03-02 13:43:34.633 [main] scaffold - Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[INFO ] 2018-03-02 13:43:34.647 [main] scaffold - Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[WARN ] 2018-03-02 13:43:35.063 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2018-03-02 13:43:35.209 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.2.2"}
[INFO ] 2018-03-02 13:43:35.430 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
[INFO ] 2018-03-02 13:43:36.145 [Ruby-0-Thread-1: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/task.rb:22] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[INFO ] 2018-03-02 13:43:36.318 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://elastic:xxxxxx#ec2-no.eu-west-2.compute.amazonaws.com:9200/]}}
[INFO ] 2018-03-02 13:43:36.327 [[main]-pipeline-manager] elasticsearch - Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx#ec2-no.eu-west-2.compute.amazonaws.com:9200/, :path=>"/"}
[WARN ] 2018-03-02 13:43:36.447 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://elastic:xxxxxx#ec2-3no3.eu-west-2.compute.amazonaws.com:9200/"}
[INFO ] 2018-03-02 13:43:36.610 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>nil}
[WARN ] 2018-03-02 13:43:36.611 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[INFO ] 2018-03-02 13:43:36.616 [[main]-pipeline-manager] elasticsearch - Using mapping template from {:path=>nil}
[INFO ] 2018-03-02 13:43:36.619 [[main]-pipeline-manager] elasticsearch - Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date"}, "#version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[INFO ] 2018-03-02 13:43:36.626 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//ec2-no.eu-west-2.compute.amazonaws.com:9200"]}
[INFO ] 2018-03-02 13:43:37.054 [Ruby-0-Thread-1: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/task.rb:22] pipeline - Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x25b5f422#/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246 run>"}
[INFO ] 2018-03-02 13:43:37.081 [Ruby-0-Thread-1: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/task.rb:22] agent - Pipelines running {:count=>1, :pipelines=>["main"]}
please help me figure out what I'm doing wrong and how to fix it :)
thanks in advance
P.S: Im using the latest versions of ElasticSearch, Kibana and Logstash

Don't see any error in the logs. Makes me think that the log files might have already been read in a previous attempt. Since the file offsets are maintained in the sincedb file in home directory, can you stop logstash, delete the file and try again?
For more details about the sincedb file, refer to https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html

Issue while getting Twitter data in HDFS using Flume

I am trying to fetch the twitter data in HDFS but getting issue.
Here is my flume.conf file
TwitterAgent.sources= Twitter
TwitterAgent.channels= MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sources.Twitter.consumerKey=xxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret= xxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken=xxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords= hadoop,election,sports, cricket,Big data
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100
In Env.sh file, I have the path
#FLUME_CLASSPATH="/usr/lib/flume-sources-1.0-SNAPSHOT.jar"
Now I am using the below command to get the data-
[cloudera#quickstart etc]$ flume-ng agent -n TwitterAgent -c conf -f /etc/flume-ng/conf/flume.conf
It showing some logs but I am getting the below error and it is getting stuck after HDFS sink started.
16/09/25 05:18:36 WARN conf.FlumeConfiguration: Could not configure source Twitter due to: Component has no type. Cannot configure. Twitter
org.apache.flume.conf.ConfigurationException: Component has no type. Cannot configure. Twitter
at org.apache.flume.conf.ComponentConfiguration.configure(ComponentConfiguration.java:76)
at org.apache.flume.conf.source.SourceConfiguration.configure(SourceConfiguration.java:56)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSources(FlumeConfiguration.java:567)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:346)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.access$000(FlumeConfiguration.java:213)
at org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:127)
at org.apache.flume.conf.FlumeConfiguration.<init>(FlumeConfiguration.java:109)
at org.apache.flume.node.PropertiesFileConfigurationProvider.getFlumeConfiguration(PropertiesFileConfigurationProvider.java:189)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:89)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/09/25 05:18:36 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Creating channels
16/09/25 05:18:36 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Created channel MemChannel
16/09/25 05:18:36 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [HDFS]
16/09/25 05:18:36 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#3963542c counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
16/09/25 05:18:36 INFO node.Application: Starting Channel MemChannel
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
16/09/25 05:18:36 INFO node.Application: Starting Sink HDFS
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started

In configuration file please replace
TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource
by
TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource

Spring Xd no containers available for stream

I'm running Spring XD in distributed mode. But when I bring up a container the admin node sees it and only deploys some and not all streams. Here is the admin log
:11:41+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ContainerListener - Container arrived: Container{name='26c5cbfa-7f20-455c-a36c-580c98a8126--More--(76%)
...skipping 1 line
2016-10-03T19:11:41+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ContainerListener - Scheduling deployments to new container(s) in 15000 ms
2016-10-03T19:11:57+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'cameras-http-receiver': DeploymentStatus{state=incomplete}
2016-10-03T19:11:57+0000 1.3.0.RELEASE WARN DeploymentSupervisor-0 zk.ModuleRedeployer - No containers available for redeployment of http for stream cameras-http-receiver
2016-10-03T19:11:57+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'cameras-http-receiver': DeploymentStatus{state=incomplete}
2016-10-03T19:11:59+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'cameras-http-receiver-2': DeploymentStatus{state=deployed}
2016-10-03T19:12:00+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'cameras-http-receiver-3': DeploymentStatus{state=deployed}
2016-10-03T19:12:01+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'cameras-processor': DeploymentStatus{state=deployed}
2016-10-03T19:12:02+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'cameras-to-mongo': DeploymentStatus{state=incomplete}
...skipping 1 line
2016-10-03T19:12:02+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'cameras-to-mongo': DeploymentStatus{state=incomplete}
2016-10-03T19:12:03+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'mongo-sink': DeploymentStatus{state=incomplete}
2016-10-03T19:12:03+0000 1.3.0.RELEASE WARN DeploymentSupervisor-0 zk.ModuleRedeployer - No containers available for redeployment of mongodb for stream mongo-sink
2016-10-03T19:12:03+0000 1.3.0.RELEASE INFO DeploymentSupervisor-0 zk.ModuleRedeployer - Deployment state for stream 'mongo-sink': DeploymentStatus{state=incomplete}
Why is it saying there's no containers available right after it recognizes the container?

Do you have any container matching criteria for the modules? It looks like one of the module instances fails to get deployed while the ModuleRedeployer is looking for some other container to deploy that module.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

NoSuchMethod error in flume with hdfs as sink - hadoop

These are the jars I added to solve the problem: hadoop-core-1.0.4.jar commons-configuration-1.6.jar commons-httpclient-3.0.1.jar jets3t-0.6.1.jar commons-codec-1.4.jar Note: I didn't have to add any extra jar when using hadoop 1.2.1

To those who are currently using HDP 2.2 with Flume 1.4 and you recieve an error like this: I just fixed it by replacing hadoop-common.jar in /flume/lib by a more recent version I found in /usr/hdp/ folder (your path may vary)

Related

Unable to retrieve Twitter streaming data using Flume

Unable to remove elastic search service during Azure DevOps Server upgrade

Importing data from file to ElasticSearch with logstash

Issue while getting Twitter data in HDFS using Flume

Spring Xd no containers available for stream

Categories

Resources