Kafka Streams failed to delete the state directory - DirectoryNotEmptyException - apache-kafka-streams

I noticed the exception stream-thread [x-CleanupThread] Failed to delete the state directory with our kafka streams application. Application uses a windowed state store and is defined as:
Stores.windowStoreBuilder(
Stores.persistentWindowStore(
storeName,
retentionPeriod,
retentionWindowSize,
false),
Serdes.String(),
Serdes.String()).withCachingEnabled();
This is not a test issue using topology driver. This is in actual deployed stream application. Every ten minutes it will try to delete the directory and fail with the stack trace below. Checking the directory it appears empty using ls -al. I have also tried modifying the permissions of the directory using chmod 777 with no luck.
Stack Trace:
java.nio.file.DirectoryNotEmptyException: /data/consumer/0_17
at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:763)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:746)
at java.nio.file.Files.walkFileTree(Files.java:2688)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:746)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:290)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:253)
at org.apache.kafka.streams.KafkaStreams$2.run(KafkaStreams.java:794)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Related

java.io.IOException: net.schmizz.sshj.sftp.SFTPException: Failure] Successfully fetched the content

i have Nifi Installation running on Linux, which was working fine and all of sudden FetchSFTp throwing error
my flow is List SFTP - FetchSFTp - PutSFTP. and below is the error showing in FetchSFTp process.
FetchSFTP[id=908da67c-0181-1000-1830-fdbb76da7be8] Successfully fetched the content for FlowFile[filename=cfgcampaign_2022-06-25.csv] from etl12.kw.zain.com:22/data1/dw/ftpuser/Varicent_Files/ICM_CC/cfgcampaign_2022-06-25.csv but failed to rename the remote file due to net.schmizz.sshj.sftp.SFTPException: Failure: java.io.IOException: net.schmizz.sshj.sftp.SFTPException: Failure - Caused by: net.schmizz.sshj.sftp.SFTPException: Failure
And from the log:
2022-06-26 10:58:50,699 WARN [Timer-Driven Process Thread-4] o.a.nifi.processors.standard.FetchSFTP [FetchSFTP[id=908da67c-0181-1000-1830-fdbb76da7be8], StandardFlowFileRecord[uuid=d139d68c-f094-45f8-982d-ab4a1abaf264,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1656230330691-548, container=default, section=548], offset=0, length=46140],offset=0,name=cfgcampaign_2022-06-25.csv,size=46140], etl12.kw.zain.com, 22, /data1/dw/ftpuser/Varicent_Files/ICM_CC/cfgcampaign_2022-06-25.csv, java.io.IOException: net.schmizz.sshj.sftp.SFTPException: Failure] Successfully fetched the content for {} from {}:{}{} but failed to rename the remote file due to {}
java.io.IOException: net.schmizz.sshj.sftp.SFTPException: Failure
at org.apache.nifi.processors.standard.util.SFTPTransfer.rename(SFTPTransfer.java:785)
at org.apache.nifi.processors.standard.FetchFileTransfer.performCompletionStrategy(FetchFileTransfer.java:359)
at org.apache.nifi.processors.standard.FetchFileTransfer.lambda$onTrigger$1(FetchFileTransfer.java:313)
at org.apache.nifi.controller.repository.StandardProcessSession.commitAsync(StandardProcessSession.java:537)
at org.apache.nifi.processors.standard.FetchFileTransfer.onTrigger(FetchFileTransfer.java:312)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1283)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
at org.apache.nifi.controller.scheduling.AbstractTimeBasedSchedulingAgent.lambda$doScheduleOnce$0(AbstractTimeBasedSchedulingAgent.java:63)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: net.schmizz.sshj.sftp.SFTPException: Failure
at net.schmizz.sshj.sftp.Response.error(Response.java:140)
at net.schmizz.sshj.sftp.Response.ensureStatusIs(Response.java:133)
at net.schmizz.sshj.sftp.Response.ensureStatusPacketIsOK(Response.java:125)
at net.schmizz.sshj.sftp.SFTPEngine.rename(SFTPEngine.java:250)
at net.schmizz.sshj.sftp.SFTPClient.rename(SFTPClient.java:124)
at net.schmizz.sshj.sftp.SFTPClient.rename(SFTPClient.java:119)
at org.apache.nifi.processors.standard.util.SFTPTransfer.rename(SFTPTransfer.java:777)
... 16 common frames omitted
Can anyone help me to fix this?
Regards,
Ben

Apache nifi: Timed out while waiting for OnScheduled of 'QueryDatabaseTable' processor to finish

I'm trying to create my first flow using the QueryDatabaseTable to incrementally extract rows from an Oracle database table.
I'm getting the errors below. I enabled full debug but nothing else useful is logged.
Thoughts on what to try next?
2017-07-10 14:43:52,280 WARN [StandardProcessScheduler Thread-4] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'QueryDatabaseTable' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'QueryDatabaseTable[id=1e535f00-015d-1000-236d-7adebe14958a]' that needs to be documented, reported and eventually fixed.
2017-07-10 14:43:52,280 ERROR [StandardProcessScheduler Thread-4] o.a.n.p.standard.QueryDatabaseTable QueryDatabaseTable[id=1e535f00-015d-1000-236d-7adebe14958a] QueryDatabaseTable[id=1e535f00-015d-1000-236d-7adebe14958a] failed to invoke #OnScheduled method due to java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task.; processor will not be scheduled to run for 30 seconds: java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task.
java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task.
at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1480)
at org.apache.nifi.controller.StandardProcessorNode.access$000(StandardProcessorNode.java:102)
at org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1303)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1465)
... 9 common frames omitted
2017-07-10 14:43:52,280 ERROR [StandardProcessScheduler Thread-4] o.a.n.controller.StandardProcessorNode Failed to invoke #OnScheduled method due to java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task.
java.lang.RuntimeException: Timed out while executing one of processor's OnScheduled task.
at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1480)
at org.apache.nifi.controller.StandardProcessorNode.access$000(StandardProcessorNode.java:102)
at org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1303)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1465)
... 9 common frames omitted
The #OnScheduled method of QueryDatabaseTable is trying to connect to your database and appears to be having problems causing it to hit the 60 second processor scheduling timeout.
Can you verify your DBCPConnectionPool service is correctly configured and that the servers running NiFi can otherwise connect to the database with the same credentials?
I Only have one Nifi Server Running. If I Chance the Connection String it Throws a Oracle Error. So I assume that this is working. Any Tipps How can I debug this ?
UPDATE: I checked and I have no connection from the nifi to the DB. This error is misleading.
On my case, it was a firewall issue. I asked to permission from security manager. The connection also can be checked via telnet.
telnet databaseserver port_number
Expected output:
Trying database_server...
Connected to 1database_server.
Escape character is '^]'.

Impala JDBC connection: Error setting/closing session: Open Session Error

If I have the following type of impala connection, can I still use SquirreL SQL:
self.impala_con = connect(host='sql.edamame.com', port=25003, use_ssl=True, auth_mechanism="PLAIN",
user='edamame1', password='edamamePass')
Here is how I set the Alias in Squirrel:
I got the following errors during test connection:
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.sql.SQLException: [Simba][ImpalaJDBCDriver](500151) Error setting/closing session: Open Session Error.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.awaitConnection(OpenConnectionCommand.java:132)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.access$100(OpenConnectionCommand.java:45)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand$2.run(OpenConnectionCommand.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Update:
I also tried the URL below:
jdbc:impala://sql.edamame.com:25003/default;AuthMech=3;SSL=1
and get the following new errors:
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.sql.SQLException: [Simba][ImpalaJDBCDriver](500310) Invalid operation: null;
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.awaitConnection(OpenConnectionCommand.java:132)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.access$100(OpenConnectionCommand.java:45)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand$2.run(OpenConnectionCommand.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
What else I might be missing? Thanks!
Short answer: RTFM.
Long answer: the Cloudera JDBC driver ships with a 80+ pages PDF "Install and Config guide". Don't be shy, open it. You may go directly to Appendix A - driver config and look at these entries:
AuthMech The authentication mechanism to use.
Set the value to one of the following numbers: 0 for No Authentication 1 for Kerberos 2 for User Name 3 for User Name and Password
SSL
When this property is set to 1, the driver communicates with the Impala server through an SSL-enabled socket. When this property is set to 0, the driver does not connect to SSL-enabled sockets.
So your URL should look like
jdbc:impala://blahblah:25003/default;AuthMech=3;SSL=1
One more thing: to troubleshoot SSL handshake issues, you may enable the debug traces with this Java system property...
-Djavax.net.debug=ssl

Unexpected transaction.log file created when starting elasticsearch embedded in Tomcat

I'm using elasticsearch-transport-wares to run an Elasticsearch embedded in Tomcat.
Whatever the value of path.logs is in the elasticsearch.yml, an empty transaction.log file is always created at the path where I run the command to start my Tomcat.
When I start a not embedded Elasticsearch server I didn't see any transaction.log file created anywhere...
My question is:
Do you know what is the purpose of this empty transaction.log file ?
How can I configure elasticsearch the choose where this file must be stored (such like path.logs to decide where your elasticsearch logs are stored) ?
Consequently when my server is started from a path with no write permission, I got this stacktrace (log as INFO):
2016-01-28 10:26:15,174 | INFO | Initializing elasticsearch Node 'node' [o.a.c.c.C.[.[.[/es-embedded]<localhost-startStop-1>]
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: transaction.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.elasticsearch.common.logging.log4j.Log4jESLoggerFactory.newInstance(Log4jESLoggerFactory.java:39)
at org.elasticsearch.common.logging.ESLoggerFactory.newInstance(ESLoggerFactory.java:74)
at org.elasticsearch.common.logging.ESLoggerFactory.getLogger(ESLoggerFactory.java:66)
at org.elasticsearch.common.logging.Loggers.getLogger(Loggers.java:122)
at org.elasticsearch.common.MacAddressProvider.<clinit>(MacAddressProvider.java:32)
at org.elasticsearch.common.TimeBasedUUIDGenerator.<clinit>(TimeBasedUUIDGenerator.java:41)
at org.elasticsearch.common.Strings.<clinit>(Strings.java:48)
at org.elasticsearch.common.settings.ImmutableSettings.<init>(ImmutableSettings.java:73)
at org.elasticsearch.common.settings.ImmutableSettings$Builder.build(ImmutableSettings.java:1142)
at org.elasticsearch.node.NodeBuilder.settings(NodeBuilder.java:89)
at org.elasticsearch.wares.NodeServlet.init(NodeServlet.java:111)
at javax.servlet.GenericServlet.init(GenericServlet.java:158)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1284)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1197)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1087)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5267)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5557)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:652)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1095)
at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1930)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
You can change the path.logs settings in the elasticsearch.yml file and then add your elasticsearch.yml configuration file inside the /WEB-INF folder of your WAR and it will be picked up.
You should be good to go.
As for the transaction.log file, are you sure it's not a file that you have configured in your Log4J configuration?

error in streaming twitter data to Hadoop using flume

I am using Hadoop-1.2.1 on Ubuntu 14.04
I am trying to stream data from twitter to HDFS by using Flume-1.6.0. I have downloaded flume-sources-1.0-SNAPSHOT.jar and included it in flume/lib folder. I have set path of flume-sources-1.0-SNAPSHOT.jar as FLUME_CLASSPATH in conf/flume-env.sh . This is my flume agent conf file:
#setting properties of agent
Twitter-agent.sources=source1
Twitter-agent.channels=channel1
Twitter-agent.sinks=sink1
#configuring sources
Twitter-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource
Twitter-agent.sources.source1.channels=channel1
Twitter-agent.sources.source1.consumerKey=<consumer-key>
Twitter-agent.sources.source1.consumerSecret=<consumer Secret>
Twitter-agent.sources.source1.accessToken=<access Toekn>
Twitter-agent.sources.source1.accessTokenSecret=<acess Token Secret>
Twitter-agent.sources.source1.keywords= morning, night, hadoop, bigdata
#configuring channels
Twitter-agent.channels.channel1.type=memory
Twitter-agent.channels.channel1.capacity=10000
Twitter-agent.channels.channel1.transactionCapacity=100
#configuring sinks
Twitter-agent.sinks.sink1.channel=channel1
Twitter-agent.sinks.sink1.type=hdfs
Twitter-agent.sinks.sink1.hdfs.path=flume/twitter/logs
Twitter-agent.sinks.sink1.rollSize=0
Twitter-agent.sinks.sink1.rollCount=1000
Twitter-agent.sinks.sink1.batchSize=100
Twitter-agent.sinks.sink1.fileType=DataStream
Twitter-agent.sinks.sink1.writeFormat=Text
when I run this agent, I am getting an error like this:
15/06/22 14:14:49 INFO source.DefaultSourceFactory: Creating instance of source source1, type com.cloudera.flume.source.TwitterSource
15/06/22 14:14:49 ERROR node.PollingPropertiesFileConfigurationProvider: Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.isStallWarningsEnabled()Z
at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.<init>(TwitterSource.java:64)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:44)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
My flume/lib folder already has twitter4j-core-3.0.3.jar
How do I rectify this error?
I found a solution to this issue. As flume-sources-1.0-SNAPSHOT.jar and twitter4j-stream-3.0.3.jar contains the same FilterQuery.class, there arises a jar conflict. All twitter4j-3.x.x uses this class so it would be better to download twitter jars of version 2.2.6(twitter4j-core,twitter4j-stream,twitter4j-media-support) and replace 3.x.x with these newly downloaded jars under flume/lib directory.
Run the agent again and twitter data will be streamed to HDFS.
Change
Twitter-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource
with
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource

Resources