Apache Flume connection to the twitter API 401:Authentication credentials - hadoop

I am trying to use Apache Flume for saving tweets to my HDFS. I am currently using the Cloudera image with Hadoop and Flume. I was following the tutorial from Cloudera's blog, but I am not able to connect to the Twitter API.
I am getting following error:
2014-03-14 09:43:14,021 INFO org.apache.flume.node.Application: Waiting for channel: MemChannel to start. Sleeping for 500 ms
2014-03-14 09:43:14,069 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
2014-03-14 09:43:14,069 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
2014-03-14 09:43:14,522 INFO org.apache.flume.node.Application: Starting Sink HDFS
2014-03-14 09:43:14,522 INFO org.apache.flume.node.Application: Starting Source Twitter
2014-03-14 09:43:14,525 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
2014-03-14 09:43:14,525 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
2014-03-14 09:43:14,595 INFO twitter4j.TwitterStreamImpl: Establishing connection.
2014-03-14 09:43:14,680 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2014-03-14 09:43:14,823 INFO org.mortbay.log: jetty-6.1.26
2014-03-14 09:43:14,946 INFO org.mortbay.log: Started SocketConnector#0.0.0.0:41414
2014-03-14 09:43:16,249 INFO twitter4j.TwitterStreamImpl: 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.
HTTP ERROR: 401
Problem accessing '/1.1/statuses/filter.json'. Reason:
Unauthorized
2014-03-14 09:43:16,249 INFO twitter4j.TwitterStreamImpl: Waiting for 10000 milliseconds
2014-03-14 09:43:26,251 INFO twitter4j.TwitterStreamImpl: Establishing
I have copied my twitter API credentials to the flume.conf (I have tried in both on disc and web UI). I have also tried to regenerate them again and copy those new ones, but it didn't help me.
My pom.xml contains:
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>3.0.5</version>
</dependency>
That means that there shouldn't be the problem that is described here.
And I have also set the system time by command:
sudo ntpdate pool.ntp.org
Does anybody have some idea of what can possibly be wrong?
Thank you very much in advance for any suggestions and help.

Try upgrading to Twitter4J 3.0.6.. I resolved a similar issue by upgrading to 3.0.6

Update:
Its because of invalid consumer key/secret, access token/secret, and make sure to have the system clock is in sync."

Related

Kafka s3 source connector: Failed to instantiate Partitioner class

I got error Failed to instantiate Partitioner class when I try to create s3 source connector. What was done:
Installed confluent-hub and confluentinc/kafka-connect-s3-source, CLASSHPATH was exported. (1.0.1 is latest version)
$ confluent-hub install --no-prompt confluentinc/kafka-connect-s3-source:1.0.1
$ export CLASSPATH=/connector/share/confluent-hub-components/confluentinc-kafka-connect-s3-source/lib/*
Connectors settings are default from documentation (connector.properties)
name=s3-source
tasks.max=1
connector.class=io.confluent.connect.s3.source.S3SourceConnector
s3.bucket.name=confluent-kafka-connect-s3-testing
format.class=io.confluent.connect.s3.format.avro.AvroFormat
confluent.license=
confluent.topic.bootstrap.servers=kafka:9092
confluent.topic.replication.factor=1
transforms=AddPrefix
transforms.AddPrefix.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.AddPrefix.regex=.*
transforms.AddPrefix.replacement=copy_of_$0
Detailed error
$ connect-standalone.sh worker.properties connector.properties
[2019-10-16 12:36:02,410] INFO Kafka version: 2.3.0 (org.apache.kafka.common.utils.AppInfoParser:117)
[2019-10-16 12:36:02,411] INFO Kafka commitId: fc1aaa116b661c8a (org.apache.kafka.common.utils.AppInfoParser:118)
[2019-10-16 12:36:02,412] INFO Kafka startTimeMs: 1571229362410 (org.apache.kafka.common.utils.AppInfoParser:119)
[2019-10-16 12:36:02,675] INFO License for single cluster, single node (io.confluent.license.LicenseManager:417)
[2019-10-16 12:36:02,683] INFO Closing License Store (io.confluent.license.LicenseStore:197)
[2019-10-16 12:36:02,683] INFO Stopping KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog:164)
[2019-10-16 12:36:02,686] INFO [Producer clientId=s3-source-license-manager] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1153)
[2019-10-16 12:36:02,701] INFO Stopped KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog:190)
[2019-10-16 12:36:02,702] INFO Closed License Store (io.confluent.license.LicenseStore:199)
[2019-10-16 12:36:02,704] ERROR WorkerConnector{id=s3-source} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector:119)
org.apache.kafka.connect.errors.ConnectException: Failed to instantiate Partitioner class
at io.confluent.connect.s3.source.S3SourceConnectorConfig.getPartitioner(S3SourceConnectorConfig.java:612)
at io.confluent.connect.s3.source.S3SourceConnector.doStart(S3SourceConnector.java:94)
at io.confluent.connect.s3.source.S3SourceConnector.start(S3SourceConnector.java:86)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:111)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:136)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:196)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:252)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.startConnector(StandaloneHerder.java:293)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:209)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:115)
I am unfamiliar with Java but now I trying to look into source code inside jars. Any help would be helpful and thanks in advance.
It sounds like your installation is a bit screwy. To run Kafka Connect under Docker you should use a dedicated image such as confluentinc/cp-kafka-connect.
To see an example of Kafka Connect deployed with Docker have a look at http://rmoff.dev/bbuzz19_demo-code and accompanying talk and slides

Apache Nifi windows unable to load NAR library bundles

I'm only attempting to launch the Nifi UI as a local instance to start playing with it. I've unzipped the package and made sure to set the JAVA_HOME variable to my Java 1.8. When I try to bin/run-nifi, in my nifi-app log, the error message is:
2018-05-03 15:03:50,585 INFO [main] org.apache.nifi.NiFi Launching NiFi...
2018-05-03 15:03:52,330 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be 'Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\conf\nifi.properties'
2018-05-03 15:03:52,363 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 146 properties from Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\conf\nifi.properties
2018-05-03 15:03:52,423 INFO [main] org.apache.nifi.NiFi Loaded 146 properties
2018-05-03 15:03:52,779 INFO [main] org.apache.nifi.BootstrapListener Started Bootstrap Listener, Listening for incoming requests on port 64802
2018-05-03 15:03:53,071 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
2018-05-03 15:03:53,181 WARN [main] org.apache.nifi.nar.NarUnpacker Unable to load NAR library bundles due to java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework directory does not have read/write privilege Will proceed without loading any further Nar bundles
2018-05-03 15:03:53,242 ERROR [main] org.apache.nifi.NiFi Failure to launch NiFi due to java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework could not be created
java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework could not be created
at org.apache.nifi.util.FileUtils.ensureDirectoryExistAndCanReadAndWrite(FileUtils.java:48)
at org.apache.nifi.nar.NarClassLoaders.load(NarClassLoaders.java:155)
at org.apache.nifi.nar.NarClassLoaders.init(NarClassLoaders.java:131)
at org.apache.nifi.NiFi.<init>(NiFi.java:133)
at org.apache.nifi.NiFi.<init>(NiFi.java:71)
at org.apache.nifi.NiFi.main(NiFi.java:292)
2018-05-03 15:03:53,383 INFO [Thread-1] org.apache.nifi.NiFi Initiating shutdown of Jetty web server...
2018-05-03 15:03:53,387 INFO [Thread-1] org.apache.nifi.NiFi Jetty web server shutdown completed (nicely or otherwise).
I've followed the installation instructions and haven't been able to trouble shoot. How do I load these NAR files upon running Nifi?
Thanks
I believe the underlying error in your output is java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework could not be created.
NiFi requires file permissions to create and write several directories, there is a list in the NiFi Admin Guide: How to install and start NiFi. NiFi does this to unpack the NAR files, write logs, and for various data repositories that comprise your data flow.
You have a few options:
Modify the permissions of the directory to allow NiFi read/write access. This can be done for each individual child directory.
Copy the entire NiFi distribution to a read/write location and run it from there.
Edit the conf/nifi-properties file to change the locations of these directories to read/write locations. See NiFi Admin Guide: System Properties for help on the properties.
Symlinks are a great solution for systems that support symlinks.
Two things you can try:
Run NiFi with administrator privilege (not a good practice) by going to ~\<NIFI_INSTALLATION_DIR>\bin and right click run-nifi.bat. Click Run as Administrator
Move NiFi directory to a location where the logged in user has full access to. Ex: C:\Users\<YOUR_USER>\Documents\. Now try to execute bin\run-nifi.bat
Similarly to the resolution that James proposed. I had to do the below 3-step process.
My scenario: I'm using docker containers and had the same problem. Even changing the user of my container to root didn't work. So, I did the following:
1 - Download Minifi https://nifi.apache.org/minifi/download.html
2 - Untar and execute the Minifi agent on my own laptop (I'm using MAC) so that the necessary folders and files will be created.
3 - Tar it up again and add to the DockerFile of my container creation
Done! Everything worked fine after that.

Spark streaming job on YARN cluster mode stuck in accepted, then fails with a Timeout Exception

I am running a spark streaming application that simply read messages from a Kafka topic, enrich them and then write the enriched messages in another kafka topic.
I already tried it in Standalone mode (both client and cluster deploy mode) and in YARN client mode, successfully.
When I submit the application in cluster mode it gives me the following messages:
18/01/10 12:13:34 INFO Client: Submitting application application_1515582681419_0001 to ResourceManager
18/01/10 12:13:34 INFO YarnClientImpl: Submitted application application_1515582681419_0001
18/01/10 12:13:35 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
18/01/10 12:13:35 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1515582814080
final status: UNDEFINED
tracking URL: http://ambari1.internal:8088/proxy/application_1515582681419_0001/
user: root
18/01/10 12:13:36 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
18/01/10 12:13:37 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
And keeps stuck in ACCEPTED Status until after around 4-5 minutes, exit with the following error message:
18/01/10 12:17:00 INFO InputInfoTracker: remove old batch metadata: 1515583000000 ms
18/01/10 12:17:02 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/01/10 12:17:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
18/01/10 12:17:02 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook
18/01/10 12:17:02 INFO ReceiverTracker: ReceiverTracker stopped
18/01/10 12:17:02 INFO JobGenerator: Stopping JobGenerator immediately
Funny fact: If I visit the age of the application, I can see that the Spark Context has been started and it processes some messages.
Could anyone help me on this?
PS: These are the resources of my YARN cluster:
The problem might be with Yarn "App Timeline Server". Try to restart it.
Are you creating your spark session with master as local?. Please do check this.

AWS EMR kerberizing the cluster hadoop.security.AccessControlException

I am trying to kerberize the AWS EMR cluster. I have enabled hadoop security, created the kerberos principals and deployed them on all the nodes.
However, when I start the namenode using the command 'sudo start hadoop-hdfs-namenode' following exception is thrown.
2016-06-08 06:14:06,515 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor (main): Number of failed storage changes from 0 to 0
2016-06-08 06:14:06,515 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager (org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor#ac4860): Updating block keys
2016-06-08 06:14:06,544 INFO org.apache.hadoop.ipc.Server (IPC Server Responder): IPC Server Responder: starting
2016-06-08 06:14:06,544 INFO org.apache.hadoop.ipc.Server (IPC Server listener on 8020): IPC Server listener on 8020: starting
2016-06-08 06:14:06,560 INFO org.apache.hadoop.hdfs.server.namenode.NameNode (main): NameNode RPC up at: ip-172-31-21-213.ap-southeast-1.compute.internal/172.31.21.213:8020
2016-06-08 06:14:06,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem (main): Starting services required for active state
2016-06-08 06:14:06,564 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor(443740501)): Starting CacheReplicationMonitor with interval 30000 milliseconds
2016-06-08 06:14:06,763 INFO org.apache.hadoop.ipc.Server (Socket Reader #1 for port 8020): Socket Reader #1 for port 8020: readAndProcess from client 172.31.21.213 threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]]
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1564)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1520)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)
Kindly help me in this regard. Thanks in advance.
The client doesn't think security is enabled; it's only trying to use "SIMPLE" auth -the caller is who they say they are. The server will only take Kerberos tickets or Hadoop delegation tokens previous acquired by by a caller with a valid Kerberos ticket.

Launch Neo4j 2.3.2 from the commandline

I have installed Neo4j 2.3.2 Community Edition on Mac OS 10.10. I can launch the application and connect to it from localhost:7474/browser/. So far, so good.
I would like to launch Neo4j 2.3.2 from a Terminal window, so that I don't have the overhead of a windowed application running at the same time. When I run the following command...
$ ~/neo4j/bin/neo4j console
... I get this output in the Terminal window:
WARNING: Max 256 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Starting Neo4j Server console-mode...
Unable to find any JVMs matching version "1.7".
Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow -XX:hashCode=5 -Dneo4j.ext.udc.source=tarball
2016-02-25 14:03:18.755+0000 INFO [API] Setting startup timeout to: 120000ms based on 120000
2016-02-25 14:03:58.356+0000 INFO [API] Successfully started database
2016-02-25 14:04:04.220+0000 INFO [API] Starting HTTP on port :7474 with 2 threads available
2016-02-25 14:04:13.512+0000 INFO [API] Enabling HTTPS on port :7473
09:04:20.201 [main] INFO org.eclipse.jetty.util.log - Logging initialized #98517ms
2016-02-25 14:04:23.034+0000 INFO [API] Mounting static content at [/webadmin] from [webadmin-html]
2016-02-25 14:04:25.785+0000 INFO [API] Mounting static content at [/browser] from [browser]
09:04:25.993 [main] INFO org.eclipse.jetty.server.Server - jetty-9.2.4.v20141103
09:04:26.722 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.h.MovedContextHandler#1611ba2{/,null,AVAILABLE}
09:04:27.794 [main] INFO o.e.j.w.StandardDescriptorProcessor - NO JSP Support for /webadmin, did not find org.apache.jasper.servlet.JspServlet
09:04:27.981 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext#132ea25{/webadmin,jar:file:/Users/james/neo4j/system/lib/neo4j-server-2.2.5-static-web.jar!/webadmin-html,AVAILABLE}
09:04:38.841 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#60bfaa02{/db/manage,null,AVAILABLE}
09:04:39.326 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#28e2e149{/db/data,null,AVAILABLE}
09:04:39.353 [main] INFO o.e.j.w.StandardDescriptorProcessor - NO JSP Support for /browser, did not find org.apache.jasper.servlet.JspServlet
09:04:39.355 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext#78e6aa71{/browser,jar:file:/Users/james/neo4j/system/lib/neo4j-browser-2.2.5.jar!/browser,AVAILABLE}
09:04:39.536 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#4994d9ab{/,null,AVAILABLE}
09:04:39.745 [main] INFO o.e.jetty.server.ServerConnector - Started ServerConnector#2d19cf20{HTTP/1.1}{localhost:7474}
09:04:40.576 [main] INFO o.e.jetty.server.ServerConnector - Started ServerConnector#43c742c{SSL-HTTP/1.1}{localhost:7473}
09:04:40.577 [main] INFO org.eclipse.jetty.server.Server - Started #119058ms
2016-02-25 14:04:40.577+0000 INFO [API] Server started on: http://localhost:7474/
2016-02-25 14:04:40.590+0000 INFO [API] Remote interface ready and available at [http://localhost:7474/]
I have Java version 8, update 74 installed (build 1.8.0_74-b02), so I assume that I can ignore the warning Unable to find any JVMs matching version "1.7".
However, when I visit http://localhost:7474/ in Chrome Version 45.0.2454.85 (64-bit), I see three errors in the Developer Console: two files that fail to load and a subsequent script error.
localhost/:28 GET http://localhost:7474/browser/styles/68eddd94.main.css
localhost/:466 GET http://localhost:7474/browser/scripts/ded362b3.scripts.js
Uncaught Error: [$injector:modulerr] Failed to instantiate module neo4jApp due to:
Error: [$injector:nomod] Module 'neo4jApp' is not available! You either misspelled the module name or forgot to load it. If registering a module ensure that you specify the dependencies as the second argument.
As a result, the Neo4j interface does not appear in the browser window.
Is it possible to run Neo4j 2.3.2 from the Terminal, and if so, what do I need to do to get http://localhost:7474/ to load correctly?
Shift-reload, or test in an incognito window.
Looks like a JS file mismatch due to aggressive browser caching.

Resources