I am trying to use Apache Flume for saving tweets to my HDFS. I am currently using the Cloudera image with Hadoop and Flume. I was following the tutorial from Cloudera's blog, but I am not able to connect to the Twitter API.
I am getting following error:
2014-03-14 09:43:14,021 INFO org.apache.flume.node.Application: Waiting for channel: MemChannel to start. Sleeping for 500 ms
2014-03-14 09:43:14,069 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
2014-03-14 09:43:14,069 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
2014-03-14 09:43:14,522 INFO org.apache.flume.node.Application: Starting Sink HDFS
2014-03-14 09:43:14,522 INFO org.apache.flume.node.Application: Starting Source Twitter
2014-03-14 09:43:14,525 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
2014-03-14 09:43:14,525 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
2014-03-14 09:43:14,595 INFO twitter4j.TwitterStreamImpl: Establishing connection.
2014-03-14 09:43:14,680 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2014-03-14 09:43:14,823 INFO org.mortbay.log: jetty-6.1.26
2014-03-14 09:43:14,946 INFO org.mortbay.log: Started SocketConnector#0.0.0.0:41414
2014-03-14 09:43:16,249 INFO twitter4j.TwitterStreamImpl: 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.
HTTP ERROR: 401
Problem accessing '/1.1/statuses/filter.json'. Reason:
Unauthorized
2014-03-14 09:43:16,249 INFO twitter4j.TwitterStreamImpl: Waiting for 10000 milliseconds
2014-03-14 09:43:26,251 INFO twitter4j.TwitterStreamImpl: Establishing
I have copied my twitter API credentials to the flume.conf (I have tried in both on disc and web UI). I have also tried to regenerate them again and copy those new ones, but it didn't help me.
My pom.xml contains:
<dependency>
<groupId>org.twitter4j</groupId>
<artifactId>twitter4j-stream</artifactId>
<version>3.0.5</version>
</dependency>
That means that there shouldn't be the problem that is described here.
And I have also set the system time by command:
sudo ntpdate pool.ntp.org
Does anybody have some idea of what can possibly be wrong?
Thank you very much in advance for any suggestions and help.
Try upgrading to Twitter4J 3.0.6.. I resolved a similar issue by upgrading to 3.0.6
Update:
Its because of invalid consumer key/secret, access token/secret, and make sure to have the system clock is in sync."
Related
I got error Failed to instantiate Partitioner class when I try to create s3 source connector. What was done:
Installed confluent-hub and confluentinc/kafka-connect-s3-source, CLASSHPATH was exported. (1.0.1 is latest version)
$ confluent-hub install --no-prompt confluentinc/kafka-connect-s3-source:1.0.1
$ export CLASSPATH=/connector/share/confluent-hub-components/confluentinc-kafka-connect-s3-source/lib/*
Connectors settings are default from documentation (connector.properties)
name=s3-source
tasks.max=1
connector.class=io.confluent.connect.s3.source.S3SourceConnector
s3.bucket.name=confluent-kafka-connect-s3-testing
format.class=io.confluent.connect.s3.format.avro.AvroFormat
confluent.license=
confluent.topic.bootstrap.servers=kafka:9092
confluent.topic.replication.factor=1
transforms=AddPrefix
transforms.AddPrefix.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.AddPrefix.regex=.*
transforms.AddPrefix.replacement=copy_of_$0
Detailed error
$ connect-standalone.sh worker.properties connector.properties
[2019-10-16 12:36:02,410] INFO Kafka version: 2.3.0 (org.apache.kafka.common.utils.AppInfoParser:117)
[2019-10-16 12:36:02,411] INFO Kafka commitId: fc1aaa116b661c8a (org.apache.kafka.common.utils.AppInfoParser:118)
[2019-10-16 12:36:02,412] INFO Kafka startTimeMs: 1571229362410 (org.apache.kafka.common.utils.AppInfoParser:119)
[2019-10-16 12:36:02,675] INFO License for single cluster, single node (io.confluent.license.LicenseManager:417)
[2019-10-16 12:36:02,683] INFO Closing License Store (io.confluent.license.LicenseStore:197)
[2019-10-16 12:36:02,683] INFO Stopping KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog:164)
[2019-10-16 12:36:02,686] INFO [Producer clientId=s3-source-license-manager] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1153)
[2019-10-16 12:36:02,701] INFO Stopped KafkaBasedLog for topic _confluent-command (org.apache.kafka.connect.util.KafkaBasedLog:190)
[2019-10-16 12:36:02,702] INFO Closed License Store (io.confluent.license.LicenseStore:199)
[2019-10-16 12:36:02,704] ERROR WorkerConnector{id=s3-source} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector:119)
org.apache.kafka.connect.errors.ConnectException: Failed to instantiate Partitioner class
at io.confluent.connect.s3.source.S3SourceConnectorConfig.getPartitioner(S3SourceConnectorConfig.java:612)
at io.confluent.connect.s3.source.S3SourceConnector.doStart(S3SourceConnector.java:94)
at io.confluent.connect.s3.source.S3SourceConnector.start(S3SourceConnector.java:86)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:111)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:136)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:196)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:252)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.startConnector(StandaloneHerder.java:293)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:209)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:115)
I am unfamiliar with Java but now I trying to look into source code inside jars. Any help would be helpful and thanks in advance.
It sounds like your installation is a bit screwy. To run Kafka Connect under Docker you should use a dedicated image such as confluentinc/cp-kafka-connect.
To see an example of Kafka Connect deployed with Docker have a look at http://rmoff.dev/bbuzz19_demo-code and accompanying talk and slides
I'm only attempting to launch the Nifi UI as a local instance to start playing with it. I've unzipped the package and made sure to set the JAVA_HOME variable to my Java 1.8. When I try to bin/run-nifi, in my nifi-app log, the error message is:
2018-05-03 15:03:50,585 INFO [main] org.apache.nifi.NiFi Launching NiFi...
2018-05-03 15:03:52,330 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be 'Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\conf\nifi.properties'
2018-05-03 15:03:52,363 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 146 properties from Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\conf\nifi.properties
2018-05-03 15:03:52,423 INFO [main] org.apache.nifi.NiFi Loaded 146 properties
2018-05-03 15:03:52,779 INFO [main] org.apache.nifi.BootstrapListener Started Bootstrap Listener, Listening for incoming requests on port 64802
2018-05-03 15:03:53,071 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
2018-05-03 15:03:53,181 WARN [main] org.apache.nifi.nar.NarUnpacker Unable to load NAR library bundles due to java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework directory does not have read/write privilege Will proceed without loading any further Nar bundles
2018-05-03 15:03:53,242 ERROR [main] org.apache.nifi.NiFi Failure to launch NiFi due to java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework could not be created
java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework could not be created
at org.apache.nifi.util.FileUtils.ensureDirectoryExistAndCanReadAndWrite(FileUtils.java:48)
at org.apache.nifi.nar.NarClassLoaders.load(NarClassLoaders.java:155)
at org.apache.nifi.nar.NarClassLoaders.init(NarClassLoaders.java:131)
at org.apache.nifi.NiFi.<init>(NiFi.java:133)
at org.apache.nifi.NiFi.<init>(NiFi.java:71)
at org.apache.nifi.NiFi.main(NiFi.java:292)
2018-05-03 15:03:53,383 INFO [Thread-1] org.apache.nifi.NiFi Initiating shutdown of Jetty web server...
2018-05-03 15:03:53,387 INFO [Thread-1] org.apache.nifi.NiFi Jetty web server shutdown completed (nicely or otherwise).
I've followed the installation instructions and haven't been able to trouble shoot. How do I load these NAR files upon running Nifi?
Thanks
I believe the underlying error in your output is java.io.IOException: Z:\DoE\LOCAL-~1\NIFI-1~1.0\.\work\nar\framework could not be created.
NiFi requires file permissions to create and write several directories, there is a list in the NiFi Admin Guide: How to install and start NiFi. NiFi does this to unpack the NAR files, write logs, and for various data repositories that comprise your data flow.
You have a few options:
Modify the permissions of the directory to allow NiFi read/write access. This can be done for each individual child directory.
Copy the entire NiFi distribution to a read/write location and run it from there.
Edit the conf/nifi-properties file to change the locations of these directories to read/write locations. See NiFi Admin Guide: System Properties for help on the properties.
Symlinks are a great solution for systems that support symlinks.
Two things you can try:
Run NiFi with administrator privilege (not a good practice) by going to ~\<NIFI_INSTALLATION_DIR>\bin and right click run-nifi.bat. Click Run as Administrator
Move NiFi directory to a location where the logged in user has full access to. Ex: C:\Users\<YOUR_USER>\Documents\. Now try to execute bin\run-nifi.bat
Similarly to the resolution that James proposed. I had to do the below 3-step process.
My scenario: I'm using docker containers and had the same problem. Even changing the user of my container to root didn't work. So, I did the following:
1 - Download Minifi https://nifi.apache.org/minifi/download.html
2 - Untar and execute the Minifi agent on my own laptop (I'm using MAC) so that the necessary folders and files will be created.
3 - Tar it up again and add to the DockerFile of my container creation
Done! Everything worked fine after that.
I am running a spark streaming application that simply read messages from a Kafka topic, enrich them and then write the enriched messages in another kafka topic.
I already tried it in Standalone mode (both client and cluster deploy mode) and in YARN client mode, successfully.
When I submit the application in cluster mode it gives me the following messages:
18/01/10 12:13:34 INFO Client: Submitting application application_1515582681419_0001 to ResourceManager
18/01/10 12:13:34 INFO YarnClientImpl: Submitted application application_1515582681419_0001
18/01/10 12:13:35 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
18/01/10 12:13:35 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1515582814080
final status: UNDEFINED
tracking URL: http://ambari1.internal:8088/proxy/application_1515582681419_0001/
user: root
18/01/10 12:13:36 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
18/01/10 12:13:37 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
And keeps stuck in ACCEPTED Status until after around 4-5 minutes, exit with the following error message:
18/01/10 12:17:00 INFO InputInfoTracker: remove old batch metadata: 1515583000000 ms
18/01/10 12:17:02 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/01/10 12:17:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
18/01/10 12:17:02 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook
18/01/10 12:17:02 INFO ReceiverTracker: ReceiverTracker stopped
18/01/10 12:17:02 INFO JobGenerator: Stopping JobGenerator immediately
Funny fact: If I visit the age of the application, I can see that the Spark Context has been started and it processes some messages.
Could anyone help me on this?
PS: These are the resources of my YARN cluster:
The problem might be with Yarn "App Timeline Server". Try to restart it.
Are you creating your spark session with master as local?. Please do check this.
I am trying to kerberize the AWS EMR cluster. I have enabled hadoop security, created the kerberos principals and deployed them on all the nodes.
However, when I start the namenode using the command 'sudo start hadoop-hdfs-namenode' following exception is thrown.
2016-06-08 06:14:06,515 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor (main): Number of failed storage changes from 0 to 0
2016-06-08 06:14:06,515 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager (org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor#ac4860): Updating block keys
2016-06-08 06:14:06,544 INFO org.apache.hadoop.ipc.Server (IPC Server Responder): IPC Server Responder: starting
2016-06-08 06:14:06,544 INFO org.apache.hadoop.ipc.Server (IPC Server listener on 8020): IPC Server listener on 8020: starting
2016-06-08 06:14:06,560 INFO org.apache.hadoop.hdfs.server.namenode.NameNode (main): NameNode RPC up at: ip-172-31-21-213.ap-southeast-1.compute.internal/172.31.21.213:8020
2016-06-08 06:14:06,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem (main): Starting services required for active state
2016-06-08 06:14:06,564 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor(443740501)): Starting CacheReplicationMonitor with interval 30000 milliseconds
2016-06-08 06:14:06,763 INFO org.apache.hadoop.ipc.Server (Socket Reader #1 for port 8020): Socket Reader #1 for port 8020: readAndProcess from client 172.31.21.213 threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]]
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1564)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1520)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)
Kindly help me in this regard. Thanks in advance.
The client doesn't think security is enabled; it's only trying to use "SIMPLE" auth -the caller is who they say they are. The server will only take Kerberos tickets or Hadoop delegation tokens previous acquired by by a caller with a valid Kerberos ticket.
I have installed Neo4j 2.3.2 Community Edition on Mac OS 10.10. I can launch the application and connect to it from localhost:7474/browser/. So far, so good.
I would like to launch Neo4j 2.3.2 from a Terminal window, so that I don't have the overhead of a windowed application running at the same time. When I run the following command...
$ ~/neo4j/bin/neo4j console
... I get this output in the Terminal window:
WARNING: Max 256 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Starting Neo4j Server console-mode...
Unable to find any JVMs matching version "1.7".
Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow -XX:hashCode=5 -Dneo4j.ext.udc.source=tarball
2016-02-25 14:03:18.755+0000 INFO [API] Setting startup timeout to: 120000ms based on 120000
2016-02-25 14:03:58.356+0000 INFO [API] Successfully started database
2016-02-25 14:04:04.220+0000 INFO [API] Starting HTTP on port :7474 with 2 threads available
2016-02-25 14:04:13.512+0000 INFO [API] Enabling HTTPS on port :7473
09:04:20.201 [main] INFO org.eclipse.jetty.util.log - Logging initialized #98517ms
2016-02-25 14:04:23.034+0000 INFO [API] Mounting static content at [/webadmin] from [webadmin-html]
2016-02-25 14:04:25.785+0000 INFO [API] Mounting static content at [/browser] from [browser]
09:04:25.993 [main] INFO org.eclipse.jetty.server.Server - jetty-9.2.4.v20141103
09:04:26.722 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.h.MovedContextHandler#1611ba2{/,null,AVAILABLE}
09:04:27.794 [main] INFO o.e.j.w.StandardDescriptorProcessor - NO JSP Support for /webadmin, did not find org.apache.jasper.servlet.JspServlet
09:04:27.981 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext#132ea25{/webadmin,jar:file:/Users/james/neo4j/system/lib/neo4j-server-2.2.5-static-web.jar!/webadmin-html,AVAILABLE}
09:04:38.841 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#60bfaa02{/db/manage,null,AVAILABLE}
09:04:39.326 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#28e2e149{/db/data,null,AVAILABLE}
09:04:39.353 [main] INFO o.e.j.w.StandardDescriptorProcessor - NO JSP Support for /browser, did not find org.apache.jasper.servlet.JspServlet
09:04:39.355 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext#78e6aa71{/browser,jar:file:/Users/james/neo4j/system/lib/neo4j-browser-2.2.5.jar!/browser,AVAILABLE}
09:04:39.536 [main] INFO o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler#4994d9ab{/,null,AVAILABLE}
09:04:39.745 [main] INFO o.e.jetty.server.ServerConnector - Started ServerConnector#2d19cf20{HTTP/1.1}{localhost:7474}
09:04:40.576 [main] INFO o.e.jetty.server.ServerConnector - Started ServerConnector#43c742c{SSL-HTTP/1.1}{localhost:7473}
09:04:40.577 [main] INFO org.eclipse.jetty.server.Server - Started #119058ms
2016-02-25 14:04:40.577+0000 INFO [API] Server started on: http://localhost:7474/
2016-02-25 14:04:40.590+0000 INFO [API] Remote interface ready and available at [http://localhost:7474/]
I have Java version 8, update 74 installed (build 1.8.0_74-b02), so I assume that I can ignore the warning Unable to find any JVMs matching version "1.7".
However, when I visit http://localhost:7474/ in Chrome Version 45.0.2454.85 (64-bit), I see three errors in the Developer Console: two files that fail to load and a subsequent script error.
localhost/:28 GET http://localhost:7474/browser/styles/68eddd94.main.css
localhost/:466 GET http://localhost:7474/browser/scripts/ded362b3.scripts.js
Uncaught Error: [$injector:modulerr] Failed to instantiate module neo4jApp due to:
Error: [$injector:nomod] Module 'neo4jApp' is not available! You either misspelled the module name or forgot to load it. If registering a module ensure that you specify the dependencies as the second argument.
As a result, the Neo4j interface does not appear in the browser window.
Is it possible to run Neo4j 2.3.2 from the Terminal, and if so, what do I need to do to get http://localhost:7474/ to load correctly?
Shift-reload, or test in an incognito window.
Looks like a JS file mismatch due to aggressive browser caching.