getting error in twitter streaming through flume - hadoop

**While running the flume command i am getting the following error,i tried changing the envi variables in .bashrc along with classpath in flume.env.sh ,still no use
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
16/12/08 01:57:11 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
16/12/08 01:57:11 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:../conf/twitter.conf
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.path
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.path = hdfs://localhost:8020/datamain/tweets
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.writeFormat
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.writeFormat = Text
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.rollCount
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.rollCount = 10000
16/12/08 01:57:11 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.rollSize
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.rollSize = 0
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.channels
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.channels = MemChannel
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.batchSize
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.batchSize = 1000
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.fileType
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.fileType = DataStream
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.type
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.type = hdfs
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.rollInterval
16/12/08 01:57:11 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.rollInterval = 600
16/12/08 01:57:11 WARN conf.FlumeConfiguration: no context for sinkHDFS
16/12/08 01:57:12 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
16/12/08 01:57:12 INFO node.AbstractConfigurationProvider: Creating channels
16/12/08 01:57:12 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
16/12/08 01:57:12 INFO node.AbstractConfigurationProvider: Created channel MemChannel
16/12/08 01:57:12 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type org.apache.flume.source.twitter.TwitterSource
16/12/08 01:57:12 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data. Exception follows.
org.apache.flume.FlumeException: Unable to load source type: org.apache.flume.source.twitter.TwitterSource, class: org.apache.flume.source.twitter.TwitterSource
at org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:67)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:40)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:327)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.flume.source.twitter.TwitterSource
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:195)
at org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:65)
... 11 more
**Also seems like there is problem with my sink command in twitter.conf but i am unable to figure it out,below is the twitter.conf file
`TwitterAgent.sources =Twitter
TwitterAgent.channels =MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = ###
TwitterAgent.sources.Twitter.consumerSecret = ###
TwitterAgent.sources.Twitter.accessToken = ###
TwitterAgent.sources.Twitter.accessTokenSecret = ####
TwitterAgent.sources.Twitter.keywords =donald trump,republican,democratic
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 10000
TwitterAgent.sink.HDFS.channels = MemChannel
TwitterAgent.sink.HDFS.type = hdfs
TwitterAgent.sink.HDFS.hdfs.path =hdfs://localhost:8020/datamain/tweets
TwitterAgent.sink.HDFS.hdfs.fileType = DataStream
TwitterAgent.sink.HDFS.hdfs.writeFormat = Text
TwitterAgent.sink.HDFS.hdfs.batchSize = 1000
TwitterAgent.sink.HDFS.hdfs.rollSize = 0
TwitterAgent.sink.HDFS.hdfs.rollCount = 10000
TwitterAgent.sink.HDFS.hdfs.rollInterval = 600 `
here are my flume.env.sh file details
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
# during Flume startup.
# Enviroment variables can be set here.
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
JAVA_OPTS="-Xms500m -Xmx1000m -Dcom.sun.management.jmxremote"
# Note that the Flume conf directory is always included in the classpath.
FLUME_CLASSPATH=/home/user/hadoop_store/flume-sources-1.0-SNAPSHOT.jar
.bashrc details
##Flume home directory
export FLUME_HOME=/home/user/hadoop_store/apache-flume-1.4.0-bin
export FLUME_CONF_DIR=$FLUME_HOME/conf
export FLUME_CLASSPATH=$FLUME_CONF_DIR
export PATH=$FLUME_HOME/bin:$PATH
##Flume home directory
tried changing snapshot file path still didnt work,

By looking your log file messages, you are missing two things:
Using org.apache.flume.source.twitter.TwitterSource class in your
configuration so flume-sources-1.0-SNAPSHOT.jar must be present in
your flume lib directory.
Replace sink with sinks keyword below:
TwitterAgent.sinks.HDFS.channels = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path =hdfs://localhost:8020/datamain/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
Hope this help you!!!

Related

Hadoop's ResourceManager fails to start when started through Ansible

I am having trouble starting YARN (via ./start-yarn.sh) in pseudo-distributed mode when doing it through Ansible.
This is the part of my playbook that starts YARN (where hadoop_home is /home/hdoop/hadoop):
- name: hdoop > Start YARN
become: true
become_user: hdoop
environment:
HADOOP_HOME: "{{ hadoop_home }}"
HADOOP_HDFS_HOME: "{{ hadoop_home }}"
HADOOP_CONF_DIR: "{{ hadoop_home }}/etc/hadoop"
HADOOP_YARN_HOME: "{{ hadoop_home }}"
shell:
cmd: "{{ hadoop_home }}/sbin/start-yarn.sh"
executable: /bin/bash
Here is my yarn-site.xml file:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>0.0.0.0:8088</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
This results in the NodeManager starting but not the ResourceManager. I get the following error:
STARTUP_MSG: build = Unknown -r 7a3bc90b05f257c8ace2f76d74264906f0f7a932; compiled by 'hexiaoqiao' on 2021-01-03T09:26Z
STARTUP_MSG: java = 1.8.0_282
************************************************************/
2021-02-20 18:27:03,040 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2021-02-20 18:27:03,632 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/home/hdoop/hadoop/etc/hadoop/core-site.xml
2021-02-20 18:27:03,736 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
2021-02-20 18:27:03,736 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-02-20 18:27:03,799 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/home/hdoop/hadoop/etc/hadoop/yarn-site.xml
2021-02-20 18:27:03,808 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
2021-02-20 18:27:03,897 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms
2021-02-20 18:27:03,905 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms
2021-02-20 18:27:03,917 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms
2021-02-20 18:27:03,965 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
2021-02-20 18:27:03,970 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
2021-02-20 18:27:03,970 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
2021-02-20 18:27:04,002 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.event.EventDispatcher
2021-02-20 18:27:04,003 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
2021-02-20 18:27:04,005 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher
2021-02-20 18:27:04,005 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher
2021-02-20 18:27:04,081 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-02-20 18:27:04,231 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-02-20 18:27:04,231 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system started
2021-02-20 18:27:04,245 INFO org.apache.hadoop.yarn.security.YarnAuthorizationProvider: org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer is instantiated.
2021-02-20 18:27:04,248 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
2021-02-20 18:27:04,254 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
2021-02-20 18:27:04,255 INFO org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo: Registered RMNMInfo MBean
2021-02-20 18:27:04,256 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor: Application lifelime monitor interval set to 3000 ms.
2021-02-20 18:27:04,261 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSortingManager: Initializing NodeSortingService=MultiNodeSortingManager
2021-02-20 18:27:04,262 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2021-02-20 18:27:04,285 INFO org.apache.hadoop.conf.Configuration: found resource capacity-scheduler.xml at file:/home/hdoop/hadoop/etc/hadoop/capacity-scheduler.xml
2021-02-20 18:27:04,293 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Minimum allocation = <memory:1024, vCores:1>
2021-02-20 18:27:04,293 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Maximum allocation = <memory:8192, vCores:4>
2021-02-20 18:27:04,382 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root is undefined
2021-02-20 18:27:04,383 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root is undefined
2021-02-20 18:27:04,407 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: root, capacity=1.0, absoluteCapacity=1.0, maxCapacity=1.0, absoluteMaxCapacity=1.0, state=RUNNING, acls=SUBMIT_APP:*ADMINISTER_QUEUE:*, labels=*,
, reservationsContinueLooking=true, orderingPolicy=utilization, priority=0
2021-02-20 18:27:04,407 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root
2021-02-20 18:27:04,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root.default is undefined
2021-02-20 18:27:04,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root.default is undefined
2021-02-20 18:27:04,444 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Initializing default
capacity = 1.0 [= (float) configuredCapacity / 100 ]
absoluteCapacity = 1.0 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
effectiveMinResource=<memory:0, vCores:0>
, effectiveMaxResource=<memory:0, vCores:0>
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 10000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 10000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 0.875 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ]
maximumAllocation = <memory:8192, vCores:4> [= configuredMaxAllocation ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = SUBMIT_APP:*ADMINISTER_QUEUE:* [= configuredAcls ]
nodeLocalityDelay = 40
rackLocalityAdditionalDelay = -1
labels=*,
reservationsContinueLooking = true
preemptionDisabled = true
defaultAppPriorityPerQueue = 0
priority = 0
maxLifetime = -1 seconds
defaultLifetime = -1 seconds
2021-02-20 18:27:04,445 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager: Initialized queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0, effectiveMinResource=<memory:0, vCores:0> , effectiveMaxResource=<memory:0, vCores:0>
2021-02-20 18:27:04,445 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager: Initialized queue: root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
2021-02-20 18:27:04,452 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager: Initialized root queue root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
2021-02-20 18:27:04,452 INFO org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule: Initialized queue mappings, override: false
2021-02-20 18:27:04,452 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.WorkflowPriorityMappingsManager: Initialized workflow priority mappings, override: false
2021-02-20 18:27:04,453 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSortingManager: MultiNode scheduling is 'false', and configured policies are
2021-02-20 18:27:04,454 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, vCores:4>>, asynchronousScheduling=false, asyncScheduleInterval=5ms,multiNodePlacementEnabled=false
2021-02-20 18:27:04,476 INFO org.apache.hadoop.conf.Configuration: dynamic-resources.xml not found
2021-02-20 18:27:04,480 INFO org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain: Initializing AMS Processing chain. Root Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor].
2021-02-20 18:27:04,480 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: disabled placement handler will be used, all scheduling requests will be rejected.
2021-02-20 18:27:04,481 INFO org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain: Adding [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor] tp top of AMS Processing chain.
2021-02-20 18:27:04,502 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: TimelineServicePublisher is not configured
2021-02-20 18:27:04,584 INFO org.eclipse.jetty.util.log: Logging initialized #2275ms to org.eclipse.jetty.util.log.Slf4jLog
2021-02-20 18:27:04,943 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2021-02-20 18:27:04,980 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined
2021-02-20 18:27:05,006 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2021-02-20 18:27:05,018 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context cluster
2021-02-20 18:27:05,018 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context static
2021-02-20 18:27:05,018 INFO org.apache.hadoop.http.HttpServer2: Added filter RMAuthenticationFilter (class=org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter) to context logs
2021-02-20 18:27:05,018 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster
2021-02-20 18:27:05,018 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2021-02-20 18:27:05,018 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2021-02-20 18:27:05,021 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /cluster/*
2021-02-20 18:27:05,021 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2021-02-20 18:27:05,021 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /app/*
2021-02-20 18:27:06,588 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2021-02-20 18:27:06,625 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8088
2021-02-20 18:27:06,628 INFO org.eclipse.jetty.server.Server: jetty-9.4.20.v20190813; built: 2019-08-13T21:28:18.144Z; git: 84700530e645e812b336747464d6fbbf370c9a20; jvm 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
2021-02-20 18:27:06,716 INFO org.eclipse.jetty.server.session: DefaultSessionIdManager workerName=node0
2021-02-20 18:27:06,716 INFO org.eclipse.jetty.server.session: No SessionScavenger set, using defaults
2021-02-20 18:27:06,721 INFO org.eclipse.jetty.server.session: node0 Scavenging every 660000ms
2021-02-20 18:27:06,763 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2021-02-20 18:27:06,798 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2021-02-20 18:27:06,819 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2021-02-20 18:27:06,836 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2021-02-20 18:27:06,906 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler#6cb6decd{logs,/logs,file:///home/hdoop/hadoop/logs/,AVAILABLE}
2021-02-20 18:27:06,915 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler#30bce90b{static,/static,jar:file:/home/hdoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.2.2.jar!/webapps/static,AVAILABLE}
2021-02-20 18:27:06,935 WARN org.eclipse.jetty.webapp.WebInfConfiguration: Can't generate resourceBase as part of webapp tmp dir name: java.lang.NullPointerException
2021-02-20 18:27:07,382 INFO org.eclipse.jetty.util.TypeUtil: JVM Runtime does not support Modules
2021-02-20 18:27:12,831 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 1: SIGHUP
2021-02-20 18:27:15,009 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext#4016ccc1{cluster,/,file:///tmp/jetty-0_0_0_0-8088-_-any-3811097377054804183.dir/webapp/,AVAILABLE}{jar:file:/home/hdoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.2.2.jar!/webapps/cluster}
2021-02-20 18:27:15,048 INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector#1bd39d3c{HTTP/1.1,[http/1.1]}{0.0.0.0:8088}
2021-02-20 18:27:15,048 INFO org.eclipse.jetty.server.Server: Started #12739ms
2021-02-20 18:27:15,048 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app cluster started at 8088
2021-02-20 18:27:15,277 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 100, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2021-02-20 18:27:15,327 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8033
2021-02-20 18:27:15,771 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server
2021-02-20 18:27:15,773 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state
2021-02-20 18:27:15,772 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2021-02-20 18:27:15,773 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8033: starting
2021-02-20 18:27:15,818 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating AMRMToken
2021-02-20 18:27:15,821 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: Rolling master-key for container-tokens
2021-02-20 18:27:15,822 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens
2021-02-20 18:27:15,822 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2021-02-20 18:27:15,822 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 1
2021-02-20 18:27:15,822 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
2021-02-20 18:27:15,833 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2021-02-20 18:27:15,833 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2021-02-20 18:27:15,834 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 2
2021-02-20 18:27:15,834 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing RMDTMasterKey.
2021-02-20 18:27:15,835 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.nodelabels.event.NodeLabelsStoreEventType for class org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager$ForwardingEventHandler
2021-02-20 18:27:16,227 INFO org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore: Created store directory :file:/tmp/hadoop-yarn-hdoop/node-attribute
2021-02-20 18:27:16,297 INFO org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore: Finished write mirror at:file:/tmp/hadoop-yarn-hdoop/node-attribute/nodeattribute.mirror
2021-02-20 18:27:16,297 INFO org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore: Finished create editlog file at:file:/tmp/hadoop-yarn-hdoop/node-attribute/nodeattribute.editlog
2021-02-20 18:27:16,319 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesManagerImpl$ForwardingEventHandler
2021-02-20 18:27:16,322 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.MultiNodeSortingManager: Starting NodeSortingService=MultiNodeSortingManager
2021-02-20 18:27:16,363 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 5000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2021-02-20 18:27:16,364 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8031
2021-02-20 18:27:16,384 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
2021-02-20 18:27:16,394 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2021-02-20 18:27:16,434 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8031: starting
2021-02-20 18:27:16,455 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
2021-02-20 18:27:16,496 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 5000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2021-02-20 18:27:16,501 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8030
2021-02-20 18:27:16,528 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
2021-02-20 18:27:16,529 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2021-02-20 18:27:16,529 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8030: starting
2021-02-20 18:27:16,721 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 5000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2021-02-20 18:27:16,722 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8032
2021-02-20 18:27:16,725 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server
2021-02-20 18:27:16,742 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting
2021-02-20 18:27:16,763 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
2021-02-20 18:27:16,763 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2021-02-20 18:27:16,766 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2021-02-20 18:27:16,769 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.w.WebAppContext#4016ccc1{cluster,/,null,UNAVAILABLE}{jar:file:/home/hdoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.2.2.jar!/webapps/cluster}
2021-02-20 18:27:16,785 INFO org.eclipse.jetty.server.AbstractConnector: Stopped ServerConnector#1bd39d3c{HTTP/1.1,[http/1.1]}{0.0.0.0:8088}
2021-02-20 18:27:16,786 INFO org.eclipse.jetty.server.session: node0 Stopped scavenging
2021-02-20 18:27:16,791 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler#30bce90b{static,/static,jar:file:/home/hdoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.2.2.jar!/webapps/static,UNAVAILABLE}
2021-02-20 18:27:16,791 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler#6cb6decd{logs,/logs,file:///home/hdoop/hadoop/logs/,UNAVAILABLE}
2021-02-20 18:27:16,795 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2021-02-20 18:27:16,811 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2021-02-20 18:27:16,821 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2021-02-20 18:27:16,822 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2021-02-20 18:27:16,822 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2021-02-20 18:27:16,825 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2021-02-20 18:27:16,825 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2021-02-20 18:27:16,826 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2021-02-20 18:27:16,827 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030
2021-02-20 18:27:16,839 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8030
2021-02-20 18:27:16,841 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2021-02-20 18:27:16,847 INFO org.apache.hadoop.ipc.Server: Stopping server on 8031
2021-02-20 18:27:16,862 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8031
2021-02-20 18:27:16,863 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2021-02-20 18:27:16,863 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor thread interrupted
2021-02-20 18:27:16,868 ERROR org.apache.hadoop.yarn.event.EventDispatcher: Returning, interrupted : java.lang.InterruptedException
2021-02-20 18:27:16,868 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager thread interrupted
2021-02-20 18:27:16,869 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events.
2021-02-20 18:27:16,870 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events.
2021-02-20 18:27:16,870 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor thread interrupted
2021-02-20 18:27:16,873 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2021-02-20 18:27:16,873 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2021-02-20 18:27:16,874 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
2021-02-20 18:27:16,875 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2021-02-20 18:27:16,878 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...
2021-02-20 18:27:16,879 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.
2021-02-20 18:27:16,879 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.
2021-02-20 18:27:16,879 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events.
2021-02-20 18:27:16,881 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2021-02-20 18:27:16,882 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at hadoop-1/127.0.1.1
************************************************************/
At this point, a call to jps tells me that ResourceManager was not started:
hdoop#hadoop-1:~/hadoop/logs$ jps
18372 DataNode
18537 SecondaryNameNode
18282 NameNode
19549 Jps
18927 NodeManager
The curious part is that when I start it myself through a terminal, it runs just fine.
Does anyone know what's going on? Or why ResourceManager refuses to start under Ansible?
yarn executable in $HADOOP_HOME/bin/yarn should be used.
Eg.:
- name: Start Yarn Cluster
command: '{{ item }}'
with_items:
- "nohup {{ HADOOP_HOME }}/bin/yarn --daemon start resourcemanager"
- "nohup {{ HADOOP_HOME }}/bin/yarn --daemon start nodemanager"
become: yes
become_method: su
become_user: ubuntu
environment: "{{ proxy_env }}"

Flume does not write to HDFS from kafka topic

I am trying to read from Kafka topic and store it to HDFS as Flume sink and input data is JSON, following is my config file,
# components name
a1.sources = source1
a1.channels = channel1
a1.sinks = sink1
a1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.source1.zookeeperConnect = server1:port,server2:port,server3:port
a1.sources.source1.kafka.bootstrap.servers = server1:port,server2:port,server3:port
a1.sources.source1.kafka.topics = TOPIC
a1.sources.source1.kafka.consumer.group.id = flume
a1.sources.source1.channels = channel1
a1.sources.source1.interceptors = i1
a1.sources.source1.interceptors.i1.type = timestamp
a1.sources.source1.kafka.consumer.timeout.ms = 100
a1.channels.channel1.type = memory
a1.channels.channel1.capacity = 1
a1.channels.channel1.transactionCapacity = 1
a1.sinks.sink1.type = hdfs
a1.sinks.sink1.hdfs.path = /path/kafka/
a1.sinks.sink1.hdfs.rollInterval = 1
a1.sinks.sink1.hdfs.rollSize = 1
a1.sinks.sink1.hdfs.rollCount = 1
a1.sinks.sink1.hdfs.fileType = DataStream
a1.sinks.sink1.channel = channel1
a1.sinks.sink1.hdfs.idleTimeout = 10
and here is Flume-NG command
flume-ng agent --conf-file /path/kafka_hdfs_sink.conf --conf /path/flume/conf/ --name a1;
Following are logs after loading all dependencies,
18/01/30 12:15:24 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
18/01/30 12:15:24 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/fk010191/kafka/kafka_hdfs_sink.conf
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Added sinks: sink1 Agent: a1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Processing:sink1
18/01/30 12:15:24 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
18/01/30 12:15:24 INFO node.AbstractConfigurationProvider: Creating channels
18/01/30 12:15:24 INFO channel.DefaultChannelFactory: Creating instance of channel channel1 type memory
18/01/30 12:15:24 INFO node.AbstractConfigurationProvider: Created channel channel1
18/01/30 12:15:24 INFO source.DefaultSourceFactory: Creating instance of source source1, type org.apache.flume.source.kafka.KafkaSource
18/01/30 12:15:24 ERROR node.AbstractConfigurationProvider: Source source1 has been removed due to an error during configuration
org.apache.flume.conf.ConfigurationException: Kafka topic must be specified.
at org.apache.flume.source.kafka.KafkaSource.doConfigure(KafkaSource.java:183)
at org.apache.flume.source.BasicSourceSemantics.configure(BasicSourceSemantics.java:65)
at org.apache.flume.source.AbstractPollableSource.configure(AbstractPollableSource.java:63)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:331)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/01/30 12:15:24 INFO sink.DefaultSinkFactory: Creating instance of sink: sink1, type: hdfs
18/01/30 12:15:24 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
18/01/30 12:15:24 INFO node.AbstractConfigurationProvider: Channel channel1 connected to [sink1]
18/01/30 12:15:24 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#4c601ec4 counterGroup:{ name:null counters:{} } }} channels:{channel1=org.apache.flume.channel.MemoryChannel{name: channel1}} }
18/01/30 12:15:24 INFO node.Application: Starting Channel channel1
18/01/30 12:15:24 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: channel1: Successfully registered new MBean.
18/01/30 12:15:24 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: channel1 started
18/01/30 12:15:24 INFO node.Application: Starting Sink sink1
18/01/30 12:15:24 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: sink1: Successfully registered new MBean.
18/01/30 12:15:24 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: sink1 started
when i stop after a while, i see as below but it doesnot write anything to HDFS
^C18/01/30 12:21:16 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 23
18/01/30 12:21:16 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: sink1 stopped
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.start.time == 1517336124877
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.stop.time == 1517336476542
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.batch.complete == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.batch.empty == 46
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.batch.underflow == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.connection.closed.count == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.connection.creation.count == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.connection.failed.count == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.event.drain.attempt == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SINK, name: sink1. sink.event.drain.sucess == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: channel1 stopped
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.start.time == 1517336124874
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.stop.time == 1517336476543
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.capacity == 1
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.current.size == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.event.put.attempt == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.event.put.success == 0
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.event.take.attempt == 46
18/01/30 12:21:16 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: channel1. channel.event.take.success == 0
when i run following command - bin/kafka-console-consumer.sh --bootstrap-server server1:port,server2:port,server3:port --topic TOPIC --from-beginning, i can see some data from the topic...but Flume is not writing anything to HDFS. Any help is greatly appreciated.

Unable to load class oracle.jdbc.OracleDriver in apache-flume

Im trying to make flume-ng-sql-source work with Apache Flume so I can stream Oracle DB to Kafka.
Find a basic tutorial here https://www.toadworld.com/platforms/oracle/w/wiki/11524.streaming-oracle-database-table-data-to-apache-kafka
Im using the following version
Flume 1.8.0, flume-ng-sql-source 1.4.4 and ojdbc7.jar
Now when I try to lauch the agent I get the error
org.hibernate.boot.registry.classloading.spi.ClassLoadingException: Unable to load class [oracle.jdbc.OracleDriver]
Here is my flume.conf
agent.channels = ch1
agent.sinks = kafkaSink
agent.sources = sql-source
agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 1000000
agent.sources.sql-source.channels = ch1
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource
# URL to connect to database
agent.sources.sql-source.hibernate.connection.url = jdbc:oracle:thin:#myoracleserver:1521:mydb
# Database connection properties
agent.sources.sql-source.hibernate.connection.user = user
agent.sources.sql-source.hibernate.connection.password = pass
agent.sources.sql-source.hibernate.dialect = org.hibernate.dialect.Oracle12cDialect
agent.sources.sql-source.hibernate.connection.driver_class = oracle.jdbc.OracleDriver
agent.sources.sql-source.table = SCHEMA.TABLE
agent.sources.sql-source.columns.to.select = *
agent.sources.sql-source.status.file.name = sql-source.status
# Increment column properties
agent.sources.sql-source.incremental.column.name = id
# Increment value is from you want to start taking data from tables (0 will import entire table)
agent.sources.sql-source.incremental.value = 0
# Query delay, each configured milisecond the query will be sent
agent.sources.sql-source.run.query.delay=10000
# Status file is used to save last readed row
agent.sources.sql-source.status.file.path = /var/lib/flume
agent.sources.sql-source.status.file.name = sql-source.status
agent.sinks.kafkaSink.type=org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkaSink.brokerList=localhost:9092
agent.sinks.kafkaSink.topic=oradb
agent.sinks.kafkaSink.channel=ch1
agent.sinks.kafkaSink.batchSize=10
Here is my flume-env.sh
FLUME_CLASSPATH="/home/mache84/flume/apache-flume-1.8.0-bin/lib"
Here is the output when I launch flume
[mache84#localhost flume]$ apache-flume-1.8.0-bin/bin/flume-ng agent --conf /home/mache84/flume/apache-flume-1.8.0-bin/conf -f /home/mache84/flume/apache-flume-1.8.0-bin/conf/flume.conf -n agent -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /home/mache84/flume/apache-flume-1.8.0-bin/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /usr/lib/jvm/jre-1.8.0-openjdk/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/home/mache84/flume/apache-flume-1.8.0-bin/conf:/home/mache84/flume/apache-flume-1.8.0-bin/lib/*:/home/mache84/flume/apache-flume-1.8.0-bin/lib:/home/mache84/flume/apache-flume-1.8.0-bin/plugins.d/sql-source/lib/*:/home/mache84/flume/apache-flume-1.8.0-bin/plugins.d/sql-source/libext/*:/lib/*' -Djava.library.path= org.apache.flume.node.Application -f /home/mache84/flume/apache-flume-1.8.0-bin/conf/flume.conf -n agent
2017-11-10 10:54:16,220 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting
2017-11-10 10:54:16,223 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:134)] Reloading configuration file:/home/mache84/flume/apache-flume-1.8.0-bin/conf/flume.conf
2017-11-10 10:54:16,228 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:kafkaSink
2017-11-10 10:54:16,229 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:kafkaSink
2017-11-10 10:54:16,229 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: kafkaSink Agent: agent
2017-11-10 10:54:16,229 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:kafkaSink
2017-11-10 10:54:16,229 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:kafkaSink
2017-11-10 10:54:16,229 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:kafkaSink
2017-11-10 10:54:16,236 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [agent]
2017-11-10 10:54:16,236 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:147)] Creating channels
2017-11-10 10:54:16,242 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel ch1 type memory
2017-11-10 10:54:16,247 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:201)] Created channel ch1
2017-11-10 10:54:16,248 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source sql-source, type org.keedio.flume.source.SQLSource
2017-11-10 10:54:16,250 (conf-file-poller-0) [INFO - org.keedio.flume.source.SQLSource.configure(SQLSource.java:63)] Reading and processing configuration values for source sql-source
2017-11-10 10:54:16,365 (conf-file-poller-0) [INFO - org.hibernate.annotations.common.reflection.java.JavaReflectionManager.<clinit>(JavaReflectionManager.java:66)] HCANN000001: Hibernate Commons Annotations {4.0.5.Final}
2017-11-10 10:54:16,370 (conf-file-poller-0) [INFO - org.hibernate.Version.logVersion(Version.java:54)] HHH000412: Hibernate Core {4.3.10.Final}
2017-11-10 10:54:16,372 (conf-file-poller-0) [INFO - org.hibernate.cfg.Environment.<clinit>(Environment.java:239)] HHH000206: hibernate.properties not found
2017-11-10 10:54:16,373 (conf-file-poller-0) [INFO - org.hibernate.cfg.Environment.buildBytecodeProvider(Environment.java:346)] HHH000021: Bytecode provider name : javassist
2017-11-10 10:54:16,393 (conf-file-poller-0) [INFO - org.keedio.flume.source.HibernateHelper.establishSession(HibernateHelper.java:63)] Opening hibernate session
2017-11-10 10:54:16,472 (conf-file-poller-0) [WARN - org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.configure(DriverManagerConnectionProviderImpl.java:93)] HHH000402: Using Hibernate built-in connection pool (not for production use!)
2017-11-10 10:54:16,473 (conf-file-poller-0) [ERROR - org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:361)] Source sql-source has been removed due to an error during configuration
org.hibernate.boot.registry.classloading.spi.ClassLoadingException: Unable to load class [oracle.jdbc.OracleDriver]
at org.hibernate.boot.registry.classloading.internal.ClassLoaderServiceImpl.classForName(ClassLoaderServiceImpl.java:245)
at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.loadDriverIfPossible(DriverManagerConnectionProviderImpl.java:200)
at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.buildCreator(DriverManagerConnectionProviderImpl.java:156)
at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.configure(DriverManagerConnectionProviderImpl.java:95)
at org.hibernate.boot.registry.internal.StandardServiceRegistryImpl.configureService(StandardServiceRegistryImpl.java:111)
at org.hibernate.service.internal.AbstractServiceRegistryImpl.initializeService(AbstractServiceRegistryImpl.java:234)
at org.hibernate.service.internal.AbstractServiceRegistryImpl.getService(AbstractServiceRegistryImpl.java:206)
at org.hibernate.engine.jdbc.internal.JdbcServicesImpl.buildJdbcConnectionAccess(JdbcServicesImpl.java:260)
at org.hibernate.engine.jdbc.internal.JdbcServicesImpl.configure(JdbcServicesImpl.java:94)
at org.hibernate.boot.registry.internal.StandardServiceRegistryImpl.configureService(StandardServiceRegistryImpl.java:111)
at org.hibernate.service.internal.AbstractServiceRegistryImpl.initializeService(AbstractServiceRegistryImpl.java:234)
at org.hibernate.service.internal.AbstractServiceRegistryImpl.getService(AbstractServiceRegistryImpl.java:206)
at org.hibernate.cfg.Configuration.buildTypeRegistrations(Configuration.java:1887)
at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1845)
at org.keedio.flume.source.HibernateHelper.establishSession(HibernateHelper.java:67)
at org.keedio.flume.source.SQLSource.configure(SQLSource.java:73)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:326)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:101)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: Could not load requested class : oracle.jdbc.OracleDriver
at org.hibernate.boot.registry.classloading.internal.ClassLoaderServiceImpl$AggregatedClassLoader.findClass(ClassLoaderServiceImpl.java:230)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.hibernate.boot.registry.classloading.internal.ClassLoaderServiceImpl.classForName(ClassLoaderServiceImpl.java:242)
... 26 more
2017-11-10 10:54:16,475 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: kafkaSink, type: org.apache.flume.sink.kafka.KafkaSink
2017-11-10 10:54:16,479 (conf-file-poller-0) [WARN - org.apache.flume.sink.kafka.KafkaSink.translateOldProps(KafkaSink.java:363)] topic is deprecated. Please use the parameter kafka.topic
2017-11-10 10:54:16,479 (conf-file-poller-0) [WARN - org.apache.flume.sink.kafka.KafkaSink.translateOldProps(KafkaSink.java:374)] brokerList is deprecated. Please use the parameter kafka.bootstrap.servers
2017-11-10 10:54:16,479 (conf-file-poller-0) [WARN - org.apache.flume.sink.kafka.KafkaSink.translateOldProps(KafkaSink.java:384)] batchSize is deprecated. Please use the parameter flumeBatchSize
2017-11-10 10:54:16,480 (conf-file-poller-0) [INFO - org.apache.flume.sink.kafka.KafkaSink.configure(KafkaSink.java:314)] Using the static topic oradb. This may be overridden by event headers
2017-11-10 10:54:16,484 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:116)] Channel ch1 connected to [kafkaSink]
2017-11-10 10:54:16,488 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:137)] Starting new configuration:{ sourceRunners:{} sinkRunners:{kafkaSink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#6542f090 counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} }
2017-11-10 10:54:16,488 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:144)] Starting Channel ch1
2017-11-10 10:54:16,489 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:159)] Waiting for channel: ch1 to start. Sleeping for 500 ms
2017-11-10 10:54:16,524 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean.
2017-11-10 10:54:16,525 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: ch1 started
2017-11-10 10:54:16,990 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:171)] Starting Sink kafkaSink
2017-11-10 10:54:17,033 (lifecycleSupervisor-1-1) [INFO - org.apache.kafka.common.config.AbstractConfig.logAll(AbstractConfig.java:165)] ProducerConfig values:
compression.type = none
metric.reporters = []
metadata.max.age.ms = 300000
metadata.fetch.timeout.ms = 60000
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
bootstrap.servers = [localhost:9092]
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
buffer.memory = 33554432
timeout.ms = 30000
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.keystore.type = JKS
ssl.trustmanager.algorithm = PKIX
block.on.buffer.full = false
ssl.key.password = null
max.block.ms = 60000
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
ssl.truststore.password = null
max.in.flight.requests.per.connection = 5
metrics.num.samples = 2
client.id =
ssl.endpoint.identification.algorithm = null
ssl.protocol = TLS
request.timeout.ms = 30000
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
acks = 1
batch.size = 16384
ssl.keystore.location = null
receive.buffer.bytes = 32768
ssl.cipher.suites = null
ssl.truststore.type = JKS
security.protocol = PLAINTEXT
retries = 0
max.request.size = 1048576
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
ssl.truststore.location = null
ssl.keystore.password = null
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
send.buffer.bytes = 131072
linger.ms = 0
2017-11-10 10:54:17,065 (lifecycleSupervisor-1-1) [INFO - org.apache.kafka.common.utils.AppInfoParser$AppInfo.<init>(AppInfoParser.java:82)] Kafka version : 0.9.0.1
2017-11-10 10:54:17,065 (lifecycleSupervisor-1-1) [INFO - org.apache.kafka.common.utils.AppInfoParser$AppInfo.<init>(AppInfoParser.java:83)] Kafka commitId : 23c69d62a0cabf06
2017-11-10 10:54:17,066 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: kafkaSink: Successfully registered new MBean.
2017-11-10 10:54:17,066 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: kafkaSink started
I have copied the lib ojdbc7.jar in
apache-flume-1.8.0-bin/lib and
apache-flume-1.8.0-bin/plugins.d/sql-source/libext/
This could be a very basic problem since I'm new to all this.
Thanks in advance
The problem I had was just that my ojdbc7.jar was a bad file.
I found a good one here https://mvnrepository.com/artifact/oracle/ojdbc6/11.2.0.3

Spark Streaming is not streaming,but waits showing consumer-config values

I am trying to stream data using spark-streaming-kafka-0-10_2.11 artifact and 2.1.1 version along with spark-streaming_2.11 version 2.1.1 and kafka_2.11(0.10.2.1 version).
When I start the program, Spark is not streaming data and neither it is throwing any error.
Below is my code, dependencies, and response.
Dependencies:
"org.apache.kafka" % "kafka_2.11" % "0.10.2.1",
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.1.1",
"org.apache.spark" % "spark-streaming_2.11" % "2.1.1"
Code:
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("kafka-dstream").set("spark.driver.allowMultipleContexts", "true")
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, Seconds(5))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "ec2-xx-yy-zz-abc.us-west-2.compute.amazonaws.com:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> ("java-test-consumer-" + random.nextInt()+"-" +
System.currentTimeMillis()),
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("test")
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
stream.foreachRDD(f=>f.foreachPartition(f=>f ->{
while (f.hasNext) {
System.out.println(">>>>>>>>>>>>>>>>>>>>>>>>>>>"+f.next().key());
}
}))
ssc.start()
ssc.awaitTermination()
Response (Logs):
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Loading project definition from D:\Scala_Workspace\LM\project
[info] Set current project to LM (in build file:/D:/Scala_Workspace/LM/)
[info] Running consumer.One
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/09 12:43:44 INFO SparkContext: Running Spark version 2.1.1
17/06/09 12:43:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/09 12:43:44 INFO SecurityManager: Changing view acls to: ee208961
17/06/09 12:43:44 INFO SecurityManager: Changing modify acls to: ee208961
17/06/09 12:43:44 INFO SecurityManager: Changing view acls groups to:
17/06/09 12:43:44 INFO SecurityManager: Changing modify acls groups to:
17/06/09 12:43:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ee208961); groups wi
permissions: Set(ee208961); groups with modify permissions: Set()
17/06/09 12:43:45 INFO Utils: Successfully started service 'sparkDriver' on port 51695.
17/06/09 12:43:45 INFO SparkEnv: Registering MapOutputTracker
17/06/09 12:43:45 INFO SparkEnv: Registering BlockManagerMaster
17/06/09 12:43:45 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/06/09 12:43:45 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/06/09 12:43:45 INFO DiskBlockManager: Created local directory at C:\Users\EE208961\AppData\Local\Temp\blockmgr-7f3c43f0-d1f5-43fb-b185-a5bf78de4496
17/06/09 12:43:45 INFO MemoryStore: MemoryStore started with capacity 93.3 MB
17/06/09 12:43:45 INFO SparkEnv: Registering OutputCommitCoordinator
17/06/09 12:43:45 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/06/09 12:43:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.1.22.70:4040
17/06/09 12:43:46 INFO Executor: Starting executor ID driver on host localhost
17/06/09 12:43:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51706.
17/06/09 12:43:46 INFO NettyBlockTransferService: Server created on 10.1.22.70:51706
17/06/09 12:43:46 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/06/09 12:43:46 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.1.22.70, 51706, None)
17/06/09 12:43:46 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.22.70:51706 with 93.3 MB RAM, BlockManagerId(driver, 10.1.22.70, 51706,
17/06/09 12:43:46 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.1.22.70, 51706, None)
17/06/09 12:43:46 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.1.22.70, 51706, None)
17/06/09 12:43:46 WARN KafkaUtils: overriding enable.auto.commit to false for executor
17/06/09 12:43:46 WARN KafkaUtils: overriding auto.offset.reset to none for executor
17/06/09 12:43:46 WARN KafkaUtils: overriding executor group.id to spark-executor-sadad
17/06/09 12:43:46 WARN KafkaUtils: overriding receive.buffer.bytes to 65536 see KAFKA-3135
17/06/09 12:43:47 INFO DirectKafkaInputDStream: Slide time = 5000 ms
17/06/09 12:43:47 INFO DirectKafkaInputDStream: Storage level = Serialized 1x Replicated
17/06/09 12:43:47 INFO DirectKafkaInputDStream: Checkpoint interval = null
17/06/09 12:43:47 INFO DirectKafkaInputDStream: Remember interval = 5000 ms
17/06/09 12:43:47 INFO DirectKafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka010.DirectKafkaInputDStream#5cda65c2
17/06/09 12:43:47 INFO ForEachDStream: Slide time = 5000 ms
17/06/09 12:43:47 INFO ForEachDStream: Storage level = Serialized 1x Replicated
17/06/09 12:43:47 INFO ForEachDStream: Checkpoint interval = null
17/06/09 12:43:47 INFO ForEachDStream: Remember interval = 5000 ms
17/06/09 12:43:47 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream#1b9c49ce
17/06/09 12:43:47 INFO ConsumerConfig: ConsumerConfig values:
metric.reporters = []
metadata.max.age.ms = 300000
partition.assignment.strategy = [org.apache.kafka.clients.consumer.RangeAssignor]
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
max.partition.fetch.bytes = 1048576
bootstrap.servers = [ec2-54-69-23-196.us-west-2.compute.amazonaws.com:9092]
ssl.keystore.type = JKS
enable.auto.commit = false
sasl.mechanism = GSSAPI
interceptor.classes = null
exclude.internal.topics = true
ssl.truststore.password = null
client.id =
ssl.endpoint.identification.algorithm = null
max.poll.records = 2147483647
check.crcs = true
request.timeout.ms = 40000
heartbeat.interval.ms = 3000
auto.commit.interval.ms = 5000
receive.buffer.bytes = 65536
ssl.truststore.type = JKS
ssl.truststore.location = null
ssl.keystore.password = null
fetch.min.bytes = 1
send.buffer.bytes = 131072
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
group.id = sadad
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.trustmanager.algorithm = PKIX
ssl.key.password = null
fetch.max.wait.ms = 500
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
session.timeout.ms = 30000
metrics.num.samples = 2
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
ssl.protocol = TLS
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.keystore.location = null
ssl.cipher.suites = null
security.protocol = PLAINTEXT
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
auto.offset.reset = latest
17/06/09 12:43:47 INFO AppInfoParser: Kafka version : 0.10.2.1
17/06/09 12:43:47 INFO AppInfoParser: Kafka commitId : a7a17cdec9eaa6c5
The code starts waiting here,neither it throws error,nor streams.
And what else i observed is,if i don't use "org.apache.kafka" % "kafka_2.11" % "0.10.2.1" dependency, am getting
"Exception in thread "main" org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 291941, only 32 bytes available" error,
so i used kafka_2.11(0.10.2.1 version) dependency.

hadoop yarn class not found in same jar but different package during run job

I'm following example of the book "Hadoop: The Definitive Guide 2/e".
I encounter a problem..:-(.
I used ubuntu 12.04, hadoop 2.2.0.
I made the job.jar using eclipse.
The class map_reduce.programming.v1.MaxTemperatureReducer is in jar but different package.
when i run job, i encounter class not found exception.
Below is mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
If change the value to local not yarn, it worked. but in case of yarn, not worked.
The HADOOP_CLASS_PATH include the path which including job.jar.
What is the root cause?
package map_reduce.programming.v3;
import map_reduce.programming.v1.MaxTemperatureReducer;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* hadoop map_reduce.programming.v3.MaxTemperatureDriver -conf conf/hadoop-local.xml /book/input/ncdc/micro max-temp
* hadoop jar job.jar map_reduce.programming.v3.MaxTemperatureDriver -conf conf/hadoop-cluster.xml /book/input/ncdc/all max-temp
*
*
*/
public class MaxTemperatureDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s [generic optins] <input> <output>\n", getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
JobConf conf = new JobConf(getConf(), getClass());
conf.setJobName("Max temperature");
FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MaxTemperatureMapper.class);
conf.setCombinerClass(MaxTemperatureReducer.class);
conf.setReducerClass(MaxTemperatureReducer.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
System.exit(exitCode);
}
}
Below is logs..
jar job.jar map_reduce.programming.v3.MaxTemperatureDriver -conf conf/hadoop-cluster.xml /book/input/ncdc/all max-temp0991
14/06/05 18:10:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/06/05 18:10:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/06/05 18:10:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/06/05 18:10:20 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/06/05 18:10:20 INFO mapred.FileInputFormat: Total input paths to process : 2
14/06/05 18:10:21 INFO mapreduce.JobSubmitter: number of splits:2
14/06/05 18:10:21 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/06/05 18:10:21 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/06/05 18:10:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401958773644_0002
14/06/05 18:10:22 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/06/05 18:10:22 INFO impl.YarnClientImpl: Submitted application
application_1401958773644_0002 to ResourceManager at /0.0.0.0:8032
14/06/05 18:10:22 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1401958773644_0002/
14/06/05 18:10:22 INFO mapreduce.Job: Running job: job_1401958773644_0002
14/06/05 18:10:27 INFO mapreduce.Job: Job job_1401958773644_0002 running in uber mode : false
14/06/05 18:10:27 INFO mapreduce.Job: map 0% reduce 0%
14/06/05 18:10:30 INFO mapreduce.Job: Task Id : attempt_1401958773644_0002_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class map_reduce.programming.v1.MaxTemperatureReducer not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1752)
at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1139)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1517)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1010)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
I had the exact same problem, with the Job configuration not finding the Mapper and Reducer classes and throwing a ClassNotFoundException.
I'm using mapreduce2 so I had to add
job.setJarByClass(Avg.class);
while in your case, I think you should call
JobConf conf = new JobConf(getConf(), YouClassMain.class);
Best,
Edoardo

Resources