Unable to retrieve Twitter streaming data using Flume - hadoop

I am trying to stream and retrieve Twitter data using Flume but unable to do so because of some sort of error.
When I try executing it using the command:
flume-ng agent -n TwitterAgent -c conf -f /home/hadoop/Flume/conf/twitter.conf
I get the following:
Info: Including Hadoop libraries found via (/home/hadoop/hadoop-2.10.1/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/home/hadoop/hbase-2.2.5/bin/hbase) for HBASE access
Info: Including Hive libraries found via (/home/hadoop/apache-hive-2.3.7-bin) for Hive access
+ exec /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx20m -cp 'conf:/home/hadoop/Flume/lib/*:/home/hadoop/hadoop-2.10.1/etc/hadoop:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/common/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.10.1/contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-2.2.5/conf:/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar:/home/hadoop/hbase-2.2.5:/home/hadoop/hbase-2.2.5/lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.2.5.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hadoop/hbase-2.2.5/lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/home/hadoop/hadoop-2.10.1/etc/hadoop:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/common/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.10.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.10.1/contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-2.2.5/conf:/home/hadoop/apache-hive-2.3.7-bin/lib/*' -Djava.library.path=:/home/hadoop/hadoop-2.10.1/lib/native:/home/hadoop/hadoop-2.10.1/lib/native org.apache.flume.node.Application -n TwitterAgent -f /home/hadoop/Flume/conf/twitter.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/Flume/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.10.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.3.7-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/11/20 02:23:44 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
20/11/20 02:23:44 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/hadoop/Flume/conf/twitter.conf
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:MemChannel
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:Twitter
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Processing:HDFS
20/11/20 02:23:44 WARN conf.FlumeConfiguration: Agent configuration for 'TwitterAgent' has no configfilters.
20/11/20 02:23:44 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Creating channels
20/11/20 02:23:44 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Created channel MemChannel
20/11/20 02:23:44 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type org.apache.flume.source.twitter.TwitterSource
**20/11/20 02:23:44 ERROR node.AbstractConfigurationProvider: Source Twitter has been removed due to an error during configuration**
j**ava.lang.InstantiationException: Incompatible source and channel settings defined. source's batch size is greater than the channels transaction capacity. Source: Twitter, batch size = 1000, channel MemChannel, transaction capacity = 100**
at org.apache.flume.node.AbstractConfigurationProvider.checkSourceChannelCompatibility(AbstractConfigurationProvider.java:386)
at org.apache.flume.node.AbstractConfigurationProvider.getSourceChannels(AbstractConfigurationProvider.java:367)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:329)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:105)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/11/20 02:23:44 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
20/11/20 02:23:44 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [HDFS]
20/11/20 02:23:44 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#78e3d64e counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
20/11/20 02:23:44 INFO node.Application: Starting Channel MemChannel
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
20/11/20 02:23:44 INFO node.Application: Starting Sink HDFS
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
20/11/20 02:23:44 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
The terminal just stays stuck here and nothing happens. I tried waiting for several minutes but it stays the same.
My config file twitter.conf is located at /home/hadoop/Flume/conf and is as follows:
#Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
#Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey =##
TwitterAgent.sources.Twitter.consumerSecret =##
TwitterAgent.sources.Twitter.accessToken =##
TwitterAgent.sources.Twitter.accessTokenSecret =##
TwitterAgent.sources.Twitter.keywords =covid,covid-19,coronavirus
#Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
#Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100
TwitterAgent.channels.MemChannel.transactionCapacity = 100
#Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
My flume-env.sh file is as follows:
#Licensed to the Apache Software Foundation (ASF) under one
#or more contributor license agreements. See the NOTICE file
#distributed with this work for additional information
#regarding copyright ownership. The ASF licenses this file
#to you under the Apache License, Version 2.0 (the
#"License"); you may not use this file except in compliance
#with the License. You may obtain a copy of the License at
#http://www.apache.org/licenses/LICENSE-2.0
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
#during Flume startup.
#Enviroment variables can be set here.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=$CLASSPATH:/home/hadoop/Flume/lib/*
FLUME_CLASSPATH="/home/hadoop/Flume/lib/flume-sources-1.0-SNAPSHOT.jar"
#Give Flume more memory and pre-allocate, enable remote monitoring via JMX
#export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
#Let Flume write raw event data and configuration information to its log files for debugging
#purposes. Enabling these flags is not recommended in production,
#as it may result in logging sensitive user information or encryption secrets.
#export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "
#Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""

The error says
j**ava.lang.InstantiationException: Incompatible source and channel settings defined. source's batch size is greater than the channels transaction capacity. Source: Twitter, batch size = 1000, channel MemChannel, transaction capacity = 100**
So you can try either decrease source batch size or increase channel capacity to match source batch size.

Update: Apparently after some research I found that I used a bad version of : flume-sources-1.0-SNAPSHOT.jar which is a jar file found in the lib folder of Flume. Fixed it by generating my own jar by following the method at: https://community.cloudera.com/t5/Support-Questions/issue-flume-twitter/m-p/22938#M6597

Related

pyspark.sql.utils.AnalysisException: u'Path does not exist

I am running a spark job with amazon emr using the standard hdfs, not S3 to store my files. I have a hive table in hdfs://user/hive/warehouse/ but it cannot be found when my spark job is ran. I configured the spark property spark.sql.warehouse.dir to reflect that of my hdfs directory and while the yarn logs do say:
17/03/28 19:54:05 INFO SharedState: Warehouse path is 'hdfs://user/hive/warehouse/'.
later on in the logs it says(full log at end of page):
LogType:stdout
Log Upload Time:Tue Mar 28 19:54:15 +0000 2017
LogLength:854
Log Contents:
Traceback (most recent call last):
File "test.py", line 25, in <module>
parquet_example(spark)
File "test.py", line 9, in parquet_example
tests = spark.read.parquet("test.parquet")
File "/mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 274, in parquet
File "/mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://ip-xxx-xx-xx-xxx.ec2.internal:8020/user/hadoop/test.parquet;'
End of LogType:stdout
What am i doing wrong for there to be a mismatch in the path?
Here is my hdfs directory for hive/warehouse:
hdfs dfs -ls
/user/hive/warehouse
Found 1 items
drwxrwxrwt - hadoop hadoop 0 2017-03-28 18:50 /user/hive/warehouse/test
here is what /user/hadoop/ gives me:
hdfs dfs -ls /user/hadoop/
Found 2 items
drwxr-xr-x - hadoop hadoop 0 2017-03-28 16:53 /user/hadoop/.hiveJars
drwxr-xr-x - hadoop hadoop 0 2017-03-28 19:54 /user/hadoop/.sparkStaging
And here is my spark job in python:
from __future__ import print_function
from pyspark.sql import SparkSession
from pyspark.sql import Row
def parquet_example(spark):
tests = spark.read.parquet("test.parquet")
tests.createOrReplaceTempView("tests")
tests_result = spark.sql("SELECT * FROM test")
tests_result.show()
if __name__ == "__main__":
warehouseLocation = "hdfs://user/hive/warehouse/"
spark = SparkSession.builder.appName("example").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()
parquet_example(spark)
spark.stop()
full yarn log:
Container: container_1490717578939_0012_01_000001 on ip-xxx-xx-xx-xxx.ec2.internal_8041
=========================================================================================
LogType:stderr
Log Upload Time:Tue Mar 28 19:54:15 +0000 2017
LogLength:14054
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/131/__spark_libs__713193244228500015.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/03/28 19:54:01 INFO SignalUtils: Registered signal handler for TERM
17/03/28 19:54:01 INFO SignalUtils: Registered signal handler for HUP
17/03/28 19:54:01 INFO SignalUtils: Registered signal handler for INT
17/03/28 19:54:02 INFO ApplicationMaster: Preparing Local resources
17/03/28 19:54:03 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1490717578939_0012_000001
17/03/28 19:54:03 INFO SecurityManager: Changing view acls to: yarn,hadoop
17/03/28 19:54:03 INFO SecurityManager: Changing modify acls to: yarn,hadoop
17/03/28 19:54:03 INFO SecurityManager: Changing view acls groups to:
17/03/28 19:54:03 INFO SecurityManager: Changing modify acls groups to:
17/03/28 19:54:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
17/03/28 19:54:03 INFO ApplicationMaster: Starting the user application in a separate Thread
17/03/28 19:54:03 INFO ApplicationMaster: Waiting for spark context initialization...
17/03/28 19:54:03 INFO SparkContext: Running Spark version 2.1.0
17/03/28 19:54:03 INFO SecurityManager: Changing view acls to: yarn,hadoop
17/03/28 19:54:03 INFO SecurityManager: Changing modify acls to: yarn,hadoop
17/03/28 19:54:03 INFO SecurityManager: Changing view acls groups to:
17/03/28 19:54:03 INFO SecurityManager: Changing modify acls groups to:
17/03/28 19:54:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
17/03/28 19:54:03 INFO Utils: Successfully started service 'sparkDriver' on port 33579.
17/03/28 19:54:04 INFO SparkEnv: Registering MapOutputTracker
17/03/28 19:54:04 INFO SparkEnv: Registering BlockManagerMaster
17/03/28 19:54:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/03/28 19:54:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/03/28 19:54:04 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/blockmgr-f3713d64-91da-4cb5-9b55-d4a18c607a74
17/03/28 19:54:04 INFO DiskBlockManager: Created local directory at /mnt1/yarn/usercache/hadoop/appcache/application_1490717578939_0012/blockmgr-634c7d4b-026c-4df7-abf4-7846bd7fc958
17/03/28 19:54:04 INFO DiskBlockManager: Created local directory at /mnt2/yarn/usercache/hadoop/appcache/application_1490717578939_0012/blockmgr-19f0a265-755a-42f0-9282-1e3d98a57ab1
17/03/28 19:54:04 INFO MemoryStore: MemoryStore started with capacity 414.4 MB
17/03/28 19:54:04 INFO SparkEnv: Registering OutputCommitCoordinator
17/03/28 19:54:04 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
17/03/28 19:54:04 INFO Utils: Successfully started service 'SparkUI' on port 37056.
17/03/28 19:54:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx.xx.xx.xxx:37056
17/03/28 19:54:04 INFO YarnClusterScheduler: Created YarnClusterScheduler
17/03/28 19:54:04 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1490717578939_0012 and attemptId Some(appattempt_1490717578939_0012_000001)
17/03/28 19:54:04 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
17/03/28 19:54:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34414.
17/03/28 19:54:04 INFO NettyBlockTransferService: Server created on xxx.xx.xx.xxx:34414
17/03/28 19:54:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/03/28 19:54:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, xxx.xx.xx.xxx, 34414, None)
17/03/28 19:54:04 INFO BlockManagerMasterEndpoint: Registering block manager xxx.xx.xx.xxx:34414 with 414.4 MB RAM, BlockManagerId(driver, xxx.xx.xx.xxx, 34414, None)
17/03/28 19:54:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxx.xx.xx.xxx, 34414, None)
17/03/28 19:54:04 INFO BlockManager: external shuffle service port = 7337
17/03/28 19:54:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, xxx.xx.xx.xxx, 34414, None)
17/03/28 19:54:05 INFO EventLoggingListener: Logging events to hdfs:///var/log/spark/apps/application_1490717578939_0012_1
17/03/28 19:54:05 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
17/03/28 19:54:05 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
17/03/28 19:54:05 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/03/28 19:54:05 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/03/28 19:54:05 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM#xxx.xx.xx.xxx:33579)
17/03/28 19:54:05 INFO ApplicationMaster:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>/usr/lib/hadoop-lzo/lib/*<CPS>/usr/share/aws/emr/emrfs/conf<CPS>/usr/share/aws/emr/emrfs/lib/*<CPS>/usr/share/aws/emr/emrfs/auxlib/*<CPS>/usr/share/aws/emr/lib/*<CPS>/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar<CPS>/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar<CPS>/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar<CPS>/usr/lib/spark/yarn/lib/datanucleus-api-jdo.jar<CPS>/usr/lib/spark/yarn/lib/datanucleus-core.jar<CPS>/usr/lib/spark/yarn/lib/datanucleus-rdbms.jar<CPS>/usr/share/aws/emr/cloudwatch-sink/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>/usr/lib/hadoop-lzo/lib/*<CPS>/usr/share/aws/emr/emrfs/conf<CPS>/usr/share/aws/emr/emrfs/lib/*<CPS>/usr/share/aws/emr/emrfs/auxlib/*<CPS>/usr/share/aws/emr/lib/*<CPS>/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar<CPS>/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar<CPS>/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar<CPS>/usr/share/aws/emr/cloudwatch-sink/lib/*
SPARK_YARN_STAGING_DIR -> hdfs://ip-xxx-xx-xx-xxx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1490717578939_0012
SPARK_USER -> hadoop
SPARK_YARN_MODE -> true
PYTHONPATH -> {{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.4-src.zip
command:
LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:$LD_LIBRARY_PATH" \
{{JAVA_HOME}}/bin/java \
-server \
-Xmx5120m \
'-verbose:gc' \
'-XX:+PrintGCDetails' \
'-XX:+PrintGCDateStamps' \
'-XX:+UseConcMarkSweepGC' \
'-XX:CMSInitiatingOccupancyFraction=70' \
'-XX:MaxHeapFreeRatio=70' \
'-XX:+CMSClassUnloadingEnabled' \
'-XX:OnOutOfMemoryError=kill -9 %p' \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.history.ui.port=18080' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler#xxx.xx.xx.xxx:33579 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
2 \
--app-id \
application_1490717578939_0012 \
--user-class-path \
file:$PWD/__app__.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
py4j-0.10.4-src.zip -> resource { scheme: "hdfs" host: "ip-xxx-xx-xx-xxx.ec2.internal" port: 8020 file: "/user/hadoop/.sparkStaging/application_1490717578939_0012/py4j-0.10.4-src.zip" } size: 74096 timestamp: 1490730839170 type: FILE visibility: PRIVATE
__spark_conf__ -> resource { scheme: "hdfs" host: "ip-xxx-xx-xx-xxx.ec2.internal" port: 8020 file: "/user/hadoop/.sparkStaging/application_1490717578939_0012/__spark_conf__.zip" } size: 75741 timestamp: 1490730839402 type: ARCHIVE visibility: PRIVATE
pyspark.zip -> resource { scheme: "hdfs" host: "ip-xxx-xx-xx-xxx.ec2.internal" port: 8020 file: "/user/hadoop/.sparkStaging/application_1490717578939_0012/pyspark.zip" } size: 452353 timestamp: 1490730838849 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "ip-xxx-xx-xx-xxx.ec2.internal" port: 8020 file: "/user/hadoop/.sparkStaging/application_1490717578939_0012/__spark_libs__713193244228500015.zip" } size: 196686961 timestamp: 1490730836856 type: ARCHIVE visibility: PRIVATE
hive-site.xml -> resource { scheme: "hdfs" host: "ip-xxx-xx-xx-xxx.ec2.internal" port: 8020 file: "/user/hadoop/.sparkStaging/application_1490717578939_0012/hive-site.xml" } size: 2375 timestamp: 1490730837023 type: FILE visibility: PRIVATE
===============================================================================
17/03/28 19:54:05 INFO RMProxy: Connecting to ResourceManager at ip-xxx-xx-xx-xxx.ec2.internal/xxx-xx-xx-xxx:8030
17/03/28 19:54:05 INFO YarnRMClient: Registering the ApplicationMaster
17/03/28 19:54:05 INFO SharedState: Warehouse path is 'hdfs://user/hive/warehouse/'.
17/03/28 19:54:05 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
17/03/28 19:54:05 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/03/28 19:54:05 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/03/28 19:54:06 INFO metastore: Trying to connect to metastore with URI thrift://ip-xxx-xx-xx-xxx.ec2.internal:9083
17/03/28 19:54:06 INFO metastore: Connected to metastore.
17/03/28 19:54:06 INFO SessionState: Created local directory: /mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/tmp/yarn
17/03/28 19:54:06 INFO SessionState: Created local directory: /mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/tmp/5f653144-e990-45b0-ba73-cdb4d10e9f7a_resources
17/03/28 19:54:06 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/5f653144-e990-45b0-ba73-cdb4d10e9f7a
17/03/28 19:54:06 INFO SessionState: Created local directory: /mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/tmp/yarn/5f653144-e990-45b0-ba73-cdb4d10e9f7a
17/03/28 19:54:06 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/5f653144-e990-45b0-ba73-cdb4d10e9f7a/_tmp_space.db
17/03/28 19:54:06 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is hdfs://user/hive/warehouse/
17/03/28 19:54:06 ERROR ApplicationMaster: User application exited with status 1
17/03/28 19:54:06 INFO ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
17/03/28 19:54:06 INFO SparkContext: Invoking stop() from shutdown hook
17/03/28 19:54:06 INFO SparkUI: Stopped Spark web UI at http://xxx.xx.xx.xxx:37056
17/03/28 19:54:06 INFO YarnClusterSchedulerBackend: Shutting down all executors
17/03/28 19:54:06 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/03/28 19:54:06 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
17/03/28 19:54:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/28 19:54:06 INFO MemoryStore: MemoryStore cleared
17/03/28 19:54:06 INFO BlockManager: BlockManager stopped
17/03/28 19:54:06 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/28 19:54:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/28 19:54:06 INFO SparkContext: Successfully stopped SparkContext
17/03/28 19:54:06 INFO ShutdownHookManager: Shutdown hook called
17/03/28 19:54:06 INFO ShutdownHookManager: Deleting directory /mnt1/yarn/usercache/hadoop/appcache/application_1490717578939_0012/spark-3a6db594-2b44-47fe-8e48-4220b93e789a
17/03/28 19:54:06 INFO ShutdownHookManager: Deleting directory /mnt2/yarn/usercache/hadoop/appcache/application_1490717578939_0012/spark-a54516f0-48be-4fdb-899b-bbee998468b1
17/03/28 19:54:06 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/spark-552e3cae-c119-47a5-9c63-34d4df59d072
17/03/28 19:54:06 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/spark-552e3cae-c119-47a5-9c63-34d4df59d072/pyspark-a0240093-16c6-43e4-8f2c-dcef309afe97
End of LogType:stderr
LogType:stdout
Log Upload Time:Tue Mar 28 19:54:15 +0000 2017
LogLength:854
Log Contents:
Traceback (most recent call last):
File "test.py", line 25, in <module>
parquet_example(spark)
File "test.py", line 9, in parquet_example
tests = spark.read.parquet("test.parquet")
File "/mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 274, in parquet
File "/mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/hadoop/appcache/application_1490717578939_0012/container_1490717578939_0012_01_000001/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u'Path does not exist: hdfs://ip-xxx-xx-xx-xxx.ec2.internal:8020/user/hadoop/test.parquet;'
End of LogType:stdout
The function parquet_example in the question would create a DataFrame from the parquet file test.parquet and query from it by creating a temporary view.
From the comments:
Since the Hive table named test already exists, directly query the table with the created SparkSession
warehouseLocation = "hdfs://user/hive/warehouse/"
spark = SparkSession \
.builder \
.appName("example") \
.config("spark.sql.warehouse.dir", warehouseLocation) \
.enableHiveSupport() \
.getOrCreate()
spark.sql("SELECT * FROM test").show()

Issue while getting Twitter data in HDFS using Flume

I am trying to fetch the twitter data in HDFS but getting issue.
Here is my flume.conf file
TwitterAgent.sources= Twitter
TwitterAgent.channels= MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sources.Twitter.consumerKey=xxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret= xxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken=xxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords= hadoop,election,sports, cricket,Big data
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100
In Env.sh file, I have the path
#FLUME_CLASSPATH="/usr/lib/flume-sources-1.0-SNAPSHOT.jar"
Now I am using the below command to get the data-
[cloudera#quickstart etc]$ flume-ng agent -n TwitterAgent -c conf -f /etc/flume-ng/conf/flume.conf
It showing some logs but I am getting the below error and it is getting stuck after HDFS sink started.
16/09/25 05:18:36 WARN conf.FlumeConfiguration: Could not configure source Twitter due to: Component has no type. Cannot configure. Twitter
org.apache.flume.conf.ConfigurationException: Component has no type. Cannot configure. Twitter
at org.apache.flume.conf.ComponentConfiguration.configure(ComponentConfiguration.java:76)
at org.apache.flume.conf.source.SourceConfiguration.configure(SourceConfiguration.java:56)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSources(FlumeConfiguration.java:567)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:346)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.access$000(FlumeConfiguration.java:213)
at org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:127)
at org.apache.flume.conf.FlumeConfiguration.<init>(FlumeConfiguration.java:109)
at org.apache.flume.node.PropertiesFileConfigurationProvider.getFlumeConfiguration(PropertiesFileConfigurationProvider.java:189)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:89)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/09/25 05:18:36 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Creating channels
16/09/25 05:18:36 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Created channel MemChannel
16/09/25 05:18:36 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [HDFS]
16/09/25 05:18:36 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#3963542c counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
16/09/25 05:18:36 INFO node.Application: Starting Channel MemChannel
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
16/09/25 05:18:36 INFO node.Application: Starting Sink HDFS
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
In configuration file please replace
TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource
by
TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource

How to run spark-shell with YARN in client mode?

I've installed spark-1.6.1-bin-hadoop2.6.tgz on a 15-node Hadoop cluster. All nodes run Java 1.8.0_72 and the latest version of Hadoop. The Hadoop cluster itself is functional, e.g. YARN can run various MapReduce jobs successfully.
I can run Spark Shell locally on a node without any problems with the following command: $SPARK_HOME/bin/spark-shell.
I can also run some Spark examples successfully, such as SparkPi using YARN and cluster mode.
But when I try to run Spark Shell on YARN with deploy mode client, I encounter problems:
hadoopu#hadoop2:~$ $SPARK_HOME/bin/spark-shell --master yarn --deploy-mode client
16/03/21 15:15:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_72)
Type in expressions to have them evaluated.
Type :help for more information.
...
16/03/21 15:15:24 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/03/21 15:15:24 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/21 15:15:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/21 15:15:24 INFO SparkUI: Started SparkUI at http://10.108.57.32:4040
16/03/21 15:15:24 INFO RMProxy: Connecting to ResourceManager at hadoop2/10.108.57.32:8032
16/03/21 15:15:24 INFO Client: Requesting a new application from cluster with 13 NodeManagers
16/03/21 15:15:25 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (131072 MB per container)
16/03/21 15:15:25 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/03/21 15:15:25 INFO Client: Setting up container launch context for our AM
16/03/21 15:15:25 INFO Client: Setting up the launch environment for our AM container
16/03/21 15:15:25 INFO Client: Preparing resources for our AM container
16/03/21 15:15:25 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/03/21 15:15:25 INFO Client: Uploading resource file:/opt/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://hadoop1:9000/user/hadoopu/.sparkStaging/application_1458568053208_0006/spark-assembly-1.6.1-hadoop2.6.0.jar
16/03/21 15:15:28 INFO Client: Uploading resource file:/tmp/spark-c9077c60-b379-439e-aeb4-85948df70df5/__spark_conf__7479505398141092205.zip -> hdfs://hadoop1:9000/user/hadoopu/.sparkStaging/application_1458568053208_0006/__spark_conf__7479505398141092205.zip
16/03/21 15:15:28 INFO SecurityManager: Changing view acls to: hadoopu
16/03/21 15:15:28 INFO SecurityManager: Changing modify acls to: hadoopu
16/03/21 15:15:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopu); users with modify permissions: Set(hadoopu)
16/03/21 15:15:28 INFO Client: Submitting application 6 to ResourceManager
16/03/21 15:15:28 INFO YarnClientImpl: Submitted application application_1458568053208_0006
16/03/21 15:15:29 INFO Client: Application report for application_1458568053208_0006 (state: ACCEPTED)
16/03/21 15:15:29 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458569728506
final status: UNDEFINED
tracking URL: http://hadoop2:8088/proxy/application_1458568053208_0006/
user: hadoopu
16/03/21 15:15:30 INFO Client: Application report for application_1458568053208_0006 (state: ACCEPTED)
16/03/21 15:15:31 INFO Client: Application report for application_1458568053208_0006 (state: ACCEPTED)
16/03/21 15:15:32 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/03/21 15:15:32 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop2, PROXY_URI_BASES -> http://hadoop2:8088/proxy/application_1458568053208_0006), /proxy/application_1458568053208_0006
16/03/21 15:15:32 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/03/21 15:15:32 INFO Client: Application report for application_1458568053208_0006 (state: RUNNING)
16/03/21 15:15:32 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.108.57.41
ApplicationMaster RPC port: 0
queue: default
start time: 1458569728506
final status: UNDEFINED
tracking URL: http://hadoop2:8088/proxy/application_1458568053208_0006/
user: hadoopu
16/03/21 15:15:32 INFO YarnClientSchedulerBackend: Application application_1458568053208_0006 has started running.
16/03/21 15:15:32 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50170.
16/03/21 15:15:32 INFO NettyBlockTransferService: Server created on 50170
16/03/21 15:15:32 INFO BlockManagerMaster: Trying to register BlockManager
16/03/21 15:15:32 INFO BlockManagerMasterEndpoint: Registering block manager 10.108.57.32:50170 with 511.1 MB RAM, BlockManagerId(driver, 10.108.57.32, 50170)
16/03/21 15:15:32 INFO BlockManagerMaster: Registered BlockManager
16/03/21 15:15:37 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/03/21 15:15:37 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop2, PROXY_URI_BASES -> http://hadoop2:8088/proxy/application_1458568053208_0006), /proxy/application_1458568053208_0006
16/03/21 15:15:37 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/03/21 15:15:39 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
16/03/21 15:15:39 INFO SparkUI: Stopped Spark web UI at http://10.108.57.32:4040
16/03/21 15:15:39 INFO YarnClientSchedulerBackend: Shutting down all executors
16/03/21 15:15:39 INFO YarnClientSchedulerBackend: Asking each executor to shut down
16/03/21 15:15:39 INFO YarnClientSchedulerBackend: Stopped
16/03/21 15:15:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/03/21 15:15:39 INFO MemoryStore: MemoryStore cleared
16/03/21 15:15:39 INFO BlockManager: BlockManager stopped
16/03/21 15:15:39 INFO BlockManagerMaster: BlockManagerMaster stopped
16/03/21 15:15:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/03/21 15:15:39 INFO SparkContext: Successfully stopped SparkContext
16/03/21 15:15:39 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/03/21 15:15:39 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/03/21 15:15:39 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/03/21 15:15:54 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
16/03/21 15:15:54 ERROR SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $line3.$read$$iwC$$iwC.<init>(<console>:15)
at $line3.$read$$iwC.<init>(<console>:24)
at $line3.$read.<init>(<console>:26)
at $line3.$read$.<init>(<console>:30)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.<init>(<console>:7)
at $line3.$eval$.<clinit>(<console>)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/21 15:15:54 INFO SparkContext: SparkContext already stopped.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $iwC$$iwC.<init>(<console>:15)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.NullPointerException
at org.apache.spark.sql.SQLContext$.createListenerAndUI(SQLContext.scala:1367)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
at $iwC$$iwC.<init>(<console>:15)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
<console>:16: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:16: error: not found: value sqlContext
import sqlContext.sql
^
scala>
scala> sc
<console>:20: error: not found: value sc
sc
^
scala>
I've also went to the YARN Web UI, found the Spark Shell in the list of FINISHED applications, then clicked on the application to see the logs. I've found two nodes with stderr logs:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/ssd1/tmp/nm-local-dir/usercache/hadoopu/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/03/21 15:07:20 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/03/21 15:07:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/21 15:07:21 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1458568053208_0005_000002
16/03/21 15:07:22 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/03/21 15:07:22 INFO SecurityManager: Changing view acls to: hadoopu
16/03/21 15:07:22 INFO SecurityManager: Changing modify acls to: hadoopu
16/03/21 15:07:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopu); users with modify permissions: Set(hadoopu)
16/03/21 15:07:22 INFO ApplicationMaster: Waiting for Spark driver to be reachable.
16/03/21 15:07:22 INFO ApplicationMaster: Driver now available: 10.108.57.32:39824
16/03/21 15:07:22 INFO ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> hadoop2, PROXY_URI_BASES -> http://hadoop2:8088/proxy/application_1458568053208_0005),/proxy/application_1458568053208_0005)
16/03/21 15:07:22 INFO RMProxy: Connecting to ResourceManager at hadoop2/10.108.57.32:8030
16/03/21 15:07:22 INFO YarnRMClient: Registering the ApplicationMaster
16/03/21 15:07:22 INFO YarnAllocator: Will request 2 executor containers, each with 1 cores and 1408 MB memory including 384 MB overhead
16/03/21 15:07:22 INFO YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
16/03/21 15:07:22 INFO YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
16/03/21 15:07:22 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
16/03/21 15:07:23 INFO AMRMClientImpl: Received new token for : hadoop14:32420
16/03/21 15:07:23 INFO AMRMClientImpl: Received new token for : hadoop3:35904
16/03/21 15:07:23 INFO YarnAllocator: Launching container container_1458568053208_0005_02_000002 for on host hadoop14
16/03/21 15:07:23 INFO YarnAllocator: Launching ExecutorRunnable. driverUrl: spark://CoarseGrainedScheduler#10.108.57.32:39824, executorHostname: hadoop14
16/03/21 15:07:23 INFO YarnAllocator: Launching container container_1458568053208_0005_02_000003 for on host hadoop3
16/03/21 15:07:23 INFO ExecutorRunnable: Starting Executor Container
16/03/21 15:07:23 INFO YarnAllocator: Launching ExecutorRunnable. driverUrl: spark://CoarseGrainedScheduler#10.108.57.32:39824, executorHostname: hadoop3
16/03/21 15:07:23 INFO ExecutorRunnable: Starting Executor Container
16/03/21 15:07:23 INFO YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
16/03/21 15:07:23 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
16/03/21 15:07:23 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
16/03/21 15:07:23 INFO ExecutorRunnable: Setting up ContainerLaunchContext
16/03/21 15:07:23 INFO ExecutorRunnable: Setting up ContainerLaunchContext
16/03/21 15:07:23 INFO ExecutorRunnable: Preparing Local resources
16/03/21 15:07:23 INFO ExecutorRunnable: Preparing Local resources
16/03/21 15:07:23 INFO ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "hadoop1" port: 9000 file: "/user/hadoopu/.sparkStaging/application_1458568053208_0005/spark-assembly-1.6.1-hadoop2.6.0.jar" } size: 187698038 timestamp: 1458569230874 type: FILE visibility: PRIVATE)
16/03/21 15:07:23 INFO ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "hadoop1" port: 9000 file: "/user/hadoopu/.sparkStaging/application_1458568053208_0005/spark-assembly-1.6.1-hadoop2.6.0.jar" } size: 187698038 timestamp: 1458569230874 type: FILE visibility: PRIVATE)
16/03/21 15:07:23 INFO ExecutorRunnable:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_PREFIX/share/hadoop/tools/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
SPARK_LOG_URL_STDERR -> http://hadoop3:8042/node/containerlogs/container_1458568053208_0005_02_000003/hadoopu/stderr?start=-4096
SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1458568053208_0005
SPARK_YARN_CACHE_FILES_FILE_SIZES -> 187698038
SPARK_USER -> hadoopu
SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE
SPARK_YARN_MODE -> true
SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1458569230874
SPARK_LOG_URL_STDOUT -> http://hadoop3:8042/node/containerlogs/container_1458568053208_0005_02_000003/hadoopu/stdout?start=-4096
SPARK_YARN_CACHE_FILES -> hdfs://hadoop1:9000/user/hadoopu/.sparkStaging/application_1458568053208_0005/spark-assembly-1.6.1-hadoop2.6.0.jar#__spark__.jar
command:
{{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.driver.port=39824' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler#10.108.57.32:39824 --executor-id 2 --hostname hadoop3 --cores 1 --app-id application_1458568053208_0005 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
16/03/21 15:07:23 INFO ExecutorRunnable:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_PREFIX/share/hadoop/tools/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
SPARK_LOG_URL_STDERR -> http://hadoop14:8042/node/containerlogs/container_1458568053208_0005_02_000002/hadoopu/stderr?start=-4096
SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1458568053208_0005
SPARK_YARN_CACHE_FILES_FILE_SIZES -> 187698038
SPARK_USER -> hadoopu
SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE
SPARK_YARN_MODE -> true
SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1458569230874
SPARK_LOG_URL_STDOUT -> http://hadoop14:8042/node/containerlogs/container_1458568053208_0005_02_000002/hadoopu/stdout?start=-4096
SPARK_YARN_CACHE_FILES -> hdfs://hadoop1:9000/user/hadoopu/.sparkStaging/application_1458568053208_0005/spark-assembly-1.6.1-hadoop2.6.0.jar#__spark__.jar
command:
{{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.driver.port=39824' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler#10.108.57.32:39824 --executor-id 1 --hostname hadoop14 --cores 1 --app-id application_1458568053208_0005 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
...
16/03/21 15:07:25 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/03/21 15:07:25 INFO ApplicationMaster: Final app status: UNDEFINED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
16/03/21 15:07:25 INFO ApplicationMaster: Unregistering ApplicationMaster with UNDEFINED (diag message: Shutdown hook called before final status was reported.)
16/03/21 15:07:25 INFO AMRMClientImpl: Waiting for application to be successfully unregistered.
16/03/21 15:07:25 INFO ApplicationMaster: Deleting staging directory .sparkStaging/application_1458568053208_0005
16/03/21 15:07:25 INFO ShutdownHookManager: Shutdown hook called
Any ideas why I can't run Spark Shell on YARN with client mode?
I had the same issue. It turned out to be a firewall between my login node and the cluster: the cluster was trying to connect back to the login node on a random port that was blocked. Either remove the firewall rules, or move your shell to one of the nodes of the cluster where there aren't any firewall rules that block access.

Flume not writing logs to Hdfs

So i configured flume to write my apache2 access logs to hdfs ...and as i figured out the by the logs of flume is that all the configuration are correct but i dont know the reason why is it still not writing to hdfs.
So here is my flume config file
#agent and component of agent
search.sources = so
search.sinks = si
search.channels = sc
# Configure a channel that buffers events in memory:
search.channels.sc.type = memory
search.channels.sc.capacity = 20000
search.channels.sc.transactionCapacity = 100
# Configure the source:
search.sources.so.channels = sc
search.sources.so.type = exec
search.sources.so.command = tail -F /var/log/apache2/access.log
# Describe the sink:
search.sinks.si.channel = sc
search.sinks.si.type = hdfs
search.sinks.si.hdfs.path = hdfs://localhost:9000/flumelogs/
search.sinks.si.hdfs.writeFormat = Text
search.sinks.si.hdfs.fileType = DataStream
search.sinks.si.hdfs.rollSize=0
search.sinks.si.hdfs.rollCount = 10000
search.sinks.si.hdfs.batchSize=1000
search.sinks.si.rollInterval=1
and here are my flume logs
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Creating channels
14/12/18 17:47:56 INFO channel.DefaultChannelFactory: Creating instance of channel sc type memory
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Created channel sc
14/12/18 17:47:56 INFO source.DefaultSourceFactory: Creating instance of source so, type exec
14/12/18 17:47:56 INFO sink.DefaultSinkFactory: Creating instance of sink: si, type: hdfs
14/12/18 17:47:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/18 17:47:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
14/12/18 17:47:56 INFO node.AbstractConfigurationProvider: Channel sc connected to [so, si]
14/12/18 17:47:56 INFO node.Application: Starting new configuration:{ sourceRunners:{so=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:so,state:IDLE} }} sinkRunners:{si=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#3de76481 counterGroup:{ name:null counters:{} } }} channels:{sc=org.apache.flume.channel.MemoryChannel{name: sc}} }
14/12/18 17:47:56 INFO node.Application: Starting Channel sc
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: sc: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: sc started
14/12/18 17:47:56 INFO node.Application: Starting Sink si
14/12/18 17:47:56 INFO node.Application: Starting Source so
14/12/18 17:47:56 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/apache2/access.log
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: si: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: si started
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: so: Successfully registered new MBean.
14/12/18 17:47:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: so started
and this is the command, i have used to start flume
flume-ng agent -n search -c conf -f ../conf/flume-conf-search
and i have a path in hdfs
hadoop fs -mkdir hdfs://localhost:9000/flumelogs
but i dont know why it is not writing to hdfs..i can see the access logs of apache2 ..but flume is not sending them to hdfs /flumelogs dir....please help ! !
I don't think it's a permission issue, you would see exceptions when flume is flushing to HDFS. There are two possible reasons for this problem:
1) there's is not enough data in the buffer, flume doesn't think it has to flush yet. Your sink batch size is 1000, your channel's capacity is 20000. To verify this, CTRL -C your flume process, that will force the process to flush to HDFS.
2) the more probable reason is that your exec source is not running properly. This can be due to a path problem with the tail command. Add the full path to tail in your command, like /bin/tail -F /var/log/apache2/access.log or /usr/bin/tail -F /var/log/apache2/access.log (depending on your system) check
which tail
for the correct path.
Could you please check the permissions on this folder : hdfs://localhost:9000/flumelogs/
My guess is that flume doesn't have the permission to write to that folder

Flume NG not writing to HDFS

I'm new at using Flume and Hadoop so I'm trying to setup the simplest (but somewhat helpful/realistic) example I can. I'm using the HortonWorks Sandbox in a VM client. After following one tutorial 12 (which involves setting up and using Flume) everything seems to be working correctly.
So I setup my own flume.conf that should
Read from an apache access log
Use a memory channel
Write to the HDFS
Simple enough right? Here's my conf file
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
agent.sources.exec-source.type=exec
agent.sources.exec-source.command=tail -F /var/log/httpd/access_log
agent.sinks.hdfs-sink.type=hdfs
agent.sinks.hdfs-sink.hdfs.path=/flume/events
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess
agent.sinks.hdfs-sink.hdfs.rollInterval=10
agent.sinks.hdfs-sink.hdfs.rollSize=0
agent.channels.ch1.type=memory
agent.channels.ch1.capacity=1000
agent.sources.exec-source.channels=ch1
agent.sinks.hdfs-sink.channel=ch1
I've seen several people have problems writing to HDFS, and in most cases it was that there weren't enough logs to fill the HDFS block. However, rollInterval=10 should generate a new file every 10 seconds, as long as at least 1 line is written to it. I can run "tail -F /var/log/httpd/access_log" in another window and see lines being written to the log fairly consistantly, so I don't think it's that.
and here's the command/output from trying to start this agent
[root#sandbox ~]# flume-ng agent -f /etc/flume/conf/flume.conf -n apache-agent
Warning: No configuration directory set! Use --conf <dir> to override.
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-log4j12-1.4.3.jar from classpath
Info: Including HBASE libraries found via (/usr/bin/hbase) for HBASE access
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-log4j12-1.4.3.jar from classpath
+ exec /usr/jdk/jdk1.6.0_31//bin/java -Xmx20m -cp '/usr/lib/flume/lib/*:/usr/lib/hadoop/libexec/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hadoop/libexec/..:/usr/lib/hadoop/libexec/../hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/libexec/../lib/asm-3.2.jar:/usr/lib/hadoop/libexec/../lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/libexec/../lib/commons-cli-1.2.jar:/usr/lib/hadoop/libexec/../lib/commons-codec-1.4.jar:/usr/lib/hadoop/libexec/../lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-configuration-1.6.jar:/usr/lib/hadoop/libexec/../lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-digester-1.8.jar:/usr/lib/hadoop/libexec/../lib/commons-el-1.0.jar:/usr/lib/hadoop/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-io-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-lang-2.4.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/libexec/../lib/commons-math-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-net-3.1.jar:/usr/lib/hadoop/libexec/../lib/core-3.1.1.jar:/usr/lib/hadoop/libexec/../lib/guava-11.0.2.jar:/usr/lib/hadoop/libexec/../lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/libexec/../lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-tools.jar:/usr/lib/hadoop/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/libexec/../lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jdeb-0.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-core-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-json-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-server-1.8.jar:/usr/lib/hadoop/libexec/../lib/jets3t-0.6.1.jar:/usr/lib/hadoop/libexec/../lib/jetty-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jsch-0.1.42.jar:/usr/lib/hadoop/libexec/../lib/junit-4.5.jar:/usr/lib/hadoop/libexec/../lib/kfs-0.2.2.jar:/usr/lib/hadoop/libexec/../lib/log4j-1.2.15.jar:/usr/lib/hadoop/libexec/../lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/libexec/../lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/lib/hadoop/libexec/../lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hbase/bin/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hbase/bin/..:/usr/lib/hbase/bin/../hbase-0.94.6.1.3.0.0-107-security.jar:/usr/lib/hbase/bin/../hbase-0.94.6.1.3.0.0-107-security-tests.jar:/usr/lib/hbase/bin/../lib/activation-1.1.jar:/usr/lib/hbase/bin/../lib/asm-3.1.jar:/usr/lib/hbase/bin/../lib/avro-1.5.3.jar:/usr/lib/hbase/bin/../lib/avro-ipc-1.5.3.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hbase/bin/../lib/commons-cli-1.2.jar:/usr/lib/hbase/bin/../lib/commons-codec-1.4.jar:/usr/lib/hbase/bin/../lib/commons-collections-3.2.1.jar:/usr/lib/hbase/bin/../lib/commons-configuration-1.6.jar:/usr/lib/hbase/bin/../lib/commons-digester-1.8.jar:/usr/lib/hbase/bin/../lib/commons-el-1.0.jar:/usr/lib/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/lib/hbase/bin/../lib/commons-io-2.1.jar:/usr/lib/hbase/bin/../lib/commons-lang-2.5.jar:/usr/lib/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/lib/hbase/bin/../lib/commons-math-2.1.jar:/usr/lib/hbase/bin/../lib/commons-net-1.4.1.jar:/usr/lib/hbase/bin/../lib/core-3.1.1.jar:/usr/lib/hbase/bin/../lib/guava-11.0.2.jar:/usr/lib/hbase/bin/../lib/hadoop-core.jar:/usr/lib/hbase/bin/../lib/high-scale-lib-1.1.1.jar:/usr/lib/hbase/bin/../lib/httpclient-4.1.2.jar:/usr/lib/hbase/bin/../lib/httpcore-4.1.3.jar:/usr/lib/hbase/bin/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-xc-1.8.8.jar:/usr/lib/hbase/bin/../lib/jamon-runtime-2.3.1.jar:/usr/lib/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/lib/hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/lib/hbase/bin/../lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hbase/bin/../lib/jersey-core-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-json-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-server-1.8.jar:/usr/lib/hbase/bin/../lib/jettison-1.1.jar:/usr/lib/hbase/bin/../lib/jetty-6.1.26.jar:/usr/lib/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/lib/hbase/bin/../lib/jruby-complete-1.6.5.jar:/usr/lib/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsr305-1.3.9.jar:/usr/lib/hbase/bin/../lib/junit-4.10-HBASE-1.jar:/usr/lib/hbase/bin/../lib/libthrift-0.8.0.jar:/usr/lib/hbase/bin/../lib/log4j-1.2.16.jar:/usr/lib/hbase/bin/../lib/metrics-core-2.1.2.jar:/usr/lib/hbase/bin/../lib/netty-3.2.4.Final.jar:/usr/lib/hbase/bin/../lib/protobuf-java-2.4.0a.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hbase/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/lib/hbase/bin/../lib/velocity-1.7.jar:/usr/lib/hbase/bin/../lib/xmlenc-0.52.jar:/usr/lib/hbase/bin/../lib/zookeeper.jar:/etc/hadoop/conf:/usr/lib/hadoop/bin:/usr/lib/hadoop/build.xml:/usr/lib/hadoop/CHANGES.txt:/usr/lib/hadoop/conf:/usr/lib/hadoop/contrib:/usr/lib/hadoop/hadoop-ant-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-ant.jar:/usr/lib/hadoop/hadoop-client-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-client.jar:/usr/lib/hadoop/hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/hadoop-examples-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-examples.jar:/usr/lib/hadoop/hadoop-minicluster-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-minicluster.jar:/usr/lib/hadoop/hadoop-test-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-test.jar:/usr/lib/hadoop/hadoop-tools-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-tools.jar:/usr/lib/hadoop/HDP-CHANGES.txt:/usr/lib/hadoop/ivy:/usr/lib/hadoop/ivy.xml:/usr/lib/hadoop/lib:/usr/lib/hadoop/libexec:/usr/lib/hadoop/LICENSE.txt:/usr/lib/hadoop/logs:/usr/lib/hadoop/LONGWING-CHANGES.txt:/usr/lib/hadoop/NOTICE.txt:/usr/lib/hadoop/pids:/usr/lib/hadoop/README.txt:/usr/lib/hadoop/sbin:/usr/lib/hadoop/webapps:/usr/lib/hadoop/lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-tools.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib/jdeb-0.8.jar:/usr/lib/hadoop/lib/jdiff:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-2.1:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/*plugin*jar:/usr/lib/hadoop/lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/zookeeper/bin:/usr/lib/zookeeper/conf:/usr/lib/zookeeper/lib:/usr/lib/zookeeper/zookeeper-3.4.5.1.3.0.0-107.jar:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/zookeeper/lib/ant-1.8.0.jar:/usr/lib/zookeeper/lib/ant-launcher-1.8.0.jar:/usr/lib/zookeeper/lib/backport-util-concurrent-3.1.jar:/usr/lib/zookeeper/lib/classworlds-1.1-alpha-2.jar:/usr/lib/zookeeper/lib/commons-codec-1.6.jar:/usr/lib/zookeeper/lib/commons-io-2.2.jar:/usr/lib/zookeeper/lib/commons-logging-1.1.1.jar:/usr/lib/zookeeper/lib/httpclient-4.2.3.jar:/usr/lib/zookeeper/lib/httpcore-4.2.3.jar:/usr/lib/zookeeper/lib/jline-0.9.94.jar:/usr/lib/zookeeper/lib/jsoup-1.7.1.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:/usr/lib/zookeeper/lib/maven-ant-tasks-2.1.3.jar:/usr/lib/zookeeper/lib/maven-artifact-2.2.1.jar:/usr/lib/zookeeper/lib/maven-artifact-manager-2.2.1.jar:/usr/lib/zookeeper/lib/maven-error-diagnostics-2.2.1.jar:/usr/lib/zookeeper/lib/maven-model-2.2.1.jar:/usr/lib/zookeeper/lib/maven-plugin-registry-2.2.1.jar:/usr/lib/zookeeper/lib/maven-profile-2.2.1.jar:/usr/lib/zookeeper/lib/maven-project-2.2.1.jar:/usr/lib/zookeeper/lib/maven-repository-metadata-2.2.1.jar:/usr/lib/zookeeper/lib/maven-settings-2.2.1.jar:/usr/lib/zookeeper/lib/nekohtml-1.9.6.2.jar:/usr/lib/zookeeper/lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/lib/zookeeper/lib/plexus-interpolation-1.11.jar:/usr/lib/zookeeper/lib/plexus-utils-3.0.8.jar:/usr/lib/zookeeper/lib/wagon-file-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-2.4.jar:/usr/lib/zookeeper/lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-shared-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-shared4-2.4.jar:/usr/lib/zookeeper/lib/wagon-provider-api-2.4.jar:/usr/lib/zookeeper/lib/xercesMinimal-1.9.6.2.jar:/usr/lib/hadoop/libexec/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hadoop/libexec/..:/usr/lib/hadoop/libexec/../hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/libexec/../lib/asm-3.2.jar:/usr/lib/hadoop/libexec/../lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/libexec/../lib/commons-cli-1.2.jar:/usr/lib/hadoop/libexec/../lib/commons-codec-1.4.jar:/usr/lib/hadoop/libexec/../lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-configuration-1.6.jar:/usr/lib/hadoop/libexec/../lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-digester-1.8.jar:/usr/lib/hadoop/libexec/../lib/commons-el-1.0.jar:/usr/lib/hadoop/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-io-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-lang-2.4.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/libexec/../lib/commons-math-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-net-3.1.jar:/usr/lib/hadoop/libexec/../lib/core-3.1.1.jar:/usr/lib/hadoop/libexec/../lib/guava-11.0.2.jar:/usr/lib/hadoop/libexec/../lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/libexec/../lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-tools.jar:/usr/lib/hadoop/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/libexec/../lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jdeb-0.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-core-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-json-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-server-1.8.jar:/usr/lib/hadoop/libexec/../lib/jets3t-0.6.1.jar:/usr/lib/hadoop/libexec/../lib/jetty-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jsch-0.1.42.jar:/usr/lib/hadoop/libexec/../lib/junit-4.5.jar:/usr/lib/hadoop/libexec/../lib/kfs-0.2.2.jar:/usr/lib/hadoop/libexec/../lib/log4j-1.2.15.jar:/usr/lib/hadoop/libexec/../lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/libexec/../lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/lib/hadoop/libexec/../lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/conf' -Djava.library.path=:/usr/lib/hadoop/libexec/../lib/native/Linux-amd64-64:/usr/lib/hadoop/libexec/../lib/native/Linux-amd64-64:/usr/lib/hbase/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application -f /etc/flume/conf/flume.conf -n apache-agent
13/09/03 12:35:11 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
13/09/03 12:35:11 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/etc/flume/conf/flume.conf
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Added sinks: hdfs-sink Agent: agent
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent]
13/09/03 12:35:11 WARN node.AbstractConfigurationProvider: No configuration found for this host:apache-agent
13/09/03 12:35:11 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} }
Now at this point I realize I'm missing several things.
1) I expect to see something along the lines of "INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-sink started" as my last line, which I don't
2) If I use the command “hadoop fs -lsr /flume” I should see new logs in my HDFS, but I don't. The last logs are from 8/28/2013, when I did the tutorial.
I also don't expect to see that WARN line in there, but I'm not sure why it's there, so maybe that's my problem and someone can tell me why.
So my questions are:
1) Can anyone tell me what might be going wrong here?
2) When I get this problem sorted out, is there anything else I should be looking for to see what Flume is working correctly, reading what it should and writing to where it should and when?
The answer is, of course, to name your agent when you start flume the same as your agent name in the config file. So my command line should have ended "-n agent" and NOT "-n apache-agent" since my flume.conf file specifies "agent.X"
After that everything appears to work.
In the config file you specified
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
so the agent name is 'agent' flume expects that while running the flume-agent you should use the same name as specified in the config file so the command should be
/usr/lib/flume/bin/flume-ng agent -n agent
Did you do set the agent in step #3 ?
Check out the original blog post and the Hadoop UI Hue and it Hadoop tutorials.

Resources