Spark streaming from Kafka returns result on local but Not working on Yarn

Spark streaming from Kafka returns result on local but Not working on Yarn - hadoop

I am using Cloudera's VM CDH 5.12, spark v1.6, kafka(installed by yum) v0.10 and python 2.66 and scala 2.10
Below is a simple spark application that I am running. It takes events from kafka and prints it after map reduce.
from __future__ import print_function
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr)
exit(-1)
sc = SparkContext(appName="PythonStreamingKafkaWordCount")
ssc = StreamingContext(sc, 1)
zkQuorum, topic = sys.argv[1:]
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()
ssc.start()
ssc.awaitTermination()
When I submit above code using following command(local) it runs fine
spark-submit --master local[2] --jars /usr/lib/spark/lib/spark-examples.jar testfile.py <ZKhostname>:2181 <kafka-topic>
But when I submit same above code using following command(YARN) it doesn't work
spark-submit --master yarn --deploy-mode client --jars /usr/lib/spark/lib/spark-examples.jar testfile.py <ZKhostname>:2181 <kafka-topic>
Here is the log generated when ran on YARN(cutting them short, logs may differ from above mentioned spark settings):
INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.134.143
ApplicationMaster RPC port: 0
queue: root.cloudera
start time: 1515766709025
final status: UNDEFINED
tracking URL: http://quickstart.cloudera:8088/proxy/application_1515761416282_0010/
user: cloudera
40 INFO YarnClientSchedulerBackend: Application application_1515761416282_0010 has started running.
40 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53694.
40 INFO NettyBlockTransferService: Server created on 53694
53 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
54 INFO BlockManagerMasterEndpoint: Registering block manager quickstart.cloudera:56220 with 534.5 MB RAM, BlockManagerId(1, quickstart.cloudera, 56220)
07 INFO ReceiverTracker: Starting 1 receivers
07 INFO ReceiverTracker: ReceiverTracker started
07 INFO PythonTransformedDStream: metadataCleanupDelay = -1
07 INFO KafkaInputDStream: metadataCleanupDelay = -1
07 INFO KafkaInputDStream: Slide time = 10000 ms
07 INFO KafkaInputDStream: Storage level = StorageLevel(false, false, false, false, 1)
07 INFO KafkaInputDStream: Checkpoint interval = null
07 INFO KafkaInputDStream: Remember duration = 10000 ms
07 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream#7137ea0e
07 INFO PythonTransformedDStream: Slide time = 10000 ms
07 INFO PythonTransformedDStream: Storage level = StorageLevel(false, false, false, false, 1)
07 INFO PythonTransformedDStream: Checkpoint interval = null
07 INFO PythonTransformedDStream: Remember duration = 10000 ms
07 INFO PythonTransformedDStream: Initialized and validated org.apache.spark.streaming.api.python.PythonTransformedDStream#de77734
10 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 5.8 KB, free 534.5 MB)
10 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.5 KB, free 534.5 MB)
20 INFO JobScheduler: Added jobs for time 1515766760000 ms
30 INFO JobScheduler: Added jobs for time 1515766770000 ms
40 INFO JobScheduler: Added jobs for time 1515766780000 ms
After this, the job just starts repeating following lines(after some delay set by stream context) and doesnt printout kafka's stream, whereas job on master local with the exact same code does.
Interestingly it prints following line every-time a kafka event occurs(picture is of increased spark memory settings)
Note that:
Data is in kafka and I can see that in consumer console
I have also tried increasing executor's momory(3g) and network timeout time(800s) but no success

Can you see application stdout logs through Yarn Resource Manager UI?
Follow your Yarn Resource Manager link.(http://localhost:8088).
Find your application in running applications list and follow application's link. (http://localhost:8088/application_1396885203337_0003/)
Open "stdout : Total file length is xxxx bytes" link to see log file on browser.
Hope this helps.

When in local mode the application runs in a single machine and you get to see all the prints given in the codes.When run on a cluster everything is in distributed mode and runs on different machines/cores an will not be able to see the print given
Try to get the logs generated by spark using command yarn logs -applicationId

It's possible, that your is an alias and it's not defined on yarn nodes, or is not resolved on the yarn nodes for other reasons.

Related

The oozie job does not run with the message [AM container is launched, waiting for AM container to Register with RM]

I ran a shell job among the oozie examples.
However, YARN application is not executed.
Detail information YARN UI & LOG:
https://docs.google.com/document/d/1N8LBXZGttY3rhRTwv8cUEfK3WkWtvWJ-YV1q_fh_kks/edit
YARN application status is
Application Priority: 0 (Higher Integer value indicates higher priority)
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Queue: default
FinalStatus Reported by AM: Application has not completed yet.
Finished: N/A
Elapsed: 20mins, 30sec
Tracking URL: ApplicationMaster
Log Aggregation Status: DISABLED
Application Timeout (Remaining Time): Unlimited
Diagnostics: AM container is launched, waiting for AM container to Register with RM
Application Attempt status is
Application Attempt State: FAILED
Elapsed: 13mins, 19sec
AM Container: container_1607273090037_0001_02_000001
Node: N/A
Tracking URL: History
Diagnostics Info: ApplicationMaster for attempt appattempt_1607273090037_0001_000002 timed out
Node Local Request Rack Local Request Off Switch Request
Num Node Local Containers (satisfied by) 0
Num Rack Local Containers (satisfied by) 0 0
Num Off Switch Containers (satisfied by) 0 0 1
nodemanager log
2020-12-07 01:45:16,237 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1607273090037_0001_01_000001]
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1607273090037_0001_01_000001 transitioned from SCHEDULED to RUNNING
2020-12-07 01:45:16,267 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1607273090037_0001_01_000001
2020-12-07 01:45:16,272 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-oozie/nm-local-dir/usercache/oozie/appcache/application_1607273090037_0001/container_1607273090037_0001_01_000001/default_container_executor.sh]
2020-12-07 01:45:17,301 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: container_1607273090037_0001_01_000001's ip = 127.0.0.1, and hostname = localhost.localdomain
2020-12-07 01:45:17,345 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Skipping monitoring container container_1607273090037_0001_01_000001 since CPU usage is not yet available.
2020-12-07 01:45:48,274 INFO logs: Aliases are enabled
2020-12-07 01:54:50,242 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 496756, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0
2020-12-07 01:58:10,071 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1607273090037_0001_000001 (auth:SIMPLE)
2020-12-07 01:58:10,078 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1607273090037_0001_01_000001
What is the problem ?

hadoop3 can't find .nm-local-dir.usercache.hadoop.appcache. when doing pi test

I'am trying to setup an hadoop3 cluster on a local computer network, in small scale for starting one master node and two workers node.
I think I manage to have something that should work, following this tutorial configure hadoop 3.1.0 in multinodes cluster
I downloaded hadoop version 3.1.1
the dfsadim report:
hadoop#######:~/hadoop3/hadoop-3.1.1$ hdfs dfsadmin -report
Configured Capacity: 1845878235136 (1.68 TB)
Present Capacity: 355431677952 (331.02 GB)
DFS Remaining: 355427651584 (331.02 GB)
DFS Used: 4026368 (3.84 MB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 6
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: ######:9866 (######)
Hostname: ######
Decommission Status : Normal
Configured Capacity: 147511238656 (137.38 GB)
DFS Used: 2150400 (2.05 MB)
Non DFS Used: 46601465856 (43.40 GB)
DFS Remaining: 93390856192 (86.98 GB)
DFS Used%: 0.00%
DFS Remaining%: 63.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Sep 06 18:44:21 CEST 2018
Last Block Report: Thu Sep 06 18:08:09 CEST 2018
Num of Blocks: 17
Name: ######:9866 (######)
Hostname: ######
Decommission Status : Normal
Configured Capacity: 1698366996480 (1.54 TB)
DFS Used: 1875968 (1.79 MB)
Non DFS Used: 1350032670720 (1.23 TB)
DFS Remaining: 262036795392 (244.04 GB)
DFS Used%: 0.00%
DFS Remaining%: 15.43%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Sep 06 18:44:22 CEST 2018
Last Block Report: Thu Sep 06 18:08:10 CEST 2018
Num of Blocks: 12
So before continuing and tuning resource management I try to run a simple test and It failed.
here the pi example test
hadoop######:~/hadoop3/hadoop-3.1.1$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar pi 2 10
Number of Maps = 2
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Starting Job
2018-09-06 18:51:29,277 INFO client.RMProxy: Connecting to ResourceManager at nameMasterhost/IP:8032
2018-09-06 18:51:29,589 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1536250099280_0005
2018-09-06 18:51:29,771 INFO input.FileInputFormat: Total input files to process : 2
2018-09-06 18:51:30,338 INFO mapreduce.JobSubmitter: number of splits:2
2018-09-06 18:51:30,397 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-09-06 18:51:30,967 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1536250099280_0005
2018-09-06 18:51:30,970 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-09-06 18:51:31,175 INFO conf.Configuration: resource-types.xml not found
2018-09-06 18:51:31,175 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-09-06 18:51:31,248 INFO impl.YarnClientImpl: Submitted application application_1536250099280_0005
2018-09-06 18:51:31,295 INFO mapreduce.Job: The url to track the job: http://nameMAster:8088/proxy/application_1536250099280_0005/
2018-09-06 18:51:31,296 INFO mapreduce.Job: Running job: job_1536250099280_0005
2018-09-06 18:51:44,388 INFO mapreduce.Job: Job job_1536250099280_0005 running in uber mode : false
2018-09-06 18:51:44,390 INFO mapreduce.Job: map 0% reduce 0%
2018-09-06 18:51:44,409 INFO mapreduce.Job: Job job_1536250099280_0005 failed with state FAILED due to: Application application_1536250099280_0005 failed 2 times due to AM Container for appattempt_1536250099280_0005_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-09-06 18:51:38.416]Exception from container-launch.
Container id: container_1536250099280_0005_02_000001
Exit code: 1
Exception message: /bin/mv: target '/nm-local-dir/nmPrivate/application_1536250099280_0005/container_1536250099280_0005_02_000001/container_1536250099280_0005_02_000001.pid' is not a directory
[2018-09-06 18:51:38.421]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class .nm-local-dir.usercache.hadoop.appcache.application_1536250099280_0005.container_1536250099280_0005_02_000001.tmp
[2018-09-06 18:51:38.422]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class .nm-local-dir.usercache.hadoop.appcache.application_1536250099280_0005.container_1536250099280_0005_02_000001.tmp
For more detailed output, check the application tracking page: http://nameMaster:8088/cluster/app/application_1536250099280_0005 Then click on links to logs of each attempt.
. Failing the application.
2018-09-06 18:51:44,438 INFO mapreduce.Job: Counters: 0
Job job_1536250099280_0005 failed!
I'll add every information asked for, but I don't understand the problem and I don't want to flood the question with all configuration file if their are not relevant.
In the hdfs system file there is no "/nm-local-dir/".
I don't understand from where that path come.
Every help is warmly welcome.

HDFS is storage, YARN is compute. If you want to use your cluster for anything other than pure storage you'll need YARN which means you'll need Node Managers(NM).
Node Managers are servers that allow you to execute tasks so you need nm-local-dir defined in order run jobs like pi. The nm-local-dir needs to be defined in yarn-site.xml and is a local directory (not HDFS!) for every host that runs a Node Manager.

Yarn + Sqoop after Ctrl+C = stuck ACCEPTED

I am new to Hadoop and I am conducting a POC with a single node.
It is the second time this has happened and do not know how to solve. It happens after a map reduce have been stopped with Ctrl + C.
SQLServer am importing data into HBase by Sqoop and after an error and Ctrl + C, nunhum another job works, gets stuck in ACCEPTED.
Already excludes all jobs applications. I restarted the entire server. The Resource Manager is configured accordingly (10Gb memory, 2cores), but do not know what happens.
[root#hadoop01 /]# hadoop version
Hadoop 2.6.0.2.2.6.0-2800
Subversion git#github.com:hortonworks/hadoop.git -r acb70ecfae2c3c5ab46e24b0caebceaec16fdcd0
Compiled by jenkins on 2015-05-18T20:21Z
Compiled with protoc 2.5.0
From source with checksum a25c30f622eb057f47e2155f78dba5e
This command was run using /usr/hdp/2.2.6.0-2800/hadoop/hadoop-common-2.6.0.2.2.6.0-2800.jar
[root#hadoop01 fausto.branco]# cat /etc/centos-release
CentOS release 6.6 (Final)
sudo -u hdfs sqoop import –connect “jdbc:sqlserver://SQLServerIP:1433;database=db_POC;username=hdpteste;password=xxxxxx” \
–hbase-create-table \
–hbase-table vw_hdp_Arquivo \
–hbase-row-key “id_Arquivo, Valor” \
–column-family cf_name \
–driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
–table “dbo.vw_hdp_Arquivo” -m 1
15/07/14 19:43:51 INFO zookeeper.ZooKeeper: Session: 0x14e8e1385000001 closed
15/07/14 19:43:51 INFO zookeeper.ClientCnxn: EventThread shut down
15/07/14 19:43:51 INFO impl.TimelineClientImpl: Timeline service address: hxxp://hadoop01.POC.local:8188/ws/v1/timeline/
15/07/14 19:43:51 INFO client.RMProxy: Connecting to ResourceManager at hadoop01.POC.local/XXX.XXX.XXX.XXX:8050
15/07/14 19:43:53 INFO db.DBInputFormat: Using read commited transaction isolation
15/07/14 19:43:53 INFO mapreduce.JobSubmitter: number of splits:1
15/07/14 19:43:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1436900603357_0002
15/07/14 19:43:54 INFO impl.YarnClientImpl: Submitted application application_1436900603357_0002
15/07/14 19:43:54 INFO mapreduce.Job: The url to track the job: hxxp://hadoop01.POC.local:8088/proxy/application_1436900603357_0002/
15/07/14 19:43:54 INFO mapreduce.Job: Running job: job_1436900603357_0002
[root#hadoop01 fausto.branco]# yarn application -status application_1436900603357_0002
15/07/14 19:45:05 INFO impl.TimelineClientImpl: Timeline service address: hxxp://hadoop01.POC.local:8188/ws/v1/timeline/
15/07/14 19:45:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop01.POC.local/XXX.XXX.XXX.XXX:8050
Application Report :
Application-Id : application_1436900603357_0002
Application-Name : dbo.vw_hdp_Arquivo.jar
Application-Type : MAPREDUCE
User : hdfs
Queue : default
Start-Time : 1436903034500
Finish-Time : 0
Progress : 0%
State : ACCEPTED
Final-State : UNDEFINED
Tracking-URL : N/A
RPC Port : -1
AM Host : N/A
Aggregate Resource Allocation : 0 MB-seconds, 0 vcore-seconds
Diagnostics :
[root#hadoop01 fausto.branco]# hadoop job -status job_1436900603357_0002
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/07/14 19:46:19 INFO impl.TimelineClientImpl: Timeline service address: hxxp://hadoop01.POC.local:8188/ws/v1/timeline/
15/07/14 19:46:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop01.POC.local/XXX.XXX.XXX.XXX:8050
Job: job_1436900603357_0002
Job File: /user/hdfs/.staging/job_1436900603357_0002/job.xml
Job Tracking URL : hxxp://hadoop01.POC.local:8088/proxy/application_1436900603357_0002/
Uber job : false
Number of maps: 0
Number of reduces: 0
map() completion: 0.0
reduce() completion: 0.0
Job state: PREP
retired: false
reason for failure:
Counters: 0

Found!
Analysing this item:
Aggregate Resource Allocation : 0 MB-seconds, 0 vcore-seconds
I see the Node State is UNHEALTHY and found this message in log:
Health-Report: 1/1 local-dirs are bad: /hadoop/yarn/local;
1/1 local-dirs are bad: /hadoop/yarn/log;
So I put more disk and worked

cloudera Oozie sqoop2 job hangs running forever Heart beat Heart beat

I am trying to run two sqoop jobs in parallel using oozie. But two jobs are stuck after 95 % , other two are in accepted state.I have also increased yarn resource maximum memory . also added
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>50 </value>
in mapred-site.xml , but nothing helped. please help.
Yarn Cluster Metrix:
Apps Submitted 4
Apps Pending 2
Apps Running 2
Apps Completed 0
Containers Running 4
Memory Used 10GB
Memory Total 32GB
Memory Reserved 0B
VCores Used 4
VCores Total 24
VCores Reserved 0
Active Nodes 4
Decommissioned Nodes 0
Lost Nodes 0
Unhealthy Nodes 0
Rebooted Nodes 0
----------
Sysout Log
========================================================================
3175 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
3198 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5-cdh5.2.0
3212 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
3213 [main] INFO org.apache.sqoop.tool.BaseSqoopTool - Using Hive-specific delimiters for output. You can override
3213 [main] INFO org.apache.sqoop.tool.BaseSqoopTool - delimiters with --fields-terminated-by, etc.
3224 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
3280 [main] INFO org.apache.sqoop.manager.oracle.OraOopManagerFactory - Data Connector for Oracle and Hadoop is disabled.
3293 [main] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000
3297 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
3951 [main] INFO org.apache.sqoop.manager.OracleManager - Time zone has been set to GMT
4023 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM PT_PRELIM_FINDING_V t WHERE 1=0
4068 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce
5925 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-nobody/compile/0dab11f6545d8ef69d6dd0f6b9041a50/PT_PRELIM_FINDING_CYTOGEN_V.jar
5937 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of PT_PRELIM_FINDING_V
5962 [main] INFO org.apache.sqoop.manager.OracleManager - Time zone has been set to GMT
5981 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
6769 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat

Thanks #abeaamase.
I asked our DBA to increase oracle database max process to 750 and max session pool to around 1.5 times process size i.e 1125.
This has solved the issue. This has nothing to do with yarn memory.Unfortunately in sqoop2 this exception is not handled.
Please feel free to add more answers,if you feel this explanation is not appropriate.

Hadoop: slaves in service but doing nothing at all

I set up a hadoop cluster and started a MapReduce job on the cluster.
The master node is running actively but all slaves are doing nothing at all.
JPS on the slave node produces
20390 DataNode
20492 NodeManager
21256 Jps
Here is the screen cast:
The next to last row corresponds to the master node.
So why the slaves using no blocks?
Also running top on master node yields the Java process(hadoop jar jar-file.jar args) taking almost 100% of CPU resources. However, such process does not exist on any slave machines.
That is why I think slaves are at rest, doing nothing at all.
Here is one example of the slave datanode log:
2014-07-24 23:28:01,302 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlockMap
2014-07-24 23:28:01,302 INFO org.apache.hadoop.util.GSet: VM type = 64-bit
2014-07-24 23:28:01,304 INFO org.apache.hadoop.util.GSet: 0.5% max memory 889 MB = 4.4 MB
2014-07-24 23:28:01,304 INFO org.apache.hadoop.util.GSet: capacity = 2^19 = 524288 entries
2014-07-24 23:28:01,304 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-1752077220-193.167.138.8-1406217332464
2014-07-24 23:28:01,310 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added bpid=BP-1752077220-193.167.138.8-1406217332464 to blockPoolScannerMap, new size=1
2014-07-24 23:31:01,116 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1752077220-193.167.138.8-1406217332464 Total blocks: 0, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
And nothing more.
However, for the master data node, the log file contains lines like the following:
2014-07-24 22:27:23,443 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1752077220-193.167.138.8-1406217332464:blk_1073742749_1925 src: /193.167.138.8:44210 dest: /193.167.138.8:50010
which I think means the node is receiving tasks and processing the data.
The following is from the yarn log file of one the slave node:
2014-07-24 23:28:13,811 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:8042
2014-07-24 23:28:13,812 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node started at 8042
2014-07-24 23:28:14,122 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2014-07-24 23:28:14,130 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ugluk/193.167.138.8:8031
2014-07-24 23:28:14,176 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using finished containers :[]
2014-07-24 23:28:14,366 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id 1336429163
2014-07-24 23:28:14,369 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1986181585
2014-07-24 23:28:14,370 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as shagrat.hiit.fi:48662 with total resource of <memory:8192, vCores:8>
2014-07-24 23:28:14,370 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
I am using Hadoop 2.4.0

It seems that you formatted namenode more than once.
The block pool id error is majorly due to formatting of namenode multiple times.
Every time ,you format a namenode ,the blockpool id ,cluster id and the namespace id changes.
So first check the above attributes of the namenode and other datanodes and secondary namenode.
You can check using VERSION file in current directory of these nodes.For this ,first see where you configured your node by checking its path hadoop hdfs-site.xml.
go to that path,and look for the CURRENT directory and make the necessary changes.
Please let me know if this helps.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spark streaming from Kafka returns result on local but Not working on Yarn - hadoop

It's possible, that your is an alias and it's not defined on yarn nodes, or is not resolved on the yarn nodes for other reasons.

Related

The oozie job does not run with the message [AM container is launched, waiting for AM container to Register with RM]

hadoop3 can't find .nm-local-dir.usercache.hadoop.appcache. when doing pi test

Yarn + Sqoop after Ctrl+C = stuck ACCEPTED

cloudera Oozie sqoop2 job hangs running forever Heart beat Heart beat

Hadoop: slaves in service but doing nothing at all

Categories

Resources