Running spark shell on yarn client error - shell

I have Spark 1.6.1 and I have set
export HADOOP_CONF_DIR=/folder/location
Now if I run spark shell:
$ ./spark-shell --master yarn --deploy-mode client
I get this type of error (relevant part)
$ 16/09/18 15:49:18 INFO impl.TimelineClientImpl: Timeline service address: http://URL:PORT/ws/v1/timeline/
16/09/18 15:49:18 INFO client.RMProxy: Connecting to ResourceManager at URL/IP:PORT
16/09/18 15:49:18 INFO yarn.Client: Requesting a new application from cluster with 9 NodeManagers
16/09/18 15:49:19 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (14336 MB per container)
16/09/18 15:49:19 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/09/18 15:49:19 INFO yarn.Client: Setting up container launch context for our AM
16/09/18 15:49:19 INFO yarn.Client: Setting up the launch environment for our AM container
16/09/18 15:49:19 INFO yarn.Client: Preparing resources for our AM container
16/09/18 15:49:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/09/18 15:49:19 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied: user=Menmosyne, access=WRITE, inode="/user/Mnemosyne/.sparkStaging/application_1464874056768_0040":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
However when I run simply
$ ./spark-shell
(without specifying master) I get a lot more configurations on the screen than usual (ie it should load the configurations in the hadoop folder). So if I don't specify that the master is yarn, do my spark jobs still get submitted to the yarn cluster or not?

The default master in spark is local, that means that the application will run local in your machine and not in the cluster.
Yarn applications, in general (hive, mapreduce, spark, etc...), require to create temporal folders to store the partial data and/or current process configuration. Normally this temporal data is being written inside the HDFS user home (in your case /user/Mnemosyne)
Your problem is that your home folder was created by the user hdfs and your user Mnemosyne doesn't have privileges to write on it.
Then the spark job can not create the temporal structure in HDFS required to launch the application.
My suggestion is that you change the owner of the home folder (each user should be the owner of its home directory) and vaidate that the owner has full access to its home directory.
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#chown

The permissions on the home directory for Mnemosyne are incorrect. It is owned by the hdfs user and not Mnemosyne.
Run: hdfs dfs -chown -R Mnemosyne /user/Mnemosyne/
see hdfs chown docs here: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#chown

I just fixed this issue, with spark 1.6.2 and hadoop 2.6.0 cluster
1. copy spark-assembly-1.6.2-hadoop2.6.0.jar from local to hdfs
hdfs://Master:9000/spark/spark-assembly-1.6.2-hadoop2.6.0.jar
2.in spark-defaults.conf add parameter
spark.yarn.jars hdfs://Master:9000/spark/spark-assembly-1.6.2-hadoop2.6.0.jar
then run spark-shell --master yarn-client
all things OK
1 more thing if you want to run spark in yarn mode ,do not start spark cluster in local mode.

Related

YARN complains java.net.NoRouteToHostException: No route to host (Host unreachable)

Attempting to run h2o on a HDP 3.1 cluster and running into error that appears to be about YARN resource capacity...
[ml1user#HW04 h2o-3.26.0.1-hdp3.1]$ hadoop jar h2odriver.jar -nodes 3 -mapperXmx 10g
Determining driver host interface for mapper->driver callback...
[Possible callback IP address: 192.168.122.1]
[Possible callback IP address: 172.18.4.49]
[Possible callback IP address: 127.0.0.1]
Using mapper->driver callback IP address and port: 172.18.4.49:46015
(You can override these with -driverif and -driverport/-driverportrange and/or specify external IP using -extdriverif.)
Memory Settings:
mapreduce.map.java.opts: -Xms10g -Xmx10g -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dlog4j.defaultInitOverride=true
Extra memory percent: 10
mapreduce.map.memory.mb: 11264
Hive driver not present, not generating token.
19/07/25 14:48:05 INFO client.RMProxy: Connecting to ResourceManager at hw01.ucera.local/172.18.4.46:8050
19/07/25 14:48:06 INFO client.AHSProxy: Connecting to Application History server at hw02.ucera.local/172.18.4.47:10200
19/07/25 14:48:07 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/ml1user/.staging/job_1564020515809_0006
19/07/25 14:48:08 INFO mapreduce.JobSubmitter: number of splits:3
19/07/25 14:48:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564020515809_0006
19/07/25 14:48:08 INFO mapreduce.JobSubmitter: Executing with tokens: []
19/07/25 14:48:08 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
19/07/25 14:48:08 INFO impl.YarnClientImpl: Submitted application application_1564020515809_0006
19/07/25 14:48:08 INFO mapreduce.Job: The url to track the job: http://HW01.ucera.local:8088/proxy/application_1564020515809_0006/
Job name 'H2O_47159' submitted
JobTracker job ID is 'job_1564020515809_0006'
For YARN users, logs command is 'yarn logs -applicationId application_1564020515809_0006'
Waiting for H2O cluster to come up...
ERROR: Timed out waiting for H2O cluster to come up (120 seconds)
ERROR: (Try specifying the -timeout option to increase the waiting time limit)
Attempting to clean up hadoop job...
19/07/25 14:50:19 INFO impl.YarnClientImpl: Killed application application_1564020515809_0006
Killed.
19/07/25 14:50:23 INFO client.RMProxy: Connecting to ResourceManager at hw01.ucera.local/172.18.4.46:8050
19/07/25 14:50:23 INFO client.AHSProxy: Connecting to Application History server at hw02.ucera.local/172.18.4.47:10200
----- YARN cluster metrics -----
Number of YARN worker nodes: 3
----- Nodes -----
Node: http://HW03.ucera.local:8042 Rack: /default-rack, RUNNING, 0 containers used, 0.0 / 15.0 GB used, 0 / 3 vcores used
Node: http://HW04.ucera.local:8042 Rack: /default-rack, RUNNING, 0 containers used, 0.0 / 15.0 GB used, 0 / 3 vcores used
Node: http://HW02.ucera.local:8042 Rack: /default-rack, RUNNING, 0 containers used, 0.0 / 15.0 GB used, 0 / 3 vcores used
----- Queues -----
Queue name: default
Queue state: RUNNING
Current capacity: 0.00
Capacity: 1.00
Maximum capacity: 1.00
Application count: 0
Queue 'default' approximate utilization: 0.0 / 45.0 GB used, 0 / 9 vcores used
----------------------------------------------------------------------
ERROR: Unable to start any H2O nodes; please contact your YARN administrator.
A common cause for this is the requested container size (11.0 GB)
exceeds the following YARN settings:
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-mb
----------------------------------------------------------------------
For YARN users, logs command is 'yarn logs -applicationId application_1564020515809_0006'
Looking in the YARN configs in Ambari UI, these properties are nowhere to be found. But checking the YARN logs in the YARN resource manager UI and checking some of the logs for the killed application, I see what appears to be unreachable-host errors...
Container: container_e05_1564020515809_0006_02_000002 on HW03.ucera.local_45454_1564102219781
LogAggregationType: AGGREGATED
=============================================================================================
LogType:stderr
LogLastModifiedTime:Thu Jul 25 14:50:19 -1000 2019
LogLength:2203
LogContents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/11/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/ml1user/appcache/application_1564020515809_0006/filecache/10/job.jar/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.YarnChild).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
java.net.NoRouteToHostException: No route to host (Host unreachable)
at java.net.PlainSocketImpl.socketConnect(Native Method)
....
at java.net.Socket.<init>(Socket.java:211)
at water.hadoop.EmbeddedH2OConfig$BackgroundWriterThread.run(EmbeddedH2OConfig.java:38)
End of LogType:stderr
***********************************************************************
Taking note of "java.net.NoRouteToHostException: No route to host (Host unreachable)". However, I can access all the other nodes from each other and they can all ping each other, so not sure what is going on here. Any suggestions for debugging or fixing?
Think I found the problem, TLDR: firewalld (nodes running on centos7) was still running, when should be disabled on HDP clusters.
From another community post:
For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports must be open and available. The easiest way to do this is to temporarily disable iptables, as follows:
systemctl disable firewalld
service firewalld stop
So apparently iptables and firewalld need to be disabled across the cluster (supporting docs can be found here, I only disabled them on the Ambari installation node). After stopping these services across the cluster (I recommend using clush), was able to run the yarn job without incident.
Normally, this problem is either due to bad DNS configuration, firewalls, or network unreachability. To quote this official doc:
The hostname of the remote machine is wrong in the configuration files
The client's host table /etc/hosts has an invalid IPAddress for the target host.
The DNS server's host table has an invalid IPAddress for the target host.
The client's routing tables (In Linux, iptables) are wrong.
The DHCP server is publishing bad routing information.
Client and server are on different subnets, and are not set up to talk to each other. This may be an accident, or it is to deliberately lock down the Hadoop cluster.
The machines are trying to communicate using IPv6. Hadoop does not currently support IPv6
The host's IP address has changed but a long-lived JVM is caching the old value. This is a known problem with JVMs (search for "java negative DNS caching" for the details and solutions). The quick solution: restart the JVMs
For me, the problem was that the driver was inside a Docker container which made it impossible for the workers to send data back to it. In other words, workers and the driver not being in the same subnet. The solution as given in this answer was to set the following configurations:
spark.driver.host=<container's host IP accessible by the workers>
spark.driver.bindAddress=0.0.0.0
spark.driver.port=<forwarded port 1>
spark.driver.blockManager.port=<forwarded port 2>

Why does my yarn application not have logs even with logging enabled?

I have enabled logs in the xml file: yarn-site.xml, and I restarted yarn by doing:
sudo service hadoop-yarn-resourcemanager restart
sudo service hadoop-yarn-nodemanager restart
I ran my application, and then I see the applicationID in yarn application -list. So, I do this: yarn logs -applicationId <application ID>, and I get the following:
hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/ does not have any log files
Do I need to change some other configuration? Or am I accessing the logs the wrong way?
Thank you.
yarn application -list
will list only the applications that are either in SUBMITTED, ACCEPTED or RUNNING state.
Log aggregation collects each container's logs and moves these logs onto the directory configured in yarn.nodemanager.remote-app-log-dir only after the completion of the application. Refer the description of yarn.log-aggregation-enable property here.
So, the applicationId listed by the command isn't completed yet and the logs are not yet collected. Thus the response when trying to access the logs of a running application
hdfs://<ip address>/var/log/hadoop-yarn/path/to/application/ does not have any log files
You can try the same command yarn logs -applicationId <application ID> to view the logs once the application has completed.
To list all the FINISHED applications, use
yarn application -list -appStates FINISHED
Or to list all the applications
yarn application -list -appStates ALL
Enable Log Aggregation
Log aggregation is enabled in the yarn-site.xml file. The yarn.log-aggregation-enable property enables log aggregation for running applications.
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
In version 2.3.2 of hadoop and higher you can get log aggregation to occur hourly on running jobs using this configuration in yarn-site.xml:
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>
See this for further details: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html
It was probably saved with another appOwner. You can try to specify the application owner in your command:
yarn logs -appOwner .. -application_id ..
ROOT CAUSE: When log aggregation has been enabled each users application logs will, by default, be placed in the directory hdfs:///app-logs//logs/<APPLICATION_ID>. By default only the user that submitted the job and members of the hadoop group will have access to read the log files. In the example directory listing below you can see that the permissions are 770. No access for anyone other than the owner and members of the hadoop group.
[root#mycluster ~]$ hdfs dfs -ls /app-logs
Found 3 items
drwxrwx--- - hive hadoop 0 2017-03-10 15:33 /app-logs/hive
drwxrwx--- - user1 hadoop 0 2017-03-10 15:37 /app-logs/user1
drwxrwx--- - spark hadoop 0 2017-03-10 15:39 /app-logs/spark
SOLUTION: The message above can be deceiving and does not necessarily indicate that log aggregation has not been enabled. To obtain yarn logs for an application the 'yarn logs' command must be executed as the user that submitted the application. In the example below the application was submitted by user1. If we execute the same command as above as the user 'user1' we should get the following output if log aggregation has been enabled.
yarn logs -applicationId application_1473860344791_0001
16/09/19 23:10:33 INFO impl.TimelineClientImpl: Timeline service address: http://mycluster.somedomain.com:8188/ws/v1/timeline/
16/09/19 23:10:33 INFO client.RMProxy: Connecting to ResourceManager at mycluster.somedomain.com/192.168.1.89:8050
16/09/19 23:10:34 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/09/19 23:10:34 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_e03_1473860344791_0001_01_000001 on mycluster.somedomain.com_45454
LogType:stderr
Log Upload Time:Wed Sep 14 09:44:15 -0400 2016
LogLength:0
Log Contents:
End of LogType:stderr
REFERENCE: The following document describes how to use log aggregation to collect logs for long-running YARN applications.
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_yarn-resource-management/content/ch_log_a...

Running Spark on Yarn Client

I have recently setup an Multinode Hadoop HA (Namenode & ResourceManager) Cluster (3 node) , The installation is completed and all daemon's run as expected
Daemon in NN1 :
2945 JournalNode
3137 DFSZKFailoverController
6385 Jps
3338 NodeManager
22730 QuorumPeerMain
2747 DataNode
3228 ResourceManager
2636 NameNode
Daemon in NN2 :
19620 Jps
3894 QuorumPeerMain
16966 ResourceManager
16808 NodeManager
16475 DataNode
16572 JournalNode
17101 NameNode
16702 DFSZKFailoverController
Daemon in DN1 :
12228 QuorumPeerMain
29060 NodeManager
28858 DataNode
29644 Jps
28956 JournalNode
I am interested to run Spark Jobs on my Yarn setup.
I have installed Scala and Spark on my NN1 and i can successfully start my spark by issuing the following command
$ spark-shell
Now , i have no knowledge about SPARK , i would like to know how can i run Spark on Yarn. I have read that we can run it as either yarn-client or yarn-cluster.
Should i install the spark & scala on all nodes in the Cluster (NN2 & DN1) to run spark on Yarn client or cluster ? If No then how can i submit the Spark Jobs from NN1 (Primary namenode) host.
I have copied over the Spark assembly JAR to the HDFS as suggested in a blog i read ,
-rw-r--r-- 3 hduser supergroup 187548272 2016-04-04 15:56 /user/spark/share/lib/spark-assembly.jar
Also created SPARK_JAR variable in my bashrc file.I tried to submit the Spark Job as yarn-client but i end up with error as below , I have no idea on if i am doing it all correct or need other settings to be done first.
[hduser#ptfhadoop01v spark-1.6.0]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 2 --queue thequeue lib/spark-examples*.jar 10
16/04/04 17:27:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/04 17:27:51 WARN SparkConf:
SPARK_WORKER_INSTANCES was detected (set to '2').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --num-executors to specify the number of executors
- Or set SPARK_EXECUTOR_INSTANCES
- spark.executor.instances to configure the number of instances in the spark config.
16/04/04 17:27:54 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/04/04 17:27:54 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/04/04 17:27:57 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/04 17:27:58 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[hduser#ptfhadoop01v spark-1.6.0]$
Please help me to resolve this and on how to run Spark on Yarn as client or as Cluster mode.
Now , i have no knowledge about SPARK , i would like to know how can i run Spark on Yarn. I have read that we can run it as either yarn-client or yarn-cluster.
It's highly recommended that you read the official documentation of Spark on YARN at http://spark.apache.org/docs/latest/running-on-yarn.html.
You can use spark-shell with --master yarn to connect to YARN. You need to have proper configuration files on the machine you do spark-shell from, e.g. yarn-site.xml.
Should i install the spark & scala on all nodes in the Cluster (NN2 & DN1) to run spark on Yarn client or cluster ?
No. You don't have to install anything on YARN since Spark will distribute necessary files for you.
If No then how can i submit the Spark Jobs from NN1 (Primary namenode) host.
Start with spark-shell --master yarn and see if you can execute the following code:
(0 to 5).toDF.show
If you see a table-like output, you're done. Else, provide the error(s).
Also created SPARK_JAR variable in my bashrc file.I tried to submit the Spark Job as yarn-client but i end up with error as below , I have no idea on if i am doing it all correct or need other settings to be done first.
Remove the SPARK_JAR variable. Don't use it as it's not needed and might cause troubles. Read the official documentation at http://spark.apache.org/docs/latest/running-on-yarn.html to understand the basics of Spark on YARN and beyond.
By adding this property into hdfs-site.xml , it solved the issue
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
In the client mode you'd run it something like below for simple word count example
spark-submit --class org.sparkexample.WordCount --master yarn-client wordcount-sample-plain-1.0-SNAPSHOT.jar input.txt output.txt
I think you got the spark-submit command wrong there. There is no --master yarn set up.
I would highly recommend using an automated provisioning tool to set up your cluster quickly instead of a manual approach.
Refer to Cloudera or Hortonworks tools. You can use it to get setup in no time and be able to submit jobs easily without doing all these configurations manually.
Reference: https://hortonworks.com/products/hdp/

getting java.net.SocketTimeoutException when trying to run the Hadoop mapReduce on fresh install of Hortonworks

I have a fresh install of Hortonworks version 2.3_1 for oracle virtualbox and I get a java.net.SocketTimeoutException whenever I try to run a mapreduce job. I changed nothing other than the memory and the cores available to the VM.
full text of run:
WARNING: Use "yarn jar" to launch YARN applications.
15/09/01 01:15:17 INFO impl.TimelineClientImpl: Timeline service address: http:/ /sandbox.hortonworks.com:8188/ws/v1/timeline/
15/09/01 01:15:20 INFO client.RMProxy: Connecting to ResourceManager at sandbox. hortonworks.com/10.0.2.15:8050
15/09/01 01:16:19 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your applicatio n with ToolRunner to remedy this.
15/09/01 01:18:09 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor excepti on for block BP-601678901-10.0.2.15-1439987491556:blk_1073742292_1499
java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0 .2.15:52924 remote=/10.0.2.15:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja va:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 61)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 31)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1 18)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java :2280)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(P ipelineAck.java:244)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor .run(DFSOutputStream.java:749)
15/09/01 01:18:11 INFO mapreduce.JobSubmitter: Cleaning up the staging area /use r/root/.staging/job_1441069639378_0001
Exception in thread "main" java.io.IOException: All datanodes DatanodeInfoWithStorage[10.0.2.15:50010,DS-56099a5f-3cb3-426e-8e1a-ff3b53df9bf2,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1117)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412)
Full name of file ova file I am using: Sandbox_HDP_2.3_1_virtualbox.ova
my host is a window 7 home premium machine with eight lines of execution(four hyperthreaded cores, I think)
The problem was exactly what it seemed a timeout error. Fixed by going to the hadoop config folder and raising all the timeouts as well as the number of retries (although from the log that didn't come into play) and stopping unnecessary services on both the host and guest operating system.
Thank, sunrise76 on of those issues pointed me to the config folder.

Job tracker is not starting up

I am installing CDH4.6.0 with the help of this site I am running start-all.sh to start services.
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-hdfs-datanode start
/etc/init.d/hadoop-hdfs-secondarynamenode start
/etc/init.d/hadoop-0.20-mapreduce-jobtracker start
/etc/init.d/hadoop-0.20-mapreduce-tasktracker start
bin/bash [to start bash prompt after starting services]
After executing these instructions as a part of docker file, like
CMD ["start-all.sh"]
It starts all the services
When i jps it, i can see only
jps
Namenode
Datanode
Secondary Namenode
Tasktracker
But job tracker is not yet started. log is as follows
2015-01-23 07:26:46,706 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=JobTracker, sessionId=
2015-01-23 07:26:46,735 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker up at: 8021
2015-01-23 07:26:46,735 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker webserver: 50030
2015-01-23 07:26:47,725 INFO org.apache.hadoop.mapred.JobTracker:
Creating the system directory
2015-01-23 07:26:47,750 WARN org.apache.hadoop.mapred.JobTracker: Failed
to operate on mapred.system.dir (hdfs://localhost:8020/var/lib/hadoop-
hdfs/cache/mapred/mapred/system) because of permissions.
2015-01-23 07:26:47,750 WARN org.apache.hadoop.mapred.JobTracker: This
directory should be owned by the user 'mapred (auth:SIMPLE)'
2015-01-23 07:26:47,751 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ...
org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
But when i again start it from bash prompt, it works. Why so? Any suggestions?
I can see it from the log. Job tracker is starting at port 8020 and why is it trying to operate at port 8020? Is it a problem? If so, how to tackle it?
Seems like the mapred user doesn't have privilege to write files/directories inside the HDFS root directory.
Switch to hdfs user and assign necessary privilege to mapred user before starting mapreduce service.
sudo -su hdfs ;
hadoop fs -chmod 777 /
/etc/init.d/hadoop-0.20-mapreduce-jobtracker stop; /etc/init.d/hadoop-0.20-mapreduce-jobtracker start

Resources