I'm using MapR distribution. While i'm trying to run a hive query. Its showing the error- java.io.ioexception: failed to run job : application rejected by queue placement policy
I have set the queue with below command,
set mapred.job.queue.name=<>;
But still no use. Could some one help me to understand!!
Thanks in advance.
I had this same error.
I used: hadoop queue -showacls | grep SUBMIT to find out the queues I had access to, and then used the command "set mapreduce.job.queuename".
Please do check your Yarn Resource Pool Configuration to make sure you have adequate resources provided for the queue.
Related
I am running a MapReduce job in Hadoop 2.7.3 in a single node cluster. How do I calculate the time taken by the map and reduce tasks of this job?
SOLVED
In case it helps anyone who views this question or faces a similar problem.
Thanks to #Shubham's answer and a little research I did:
Job tracker has been removed in hadoop 2. It has been split into resource manager and application master.
To access the Resource manager, type in the URL in your browser "http://localhost:8088"
To access the Job History Server (to view statistics about the applications and jobs that have been completed) type in the URL in your browser "http://localhost:19888"
You could encounter an error when trying to access the Job History Server. It may show that there is no history for the application. In that case follow these steps:
Change the bashrc file
Steps:
i. In your terminal, type "nano ~/.bashrc"
ii. Now in this file, where the other hadoop variables are written add the line
export HADOOP_CONFIG_DIR=/usr/local/hadoop/etc/hadoop
iii. Exit out of nano and save the file.
iv. Run the command "source ~/.bashrc"
1. To start the job history server
Steps:
i. Run the command in your terminal
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONFIG_DIR start historyserver
ii. Then run the command
jps
You should be able to see the "JobHistoryServer" in the list
iii. Now run the command
netstat -ntlp | grep 19888
Hit resource manager's web UI(http://rm_http_address_host:port/). Typically the web port is 8088. You can hit http://resourcemanager_host:8088/ for this.
There you will find the link for all the applications that are in various states like STARTED, RUNNING, FAILED, SUCCEEDED etc
Clicking on each application's link will give you all the statistics(like number of containers(mappers/reducers in case of mapreduce), memory/Vcores used, running time and a lot more stats) about that yarn job.
And a lot many stats are exposed by ResourceManager REST API’s. Find them here https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
You can go to the jobtracker (Runs on port 50030 by default) and check the job details. It shows the counters for Map time and reduce time. Moreover if you are interested in individual tasks you can follow link "Analyse This Job" that shows best and worst performing tasks.
` I need to check MQ queue is already exists in cluster. dspmq command and
dis q(TEST.QUEUE) CLUSTER. which command is used to check IBM MQ queue is already exists in cluster
dspmq is used to display the Queue Manager status.
If you want to find out whether a Cluster already has a Queue in it you want to execute the following MQSC command DISPLAY QCLUSTER(<Queue Name>) WHERE (CLUSTER EQ <cluster name>)
However, the response will only be valid if the Queue Manager knows about the Queue:
If you execute the command on a full repository then you can trust the response as the Full repositories always know everything about the cluster.
If you execute the command on a partial repository, the Queue Manager will only be able to tell you about the Queue if an application has already attempted to make use of the Queue. Otherwise it won't know whether it exists or not.
I have a topology running on a Storm cluster with 3 supervisor nodes(32GRAM each node). In the first several days, the topology goes well, everything is ok. But the following error always occurred and the topology gone down after several days running:
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /brokers/topics/TOPICNAME/partitions at storm.kafka.ZkCoordinator.refresh
The topology uses a spout to consume messages from a remote Kafka service which sits on an remote server and this server is also the zookeeper service on.
I guess the reason for this exception is that the zookeeper server is instability, OR the network connection is unstable.
I have no permission to do anything with the remote kafka/zookeeper server, So I need a solution by my side to keep the topology running stably. Is there anyway to let the topology runs stably OR anyway to skip the exception while it comes out?
Or is there anyway to resubmit topology automatically?
Thank you very much!
The first thing you should have done is to google for what causes the connection loss error.
Then go to storm's log files and view which line of code is causing the error.
The right way to do things is to find out what is causing the error.
However, if you want the quicker temporary solution, then use Storm's REST API to kill the topology. Then you can use a normal Java program or a script in any language to re-launch the topology from the commandline.
I've set up the EC2 cluster with Spark. Everything works, all master/slaves are up and running.
I'm trying to submit a sample job (SparkPi). When I ssh to cluster and submit it from there - everything works fine. However when driver is created on a remote host (my laptop), it doesn't work. I've tried both modes for --deploy-mode:
--deploy-mode=client:
From my laptop:
./bin/spark-submit --master spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
Results in the following indefinite warnings/errors:
WARN TaskSchedulerImpl: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have
sufficient memory 15/02/22 18:30:45
ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 15/02/22 18:30:45
ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
...and failed drivers - in Spark Web UI "Completed Drivers" with "State=ERROR" appear.
I've tried to pass limits for cores and memory to submit script but it didn't help...
--deploy-mode=cluster:
From my laptop:
./bin/spark-submit --master spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --deploy-mode cluster --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
The result is:
.... Driver successfully submitted as driver-20150223023734-0007 ...
waiting before polling master for driver state ... polling master for
driver state State of driver-20150223023734-0007 is ERROR Exception
from cluster was: java.io.FileNotFoundException: File
file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
does not exist. java.io.FileNotFoundException: File
file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
does not exist. at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329) at
org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)
at
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:75)
So, I'd appreciate any pointers on what is going wrong and some guidance how to deploy jobs from remote client. Thanks.
UPDATE:
So for the second issue in cluster mode, the file must be globally visible by each cluster node, so it has to be somewhere in accessible location. This solve IOException but leads to the same issue as in the client mode.
The documentation at:
http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security
lists all the different communication channels used in a Spark cluster. As you can see, there are a bunch where the connection is made from the Executor(s) to the Driver. When you run with --deploy-mode=client, the driver runs on your laptop, so the executors will try to make a connection to your laptop. If the AWS security group that your executors run under blocks outbound traffic to your laptop (which the default security group created by the Spark EC2 scripts doesn't), or you are behind a router/firewall (more likely), they fail to connect and you get the errors you are seeing.
So to resolve it, you have to forward all the necessary ports to your laptop, or reconfigure your firewall to allow connection to the ports. Seeing as a bunch of the ports are chosen at random, this means opening up a wide range of, if not all ports. So probably using --deploy-mode=cluster, or client from the cluster, is less painful.
I advise against submitting spark jobs remotely using the port opening strategy, because it can create security problems and is in my experience, more trouble than it's worth, especially due to having to troubleshoot the communication layer.
Alternatives:
1) Livy - now an Apache project! http://livy.io or http://livy.incubator.apache.org/
2) Spark Job server - https://github.com/spark-jobserver/spark-jobserver
I am using Spark 1.1.1 . I followed the instructions given on https://spark.apache.org/docs/1.1.1/ec2-scripts.html and have a cluster of 1 master node and 1 worker on EC2 running.
I have made a jar of the application and rsynced it to the slaves. When I run the application using spark-submit with the deploy-mode of client, the application works. However, when I do so using deploy-mode cluster it gives me an error saying it cannot find the jar on the worker. The permission of the jar is 755 on both the master and worker.
I am not sure whether when I run the application using deploy-mode=client whether the application is using the workers. I don't think it is since the worker url does not show any completed jobs. But it does show failed jobs during deploy-mode=cluster.
Am I doing something wrong? Thank you for your help.
You can check if executors are assigned to the application on the /executors page on port 4040 (e.g. http://localhost:4040/executors/). If you only see <driver> then you are not using the worker. If you see one line for <driver> and one other line (with ID 0, unless it has restarted), then the worker is also providing an executor to your application. Here you can also see how many tasks it has completed for your application, and other stats.