Can't connect to resource manager from IDEA - hadoop

I encounter a connection issue when submiting job in yarn-client mode from IDEA Intellij.
I did set the env variables and double checked it by printing them out:
System.setProperty("YARN_CONF_DIR", "D:\\HadoopDev\\UserClick\\src\\main\\resources\\hadoop-vm");
System.setProperty("HADOOP_CONF_DIR", "D:\\HadoopDev\\UserClick\\src\\main\\resources\\hadoop-vm");
But I still got error message telling me:
INFO - Connecting to ResourceManager at /0.0.0.0:8032
INFO - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 >time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >sleepTime=1 SECONDS)
INFO - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 》>time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >sleepTime=1 SECONDS)
INFO - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 >time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >sleepTime=1 SECONDS)
All config files hadoop related are in the folder.And I have tried to upload the jar and submited it in yarn-client mode in the cluster which turned out it worked.
Any help?Thx~

set .config("spark.hadoop.yarn.resourcemanager.address", "hadoop:8032")
does the override trick.

Related

What if the ResourceManager down?

In the newest version of Hadoop mapreduce(called 'Yarn'), JobTracker(exists in previous version) has been replaced by the ResourceManager(called 'RM') and ApplicationMaster.
In official document about Yarn architecture, there are no words say that how many RMs are there in a MapReduce cluster, and the given graph about Yarn architecture shows only 1 RM exists in a cluster.
So, what if the only RM down? If there are several RMs, how do they work together?
Hope someone can explain it to me.
Thanks.
There is 1 RessourceManager per rack but you can have several racks in your cluster.
If you try to submit a job while RessourceManager is down, Hadoop will try to connect to the RessourceManager because it needs it to execute the job.
Here is an example of the logs when the RM is down and try to submit a job :
14/06/06 09:39:54 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/06/06 09:39:55 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/06/06 09:39:56 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
When the RM is back, the job is submitting correctly.

Hadoop-1.2.1 in Solaris 11.1 VM: Call to name-node failed on connection exception

Hi I am following this below guide in link for VirtualBox Solaris Zones Hadoop installation.
Oracle Solaris Zones Hadoop Setup
I was able to successfully follow till step 10. Once I tried to check report I am getting this error::
adoop#name-node:~$ hadoop dfsadmin -report
14/05/17 16:45:12 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/17 16:45:13 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
....
14/05/17 16:45:21 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
report: Call to name-node/192.168.1.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
hadoop#name-node:~$
can someone kindly suggest resolution.
Also netstat shows this
name-node.8021 . 0 0 128000 0 LISTEN
*.50030 . 0 0 128000 0 LISTEN
how to configure dfsadmin to port 8021 instead?
Step by step to configure Hadoop cluster on Oracle Solaris 11.1 using zones --- http://hashprompt.blogspot.com/2014/05/multi-node-hadoop-cluster-on-oracle.html
Probably this is too old question and you might have already solved it. But just in case if anyone is wondering.
in core-site.xml make the following changes
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.1:8021/</value>
</property>
This will configure name node server port.

Hadoop - Pseudo-Distributed Operation

I am trying to copy a file quangle.txt from my localsystem to Hadoop using the command below:
testuser#ubuntu:~/Downloads/hadoop/bin$ ./hadoop fs -copyFromLocal Desktop/quangle.txt hdfs://localhost/testuser/quangle.txt
13/11/28 06:35:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
13/11/28 06:35:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
copyFromLocal: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
I tried to ping 127.0.0.1 and I got the response. Please advice
just add correct port to the filepath after localhost:
hdfs://localhost:9000/testuser/quangle.txt
Looks like your Name node isn't running - try running the jps cmd and see if NameNode is listed in the running services (or you might have to run ps axww | grep NameNode if the NameNode was started by/under a different user)
Does sudo netstat -atnp | grep 8020 yield any results?
If the Name Node is refusing to start then copy in your Name Node logs into to your original question (or post a new question - after searching for the error first of all to see if someone else has had this problem)
Try running jps to see the currently running Java processes.
Are all Hadoop processes running, especially the Namemode?
If yes, you should get this output (with different process ids):
10015 JobTracker
9670 TaskTracker
9485 DataNode
10380 Jps
9574 SecondaryNameNode
9843 NameNode
I think you can use hadoop fs -put ~/Desktop/quangle.txt /testuser, after copied, you can look up it via hadoop fs -ls /testuser in the /testuser directory
you create Desktop and others with the command hadoop fs -mkdir testuser and then try, it worked for me that way
Maybe there is something wrong with your setting for Pseudodistributed Mode.
It should be configured in this order:
fill up the configuration files:core-site.xml, hdfs-site.xml,
mapred-site.xml, yar-site.xml.
Configuring SSH
Formatting the
HDFS filesystem
Starting and stopping the daemons

Connection Error in Apache Pig

I am running Apache Pig .11.1 with Hadoop 2.0.5.
Most simple jobs that I run in Pig work perfectly fine.
However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:
2013-07-29 13:24:08,591 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
013-07-29 11:57:29,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:30,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:31,422 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException
The strange thing is that after these errors keeping appearing for about 2 minutes, they'll stop, and the correct output shows up at the bottom.
So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up.
The LIMIT operator always gets this error. It happens on both MapReduce mode and local mode. The GROUP BY operator will work fine on small datasets.
One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.
Any suggestions on how to get rid of these messages?
Yes the problem was that the job history server was not running.
All we had to do to fix this problem was enter this command into the command prompt:
mr-jobhistory-daemon.sh start historyserver
This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server.
I think, this problem is related to hadoop mapred-site configuration issue. History Server runs default in localhost, so you need to add your configured host.
<property>
<name>mapreduce.jobhistory.address</name>
<value>host:port</value>
</property>
then fire this command -
mr-jobhistory-daemon.sh start historyserver
I am using Hadoop 2.6.0, so I had to do
$ mr-jobhistory-daemon.sh --config /usr/local/hadoop/etc start historyserver
where, /usr/local/hadoop/etc is my HADOOP_CONF_DIR.
I am using Hadoop 2.2.0. This problem was due to The History server was not running. I had to start the history server. I used following command to start history server:
[root#localhost ~]$ /usr/lib/hadoop-2.2.0/sbin/mr-jobhistory-daemon.sh
start historyserver

Unable to add a datanode to Hadoop

I got all my settings right and I am able to run Hadoop ( 1.1.2 ) on a single-Node. However, after making the changes to the relevant files ( /etc/hosts, *-site.xml ), I am not able to add a Datanode to the cluster and I keep getting the following error on the Slave.
Anybody knows how to rectify this?
2013-05-13 15:36:10,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:11,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-05-13 15:36:12,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Check the value of fs.default.name in your core-site.xml conf file (on each node in your cluster). This needs to be the network name of the name node and i suspect you have this as hdfs://localhost:54310).
Failing that check for any mention of localhost in your hadoop configuration files on all nodes in your cluster:
grep localhost $HADOOP_HOME/conf/*.xml
try relpacing localhost with the namenode's ip address or network name

Resources