I am running Apache Pig .11.1 with Hadoop 2.0.5.
Most simple jobs that I run in Pig work perfectly fine.
However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:
2013-07-29 13:24:08,591 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
013-07-29 11:57:29,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:30,421 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 11:57:31,422 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException
The strange thing is that after these errors keeping appearing for about 2 minutes, they'll stop, and the correct output shows up at the bottom.
So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up.
The LIMIT operator always gets this error. It happens on both MapReduce mode and local mode. The GROUP BY operator will work fine on small datasets.
One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.
Any suggestions on how to get rid of these messages?
Yes the problem was that the job history server was not running.
All we had to do to fix this problem was enter this command into the command prompt:
mr-jobhistory-daemon.sh start historyserver
This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server.
I think, this problem is related to hadoop mapred-site configuration issue. History Server runs default in localhost, so you need to add your configured host.
<property>
<name>mapreduce.jobhistory.address</name>
<value>host:port</value>
</property>
then fire this command -
mr-jobhistory-daemon.sh start historyserver
I am using Hadoop 2.6.0, so I had to do
$ mr-jobhistory-daemon.sh --config /usr/local/hadoop/etc start historyserver
where, /usr/local/hadoop/etc is my HADOOP_CONF_DIR.
I am using Hadoop 2.2.0. This problem was due to The History server was not running. I had to start the history server. I used following command to start history server:
[root#localhost ~]$ /usr/lib/hadoop-2.2.0/sbin/mr-jobhistory-daemon.sh
start historyserver
Related
I'm starting an application which loads creates a new YarnConfiguration() object.
When I'm running it I'm setting HADOOP_CONF_DIR to /etc/hadoop/conf where the configuration files are.
I'm then starting the application;
yarn -jar jarname.jar --config.file config/local.properties and getting the following error;
INFO: Connecting to ResourceManager at /0.0.0.0:8032
Jul 25, 2016 12:33:49 PM org.apache.hadoop.ipc.Client handleConnectionFailure
INFO: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
So it doesn't seem to be picking up the details of the yarn resource manager which are running on another client.
the yarn-site.xml has the correct values in it.
Ignoring the shame of how long this took me to spot, incase anyone else has the same problem -
It came down to the -jar which was incorrect. The command needed to be yarn jar without the hypen.
I encounter a connection issue when submiting job in yarn-client mode from IDEA Intellij.
I did set the env variables and double checked it by printing them out:
System.setProperty("YARN_CONF_DIR", "D:\\HadoopDev\\UserClick\\src\\main\\resources\\hadoop-vm");
System.setProperty("HADOOP_CONF_DIR", "D:\\HadoopDev\\UserClick\\src\\main\\resources\\hadoop-vm");
But I still got error message telling me:
INFO - Connecting to ResourceManager at /0.0.0.0:8032
INFO - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 >time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >sleepTime=1 SECONDS)
INFO - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 》>time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >sleepTime=1 SECONDS)
INFO - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 >time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >sleepTime=1 SECONDS)
All config files hadoop related are in the folder.And I have tried to upload the jar and submited it in yarn-client mode in the cluster which turned out it worked.
Any help?Thx~
set .config("spark.hadoop.yarn.resourcemanager.address", "hadoop:8032")
does the override trick.
I'm running a Pig job that fails to connect to the Hadoop job history server.
The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:
2015-04-21 19:05:22,825 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:
2015-04-21 19:05:55,822 [main] WARN org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
I found this question here but in my case the job history server is started. If I run netstat, I find:
tcp 0 0 0.0.0.0:10020 0.0.0.0:* LISTEN 12073/java off (0.00/0/0)
Where 12073 is ...
12073 pts/4 Sl 0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
I tried opening the port 10200 in case it was a firewall issue:
ACCEPT tcp -- anywhere anywhere tcp dpt:10020
... but no luck.
After a few minutes, some of the tasks just arbitrarily continue to the next part.
I'm using Hadoop 2.3 and Pig 0.14.
My question is:
1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?
... or failing that ...
2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?
It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.
The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)
<property>
<name>mapreduce.jobhistory.address</name>
<value>cm:10020</value>
<description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>
Then restart the job history server:
mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver
If you get a bind exception (port in use), it means the stop didn't work. Either
Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or
Use a different port in the configuration
Pig should pick up the new settings automatically. Run a Pig script and hope for the best.
start history server in hadoop bin using the below command
bin$ ./mr-jobhistory-daemon.sh start historyserver
run pig using the below command
$pig
Config mapreduce.jobhistory.address in hadoop/etc/hadoop/mapred-site.xml,
then:
mapred --daemon start
The solution was the History server was not running:
[user#vm9 sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/user/hadoop-2.7.7/logs/mapred-user-historyserver-vm9.out
[user#vm9 sbin]$ jps
5683 NameNode
6309 NodeManager
5974 SecondaryNameNode
8075 RunJar
6204 ResourceManager
8509 JobHistoryServer
5821 DataNode
8542 Jps
[user#vm9 sbin]$
Now pig can run properly and it will connect to the job history server and the dump command is working fine.
In the newest version of Hadoop mapreduce(called 'Yarn'), JobTracker(exists in previous version) has been replaced by the ResourceManager(called 'RM') and ApplicationMaster.
In official document about Yarn architecture, there are no words say that how many RMs are there in a MapReduce cluster, and the given graph about Yarn architecture shows only 1 RM exists in a cluster.
So, what if the only RM down? If there are several RMs, how do they work together?
Hope someone can explain it to me.
Thanks.
There is 1 RessourceManager per rack but you can have several racks in your cluster.
If you try to submit a job while RessourceManager is down, Hadoop will try to connect to the RessourceManager because it needs it to execute the job.
Here is an example of the logs when the RM is down and try to submit a job :
14/06/06 09:39:54 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/06/06 09:39:55 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/06/06 09:39:56 INFO ipc.Client: Retrying connect to server: hadoop01.sii.fr/10.6.6.211:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
When the RM is back, the job is submitting correctly.
Hi I am following this below guide in link for VirtualBox Solaris Zones Hadoop installation.
Oracle Solaris Zones Hadoop Setup
I was able to successfully follow till step 10. Once I tried to check report I am getting this error::
adoop#name-node:~$ hadoop dfsadmin -report
14/05/17 16:45:12 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/17 16:45:13 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
....
14/05/17 16:45:21 INFO ipc.Client: Retrying connect to server: name-node/192.168.1.1:8020. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
report: Call to name-node/192.168.1.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
hadoop#name-node:~$
can someone kindly suggest resolution.
Also netstat shows this
name-node.8021 . 0 0 128000 0 LISTEN
*.50030 . 0 0 128000 0 LISTEN
how to configure dfsadmin to port 8021 instead?
Step by step to configure Hadoop cluster on Oracle Solaris 11.1 using zones --- http://hashprompt.blogspot.com/2014/05/multi-node-hadoop-cluster-on-oracle.html
Probably this is too old question and you might have already solved it. But just in case if anyone is wondering.
in core-site.xml make the following changes
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.1:8021/</value>
</property>
This will configure name node server port.