Simplest Hive query just doesn't work - hadoop

When i try to got a query SELECT * FROM A
I get all of information but when try to getting a query that exists a column name (or a join) SELECT a FROM A It returns some Exceptions:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hive-merza/hive_2014-06-25_09-12-19_567_6884805431385202274/-mr-10003/3a481893-3b7e-4cb7-bca7-d5ad103e3aa8 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
Continuing ...
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-merza/mapred/staging/merza/.staging/job_201406250859_0004/libjars/hbase-0.94.16.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy12.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at $Proxy12.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
Job Submission failed with exception 'org.apache.hadoop.ipc.RemoteException(java.io.IOException: File /tmp/hadoop-merza/mapred/staging/merza/.staging/job_201406250859_0004/libjars/hbase-0.94.16.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
HADOOP 1.2.1
HBASE 0.94.16
HIVE 0.11.0
NOTE:my mapred and dfs is running.

The reason select * from table works and select a from table doesn't is because when u do a select * you need all the data which is lying there on hdfs hence it doesn't run's any M/R job
but in case of select a it has to run m/r job to strip the extra columns.
So effectively the issue is you are not able M/R job on your cluster, this issue is usually caused by insufficient space.
please check the total capacity of your cluster and used, remaining ratio using
hdfs dfsadmin -report
also check dfs.datanode.du.reserved in the hdfs-site.xml, if this value is larger than your remained capacity
look for other possible causes explained here

Solved!
Must be add some jar files to env-hive.sh
export HIVE_AUX_JARS_PATH=/.../hbase-***.jar,/.../hbase-handler***.jar,/.../zookeeperp***.jar

Related

How to put a large Text File in Hadoop HDFS

I am trying to process the Large Text file using spark, after installing the Spark cluster in Ec2 with 8 Gb of memory. How do I put my 100 GB of file in the hdfs to be able to process it using spark-scala script?
I am using following command
./hadoop fs -put /user/myuser/example.txt /src
but its throwing an Exception as follows
WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/myuser/example.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
WARN hdfs.DFSClient: Could not get block locations. Source file "/user/myuser/example.txt" - Aborting...
put: java.io.IOException: File /user/myuser/example.txt could only be replicated to 0 nodes, instead of 1
ERROR hdfs.DFSClient: Failed to close file /user/myuser/example.txt
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/myuser/example.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)

Hadoop warning and error while copying to HDFS on Amazon Aws EC2

I am trying to setup Hadoop cluster on Amazon EC2. I was able to format namenode. The output of 'jps' command is following in the namenode:
5826 NameNode
6312 JobTracker
6192 SecondaryNameNode
8266 Jps
and in the slave node:
5452 DataNode
5699 Jps
5609 TaskTracker
So, I assume hadoop is running. While I am trying to copy data from local directory to HDFS with the command
hadoop fs -mkdir /user/ubuntu/clusters
hadoop fs -copyFromLocal clusters /user/ubuntu/clusters
I am getting data replication warning and other errors. The log is following
14/11/18 18:41:12 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ubuntu/clusters/clusters could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
14/11/18 18:41:12 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
14/11/18 18:41:12 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/ubuntu/clusters/clusters" - Aborting...
copyFromLocal: java.io.IOException: File /user/ubuntu/clusters/clusters could only be replicated to 0 nodes, instead of 1
14/11/18 18:41:12 ERROR hdfs.DFSClient: Failed to close file /user/ubuntu/clusters/clusters
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ubuntu/clusters/clusters could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1920)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:783)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3720)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3580)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3023)
Any help will be highly appreciated.
Thanks.
One of the best step by step guide to create multinode cluster in Amazon EC2 is Here
It explains each and every step. You are already done with first part seems, Go through the second part which will help you..
Hope it helps you..

Adding Parallel option is failing the job in Pig

I am new to Pig and am trying to run the following PigScript on our 5-node Hadoop cluster.
The following script gives me the set intersection of two columns in a relation
register '/home/workspace/Pig/setIntersecUdf.jar';
define Inter com.cs.pig.SetIntersection();
a = load '/home/pig/pig-0.12.0/input/location.txt' as (location:chararray);
b = load '/home/pig/pig-0.12.0/input/location.txt' as (location:chararray);
c = CROSS a,b parallel 10;
c = DISTINCT c;
d = Foreach c generate $0,$1,Inter($0,$1) as intersection;
e = Filter d by intersection !='[]' parallel 10;
e = Filter e by $0!=$1 parallel 10;
Store e into '/home/Documents/Pig_output';
When I run this script without giving the 'PARALLEL' option, it runs fine. But, when I add this option, I get the following error in the tasktracker stack of the reducers and the job fails.
Java.io.IOException: org.apache.hadoop.ipc.RemoteException:
java.io.IOException: File /tmp/temp-10926921/tmp-1823693600/_temporary/_attempt_201401171541_0001_r_000000_0/part-r-00000 could only be replicated to 0 nodes, instead of 1 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:469) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:404) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256) at
org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650) at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/temp-10926921/tmp-1823693600/_temporary/_attempt_201401171541_0001_r_000000_0/part-r-00000 could only be replicated to 0 nodes, instead of 1 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639) at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736) at
sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) at
org.apache.hadoop.ipc.Client.call(Client.java:1107) at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at
$Proxy2.addBlock(Unknown Source) at
sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at $Proxy2.addBlock(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686) at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546) at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749) at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989)
Also, the diagnostic info for the failed job comes as:
# of failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201401171541_0001_r_000002
Kindly, help!
The problem appears to be that you are running out of disk space on your reducer. The CROSS operation is very expensive, in that it writes out every possible pair of records. You have 321,372 records, so your data will blow up by a factor of 321,372 when you use CROSS.
Since your data is about 6 MB, this translates to needing to write about 1.9 TB. When you specify PARALLEL 10, this gets spread across 10 reducers, which presumably doesn't max out any single machine's disk space. But without PARALLEL 10, Pig is defaulting to a single reducer (probably because of the small size of your input) and trying to write all of the data on that single machine.
Perhaps you could aggressively FILTER your results immediately after the CROSS so that Pig will throw most of those records away before trying to write to disk. Or you'll need to re-think your data flow so that you don't need to use CROSS.

File jobtracker.info could only be replicated to 0 nodes, instead of 1

I am trying to setup Hadoop cluster on Fedora 17.
When I give /bin/star-all.sh command daemons are getting started on masters and slaves nodes.
But when I view log file for data node on master node I get following EROOR
ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop1 cause:java.io.IOException: File /home/hadoop1/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
2013-03-23 15:37:08,205 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9100, call addBlock(/home/hadoop1/mapred/system/jobtracker.info, DFSClient_-838454688, null) from 127.0.0.1:40173: error: java.io.IOException: File /home/hadoop1/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /home/hadoop1/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
I am also trying to run wordcound program. While copying data into HDFS using command
$bin/hadoop dfs -copyFromLocal /home/hadoop1/Documents/wordcount/ /home/hadoop1/hdfs/data
I get following error
WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop1/hdfs/data/wordcount/pg20417.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829)
13/03/23 15:41:05 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
13/03/23 15:41:05 WARN hdfs.DFSClient: Could not get block locations. Source file "/home/hadoop1/hdfs/data/wordcount/pg20417.txt" - Aborting...
copyFromLocal: java.io.IOException: File /home/hadoop1/hdfs/data/wordcount/pg20417.txt could only be replicated to 0 nodes, instead of 1
13/03/23 15:41:05 ERROR hdfs.DFSClient: Exception closing file /home/hadoop1/hdfs/data/wordcount/pg20417.txt : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop1/hdfs/data/wordcount/pg20417.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
Help in this regard is appreciated..
I managed to solve this issue...
Step I) There was firewall active on master and slaves node machines..
I disabled it by following command "systemctl disable iptables.service"
Step II) I wrongly assigned "hdfs://localhost:9100" to "fs.default.name" in slave's core-sites.xml configuration file. I changed it to "hdfs://master:9100"
Now my Hadoop Cluster is Up..
Thank You Chris for your kind help...
My reason for this error was not enough space on disk (~200MB was not enough).
I delete jobtracker.info on HDFS, then it works.
Hadoop seems to remake it
One more reason for this error (which I encountered) is a messed up /etc/hosts file causing namenode to listen only on the loopback interface.
This caused datanodes not to connect to namenode and replication level was not reached as a side effect. See namenode rejecting connections bug for more detail.

LeaseExpiredException in Hive

all. I run a hive query runs to 97% and exception shows that org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on sth.
Can anyone kindly explain the reason why this error occurred?
And this is a single user Hive cluster environment.
Thank you in advance.
2013-01-02 22:16:17,833 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file /tmp/hive-hadoop/hive_2013-01-01_21-21-32_067_6367259756570557828/_task_tmp.-ext-10002/_tmp.000004_1 : org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-hadoop/hive_2013-01-01_21-21-32_067_6367259756570557828/_task_tmp.-ext-10002/_tmp.000004_1 File does not exist. Holder DFSClient_attempt_201301012114_0002_m_000004_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1631)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1677)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-hadoop/hive_2013-01-01_21-21-32_067_6367259756570557828/_task_tmp.-ext-10002/_tmp.000004_1 File does not exist. Holder DFSClient_attempt_201301012114_0002_m_000004_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1631)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1622)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1677)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1665)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy2.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy2.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3897)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3812)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1345)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:275)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:328)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1446)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:277)
at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:260)
Does your hive query create parallel MR jobs ?
I had the same problem, and found that: LeaseExpiredException: No lease error on HDFS :
When the job ends he deletes /data/work/ folder. If few jobs are running in parallel the deletion will also delete the files of the another job. actually I need to delete /data/work/.
In other words this exception is thrown when the job try to access to files which are not existed anymore
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;

Resources