Hadoop dfs put fails - hadoop

I am getting this error (hadoop dfs -put) when the process is triggered from a shell script (run by Jenkins) during the nigh, however, it runs OK if I try manually later.
put: Lease mismatch on /shared/file1.tsv._COPYING_ owned by DFSClient_NONMAPREDUCE_-2031300157_1 but is accessed by DFSClient_NONMAPREDUCE_-1375175076_1
13/08/07 07:11:39 ERROR hdfs.DFSClient: Failed to close file /shared/epg/programs/programs/20130812/programs__20130812_20130807.tsv._COPYING_
any suggestion why this might happen?

Related

Hadoop errorcode -1000, No space available in any of the local directories

I'm using Windows 7 with Hadoop 2.10.1 installed as shown here: https://exitcondition.com/install-hadoop-windows/ and I get an error when running my job:
INFO mapreduce.Job:
Job job_1605374051781_0001 failed with state FAILED due to:
Application application_1605374051781_0001 failed 2 times
due to AM Container for appattempt_1605374051781_0001_000002 exited with
exitCode: -1000 Failing this attempt.Diagnostics:
[2020-11-14 18:17:54.217]No space available in any of the local directories.
The expected output is several lines of text and my disks are nowhere near full (at least 10GB free). The code is some generic mapreduce job that I cannot post here because it's the intellectual property of the university.
Any tips on how to solve the "No space available" error?
For clarification I'm using only my PC, I'm not connected to other machines.
PS: I've solved it, as said here: Hadoop map reduce example stuck on Running job by user "banu reddy" https://stackoverflow.com/users/4249076/banu-reddy the free HDD space needs to be at least 10% od the disk.
Hadoop's jobs are executed within the framework's distributed filesystem aka HDFS, which works independently from the local filesystem (even by operating in just one machine, as you clarified).
That basically means that the error you got referred to the disk space available in the HDFS and not on your hard drives in general. To check if the HDFS has enough disk space to run the job or not, you can execute the following command on the terminal:
hdfs dfs -df -h
Which can have an output like this (ignoring the warning I get on my Hadoop setup):
If the command output in your system indicates that the available disk space is low or non-existent, you can individualy delete directories from the HDFS
by firstly checking what directories and files are stored:
hadoop fs -ls
And then deleting each directory from the HDFS:
hadoop fs -rm -r name_of_the_folder
Or file from the HDFS:
hadoop fs -rm name_of_the_file
Alternatively, you can empty everything stored in the HDFS to be sure that you will not hit the disk space limit again any time soon. You can do that by stopping the YARN and HDFS daemons at first:
stop-all.sh
Then enabling only the HDFS daemon:
start-dfs.sh
Then formatting everything on the namenode (aka the HDFS in your system, not your local files of course):
hadoop namenode -format
And enabling YARN and HDFS daemons at last:
start-all.sh
Remember to re-run the hdfs dfs -df -h command after deleting stuff in the HDFS so you make sure you have free space on the HDFS.

Connection refused error for Hadoop

When I start my system and opens Hadoop. It gives error as "Connection refused".
When I format my name node using hadoop nodname -format, I'm able to access my Hadoop directory using hadoop dfs -ls /.
But every time I have to format my nodename.
You can't just turn off your computer and expect Hadoop to pick up where it left off when turning the system back on
You need to actually run stop-dfs to prevent corruption in the Namenode and Datanode directories
Check both namenode and datanode logs to inspect why it's not starting if you do get "connection refused", otherwise it's a network problem

Cannot start running on browser the namenode for Hadoop

It is my first time in installing Hadoop on my Linux (Fedora distro) running on VM (using Parallel on my Mac). And I followed every step on this video and including the textual version of it.And then when I run it on localhost (or the equivalent value from hostname) in port 50070, I got the following message.
...can't establish a connection to the server at localhost:50070
When I run the jps by the way command I don't have the datanode and namenode unlike at the end of the textual version tutorial which has the following:
While mine has only the following processes running:
6021 NodeManager
3947 SecondaryNameNode
5788 ResourceManager
8941 Jps
When I run the hadoop namenode command I have some of the following [redacted] error:
Cannot access storage directory /usr/local/hadoop_store/hdfs/namenode
16/10/11 21:52:45 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_store/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I tried to access by the way the above mentioned directories and it existed.
Any hint for this newbie? ;-)
You would need to give read and write permission to user with which you are running the services on directory /usr/local/hadoop_store/hdfs/namenode.
Once done, you should run format command using hadoop namenode -format
Then try to start your services.
delete files /app/hadoop/tmp/*
and try again formatting the namenode and then start-dfs.sh & start-yarn.sh

java.io.EOFException: Premature EOF: no length prefix available in Spark on Hadoop

I'm getting this weird exception. I'm using Spark 1.6.0 on Hadoop 2.6.4 and submitting Spark job on YARN cluster.
16/07/23 20:05:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-532134798-128.110.152.143-1469321545728:blk_1073741865_1041
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
16/07/23 20:49:09 ERROR server.TransportRequestHandler: Error sending result RpcResponse{requestId=4719626006875125240, body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=81 cap=81]}} to ms0440.utah.cloudlab.us/128.110.152.175:58944; closing connection
java.nio.channels.ClosedChannelException
I was getting this error when running on Hadoop 2.6.0 and thought the exception might be kind of a bug like this but after even changing this to Hadoop 2.6.4 I'm getting the same error. There is not any memory problem, my cluster is good with HDFS and memory. I went through this and this but no luck.
Note: 1. I'm using Apache Hadoop and Spark not any CDH/HDP. 2. I'm able to copy data in HDFS and even able to execute another job on this cluster.
Check file permissions of dfs directory:
find /path/to/dfs -group root
In general, the user permission group is hdfs.
Since I started HDFS service with root user, some dfs block file with root permissions was generated.
I solved the problem after change to right permissions:
sudo chown -R hdfs:hdfs /path/to/dfs

Hadoop Basic Examples WordCount

I am getting this error with a mostly out of the box configuration from
version 0.20.203.0
Where should I look for a potential issue. Most of the configuration is out of the box. I was able to visit the local websites for hdfs, task manager.
I am guessing the error is related to a permissions issue on cygwin and windows. Also, googling the problem, they say there might be some kind of out of memory issue. It is such a simple example, I don't see how that could be.
When I try to run the wordcount examples.
$ hadoop jar hadoop-examples-0.20.203.0.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output6
I get this error:
2011-08-12 15:45:38,299 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_201108121544_0001_m_000008_2 : Child Error
java.io.IOException: Task process exit with nonzero status of 127.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
2011-08-12 15:45:38,878 WARN org.apache.hadoop.mapred.TaskLog: Failed to
retrieve stdout log for task: attempt_201108121544_0001_m_000008_1
java.io.FileNotFoundException:
E:\projects\workspace_mar11\ParseLogCriticalErrors\lib\h\logs\userlogs\j
ob_201108121544_0001\attempt_201108121544_0001_m_000008_1\log.index (The
system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
at
org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:112)
...
The userlogs/job* directory is empty. Maybe there is some permission
issue with those directories.
I am running on windows with cygwin so I don't really know permissions
to set.
I couldn't figure out this problem with the current version of hadoop. I reverted from the current version and went to a previous release, hadoop-0.20.2. I had to play around with the core-site.xml configuration file and temp directories but I eventually got the hdfs and map reduce to work properly.
The issue seems to be cygwin, windows and the drive setup that I was using. Hadoop launches a new JVM process when it tries to invoke a 'child' map/reduce task. The actual jvm execute statement is in some shell script.
In my case, hadoop couldn't find the path to the shell script. I am assuming that status code 127 error was the result of the Java Runtime execute not finding the shell script.

Resources