Hadoop MapReduce :java.io.EOFException: Premature EOF: no length prefix available - hadoop

when I try the Example: WordCount v1.0 from
http://hadoop.apache.org/docs/r2.7.4/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0
I got the warns and Exceptions blow:
And I found that when I put some .txt files to the HDFS. I also got the EOFException. Anyone know why ?

I didn't have write permission to the destination directory. Once I got the permission, it worked fine. Error was deceiving.

Related

Error: -copyFromLocal: java.net.UnknownHostException

I am new at Java, Hadoop etc.
I am having a problem when trying to copy a file to HDFS.
It says: "-copyFromLocal: java.net.UnknownHostException: quickstart.cloudera (...)"
How can I solve this? It is a exercise. You can see the problem in the imagem below.
Image with the problem
Image 2 with the error
Thank you very much.
As error says you need to supply the HDFS folder path as destination. So the code should be like:
hadoop fs -copyFromLocal words.txt /HDFS/Folder/Path
Almost all errors that you get while working in Hadoop are Java errors as MapReduce was mostly written in Java. But that doesnt mean there is some Java error in it.

How do you transfer files onto the Hadoop FS (HDFS) on WIndows cmdline without Cygwin?

I have zero experience with Hadoop, but suddenly have to use it at work with Spark on Windows. My question, which has been asked a few times here, but I never could quite get the syntax for what I need, is this. I'm trying to transfer a simple file called:
gensortText.txt which let's say is at c:\gensortText.txt
I know you can use hadoop fs -copyFromLocal. I've tried these things:
hadoop fs -copyFromLocal C:\gensortText.txt hdfs://0.0.0.0:19000
ERROR: Relative path in absolute URI.
hadoop fs -copyFromLocal C:\gensortOutText.txt \tmp\hadoop-Administrator\dfs
ERROR: copyFromLocal: `tmphadoop-Administratordfs': No such file or directory
and a number of other variations with hdfs: and using the tmp directory which all returned similar errors.
I have hadoop in c:\deploy as suggested in the Hadoop2Windows guide (which works and allowed me to run Hadoop. I can access the WebGui and all that). Hadoop has created my new HDFS at c:\temp. Please someone help me figure out how to transfer files into the system. It can even be manually if that's possible, but that doesn't seem to work as it doesn't show up in the Web GUI when I go to "Utilities->Browse the Filesystem". Nothing shows up there actually.
Can someone please help. Any information that's relevant I can provide, but I'm so new to this I don't really know what would be helpful. I think it's just my syntax for the cmdline tool. Can someone give me a concrete example of how to use hadoop -fs copyFromLocal or another simple way to do this? Sorry for my ignorance on the subject, and thanks for any help
To be able to run hadoop commands on Windows you need to have winutils installed and visible to hadoop process.

pig local mode spill data issue

I am trying to solve this issue but unable to understand. The pig script in my Development machine ran on a 1.8 GB data file successfully.
When I am trying to run it in server it is stating that it cannot find a local device to spill data spill0.out
I have modified the pig.temp.Dir property in the pig.property file to point to a location having space..
error:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out
So how to find out where pig is spilling out the data and can we change the pig spill directory location as well somehow.
I using pig in local mode.
Any ideas or suggestions or workarounds will be of great help.
Thanks..
I found an answer.
We need to put the follwing to the $PIG_HOME/conf/pig.properties file
mapreduce.jobtracker.staging.root.dir
mapred.local.dir
pig.temp.dir
and then test.
This has helped me solve the problem.
This is not a problem with Pig.
I'm not using Pig and I also have exactly the same error.
The problem seems to be more related to Hadoop. I also use it in local mode. I'm using Hadoop 2.6.0
I had no luck with these answers, Pig (version 0.15.0) was still writing pigbag* files to /tmp dir so I just renamed my /tmp dir and created a symbolic link to the desired location like this:
sudo -s #change to root
cd /
mv tmp tmp_local
ln -s /desired/new/tmp/location tmp
chmod 1777 tmp
mv tmp_local/* tmp
Make sure there are no active applications writing to tmp folder at the time of running these commands.

unable to setup psuedo distributed hadoop cluster

I am using centos 7. Downloaded and untarred hadoop 2.4.0 and followed the instruction as per the link Hadoop 2.4.0 setup
Ran the following command.
./hdfs namenode -format
Got this error :
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I see a number of posts with the same error with no accepted answers and I have tried them all without any luck.
This error can occur if the necessary jarfiles are not readable by the user running the "./hdfs" command or are misplaced so that they can't be found by hadoop/libexec/hadoop-config.sh.
Check the permissions on the jarfiles under: hadoop-install/share/hadoop/*:
ls -l share/hadoop/*/*.jar
and if necessary, chmod them as the owner of the respective files to ensure they're readable. Something like chmod 644 should be sufficient to at least check if that fixes the initial problem. For the more permanent fix, you'll likely want to run the hadoop commands as the same user that owns all the files.
I followed the link Setup hadoop 2.4.0
and I was able to get over the error message.
Seems like the documentation on hadoop site is not complete.

hdfs data directory "is in an inconsistent state: is incompatible with others."

Sorry, but this is getting on my nerves...
Exactly when i start loading a table through hive, I start getting this error. And dear old google is not able to help either.
my situation -
single node setup. Namenode working properly.
datanode startup is failing with this message -
ERROR datanode.DataNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /xxxxxx/hadoop/hdfs-data-dir is in an inconsistent state: is incompatible with others.
I have already tried to re-format my namenode, but it doesnt help.
Also, I tried to find ways to "format" my datanode, but no success so far..
help please...
This site pointed me at a solution after a drive got reformatted:
I ran into a problem with hadoop where it wouldn’t startup after I
reformatted a drive. To fix this make sure the VERSION number is the
same across all hadoop directories
md5sum /hadoop/sd*/dfs/data/current/VERSION
If they aren’t the same version across all partitions, then you will
get the error.
I simply copied the VERSION information from one of the other drives, changed permissions, and restarted HDFS.
Found a fix.
Needed to
create a fresh hdfs directory,
remove the write permissions from the group (chmod g-w xxxx) and
remove all temporary files from /tmp pertaining to hadoop/hdfs.
I am convinced that there could/would be a better/cleaner way to fix this.
still keeping the question open therefore.

Resources