run hadoop nutch on oozie gets old racing condition

run hadoop nutch on oozie gets old racing condition - hadoop

I can run hadoop jar apache-nutch-1.7.job org.apache.nutch.crawl.Crawl args in command line just fine, but when run in oozie, I get an exception
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://server:8020/user/hdfs/.staging/job_1416525929767_0494/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1566)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1430)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1388)
an old Jira reported this exception
https://issues.apache.org/jira/browse/MAPREDUCE-5471
but it was supposely fixed back in version 2.1.1-beta. I am on yarn 2.5.0.
Any one else see this?

Related

Spark 2.0.1 not finding file passed in through archives flag

I was running Spark job which make use of other files that is passed in through --archives flag of spark
spark-submit .... --archives hdfs:///user/{USER}/{some_folder}.zip .... {file_to_run}.py
Spark is currently running on YARN and when I tried it with spark version 1.5.1 it was fine.
However, when I ran the same commands with spark 2.0.1, I got
ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "/home/{USER}/{some_folder}/.....": error=2, No such file or directory
Since the resource is managed by YARN, it is challenging to manually check if the file gets successfully decompressed and exist when the job runs.
I wonder if anyone has experienced similar issue.

How to remove apache oozie completly?

I want to remove oozie and reinstall a fresh copy.
I installed oozie by following this steps
http://hadooptutorial.info/apache-oozie-installation-on-ubuntu-14-04/ Can anyone please help me to remove oozie completely from my laptop?
I am using ubuntu latest version ..with hadoop 2.6.0 ..
Earlier I removed /usr/lib/oozie folder but it did not worked out for me after installing a fresh copy of oozie ..(got many errors and exception )
I am describing few of the errrors below after installing fresh copy of oozie
oozie admin -oozie http://localhost:11000/oozie -status
Connection exception has occurred [ java.net.ConnectException Connection refused ]. Trying after 1 sec. Retry count = 1
oozied.sh stop
PID file found but no matching process was found. Stop aborted.
oozie-setup.sh sharelib create -fs hdfs://localhost:9000
setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/io/filefilter/IOFileFilter
Thank you

removing /usr/lib/oozie will not remove oozie entirely .
Something more is require

Executing Mahout against Hadoop cluster

I have a jar file which contains the mahout jars as well as other code I wrote.
It works fine in my local machine.
I would like to run it in a cluster that has Hadoop already installed.
When I do
$HADOOP_HOME/bin/hadoop jar myjar.jar args
I get the error
Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/hdfs/path (exists=false, cwd=file:local/folder/where/myjar/is)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java 440)
...
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I checked that I can access and create the dir in the hdfs system.
I have also ran hadoop code (no mahout) without a problem.
I am running this in a linux machine.

Check for the mahout user and hadoop user being same. and also check for mahout and hadoop version compatibility.
Regards
Jyoti ranjan panda

nutch 1.7 keeps change filesystem to local when run from oozie

I built and ran nutch 1.7 from command line just fine
hadoop jar apache-ntuch-1.7.job org.apache.nutch.crawl.Crawl hdfs://myserver/nutch/urls -dir hdfs://myserver/nutch/crawl -depth 5 -topN100
but when I ran the same thing from oozie, it keeps getting
Wrong FS: hdfs://myserver/nutch/crawl/crawldb/current, expected: file:///
I checked into the source, every time the code does
FileSystem fs = new JobClient(job).getFs();
the fs gets changed back to local fs.
I override all the instance of these statements, the job then dies in the fetch stage, simply says
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:838)
it really appears that running from oozie causes the wrong version of JobClient class (from hadoop-core.jar) to be loaded.
Anyone saw this before?

it seems the oozie conf directory is missing the proper *-site.xml files. I added mapred-site.xml to /etc/oozie/conf/hadoop-conf directory, and this problem went away.

Hadoop map-Reduce program not runing

I'm new to Hadoop MapReduce. When I'm trying to run my MapReduce code using the following command:
vishal#XXXX bin/hadoop jar /user/vishal/WordCount com.WordCount.java /user/vishal/file01 /user/vishal/output.
It displays the following output:
Exception in thread "main" java.io.IOException: Error opening job jar: /user/vishal/WordCount.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:130)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:131)
at java.util.jar.JarFile.<init>(JarFile.java:150)
at java.util.jar.JarFile.<init>(JarFile.java:87)
at org.apache.hadoop.util.RunJar.main(RunJar.java:128)
How can I fix this error?

Your command is asking Hadoop to run a JAR but is specifying a directory instead.
You have also added '.java' to the class name, which is not required. (This is assuming you have written the package name, com.WordCount, correctly).
First build the jar in /user/vishal/WordCount.jar (ensure this is a local directory, not HDFS) then run the command without the '.java' at the end of the class name. Also, you put a dot at the end of the command in your question, I hope that isn't there in the real command.
bin/hadoop jar /user/vishal/WordCount.jar com.WordCount /user/vishal/file01 /user/vishal/output
See the Hadoop tutorial's 'Usage' section for more.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

run hadoop nutch on oozie gets old racing condition - hadoop

Related

Spark 2.0.1 not finding file passed in through archives flag

How to remove apache oozie completly?

Executing Mahout against Hadoop cluster

nutch 1.7 keeps change filesystem to local when run from oozie

Hadoop map-Reduce program not runing

Categories

Resources