I have installed fully distributed Hadoop 1.2.1. I was trying to integrated nutch with steps below:
Download apache-nutch-1.9-src.zip
Add value http.agent.name into nutch-site.xml
Copy hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml,
masters, slaves into $NUTCH_HOME/conf
compile using ant runtime
create urls/seed.txt and put on hadoop dfs
edit $NUTCH_HOME/conf/regex-urlfilter.txt
Test crawl using command:
bin/hadoop -jar nutch-1.9.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5
and get this error:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I tried extract nutch-1.9.job and I didn't find out class Crawl in org/apache/nutch/crawl.
Do I need to config something?
Crawl.java removed at 1.8 version. You can use crawl shell script for all crawling.
Deprecated class o.a.n.crawl.Crawler is still in code base https://issues.apache.org/jira/browse/NUTCH-1621
Related
[version]
Apache Spark 2.2.0
Hadoop 2.7
I want to set up Apache Spark histroy server.
Spark events log located in Amazon S3.
I can save log file in S3, but cannot read from history server.
Apache Spark installed at /usr/local/spark
so, $SPARK_HOME is /usr/local/spark
$ cd /usr/local/spark/sbin
$ sh start-history-server.sh
I got following error
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
....
my spark-defaults.conf is below
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.history.provider org.apache.hadoop.fs.s3a.S3AFileSystem
spark.history.fs.logDirectory s3a://xxxxxxxxxxxxx
spark.eventLog.enabled true
spark.eventLog.dir s3a://xxxxxxxxxxxxxxx
I installed this 2 jar files in /usr/local/spark/jars/
aws-java-sdk-1.7.4.jar
hadoop-aws-2.7.3.jar
but error is same.
What is wrong?
Please add the following in spark-defaults.conf file and retry again.
spark.driver.extraClassPath :/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/* :
spark.executor.extraClassPath :/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:
I'm trying to run an oozie workflow the should execute MRv1 hadoop job.
Started with Cloudera QuickStart VM 5.4.2-0. Configured it to use MRv1 (appended at the bottom).
But the workflow fails:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.MapReduceMain], main() threw exception, org/apache/hadoop/yarn/exceptions/YarnException
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:58)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:46)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:228)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
The strange thing is that when I look at the classpath, it is loading the jar files from the right location; namely /usr/lib/hadoop-0.20-mapreduce
-----configurations----
Configured OOzie to use MapReduce instead of yarn:
Changed config for Oozie in CDH by changing: MapReduce Service from YARN to MapReduce
Changed config for Oozie in by setting oozie-site.xml valve value:
<property>
<name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
<value>quickstart.cloudera:8021</value>
</property>
Changed tomcat config using alternatives:
sudo alternatives --config oozie-tomcat-deployment -- chose tomcat-conf.http.mr1
Configured Hadoop to use MapReduce:
sudo alternatives --config hadoop-conf -- chose conf.cloudera.mapreduce
I had forgotten to update the actions to use MR1:
$ sudo -u oozie hadoop fs -rmr /user/oozie/share
sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-mr1
I got pre-built Spark 1.4.1 and I'm running HDP 2.6. when I try to run spark-shell it gives me an error message as follows.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:111)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:97)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:107)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
What is the issue?
ClassNotFoundException occurs when class loader could not find the
required class in class path . So , basically you should check your
class path and add the class in the classpath.
Check whether hadoop-common-0.21.0.jar is added to your classpath.
Is it possible that your Hadoop home is not set, as in here?
Cannot find hadoop installation: $HADOOP_HOME must be set or hadoop must be in the path
I'm trying to run hive from the command prompt it is working absolutely fine. But when I try running hiveserver using "hive --service hiveserver" command, I'm getting the following exception.
Starting Hive Thrift Server
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
So I then tried with the command "hive --service hiveserver2"; still I'm not finding any solution.
Can anybody please suggest a solution for this problem.
May be another process (another hiveserver) already listening on port 10000.
can you check it by :
netstat -ntulp | grep ':10000' to see it and if found then kill the process.
Otherwise start the server on another port.
By the way which version you are using ?
This error occurred to me when it can't find hive-service-*.jar in hadoop classpath. Just copy the hive-service-*.jar to your hadoop lib folder or export classpath in hadoop-env.sh. I have mentioned how to add classpath below.
Add this line in hadoop-env.sh:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hive/lib/hive-*.jar
I have mentioned the path for hive as /usr/local/hive since i have hive installed at that location. Change it to point to your hive installation.
I get the following error while executing a MapReduce program.
I have placed all jars in hadoop/lib directory and have also mentioned the jars in -libjars.
This is the cmd I am executing:
$HADOOP_HOME/bin/hadoop --config $HADOOP_HOME/conf jar /home/shash/distinct.jar HwordCount -libjars $LIB_JARS WordCount HWordCount2
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hcatalog.mapreduce.HCatOutputFormat at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996) at
org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:248) at org.apache.hadoop.mapred.Task.initialize(Task.java:501) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:306) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at
org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.ClassNotFoundException: org.apache.hcatalog.mapreduce.HCatOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:423) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:356) at
java.lang.Class.forName0(Native Method) at
java.lang.Class.forName(Class.java:264) at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:994) ...
8 more
Make sure LIB_JARS is a comma-separated list (as opposed to colon-separated like CLASSPATH)
Applies To CDH 5.0.x CDH 5.1.x CDH 5.2.x CDH 5.3.x Sqoop
Cause Sqoop cannot pick up the HCatalog libraries because Cloudera
Manager does not set the HIVE_HOME environment. It needs to be set
manually.
This problem is tracked with below JIRA:
https://issues.apache.org/jira/browse/SQOOP-2145
The fix of this JIRA has been included in CDH since version 5.4.0.
Workaround: Applicable to CDH versions lower than 5.4.0.
Execute below commands in shell before calling Sqoop command or adding them to /etc/sqoop/conf/sqoop-env.sh (create one, if it does not already exists):
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive (for parcel installation)
export HIVE_HOME=/usr/lib/hive (for package installation)