Hadoop: ClassNotFoundException - org.apache.hcatalog.rcfile.RCFileMapReduceOutputFormat - hadoop

I'm facing ClassNotFoundException, when I run my job for the class org.apache.hcatalog.rcfile.RCFileMapReduceOutputFormat.
I tried to pass the additional jar files with -libjars, still I am facing the same issue. Any suggestions will be greatly helpful. Thanks in advance.
Below is the command I am using and exception I am facing!
hadoop jar MyJob.jar MyDriver -libjars hcatalog-core-0.5.0-cdh4.4.0.jar inputDir OutputDir
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hcatalog/rcfile/RCFileMapReduceOutputFormat
at com.cloudera.sa.omniture.mr.OmnitureToRCFileJob.run(OmnitureToRCFileJob.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.cloudera.sa.omniture.mr.OmnitureToRCFileJob.main(OmnitureToRCFileJob.java:131)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hcatalog.rcfile.RCFileMapReduceOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more
I implemented ToolRunner as well, below is the code which confirms that!
public class OmnitureToRCFileJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
OmnitureToRCFileJob processor = new OmnitureToRCFileJob();
String[] otherArgs = new GenericOptionsParser(processor.getConf(), args).getRemainingArgs();
System.exit(ToolRunner.run(processor.getConf(), processor, otherArgs));
}
}

Did you try running by giving full path of "hcatalog-core-0.5.0-cdh4.4.0.jar" jar file in your below line.
hadoop jar MyJob.jar MyDriver -libjars hcatalog-core-0.5.0-cdh4.4.0.jar inputDir OutputDir
or
Below configuration should also work for you
$ export LIBJARS= <fullpath>/hcatalog-core-0.5.0-cdh4.4.0.jar
$hadoop jar MyJob.jar MyDriver -libjars ${LIBJARS} inputDir OutputDir

If you look at hadoop command documentation, you can see that -libjars is a generic option. For parsing generic option, you got to override the ToolRunner.run() method in your driver class as follows :
public class TestDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
# Job configuration details
# Job submission
return 0;
}
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new TestDriver(), args);
System.exit(exitCode);
}
I'nk you are getting this exception from your driver code itself. Setting hcatalog-cor*.jar using -libjars option may not be available in client JVM(JVM in which driver code runs). Better you need to set this jar in HADOOP_CLASSPATH environment variable before executing the same using hadoop jar as follows
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:<PATH-TO-HCAT-LIB>/hcatalog-core-0.5.0-cdh4.4.0.jar;
hadoop jar MyJob.jar MyDriver -libjars hcatalog-core-0.5.0-cdh4.4.0.jar inputDir OutputDir

Id had the same problem but found out the jar command doesn't accept the --libjars argument.
"Specify comma separated jar files to include in the classpath. Applies only to job." --> Hadoop Cli Generic Options
Instead you should use the env vars to add additional or replace jars.
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_CLASSPATH="./lib/*"

Related

How to read external xml in spring boot 2.1.7 using external path as command line argument or config folder

Am working on a spring boot application having external configuration.Jar file run via command prompt using following command.
java -jar Service-1.0.jar --spring.config.additional-location=C:/Users/Administrator/Desktop/Springboot/
Need to pass external configuration path via command line argument,because they may varying.
Main class
#ImportResource(locations = {
"config/spring/service-config.xml",
"config/spring/datasource-config.xml"
})
public class ServiceMain {
public static void main(String[] args) {
ConfigurableApplicationContext applicationContext = new SpringApplicationBuilder(ServiceMain.class)
.build()
.run(args);
for (String name : applicationContext.getBeanDefinitionNames()) {
}
}
}
when i run this jar it showing the following error
EDIT 1
changed running command
java -jar Service-1.0.jar --spring.config.additional-location=C:/Users/Administrator/Desktop/Springboot/,--external.config=C:/Users/Administrator/Desktop/Springboot/
changed main class
#ImportResource(locations = {
"${external.config}/config/spring/service-config.xml",
"${external.config}/config/spring/datasource-config.xml"
})
public class ServiceMain {
public static void main(String[] args) {
ConfigurableApplicationContext applicationContext = new SpringApplicationBuilder(ServiceMain.class)
.build()
.run(args);
for (String name : applicationContext.getBeanDefinitionNames()) {
}
}
}
it showing exception
java.lang.IllegalArgumentException: Could not resolve placeholder 'external.config' in value "${external.config}/config/spring/service-config.xml"
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:181)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.processConfigBeanDefinitions(ConfigurationClassPostProcessor.java:315)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.postProcessBeanDefinitionRegistry(ConfigurationClassPostProcessor.java:232)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanDefinitionRegistryPostProcessors(PostProcessorRegistrationDelegate.java:275)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanFactoryPostProcessors(PostProcessorRegistrationDelegate.java:95)
at org.springframework.context.support.AbstractApplicationContext.invokeBeanFactoryPostProcessors(AbstractApplicationContext.java:705)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:531)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:743)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:390)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:312)
at com.ge.hcit.xer.app.services.api.XERServiceMain.main(XERServiceMain.java:31)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:51)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:52)
Caused by: java.lang.IllegalArgumentException: Could not resolve placeholder 'external.config' in value "${external.config}/config/spring/service-config.xml"
at org.springframework.util.PropertyPlaceholderHelper.parseStringValue(PropertyPlaceholderHelper.java:178)
at org.springframework.util.PropertyPlaceholderHelper.replacePlaceholders(PropertyPlaceholderHelper.java:124)
at org.springframework.core.env.AbstractPropertyResolver.doResolvePlaceholders(AbstractPropertyResolver.java:237)
at org.springframework.core.env.AbstractPropertyResolver.resolveRequiredPlaceholders(AbstractPropertyResolver.java:211)
at org.springframework.core.env.AbstractEnvironment.resolveRequiredPlaceholders(AbstractEnvironment.java:575)
at org.springframework.context.annotation.ConfigurationClassParser.doProcessConfigurationClass(ConfigurationClassParser.java:311)
at org.springframework.context.annotation.ConfigurationClassParser.processConfigurationClass(ConfigurationClassParser.java:242)
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:199)
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:167)
... 18 common frames omitted
how the importresource taking the external configuration
EDIT 2
Am placing configuration in config folder.external application.properties are loading in current project,but its not loading in dependency project.
#Configuration
public class ConfigurationFactory
{
public static final String REQ_CONF = "config/Configuration.xml";
public static final String FILTER_XML_CONF = "config/DocFilter.xml";
}
Is the spring load external application.properties/yml files only?
Use the file: prefix for system resource file:
#ImportResource(locations = {
"file:${external.config.location}/config/spring/service-config.xml",
"file:${external.config.location}/config/spring/datasource-config.xml"
})
Run with --external.config.location=xxx:
java -jar Service-1.0.jar --external.config.location=C:/Users/Administrator/Desktop/Springboot
Nothing wrong with your approach.
Instead of command line, add that as variable in application.properties file
please separate place holder and remaining path,
like ("${external.config}"+"/spring/service-config.xml")

error in running hbase java program

this is my java hbase createtable program below:-
public class createtable
{
public static void main(String[] args) throws IOException
{
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "sandbox.hortonworks.com");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("zookeeper.znode.parent", "/hbase-unsecure");
HBaseAdmin admin = new HBaseAdmin(conf);
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("people"));
tableDescriptor.addFamily(new HColumnDescriptor("personal"));
tableDescriptor.addFamily(new HColumnDescriptor("contactinfo"));
admin.createTable(tableDescriptor);
Put put = new Put(Bytes.toBytes("doe-john-m-12345");
}
after creating jar (table-0.0.1-SNAPSHOT.jar) of the program when i am running the command
hadoop jar table-0.0.1-SNAPSHOT.jar table.createtable
i am getting
Exception in thread "main" java.lang.NoClassDefFoundError:org/apachehadoop/hbase/HBaseConfiguration at
table.createtable.main(createtable.java:17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav a:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Causedby:
java.lang.ClassNotFoundException:org.apache.hadoop.hbase.HBase
Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
how do i resolve this error ??
Your code is not able to see hbase.jar . Try adding this piece in your pom.xml
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.90.2</version>
</dependency>
And then run it .
Please read this blog: https://my-bigdata-blog.blogspot.com/2017/08/hbase-programming-on-map-reduce-with.html
Two points to make it work
1) You code should have TableMapReduceUtil.addDependencyJars(job);
2) On command line - before you execute the command do this:
export HADOOP_CLASSPATH=your-jar-path
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:hbase classpath
This adds hbase libraries for execution. The one you add in maven/netbeans are for compilation.
i solved this issue as there is issue in making fat jar and the in new hbase version jars are deprecated with many dependency..so i also added hbase-client-1.2.6-jar and hbase-common-1.2.6-jar to eclipse externally and then building fat jar ...this solved my issue .

Kite Dataset map-reduce

I am trying to do map-reduce with kite-dataset api.
I have followed below urls to refer.
https://community.cloudera.com/t5/Kite-SDK-includes-Morphlines/Map-Reduce-with-Kite/td-p/22165
https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-mapreduce/src/test/java/org/kitesdk/data/mapreduce/TestMapReduce.java
My code snippet as below
public class MapReduce {
private static final String sourceDatasetURI = "dataset:hive:test_avro";
private static final String destinationDatasetURI = "dataset:hive:test_parquet";
private static class LineCountMapper
extends Mapper<GenericData.Record, Void, Text, IntWritable> {
#Override
protected void map(GenericData.Record record, Void value,
Context context)
throws IOException, InterruptedException {
System.out.println("Record is "+record);
context.write(new Text(record.get("index").toString()), new IntWritable(1));
}
}
private Job createJob() throws Exception {
System.out.println("Inside Create Job");
Job job = new Job();
job.setJarByClass(getClass());
Dataset<GenericData.Record> inputDataset = Datasets.load(sourceDatasetURI, GenericData.Record.class);
Dataset<GenericData.Record> outputDataset = Datasets.load(destinationDatasetURI, GenericData.Record.class);
DatasetKeyInputFormat.configure(job).readFrom(inputDataset).withType(GenericData.Record.class);
job.setMapperClass(LineCountMapper.class);
DatasetKeyOutputFormat.configure(job).writeTo(outputDataset).withType(GenericData.Record.class);
job.waitForCompletion(true);
return job;
}
public static void main(String[] args) throws Exception {
MapReduce httAvroToParquet = new MapReduce();
httAvroToParquet.createJob();
}
}
I am using HDP 2.3.2 box ,creating assembly jar and submitting my application through spark-submit.
I am getting below error when I submit my application.
2015-12-18 04:09:07,156 WARN [main] org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2015-12-18 04:09:07,282 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2015-12-18 04:09:07,333 WARN [main] org.kitesdk.data.spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
2015-12-18 04:09:07,334 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:478)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1560)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:377)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1518)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
at org.kitesdk.data.Datasets.load(Datasets.java:103)
at org.kitesdk.data.Datasets.load(Datasets.java:165)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:510)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getOutputCommitter(DatasetKeyOutputFormat.java:473)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:476)
... 11 more
I am not getting what's wrong ? Is there any class-path problem ? If yes then where do I set it ?
You effectively have a classpath problem
Your project is missing org.kitesdk:kite-data-hive
You can
Add this jar to your fat jar before submitting it to Spark
Tells Spark to add it to your classpath when you submit

How to run HBase program

How can I run the code below from the command line?
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.*;
public class MyHBase {
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
try {
HTable table = new HTable(conf, "test-table");
Put put = new Put(Bytes.toBytes("test-key"));
put.add(Bytes.toBytes("cf"), Bytes.toBytes("q"), Bytes.toBytes("value"));
table.put(put);
} finally {
admin.close();
}
}
}
How to set my hbase classpath? I am getting a huge string in my classpath.
UPDATE
root# vi MyHBase.java
hbase-0.92.2 root# java -classpath `hbase classpath`:./ /var/root/MyHBase
-sh: hbase: command not found
Exception in thread "main" java.lang.NoClassDefFoundError: /var/root/MyHBase
Caused by: java.lang.ClassNotFoundException: .var.root.MyHBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
I am able to do
hbase-0.92.2 root# bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.2, r1379292, Fri Aug 31 13:13:53 UTC 2012
hbase(main):001:0> exit
Am i doing anything wrong?
You can also do something like this:-
# export HADOOP_CLASSPATH=`./hbase classpath`
and then bundle this java in jar to run it within hadoop cluster like this:-
#hadoop jar <jarfile> <mainclass>
Use this
java -classpath `hbase classpath`:./ MyHBase
this will get the entire classpath string that HBase uses and adds the current directory as well to the classpath.
Cheers
Rags

Hadoop ToolRunner fails with NoClassDefFoundError

I have a created a simple MapReduce Driver that implements the Tool interface. But when I try to run the job in Eclipse, I get a NoClassDefFoundError before the run() method is invoked.
I am running Hadoop 0.20.2 on Ubuntu 10.04 LTS. The source code and stack trace are provided below. Any assistance will be greatly appreciated.
Sourcecode
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.*;
public class MyTestDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s [generic options] <input> <output>\n",
getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
// Code here to submit Hadoop Job ...
return 0;
}
/**
* #param args
*/
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MyTestDriver(), args);
System.exit(exitCode);
}
}
Stacktrace
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/cli/ParseException at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at
MaxTemperatureDriver.main(MaxTemperatureDriver.java:44) Caused by:
java.lang.ClassNotFoundException:
org.apache.commons.cli.ParseException at
java.net.URLClassLoader$1.run(URLClassLoader.java:202) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:190) at
java.lang.ClassLoader.loadClass(ClassLoader.java:307) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at
java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 3 more
Are all of Hadoop's dependencies in your Eclipse build path? Make sure all the jars in the hadoop/lib directory are in your build path.
The error means that there was a particular class that was not found. Classes are stored within .jar files. So, you need to check if all the required jar files are available or not. Jar files are searched based on the CLASSPATH variable. If you are using Eclipse as your development environment, please check if the build dependencies are satisfied (you can do this by - right clicking on your project -> configure build path -> add external jar).
Also, if you are not sure which class is missing, you can check the classname (Hadoop is open source, so you can find the class name) and then search the class name in findjar

Resources