How to run HBase program - hadoop

How can I run the code below from the command line?
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.*;
public class MyHBase {
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
try {
HTable table = new HTable(conf, "test-table");
Put put = new Put(Bytes.toBytes("test-key"));
put.add(Bytes.toBytes("cf"), Bytes.toBytes("q"), Bytes.toBytes("value"));
table.put(put);
} finally {
admin.close();
}
}
}
How to set my hbase classpath? I am getting a huge string in my classpath.
UPDATE
root# vi MyHBase.java
hbase-0.92.2 root# java -classpath `hbase classpath`:./ /var/root/MyHBase
-sh: hbase: command not found
Exception in thread "main" java.lang.NoClassDefFoundError: /var/root/MyHBase
Caused by: java.lang.ClassNotFoundException: .var.root.MyHBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
I am able to do
hbase-0.92.2 root# bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.2, r1379292, Fri Aug 31 13:13:53 UTC 2012
hbase(main):001:0> exit
Am i doing anything wrong?

You can also do something like this:-
# export HADOOP_CLASSPATH=`./hbase classpath`
and then bundle this java in jar to run it within hadoop cluster like this:-
#hadoop jar <jarfile> <mainclass>

Use this
java -classpath `hbase classpath`:./ MyHBase
this will get the entire classpath string that HBase uses and adds the current directory as well to the classpath.
Cheers
Rags

Related

error in running hbase java program

this is my java hbase createtable program below:-
public class createtable
{
public static void main(String[] args) throws IOException
{
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "sandbox.hortonworks.com");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("zookeeper.znode.parent", "/hbase-unsecure");
HBaseAdmin admin = new HBaseAdmin(conf);
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("people"));
tableDescriptor.addFamily(new HColumnDescriptor("personal"));
tableDescriptor.addFamily(new HColumnDescriptor("contactinfo"));
admin.createTable(tableDescriptor);
Put put = new Put(Bytes.toBytes("doe-john-m-12345");
}
after creating jar (table-0.0.1-SNAPSHOT.jar) of the program when i am running the command
hadoop jar table-0.0.1-SNAPSHOT.jar table.createtable
i am getting
Exception in thread "main" java.lang.NoClassDefFoundError:org/apachehadoop/hbase/HBaseConfiguration at
table.createtable.main(createtable.java:17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav a:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Causedby:
java.lang.ClassNotFoundException:org.apache.hadoop.hbase.HBase
Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
how do i resolve this error ??
Your code is not able to see hbase.jar . Try adding this piece in your pom.xml
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.90.2</version>
</dependency>
And then run it .
Please read this blog: https://my-bigdata-blog.blogspot.com/2017/08/hbase-programming-on-map-reduce-with.html
Two points to make it work
1) You code should have TableMapReduceUtil.addDependencyJars(job);
2) On command line - before you execute the command do this:
export HADOOP_CLASSPATH=your-jar-path
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:hbase classpath
This adds hbase libraries for execution. The one you add in maven/netbeans are for compilation.
i solved this issue as there is issue in making fat jar and the in new hbase version jars are deprecated with many dependency..so i also added hbase-client-1.2.6-jar and hbase-common-1.2.6-jar to eclipse externally and then building fat jar ...this solved my issue .

Hadoop: ClassNotFoundException - org.apache.hcatalog.rcfile.RCFileMapReduceOutputFormat

I'm facing ClassNotFoundException, when I run my job for the class org.apache.hcatalog.rcfile.RCFileMapReduceOutputFormat.
I tried to pass the additional jar files with -libjars, still I am facing the same issue. Any suggestions will be greatly helpful. Thanks in advance.
Below is the command I am using and exception I am facing!
hadoop jar MyJob.jar MyDriver -libjars hcatalog-core-0.5.0-cdh4.4.0.jar inputDir OutputDir
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hcatalog/rcfile/RCFileMapReduceOutputFormat
at com.cloudera.sa.omniture.mr.OmnitureToRCFileJob.run(OmnitureToRCFileJob.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.cloudera.sa.omniture.mr.OmnitureToRCFileJob.main(OmnitureToRCFileJob.java:131)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hcatalog.rcfile.RCFileMapReduceOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more
I implemented ToolRunner as well, below is the code which confirms that!
public class OmnitureToRCFileJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
OmnitureToRCFileJob processor = new OmnitureToRCFileJob();
String[] otherArgs = new GenericOptionsParser(processor.getConf(), args).getRemainingArgs();
System.exit(ToolRunner.run(processor.getConf(), processor, otherArgs));
}
}
Did you try running by giving full path of "hcatalog-core-0.5.0-cdh4.4.0.jar" jar file in your below line.
hadoop jar MyJob.jar MyDriver -libjars hcatalog-core-0.5.0-cdh4.4.0.jar inputDir OutputDir
or
Below configuration should also work for you
$ export LIBJARS= <fullpath>/hcatalog-core-0.5.0-cdh4.4.0.jar
$hadoop jar MyJob.jar MyDriver -libjars ${LIBJARS} inputDir OutputDir
If you look at hadoop command documentation, you can see that -libjars is a generic option. For parsing generic option, you got to override the ToolRunner.run() method in your driver class as follows :
public class TestDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
# Job configuration details
# Job submission
return 0;
}
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new TestDriver(), args);
System.exit(exitCode);
}
I'nk you are getting this exception from your driver code itself. Setting hcatalog-cor*.jar using -libjars option may not be available in client JVM(JVM in which driver code runs). Better you need to set this jar in HADOOP_CLASSPATH environment variable before executing the same using hadoop jar as follows
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:<PATH-TO-HCAT-LIB>/hcatalog-core-0.5.0-cdh4.4.0.jar;
hadoop jar MyJob.jar MyDriver -libjars hcatalog-core-0.5.0-cdh4.4.0.jar inputDir OutputDir
Id had the same problem but found out the jar command doesn't accept the --libjars argument.
"Specify comma separated jar files to include in the classpath. Applies only to job." --> Hadoop Cli Generic Options
Instead you should use the env vars to add additional or replace jars.
export HADOOP_USER_CLASSPATH_FIRST=true
export HADOOP_CLASSPATH="./lib/*"

HBase Bulk Load MapReduce HFile exception (netty jar)

I am attempting to run a simple MapReduce process to write HFiles for later import into an HBase table.
When the job is submitted:
hbase com.pcoa.Driver /test /bulk pcoa
I am getting the following exception indicating that netty-3.6.6.Final.jar does not exist in HDFS (it does however exist here).
-rw-r--r--+ 1 mbeening flprod 1206119 Sep 18 18:25 /dedge1/hadoop/hbase-0.96.1.1-hadoop2/lib/netty-3.6.6.Final.jar
I am afraid that I do not understand how to address this configuration(?) error.
Can anyone provide any advice to me?
Here is the Exception:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost/dedge1/hadoop/hbase-0.96.1.1-hadoop2/lib/netty-3.6.6.Final.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at com.pcoa.Driver.main(Driver.java:63)
Here is my driver routine:
public class Driver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "HBase Bulk Import");
job.setJarByClass(HBaseKVMapper.class);
job.setMapperClass(HBaseKVMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(KeyValue.class);
job.setInputFormatClass(TextInputFormat.class);
HTable hTable = new HTable(conf, args[2]);
HFileOutputFormat.configureIncrementalLoad(job, hTable);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I am not sure why/if i had to do this (didn't see anything like this in any of the startup docs anywhere)
but I ran one of these:
hdfs dfs -put /hadoop/hbase-0.96.1.1-hadoop2/lib/*.jar /hadoop/hbase-0.96.1.1-hadoop2/lib
And....my MR job seems to run now
If this is an incorrect course - please let me know
thanks!

Getting ClassNotFound Exception: While dumping Configuration in Hadoop

I'm trying a simple program from the "hadoop in Action" book to merge a series of files from the local file system into one file in the hdfs. The code snippet is the same as the one provided in the book.
import java.lang.*;
import java.util.*;
import java.io.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
public class PutMerge {
public static void main(String[] args) throws IOException{
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
FileSystem local = FileSystem.getLocal(conf);
Path inputDir = new Path(args[0]); // First argument has the input directory
Path hdfsFile = new Path(args[1]); // Concatenated hdfs file name
try {
FileStatus[] inputFiles = local.listStatus(inputDir); // list of Local Files
FSDataOutputStream out = hdfs.create(hdfsFile); // target file creation
for (int i = 0; i<inputFiles.size; i++ {
FSDataInputStream in = local.open(inputFiles[i].getPath());
int bytesRead = 0;
byte[] buff = new byte[256];
while (bytesRead = (in.read(buff))>0) {
out.write(buff,0,bytesRead);
}
in.close();
}
out.close();
}
catch(Exception e) {
e.printStackTrace();
}
}
}
The program successfully compiled and while trying to run I'm getting the following exception
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:37)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:34)
at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:217)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:185)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:237)
at org.apache.hadoop.security.KerberosName.(KerberosName.java:79)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:210)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:185)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:237)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:482)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:468)
at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at PutMerge.main(PutMerge.java:16) Caused by: java.lang.ClassNotFoundException:
org.apache.commons.configuration.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 17 more
Based on inputs from some of the posts, I added the commons package. My classpath definition is
/usr/java/jdk1.7.0_21:/data/commons-logging-1.1.2/commons-logging-1.1.2.jar:/data/hadoop-1.1.2/hadoop-core-1.1.2.jar:/data/commons-logging-1.1.2/commons-logging-adapters-1.1.2.jar:/data/commons-logging-1.1.2/commons-logging-api-1.1.2.jar:.
Any clue on why this is not working?
You didnt include apache configuration in your classpath.
Really though you shouldn't need to include much besides hadoop itself. Make sure you are running your jar with hadoop itself.
> hadoop -jar myJar.jar

Hadoop ToolRunner fails with NoClassDefFoundError

I have a created a simple MapReduce Driver that implements the Tool interface. But when I try to run the job in Eclipse, I get a NoClassDefFoundError before the run() method is invoked.
I am running Hadoop 0.20.2 on Ubuntu 10.04 LTS. The source code and stack trace are provided below. Any assistance will be greatly appreciated.
Sourcecode
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.*;
public class MyTestDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s [generic options] <input> <output>\n",
getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
// Code here to submit Hadoop Job ...
return 0;
}
/**
* #param args
*/
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new MyTestDriver(), args);
System.exit(exitCode);
}
}
Stacktrace
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/cli/ParseException at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at
MaxTemperatureDriver.main(MaxTemperatureDriver.java:44) Caused by:
java.lang.ClassNotFoundException:
org.apache.commons.cli.ParseException at
java.net.URLClassLoader$1.run(URLClassLoader.java:202) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:190) at
java.lang.ClassLoader.loadClass(ClassLoader.java:307) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at
java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 3 more
Are all of Hadoop's dependencies in your Eclipse build path? Make sure all the jars in the hadoop/lib directory are in your build path.
The error means that there was a particular class that was not found. Classes are stored within .jar files. So, you need to check if all the required jar files are available or not. Jar files are searched based on the CLASSPATH variable. If you are using Eclipse as your development environment, please check if the build dependencies are satisfied (you can do this by - right clicking on your project -> configure build path -> add external jar).
Also, if you are not sure which class is missing, you can check the classname (Hadoop is open source, so you can find the class name) and then search the class name in findjar

Resources