Nutch and HBase configuration error - hadoop

I am trying to get nutch and hbase working based on this docker image: https://hub.docker.com/r/cogfor/nutch/
I am getting an exception that i try to inject a URL file:
InjectorJob: starting at 2017-12-19 20:49:45
InjectorJob: Injecting urlDir: urls
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/hbase/HBaseConfiguration
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:114)
at g.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
I know there is some misconfiguration between Nutch/HBase/Hadoop.
My gora.properties has:
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
My hbase-site.xml has:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///data</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
</property>
</configuration>
And my nutch-site.xml has:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>My Spider</value>
</property>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.hbase.store.HBaseStore</value>
<description>Default class for storing data</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(text|tika|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>true</value>
</property>
<property>
<name>parser.character.encoding.default</name>
<value>utf-8</value>
</property>
<property>
<name>http.content.limit</name>
<value>6553600</value>
</property>
This same is error is reported multiple times on S.O. but none of the solutions worked for me. The $HBASE_HOME and $HADOOP_CLASSPATH env variables are set to:
root#a5fb7fefc53e:/nutch_source/runtime/local/bin# echo $HADOOP_CLASSPATH
/opt/hbase-0.98.21-hadoop2/lib/hbase-client-0.98.21-hadoop2.jar:
/opt/hbase-0.98.21-hadoop2/lib/hbase-common-0.98.12-hadoop2.jar:
/opt/hbase-0.98.21-hadoop2/lib/protobuf-java-2.5.0.jar: /opt/hbase-
0.98.21-hadoop2/lib/guava-12.0.1.jar: /opt/hbase-0.98.21-
hadoop2/lib/zookeeper-3.4.6.jar: /opt/hbase-0.98.21-hadoop2/lib/hbase-
protocol-0.98.12-hadoop2.jar
root#a5fb7fefc53e:/nutch_source/runtime/local/bin# echo $HBASE_HOME
/opt/hbase-0.98.21-hadoop2
I verified all those files exist.
Can someone please help me out what i am missing?

The issue is mentioned in the documentation (https://wiki.apache.org/nutch/Nutch2Tutorial)
"N.B. It's possible to encounter the following exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration; this is caused by the fact that sometimes the hbase TEST jar is deployed in the lib dir. To resolve this just copy the lib over from your installed HBase dir into the build lib dir. (This issue is currently in progress)."
All that needs to be done is this:
cp -R /root/hbase/lib/* /root/nutch/lib/
and nutch will start working fine.

Related

oozie wofklow intermittently fails on java action for scalding

I am using CDH (Cloudera Hadoop) version 5.12.0 (which uses: Hadoop 2.6.0 Oozie 4.1.0) and Scalding 2.11
I am using a shaded jar with my dependencies built in.
I can run all my jobs properly without any error using a hadoop jar command as such:
hadoop jar /<path>/<to>/<my>/<project>.jar com.twitter.scalding.Tool
-libjars <comma>,<separated>,<list>,<of>,<dependencies>
-D mapreduce.framework.name=yarn
-D yarn.app.mapreduce.am.staging-dir=/<path>/<to>/<staging>/<dir>
-D mapreduce.map.output.compress=True
<my>.<scalding>.<job> --hdfs --input <my>/<input> --output <output>/<dir>
I can also run an oozie workflow with pig actions or other actions without any trouble. But when I try to run an oozie workflow with java actions that call scalding like so:
<action name="myAction">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="<output>/<dir>"/>
</prepare>
<configuration>
<property>
<name>oozie.action.sharelib.for.java</name>
<value>scalding</value>
</property>
<property>
<name>oozie.launcher.mapreduce.task.classpath.user.precedence</name>
<value>true</value>
</property>
<property>
<name>oozie.launcher.mapreduce.task.classpath.first</name>
<value>true</value>
</property>
<property>
<name>oozie.launcher.mapreduce.job.user.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.user.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapreduce.task.classpath.user.precedence</name>
<value>true</value>
</property>
<property>
<name>mapreduce.task.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.log.level</name>
<value>DEBUG</value>
</property>
<property>
<name>mapreduce.reduce.log.level</name>
<value>DEBUG</value>
</property>
<property>
<name>mapreduce.job.log4j-properties-file</name>
<value>${appLocation}/conf</value>
</property>
<property>
<name>mapreduce.task.files.preserve.failedtasks</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.jvm.numtasks</name>
<value>-1</value>
</property>
<property>
<name>oozie.mapreduce.uber.jar</name>
<value>/<path>/<to>/<my>/<class>.jar</value>
</property>
</configuration>
<main-class>com.twitter.scalding.Tool</main-class>
<java-opts>-Xms2G -Xmx3G</java-opts>
<arg><my>.<scalding>.<job></arg>
<arg>--hdfs</arg>
<arg>--input</arg>
<arg><my>/<input></arg>
<arg>--output</arg>
<arg><output>/<dir></arg>
<file>/<path>/<to>/<my>/<project>.jar</file>
</java>
<ok to="end" />
<error to="sendEmailFailure" />
I have also set:
oozie.libpath=<path>/<to>/<lib>/<directory>
oozie.use.system.libpath=true
in my properties file and all the necessary jars are within the libpath, but I get errors where the java action can't find the dependencies it needs:
Error: java.io.IOException: Split class cascading.tap.hadoop.io.MultiInputSplit not found at
org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:363) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.io.MultiInputSplit not found at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2108) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:361) ... 7 more
This happens inconsistently, in that if I only put one action in the workflow, sometimes the workflow completes, sometimes it fails. If I put more than one action in the workflow, it definitely fails on some action eventually, not always the same action but always with the same error.
I believe the problem is coming from the way in which the action is getting it's dependencies and sometimes it loads them in the correct order and gets the one I want first and sometimes it doesn't and is missing the dependency it needs. But I'm new to oozie, so who knows?
I think I can increase the number of max attempts for each action taken by oozie, but I feel like that is not really a solution and is sort of akin to rolling the dice against the cluster.
I saw people post this issue a while back and rolling back to an older verison of CDH (4.1) but that's not an option for me.
Suggestions?

hbase doesn't start because of Master exiting error caused by java.lang.NumberFormatException

introduction
In version 1.1.4, I run the start-hbase.sh,find that regionserver is successfully started but the hbase master failed as follows:
error
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2341)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:233)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2355)
Caused by: java.lang.NumberFormatException: For input string: "hdfs://dell06:60000"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1104)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:852)
at org.apache.hadoop.hbase.master.MasterRpcServices.<init>(MasterRpcServices.java:214)
at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:533)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:531)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:365)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2336)
... 5 more
explain
the input string: "hdfs://dell06:60000" is the configuration of hbase.master.port,and the full hbase-site.xml as follows.
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://dell06:12306/hbase</value>
<description></description>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/tmp/hbase-${user.name}</value>
<description></description>
</property>
<property>
<name>hbase.master.port</name>
<value>hdfs://dell06:60000</value>
<description></description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>dell01,dell02,dell03,dell04,dell05,dell06</value>
<description></description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>5</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>30</value>
</property>
</configuration>
i read the source code of this part but didn't get it!! Thanks in advance

Not able to start hive

I installed Hadoop on windows and also setup hive. When I start hive using hive.cmd, I get the following error
16/12/28 18:14:05 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
It has not created the metastore_db folder in the hive\bin path.
I also tried using the schematool to initialize the schemas. But it gives me "'schematool' is not recognized as an internal or external command,
operable program or batch file."
My environment variables are as follows :
HIVE_BIN_PATH : C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin\bin
HIVE_HOME : C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin
HIVE_LIB : C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin\lib
PATH : C:\hadoop-2.7.1.tar\hadoop-2.7.1\bin;C:\apache\db-derby-10.12.1.1-bin\bin;C:\hadoop-2.7.1.tar\apache-hive-2.1.1-bin\bin;
Here is my hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.server2.enable.impersonation</name>
<value>true</value>
<description>Enable user impersonation for HiveServer2</description>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
</configuration>
I have added the derby.jar, derby-client.jar and derbytools.jar to the hive\lib folder. I have also added the slf4j-api-1.5.8.jar to the hive\lib folder. But it still does not work. Any pointers on this one?

hbase 0.95.1 fails on hadoop-2.0.5 alpha

I installed hadoop-2.0.5-alpha, hbase-0.95.1-hadoop2, and zookeeper-3.4.5. Hadoop and zookeper are running fine. HDFS and MR2 work great. But HBase will not boot. Has anyone seen this error before? I'll post my config and logs below. Thanks in advance for your help.
hbase-site.xml :
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>master</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:8020/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
</configuration>
hbase-xxxx-master-master.log :
2013-07-02 14:33:14,791 FATAL [master:master:60000] master.HMaster: Unhandled
exception. Starting shutdown.
java.io.IOException: Failed on local exception:
com.google.protobuf.InvalidProtocolBufferException: Message missing required fields:
callId, status; Host Details : local host is: "master/192.168.255.130"; destination
host is: "master":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760)
at org.apache.hadoop.ipc.Client.call(Client.java:1168)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invok
(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy10.setSafeMode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invok
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy10.setSafeMode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode
(ClientNamenodeProtocolTranslatorPB.java:514)
at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:1896)
at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode
(DistributedFileSystem.java:660)
at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:421)
at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:828)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir
(MasterFileSystem.java:464)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout
(MasterFileSystem.java:153)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:137)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:728)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:546)
at java.lang.Thread.run(Thread.java:662)
Make sure you have built hbase properly(keeping all the hadoop-2.0.5 dependencies in mind). Verify that the hadoop-core jar in hbase/lib directory is same as hadoop jar inside your hadoop. Check the version of hadoop in your pom.xml once and build hbase accordingly.
If you still face any issue you can try the patch from HBASE-7904 and rebuild your HBase.
HTH
there may be compatibility issue while installing hbase with hadoop 2.x please
check

scheduling a HBase Hadoop MR job having input parameters

I am able to run the job using hadoop jar command.
But when I try to schedule the job using oozie I am unable to do that.
Also please let me know if the error is due to data in hbase table or due to xml file.
The WorkFlow xml File is as follows :
<workflow-app xmlns="uri:oozie:workflow:0.1" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker>00.00.00.116:00000</job-tracker>
<name-node>hdfs://00.00.000.116:00000</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>aaaaaa0000002d:2888:3888,bbbbbb000000d:2888:3888,bbbbbb000000d:2888:3888</value>
</property>
<property>
<name>hbase.master</name>
<value>aaaaaa000000d:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://aaaa000000d:54310/hbase</value>
</property>
</configuration>
<main-class>com.cf.mapreduce.nord.GetSuggestedItemsForViewsCarts</main-class>
</java>
<map-reduce>
<job-tracker>1000.0000.00.000</job-tracker>
<name-node>hdfs://10.00.000.000:00000</name-node>
<configuration>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>mahout.cf.mapreduce.nord.GetSuggestedItemsForViewsCarts$GetSuggestedItemsForViewsCartsMapper</value>
</property>
<property>
<name>mapreduce.reduce.class</name>
<value>mahout.cf.mapreduce.nord.GetSuggestedItemsForViewsCarts$GetSuggestedItemsForViewsCartsReducer</value>
</property>
<property>
<name>hbase.mapreduce.inputtable</name>
<value>${MAPPER_INPUT_TABLE}</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>${wf:actionData('get-scanner')['scan']}</value>
</property>
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableInputFormat</value>
</property>
<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.NullOutputFormat</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>10</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>aaa000,aaaa0000,aaaa00000</value>
</property>
<property>
<name>hbase.master</name>
<value>blrkec242032d:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://aaaa0000:00010/hbase</value>
</property>
</configuration>
</map-reduce>
and the error log of mapper is :
Submitting Oozie action Map-Reduce job
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.MapReduceMain], main() threw exception, No table was provided.
java.io.IOException: No table was provided. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:130) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818)
org.apache.oozie.action.hadoop.MapReduceMain.submitJob(MapReduceMain.java:91)
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:57)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:454)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
syslog logs
2012-12-11 10:21:18,472 WARN org.apache.hadoop.mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2012-12-11 10:21:18,586 ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormat: java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:404)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:153) org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:70) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818) at org.apache.oozie.action.hadoop.MapReduceMain.submitJob(MapReduceMain.java:91)
at org.apache.oozie.action.hadoop.MapReduceMain.run(MapReduceMain.java:57)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.MapReduceMain.main(MapReduceMain.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:454)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
When you call TableMapReduceUtil.initTableMapper(..), the utility method is configuring a number of job properties, one of which is the HBase table to scan.
Looking through the code (#GrepCode), i can see the following properties being set by this method:
<property>
<name>hbase.mapreduce.inputtable</name>
<value>CUSTOMER_INFO</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>...</value>
</property>
The input table should be the name of your table, the scan property is some serialization of the scan information (a Base 64 encoded version). You best bet in my opinion is to run a job manually, and inspect the job.xml via the job tracker to see what the set values are.
Note you'll also need to set the properties for the reducer (see the source in the initTableReducerJob method), again inspecting the job.xml for a job that has been submitted manually may be your best bet.

Resources