Can't Store pig relation into Hbase - hadoop

Hi I am trying to store pig relation into HBase.
store result INTO 'hbase://hourlyAggregation' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('countDetails:ansCount countDetails:divCount countDetails:unansCount countDetails:engCount');
This is running fine in local. when I tried to run pig in mapred mode my job is failing and my log is showing no error
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /home/HadoopUser/pig_1384412383791.log
Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:500)
at org.apache.pig.Main.main(Main.java:107)
================================================================================
my profile is as follows
export JAVA_HOME=/home/hadoop/jdk1.6.0_39
export HADOOP_HOME=$MY_HOME/hadoop-0.20.2-cdh3u4
export CLASSPATH=$JAVA_HOME/lib/tools.jar:.
export HIVE_HOME=$MY_HOME/hive-0.7.1-cdh3u4
export PIG_HOME=$MY_HOME/pig-0.8.1-cdh3u4
export HBASE_HOME=$MY_HOME/hbase-0.90.6-cdh3u4
export PIG_CLASSPATH=”`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH”
please help me on this
I even tried to register jars of zookeeper and hbase in pig_home/lib
JT log
14-Nov-2013 14:48:29 (17sec)
java.lang.RuntimeException: could not instantiate 'org.apache.pig.backend.hadoop.hbase.HBaseStorage' with arguments '[countDetails:ansCount countDetails:divCount countDetails:unansCount countDetails:engCount]'
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:502)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:218)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:85)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:68)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:278)
at org.apache.hadoop.mapred.Task.initialize(Task.java:511)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:306)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

registering hbase,zookeeper jar in PIG/lib and guava jar in hbase lib did worked thanks Lorand Bendig for ur support.

Related

apache pig not connecting to hdfs

I have Hadoop version 2.6.3 and pig-0.6.0
I have all the daemons up and running in Single node cluster.
After firing the pig command . The pig is only connecting to file:/// not hdfs
could you please tell me how to make it to connect hdfs
below is the INFO log that could i see
2016-01-10 20:58:30,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2016-01-10 20:58:30,650 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
when I hit the command in GRUNT
grunt> ls hdfs://localhost:54310/
2016-01-10 21:05:41,059 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Wrong FS: hdfs://localhost:54310/, expected: file:///
Details at logfile: /home/hguna/pig_1452488310172.log
I have no clue has to why it is expecting file:///
ERROR 2999: Unexpected internal error. Wrong FS: hdfs://localhost:54310/, expected: file:///
java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:54310/, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
at org.apache.pig.tools.grunt.GruntParser.processLS(GruntParser.java:576)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:304)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:352)
Did I configure hadoop correctly ? or some where I am wrong please let me know if there is any file that I need to share . I have done enough researching could not fix it .Btw I am a newbie to Hadoop and pig
please help me .
Thanks
Chek your configuration in hadoop-site.xml, core-site.xml and mapred-site.xml
Use PIG_CLASSPATH to specify addition classpath entries. For eg, to add hadoop configuration files (hadoop-site.xml, core-site.xml) to classpath
export PIG_CLASSPATH=<path_to_hadoop_conf_dir>
you should override default classpath entries by setting PIG_USER_CLASSPATH_FIRST
export PIG_USER_CLASSPATH_FIRST=true
After that you can able to start the grunt shell

Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)

I am running a hive query throwh oozie using hue..
I am creating a table through hue-oozie work flow...
My job is failing but when I check in hive the table is created.
Log shows below error:
16157 [main] INFO org.apache.hadoop.hive.ql.hooks.ATSHook - Created ATS Hook
2015-09-24 11:05:35,801 INFO [main] hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
2015-09-24 11:05:35,803 ERROR [main] ql.Driver (SessionState.java:printError(960)) - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
Not able to identify the issue....
I am usig HDP 2.3.1
Basically this error is due to missing atlas jar in oozie share lib.
In HDP the Atlas jar is available in /usr/hdp/2.3.0.0-2557/atlas/
Put all the jars related to atlas in hadoop share lib ..
hadoop fs -put /usr/hdp/2.3.0.0-2557/atlas/hook/hive/* /user/oozie/share/lib/lib200344/hive
Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh .
Copy <atlas package>/conf/application.propertiesto hive conf directory.
Restart the oozie services. This will solve this problem. If anybody face the problem please comment here so that I can help.
[Comment by Immo Huneke: when using the Hortonworks sandbox VM, I found that just putting the jar files in the share/lib folder under HDFS was enough to resolve the problem. I didn't have to update hive-env.sh or copy the application.properties file. But check the exact path of your share/lib folder by executing the command hdfs dfs -ls /user/oozie/share/lib before copying.]
hive>add jar /usr/hdp//atlas/hook/hive/hive-bridge-${VERSION}.jar
it will be ok.
hope help for u.
It Seems You CLASS is not found exception.
Have you installed Oozie Sharedlib, if Yes, please update all the hive dependent Jar in the sharedLib Location, and check if the status
Also check if Hive Client is available in all the Nodes under the cluster and same should be running
​I tried each and every possible solution mentioned in this forum and in stackoverflow, but it did not resolve my issue.
Finally, I resolved it by copying all the jars in /hook/hive to lib (create a new lib folder at job.properties level) folder of my oozie workflow

Integrating Pig with Hbase

I have installed hadoop-2.5.0, pig 0.13.0 and HBase 0.98.6.1 in linux. When trying to run simple pig script, error occurs as
2014-10-14 16:01:54,891 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.hbase.util.Bytes.equals([BLjava/nio/ByteBuffer;)Z
Details at logfile: /home/labuser/pig_1413279561970.log
Pasted the log below...
Pig Stack Trace
ERROR 2998: Unhandled internal error. org.apache.hadoop.hbase.util.Bytes.equals([BLjava/nio/ByteBuffer;)Z
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.Bytes.equals([BLjava/nio/ByteBuffer;)Z
at org.apache.hadoop.hbase.TableName.(TableName.java:281)
at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:344)
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:382)
at org.apache.hadoop.hbase.TableName.(TableName.java:82)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
It seems that HBase 0.98.6.1 version does not support for pig 0.13.0
So how to make it works? or which version of HBase does support for pig 0.13.0?
The root cause for this has been identified to be https://issues.apache.org/jira/browse/HBASE-6658 where it says the class "org.apache.hadoop.hbase.filter.WritableByteArrayComparable" was renamed.
You may need to re-compile using the HBase profile you're using.

SerDe problems with Hive 0.12 and Hadoop 2.2.0-cdh5.0.0-beta2

The title is a bit weird as I'm having difficulties narrowing down the problem. I used my solution on Hadoop 2.0.0-cdh4.4.0 and hive 0.10 without issues.
I can't create a table using this SerDe: https://github.com/rcongiu/Hive-JSON-Serde
first try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.<init>(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V
second try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
I can create a table with this SerDe: https://github.com/cloudera/cdh-twitter-example
I create an external table with tweets from flume. I can't do "SELECT * FROM tweets;"
FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
I can do SELECT id, text FROM tweets;
I can do a SELECT COUNT(*) FROM tweets;
I can't self join this table:
Execution log at: /tmp/jochen.debie/jochen.debie_20140311121313_164611a9-b0d8-4e53-9bda-f9f7ac342aaf.log
2014-03-11 12:13:30 Starting to launch local task to process map join; maximum memory = 257294336
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-5
mentioned execution log:
2014-03-11 12:13:30,331 ERROR mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map local work failed
org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
Does anyone know how to fix this or at least show me where the problem is?
EDIT: Can it be a problem that I built the serde on a Hadoop 2.0.0-cdh4.4.0 and hive 0.10?
From what I've seen, Hive-.11+ has a bug in join with custom SerDe.
https://github.com/Esri/gis-tools-for-hadoop/issues/9
You might try the workaround of copying the JAR file containing the SerDe class, to $HIVE_HOME/lib .
(I see in your question you got ClassNotFoundException both in join and in other cases; so far the times I have encountered such were all with join.)
[Edit] Another workaround is to use HADOOP_CLASSPATH:
env HADOOP_CLASSPATH=some.jar:other.jar hive ...
[Edit] The work around applies to Hive versions 0.11 and 0.12; then 0.13 and above contain the fix for HIVE-6670.

Reading hive table using Pig script

I am trying to read hive table using PIG script but when I run a pig code to read a table in hive its giving me following error:
2014-02-12 15:48:36,143 [main] WARN org.apache.hadoop.hive.conf.HiveConf
-hive-site.xml not found on CLASSPATH 2014-02-12 15:49:10,781 [main] ERROR
org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate
exception from backed error: Error: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
(Ignore newlines and whitespace added for readability)
Hadoop version
1.1.1
Hive version
0.9.0
Pig version
0.10.0
Pig code
a = LOAD '/user/hive/warehouse/test' USING
org.apache.pig.piggybank.storage.HiveColumnarLoader('name string');
Is it due to some version mismatch ?
Why can't you use hcatalog to access hive metadata in pig?
Check this for an example

Resources