error while executing insert overwrite query in hive - hadoop

I'm using hadoop 1.2 , hbase 0.94.8 and hive 0.14 . I'am trying to insert data into a hbase table using hive.
I have already created the table:
CREATE TABLE hbase_table_emp(id int, name string, role string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");
and load data into another table that I will overwrite it into the hbase table :
hive> create table testemp(id int, name string, role string) row format delimited fields terminated by '\t';
hive> load data local inpath '/home/user/sample.txt' into table testemp;
Now,I'm trying to overwrite it into hbase table:
When I do :
hive> insert overwrite table hbase_table_emp select * from testemp;
I get this error:
hive> insert overwrite table hbase_table_emp select * from testemp;
Query ID = hduser_20150126005151_ebc2a36f-97c4-41da-b145-32d5732d9681
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.lang.NoClassDefFoundError: org/cliffc/high_scale_lib/Counter
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureJobConf(HBaseStorageHandler.java:470)
at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobConf(PlanUtils.java:856)
at org.apache.hadoop.hive.ql.plan.MapWork.configureJobConf(MapWork.java:544)
at org.apache.hadoop.hive.ql.plan.MapredWork.configureJobConf(MapredWork.java:68)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:370)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassNotFoundException: org.cliffc.high_scale_lib.Counter
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 24 more
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org/cliffc/high_scale_lib/Counter
Could someone help me please?

I found a solution to this porblem!
First of all I changed to hadoop2.4 /hbase0.98 /hive 0.14:
in hive-env.sh I did:
export HIVE_AUX_JARS_PATH=/usr/local/hbase/lib
and in the hive shell:
hive> add jar /usr/local/hive/lib/hive-hbase-handler-0.14.0.jar;
hive> add jar /usr/local/hbase/lib/hbase-common-0.98.0-hadoop2.jar;
hive> add jar /usr/local/hbase/lib/zookeeper-3.4.5.jar;
hive> add jar /usr/local/hbase/lib/guava-12.0.1.jar;
hive> add jar /usr/local/hbase/lib/high-scale-lib-1.1.1.jar;
and this worked out for me :)

It sounds like at runtime Hive isn't able to resolve the HBase home directory (and associated libraries). Can you verify that the following environment variables are set (your actual values may vary of course, depending on where Hive and HBase are installed)?
HBASE_IDENT_STRING=hbase
HBASE_CONF_DIR=/etc/hbase/conf
HBASE_HOME=/usr/lib/hbase
HBASE_LOG_DIR=/var/log/hbase
HIVE_HOME=/usr/lib/hive
HBASE_PID_DIR=/var/run/hbase
I'm not certain whether all of the above environmental settings have to be defined for what you are doing to work, but my intuition is that at a minimum HBASE_HOME and HBASE_CONF_DIR must be set.
For reference, the above environment variables are ones that the HDP 2.1 distribution sets for you at Hive runtime. I was able to perform the Hive-to-HBase persistence you were attempting with a minimal dataset, so hopefully the solution lies with those environmental settings.

Related

HIve - Create Table as - error while creating parquet table from existing table

I created a Parquet table(orders_parquet) from existing table(orders) with CTAS as below :
CREATE TABLE orders_parquet
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
AS SELECT * FROM orders;
It is loading with below
Query ID = jonnavithulasivakrishna_20171105234912_e608ac1f-a10b-435e-8307-92747fb5c37d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Can't load log handler "java.util.logging.FileHandler"
java.io.FileNotFoundException: /tmp/parquet-3.log (Permission denied)
java.io.FileNotFoundException: /tmp/parquet-3.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at java.util.logging.FileHandler.open(FileHandler.java:210)
at java.util.logging.FileHandler.rotate(FileHandler.java:661)
at java.util.logging.FileHandler.openFiles(FileHandler.java:538)
at java.util.logging.FileHandler.<init>(FileHandler.java:263)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
But the table is loaded with data.Could you please help why the error is coming?
Hive need write permission in /tmp location to log information. In your case hive don't have write permission to /tmp/parquet-3.log. Give write permission to given file as chmod 777 /tmp/parquet-3.log

ClassCastException on Drop table query in apache spark hive

I'm using the following hive query :
this.queryExecutor.executeQuery("Drop table user")
and am getting the following exception :
java.lang.LinkageError: ClassCastException: attempting to castjar:file:/usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/javax/ws/rs/ext/RuntimeDelegate.class
at javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:116)
at javax.ws.rs.ext.RuntimeDelegate.getInstance(RuntimeDelegate.java:91)
at javax.ws.rs.core.MediaType.<clinit>(MediaType.java:44)
at com.sun.jersey.core.header.MediaTypes.<clinit>(MediaTypes.java:64)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:182)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:175)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.init(MessageBodyFactory.java:162)
at com.sun.jersey.api.client.Client.init(Client.java:342)
at com.sun.jersey.api.client.Client.access$000(Client.java:118)
at com.sun.jersey.api.client.Client$1.f(Client.java:191)
at com.sun.jersey.api.client.Client$1.f(Client.java:187)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
at com.sun.jersey.api.client.Client.<init>(Client.java:187)
at com.sun.jersey.api.client.Client.<init>(Client.java:170)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:340)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.hive.ql.hooks.ATSHook.<init>(ATSHook.java:67)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1309)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1293)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1347)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:495)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:484)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:484)
at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:474)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:613)
at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:89)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at com.accenture.aa.dmah.spark.core.QueryExecutor.executeQuery(QueryExecutor.scala:35)
at com.accenture.aa.dmah.attribution.transformer.MulltipleUserJourneyTransformer.transform(MulltipleUserJourneyTransformer.scala:32)
at com.accenture.aa.dmah.attribution.userjourney.UserJourneyBuilder$$anonfun$buildUserJourney$1.apply$mcVI$sp(UserJourneyBuilder.scala:31)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at com.accenture.aa.dmah.attribution.userjourney.UserJourneyBuilder.buildUserJourney(UserJourneyBuilder.scala:29)
at com.accenture.aa.dmah.attribution.core.AttributionHub.executeAttribution(AttributionHub.scala:47)
at com.accenture.aa.dmah.attribution.jobs.AttributionJob.process(AttributionJob.scala:33)
at com.accenture.aa.dmah.core.DMAHJob.processJob(DMAHJob.scala:73)
at com.accenture.aa.dmah.core.DMAHJob.execute(DMAHJob.scala:27)
at com.accenture.aa.dmah.core.JobRunner.<init>(JobRunner.scala:17)
at com.accenture.aa.dmah.core.ApplicationInstance.initilize(ApplicationInstance.scala:48)
at com.accenture.aa.dmah.core.Bootstrap.boot(Bootstrap.scala:112)
at com.accenture.aa.dmah.core.BootstrapObj$.main(Bootstrap.scala:134)
at com.accenture.aa.dmah.core.BootstrapObj.main(Bootstrap.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:74)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
I saw there have been similar posts here and here but they haven't had any response till now.
Also have looked here but don't think thats a valid course of action in my case.
Whats intriguing is that this is specific when we try to use drop table (or drop table if exists) query.
Hoping to find resolution for the same.
To my knowledge, The above error could be because of sample class with same package structure ie: 'javax.ws.rs.ext.RuntimeDelegate' found in different JARs issue. Class Objects are created and casted at run time. So there is every possibility the the code responsible for triggering DROP syntax , the above class would be used and broken due as it it is found more than once in the classpath.
I have tried DROP and DROP IF EXISTS in chd5 and was working without issue, below are the details of my run:
first run - Hadoop version - 2.6,Hive 1.1.0 and Spark - 1.3.1 (included hive libraries to spark lib)
second run -Hadoop version - 2.6,Hive 1.1.0 and Spark - 1.6.1
Mode of run - cli
scala> sqlContext.sql("DROP TABLE SAMPLE");
16/08/04 11:31:39 INFO parse.ParseDriver: Parsing command: DROP TABLE SAMPLE
16/08/04 11:31:39 INFO parse.ParseDriver: Parse Completed
......
scala>sqlContext.sql("DROP TABLE IF EXISTS SAMPLE");
16/08/04 11:40:34 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS SAMPLE
16/08/04 11:40:35 INFO parse.ParseDriver: Parse Completed
.....
If Possible please validate DROP commands using a different version of spark lib to narrow down problem scope.
Meanwhile, I am analyzing the jars to find out the linkage where two occurrences of same class 'RuntimeDelegate' existence and will report back to check if the removal of any jar can fix the issue and addition of the jar should recreate the same issue.

Sqoop Job Exception

I am getting the following Exception when I'm trying to List the Sqoop JOBS.
I'm not able to create the Soop jobs because of this exception:
root#ubuntu:/usr/lib/sqoop/conf# sqoop job --list 16/04/11 01:51:44
ERROR tool.JobTool: I/O error performing job operation:
java.io.IOException: Exception creating SQL connection at
com.cloudera.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:220)
at
com.cloudera.sqoop.metastore.hsqldb.AutoHsqldbStorage.open(AutoHsqldbStorage.java:113)
at com.cloudera.sqoop.tool.JobTool.run(JobTool.java:279) at
com.cloudera.sqoop.Sqoop.run(Sqoop.java:146) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at
com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182) at
com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221) at
com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230) at
com.cloudera.sqoop.Sqoop.main(Sqoop.java:239) Caused by:
Caused by: java.sql.SQLException: General error: java.lang.ClassFormatError: >Truncated class file
at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
at org.hsqldb.jdbc.jdbcConnection.(Unknown Source)
at org.hsqldb.jdbcDriver.getConnection(Unknown Source)
at org.hsqldb.jdbcDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at com.cloudera.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:>180)
... 8 more
Sqoop Version: 1.3.0-cdh3u5
Please help
Commands used as below:
sqoop job --list
sqoop job --create sqoopjob21 -- import --connect jdbc:mysql://localhost/mysql1 --table emp --target-dir /importjob21 ;
This may possibly be because sqoop cannot find the hsqldb which it uses to store job information. Check if you have "metastore.db.script" file in the sqoop installed directory, if not create it.
Create a file named "metastore.db.script" and put the following lines
CREATE SCHEMA PUBLIC AUTHORIZATION DBA
CREATE MEMORY TABLE SQOOP_ROOT(VERSION INTEGER,PROPNAME VARCHAR(128) NOT NULL,PROPVAL VARCHAR(256),CONSTRAINT SQOOP_ROOT_UNQ UNIQUE(VERSION,PROPNAME))
CREATE MEMORY TABLE SQOOP_SESSIONS(JOB_NAME VARCHAR(64) NOT NULL,PROPNAME VARCHAR(128) NOT NULL,PROPVAL VARCHAR(1024),PROPCLASS VARCHAR(32) NOT NULL,CONSTRAINT SQOOP_SESSIONS_UNQ UNIQUE(JOB_NAME,PROPNAME,PROPCLASS))
CREATE USER SA PASSWORD ""
GRANT DBA TO SA
SET WRITE_DELAY 10
SET SCHEMA PUBLIC
INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0')
INSERT INTO SQOOP_ROOT VALUES(0,'sqoop.hsqldb.job.info.table','SQOOP_SESSIONS')
Now create "metastore.db.properties" file and put these lines
#HSQL Database Engine 1.8.0.10
#Fri Aug 04 14:07:10 IST 2017
hsqldb.script_format=0
runtime.gc_interval=0
sql.enforce_strict_size=false
hsqldb.cache_size_scale=8
readonly=false
hsqldb.nio_data_file=true
hsqldb.cache_scale=14
version=1.8.0
hsqldb.default_table_type=memory
hsqldb.cache_file_scale=1
hsqldb.log_size=200
modified=no
hsqldb.cache_version=1.7.0
hsqldb.original_version=1.8.0
hsqldb.compatible_version=1.8.0
Now create a directory named ".sqoop" if not already created and put these two files there. Now run your job.

SerDe problems with Hive 0.12 and Hadoop 2.2.0-cdh5.0.0-beta2

The title is a bit weird as I'm having difficulties narrowing down the problem. I used my solution on Hadoop 2.0.0-cdh4.4.0 and hive 0.10 without issues.
I can't create a table using this SerDe: https://github.com/rcongiu/Hive-JSON-Serde
first try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.<init>(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V
second try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
I can create a table with this SerDe: https://github.com/cloudera/cdh-twitter-example
I create an external table with tweets from flume. I can't do "SELECT * FROM tweets;"
FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
I can do SELECT id, text FROM tweets;
I can do a SELECT COUNT(*) FROM tweets;
I can't self join this table:
Execution log at: /tmp/jochen.debie/jochen.debie_20140311121313_164611a9-b0d8-4e53-9bda-f9f7ac342aaf.log
2014-03-11 12:13:30 Starting to launch local task to process map join; maximum memory = 257294336
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-5
mentioned execution log:
2014-03-11 12:13:30,331 ERROR mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map local work failed
org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
Does anyone know how to fix this or at least show me where the problem is?
EDIT: Can it be a problem that I built the serde on a Hadoop 2.0.0-cdh4.4.0 and hive 0.10?
From what I've seen, Hive-.11+ has a bug in join with custom SerDe.
https://github.com/Esri/gis-tools-for-hadoop/issues/9
You might try the workaround of copying the JAR file containing the SerDe class, to $HIVE_HOME/lib .
(I see in your question you got ClassNotFoundException both in join and in other cases; so far the times I have encountered such were all with join.)
[Edit] Another workaround is to use HADOOP_CLASSPATH:
env HADOOP_CLASSPATH=some.jar:other.jar hive ...
[Edit] The work around applies to Hive versions 0.11 and 0.12; then 0.13 and above contain the fix for HIVE-6670.

Writing snappy compressed data to a hive table

I've created a hive table and now I want to load snappy compressed data into the table. Therefore I did the following:
SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET hive.exec.compress.output=true;
SET mapreduce.output.fileoutputformat.compress=true;
CREATE TABLE toydata_table (id STRING, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";'
Then I created as CSV file called toydata.csv that has the following content:
A,Value1
B,Value2
C,Value3
I compressed this file with snzip ( https://github.com/kubo/snzip ) by doing
/usr/local/bin/snzip -t snappy-java toydata.csv
which produces toydata.csv.snappy. After having done this I returned to the hive cli and loaded the data by LOAD DATA LOCAL INPATH "toydata.csv.snappy" INTO TABLE toydata_table;. But now I want to try to query from that table and get the following error message:
hive> select * from toydata_table;
OK
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:189)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:175)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:433)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I did the exact same thing with gzip and using gzip works fine. So, why does this part fail?
Please install snappy compression codec on your cluster.If you want to confirm whether snappy is installed please find libsnappy.so file in your libraries.
Also you need to start hive shell with --auxpath parameter and provide snappy.jar.e.g: hive --auxpath /home/user/snappy1.0.4.1.jar.

Resources