Writing snappy compressed data to a hive table - hadoop

I've created a hive table and now I want to load snappy compressed data into the table. Therefore I did the following:
SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET hive.exec.compress.output=true;
SET mapreduce.output.fileoutputformat.compress=true;
CREATE TABLE toydata_table (id STRING, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";'
Then I created as CSV file called toydata.csv that has the following content:
A,Value1
B,Value2
C,Value3
I compressed this file with snzip ( https://github.com/kubo/snzip ) by doing
/usr/local/bin/snzip -t snappy-java toydata.csv
which produces toydata.csv.snappy. After having done this I returned to the hive cli and loaded the data by LOAD DATA LOCAL INPATH "toydata.csv.snappy" INTO TABLE toydata_table;. But now I want to try to query from that table and get the following error message:
hive> select * from toydata_table;
OK
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:189)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:175)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:433)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I did the exact same thing with gzip and using gzip works fine. So, why does this part fail?

Please install snappy compression codec on your cluster.If you want to confirm whether snappy is installed please find libsnappy.so file in your libraries.
Also you need to start hive shell with --auxpath parameter and provide snappy.jar.e.g: hive --auxpath /home/user/snappy1.0.4.1.jar.

Related

HIve - Create Table as - error while creating parquet table from existing table

I created a Parquet table(orders_parquet) from existing table(orders) with CTAS as below :
CREATE TABLE orders_parquet
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
AS SELECT * FROM orders;
It is loading with below
Query ID = jonnavithulasivakrishna_20171105234912_e608ac1f-a10b-435e-8307-92747fb5c37d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Can't load log handler "java.util.logging.FileHandler"
java.io.FileNotFoundException: /tmp/parquet-3.log (Permission denied)
java.io.FileNotFoundException: /tmp/parquet-3.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
at java.util.logging.FileHandler.open(FileHandler.java:210)
at java.util.logging.FileHandler.rotate(FileHandler.java:661)
at java.util.logging.FileHandler.openFiles(FileHandler.java:538)
at java.util.logging.FileHandler.<init>(FileHandler.java:263)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
But the table is loaded with data.Could you please help why the error is coming?
Hive need write permission in /tmp location to log information. In your case hive don't have write permission to /tmp/parquet-3.log. Give write permission to given file as chmod 777 /tmp/parquet-3.log

ClassCastException on Drop table query in apache spark hive

I'm using the following hive query :
this.queryExecutor.executeQuery("Drop table user")
and am getting the following exception :
java.lang.LinkageError: ClassCastException: attempting to castjar:file:/usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/javax/ws/rs/ext/RuntimeDelegate.class
at javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:116)
at javax.ws.rs.ext.RuntimeDelegate.getInstance(RuntimeDelegate.java:91)
at javax.ws.rs.core.MediaType.<clinit>(MediaType.java:44)
at com.sun.jersey.core.header.MediaTypes.<clinit>(MediaTypes.java:64)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:182)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:175)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.init(MessageBodyFactory.java:162)
at com.sun.jersey.api.client.Client.init(Client.java:342)
at com.sun.jersey.api.client.Client.access$000(Client.java:118)
at com.sun.jersey.api.client.Client$1.f(Client.java:191)
at com.sun.jersey.api.client.Client$1.f(Client.java:187)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
at com.sun.jersey.api.client.Client.<init>(Client.java:187)
at com.sun.jersey.api.client.Client.<init>(Client.java:170)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:340)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.hive.ql.hooks.ATSHook.<init>(ATSHook.java:67)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1309)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1293)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1347)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:495)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:484)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:484)
at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:474)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:613)
at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:89)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at com.accenture.aa.dmah.spark.core.QueryExecutor.executeQuery(QueryExecutor.scala:35)
at com.accenture.aa.dmah.attribution.transformer.MulltipleUserJourneyTransformer.transform(MulltipleUserJourneyTransformer.scala:32)
at com.accenture.aa.dmah.attribution.userjourney.UserJourneyBuilder$$anonfun$buildUserJourney$1.apply$mcVI$sp(UserJourneyBuilder.scala:31)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at com.accenture.aa.dmah.attribution.userjourney.UserJourneyBuilder.buildUserJourney(UserJourneyBuilder.scala:29)
at com.accenture.aa.dmah.attribution.core.AttributionHub.executeAttribution(AttributionHub.scala:47)
at com.accenture.aa.dmah.attribution.jobs.AttributionJob.process(AttributionJob.scala:33)
at com.accenture.aa.dmah.core.DMAHJob.processJob(DMAHJob.scala:73)
at com.accenture.aa.dmah.core.DMAHJob.execute(DMAHJob.scala:27)
at com.accenture.aa.dmah.core.JobRunner.<init>(JobRunner.scala:17)
at com.accenture.aa.dmah.core.ApplicationInstance.initilize(ApplicationInstance.scala:48)
at com.accenture.aa.dmah.core.Bootstrap.boot(Bootstrap.scala:112)
at com.accenture.aa.dmah.core.BootstrapObj$.main(Bootstrap.scala:134)
at com.accenture.aa.dmah.core.BootstrapObj.main(Bootstrap.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:74)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
I saw there have been similar posts here and here but they haven't had any response till now.
Also have looked here but don't think thats a valid course of action in my case.
Whats intriguing is that this is specific when we try to use drop table (or drop table if exists) query.
Hoping to find resolution for the same.
To my knowledge, The above error could be because of sample class with same package structure ie: 'javax.ws.rs.ext.RuntimeDelegate' found in different JARs issue. Class Objects are created and casted at run time. So there is every possibility the the code responsible for triggering DROP syntax , the above class would be used and broken due as it it is found more than once in the classpath.
I have tried DROP and DROP IF EXISTS in chd5 and was working without issue, below are the details of my run:
first run - Hadoop version - 2.6,Hive 1.1.0 and Spark - 1.3.1 (included hive libraries to spark lib)
second run -Hadoop version - 2.6,Hive 1.1.0 and Spark - 1.6.1
Mode of run - cli
scala> sqlContext.sql("DROP TABLE SAMPLE");
16/08/04 11:31:39 INFO parse.ParseDriver: Parsing command: DROP TABLE SAMPLE
16/08/04 11:31:39 INFO parse.ParseDriver: Parse Completed
......
scala>sqlContext.sql("DROP TABLE IF EXISTS SAMPLE");
16/08/04 11:40:34 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS SAMPLE
16/08/04 11:40:35 INFO parse.ParseDriver: Parse Completed
.....
If Possible please validate DROP commands using a different version of spark lib to narrow down problem scope.
Meanwhile, I am analyzing the jars to find out the linkage where two occurrences of same class 'RuntimeDelegate' existence and will report back to check if the removal of any jar can fix the issue and addition of the jar should recreate the same issue.

Apache Kylin: Intermediate table not found

I’m a newbie for kylin. After installing, I run sample.sh, and then build the cube, but get the wrong message:
java.io.IOException: NoSuchObjectException(message:default.kylin_intermediate_kylin_sales_cube_desc_19700101000000_20160101000000_38b1539f_1f69_406d_89ed_96f3ca776841 table not found)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:101)
at org.apache.kylin.job.hadoop.cube.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:77)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.kylin.job.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: NoSuchObjectException(message:default.kylin_intermediate_kylin_sales_cube_desc_19700101000000_20160101000000_38b1539f_1f69_406d_89ed_96f3ca776841 table not found)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1569)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:106)
at com.sun.proxy.$Proxy49.get_table(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1008)
at org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:191)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
... 13 more
`
Issue 975 suggests kylin.job.hive.database.for.intermediatetable = default. After that, but I also get this error message.
When I run desc formatted kylin_intermediate_kylin_sales_cube_desc_1970... command in hive shell, I can get its formatted info. It demonstrates that this table exists in Hive. Why could not Kylin load this table from Hive?
Kylin version = 1.2
Hive version = 0.13.1-cdh5.3.2
Hbase version = 0.98.6+cdh5.3.2
Hadoop version = 2.5.0-cdh5.3.2
I have solved this problem. The reason is that hive.metastore.uris property is not set in hive-site.xml. Kylin uses HCatalog to read Hive table. HCatalog will use hive.metastore.uris property to create HiveMetaStoreClient and get table meta. This page has detailed explanation.
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
To start metastore using command:
nohup hive --service metastore -p 9083 &

error while executing insert overwrite query in hive

I'm using hadoop 1.2 , hbase 0.94.8 and hive 0.14 . I'am trying to insert data into a hbase table using hive.
I have already created the table:
CREATE TABLE hbase_table_emp(id int, name string, role string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");
and load data into another table that I will overwrite it into the hbase table :
hive> create table testemp(id int, name string, role string) row format delimited fields terminated by '\t';
hive> load data local inpath '/home/user/sample.txt' into table testemp;
Now,I'm trying to overwrite it into hbase table:
When I do :
hive> insert overwrite table hbase_table_emp select * from testemp;
I get this error:
hive> insert overwrite table hbase_table_emp select * from testemp;
Query ID = hduser_20150126005151_ebc2a36f-97c4-41da-b145-32d5732d9681
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.lang.NoClassDefFoundError: org/cliffc/high_scale_lib/Counter
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureJobConf(HBaseStorageHandler.java:470)
at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobConf(PlanUtils.java:856)
at org.apache.hadoop.hive.ql.plan.MapWork.configureJobConf(MapWork.java:544)
at org.apache.hadoop.hive.ql.plan.MapredWork.configureJobConf(MapredWork.java:68)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:370)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ClassNotFoundException: org.cliffc.high_scale_lib.Counter
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 24 more
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org/cliffc/high_scale_lib/Counter
Could someone help me please?
I found a solution to this porblem!
First of all I changed to hadoop2.4 /hbase0.98 /hive 0.14:
in hive-env.sh I did:
export HIVE_AUX_JARS_PATH=/usr/local/hbase/lib
and in the hive shell:
hive> add jar /usr/local/hive/lib/hive-hbase-handler-0.14.0.jar;
hive> add jar /usr/local/hbase/lib/hbase-common-0.98.0-hadoop2.jar;
hive> add jar /usr/local/hbase/lib/zookeeper-3.4.5.jar;
hive> add jar /usr/local/hbase/lib/guava-12.0.1.jar;
hive> add jar /usr/local/hbase/lib/high-scale-lib-1.1.1.jar;
and this worked out for me :)
It sounds like at runtime Hive isn't able to resolve the HBase home directory (and associated libraries). Can you verify that the following environment variables are set (your actual values may vary of course, depending on where Hive and HBase are installed)?
HBASE_IDENT_STRING=hbase
HBASE_CONF_DIR=/etc/hbase/conf
HBASE_HOME=/usr/lib/hbase
HBASE_LOG_DIR=/var/log/hbase
HIVE_HOME=/usr/lib/hive
HBASE_PID_DIR=/var/run/hbase
I'm not certain whether all of the above environmental settings have to be defined for what you are doing to work, but my intuition is that at a minimum HBASE_HOME and HBASE_CONF_DIR must be set.
For reference, the above environment variables are ones that the HDP 2.1 distribution sets for you at Hive runtime. I was able to perform the Hive-to-HBase persistence you were attempting with a minimal dataset, so hopefully the solution lies with those environmental settings.

Load CSV data to HBase using pig or hive

Hi I have created a pig script which loads data into hbase. My csv file is stored into hadoop location at /hbase_tables/zip.csv
Pig Script
register /home/hduser/pig-0.12.0/lib/pig-0.8.0-core.jar;
A = LOAD '/hbase_tables/zip.csv' USING PigStorage(',') as (id:chararray, zip:chararray, desc1:chararray, desc2:chararray, income:chararray);
STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('zip:zip,desc:desc1,desc:desc2,income:income');
when i execute it gives below error
Pig Stack Trace
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:667)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
at org.apache.pig.PigServer.execute(PigServer.java:1190)
at org.apache.pig.PigServer.access$100(PigServer.java:128)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362)
at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:71)
at org.apache.hadoop.fs.Path.<init>(Path.java:45)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:470)
... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
... 23 more
Please let me know how i can import csv data file into hbase or if you have any alternate solution.
Seems like your problem is with "Relative path" in absolute URI: hbase://mydata_logs.
Are you sure the path is correct?
Probably table mydata_logs does not exist. Start: hbase shell and type list. Is your table mydata_logs on the list?
I had the same task once and have fully-working solution (actually, I'm not sure about commas in your third line of the code):
%default hbase_home `echo \$HBASE_HOME`;
%default tmp '/user/alexander/tmp/users_dump/k14'
set zookeeper.znode.parent '/hbase-unsecure';
set hbase.zookeeper.quorum 'dmp-hbase.local';
register $hbase_home/lib/zookeeper-3.4.5.jar;
register $hbase_home/hbase-0.94.20.jar;
UsersHdfs = LOAD '$tmp' using PigStorage('\t', '-schema');
store UsersHdfs into 'hbase://user_test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'id:DEFAULT id:last_modified birth:year gender:female gender:male','-caster HBaseBinaryConverter'
);
That code works for me, maybe the matter is in you hbase configs.
You could provide your .csv file and we could talk about it in more details.

Resources