PIG cannot understand hbase table data - hadoop

I'm running hbase(0.94.13) on a single node for my academic project. After loading data into hbase tables, I'm trying to run pig(0.11.1) scripts on the data using HBaseStorage. However this throws an error saying
IllegalArgumentException: Not a host:port pair: �\00\00\00
here is the load command I'm using in Pig
books = LOAD 'hbase://booksdb' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('details:title','-loadKey
true') AS (ID:chararray,title:chararray);
I thought this might be a problem of hbase being a different version in pig than what my machine has. But can't seem to make it work without downgrading my hbase. Any help?

It seems you are trying to submit a pig job remotely
if so you'd need to add a few settings in the pig.properties file (or set setting_name='values' in your script)
hbase.zookeeper.quorum=<node>
hadoop.job.ugi=username,groupname
fs.default.name=hdfs://<node>:port
mapred.job.tracker=hdfs://<node>:port

Related

HBase components doesn't appear in Pentaho Kettle

I am trying to working with Pentaho, in order to build some big data solutions. But the Hadoop HBase components aren't appering in the dashboard. I don't understand why HBase doesn't appear, since HBase is up an running on my machine... I've been seeking for a solutions, but without success...
Please check this property value 'hbase.client.scanner.timeout.period' set to 10 mins in hbase-default.xml to get rid of hbase exceptions.
Check that you have added zookeeper host in the hbase output host in pentaho data integration tool.
Have you read this wiki in order to load hbase data into pentaho.

Is hive, Pig or Impala used from command line only?

I am new to Hadoop and have this confusion. Can you please help?
Q. How Hive, Pig or Impala are used in practical projects? Are they used from command line only or from within Java, Scala etc?
One can use Hive and Pig from the command line, or run scripts written in their language.
Of course it is possible to call(/build) these scripts in any way you like, so you could have a Java program build a pig command on the fly and execute it.
The Hive (and Pig) languages are typically used to talk to a Hive database. Besides this, it is also possible to talk to the hive database via a link (JDBC/ODBC). This could be done directly from anywhere, so you could let a java program make a JDBC connection to talk to your Hive tables.
Within the context of this answer, I belive everything I said about the Hive language also applies to Impala.

Unable to retain HIVE tables

I have set up a single node hadoop cluster on ubuntu.I have installed hadoop 2.6 version on in my machine.
Problem:
Everytime i create HIVE tables and load data into it , i can see the data by querying on it but once i shut-down my hadoop , tables gets wiped out. Is there any way i can retain them or is there any setting i am missing?
I tried some online solution provided , but nothing worked , kindly help me out with this.
Blockquote
Thanks
B
The hive table data is on the hadoop hdfs, hive just add a meta data and provide users sql like commands to prevent them from writing basic MR jobs.So if you shutdown the hadoop cluster,Hive cant find the data in the table.
But if you are saying when you restart the hadoop cluster, the data is lost.That's another problem.
seems you are using default derby as metastore.configure the hive metastore.i am pointing the link.please fallow it.
Hive is not showing tables

How To Refresh/Clear the DistributedCache When Using Hue + Beeswax To Run Hive Queries That Define Custom UDFs?

I've set up a Hadoop cluster (using the Cloudera distro through Cloudera Manager) and I'm running some Hive queries using the Hue interface, which uses Beeswax underneath.
All my queries run fine and I have even successfully deployed a custom UDF.
But, while deploying the UDF, I ran into a very frustrating versioning issue. In the initial version of my UDF class, I used a 3rd party class that was causing a StackOverflowError.
I fixed this error and then verified that the UDF can be deployed and used successfully from the hive command line.
Then, when I went back to using Hue and Beeswax again, I kept getting the same error. I could fix this only by changing my UDF java class name. (From Lower to Lower2).
Now, my question is, what is the proper way to deal with these kind of version issues?
From what I understand, when I add jars using the handy form fields to the left, they get added to the distributed cache. So, how do I refresh/clear the distributed cache? (I couldn't get LIST JARS; etc. to run from within Hive / Beeswax. It gives me a syntax error.)
Since the classes are loaded onto the Beeswax Server JVM (same goes with HiveServer1 and HiveServer2 JVMs), deploying a new version of a jar could often require restarting these service to avoid such class loading issues.

access hbase table fron hadoop mapreduce

I want to access hbase table from hadoop mapreduce and I'm using windowsXP, cygwin, hadoop-0.20.2 and hbase-0.92.0.
I am able to run mapreduce wordcount successfully on 3 pcs and have verfied that hadoop and hbase are working fine. I can also create table from shell.
I have tried many examples but they are not working, for example when I try to compile it using
javac Example.java
it gives error.....
org.apache.hadoop.hbase.client does not exist
org.apache.hadoop.hbase does not exist
org.apache.hadoop.hbase.io does not exist
please can anyone help me in this......
-plz give me some example code to access hbase from hadoop map reduce
-also guide me how should I compile and execute it?
This website has example hbase/Mapreduce code. I haven't tried it, but it looks OK at first glance. Also what distribution of Hadoop/HBase are you using? Apache? Cloudera?
http://kdpeterson.net/blog/2009/09/minimal-hbase-mapreduce-example.html

Resources