Not able to indexing to colunm in hbase table using phoenix - hadoop

I have store 3 millions records in hbase table using csv bulk load and try to fetch some sql data using phoenix but I am getting more time to fetch records, So I have created index using phoenix but not able to insert index table record using phoenix map reduce process. I have used following command to insert indexing data.
I am not sure where this hfile store in hdfs. Please help me where this will store or something I did wrong then please help me for this.
sudo -u hdfs hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table BIGSS --index-table BIG_STATUS_INDEXS --output-path STATUS_HFILES
I have attached screenshot which mentioned error which I am facing. Please help for this.
I have used following link for reference.
https://phoenix.apache.org/secondary_indexing.html

Related

Unable to fetch the updated Data in hbase through pig after storing using HbaseStorage function?

I am new to pig Scripting . I am unable to fetch the data which i have stored through pig script into hbase using HbaseStorage(). But when i am trying to fetch the data using phoenix i am able to see the data .
Can anyone help me out ?
The problem Solved guyz . As the timeStamp always changes for a column in Hbase. When i use the Stored Time correctly over there for searching . I was able to fetch the records .
Previously i was using intial row creation timeStamp for searching.

How to integrate manually copied hbase files into an hbase instance?

I have copied the files associated an hbase table into another cluster and stored the files in the hbase folder. I can see the table when i do a list. When I do scan 'myTable' it can't find the table.
When I go through the HBase-WebUI, I see the table including its cf information, when I click on the table I get:
org.apache.hadoop.hbase.TableNotFoundException: hbTable
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1024)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
How do i get the hbaseRegionServers to manage the table?
P.S. For the purposes of this exercise I'm not interested in using the export utility or the copyTable Utility.
Please refer this link : Complete Bulk Load
Assuming here that you copied the Hfiles from one cluster to another.
Did you create the table in the other cluster? I think the error is due to the missing information in the .META

Load Data into Hive from Flat files or existing database

We are setting up Hadoop and Hive in our organization.
Also we will be having the sample data created by data generator tool. The data will be around 1 TB.
My question is - i have to load that data into Hive and Hadoop. What is the process i need to follow for this?
Also we will be having HBase installed with Hadoop.
We need to create the same database design which is right now there in SQL Server..But using Hive. Cz after this data loaded into hive we want to use the Business Objects 4.1 as a front end to create the Reports.
The challage is to load the sample data into the Hive..
Please help me as we want to do all the things asap.
First ingest your data in HDFS
Use Hive external tables, pointing to the location where you ingested the data i.e. your hdfs directory.
You are all set to query the data from the tables you created in Hive.
Good luck.
For the first case you need to put data in hdfs.
Transport your data file(s) to a client node (app node)
put your files en distribute file system (hdfs dfs -put ... )
create an external Table pointing the hdfs directory in which you uploaded those files. Your data have been structure of some way. For instance delimited by semicolon symbol.
Now you can operate over the data with sql queries.
For the second case you can create another hive table (using HBaseStorageHandler , https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration) and load from the first table with Insert statement.
I hope this can help you.

how to create a query in hive for particular data

I am using hive to load a data file and run hadoop mapreduce on it. But I am stuck at create table query. I have a data like this 59.7*, 58.9* where * is just a character. I want to make two columns to store 59.7 & 58.9. Can anyone help on that? Thanks
You can use RegexSerDe to do that. You can visit this page if you need an example.

Basic thing about Hadoop and Hive

I have started working with Hadoop recently. There is table named Checkout that I access through Hive. And below is the path where the data goes to HDFS and other info. So what information I can get if I have to read the below three lines?
Path Size Record Count Date Loaded
/sys/edw/dw_checkout_trans/snapshot/2012/07/04/00 1.13 TB 9,294,245,800 2012-07-05 07:26
/sys/edw/dw_checkout_trans/snapshot/2012/07/03/00 1.13 TB 9,290,477,963 2012-07-04 09:37
/sys/edw/dw_checkout_trans/snapshot/2012/07/02/00 1.12 TB 9,286,199,847 2012-07-03 07:08
So my question is-
1) Firstly, We are loading the data to HDFS and then through Hive I am querying it to get the result back? Right?
2) Secondly, When you look into the above path and other things, the only thing that I am confuse is, when I will be querying using Hive then I will be getting data from all the three paths above? or the most recent one at the top?
As I am new to these stuff, so I am having lot of problem. Can anyone explain me hive gets the data from where? And we store all the data in HDFS and then we use Hive or Pig to get data back from HDFS? And it will be great if some one give high level knowledge of Hadoop and Hive.
I think you need to get the difference between Hive's native table and Hive's external table.
Hive native table mean that you load data into hive, and it takes care how data is stored in the HDFS. We usually do not care what is directory structure in this case.
Hive External table mean that we put data in some directory (if we forget about partitioning for the moment) and tell to Hive - it is table's data. Please treat is as such. And hive enable us to query it, join with other external or regular table. And it is our responsibility to add data, delete it, etc

Resources