Error when loading Hive table data to HBase - hadoop

I'm using CDH 4.4, Hive-0.10.0, and HBase-0.94.6. I created tables in hive using HBaseStorageHandler; I created a table called pokes which has one record, 98. Here's my create table code:
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
I verified the table was created in Hive (hbase_table_1) and in HBase (xyz). However, when I run this command
INSERT OVERWRITE TABLE hbase_table_1
SELECT *
FROM pokes
WHERE foo=98;
I get an error:
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
What am I missing? Please help.

Related

Unable to map the HBase row key in HIve table effectively

I have a HBase table where the rowkey looks like this.
08:516485815:2013 1
06:260070837:2014 1
00:338289200:2014 1
I create a Hive link table using the below query.
create external table hb
(key string,value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,e:-1")
tblproperties("hbase.table.name"="hbaseTable");
When I query the table I get the below result
select * from hb;
08:516485815 1
06:260070837 1
00:338289200 1
This is very strange to me. Why the serde is not able to map the whole content of the HBase key? The hive table is missing everything after the second ':'
Has anybody faced a similar kind of issue?
I tried by recreating your scenario on Hbase 1.1.2 and Hive 1.2.1000,it works as expected and i am able to get the whole rowkey from hive.
hbase> create 'hbaseTable','e'
hbase> put 'hbaseTable','08:516485815:2013','e:-1','1'
hbase> scan 'hbaseTable'
ROW COLUMN+CELL
08:516485815:2013 column=e:-1, timestamp=1519675029451, value=1
1 row(s) in 0.0160 seconds
As i'm having 08:516485815:2013 as rowkey and i have created hive table
hive> create external table hb
(key string,value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,e:-1")
tblproperties("hbase.table.name"="hbaseTable");
hive> select * from hb;
+--------------------+-----------+--+
| hb.key | hb.value |
+--------------------+-----------+--+
| 08:516485815:2013 | 1 |
+--------------------+-----------+--+
Can you once make sure your hbase table rowkey having the data after second :.

Insert into bucketed table produces empty table

I`m trying to do insert into bucketed table. When I run the query everything looks fine and I see in reports some amount of wrote bytes. No any errors in Hive logs also.
But when I look into table I have nothing :(
CREATE TABLE test(
test_date string,
test_id string,
test_title string,)
CLUSTERED BY (
text_date)
INTO 100 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS ORC
LOCATION
'hdfs://myserver/data/hive/databases/test.db/test'
TBLPROPERTIES (
'skip.header.line.count'='1',
'transactional' = 'true')
INSERT INTO test.test
SELECT 'test_date', 'test_id', 'test_title' from test2.green
Result
Ended Job = job_148140234567_254152
Loading data to table test.test
Table test.teststats: [numFiles=100, numRows=1601822, totalSize=9277056, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 6 Reduce: 100 Cumulative CPU: 423.34 sec
HDFS Read: 148450105
HDFS Write: 9282219
SUCCESS
hive> select * from test.test limit 2;
OK
Time taken: 0.124 seconds
hive>
Is this query really working? You have extra comma after in line
test_title string,)
also coulmn text_date isnt in your you column definition. May be you meant test_date?
CLUSTERED BY (text_date)

How to create external hive table with complex data types which points to hbase table?

I have a hbase table with Column families (Name, Contact) and columns, Name(String), Age(String), workStreet(String), workCity(String), workState(String).
I want to create an external hive table which points to this hbase table with following columns.
Name(String), Age(String), Address(Struct).
CREATE EXTERNAL TABLE hiveTable(id INT,name STRING, age STRING,
address STRUCT<Street:STRING,City:STRING,State:STRING>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" ="Name:name,Name:age,Contact:workStreet, Contact:workCity, Contact:workState")
TBLPROPERTIES("hbase.table.name" = "hbaseTable");
It ran into the following error,
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 3 elements while hbase.columns.mapping
has 5 elements (counting the key if implicit))
I have tried using Map instead of Struct. Below is the query,
CREATE EXTERNAL TABLE hiveTable(id INT,name STRING,age STRING,
address MAP<String,String>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "Name:name,Name:,Contact:")
TBLPROPERTIES("hbase.table.name" = "hbaseTable");

Using spark/scala, I use saveAsTextFile() to HDFS, but hiveql("select count(*) from...) return 0

I created external table as follows...
hive -e "create external table temp_db.temp_table (a char(10), b int) PARTITIONED BY (PART_DATE VARCHAR(10)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/work/temp_db/temp_table'"
And I use saveAsTextFile() with scala in IntelliJ IDEA as follows...
itemsRdd.map(_.makeTsv).saveAsTextFile("hdfs://work/temp_db/temp_table/2016/07/19")
So the file(fields terminated by '\t') was in the /work/temp_db/temp_table/2016/07/19.
hadoop fs -ls /work/temp_db/temp_table/2016/07/19/part-00000 <- data file..
But, I checked with hiveql, there are no datas as follows.
hive -e "select count(*) from temp_db.temp_table" -> 0.
hive -e "select * from temp_db.temp_table limit 5" -> 0 rows fetched.
Help me what to do. Thanks.
you are saving at wrong location from spark. Partition dir name follows part_col_name=part_value.
In Spark: save file at directory part_date=2016%2F07%2F19 under temp_table dir
itemsRdd.map(_.makeTsv)
.saveAsTextFile("hdfs://work/temp_db/temp_table/part_date=2016%2F07%2F19")
add partitions: You will need to add partition that should update hive table's metadata (partition dir we have created from spark as hive expected key=value format)
alter table temp_table add partition (PART_DATE='2016/07/19');
[cloudera#quickstart ~]$ hadoop fs -ls /user/hive/warehouse/temp_table/part*|awk '{print $NF}'
/user/hive/warehouse/temp_table/part_date=2016%2F07%2F19/part-00000
/user/hive/warehouse/temp_table/part_date=2016-07-19/part-00000
query partitioned data:
hive> alter table temp_table add partition (PART_DATE='2016/07/19');
OK
Time taken: 0.16 seconds
hive> select * from temp_table where PART_DATE='2016/07/19';
OK
test1 123 2016/07/19
Time taken: 0.219 seconds, Fetched: 1 row(s)
hive> select * from temp_table;
OK
test1 123 2016/07/19
test1 123 2016-07-19
Time taken: 0.199 seconds, Fetched: 2 row(s)
For Everyday process: you can run saprk job like this - just add partitions right after saveAsTextFile(), aslo note the s in alter statement. it is need to pass variable in hive sql from spark:
val format = new java.text.SimpleDateFormat("yyyy/MM/dd")
vat date = format.format(new java.util.Date())
itemsRDD.saveAsTextFile("/user/hive/warehouse/temp_table/part=$date")
val hive = new HiveContext(sc)
hive.sql(s"alter table temp_table add partition (PART_DATE='$date')")
NOTE: Add partition after saving the file or else spark will throw directory already exist exception as hive creates dir (if not exist) when adding partition.

insert into hbase table using hive (Hadoop)

I created this table in hbase using hive successfully :
CREATE TABLE hbase_trades(key string, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "trades");
now I want to insert values in this table! what is the HiveQl query?
Just insert to your hive table(hbase_trades) ,since you have integrated both, the data will be queriable from hbase (trades) table. Hope this helps.

Resources