insert into hbase table using hive (Hadoop) - hadoop

I created this table in hbase using hive successfully :
CREATE TABLE hbase_trades(key string, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "trades");
now I want to insert values in this table! what is the HiveQl query?

Just insert to your hive table(hbase_trades) ,since you have integrated both, the data will be queriable from hbase (trades) table. Hope this helps.

Related

How to add multi-level partition in hive?

I have customer managed table in the hive, partition based on date and customerName. My directory structure is like below:
user/hive/warehouse/test.db/customer/date1=2021-09-16/customerName=xyz
when I am doing show partitions customer it is not giving output. So I tried to add a partition with
MSCK REPAIR TABLE customer;
It give error Execution Error,return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
ALTER TABLE customer ADD PARTITION (date1='2021-09-15') PARTITION (customerName='xyz');
It also give error ValidationFailureSemanticException partition spec {customername=xyz} contain non partition column
How can I add these partitions in hive metastore.
hive> show create table customer;
OK
CREATE TABLE `customer`(
`act` string)
PARTITIONED BY (
`date1` string,
`customername` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'path'='hdfs://hdcluster/user/hive/warehouse/test.db/customer')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://hdcluster/user/hive/warehouse/test.db/customer'
TBLPROPERTIES (
'spark.sql.create.version'='2.4.0',
'spark.sql.partitionProvider'='catalog',
'spark.sql.sources.provider'='orc',
'spark.sql.sources.schema.numPartCols'='2',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'=
'{\"type\":\"struct\",\"fields\":
[{\"name\":\"act\",\"type\":\"string\",\"nullable\":true,\"metadata\":
{}}, {\"name\":\"date1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},
{\"name\":\"customername\",\"type\":\"string\",\"nullable\":true,\"metadata\
":{}}]}','spark.sql.sources.schema.partCol.0'='date1',
'spark.sql.sources.schema.partCol.1'='customername',
'transient_lastDdlTime'='1631781225')

msck repair table not working on unpartitioned table - hive config issue

I have an unpartitioned EXTERNAL table:
CREATE EXTERNAL TABLE `db.tableName`(
`sid` string,
`uid` int,
`t1` timestamp,
`t2` timestamp)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://<db_location>/tableName'
TBLPROPERTIES (
'serialization.null.format'='',
'transient_lastDdlTime'='1551121065')
When I copy the file tableName.csv to s3://db_location/tableName/tableName.csv and then run msck repair table db.tableName, I get the count back as zero.
There are 10 rows in the CSV and I expect to get the count back as 10.
Any help is appreciated.

Difference in create table properties in hive while using ORC serde

Below is the structure of one of the existing hive table.
CREATE TABLE `tablename`(
col1 datatype,
col2 datatype,
col3 datatype)
partitioned by (col3 datatype)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim'='T',
'serialization.format'='T')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'maprfs:/file/location'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1536752440')
Now i want to create a table with same properties, how do i define below properties in create table syntax.
field delimiter and seralization format
TBLPROPERTIES to store numFiles, numRows, radDataSize, totalSize (and what all other information we can store in TBLPROPERTIES option)
Below is one of the create table syntax which i have used
create table test_orc_load (a int, b int) partitioned by (c int) stored as ORC;
Table properties which i got using show create table option.
CREATE TABLE `test_orc_load`(
`a` int,
`b` int)
PARTITIONED BY (
`c` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'maprfs:/user/hive/warehouse/alb_supply_chain.db/test_orc_load'
TBLPROPERTIES (
'transient_lastDdlTime'='1537774167')

Creating external hive table from dynamodb table in sparkSQL 2.2.1

I am having issues to execute the following hive query from sparkSQL 2.2.1. The query works fine in the hive editor in hue. I have loaded the jars (emr-dynamodb-hadoop-4.2.0.jar and emr-dynamodb-hive-4.2.0.jar) for dynamodb.
The problem appear to be in the syntax. I have tried to use this but no luck.
CREATE EXTERNAL TABLE schema.table
(name string, surname string, address string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "dynamodb-table",
"dynamodb.column.mapping" = "name:name,surname:surname,address:address"
);
I was able to run this:
CREATE TABLE IF NOT EXISTS surge.threshold3 (name string, surname string, address string) USING hive
OPTIONS(
INPUTFORMAT 'org.apache.hadoop.dynamodb.read.DynamoDBInputFormat',
OUTPUTFORMAT 'org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat'
)
TBLPROPERTIES (
"dynamodb.table.name" = "dynamodb-table",
"dynamodb.column.mapping" = "name:name,surname:surname,address:address"
)
LOCATION ''
However, I can't find a way to find the location of the dynamodb table. Its quite different from locating hdfs or s3.
Help is much appreciated!

How to create external hive table with complex data types which points to hbase table?

I have a hbase table with Column families (Name, Contact) and columns, Name(String), Age(String), workStreet(String), workCity(String), workState(String).
I want to create an external hive table which points to this hbase table with following columns.
Name(String), Age(String), Address(Struct).
CREATE EXTERNAL TABLE hiveTable(id INT,name STRING, age STRING,
address STRUCT<Street:STRING,City:STRING,State:STRING>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" ="Name:name,Name:age,Contact:workStreet, Contact:workCity, Contact:workState")
TBLPROPERTIES("hbase.table.name" = "hbaseTable");
It ran into the following error,
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 3 elements while hbase.columns.mapping
has 5 elements (counting the key if implicit))
I have tried using Map instead of Struct. Below is the query,
CREATE EXTERNAL TABLE hiveTable(id INT,name STRING,age STRING,
address MAP<String,String>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "Name:name,Name:,Contact:")
TBLPROPERTIES("hbase.table.name" = "hbaseTable");

Resources