How to create external hive table with complex data types which points to hbase table? - hadoop

I have a hbase table with Column families (Name, Contact) and columns, Name(String), Age(String), workStreet(String), workCity(String), workState(String).
I want to create an external hive table which points to this hbase table with following columns.
Name(String), Age(String), Address(Struct).
CREATE EXTERNAL TABLE hiveTable(id INT,name STRING, age STRING,
address STRUCT<Street:STRING,City:STRING,State:STRING>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" ="Name:name,Name:age,Contact:workStreet, Contact:workCity, Contact:workState")
TBLPROPERTIES("hbase.table.name" = "hbaseTable");
It ran into the following error,
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 3 elements while hbase.columns.mapping
has 5 elements (counting the key if implicit))

I have tried using Map instead of Struct. Below is the query,
CREATE EXTERNAL TABLE hiveTable(id INT,name STRING,age STRING,
address MAP<String,String>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "Name:name,Name:,Contact:")
TBLPROPERTIES("hbase.table.name" = "hbaseTable");

Related

How to add multi-level partition in hive?

I have customer managed table in the hive, partition based on date and customerName. My directory structure is like below:
user/hive/warehouse/test.db/customer/date1=2021-09-16/customerName=xyz
when I am doing show partitions customer it is not giving output. So I tried to add a partition with
MSCK REPAIR TABLE customer;
It give error Execution Error,return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
ALTER TABLE customer ADD PARTITION (date1='2021-09-15') PARTITION (customerName='xyz');
It also give error ValidationFailureSemanticException partition spec {customername=xyz} contain non partition column
How can I add these partitions in hive metastore.
hive> show create table customer;
OK
CREATE TABLE `customer`(
`act` string)
PARTITIONED BY (
`date1` string,
`customername` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'path'='hdfs://hdcluster/user/hive/warehouse/test.db/customer')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://hdcluster/user/hive/warehouse/test.db/customer'
TBLPROPERTIES (
'spark.sql.create.version'='2.4.0',
'spark.sql.partitionProvider'='catalog',
'spark.sql.sources.provider'='orc',
'spark.sql.sources.schema.numPartCols'='2',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'=
'{\"type\":\"struct\",\"fields\":
[{\"name\":\"act\",\"type\":\"string\",\"nullable\":true,\"metadata\":
{}}, {\"name\":\"date1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},
{\"name\":\"customername\",\"type\":\"string\",\"nullable\":true,\"metadata\
":{}}]}','spark.sql.sources.schema.partCol.0'='date1',
'spark.sql.sources.schema.partCol.1'='customername',
'transient_lastDdlTime'='1631781225')

Inserting into Hive Table error

I am looking to encode columns of a table in hive.
I tried:
hive> create table encode_test(id int, name STRING, phone STRING, address STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE;
Say i have a CSV file, with following row
100,'navis','010-0000-0000','Seoul Seocho'
Now i tried to use.
LOAD DATA LOCAL INPATH
'/home/path/to/csv/test.csv'
INTO TABLE encode_test;
But when doing Select * from encode_test i am getting all columns NULL
Whereas the result should have been
100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw==
Also i want to give Fields TERMINATED BY ',' IN create table encode_test query.
but i am getting error: EOF error Near Fields
I also tried creating another table sample
create table sample(id int, name STRING, phone STRING, address STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
And then imported the csv file in the sample table. and it was successfully imported.
then i tried using.
insert into encode_test select * from sample;
But i am getting this new error
Permission denied: user=root, access=WRITE, inode="/user":h dfs:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.c heckFsPermission(DefaultAuthorizationProvider.java:279)
I'n new into hadoop
Please refer to this link from where i tried this problem
In Hive DDL, ROW FORMAT SERDE and FIELDS TERMINATED BY cannot co-exist together. Instead you can use, field.delim serde property.
create table encode_test(id int, name STRING, phone STRING, address STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'column.encode.columns'='phone,address',
'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly')
STORED AS TEXTFILE;
And for the PermissionDenied exception, run the hive queries as either hdfs or hive user since root user does not have WRITE access to HDFS.

Import multiple column families from hbase to hive

I am trying to move hbase table having two column family into hive table. I am able to move one column family but how can i move another one in same hive table.
Edit:
I moved one column family ushing below code.
CREATE TABLE hbase_hive(key string, firstname string, age string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “id:firstname,id:age")
TBLPROPERTIES(“hbase.table.name” = “hl”);
but i am having one more column family with name hb and having three columns. How to achive this.
Update:
I also tried adding column name of different column family below is my code.
CREATE TABLE hbase_hive(key string, firstname string, age string, testname string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “id:firstname,id:age,pd:name")
TBLPROPERTIES(“hbase.table.name” = “hl”);
but i am getting below result:
819215975 19391121 625678921720 NULL
819617215 19570622 625116365890 NULL
820333876 19640303 623221670810 NULL
824794938 19531211 625278010070 NULL
828093442 19420803 625284904860 NULL
828905771 19320209 625078004220 NULL
829832017 19630722 625178010070 NULL
Instead of values i am getting null.
Update:
I tried creating hbase table using below command in hbase shell
create ‘hl’,’id’
then i created one more column family using below command
alter ‘hl’,’pd’
In your HiveQL, you select two columns in column family "id" from hbase table "hl" into hive table. If you want to add more columns (even from other column families), you just need to add them to table schema and hbase.columns.mapping. For example:
CREATE TABLE hbase_hive(key string, firstname string, age string, a string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “id:firstname,id:age,hb:a")
TBLPROPERTIES(“hbase.table.name” = “hl”);
see https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-MultipleColumnsandFamilies
I see a couple of issues (more or less serious) with what you wrote:
First of all, I would create an EXTERNAL TABLE instead
You are creating a Hive table with only 3 columns but expecting 4 in the end
You are not explicitly mapping the :key
Your data for 'firstname' and 'age' looks like wild random numbers! :|
I could not test it but the following should be a better starting point:
CREATE EXTERNAL TABLE hbase_hive_hl(key string, firstname string, age string, name string)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,id:firstname,id:age,pd:name")
TBLPROPERTIES(“hbase.table.name” = “hl”);

Unable to load data in Hive partitioned table

I have created a table in Hive with the following query:
create table if not exists employee(CASE_NUMBER String,
CASE_STATUS String,
CASE_RECEIVED_DATE DATE,
DECISION_DATE DATE,
EMPLOYER_NAME STRING,
PREVAILING_WAGE_PER_YEAR BIGINT,
PAID_WAGE_PER_YEAR BIGINT,
order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';
I tried loading data into the create table using below query:
LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP);
For the partitioned table, I have even set following configuration :
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
But I am getting below error while executing the load query:
Your query has the following error(s):
Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [job_title_subgroup, ], values [])
Please help.
If you want to load data into a Hive partition, you have to provide the value of the partition itself in the LOAD DATA query. So in this case, your query would be something like this.
LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");
Where "Value" is the name of the partition in which you are loading your data. The reason is because Hive will use "Value" to create the directory in which your .csv is going to be stored, which will be something like this: .../employee/JOB_TITLE_SUBGROUP=Value. I hope this helps.
Check the documentation for details on the LOAD DATA syntax.
EDITED
Since the table has dynamic partition, one solution would be loading the .csv into an external table (e.g. employee_external) and then execute an INSERT command like this:
INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external
I might be little late to reply but can try below steps:
Set below properties first :
Ø set hive.exec.dynamic.partition.mode=nonstrict;
Ø set hive.exec.dynamic.partition=true;
Create temp table first:
CREATE EXTERNAL TABLE IF NOT EXISTS employee_temp(
ID STRING,
Name STRING,
Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");
Load Data in temporary table:
hive> LOAD DATA INPATH 'filepath/employee.csv' OVERWRITE INTO TABLE employee;
Create Partitioned Table:
CREATE EXTERNAL TABLE IF NOT EXISTS employee_part(
ID STRING,
Name STRING)
PARTITIONED BY (Salary STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
tblproperties ("skip.header.line.count"="1");
Load Data into partitioned table from intermediate / temp table:
INSERT OVERWRITE TABLE employee_part PARTITION (SALARY) SELECT * FROM employee;

insert into hbase table using hive (Hadoop)

I created this table in hbase using hive successfully :
CREATE TABLE hbase_trades(key string, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "trades");
now I want to insert values in this table! what is the HiveQl query?
Just insert to your hive table(hbase_trades) ,since you have integrated both, the data will be queriable from hbase (trades) table. Hope this helps.

Resources