Hive parse and edit array to struct field

Hive parse and edit array to struct field - hadoop

I've a requirement in hive complex data structure which I'm new to. I've tried few things which didn't work out. I'd like to know if there is a solution or I'm looking at a dead end.
Requirement :
Table1 and Table2 are of same create syntax. I want to select all columns from table1 and insert it into table2, where few column values will be modified. For struct field also, I can make it work using named_struct.
But if table1 has array> type, then I'm not sure how to make it work.
eg.,
CREATE TABLE IF NOT EXISTS table1 (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#' ;
CREATE TABLE IF NOT EXISTS table2 (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#' ;
hive> select * from table1 ;
OK
1 [{"x":1,"y":2,"z":3},{"x":4,"y":5,"z":6},{"x":7,"y":8,"z":9}]
2 [{"x":4,"y":5,"z":6},{"x":7,"y":8,"z":9}]
How can I update a struct field in array while inserting. Let's say if structField y is 5, then I want it to be inserted as 0.

For complex type struct you can use Brickhouse UDF.Download the jar and add it in your script.
add jar hdfs://path_where_jars_are_downloaded/brickhouse-0.6.0.jar
Create a collect function.
create temporary function collect_arrayofstructs as 'brickhouse.udf.collect.CollectUDAF';
Query:Replace the y value with 0
select ID, collect_arrayofstructs(
named_struct(
"x", x,
"y", 0,
"z", z,
)) as XYZ
from table1;

Related

HIVE - Cannot partition a table: semantic exception failure

I'm not able to import data on partitioned table in Hive.
Here is how I create the table
CREATE TABLE IF NOT EXISTS title_ratings
(
tconst STRING,
averageRating DOUBLE,
numVotes INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
TBLPROPERTIES("skip.header.line.count"="1");
And then I load the data into it : LOAD DATA INPATH '/title.ratings.tsv.gz' INTO TABLE eval_hive_db.title_ratings;
It works fine till here. Now I want to create a dynamic partitioned table. First of all, I setup theses params:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
I now create my partitioned table:
CREATE TABLE IF NOT EXISTS title_ratings_part
(
tconst STRING,
numVotes INT
)
PARTITIONED BY (averageRating DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\n'
STORED AS TEXTFILE;
insert into title_ratings_part partition(title_ratings) select tconst, averageRating, numVotes from title_ratings;
(I also tried with numVotes instead by the way)
And I receive this error: FAILED: ValidationFailureSemanticException eval_hive_db.title_ratings_part: Partition spec {title_ratings=null} contains non-partition columns
Someone can help me please?
Ideally, I want to partition my table by averageRating (less than 2, between 2 and 4, and greater than 4)

You can run this command to check if there are null values or not.
select count(averageRating) from title_ratings group by averageRating;
Now, if there are null values in this column then you will get the count, which you have to fill then apply partitioning again.

Partition column is stored as last column in a table so while inserting you need to maintain correct order in select statement.
Pls change order of columns in select.
insert into title_ratings_part partition(title_ratings)
Select
Tconst,
numVotes,
averageRating --orderwise this should always be last column
from title_ratings

Row is missing while exploding a clob data in oracle

I am trying to explode below data which contains a clob type, but while clob type is containing null, the complete row is missing out.
SEC is having a CLOB data type containg array data. While SEC is having null, below query is not giving any output .
select
CRE_DATE,
serv,
rg,
id,
hos_id,
state
from table1 a,
JSON_TABLE(a.SEC,'$[*]' COLUMNS (h_id varchar2(256) path '$' null on empty ));

Just use outer apply:
select
CRE_DATE,
serv,
rg,
id,
hos_id,
state
from table1 a
outer apply JSON_TABLE(a.SEC,'$[*]' COLUMNS (h_id varchar2(256) path '$' null on empty ));

How insert overwrite table in hive with diffrent where clauses?

I want to read a .tsv file from Hbase into hive. The file has a columnfamily, which has 3 columns inside: news, social and all. The aim is to store these columns in an table in hbase which has the columns news, social and all.
CREATE EXTERNAL TABLE IF NOT EXISTS topwords_logs (key String,
columnfamily String , wort String , col String, occurance int)ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'STORED AS TEXTFILE LOCATION '/home
/hfu/Testdaten';
load data local inpath '/home/hfu/Testdaten/part-r-00000.tsv' into table topwords_logs;
CREATE TABLE newtopwords (columnall int, columnsocial int , columnnews int) PARTITIONED BY(wort STRING) STORED AS SEQUENCEFILE;
Here i created a external table, which contain the data from hbase. Further on I created a table with the 3 columns.
What i have tried so far is this:
insert overwrite table newtopwords partition(wort)
select occurance ,'1', '1' , wort from topwords_log;
This Code works fine, but i have for each column an extra where clause. How can I insert data like this?
insert overwrite table newtopwords partition(wort)
values(columnall,(select occurance from topwords_logs where col =' all')),(columnnews,( select occurance from topwords_logs where col =' news')) ,(columnsocial,( select occurance from topwords_logs where col =' social')),(wort,(select wort from topwords_log));
This code isnt working ->NoViableAltException.
On every example I just see Code, where they insert data without a Where clause. How can I insert Data with a Where clause?

Excluding the partition field from select queries in Hive

Suppose I have a table definition as follows in Hive(the actual table has around 65 columns):
CREATE EXTERNAL TABLE S.TEST (
COL1 STRING,
COL2 STRING
)
PARTITIONED BY (extract_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\007'
LOCATION 'xxx';
Once the table is created, when I run hive -e "describe s.test", I see extract_date as being one of the columns on the table. Doing a select * from s.test also returns extract_date column values. Is it possible to exclude this virtual(?) column when running select queries in Hive.

Change this property
set hive.support.quoted.identifiers=none;
and run the query as
SELECT `(extract_date)?+.+` FROM <table_name>;
I tested it working fine.

Hive inserting values to an array complex type column

I am unable to append data to tables that contain an array column using insert into statements; the data type is array < varchar(200) >
Using jodbc I am unable to insert values into an array column by values like :
INSERT INTO demo.table (codes) VALUES (['a','b']);
does not recognises the "[" or "{" signs.
Using the array function like ...
INSERT INTO demo.table (codes) VALUES (array('a','b'));
I get the following error using array function:
Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
Tried the workaround...
INSERT into demo.table (codes) select array('a','b');
unsuccessfully:
Failed to recognize predicate '<EOF>'. Failed rule: 'regularBody' in statement
How can I load array data into columns using jdbc ?

My Table has two columns: a STRING, b ARRAY<STRING>.
When I use #Kishore Kumar Suthar's method, I got this:
FAILED: ParseException line 1:33 cannot recognize input near '(' 'a' ',' in statement
But I find another way, and it works for me:
INSERT INTO test.table
SELECT "test1", ARRAY("123", "456", "789")
FROM dummy LIMIT 1;
dummy is any table which has atleast one row.

make a dummy table which has atleast one row.
INSERT INTO demo.table (codes) VALUES (array('a','b')) from dummy limit 1;
hive> select codes demo.table;
OK
["a","b"]
Time taken: 0.088 seconds, Fetched: 1 row(s)

Suppose I have a table employee containing the fields ID and Name.
I create another table employee_address with fields ID and Address. Address is a complex data of type array(string).
Here is how I can insert values into it:
insert into table employee_address select 1, 'Mark', 'Evans', ARRAY('NewYork','11th
avenue') from employee limit 1;
Here the table employee just acts as a dummy table. No data is copied from it. Its schema may not match employee_address. It doesn't matter.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hive parse and edit array to struct field - hadoop

Related

HIVE - Cannot partition a table: semantic exception failure

Row is missing while exploding a clob data in oracle

How insert overwrite table in hive with diffrent where clauses?

Excluding the partition field from select queries in Hive

Hive inserting values to an array complex type column

Categories

Resources