I have a strange situation.
a) I have to create a table B from Table A plus some custom columns. For example.
Table B should have few columns of table A and some additional columns(These are static values like NULL,static string and system timestmaps).
b) One column in Table A needs to be split into Two columns in Table B
example: Data in Column X is [A1234, B5678, 0000, 1111]
Table B should have two columns AlphaColumn[A1234, B5678], NumberishColumn[0000, 1111]
The difference is : First letter of the data can be alphabet. Thats the only distinguishing criteria
How can I do this in one query?
You can use a CASE expression and simple string functions:
INSERT INTO table_b(firstname, lastname, alphacolumn, numberishcolumn)
SELECT firstname,
lastname,
CASE
WHEN SUBSTR(employeeid, 1, 1) BETWEEN '0' AND '9'
THEN NULL
ELSE employeeid
END,
CASE
WHEN SUBSTR(employeeid, 1, 1) BETWEEN '0' AND '9'
THEN employeeid
ELSE NULL
END
FROM table_a;
Or, you could create table_b as a VIEW instead of another table.
Related
I created three tables A (id, name, date, realnumber, integer), B (id, name, date, realnumber, integer), and C which is identical to table A. It only has two more columns called integerB and sequence s. I want to create a trigger which would fire after insert on table B for each row input so that it saves the referenced row of Table A and adds integer from input row of table B in column integerB of table C. If the row already exists in Table C only integerB should be added. When it comes to sequence s, next value is added with first insert of row of table A.
Simple explanation: Table C is a copy of table A with two additional columns: integerB and sequence. The point of the trigger is to add new rows from table A without repetition, integerB from table B(integer in table B) and sequence should start with 1 and increment by 1. If the row in table A is repeated then only integerB should be updated.
I did not work with triggers that much, so I am not sure how to solve the problem when I have to insert data from multiple tables. Here is my trigger.
CREATE OR REPLACE TRIGGER trig1
AFTER INSERT ON B
FOR EACH ROW
INSERT INTO C (integerB) VALUES (NEW.integer);
INSERT INTO C (id, name, date, realnumber)
SELECT a.id, a.name, a.date, a.realnumber FROM A a;
END;
/
First off you really need to use better column and table names as a lot of these are reserved words... This makes everything far more complicated than it needs to be.
It isn't entirely clear what you want to do but it seems that if someone was to insert a record with ID = 1 into B then you want to get the values from A for ID = 1 and store them in C along with the integer value inserted into B
In which case you want to use an MERGE statement (UPSERT) in your trigger and something like
CREATE OR REPLACE TRIGGER T1
AFTER INSERT ON B
FOR EACH ROW
BEGIN
MERGE INTO C C
USING (SELECT * FROM A WHERE ID = :NEW.ID) A
ON (C.ID = :NEW.ID)
WHEN MATCHED THEN UPDATE
SET C.INTEGERB = :NEW.INTEGER,
C.SEQUENCE = C.SEQUENCE + 1
WHEN NOT MATCHED THEN
INSERT (ID, NAME, DATE, REALNUMBER, INTEGER, INTEGERB, SEQUENCE)
VALUES (A.ID, A.NAME. A.DATE, A.REALNUMBER, A.INTEGER, :NEW.INTEGER, 0);
END;
/
For sequence this has been set to 0 when a new record is inserted into C and then incremented each time integerB is updated. I am not sure if this is waht you want or not.
You should be able to tweak this to match the exact joins and logic you need.
Tip
Get your SQL statement working with literal values first and then translate it into your trigger. It will be much easier if you can get something working manually first before you attempt to make things more complicated
Could you please help me below query.
Suppose there is table employee and columns A , B and Date column.
I have to load data from table employee to another table emp with below transformation applied
Transformation in Employee table
Absolute value of column A - (column name in emp wil be ABS_A)
Absolute value of column B -(column name in emp wil be ABS_B)
Find the sum(ABS_A) for a given Date column
4.Find the sum(ABS_b) for a given Date column
Find sum(ABS_A)/sum(ABS_B) - column name will be Average.
So the Final table emp will have below columns
1.A
2.B
3.ABS_A
4.ABS_B
5.Average
How to handle such derived column in hive?
I tried below query but now working. could anyone guide me.
insert overwrite into emp
select
A,
B,
ABS(A) as ABS_A,
ABS(B) as ABS_B,
sum(ABS_A) OVER PARTION BY DATE AS sum_OF_A,
sum(ABS_B) OVER PARTTION BY DATE AS sum_of_b,
avg(sum_of_A,sum_of_b) over partition by date as average
from employee
Hive does not support using derived columns in the same subquery level. Use subqueries or functions in place of column aliases.
insert overwrite table emp
select A, B, ABS_A, ABS_B, sum_OF_A, sum_of_b, `date`, sum_OF_A/sum_of_b as average
from
(
select A, B, ABS(A) as ABS_A, ABS(B) as ABS_B, `date`,
sum(ABS(A)) OVER (PARTTION BY DATE) AS sum_OF_A,
sum(ABS(B)) OVER (PARTTION BY DATE) AS sum_of_b
from employee
)s;
I've a requirement in hive complex data structure which I'm new to. I've tried few things which didn't work out. I'd like to know if there is a solution or I'm looking at a dead end.
Requirement :
Table1 and Table2 are of same create syntax. I want to select all columns from table1 and insert it into table2, where few column values will be modified. For struct field also, I can make it work using named_struct.
But if table1 has array> type, then I'm not sure how to make it work.
eg.,
CREATE TABLE IF NOT EXISTS table1 (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#' ;
CREATE TABLE IF NOT EXISTS table2 (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#' ;
hive> select * from table1 ;
OK
1 [{"x":1,"y":2,"z":3},{"x":4,"y":5,"z":6},{"x":7,"y":8,"z":9}]
2 [{"x":4,"y":5,"z":6},{"x":7,"y":8,"z":9}]
How can I update a struct field in array while inserting. Let's say if structField y is 5, then I want it to be inserted as 0.
For complex type struct you can use Brickhouse UDF.Download the jar and add it in your script.
add jar hdfs://path_where_jars_are_downloaded/brickhouse-0.6.0.jar
Create a collect function.
create temporary function collect_arrayofstructs as 'brickhouse.udf.collect.CollectUDAF';
Query:Replace the y value with 0
select ID, collect_arrayofstructs(
named_struct(
"x", x,
"y", 0,
"z", z,
)) as XYZ
from table1;
I have two db2 tables, table one contains rows of constant data with id as one of the columns. second table contains id column which is foreign key to id column in first table and has 3 more varchar columns.
I am trying to insert rows into second table, where entry into id col is based on a where clause and the remaining columns get values from outside.
table1 has columns id, t1c1, t1c2, t1c3
table2 has columns id, t2c1, t2c2, t2c3
to insert into table2, I am trying this query:
insert into table2 values
(select id from table1 where t1c2 like 'xxx', 'abc1','abc2');
I know it is something basic I am missing here.
Please help correcting this query.
I am trying to get a query result in HiveQL with one column as distinct. However the results are not matching . There are almost 20 columns in the table.
create table uniq_us row format delimited fields terminated by ',' lines terminated by '\n' as select distinct(a),b,c,d,e,f,g,h,i,j from ctry_us_join;
The resulting number of Rows :513238
select count(distinct a) from ctry_us_join;
The resulting number of rows : 151616
How is this possible and is something wrong in my first or second query
U need to use subselect with group by statement.
select count(a) from (
select a, count(*) from ctry_us_join group by a) b
This is just one solution for this.
Distinct is a keyword, not a function. It applies to all columns you list in your select clause. It is quite reasonable that your table has only 151,616 distinct values in the column a, but multiple rows with the same value in the column a have different values in other columns. That might give you 513,238 distinct rows.