Function to Perform Complex Transformation in Hive - hadoop

I'm trying to do some transformations in my input flatfile. The real problem that i facing here is that my input file consist of 111 Fields .So how could i do the transformation for these many fields.
I have an Option to use UDF's but how could i pass th ose 111 fields to my UDF! Is that possible i.e, Is there is any way that we could pass the entire fields in my tables to my UDF ?
This is my input file
A|Adding||Testing|DV005| |7425478987|10 | |Jayendran | |Arumugam |V| |MALE|19711028|101 |N|01| |Candy| |1312 WEST 10TH STREET | |AUSTIN |TX| |78703 |840 | |5127768623| |8009238-12345678912|A|B|H|01500|03000|Chocalates |8009238||RAPID 7 LLC |20130501|00000000| |000| | | | | | | | | | | |N |BUS|20150901|20160831|0000000000|0000000001| |8009238-999940185-002348025-CAR|960230702-CAR-002348025-20150901|Y |CAR|20160531|20160730|0000000011|0000001321|8009238-999940185-002348025-TRAIN|960230702-TRAIN-002348025-20150901|N |TRAIN|20150901|20160831|0000000000|0000000000| | |N |VAN|20150901|20160831| |0000000000|0000000000| | | |N |TRUCK|20150101|20991231| | |N |JEEP| | |0000000000|0000000000| | |Y |PLANE|20150901|20160831| |20160319002530000001
Here's my sample output
Testing DV005 JayendranArumugam MALE
CAR2016053120160730
TRAIN0000000000000000
VAN0000000000000000
TRUCK0000000000000000
JEEP0000000000000000
PLANE2015090120160831
Please help me here to find my solution
Thanks in advance
Jay

create external table mytable (rec string)
location '/... put the location here ...'
tblproperties ('serialization.last.column.takes.rest'='true')
;
select explode
(
array
(
concat_ws(' ',f[3],f[4],concat(f[9],f[11]),f[14])
,concat(f[ 67] ,case when f[ 66] = 'Y' then concat(f[ 68] ,f[ 69]) else '0000000000000000' end)
,concat(f[ 75] ,case when f[ 74] = 'Y' then concat(f[ 76] ,f[ 77]) else '0000000000000000' end)
,concat(f[ 83] ,case when f[ 82] = 'Y' then concat(f[ 84] ,f[ 85]) else '0000000000000000' end)
,concat(f[ 93] ,case when f[ 92] = 'Y' then concat(f[ 94] ,f[ 95]) else '0000000000000000' end)
,concat(f[ 99] ,case when f[ 98] = 'Y' then concat(f[100] ,f[101]) else '0000000000000000' end)
,concat(f[107] ,case when f[106] = 'Y' then concat(f[108] ,f[109]) else '0000000000000000' end)
)
)
from (select split(rec,'\\s*\\|\\s*') as f
from mytable
) t
;
+--------------------------------------+
| col |
+--------------------------------------+
| Testing DV005 JayendranArumugam MALE |
| CAR2016053120160730 |
| TRAIN0000000000000000 |
| VAN0000000000000000 |
| TRUCK0000000000000000 |
| JEEP0000000000000000 |
| PLANE2015090120160831 |
+--------------------------------------+

Related

When i select , only one column is checked without duplicates

I have a 2 table like this:
first table
+------------+---------------+--------+
| pk | user_one |user_two|
+------------+---------------+--------+
second table
+------------+---------------+--------+----------------+----------------+
| pk | sender |receiver|fk of firsttable|content |
+------------+---------------+--------+----------------+----------------+
First and second table have one to many(1:N) relations.
There are many records in second table:
| pk | sender|receiver|fk of firsttable|content |
|120 |car224 |car223 |1 |test message1 to 223
|121 |car224 |car223 |1 |test message2 to 223
|122 |car224 |car225 |21 |test message1 to 225
|123 |car224 |car225 |21 |test message2 to 225
|124 |car224 |car225 |21 |test message3 to 225
|125 |car224 |car225 |21 |test message4 to 225
I need to find if fk has the same value and I want the row with the largest pk.
I've changed the above column name to make it easier to understand.
Here is the actual sql I've tried so far:
select *
from (select rownum rn,
mr.mrno,
mr.user_one,
mr.user_two,
m.mno,
m.content
from tbl_messagerelation mr,
tbl_message m
where (mr.user_one = 'car224' or
mr.user_two='car224') and
m.rowid in (select max(rowid)
from tbl_message
group by m.mno) and
rownum <= 1*20)
where rn > (1-1) * 20
And this is the result:
+---------+-------+----------+----------+-------------------------+----------------------+
| rn | mrno | user_one | user_two | mno(pk of second table) | content |
+---------+-------+----------+----------+-------------------------+----------------------+
| 1 | 1 | car224 | car223 | 125 | test message4 to 225 |
| 2 | 21 | car224 | car225 | 125 | test message4 to 225 |
+---------+-------+----------+----------+-------------------------+----------------------+
My desired result is something like this:
+---------+---------+----------+--------------------+----------------------+
| fk | sender | receiver | pk of second table | content |
+---------+---------+----------+--------------------+----------------------+
| 1 | car224 | car223 | 121 | test message2 to 223 |
| 21 | car224 | car223 | 125 | test message4 to 225 |
+---------+---------+----------+--------------------+----------------------+
Your table description when compared to your query is confusing me. However, what I could understand was that you are probably looking for row_number().
An important advice is to use standard explicit JOIN syntax rather than outdated a,b syntax for joins. Join keys were not clear to me and you may replace it appropriately in your final query.
select * from
(
select mr.*, m.*, row_number() over ( partition by m.fk order by m.pk desc ) as rn
from tbl_messagerelation mr join tbl_message m on mr.? = m.?
) where rn =1
Or perhaps you don't need that join at all
select * from
(
select m.*, row_number() over ( partition by m.fk order by m.pk desc ) as rn
from tbl_message m
) where rn =1

Not able to aggregate in case statement in hive query

I have data like below:
SELECT
mtrans.merch_num,
mtrans.card_num
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30;
+-----------+----------------------------+
| merch_num | card_num |
+-----------+----------------------------+
| 1 | 4658XXXXXXXXXXXXXXXXXXURMX |
| 2 | 4658XXXXXXXXXXXXXXXXXXIE6X |
| 2 | 4658XXXXXXXXXXXXXXXXXXDA8X |
| 2 | 4658XXXXXXXXXXXXXXXXXX7D1X |
| 2 | 4658XXXXXXXXXXXXXXXXXXTJ2X |
| 2 | 4658XXXXXXXXXXXXXXXXXXQQWX |
| 2 | 4659XXXXXXXXXXXXXXXXXXY4EX |
| 2 | 4658XXXXXXXXXXXXXXXXXXRDOX |
| 2 | 4658XXXXXXXXXXXXXXXXXX0O3X |
| 2 | 4658XXXXXXXXXXXXXXXXXXNVBX |
+-----------+----------------------------+
I want to aggregate trans_amt by merch_num only if I get unique card_num more than 1.
In simple Query I can do it:
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
SUM(mtrans.trans_amt) AS total_age_less_30_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30
GROUP BY
mtrans.merch_num having count(distinct mtrans.card_num) > 1;
+-----------+---------------+---------------------+
| merch_num | process_month | total_age_less_30_1 |
+-----------+---------------+---------------------+
| 2 | Nov-2017 | 2147.5 |
+-----------+---------------+---------------------+
Here I am able to skip merchant - 5493036 as it doesn't have unique cards more than 1.
But I have multiple conditions in where & want to write 1 query only.
Using case statement I am able to do it like below:
SELECT mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_less_30_1,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num
+-----------+---------------+---------------------+-------------------+
| merch_num | process_month | total_age_less_30_1 | total_age_30_40_1 |
+-----------+---------------+---------------------+-------------------+
| 3 | Nov-2017 | 0 | 0 |
| 4 | Nov-2017 | 0 | 0 |
| 1 | Nov-2017 | 2.49 | 203.68 |
| 2 | Nov-2017 | 2147.5 | 4907 |
| 5 | Nov-2017 | 0 | 0 |
+-----------+---------------+---------------------+-------------------+
I want to make 2.49 as NULL as for that merchant, more than 1 unique card is not present.
I am not able to apply having condition to check if unique card no is more than 1 then only I have to show the sum(trans_amt)
when I apply and condition in case statement, I get below error:
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_less_30_1,
NVL(SUM(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
THEN mtrans.trans_amt ELSE 0 END), NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num;
ERROR: AnalysisException: aggregate function must not contain aggregate parameters: sum(CASE WHEN (round(datediff(mtrans.transaction_date, cdemo.date_birth) / 365) < 30 AND count(DISTINCT mtrans.card_num) > 1) THEN mtrans.trans_amt ELSE 0 END)
Can someone help?
The error seems to be because you have count inside the SUM statement. This is what you must try, Let me know how it goes :
SELECT
mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
NVL(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1)
THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
AS total_age_less_30_1,
NVL(CASE
WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code='P'
GROUP BY
mtrans.merch_num;
I would suggest doing it in a better way as follows.
(PS: I didn't have any hive access, so I am doing this using Postgresql using regular SQL. So, it should be easier to adapt to Hive SQL).
Here is my SQL Table and records inserted in the table.
CREATE TEMPORARY TABLE hivetest (
merchant_id INTEGER,
card_number TEXT,
customer_dob TIMESTAMP,
transaction_dt TIMESTAMP,
transaction_amt DECIMAL
);
INSERT INTO hivetest VALUES
(1, 'A', '1997-12-01', '2017-11-01', 10.0),
(2, 'A', '1997-12-01', '2017-11-01', 11.0),
(2, 'B', '1980-12-01', '2017-11-01', 12.0),
(3, 'A', '1997-12-01', '2017-11-01', 13.0),
(3, 'A', '1997-12-01', '2017-11-01', 14.0),
(4, 'A', '1997-12-01', '2017-11-01', 15.0),
(4, 'C', '1980-12-01', '2017-11-01', 16.0);
First, you need to join the tables and generate a dataset that gives you the transaction_age (transaction_dt - customer_dob). I have most of the data for date subtraction in this single table, but simple INNER JOIN(s) should suffice to achieve this. Anyways, here is the query for the same.
SELECT
merchant_id, card_number, DATE(customer_dob) customer_dob, DATE(transaction_dt) transaction_dt,
DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age,
transaction_amt
FROM hivetest ORDER BY 1;
This results in the data as follows.
+-------------+-------------+--------------+----------------+-----------------+----------------+
| merchant_id | card_number | customer_dob | transaction_dt | transaction_age |transaction_amt |
+-------------+-------------+--------------+----------------+-----------------+----------------+
| 1 | A | 1997-12-01 | 2017-11-01 | 20 | 10.0 |
| 2 | A | 1997-12-01 | 2017-11-01 | 20 | 11.0 |
| 2 | B | 1980-12-01 | 2017-11-01 | 37 | 12.0 |
| 3 | A | 1997-12-01 | 2017-11-01 | 20 | 13.0 |
| 3 | A | 1997-12-01 | 2017-11-01 | 20 | 14.0 |
| 4 | A | 1997-12-01 | 2017-11-01 | 20 | 15.0 |
| 4 | C | 1980-12-01 | 2017-11-01 | 37 | 16.0 |
+-------------+-------------+--------------+----------------+-----------------+----------------+
The above dataset will allow you to categorise the SUM of transaction amounts based on the transaction_age as you want. The trick is to have the above query in a sub-query and use the results of this subquery to categorize. Here is the query to do the same.
SELECT
merchant_id,
-- Transaction Age less than 30
SUM(CASE WHEN transaction_age <= 30 THEN 1 ELSE 0 END) count_30,
SUM(CASE WHEN transaction_age <= 30 THEN transaction_amt ELSE 0 END) sum_30,
-- Transaction Age between 30 and 40
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN 1 ELSE 0 END) case_30_40,
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN transaction_amt ELSE 0 END) sum_30_40
FROM
(
SELECT
merchant_id, transaction_amt,
DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age
FROM hivetest
) m
GROUP BY merchant_id ORDER BY 1;
This results in the categorised output as below which gives you the count of transactions and sum of transaction amounts for each category for each merchant:
+-------------+----------+--------+------------+-----------+
| merchant_id | count_30 | sum_30 | case_30_40 | sum_30_40 |
+-------------+----------+--------+------------+-----------+
| 1 | 1 | 10.0 | 0 | 0 |
| 2 | 1 | 11.0 | 1 | 12.0 |
| 3 | 2 | 27.0 | 0 | 0 |
| 4 | 1 | 15.0 | 1 | 16.0 |
+-------------+----------+--------+------------+-----------+
Now, this is our dataset which is more or less the final result. However, as per your requirement, you are only interested in merchants which have more than 1 unique cards (COUNT(DISTINCT card_number) > 1).
So, lets write another query which gives us this. Below is the query which calculates this and based on the criteria, it marks the flag as TRUE or FALSE indicating whether or not we are interested in that merchant or not.
SELECT
merchant_id,
CASE
WHEN COUNT(DISTINCT card_number) > 1 THEN
TRUE
ELSE
FALSE
END has_distinct_cards_gt_1
FROM hivetest GROUP BY merchant_id ORDER BY 1
This gives the output as below.
+-------------+-------------------------+
| merchant_id | has_distinct_cards_gt_1 |
+-------------+-------------------------+
| 1 | false |
| 2 | true |
| 3 | false |
| 4 | true |
+-------------+-------------------------+
Now, we are almost done. We just need to join these two tables and then based on the has_distinct_cards_gt_1, display the columns accordingly from the dataset generated previously.
Here is the final join query and resultset data generated.
SELECT
merchants_all.merchant_id,
-- Age < 30
CASE
WHEN merchants_cards.has_distinct_cards_gt_1 THEN
sum_30
ELSE
0
END total_sum_30,
-- Age in 30 and 40
CASE
WHEN merchants_cards.has_distinct_cards_gt_1 THEN
sum_30_40
ELSE
0
END total_sum_30_40
FROM
(
SELECT
merchant_id,
SUM(CASE WHEN transaction_age <= 30 THEN transaction_amt ELSE 0 END) sum_30,
SUM(CASE WHEN transaction_age > 30 AND transaction_age <= 40 THEN transaction_amt ELSE 0 END) sum_30_40
FROM
(
SELECT merchant_id, DATE_PART('year', DATE(transaction_dt)) - DATE_PART('year', DATE(customer_dob)) transaction_age, transaction_amt
FROM hivetest
) m
GROUP BY merchant_id
) merchants_all
JOIN
(
SELECT merchant_id, CASE WHEN COUNT(DISTINCT card_number) > 1 THEN TRUE ELSE FALSE END has_distinct_cards_gt_1
FROM hivetest GROUP BY merchant_id ORDER BY 1
) merchants_cards
ON
(merchants_all.merchant_id = merchants_cards.merchant_id);
And this generates your final data, which you need.
+-------------+--------------+-----------------+
| merchant_id | total_sum_30 | total_sum_30_40 |
+-------------+--------------+-----------------+
| 1 | 0 | 0 |
| 2 | 11.0 | 12.0 |
| 3 | 0 | 0 |
| 4 | 15.0 | 16.0 |
+-------------+--------------+-----------------+
Let me know if this helps.
COUNT inside SUM is the problem.
Here is a solution. I haven't tested it though.
It's not obvious which table person_org_code belongs to. If it is in merch_trans_daily, then add person_org_code = 'P' to the where clause in the view. Let's know whether it works!
WITH mtrans_count AS
(SELECT merch_num,
COUNT(1) AS cnt
FROM a_sbp_db.merch_trans_daily
WHERE mtrans.transaction_date LIKE '2017-09%'
)
SELECT mtrans.merch_num
,FROM_UNIXTIME(UNIX_TIMESTAMP(), 'MMM-yyyy') AS process_month
,NVL(SUM(CASE
WHEN (
ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) < 30
AND mtrans_count.cnt > 1
)
THEN mtrans.trans_amt
ELSE 0
END), NULL) AS total_age_less_30_1
,NVL(SUM(CASE
WHEN (
ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) >= 30
AND ROUND(DATEDIFF(mtrans.transaction_date, cdemo.date_birth) / 365) < 40
AND mtrans_count.cnt > 1
)
THEN mtrans.trans_amt
ELSE 0
END), NULL) AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
INNER JOIN mtrans_count ON mtrans_count.merch_num = mtrans.merch_num
WHERE mtrans.transaction_date LIKE '2017-09%'
AND person_org_code = 'P'
GROUP BY mtrans.merch_num;

Update in oracle with joining two table

I have these two tables below, I need to update Table1.Active_flag to Y, where Table2.Reprocess_Flag is N.
Table1
+--------+--------------+--------------+--------------+-------------+
| Source | Subject_area | Source_table | Target_table | Active_flag |
+--------+--------------+--------------+--------------+-------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | N |
| d | PRODUCT | PD_PD1 | PD_PD1 | N |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+-------------+
Table2
| Source | Subject_area | Source_table | Target_table | Reprocess_Flag |
+--------+--------------+--------------+--------------+----------------+
| a | CUSTOMER | ADS_SALES | ADS_SALES | N |
| b | CUSTOMER | ADS_PROD | ADS_PROD | N |
| CDW | SALES | CD_SALES | CD_SALES | N |
| c | PRODUCT | PD_PRODUCT | PD_PRODUCT | Y |
| d | PRODUCT | PD_PD1 | PD_PD1 | Y |
| e | ad | IR_PLNK | IR_PLNK | N |
+--------+--------------+--------------+--------------+----------------+
Use all three columns in a single select statement.
UPDATE hdfs_cntrl SET active_flag = 'Y'
where (source,subject_area ,source_table ) in ( select source,subject_area ,source_table from proc_cntrl where Reprocess_Flag = 'N');
Updating one table based on data in another table is almost always best done with the MERGE statement.
Assuming source is a unique key in table2:
merge into table1 t1
using table2 t2
on (t1.source = t2.source)
when matched
then update set t1.active_flag = 'Y'
where t2.reprocess_flag = 'N'
;
If you are not familiar with the MERGE statement, read about it - it's just as easy to learn as UPDATE and INSERT and DELETE, it can do all three types of operations in a single statement, it is much more flexible and, in some cases, more efficient (faster).
merge into table1 t1
using table2 t2
on (t1.sorce=t2.source and t1.Subject_area = t2.Subject_area and t1.Source_table = t2.Source_table and t1.Target_table = t2.Target_table and t2.flag_status = 'N')
when matched then update set
t1.flag = 'Y';
UPDATE hdfs_cntrl SET active_flag = 'Y' where source in ( select source from proc_cntrl where Reprocess_Flag = 'N') and subject_area in (select subject_area from proc_cntrl where Reprocess_Flag = 'N') and source_table in (select target_table from proc_cntrl where Reprocess_Flag = 'N')

Oracle: How to use pivot muilti column?

I want Pivot multi column. What use oracle pivot table?
SQL:
SELECT * FROM
(
SELECT *
FROM IRO_SIM A
WHERE A.COM_CODE = 'AAQ'
AND A.PCODE = 'AKIOP'
)
PIVOT
(
LISTAGG(SIMTYPE,',')
WITHIN GROUP (ORDER BY SIMTYPE)
FOR SIMTYPE IN ('H','V')
)
Sample Data:
COM_CODE | PCODE | L_VALUE | A_SIM | AMT_SIM | SIMTYPE
A | AKIOP | 1700 | TOTAL | 50 | H
A | AKIOP | 500 | EACH | 100 | V
A | BHUIO | 200 | TOTAL | 500 | H
A | BHUIO | 600 | TOTAL | 400 | V
i need Result:
COM_CODE | PCODE | H_VALUE | H_ASIM | H_AMTSIM | V_VALUE | V_ASIM | V_AMTSIM
A | AKIOP | 1700 | TOTAL | 50 | 500 | EACH | 100
A | BHUIO | 200 | TOTAL | 500 | 600 | TOTAL | 400
thanks advance :)
Just list the multiple columns. Every expression in your PIVOT clause will be matched with every value in the FOR clause. So, what you want is this:
SELECT * FROM d
PIVOT ( sum(l_value) as value, max(a_sim) as asim, sum(amt_sim) as amtsim
FOR simtype in ('H' AS "H", 'V' AS "V") )
With data...
with d as (
SELECT 'A' com_code, 'AKIOP' pcode, 1700 l_value, 'TOTAL' a_sim, 50 amt_sim, 'H' simtype FROM DUAL UNION ALL
SELECT 'A' com_code, 'AKIOP' pcode, 500 l_value, 'EACH' a_sim, 100 amt_sim, 'V' simtype FROM DUAL UNION ALL
SELECT 'A' com_code, 'BHUIO' pcode, 200 l_value, 'TOTAL' a_sim, 500 amt_sim, 'H' simtype FROM DUAL UNION ALL
SELECT 'A' com_code, 'BHUIO' pcode, 600 l_value, 'TOTAL' a_sim, 400 amt_sim, 'V' simtype FROM DUAL)
SELECT * FROM d
PIVOT ( sum(l_value) as value, max(a_sim) as asim, sum(amt_sim) as amtsim
FOR simtype in ('H' AS "H", 'V' AS "V") )

Oracle 11g hierarchical query needs some inherited data

table looks kind of like:
create table taco (
taco_id int primary key not null,
taco_name varchar(255),
taco_prntid int,
meat_id int,
meat_inht char(1) -- inherit meat
)
data looks like:
insert into taco values (1, '1', null, 1, 'N');
insert into taco values (2, '1.1', 1, null, 'Y');
insert into taco values (3, '1.1.1', 2, null, 'N');
insert into taco values (4, '1.2', 1, 2, 'N');
insert into taco values (5, '1.2.1', 4, null, 'Y');
insert into taco values (6, '1.1.2', 2, null, 'Y');
or...
- 1 has a meat_id=1
- 1.1 has a meat_id=1 because it inherits from its parent via taco_prntid=1
- 1.1.1 has a meat_id of null because it does NOT inherit from its parent
- 1.2 has a meat_id=2 and it does not inherit from its parent
- 1.2.1 has a meat_id=2 because it does inherit from its parent via taco_prntid=4
- 1.1.2 has a meat_id=1 because it does inherit from its parent via taco_prntid=2
Now... how in the world do I query what the meat_id is for each taco_id? What is below did work until I realized that I wasn't using the inheritance flag and some of my data was messing up.
select x.taco_id,
x.taco_name,
to_number(substr(meat_id,instr(rtrim(meat_id), ' ', -1)+1)) as meat_id
from ( select taco_id,
taco_name,
level-1 "level",
sys_connect_by_path(meat_id, ' ') meat_id
from taco
start with taco_prntid is null
connect by prior taco_id = taco_prntid
) x
I can post some failed attempts to modify my query above but they're rather embarrassing failures. I haven't worked with hierarchical queries at all before beyond the basics so I'm hoping there is some keyword or concept I'm not aware I should be searching for.
I posted an answer myself down at the bottom to show what I ended up with ultimately. I'm leaving the other answer as accepted because they were able to make the data more clear for me and without it, I wouldn't have gotten anywhere.
Your inner query is correct. All you need is to pick only the rightmost number from the meat_id column of inner query, when flag is Y.
I have used REGEXP_SUBSTR function to get the rightmost number and CASE statement to check the flag.
SQL Fiddle
Query 1:
select taco_id,
taco_name,
taco_prntid,
case meat_inht
when 'N' then meat_id
when 'Y' then to_number(regexp_substr(meat_id2,'\d+\s*$'))
end meat_id,
meat_inht
from ( select taco_id,
taco_name,
taco_prntid,
meat_id,
meat_inht,
level-1 "level",
sys_connect_by_path(meat_id, ' ') meat_id2
from taco
start with taco_prntid is null
connect by prior taco_id = taco_prntid
)
order by 1
Results:
| TACO_ID | TACO_NAME | TACO_PRNTID | MEAT_ID | MEAT_INHT |
|---------|-----------|-------------|---------|-----------|
| 1 | 1 | (null) | 1 | N |
| 2 | 1.1 | 1 | 1 | Y |
| 3 | 1.1.1 | 2 | (null) | N |
| 4 | 1.2 | 1 | 2 | N |
| 5 | 1.2.1 | 4 | 2 | Y |
| 6 | 1.1.2 | 2 | 1 | Y |
Query 2:
select taco_id,
taco_name,
taco_prntid,
meat_id,
meat_inht,
level-1 "level",
sys_connect_by_path(meat_id, ' ') meat_id2
from taco
start with taco_prntid is null
connect by prior taco_id = taco_prntid
Results:
| TACO_ID | TACO_NAME | TACO_PRNTID | MEAT_ID | MEAT_INHT | LEVEL | MEAT_ID2 |
|---------|-----------|-------------|---------|-----------|-------|----------|
| 1 | 1 | (null) | 1 | N | 0 | 1 |
| 2 | 1.1 | 1 | (null) | Y | 1 | 1 |
| 3 | 1.1.1 | 2 | (null) | N | 2 | 1 |
| 6 | 1.1.2 | 2 | (null) | Y | 2 | 1 |
| 4 | 1.2 | 1 | 2 | N | 1 | 1 2 |
| 5 | 1.2.1 | 4 | (null) | Y | 2 | 1 2 |
This is what I've ended up with so far... after applying the logic in the accepted answer. I added a few more things so that I can join the result up against my meat table. the upper case could be optimized a little bit but I am so over this part of the query so.... it's going to have to stay for now.
select x.taco_id,
x.taco_name,
x.taco_prntname,
meat_id
,case when to_number(regexp_substr(meat_id,'\d+\s*$'))=0 then null else
to_number(regexp_substr(meat_id,'\d+\s*$')) end as meat_id
from ( select taco_id,
taco_name,
taco_prntname,
level-1 "level",
sys_connect_by_path(
case when meat_inht='N' then nvl(to_char(meat_id),'0') else '' end
,' ') meat_id
from taco join jobdtl on jobdtl.jobdtl_id=taco.jobdtl_id
start with taco_prntid is null
connect by prior taco_id = taco_prntid
) x
(do you ever wonder, when you read questions like this, what the real schema is? obviously I am not working on a taco project. or does it even matter as long as the general relationships and concept is preserved?)

Resources