Hive date format matching - hadoop

How can i match particular date format in hive query, as i have to get those rows having date format other than max of rows.
Eg. My max of rows have date format as MM/dd/yyyy and i have to list all rows other than above format
+----------------------------+--------------------------+--------------------+-----------+
| AllocationActBankAccountID | GiftCardActBankAccountID | UpdateTimeStampUtc | Date |
+----------------------------+--------------------------+--------------------+-----------+
| 14 | 14 | 41:39.8 | 4/19/2016 |
| 14 | 14 | 37:16.4 | 4/20/2016 |
| 14 | 14 | 52:15.2 | 4/21/2016 |
| 14 | 14 | 52:15.2 | 2/11/2019 |
| 14 | 14 | 52:15.2 | 12-Feb-19 |*
| 14 | 14 | 41:39.8 | 2/13/2019 |
+----------------------------+--------------------------+--------------------+-----------+
I want to get * marked data (Date = 12-Feb-19)

select *
from mytable
Where date not rlike '^([1-9]|1[0-2])/([1-9]|[1-2][0-9]|3[0-1])/(19|20)\\d{2}$'
or
select *
from mytable
Where not
(
date rlike '^\\d{1,2}/\\d{1,2}/\\d{4}$'
and cast(split (date,'/')[0] as int) between 1 and 12
and cast(split (date,'/')[1] as int) between 1 and 31
and cast(split (date,'/')[2] as int) between 1900 and 2099
)
or
select date
from mytable
Where coalesce(from_unixtime(to_unix_timestamp(date,'M/d/y')),'0000-01-01')
not between date '1900-01-01' and date '2099-12-31'

Related

Performance drop returning cursor with union all

I'm facing unsolvable and impossible performace drop while using UNION ALL with two sub-queries in one cursor (at least I think that's the problem). PL/SQL Developer just freezes when opening cursor results in test window.
If I turn off no matter which sub-query - everything works fine.
If I take the whole query out of cursor to regular SQL Query windows - everything is okay without any need to turn off some parts.
Procedure structure is down below, looking forward any help:
procedure p_proc(p_param varchar2,
outcur out sys_refcursor) is
begin
open outcur for
select *
from (select -- visible cols
si.item_full_name
, si.final_price
, si.full_price
, si.receipt_num
, si.receipt_date
, si.vendor_code
, case when det.br_summary is null and mr.motiv_rate_value is not null then mr.motiv_rate_value
when det.br_summary is not null then det.br_summary
end personal_bonus_amount
, case when det.br_summary is null and mr.motiv_rate_value is not null then 1
when det.br_summary is not null then det.cross_sale_kt
end personal_bonus_koeff
-- service cols
, case when det.br_summary is null and mr.motiv_rate_value is not null then 'approximate'
when det.br_summary is not null then 'definite'
end personal_bonus_type
, coalesce(det.sale_stream, mr.sale_stream, 'Not defined') item_group_name
, si.operation_type
, si.src
-- pagination
, row_number() over (order by si.receipt_date desc) rn
from (-- curr day
select b.cost final_price
, case when b.discount = 0 then null else b.price
end full_price
, b.doc_number receipt_num
, b.receipt_date receipt_date
, i.item_code vendor_code
, i.full_name item_full_name
, b.subsite code_op
, b.operator_id
, to_char(b.businessday, 'yyyymm') sale_period
, b.oper_type operation_type
, 'bill' src
from scheme.bills b
join scheme.items i on i.item_code = b.item
where b.businessday = trunc(p_date_to)
and b.subsite = p_office_id
and b.operator_id = p_emp_id
union all
-- prev days
select l.txn_amount final_price
, case when l.disc = 0 then null else l.price
end full_price
, t.receipt_num receipt_num
, t.ts receipt_date
, i.item_code vendor_code
, i.full_name item_full_name
, s.office_code code_op
, e.emp_code operator_id
, to_char(l.dt,'yyyymm') sale_period
, l.txn_type operation_type
, 'txn' src
from scheme.txn t
join scheme.txn_lines l on t.rtl_txn_id = l.rtl_txn_id
join scheme.items i on l.item_id = i.item_id
join scheme.offices s on t.subsite_id = s.subsite_id
join scheme.employees e on t.employee_id = e.employee_id
where t.ts between trunc(p_date_from) and trunc(p_date_to)
and t.subsite_id = v_op_id
and t.employee_id = v_emp_id
) si
/* fact */
left join scheme.sales_details det on si.sale_period = det.period
and si.code_op = det.op_code
and ltrim(si.operator_id,'0') = ltrim(det.tab_num,'0')
and si.receipt_num = det.rcpt_num
and si.vendor_code = det.item_article
/* prognosis */
left join scheme.rates mr on si.sale_period = mr.motiv_rate_period
and si.code_op = mr.code_op
and si.vendor_code = mr.code_1c
where 1 = 1
and si.final_price between nvl(p_price_from, si.final_price) and nvl(p_price_to, si.final_price)
/* if no filters */
and (item_group_cnt = 0 or coalesce(det.sale_stream, mr.sale_stream, 'Not defined') in (select * from table(p_item_group)))
and si.receipt_num = nvl(p_receipt_num, si.receipt_num)
)
where rn between p_page_num * p_page_size + 1 and (p_page_num + 1) * p_page_size;
end;
UPD Explain plan for the whole query used in a cursor:
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 32810 | 62 | 00:00:01 |
| * 1 | VIEW | | 10 | 32810 | 62 | 00:00:01 |
| * 2 | WINDOW SORT PUSHED RANK | | 2 | 2956 | 62 | 00:00:01 |
| 3 | NESTED LOOPS OUTER | | 2 | 2956 | 61 | 00:00:01 |
| 4 | NESTED LOOPS OUTER | | 2 | 2826 | 53 | 00:00:01 |
| 5 | VIEW | | 2 | 2728 | 46 | 00:00:01 |
| 6 | UNION-ALL | | | | | |
| 7 | NESTED LOOPS | | 1 | 138 | 32 | 00:00:01 |
| 8 | NESTED LOOPS | | 1 | 138 | 32 | 00:00:01 |
| 9 | PARTITION RANGE SINGLE | | 1 | 66 | 29 | 00:00:01 |
| * 10 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | F003_BILL | 1 | 66 | 29 | 00:00:01 |
| * 11 | INDEX RANGE SCAN | IX_SUBSITE_DOCNUM_BUSINDAY_SEQ | 1 | | 5 | 00:00:01 |
| * 12 | INDEX RANGE SCAN | IX_D001_CODE_1C_ITEM_ID | 1 | | 2 | 00:00:01 |
| 13 | TABLE ACCESS BY INDEX ROWID | D001_ITEM | 1 | 72 | 3 | 00:00:01 |
| 14 | NESTED LOOPS | | 1 | 183 | 14 | 00:00:01 |
| 15 | NESTED LOOPS | | 1 | 183 | 14 | 00:00:01 |
| 16 | NESTED LOOPS | | 1 | 104 | 12 | 00:00:01 |
| 17 | NESTED LOOPS | | 1 | 70 | 7 | 00:00:01 |
| 18 | NESTED LOOPS | | 1 | 30 | 4 | 00:00:01 |
| 19 | TABLE ACCESS BY INDEX ROWID | D005_EMPLOYEE | 1 | 18 | 3 | 00:00:01 |
| * 20 | INDEX UNIQUE SCAN | PK_D005 | 1 | | 2 | 00:00:01 |
| 21 | TABLE ACCESS BY INDEX ROWID | D018_SUBSITE | 1 | 12 | 1 | 00:00:01 |
| * 22 | INDEX UNIQUE SCAN | PK_D018 | 1 | | 0 | 00:00:01 |
| 23 | PARTITION RANGE ITERATOR | | 1 | 40 | 3 | 00:00:01 |
| 24 | PARTITION HASH SINGLE | | 1 | 40 | 3 | 00:00:01 |
| * 25 | TABLE ACCESS FULL | F007_RTL_TXN | 1 | 40 | 3 | 00:00:01 |
| * 26 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | F008_RTL_TXN_LI | 1 | 34 | 5 | 00:00:01 |
| * 27 | INDEX RANGE SCAN | IX_F008_RTL_TXN_ID | 7 | | 3 | 00:00:01 |
| * 28 | INDEX UNIQUE SCAN | PK_D001 | 1 | | 1 | 00:00:01 |
| 29 | TABLE ACCESS BY INDEX ROWID | D001_ITEM | 1 | 79 | 2 | 00:00:01 |
| * 30 | TABLE ACCESS BY INDEX ROWID BATCHED | T_OP_MOTIVATION_RATE_MYRTK | 1 | 49 | 7 | 00:00:01 |
| * 31 | INDEX RANGE SCAN | IDX02_CODE_OP_1C | 3 | | 3 | 00:00:01 |
| * 32 | TABLE ACCESS BY INDEX ROWID BATCHED | DET_SALES_PPT_DWH | 1 | 65 | 4 | 00:00:01 |
| * 33 | INDEX RANGE SCAN | IDX_03_RCPT_NUM | 3 | | 2 | 00:00:01 |
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("RN">=1 AND "RN"<=10)
* 2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("SI"."RECEIPT_DATE") DESC )<=10)
* 10 - filter("F003"."OPERATOR_ID"='000189513' AND "F003"."COST">=TO_NUMBER(TO_CHAR("F003"."COST")) AND "F003"."COST"<=TO_NUMBER(TO_CHAR("F003"."COST")))
* 11 - access("F003"."SUBSITE"='S165' AND "F003"."BUSINESSDAY"=TO_DATE(' 2021-11-23 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
* 11 - filter("F003"."BUSINESSDAY"=TO_DATE(' 2021-11-23 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "F003"."DOC_NUMBER" IS NOT NULL)
* 12 - access("I"."D001_CODE_1C"="F003"."ITEM")
* 12 - filter("I"."D001_CODE_1C" IS NOT NULL)
* 20 - access("E"."EMPLOYEE_ID"=3561503543)
* 22 - access("S"."SUBSITE_ID"=29260)
* 25 - filter("T"."EMPLOYEE_ID"=3561503543 AND "T"."SUBSITE_ID"=29260 AND "T"."F007_TS"<=TO_DATE(' 2021-11-23 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "T"."F007_RCPT_NUM_1C" IS NOT NULL)
* 26 - filter("L"."F008_AMOUNT">=TO_NUMBER(TO_CHAR("L"."F008_AMOUNT")) AND "L"."F008_AMOUNT"<=TO_NUMBER(TO_CHAR("L"."F008_AMOUNT")))
* 27 - access("T"."RTL_TXN_ID"="L"."RTL_TXN_ID")
* 28 - access("L"."ITEM_ID"="I"."ITEM_ID")
* 30 - filter("SI"."SALE_PERIOD"="MR"."MOTIV_RATE_PERIOD"(+))
* 31 - access("SI"."CODE_OP"="MR"."CODE_OP"(+) AND "SI"."VENDOR_CODE"="MR"."CODE_1C"(+))
* 32 - filter("SI"."CODE_OP"="DET"."OP_CODE"(+) AND "SI"."VENDOR_CODE"="DET"."ITEM_ARTICLE"(+) AND "DET"."ITEM_ARTICLE"(+) IS NOT NULL AND "DET"."PERIOD"(+)=TO_NUMBER("SI"."SALE_PERIOD") AND
LTRIM("SI"."OPERATOR_ID",'0')=LTRIM("DET"."TAB_NUM_RTK"(+),'0'))
* 33 - access("SI"."RECEIPT_NUM"="DET"."RCPT_NUM"(+))
* 33 - filter("DET"."RCPT_NUM"(+) IS NOT NULL)
Actual solution
Managed to get procedure execution plan from DBA. The problem was that optimizer chose another index for joining scheme.sales_details table when executing query inside the procedure. Added INDEX HINT with the same index which was used in regular query and everything works just fine.
Deprecated ideas down below
As far as I understood the problem is in Oracle optimizer which "thought" that doing UNION ALL first is better than pushing predicate into the sub-query. Separating this union into two single queries make him push pred without any hesitations.
Probably this can be fixed by playing with hints, that's wip for now.
Temporary workaround is to regroup the query, going from this structure
select *
from (select row_number() rn
, u.*
from (select *
from first_query
union all
select *
from second_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
)
where rn between c and d;
to this
select *
from (select row_number() rn
, u.*
from (select *
from first_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
union all
select row_number() rn
, u.*
from (select *
from second_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
)
where rn between c and d;
That's not the perfect solution cause it doubles the JOIN section but at least it works.

How do I fetch data from oracle table for interval 5 hour

This question is for database oracle, I have table A which have data for one week for every hour like below
----------------------------
NAME | CODE | INSERTDATE (Timestamp- datatype)
----------------------------
YYY | Y | 11/8/2017 1:32:22.000000000 PM
zzz | Z | 11/8/2017 2:32:22.000000000 PM
aaa | A | 11/8/2017 3:32:22.000000000 PM
bbb | B | 11/8/2017 4:32:22.000000000 PM
ccc | C | 11/8/2017 5:32:22.000000000 PM
SSS | S | 11/8/2017 6:32:22.000000000 PM
... | . | ..............................
... | . | ..............................
... | . | ..............................
RRR | R | 11/8/2017 11:32:22.000000000 PM
table have data for duration of one week and I am looking to select every record after 5 hour. (so total records are 168 and I need output 168/5= 33 or 34 records) like below
For Ex
table A
----------------------------
NAME | CODE | INSERTDATE (Timestamp- datatype)
----------------------------
YYY | Y | 11/8/2017 1:32:22.000000000 PM
SSS | S | 11/8/2017 6:32:22.000000000 PM
RRR | R | 11/8/2017 11:32:22.000000000 PM
... | . | ..............................
... | . | ..............................
... | . | ..............................
Please suggest how can I get this.
Thanks
For the datatype timestamp the syntax of adding time is + INTERVAL '5' HOUR or just use the minus symbol to subtract. It may be used anywhere you need such a calculation as the following example shows:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Table1
(NAME varchar2(3), CODE varchar2(1), INSERTDATE timestamp)
;
INSERT ALL
INTO Table1 ("NAME", "CODE", "INSERTDATE")
VALUES ('YYY', 'Y', '08-Nov-2017 01:32:22 PM')
INTO Table1 ("NAME", "CODE", "INSERTDATE")
VALUES ('SSS', 'S', '08-Nov-2017 06:32:22 PM')
INTO Table1 ("NAME", "CODE", "INSERTDATE")
VALUES ('RRR', 'R', '08-Nov-2017 11:32:22 PM')
SELECT * FROM dual
;
Query 1:
select
*
from table1 t1
left join table1 t2 on t1.INSERTDATE = (t2.INSERTDATE + interval '5' HOUR)
Results:
| NAME | CODE | t1.INSERTDATE | NAME | CODE | t2.INSERTDATE |
|------|------|-----------------------|--------|--------|-----------------------|
| SSS | S | 2017-11-08 18:32:22.0 | YYY | Y | 2017-11-08 13:32:22.0 |
| RRR | R | 2017-11-08 23:32:22.0 | SSS | S | 2017-11-08 18:32:22.0 |
| YYY | Y | 2017-11-08 13:32:22.0 | (null) | (null) | (null) |

Hive - same sequence of records in array

I have a table with data at hour level. I want to find the count of hours and the values for col1 and col2 for all hours in an array. Input Table
+-----+-----+-----+
| hour| col1| col2|
+-----+-----+-----+
| 00 | 0.0 | a |
| 04 | 0.1 | b |
| 08 | 0.2 | c |
| 12 | 0.0 | d |
+-----+-----+-----+
I am using the below query to get the column values in an array
Query:
select count(hr), map_values(str_to_map(concat_ws(',',collect_set(concat_ws(':',reflect('java.util.UUID','randomUUID'),cast(col1 as string)))))) as col1_arr, map_values(str_to_map(concat_ws(',',collect_set(concat_ws(':',reflect('java.util.UUID','randomUUID'),cast(col2 as string)))))) as col2_arr from table;
Output that i am getting, values in col2_arr are not in the same sequence with col1_arr. Please suggest how can i get the values in array/list for different columns in same sequence.
+----------+-----------------+----------+
| count(hr)| col1_arr | col2_arr |
+----------+-----------------+----------+
| 4 | 0.0,0.1,0.2,0.0 | b,a,c,d |
+----------+----------------+-----------+
Required output:
+----------+-----------------+----------+
| count(hr)| col1_arr | col2_arr |
+----------+-----------------+----------+
| 4 | 0.0,0.1,0.2,0.0 | a,b,c,d |
+----------+----------------+-----------+
Thanks
select count(*) as cnt
,concat_ws(',',sort_array(collect_list(hour))) as hour
,regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',hour,cast(col1 as string))))),'..:','') as col1
,regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',hour,col2)))),'..:','') as col2
from mytable
;
+-----+-------------+-------------+---------+
| cnt | hour | col1 | col2 |
+-----+-------------+-------------+---------+
| 4 | 00,04,08,12 | 0,0.1,0.2,0 | a,b,c,d |
+-----+-------------+-------------+---------+

Hive Query for ROlling total based on 2 fields

I have a table a show below
Date | Customer | Count | Daily_Count | ITD_Count
d1 | A | 3 | 3 |
d2 | B | 4 | 4 |
d3 | A | 7 | 16 |
d3 | B | 9 | 16 |
d4 | A | 8 | 9 |
d4 | B | 1 | 9 |
Descrption of Fields:
Date : date
customer : name of customer
Count : # of customers
daily_Count : # of customers on daily basis calculated as
SUM(count) OVER (partition BY date )as Daily_Count
Question :
How do I calculate the Running Total or Rolling Total in the ITD_Count ?
The output should look like
Date | Customer | Count | Daily_Count | ITD_Count
d1 | A | 3 | 3 | 3
d2 | B | 4 | 4 | 7
d3 | A | 7 | 16 | 23
d3 | B | 9 | 16 | 23
d4 | A | 8 | 9 | 31
d4 | B | 1 | 9 | 31
I have tried several variations of using the Window functionality.. But hit a road-block in all my attempts.
Attempt 1 ;
SUM(daily_COunt) OVER (partition BY date order by date rows between unbounded preceding and current row ) as ITD_account_linking
Attempt 2 :
SUM(daily_COunt) OVER (partition BY date, daily_count order by date rows between unbounded preceding and current row ) as ITD_account_linking
and several more attempts following this. :(
Any possible suggestions to guide me in the right direction are welcome.
Please let me know if you need more details.
Use Hive Windowing and Analytics functions.
SELECT Date, Customer, Count, Daily_Count,
SUM(Daily_Count) OVER (ORDER BY Date ROWS UNBOUNDED PRECEDING) AS ITD_Count
FROM table;

ORACLE: INSERT SELECT FROM 2 views and value from param

I'm trying to insert some fields into MYTABLE from views MYVIEW1 and MYVIEW2 and then add a value from a parameter (this is part of a stored procedure) for UPDATED_BY, SYSDATE for UPDATED_ON. How can I correctly do this with INSERT SELECT or some other way entirely?
MYVIEW1
+------+----+-----+-----------+---------+
| YR | MO | QTR | USER_CODE | MO_PERF |
+------+----+-----+-----------+---------+
| 2012 | 1 | 1 | 1099 | 89 |
| 2012 | 2 | 1 | 1099 | 86 |
| 2012 | 3 | 1 | 1099 | 95 |
+------+----+-----+-----------+---------+
MYVIEW2
+------+-----+-----------+----------+
| YR | QTR | USER_CODE | QTR_PERF |
+------+-----+-----------+----------+
| 2012 | 1 | 1099 | 90 |
+------+-----+-----------+----------+
MYTABLE
+------+-----+-----------+---------+---------+---------+---------+-------------+------------+
| YR | QTR | USER_CODE | MO1_PCT | MO2_PCT | MO3_PCT | INC | UPDATED_BY | UPDATED_ON |
+------+-----+-----------+---------+---------+---------+---------+-------------+------------+
| 2012 | 1 | 1099 | 89 | 86 | 95 | 7000 | SAMPLE NAME | 01/16/2013 |
+------+-----+-----------+---------+---------+---------+---------+-------------+------------+
INSERT INTO MYTABLE
(YR,QTR,USER_CODE,MO1_PCT,MO2_PCT,MO3_PCT,INC,UPDATED_BY,UPDATED_ON)
SELECT b.YR,b.QTR,b.USER_CODE,b.MO1_PCT,b.MO2_PCT,b.MO3_PCT,c.INC
FROM MYVIEW1 b,
MYVIEW2 c
How do I insert values for (first month of QTR's MO_PERF) as MO1_PCT and (second month of QTR's MO_PERF) as MO2_PCT and (last month of QTR's MO_PERF) as MO3_PCT, making sure that I've inserted the right month within the right quarter and year.And then check if the MO_PERF values of each month has reached at least 85, else set INC as NULL.
,CASE WHEN MO1_PCT>=85 AND MO2_PCT>=85 AND MO3_PCT>=85 THEN 7000
ELSE NULL
END INC
If you're using oracle 11g then you can use PIVOT like this:
select YR, QTR, USER_CODE, "1_MO_PCT" MO1_PCT, "2_MO_PCT" MO2_PCT, "3_MO_PCT" MO3_PCT ,
case when "1_MO_PCT" >= 85 and "2_MO_PCT" >= 85 and "2_MO_PCT" >= 85 then 7000 end INC,
user updated_by, sysdate updated_on
from (
select m1.yr, m1.mo, m1.qtr, m1.user_code, m1.mo_perf, m2.qtr_perf
from myview1 m1 join myview2 m2 on m1.yr=m2.yr
and m1.qtr = m2.qtr and m1.user_code = m2.user_code )t
pivot(
max(mo_perf) MO_PCT for mo in (1,2,3)
)
Here is a sqlfiddle demo

Resources