Recursive Oracle query to calculate two values that depends on each other - oracle

I have this table with materials stock data:
Date(MM/DD) |received_qty| returned_qty | used_qty
01/01 | 5000 | 0 | 3500
01/02 | 0 | 0 | 1500
01/03 | 7500 | 0 | 1250
01/04 | 0 | 0 | 0
I need to add two more columns to calculate the "start quantity" and the "daily stock", assuming this:
"start quantity" is zero in the first day, after that is "daily
stock" from the previous day.
"daily stock" is "start quantity" + received_qty - returned_qty - used_qty
As you can see, each value depend on each other...
So, the data would like this after adding these two columns:
Date(MM/DD) |Start_qty |received_qty| returned_qty | used_qty | daily_stock
01/01 | 0 | 5000 | 0 | 3500 | 1500
01/02 | 1500 | 0 | 0 | 1500 | 0
01/03 | 0 | 7500 | 0 | 1250 | 6250
01/04 | 6250 | 0 | 0 | 0 | 6250
I'm sure these columns can be generated using recursives queries with start with and connect by clauses present in Oracle, but I'm struggled with the script...

Try this query
Select t.*,
Coalesce(
Sum( "received_qty" - "returned_qty" - "used_qty" )
Over ( order by "Date" ) ,
0) as daily_stock,
Coalesce(
Sum( "received_qty" - "returned_qty" - "used_qty" )
Over ( order by "Date"
Rows between unbounded preceding
And 1 preceding ),
0) as start_quantity
from table1 t
Demo http://sqlfiddle.com/#!4/94417/13
| Date | received_qty | returned_qty | used_qty | DAILY_STOCK | START_QUANTITY |
|-----------------------|--------------|--------------|----------|-------------|----------------|
| 2001-01-01 00:00:00.0 | 5000 | 0 | 3500 | 1500 | 0 |
| 2001-01-02 00:00:00.0 | 0 | 0 | 1500 | 0 | 1500 |
| 2001-01-03 00:00:00.0 | 7500 | 0 | 1250 | 6250 | 0 |
| 2001-01-04 00:00:00.0 | 0 | 0 | 0 | 6250 | 6250 |

Related

Distinct aggregation in pre-calculated measure (MDX)

There are two measures in one fact table \ dimension. Measure 'YearTotal' should somehow be pre-calcuated as a distinct value for any futher summing (aggregating). And 'YearTotal' can't be derived from 'YearDetail' measure, so they are completely independent.
+-------------+---------------+-----------+------------+
| AccountingID | Date | TotalYear | YearDetail |
+--------------+---------------+-----------+------------+
| account1 | 31.12.2012 | 500 | 7 |
| account1 | 31.12.2012 | 500 | 3 |
| account1 | 31.12.2012 | 500 | 1 |
| account2 | 31.12.2012 | 900 | 53 |
| account2 | 31.12.2012 | 900 | 4 |
| account2 | 31.12.2012 | 900 | 9 |
| account3 | 31.12.2012 | 203 | 25 |
| account3 | 31.12.2012 | 203 | 11 |
| account3 | 31.12.2012 | 203 | 17 |
+--------------+---------------+-----------+------------+
So, the question: What should be in (pre)calculated measure expression to get such a result:
select
(
[Accounting Dim].[Account ID]
[Dim Date].[Calendar Year].&[2012]
) ON COLUMNS
from [Cube]
WHERE [Measures].[YearTotal]
in case of correct expression the answer would be --> (500+900+203) = 1603
(and optionaly): maybe there is a common distinct pattern solution for any other simple types of aggregation
Maybe go with MAX at the column level [TotalYear] and enforce a specific level of calculation.
CREATE MEMBER [Measures].[YearTotal] AS
MAX(
(
[Calendar Dim].[Year].[All].Children,
[Accounting Dim].[Account ID].[All].Children
),
[Measures].[TotalYear]
)

Recording particular cell of a table on the basis of a certain condition using grep

I have a number of tables that looks as follows:
time | node | left |LP iter|LP it/n|mem/heur|mdpt |vars |cons |rows |cuts |sepa|confs|strbr| dualbound | primalbound | gap | compl.
0.0s| 1 | 0 | 100 | - | 1046k | 0 | 100 | 102 | 100 | 0 | 0 | 0 | 0 | -- | 9.999990e+05*| Inf | unknown
* 0.3s| 1 | 0 | 100 | - | LP | 0 | 200 | 102 | 100 | 0 | 0 | 0 | 0 | -- | 5.587300e+04 | Inf | unknown
12.0s| 1 | 0 | 239 | - | 1781k | 0 | 239 | 102 | 100 | 0 | 0 | 0 | 0 | 5.577800e+04 | 5.587300e+04 | 0.17%| unknown
12.1s| 1 | 0 | 287 | - | 2595k | 0 | 239 | 102 | 935 | 835 | 1 | 0 | 0 | 5.577800e+04 | 5.587300e+04 | 0.17%| unknown
66.8s| 1 | 0 | 422 | - | 3061k | 0 | 336 | 102 | 935 | 835 | 1 | 0 | 0 | 5.577800e+04 | 5.587300e+04 | 0.17%| unknown
89.4s| 1 | 0 | 481 | - | 3218k | 0 | 361 | 102 | 935 | 835 | 1 | 0 | 0 | 5.580100e+04 | 5.587300e+04 | 0.13%| unknown
89.5s| 1 | 0 | 579 | - | 3513k | 0 | 361 | 102 |1335 |1235 | 2 | 0 | 0 | 5.580100e+04 | 5.587300e+04 | 0.13%| unknown
100s| 1 | 0 | 715 | - | 3837k | 0 | 403 | 102 |1335 |1235 | 2 | 0 | 0 | 5.583250e+04 | 5.587300e+04 | 0.07%| unknown
I'm interested in recording the first numeric value in the gap column (second last column of the table). The gap column could either have Inf or x.xx% values in it. If all the values in the gap column are Inf, then I would simply record Inf, otherwise, I would like to record the first numeric value. For e.g. in the above table, the value that I would like to record is 0.17. I tried many different ways but couldn't achieve any success. It would be really great if someone could provide some guidance as to how to achieve the above-mentioned objective. Thanks !
You may use this awk solution:
awk -F '[[:blank:]]*\\|[[:blank:]]*' '
NR > 1 && (!v || v == "Inf") {
v = ($(NF-1) == "Inf" ? $(NF-1) : $(NF-1)+0)
}
END {
print v
}' file
0.17

Which model should I use? xtlogit or xtprobit

I have the following panel data set with very large N (500,000) and small T (15 years). My dependent variable is Project1 or project 2. I want to estimate the likelihood of Project dependent on treated with year and village fixed effects. For the continuous dependent variable, I was using reghdfe.
The dependent variable is simply that when a village gets the project the dummy is equal to 1 and remains 1 for the subsequent years.
I am aware that I cannot use "probit" command in STATA as I have a panel. Can you suggest which model should I use?
| village | population | year | project_1 | project_2 | treated |
|---------|------------|------|-----------|-----------|-----------|
| A | 100 | 2001 | 0 | 0 | 0 |
| A | 100 | 2002 | 1 | 0 | 0 |
| A | 100 | 2003 | 1 | 0 | 1 |
| A | 100 | 2004 | 1 | 0 | 1 |
| A | 100 | 2005 | 1 | 0 | 1 |
| B | 200 | 2001 | 0 | 0 | 0 |
| B | 200 | 2002 | 0 | 0 | 1 |
| B | 200 | 2003 | 0 | 1 | 1 |
| B | 200 | 2004 | 0 | 1 | 1 |
| B | 200 | 2005 | 0 | 1 | 1 |
| C | 150 | 2001 | 0 | 0 | 0 |
| C | 150 | 2002 | 0 | 0 | 0 |
| C | 150 | 2003 | 0 | 0 | 0 |
| C | 150 | 2004 | 1 | 0 | 0 |
| C | 150 | 2005 | 1 | 0 | 1 |
| D | 175 | 2001 | 0 | 0 | 0 |
| D | 175 | 2002 | 0 | 0 | 0 |
| D | 175 | 2003 | 0 | 0 | 0 |
| D | 175 | 2004 | 0 | 0 | 1 |
| D | 175 | 2005 | 0 | 0 | 1 |
Your question has two parts. Which model of Logit and Probit is more appropriate for you, and how to implement the appropriate model in Stata. As #NickCox mentioned, the former is most appropriate for Cross Validated, and has received robust discussion there: Difference between logit and probit models
.

Query running long despite having "good" plan

I have a query which is a UNION query which joins multiple tables. All the stats are updated. The query used to run fine and completed within 30-40 seconds but in 12c it's taking insane amount of time(~10 mins to 5 mins)
I have captured the actual plan.
lan hash value: 1038754298
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | O/1/M |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 50 |00:00:49.93 | 960K| 7 | | | |
| 1 | SORT ORDER BY | | 1 | 1791 | 50 |00:00:49.93 | 960K| 7 | 267K| 267K| 1/0/0|
| 2 | UNION-ALL | | 1 | | 2648 |00:00:49.93 | 960K| 7 | | | |
| 3 | NESTED LOOPS SEMI | | 1 | 1 | 2648 |00:00:42.36 | 498K| 0 | | | |
|* 4 | HASH JOIN | | 1 | 1589 | 9111 |00:00:42.22 | 469K| 0 | 7900K| 2299K| 1/0/0|
|* 5 | HASH JOIN | | 1 | 1140K| 50569 |00:00:19.50 | 342K| 0 | 8321K| 2379K| 1/0/0|
|* 6 | HASH JOIN | | 1 | 44335 | 59660 |00:00:16.01 | 252K| 0 | 1817K| 1817K| 1/0/0|
|* 7 | TABLE ACCESS FULL | PROMHEAD | 1 | 870 | 870 |00:00:00.01 | 602 | 0 | | | |
|* 8 | HASH JOIN | | 1 | 1168K| 172K|00:00:15.94 | 251K| 0 | 1148K| 1148K| 1/0/0|
|* 9 | TABLE ACCESS FULL | PRICE_BATCH_TRAN | 1 | 857 | 857 |00:00:00.01 | 315 | 0 | | | |
| 10 | TABLE ACCESS FULL | PROMSTORE | 1 | 36M| 36M|00:00:04.64 | 251K| 0 | | | |
|* 11 | TABLE ACCESS FULL | PROMSKU | 1 | 6665K| 6665K|00:00:01.49 | 90224 | 0 | | | |
| 12 | PARTITION RANGE ALL | | 1 | 37M| 37M|00:00:10.72 | 127K| 0 | | | |
| 13 | INDEX FAST FULL SCAN | PK_ITEM_ZONE_PRICE | 24 | 37M| 37M|00:00:04.75 | 127K| 0 | | | |
| 14 | PARTITION RANGE ITERATOR | | 9111 | 32376 | 2648 |00:00:00.13 | 28752 | 0 | | | |
| 15 | PARTITION HASH ITERATOR | | 9111 | 32376 | 2648 |00:00:00.11 | 28752 | 0 | | | |
|* 16 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | FDT_PRICE_FUTURE | 9111 | 32376 | 2648 |00:00:00.09 | 28752 | 0 | | | |
|* 17 | INDEX RANGE SCAN | FDT_PRICE_FUTURE_PK | 9111 | 1 | 2668 |00:00:00.07 | 26089 | 0 | | | |
|* 18 | HASH JOIN | | 1 | 1790 | 0 |00:00:07.31 | 450K| 7 | 1599K| 1599K| 1/0/0|
| 19 | TABLE ACCESS FULL | PRICE_SUSP_HEAD | 1 | 2056 | 2056 |00:00:00.01 | 965 | 0 | | | |
|* 20 | HASH JOIN | | 1 | 1912 | 0 |00:00:07.30 | 449K| 7 | 39M| 8299K| 1/0/0|
| 21 | PART JOIN FILTER CREATE | :BF0000 | 1 | 745K| 745K|00:00:00.12 | 1682 | 0 | | | |
| 22 | TABLE ACCESS FULL | PRICE_ZONE_GROUP_STORE | 1 | 745K| 745K|00:00:00.03 | 1682 | 0 | | | |
|* 23 | HASH JOIN | | 1 | 496K| 0 |00:00:07.03 | 448K| 7 | 73M| 6638K| 1/0/0|
| 24 | PARTITION RANGE ALL | | 1 | 1138K| 1138K|00:00:01.70 | 154K| 7 | | | |
| 25 | PARTITION HASH ALL | | 26 | 1138K| 1138K|00:00:01.53 | 154K| 7 | | | |
| 26 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| FDT_PRICE_FUTURE | 208 | 1138K| 1138K|00:00:01.35 | 154K| 7 | | | |
|* 27 | INDEX SKIP SCAN | FDT_PRICE_FUTURE_I1 | 208 | 1138K| 1138K|00:00:00.89 | 125K| 7 | | | |
|* 28 | HASH JOIN | | 1 | 25M| 0 |00:00:04.84 | 293K| 0 | 977K| 977K| 1/0/0|
|* 29 | HASH JOIN | | 1 | 252K| 1123 |00:00:03.33 | 90826 | 0 | 1344K| 1344K| 1/0/0|
|* 30 | TABLE ACCESS FULL | PROMHEAD | 1 | 870 | 870 |00:00:00.01 | 602 | 0 | | | |
|* 31 | TABLE ACCESS FULL | PROMSKU | 1 | 6665K| 6665K|00:00:01.47 | 90224 | 0 | | | |
| 32 | PARTITION RANGE JOIN-FILTER | | 1 | 6719K| 32778 |00:00:01.51 | 202K| 0 | | | |
|* 33 | TABLE ACCESS FULL | PRICE_SUSP_DETAIL | 24 | 6719K| 32778 |00:00:01.51 | 202K| 0 | | | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("IZP"."ITEM"="PSK"."SKU" AND "IZP"."ZONE_GROUP_ID"="PBT"."ZONE_GROUP_ID" AND "IZP"."ZONE_ID"="PBT"."NEW_ZONE_ID")
5 - access("PS"."PROMOTION"="PSK"."PROMOTION")
6 - access("PH"."PROMOTION"="PS"."PROMOTION")
7 - filter("PH"."STATUS"='A')
8 - access("PS"."STORE"="PBT"."STORE")
filter("PS"."START_DATE">"PBT"."EFFECTIVE_DATE")
9 - filter("PBT"."FDC_STATUS"='A')
11 - filter(("PSK"."CHANGE_TYPE"='MA' OR "PSK"."CHANGE_TYPE"='PC'))
16 - filter("PRICE_MOD_TYPE"='ZC')
17 - access("STORE"="PS"."STORE" AND "SKU"="PSK"."SKU" AND "START_DATE"<"PS"."START_DATE")
18 - access("PSH"."PRICE_CHANGE"="PSD"."PRICE_CHANGE")
filter("FDT"."START_DATE">"PSH"."ACTIVE_DATE")
20 - access("PSD"."ZONE_ID"="PZGS"."ZONE_ID" AND "PSD"."ZONE_GROUP_ID"="PZGS"."ZONE_GROUP_ID" AND "FDT"."STORE"="PZGS"."STORE")
23 - access("PSK"."PROMOTION"="FDT"."PRICE_MOD_NO" AND "PSD"."SKU"="FDT"."SKU")
27 - access("FDT"."PRICE_MOD_TYPE"='PR')
filter("FDT"."PRICE_MOD_TYPE"='PR')
28 - access("PSK"."SKU"="PSD"."SKU")
29 - access("PH"."PROMOTION"="PSK"."PROMOTION")
30 - filter("PH"."STATUS"='A')
31 - filter(("PSK"."CHANGE_TYPE"='MA' OR "PSK"."CHANGE_TYPE"='PC'))
33 - filter("PSD"."STATUS"='A')
Note
-----
- this is an adaptive plan
As you can see Oracle predicts the cardinality correctly in most cases. Can you suggest where should I start looking from.
SELECT
PBT.STORE ,
PSK.SKU ,
0 MULTI_UNITS ,
0 MULTI_UNIT_RETAIL ,
PS.START_DATE PS_START_DATE ,
PBT.EFFECTIVE_DATE PC_ACTIVE_DATE ,
PBT.ZONE_CHANGE PRICE_CHANGE ,
PH.PROMOTION PRICE_MOD_NO ,
'PR' PRICE_TYPE ,
'U' REC_TYPE
FROM PROMHEAD PH ,
PROMSTORE PS ,
PROMSKU PSK ,
PRICE_BATCH_TRAN PBT,
ITEM_ZONE_PRICE IZP
WHERE PH.PROMOTION =PS.PROMOTION
AND PS.PROMOTION =PSK.PROMOTION
AND PBT.FDC_STATUS ='A'
AND PH.STATUS ='A'
AND PS.STORE =PBT.STORE
AND PSK.CHANGE_TYPE IN('MA','PC')
AND IZP.ITEM =PSK.SKU
AND IZP.ZONE_GROUP_ID=PBT.ZONE_GROUP_ID
AND IZP.ZONE_ID =PBT.NEW_ZONE_ID
AND PS.START_DATE > PBT.EFFECTIVE_DATE
AND EXISTS
(
SELECT 1
FROM FDT_PRICE_FUTURE
WHERE SKU =PSK.SKU
AND STORE =PS.STORE
AND START_DATE < PS.START_DATE
AND PRICE_MOD_TYPE = 'ZC'
)
UNION ALL
SELECT FDT.STORE ,
FDT.SKU ,
0 MULTI_UNITS ,
0 MULTI_UNIT_RETAIL ,
FDT.START_DATE PS_START_DATE ,
PSH.ACTIVE_DATE PC_ACTIVE_DATE ,
PSD.PRICE_CHANGE PRICE_CHANGE ,
PH.PROMOTION PRICE_MOD_NO ,
'PR' PRICE_TYPE ,
'U' REC_TYPE
FROM FDT_PRICE_FUTURE FDT ,
PROMHEAD PH ,
PROMSKU PSK ,
PRICE_SUSP_DETAIL PSD,
PRICE_SUSP_HEAD PSH ,
PRICE_ZONE_GROUP_STORE PZGS
WHERE PH.PROMOTION =PSK.PROMOTION
AND PSK.PROMOTION =FDT.PRICE_MOD_NO
AND FDT.PRICE_MOD_TYPE='PR'
AND PH.STATUS ='A'
AND PSH.PRICE_CHANGE =PSD.PRICE_CHANGE
AND PSD.STATUS ='A'
AND PSD.ZONE_ID =PZGS.ZONE_ID
AND PSD.ZONE_GROUP_ID =PZGS.ZONE_GROUP_ID
AND FDT.STORE =PZGS.STORE
AND PSK.SKU =PSD.SKU
AND PSD.SKU =FDT.SKU
AND PSK.CHANGE_TYPE IN('MA','PC')
AND FDT.START_DATE > PSH.ACTIVE_DATE
ORDER BY 1,2,7;
Also, if Diagnostics and Tuning Pack licenses available, you can run the SQL Tuning Advisor on the SQL.
-- SQLTUNE.sql
-- Updated 04-Mar-2015 RDCornejo for spool error
-- takes sql_id,plan_hash_value, start and ent snap_id's as parameters
-- created sql tuning report on c:\temp
set echo off
set feedback off
set term off
set verify off
set head on
-- uncomment for debugging if neeed
-- set echo on term on
set termout on
accept SQL_ID char prompt 'Enter the sql_id to Tune? '
accept hash_value char prompt 'Enter the hash_value to Tune? '
accept BEGIN_SNAP char prompt 'Enter the begin snap_id to Tune? '
accept END_SNAP char prompt 'Enter the end snap_id to Tune? '
-- using bind variables:
variable sql_id_var varchar2(25);
variable plan_hash_value varchar2(25);
variable begin_snap_var varchar2(25);
variable end_snap_var varchar2(25);
variable task_name varchar2(100);
exec :sql_id_var := '&SQL_ID';
exec :plan_hash_value := '&hash_value';
exec :begin_snap_var := '&BEGIN_SNAP';
exec :end_snap_var := '&END_SNAP';
exec :task_name := :sql_id_var || '_' || :plan_hash_value || '_AWR';
column sql_id format a25
column plan_hash_value format 999999999999
column begin_snap format a10
column end_snap format a10
select :sql_id_var as sql_id
, :plan_hash_value as plan_hash_value
, :begin_snap_var as begin_snap
, :end_snap_var as end_snap
from dual;
-- get my prefered text in file name
column MY_TEXT new_val FILE_NAME noprint
Select instance_name||'_'|| :SQL_ID_VAR || '_' || :plan_hash_value|| '_' || :BEGIN_SNAP_VAR || '_' || :END_SNAP_VAR || '_' || to_char(sysdate, 'yyyy_mm_dd_hh24_mi') MY_TEXT
from
(select upper(substr(global_name, 1, instr(global_name,'.' )-1)) instance_name from global_name);
-- turn spooling on to capture any error and the report.
set serveroutput on
spool c:\temp\AWR_Tuning_Task_&FILE_NAME..txt
set linesize 210
set pagesize 9999
set long 10000000
set time on
set timing on
SET SERVEROUTPUT ON
prompt Drop existing task if exists. Ignore error if not
-- Drop existig tuning task; assuming I'm running new tasks and if one exists, it it no longer needed; can ignore error if no task exists yet
execute DBMS_SQLTUNE.drop_tuning_task (task_name => :task_name);
prompt Create Tuning Task
-- Tuning task created for specific a statement from the AWR. [SEE BELOW FOR EXAPLES USING OTHER METHODS OF IDENTIFYING THE SQL]
--If the TASK_NAME parameter is specified it's value is returned as the SQL tune task identifier.
--If ommitted a system generated name like "TASK_1478" is returned.
--If the SCOPE parameter is set to scope_limited the SQL profiling analysis is omitted.
--The TIME_LIMIT parameter simply restricts the time the optimizer can spend compiling the recommendations.
/**/
DECLARE
l_sql_tune_task_id VARCHAR2(100);
BEGIN
l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (
begin_snap => to_number(:BEGIN_SNAP_VAR) ,
end_snap => to_number(:END_SNAP_VAR) ,
sql_id => :sql_id_var,
plan_hash_value => to_number(:plan_hash_value),
scope => DBMS_SQLTUNE.scope_comprehensive,
time_limit => 1200, -- in seconds
task_name => :task_name,
description => 'Tuning task for statement '|| :sql_id_var || ' hash: ' || :plan_hash_value|| ' in AWR ' || :BEGIN_SNAP_VAR || ' - ' || :END_SNAP_VAR || ' .');
DBMS_OUTPUT.put_line('l_sql_tune_task_id: ' || l_sql_tune_task_id);
END;
/
prompt Execute Tuning Task
-- With the tuning task defined the next step is to execute it using the EXECUTE_TUNING_TASK procedure.
EXEC DBMS_SQLTUNE.execute_tuning_task(task_name => :task_name);
/**/
-- Once the tuning task has executed successfully the recommendations can be displayed using the REPORT_TUNING_TASK function.
set linesize 210
set pagesize 9999
set long 10000000
set time on
set timing on
SET SERVEROUTPUT ON
-- width of output should not need more than 200
column recommendations format a200
prompt Report Tuning Task
SELECT DBMS_SQLTUNE.report_tuning_task(:task_name) AS recommendations FROM dual;
spool off

View count rows as columns in query result

First thing first: I am able to get the data one way. My purpose is to increase the readability of my query result. I am seeking if it is possible.
I have a table that fed by devices. I want to get the number of data sent on each hour that was grouped by two identical columns. Grouping these two columns is needed to determine one device type.
Table structure is like:
| identifier-1 | identifier-2 | day | hour | data_name | data_value |
|--------------|--------------|------------|------|-----------|------------|
| type_1 | subType_4 | 2016-08-25 | 0 | Key-30 | 4342 |
|--------------|--------------|------------|------|-----------|------------|
| type_3 | subType_2 | 2016-08-25 | 0 | Key-50 | 96 |
|--------------|--------------|------------|------|-----------|------------|
| type_6 | subType_2 | 2016-08-25 | 1 | Key-44 | 324 |
|--------------|--------------|------------|------|-----------|------------|
| type_2 | subType_1 | 2016-08-25 | 1 | Key-26 | 225 |
|--------------|--------------|------------|------|-----------|------------|
I'm going to use one specific data_name which was sent by all devices, and getting the count of this data_name will give me the data sent on each hour. It is possible to get the number in 24 rows as grouping by identifier-1,identifier-2, day and hour. However, they will repeat for each device type.
| identifier-1 | identifier-2 | day | hour | count |
|--------------|--------------|------------|------|-------|
| type_6 | subType_2 | 2016-08-25 | 0 | 340 |
|--------------|--------------|------------|------|-------|
| type_6 | subType_2 | 2016-08-25 | 1 | 340 |
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
| type_1 | subType_4 | 2016-08-25 | 0 | 32 |
|--------------|--------------|------------|------|-------|
| type_1 | subType_4 | 2016-08-25 | 1 | 30 |
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
|--------------|--------------|------------|------|-------|
I want to view the result like this:
| identifier-1 | identifier-2 | day | count_of_0 | count_of_1 |
|--------------|--------------|------------|------------|------------|
| type_6 | subType_2 | 2016-08-25 | 340 | 340 |
|--------------|--------------|------------|------------|------------|
| type_1 | subType_4 | 2016-08-25 | 32 | 30 |
|--------------|--------------|------------|------------|------------|
|--------------|--------------|------------|------------|------------|
In SQL, it is possible to get subqueries and columns in result but it is not possible on Hive. I guess it is called correlated subqueries.
Hive column as a subquery select
Answer of this question did not work for me.
Do you have any idea or suggestion?
You can do this using conditional aggregation:
select identifier1, identifier2, day,
sum(case when hour = 0 then data_value else 0 end) as cnt_0,
sum(case when hour = 1 then data_value else 0 end) as cnt_1
from t
where data_name = ??
group by identifier1, identifier2, day
order by identifier1, identifier2, day

Resources