why adding order by in the query changes the aggregate value? - vertica

Following vertica example from https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AnalyzingData/SQLAnalytics/AnalyticFunctionsVersusAggregateFunctions.htm?tocpath=Analyzing%20Data%7CSQL%20Analytics%7C_____2
CREATE TABLE employees(emp_no INT, dept_no INT);
INSERT INTO employees VALUES(1, 10);
INSERT INTO employees VALUES(2, 30);
INSERT INTO employees VALUES(3, 30);
INSERT INTO employees VALUES(4, 10);
INSERT INTO employees VALUES(5, 30);
INSERT INTO employees VALUES(6, 20);
INSERT INTO employees VALUES(7, 20);
INSERT INTO employees VALUES(8, 20);
INSERT INTO employees VALUES(9, 20);
INSERT INTO employees VALUES(10, 20);
INSERT INTO employees VALUES(11, 20);
COMMIT;
If I run this query without order by, I get same count value for all rows
dbadmin#b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
6 | 20 | 6
7 | 20 | 6
8 | 20 | 6
9 | 20 | 6
10 | 20 | 6
11 | 20 | 6
1 | 10 | 2
4 | 10 | 2
2 | 30 | 3
3 | 30 | 3
5 | 30 | 3
(11 rows)
But if I add order by, I get incremental value
dbadmin#b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not order by emp_no) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
2 | 30 | 1
3 | 30 | 2
5 | 30 | 3
1 | 10 | 1
4 | 10 | 2
6 | 20 | 1
7 | 20 | 2
8 | 20 | 3
9 | 20 | 4
10 | 20 | 5
11 | 20 | 6
(11 rows)
Time: First fetch (11 rows): 85.075 ms. All rows formatted: 85.139 ms
What is the affect of order by ? Why do I get incremental value?

If the window clause only contains PARTITION BY, it returns the total sum of the partition - for each row of the partition the same value.
If the window clause contains both PARTITION BY and ORDER BY, it returns the running count within the partition . So, using the ORDER BY expression, how many rows have been counted so far within the partition.
That's exactly how window functions work. They give you a whole world of possibilities ...

That happens because Vertica applies a default frame-clause which is defined as:
RANGE UNBOUNDED PRECEDING AND CURRENT ROW
So to get the result you want, you may want to add the frame clause as below after you ORDER BY in the OVER() clause:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
This behaviour is documented as:
If the OVER clause omits specifying a window frame, the function creates a default window that extends from the current row to the first row in the current partition.
Link to doc

Related

SQL help to count number of locations for each item/branch

I'm a SQL rookie, and am having trouble wrapping my head around how to do the following. I have a table that contains item information by branch. Within a branch an item can be in multiple locations. The data I need to extract needs to include a column that provides the total number of locations (count) the item is associated with for a given branch.
Output would look something like this:
I'm guessing this is a sub query, but to be honest I'm not sure how to get started... order in which this is done (subquery group by first, then join, etc)
In purely logical terms:
SELECT
a.Branch,
a.Item,
a.Loc,
COUNT(a.Branch||a.Item) AS 'LocCount'
FROM BranchInventoryFile a
GROUP BY a.Branch,a.Item
You can tackle this by using Oracle's Count Analytical functions found here. Be sure to read up on WINDOW/Partitioning functions as this unlocks quite a bit of functionality in SQL.
SQL:
SELECT
a.BRANCH,
a.ITEM,
a.LOC,
COUNT(a.ITEM) OVER (PARTITION BY a.BRANCH, a.ITEM) AS LOC_COUNT
FROM
BRANCH a;
Result:
| BRANCH | ITEM | LOC | LOC_COUNT |
|--------|------|------|-----------|
| 100 | A | 1111 | 2 |
| 100 | A | 1112 | 2 |
| 200 | A | 2111 | 1 |
| 200 | B | 1212 | 2 |
| 200 | B | 1212 | 2 |
| 300 | A | 1222 | 1 |
SQL Fiddle:
Here
total number of locations (count) the item is associated with for a given branch
The way you described it, you should
remove location from query:
SQL> with branchinventoryfile (branch, item, location) as
2 (select 100, 'A', 1111 from dual union all
3 select 100, 'A', 1112 from dual union all
4 select 200, 'A', 2111 from dual
5 )
6 select branch,
7 item,
8 count(distinct location) cnt
9 from BranchInventoryFile
10 group by branch, item;
BRANCH I CNT
---------- - ----------
100 A 2
200 A 1
SQL>
if you leave location in select, you have to group by it (and get wrong result):
6 select branch,
7 item,
8 location,
9 count(distinct location) cnt
10 from BranchInventoryFile
11 group by branch, item, location;
BRANCH I LOCATION CNT
---------- - ---------- ----------
100 A 1111 1
200 A 2111 1
100 A 1112 1
SQL>
or include locations, but aggregate them, e.g.
6 select branch,
7 item,
8 listagg(location, ', ') within group (order by null) loc,
9 count(distinct location) cnt
10 from BranchInventoryFile
11 group by branch, item;
BRANCH I LOC CNT
---------- - -------------------- ----------
100 A 1111, 1112 2
200 A 2111 1
SQL>

how to loop through each row of every group (while doing "group by") in Oracle table

I have a table like this:
I want to group by the table base on "customer_id" column and calculate "Day-day[0]" column. "Day-day[0]" is "Day" field in every group and "day[0]" is first row of the day in the group. At the same time, I have to calculate total risk which is in following:
This is the table after grouping by:
This is total risk formula:
In fact, I have to loop through each row of every group to calculate total risk.
My sample table is like this:
CREATE TABLE risk_test
(id VARCHAR2 (32) NOT NULL PRIMARY KEY,
customer_id varchar2 (40BYTE),
risk number,
day VARCHAR2(50 BYTE))
insert into risk_test values(1,102,15,1);
insert into risk_test values(2,102,16,1);
insert into risk_test values(3,104,11,1);
insert into risk_test values(4,102,17,2);
insert into risk_test values(5,102,10,2);
insert into risk_test values(6,102,13,3);
insert into risk_test values(7,104,14,2);
insert into risk_test values(8,104,13,2);
insert into risk_test values(9,104,17,1);
insert into risk_test values(10,104,16,2);
The sample answer is like this:
Would you please guide me how I can do this scenario in Oracle database?
Any help is really appreciated.
Using the sample data that was provided, I believe this query should calculate the risks properly:
Query
SELECT o.*,
ROUND (
SUM (day_minus_day0 * risk) OVER (PARTITION BY customer_id)
/ SUM (day_minus_day0) OVER (PARTITION BY customer_id),
5) AS total_risk
FROM (SELECT rt.*, (rt.day - MIN (rt.day) OVER (PARTITION BY customer_id)) + 1 AS day_minus_day0
FROM risk_test rt) o
ORDER BY customer_id, TO_NUMBER (day), TO_NUMBER (id);
Result
ID CUSTOMER_ID RISK DAY DAY_MINUS_DAY0 TOTAL_RISK
_____ ______________ _______ ______ _________________ _____________
1 102 15 1 1 13.77778
2 102 16 1 1 13.77778
4 102 17 2 2 13.77778
5 102 10 2 2 13.77778
6 102 13 3 3 13.77778
3 104 11 1 1 14.25
9 104 17 1 1 14.25
7 104 14 2 2 14.25
8 104 13 2 2 14.25
10 104 16 2 2 14.25
Your total risk calculation just looks like a weighted average to me. That is, the average risk of the rows for each customer, weighted according to the day offset (day-day[0]), so that risks in later days count for more.
To compute that, you need a common table expression to 1st compute the day-weighted risk for each row. Then you can just compute the weighted average by dividing.
The query below illustrates the approach, with comments.
-- This first WITH clause is just sample data. In your database you would
-- get rid of this and replace all references to "input" with your actual
-- table name
with input ( customer_id, risk, day ) AS (
SELECT 1053, 100, 1 FROM DUAL UNION ALL
SELECT 1053, 100, 1 FROM DUAL UNION ALL
SELECT 1053, 100, 2 FROM DUAL UNION ALL
SELECT 1053, 100, 2 FROM DUAL UNION ALL
SELECT 1053, 100, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 1 FROM DUAL UNION ALL
SELECT 1054, 200, 1 FROM DUAL UNION ALL
SELECT 1054, 200, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 3 FROM DUAL UNION ALL
SELECT 1054, 200, 4 FROM DUAL
),
-- This CTE computes the day offset for each row and multiplies by the risk to
-- compute a day-weighted risk.
-- I added +1 to the day_offset, otherwise risks on the 1st day would not contribute
-- to the total risk, which I think is not what you intended(?)
weighted_input AS (
SELECT i.customer_id,
i.risk,
i.day,
i.day - min(i.day) over ( partition by i.customer_id ) + 1 day_offset,
( i.day - min(i.day) over ( partition by i.customer_id ) + 1 ) * i.risk day_weighted_risk
FROM input i )
-- This is the main SELECT clause that gets all the weighted risks and computes
-- the group total risk, which appears the same in every row in each group.
SELECT wi.*,
sum(wi.day_weighted_risk) over ( partition by wi.customer_id ) / sum(wi.day_offset) over ( partition by wi.customer_id ) total_risk
FROM weighted_input wi;
+-------------+------+-----+------------+-------------------+------------+
| CUSTOMER_ID | RISK | DAY | DAY_OFFSET | DAY_WEIGHTED_RISK | TOTAL_RISK |
+-------------+------+-----+------------+-------------------+------------+
| 1053 | 100 | 1 | 1 | 100 | 100 |
| 1053 | 100 | 1 | 1 | 100 | 100 |
| 1053 | 100 | 2 | 2 | 200 | 100 |
| 1053 | 100 | 2 | 2 | 200 | 100 |
| 1053 | 100 | 3 | 3 | 300 | 100 |
| 1054 | 200 | 1 | 1 | 200 | 200 |
| 1054 | 200 | 1 | 1 | 200 | 200 |
| 1054 | 200 | 3 | 3 | 600 | 200 |
| 1054 | 200 | 3 | 3 | 600 | 200 |
| 1054 | 200 | 4 | 4 | 800 | 200 |
+-------------+------+-----+------------+-------------------+------------+
For your database, having the actual table and not needing the input CTE, it would be:
WITH weighted_input AS (
-- This CTE computes the day offset for each row and multiplies by the risk to
-- compute a day-weighted risk.
-- I added +1 to the day_offset, otherwise risks on the 1st day would not contribute
-- to the total risk, which I think is not what you intended(?)
SELECT i.customer_id,
i.risk,
i.day,
i.day - min(i.day) over ( partition by i.customer_id ) + 1 day_offset,
( i.day - min(i.day) over ( partition by i.customer_id ) + 1 ) * i.risk day_weighted_risk
FROM my_table i )
-- This is the main SELECT clause that gets all the weighted risks and computes
-- the group total risk, which appears the same in every row in each group.
SELECT wi.*,
sum(wi.day_weighted_risk) over ( partition by wi.customer_id ) / sum(wi.day_offset) over ( partition by wi.customer_id ) total_risk
FROM weighted_input wi;

use LAG with expression in oracle

I have a column (status) in a table that contain numbers and values are 1, 2 or 4.
I would like, in a SQL query, add a calculated column (bitStatus) that will store the bitwise oerator OR for the status column of the current line and the column in the previous line.
like so :
| id | status| bitStatus|
|----|-------|----------|
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 4 | 7 |
| 4 | 1 | 7 |
So what I did is to use LAG function in oracle but I coudn't figure out how to do it as long as I want to create only on calculated column bitStatus
my query is like :
select id, status,
BITOR(LAG(bitStatus) OVER (ORDER BY 1), status)) AS bitStatus
But as you know, I can't use LAG(bitStatus) when calculating bitStatus.
So how could I make it the desired table.
Thanks in advance.
Would this help?
lines #1 - 6 represent sample data
the TEMP CTE is here to fetch LAG status value (to improve readability)
the final select does the BITOR operation as bitor(a, b) = a - bitand(a, b) + b
SQL> with test (id, status) as
2 (select 1, 1 from dual union all
3 select 2, 2 from dual union all
4 select 3, 1 from dual union all
5 select 4, 4 from dual
6 ),
7 temp as
8 (select id, status,
9 lag(status) over (order by id) lag_status
10 from test
11 )
12 select id,
13 status,
14 status - bitand(status, nvl(lag_status, status)) + nvl(lag_status, status) as bitstatus
15 from temp
16 order by id;
ID STATUS BITSTATUS
---------- ---------- ----------
1 1 1
2 2 3
3 1 3
4 4 5
SQL>

ORACLE Query Get Last ID Using MIN Based On Quantity Consumed By ID

I have Incoming Stock transaction data using Oracle:
ID | DESCRIPTION | PART_NO | QUANTITY | DATEADDED
TR5 | FG | P0025 | 5 | 06-SEP-2017 08:20:33 <-- just now added
TR4 | Test | TEST1 | 8 | 05-SEP-2017 15:11:15
TR3 | FG | GSDFGSG | 10 | 31-AUG-2017 16:26:04
TR2 | FG | GSDFGSG | 2 | 31-AUG-2017 16:05:39
TR1 | FG | GSDFGSG | 2 | 30-AUG-2017 16:30:16
And now I'm grouping that data to be:
TR_ID | PART_NO | TOTAL
TR1 | GSDFGSG | 14
TR4 | TEST1 | 8
TR5 | P0025 | 5 <-- just now added
Query Code:
SELECT MIN(TRANSACTION_EQUIPMENTID) as TR_ID,
PART_NO,
SUM(T.QUANTITY) AS TOTAL
FROM WA_II_TBL_TR_EQUIPMENT T
GROUP BY T.PART_NO
As you can see on that data and query code, I'm show TR_ID using MIN to get first ID on first transaction.
And now I have Outgoing transaction data:
Assume I try to get quantity 8
ID_FK | QUANTITY
TR1 | 8
And now I want to get last ID due to quantity 8 has been consumed
ID | DESCRIPTION | PART_NO | QUANTITY
TR3| FG | GSDFGSG | 10 <-- CONSUMED 4+2+2, TOTAL 8
TR2| FG | GSDFGSG | 2 <-- CONSUMED 2+2, TOTAL 4
TR1| FG | GSDFGSG | 2 <-- CONSUMED 2
As you can see above, TR1, TR2 has been consumed. Now I want the query
SELECT MIN(TRANSACTION_EQUIPMENTID) as TR_ID,
PART_NO,
SUM(T.QUANTITY) AS TOTAL
FROM WA_II_TBL_TR_EQUIPMENT T
GROUP BY T.PART_NO
get the last id is : TR3, due to TR1 & TR2 has been consumed.
How to do that in query?
Take minimum id where growing sum is greater than 8. Use analytic sum():
select min(id) id
from (select t.*,
sum(quantity) over (partition by part_no order by id) sq
from t
where part_no = 'GSDFGSG'
)
where sq >= 8
Test data, output:
create table t(ID varchar2(3), DESCRIPTION varchar2(5),
PART_NO varchar2(8), QUANTITY number(5), DATEADDED date);
insert into t values ('TR4', 'Test', 'TEST1', 8, timestamp '2017-09-05 15:11:15');
insert into t values ('TR3', 'FG', 'GSDFGSG', 10, timestamp '2017-08-31 16:26:04');
insert into t values ('TR2', 'FG', 'GSDFGSG', 2, timestamp '2017-08-31 16:05:39');
insert into t values ('TR1', 'FG', 'GSDFGSG', 2, timestamp '2017-08-30 16:30:16');
insert into t values ('TR5', 'FG', 'GSDFGSG', 3, timestamp '2017-08-31 17:00:00');
Edit:
Add part_no and total columns and group by clause:
select min(id) id, part_no, min(sq) total
from (select t.*,
sum(quantity) over (partition by part_no order by id) sq
from t
where part_no = 'GSDFGSG'
)
where sq >= 8
group by part_no
ID PART_NO TOTAL
--- -------- ----------
TR3 GSDFGSG 14

Table scan on internal tables

Assume I have 2 tables - TABLE-1 & TABLE-2 and each of the table has 1 million rows with 10 columns and index on col1..
Now I build a internal table on this 2 tables ( 1 + 1 = 2 million) rows,
select * from
(select col1, col2,....col10 from table-1
union all
select col1, col2,....col10 from table-2) x
Questions,
how will the internal table will be treated in Oracle since its a internal table..
1. Will the internal table will be treated as a table with index on col1?
2. Will this be captured in the Explain plan?
Yes and yes.
Oracle will effectively treat this inline view as a table. It can use predicate pushing to apply a filter on the inline view to the base tables, and potentially use an index. The explain plan will show this.
Tables, indexes, sample data, and statistics
create table table1(col1 number, col2 number, col3 number, col4 number);
create table table2(col1 number, col2 number, col3 number, col4 number);
create index table1_idx on table1(col1);
create index table2_idx on table2(col1);
insert into table1 select level, level, level, level
from dual connect by level <= 100000;
insert into table2 select level, level, level, level
from dual connect by level <= 100000;
commit;
begin
dbms_stats.gather_table_stats(user, 'TABLE1');
dbms_stats.gather_table_stats(user, 'TABLE2');
end;
/
Explain plan showing predicate pushing and index access
explain plan for
select * from
(
select col1, col2, col3, col4 from table1
union all
select col1, col2, col3, col4 from table2
)
where col1 = 1;
select * from table(dbms_xplan.display);
Plan hash value: 400235428
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 40 | 2 (0)| 00:00:01 |
| 1 | VIEW | | 2 | 40 | 2 (0)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TABLE1 | 1 | 20 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | TABLE1_IDX | 1 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| TABLE2 | 1 | 20 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TABLE2_IDX | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("COL1"=1)
6 - access("COL1"=1)
Notice how the predicates happen before the VIEW, and both indexes are used. By default everything should work as well as can be expected.
Notes
This type of query structure is called an inline view. Although a physical table is not built, the phrase "internal tables" is a good way of thinking about how the query logically works. Ideally, an inline view would work exactly like a pre-built table with the same data. In reality there are some cases where things don't quit work that way. But in general you are definitely on the right path - build a large query by assembling small inline views, and assume that Oracle will optimize it correctly.
for your particular query no any index will be used, but I suppose you do some filtering, ie where x.col1 = ###, I'm not sure that oracle will be able to use table-1/table-2 indexes to filter, so I suggest you to put where statements inside "union query"

Resources