order by not returning running count - vertica

In this question - why adding order by in the query changes the aggregate value? - I was told that "If the window clause contains both PARTITION BY and ORDER BY, it returns the running count within the partition . So, using the ORDER BY expression, how many rows have been counted so far within the partition."
Referring to this example - https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AnalyzingData/SQLAnalytics/ReportingAggregates.htm?tocpath=Analyzing%20Data%7CSQL%20Analytics%7CWindow%20Framing%7C_____3
Why does the cumulative count shows 4 (last value of count) for all values of sal=109?
=> SELECT deptno, sal, empno, COUNT(sal) OVER (
-> PARTITION BY deptno ORDER BY sal
-> ) AS COUNT
-> FROM emp;
deptno | sal | empno | count
--------+-----+-------+-------
10 | 101 | 1 | 1
10 | 104 | 4 | 2
------------------------------
20 | 100 | 11 | 1
20 | 109 | 7 | 4<-
20 | 109 | 6 | 4<-
20 | 109 | 8 | 4<-
20 | 110 | 10 | 6<-
20 | 110 | 9 | 6<-
------------------------------
30 | 102 | 2 | 1
30 | 103 | 3 | 2
30 | 105 | 5 | 3

You order by sal, which is at 109 for 3 rows within deptno 20. For the ordering criteria, there are 3 rows that should appear at the same time. After all 3 are added, after 100 for 1 row, you are immediately at 4. So you get 4 for all 3.
You need distinct ordering values to get distinct running count results.

Related

Row count by group

I am attempting to write the following query to get a row count by group.
select
a.employee, a.cov_option,
count(a.cov_option) over (partition by a.cov_option order by a.employee) as row_num
from wilson.benefit a
inner join wilson.bncategory b
ON a.plan_type = b.plan_type and a.plan_option = b.plan_option
inner join wilson.bncovopt c
ON a.company = c.company and a.plan_code = c.plan_code and a.cov_option = c.coverage_opt
where
a.plan_type = 'HL' and
to_char(a.stop_date, 'yyyy-mm-dd') = '1700-01-01'
order by a.employee, a.cov_option
The result set returned is:
employee | cov_option |row_num
-------------|--------------|--------------
429 | 1 | 1
429 | 3 | 2
429 | 3 | 2
1420 | 1 | 2
1420 | 3 | 4
1420 | 3 | 4
1537 | 2 | 2
1537 | 2 | 2
The result set I am attempting to return is:
429 | 1 | 1
429 | 3 | 2
429 | 3 | 2
1420 | 1 | 1
1420 | 3 | 2
1420 | 3 | 2
1537 | 2 | 1
1537 | 2 | 1
What you seem to want is dense_rank() rather than count(). Indeed, "count" means simply determining how many rows are in each group, it is not "counting" in the way we learn as children (first, second, third). That kind of counting is called "ranking".
dense_rank() over (partition by a.employee order by a.cov_option) as row_num
should do what you need.
There is also rank() - the difference is that if two rows are tied for first, with dense_rank() the third row gets rank 2; with simple rank() it gets rank 3 (rank 2 is "used up" by the first two rows).

Oracle pagination query and user defined function - slow execution

My query is like this:
select id, fn_calc(col)
from table_a
order by id
offset 1483800 rows fetch next 100 rows only;
Note that the paging offset is extremely high - table_a has about 1,500,000 rows.
The above query takes very long, but when fn_calc(col) is replaced by col, query speed is satisfactory - at least 5 times faster. But when the offset is 0 or 100, two queries are almost equally fast. Why this difference?
Possible reasons I can think of:
Oracle executes fn_calc() 1483900 times, although it is logically not neccessary. (It is enough to call it only 100 times)
The calling cost for a user function in a query is very high.
I'm using Oracle 12c on ExaData.
Any suggestion can help.
UPDATE:
When the above query is changed as follows:
select id, fn_calc(col)
from
(
select id, col
from table_a
order by id
offset 1483800 rows fetch next 100 rows only
)
order by id
The query speed is comparable to the case when fn_calc() is not called at all.
Why this?
UPDATE:
The execution plans are as follows: (Sorry, currently only I have is SQLDevelper, so I had to capture the result.)
The first query:
The second query (which uses subquery):
To down-voters and close-voters: Please specify your concerns about this question before you vote. I'll update my question accordingly. This question is about the real and serious problem to me. Please do not deprive the chance to get help.
Firstly, what is the purpose of the pagination query when you are not using ORDER BY clause. You are randomly fetching the rows, i.e. Oracle will internally apply ORDER BY NULL.
Secondly, since you have the function in the SELECT and not in the filter predicate, the explain plan should be same. The only extra time spent should be due to the function.
For example,
SQL> CREATE OR REPLACE FUNCTION f_char(
2 i_empno NUMBER)
3 RETURN VARCHAR2
4 AS
5 v_empno VARCHAR2(10);
6 BEGIN
7 v_empno := TO_CHAR(i_empno);
8 return v_empno;
9 END;
10 /
Function created.
Let's compare the explain plan:
SQL> set autot on explain
SQL>
SQL> SELECT f_char(empno) FROM emp
2 OFFSET 5 ROWS
3 FETCH NEXT 5 rows only;
F_CHAR(EMPNO)
--------------------------------------------------------------------------------
7698
7782
7788
7839
7844
Execution Plan
----------------------------------------------------------
Plan hash value: 3611411408
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 28210 | 4 (0)| 00:00:01 |
|* 1 | VIEW | | 14 | 28210 | 4 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 14 | 56 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 14 | 56 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN
(5>=0) THEN 5 ELSE 0 END +5 AND "from$_subquery$_002"."rowlimit_$$
_rownu
mber">5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=CASE WHEN (5>=0)
THEN 5 ELSE 0 END +5)
SQL> SELECT empno FROM emp
2 OFFSET 5 ROWS
3 FETCH NEXT 5 rows only;
EMPNO
----------
7698
7782
7788
7839
7844
Execution Plan
----------------------------------------------------------
Plan hash value: 3611411408
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 364 | 4 (0)| 00:00:01 |
|* 1 | VIEW | | 14 | 364 | 4 (0)| 00:00:01 |
|* 2 | WINDOW NOSORT STOPKEY| | 14 | 56 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 14 | 56 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=CASE WHEN
(5>=0) THEN 5 ELSE 0 END +5 AND "from$_subquery$_002"."rowlimit_$$
_rownu
mber">5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=CASE WHEN (5>=0)
THEN 5 ELSE 0 END +5)

Oracle select two (or more) adjacent rows having the same value for a given column

How do I do the following in Oracle:
I have a (simplified) table:
+-----+-----+-----+
| a | b | ... |
+-----+-----+-----+
| 1 | 7 | ... |
| 2 | 5 | ... |
| 1 | 7 | ... |
+-----+-----+-----+
Where a functions as a unique identifier for a person, and b is the field I am interested in matching across rows. How do I construct a query that basically says "give me the person-ID's where the person has multiple b values (i.e., duplicates)"?
So far I have tried:
SELECT a FROM mytable GROUP BY a HAVING COUNT(DISTINCT b) > 1;
This feels close except it just gives me the user IDs where the user has multiple unique b's, which I suspect is coming from the DISTINCT part, but I'm not sure how to change the query to achieve what I want.
Try
group by a,b having count(b) > 1
Yours would count 7,5,7 as 2 (one 7, one 5). This one one will count total Bs in any grouping, so you'll get 1,7 - > 2 and 1,5 -> 1
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE mytable ( a, b ) AS
SELECT LEVEL, LEVEL FROM DUAL CONNECT BY LEVEL <= 2000
UNION ALL
SELECT LEVEL *2, LEVEL * 2 FROM DUAL CONNECT BY LEVEL <= 1000;
Query 1:
WITH data AS (
SELECT a
FROM mytable
GROUP BY a
HAVING COUNT(b) > COUNT( DISTINCT b )
ORDER BY a
),
numbered AS (
SELECT a,
ROWNUM AS rn
FROM data
)
SELECT a
FROM numbered
WHERE rn <= 20
Results:
| A |
|----|
| 2 |
| 4 |
| 6 |
| 8 |
| 10 |
| 12 |
| 14 |
| 16 |
| 18 |
| 20 |
| 22 |
| 24 |
| 26 |
| 28 |
| 30 |
| 32 |
| 34 |
| 36 |
| 38 |
| 40 |

putting data into categories (buckets) based on a value

how can i assign a category based on a value?
for example, i have a table with values from 1-200. how do i assign a category to each record, like 1-5, 6-10, 11-15, etc.
i can do it using the below but that seems like a bad solution.
sorry, this is probably very basic but i don't know what it's called and googling buckets (as it's called in our company) didn't bring up any results.
thank you
SELECT DISTINCT CountOfSA,
CASE
WHEN CountOfSA BETWEEN 1 AND 5 THEN
'1-5'
WHEN CountOfSA BETWEEN 6 AND 10 THEN
'6-10'
WHEN CountOfSA BETWEEN 11 AND 15 THEN
'11-15'
WHEN CountOfSA BETWEEN 16 AND 20 THEN
'16-20'
WHEN CountOfSA BETWEEN 21 AND 25 THEN
'21-25'
WHEN CountOfSA BETWEEN 26 AND 30 THEN
'26-30'
END
AS diff
FROM NR_CF_212
Have a look at WIDTH_BUCKET function.
This divides the range into equal sized intervals and assigns a bucket number to each interval.
with x as (
select CountOfSA,
width_bucket(CountOfSA, 1, 200, 40) bucket_
from NR_CF_212
)
select CountOfSA,
cast(1 + (bucket_ - 1)*5 as varchar2(4)) ||
'-' ||
cast( bucket_*5 as varchar2(4)) diff
from x
order by CountOfSA;
Demo here.
I would put the range values and descriptions into a separate table, especially if you plan to use these for future queries, views, etc. Plus its easier to change the ranges or descriptions as needed. For example:
create table sales_ranges
(
low_val number not null,
high_val number not null,
range_desc varchar2(100) not null
)
cache;
insert into sales_ranges values (0,1000,'$0-$1k');
insert into sales_ranges values (1001,10000,'$1k-$10k');
insert into sales_ranges values (10001,100000,'$10k-$100k');
insert into sales_ranges values (100001,1000000,'$100k-$1mm');
insert into sales_ranges values (1000001,10000000,'$1mm-$10mm');
insert into sales_ranges values (10000001,100000000,'$10mm-$100mm');
commit;
create table sales
(
id number,
total_sales number
);
insert into sales(id, total_sales)
-- some random values for testing
select level, trunc(dbms_random.value(1,10000000))
from dual
connect by level <= 100;
commit;
select id, total_sales, range_desc
from sales s
left outer join sales_ranges sr
on (s.total_sales between sr.low_val and sr.high_val)
order by s.id
;
Output (just first 3 rows):
ID TOTAL_SALES RANGE_DESC
1 5122380 $1mm-$10mm
2 347726 $100k-$1mm
3 6564700 $1mm-$10mm
I guess you could use concatenate together some calculated values to dynamically create the bucket name:
select countofsa
, ((countofsa - 1)/5) * 5 + 1
, ((countofsa - 1)/5 + 1) * 5
, ((countofsa - 1)/5) * 5 + 1 || '-' || ((countofsa - 1)/5 + 1) * 5 AS diff
from nr_cf_212
Some output:
countofsa | ?column? | ?column? | diff
-----------+----------+----------+-------
1 | 1 | 5 | 1-5
2 | 1 | 5 | 1-5
3 | 1 | 5 | 1-5
4 | 1 | 5 | 1-5
5 | 1 | 5 | 1-5
6 | 6 | 10 | 6-10
7 | 6 | 10 | 6-10
8 | 6 | 10 | 6-10
9 | 6 | 10 | 6-10
10 | 6 | 10 | 6-10
11 | 11 | 15 | 11-15
(11 rows)
UPDATE from comments, Oracle example, dynamically computing range:
create table nr_cf_212(countofsa number);
insert into nr_cf_212 values(1);
insert into nr_cf_212 values(2);
insert into nr_cf_212 values(3);
insert into nr_cf_212 values(4);
insert into nr_cf_212 values(5);
insert into nr_cf_212 values(6);
insert into nr_cf_212 values(7);
insert into nr_cf_212 values(9);
insert into nr_cf_212 values(10);
insert into nr_cf_212 values(11);
select countofsa
, TRUNC((countofsa - 1)/5) * 5 + 1
, (TRUNC((countofsa - 1)/5) + 1) * 5
, TRUNC((countofsa - 1)/5) * 5 + 1 || '-' || (TRUNC((countofsa - 1)/5) + 1) * 5 AS diff
from nr_cf_212;
| COUNTOFSA | TRUNC((COUNTOFSA-1)/5)*5+1 | (TRUNC((COUNTOFSA-1)/5)+1)*5 | DIFF |
|-----------|----------------------------|------------------------------|-------|
| 1 | 1 | 5 | 1-5 |
| 2 | 1 | 5 | 1-5 |
| 3 | 1 | 5 | 1-5 |
| 4 | 1 | 5 | 1-5 |
| 5 | 1 | 5 | 1-5 |
| 6 | 6 | 10 | 6-10 |
| 7 | 6 | 10 | 6-10 |
| 9 | 6 | 10 | 6-10 |
| 10 | 6 | 10 | 6-10 |
| 11 | 11 | 15 | 11-15 |
I tried it with sqlfiddle (http://sqlfiddle.com/#!4/b922e/4).
I broke it into parts to show the "from" column, the "to" column and then the range. If you divide your number by 5 and look at the quotient and the remainder, you will see a pattern:
1/5 = 0 remainder 1
2/5 = 0 remainder 2
3/5 = 0 remainder 3
4/5 = 0 remainder 4
5/5 = 1 remainder 0
6/5 = 1 remainder 1
7/5 = 1 remainder 2
8/5 = 1 remainder 3
9/5 = 1 remainder 4
10/5 = 2 remainder 0
11/5 = 2 remainder 1
The range for the number is "from" 5 times the quotient "to" 5 times the quotient plus the remainder -almost. Actually, everything is offset by 1. So take your number, subtract 1, then do the division.

SQL bring back highest sum of rows

I'm looking to calculate the highest basket in my set of data but I can't get my head around how I should do it.
I have data like:
OrderID | CustomerID | BasketID | ProductID | Price
1 | 1 | 1 | 221 | 10
2 | 1 | 1 | 431 | 123
3 | 1 | 2 | 761 | 44
4 | 2 | 3 | 12 | 54
5 | 2 | 3 | 102 | 78
6 | 3 | 4 | 111 | 98
7 | 3 | 4 | 41 | 45
8 | 3 | 5 | 65 | 66
9 | 4 | 6 | 32 | 47
10 | 4 | 6 | 118 | 544
Sorry if it seems quite messy.
But I can easily get the SUM with an obvious
SELECT SUM([Price]), BasketID, CustomerID FROM table
GROUP BY BasketID, CustomerID
But how can I filter the list for only the highest priced Basket ID for that CustomerID
Thanks
You can use a CTE (Common Table Expression) with the ROW_NUMBER function:
;WITH HighestPricePerCustomerAndBasket AS
(
SELECT
ID, UserID, ClassID, SchoolID, Created,
ROW_NUMBER() OVER(PARTITION BY BasketID,CustomerID ORDER BY Price DESC) AS 'RowNum'
FROM dbo.YourTable
)
SELECT
[Price], BasketID, CustomerID
FROM HighestPricePerCustomerAndBasket
WHERE RowNum = 1
This CTE "partitions" your data by BasketID,CustomerID, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by Price DESC - so the first row (highest price) gets RowNum = 1 (for each BasketID,CustomerID "partition") which is what I select from the CTE in the SELECT statement after it.
SELECT *
FROM (SELECT *,
DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY BasketTotal DESC) AS RNK
FROM (SELECT Sum(Price) AS BasketTotal,
BasketID,
CustomerID
FROM Order a
GROUP BY BasketID,
CustomerID
) a
) b
WHERE RNK = 1
I managed to conjure something up that worked.

Resources