BigQuery translation from Oracle - min & max with partition & range - oracle

I am trying to translate a query from Oracle to BigQuery and coming up with an error. No matter how I try to rearrange the BigQuery version, I'm not getting the results that I expect. The values are far too large on the BigQuery version. Can someone take a look and see what might be missing?
Oracle Version
MAX(LASTWK) OVER(
PARTITION BY TO_CHAR(STRT_DT, 'D'), HR, TIME_INT
ORDER BY
TRUNC(STRT_DT)
RANGE BETWEEN 35 PRECEDING AND 0 PRECEDING
) MAXCNT,
MIN(LASTWK) OVER(
PARTITION BY TO_CHAR(STRT_DT, 'D'), HR, TIME_INT
ORDER BY
TRUNC(STRT_DT)
RANGE BETWEEN 35 PRECEDING AND 0 PRECEDING
) MINCNT
BigQuery version that isn't working as expected:
MAX(LASTWK) OVER(
PARTITION BY format_date('%d',STRT_DT), HR, cast(TIME_INT as int64)
ORDER BY
datetime_trunc(STRT_DT, day)
RANGE BETWEEN 35 PRECEDING AND 0 PRECEDING
) MAXCNT,
MIN(LASTWK) OVER(
PARTITION BY format_date('%d',STRT_DT), HR, cast(TIME_INT as int64)
ORDER BY
datetime_trunc(STRT_DT, day)
RANGE BETWEEN 35 PRECEDING AND 0 PRECEDING
) MINCNT

Related

SQL server: Find Max, 2nd Max and 3rd Max values >> Query processing time question

I'm learning SQL and I'm trying to find an explanation for the behavior of the code below.
Currently, with this code, I can obtain the Max, 2nd Max, and 3rd Max values out of a column table (table name: #TEMP_EST_FYQ). This table has 3 attributes and more than 65K tuples.
The code provides the expected results and the query executes in less than a second for 2nd Max and 3rd Max values BUT for the Max, the query takes over 8 minutes to complete.
I'm curious about what could be the reason for this. Thanks in advance!
--Query executes in: +8 mins
SELECT DISTINCT SE_Request_FYQ
FROM #TEMP_EST_FYQ A
WHERE (0) = (SELECT COUNT(DISTINCT(SE_Request_FYQ))
FROM #TEMP_EST_FYQ B
WHERE B.SE_Request_FYQ > A.SE_Request_FYQ )
--Query executes in: -1 sec
SELECT DISTINCT SE_Request_FYQ
FROM #TEMP_EST_FYQ A
WHERE (1) = (SELECT COUNT(DISTINCT(SE_Request_FYQ))
FROM #TEMP_EST_FYQ B
WHERE B.SE_Request_FYQ > A.SE_Request_FYQ )
--Query executes in: -1 sec
SELECT DISTINCT SE_Request_FYQ
FROM #TEMP_EST_FYQ A
WHERE (2) = (SELECT COUNT(DISTINCT(SE_Request_FYQ))
FROM #TEMP_EST_FYQ B
WHERE B.SE_Request_FYQ > A.SE_Request_FYQ )
Value results:
"2022-Q1" Max
"2021-Q4" 2nd Max
"2021-Q3" 3rd Max
I don't/can't know the exact reason for performance without checking the execution plans. For the max value, I suggest just doing a max query:
SELECT MAX(SE_Request_FYQ)
FROM #TEMP_EST_FYQ;
For this query, an index on SE_Request_FYQ should let SQL Server find the max value almost immediately. For the second and third highest, use DENSE_RANK:
WITH cte AS (
SELECT *, DENSE_RANK() OVER (ORDER BY SE_Request_FYQ DESC) rnk
FROM #TEMP_EST_FYQ
)
SELECT SE_Request_FYQ
FROM cte
WHERE rnk IN (2, 3);
This query should also benefit from the same index suggested above for the max query.

multiply records acrross a rows - pl/sql [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

11g Oracle aggregate SQL query

Can you please help me in getting a query for this scenario. In below case it should return me single row of A=13 because 13,14 in column A has most occurrences and value of B (30) is greater for 13. We are interested in maximum occurrences of A and in case of tie B should be considered as tie breaker.
A B
13 30
13 12
14 10
14 25
15 5
In below case where there are single occurrence of A (all tied) it should return 14 having maximum value of 40 for B.
A B
13 30
14 40
15 5
Use case - we get calls from corporate customers. We are interested in knowing during what hours of day when most calls come and in case of tie - which of the busiest hours has longest call.
Further question
There is further questions on this. I want to use either of two solutions - '11g or lower' from #GurV or 'dense_rank' from #mathguy in bigger query below how can I do it.
SELECT dv.id , u.email , dv.email_subject AS headline , dv.start_date , dv.closing_date, b.name AS business_name, ls.call_cost, dv.currency,
SUM(lsc.duration) AS duration, COUNT(lsc.id) AS call_count, ROUND(AVG(lsc.duration), 2) AS avg_duration
-- max(extract(HOUR from started )) keep (dense_rank last order by count(duration), max(duration)) as most_popular_hour
FROM deal_voucher dv
JOIN lead_source ls ON dv.id = ls.deal_id
JOIN lead_source_call lsc ON ls.PHONE_SID = lsc.phone_number_id
JOIN business b ON dv.business_id = b.id
JOIN users u ON b.id = u.business_id
AND TRUNC(dv.closing_date) = to_date('13-01-2017', 'dd-mm-yyyy')
AND lsc.status = 'completed' and lsc.duration >= 30
GROUP BY dv.id , u.email , dv.email_subject , dv.start_date , dv.closing_date, b.name, ls.call_cost, dv.currency
--, extract(HOUR from started )
Try this if 12c+
select a
from t
group by a
order by count(*) desc, max(b) desc
fetch first 1 row only;
If 11g or lower:
select * from (
select a
from t
group by a
order by count(*) desc, max(b) desc
) where rownum = 1;
Note that if there is equal count and equal max value for two or more values of A, then any one of them will be fetched.
Here is a query that will work in older versions (no fetch clause) and does not require a subquery. It uses the first/last function. In case of ties by both "count by A" and "value of max(B)" it selects only the row with the largest value of A. You can change that to min(A), or even to sum(A) (although that probably doesn't make sense in your problem) or LISTAGG(A, ',') WITHIN GROUP (ORDER BY A) to get a comma-delimited list of the A's that are tied for first place, but that requires 11.2 (I believe).
select max(a) keep (dense_rank last order by count(b), max(b)) as a
, max(max(b)) keep (dense_rank last order by count(b)) as b
from inputs
group by a
;

Oracle query fine tuning

Hi I have a database with large number of records roughly, 400K which is supposed to grow even more.
I have a query to fetch data from this table to display records to user . my query is below.
SELECT "PC0".PYID AS "pyID" ,
"PC0".NAME AS "Name" ,
"PC0".OPPORTUNITYSTAGE AS "OpportunityStage" ,
"PC0".PXCREATEOPNAME AS "pxCreateOpName" ,
"PC0".PZINSKEY AS "pzInsKey" ,
"PC0".OPPORTUNITYSHORTNAME AS "OpportunityShortName" ,
"PC0".IDTYPE AS "IDType" ,
"PC0".IDNO AS "IDNo" ,
"Campaign".PROGRAMNAME AS "ProgramName" ,
"Campaign".ENDDATE AS "EndDate" ,
"PC0".PRODUCTNAME AS "ProductName" ,
"PC0".PRODUCTTYPE AS "ProductType" ,
"PC0".OPPORTUNITYSTAGE AS "OpportunityStage" ,
"PC0".PXCREATEOPNAME AS "pxCreateOpName" ,
"PC0".OPPORTUNITYSOURCE AS "OpportunitySource" ,
"PC0".OPPORTUNITYOWNER AS "OpportunityOwner" ,
"PC0".IDTYPE
||"PC0".IDNO AS "pyTextValue(1)" ,
"PC0".REMINDERDATE AS "ReminderDate" ,
"PC0".STAGELASTCHANGED AS "StageLastChanged" ,
ROUND((CAST(SYSDATE AS DATE) - CAST("PC0".STAGELASTCHANGED AS DATE))) AS "pyIntegerValue(1)" ,
(
CASE
WHEN ROUND((CAST(SYSDATE AS DATE) - CAST("PC0".REMINDERDATE AS DATE))) > 0
THEN 1
WHEN ROUND((CAST(SYSDATE AS DATE) - CAST("PC0".STAGELASTCHANGED AS DATE))) > 7
THEN 2
ELSE 3
END) AS "pyIntegerValue(2)" ,
"PC0".PXCREATEDATETIME AS "pxCreateDateTime" ,
"PC0".CAMPAIGNID AS "CampaignID" ,
ROUND((CAST(SYSDATE AS DATE) - CAST("PC0".REMINDERDATE AS DATE))) AS "pyIntegerValue(3)"
FROM MYCO_OPPORTUNITY "PC0"
LEFT OUTER JOIN MYCO_CAMPAIGN "Campaign"
ON ( "PC0".CAMPAIGNID = "Campaign".PYID)
ORDER BY 21 ASC,
22 DESC
This takes near to 13 seconds to fetch first 50 records in SQl developer. In real time I will be fetching almost 5k records at a time.
The time 13 sec is coming after i have defined functional index for CAST on REMINDERDATE and STAGELASTCHANGED column and a bitmap join index.
Can you please suggest how should i optimize the query. Order by on a large set might be an issue bit it is must for me. :(
Make sure you have an index on: "PC0".CAMPAIGNID and on: "Campaign".PYID
Make sure your SGA is set high enough. Without knowing a lot information about the server and database it's hard to provide guidance other than make sure the SGA is large enough.
You're using "order by" on a computed column, which means Oracle has to compute this value for all 400k rows, before being able to sort and return results. To be certain that this is the problem test without using order by.
There are a number of possible solutions but this example does not seem to be your actual use case so its pretty much meaningless to suggest optimizations for it.
Without more knowledge about the data I'd suggest splitting the query into three parts connected with union and implement indexes on reminderdate and stagelastchanged.
select * from ( [part 1] where reminderdate > sysdate order by pxCreateDateTime )
union all
select * from ( [part 2] where reminderdate <= sysdate and stagelastchanged + 7 < sysdate order by pxCreateDateTime )
union all
select * from ( [part 3] where reminderdate <= sysdate and stagelastchanged + 7 >= sysdate order by pxCreateDateTime )
I'd then expect that 1. and 2. should be satisfied using index and 3. a full table scan, which might be helped by adding a first_rows hint.

Oracle Moving Average without Current Row

Would like to calculate moving average in oracle for the last 30 records excluding current row.
select crd_nb, flng_month, curr_0380,
avg(curr_0380) over (partition by crd_nb order by flng_month ROWS 30 PRECEDING) mavg
from us_ordered
The above SQL calculates moving average for 30 records including current row.
Is there any way to calculate moving average excluding current row.?
select
crd_nb,
flng_month,
curr_0380,
avg(curr_0380) over (
partition by crd_nb
order by flng_month
ROWS between 30 PRECEDING and 1 preceding
) mavg
from us_ordered
#be_here_now's answer is clearly superior. I'm leaving mine in place nonetheless, as it's still functional, if needlessly complex.
An answer would to be to get the sum and count individually, subtract out the current row then divide the two results. It's a little ugly, but it should work:
SELECT crd_nb,
flng_month,
curr_0380,
( SUM (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- curr_0380)
/ ( COUNT (curr_0380)
OVER (PARTITION BY crd_nb
ORDER BY flng_month ROWS 30 PRECEDING)
- 1)
mavg
FROM us_ordered
If curr_0380 can be null, you'd have to tweak this slightly so that the current row is removed only if it's not null.

Resources