I am attempting to get the average of the lower third of the data that has wages sorted by wages ascending. I have tried to just return the TOPN using a FILTER to not include BLANK() wage values. I then need to just select the column I care about in the AVERAGE calculation. So I wrote something like the following where [withSalaryJobCount] is a calculated measure that is only the count of rows that have a non-BLANK annualSalary column:
entryWages:= AVERAGE(
SELECTCOLUMNS(
CALCULATE(
TOPN(
[withSalaryJobCount]- [withSalaryJobCount]/3,
'table',
'table'[salaryAnnual],
ASC
),
FILTER(table, [salaryAnnual] <> BLANK())
),[entryWages]
"bottomThird",
[salaryAnnual]
)
)
This is failing with an error that:
The AVERAGE function only accepts a column reference as an argument
Original Question:
I have a set of SQL calculations that give me the percentile wages as well as what we term the entry and experience level wages. The list of wages are entered into a table sorted by their value with an IDENTITY column. A much simplified query to insert and calculate the percentile, entry, and experienced wages is listed below:
CREATE TABLE #t1 (
id int identity,
salaryannual decimal(18,2)
)
INSERT INTO #t1
SELECT salaryannual
FROM table a
ORDER BY salaryannual
SELECT
(SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=minID AND ID<=minID+(ct/3)) entryLevelSalary,
(SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=maxID-(ct/3) AND ID<=maxID) experiencedSalary,
(select AVG(CAST(salaryannual AS BIGINT)) from #t1 where ID = minID + (ct/2+1)/2 or ID = minID + (ct/2+1)/2 + (ct/2+1)%2) q1,
(select AVG(CAST(salaryannual AS BIGINT)) from #t1 where ID = minID + (ct+1)/2 or ID = minID + (ct+1)/2 + (ct+1)%2 ) median,
(select AVG(CAST(salaryannual AS BIGINT)) from #t1 where ID = minID + ct+1 - ((ct/2+1)/2 + (ct/2+1)%2) or ID = minID + ct+1 -((ct/2+1)/2) ) q3,
(SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=minID AND ID<=maxID) avgSal
FROM
(
SELECT COUNT(*) ct, MIN(ID) minID, MAX(ID) maxID
FROM #t1
) uniqueIDs
Converting the percentile calculation is of the form:
pct25Wages:= Calculate(PERCENTILE.INC('table'[salaryAnnual], .25), FILTER([withSalaryCount] > 6))
The FILTER is used because we have a minimum requirement that there be at least 7 entries with a salary going forward.
My question is how to convert the entry/experience into a DAX/measure query?
(SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=minID AND ID<=minID+(ct/3)) entryLevelSalary,
(SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=maxID-(ct/3) AND ID<=maxID) experiencedSalary,
I have tried using a STDDEV and AVG wage calculation like below but it does not give expected results and looking at it I can see it would not work as I expected anyway:
entryWages:= [avgWages] + 3 * [StdDevWage]
So, after much banging my head against a wall this is how I accomplished this.
First I needed to rank my rows in the set by the [salaryAnnual] column. But, since many entries can have the same salary I also tweaked the calculation a bit by using the uniqueID assigned to the row:
RANKX(
'TABLE',
'TABLE'[salaryAnnual] + ('TABLE'[ID] / 1000000000),
,
ASC
)
Then I used this value to give me the bottom third of rows with salary (measure [withSalaryJobCount]):
TOPN(
1 + ( [withSalaryJobCount]/3),
'TABLE',
RANKX(
'TABLE',
'TABLE'[salaryAnnual] + ('TABLE'[ID] / 1000000000),
,
ASC
),
ASC
)
Finally I needed to only get the [salaryAnnual] column that was not null and only pull out the [salaryAnnual] column from the calculated table to do the average of:
AVERAGEX(
SELECTCOLUMNS(
CALCULATETABLE(
TOPN(
1 + ( [withSalaryJobCount]/3),
'TABLE',
RANKX(
'TABLE',
'TABLE'[salaryAnnual] + ('TABLE'[ID] / 1000000000),
,
ASC
),
ASC
),
FILTER(TABLE, [salaryAnnual] <> BLANK())
),
"bottomThird",
'TABLE'[salaryAnnual]
), [bottomThird]
)
Related
I have the following setup, which works fine and generates output as expected.
I'm trying to add the locations subquery into the CTE so my output will have a random location_id for each row.
The subquery is straight forward and should work but I am getting syntax errors when I try to place it into the 'data's CTE. I was hoping someone could help me out.
CREATE TABLE employees(
employee_id NUMBER(6),
emp_name VARCHAR2(30)
);
INSERT INTO employees(
employee_id,
emp_name
) VALUES
(1, 'John Doe');
INSERT INTO employees(
employee_id,
emp_name
) VALUES
(2, 'Jane Smith');
INSERT INTO employees(
employee_id,
emp_name
) VALUES
(3, 'Mike Jones');
CREATE TABLE locations AS
SELECT level AS location_id,
'Door ' || level AS location_name
FROM dual
CONNECT BY level <=
with rws as (
select level rn from dual connect by level <= 5 ),
data as ( select e.*,round (dbms_random.value(1,5)
) n from employees e)
select employee_id,
emp_name,
trunc (sysdate) + dbms_random.value (0, 5) AS random_date
from rws
join data d on rn <= n
order by employee_id;
-- trying to make this work
with rws as ( select level rn from dual connect by level <= 5 ),
data as ( select e.*, loc.location_id = (
select location_id
from locations order by dbms_random.value()
fetch first 1 row only
),
round (dbms_random.value(1,5)
) n from employees e )
select employee_id,
emp_name,
trunc (sysdate) + dbms_random.value (0, 5) AS random_date
from rws
join data d on rn <= n
order by employee_id;
You need to alias the subquery column expression, rather than trying to assign it to a [variable] name. So instead of this:
with rws as ( select level rn from dual connect by level <= 5 ),
data as ( select e.*, loc.location_id = (
select location_id
from locations order by dbms_random.value()
fetch first 1 row only
),
round (dbms_random.value(1,5)
) n from employees e )
you would do this:
with rws as (
select level rn
from dual
connect by level <= 5
),
data as (
select e.*,
(
select location_id
from locations
order by dbms_random.value()
fetch first 1 row only
) as location_id,
round (dbms_random.value(1,5)) as n
from employees e
)
db<>fiddle
But yes, you'll get the same location_id for each row, which probably isn't what you want.
There are probably better ways to avoid it (or to approach whatever you're actually trying to achieve) but one option is to force the subquery to be correlated by adding something like:
where location_id != -1 * e.employee_id
db<>fiddle
although that might be expensive. It's probably worth asking a new question about that specific aspect.
I am getting the same location_id for every employee_id, which I don't want either.
The subquery is in the wrong place then; move it to the main query, and correlate against both ID and n:
with rws as (
select level rn
from dual
connect by level <= 5
),
data as (
select e.*,
round (dbms_random.value(1,5)) as n
from employees e
)
select d.employee_id,
d.emp_name,
(
select location_id
from locations
where location_id != -1 * d.employee_id * d.n
order by dbms_random.value()
fetch first 1 row only
) as location_id,
trunc (sysdate) + dbms_random.value (0, 5) AS random_date
from rws r
join data d on r.rn <= d.n
order by d.employee_id;
db<>fiddle
Or move the location part to a new CTE, I suppose, with its own row number; and join that on one of your other generated values.
PROCEDURE DELETE_X1
IS
v_emp_unum VARCHAR2 (25);
BEGIN
/*FOR rec
IN (SELECT l2.*,
row_number ()
OVER (PARTITION BY case_uid,nvl(min_eff_date, eff_begin_date)
ORDER BY case_uid,nvl(min_eff_date, eff_begin_date))
rw,
ROWID rid
FROM (SELECT l1.*,
MIN (
CASE
WHEN eff_end_date = next_begin - 1
THEN
eff_begin_date
END)
OVER (PARTITION BY case_uid)
min_eff_date,
MAX (
CASE
WHEN previous_end <>
TO_DATE ('12/31/9999',
'mm/dd/yyyy')
AND previous_end + 1 = eff_begin_date
THEN
eff_end_date
END)
OVER (PARTITION BY case_uid)
max_end_date
FROM (SELECT GT.*,
LEAD (
EFF_begin_DATE)
OVER (PARTITION BY CASE_UID
ORDER BY EFF_BEGIN_DATE)
next_begin,
LAG (
EFF_end_DATE)
OVER (PARTITION BY CASE_UID
ORDER BY EFF_BEGIN_DATE)
previous_end
FROM TABLE_OUTPUT GT
WHERE SRC = 'ERICSSON' AND STATUS_CODE = 'X1') l1)
l2)*/
FOR rec
IN (SELECT l2.*,
ROW_NUMBER ()
OVER (PARTITION BY case_uid, min_eff_date, max_end_date
ORDER BY case_uid, min_eff_date, max_end_date)
rw,
ROWID rid
FROM (SELECT l1.*,
(MAX (
start_at)
OVER (
PARTITION BY case_uid
ORDER BY EFF_BEGIN_DATE
ROWS UNBOUNDED PRECEDING))
min_eff_date,
(MIN (
break_at)
OVER (
PARTITION BY case_uid
ORDER BY EFF_BEGIN_DATE
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING))
max_end_date
FROM (SELECT GT.*,
(CASE
WHEN LAG (
EFF_end_DATE)
OVER (
PARTITION BY CASE_UID
ORDER BY EFF_BEGIN_DATE) =
EFF_BEGIN_DATE
- 1
THEN
NULL
ELSE
EFF_BEGIN_DATE
END)
start_at,
(CASE
WHEN LEAD (
EFF_BEGIN_DATE)
OVER (
PARTITION BY case_uid
ORDER BY EFF_BEGIN_DATE) =
CASE
WHEN EFF_end_DATE <>
TO_DATE (
'12/31/9999',
'mm/dd/yyyy')
THEN
EFF_end_DATE
+ 1
ELSE
EFF_end_DATE
END
THEN
NULL
ELSE
EFF_end_DATE
END)
break_at
FROM TABLE_OUTPUT GT
WHERE SRC = 'ERICSSON' AND STATUS_CODE = 'X1') l1)
l2)
The part of code is commented out an re written.
commented out code output
OFF TIME 1/1/2017 1/7/2017 X1
OFF TIME 1/8/2017 2/1/2017 X1
New code output
OFF TIME 1/1/2017 2/1/2017 X1
NORMAL 2/2/2017 2/2/2017 AB
OFF TIME 2/20/2017
The LAG function is used to access data from a previous row.
The LEAD function is used to return data from rows further down the result set.
aggregate functions MIN , MAX
i am pretty confused with the flow of code.
I can't understand that code on the whole please explain the logic for the code
I am working on a small project in Oracle. I need to get the top three best-selling products and the total amounts taken on these for the year and for each of the four quarters April-June, July-September, October-December and January-March. I have already found the first part, I simply need help getting the 4 quarter totals for each product. Hope someone can help, thank you
This is the SQL commands used so far:
select * from (
select "FACTQUANTITY"."PRODUCTID" as "PRODUCTID",
"DIMPRODUCT"."PRODUCTNAME" as "PRODUCTNAME",
sum(FACTQUANTITY.QUANTITY) as "QUANTITY"
from "FACTQUANTITY" "FACTQUANTITY",
"DIMPRODUCT" "DIMPRODUCT"
where "DIMPRODUCT"."PRODUCTID"="FACTQUANTITY"."PRODUCTID"
group by FACTQUANTITY.PRODUCTID,
DIMPRODUCT.PRODUCTNAME
order by sum(FACTQUANTITY.QUANTITY) desc
)
WHERE ROWNUM <= 3;
You can start from here:
select
trunc(fact_table.date_column,'Q') as quarter,
"FACTQUANTITY"."PRODUCTID" ,
"DIMPRODUCT"."PRODUCTNAME",
sum(FACTQUANTITY.QUANTITY) as "QUANTITY"
from "FACTQUANTITY" JOIN "DIMPRODUCT"
ON "DIMPRODUCT"."PRODUCTID"="FACTQUANTITY"."PRODUCTID"
group by
FACTQUANTITY.PRODUCTID, DIMPRODUCT.PRODUCTNAME, trunc(fact_table.date_column,'Q')
;
Subsequently, you can:
with a as (<previous query>)
select *
from (
select
quarter,
productid,
productname,
quantity,
row_number() over (partition by quarter order by quantity desc) rnk
from a
)
where rnk <= 3;
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER ( PARTITION BY quarter
ORDER BY quantity DESC, PRODUCTNAME ASC ) AS rn
FROM (
SELECT TRUNC( f.datetime, 'Q' ) AS quarter,
f.PRODUCTID,
d.PRODUCTNAME,
sum(f.QUANTITY) AS QUANTITY
FROM FACTQUANTITY f
INNER JOIN
DIMPRODUCT d
ON ( d.PRODUCTID = f.PRODUCTID )
GROUP BY TRUNC( f.datetime, 'Q' ),
FACTQUANTITY.PRODUCTID,
DIMPRODUCT.PRODUCTNAME
)
)
WHERE rn <= 3;
I am have a table with 500k transactions. I want to fetch the last balance for a particular date. So I have have returned a query like below.
SELECT curr_balance
FROM transaction_details
WHERE acct_num = '10'
AND is_deleted = 'N'
AND ( value_date, srl_num ) IN(
SELECT MAX( value_date ), MAX( srl_num )
FROM transaction_details
WHERE TO_DATE( value_date, 'dd/mm/yyyy' )
<= TO_DATE( ADD_MONTHS( '05-APR-2012', 1 ), 'dd/mm/yyyy' )
AND acct_num = '10'
AND is_deleted = 'N'
AND ver_status = 'Y' )
AND ver_status = 'Y'
This has to be executed for incrementing of 12 months to find the last balance for each particular month. But this query is having more cpu cost, 12 times it is taking huge time. How to remodify the above query to get the results in faster way. Whether this can be broken into two part in PL/SQL to achieve the performance. ?
Try:
select * from(
SELECT value_date, srl_num, curr_balance
FROM transaction_details
WHERE acct_num = '10'
AND is_deleted = 'N'
AND ver_status = 'Y'
row_number() over (partition by trunc(value_date - interval '5' day,'MM')
order by srl_num desc
) as rnk
)
where rnk = 1;
You'll get a report with the ballance on last srl_num on each month in your table.
The benefit is that your approach scans the table 24 times for 12 months report and my approach scans the table once.
The analytic function gets the rank of record in current month(partition by clause) ordering the rows in the month after srl_num.
You don't have to query your table twice. Try using analytic functions
SELECT t.curr_balance
-- , any other column you want as long it is in the subselect.
FROM (
SELECT
trans.curr_balance
, trans.value_date
-- any other column you want
, trans.srl_num
, MAX(trans.srl_num) OVER(PARTITION BY trans.value_date, trans.srl_num) max_srl_num
, MAX(trans.value_date) OVER(PARTITION BY trans.value_date, trans.srl_num) max_date
FROM transaction_details trans
WHERE TO_DATE( value_date, 'dd/mm/yyyy' ) <= TO_DATE( ADD_MONTHS( '01-APR-2012', 1 ), 'dd/mm/yyyy' )
AND acct_num = '10'
AND is_deleted = 'N'
AND ver_status = 'Y'
) t
WHERE t.max_date = t.value_date
AND t.max_srl_num = t.srl_num
A couple of thoughts.
Why do you have TO_DATE( value_date...? Isn't your data type DATE? this might be breaking your index if you have one in that column.
Note that (this is a wild guess) if your srl_num is not the highest for the latest date, you will have incorrect results and might not return any rows.
I have identified a way to get fast paged results from the database using CTEs and the Row_Number function, as follows...
DECLARE #PageSize INT = 1
DECLARE #PageNumber INT = 2
DECLARE #Customer TABLE (
ID INT IDENTITY(1, 1),
Name VARCHAR(10),
age INT,
employed BIT)
INSERT INTO #Customer
(name,age,employed)
SELECT 'bob',21,1
UNION ALL
SELECT 'fred',33,1
UNION ALL
SELECT 'joe',29,1
UNION ALL
SELECT 'sam',16,1
UNION ALL
SELECT 'arthur',17,0;
WITH cteCustomers
AS ( SELECT
id,
Row_Number( ) OVER(ORDER BY Age DESC) AS Row
FROM #Customer
WHERE employed = 1
/*Imagine I've joined to loads more tables with a really complex where clause*/
)
SELECT
name,
age,
Total = ( SELECT
Count( id )
FROM cteCustomers )
FROM cteCustomers
INNER JOIN #Customer cust
/*This is where I choose the columns I want to read, it returns really fast!*/
ON cust.id = cteCustomers.id
WHERE row BETWEEN ( #PageSize * #PageNumber - 1 ) AND ( #PageSize * ( #PageNumber ) )
ORDER BY row ASC
Using this technique the returned results is really really fast even on complex joins and filters.
To perform paging I need to know the Total Rows returned by the full CTE. I have "Bodged" this by putting a column that holds it
Total = ( SELECT
Count( id )
FROM cteCustomers )
Is there a better way to return the total in a different result set without bodging it into a column? Because it's a CTE I can't seem to get it into a second result set.
Without using a temp table first, I'd use a CROSS JOIN to reduce the risk of row by row evaluation on the COUNT
To get total row, this needs to happen separately to the WHERE
WITH cteCustomers
AS ( SELECT
id,
Row_Number( ) OVER(ORDER BY Age DESC) AS Row
FROM #Customer
WHERE employed = 1
/*Imagine I've joined to loads more tables with a really complex where clause*/
)
SELECT
name,
age,
Total
FROM cteCustomers
INNER JOIN #Customer cust
/*This is where I choose the columns I want to read, it returns really fast!*/
ON cust.id = cteCustomers.id
CROSS JOIN
(SELECT Count( *) AS Total FROM cteCustomers ) foo
WHERE row BETWEEN ( #PageSize * #PageNumber - 1 ) AND ( #PageSize * ( #PageNumber ) )
ORDER BY row ASC
However, this isn't guaranteed to give accurate results as demonstrated here:
can I get count() and rows from one sql query in sql server?
Edit: after a few comments.
How to avoid a CROSS JOIN
WITH cteCustomers
AS ( SELECT
id,
Row_Number( ) OVER(ORDER BY Age DESC) AS Row,
COUNT(*) OVER () AS Total --the magic for this edit
FROM #Customer
WHERE employed = 1
/*Imagine I've joined to loads more tables with a really complex where clause*/
)
SELECT
name,
age,
Total
FROM cteCustomers
INNER JOIN #Customer cust
/*This is where I choose the columns I want to read, it returns really fast!*/
ON cust.id = cteCustomers.id
WHERE row BETWEEN ( #PageSize * #PageNumber - 1 ) AND ( #PageSize * ( #PageNumber ) )
ORDER BY row ASC
Note: YMMV for performance depending on 2005 or 2008, Service pack etc
Edit 2:
SQL Server Central shows another technique where you have reverse ROW_NUMBER. Looks useful
#Digiguru
OMG, this really is the wholy grail!
WITH cteCustomers
AS ( SELECT id,
Row_Number() OVER(ORDER BY Age DESC) AS Row,
Row_Number() OVER(ORDER BY id ASC)
+ Row_Number() OVER(ORDER BY id DESC) - 1 AS Total /*<- voodoo here*/
FROM #Customer
WHERE employed = 1
/*Imagine I've joined to loads more tables with a really complex where clause*/
)
SELECT name, age, Total
/*This is where I choose the columns I want to read, it returns really fast!*/
FROM cteCustomers
INNER JOIN #Customer cust
ON cust.id = cteCustomers.id
WHERE row BETWEEN ( #PageSize * #PageNumber - 1 ) AND ( #PageSize * ( #PageNumber ) )
ORDER BY row ASC
So obvious now.