How to use SUM and MAX in select statement more than one table - oracle

I have 2 Tables
table a mempur
memberno = member number
purdt = purchase date
amount = purchase amount
table b meminfo
memberno = member number
fname = first name
age = age
select a.memberno,b.fname,sum(a.amount),a.purdt,b.age from mempur a,(select max(purdt) as maxdate,memberno from mempur group by memberno) maxresult,meminfo b
where a.memberno=b.memberno
and a.purdt between '01-JAN-22' and '28-FEB-22'
and a.memberno=maxresult.memberno
and a.purdt=maxresult.maxdate
group by a.memberno,b.fname,a.purdt,b.age
order by a.memberno;
How to get my result with total purchase amount and highest date purchase from table mempur?
I use this query able to show the result but the total amount incorrect between the range.
Anyone help is much appreciated.
my sample data
MEMBERNO PURDT AMOUNT
--------------- --------------- ---------
BBMY0004580 12-AUG-21 823.65
BBMY0004580 12-AUG-21 1709.1
BBMY0004580 26-AUG-21 1015.1
BBMY0004580 28-AUG-21 1105.1
my result only show total amount 1105.1

You can aggregate in mempur and then join to meminfo:
SELECT i.*, p.total_amount, p.maxdate
FROM meminfo i
INNER JOIN (
SELECT memberno, SUM(amount) total_amount, MAX(purdt) maxdate
FROM mempur
WHERE purdt BETWEEN '01-JAN-22' AND '28-FEB-22'
GROUP BY memberno
) p ON p.memberno = i.memberno;
You may use a LEFT join if there are members with no purchases which you want in the results.

Your query gets the maximum purdt and adds up the amount for this date. It also checks whether the maximum purdt is in January or February 2022. If it is, the result gets show, if it is not, you don't show the result. This is not the query you want.
Apart from that, the query looks rather ugly. You are using an ancient join syntax that is hard to read and prone to errors. We used that in the 1980s, but in 1992 explicit joins made it into the SQL standard. You should no longer use this old comma syntax. It is strange to see it still being used. Feels like going to a museum. Then, you are using table aliases. The idea of these is to get a query more readable, but yours even lessen readability, because your alias names a and b are arbitrary. Use mnemonic names instead, e.g. mp for mempur and mi for meminfo. Then, you are comparing the date (I do hope purdt is a date!) with strings. Don't. Use date literals instead.
As to your tables: Are you really storing the age? You will have to update it daily to keep it up-to-date. Better store the date of birth and calculate the age from it in your queries.
Here is a query that gets you the maximum date and the total amount for the given date range:
select memberno, m.name, p.sum_amount, p.max_purdt, m.age
from meminfo m
left outer join
(
select memberno, sum(amount) as sum_amount, max(purdt) as max_purdt
from mempur
where purdt >= date '2022-01-01' and purdt < date '2022-03-01'
group by memberno
) p using (memberno)
order by memberno;
And here is a query that gets you the maximum overall date along with the total amount for the given date range:
select memberno, m.name, p.sum_amount, p.max_purdt, m.age
from meminfo m
left outer join
(
select
memberno,
sum(case when where purdt >= date '2022-01-01' and purdt < date '2022-03-01'
then amount
end) as sum_amount,
max(purdt) as max_purdt
from mempur
group by memberno
) p using (memberno)
order by memberno;

Related

How to SELECT the MAX Time Difference Between Any 2 Consecutive Rows Per Value?

Just had a user answer this correctly for TSQL, but wondering how best to achieve this now in SQL Developer/PLSQL seeing as there is no DATEDIFF function.
Table I want to query on has some 'CODE' values, which can naturally have multiple primary key records ('OccsID') in a table 'Occs'. There is also a datetime column called 'CreateDT' for each OccsID.
Just want to find the maximum possible time variance between any 2 consecutive rows in 'Occs', per 'CODE'.
If you subtract the "next" date and "this" date (using the LEAD analytic function), you'll get the date difference. Then fetch the maximum difference per code. Something like this:
with diff as
(select occsid,
code,
nvl(lead(createdt) over (partition by code order by createdt), createdt) - createdt date_diff
from test
)
select code,
max(date_diff)
from diff
group by code;
Assuming that this T-SQL version works for you (from the prior question)
SELECT x.code, MAX(x.diff_sec) FROM
(
SELECT
code,
DATEDIFF(
SECOND,
CreateDT,
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT) --next row's createdt
) as diff_sec
FROM Occs
)x
GROUP BY x.code
The simplest option is just to subtract the two dates to get a difference in days. You can then multiply to get the difference in hours, minutes, or seconds
SELECT x.code, MAX(x.diff_day), MAX(x.diff_sec)
FROM
(
SELECT
code,
CreateDT -
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT) as diff_day,
24*60*60* (CreateDT -
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT)) as diff_sec
FROM Occs
)x
GROUP BY x.code

exclude part of the select not to consider date where clause

i have a select(water readings, previous water reading, other columns) , a "where clause" that is based on date water reading date. however for previous water reading it must not consider the where clause. I want to get previous meter reading regardless where clause date range.
looked at union problem is that i have to use the same clause,
SELECT
WATERREADINGS.name,
WATERREADINGS.date,
LAG( WATERREADINGS.meter_reading,1,NULL) OVER(
PARTITION BY WATERREADINGS.meter_id,WATERREADINGS.register_id
ORDER BY WATERREADINGS.meter_id DESC,WATERREADINGS.register_id
DESC,WATERREADINGS.readingdate ASC,WATERREADINGS.created ASC
) AS prev_water_reading,
FROM WATERREADINGS
WHERE waterreadings.waterreadingdate BETWEEN '24-JUN-19' AND
'24-AUG-19' and isactive = 'Y'
The prev_water_reading value must not be restricted by the date BETWEEN '24-JUN-19' AND '24-AUG-19' predicate but the rest of the sql should be.
You can do this by first finding the previous meter readings for all rows and then filtering those results on the date, e.g.:
WITH meter_readings AS (SELECT waterreadings.name,
waterreadings.date dt,
lag(waterreadings.meter_reading, 1, NULL) OVER (PARTITION BY waterreadings.meter_id, waterreadings.register_id
ORDER BY waterreadings.readingdate ASC, waterreadings.created ASC)
AS prev_water_reading,
FROM waterreadings
WHERE isactive = 'Y')
-- the meter_readings subquery above gets all rows and finds their previous meter reading.
-- the main query below then applies the date restriction to the rows from the meter_readings subquery.
SELECT name,
date,
prev_water_reading,
FROM meter_readings
WHERE dt BETWEEN to_date('24/06/2019', 'dd/mm/yyyy') AND to_date('24/08/2019', 'dd/mm/yyyy');
Perform the LAG in an inner query that is not filtered by dates and then filter by the dates in the outer query:
SELECT name,
"date",
prev_water_reading
FROM (
SELECT name,
"date",
LAG( meter_reading,1,NULL) OVER(
PARTITION BY meter_id, register_id
ORDER BY meter_id DESC, register_id DESC, readingdate ASC, created ASC
) AS prev_water_reading,
waterreadingdate --
FROM WATERREADINGS
WHERE isactive = 'Y'
)
WHERE waterreadingdate BETWEEN DATE '2019-06-24' AND DATE '2019-08-24'
You should also not use strings for dates (that require an implicit cast using the NLS_DATE_FORMAT session parameter, which can be changed by any user in their own session) and use date literals DATE '2019-06-24' or an explicit cast TO_DATE( '24-JUN-19', 'DD-MON-RR' ).
You also do not need to reference the table name for every column when there is only a single table as this clutters up your code and makes it difficult to read and DATE is a keyword so you either need to wrap it in double quotes to use it as a column name (which makes the column name case sensitive) or should use a different name for your column.
I've added a subquery with previous result without filter and then joined it with the main table with filters:
SELECT
WATERREADINGS.name,
WATERREADINGS.date,
w_lag.prev_water_reading
FROM
WATERREADINGS,
(SELECT name, date, LAG( WATERREADINGS.meter_reading,1,NULL) OVER(
PARTITION BY WATERREADINGS.meter_id,WATERREADINGS.register_id
ORDER BY WATERREADINGS.meter_id DESC,WATERREADINGS.register_id
DESC,WATERREADINGS.readingdate ASC,WATERREADINGS.created ASC
) AS prev_water_reading
FROM WATERREADINGS) w_lag
WHERE waterreadings.waterreadingsdate BETWEEN '24-JUN-19' AND '24-AUG-19' and isactive = 'Y'
and WATERREADINGS.name = w_lag.name
and WATERREADINGS.date = w_lag.date

Oracle: Update values in table with aggregated values from same table

I am looking for a possibly better approach to this.
I have created a temp table in Oracle 11.2 that I'm using to pre calculate values that I will need in other selects instead of always generating them again with each select.
create global temporary table temp_foo (
DT timestamp(6), --only the date part will be used in this example but for later things I will need the time
Something varchar2(100),
Customer varchar2(100),
MinDate timestamp(6),
MaxDate timestamp(6),
Filecount int,
Errorcount int,
AvgFilecount int,
constraint PK_foo primary key (DT, Customer)
) on commit preserve rows;
I then first insert some fixed values for everything except AvgFilecount. AvgFilecount should contain the average for the Filecount for the 3 previous records (going by the date in DT). It doesn’t matter that the result will be converted to an int, I don’t need the decimal places
DT | Customer | Filecount | AvgFilecount
2019-04-30 | x | 10 | avg(2+3+9)
2019-04-29 | x | 2 | based on values before this
2019-04-28 | x | 3 | based on values before this
2019-04-27 | x | 9 | based on values before this
I thought about using a normal UPDATE statement as this should be faster than looping through the values. I should mention that there are no gaps in the DT field but obviously there is a first one where I won‘t find any previous records. If I would loop through, I could easily calculate AvgFilecount with (the record before previous record/2 + previous record)/3 which I cannot with UPDATE as I cannot guarantee the order of how they are executed. So I‘m fine with just taking the last 3 records (going by DT) and calcuting it from there.
What I thought would be an easy update is giving me headaches. I‘m mostly doing SQL Server where I would just join the 3 other records but it seems is a bit different in Oracle. I have found https://stackoverflow.com/a/2446834/4040068 and wanted to use the second approach in the answer.
update
(select curr.DT, curr.temp_foo, curr.Filecount, curr.AvgFilecount as OLD, (coalesce(Minus1.Filecount, 0) + coalesce(Minus2.Filecount, 0) + coalesce(Minus3.Filecount, 0)) / 3 as NEW
from temp_foo curr
left join temp_foo Minus1 ON Minus1.Customer = curr.Customer and trunc(Minus1.DT) = trunc(curr.DT-1)
left join temp_foo Minus2 ON Minus2.Customer = curr.Customer and trunc(Minus2.DT) = trunc(curr.DT-2)
left join temp_foo Minus3 ON Minus3.Customer = curr.Customer and trunc(Minus3.DT) = curr.DT-3
order by 1, 2
)
set OLD = NEW;
Which gives me an
ORA-01779: cannot modify a column which maps to a non key-preserved
table
01779. 00000 - "cannot modify a column which maps to a non key-preserved table"
*Cause: An attempt was made to insert or update columns of a join view which
map to a non-key-preserved table.
*Action: Modify the underlying base tables directly.
I thought this should work as both join conditions are in the primary key and thus unique. I am currently implementing the first approach in the above mentioned answer but it is getting quite big and it feels like there should be a better solution to this.
Other things I thought about trying:
using a nested subselect (nested because Oracle doesn’t know top(n) and I need to sort the subselect) to select the previous 3 records ordered by DT and then he outer select with rownum <=3 and then I could just use AVG(). However, I was told subselect can be quite slow and joins are better in Oracle performance wise. Dunno if that is really the case, haven‘t done any testing
Edit: My insert right now looks like this. I am already aggregating the Filecount for a day as there can be multiple records per DT per Customer per Something.
insert into temp_foo (DT, Something, Customer, Filecount)
select dates.DT, tbl1.Something, tbl1.Customer, coalesce(sum(tbl3.Filecount),0)
from table(Function_Returning_Daterange(NULL, NULL)) dates
cross join
(SELECT Something,
Code,
Value
FROM Table2 tbl2
WHERE (Something = 'Value')) tbl1
left outer join Table3 tbl3
on tbl3.Customer = tbl1.Customer
and trunc(tbl3.MinDate) = trunc(dates.DT)
group by dates.DT, tbl1.Something, tbl1.Customer;
You could use an analytic average with a window clause:
select dt, customer, filecount,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
and then do the update part with a merge statement:
merge into tmp_foo t
using (
select dt, customer,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
) s
on (s.dt = t.dt and s.customer = t.customer)
when matched then update set t.avgfilecount = s.avgfilecount;
4 rows merged.
select dt, customer, filecount, avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
You haven't shown your original insert statement; it might be possible to add the analytic calculation to that, and avoid the separate update step.
Also, if you want the first two date values to be calculated as if the 'missing' extra days before them had zero counts, you could use sum and division instead of avg:
select dt, customer, filecount,
sum(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding)/3 as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 4
2019-04-28 x 3 3
2019-04-27 x 9
It depends what you expect those last calculated values to be.

using alias for cast in HIVE

I have a table called loan with loan amount,annual income, year (MMM-YY format) and member id. I am trying to find the highest loan amount in a year along wit annual income and member id details.
I tried to group the highest loan amount by year using the code
select max(cast(loan_amt as int)),issue_d from loan group by issue_d;
then I wanted also to fetch the member id and annual income information so I wrote the following code
but it is giving me error message for using alias for a column which is cast.
Code:
select a.loan_amt,a.member_id,a.annual_inc,a.issue_d
from
(select loan_amt,member_id,annual_inc,issue_d from loan) a
join
(select max(cast(loan_amt as int)) as ml,issue_d from loan group by issue_d) c
where ((a.issue_d=c.issue_d) and (a.loan_amt=a.ml));
What you want to do is rank the records based on the Amount, per Period, then keep only the top 1 record for each Period.
Use one of the analytic functions that are designed exactly for that purpose -- Hive has a pretty good support of the SQL standard on that topic.
Since you don't say what to do about ties (i.e. what if several loans have the same Amount???) I assume you want just one record chosen randomly...
select X, Y, Z, Period, Amount as TopAmount
from
(select X, Y, Z, Period, cast(StrAmt as double) as Amount,
row_number() over (partition by Period order by cast(StrAmt as double) desc) as TmpRank
from WTF
) TMPWTF
where TmpRank =1
If you want all the records with top Amount then replace row_number with rank or dense_rank (the "dense" stuff would make a difference for the top 2, but not for the top 1)

Need to make a query more efficient

I have a query which I need to make more efficient.
I am breaking it down into sections to see where the efficiency floors are, I currently have a few Nested Select statements, are these a performance problem?
Here is an example of one of them:
SELECT AgreementID,
DueDate,
UpdatedAmountDue AS AmountDue,
COALESCE((SELECT SUM(UpdatedAmountDue)
FROM RepaymentBreakdown AS B
WHERE CONVERT(datetime, CONVERT(varchar, DueDate, 103), 103) <=
CONVERT(datetime, CONVERT(varchar, R.DueDate, 103), 103)
AND B.AgreementID = R.AgreementID),0) AS DueTD,
RN=ROW_NUMBER() OVER (Partition BY R.AgreementID ORDER BY DueDate)
FROM RepaymentBreakdown AS R
Is there a more clean and efficient way of getting the data of DueTD?
Basically, for each line of a repayment schedule result, I want to get:
AgreementID,
DueDate,
AmountDue,
AmountDueToDate (DueTD)
RowNumber.
The table I am querying is structured as follows:
AgreementID (int),
DueDate (datetime),
AmountDue (decimal(9,2)),
UpdatedAmountDue (decimal(9,2))*
*UpdatedAmountDue is always referenced as it is the moving figure, AmountDue is always fixed, as a reference value.
So, I think you could get performance boost just by removing convert, like this:
select
AgreementID,
DueDate,
UpdatedAmountDue as AmountDue,
(
select sum(B.UpdatedAmountDue)
from RepaymentBreakdown as B
where B.DueDate <= R.DueDate and B.AgreementID = R.AgreementID
) as UpdatedAmountDue
from RepaymentBreakdown AS R
The fastest way I know to calculate running total in SQL Server 2008 would be to use recursive CTE, see my answer here Calculate a Running Total in SqlServer. In your case the query would be smth like this:
create table #t (....., primary key (AgreementID, ord))
insert into #t (AgreementID, DueDate, UpdatedAmountDue, ord)
select AgreementID, DueDate, UpdatedAmountDue, row_number() over (partition by AgreementID, DueDate order by DueDate asc)
;with
CTE_RunningTotal
as
(
select T.ord, T.AgreementID, T.DueDate, T.UpdatedAmountDue as T.AmountDue, T.UpdatedAmountDue
from #t as T
where T.ord = 1
union all
select T.ord, T.AgreementID, T.DueDate, T.UpdatedAmountDue as T.AmountDue, T.UpdatedAmountDue + C.UpdatedAmountDue as UpdatedAmountDue
from CTE_RunningTotal as C
inner join #t as T on T.ord = C.ord + 1 and T.AgreementID = C.AgreementID
)
select AgreementID, DueDate, AmountDue, UpdatedAmountDue
from CTE_RunningTotal as C
option (maxrecursion 0)
Your conversion of the datetime to a date has several issues.
First, it is not guaranteed to always produce correct results depending on your servers language settings. If you need to do String manipulation on a datetime value always use CONVERT(,,126).
But more importantly, it prevents index usage. Instead use CAST(DueDate AS DATE) as the optimizer recognizes that conversion to be index-safe.
Afterwards you might want to add an index on AgreementId,DueDate and either INCLUDE UpdatedAmountDue or better make it clustered.
Assuming UpdatedAmountDue cannot be NULL, you can get rid of the COALESCE too, as the sum always includes the current row.

Resources