Using Oracle, PSQL, I am trying to figure out the earliest invoice date for each supplier. That would be simple enough, but I am also trying to figure out the max distribution line on the earliest invoice so I can determine what segment of the business the invoice belongs to. Segment is determined by SEGMENT_NUMBER in the example below. I know a sub query or multiple sub queries are needed here with a group by clause but I am at a loss. The syntax below is not even close, but I wanted to provided something for feedback.
SELECT
SUPPLIER_ID,
INVOICE_NUMBER,
SEGMENT_NUMBER,
MIN(INVOICE_DATE) as EARLIEST_INV_DATE,
MAX(DISTRIBUTION_AMOUNT) as MAX_DIST_LINE
FROM INVOICE_DIST
Use Analytical function like RANK().
SELECT SUPPLIER_ID,
INVOICE_NUMBER,
SEGMENT_NUMBER,
INVOICE_DATE,DISTRIBUTION_AMOUNT
(SELECT SUPPLIER_ID,
INVOICE_NUMBER,
SEGMENT_NUMBER,
INVOICE_DATE,DISTRIBUTION_AMOUNT,
RANK() OVER(PARTITION BY SUPPLIER_ID ORDER BY INVOICE_DATE,DISTRIBUTION_AMOUNT DESC) POSITION FROM INVOICE_DIST) TBL WHERE POSITION=1;
Related
Just had a user answer this correctly for TSQL, but wondering how best to achieve this now in SQL Developer/PLSQL seeing as there is no DATEDIFF function.
Table I want to query on has some 'CODE' values, which can naturally have multiple primary key records ('OccsID') in a table 'Occs'. There is also a datetime column called 'CreateDT' for each OccsID.
Just want to find the maximum possible time variance between any 2 consecutive rows in 'Occs', per 'CODE'.
If you subtract the "next" date and "this" date (using the LEAD analytic function), you'll get the date difference. Then fetch the maximum difference per code. Something like this:
with diff as
(select occsid,
code,
nvl(lead(createdt) over (partition by code order by createdt), createdt) - createdt date_diff
from test
)
select code,
max(date_diff)
from diff
group by code;
Assuming that this T-SQL version works for you (from the prior question)
SELECT x.code, MAX(x.diff_sec) FROM
(
SELECT
code,
DATEDIFF(
SECOND,
CreateDT,
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT) --next row's createdt
) as diff_sec
FROM Occs
)x
GROUP BY x.code
The simplest option is just to subtract the two dates to get a difference in days. You can then multiply to get the difference in hours, minutes, or seconds
SELECT x.code, MAX(x.diff_day), MAX(x.diff_sec)
FROM
(
SELECT
code,
CreateDT -
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT) as diff_day,
24*60*60* (CreateDT -
LEAD(CreateDT) OVER(PARTITION BY CODE ORDER BY CreateDT)) as diff_sec
FROM Occs
)x
GROUP BY x.code
I don't often use ORACLE PL/SQL by the way but i need to understand what if anything in this function created by someone else
in the company before me is wrong as for it is not returning the latest record i've been told. I found out in some other forum issues that they
suggested to use the max(dateColumn) instead of "row_numer = 1" for example but not quite sure how to and where to incorporate that.
-- Knowing that --
We use Oracle version 12,
CustomObjectTypeA is an custom Oracle OBJECT TYPE defined by some old employee not longer in here,
V_OtherView is of Table_Mnd type beeing defined by some old employee not longer in here,
V_ABC_123 is a view created by some old employee not longer in here as well.
CREATE OR REPLACE FUNCTION F_TABLE_APPROVED (NUMBER_F_UPD number, NUMBER_F_GET VARcHAR2)
RETURN Table_Mnd
IS
V_OtherView Table_Mnd
BEGIN
SELECT CustomObjectTypeA (FromT.NUMBER_F,
FromT.OP_CODE,
FromT.CATG_CODE,
FromT.CATG_NAME,
FromT.CATG_SORT,
FromT.ORG_CODE,
FromT.ORG_NAME
FromT.DATA_ENTRY_VALID,
FromT.NUMBER_RECEIVED,
FromT.YEAR_1,
FromT.YEAR_2)
BULK COLLECT INTO V_OtherView
FROM (SELECT NUMBER_F,
OP_CODE,
CATG_CODE,
CATG_NAME,
CATG_SORT,
ORG_CODE,
ORG_NAME
DATA_ENTRY_VALID,
NUMBER_RECEIVED,
YEAR_1,
YEAR_2,
ROW_NUMBER() OVER (PARTITION BY BY ORG_CODE ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC) AS ROW_NUMBER
FROM V_ABC_123
WHERE NUMBER_F = NUMBER_F_UPD AND DATA_ENTRY_VALID <> 'OnGoing'
AND LOAD_DATE >= (SELECT sysdate-10 FROM dual)
AND LOAD_DATE <= (SELECT DISTINCT LOAD_DATE
FROM V_ABC_123
WHERE NUMBER_RECEIVED = NUMBER_F_GET)) FromT
WHERE FromT.ROW_NUMBER=1;
RETURN V_OtherView;
END F_TABLE_APPROVED;
The important bits of the query are:
SELECT ...
FROM (select ...,
ROW_NUMBER()
OVER (PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC,
LOAD_DATE DESC) AS ROW_NUMBER
...) FromT
WHERE FromT.ROW_NUMBER = 1;
The "ROW_NUMBER" column is computed according to the following window clause:
PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC
Which means that for each ORG_CODE, it will sort all the records by NUMBER_RECEVED,LOAD_DATE in descending order. Note that if the columns are Oracle DATEs, they will only be accurate to the nearest second; so if there are multiple records with date/times in the exact same 1-second interval, this sort order will not be guaranteed unique. The logic of ROW_NUMBER will therefore pick one of them arbitrarily (i.e. whichever record happens to be emitted first) and assign it the value "1", and this will be deemed the "latest". Subsequent executions of the same SQL could (in theory) return a different record.
The suspicious part is NUMBER_RECEIVED which sounds like it's a number, not a date? Sorting by this means that the records with the highest NUMBER_RECEIVED will be preferred. Was this intentional?
I'm not sure why the PARTITION is there, this would cause the query to return one "latest" record for each value of ORG_CODE that it finds. I can only assume this was intentional.
The problem is that the query can only determine the "latest record" as well as it can based on the data provided to it. In this case, it's possible the data is simply not granular enough to be able to decide which record is the actual "latest" record.
I have a table called loan with loan amount,annual income, year (MMM-YY format) and member id. I am trying to find the highest loan amount in a year along wit annual income and member id details.
I tried to group the highest loan amount by year using the code
select max(cast(loan_amt as int)),issue_d from loan group by issue_d;
then I wanted also to fetch the member id and annual income information so I wrote the following code
but it is giving me error message for using alias for a column which is cast.
Code:
select a.loan_amt,a.member_id,a.annual_inc,a.issue_d
from
(select loan_amt,member_id,annual_inc,issue_d from loan) a
join
(select max(cast(loan_amt as int)) as ml,issue_d from loan group by issue_d) c
where ((a.issue_d=c.issue_d) and (a.loan_amt=a.ml));
What you want to do is rank the records based on the Amount, per Period, then keep only the top 1 record for each Period.
Use one of the analytic functions that are designed exactly for that purpose -- Hive has a pretty good support of the SQL standard on that topic.
Since you don't say what to do about ties (i.e. what if several loans have the same Amount???) I assume you want just one record chosen randomly...
select X, Y, Z, Period, Amount as TopAmount
from
(select X, Y, Z, Period, cast(StrAmt as double) as Amount,
row_number() over (partition by Period order by cast(StrAmt as double) desc) as TmpRank
from WTF
) TMPWTF
where TmpRank =1
If you want all the records with top Amount then replace row_number with rank or dense_rank (the "dense" stuff would make a difference for the top 2, but not for the top 1)
I can find empirical distribution that way
select command_type, duration, round(percentage, 2)
from (select distinct command_type,duration_sec,
percent_rank() over(partition by command_type order by duration) percentage
from command_durations
order by 1, 2)
The question is how to do the same using oracle model clause. I have started with this
select command_type,duration,dur_count from command_durations
model UNIQUE SINGLE REFERENCE
partition by (command_type)
dimension by ( duration)
measures(0 dur_count)
rules(
dur_count[duration]=count(1)[cv(duration)]
)
order by command_type,duration
But now I need to make records distinct, in order to be able to proceed with finding empirical distribution.
How to do the records distinct in the model clause?
If you want to take that query and use 'distinct' on it, one method might be to wrap that in a From Subquery statement, and then do a distinct. For instance:
Select Distinct command_type, duration, dur_count
From (
[Your Code]
)
Let me know if that works.
I'm pulling two pieces of information over a specific time period, but I would like to fetch the daily average of one tag and the daily count of another tag. I'm not sure how to do daily averages over a specific time period, can anyone provide some advice? Below were my first ideas on how to handle this however to change every date would be annoying. Any help is appreciated thanks
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18:00:00:00','yyyy-mm-dd:hh24:mi:ss')
AND chargetime<to_date('2012-07-19:00:00:00','yyyy-mm-dd:hh24:mi:ss')
group by chargetime;
The working version of the daily sum
SELECT to_char(bi.chargetime, 'mmddyyyy') as chargtime, SUM(cv.val)*0.0005
FROM Charge_Value cv, batch_index bi WHERE cv.ValueID =97
AND bi.chargetime<=to_date('2012-07-19','yyyy-mm-dd')
AND bi.chargeno = cv.chargeno AND bi.typ=1
group by to_char(bi.chargetime, 'mmddyyyy')
seems like in the first one you want to change the group to the day - not the time... (plus i dont think you need to specify all those 0's for seconds..)
SELECT COUNT(distinct chargeno), to_char(chargetime, 'mmddyyyy') AS chargeend
FROM batch_index WHERE plant=1 AND chargetime>to_date('2012-06-18','yyyy-mm-dd')
AND chargetime<to_date('2012-07-19','yyyy-mm-dd')
group by to_char(chargetime, 'mmddyyyy') ;
not 100% I'm following your question, but if you just want to do aggregates (sums, avg), then do just that. I threw in the rollup just in case that is what you were looking for
with fakeData as(
select trunc(level *.66667) nr
, trunc(2*level * .33478) lvl --these truncs just make the doubles ints
,trunc(sysdate+trunc(level*.263784123)) dte --note the trunc, this gets rid of the to_char to drop the time
from dual
connect by level < 600
) --the cte is just to create fake data
--below is just some aggregates that may help you
select sum(nr) daily_sum_of_nr
, avg(nr) daily_avg_of_nr
, count(distinct lvl) distinct_lvls_per_day
, count(lvl) count_of_nonNull_lvls_per_day
, dte days
from fakeData
group by rollup(dte)
--if you want the query to supply a total for the range, you may use rollup ( http://psoug.org/reference/rollup.html )