Bye KEEP DENSE_RANK? - oracle

With data given
Id sdate sales
1 15.03.2015 150
2 16.03.2015 170
where id+date is unique combination
one could easily find the best date, or best item to sale.
Select max(date) keep(dense_rank last order by sales) from data.
So far so good. But suppose we have data like following:
Id sdate sales
1 15.03.2015 150
2 16.03.2015 170
1 15.03.2015 117
2 16.03.2015 97
… some other dates with worst sale sums than 15.03.2015 and 16.03.2015
Now I want to know the best DATES to sale
Select max(sdate) keep(dense_rank last order by sum(sales)) from data group by sdate.
Hey! It shows only 15.03.2015. But I want to see it both – 15.03.2015 and 16.03.2015.
LISTAGG doesn’t help here too. Only
Select sdate from data group by sdate
Order by sum(sales) DESC FETCH FIRST ROW WITH TIES
Returns me both dates. So, bye KEEP DENSE_RANK? Meet FETCH FIRST?
What is your opinion , respective all?

They're doing different things. keep can only return one row for each group. As you want to see tied values, you can't use keep, but you could do this with an inline view:
select sdate
from (
select sdate, dense_rank() over (order by sum(sales) desc) as rnk
from data
group by sdate
)
where rnk = 1;
Which is essentially what fetch first rows with ties is doing in 12c in this example.
There are situations where keep is appropriate, and others where an inline view or fetch first rows is appropriate, and some where either would work.
Having a scenario where you can't use keep to get the result you want doesn't mean you should never use it. Your first simpler query could use either approach; if you wanted other information then keep would come into its own (like the examples in the documentation for first). There are a lot of tools available and you need to pick the best one for what you're trying to achieve.

Related

Best way to store parameter that changes over the course of time

Consider the following scenario:
We have a function (let's call it service_cost) that performs some sort of computations.
In that computations we also use a variable (say current_fee) witch has a certain value at a given time (we get the value of that variable from an auxiliary table - fee_table).
Now current_fee could stay the same for 4 months, then it changes and obtains a new value, and so on and so forth. Of course I would like to know the current fee, but also should be able to find out the fee that was 'active' days, months, years before...
So, one way of organizing the the fee_table is
create table fee_table (
id number,
valid_from date,
valid_to date,
fee number
)
And then at any given time - if I want to get the current fee I would:
select fee into current_fee form
fee_table where trunc(sysdate) between valid_from and valid_to;
What I don't like about the solution above, is that it is easy to create inconsistent entries into fee_table - like:
-overlapping time periods (valid_from-valid_to) e.g. (1/1/2012 - 1/2/2012) and (15/1/2012-5/2012)
-no entry for current period
-holes in between the periods e.g. ([1/1/2012-1/2/2012],[1/4/2012-1/5/2012])
etc.
Could anyone suggest a better way to handle such a scenario?
Or may be - if we stick with the above scenario - some kind of constraints, check, triggers etc upon the table to avoid the inconsistencies described?
Thanks.
Thank you for all the comments above. So based on #Alex Pool and #William Robertson.
I am leaning towards the following solution:
The table
create table fee_table (
id number unique,
valid_from date unique,
fee number
)
The Data:
insert into fee_table_todel(tid, valid_from,fee) values (1,to_date('1/1/2014','dd/mm/rrrr'), 30.5);
insert into fee_table_todel(tid, valid_from,fee) values (2,to_date('3/2/2014','dd/mm/rrrr'), 20.5);
insert into fee_table_todel(tid, valid_from,fee) values (3,to_date('4/4/2014','dd/mm/rrrr'), 10);
The select:
with from_to_table as (
SELECT tid, valid_from, LEAD(valid_from, 1, null) OVER (ORDER BY
valid_from)-1 AS valid_to,fee
FROM fee_table
)
select fee from from_to_table
where to_date(:mydate,'dd/mm/rrrr') between valid_from and nvl(valid_to,to_date(:mydate,'dd/mm/rrrr')+1)

Return max by two columns within dataset?

The problem I am facing is I am trying to query SAP HANA to bring back a list of unique codes that refer to one instance of a change being made to a database. For a bit of background to the below image, each change has a relevant Site ID and Product No. that I am using together as variables, in order to find out the TS Number for the most recent date.
However, when I use the SELECT MAX(DATAB) function, it forces me to use aGROUP BY clause. But, because I cannot omit the TS Number from the GROUP BY clause, it returns all three.
Is there a way to get the max date, for any given combination of Product No. and Site ID, and only return the TS Number for that date? In this example, it would be fine to use TOP 1 but this is just a scaled-down example from a query that will look at many combinations of Product No. and Site ID (with the desired outcome being a list of all of the TS Numbers that relate to the most recent change for that product/store combination, that I will use for a join to another query).
Any help would be appreciated. If full table design etc. is required so that people can attempt to replicate the problem I will happily provide this but am hoping there's a simple solution I have not thought of...
Many thanks
As in any other SQL-DB that supports window functions, you can use row_number() or rank() function to get the desired result. Which one to use depends on how you want to handle tie values.
If you just want exactly one TS-Number in case there are more than one TS-Number for the same MAXDATE, use the following SQL:
select dat, ts_nr, pr_nr, site
from
(select *, row_number() over ( partition by pr_nr, site order by dat desc ) rownum
from mytab
)
where rownum = 1;
Be aware, that the result is non-deterministic. However, you can (should in most cases!) make it deterministic by adding ts_nr to the order by in the window order by clause. Then you get either the highest or lowest TS-Number for the same MAXDATE, depending on the sort order.
If you want all TS-Numbers in case there are several TS-Numbers for the same MAXDATE, use rank() instead of row_number(), like this:
select dat, ts_nr, pr_nr, site
from
(select *, rank() over ( partition by pr_nr, site order by dat desc ) ranknum
from mytab
)
where ranknum = 1;

AM I on the right path

I'm taking a database intro master's class. We are working on SQL. The professor likes to be ambiguous with certain explains.
Here's my question. Certain questions we are required to find out the opposite of a query something like if a supplier ships parts that are red and blue what colors don't the ship.
here is how I figured out a solution
SELECT distinct PARTS.COLOR
FROM PARTS, SHIPMENTS
WHERE PARTS.COLOR NOT IN(
SELECT distinct PARTS.COLOR
FROM SHIPMENTS, PARTS
WHERE PARTS.PARTNO IN(
SELECT distinct SHIPMENTS.PARTNO
FROM SHIPMENTS
WHERE SHIPMENTS.SUPPLIERNO='S1'))
AND SHIPMENTS.PARTNO = PARTS.PARTNO;
What I was wondering is, is this best approach to this question. This works but I'm not sure it is how it should be done.
I should also mention he does not want us to use all available operations. He did not show us JOIN, EXISTS,
he showed us SELECT, IN, ALL/ANY, Aggregates so MAX, MIN, SUM, GROUP BY, and HAVING
Thanks
If you learn now to use "EXPLAIN PLAN" to view the query plan, you'll find that Oracle often uses the same execution plan for "WHERE .. IN()" and "WHERE EXISTS". Depending on if there are indexes on the columns, it comes down to several aspects, mainly if you are using statistics gathering, Oracle will look at the number of rows for each table / index and decide which is the best way to execute it. So unless you find that IN() vs EXISTS() runs drastically differently than each other, just use whichever one makes most sense to you at the time, but always check the execution plan.
As far as your question, since you are prohibited from using joins or exists, I see nothing wrong with your solution.
The easy options I can come up with to simplify either use a join or an exists. You could do it with group and outer join, probably, but I see no point.
Without the restrictions, I could simplify it down to:
SELECT distinct P.COLOR
FROM PARTS P WHERE NOT EXISTS
(SELECT 1 FROM SHIPMENTS S WHERE S.PARTNO = P.PARTNO AND S.SUPPLIERNO = 'S1')
though I am not certain about your schema, and where color is. I assumed a part has a distinct color. If not, this is not adequate and you'd need to correlate the subquery on color, not partno.
Your question is: "if a supplier ships parts that are red and blue what colors don't they ship."
Interesting question. I think the easiest method uses analytic functions, which you probably haven't covered:
select sp.supplierno, color, count(*)
from (select s.*, p.color
max(case when p.color = 'red' then 1 else 0 end) over (partition by partno) as HasRed,
max(case when p.color = 'blue' then 1 else 0 end) over (partition by partno) as HasBlue
from shipments s join
parts p
on s.partno = p.partno
) sp
where hasRed > 0 and hasBlue > 0
group by sp.supplierno, color;

Storing weekly and monthly aggregates in Oracle

I need to dynamically update weekly and monthly sales data per product and customer. These need to be updated and checked during the sale of a product, and for various reasons I'm not able to use stored procedures or materialized views for this (I'll read everything into the application, modify everything in memory and then update and commit the results).
What is the best table structure for holding the sales during a period?
Store the period type (M, W) with start and end dates, or just the type and start date?
Use date fields and a char, or code it into a string ('M201201' / 'W201248')
Normalize sales and periods into two tables, or keep both sales and the period in a single table?
I will be doing two kinds of queries - select the sales of the current weekly (xor monthly) period/customer/article but not update them, and select for update weekly and monthly periods for a customer/article.
If you store both the start and end dates of the applicable period in the row, then your retrieval queries will be much easier, at least the ones that are based on a single date (like today). This is a very typical mode of access since you are probably going to be looking at things from the perspective of a business transaction (like a specific sale) which happens on a given date.
It is very direct and simple to say where #date_of_interest >= start_date and #date_of_interest <= end_date. Any other combination requires you to do date arithmetic either in code before you go to your query or within your query itself.
Keeping a type code (M, W) as well as both start and end dates entails introducing some redundancy. However, you might choose to introduce this redundancy for the sake of easing data retrieval. This: where #date_of_interest >= start_date and #date_of_interest <= end_date and range_type='M' is also very direct and simple.
As with all denormalization, you need to ensure that you have controls that will manage this redundancy.
I would recommend you to use a normalized schema for that purpose where you store weekly and monthly aggregation in two different tables with the same structure. I don't know the kind of queries you're going to do, but I suppose that this would make it easier to do any sort of search (when it's done in the right way!!!).
Probably something like this sample
product_prices (
prod_code,
price,
date_price_begin
);
sales (
prod_code,
customer_code,
sale_date
);
<aggregate-week>
select trunc(sale_date,'w') as week,
prod_code,
customer_code,
sum(price) keep (dense_rank first order by date_price_start) as price
from sales
natural join product_prices
where sale_date > date_from
group by trunc(sale_date,'iw'),
prod_code,
customer_code
/
<aggregate-month>
select trunc(sale_date,'m') as month,
prod_code,
customer_code,
sum(price) keep (dense_rank first order by date_price_start) as price
from sales
natural join product_prices
where sale_date > date_from
group by trunc(sale_date,'m'),
prod_code,
customer_code
/

Does Oracle's ROWNUM build the whole table before it extract the rows you want?

I need to make a navigation panel that shows only a subset of a possible large result set. This subset is 20 records before and 20 records after the resulted record set. As I navigate the results through the navigation panel, I'll be applying a sliding window design using ROWNUM to get the next subset. My question is does Oracle's ROWNUM build the whole table before it extracts the rows you want? Or is it intelligent enough to only generate the rows I need? I googled and I couldn't find an explanation on this.
The pre-analytic-function method for doing this would be:
select col1, col2 from (
select col1, col2, rownum rn from (
select col1, col2 from the_table order by sort_column
)
where rownum <= 20
)
where rn > 10
The Oracle optimizer will recognize in this case that it only needs to get the top 20 rows to satisfy the inner query. It will likely have to look at all the rows (unless, say, the sort column is indexed in a way that lets it avoid the sort altogether) but it will not need to do a full sort of all the rows.
Your solution will not work (as Bob correctly pointed out) but you can use row_number() to do what you want:
SELECT col1,
col2
FROM (
SELECT col1,
col2,
row_number() over (order by some_column) as rn
FROM your_table
) t
WHERE rn BETWEEN 10 AND 20
Note that this solution has the added benefit that you can order the final result on a different criteria if you want to.
Edit: forgot to answer your initial question:
With the above solution, yes Oracle will have to build the full result in order to find out the correct numbering.
With 11g and above you might improve your query using the query cache.
Concerning the question's title.
See http://www.orafaq.com/wiki/ROWNUM and this in-depth explanation by Tom Kyte.
Concerning the question's goal.
This should be what you're looking for: Paging with Oracle
I don't think your design is quite going to work out as you've planned. Oracle assigns values to ROWNUM in the order that they are produced by the query - the first row produced is assigned ROWNUM=1, the second is assigned ROWNUM=2, etc. Notice that in order to have ROWNUM=21 assigned the query must first return the first twenty rows and thus if you write a query which says
SELECT *
FROM MY_TABLE
WHERE ROWNUM >= 21 AND
ROWNUM <= 40
no rows will be returned because in order for there to be rows with ROWNUM >= 21 the query must first return all the rows with ROWNUM <= 20.
I hope this helps.
It's an old question but you should try this - http://www.inf.unideb.hu/~gabora/pagination/results.html

Resources