Return max by two columns within dataset? - max

The problem I am facing is I am trying to query SAP HANA to bring back a list of unique codes that refer to one instance of a change being made to a database. For a bit of background to the below image, each change has a relevant Site ID and Product No. that I am using together as variables, in order to find out the TS Number for the most recent date.
However, when I use the SELECT MAX(DATAB) function, it forces me to use aGROUP BY clause. But, because I cannot omit the TS Number from the GROUP BY clause, it returns all three.
Is there a way to get the max date, for any given combination of Product No. and Site ID, and only return the TS Number for that date? In this example, it would be fine to use TOP 1 but this is just a scaled-down example from a query that will look at many combinations of Product No. and Site ID (with the desired outcome being a list of all of the TS Numbers that relate to the most recent change for that product/store combination, that I will use for a join to another query).
Any help would be appreciated. If full table design etc. is required so that people can attempt to replicate the problem I will happily provide this but am hoping there's a simple solution I have not thought of...
Many thanks

As in any other SQL-DB that supports window functions, you can use row_number() or rank() function to get the desired result. Which one to use depends on how you want to handle tie values.
If you just want exactly one TS-Number in case there are more than one TS-Number for the same MAXDATE, use the following SQL:
select dat, ts_nr, pr_nr, site
from
(select *, row_number() over ( partition by pr_nr, site order by dat desc ) rownum
from mytab
)
where rownum = 1;
Be aware, that the result is non-deterministic. However, you can (should in most cases!) make it deterministic by adding ts_nr to the order by in the window order by clause. Then you get either the highest or lowest TS-Number for the same MAXDATE, depending on the sort order.
If you want all TS-Numbers in case there are several TS-Numbers for the same MAXDATE, use rank() instead of row_number(), like this:
select dat, ts_nr, pr_nr, site
from
(select *, rank() over ( partition by pr_nr, site order by dat desc ) ranknum
from mytab
)
where ranknum = 1;

Related

Dose multiple ORDER BY values ​is as effective as in an individual ORDER BY?

I am a junior in a sub-position to the company's DBA employee. (ORACLE, PL/SQL Developer IDE)
As part of my tasks, there is a need to sort a particular table by a certain value, then sort it again, and then sort it again, as in the following example:
SELECT *
FROM (SELECT *
FROM (SELECT * FROM CARS
ORDER BY BRAND)
ORDER BY COLOR)
ORDER BY YEAR)
While in my opinion, multi-values ORDER BY can be used, like the following:
SELECT * FROM CARS ORDER BY BRAND, COLOR, YEAR
Note that there is great importance to the runtime of the program in the current task, so even if the code is a little more complicated, but effective and takes less time, it is preferable.
Just for the record, the sort value at each step is created by ROW_NUMBER () OVER (PARTITION BY...
which relevant to each of the values. (e.g. ROW_NUMBER () OVER (PARTITION BY COLOR ORDER BY COLOR) AS C1, and then ORDER BY C1)
So which of the ways is preferable? We should expect to see similar results in both? Or is there another better option?
The code
SELECT *
FROM (SELECT *
FROM (SELECT * FROM CARS
ORDER BY BRAND)
ORDER BY COLOR)
ORDER BY YEAR)
is synonymous with
SELECT *
FROM CARS
ORDER BY YEAR
If you are seeing that the output looks to be sorted in a cascading fashion, ie, by year, brand, color etc then this is by pure good luck not by anything guaranteed by the SQL engine. The final ORDER BY is the only one that matters (in terms of sorting). [Some times you might see an embedded ORDER BY for reasons of doing pagination, but the final result only depends on the final ORDER BY]
So yes, changing the statement to
SELECT * FROM CARS
ORDER BY BRAND, COLOR, YEAR
is most probably what you want, because the former is not going the give the result you're after.

Bye KEEP DENSE_RANK?

With data given
Id sdate sales
1 15.03.2015 150
2 16.03.2015 170
where id+date is unique combination
one could easily find the best date, or best item to sale.
Select max(date) keep(dense_rank last order by sales) from data.
So far so good. But suppose we have data like following:
Id sdate sales
1 15.03.2015 150
2 16.03.2015 170
1 15.03.2015 117
2 16.03.2015 97
… some other dates with worst sale sums than 15.03.2015 and 16.03.2015
Now I want to know the best DATES to sale
Select max(sdate) keep(dense_rank last order by sum(sales)) from data group by sdate.
Hey! It shows only 15.03.2015. But I want to see it both – 15.03.2015 and 16.03.2015.
LISTAGG doesn’t help here too. Only
Select sdate from data group by sdate
Order by sum(sales) DESC FETCH FIRST ROW WITH TIES
Returns me both dates. So, bye KEEP DENSE_RANK? Meet FETCH FIRST?
What is your opinion , respective all?
They're doing different things. keep can only return one row for each group. As you want to see tied values, you can't use keep, but you could do this with an inline view:
select sdate
from (
select sdate, dense_rank() over (order by sum(sales) desc) as rnk
from data
group by sdate
)
where rnk = 1;
Which is essentially what fetch first rows with ties is doing in 12c in this example.
There are situations where keep is appropriate, and others where an inline view or fetch first rows is appropriate, and some where either would work.
Having a scenario where you can't use keep to get the result you want doesn't mean you should never use it. Your first simpler query could use either approach; if you wanted other information then keep would come into its own (like the examples in the documentation for first). There are a lot of tools available and you need to pick the best one for what you're trying to achieve.

ORA-00979 not a Group By function error

Iam trying to select 2 values from a Table, Employee emp_name, emp_location grouping by emp_location, iam aware that the columns which are in group by function needs to be in select clause, but i would like to know whether is there any other way to get these value in a single query.
My intention is to select only one employee per location based on age.
sample query
select emp_name,emp_location
from Employee
where emp_age=25
group by emp_location
please help in this regard.
Thanks a lot for all the guys who have responded for this question. I will try to learn these windows functions as these are very handy.
The reason why this works in MySQL and not in Oracle, is because in Oracle, as well most other databases, you either need to specify a field (or expression) in the group by clause, or it has to be an aggregation which combines the values of all values in the group into a single one. For instance, this would work:
select max(emp_name),emp_location
from Employee
where emp_age=25
group by emp_location
However, it's may not the best solution. It will work if you want just the name, but you'll get into trouble when you want to have multiple fields for an employee. In that case max won't do the trick. In the query below, you might get a first name that doesn't match the last name.
select max(emp_firstname), max(emp_lastname), emp_location
from Employee
where emp_age=25
group by emp_location
On solution for this, is using a window function (analytical function). With those, you can generate a value for each record, without immediately reducing the number of records. For instance, with a windowed max function, you could select the max age for people named John, and display that value next to every John in the result, even if they don't have that age.
Some functions, like rank, dense_rank and row_number can be used to generate a number for each employee, which you can then use to filter by. In the example below, I created such a counter per location (partition by), and ordered by, in this case name and id. You can specify other fields as well, for instance if you want one name per age per location, you specify both age and location in partition by. If you want the oldest employee of each location, you can remove where emp_age=25 and order by emp_age desc instead.
select
*
from
(select
emp_name, emp_location,
dense_rank() over (partition by emp_location order by emp_name, emp_id) as emp_rank
from Employee
where emp_age=25)
where
emp_rank = 1
ORA-00979 not a Group By function error
Only aggregate functions and columns specified in the GROUP BY clause are allowed in the SELECT clause.
In that regard, Oracle follows the SQL standard closely. But, as you noticed in your comment, some other RDBMS are less strict than Oracle regarding that point. For example, to quote MySQL's documentation (emphasis mine):
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. [...]
However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
So, in the recommended use case, adding the extra columns to the GROUP BY clause will lead to the same result.
select emp_name,emp_location
-- ^^^^^^^^
-- this is *not* part of the ̀`GROUP BY` clause
from Employee
where emp_state=25
group by emp_location
Maybe are you looking for:
...
group by emp_location, emp_name
select emp_name,emp_location
from Employee
where emp_age=25
group by emp_name,emp_location
or
select max(emp_name) emp_name,emp_location
from Employee
where emp_age=25
group by emp_location

AM I on the right path

I'm taking a database intro master's class. We are working on SQL. The professor likes to be ambiguous with certain explains.
Here's my question. Certain questions we are required to find out the opposite of a query something like if a supplier ships parts that are red and blue what colors don't the ship.
here is how I figured out a solution
SELECT distinct PARTS.COLOR
FROM PARTS, SHIPMENTS
WHERE PARTS.COLOR NOT IN(
SELECT distinct PARTS.COLOR
FROM SHIPMENTS, PARTS
WHERE PARTS.PARTNO IN(
SELECT distinct SHIPMENTS.PARTNO
FROM SHIPMENTS
WHERE SHIPMENTS.SUPPLIERNO='S1'))
AND SHIPMENTS.PARTNO = PARTS.PARTNO;
What I was wondering is, is this best approach to this question. This works but I'm not sure it is how it should be done.
I should also mention he does not want us to use all available operations. He did not show us JOIN, EXISTS,
he showed us SELECT, IN, ALL/ANY, Aggregates so MAX, MIN, SUM, GROUP BY, and HAVING
Thanks
If you learn now to use "EXPLAIN PLAN" to view the query plan, you'll find that Oracle often uses the same execution plan for "WHERE .. IN()" and "WHERE EXISTS". Depending on if there are indexes on the columns, it comes down to several aspects, mainly if you are using statistics gathering, Oracle will look at the number of rows for each table / index and decide which is the best way to execute it. So unless you find that IN() vs EXISTS() runs drastically differently than each other, just use whichever one makes most sense to you at the time, but always check the execution plan.
As far as your question, since you are prohibited from using joins or exists, I see nothing wrong with your solution.
The easy options I can come up with to simplify either use a join or an exists. You could do it with group and outer join, probably, but I see no point.
Without the restrictions, I could simplify it down to:
SELECT distinct P.COLOR
FROM PARTS P WHERE NOT EXISTS
(SELECT 1 FROM SHIPMENTS S WHERE S.PARTNO = P.PARTNO AND S.SUPPLIERNO = 'S1')
though I am not certain about your schema, and where color is. I assumed a part has a distinct color. If not, this is not adequate and you'd need to correlate the subquery on color, not partno.
Your question is: "if a supplier ships parts that are red and blue what colors don't they ship."
Interesting question. I think the easiest method uses analytic functions, which you probably haven't covered:
select sp.supplierno, color, count(*)
from (select s.*, p.color
max(case when p.color = 'red' then 1 else 0 end) over (partition by partno) as HasRed,
max(case when p.color = 'blue' then 1 else 0 end) over (partition by partno) as HasBlue
from shipments s join
parts p
on s.partno = p.partno
) sp
where hasRed > 0 and hasBlue > 0
group by sp.supplierno, color;

Oracle Select Query, Order By + Limit Results

I am new to Oracle and working with a fairly large database. I would like to perform a query that will select the desired columns, order by a certain column and also limit the results. According to everything I have read, the below query should be working but it is returning "ORA-00918: column ambiguously defined":
SELECT * FROM(SELECT * FROM EAI.EAI_EVENT_LOG e,
EAI.EAI_EVENT_LOG_MESSAGE e1 WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) WHERE ROWNUM <= 20
Any suggestions would be greatly appreciated :D
The error message means your result set contains two columns with the same name. Each column in a query's projection needs to have a unique name. Presumably you have a column (or columns) with the same name in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE.
You also want to join on that column. At the moment you are generating a cross join between the two tables. In other words, if you have a hundred records in EAI_EVENT_LOG and two hundred records EAI_EVENT_LOG_MESSAGE your result set will be twenty thousand records (without the rownum). This is probably your intention.
"By switching to innerjoin, will that eliminate the error with the
current code?"
No, you'll still need to handle having two columns with the same name. Basically this comes from using SELECT * on two multiple tables. SELECT * is bad practice. It's convenient but it is always better to specify the exact columns you want in the query's projection. That way you can include (say) e.TRANSACTION_ID and exclude e1.TRANSACTION_ID, and avoid the ORA-00918 exception.
Maybe you have some columns in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE tables having identical names? Instead of SELECT * list all columns you want to select.
Other problem I see is that you are selecting from two tables but you're not joining them in the WHERE clause hence the result set will be the cross product of those two table.
You need to stop using SQL '89 implicit join syntax.
Not because it doesn't work, but because it is evil.
Right now you have a cross join which in 99,9% of the cases is not what you want.
Also every sub-select needs to have it's own alias.
SELECT * FROM
(SELECT e.*, e1.* FROM EAI.EAI_EVENT_LOG e
INNER JOIN EAI.EAI_EVENT_LOG_MESSAGE e1 on (......)
WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) s WHERE ROWNUM <= 20
Please specify a join criterion on the dotted line.
Normally you do a join on a keyfield e.g. ON (e.id = e1.event_id)
It's bad idea to use select *, it's better to specify exactly which fields you want:
SELECT e.field1 as customer_id
,e.field2 as customer_name
.....

Resources