I am trying to simply run a query where I return every field from a table (Select*), however I want to condense the number of records returned by grouping on certain criteria. So far If I do a group by after an aggregate function I can only reduce the records to the extent I want by only grouping by the aggregated function and a single other field. If i introduce more than one field then the records expand as the group by is now going beyond just the one field.
SELECT SUM (Cars) as CarSales, SaleDates
FROM (SELECT Cars,
Date,
Color,
Make,
Model,
Jetpack,
CASE
WHEN NVL (TO_CHAR (Date), 'NULL') LIKE 'NULL'
THEN
TRUNC (TO_DATE (SYSDATE), 'dd')
ELSE
Date
END
SalesDates
FROM Car.DB)
WHERE TableDate= 20140501
GROUP BY SalesDates
ORDER BY SalesDates DESC
So in the above , I want to return only the records provided, but in addition I want to return the rest of the table data while not expanding the number of records. I want the group by to stick to grouping by SalesDates only.
Related
I have question about "Subquery in Order by clause". The below request returns the error. Is it means that Subquery in Order by clause must be scalar?
select *
from employees
order by (select * from employees where first_name ='Steven' and last_name='King');
Error:
ORA-00913: too many values
00913. 00000 - "too many values"
Yes, it means that if you use a subquery in ORDER BY it must be scalar.
With select * your subquery returns multiple columns and the DBMS would not know which of these to use for the sorting. And if you selected one column only, you would still have to make sure you only select one row of course. (The difference is that Oracle sees the too-many-columns problem immediately, but detect too many rows only when fetching the data.)
This would be allowed:
select * from employees
order by (select birthdate from employees where employee_id = 12345);
This is a scalar query, because it returns only one value (one column, one row). But of course this still makes as little sense as your original query, because the subquery result is independent from the main query, i.e. it returns the same value for every row in the table and thus no sorting takes effect.
A last remark: A subquery in ORDER BY makes very seldomly sense, because that would mean you order by something you don't display. The exception is when looking up a sortkey. E.g.:
select *
from products p
where type = 'shirt' and color = 'blue' and size in ('S', 'M', 'L', 'XL')
order by (select sortkey from sizes s where s.size = p.size);
It means that valid options for ORDER BY clause can be
expression,
position or
column alias
A subquery is neither of these.
i have a select(water readings, previous water reading, other columns) , a "where clause" that is based on date water reading date. however for previous water reading it must not consider the where clause. I want to get previous meter reading regardless where clause date range.
looked at union problem is that i have to use the same clause,
SELECT
WATERREADINGS.name,
WATERREADINGS.date,
LAG( WATERREADINGS.meter_reading,1,NULL) OVER(
PARTITION BY WATERREADINGS.meter_id,WATERREADINGS.register_id
ORDER BY WATERREADINGS.meter_id DESC,WATERREADINGS.register_id
DESC,WATERREADINGS.readingdate ASC,WATERREADINGS.created ASC
) AS prev_water_reading,
FROM WATERREADINGS
WHERE waterreadings.waterreadingdate BETWEEN '24-JUN-19' AND
'24-AUG-19' and isactive = 'Y'
The prev_water_reading value must not be restricted by the date BETWEEN '24-JUN-19' AND '24-AUG-19' predicate but the rest of the sql should be.
You can do this by first finding the previous meter readings for all rows and then filtering those results on the date, e.g.:
WITH meter_readings AS (SELECT waterreadings.name,
waterreadings.date dt,
lag(waterreadings.meter_reading, 1, NULL) OVER (PARTITION BY waterreadings.meter_id, waterreadings.register_id
ORDER BY waterreadings.readingdate ASC, waterreadings.created ASC)
AS prev_water_reading,
FROM waterreadings
WHERE isactive = 'Y')
-- the meter_readings subquery above gets all rows and finds their previous meter reading.
-- the main query below then applies the date restriction to the rows from the meter_readings subquery.
SELECT name,
date,
prev_water_reading,
FROM meter_readings
WHERE dt BETWEEN to_date('24/06/2019', 'dd/mm/yyyy') AND to_date('24/08/2019', 'dd/mm/yyyy');
Perform the LAG in an inner query that is not filtered by dates and then filter by the dates in the outer query:
SELECT name,
"date",
prev_water_reading
FROM (
SELECT name,
"date",
LAG( meter_reading,1,NULL) OVER(
PARTITION BY meter_id, register_id
ORDER BY meter_id DESC, register_id DESC, readingdate ASC, created ASC
) AS prev_water_reading,
waterreadingdate --
FROM WATERREADINGS
WHERE isactive = 'Y'
)
WHERE waterreadingdate BETWEEN DATE '2019-06-24' AND DATE '2019-08-24'
You should also not use strings for dates (that require an implicit cast using the NLS_DATE_FORMAT session parameter, which can be changed by any user in their own session) and use date literals DATE '2019-06-24' or an explicit cast TO_DATE( '24-JUN-19', 'DD-MON-RR' ).
You also do not need to reference the table name for every column when there is only a single table as this clutters up your code and makes it difficult to read and DATE is a keyword so you either need to wrap it in double quotes to use it as a column name (which makes the column name case sensitive) or should use a different name for your column.
I've added a subquery with previous result without filter and then joined it with the main table with filters:
SELECT
WATERREADINGS.name,
WATERREADINGS.date,
w_lag.prev_water_reading
FROM
WATERREADINGS,
(SELECT name, date, LAG( WATERREADINGS.meter_reading,1,NULL) OVER(
PARTITION BY WATERREADINGS.meter_id,WATERREADINGS.register_id
ORDER BY WATERREADINGS.meter_id DESC,WATERREADINGS.register_id
DESC,WATERREADINGS.readingdate ASC,WATERREADINGS.created ASC
) AS prev_water_reading
FROM WATERREADINGS) w_lag
WHERE waterreadings.waterreadingsdate BETWEEN '24-JUN-19' AND '24-AUG-19' and isactive = 'Y'
and WATERREADINGS.name = w_lag.name
and WATERREADINGS.date = w_lag.date
I don't often use ORACLE PL/SQL by the way but i need to understand what if anything in this function created by someone else
in the company before me is wrong as for it is not returning the latest record i've been told. I found out in some other forum issues that they
suggested to use the max(dateColumn) instead of "row_numer = 1" for example but not quite sure how to and where to incorporate that.
-- Knowing that --
We use Oracle version 12,
CustomObjectTypeA is an custom Oracle OBJECT TYPE defined by some old employee not longer in here,
V_OtherView is of Table_Mnd type beeing defined by some old employee not longer in here,
V_ABC_123 is a view created by some old employee not longer in here as well.
CREATE OR REPLACE FUNCTION F_TABLE_APPROVED (NUMBER_F_UPD number, NUMBER_F_GET VARcHAR2)
RETURN Table_Mnd
IS
V_OtherView Table_Mnd
BEGIN
SELECT CustomObjectTypeA (FromT.NUMBER_F,
FromT.OP_CODE,
FromT.CATG_CODE,
FromT.CATG_NAME,
FromT.CATG_SORT,
FromT.ORG_CODE,
FromT.ORG_NAME
FromT.DATA_ENTRY_VALID,
FromT.NUMBER_RECEIVED,
FromT.YEAR_1,
FromT.YEAR_2)
BULK COLLECT INTO V_OtherView
FROM (SELECT NUMBER_F,
OP_CODE,
CATG_CODE,
CATG_NAME,
CATG_SORT,
ORG_CODE,
ORG_NAME
DATA_ENTRY_VALID,
NUMBER_RECEIVED,
YEAR_1,
YEAR_2,
ROW_NUMBER() OVER (PARTITION BY BY ORG_CODE ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC) AS ROW_NUMBER
FROM V_ABC_123
WHERE NUMBER_F = NUMBER_F_UPD AND DATA_ENTRY_VALID <> 'OnGoing'
AND LOAD_DATE >= (SELECT sysdate-10 FROM dual)
AND LOAD_DATE <= (SELECT DISTINCT LOAD_DATE
FROM V_ABC_123
WHERE NUMBER_RECEIVED = NUMBER_F_GET)) FromT
WHERE FromT.ROW_NUMBER=1;
RETURN V_OtherView;
END F_TABLE_APPROVED;
The important bits of the query are:
SELECT ...
FROM (select ...,
ROW_NUMBER()
OVER (PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC,
LOAD_DATE DESC) AS ROW_NUMBER
...) FromT
WHERE FromT.ROW_NUMBER = 1;
The "ROW_NUMBER" column is computed according to the following window clause:
PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC
Which means that for each ORG_CODE, it will sort all the records by NUMBER_RECEVED,LOAD_DATE in descending order. Note that if the columns are Oracle DATEs, they will only be accurate to the nearest second; so if there are multiple records with date/times in the exact same 1-second interval, this sort order will not be guaranteed unique. The logic of ROW_NUMBER will therefore pick one of them arbitrarily (i.e. whichever record happens to be emitted first) and assign it the value "1", and this will be deemed the "latest". Subsequent executions of the same SQL could (in theory) return a different record.
The suspicious part is NUMBER_RECEIVED which sounds like it's a number, not a date? Sorting by this means that the records with the highest NUMBER_RECEIVED will be preferred. Was this intentional?
I'm not sure why the PARTITION is there, this would cause the query to return one "latest" record for each value of ORG_CODE that it finds. I can only assume this was intentional.
The problem is that the query can only determine the "latest record" as well as it can based on the data provided to it. In this case, it's possible the data is simply not granular enough to be able to decide which record is the actual "latest" record.
I have a table with 3 columns:
table1: ID, CODE, RESULT, RESULT2, RESULT3
I have this SAS code:
data table1
set table1;
BY ID, CODE;
IF FIRST.CODE and RESULT='A' THEN OUTPUT;
ELSE IF LAST.CODE and RESULT NE 'A' THEN OUTPUT;
RUN;
So we are grouping the data by ID and CODE, and then writing to the dataset if certain conditions are met. I want to write a hive query to replicate this. This is what I have:
proc sql;
create table temp as
select *, row_number() over (partition by ID, CODE) as rowNum
from table1;
create table temp2 as
select a.ID, a.CODE, a.RESULT, a.RESULT2, a.RESULT3
from temp a
inner join (select ID, CODE, max(rowNum) as maxRowNum
from temp
group by ID, CODE) b
on a.ID=b.ID and a.CODE=b.CODE
where (a.rowNum=1 and a.RESULT='A') or (a.rowNum=b.maxRowNum and a.RESULT NE 'A');
quit;
There are two issues I see with this.
1) The row that is first or last in each BY group is entirely dependant on the order of rows in table1 in SAS, we aren't ordering by anything. I don't think row order is preserved when translating to a hive query.
2) The SAS code is taking the first row in each BY GROUP or the last, not both. I think that my HIVE query is taking both, resulting in more rows than I want.
Any suggestions or insight on how to improve my query is appreciated. Is it even possible to replicate this SAS code in HIVE?
The SAS code has a by statement (BY ID CODE;), which tells SAS that the set dataset is sorted at those levels. So, not a random selection for first. and last..
That said, we can replicate this in HIVE by using the first_value and last_value window functions.
FIRST.CODE should replicate to
first_value(code) over (partition by Id order by code)fcode
Similarly, LAST.CODE would be
last_value(code) over (partition by Id order by code)lcode
Once you have the fcode and lcode columns, use case when statements for the result column criteria. Like,
case when (code=fcode and result='A') or (code=lcode and result<>'A')
then 1 else 0 end as op_flag
Then the fetch the table with where op_flag = 1
SAMPLE
select id, code, result from (
select *,
first_value(code) over (partition by id order by code)fcode,
last_value(code) over (partition by id order by code)lcode
from footab) f
where (code=fcode and result='A') or (code=lcode and result<>'A')
Regarding point 1) the BY group processing requires the input data to be sorted or indexed on BY variables, so though the code contains no ordering, the source data is processed in order. If the input data was not indexed/sorted, SAS will throw error.
Regarding this, possible differences are on rows with same values of BY variables, especially if the RESULT is different.
In SAS, I would pre-sort data by ID, CODE, RESULT, then use BY ID CODE in order to not be influenced by order of rows.
Regarding 2) FIRST and LAST can be both true in SAS. Since your condition for first and last on RESULT is different, I guess this is not a source of differences.
I guess you could add another field as
row_number() over (partition by ID, CODE desc) as rowNumDesc
to detect last row with rowNumDesc = 1 (so that you skip the join).
EDIT:
I think the two programs above both include random selection of rows for groups with same values of ID and CODE variables, especially with same values of RESULT. But you should get same number of rows from both. If not, just debug it.
However the random aspect in SAS code/storage is based on physical order of rows, while the ROW_NUMBERs randomness within a group will be influenced by the implementation of the function in the engine.
I have wrote two plsql functions for Stock and Sales Comparison. one as find_prodcuts_sold and other as find_usage_from_stock.
I can get different between stock and sales by decrease find_prodcuts_sold function from find_usage_from_stock function. for this i should pass From date and To date to these two functions. (From date and To date are taken from stock_date column in stock table). then my functions return values for given date range.
Now i want to create a line chart using my functions to get different between stock and slaes. chart should build automatically. with out user pass From date and To date.
Example stock_date column from stock table.
stock_date
30-JAN-12
26-JAN-12
24-JAN-12
23-JAN-12
18-JAN-12
15-JAN-12
13-JAN-12
12-JAN-12
11-JAN-12
08-JAN-12
06-JAN-12
I want to pass above dates as below to my functions automatically.
From To
26-JAN-12 30-JAN-12
24-JAN-12 26-JAN-12
23-JAN-12 24-JAN-12
18-JAN-12 23-JAN-12
15-JAN-12 18-JAN-12
13-JAN-12 15-JAN-12
12-JAN-12 13-JAN-12
11-JAN-12 12-JAN-12
08-JAN-12 11-JAN-12
06-JAN-12 08-JAN-12
how could i do this ?
You can use LAG or LEAD analytic functions:
select stock_date, lead(stock_date, 1, null) over (order by stock_date) next_date
from stock_table
then use the result of the query for your input, i.e.:
SELECT find_usage_from_stock(t.product_id, t.start_date, t.end_date) as usage_from_stock,
find_prodcuts_sold(t.product_id, t.start_date, t.end_date) as prodcuts_sold,
t.product_id, t.start_date
FROM (select stock_date as start_date,
lead(stock_date, 1, null) over (order by stock_date) as end_date,
product_id
from stock_table) t
Note: I used null as the empty value in the lead function, perhaps you'll need to put something else