Hello all :) I have this query in Oracle 10g:
SELECT
LAST_VALUE(SERIAL_ID) OVER (),
LAST_VALUE(COLOR) OVER ()
FROM (
SELECT SERIAL_ID, COLOR FROM TABLE_1
UNION ALL
SELECT SERIAL_ID, COLOR FROM TABLE_2
) WHERE SERIAL_ID = <PUT UNIQUE ID TO TABLE_1 and TABLE_2 HERE>
TABLE_1 and TABLE_2 have the exact same schema, but different data. There is a unique constraint on SERIAL_ID, but one SERIAL_ID can be found in both tables.
So far the LAST_VALUE(COLOR) OVER () has always returned the value from TABLE_2 over TABLE_1 when there is a match in TABLE_2. Which is what I want.
I cannot find information in the documentation that tells me that the order from UNION ALL will get preserved. In my opinion UNION ALL is from the set realm, and I don't know if Oracle reserves the right to present this set in undefined order.
I want to make sure that the order will stay the same.
Best regards
Unless it's documented it isn't safe to assume anything about ordering, other than Tom Kyte's mantra. Even it appears to always work now, that doesn't mean you won't find it working differently one day, in the current or a future version.
You could ensure this by adding a flag to each branch of the union:
SELECT
LAST_VALUE(SERIAL_ID) OVER (ORDER BY FLAG
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
LAST_VALUE(COLOR) OVER (ORDER BY FLAG
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM (
SELECT SERIAL_ID, COLOR, 1 AS FLAG FROM TABLE_1
UNION ALL
SELECT SERIAL_ID, COLOR, 2 AS FLAG FROM TABLE_2
) WHERE SERIAL_ID = <PUT UNIQUE ID TO TABLE_1 and TABLE_2 HERE>
This will work even if you (artifically) break it by adding an ORDER BY FLAG DESC to the inner query - SQL Fiddle showing that in action.
Well, UNION ALL always returned ordered data for me. Order is by all columns, starting from first one. To set records in order you need, you will have to generate some additional field in both internal queries and then order on that field in outer query.
Related
I don't often use ORACLE PL/SQL by the way but i need to understand what if anything in this function created by someone else
in the company before me is wrong as for it is not returning the latest record i've been told. I found out in some other forum issues that they
suggested to use the max(dateColumn) instead of "row_numer = 1" for example but not quite sure how to and where to incorporate that.
-- Knowing that --
We use Oracle version 12,
CustomObjectTypeA is an custom Oracle OBJECT TYPE defined by some old employee not longer in here,
V_OtherView is of Table_Mnd type beeing defined by some old employee not longer in here,
V_ABC_123 is a view created by some old employee not longer in here as well.
CREATE OR REPLACE FUNCTION F_TABLE_APPROVED (NUMBER_F_UPD number, NUMBER_F_GET VARcHAR2)
RETURN Table_Mnd
IS
V_OtherView Table_Mnd
BEGIN
SELECT CustomObjectTypeA (FromT.NUMBER_F,
FromT.OP_CODE,
FromT.CATG_CODE,
FromT.CATG_NAME,
FromT.CATG_SORT,
FromT.ORG_CODE,
FromT.ORG_NAME
FromT.DATA_ENTRY_VALID,
FromT.NUMBER_RECEIVED,
FromT.YEAR_1,
FromT.YEAR_2)
BULK COLLECT INTO V_OtherView
FROM (SELECT NUMBER_F,
OP_CODE,
CATG_CODE,
CATG_NAME,
CATG_SORT,
ORG_CODE,
ORG_NAME
DATA_ENTRY_VALID,
NUMBER_RECEIVED,
YEAR_1,
YEAR_2,
ROW_NUMBER() OVER (PARTITION BY BY ORG_CODE ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC) AS ROW_NUMBER
FROM V_ABC_123
WHERE NUMBER_F = NUMBER_F_UPD AND DATA_ENTRY_VALID <> 'OnGoing'
AND LOAD_DATE >= (SELECT sysdate-10 FROM dual)
AND LOAD_DATE <= (SELECT DISTINCT LOAD_DATE
FROM V_ABC_123
WHERE NUMBER_RECEIVED = NUMBER_F_GET)) FromT
WHERE FromT.ROW_NUMBER=1;
RETURN V_OtherView;
END F_TABLE_APPROVED;
The important bits of the query are:
SELECT ...
FROM (select ...,
ROW_NUMBER()
OVER (PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC,
LOAD_DATE DESC) AS ROW_NUMBER
...) FromT
WHERE FromT.ROW_NUMBER = 1;
The "ROW_NUMBER" column is computed according to the following window clause:
PARTITION BY ORG_CODE
ORDER BY NUMBER_RECEIVED DESC, LOAD_DATE DESC
Which means that for each ORG_CODE, it will sort all the records by NUMBER_RECEVED,LOAD_DATE in descending order. Note that if the columns are Oracle DATEs, they will only be accurate to the nearest second; so if there are multiple records with date/times in the exact same 1-second interval, this sort order will not be guaranteed unique. The logic of ROW_NUMBER will therefore pick one of them arbitrarily (i.e. whichever record happens to be emitted first) and assign it the value "1", and this will be deemed the "latest". Subsequent executions of the same SQL could (in theory) return a different record.
The suspicious part is NUMBER_RECEIVED which sounds like it's a number, not a date? Sorting by this means that the records with the highest NUMBER_RECEIVED will be preferred. Was this intentional?
I'm not sure why the PARTITION is there, this would cause the query to return one "latest" record for each value of ORG_CODE that it finds. I can only assume this was intentional.
The problem is that the query can only determine the "latest record" as well as it can based on the data provided to it. In this case, it's possible the data is simply not granular enough to be able to decide which record is the actual "latest" record.
I have a table with 3 columns:
table1: ID, CODE, RESULT, RESULT2, RESULT3
I have this SAS code:
data table1
set table1;
BY ID, CODE;
IF FIRST.CODE and RESULT='A' THEN OUTPUT;
ELSE IF LAST.CODE and RESULT NE 'A' THEN OUTPUT;
RUN;
So we are grouping the data by ID and CODE, and then writing to the dataset if certain conditions are met. I want to write a hive query to replicate this. This is what I have:
proc sql;
create table temp as
select *, row_number() over (partition by ID, CODE) as rowNum
from table1;
create table temp2 as
select a.ID, a.CODE, a.RESULT, a.RESULT2, a.RESULT3
from temp a
inner join (select ID, CODE, max(rowNum) as maxRowNum
from temp
group by ID, CODE) b
on a.ID=b.ID and a.CODE=b.CODE
where (a.rowNum=1 and a.RESULT='A') or (a.rowNum=b.maxRowNum and a.RESULT NE 'A');
quit;
There are two issues I see with this.
1) The row that is first or last in each BY group is entirely dependant on the order of rows in table1 in SAS, we aren't ordering by anything. I don't think row order is preserved when translating to a hive query.
2) The SAS code is taking the first row in each BY GROUP or the last, not both. I think that my HIVE query is taking both, resulting in more rows than I want.
Any suggestions or insight on how to improve my query is appreciated. Is it even possible to replicate this SAS code in HIVE?
The SAS code has a by statement (BY ID CODE;), which tells SAS that the set dataset is sorted at those levels. So, not a random selection for first. and last..
That said, we can replicate this in HIVE by using the first_value and last_value window functions.
FIRST.CODE should replicate to
first_value(code) over (partition by Id order by code)fcode
Similarly, LAST.CODE would be
last_value(code) over (partition by Id order by code)lcode
Once you have the fcode and lcode columns, use case when statements for the result column criteria. Like,
case when (code=fcode and result='A') or (code=lcode and result<>'A')
then 1 else 0 end as op_flag
Then the fetch the table with where op_flag = 1
SAMPLE
select id, code, result from (
select *,
first_value(code) over (partition by id order by code)fcode,
last_value(code) over (partition by id order by code)lcode
from footab) f
where (code=fcode and result='A') or (code=lcode and result<>'A')
Regarding point 1) the BY group processing requires the input data to be sorted or indexed on BY variables, so though the code contains no ordering, the source data is processed in order. If the input data was not indexed/sorted, SAS will throw error.
Regarding this, possible differences are on rows with same values of BY variables, especially if the RESULT is different.
In SAS, I would pre-sort data by ID, CODE, RESULT, then use BY ID CODE in order to not be influenced by order of rows.
Regarding 2) FIRST and LAST can be both true in SAS. Since your condition for first and last on RESULT is different, I guess this is not a source of differences.
I guess you could add another field as
row_number() over (partition by ID, CODE desc) as rowNumDesc
to detect last row with rowNumDesc = 1 (so that you skip the join).
EDIT:
I think the two programs above both include random selection of rows for groups with same values of ID and CODE variables, especially with same values of RESULT. But you should get same number of rows from both. If not, just debug it.
However the random aspect in SAS code/storage is based on physical order of rows, while the ROW_NUMBERs randomness within a group will be influenced by the implementation of the function in the engine.
I have a select statement that includes a call to the row_number() function, which technically gives me a unique id per row that is returned.
SELECT f.*, row_number() as row_id OVER(ORDER BY f.name)
FROM widgets f
It'd be kinda cool if i could somehow use this row_id to sort the table. I'd now like to try to use the row_number to sort like so:
table.sort(mytable, function(a,b) return a.row_id< b.row_id end)
I'm just trying to save myself from having to loop through the results to add a unique id and then sort it but maybe it's not possible.
I don't know how you would express this in Lua, but in Postgres you can order by a column alias. The Postgres query would be:
SELECT f.*, row_number() OVER (ORDER BY f.name) as row_id
FROM widgets f
ORDER BY row_id;
The fact that you don't want to order by f.name suggests that you have duplicates. Do note that ordering in SQL is not guaranteed to be stable. That is, duplicate names could be in different orders. If you have a way of making the ordering stable (i.e. by uniquely identifying each row), you can use those columns in the order by.
I have a table that has 14 columns in it. These columns are color, type, ft, date, count, etc. What I need is to select all distinct records of id and type with the most recent date. So, for example...
color------type-----------date
red--------work-----------01/01/01
red---------play----------02/02/02
red---------play----------03/03/03
In this case, I want to return red, work, 01/01/01 and red, play 03/03/03. Hopefully this makes sense. I've tried different combinations of select unique and select distinct and group bys, and I haven't been able to come up with anything.
Here is the SQL statement I'm trying:
select distinct
chock_id,
roll_type,
max(chock_service_dt),
chock_id_dt,
chock_seq_num,
chock_service_cmnt,
total_rolled_lineal_ft,
total_rolled_tons,
chock_usage_cnt,
chock_insert_dt,
record_modify_dt,
next_chock_service_dt_act,
previous_alarm_value,
upload_complete_yn
from
tp07_chock_summary_row
group by
chock_id,
roll_type,
chock_service_dt,
chock_id_dt,
chock_seq_num,
chock_service_cmnt,
total_rolled_lineal_ft,
total_rolled_tons,
chock_usage_cnt,
chock_insert_dt,
record_modify_dt,
next_chock_service_dt_act,
previous_alarm_value,
upload_complete_yn;
Here's a screenshot. Like I said in a comment below, like in rows 2 and 4, I can't have multiple records with the same chock_id and roll_type.
Given your new requirements, which you did not explain initially, this should do it:
select
chock_id,
roll_type,
chock_service_dt,
chock_id_dt,
chock_seq_num,
chock_service_cmnt,
total_rolled_lineal_ft,
total_rolled_tons,
chock_usage_cnt,
chock_insert_dt,
record_modify_dt,
next_chock_service_dt_act,
previous_alarm_value,
upload_complete_yn
from (
select
chock_id,
roll_type,
chock_service_dt,
chock_id_dt,
chock_seq_num,
chock_service_cmnt,
total_rolled_lineal_ft,
total_rolled_tons,
chock_usage_cnt,
chock_insert_dt,
record_modify_dt,
next_chock_service_dt_act,
previous_alarm_value,
upload_complete_yn,
row_number() over (
partition by chock_id, roll_type
order by chock_service_dt desc
) rn
from
tp07_chock_summary_row
) where rn = 1
select color, type, max(date)
from ...
group by color, type
select
color,
type,
max(date)
from
yourtable
group by
color,
type
For a pair of cursors where the total number of rows in the resultset is required immediately after the first FETCH, ( after some trial-and-error ) I came up with the query below
SELECT
col_a,
col_b,
col_c,
COUNT(*) OVER( PARTITION BY 1 ) AS rows_in_result
FROM
myTable JOIN theirTable ON
myTable.col_a = theirTable.col_z
GROUP BY
col_a, col_b, col_c
ORDER BY
col_b
Now when the output of the query is X rows, rows_in_result reflects this accurately.
What does PARTITION BY 1 mean?
I think it probably tells the database to partition the results into pieces of 1-row each
It is an unusual use of PARTITION BY. What it does is put everything into the same partition so that if the query returns 123 rows altogether, then the value of rows_in_result on each row will be 123 (as its alias implies).
It is therefore equivalent to the more concise:
COUNT(*) OVER ()
Databases are quite free to add restrictions to the OVER() clause. Sometimes, either PARTITION BY [...] and/or ORDER BY [...] are mandatory clauses, depending on the aggregate function. PARTITION BY 1 may just be a dummy clause used for syntax integrity. The following two are usually equivalent:
[aggregate function] OVER ()
[aggregate function] OVER (PARTITION BY 1)
Note, though, that Sybase SQL Anywhere and CUBRID interpret this 1 as being a column index reference, similar to what is possible in the ORDER BY [...] clause. This might appear to be a bit surprising as it imposes an evaluation order to the query's projection. In your case, this would then mean that the following are equivalent
COUNT(*) OVER (PARTITION BY 1)
COUNT(*) OVER (PARTITION BY col_a)
This curious deviation from other databases' interpretation allows for referencing more complex grouping expressions.