Merge two recordset in oracle - oracle

I have following 2 recordsets :
Recordset 1:
Id isVal isVal1
1 Y N
2 Y N
Recordset 2:
Id isVal isVal1
2 N Y
3 N Y
Actual recordset required is:
Id isVal isVal1
1 Y N
2 Y Y
3 N Y
Should I use join? Can you please advice me how can I solve this?

No, you want to place the records on top of each other so you would need to use union.
select id, max(isval) as isval, max(isval1) as isval1
from ( select id, isval, isval1
from recordset1
union all
select id, isval, isval1
from recordset1
)
group by id
I use union all as you don't need to remove duplicates, for which you would remove the all.
The max works because 'Y' is "greater" than 'N'.
I'm assuming that 'Y' takes precedence over 'N' rather than values from the first record-set are less important than values from the second.

Related

Oracle: prioritizing results based on column’s value

I have a data-set in which there are duplicate IDs in the first column. I'm hoping to obtain a single row of data for each ID based on the second column's value. The data looks like so:
ID Info_Source Prior?
A 1 Y
A 3 N
A 2 Y
B 1 N
B 1 N
B 2 Y
C 2 N
C 3 Y
C 1 N
Specifically the criteria would call for prioritizing based on the second column's value (3 highest priority; then 1; and lastly 2): if the 'Info_Source' column has a value of 3, return that row; if there is no 3 in the second column for a given ID, look for a 1 and if found return that row; and finally if there is no 3 or 1 associated with the ID, search for 2 and return that row for the ID.
The desired results would be a single row for each ID, and the resulting data would be:
ID Info_Source Prior?
A 3 N
B 1 N
C 3 Y
row_number() over() usually solves these needs nicely and efficiently e.g.
select ID, Info_Source, Prior
from (
select ID, Info_Source, Prior
, row_number() over(partition by id order by Info_source DESC) as rn
)
where rn = 1
For prioritizing the second column's value (3 ; then 1, then 2) use a case expression to alter the raw value into an order that you need.
select ID, Info_Source, Prior
from (
select ID, Info_Source, Prior
, row_number() over(partition by id
order by case when Info_source = 3 then 3
when Infor_source = 1 then 2
else 1 end DESC) as rn
)
where rn = 1

Oracle - Assign count value for a column based on another column in select query

Consider, I have the following in a select query:
ID Flag
5 Y
5 Y
5 N
6 Y
6 Y
6 Y
6 N
I should be adding a new column count in the same select which counts the number of 'Y' records for the ID and assigns it to all. (Eg: ID=5 has 3 records. All of them should be assigned the count value as '2').
Output required in select query:
ID Flag count
5 Y 2
5 Y 2
5 N 2
6 Y 3
6 Y 3
6 Y 3
6 N 3
Use a window function:
select id,
flag,
count(case when flag = 'Y' then 1 end) over (partition by id) as "count"
from the_table
order by id;
The case expression will return null for flags with N and thus they will be ignored by the count() function

Can I Order Random results?

I want to randomly pull out 3 records from a table and then order them to the field "sponsor_ranking".
My Code Reads.
$sql = "SELECT * FROM $TableSponsors ORDER BY RAND(), sponsor_ranking asc LIMIT 3";
But it is not ordering the results in order of the "sponsor_ranking" but it is randomizing the results.
Any suggestions?
Thank you.
Of course, ordering by sponsor_ranking is only useful if each of the records had the same RAND() value, which is not so likely.
You can solve it like this. Order by random, limit to 3, order again by sponsor_ranking.
SELECT * FROM
(SELECT * FROM $TableSponsors
ORDER BY RAND()
LIMIT 3) x
ORDER BY
sponsor_ranking
you could make a subtable in the for clause :
$sql = "SELECT * FROM (SELECT * FROM $TableSponsors ORDER BY RAND() LIMIT 3) Faketable ORDER BY sponsor_ranking";
This will never work. Doing an order by with multiple fields requires that the "earlier" fields have the same values for the second and subsequent fields to even be considered.
You'll have to use a subquery to do the rand() ordering, then rank by the other fields in the parent query:
SELECT *
FROM (
SELECT *
FROM $TableSponsors
ORDER BY RAND()
) as foo
ORDER BY sponsor_ranking
LIMIT 3
e.g. if your table had this:
x y
1 5
1 6
2 7
3 8
4 9
... ORDER BY x DESC, y ASC
then you'd get
x y
4 9 // only one "4", so 9 is ignored, no point in sorting a single value
3 8 // only one "3", so 8 is ignored, no point in sorting a single value
2 7 // ditto
1 5 // hey, there's two "1" values, so now the second field **IS** sorted
1 6

Similar queries have way different execution times

I had the following query:
SELECT nvl(sum(adjust1),0)
FROM (
SELECT
ManyOperationsOnFieldX adjust1,
a, b, c, d, e
FROM (
SELECT
a, b, c, d, e,
SubStr(balance, INSTR(balance, '[&&2~', 1, 1)) X
FROM
table
WHERE
a >= To_Date('&&1','YYYYMMDD')
AND a < To_Date('&&1','YYYYMMDD')+1
)
)
WHERE
b LIKE ...
AND e IS NULL
AND adjust1>0
AND (b NOT IN ('...','...','...'))
OR (b = '... AND c <> NULL)
I tried to change it to this:
SELECT nvl(sum(adjust1),0)
FROM (
SELECT
ManyOperationsOnFieldX adjust1
FROM (
SELECT
SubStr(balance, INSTR(balance, '[&&2~', 1, 1)) X
FROM
table
WHERE
a >= To_Date('&&1','YYYYMMDD')
AND a < To_Date('&&1','YYYYMMDD')+1
AND b LIKE '..'
AND e IS NULL
AND (b NOT IN ('..','..','..'))
OR (b='..' AND c <> NULL)
)
)
WHERE
adjust1>0
Mi intention was to have all the filtering in the innermost query, and only give to the outer ones the field X which is the one I have to operate a lot. However, the firts (original) query takes a couple of seconds to execute, while the second one won't even finish. I waited for almost 20 minutes and still I wouldn't get the answer.
Is there an obvious reason for this to happen that I might be overlooking?
These are the plans for each of them:
SELECT STATEMENT optimizer=all_rows (cost = 973 Card = 1 bytes = 288)
SORT (aggregate)
PARTITION RANGE (single) (cost=973 Card = 3 bytes = 864)
TABLE ACCESS (full) OF "table" #3 TABLE Optimizer = analyzed(cost=973 Card = 3 bytes=564)
SELECT STATEMENT optimizer=all_rows (cost = 750.354 Card = 1 bytes = 288)
SORT (aggregate)
PARTITION RANGE (ALL) (cost=759.354 Cart = 64.339 bytes = 18.529.632)
TABLE ACCESS (full) OF "table" #3 TABLE Optimizer = analyzed(cost=750.354 Card = 64.339 bytes=18.529.632)
Your two queries are not identical.
the logical operator AND is evaluated before the operator OR:
SQL> WITH data AS
2 (SELECT rownum id
3 FROM dual
4 CONNECT BY level <= 10)
5 SELECT *
6 FROM data
7 WHERE id = 2
8 AND id = 3
9 OR id = 5;
ID
----------
5
So your first query means: Give me the big SUM over this partition when the data is this way.
Your second query means: give me the big SUM over (this partition when the data is this way) or (when the data is this other way [no partition elimination hence big full scan])
Be careful when mixing the logical operators AND and OR. My advice would be to use brackets so as to avoid any confusion.
It is all about your OR... Try this:
SELECT nvl(sum(adjust1),0)
FROM (
SELECT
ManyOperationsOnFieldX adjust1
FROM (
SELECT
SubStr(balance, INSTR(balance, '[&&2~', 1, 1)) X
FROM
table
WHERE
a >= To_Date('&&1','YYYYMMDD')
AND a < To_Date('&&1','YYYYMMDD')+1
AND (
b LIKE '..'
AND e IS NULL
AND (b NOT IN ('..','..','..'))
OR (b='..' AND c <> NULL)
)
)
)
WHERE
adjust1>0
Because you have the OR inline with the rest of your AND statements with no parenthesis, the 2nd version isn't limiting the data checked to just the rows that fall in the date filter. For more info, see the documentation of Condition Precedence

Interpolation between two values in a single query

I want to calculate a value by interpolating the value between two nearest neighbours.
I have a subquery that returns the values of the neighbours and their relative distance, in the form of two columns with two elements.
Let's say:
(select ... as value, ... as distance
from [get some neighbours by distance] limit 2) as sub
How can I calculate the value of the point by linear interpolation? Is it possible to do that in a single query?
Example: My point has the neighbour A with value 10 at distance 1, and the neighbour B with value 20 at distance 4. The function should return a value 10 * 4 + 20 * 1 / 5 = 12 for my point.
I tried the obvious approach
select sum(value * (sum(distance)-distance)) / sum(distance)
which will fail because you cannot work with group clauses inside group clauses. Using another subquery returning the sum is not possible either, because then I cannot forward the individual values at the same time.
This is an ugly hack (based on a abused CTE ;). The crux of it is that
value1 * distance2 + value2 * distance1
Can, by dividing by distance1*distance2, be rewritten to
value1/distance1 + value2/distance2
So, the products (or divisions) can stay inside their rows. After the summation, multiplying by (distance1*distance2) rescales the result to the desired output. Generalisation to more than two neighbors is left as an exercise to the reader.YMMV
DROP TABLE tmp.points;
CREATE TABLE tmp.points
( pname VARCHAR NOT NULL PRIMARY KEY
, distance INTEGER NOT NULL
, value INTEGER
);
INSERT INTO tmp.points(pname, distance, value) VALUES
( 'A' , 1, 10 )
, ( 'B' , 4, 20 )
, ( 'C' , 10 , 1)
, ( 'D' , 11 , 2)
;
WITH RECURSIVE twin AS (
select 1::INTEGER AS zrank
, p0.pname AS zname
, p0.distance AS dist
, p0.value AS val
, p0.distance* p0.value AS prod
, p0.value::float / p0.distance AS frac
FROM tmp.points p0
WHERE NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance < p0.distance)
UNION
select 1+twin.zrank AS zrank
, p1.pname AS zname
, p1.distance AS dist
, p1.value AS val
, p1.distance* p1.value AS prod
, p1.value::float / p1.distance AS frac
FROM tmp.points p1, twin
WHERE p1.distance > twin.dist
AND NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance > twin.dist
AND px.distance < p1.distance
)
)
-- SELECT * from twin ;
SELECT min(zname) AS name1, max(zname) AS name2
, MIN(dist) * max(dist) *SUM(frac) / SUM(dist) AS score
FROM twin
WHERE zrank <=2
;
The result:
CREATE TABLE
INSERT 0 4
name1 | name2 | score
-------+-------+-------
A | B | 12
Update: this one is a bit cleaner ... ties are still not handled (need a window function or a LIMIT 1 clause in the outer query for that)
WITH RECURSIVE twin AS (
select 1::INTEGER AS zrank
, p0.pname AS name1
, p0.pname AS name2
, p0.distance AS dist
FROM tmp.points p0
WHERE NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance < p0.distance)
UNION
select 1+twin.zrank AS zrank
, twin.name1 AS name1
, p1.pname AS name2
, p1.distance AS dist
FROM tmp.points p1, twin
WHERE p1.distance > twin.dist
AND NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance > twin.dist
AND px.distance < p1.distance
)
)
SELECT twin.name1, twin.name2
, (p1.distance * p2.value + p2.distance * p1.value) / (p1.distance+p2.distance) AS score
FROM twin
JOIN tmp.points p1 ON (p1.pname = twin.name1)
JOIN tmp.points p2 ON (p2.pname = twin.name2)
WHERE twin.zrank =2
;
If you actually want the point in between, there is a built-in way of doing that (but not an aggregate function):
SELECT center(box(x.mypoint,y.mypoint))
FROM ([get some neighbours by distance] order by value limit 1) x
,([get some neighbours by distance] order by value offset 1 limit 1) y;
If you want the mean distance:
SELECT avg(x.distance)
FROM ([get some neighbours by distance] order by value limit 2) as x
See geometrical function and aggregate functions in the manual.
Edit:
For the added example, the query could look like this:
SELECT (x.value * 4 + y.value) / 5 AS result
FROM ([get some neighbours by distance] order by value limit 1) x
,([get some neighbours by distance] order by value offset 1 limit 1) y;
I added missing () to get the result you expect!
Or, my last stab at it:
SELECT y.x, y.x[1], (y.x[1] * 4 + y.x[2]) / 5 AS result
FROM (
SELECT ARRAY(
SELECT value FROM tbl WHERE [some condition] ORDER BY value LIMIT 2
) x
) y
It would be so much easier, if you provided the full query and the table definitions.

Resources