I have a hobby project which is about creating a tree to store identification numbers. I had used digit, stored at node, that is node can be 0 1 2 3 4 5 6 7 8 9.
After I have create tree, I want compose list from tree. But, I could not find a algorithm to manage my goal.
What I want :
"recompose tree" will return list of numbers. For below tree it should be
[ 2, 21, 243, 245, 246, 78, 789 ]
Root
/ \
2* 7
/ \ \
1* 4 8*
/ \ \ \
3* 5* 6* 9*
my data type : data ID x = ID ( ( x, Mark ), [ ID x ] )
data Mark = Marked | Unmarked
EDIT:
for convenience : * shows it is marked
I have stored digit as char, actually not 1,
it is stored as'1'
Do you have advice how I can do that ? ( advice is prefferred to be algorithm )
What about
recompose :: Num a => ID a -> [a]
recompose = go 0
where
go acc (ID ((n, Marked), ids)) =
let acc' = 10 * acc + n
in acc' : concatMap (go acc') ids
go acc (ID ((n, Unmarked), ids)) =
let acc' = 10 * acc + n
in concatMap (go acc') ids
?
That is, we traverse the tree while accumulating a value for the path from the root to a node. At every node we update the accumulator by multiplying the value for the path by 10 and adding the value for the node to it. The traversal produces a list of all accumulator values for marked node: so, at marked node we add the accumulator value to the list, for unmarked nodes we just propagate the list that we have collected for the children of the node.
How do we compute the list for the children of a node? We recursively call the traversal function (go) to all children by mapping it over the list of children. That gives us a list of lists that we concatenate to obtain a single list. (concatMap f xs is just concat (map f xs)) or concat . map f.)
In attribute-grammar terminology: the accumulator serves as an inherited attribute, while the returned list is a synthesised attribute.
As a refinement, we can introduce an auxiliary function isMarked,
isMarked :: Mark -> Bool
isMarked Marked = True
isMarked Unmarked = False
so that we can concisely write
recompose :: Num a => ID a -> [a]
recompose = go 0
where
go acc (ID ((n, m), ids)) =
let acc' = 10 * acc + n
prefix = if isMarked m then (acc' :) else id
in prefix (concatMap (go acc') ids)
BTW: this can even be done in sql:
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path='tmp';
CREATE TABLE the_tree
( id CHAR(1) NOT NULL PRIMARY KEY
, parent_id CHAR(1) REFERENCES the_tree(id)
);
INSERT INTO the_tree(id,parent_id) VALUES
( '#', NULL)
,( '2', '#')
,( '7', '#')
,( '1', '2')
,( '4', '2')
,( '3', '4')
,( '5', '4')
,( '6', '4')
,( '8', '7')
,( '9', '8')
;
WITH RECURSIVE sub AS (
SELECT id, parent_id, ''::text AS path
FROM the_tree t0
WHERE id = '#'
UNION
SELECT t1.id, t1.parent_id, sub.path || t1.id::text
FROM the_tree t1
JOIN sub ON sub.id = t1.parent_id
)
SELECT sub.id,sub.path
FROM sub
ORDER BY path
;
RESULT: (postgresql)
NOTICE: drop cascades to table tmp.the_tree
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "the_tree_pkey" for table "the_tree"
CREATE TABLE
INSERT 0 10
id | path
----+------
# |
2 | 2
1 | 21
4 | 24
3 | 243
5 | 245
6 | 246
7 | 7
8 | 78
9 | 789
(10 rows)
Related
I have a data-set in which there are duplicate IDs in the first column. I'm hoping to obtain a single row of data for each ID based on the second column's value. The data looks like so:
ID Info_Source Prior?
A 1 Y
A 3 N
A 2 Y
B 1 N
B 1 N
B 2 Y
C 2 N
C 3 Y
C 1 N
Specifically the criteria would call for prioritizing based on the second column's value (3 highest priority; then 1; and lastly 2): if the 'Info_Source' column has a value of 3, return that row; if there is no 3 in the second column for a given ID, look for a 1 and if found return that row; and finally if there is no 3 or 1 associated with the ID, search for 2 and return that row for the ID.
The desired results would be a single row for each ID, and the resulting data would be:
ID Info_Source Prior?
A 3 N
B 1 N
C 3 Y
row_number() over() usually solves these needs nicely and efficiently e.g.
select ID, Info_Source, Prior
from (
select ID, Info_Source, Prior
, row_number() over(partition by id order by Info_source DESC) as rn
)
where rn = 1
For prioritizing the second column's value (3 ; then 1, then 2) use a case expression to alter the raw value into an order that you need.
select ID, Info_Source, Prior
from (
select ID, Info_Source, Prior
, row_number() over(partition by id
order by case when Info_source = 3 then 3
when Infor_source = 1 then 2
else 1 end DESC) as rn
)
where rn = 1
I'm stumped on what seemed to be a simple UPDATE statement.
I'm looking for an UPDATE that uses two values. The first (a) is used to group, the second (b) is used to find a local minimum of values within the respective group. As a little extra there is a threshold value on b: Any value 1 or smaller shall remain as it is.
drop table t1;
create table t1 (a number, b number);
insert into t1 values (1,0);
insert into t1 values (1,1);
insert into t1 values (2,1);
insert into t1 values (2,2);
insert into t1 values (3,1);
insert into t1 values (3,2);
insert into t1 values (3,3);
insert into t1 values (4,1);
insert into t1 values (4,3);
insert into t1 values (4,4);
insert into t1 values (4,5);
-- 1,0 -> 1,0
-- 1,1 -> 1,1
-- 2,1 -> 2,1
-- 2,2 -> 2,2
-- 3,1 -> 3,1
-- 3,2 -> 3,2
-- 3,3 -> 3,2 <-
-- 4,1 -> 4,1
-- 4,3 -> 4,3 <-
-- 4,4 -> 4,3 <-
-- 4,5 -> 4,3 <-
Obviously not sufficient is:
update t1 x
set b = (select min(b) from t1 where b > 1)
;
Whatever more complicated stuff I try, e.g.
UPDATE t1 x
set (a,b) = (select distinct a,b from (
select a, min(b) from t1 where b > 1 group by a)
)
;
I get
SQL-Fehler: ORA-01427: Unterabfrage für eine Zeile liefert mehr als eine Zeile
01427. 00000 - "single-row subquery returns more than one row"
which is not overly surprising as I need a row for each value of a.
Of course I could write a PL/SQL Procedure with a cursor loop but is it possible in a single elegant SQL statement? Maybe using partition by?
Your question is a bit confusing.
You say that you would like to set value b to a minimum value from partition a that column b is in row with, while the rows containing b = 1 should remain untouched.
From what I can see in your question as comments (I assume it's your expected output) you also want to get the minimum value that follows 1 within a partition - so you basically want the minimum value of b that is greater than 1.
Below is SQL query that does this
UPDATE t1 alias
SET b = (
SELECT min(b)
FROM t1
WHERE alias.a = t1.a
AND t1.b > 1 -- this would get the minimum value higher than 1
GROUP BY a
)
WHERE alias.b > 1 -- update will not affect rows with b <= 1
Output after update
a | b
---+---
1 | 0
1 | 1
2 | 1
2 | 2
3 | 1
3 | 2
3 | 2
4 | 1
4 | 3
4 | 3
4 | 3
I had the following query:
SELECT nvl(sum(adjust1),0)
FROM (
SELECT
ManyOperationsOnFieldX adjust1,
a, b, c, d, e
FROM (
SELECT
a, b, c, d, e,
SubStr(balance, INSTR(balance, '[&&2~', 1, 1)) X
FROM
table
WHERE
a >= To_Date('&&1','YYYYMMDD')
AND a < To_Date('&&1','YYYYMMDD')+1
)
)
WHERE
b LIKE ...
AND e IS NULL
AND adjust1>0
AND (b NOT IN ('...','...','...'))
OR (b = '... AND c <> NULL)
I tried to change it to this:
SELECT nvl(sum(adjust1),0)
FROM (
SELECT
ManyOperationsOnFieldX adjust1
FROM (
SELECT
SubStr(balance, INSTR(balance, '[&&2~', 1, 1)) X
FROM
table
WHERE
a >= To_Date('&&1','YYYYMMDD')
AND a < To_Date('&&1','YYYYMMDD')+1
AND b LIKE '..'
AND e IS NULL
AND (b NOT IN ('..','..','..'))
OR (b='..' AND c <> NULL)
)
)
WHERE
adjust1>0
Mi intention was to have all the filtering in the innermost query, and only give to the outer ones the field X which is the one I have to operate a lot. However, the firts (original) query takes a couple of seconds to execute, while the second one won't even finish. I waited for almost 20 minutes and still I wouldn't get the answer.
Is there an obvious reason for this to happen that I might be overlooking?
These are the plans for each of them:
SELECT STATEMENT optimizer=all_rows (cost = 973 Card = 1 bytes = 288)
SORT (aggregate)
PARTITION RANGE (single) (cost=973 Card = 3 bytes = 864)
TABLE ACCESS (full) OF "table" #3 TABLE Optimizer = analyzed(cost=973 Card = 3 bytes=564)
SELECT STATEMENT optimizer=all_rows (cost = 750.354 Card = 1 bytes = 288)
SORT (aggregate)
PARTITION RANGE (ALL) (cost=759.354 Cart = 64.339 bytes = 18.529.632)
TABLE ACCESS (full) OF "table" #3 TABLE Optimizer = analyzed(cost=750.354 Card = 64.339 bytes=18.529.632)
Your two queries are not identical.
the logical operator AND is evaluated before the operator OR:
SQL> WITH data AS
2 (SELECT rownum id
3 FROM dual
4 CONNECT BY level <= 10)
5 SELECT *
6 FROM data
7 WHERE id = 2
8 AND id = 3
9 OR id = 5;
ID
----------
5
So your first query means: Give me the big SUM over this partition when the data is this way.
Your second query means: give me the big SUM over (this partition when the data is this way) or (when the data is this other way [no partition elimination hence big full scan])
Be careful when mixing the logical operators AND and OR. My advice would be to use brackets so as to avoid any confusion.
It is all about your OR... Try this:
SELECT nvl(sum(adjust1),0)
FROM (
SELECT
ManyOperationsOnFieldX adjust1
FROM (
SELECT
SubStr(balance, INSTR(balance, '[&&2~', 1, 1)) X
FROM
table
WHERE
a >= To_Date('&&1','YYYYMMDD')
AND a < To_Date('&&1','YYYYMMDD')+1
AND (
b LIKE '..'
AND e IS NULL
AND (b NOT IN ('..','..','..'))
OR (b='..' AND c <> NULL)
)
)
)
WHERE
adjust1>0
Because you have the OR inline with the rest of your AND statements with no parenthesis, the 2nd version isn't limiting the data checked to just the rows that fall in the date filter. For more info, see the documentation of Condition Precedence
I want to calculate a value by interpolating the value between two nearest neighbours.
I have a subquery that returns the values of the neighbours and their relative distance, in the form of two columns with two elements.
Let's say:
(select ... as value, ... as distance
from [get some neighbours by distance] limit 2) as sub
How can I calculate the value of the point by linear interpolation? Is it possible to do that in a single query?
Example: My point has the neighbour A with value 10 at distance 1, and the neighbour B with value 20 at distance 4. The function should return a value 10 * 4 + 20 * 1 / 5 = 12 for my point.
I tried the obvious approach
select sum(value * (sum(distance)-distance)) / sum(distance)
which will fail because you cannot work with group clauses inside group clauses. Using another subquery returning the sum is not possible either, because then I cannot forward the individual values at the same time.
This is an ugly hack (based on a abused CTE ;). The crux of it is that
value1 * distance2 + value2 * distance1
Can, by dividing by distance1*distance2, be rewritten to
value1/distance1 + value2/distance2
So, the products (or divisions) can stay inside their rows. After the summation, multiplying by (distance1*distance2) rescales the result to the desired output. Generalisation to more than two neighbors is left as an exercise to the reader.YMMV
DROP TABLE tmp.points;
CREATE TABLE tmp.points
( pname VARCHAR NOT NULL PRIMARY KEY
, distance INTEGER NOT NULL
, value INTEGER
);
INSERT INTO tmp.points(pname, distance, value) VALUES
( 'A' , 1, 10 )
, ( 'B' , 4, 20 )
, ( 'C' , 10 , 1)
, ( 'D' , 11 , 2)
;
WITH RECURSIVE twin AS (
select 1::INTEGER AS zrank
, p0.pname AS zname
, p0.distance AS dist
, p0.value AS val
, p0.distance* p0.value AS prod
, p0.value::float / p0.distance AS frac
FROM tmp.points p0
WHERE NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance < p0.distance)
UNION
select 1+twin.zrank AS zrank
, p1.pname AS zname
, p1.distance AS dist
, p1.value AS val
, p1.distance* p1.value AS prod
, p1.value::float / p1.distance AS frac
FROM tmp.points p1, twin
WHERE p1.distance > twin.dist
AND NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance > twin.dist
AND px.distance < p1.distance
)
)
-- SELECT * from twin ;
SELECT min(zname) AS name1, max(zname) AS name2
, MIN(dist) * max(dist) *SUM(frac) / SUM(dist) AS score
FROM twin
WHERE zrank <=2
;
The result:
CREATE TABLE
INSERT 0 4
name1 | name2 | score
-------+-------+-------
A | B | 12
Update: this one is a bit cleaner ... ties are still not handled (need a window function or a LIMIT 1 clause in the outer query for that)
WITH RECURSIVE twin AS (
select 1::INTEGER AS zrank
, p0.pname AS name1
, p0.pname AS name2
, p0.distance AS dist
FROM tmp.points p0
WHERE NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance < p0.distance)
UNION
select 1+twin.zrank AS zrank
, twin.name1 AS name1
, p1.pname AS name2
, p1.distance AS dist
FROM tmp.points p1, twin
WHERE p1.distance > twin.dist
AND NOT EXISTS ( SELECT * FROM tmp.points px
WHERE px.distance > twin.dist
AND px.distance < p1.distance
)
)
SELECT twin.name1, twin.name2
, (p1.distance * p2.value + p2.distance * p1.value) / (p1.distance+p2.distance) AS score
FROM twin
JOIN tmp.points p1 ON (p1.pname = twin.name1)
JOIN tmp.points p2 ON (p2.pname = twin.name2)
WHERE twin.zrank =2
;
If you actually want the point in between, there is a built-in way of doing that (but not an aggregate function):
SELECT center(box(x.mypoint,y.mypoint))
FROM ([get some neighbours by distance] order by value limit 1) x
,([get some neighbours by distance] order by value offset 1 limit 1) y;
If you want the mean distance:
SELECT avg(x.distance)
FROM ([get some neighbours by distance] order by value limit 2) as x
See geometrical function and aggregate functions in the manual.
Edit:
For the added example, the query could look like this:
SELECT (x.value * 4 + y.value) / 5 AS result
FROM ([get some neighbours by distance] order by value limit 1) x
,([get some neighbours by distance] order by value offset 1 limit 1) y;
I added missing () to get the result you expect!
Or, my last stab at it:
SELECT y.x, y.x[1], (y.x[1] * 4 + y.x[2]) / 5 AS result
FROM (
SELECT ARRAY(
SELECT value FROM tbl WHERE [some condition] ORDER BY value LIMIT 2
) x
) y
It would be so much easier, if you provided the full query and the table definitions.
I have a table with these columns
create table demo (
ID integer not null,
INDEX integer not null,
DATA varchar2(10)
)
Example data:
ID INDEX DATA
1 1 A
1 3 B
2 1 C
2 1 D
2 5 E
I want:
ID INDEX DATA
1 1 A
1 2 B -- 3 -> 2
2 1 C or D -- 1 -> 1 or 2
2 2 D or C -- 1 -> 2 or 1
2 3 E -- 5 -> 3 (closing the gap)
Is there a way to renumber the INDEX per ID using a single SQL update? Note that the order of items should be preserved (i.e. the item "E" should still be after the items "C" and "D" but the order of "C" and "D" doesn't really matter.
The goal is to be able to create a primary key over ID and INDEX.
If that matters, I'm on Oracle 10g.
MERGE
INTO demo d
USING (
SELECT rowid AS rid, ROW_NUMBER() OVER (PARTITION BY id ORDER BY index) AS rn
FROM demo
) d2
ON (d.rowid = d2.rid)
WHEN MATCHED THEN
UPDATE
SET index = rn