Search - find closest node to n different start points (uniform cost) - algorithm

Suppose I had a node path where the cost of travelling between each node is uniform. I'm trying to find the closest node that 2 or more nodes can travel to. Closest being measured as the cumulative cost of reaching the common node from all start points.
If I wanted to find the closest common node to node A and B, that node would be E.
A -> E (2 cost)
B -> E (1 cost)
If I wanted to find the closest common node to node A, B, C, that node would be F.
A -> F (3 cost)
B -> F (2 cost)
C -> F (1 cost)
And if I wanted to find the closest common node between node G, E, no node is possible.
So there should be two outputs: either the closest node or an error message stating that it cannot reach one another.
I would appreciate if I could be given a algorithm that can achieve this. A link to a article, psudocode or any language is fine, below is some python code that represents the graph above in a defaultdict(list) object.
from enum import Enum
from collections import defaultdict
class Type(Enum):
A = 1
B = 2
C = 3
D = 4
E = 5
F = 6
G = 6
paths = defaultdict(list)
paths[Type.A].append(Type.D)
paths[Type.D].append(Type.G)
paths[Type.D].append(Type.E)
paths[Type.B].append(Type.E)
paths[Type.E].append(Type.F)
paths[Type.C].append(Type.F)
Thanks in advance.

Thanks to #VincentvanderWeele for the suggestion:
Example cost of all nodes from A, B
A B C D E F G
___________________
A 0 X X 1 2 3 2
B X 1 X X 2 2 X
As an optimisation when working out the 2nd+ node you can skip any nodes that the previous nodes can not travel to, e.g.
A B C D E F G
___________________
A 0 X X 1 2 3 2
B X X X X 2 2 X
^
Possible closest nodes:
E = 2 + 2 = 4
F = 2 + 3 = 5
Result is E since it has the lowest cost

Related

Least Frequently Used (LFU) cache tracing

I'm wondering if I've answered this question right:
The page references are in this sequence: *ABCBADACEBEFBEFBA
With LFU page replacement, how many page faults would occur?
SLOT
A
B
C
B
A
D
A
C
E
B
E
F
B
E
F
B
A
1
A
x
x
x
2
B
x
x
x
x
3
C
D
C
E
x
F
E
F
From the tracing I've done. I've come to the conclusion that there are 9 page faults. I count the frequency of each time a page is used and reset it to 0 whenever they are removed from their slot (swapped out). Is this the right way to do this?
SLOT
A
B
C
B
A
D
A
C
E
B
E
F
B
E
F
B
A
1
A
x
x
x
2
B
x
C
B
F
B
F
B
3
C
D
E
x
x
The solution I've been given is like this that gives us 11 page faults. However, I can't understand why the second C would be replaced on slot 2 when the frequency of B is 2 and the frequency of D in slot 3 is only 1.
You should go back to the definition of LFU that your were given in class. It seems that you interpret it as
evict the entry with the least number of hits since it was populated.
in which case your answer (first table) is indeed correct.
However, it seems that the LFU policy used in the expected answer (second table) is
evict the entry with the smallest ratio of freq(X) = number of hits / its age.
In such a case, at the 2nd C, you have
freq(A) = 3/7 = 0.429
freq(B) = 2/6 = 0.333
freq(D) = 1/2 = 0.500
and the entry with the least frequency is, indeed, B.
I'd expect LFU to implement the 2nd strategy, because once you have entries with different ages in your cache, you have to account for them having less or more time to accumulate statistics. Your approach would give correct frequencies only in the limit if the entries are never evicted -- which is not a practically interesting case.

Dijkstra's Algorithm with Negative Weights Query

In this scenario, the aim is to find a path with the smallest total weight. There are 5 sections with each section having different nodes. The nodes are only connected to nodes in adjacent sections and a path must consist of only a single node from each section.
For example, let:
section 1 has nodes [1, 2, 3].
section 2 has nodes [4, 5].
section 3 has nodes [6].
section 4 has nodes [7, 8, 9, 10, 11].
section 5 has nodes [12, 13, 14].
A valid path through the sections is [1, 4, 6, 7 , 12] and also [1, 5, 6, 11, 14] etc...
All nodes have negative weights but negative cycles are impossible (due to the one node per section policy). Therefore, does the process of adding a constant to each node resolve the issue of negative weights? If it can fix the issue, are there any papers which show this? I know there are other algorithms to resolve negative weights but I'm interestted in Dijkstra's algorithm. Thanks.
No, you can't do this. Let's have a look at the counter example. Suppose we have a graph with A, B, C nodes and egdes:
A - B -2 (negative)
A - C 6
B - C 7
we are looking for the shortest path from A to C. In the original graph we have
A - B - C => -2 + 7 = 5 (the shortest path, 5 < 6)
A - C => 6
The best choice is A - B - C. Now, let's get rid of negative edges by adding 2. We'll have now
A - B 0
A - C 8
B - C 9
A - B - C => 0 + 9 = 9
A - C => 8 (the shortest path, 8 < 9)
Please note, that now the shortest path is A - C. Alas! While adding constant value to each edge we ruin the problem itself; and it doesn't matter now which algorithm we use.
Edit: Counter example with all edges (arc to prevent negative loops) being negative:
A -> B -6
B -> C -1
A -> C -5
Before adding 6 we have
A -> B -> C = -6 - 1 = -7 (the shortest path)
A -> C = -5
After adding 6 we get
A -> B 0
B -> C 5
A -> C 1
A -> B -> C = 0 + 5 = 5
A -> C = 1 (the shortest path)

Grouping connected pairs of values

I have a list containing unique pairs of values x and y; for example:
x y
-- --
1 A
2 A
3 A
4 B
5 A
5 C
6 D
7 D
8 C
8 E
9 B
9 F
10 C
10 G
I want to divide this list of pairs as follows:
Group 1
1 A
2 A
3 A
5 A
5 C
8 C
10 C
8 E
10 G
Group 2
4 B
9 B
9 F
Group 3
6 D
7 D
Group 1 contains
all pairs where y = 'A' (1-A, 2-A, 3-A, 5-A)
any additional pairs where x = any of the x's above (5-C)
any additional pairs where y = any of the y's above (8-C, 10-C)
any additional pairs where x = any of the x's above (8-E, 10-G)
The pairs in Group 2 can't be reached in such a manner from any pairs in Group 1, nor can the pairs in Group 3 be reached from either Group 1 or Group 2.
As suggested in Group 1, the chain of connections can be arbitrarily long.
I'm exploring solutions using Perl, but any sort of algorithm, including pseudocode, would be fine. For simplicity, assume that all of the data can fit in data structures in memory.
[UPDATE] Because I need to apply this approach to 5.3 billion pairs, scaleability is important to me.
Pick a starting point. Find all points reachable from that, removing from the master list. Repeat for all added points, until no more can be reached. Move to the next group, starting with another remaining point. Continue until you have no more remaining points.
pool = [(1 A), (2 A), (3 A), (4 B), ... (10 G)]
group_list = []
group = []
pos = 0
while pool is not empty
group = [ pool[0] ] # start with next available point
pos = -1
while pos+1 < size(group) // while there are new points in the group
pos += 1
group_point = group[pos] // grab next available point
for point in pool // find all remaining points reachable
if point and group_point have a coordinate in common
remove point from pool
add point to group
// we've reached closure with that starting point
add group to group_list
return group_list
You can think of the letters and numbers as nodes of a graph, and the pairs as edges. Divide this graph into connected components in linear time.
The connected component with 'A' forms group 1. The other connected components form the other groups.

Finding the largest power of a number that divides a factorial in haskell

So I am writing a haskell program to calculate the largest power of a number that divides a factorial.
largestPower :: Int -> Int -> Int
Here largestPower a b has find largest power of b that divides a!.
Now I understand the math behind it, the way to find the answer is to repeatedly divide a (just a) by b, ignore the remainder and finally add all the quotients. So if we have something like
largestPower 10 2
we should get 8 because 10/2=5/2=2/2=1 and we add 5+2+1=8
However, I am unable to figure out how to implement this as a function, do I use arrays or just a simple recursive function.
I am gravitating towards it being just a normal function, though I guess it can be done by storing quotients in an array and adding them.
Recursion without an accumulator
You can simply write a recursive algorithm and sum up the result of each call. Here we have two cases:
a is less than b, in which case the largest power is 0. So:
largestPower a b | a < b = 0
a is greater than or equal to b, in that case we divide a by b, calculate largestPower for that division, and add the division to the result. Like:
| otherwise = d + largestPower d b
where d = (div a b)
Or putting it together:
largestPower a b | a < b = 1
| otherwise = d + largestPower d b
where d = (div a b)
Recursion with an accumuator
You can also use recursion with an accumulator: a variable you pass through the recursion, and update accordingly. At the end, you return that accumulator (or a function called on that accumulator).
Here the accumulator would of course be the running product of divisions, so:
largestPower = largestPower' 0
So we will define a function largestPower' (mind the accent) with an accumulator as first argument that is initialized as 1.
Now in the recursion, there are two cases:
a is less than b, we simply return the accumulator:
largestPower' r a b | a < b = r
otherwise we multiply our accumulator with b, and pass the division to the largestPower' with a recursive call:
| otherwise = largestPower' (d+r) d b
where d = (div a b)
Or the full version:
largestPower = largestPower' 1
largestPower' r a b | a < b = r
| otherwise = largestPower' (d+r) d b
where d = (div a b)
Naive correct algorithm
The algorithm is not correct. A "naive" algorithm would be to simply divide every item and keep decrementing until you reach 1, like:
largestPower 1 _ = 0
largestPower a b = sumPower a + largestPower (a-1) b
where sumPower n | n `mod` b == 0 = 1 + sumPower (div n b)
| otherwise = 0
So this means that for the largestPower 4 2, this can be written as:
largestPower 4 2 = sumPower 4 + sumPower 3 + sumPower 2
and:
sumPower 4 = 1 + sumPower 2
= 1 + 1 + sumPower 1
= 1 + 1 + 0
= 2
sumPower 3 = 0
sumPower 2 = 1 + sumPower 1
= 1 + 0
= 1
So 3.
The algorithm as stated can be implemented quite simply:
largestPower :: Int -> Int -> Int
largestPower 0 b = 0
largestPower a b = d + largestPower d b where d = a `div` b
However, the algorithm is not correct for composite b. For example, largestPower 10 6 with this algorithm yields 1, but in fact the correct answer is 4. The problem is that this algorithm ignores multiples of 2 and 3 that are not multiples of 6. How you fix the algorithm is a completely separate question, though.

Determine distance between two random nodes in a tree

Given a general tree, I want the distance between two nodes v and w.
Wikipedia states the following:
Computation of lowest common ancestors may be useful, for instance, as part of a procedure for determining the distance between pairs of nodes in a tree: the distance from v to w can be computed as the distance from the root to v, plus the distance from the root to w, minus twice the distance from the root to their lowest common ancestor.
Let's say d(x) denotes the distance of node x from the root which we set to 1. d(x,y) denotes the distance between two vertices x and y. lca(x,y) denotes the lowest common ancestor of vertex pair x and y.
Thus if we have 4 and 8, lca(4,8) = 2 therefore, according to the description above, d(4,8) = d(4) + d(8) - 2 * d(lca(4,8)) = 2 + 3 - 2 * 1 = 3. Great, that worked!
However, the case stated above seems to fail for the vertex pair (8,3) (lca(8,3) = 2) d(8,3) = d(8) + d(3) - 2 * d(2) = 3 + 1 - 2 * 1 = 2. This is incorrect however, the distance d(8,3) = 4 as can be seen on the graph. The algorithm seems to fail for anything that crosses over the defined root.
What am I missing?
You missed that the lca(8,3) = 1, and not = 2. Hence the d(1) == 0 which makes it:
d(8,3) = d(8) + d(3) - 2 * d(1) = 3 + 1 - 2 * 0 = 4
For the appropriate 2 node, namely the one one the right, d(lca(8,2)) == 0, not 1 as you have it in your derivation. The distance from the root--which is the lca in this case--to itself is zero. So
d(8,2) = d(8) + d(2) - 2 * d(lca(8,2)) = 3 + 1 - 2 * 0 = 4
The fact that you have two nodes labeled 2 is probably confusing things.
Edit: The post has been edited so that a node originally labeled 2 is now labeled 3. In this case, the derivation is now correct but the statement
the distance d(8,2) = 4 as can be seen on the graph
is incorrect, d(8,2) = 2.

Resources