Find the maximal k such that there exists a point covered by k intervals - algorithm

Suppose that n closed intervals [a[i], b[i]] on the real line are given (i = 1..n). Find the maximal k such that there exists a point covered by k intervals (the maximal number of “layers”). The number of operations should be of order nlogn.
There is a solution about it.
[Hint. Sort all the left and right endpoints of the intervals together. While sorting, assume that the left endpoint precedes the right endpoint located at the same point of the real line. Then move from left to right counting the number of layers. When we cross the left endpoint, increase the number of layers by 1; when we cross the right endpoint, decrease the number of layers by 1. Please note that two adjacent intervals are processed correctly; that is, the left endpoint precedes the right endpoint according to our convention.]
My question is, how do I know if the point encountered is left endpoint or right endpoint? Do I need extra spaces to take record?

The question itself contains the steps of the expected algorithm, almost a pseudo-code.
I turned the description into a python 3 program as an excercise:
def prefix_sum(seq, acc = 0):
for i in seq:
acc += i
yield acc
def count_layers(intervals):
endpoints = sorted([(s, -1) for s,e in intervals] + [(e, +1) for s,e in intervals])
return -min(prefix_sum(delta for _,delta in endpoints))
print(count_layers([[2,3],[1,2]]))
Tested with:
def test(intervals):
print()
print('Test')
for s,e in intervals:
print(' ' * s + '-' * (e - s + 1))
print('Answer:', count_layers(intervals))
TEST_CASES = [
[ [ 1, 5], [ 4, 9], [ 2, 4], [ 6,12], ],
[ [ 1, 3], [ 3, 5], [ 7, 9], [ 5, 7], ],
[ [ 3, 4], [ 1, 2], ],
[ [ 2, 3], [ 1, 2], ],
]
for test_case in TEST_CASES:
test(test_case)

Related

How to find all possible unique paths in a grid?

I have a 3 x 3 grid with randomly placed obstacles in which there is a random starting point but no endpoint. The endpoint is created when there are no more cells to occupy. Movement can occur up, down, left or right.
How can I see all possible unique paths within the grid?
Example:
Once a cell is used when looking for a path, it cannot be used again (1 becomes a 0).
If there are no more neighbouring cells to move into the path has ended. Regardless of weather all the cells have been visited.
# bottom left is (0, 0) and top right is (2, 2)...
# start is (1, 1) and obstacle is (2, 1)
[1] [1] [1]
[1] [S] [0]
[1] [1] [1]
S = starting point
0 = obstacle
1 = open cell
With the above example there would be 6 unique paths.
path_1 = (1, 2), (2, 2) # up, right
path_2 = (1, 0), (2, 0) # down, right
path_3 = (0, 1), (0, 2), (1, 2), (2, 2) # left, up, right, right
path_4 = (0, 1), (0, 0), (1, 0), (2, 0) # left, down, right, right
path_5 = (1, 2), (0, 2), (0, 1), (0, 0), (1, 0), (2, 0) # up, left, down, down, right, right
path_6 = (1, 0), (0, 0), (0, 1), (0, 2) (1, 2), (2, 2) # down, left, up, up, right, right
To get all the paths, you can use DFS or BFS but each path needs to have a unique visited set to keep track that you:
do not go back to the same coordinate twice in a single path
allow different paths to go on the same coordinate
I wrote a DFS implementation for a grid here and the solution will rely on this example.
Solution
To do graph search one would need to define the states, which are the coordinates in this case, but for this problem we will keep track of two parameters, for convenience:
The path taken will be documented via Crack code (right=0, down=1, left=2, up=3) which is a form of chain code
the visited set for each path will de documented for the reasons noted above
The implementation in Python is as follows (in my case the top left matches coordinate (0,0) and lower right matches (n-1,n-1) for nXn grid)
import collections
def find_paths(grid, start_coord):
paths = set() # paths will be added here
state_queue = collections.deque([]) # Pending states which have not been explored yet
# State is a tuple consists of:
# 1. current coordinate
# 2. crack code path
# 3. set of visited coordinates in that path
state = [start_coord, [], {start_coord}] # Starting state
while True:
# Getting all possible neighboring states
# Crack code (right=0, down=1, left=2, up=3)
state_right = [(state[0][0],state[0][1]+1), state[1] + [0], state[2].copy()] if state[0][1]+1 < len(grid[state[0][0]]) else None
state_down = [(state[0][0]+1,state[0][1]), state[1] + [1], state[2].copy()] if state[0][0]+1 < len(grid) else None
state_left = [(state[0][0],state[0][1]-1), state[1] + [2], state[2].copy()] if state[0][1]-1 >= 0 else None
state_up = [(state[0][0]-1,state[0][1]), state[1] + [3], state[2].copy()] if state[0][0]-1 >= 0 else None
# Adding to the queue all the unvisited states, as well as to the visited to avoid returning to states
blocked_counter = 0
for next_state in [state_right, state_down, state_left, state_up]:
if next_state is None:
blocked_counter += 1
elif next_state[0] in state[2] or grid[next_state[0][0]][next_state[0][1]] == 0:
blocked_counter += 1
else:
next_state[2].add(next_state[0])
state_queue.append(next_state)
# After checking all directions' if reached a 'dead end', adding this path to the path set
if blocked_counter == 4:
paths.add(tuple(state[1]))
# Popping next state from the queue and updating the path if needed
try:
state = state_queue.pop()
except IndexError:
break
return paths
Explanation
At the beginning we create the initial state, as well as the initial path which is an empty list and the visited set, containing only the start coordinate
For each state we are in we do the following:
2.1. create the four neighboring states (advancing right, up, left or down)
2.2. checking if we can advance in each direction by checking if the path we are in already visited this coordinate and if the coordinate is valid
2.3. if we can advance in the said direction, add this next_state to the state_queue as well as to the visited set of the path
2.4. if we can not advance in any of the four directions, we reached a 'dead end' and we add the path we are in to the paths set
2.5. pop the next state from state_queue which is kind of a bad name since it is a stack (due to the appending and popping from the same side of the deque()
When the state_queue is empty, we finished the search and we can return all the found paths
When running this with the given example, i.e.
start_coord = (1,1)
grid = [[1, 1, 1],
[1, 1, 0],
[1, 1, 1]]
We get:
find_paths(grid, start_coord)
# {(2, 3, 0, 0), (3, 2, 1, 1, 0, 0), (3, 0), (2, 1, 0, 0), (1, 0), (1, 2, 3, 3, 0, 0)}
This is actually not as difficult as it seems. Since it seems to me that you are learning, I avoid giving you code and will focus on the ideas.
The solution
Since you need to find all solutions, this is a classical backtracking problem. The idea of backtracking is to try out every possibility and store all the solutions. Particularities about your problem:
grid
movement
obstacles
unique solutions
How to check everything?
You need to loop your grid and for each point and for each point, starting from a depth of 0, you repeat the following:
if depth < available points map the possible moves (no obstacles, not visited) (1)
for each point:
increase depth with 1
mark the current point as visited
check whether there was a solution and handle it if so
with the current status, jump to (1)
unmark the current point from being visited
decrease depth by 1
Handling the solution
You need to have a unique, predictable signature for your solution, like the points you have moved to in their order and store all solutions you had so far. Before you enter your new solution, check whether it's already among the solutions and only append it if it's not there.
Pruning
You can prune your findings based on earlier findings, if the problem-space is big-enough to make this a performance optimization, but you should only consider that when you already have a solution.
I was looking to solve this with dynamic programming to avoid the exponential time complexity but failed to do so. However, here is a dense version in Javascript which uses DFS (which given this problem is the same as brute-forcing since we are interested in all possible paths) and recursion in a functional style with es6 features. We also make use of a single-dimension array to represent the board with the directions method accounting for which positions are reachable from a given position. The single-dimension array contains the rows of the board laid out sequentially from top to bottom.
[ 0 1 2 ]
[ 3 4 5 ] => [0, 1, 2, 3, 4, 5, 6, 7, 8]
[ 6 7 8 ]
The algorithm works for any m by n grid and so we must specify the width as an input parameter.
const directions = (pos, board, width) => {
const [n, s, w, e] = [pos - width, pos + width, pos - 1, pos + 1]
return [
n >= 0 && board[n] ? n : null,
s < board.length && board[s] ? s : null,
pos % width !== 0 && board[w] ? w : null,
e % width !== 0 && board[e] ? e : null
]
.filter(pos => pos !== null)
}
const solve = (pos, board, width, path) => {
const next = directions(pos, board, width)
return next.length === 0 ?
[[...path, pos]] // have a complete path
:
next
.map(nextPos => solve(nextPos, [...board.slice(0, pos), 0, ...board.slice(pos + 1)], width, [...path, pos]))
.flat()
}
const oneDim2TwoDim = (oneDimCoord, width) =>
[Math.floor(oneDimCoord / width), oneDimCoord % width]
const res = solve(4, [1, 1, 1, 1, 1, 0, 1, 1, 1], 3, [])
console.log(res.map(path => path.map(step => oneDim2TwoDim(step, 3))))
At each step, we check which directions are possible to go in. If no direction is available, we return the current path, otherwise we make a recursive call in each of the directions.
Use it like
solve(4, [1, 1, 1, 1, 1, 0, 1, 1, 1], 3, [])
/*
[
[ 4, 1, 0, 3, 6, 7, 8],
[ 4, 1, 2 ],
[ 4, 7, 6, 3, 0, 1, 2],
[ 4, 7, 8 ],
[ 4, 3, 0, 1, 2 ],
[ 4, 3, 6, 7, 8 ]
]
*/
Each path starts with the starting position and you can easily convert it back to (x, y) coordinate style with
const oneDim2TwoDim = (oneDimCoord, width) =>
[Math.floor(oneDimCoord / width), oneDimCoord % width]
const res = solve(4, [1, 1, 1, 1, 1, 0, 1, 1, 1], 3, [])
.map(path => path.map(step => oneDim2TwoDim(step, 3)))
/*
[
[ [ 1, 1 ], [ 0, 1 ], [ 0, 0 ], [ 1, 0 ], [ 2, 0 ], [ 2, 1 ], [ 2, 2 ] ],
[ [ 1, 1 ], [ 0, 1 ], [ 0, 2 ] ],
[ [ 1, 1 ], [ 2, 1 ], [ 2, 0 ], [ 1, 0 ], [ 0, 0 ], [ 0, 1 ], [ 0, 2 ] ],
[ [ 1, 1 ], [ 2, 1 ], [ 2, 2 ] ],
[ [ 1, 1 ], [ 1, 0 ], [ 0, 0 ], [ 0, 1 ], [ 0, 2 ] ],
[ [ 1, 1 ], [ 1, 0 ], [ 2, 0 ], [ 2, 1 ], [ 2, 2 ] ]
]
*/

Detect outlier in repeating sequence

I have a repeating sequence of say 0~9 (but may start and stop at any of these numbers). e.g.:
3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2
And it has outliers at random location, including 1st and last one, e.g.:
9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6
I need to find & correct the outliers, in the above example, I need correct the first "9" into "3", and "8" into "5", etc..
What I came up with is to construct a sequence with no outlier of desired length, but since I don't know which number the sequence starts with, I'd have to construct 10 sequences each starting from "0", "1", "2" ... "9". And then I can compare these 10 sequences with the given sequence and find the one sequence that match the given sequence the most. However this is very inefficient when the repeating pattern gets large (say if the repeating pattern is 0~99, I'd need to create 100 sequences to compare).
Assuming there won't be consecutive outliers, is there a way to find & correct these outliers efficiently?
edit: added some explanation and added the algorithm tag. Hopefully it is more appropriate now.
I'm going to propose a variation of #trincot's fine answer. Like that one, it doesn't care how many outliers there may be in a row, but unlike that one doesn't care either about how many in a row aren't outliers.
The base idea is just to let each sequence element "vote" on what the first sequence element "should be". Whichever gets the most votes wins. By construction, this maximizes the number of elements left unchanged: after the 1-liner loop ends, votes[i] is the number of elements left unchanged if i is picked as the starting point.
def correct(numbers, mod=None):
# this part copied from #trincot's program
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
votes = [0] * mod
for i, x in enumerate(numbers):
# which initial number would make x correct?
votes[(x - i) % mod] += 1
winning_count = max(votes)
winning_numbers = [i for i, v in enumerate(votes)
if v == winning_count]
if len(winning_numbers) > 1:
raise ValueError("ambiguous!", winning_numbers)
winning_number = winning_numbers[0]
for i in range(len(numbers)):
numbers[i] = (winning_number + i) % mod
return numbers
Then, e.g.,
>>> correct([9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6])
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
but
>>> correct([1, 5, 3, 7, 5, 9])
...
ValueError: ('ambiguous!', [1, 4])
That is, it's impossible to guess whether you want [1, 2, 3, 4, 5, 6] or [4, 5, 6, 7, 8, 9]. They both have 3 numbers "right", and despite that there are never two adjacent outliers in either case.
I would do a first scan of the list to find the longest sublist in the input that maintains the right order. We will then assume that those values are all correct, and calculate backwards what the first value would have to be to produce those values in that sublist.
Here is how that would look in Python:
def correct(numbers, mod=None):
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
# Find the longest slice in the list that maintains order
start = 0
longeststart = 0
longest = 1
expected = -1
for last in range(len(numbers)):
if numbers[last] != expected:
start = last
elif last - start >= longest:
longest = last - start + 1
longeststart = start
expected = (numbers[last] + 1) % mod
# Get from that longest slice what the starting value should be
val = (numbers[longeststart] - longeststart) % mod
# Repopulate the list starting from that value
for i in range(len(numbers)):
numbers[i] = val
val = (val + 1) % mod
# demo use
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
correct(numbers, 10) # for 0..9 provide 10 as argument, ...etc
print(numbers)
The advantage of this method is that it would even give a good result if there were errors with two consecutive values, provided that there are enough correct values in the list of course.
Still this runs in linear time.
Here is another way using groupby and count from Python's itertools module:
from itertools import count, groupby
def correct(lst):
groupped = [list(v) for _, v in groupby(lst, lambda a, b=count(): a - next(b))]
# Check if all groups are singletons
if all(len(k) == 1 for k in groupped):
raise ValueError('All groups are singletons!')
for k, v in zip(groupped, groupped[1:]):
if len(k) < 2:
out = v[0] - 1
if out >= 0:
yield out
else:
yield from k
else:
yield from k
# check last element of the groupped list
if len(v) < 2:
yield k[-1] + 1
else:
yield from v
lst = "9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6"
lst = [int(k) for k in lst.split(',')]
out = list(correct(lst))
print(out)
Output:
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
Edit:
For the case of [1, 5, 3, 7, 5, 9] this solution will return something not accurate, because i can't see which value you want to modify. This is why the best solution is to check & raise a ValueError if all groups are singletons.
Like this?
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
i = 0
for n in numbers[:-1]:
i += 1
if n > numbers[i] and n > 0:
numbers[i-1] = numbers[i]-1
elif n > numbers[i] and n == 0:
numbers[i - 1] = 9
n = numbers[-1]
if n > numbers[0] and n > 0:
numbers[-1] = numbers[0] - 1
elif n > numbers[0] and n == 0:
numbers[-1] = 9
print(numbers)

Adding a number to each column of a matrix

I have a matrix
A = [[ 1. 2. 3.]
[ 4. 5. 6.]]
and a vector
b = [ 5. 10. 15.]
I want to add each column of A (A[:,i]) to b[i], i.e.
[[ 6. 12. 18.]
[ 9. 15. 21.]]
an easy way to do it would be
A = tf.constant([[1., 2, 3],[1, 2, 3]])
b = tf.constant([[5, 10, 15.]])
e = tf.ones((2,1))
a + tf.matmul( e, b ) # outer product "repmat"
But it seems terribly wasteful to do this and have to construct an entire auxiliary matrix which we will eventually throw out. Is there a more idomatic way of doing this without writing my own op?
As mentioned you can do A + b:
import tensorflow as tf
tf.InteractiveSession()
A = tf.constant([[1., 2, 3], [4, 5, 6]])
b = tf.constant([[5, 10, 15.]])
(A + b).eval()
returns:
array([[ 6., 12., 18.],
[ 9., 15., 21.]], dtype=float32)
The reason this works is because of array broadcasting. The Numpy broadcasting page has great info and tensorflow broadcasting works the same way. Basically for each dimension (moving from trailing dimension to leading dimension) tensorflow/numpy attempts checks to see if the dimensions are compatible (either they have the same number of elements or one of them only has 1 element).
In your case A is of shape [2, 3] and b is of shape [1, 3]. The second dimensions match, but because b has a first dimension with only a single element that element of b is "broadcast" along the first dimension of A (two elements).

pairwise distinct left ends in all segments

I am provided with M segments of form [L,R] of N elements of an array.I need to change these segments in such a way that all segments have pairwise distinct left ends.
Example : Let suppose we have 5 elements in array and we have 4 segments : [1,2],[1,3],[2,4] and [4,5] then after making all the left ends pairwise disjoint we have [1,2],[3,3],[2,4] and [4,5].Here all segments have different left ends
Let's see if I got this. I suggest
You sort all segments according to the right end.
Then you fix all the left ends, starting with the smallest right end working towards larger right ends. Fixing means you replace the current left end with the next available value.
In Python it looks like this:
def fit_intervals(datalist):
d1 = sorted(datalist, key=lambda x : x[1])
taken = set()
def find_next_free(x):
while x in taken:
x = x + 1
taken.add(x)
return x
for interval in d1:
interval[0] = find_next_free( interval[0] )
data = [ [4,5], [1,9], [1,2], [1,3], [2,4] ]
fit_intervals(data)
print(data)
output: [[4, 5], [5, 9], [1, 2], [2, 3], [3, 4]]
This function find_next_free currently uses a simple linear algorithm, if necessary this could certainly be improved.

Partitioning a superset and getting the list of original sets for each partition

Introduction
While trying to do some cathegorization on nodes in a graph (which will be rendered differenty), I find myself confronted with the following problem:
The Problem
Given a superset of elements S = {0, 1, ... M} and a number n of non-disjoint subsets T_i thereof, with 0 <= i < n, what is the best algorithm to find out the partition of the set S called P?
P = S is the union of all disjoint partitions P_j of the original superset S, with 0 <= j < M, such that for all elements x in P_j, every x has the same list of "parents" among the "original" sets T_i.
Example
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
So all P_js would be:
P_1 = [1, 4] # all elements x have the same list of "parents": T_1, T_3
P_2 = [2] # all elements x have the same list of "parents": T_2
P_3 = [3] # all elements x have the same list of "parents": T_2, T_3
P_4 = [5, 6, 8, 9] # all elements x have the same list of "parents": S (so they're not in any of the P_j
Questions
What are good functions/classes in the python packages to compute all P_js and the list of their "parents", ideally restricted to numpy and scipy? Perhaps there's already a function which does just that
What is the best algorithm to find those partitions P_js and for each one, the list of "parents"? Let's note T_0 = S
I think the brute force approach would be to generate all 2-combinations of T sets and split them in at most 3 disjoint sets, which would be added back to the pool of T sets and then repeat the process until all resulting Ts are disjoint, and thus we've arrived at our answer - the set of P sets. A little problematic could be caching all the "parents" on the way there.
I suspect a dynamic programming approach could be used to optimize the algorithm.
Note: I would have loved to write the math parts in latex (via MathJax), but unfortunately this is not activated :-(
The following should be linear time (in the number of the elements in the Ts).
from collections import defaultdict
S = [1, 2, 3, 4, 5, 6, 8, 9]
T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]
Ts = [S, T_1, T_2, T_3]
parents = defaultdict(int)
for i, T in enumerate(Ts):
for elem in T:
parents[elem] += 2 ** i
children = defaultdict(list)
for elem, p in parents.items():
children[p].append(elem)
print(list(children.values()))
Result:
[[5, 6, 8, 9], [1, 4], [2], [3]]
The way I'd do this is to construct an M × n boolean array In where In(i, j) &equals; Si &in; Tj. You can construct that in O(Σj|Tj|), provided you can map an element of S onto its integer index in O(1), by scanning all of the sets T and marking the corresponding bit in In.
You can then read the "signature" of each element i directly from In by concatenating row i into a binary number of n bits. The signature is precisely the equivalence relationship of the partition you are seeking.
By the way, I'm in total agreement with you about Math markup. Perhaps it's time to mount a new campaign.

Resources