Why does this code take up so much memory? - algorithm

I created a solution to this problem on leetcode:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
My solution causes an out of memory exception:
class Solution:
# #param {string} s
# #return {string[]}
def findRepeatedDnaSequences(self, s):
seen_so_far = set()
results = set()
for seq in self.window(10, s):
if seq in seen_so_far:
results.add(seq)
else:
seen_so_far.add(seq)
return list(results)
def window(self, window_size, array):
window_start = 0
window_end = window_size
while window_end < len(array):
yield array[window_start:window_end+1]
window_end += window_size
This solution works:
class Solution:
# #param {string} s
# #return {string[]}
def findRepeatedDnaSequences(self, s):
d = {}
res = []
for i in range(len(s)):
key = s[i:i+10]
if key not in d:
d[key] = 1
else:
d[key] += 1
for e in d:
if d[e] > 1:
res.append(e)
return res
They seem essentially the same. What am I missing?

Your window function is incorrect. It yields a sequence of substrings [0:11], [0:21], [0:31], ... (note that window_start remains zero). It can be fixed e.g. as
def window(self, window_size, array):
window_start = 0
while window_start < len(array) - window_size + 1:
yield array[window_start:window_start+window_size]
window_start += 1
Edit: substring ending indices were off by 1.

In this:
def window(self, window_size, array):
window_start = 0
window_end = window_size
while window_end < len(array):
yield array[window_start:window_end+1]
window_end += window_size
the value of window_start is never changed. So, given that window_size is 10, you first yield the slice 0:11 (but wanted 0:10), then the slice 0:21 (but wanted 1:11), then the slice 0:31 (but wanted 2:12), and so on. The total length of all the slices you return grows proportionally to the square of len(array). If array is long, that would explain it. But not enough info was given to be sure about that.

Related

Implementing a PriorityQueue Algorithm

Here is my implementation of a PriorityQueue algorithm. I have a feeling that my pop function is wrong. But I am not sure where exactly it is wrong. I have checked multiple times on where my logic went wrong but it seems to be perfectly correct(checked with CLRS pseudo code).
class PriorityQueue:
"""Array-based priority queue implementation."""
def __init__(self):
"""Initially empty priority queue."""
self.queue = []
self.min_index = None
def parent(self, i):
return int(i/2)
def left(self, i):
return 2*i+1
def right(self, i):
return 2*i+2
def min_heapify(self, heap_size, i):
#Min heapify as written in CLRS
smallest = i
l = self.left(i)
r = self.right(i)
#print([l,r,len(self.queue),heap_size])
try:
if l <= heap_size and self.queue[l] < self.queue[i]:
smallest = l
else:
smallest = i
except IndexError:
pass
try:
if r <= heap_size and self.queue[r] < self.queue[smallest]:
smallest = r
except IndexError:
pass
if smallest != i:
self.queue[i], self.queue[smallest] = self.queue[smallest], self.queue[i]
self.min_heapify(heap_size, smallest)
def heap_decrease_key(self, i, key):
#Implemented as specified in CLRS
if key > self.queue[i]:
raise ValueError("new key is larger than current key")
#self.queue[i] = key
while i > 0 and self.queue[self.parent(i)] > self.queue[i]:
self.queue[i], self.queue[self.parent(i)] = self.queue[self.parent(i)], self.queue[i]
i = self.parent(i)
def __len__(self):
# Number of elements in the queue.
return len(self.queue)
def append(self, key):
"""Inserts an element in the priority queue."""
if key is None:
raise ValueError('Cannot insert None in the queue')
self.queue.append(key)
heap_size = len(self.queue)
self.heap_decrease_key(heap_size - 1, key)
def min(self):
"""The smallest element in the queue."""
if len(self.queue) == 0:
return None
return self.queue[0]
def pop(self):
"""Removes the minimum element in the queue.
Returns:
The value of the removed element.
"""
if len(self.queue) == 0:
return None
self._find_min()
popped_key = self.queue[self.min_index]
self.queue[0] = self.queue[len(self.queue)-1]
del self.queue[-1]
self.min_index = None
self.min_heapify(len(self.queue), 0)
return popped_key
def _find_min(self):
# Computes the index of the minimum element in the queue.
#
# This method may crash if called when the queue is empty.
if self.min_index is not None:
return
min = self.queue[0]
self.min_index = 0
Any hint or input will be highly appreciated
The main issue is that the parent function is wrong. As it should do the opposite from the left and right methods, you should first subtract 1 from i before halving that value:
def parent(self, i):
return int((i-1)/2)
Other things to note:
You don't really have a good use for the member self.min_index. It is either 0 or None, and the difference is not really used in your code, as it follows directly from whether the heap is empty or not. This also means you don't need the method _find_min, (which in itself is strange: you assign to min, but never use that). Any way, drop that method, and the line where you call it. Also drop the line where you assign None to self.min_index, and the only other place where you read the value, just use 0.
You have two ways to protect against index errors in the min_heapify method: <= heapsize and a try block. The first protection should really have < instead of <=, but you should use only one way, not two. So either test the less-than, or trap the exception.
The else block with smallest = i is unnecessary, because at that time smallest == i.
min_heapify has a first parameter that always receives the full size of the heap. So it is an unnecessary parameter. It would also not make sense to ever call this method with another value for it. So drop that argument from the method definition and all calls. And then define heap_size = len(self.queue) as a local name in that function
In heap_decrease_key you commented out the assignment #self.queue[i] = key, which is fine as long as you never call this method to really decrease a key. But although you never do that from "inside" the class, the user of the class may well want to use it in that way (since that is what the method's name is suggesting). So better uncomment that assignment.
With the above changes, your instance would only have queue as its data property. This is fine, but you could consider to let PriorityQueue inherit from list, so that you don't need this property either, and can just work with the list that you inherit. By consequence, you should then replace self.queue with self throughout your code, and you can drop the __init__ and __len__ methods, since the list implementation of those is just what you need. A bit of care is needed in the case where you want to call a list original method, when you have overridden it, like append. In that case use super().append.
With all of the above changes applied:
class PriorityQueue(list):
"""Array-based priority queue implementation."""
def parent(self, i):
return int((i-1)/2)
def left(self, i):
return 2*i+1
def right(self, i):
return 2*i+2
def min_heapify(self, i):
#Min heapify as written in CLRS
heap_size = len(self)
smallest = i
l = self.left(i)
r = self.right(i)
if l < heap_size and self[l] < self[i]:
smallest = l
if r < heap_size and self[r] < self[smallest]:
smallest = r
if smallest != i:
self[i], self[smallest] = self[smallest], self[i]
self.min_heapify(smallest)
def heap_decrease_key(self, i, key):
#Implemented as specified in CLRS
if key > self[i]:
raise ValueError("new key is larger than current key")
self[i] = key
while i > 0 and self[self.parent(i)] > self[i]:
self[i], self[self.parent(i)] = self[self.parent(i)], self[i]
i = self.parent(i)
def append(self, key):
"""Inserts an element in the priority queue."""
if key is None:
raise ValueError('Cannot insert None in the queue')
super().append(key)
heap_size = len(self)
self.heap_decrease_key(heap_size - 1, key)
def min(self):
"""The smallest element in the queue."""
if len(self) == 0:
return None
return self[0]
def pop(self):
"""Removes the minimum element in the queue.
Returns:
The value of the removed element.
"""
if len(self) == 0:
return None
popped_key = self[0]
self[0] = self[-1]
del self[-1]
self.min_heapify(0)
return popped_key
Your parent function is already wrong.
The root element of your heap is stored in array index 0, the children are in 1 and 2. The parent of 1 is 0, that is correct, but the parent of 2 should also be 0, whereas your function returns 1.
Usually the underlying array of a heap does not use the index 0, instead the root element is at index 1. This way you can compute parent and children like this:
parent(i): i // 2
left_child(i): 2 * i
right_child(i): 2 * i + 1

Python Codility Frog River One time complexity

So this is another approach to probably well-known codility platform, task about frog crossing the river. And sorry if this question is asked in bad manner, this is my first post here.
The goal is to find the earliest time when the frog can jump to the other side of the river.
For example, given X = 5 and array A such that:
A[0] = 1
A[1] = 3
A[2] = 1
A[3] = 4
A[4] = 2
A[5] = 3
A[6] = 5
A[7] = 4
the function should return 6.
Example test: (5, [1, 3, 1, 4, 2, 3, 5, 4])
Full task content:
https://app.codility.com/programmers/lessons/4-counting_elements/frog_river_one/
So that was my first obvious approach:
def solution(X, A):
lista = list(range(1, X + 1))
if X < 1 or len(A) < 1:
return -1
found = -1
for element in lista:
if element in A:
if A.index(element) > found:
found = A.index(element)
else: return -1
return found
X = 5
A = [1,2,4,5,3]
solution(X,A)
This solution is 100% correct and gets 0% in performance tests.
So I thought less lines + list comprehension will get better score:
def solution(X, A):
if X < 1 or len(A) < 1:
return -1
try:
found = max([ A.index(element) for element in range(1, X + 1) ])
except ValueError:
return -1
return found
X = 5
A = [1,2,4,5,3]
solution(X,A)
This one also works and has 0% performance but it's faster anyway.
I also found solution by deanalvero (https://github.com/deanalvero/codility/blob/master/python/lesson02/FrogRiverOne.py):
def solution(X, A):
# write your code in Python 2.6
frog, leaves = 0, [False] * (X)
for minute, leaf in enumerate(A):
if leaf <= X:
leaves[leaf - 1] = True
while leaves[frog]:
frog += 1
if frog == X: return minute
return -1
This solution gets 100% in correctness and performance tests.
My question arises probably because I don't quite understand this time complexity thing. Please tell me how the last solution is better from my second solution? It has a while loop inside for loop! It should be slow but it's not.
Here is a solution in which you would get 100% in both correctness and performance.
def solution(X, A):
i = 0
dict_temp = {}
while i < len(A):
dict_temp[A[i]] = i
if len(dict_temp) == X:
return i
i += 1
return -1
The answer already been told, but I'll add an optional solution that i think might help you understand:
def save_frog(x, arr):
# creating the steps the frog should make
steps = set([i for i in range(1, x + 1)])
# creating the steps the frog already did
froggy_steps = set()
for index, leaf in enumerate(arr):
froggy_steps.add(leaf)
if froggy_steps == steps:
return index
return -1
I think I got the best performance using set()
take a look at the performance test runtime seconds and compare them with yours
def solution(X, A):
positions = set()
seconds = 0
for i in range(0, len(A)):
if A[i] not in positions and A[i] <= X:
positions.add(A[i])
seconds = i
if len(positions) == X:
return seconds
return -1
The amount of nested loops doesn't directly tell you anything about the time complexity. Let n be the length of the input array. The inside of the while-loop needs in average O(1) time, although its worst case time complexity is O(n). The fast solution uses a boolean array leaves where at every index it has the value true if there is a leaf and false otherwise. The inside of the while-loop during the entire algotihm is excetuded no more than n times. The outer for-loop is also executed only n times. This means the time complexity of the algorithm is O(n).
The key is that both of your initial solutions are quadratic. They involve O(n) inner scans for each of the parent elements (resulting in O(n**2)).
The fast solution initially appears to suffer the same fate as it's obvious it contains a loop within a loop. But the inner while loop does not get fully scanned for each 'leaf'. Take a look at where 'frog' gets initialized and you'll note that the while loop effectively picks up where it left off for each leaf.
Here is my 100% solution that considers the sum of numeric progression.
def solution(X, A):
covered = [False] * (X+1)
n = len(A)
Sx = ((1+X)*X)/2 # sum of the numeric progression
for i in range(n):
if(not covered[A[i]]):
Sx -= A[i]
covered[A[i]] = True
if (Sx==0):
return i
return -1
Optimized solution from #sphoenix, no need to compare two sets, it's not really good.
def solution(X, A):
found = set()
for pos, i in enumerate(A, 0):
if i <= X:
found.add(i)
if len(found) == X:
return pos
return -1
And one more optimized solution for binary array
def solution(X, A):
steps, leaves = X, [False] * X
for minute, leaf in enumerate(A, 0):
if not leaves[leaf - 1]:
leaves[leaf - 1] = True
steps -= 1
if 0 == steps:
return minute
return -1
The last one is better, less resources. set consumes more resources compared to binary list (memory and CPU).
def solution(X, A):
# if there are not enough items in the list
if X > len(A):
return -1
# else check all items
else:
d = {}
for i, leaf in enumerate(A):
d[leaf] = i
if len(d) == X:
return i
# if all else fails
return -1
I tried to use as much simple instruction as possible.
def solution(X, A):
if (X > len(A)): # check for no answer simple
return -1
elif(X == 1): # check for single element
return 0
else:
std_set = {i for i in range(1,X+1)} # list of standard order
this_set = set(A) # set of unique element in list
if(sum(std_set) > sum(this_set)): # check for no answer complex
return -1
else:
for i in range(0, len(A) - 1):
if std_set:
if(A[i] in std_set):
std_set.remove(A[i]) # remove each element in standard set
if not std_set: # if all removed, return last filled position
return(i)
I guess this code might not fulfill runtime but it the simplest I could think of
I am using OrderedDict from collections and sum of first n numbers to check the frog will be able to cross or not.
def solution(X, A):
from collections import OrderedDict as od
if sum(set(A))!=(X*(X+1))//2:
return -1
k=list(od.fromkeys(A).keys())[-1]
for x,y in enumerate(A):
if y==k:
return x
This code gives 100% for correctness and performance, runs in O(N)
def solution(x, a):
# write your code in Python 3.6
# initialize all positions to zero
# i.e. if x = 2; x + 1 = 3
# x_positions = [0,1,2]
x_positions = [0] * (x + 1)
min_time = -1
for k in range(len(a)):
# since we are looking for min time, ensure that you only
# count the positions that matter
if a[k] <= x and x_positions[a[k]] == 0:
x_positions[a[k]] += 1
min_time = k
# ensure that all positions are available for the frog to jump
if sum(x_positions) == x:
return min_time
return -1
100% performance using sets
def solution(X, A):
positions = set()
for i in range(len(A)):
if A[i] not in positions:
positions.add(A[i])
if len(positions) == X:
return i
return -1

Algorithm for finding all points within distance of another point

I had this problem for an entry test for a job. I did not pass the test. I am disguising the question in deference to the company.
Imagine you have N number of people in a park of A X B space. If a person has no other person within 50 feet, he enjoys his privacy. Otherwise, his personal space is violated. Given a set of (x, y), how many people will have their space violated?
For example, give this list in Python:
people = [(0,0), (1,1), (1000, 1000)]
We would find 2 people who are having their space violated: 1, 2.
We don't need to find all sets of people; just the total number of unique people.
You can't use a brute method to solve the problem. In other words, you can't use a simple array within an array.
I have been working on this problem off and on for a few weeks, and although I have gotten a solution faster than n^2, have not come up with a problem that scales.
I think the only correct way to solve this problem is by using Fortune's algorithm?
Here's what I have in Python (not using Fortune's algorithm):
import math
import random
random.seed(1) # Setting random number generator seed for repeatability
TEST = True
NUM_PEOPLE = 10000
PARK_SIZE = 128000 # Meters.
CONFLICT_RADIUS = 500 # Meters.
def _get_distance(x1, y1, x2, y2):
"""
require: x1, y1, x2, y2: all integers
return: a distance as a float
"""
distance = math.sqrt(math.pow((x1 - x2), 2) + math.pow((y1 - y2),2))
return distance
def check_real_distance(people1, people2, conflict_radius):
"""
determine if two people are too close
"""
if people2[1] - people1[1] > conflict_radius:
return False
d = _get_distance(people1[0], people1[1], people2[0], people2[1])
if d >= conflict_radius:
return False
return True
def check_for_conflicts(peoples, conflict_radius):
# sort people
def sort_func1(the_tuple):
return the_tuple[0]
_peoples = []
index = 0
for people in peoples:
_peoples.append((people[0], people[1], index))
index += 1
peoples = _peoples
peoples = sorted(peoples, key = sort_func1)
conflicts_dict = {}
i = 0
# use a type of sweep strategy
while i < len(peoples) - 1:
x_len = peoples[i + 1][0] - peoples[i][0]
conflict = False
conflicts_list =[peoples[i]]
j = i + 1
while x_len <= conflict_radius and j < len(peoples):
x_len = peoples[j][0] - peoples[i][0]
conflict = check_real_distance(peoples[i], peoples[j], conflict_radius)
if conflict:
people1 = peoples[i][2]
people2 = peoples[j][2]
conflicts_dict[people1] = True
conflicts_dict[people2] = True
j += 1
i += 1
return len(conflicts_dict.keys())
def gen_coord():
return int(random.random() * PARK_SIZE)
if __name__ == '__main__':
people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
print("people in conflict: {}".format(conflicts))
As you can see from the comments, there's lots of approaches to this problem. In an interview situation you'd probably want to list as many as you can and say what the strengths and weaknesses of each one are.
For the problem as stated, where you have a fixed radius, the simplest approach is probably rounding and hashing. k-d trees and the like are powerful data structures, but they're also quite complex and if you don't need to repeatedly query them or add and remove objects they might be overkill for this. Hashing can achieve linear time, versus spatial trees which are n log n, although it might depend on the distribution of points.
To understand hashing and rounding, just think of it as partitioning your space up into a grid of squares with sides of length equal to the radius you want to check against. Each square is given it's own "zip code" which you can use as a hash key to store values in that square. You can compute the zip code of a point by dividing the x and y co-ordinates by the radius, and rounding down, like this:
def get_zip_code(x, y, radius):
return str(int(math.floor(x/radius))) + "_" + str(int(math.floor(y/radius)))
I'm using strings because it's simple, but you can use anything as long as you generate a unique zip code for each square.
Create a dictionary, where the keys are the zip codes, and the values are lists of all the people in that zip code. To check for conflicts, add the people one at a time, and before adding each one, test for conflicts with all the people in the same zip code, and the zip code's 8 neighbours. I've reused your method for keeping track of conflicts:
def check_for_conflicts(peoples, conflict_radius):
index = 0
d = {}
conflicts_dict = {}
for person in peoples:
# check for conflicts with people in this person's zip code
# and neighbouring zip codes:
for offset_x in range(-1, 2):
for offset_y in range(-1, 2):
offset_zip_code = get_zip_code(person[0] + (offset_x * conflict_radius), person[1] + (offset_y * conflict_radius), conflict_radius)
if offset_zip_code in d:
# get a list of people in this zip:
other_people = d[offset_zip_code]
# check for conflicts with each of them:
for other_person in other_people:
conflict = check_real_distance(person, other_person, conflict_radius)
if conflict:
people1 = index
people2 = other_person[2]
conflicts_dict[people1] = True
conflicts_dict[people2] = True
# add the new person to their zip code
zip_code = get_zip_code(person[0], person[1], conflict_radius)
if not zip_code in d:
d[zip_code] = []
d[zip_code].append([person[0], person[1], index])
index += 1
return len(conflicts_dict.keys())
The time complexity of this depends on a couple of things. If you increase the number of people, but don't increase the size of the space you are distributing them in, then it will be O(N2) because the number of conflicts is going to increase quadratically and you have to count them all. However if you increase the space along with the number of people, so that the density is the same, it will be closer to O(N).
If you're just counting unique people, you can keep a count if how many people in each zip code have at least 1 conflict. If its equal to everyone in the zip code, you can early out of the loop that checks for conflicts in a given zip after the first conflict with the new person, since no more uniques will be found. You could also loop through twice, adding all people on the first loop, and testing on the second, breaking out of the loop when you find the first conflict for each person.
You can see this topcoder link and section 'Closest pair'. You can modify the closest pair algorithm so that the distance h is always 50.
So , what you basically do is ,
Sort the people by X coordinate
Sweep from left to right.
Keep a balanced binary tree and keep all the points within 50 radii in the binary tree. The key of the binary tree would be the Y coordinates of the point
Select the points with Y-50 and Y+50 , this can be done with the binary tree in lg(n) time.
So the overall complexity becomes nlg(n)
Be sure to mark the points you find to skip those points in the future.
You can use set in C++ as the binary tree.But I couldn't find if python set supports range query or upper_bound and lower_bound.If someone knows , please point that out in the comments.
Here's my solution to this interesting problem:
from math import sqrt
import math
import random
class Person():
def __init__(self, x, y, conflict_radius=500):
self.position = [x, y]
self.valid = True
self.radius = conflict_radius**2
def validate_people(self, people):
P0 = self.position
for p in reversed(people):
P1 = p.position
dx = P1[0] - P0[0]
dy = P1[1] - P0[1]
dx2 = (dx * dx)
if dx2 > self.radius:
break
dy2 = (dy * dy)
d = dx2 + dy2
if d <= self.radius:
self.valid = False
p.valid = False
def __str__(self):
p = self.position
return "{0}:{1} - {2}".format(p[0], p[1], self.valid)
class Park():
def __init__(self, num_people=10000, park_size=128000):
random.seed(1)
self.num_people = num_people
self.park_size = park_size
def gen_coord(self):
return int(random.random() * self.park_size)
def generate(self):
return [[self.gen_coord(), self.gen_coord()] for i in range(self.num_people)]
def naive_solution(data):
sorted_data = sorted(data, key=lambda x: x[0])
len_sorted_data = len(sorted_data)
result = []
for index, pos in enumerate(sorted_data):
print "{0}/{1} - {2}".format(index, len_sorted_data, len(result))
p = Person(pos[0], pos[1])
p.validate_people(result)
result.append(p)
return result
if __name__ == '__main__':
people_positions = Park().generate()
with_conflicts = len(filter(lambda x: x.valid, naive_solution(people_positions)))
without_conflicts = len(filter(lambda x: not x.valid, naive_solution(people_positions)))
print("people with conflicts: {}".format(with_conflicts))
print("people without conflicts: {}".format(without_conflicts))
I'm sure the code can be still optimized further
I found a relatively solution to the problem. Sort the list of coordinates by the X value. Then look at each X value, one at a time. Sweep right, checking the position with the next position, until the end of the sweep area is reached (500 meters), or a conflict is found.
If no conflict is found, sweep left in the same manner. This method avoids unnecessary checks. For example, if there are 1,000,000 people in the park, then all of them will be in conflict. The algorithm will only check each person one time: once a conflict is found the search stops.
My time seems to be O(N).
Here is the code:
import math
import random
random.seed(1) # Setting random number generator seed for repeatability
NUM_PEOPLE = 10000
PARK_SIZE = 128000 # Meters.
CONFLICT_RADIUS = 500 # Meters.
check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)
def check_for_conflicts(peoples, conflict_radius):
peoples.sort(key = lambda x: x[0])
conflicts_dict = {}
i = 0
num_checks = 0
# use a type of sweep strategy
while i < len(peoples) :
conflict = False
j = i + 1
#sweep right
while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j += 1
j = i - 1
#sweep left
while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j -= 1
i += 1
print("num checks is {0}".format(num_checks))
print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
return len(conflicts_dict.keys())
def gen_coord():
return int(random.random() * PARK_SIZE)
if __name__ == '__main__':
people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
print("people in conflict: {}".format(conflicts))
I cam up with an answer that seems to take O(N) time. The strategy is to sort the array by X values. For each X value, sweep left until a conflict is found, or the distance exceeds the conflict distance (500 M). If no conflict is found, sweep left in the same manner. With this technique, you limit the amount of searching.
Here is the code:
import math
import random
random.seed(1) # Setting random number generator seed for repeatability
NUM_PEOPLE = 10000
PARK_SIZE = 128000 # Meters.
CONFLICT_RADIUS = 500 # Meters.
check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)
def check_for_conflicts(peoples, conflict_radius):
peoples.sort(key = lambda x: x[0])
conflicts_dict = {}
i = 0
num_checks = 0
# use a type of sweep strategy
while i < len(peoples) :
conflict = False
j = i + 1
#sweep right
while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j += 1
j = i - 1
#sweep left
while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j -= 1
i += 1
print("num checks is {0}".format(num_checks))
print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
return len(conflicts_dict.keys())
def gen_coord():
return int(random.random() * PARK_SIZE)
if __name__ == '__main__':
people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
print("people in conflict: {}".format(conflicts))

Generating all the combinations of two lists and output them one by one in python

I have two lists
[1, 3, 4] [7, 8]
I want to generate all the combinations of two list starting from the smallest combinations like 17,18,37,38,47,48,137,138,147,148......178,378....
Now for each combination I have to test it's presence in some other place and if I found that combination to be present then I will stop the combination generation. For example If I see that 17 is present then I will not generate the other combinations. Again If I found 48 to be present then I will not generate the later combinations.
This is a pretty ugly algorithm, but it worked for me. It's also not super expensive (expect, of course, for generating all the combinations with itertools.combinations(a, i)...):
import itertools
def all_combs(a):
to_return = []
temp = []
for i in a:
temp.append(i)
to_return.append(temp)
for i in range(2, len(a) + 1):
temp = []
for j in itertools.combinations(a, i):
s = ""
for k in j:
s = s + str(k)
temp.append(int(s)) #Get all values from the list permutation
to_return.append(temp)
print(to_return)
return to_return
def all_perm(a, b):
a_combs = all_combs(a)
b_combs = all_combs(b)
to_return = []
for i in a_combs:
for j in b_combs:
for k in i:
for l in j:
to_return.append(10**len(str(l)) * k + l)
to_return.sort()
for i in to_return:
yield i
EDIT: Fixed a bug where multi-digit values weren't read in correctly
EDIT: Made the function act as a generator
EDIT: Fixed a bug involving digits (by adding a sort...)
EDIT: Here's a vastly superior implementation which meets the generator style much more closely. It's still not perfect, but it should provide good speedup in the average case:
import itertools
def add_to_dict(dict, length, num):
if not length in dict:
dict[length] = []
dict[length].append(num)
def sum_to_val(val):
to_return = []
for i in range(1, val):
to_return.append([i, val-i])
return to_return
def all_combs(a):
to_return = {}
for i in a:
add_to_dict(to_return, len(str(i)), i)
for i in range(2, len(a) + 1):
for j in itertools.combinations(a, i):
s = ""
for k in j:
s = s + str(k)
add_to_dict(to_return, len(s), int(s)) #Get all values from the list permutation
return to_return
def all_perm(a, b):
a_combs = all_combs(a)
b_combs = all_combs(b)
for val in range(max(a_combs.keys())+max(b_combs.keys())+1):
to_return = []
sums = sum_to_val(val)
for i in sums:
if not(i[0] in a_combs and i[1] in b_combs):
continue
for j in a_combs[i[0]]:
for k in b_combs[i[1]]:
to_return.append(10**len(str(k)) * j + k)
to_return.sort()
for i in to_return:
yield i

Need help in understanding Dynamic Programming approach for "balanced 0-1 matrix"?

Problem: I am struggling to understand/visualize the Dynamic Programming approach for "A type of balanced 0-1 matrix in "Dynamic Programming - Wikipedia Article."
Wikipedia Link: https://en.wikipedia.org/wiki/Dynamic_programming#A_type_of_balanced_0.E2.80.931_matrix
I couldn't understand how the memoization works when dealing with a multidimensional array. For example, when trying to solve the Fibonacci series with DP, using an array to store previous state results is easy, as the index value of the array store the solution for that state.
Can someone explain DP approach for the "0-1 balanced matrix" in simpler manner?
Wikipedia offered both a crappy explanation and a not ideal algorithm. But let's work with it as a starting place.
First let's take the backtracking algorithm. Rather than put the cells of the matrix "in some order", let's go everything in the first row, then everything in the second row, then everything in the third row, and so on. Clearly that will work.
Now let's modify the backtracking algorithm slightly. Instead of going cell by cell, we'll go row by row. So we make a list of the n choose n/2 possible rows which are half 0 and half 1. Then have a recursive function that looks something like this:
def count_0_1_matrices(n, filled_rows=None):
if filled_rows is None:
filled_rows = []
if some_column_exceeds_threshold(n, filled_rows):
# Cannot have more than n/2 0s or 1s in any column
return 0
else:
answer = 0
for row in possible_rows(n):
answer = answer + count_0_1_matrices(n, filled_rows + [row])
return answer
This is a backtracking algorithm like what we had before. We are just doing whole rows at a time, not cells.
But notice, we're passing around more information than we need. There is no need to pass in the exact arrangement of rows. All that we need to know is how many 1s are needed in each remaining column. So we can make the algorithm look more like this:
def count_0_1_matrices(n, still_needed=None):
if still_needed is None:
still_needed = [int(n/2) for _ in range(n)]
# Did we overrun any column?
for i in still_needed:
if i < 0:
return 0
# Did we reach the end of our matrix?
if 0 == sum(still_needed):
return 1
# Calculate the answer by recursion.
answer = 0
for row in possible_rows(n):
next_still_needed = [still_needed[i] - row[i] for i in range(n)]
answer = answer + count_0_1_matrices(n, next_still_needed)
return answer
This version is almost the recursive function in the Wikipedia version. The main difference is that our base case is that after every row is finished, we need nothing, while Wikipedia would have us code up the base case to check the last row after every other is done.
To get from this to a top-down DP, you only need to memoize the function. Which in Python you can do by defining then adding an #memoize decorator. Like this:
from functools import wraps
def memoize(func):
cache = {}
#wraps(func)
def wrap(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return wrap
But remember that I criticized the Wikipedia algorithm? Let's start improving it! The first big improvement is this. Do you notice that the order of the elements of still_needed can't matter, just their values? So just sorting the elements will stop you from doing the calculation separately for each permutation. (There can be a lot of permutations!)
#memoize
def count_0_1_matrices(n, still_needed=None):
if still_needed is None:
still_needed = [int(n/2) for _ in range(n)]
# Did we overrun any column?
for i in still_needed:
if i < 0:
return 0
# Did we reach the end of our matrix?
if 0 == sum(still_needed):
return 1
# Calculate the answer by recursion.
answer = 0
for row in possible_rows(n):
next_still_needed = [still_needed[i] - row[i] for i in range(n)]
answer = answer + count_0_1_matrices(n, sorted(next_still_needed))
return answer
That little innocuous sorted doesn't look important, but it saves a lot of work! And now that we know that still_needed is always sorted, we can simplify our checks for whether we are done, and whether anything went negative. Plus we can add an easy check to filter out the case where we have too many 0s in a column.
#memoize
def count_0_1_matrices(n, still_needed=None):
if still_needed is None:
still_needed = [int(n/2) for _ in range(n)]
# Did we overrun any column?
if still_needed[-1] < 0:
return 0
total = sum(still_needed)
if 0 == total:
# We reached the end of our matrix.
return 1
elif total*2/n < still_needed[0]:
# We have total*2/n rows left, but won't get enough 1s for a
# column.
return 0
# Calculate the answer by recursion.
answer = 0
for row in possible_rows(n):
next_still_needed = [still_needed[i] - row[i] for i in range(n)]
answer = answer + count_0_1_matrices(n, sorted(next_still_needed))
return answer
And, assuming you implement possible_rows, this should both work and be significantly more efficient than what Wikipedia offered.
=====
Here is a complete working implementation. On my machine it calculated the 6'th term in under 4 seconds.
#! /usr/bin/env python
from sys import argv
from functools import wraps
def memoize(func):
cache = {}
#wraps(func)
def wrap(*args):
if args not in cache:
cache[args] = func(*args)
return cache[args]
return wrap
#memoize
def count_0_1_matrices(n, still_needed=None):
if 0 == n:
return 1
if still_needed is None:
still_needed = [int(n/2) for _ in range(n)]
# Did we overrun any column?
if still_needed[0] < 0:
return 0
total = sum(still_needed)
if 0 == total:
# We reached the end of our matrix.
return 1
elif total*2/n < still_needed[-1]:
# We have total*2/n rows left, but won't get enough 1s for a
# column.
return 0
# Calculate the answer by recursion.
answer = 0
for row in possible_rows(n):
next_still_needed = [still_needed[i] - row[i] for i in range(n)]
answer = answer + count_0_1_matrices(n, tuple(sorted(next_still_needed)))
return answer
#memoize
def possible_rows(n):
return [row for row in _possible_rows(n, n/2)]
def _possible_rows(n, k):
if 0 == n:
yield tuple()
else:
if k < n:
for row in _possible_rows(n-1, k):
yield tuple(row + (0,))
if 0 < k:
for row in _possible_rows(n-1, k-1):
yield tuple(row + (1,))
n = 2
if 1 < len(argv):
n = int(argv[1])
print(count_0_1_matrices(2*n)))
You're memoizing states that are likely to be repeated. The state that needs to be remembered in this case is the vector (k is implicit). Let's look at one of the examples you linked to. Each pair in the vector argument (of length n) is representing "the number of zeros and ones that have yet to be placed in that column."
Take the example on the left, where the vector is ((1, 1) (1, 1) (1, 1) (1, 1)), when k = 2 and the assignments leading to it were 1 0 1 0, k = 3 and 0 1 0 1, k = 4. But we could get to the same state, ((1, 1) (1, 1) (1, 1) (1, 1)), k = 2 from a different set of assignments, for example: 0 1 0 1, k = 3 and 1 0 1 0, k = 4. If we would memoize the result for the state, ((1, 1) (1, 1) (1, 1) (1, 1)), we could avoid recalculating the recursion for that branch again.
Please let me know if there's anything I could better clarify.
Further elaboration in response to your comment:
The Wikipedia example seems to be pretty much a brute-force with memoization. The algorithm seems to attempt to enumerate all the matrixes but uses memoization to exit early from repeated states. How do we enumerate all possibilities? To take their example, n = 4, we start with the vector [(2,2),(2,2),(2,2),(2,2)] where zeros and ones are yet to be placed. (Since the sum of each tuple in the vector is k, we could have a simpler vector where k and the count of either ones or zeros is maintained.)
At every stage, k, in the recursion, we enumerate all possible configurations for the next vector. If the state exists in our hash, we simply return the value for that key. Otherwise, we assign the vector as a new key in the hash (in which case this recursion branch will continue).
For example:
Vector [(2,2),(2,2),(2,2),(2,2)]
Possible assignments of 1's: [1 1 0 0], [1 0 1 0], [1 0 0 1] ... etc.
First branch: [(2,1),(2,1),(1,2),(1,2)]
is this vector a key in the hash?
if yes, return value lookup
else, assign this vector as a key in the hash where the value is the sum
of the function calls with the next possible vectors as their arguments
Building on the excellent answer by https://stackoverflow.com/users/585411/btilly, I've updated their algorithm to exclude "0" cases in the still_needed tuple. The code is about 50% faster largely because of more cache hits using the collapsable tuple.
import time
from typing import Tuple
from sys import argv
from functools import cache
#cache
def possible_rows(n, k=None) -> Tuple[int]:
if k is None:
k = n / 2
return [row for row in _possible_rows(n, k)]
def _possible_rows(n, k) -> Tuple[int]:
if 0 == n:
yield tuple()
else:
if k < n:
for row in _possible_rows(n-1, k):
yield tuple(row + (0,))
if 0 < k:
for row in _possible_rows(n-1, k-1):
yield tuple(row + (1,))
def count(n: int, k: int) -> int:
if n == 0:
return 1
still_needed = tuple([k] * n)
return count_0_1_matrices(k, still_needed)
#cache
def count_0_1_matrices(k:int, still_needed: Tuple[int]):
"""
Assume still_needed contains only positive ints, and is sorted ascending
"""
# Calculate the answer by recursion.
answer = 0
for row in possible_rows(len(still_needed), k):
# Decrement the still_needed value tuple by the row tuple and only keep positive results. Sorting is important for cache hits.
next_still_needed = tuple(sorted([sn - r for sn, r in zip(still_needed, row) if sn > r]))
# Only continue if we still need values and there are enough rows left
if not next_still_needed:
answer += 1
elif len(next_still_needed) >= k and sum(next_still_needed) >= next_still_needed[-1] * k:
# sum / k -> how many rows left. We need enough rows left to continue down this path.
answer += count_0_1_matrices(k, next_still_needed)
return answer
if __name__ == "__main__":
n = 7
if 1 < len(argv):
n = int(argv[1])
start = time.time()
result = count(2*n, n)
print(f"{result} in {time.time() - start} seconds")

Resources