Related
I have written a code and try to use numba for accelerating the code. The main goal of the code is to group some values based on a condition. In this regard, iter_ is used for converging the code to satisfy the condition. I prepared a small case below to reproduce the sample code:
import numpy as np
import numba as nb
rng = np.random.default_rng(85)
# --------------------------------------- small data volume ---------------------------------------
# values_ = {'R0': np.array([0.01090976, 0.01069902, 0.00724112, 0.0068463 , 0.01135723, 0.00990762,
# 0.01090976, 0.01069902, 0.00724112, 0.0068463 , 0.01135723]),
# 'R1': np.array([0.01836379, 0.01900166, 0.01864162, 0.0182823 , 0.01840322, 0.01653088,
# 0.01900166, 0.01864162, 0.0182823 , 0.01840322, 0.01653088]),
# 'R2': np.array([0.02430913, 0.02239156, 0.02225379, 0.02093393, 0.02408692, 0.02110411,
# 0.02239156, 0.02225379, 0.02093393, 0.02408692, 0.02110411])}
#
# params = {'R0': [3, 0.9490579204466154, 1825, 7.070272000000002e-05],
# 'R1': [0, 0.9729203826820172, 167 , 7.070272000000002e-05],
# 'R2': [1, 0.6031363088057902, 1316, 8.007296000000003e-05]}
#
# Sno, dec_, upd_ = 2, 100, 200
# -------------------------------------------------------------------------------------------------
# ----------------------------- UPDATED (medium and large data volumes) ---------------------------
# values_ = np.load("values_med.npy", allow_pickle=True)[()]
# params = np.load("params_med.npy", allow_pickle=True)[()]
values_ = np.load("values_large.npy", allow_pickle=True)[()]
params = np.load("params_large.npy", allow_pickle=True)[()]
Sno, dec_, upd_ = 2000, 1000, 200
# -------------------------------------------------------------------------------------------------
# values_ = [*values_.values()]
# params = [*params.values()]
# #nb.jit(forceobj=True)
# def test(values_, params, Sno, dec_, upd_):
final_dict = {}
for i, j in enumerate(values_.keys()):
Rand_vals = []
goal_sum = params[j][1] * params[j][3]
tel = goal_sum / dec_ * 10
if params[j][0] != 0:
for k in range(Sno):
final_sum = 0.0
iter_ = 0
t = 1
while not np.allclose(goal_sum, final_sum, atol=tel):
iter_ += 1
vals_group = rng.choice(values_[j], size=params[j][0], replace=False)
# final_sum = 0.0016 * np.sum(vals_group) # -----> For small data volume
final_sum = np.sum(vals_group ** 3) # -----> UPDATED For med or large data volume
if iter_ == upd_:
t += 1
tel = t * tel
values_[j] = np.delete(values_[j], np.where(np.in1d(values_[j], vals_group)))
Rand_vals.append(vals_group)
else:
Rand_vals = [np.array([])] * Sno
final_dict["R" + str(i)] = Rand_vals
# return final_dict
# test(values_, params, Sno, dec_, upd_)
At first, for applying numba on this code #nb.jit was used (forceobj=True is used for avoiding warnings and …), which will have adverse effect on the performance. nopython is checked, too, with #nb.njit which get the following error due to not supporting (as mentioned in 1, 2) dictionary type of the inputs:
cannot determine Numba type of <class 'dict'>
I don't know if (how) it could be handled by Dict from numba.typed (by converting created python dictionaries to numba Dict) or if converting the dictionaries to lists of arrays have any advantage. I think, parallelization may be possible if some code lines e.g. Rand_vals.append(vals_group) or else section or … be taken or be modified out of the function to get the same results as before, but I don't have any idea how to do so.
I will be grateful for helping utilize numba on this code. numba parallelization will be the most desired (probably the best applicable method in terms of performance) solution if it could.
Data:
medium data volume: values_med, params_med
large data volume: values_large, params_large
This code can be converted to Numba but it is not straightforward.
First of all, the dictionary and list type must be defined since Numba njit functions cannot directly operate on reflected lists (aka. pure-python lists). This is a bit tedious to do in Numba and the resulting code is a bit verbose:
String = nb.types.unicode_type
ValueArray = nb.float64[::1]
ValueDict = nb.types.DictType(String, ValueArray)
ParamDictValue = nb.types.Tuple([nb.int_, nb.float64, nb.int_, nb.float64])
ParamDict = nb.types.DictType(String, ParamDictValue)
FinalDictValue = nb.types.ListType(ValueArray)
FinalDict = nb.types.DictType(String, FinalDictValue)
Then you need to convert the input dictionaries:
nbValues = nb.typed.typeddict.Dict.empty(String, ValueArray)
for key,value in values_.items():
nbValues[key] = value.copy()
nbParams = nb.typed.typeddict.Dict.empty(String, ParamDictValue)
for key,value in params.items():
nbParams[key] = (nb.int_(value[0]), nb.float64(value[1]), nb.int_(value[2]), nb.float64(value[3]))
Then, you need to write the core function. np.allclose and np.isin are not implemented in Numba so they should be reimplemented manually. But the main point is that Numba does not support the rng Numpy object. I think it will certainly not support it any time soon. Note that Numba has an random numbers implementation that try to mimic the behavior of Numpy but the management of the seed is a bit different. Note also that results should be the same with the np.random.xxx Numpy functions if the seed is set to the same value (Numpy and Numba have different seed variables that are not synchronized).
#nb.njit(FinalDict(ValueDict, ParamDict, nb.int_, nb.int_, nb.int_))
def nbTest(values_, params, Sno, dec_, upd_):
final_dict = nb.typed.Dict.empty(String, FinalDictValue)
for i, j in enumerate(values_.keys()):
Rand_vals = nb.typed.List.empty_list(ValueArray)
goal_sum = params[j][1] * params[j][3]
tel = goal_sum / dec_ * 10
if params[j][0] != 0:
for k in range(Sno):
final_sum = 0.0
iter_ = 0
t = 1
vals_group = np.empty(0, dtype=nb.float64)
while np.abs(goal_sum - final_sum) > (1e-05 * np.abs(final_sum) + tel):
iter_ += 1
vals_group = np.random.choice(values_[j], size=params[j][0], replace=False)
final_sum = 0.0016 * np.sum(vals_group)
# final_sum = 0.0016 * np.sum(vals_group) # (for small data volume)
final_sum = np.sum(vals_group ** 3) # (for med or large data volume)
if iter_ == upd_:
t += 1
tel = t * tel
# Perform an in-place deletion
vals, gr = values_[j], vals_group
cur = 0
for l in range(vals.size):
found = False
for m in range(gr.size):
found |= vals[l] == gr[m]
if not found:
# Keep the value (delete it otherwise)
vals[cur] = vals[l]
cur += 1
values_[j] = vals[:cur]
Rand_vals.append(vals_group)
else:
for k in range(Sno):
Rand_vals.append(np.empty(0, dtype=nb.float64))
final_dict["R" + str(i)] = Rand_vals
return final_dict
Note that the replacement implementation of np.isin is quite naive but it works pretty well in practice on your input example.
The function can be called using the following way:
nbFinalDict = nbTest(nbValues, nbParams, Sno, dec_, upd_)
Finally, the dictionary should be converted back to basic Python objects:
finalDict = dict()
for key,value in nbFinalDict.items():
finalDict[key] = list(value)
This implementation is fast for small inputs but not large ones since np.random.choice takes almost all the time (>96%). The thing is this function is clearly not optimal when the number of requested item is small (which is your case). Indeed, it surprisingly runs in linear time of the input array and not in linear time of the number of requested items.
Further Optimizations
The algorithm can be completely rewritten to extract only 12 random items and discard them from the main currant array in a much more efficient way. The idea is to swap n items (small target sample) at the end of the array with other items at random locations, then check the sum, repeat this process until a condition is fulfilled, and finally extract the view to the last n items before resizing the view so to discard the last items. All of this can be done in O(n) time rather than O(m) time where m is the size of the main current array with n << m (eg. 12 VS 20_000). It can also be compute without any expensive allocation. Here is the resulting code:
#nb.njit(nb.void(ValueArray, nb.int_, nb.int_))
def swap(arr, i, j):
arr[i], arr[j] = arr[j], arr[i]
#nb.njit(FinalDict(ValueDict, ParamDict, nb.int_, nb.int_, nb.int_))
def nbTest(values_, params, Sno, dec_, upd_):
final_dict = nb.typed.Dict.empty(String, FinalDictValue)
for i, j in enumerate(values_.keys()):
Rand_vals = nb.typed.List.empty_list(ValueArray)
goal_sum = params[j][1] * params[j][3]
tel = goal_sum / dec_ * 10
values = values_[j]
n = params[j][0]
if n != 0:
for k in range(Sno):
final_sum = 0.0
iter_ = 0
t = 1
m = values.size
assert n <= m
group = values[-n:]
while np.abs(goal_sum - final_sum) > (1e-05 * np.abs(final_sum) + tel):
iter_ += 1
# Swap the group view with other random items
for pos in range(m - n, m):
swap(values, pos, np.random.randint(0, m))
# For small data volume:
# final_sum = 0.0016 * np.sum(group)
# For med/large data volume
final_sum = 0.0
for v in group:
final_sum += v ** 3
if iter_ == upd_:
t += 1
tel *= t
assert iter_ > 0
values = values[:m-n]
Rand_vals.append(group)
else:
for k in range(Sno):
Rand_vals.append(np.empty(0, dtype=nb.float64))
final_dict["R" + str(i)] = Rand_vals
return final_dict
In addition to being faster, this implementation as the benefit of being also simpler. Results looks quite similar to the previous implementation despite the randomness make the check of the results tricky (especially since this function does not use the same method to choose the random sample). Note that this implementation does not remove items in values that are in group as opposed to the previous one (this is probably not wanted though).
Benchmark
Here are the results of the last implementation on my machine (compilation and conversion timings excluded):
Provided small input (embedded in the question):
- Initial code: 42.71 ms
- Numba code: 0.11 ms
Medium input:
- Initial code: 3481 ms
- Numba code: 11 ms
Large input:
- Initial code: 6728 ms
- Numba code: 20 ms
Note that the conversion time takes about the same time than the computation.
This last implementation is 316~388 times faster than the initial code on small inputs.
Notes
Note that the compilation time takes few seconds due to the dict and lists types.
Note that while it may be possible to parallelise the implementation, only the most encompassing loop can be parallelised. The thing is there is only few items to compute and the time is already quite small (not the best case for multi-threading). <-- Additionally, the creation of many temporary arrays (created by rng.choice) will certainly cause the parallel loop not to scale well anyway. --> Additionally, the list/dict cannot be written from multiple threads safely so one need to use Numpy arrays in the whole function to be able to do that (or add additional conversion that are already expensive). Moreover, Numba parallelism tends to increase significantly the compilation time which is already significant. Finally, the result will be less deterministic since each Numba thread has its own random number generator seed and the items computed by the threads cannot be predicted with prange (dependent of the parallel runtime chosen on the target platform). Note that in Numpy there is one global seed by default used by usual random functions (deprecated way) and RNG objects have their own seed (new preferred way).
My use case requires solving min cost max flow problem. I am looking for an algorithm that can satisfy the following restriction.
I want to add a special restriction to finding the min cost solution.
The restriction is that the cost should be calculated based on the square of the flow running through edge, not a unit cost. This restriction will force the algorithm to distribute the flow more equally.
Thank you.
I did two implementations in Python: one using OR-Tools and line search, the other using cvxpy. The former should be easy to transliterate to C++ or another OR-Tools language, but it's an approximation.
import ortools.graph.pywrapgraph
import random
# Set up a random network.
n = 100
average_out_degree = 20
max_arc_capacity = 10
arcs = []
for j in range(n * average_out_degree):
while True:
tail = random.randrange(n)
head = random.randrange(n)
if tail != head:
break
capacity = random.randrange((n // abs(tail - head)) ** 2 + 1)
# We need a lot of excess precision in the capacity because we round.
arcs.append((tail, head, 10 ** 4 * capacity))
source = 0
sink = n - 1
# Initialize the line search with a max flow.
max_flow = ortools.graph.pywrapgraph.SimpleMaxFlow()
arc_indices = []
for tail, head, capacity in arcs:
arc_indices.append(max_flow.AddArcWithCapacity(tail, head, capacity))
max_flow.Solve(source, sink)
flows = []
for arc_index in arc_indices:
flows.append(max_flow.Flow(arc_index))
amount = max_flow.OptimalFlow()
# Improve the cost of this flow using line search.
for i in range(1000):
print("Cost is", sum(flow ** 2 for flow in flows))
# The gradient of the L2 cost function is linear. Therefore we can compute
# an optimal improving direction as a min-cost flow.
min_cost_flow = ortools.graph.pywrapgraph.SimpleMinCostFlow()
arc_indices = []
for (tail, head, capacity), flow in zip(arcs, flows):
# OR-Tools requires the unit cost to be an integer.
unit_cost = round(flow)
arc_indices.append(
min_cost_flow.AddArcWithCapacityAndUnitCost(tail, head, capacity, unit_cost)
)
min_cost_flow.SetNodeSupply(source, amount)
min_cost_flow.SetNodeSupply(sink, -amount)
min_cost_flow.Solve()
flows_prime = []
for arc_index in arc_indices:
flows_prime.append(min_cost_flow.Flow(arc_index))
# Now we take the best solution that is a convex combination (1 - alpha) *
# flows + alpha * flows_prime. First we express the cost as a quadratic
# polynomial in alpha.
a = 0
b = 0
c = 0
for flow, flow_prime in zip(flows, flows_prime):
# Expand ((1 - alpha) * flow + alpha * flow_prime) ** 2 and get the
# coefficients below.
a += (flow_prime - flow) ** 2
b += 2 * (flow_prime - flow) * flow
c += flow ** 2
if a == 0:
# flows and flows_prime are the same. No point in continuing.
break
# Since a > 0, this is where the polynomial takes its minimum.
alpha = -b / (2 * a)
# Clamp alpha.
alpha = max(0, min(1, alpha))
# Update flows.
for j, (flow, flow_prime) in enumerate(zip(flows, flows_prime)):
flows[j] = (1 - alpha) * flow + alpha * flow_prime
# Comparison using cvxpy.
import cvxpy
flows = cvxpy.Variable(len(arcs))
constraints = []
excesses = [0] * n
excesses[source] = amount
excesses[sink] = -amount
for (tail, head, capacity), flow in zip(arcs, flows):
excesses[tail] -= flow
excesses[head] += flow
constraints.append(flow >= 0)
constraints.append(flow <= capacity)
for excess in excesses:
constraints.append(excess == 0)
problem = cvxpy.Problem(cvxpy.Minimize(cvxpy.sum_squares(flows)), constraints)
problem.solve()
print(problem.value)
I had this problem for an entry test for a job. I did not pass the test. I am disguising the question in deference to the company.
Imagine you have N number of people in a park of A X B space. If a person has no other person within 50 feet, he enjoys his privacy. Otherwise, his personal space is violated. Given a set of (x, y), how many people will have their space violated?
For example, give this list in Python:
people = [(0,0), (1,1), (1000, 1000)]
We would find 2 people who are having their space violated: 1, 2.
We don't need to find all sets of people; just the total number of unique people.
You can't use a brute method to solve the problem. In other words, you can't use a simple array within an array.
I have been working on this problem off and on for a few weeks, and although I have gotten a solution faster than n^2, have not come up with a problem that scales.
I think the only correct way to solve this problem is by using Fortune's algorithm?
Here's what I have in Python (not using Fortune's algorithm):
import math
import random
random.seed(1) # Setting random number generator seed for repeatability
TEST = True
NUM_PEOPLE = 10000
PARK_SIZE = 128000 # Meters.
CONFLICT_RADIUS = 500 # Meters.
def _get_distance(x1, y1, x2, y2):
"""
require: x1, y1, x2, y2: all integers
return: a distance as a float
"""
distance = math.sqrt(math.pow((x1 - x2), 2) + math.pow((y1 - y2),2))
return distance
def check_real_distance(people1, people2, conflict_radius):
"""
determine if two people are too close
"""
if people2[1] - people1[1] > conflict_radius:
return False
d = _get_distance(people1[0], people1[1], people2[0], people2[1])
if d >= conflict_radius:
return False
return True
def check_for_conflicts(peoples, conflict_radius):
# sort people
def sort_func1(the_tuple):
return the_tuple[0]
_peoples = []
index = 0
for people in peoples:
_peoples.append((people[0], people[1], index))
index += 1
peoples = _peoples
peoples = sorted(peoples, key = sort_func1)
conflicts_dict = {}
i = 0
# use a type of sweep strategy
while i < len(peoples) - 1:
x_len = peoples[i + 1][0] - peoples[i][0]
conflict = False
conflicts_list =[peoples[i]]
j = i + 1
while x_len <= conflict_radius and j < len(peoples):
x_len = peoples[j][0] - peoples[i][0]
conflict = check_real_distance(peoples[i], peoples[j], conflict_radius)
if conflict:
people1 = peoples[i][2]
people2 = peoples[j][2]
conflicts_dict[people1] = True
conflicts_dict[people2] = True
j += 1
i += 1
return len(conflicts_dict.keys())
def gen_coord():
return int(random.random() * PARK_SIZE)
if __name__ == '__main__':
people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
print("people in conflict: {}".format(conflicts))
As you can see from the comments, there's lots of approaches to this problem. In an interview situation you'd probably want to list as many as you can and say what the strengths and weaknesses of each one are.
For the problem as stated, where you have a fixed radius, the simplest approach is probably rounding and hashing. k-d trees and the like are powerful data structures, but they're also quite complex and if you don't need to repeatedly query them or add and remove objects they might be overkill for this. Hashing can achieve linear time, versus spatial trees which are n log n, although it might depend on the distribution of points.
To understand hashing and rounding, just think of it as partitioning your space up into a grid of squares with sides of length equal to the radius you want to check against. Each square is given it's own "zip code" which you can use as a hash key to store values in that square. You can compute the zip code of a point by dividing the x and y co-ordinates by the radius, and rounding down, like this:
def get_zip_code(x, y, radius):
return str(int(math.floor(x/radius))) + "_" + str(int(math.floor(y/radius)))
I'm using strings because it's simple, but you can use anything as long as you generate a unique zip code for each square.
Create a dictionary, where the keys are the zip codes, and the values are lists of all the people in that zip code. To check for conflicts, add the people one at a time, and before adding each one, test for conflicts with all the people in the same zip code, and the zip code's 8 neighbours. I've reused your method for keeping track of conflicts:
def check_for_conflicts(peoples, conflict_radius):
index = 0
d = {}
conflicts_dict = {}
for person in peoples:
# check for conflicts with people in this person's zip code
# and neighbouring zip codes:
for offset_x in range(-1, 2):
for offset_y in range(-1, 2):
offset_zip_code = get_zip_code(person[0] + (offset_x * conflict_radius), person[1] + (offset_y * conflict_radius), conflict_radius)
if offset_zip_code in d:
# get a list of people in this zip:
other_people = d[offset_zip_code]
# check for conflicts with each of them:
for other_person in other_people:
conflict = check_real_distance(person, other_person, conflict_radius)
if conflict:
people1 = index
people2 = other_person[2]
conflicts_dict[people1] = True
conflicts_dict[people2] = True
# add the new person to their zip code
zip_code = get_zip_code(person[0], person[1], conflict_radius)
if not zip_code in d:
d[zip_code] = []
d[zip_code].append([person[0], person[1], index])
index += 1
return len(conflicts_dict.keys())
The time complexity of this depends on a couple of things. If you increase the number of people, but don't increase the size of the space you are distributing them in, then it will be O(N2) because the number of conflicts is going to increase quadratically and you have to count them all. However if you increase the space along with the number of people, so that the density is the same, it will be closer to O(N).
If you're just counting unique people, you can keep a count if how many people in each zip code have at least 1 conflict. If its equal to everyone in the zip code, you can early out of the loop that checks for conflicts in a given zip after the first conflict with the new person, since no more uniques will be found. You could also loop through twice, adding all people on the first loop, and testing on the second, breaking out of the loop when you find the first conflict for each person.
You can see this topcoder link and section 'Closest pair'. You can modify the closest pair algorithm so that the distance h is always 50.
So , what you basically do is ,
Sort the people by X coordinate
Sweep from left to right.
Keep a balanced binary tree and keep all the points within 50 radii in the binary tree. The key of the binary tree would be the Y coordinates of the point
Select the points with Y-50 and Y+50 , this can be done with the binary tree in lg(n) time.
So the overall complexity becomes nlg(n)
Be sure to mark the points you find to skip those points in the future.
You can use set in C++ as the binary tree.But I couldn't find if python set supports range query or upper_bound and lower_bound.If someone knows , please point that out in the comments.
Here's my solution to this interesting problem:
from math import sqrt
import math
import random
class Person():
def __init__(self, x, y, conflict_radius=500):
self.position = [x, y]
self.valid = True
self.radius = conflict_radius**2
def validate_people(self, people):
P0 = self.position
for p in reversed(people):
P1 = p.position
dx = P1[0] - P0[0]
dy = P1[1] - P0[1]
dx2 = (dx * dx)
if dx2 > self.radius:
break
dy2 = (dy * dy)
d = dx2 + dy2
if d <= self.radius:
self.valid = False
p.valid = False
def __str__(self):
p = self.position
return "{0}:{1} - {2}".format(p[0], p[1], self.valid)
class Park():
def __init__(self, num_people=10000, park_size=128000):
random.seed(1)
self.num_people = num_people
self.park_size = park_size
def gen_coord(self):
return int(random.random() * self.park_size)
def generate(self):
return [[self.gen_coord(), self.gen_coord()] for i in range(self.num_people)]
def naive_solution(data):
sorted_data = sorted(data, key=lambda x: x[0])
len_sorted_data = len(sorted_data)
result = []
for index, pos in enumerate(sorted_data):
print "{0}/{1} - {2}".format(index, len_sorted_data, len(result))
p = Person(pos[0], pos[1])
p.validate_people(result)
result.append(p)
return result
if __name__ == '__main__':
people_positions = Park().generate()
with_conflicts = len(filter(lambda x: x.valid, naive_solution(people_positions)))
without_conflicts = len(filter(lambda x: not x.valid, naive_solution(people_positions)))
print("people with conflicts: {}".format(with_conflicts))
print("people without conflicts: {}".format(without_conflicts))
I'm sure the code can be still optimized further
I found a relatively solution to the problem. Sort the list of coordinates by the X value. Then look at each X value, one at a time. Sweep right, checking the position with the next position, until the end of the sweep area is reached (500 meters), or a conflict is found.
If no conflict is found, sweep left in the same manner. This method avoids unnecessary checks. For example, if there are 1,000,000 people in the park, then all of them will be in conflict. The algorithm will only check each person one time: once a conflict is found the search stops.
My time seems to be O(N).
Here is the code:
import math
import random
random.seed(1) # Setting random number generator seed for repeatability
NUM_PEOPLE = 10000
PARK_SIZE = 128000 # Meters.
CONFLICT_RADIUS = 500 # Meters.
check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)
def check_for_conflicts(peoples, conflict_radius):
peoples.sort(key = lambda x: x[0])
conflicts_dict = {}
i = 0
num_checks = 0
# use a type of sweep strategy
while i < len(peoples) :
conflict = False
j = i + 1
#sweep right
while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j += 1
j = i - 1
#sweep left
while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j -= 1
i += 1
print("num checks is {0}".format(num_checks))
print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
return len(conflicts_dict.keys())
def gen_coord():
return int(random.random() * PARK_SIZE)
if __name__ == '__main__':
people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
print("people in conflict: {}".format(conflicts))
I cam up with an answer that seems to take O(N) time. The strategy is to sort the array by X values. For each X value, sweep left until a conflict is found, or the distance exceeds the conflict distance (500 M). If no conflict is found, sweep left in the same manner. With this technique, you limit the amount of searching.
Here is the code:
import math
import random
random.seed(1) # Setting random number generator seed for repeatability
NUM_PEOPLE = 10000
PARK_SIZE = 128000 # Meters.
CONFLICT_RADIUS = 500 # Meters.
check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)
def check_for_conflicts(peoples, conflict_radius):
peoples.sort(key = lambda x: x[0])
conflicts_dict = {}
i = 0
num_checks = 0
# use a type of sweep strategy
while i < len(peoples) :
conflict = False
j = i + 1
#sweep right
while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j += 1
j = i - 1
#sweep left
while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
and not conflict and not conflicts_dict.get(i):
num_checks += 1
conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
if conflict:
conflicts_dict[i] = True
conflicts_dict[j] = True
j -= 1
i += 1
print("num checks is {0}".format(num_checks))
print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
return len(conflicts_dict.keys())
def gen_coord():
return int(random.random() * PARK_SIZE)
if __name__ == '__main__':
people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
print("people in conflict: {}".format(conflicts))
I have written a poker simulator that calculates the probability to win in texas holdem by running a simulation of games (i.e. monte carlo simulation). Currently it runs around 10000 simulations in 10 seconds which in general is good enough. However, there are apps on the iPhone that are running around 100x faster. I'm wondering whether there's anything I can do to speed up the program significantly. I have already replaced strings with lists and found many ways to speed up the program by around 2-3 times. But what can I do to possibly speed it up 50x-100x? I checked with a profiler but couldn't find any significant bottleneck. I also complied it to cython (without making any changes) but that had no effect on speed either. Any suggestions are appreciated. The full listing is below:
__author__ = 'Nicolas Dickreuter'
import time
import numpy as np
from collections import Counter
class MonteCarlo(object):
def EvalBestHand(self, hands):
scores = [(i, self.score(hand)) for i, hand in enumerate(hands)]
winner = sorted(scores, key=lambda x: x[1], reverse=True)[0][0]
return hands[winner],scores[winner][1][-1]
def score(self, hand):
crdRanksOriginal = '23456789TJQKA'
originalSuits='CDHS'
rcounts = {crdRanksOriginal.find(r): ''.join(hand).count(r) for r, _ in hand}.items()
score, crdRanks = zip(*sorted((cnt, rank) for rank, cnt in rcounts)[::-1])
potentialThreeOfAKind = score[0] == 3
potentialTwoPair = score == (2, 2, 1, 1, 1)
potentialPair = score == (2, 1, 1, 1, 1, 1)
if score[0:2]==(3,2) or score[0:2]==(3,3): # fullhouse (three of a kind and pair, or two three of a kind)
crdRanks = (crdRanks[0],crdRanks[1])
score = (3,2)
elif score[0:4]==(2,2,2,1):
score=(2,2,1) # three pair are not worth more than two pair
sortedCrdRanks = sorted(crdRanks,reverse=True) # avoid for example 11,8,6,7
crdRanks=(sortedCrdRanks[0],sortedCrdRanks[1],sortedCrdRanks[2],sortedCrdRanks[3])
elif len(score) >= 5: # high card, flush, straight and straight flush
# straight
if 12 in crdRanks: # adjust if 5 high straight
crdRanks += (-1,)
sortedCrdRanks = sorted(crdRanks,reverse=True) # sort again as if pairs the first rank matches the pair
for i in range(len(sortedCrdRanks) - 4):
straight = sortedCrdRanks[i] - sortedCrdRanks[i + 4] == 4
if straight:
crdRanks=(sortedCrdRanks[i],sortedCrdRanks[i+1],sortedCrdRanks[i+2],sortedCrdRanks[i+3],sortedCrdRanks[i+4])
break
# flush
suits = [s for _, s in hand]
flush = max(suits.count(s) for s in suits) >= 5
if flush:
for flushSuit in originalSuits: # get the suit of the flush
if suits.count(flushSuit)>=5:
break
flushHand = [k for k in hand if flushSuit in k]
rcountsFlush = {crdRanksOriginal.find(r): ''.join(flushHand).count(r) for r, _ in flushHand}.items()
score, crdRanks = zip(*sorted((cnt, rank) for rank, cnt in rcountsFlush)[::-1])
crdRanks = tuple(sorted(crdRanks,reverse=True)) # ignore original sorting where pairs had influence
# check for straight in flush
if 12 in crdRanks: # adjust if 5 high straight
crdRanks += (-1,)
for i in range(len(crdRanks) - 4):
straight = crdRanks[i] - crdRanks[i + 4] == 4
# no pair, straight, flush, or straight flush
score = ([(5,), (2, 1, 2)], [(3, 1, 3), (5,)])[flush][straight]
if score == (1,) and potentialThreeOfAKind: score = (3, 1)
elif score == (1,) and potentialTwoPair: score = (2, 2, 1)
elif score == (1,) and potentialPair: score = (2, 1, 1)
if score[0]==5:
handType="StraightFlush"
#crdRanks=crdRanks[:5] # five card rule makes no difference {:5] would be incorrect
elif score[0]==4:
handType="FoufOfAKind"
crdRanks=crdRanks[:2]
elif score[0:2]==(3,2):
handType="FullHouse"
# crdRanks=crdRanks[:2] # already implmeneted above
elif score[0:3]==(3,1,3):
handType="Flush"
crdRanks=crdRanks[:5] # !! to be verified !!
elif score[0:3]==(3,1,2):
handType="Straight"
crdRanks=crdRanks[:5] # !! to be verified !!
elif score[0:2]==(3,1):
handType="ThreeOfAKind"
crdRanks=crdRanks[:3]
elif score[0:2]==(2,2):
handType="TwoPair"
crdRanks=crdRanks[:3]
elif score[0]==2:
handType="Pair"
crdRanks=crdRanks[:4]
elif score[0]==1:
handType="HighCard"
crdRanks=crdRanks[:5]
else: raise Exception('Card Type error!')
return score, crdRanks, handType
def createCardDeck(self):
values = "23456789TJQKA"
suites = "CDHS"
Deck=[]
[Deck.append(x+y) for x in values for y in suites]
return Deck
def distributeToPlayers(self, Deck, PlayerAmount, PlayerCardList, TableCardsList):
Players =[]
CardsOnTable = []
knownPlayers = 0
for PlayerCards in PlayerCardList:
FirstPlayer=[]
FirstPlayer.append(Deck.pop(Deck.index(PlayerCards[0])))
FirstPlayer.append(Deck.pop(Deck.index(PlayerCards[1])))
Players.append(FirstPlayer)
knownPlayers += 1
for c in TableCardsList:
CardsOnTable.append(Deck.pop(Deck.index(c))) # remove cards that are on the table from the deck
for n in range(0, PlayerAmount - knownPlayers):
plr=[]
plr.append(Deck.pop(np.random.random_integers(0,len(Deck)-1)))
plr.append(Deck.pop(np.random.random_integers(0,len(Deck)-1)))
Players.append(plr)
return Players, Deck
def distributeToTable(self, Deck, TableCardsList):
remaningRandoms = 5 - len(TableCardsList)
for n in range(0, remaningRandoms):
TableCardsList.append(Deck.pop(np.random.random_integers(0,len(Deck)-1)))
return TableCardsList
def RunMonteCarlo(self, originalPlayerCardList, originalTableCardsList, PlayerAmount, gui, maxRuns=6000,maxSecs=5):
winnerlist = []
EquityList = []
winnerCardTypeList=[]
wins = 0
runs=0
timeout_start=time.time()
for m in range(maxRuns):
runs+=1
try:
if gui.active==True:
gui.progress["value"] = int(round(m*100/maxRuns))
except:
pass
Deck = self.createCardDeck()
PlayerCardList = originalPlayerCardList[:]
TableCardsList = originalTableCardsList[:]
Players, Deck = self.distributeToPlayers(Deck, PlayerAmount, PlayerCardList, TableCardsList)
Deck5Cards = self.distributeToTable(Deck, TableCardsList)
PlayerFinalCardsWithTableCards = []
for o in range(0, PlayerAmount):
PlayerFinalCardsWithTableCards.append(Players[o]+Deck5Cards)
bestHand,winnerCardType=self.EvalBestHand(PlayerFinalCardsWithTableCards)
winner = (PlayerFinalCardsWithTableCards.index(bestHand))
#print (winnerCardType)
CollusionPlayers = 0
if winner < CollusionPlayers + 1:
wins += 1
winnerCardTypeList.append(winnerCardType)
# winnerlist.append(winner)
# self.equity=wins/m
# if self.equity>0.99: self.equity=0.99
# EquityList.append(self.equity)
if time.time()>timeout_start+maxSecs:
break
self.equity = wins / runs
self.winnerCardTypeList = Counter(winnerCardTypeList)
for key, value in self.winnerCardTypeList.items():
self.winnerCardTypeList[key] = value / runs
self.winTypesDict=self.winnerCardTypeList.items()
# show how the montecarlo converges
# xaxis=range(500,monteCarloRuns)
# plt.plot(xaxis,EquityList[499:monteCarloRuns])
# plt.show()
return self.equity,self.winTypesDict
if __name__ == '__main__':
Simulation = MonteCarlo()
mycards=[['AS', 'KS']]
cardsOnTable = []
players = 3
start_time = time.time()
Simulation.RunMonteCarlo(mycards, cardsOnTable, players, 1, maxRuns=200000, maxSecs=120)
print("--- %s seconds ---" % (time.time() - start_time))
equity = Simulation.equity # considering draws as wins
print (equity)
This algorithm is too long and complex for any sensible suggestion apart from generic ones. So here you go: vectorise everything you can and work with vectors of data instead of loops, lists, insertions, removals. Montecarlo simulations allow for this quite nicely because all samples are independent.
For example, you want 1 million retries, generate a numpy array of random 1 million values in one line. You want to test outcome, place 1 million outcomes into a numpy array and test all of them at once against a given condition generating a numpy array of booleans.
If you manage to vectorize this you will get tremendous performance increases.
Inefficient array operations are probably causing most of the slowness. In particular, I see a lot of lines like this:
Deck.pop(some_index)
Every time you do that, the list has to shift all the elements after that index. You can eliminate this (as well as repeated random functions) by using:
from random import shuffle
# Setup
originalDeck = createCardDeck()
# Individual run
Deck = originalDeck[:]
shuffle(Deck) # shuffles in place
# Draw a card from the end so nothing needs to shift
card = Deck.pop()
# Concise and efficient!
There are some other things you could do, like change long if-else chains into constant time lookups with a dict or custom class. Right now it checks for a straight flush first down to a high card last for every hand, which seems rather inefficient. I doubt that it would make a big difference though.
I'm using simulated annealing to help solve a problem such as the travelling salesman problem. I get a solution, which I'm happy with, but I would now like to know the path, within the solution space (that is, the steps taken by the algorithm to reach the solution), the algorithm took to go from a random, far from optimal itinerary, to pretty optimal one.
In other words, in the case of the traveling salesman problem:
Start with a random itinerary between cities
Step 1: path between city A-C and E-F were swapped
Step 2: path between city G-U and S-Q were swapped
...
End up with a pretty optimal itinerary between cities
Is there a way, using simulated annealing, to save each steps of the optimization process so that I can retrace, one by one, each change made to the system which eventually led to that particular solution? Or this goes against how a simulated annealing works?
Here is a great simulated annealing algorithm, by #perrygeo, very slightly modified from https://github.com/perrygeo/simanneal/blob/master/simanneal/anneal.py using the travelling salesman example: https://github.com/perrygeo/simanneal/blob/master/examples/salesman.py
(All my changes are followed by a one-line comment preceded by """)
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import unicode_literals
import copy
import math
import sys
import time
import random
def round_figures(x, n):
"""Returns x rounded to n significant figures."""
return round(x, int(n - math.ceil(math.log10(abs(x)))))
def time_string(seconds):
"""Returns time in seconds as a string formatted HHHH:MM:SS."""
s = int(round(seconds)) # round to nearest second
h, s = divmod(s, 3600) # get hours and remainder
m, s = divmod(s, 60) # split remainder into minutes and seconds
return '%4i:%02i:%02i' % (h, m, s)
class Annealer(object):
"""Performs simulated annealing by calling functions to calculate
energy and make moves on a state. The temperature schedule for
annealing may be provided manually or estimated automatically.
"""
Tmax = 25000.0
Tmin = 2.5
steps = 50000
updates = 100
copy_strategy = 'deepcopy'
def __init__(self, initial_state):
self.initial_state = initial_state
self.state = self.copy_state(initial_state)
def set_schedule(self, schedule):
"""Takes the output from `auto` and sets the attributes
"""
self.Tmax = schedule['tmax']
self.Tmin = schedule['tmin']
self.steps = int(schedule['steps'])
def copy_state(self, state):
"""Returns an exact copy of the provided state
Implemented according to self.copy_strategy, one of
* deepcopy : use copy.deepcopy (slow but reliable)
* slice: use list slices (faster but only works if state is list-like)
* method: use the state's copy() method
"""
if self.copy_strategy == 'deepcopy':
return copy.deepcopy(state)
elif self.copy_strategy == 'slice':
return state[:]
elif self.copy_strategy == 'method':
return state.copy()
def update(self, step, T, E, acceptance, improvement):
"""Prints the current temperature, energy, acceptance rate,
improvement rate, elapsed time, and remaining time.
The acceptance rate indicates the percentage of moves since the last
update that were accepted by the Metropolis algorithm. It includes
moves that decreased the energy, moves that left the energy
unchanged, and moves that increased the energy yet were reached by
thermal excitation.
The improvement rate indicates the percentage of moves since the
last update that strictly decreased the energy. At high
temperatures it will include both moves that improved the overall
state and moves that simply undid previously accepted moves that
increased the energy by thermal excititation. At low temperatures
it will tend toward zero as the moves that can decrease the energy
are exhausted and moves that would increase the energy are no longer
thermally accessible."""
elapsed = time.time() - self.start
if step == 0:
print(' Temperature Energy Accept Improve Elapsed Remaining')
print('%12.2f %12.2f %s ' % \
(T, E, time_string(elapsed)))
else:
remain = (self.steps - step) * (elapsed / step)
print('%12.2f %12.2f %7.2f%% %7.2f%% %s %s' % \
(T, E, 100.0 * acceptance, 100.0 * improvement,
time_string(elapsed), time_string(remain)))
def anneal(self):
"""Minimizes the energy of a system by simulated annealing.
Parameters
state : an initial arrangement of the system
Returns
(state, energy): the best state and energy found.
"""
step = 0
self.start = time.time()
steps = [] ### initialise a list to save the steps taken by the algorithm to find a good solution
# Precompute factor for exponential cooling from Tmax to Tmin
if self.Tmin <= 0.0:
raise Exception('Exponential cooling requires a minimum "\
"temperature greater than zero.')
Tfactor = -math.log(self.Tmax / self.Tmin)
# Note initial state
T = self.Tmax
E = self.energy()
prevState = self.copy_state(self.state)
prevEnergy = E
bestState = self.copy_state(self.state)
bestEnergy = E
trials, accepts, improves = 0, 0, 0
if self.updates > 0:
updateWavelength = self.steps / self.updates
self.update(step, T, E, None, None)
# Attempt moves to new states
while step < self.steps:
step += 1
T = self.Tmax * math.exp(Tfactor * step / self.steps)
a,b = self.move()
E = self.energy()
dE = E - prevEnergy
trials += 1
if dE > 0.0 and math.exp(-dE / T) < random.random():
# Restore previous state
self.state = self.copy_state(prevState)
E = prevEnergy
else:
# Accept new state and compare to best state
accepts += 1
if dE < 0.0:
improves += 1
prevState = self.copy_state(self.state)
prevEnergy = E
steps.append([a,b]) ### append the "good move" to the list of steps
if E < bestEnergy:
bestState = self.copy_state(self.state)
bestEnergy = E
if self.updates > 1:
if step // updateWavelength > (step - 1) // updateWavelength:
self.update(
step, T, E, accepts / trials, improves / trials)
trials, accepts, improves = 0, 0, 0
# Return best state and energy
return bestState, bestEnergy, steps ### added steps to what should be returned
def distance(a, b):
"""Calculates distance between two latitude-longitude coordinates."""
R = 3963 # radius of Earth (miles)
lat1, lon1 = math.radians(a[0]), math.radians(a[1])
lat2, lon2 = math.radians(b[0]), math.radians(b[1])
return math.acos(math.sin(lat1) * math.sin(lat2) +
math.cos(lat1) * math.cos(lat2) * math.cos(lon1 - lon2)) * R
class TravellingSalesmanProblem(Annealer):
"""Test annealer with a travelling salesman problem.
"""
# pass extra data (the distance matrix) into the constructor
def __init__(self, state, distance_matrix):
self.distance_matrix = distance_matrix
super(TravellingSalesmanProblem, self).__init__(state) # important!
def move(self):
"""Swaps two cities in the route."""
a = random.randint(0, len(self.state) - 1)
b = random.randint(0, len(self.state) - 1)
self.state[a], self.state[b] = self.state[b], self.state[a]
return a,b ### return the change made
def energy(self):
"""Calculates the length of the route."""
e = 0
for i in range(len(self.state)):
e += self.distance_matrix[self.state[i-1]][self.state[i]]
return e
if __name__ == '__main__':
# latitude and longitude for the twenty largest U.S. cities
cities = {
'New York City': (40.72, 74.00),
'Los Angeles': (34.05, 118.25),
'Chicago': (41.88, 87.63),
'Houston': (29.77, 95.38),
'Phoenix': (33.45, 112.07),
'Philadelphia': (39.95, 75.17),
'San Antonio': (29.53, 98.47),
'Dallas': (32.78, 96.80),
'San Diego': (32.78, 117.15),
'San Jose': (37.30, 121.87),
'Detroit': (42.33, 83.05),
'San Francisco': (37.78, 122.42),
'Jacksonville': (30.32, 81.70),
'Indianapolis': (39.78, 86.15),
'Austin': (30.27, 97.77),
'Columbus': (39.98, 82.98),
'Fort Worth': (32.75, 97.33),
'Charlotte': (35.23, 80.85),
'Memphis': (35.12, 89.97),
'Baltimore': (39.28, 76.62)
}
# initial state, a randomly-ordered itinerary
init_state = list(cities.keys())
random.shuffle(init_state)
reconstructed_state = init_state
# create a distance matrix
distance_matrix = {}
for ka, va in cities.items():
distance_matrix[ka] = {}
for kb, vb in cities.items():
if kb == ka:
distance_matrix[ka][kb] = 0.0
else:
distance_matrix[ka][kb] = distance(va, vb)
tsp = TravellingSalesmanProblem(init_state, distance_matrix)
# since our state is just a list, slice is the fastest way to copy
tsp.copy_strategy = "slice"
state, e, steps = tsp.anneal()
while state[0] != 'New York City':
state = state[1:] + state[:1] # rotate NYC to start
print("Results:")
for city in state:
print("\t", city)
### recontructed the annealing process
print("")
print("nbr. of steps:",len(steps))
print("Reconstructed results:")
for s in steps:
reconstructed_state[s[0]], reconstructed_state[s[1]] = reconstructed_state[s[1]], reconstructed_state[s[0]]
while reconstructed_state[0] != 'New York City':
reconstructed_state = reconstructed_state[1:] + reconstructed_state[:1] # rotate NYC to start
for city in reconstructed_state:
print("\t", city)
Saving every time a move is made builds a huge list of steps that are indeed retraceable. However it obviously mimics how the algorithm explores and jumps around many different positions in the solution space, especially at high temperatures.
To get more straightly converging steps, I could move step-saving line:
steps.append([a,b]) ### append the "good move" to the list of steps
under
if E < bestEnergy:
Where only the steps actually improving upon the best found solution so far are saved. However, the final list of steps does not help reconstruct the itinerary anymore (steps are missing).
Is this problem hopeless and inherent to how simulated annealing works, or is there hope to being able to construct a converging step-list from random to quasi-optimal?
Saving the steps along the way does not depend on the algorithm, i.e. simulated annealing, but on your implementation or software you use. If it is your own implementation, it should be no problem at all to save these steps. If you use an event-listener approach, you can add arbitrary no. of listeners/clients to process your events. But you can also just write out your data to one client, e.g. file. If you do not use your own implementation, it depends on the software. If you use proprietary software, you are dependent on its API. If you use open source software and you do not find ways to retrieve such information in its API, you are usually allowed to modify the software for your own needs.
Ultimately the move itself is not important except in the context of rest of the system state. Instead of trying to track the move or change in state that resulted in each low energy solution (which, as you point out, loses all the information about previous moves that were accepted), you could just track the state itself every time you encountered a new lowest score:
if E < bestEnergy:
list_of_best_states.append(self.copy_state(self.state))
Given any randomized algorithm, if you use a pseudorandom generator, then you can save the generator's seed (random but short) and replay the execution by seeding the generator the same way.
I have started with the Ingber ASA code for a more conventional SA problem involving estimating about 72 error variables to give a best fit to measured data.
In my case, there are a limited number of trials that result in a reduced cost. Of several million trials, typically less than 40 are accepted. If I wanted to keep track of them, I would add a step in the code that accepts a new position to log the difference between the then-current and the now-new position.
Replaying these differences would give you the path.