Parallelisation of first-to-survive-wins loop

Parallelisation of first-to-survive-wins loop - random

I have a problem, which, when simplified:
has a loop which samples new points
evaluates them with a complex/slow function
accepts them if the value is above an ever-increasing threshold.
Here is example code for illustration:
from numpy.random import uniform
from time import sleep
def userfunction(x):
# do something complicated
# but computation always takes takes roughly the same time
sleep(1) # comment this out if too slow
xnew = uniform() # in reality, a non-trivial function of x
y = -0.5 * xnew**2
return xnew, y
x0, cur = userfunction([])
x = [x0] # a sequence of points
while cur < -2e-16:
# this should be parallelised
# search for a new point higher than a threshold
x1, next = userfunction(x)
if next <= cur:
# throw away (this branch is taken 99% of the time)
pass
else:
cur = next
print cur
x.append(x1) # note that userfunction depends on x
print x
I want to parallelise this (e.g. across a cluster), but the problem is that I need to terminate the other workers when a successful point has been found, or at least inform them of the new x (if they manage to get above the new threshold with an older x, the result is still acceptable). As long as no point has been successful, I need the workers repeat.
I am looking for tools/frameworks which can handle this type of problem, in any scientific programming language (C, C++, Python, Julia, etc., no Fortran please).
Can this be solved with MPI semi-elegantly? I don't understand how I can inform/interrupt/update workers with MPI.
Update: added code comments to say most tries are unsuccessful and do not influence the variable userfunction depends on.

if userfunction() does not take too long, then here is an option that qualifies for "MPI semi-elegantly"
in order to keep thing simple, let's assume rank 0 is only an orchestrator and does not compute anything.
on rank 0
cur = 0
x = []
while cur < -2e-16:
MPI_Recv(buf=cur+x1, src=MPI_ANY_SOURCE)
x.append(x1)
MPI_Ibcast(buf=cur+x, root=0, request=req)
MPI_Wait(request=req)
on rank != 0
x0, cur = userfunction([])
x = [x0] # a sequence of points
while cur < -2e-16:
MPI_Ibcast(buf=newcur+newx, root=0, request=req
# search for a new point higher than a threshold
x1, next = userfunction(x)
if next <= cur:
# throw away (this branch is taken 99% of the time)
MPI_Test(request=ret, flag=found)
if found:
MPI_Wait(request)
else:
cur = next
MPI_Send(buffer=cur+x1, dest=0)
MPI_Wait(request)
extra logic is needed to correctly handle
- rank 0 does computation too
- several ranks find the solution at the same time, subsequent messages must be consumed by rank 0
strictly speaking, a task is not "interrupted" when a solution is found on an other task. instead, each task check periodically if the solution was found by an other task. so there is a delay between the time a solution if found somewhere and all tasks stop looking for solutions, but if userfunction() does not take "too long", this looks very acceptable to me.

I solved it roughly with the following code.
This transmits only curmax at the moment, but one can send the other array with a second broadcast+tag.
import numpy
import time
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
import logging
logging.basicConfig(filename='mpitest%d.log' % rank,level=logging.DEBUG)
logFormatter = logging.Formatter("[%(name)s %(levelname)s]: %(message)s")
consoleHandler = logging.StreamHandler()
consoleHandler.setFormatter(logFormatter)
consoleHandler.setLevel(logging.INFO)
logging.getLogger().addHandler(consoleHandler)
log = logging.getLogger(__name__)
if rank == 0:
curmax = numpy.random.random()
seq = [curmax]
log.info('%d broadcasting starting value %f...' % (rank, curmax))
comm.Ibcast(numpy.array([curmax]))
was_updated = False
while True:
# check if news available
status = MPI.Status()
a_avail = comm.iprobe(source=MPI.ANY_SOURCE, tag=12, status=status)
if a_avail:
sugg = comm.recv(source=status.Get_source(), tag=12)
log.info('%d received new limit from %d: %s' % (rank, status.Get_source(), sugg))
if sugg < curmax:
curmax = sugg
seq.append(curmax)
log.info('%d updating to %s' % (rank, curmax))
was_updated = True
else:
# ignore
pass
# check if next message is already waiting:
if comm.iprobe(source=MPI.ANY_SOURCE, tag=12):
# consume it first before broadcasting outdated info
continue
if was_updated:
log.info('%d broadcasting new limit %f...' % (rank, curmax))
comm.Ibcast(numpy.array([curmax]))
was_updated = False
else:
# no message waiting for us and no broadcast done, so pause
time.sleep(0.1)
print
print data, rank
else:
log.info('%d waiting for root to send us starting value...' % (rank))
nextmax = numpy.empty(1, dtype=float)
comm.Ibcast(nextmax).Wait()
amax = float(nextmax)
numpy.random.seed(rank)
update_req = comm.Ibcast(nextmax)
while True:
a = numpy.random.uniform()
if a < amax:
log.info('%d found new: %s, sending to root' % (rank, a))
amax = a
comm.isend(a, dest=0, tag=12)
s = update_req.Get_status()
#log.info('%d bcast status: %s' % (rank, s))
if s:
update_req.Wait()
log.info('%d receiving new limit from root, %s' % (rank, nextmax))
amax = float(nextmax)
update_req = comm.Ibcast(nextmax)

Related

Problem by dictionaries to use numba njit parallelization to accelerate the code

I have written a code and try to use numba for accelerating the code. The main goal of the code is to group some values based on a condition. In this regard, iter_ is used for converging the code to satisfy the condition. I prepared a small case below to reproduce the sample code:
import numpy as np
import numba as nb
rng = np.random.default_rng(85)
# --------------------------------------- small data volume ---------------------------------------
# values_ = {'R0': np.array([0.01090976, 0.01069902, 0.00724112, 0.0068463 , 0.01135723, 0.00990762,
# 0.01090976, 0.01069902, 0.00724112, 0.0068463 , 0.01135723]),
# 'R1': np.array([0.01836379, 0.01900166, 0.01864162, 0.0182823 , 0.01840322, 0.01653088,
# 0.01900166, 0.01864162, 0.0182823 , 0.01840322, 0.01653088]),
# 'R2': np.array([0.02430913, 0.02239156, 0.02225379, 0.02093393, 0.02408692, 0.02110411,
# 0.02239156, 0.02225379, 0.02093393, 0.02408692, 0.02110411])}
#
# params = {'R0': [3, 0.9490579204466154, 1825, 7.070272000000002e-05],
# 'R1': [0, 0.9729203826820172, 167 , 7.070272000000002e-05],
# 'R2': [1, 0.6031363088057902, 1316, 8.007296000000003e-05]}
#
# Sno, dec_, upd_ = 2, 100, 200
# -------------------------------------------------------------------------------------------------
# ----------------------------- UPDATED (medium and large data volumes) ---------------------------
# values_ = np.load("values_med.npy", allow_pickle=True)[()]
# params = np.load("params_med.npy", allow_pickle=True)[()]
values_ = np.load("values_large.npy", allow_pickle=True)[()]
params = np.load("params_large.npy", allow_pickle=True)[()]
Sno, dec_, upd_ = 2000, 1000, 200
# -------------------------------------------------------------------------------------------------
# values_ = [*values_.values()]
# params = [*params.values()]
# #nb.jit(forceobj=True)
# def test(values_, params, Sno, dec_, upd_):
final_dict = {}
for i, j in enumerate(values_.keys()):
Rand_vals = []
goal_sum = params[j][1] * params[j][3]
tel = goal_sum / dec_ * 10
if params[j][0] != 0:
for k in range(Sno):
final_sum = 0.0
iter_ = 0
t = 1
while not np.allclose(goal_sum, final_sum, atol=tel):
iter_ += 1
vals_group = rng.choice(values_[j], size=params[j][0], replace=False)
# final_sum = 0.0016 * np.sum(vals_group) # -----> For small data volume
final_sum = np.sum(vals_group ** 3) # -----> UPDATED For med or large data volume
if iter_ == upd_:
t += 1
tel = t * tel
values_[j] = np.delete(values_[j], np.where(np.in1d(values_[j], vals_group)))
Rand_vals.append(vals_group)
else:
Rand_vals = [np.array([])] * Sno
final_dict["R" + str(i)] = Rand_vals
# return final_dict
# test(values_, params, Sno, dec_, upd_)
At first, for applying numba on this code #nb.jit was used (forceobj=True is used for avoiding warnings and …), which will have adverse effect on the performance. nopython is checked, too, with #nb.njit which get the following error due to not supporting (as mentioned in 1, 2) dictionary type of the inputs:
cannot determine Numba type of <class 'dict'>
I don't know if (how) it could be handled by Dict from numba.typed (by converting created python dictionaries to numba Dict) or if converting the dictionaries to lists of arrays have any advantage. I think, parallelization may be possible if some code lines e.g. Rand_vals.append(vals_group) or else section or … be taken or be modified out of the function to get the same results as before, but I don't have any idea how to do so.
I will be grateful for helping utilize numba on this code. numba parallelization will be the most desired (probably the best applicable method in terms of performance) solution if it could.
Data:
medium data volume: values_med, params_med
large data volume: values_large, params_large

This code can be converted to Numba but it is not straightforward.
First of all, the dictionary and list type must be defined since Numba njit functions cannot directly operate on reflected lists (aka. pure-python lists). This is a bit tedious to do in Numba and the resulting code is a bit verbose:
String = nb.types.unicode_type
ValueArray = nb.float64[::1]
ValueDict = nb.types.DictType(String, ValueArray)
ParamDictValue = nb.types.Tuple([nb.int_, nb.float64, nb.int_, nb.float64])
ParamDict = nb.types.DictType(String, ParamDictValue)
FinalDictValue = nb.types.ListType(ValueArray)
FinalDict = nb.types.DictType(String, FinalDictValue)
Then you need to convert the input dictionaries:
nbValues = nb.typed.typeddict.Dict.empty(String, ValueArray)
for key,value in values_.items():
nbValues[key] = value.copy()
nbParams = nb.typed.typeddict.Dict.empty(String, ParamDictValue)
for key,value in params.items():
nbParams[key] = (nb.int_(value[0]), nb.float64(value[1]), nb.int_(value[2]), nb.float64(value[3]))
Then, you need to write the core function. np.allclose and np.isin are not implemented in Numba so they should be reimplemented manually. But the main point is that Numba does not support the rng Numpy object. I think it will certainly not support it any time soon. Note that Numba has an random numbers implementation that try to mimic the behavior of Numpy but the management of the seed is a bit different. Note also that results should be the same with the np.random.xxx Numpy functions if the seed is set to the same value (Numpy and Numba have different seed variables that are not synchronized).
#nb.njit(FinalDict(ValueDict, ParamDict, nb.int_, nb.int_, nb.int_))
def nbTest(values_, params, Sno, dec_, upd_):
final_dict = nb.typed.Dict.empty(String, FinalDictValue)
for i, j in enumerate(values_.keys()):
Rand_vals = nb.typed.List.empty_list(ValueArray)
goal_sum = params[j][1] * params[j][3]
tel = goal_sum / dec_ * 10
if params[j][0] != 0:
for k in range(Sno):
final_sum = 0.0
iter_ = 0
t = 1
vals_group = np.empty(0, dtype=nb.float64)
while np.abs(goal_sum - final_sum) > (1e-05 * np.abs(final_sum) + tel):
iter_ += 1
vals_group = np.random.choice(values_[j], size=params[j][0], replace=False)
final_sum = 0.0016 * np.sum(vals_group)
# final_sum = 0.0016 * np.sum(vals_group) # (for small data volume)
final_sum = np.sum(vals_group ** 3) # (for med or large data volume)
if iter_ == upd_:
t += 1
tel = t * tel
# Perform an in-place deletion
vals, gr = values_[j], vals_group
cur = 0
for l in range(vals.size):
found = False
for m in range(gr.size):
found |= vals[l] == gr[m]
if not found:
# Keep the value (delete it otherwise)
vals[cur] = vals[l]
cur += 1
values_[j] = vals[:cur]
Rand_vals.append(vals_group)
else:
for k in range(Sno):
Rand_vals.append(np.empty(0, dtype=nb.float64))
final_dict["R" + str(i)] = Rand_vals
return final_dict
Note that the replacement implementation of np.isin is quite naive but it works pretty well in practice on your input example.
The function can be called using the following way:
nbFinalDict = nbTest(nbValues, nbParams, Sno, dec_, upd_)
Finally, the dictionary should be converted back to basic Python objects:
finalDict = dict()
for key,value in nbFinalDict.items():
finalDict[key] = list(value)
This implementation is fast for small inputs but not large ones since np.random.choice takes almost all the time (>96%). The thing is this function is clearly not optimal when the number of requested item is small (which is your case). Indeed, it surprisingly runs in linear time of the input array and not in linear time of the number of requested items.
Further Optimizations
The algorithm can be completely rewritten to extract only 12 random items and discard them from the main currant array in a much more efficient way. The idea is to swap n items (small target sample) at the end of the array with other items at random locations, then check the sum, repeat this process until a condition is fulfilled, and finally extract the view to the last n items before resizing the view so to discard the last items. All of this can be done in O(n) time rather than O(m) time where m is the size of the main current array with n << m (eg. 12 VS 20_000). It can also be compute without any expensive allocation. Here is the resulting code:
#nb.njit(nb.void(ValueArray, nb.int_, nb.int_))
def swap(arr, i, j):
arr[i], arr[j] = arr[j], arr[i]
#nb.njit(FinalDict(ValueDict, ParamDict, nb.int_, nb.int_, nb.int_))
def nbTest(values_, params, Sno, dec_, upd_):
final_dict = nb.typed.Dict.empty(String, FinalDictValue)
for i, j in enumerate(values_.keys()):
Rand_vals = nb.typed.List.empty_list(ValueArray)
goal_sum = params[j][1] * params[j][3]
tel = goal_sum / dec_ * 10
values = values_[j]
n = params[j][0]
if n != 0:
for k in range(Sno):
final_sum = 0.0
iter_ = 0
t = 1
m = values.size
assert n <= m
group = values[-n:]
while np.abs(goal_sum - final_sum) > (1e-05 * np.abs(final_sum) + tel):
iter_ += 1
# Swap the group view with other random items
for pos in range(m - n, m):
swap(values, pos, np.random.randint(0, m))
# For small data volume:
# final_sum = 0.0016 * np.sum(group)
# For med/large data volume
final_sum = 0.0
for v in group:
final_sum += v ** 3
if iter_ == upd_:
t += 1
tel *= t
assert iter_ > 0
values = values[:m-n]
Rand_vals.append(group)
else:
for k in range(Sno):
Rand_vals.append(np.empty(0, dtype=nb.float64))
final_dict["R" + str(i)] = Rand_vals
return final_dict
In addition to being faster, this implementation as the benefit of being also simpler. Results looks quite similar to the previous implementation despite the randomness make the check of the results tricky (especially since this function does not use the same method to choose the random sample). Note that this implementation does not remove items in values that are in group as opposed to the previous one (this is probably not wanted though).
Benchmark
Here are the results of the last implementation on my machine (compilation and conversion timings excluded):
Provided small input (embedded in the question):
- Initial code: 42.71 ms
- Numba code: 0.11 ms
Medium input:
- Initial code: 3481 ms
- Numba code: 11 ms
Large input:
- Initial code: 6728 ms
- Numba code: 20 ms
Note that the conversion time takes about the same time than the computation.
This last implementation is 316~388 times faster than the initial code on small inputs.
Notes
Note that the compilation time takes few seconds due to the dict and lists types.
Note that while it may be possible to parallelise the implementation, only the most encompassing loop can be parallelised. The thing is there is only few items to compute and the time is already quite small (not the best case for multi-threading). <-- Additionally, the creation of many temporary arrays (created by rng.choice) will certainly cause the parallel loop not to scale well anyway. --> Additionally, the list/dict cannot be written from multiple threads safely so one need to use Numpy arrays in the whole function to be able to do that (or add additional conversion that are already expensive). Moreover, Numba parallelism tends to increase significantly the compilation time which is already significant. Finally, the result will be less deterministic since each Numba thread has its own random number generator seed and the items computed by the threads cannot be predicted with prange (dependent of the parallel runtime chosen on the target platform). Note that in Numpy there is one global seed by default used by usual random functions (deprecated way) and RNG objects have their own seed (new preferred way).

Minimum Jumps required from given number to 1

I am trying to solve the problem given below: (I solved it using recursion but I am having a hard time trying to use a cache to prevent a lot of the same steps being recalculated.
"""
Given a positive integer N, find the smallest number of steps it will take to reach 1.
There are two kinds of permitted steps:
You may decrement N to N - 1.
If a * b = N, you may decrement N to the larger of a and b.
For example, given 100, you can reach 1 in five steps with the following route:
100 -> 10 -> 9 -> 3 -> 2 -> 1.
"""
def multiples(num):
ret = []
start = 2
while start < num:
if num % start == 0:
ret.append(num // start)
start +=1
if ret and start >= ret[-1]:
break
return ret if ret else None
def min_jumps_no_cache(num, jumps=0):
if num == 1:
return jumps
mults = multiples(num)
res = []
res.append(min_jumps(num - 1, jumps +1))
if mults:
for mult in mults:
res.append(min_jumps(mult , jumps +1))
return min(res)
Now, I am trying to add a cache in here because of the obvious high runtime of this solution. But I have run into a similar issue before where I am overwriting the cache and I was curious if there is a solution to this.
def min_jumps_cache(num, jumps=0, cache={}):
if num == 1:
return jumps
if num in cache:
return cache[num]
mults = multiples(num)
res = []
temp = min_jumps_cache(num - 1, jumps +1, cache)
res.append(temp)
if mults:
for mult in mults:
res.append(min_jumps_cache(mult , jumps +1, cache))
temp = min(res)
cache[num] = min(temp, cache[num]) if num in cache else temp
return temp
It seems to me the issue here is you can't really cache an answer until you have calculated both its "left and right" solutions to find the answer. Is there something else I am missing here?

Your solution is fine so far as it goes.
However it will be more efficient to do a breadth-first solution from the bottom up. (This can trivially be optimized more. How is an exercise for the reader.)
def path (n):
path_from = {}
queue = [(1, None)]
while True:
value, prev = queue.pop(0)
value not in path_from:
path_from[value] = prev
if value == n:
break # EXIT HERE
queue.append((value+1, value))
for i in range(2, min(value+1, n//value + 1)):
queue.append((i*value, value))
answer = []
while n is not None:
answer.append(n)
n = path_from[n]
return(answer)
print(path(100))

How to parallelize computation of pairwise distance matrix?

My problem is roughly as follows. Given a numerical matrix X, where each row is an item. I want to find each row's nearest neighbor in terms of L2 distance in all rows except itself. I tried reading the official documentation but was still a little confused about how to achieve this. Could someone give me some hint?
My code is as follows
function l2_dist(v1, v2)
return sqrt(sum((v1 - v2) .^ 2))
end
function main(Mat, dist_fun)
n = size(Mat, 1)
Dist = SharedArray{Float64}(n) #[Inf for i in 1:n]
Id = SharedArray{Int64}(n) #[-1 for i in 1:n]
#parallel for i = 1:n
Dist[i] = Inf
Id[i] = 0
end
Threads.#threads for i in 1:n
for j in 1:n
if i != j
println(i, j)
dist_temp = dist_fun(Mat[i, :], Mat[j, :])
if dist_temp < Dist[i]
println("Dist updated!")
Dist[i] = dist_temp
Id[i] = j
end
end
end
end
return Dict("Dist" => Dist, "Id" => Id)
end
n = 4000
p = 30
X = [rand() for i in 1:n, j in 1:p];
main(X[1:30, :], l2_dist)
#time N = main(X, l2_dist)
I'm trying to distributed all the i's (i.e. calculating each row minimum) over different cores. But the version above apparently isn't working correctly. It is even slower than the sequential version. Can someone point me to the right direction? Thanks.

Maybe you're doing something in addition to what you have written down, but, at this point from what I can see, you aren't actually doing any computations in parallel. Julia requires you to tell it how many processors (or threads) you would like it to have access to. You can do this through either
Starting Julia with multiple processors julia -p # (where # is the number of processors you want Julia to have access to)
Once you have started a Julia "session" you can call the addprocs function to add additional processors.
To have more than 1 thread, you need to run command export JULIA_NUM_THREADS = #. I don't know very much about threading, so I will be sticking with the #parallel macro. I suggest reading documentation for more details on threading -- Maybe #Chris Rackauckas could expand a little more on the difference.
A few comments below about my code and on your code:
I'm on version 0.6.1-pre.0. I don't think I'm doing anything 0.6 specific, but this is a heads up just in case.
I'm going to use the Distances.jl package when computing the distances between vectors. I think it is a good habit to farm out as many of my computations to well-written and well-maintained packages as possible.
Rather than compute the distance between rows, I'm going to compute the distance between columns. This is because Julia is a column-major language, so this will increase the number of cache hits and give a little extra speed. You can obviously get the row-wise results you want by just transposing the input.
Unless you expect to have that many memory allocations then that many allocations are a sign that something in your code is inefficient. It is often a type stability problem. I don't know if that was the case in your code before, but that doesn't seem to be an issue in the current version (it wasn't immediately clear to me why you were having so many allocations).
Code is below
# Make sure all processors have access to Distances package
#everywhere using Distances
# Create a random matrix
nrow = 30
ncol = 4000
# Seed creation of random matrix so it is always same matrix
srand(42)
X = rand(nrow, ncol)
function main(X::AbstractMatrix{Float64}, M::Distances.Metric)
# Get size of the matrix
nrow, ncol = size(X)
# Create `SharedArray` to store output
ind_vec = SharedArray{Int}(ncol)
dist_vec = SharedArray{Float64}(ncol)
# Compute the distance between columns
#sync #parallel for i in 1:ncol
# Initialize various temporary variables
min_dist_i = Inf
min_ind_i = -1
X_i = view(X, :, i)
# Check distance against all other columns
for j in 1:ncol
# Skip comparison with itself
if i==j
continue
end
# Tell us who is doing the work
# (can uncomment if you want to verify stuff)
# println("Column $i compared with Column $j by worker $(myid())")
# Evaluate the new distance...
# If it is less then replace it, otherwise proceed
dist_temp = evaluate(M, X_i, view(X, :, j))
if dist_temp < min_dist_i
min_dist_i = dist_temp
min_ind_i = j
end
end
# Which column is minimum distance from column i
dist_vec[i] = min_dist_i
ind_vec[i] = min_ind_i
end
return dist_vec, ind_vec
end
# Using Euclidean metric
metric = Euclidean()
inds, dist = main(X, metric)
#time main(X, metric);
#show dist[[1, 5, 25]], inds[[1, 5, 25]]
You can run the code with
1 processor julia testfile.jl
% julia testfile.jl
0.640365 seconds (16.00 M allocations: 732.495 MiB, 3.70% gc time)
(dist[[1, 5, 25]], inds[[1, 5, 25]]) = ([2541, 2459, 1602], [1.40892, 1.38206, 1.32184])
n processors (in this case 4) julia -p n testfile.jl
% julia -p 4 testfile.jl
0.201523 seconds (2.10 k allocations: 99.107 KiB)
(dist[[1, 5, 25]], inds[[1, 5, 25]]) = ([2541, 2459, 1602], [1.40892, 1.38206, 1.32184])

How to speed up monte carlo simulation in python

I have written a poker simulator that calculates the probability to win in texas holdem by running a simulation of games (i.e. monte carlo simulation). Currently it runs around 10000 simulations in 10 seconds which in general is good enough. However, there are apps on the iPhone that are running around 100x faster. I'm wondering whether there's anything I can do to speed up the program significantly. I have already replaced strings with lists and found many ways to speed up the program by around 2-3 times. But what can I do to possibly speed it up 50x-100x? I checked with a profiler but couldn't find any significant bottleneck. I also complied it to cython (without making any changes) but that had no effect on speed either. Any suggestions are appreciated. The full listing is below:
__author__ = 'Nicolas Dickreuter'
import time
import numpy as np
from collections import Counter
class MonteCarlo(object):
def EvalBestHand(self, hands):
scores = [(i, self.score(hand)) for i, hand in enumerate(hands)]
winner = sorted(scores, key=lambda x: x[1], reverse=True)[0][0]
return hands[winner],scores[winner][1][-1]
def score(self, hand):
crdRanksOriginal = '23456789TJQKA'
originalSuits='CDHS'
rcounts = {crdRanksOriginal.find(r): ''.join(hand).count(r) for r, _ in hand}.items()
score, crdRanks = zip(*sorted((cnt, rank) for rank, cnt in rcounts)[::-1])
potentialThreeOfAKind = score[0] == 3
potentialTwoPair = score == (2, 2, 1, 1, 1)
potentialPair = score == (2, 1, 1, 1, 1, 1)
if score[0:2]==(3,2) or score[0:2]==(3,3): # fullhouse (three of a kind and pair, or two three of a kind)
crdRanks = (crdRanks[0],crdRanks[1])
score = (3,2)
elif score[0:4]==(2,2,2,1):
score=(2,2,1) # three pair are not worth more than two pair
sortedCrdRanks = sorted(crdRanks,reverse=True) # avoid for example 11,8,6,7
crdRanks=(sortedCrdRanks[0],sortedCrdRanks[1],sortedCrdRanks[2],sortedCrdRanks[3])
elif len(score) >= 5: # high card, flush, straight and straight flush
# straight
if 12 in crdRanks: # adjust if 5 high straight
crdRanks += (-1,)
sortedCrdRanks = sorted(crdRanks,reverse=True) # sort again as if pairs the first rank matches the pair
for i in range(len(sortedCrdRanks) - 4):
straight = sortedCrdRanks[i] - sortedCrdRanks[i + 4] == 4
if straight:
crdRanks=(sortedCrdRanks[i],sortedCrdRanks[i+1],sortedCrdRanks[i+2],sortedCrdRanks[i+3],sortedCrdRanks[i+4])
break
# flush
suits = [s for _, s in hand]
flush = max(suits.count(s) for s in suits) >= 5
if flush:
for flushSuit in originalSuits: # get the suit of the flush
if suits.count(flushSuit)>=5:
break
flushHand = [k for k in hand if flushSuit in k]
rcountsFlush = {crdRanksOriginal.find(r): ''.join(flushHand).count(r) for r, _ in flushHand}.items()
score, crdRanks = zip(*sorted((cnt, rank) for rank, cnt in rcountsFlush)[::-1])
crdRanks = tuple(sorted(crdRanks,reverse=True)) # ignore original sorting where pairs had influence
# check for straight in flush
if 12 in crdRanks: # adjust if 5 high straight
crdRanks += (-1,)
for i in range(len(crdRanks) - 4):
straight = crdRanks[i] - crdRanks[i + 4] == 4
# no pair, straight, flush, or straight flush
score = ([(5,), (2, 1, 2)], [(3, 1, 3), (5,)])[flush][straight]
if score == (1,) and potentialThreeOfAKind: score = (3, 1)
elif score == (1,) and potentialTwoPair: score = (2, 2, 1)
elif score == (1,) and potentialPair: score = (2, 1, 1)
if score[0]==5:
handType="StraightFlush"
#crdRanks=crdRanks[:5] # five card rule makes no difference {:5] would be incorrect
elif score[0]==4:
handType="FoufOfAKind"
crdRanks=crdRanks[:2]
elif score[0:2]==(3,2):
handType="FullHouse"
# crdRanks=crdRanks[:2] # already implmeneted above
elif score[0:3]==(3,1,3):
handType="Flush"
crdRanks=crdRanks[:5] # !! to be verified !!
elif score[0:3]==(3,1,2):
handType="Straight"
crdRanks=crdRanks[:5] # !! to be verified !!
elif score[0:2]==(3,1):
handType="ThreeOfAKind"
crdRanks=crdRanks[:3]
elif score[0:2]==(2,2):
handType="TwoPair"
crdRanks=crdRanks[:3]
elif score[0]==2:
handType="Pair"
crdRanks=crdRanks[:4]
elif score[0]==1:
handType="HighCard"
crdRanks=crdRanks[:5]
else: raise Exception('Card Type error!')
return score, crdRanks, handType
def createCardDeck(self):
values = "23456789TJQKA"
suites = "CDHS"
Deck=[]
[Deck.append(x+y) for x in values for y in suites]
return Deck
def distributeToPlayers(self, Deck, PlayerAmount, PlayerCardList, TableCardsList):
Players =[]
CardsOnTable = []
knownPlayers = 0
for PlayerCards in PlayerCardList:
FirstPlayer=[]
FirstPlayer.append(Deck.pop(Deck.index(PlayerCards[0])))
FirstPlayer.append(Deck.pop(Deck.index(PlayerCards[1])))
Players.append(FirstPlayer)
knownPlayers += 1
for c in TableCardsList:
CardsOnTable.append(Deck.pop(Deck.index(c))) # remove cards that are on the table from the deck
for n in range(0, PlayerAmount - knownPlayers):
plr=[]
plr.append(Deck.pop(np.random.random_integers(0,len(Deck)-1)))
plr.append(Deck.pop(np.random.random_integers(0,len(Deck)-1)))
Players.append(plr)
return Players, Deck
def distributeToTable(self, Deck, TableCardsList):
remaningRandoms = 5 - len(TableCardsList)
for n in range(0, remaningRandoms):
TableCardsList.append(Deck.pop(np.random.random_integers(0,len(Deck)-1)))
return TableCardsList
def RunMonteCarlo(self, originalPlayerCardList, originalTableCardsList, PlayerAmount, gui, maxRuns=6000,maxSecs=5):
winnerlist = []
EquityList = []
winnerCardTypeList=[]
wins = 0
runs=0
timeout_start=time.time()
for m in range(maxRuns):
runs+=1
try:
if gui.active==True:
gui.progress["value"] = int(round(m*100/maxRuns))
except:
pass
Deck = self.createCardDeck()
PlayerCardList = originalPlayerCardList[:]
TableCardsList = originalTableCardsList[:]
Players, Deck = self.distributeToPlayers(Deck, PlayerAmount, PlayerCardList, TableCardsList)
Deck5Cards = self.distributeToTable(Deck, TableCardsList)
PlayerFinalCardsWithTableCards = []
for o in range(0, PlayerAmount):
PlayerFinalCardsWithTableCards.append(Players[o]+Deck5Cards)
bestHand,winnerCardType=self.EvalBestHand(PlayerFinalCardsWithTableCards)
winner = (PlayerFinalCardsWithTableCards.index(bestHand))
#print (winnerCardType)
CollusionPlayers = 0
if winner < CollusionPlayers + 1:
wins += 1
winnerCardTypeList.append(winnerCardType)
# winnerlist.append(winner)
# self.equity=wins/m
# if self.equity>0.99: self.equity=0.99
# EquityList.append(self.equity)
if time.time()>timeout_start+maxSecs:
break
self.equity = wins / runs
self.winnerCardTypeList = Counter(winnerCardTypeList)
for key, value in self.winnerCardTypeList.items():
self.winnerCardTypeList[key] = value / runs
self.winTypesDict=self.winnerCardTypeList.items()
# show how the montecarlo converges
# xaxis=range(500,monteCarloRuns)
# plt.plot(xaxis,EquityList[499:monteCarloRuns])
# plt.show()
return self.equity,self.winTypesDict
if __name__ == '__main__':
Simulation = MonteCarlo()
mycards=[['AS', 'KS']]
cardsOnTable = []
players = 3
start_time = time.time()
Simulation.RunMonteCarlo(mycards, cardsOnTable, players, 1, maxRuns=200000, maxSecs=120)
print("--- %s seconds ---" % (time.time() - start_time))
equity = Simulation.equity # considering draws as wins
print (equity)

This algorithm is too long and complex for any sensible suggestion apart from generic ones. So here you go: vectorise everything you can and work with vectors of data instead of loops, lists, insertions, removals. Montecarlo simulations allow for this quite nicely because all samples are independent.
For example, you want 1 million retries, generate a numpy array of random 1 million values in one line. You want to test outcome, place 1 million outcomes into a numpy array and test all of them at once against a given condition generating a numpy array of booleans.
If you manage to vectorize this you will get tremendous performance increases.

Inefficient array operations are probably causing most of the slowness. In particular, I see a lot of lines like this:
Deck.pop(some_index)
Every time you do that, the list has to shift all the elements after that index. You can eliminate this (as well as repeated random functions) by using:
from random import shuffle
# Setup
originalDeck = createCardDeck()
# Individual run
Deck = originalDeck[:]
shuffle(Deck) # shuffles in place
# Draw a card from the end so nothing needs to shift
card = Deck.pop()
# Concise and efficient!
There are some other things you could do, like change long if-else chains into constant time lookups with a dict or custom class. Right now it checks for a straight flush first down to a high card last for every hand, which seems rather inefficient. I doubt that it would make a big difference though.

Retracing a simulated-annealing's optimization steps

I'm using simulated annealing to help solve a problem such as the travelling salesman problem. I get a solution, which I'm happy with, but I would now like to know the path, within the solution space (that is, the steps taken by the algorithm to reach the solution), the algorithm took to go from a random, far from optimal itinerary, to pretty optimal one.
In other words, in the case of the traveling salesman problem:
Start with a random itinerary between cities
Step 1: path between city A-C and E-F were swapped
Step 2: path between city G-U and S-Q were swapped
...
End up with a pretty optimal itinerary between cities
Is there a way, using simulated annealing, to save each steps of the optimization process so that I can retrace, one by one, each change made to the system which eventually led to that particular solution? Or this goes against how a simulated annealing works?
Here is a great simulated annealing algorithm, by #perrygeo, very slightly modified from https://github.com/perrygeo/simanneal/blob/master/simanneal/anneal.py using the travelling salesman example: https://github.com/perrygeo/simanneal/blob/master/examples/salesman.py
(All my changes are followed by a one-line comment preceded by """)
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
from __future__ import unicode_literals
import copy
import math
import sys
import time
import random
def round_figures(x, n):
"""Returns x rounded to n significant figures."""
return round(x, int(n - math.ceil(math.log10(abs(x)))))
def time_string(seconds):
"""Returns time in seconds as a string formatted HHHH:MM:SS."""
s = int(round(seconds)) # round to nearest second
h, s = divmod(s, 3600) # get hours and remainder
m, s = divmod(s, 60) # split remainder into minutes and seconds
return '%4i:%02i:%02i' % (h, m, s)
class Annealer(object):
"""Performs simulated annealing by calling functions to calculate
energy and make moves on a state. The temperature schedule for
annealing may be provided manually or estimated automatically.
"""
Tmax = 25000.0
Tmin = 2.5
steps = 50000
updates = 100
copy_strategy = 'deepcopy'
def __init__(self, initial_state):
self.initial_state = initial_state
self.state = self.copy_state(initial_state)
def set_schedule(self, schedule):
"""Takes the output from `auto` and sets the attributes
"""
self.Tmax = schedule['tmax']
self.Tmin = schedule['tmin']
self.steps = int(schedule['steps'])
def copy_state(self, state):
"""Returns an exact copy of the provided state
Implemented according to self.copy_strategy, one of
* deepcopy : use copy.deepcopy (slow but reliable)
* slice: use list slices (faster but only works if state is list-like)
* method: use the state's copy() method
"""
if self.copy_strategy == 'deepcopy':
return copy.deepcopy(state)
elif self.copy_strategy == 'slice':
return state[:]
elif self.copy_strategy == 'method':
return state.copy()
def update(self, step, T, E, acceptance, improvement):
"""Prints the current temperature, energy, acceptance rate,
improvement rate, elapsed time, and remaining time.
The acceptance rate indicates the percentage of moves since the last
update that were accepted by the Metropolis algorithm. It includes
moves that decreased the energy, moves that left the energy
unchanged, and moves that increased the energy yet were reached by
thermal excitation.
The improvement rate indicates the percentage of moves since the
last update that strictly decreased the energy. At high
temperatures it will include both moves that improved the overall
state and moves that simply undid previously accepted moves that
increased the energy by thermal excititation. At low temperatures
it will tend toward zero as the moves that can decrease the energy
are exhausted and moves that would increase the energy are no longer
thermally accessible."""
elapsed = time.time() - self.start
if step == 0:
print(' Temperature Energy Accept Improve Elapsed Remaining')
print('%12.2f %12.2f %s ' % \
(T, E, time_string(elapsed)))
else:
remain = (self.steps - step) * (elapsed / step)
print('%12.2f %12.2f %7.2f%% %7.2f%% %s %s' % \
(T, E, 100.0 * acceptance, 100.0 * improvement,
time_string(elapsed), time_string(remain)))
def anneal(self):
"""Minimizes the energy of a system by simulated annealing.
Parameters
state : an initial arrangement of the system
Returns
(state, energy): the best state and energy found.
"""
step = 0
self.start = time.time()
steps = [] ### initialise a list to save the steps taken by the algorithm to find a good solution
# Precompute factor for exponential cooling from Tmax to Tmin
if self.Tmin <= 0.0:
raise Exception('Exponential cooling requires a minimum "\
"temperature greater than zero.')
Tfactor = -math.log(self.Tmax / self.Tmin)
# Note initial state
T = self.Tmax
E = self.energy()
prevState = self.copy_state(self.state)
prevEnergy = E
bestState = self.copy_state(self.state)
bestEnergy = E
trials, accepts, improves = 0, 0, 0
if self.updates > 0:
updateWavelength = self.steps / self.updates
self.update(step, T, E, None, None)
# Attempt moves to new states
while step < self.steps:
step += 1
T = self.Tmax * math.exp(Tfactor * step / self.steps)
a,b = self.move()
E = self.energy()
dE = E - prevEnergy
trials += 1
if dE > 0.0 and math.exp(-dE / T) < random.random():
# Restore previous state
self.state = self.copy_state(prevState)
E = prevEnergy
else:
# Accept new state and compare to best state
accepts += 1
if dE < 0.0:
improves += 1
prevState = self.copy_state(self.state)
prevEnergy = E
steps.append([a,b]) ### append the "good move" to the list of steps
if E < bestEnergy:
bestState = self.copy_state(self.state)
bestEnergy = E
if self.updates > 1:
if step // updateWavelength > (step - 1) // updateWavelength:
self.update(
step, T, E, accepts / trials, improves / trials)
trials, accepts, improves = 0, 0, 0
# Return best state and energy
return bestState, bestEnergy, steps ### added steps to what should be returned
def distance(a, b):
"""Calculates distance between two latitude-longitude coordinates."""
R = 3963 # radius of Earth (miles)
lat1, lon1 = math.radians(a[0]), math.radians(a[1])
lat2, lon2 = math.radians(b[0]), math.radians(b[1])
return math.acos(math.sin(lat1) * math.sin(lat2) +
math.cos(lat1) * math.cos(lat2) * math.cos(lon1 - lon2)) * R
class TravellingSalesmanProblem(Annealer):
"""Test annealer with a travelling salesman problem.
"""
# pass extra data (the distance matrix) into the constructor
def __init__(self, state, distance_matrix):
self.distance_matrix = distance_matrix
super(TravellingSalesmanProblem, self).__init__(state) # important!
def move(self):
"""Swaps two cities in the route."""
a = random.randint(0, len(self.state) - 1)
b = random.randint(0, len(self.state) - 1)
self.state[a], self.state[b] = self.state[b], self.state[a]
return a,b ### return the change made
def energy(self):
"""Calculates the length of the route."""
e = 0
for i in range(len(self.state)):
e += self.distance_matrix[self.state[i-1]][self.state[i]]
return e
if __name__ == '__main__':
# latitude and longitude for the twenty largest U.S. cities
cities = {
'New York City': (40.72, 74.00),
'Los Angeles': (34.05, 118.25),
'Chicago': (41.88, 87.63),
'Houston': (29.77, 95.38),
'Phoenix': (33.45, 112.07),
'Philadelphia': (39.95, 75.17),
'San Antonio': (29.53, 98.47),
'Dallas': (32.78, 96.80),
'San Diego': (32.78, 117.15),
'San Jose': (37.30, 121.87),
'Detroit': (42.33, 83.05),
'San Francisco': (37.78, 122.42),
'Jacksonville': (30.32, 81.70),
'Indianapolis': (39.78, 86.15),
'Austin': (30.27, 97.77),
'Columbus': (39.98, 82.98),
'Fort Worth': (32.75, 97.33),
'Charlotte': (35.23, 80.85),
'Memphis': (35.12, 89.97),
'Baltimore': (39.28, 76.62)
}
# initial state, a randomly-ordered itinerary
init_state = list(cities.keys())
random.shuffle(init_state)
reconstructed_state = init_state
# create a distance matrix
distance_matrix = {}
for ka, va in cities.items():
distance_matrix[ka] = {}
for kb, vb in cities.items():
if kb == ka:
distance_matrix[ka][kb] = 0.0
else:
distance_matrix[ka][kb] = distance(va, vb)
tsp = TravellingSalesmanProblem(init_state, distance_matrix)
# since our state is just a list, slice is the fastest way to copy
tsp.copy_strategy = "slice"
state, e, steps = tsp.anneal()
while state[0] != 'New York City':
state = state[1:] + state[:1] # rotate NYC to start
print("Results:")
for city in state:
print("\t", city)
### recontructed the annealing process
print("")
print("nbr. of steps:",len(steps))
print("Reconstructed results:")
for s in steps:
reconstructed_state[s[0]], reconstructed_state[s[1]] = reconstructed_state[s[1]], reconstructed_state[s[0]]
while reconstructed_state[0] != 'New York City':
reconstructed_state = reconstructed_state[1:] + reconstructed_state[:1] # rotate NYC to start
for city in reconstructed_state:
print("\t", city)
Saving every time a move is made builds a huge list of steps that are indeed retraceable. However it obviously mimics how the algorithm explores and jumps around many different positions in the solution space, especially at high temperatures.
To get more straightly converging steps, I could move step-saving line:
steps.append([a,b]) ### append the "good move" to the list of steps
under
if E < bestEnergy:
Where only the steps actually improving upon the best found solution so far are saved. However, the final list of steps does not help reconstruct the itinerary anymore (steps are missing).
Is this problem hopeless and inherent to how simulated annealing works, or is there hope to being able to construct a converging step-list from random to quasi-optimal?

Saving the steps along the way does not depend on the algorithm, i.e. simulated annealing, but on your implementation or software you use. If it is your own implementation, it should be no problem at all to save these steps. If you use an event-listener approach, you can add arbitrary no. of listeners/clients to process your events. But you can also just write out your data to one client, e.g. file. If you do not use your own implementation, it depends on the software. If you use proprietary software, you are dependent on its API. If you use open source software and you do not find ways to retrieve such information in its API, you are usually allowed to modify the software for your own needs.

Ultimately the move itself is not important except in the context of rest of the system state. Instead of trying to track the move or change in state that resulted in each low energy solution (which, as you point out, loses all the information about previous moves that were accepted), you could just track the state itself every time you encountered a new lowest score:
if E < bestEnergy:
list_of_best_states.append(self.copy_state(self.state))

Given any randomized algorithm, if you use a pseudorandom generator, then you can save the generator's seed (random but short) and replay the execution by seeding the generator the same way.

I have started with the Ingber ASA code for a more conventional SA problem involving estimating about 72 error variables to give a best fit to measured data.
In my case, there are a limited number of trials that result in a reduced cost. Of several million trials, typically less than 40 are accepted. If I wanted to keep track of them, I would add a step in the code that accepts a new position to log the difference between the then-current and the now-new position.
Replaying these differences would give you the path.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parallelisation of first-to-survive-wins loop - random

Related

Problem by dictionaries to use numba njit parallelization to accelerate the code

Minimum Jumps required from given number to 1

How to parallelize computation of pairwise distance matrix?

How to speed up monte carlo simulation in python

Retracing a simulated-annealing's optimization steps

Categories

Resources