How do I produce reproducible randomness with OpenAi-Gym and Scoop?
I want to have the exact same results every time I repeat the example. If possible I want this to work with existing libraries which use randomness-provider (e.g. random and np.random), which can be a problem because they usually use the global random-state and don't provide an interface for a local random state
My example script looks like this:
import random
import numpy as np
from scoop import futures
import gym
def do(it):
random.seed(it)
np.random.seed(it)
env.seed(it)
env.action_space.seed(it)
env.reset()
observations = []
for i in range(3):
while True:
action = env.action_space.sample()
ob, reward, done, _ = env.step(action)
observations.append(ob)
if done:
break
return observations
env = gym.make("BipedalWalker-v3")
if __name__ == "__main__":
maxit = 20
results1 = futures.map(do, range(2, maxit))
results2 = futures.map(do, range(2, maxit))
for a,b in zip(results1, results2):
if np.array_equiv(a, b):
print("equal, yay")
else:
print("not equal :(")
expected output: equal, yay on every line
actual output: not equal :( on multipe lines
full output:
/home/chef/.venv/neuro/bin/python -m scoop /home/chef/dev/projekte/NeuroEvolution-CTRNN_new/random_test.py
[2020-05-18 18:05:03,578] launcher INFO SCOOP 0.7 1.1 on linux using Python 3.8.2 (default, Apr 27 2020, 15:53:34) [GCC 9.3.0], API: 1013
[2020-05-18 18:05:03,578] launcher INFO Deploying 4 worker(s) over 1 host(s).
[2020-05-18 18:05:03,578] launcher INFO Worker distribution:
[2020-05-18 18:05:03,578] launcher INFO 127.0.0.1: 3 + origin
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
equal, yay
not equal :(
not equal :(
not equal :(
not equal :(
not equal :(
equal, yay
not equal :(
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
not equal :(
equal, yay
equal, yay
equal, yay
not equal :(
[2020-05-18 18:05:08,554] launcher (127.0.0.1:37729) INFO Root process is done.
[2020-05-18 18:05:08,554] launcher (127.0.0.1:37729) INFO Finished cleaning spawned subprocesses.
Process finished with exit code 0
When I run this example without scoop, I get almost perfect results:
/home/chef/.venv/neuro/bin/python /home/chef/dev/projekte/NeuroEvolution-CTRNN_new/random_test.py
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/scoop/fallbacks.py:38: RuntimeWarning: SCOOP was not started properly.
Be sure to start your program with the '-m scoop' parameter. You can find further information in the documentation.
Your map call has been replaced by the builtin serial Python map().
warnings.warn(
not equal :(
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
Process finished with exit code 0
I could "solve" it by moving the creation of the gym into the do-function.
The full corrected code would look like this:
import random
import numpy as np
from scoop import futures
import gym
def do(it):
env = gym.make("BipedalWalker-v3")
random.seed(it)
np.random.seed(it)
env.seed(it)
env.action_space.seed(it)
env.reset()
observations = []
for i in range(3):
while True:
action = env.action_space.sample()
ob, reward, done, _ = env.step(action)
observations.append(ob)
if done:
break
return observations
if __name__ == "__main__":
maxit = 20
results1 = futures.map(do, range(2, maxit))
results2 = futures.map(do, range(2, maxit))
for a,b in zip(results1, results2):
if np.array_equiv(a, b):
print("equal, yay")
else:
print("not equal :(")
Related
I have a problem, which, when simplified:
has a loop which samples new points
evaluates them with a complex/slow function
accepts them if the value is above an ever-increasing threshold.
Here is example code for illustration:
from numpy.random import uniform
from time import sleep
def userfunction(x):
# do something complicated
# but computation always takes takes roughly the same time
sleep(1) # comment this out if too slow
xnew = uniform() # in reality, a non-trivial function of x
y = -0.5 * xnew**2
return xnew, y
x0, cur = userfunction([])
x = [x0] # a sequence of points
while cur < -2e-16:
# this should be parallelised
# search for a new point higher than a threshold
x1, next = userfunction(x)
if next <= cur:
# throw away (this branch is taken 99% of the time)
pass
else:
cur = next
print cur
x.append(x1) # note that userfunction depends on x
print x
I want to parallelise this (e.g. across a cluster), but the problem is that I need to terminate the other workers when a successful point has been found, or at least inform them of the new x (if they manage to get above the new threshold with an older x, the result is still acceptable). As long as no point has been successful, I need the workers repeat.
I am looking for tools/frameworks which can handle this type of problem, in any scientific programming language (C, C++, Python, Julia, etc., no Fortran please).
Can this be solved with MPI semi-elegantly? I don't understand how I can inform/interrupt/update workers with MPI.
Update: added code comments to say most tries are unsuccessful and do not influence the variable userfunction depends on.
if userfunction() does not take too long, then here is an option that qualifies for "MPI semi-elegantly"
in order to keep thing simple, let's assume rank 0 is only an orchestrator and does not compute anything.
on rank 0
cur = 0
x = []
while cur < -2e-16:
MPI_Recv(buf=cur+x1, src=MPI_ANY_SOURCE)
x.append(x1)
MPI_Ibcast(buf=cur+x, root=0, request=req)
MPI_Wait(request=req)
on rank != 0
x0, cur = userfunction([])
x = [x0] # a sequence of points
while cur < -2e-16:
MPI_Ibcast(buf=newcur+newx, root=0, request=req
# search for a new point higher than a threshold
x1, next = userfunction(x)
if next <= cur:
# throw away (this branch is taken 99% of the time)
MPI_Test(request=ret, flag=found)
if found:
MPI_Wait(request)
else:
cur = next
MPI_Send(buffer=cur+x1, dest=0)
MPI_Wait(request)
extra logic is needed to correctly handle
- rank 0 does computation too
- several ranks find the solution at the same time, subsequent messages must be consumed by rank 0
strictly speaking, a task is not "interrupted" when a solution is found on an other task. instead, each task check periodically if the solution was found by an other task. so there is a delay between the time a solution if found somewhere and all tasks stop looking for solutions, but if userfunction() does not take "too long", this looks very acceptable to me.
I solved it roughly with the following code.
This transmits only curmax at the moment, but one can send the other array with a second broadcast+tag.
import numpy
import time
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
import logging
logging.basicConfig(filename='mpitest%d.log' % rank,level=logging.DEBUG)
logFormatter = logging.Formatter("[%(name)s %(levelname)s]: %(message)s")
consoleHandler = logging.StreamHandler()
consoleHandler.setFormatter(logFormatter)
consoleHandler.setLevel(logging.INFO)
logging.getLogger().addHandler(consoleHandler)
log = logging.getLogger(__name__)
if rank == 0:
curmax = numpy.random.random()
seq = [curmax]
log.info('%d broadcasting starting value %f...' % (rank, curmax))
comm.Ibcast(numpy.array([curmax]))
was_updated = False
while True:
# check if news available
status = MPI.Status()
a_avail = comm.iprobe(source=MPI.ANY_SOURCE, tag=12, status=status)
if a_avail:
sugg = comm.recv(source=status.Get_source(), tag=12)
log.info('%d received new limit from %d: %s' % (rank, status.Get_source(), sugg))
if sugg < curmax:
curmax = sugg
seq.append(curmax)
log.info('%d updating to %s' % (rank, curmax))
was_updated = True
else:
# ignore
pass
# check if next message is already waiting:
if comm.iprobe(source=MPI.ANY_SOURCE, tag=12):
# consume it first before broadcasting outdated info
continue
if was_updated:
log.info('%d broadcasting new limit %f...' % (rank, curmax))
comm.Ibcast(numpy.array([curmax]))
was_updated = False
else:
# no message waiting for us and no broadcast done, so pause
time.sleep(0.1)
print
print data, rank
else:
log.info('%d waiting for root to send us starting value...' % (rank))
nextmax = numpy.empty(1, dtype=float)
comm.Ibcast(nextmax).Wait()
amax = float(nextmax)
numpy.random.seed(rank)
update_req = comm.Ibcast(nextmax)
while True:
a = numpy.random.uniform()
if a < amax:
log.info('%d found new: %s, sending to root' % (rank, a))
amax = a
comm.isend(a, dest=0, tag=12)
s = update_req.Get_status()
#log.info('%d bcast status: %s' % (rank, s))
if s:
update_req.Wait()
log.info('%d receiving new limit from root, %s' % (rank, nextmax))
amax = float(nextmax)
update_req = comm.Ibcast(nextmax)
I have the task to make a game against the computer. I should "think of a number" and the computer must guess it. In case it has correctly guessed it, I should say C and break out of the loop. In case of a Lower number, I should say L and the computer should try to generate a lower number. H is for higher and is the opposite situation. So far I have managed to successfully implement everything with one exception. With the code below, the if I tell the computer L for example, it will not exceed the limit of the last number, however, if I then say H, it will randomly generate the numbers again.
Please bear in mind this is a task for a beginners course (functions are NOT yet covered). We have to use loops. For the functions getInteger and getLetter, do not pay attention, they are functions our professor has created and are similar to input() but just restrict the user to enter something different than a letter or an integer.
Here is the code:
from pcinput import getLetter, getInteger
from random import random, randint, seed
mynum = getInteger("My number is:")
comnum = randint(0, 1000)
print("Is your number:", comnum, "?")
while True:
answer = getLetter("That is: ")
if answer == "C":
print("Congratulations!")
break
if answer == "L":
comnum = randint(0,comnum)
print("Is your number:", comnum, "?")
possibility = range(comnum, )
continue
elif answer == "H":
comnum = randint(comnum, 1000)
print("Is your number:", comnum, "?")
continue
*comnum is the letter the computer should be entering
My question is basically how to fix this code so that the computer will create some sort of a range between the first and the last guess and do not exceed or go below it thus shortening the interval between the guesses each time. (I hope you get my point).
Thank you a lot!
My method would be (starting from your code):
from random import random, randint, seed
mynum = int(input("My number is: "))
maximum = 1000
minimum = 0
while True:
comnum = randint(minimum, maximum)
print("Is your number:", comnum, "?")
answer = input("That is ")
if answer == "C":
print("Congratulations!")
break
if answer == "L":
maximum = comnum - 1
continue
elif answer == "H":
minimum = comnum + 1
Since ask.sagemath.org is down, I figure I'd ask this here...
I'm trying to parallelize the generation of a bunch of random primes using the random_prime function in sage. Here's some code:
#!/usr/bin/env sage -python
from __future__ import print_function
from sage.all import *
import time
N = 100
B = 200
length = (1 << B) - 1
print('Generating primes (sequentially)')
start = time.time()
ps = []
for _ in xrange(N):
ps.append(random_prime(length))
end = time.time()
print('Took: %f seconds' % (end - start))
print(ps)
#parallel(ncpus=10)
def _random_prime(length):
return random_prime(length)
print('Generating primes (in parallel)')
start = time.time()
ps = [length] * N
ps = list(_random_prime(ps))
end = time.time()
print('Took: %f seconds' % (end - start))
ps = [p for _, p in ps]
print(ps)
The first run through computes the primes sequentially, and it works.
The second run through computes them using sage's #parallel decorator. It "works" in the sense that the computation is parallelized, but all the output primes are the same (i.e., it doesn't generate 100 random primes but rather 100 instances of the same random prime). I'd think this'd be a common usecase of #parallel, however I cannot find any details on what the issue here is. Anyone have any ideas? Thanks.
Just a hint on this - is it possible that you'd need to set a different seed each time? Note that the doc doesn't have a #random in the test, so perhaps the seed for all the parallel instances is the same for some reason (which seems reasonable).
Edit to put Volker's more detailed description of how to do this:
import os
import sage.misc.randstate as randstate
randstate.set_random_seed(os.getpid())
This is a (pretty bad) solution to one of the project Euler problems. The problem was to find the 10_001st prime number. The code below does it, but it takes 8 minutes to run. Can you explain why that is the case and how to optimize it?
primes = []
number = 2.0
until primes[10000] != nil
if (2..(number - 1)).any? do |n|
number % n == 0
end == false
primes << number
end
number = number + 1.0
end
puts primes[10000]
Some simple optimizations to prime finding:
Start by pushing 2 onto your primes list, and start by checking if 3 is a prime. (This eliminates needing to write special case code for the numbers 0 to 2)
You only have to check numbers that are odd for prime candidacy. (Or, if you start by adding 2/3/5 and checking 7, you only need to check numbers that are 1 or 5 after doing % 6. Or... You get the idea)
You only have to see if your current candidate x is divisible by factors up to sqrt(x)—because any factor above sqrt(x) divides x into a number below sqrt(x), and you've already checked all of those.
You only have to check numbers in your prime list, instead of all numbers, for divisors of x - since all composite numbers are divisible by primes. For example, 81 is 9*9 - but 9*9 is 3*3*9, 9 being composite, so you'll discover it's a prime when you check it against 3. Therefore you never need to test if 9 is a factor, and so on for every composite factor.
There are very optimized, sped up prime finding functions (see the Sieve of Atkin for a start), but these are the common optimizations that are easy to come up with.
Do you really have to check if the number divides with all previous numbers? Check only with the smaller primes you already discovered. Also, why using floats where integers are perfectly fine?
EDIT:
Some possible changes (not best algorithm, can be improved):
primes = [2, 3, 5]
num = 7
until primes[10000]
is_prime = true
i = 0
sqrtnum = Math.sqrt(num).ceil
while (n=primes[i+=1]) <= sqrtnum
if num % n == 0
is_prime = false
break
end
end
if is_prime
primes << num
end
num += 2
end
puts primes[10000]
On my computer (for 1000 primes):
Yours:
real 0m3.300s
user 0m3.284s
sys 0m0.000s
Mine:
real 0m0.045s
user 0m0.040s
sys 0m0.004s
i just began learning ruby.
now im trying to code a little script which plays the montyhall problem
i got a problem with the last line of the code
numgames = 10000 # Number of games to play
switch = true # Switch your guess?
wins = 0
numgames.times do doors = [0, 0, 0] # Three doors!
doors[rand(3)] = 1 # One of them has a car!
guess = doors.delete_at(rand(3)) # We pick one of them!
doors.delete_at(doors[0] == 0 ? 0 : 1) # Take out one of the remaining doors that is not a car!
wins += switch ? doors[0] : guess end
puts "You decided #{switch ? "" : "not "}to switch, and your win % is #{wins.times ()/numgames}"
In the last line replace
wins.times ()
with
wins
times returns Enumerator, which doesn't play well with division.
Two problems:
First, wins and numgames are integers, and integer division returns an integer:
irb(main):001:0> 6632 / 10000
=> 0
So, change wins = 0 to wins = 0.0. This will force a floating point division, which will return a floating point answer.
Second, wins is a number, not an array. So get rid of wins.times() and wins.size(). Both are wrong.
With these two changes in place, I consistently get around 66% wins, which just goes to show that Marilyn vos Savant is way smarter than I am.
Your wins is an integer so you don't need .times or .size, you do, however, want .to_f to force things into floating point mode:
wins.to_f / numgames
And if you want a percentage, then you'll have to multiply by 100:
wins.to_f / numgames * 100
You should also properly indent your code for readability and break things up with line breaks to make it easier to read and easier for the parser to figure out:
numgames = 10000 # Number of games to play
switch = true # Switch your guess?
wins = 0
numgames.times do
doors = [0, 0, 0] # Three doors!
doors[rand(3)] = 1 # One of them has a car!
guess = doors.delete_at(rand(3)) # We pick one of them!
doors.delete_at(doors[0] == 0 ? 0 : 1) # Take out one of the remaining doors that is not a car!
wins += switch ? doors[0] : guess
end
puts "You decided #{switch ? "" : "not "}to switch, and your win % is #{100 * wins.to_f / numgames}"