Multiprocess execution before map - multiprocessing

I'm running the following code, to perform the run_problem(point) function twice (just as a reproducible example, my real problem is obviously more complex than this).
import time, multiprocessing
def run_problem(point):
print(point)
points = [[90,3,3,3,3],[150,10,10,10,10]]
print('before pool')
if __name__ == '__main__':
tic = time.time()
pool = multiprocessing.Pool(2)
pool.map(run_problem, points)
pool.close()
toc = time.time()
print('Done in {:.4f} seconds'.format(toc-tic))
I would expect the first print statement should be only printed once. However, it turns out it is printed out 3 times. Output:
before pool
before pool
before pool
[90, 3, 3, 3, 3]
[150, 10, 10, 10, 10]
Done in 0.4283 seconds
Why?

Related

Find local min based on the length of occurences of successive means without falling in wrong min

1. Problem description
I have the following list of values [10, 10, 10, 10, 5, 5, 5, 5, 7, 7, 7, 2, 4, 3, 3, 3, 10] It is shown in the following picture.
What I want to do is find the minimum based on the value of the element and
its duration. From the previous list we can construct the following dictionary (key:val) :[10:4, 5:4, 7:2, 2:1, 4:1, 3:3, 10:1]. Meaning we have 4 sucessive 10s followed by 4 successive 5s, 2 successive 7s and 3 successive 3s.
Based on what I said the local min is 5. But I don't want that The local min should be 3. We didn't select 2 because it happened only once.
Do you have an idea on how we can solve that problem. Is there an existing method that can be used to solve it?
Of course we can sort the dictionary by values [10:4, 5:4, 7:2, 3:3, 10:1] and select the lowest key that has a value different than 1. Is that a good solution?
2. Selection criteria
must be a local min (find_local_min(prices))
must have the highest numbers of succession
the min succession must be > 1
AND I AM STUCK! because now I have 3 as local minimum but it is repeated only 3 times. I was testing if My idea is correct and I tried to find a counter example and I shot my foot
3. source code
the following code extracts the minimums with the dictionary:
#!/usr/bin/env python
import csv
import sys
import os
from collections import defaultdict
def find_local_min(prices):
i = 1
minPrices = []
while i < len(prices):
if prices[i] < prices[i-1]:
minPrices.append(prices[i])
j = i + 1
while j < len(prices) and prices[j] == prices[j-1]:
minPrices.append(prices[j])
j += 1
i = j
else:
i += 1
return minPrices
if __name__ == "__main__":
l = [10, 10, 10, 10, 5, 5, 5, 5, 7, 7, 7, 2,4, 3, 3, 3, 10]
minPrices = find_local_min(l)
minPriceDict = defaultdict(int)
for future in minPrices :
minPriceDict[future] += 1
print minPriceDict
As output if gives the following: defaultdict(<type 'int'>, {2: 1, 3:
3, 5: 4}) Based on this output the algorithm will select 5 as the min
because it is repeated 5 successive times. But that's wrong! it
should be 3. I really want to know how to solve that problem

Torch code producing CUDA Runtime Error

a friend of mine implemented a sparse version of torch.bmm that actually works, but when I try a test, I have a runtime error (that has nothing to do with this implementation), that I don't understand. I have seen a few topics about if but couldn't find a solution. Here is the code, and the error:
if __name__ == "__main__":
tmp = torch.zeros(1).cuda()
batch_csr = BatchCSR()
sparse_bmm = SparseBMM()
i=torch.LongTensor([[0,5,8], [1,5,8], [2,5,8]])
v=torch.FloatTensor([4,3,8])
s=torch.Size([3,500,500])
indices, values, size = i,v,s
a_ = torch.sparse.FloatTensor(indices, values, size).cuda().transpose(2, 1)
batch_size, num_nodes, num_faces = a_.size()
a = a_.to_dense()
for _ in range(10):
b = torch.randn(batch_size, num_faces, 16).cuda()
torch.cuda.synchronize()
time1 = time.time()
result = torch.bmm(a, b)
torch.cuda.synchronize()
time2 = time.time()
print("{} CuBlas dense bmm".format(time2 - time1))
torch.cuda.synchronize()
time1 = time.time()
col_ind, col_ptr = batch_csr(a_.indices(), a_.size())
my_result = sparse_bmm(a_.values(), col_ind, col_ptr, a_.size(), b)
torch.cuda.synchronize()
time2 = time.time()
print("{} My sparse bmm".format(time2 - time1))
print("{} Diff".format((result-my_result).abs().max()))
And the error:
Traceback (most recent call last):
File "sparse_bmm.py", line 72, in <module>
b = torch.randn(3, 500, 16).cuda()
File "/home/bizeul/virtual_env/lib/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorCopy.c:18
When running with the command CUDA_LAUNCH_BLOCKING=1, I get the error :
/b/wheel/pytorch-src/torch/lib/THC/THCTensorIndex.cu:121: void indexAddSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion `dstIndex < dstAddDimSize` failed.
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THCS/generic/THCSTensorMath.cu line=292 error=59 : device-side assert triggered
Traceback (most recent call last):
File "sparse_bmm.py", line 69, in <module>
a = a_.to_dense()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THCS/generic/THCSTensorMath.cu:292
The indices that you are passing to create the sparse tensor are incorrect.
here is how it should be:
i = torch.LongTensor([[0, 1, 2], [5, 5, 5], [8, 8, 8]])
How to create a sparse tensor:
Lets take a simpler example. Lets say we want the following tensor:
0 0 0 2 0
0 0 0 0 0
0 0 0 0 20
[torch.cuda.FloatTensor of size 3x5 (GPU 0)]
As you can see, the number (2) needs to be in the (0, 3) location of the sparse tensor. And the number (20) needs to be in the (2, 4) location.
In order to create this, our index tensor should look like this
[[0 , 2],
[3 , 4]]
And, now for the code to create the above sparse tensor:
i=torch.LongTensor([[0, 2], [3, 4]])
v=torch.FloatTensor([2, 20])
s=torch.Size([3, 5])
a_ = torch.sparse.FloatTensor(indices, values, size).cuda()
More comments regarding the assert error by cuda:
Assertion 'dstIndex < dstAddDimSize' failed. tells us that, its highly likely, you've got an index out of bounds. So whenever you notice that, look for places where you might have supplied the wrong indices to any of the tensors.

Task scheduling to minimize waiting time algorithm

I am completely stuck on a task scheduling problem.
Here is the requirement:
Implement a scheduling algorithm that adds jobs to the regular queue and pushes them through in such a way that the average wait time for all jobs in the queue is minimized. A new job isn't pushed through unless it minimizes the average waiting time.
Assume that your program starts working at 0 seconds. A request for the ith job came at requestTimei, and let's assume that it takes jobProcessi seconds to process it.
def jobScheduling(requestTime, jobProcess, timeFromStart):
requestTimeAndDuration={}
for i in range(len(requestTime)):
job=[]
job.append(requestTime[i])
job.append(jobProcess[i])
requestTimeAndDuration[i]=job
taskProcessed=[]
previousEndTime=0
while (requestTimeAndDuration):
endTimes={}
for k,v in requestTimeAndDuration.items():
if(len(taskProcessed)==0):
previousEndTime=0
else:
previousEndTime=taskProcessed[-1][1]
#print previousEndTime
if(v[0]<=previousEndTime):
endTimes[k]= previousEndTime+v[1]
else:
endTimes[k]= v[0]+v[1]
endTimesSorted = sorted(endTimes.items(), key=lambda endTimes: endTimes[1])
nextJobId = endTimesSorted[0][0]
nextJobEndTime = endTimesSorted[0][1]
nextJob=[]
nextJob.append(nextJobId)
previousEndTime=0
if(len(taskProcessed)>0):
previousEndTime=taskProcessed[-1][1]
nextJobStarTime = nextJobEndTime-jobProcess[nextJobId]
nextJob.append(nextJobEndTime)
nextJob.append(nextJobStarTime)
taskProcessed.append(nextJob)
del requestTimeAndDuration[nextJobId]
print taskProcessed
My algorithm tries to sort the tasks by its end time, which is computed from previousEndTime + currentJobProcess
requestTime = [0, 5, 8, 11], jobProcess = [9, 4, 2, 1]
iteration 1:
task = [[0,9],[5,4],[8,2][11,1]]
PreviousEndTime=0 //since we started, there were no previous tasks 0+9=9, 5+4=9, 8+2=10, 11+1=12
endTime = {0:9, 1:9, 2:11, 3:12} //take task 0 and remove it from tasks
iteration 2:
task = [[5,4],[8,2][11,1]]
PreviousEndTime=9 9+4=13, 9+2=11, 11+1=12
endTime = {1:13,2:11,3:12} //remove task 2
iteration 3:
task = [[5,4],[11,1]]
previousEndTime=11
11+4=15, 11+1=12
endTime = {1:13,3:12} //remove task 3
iteration 4:
task = [[5,4],[11,1]]
previousEndTime=12
12+4=15
endTime = {1:16} //remove task 1
Final Result printed is [0,2,3,1]
My problem is that, my algorithm works for some cases, but not the complicated ones.
requestTime: [4, 6, 8, 8, 15, 16, 17, 21, 22, 25]
jobProcess: [30, 25, 14, 16, 26, 10, 11, 11, 14, 8]
The answer is [9, 5, 6, 7, 2, 8, 3, 1, 4]
But my algoritm produces [5, 9, 6, 7, 8, 3, 1, 4, 0]
So does anyone know how to do this problem? I'm afraid my algorithm may be fundamentally flawed.
I don't see a really neat solution like sorting by end time, but if there is such a solution, you should be able to get the same answer by sorting the tasks using as a comparator a function that works out which task should be scheduled first if those are the only two tasks to be considered.

Parallel Computing - Shuffle

I am looking to shuffle an array in parallel. I have found that doing an algorithm similar to bitonic sort but with a random (50/50) re-order results in an equal distribution but only if the array is a power of 2. I've considered the Yates Fisher Shuffle but I can't see how I could parallel-ize it in order to avoid O(N) computations.
Any advice?
Thanks!
There's a good clear recent paper on this here and the references, especially Shun et al 2015 are worth a read.
But basically you can do this using the same sort of approach that's used in sort -R: shuffle by giving each row a random key value and sorting on that key. And there are lots of ways to do good parallel distributed sort.
Here's a basic version in python + MPI using an odd-even sort; it goes through P communication steps if P is the number of processors. You can do better than that, but this is pretty simple to understand; it's discussed in this question.
from __future__ import print_function
import sys
import random
from mpi4py import MPI
comm = MPI.COMM_WORLD
def exchange(localdata, sendrank, recvrank):
"""
Perform a merge-exchange with a neighbour;
sendrank sends local data to recvrank,
which merge-sorts it, and then sends lower
data back to the lower-ranked process and
keeps upper data
"""
rank = comm.Get_rank()
assert rank == sendrank or rank == recvrank
assert sendrank < recvrank
if rank == sendrank:
comm.send(localdata, dest=recvrank)
newdata = comm.recv(source=recvrank)
else:
bothdata = list(localdata)
otherdata = comm.recv(source=sendrank)
bothdata = bothdata + otherdata
bothdata.sort()
comm.send(bothdata[:len(otherdata)], dest=sendrank)
newdata = bothdata[len(otherdata):]
return newdata
def print_by_rank(data, rank, nprocs):
""" crudely attempt to print data coherently """
for proc in range(nprocs):
if proc == rank:
print(str(rank)+": "+str(data))
comm.barrier()
return
def odd_even_sort(data):
rank = comm.Get_rank()
nprocs = comm.Get_size()
data.sort()
for step in range(1, nprocs+1):
if ((rank + step) % 2) == 0:
if rank < nprocs - 1:
data = exchange(data, rank, rank+1)
elif rank > 0:
data = exchange(data, rank-1, rank)
return data
def main():
# everyone get their data
rank = comm.Get_rank()
nprocs = comm.Get_size()
n_per_proc = 5
data = list(range(n_per_proc*rank, n_per_proc*(rank+1)))
if rank == 0:
print("Original:")
print_by_rank(data, rank, nprocs)
# tag your data with random values
data = [(random.random(), item) for item in data]
# now sort it by these random tags
data = odd_even_sort(data)
if rank == 0:
print("Shuffled:")
print_by_rank([x for _, x in data], rank, nprocs)
return 0
if __name__ == "__main__":
sys.exit(main())
Running gives:
$ mpirun -np 5 python mergesort_shuffle.py
Original:
0: [0, 1, 2, 3, 4]
1: [5, 6, 7, 8, 9]
2: [10, 11, 12, 13, 14]
3: [15, 16, 17, 18, 19]
4: [20, 21, 22, 23, 24]
Shuffled:
0: [19, 17, 4, 20, 9]
1: [23, 12, 3, 2, 8]
2: [14, 6, 13, 15, 1]
3: [11, 0, 22, 16, 18]
4: [5, 10, 21, 7, 24]

Give random integers and a transform function, after some times of transformation, it will run into a cycle

This is a derived question, you can refer to original question,
and my question is: Given 10 random integers(from 0 to 9, repeating allowed), and a transform funciton f, f is this(in python 3.3 code):
def f(a):
l = []
for i in range(10):
l.append(a.count(i))
return l
Supposing a is the ten random integers, execute f and assign the result back to a, repeat this process, after a few times, you wil run into a cycle.
It is to say: a, a1=f(a), a2=f(a1)..., there is a cycle in this sequence.
test code is as following(code from #user1125600):
import random
# [tortoise and hare algorithm][2] to detect cycle
a = []
for i in range(10):
a.append(random.randint(0,9))
print('random:', a)
fast = a
slow = a
i = 0
while True:
fast = f(f(fast))
slow = f(slow)
print('slow:', slow, 'fast:', fast)
i +=1
# in case of running into an infinite loop, we are limited to run no more than 10 times
if(i > 10):
print('more than 10 times, quit')
break
if fast == slow:
print('you are running in a cycle:', fast, 'loop times:', i)
break
how to prove why existing a cycle in it ? And another interesting thing is that: look at the results of test, you will find that fast and slow will meet only at three points:[7, 1, 0, 1, 0, 0, 1, 0, 0, 0] and [6, 3, 0, 0, 0, 0, 0, 1, 0, 0] and [6, 2, 1, 0, 0, 0, 1, 0, 0, 0]
There has to be a cycle because f is a function (it always produces the same output for a given input), and because the range of the function (the set of possible outputs) is finite. Since the range is finite, if you repeatedly map the range onto itself, you must eventually get some value you've already seen.

Resources