Julia: Unexpected Result when Using #parallel for Loop with SharedArray - for-loop

I am trying to learn parallel for loop in Julia, while I was confused by the result of the example code below:
addprocs(4)
#everywhere begin
N = 10
V = SharedArray{Int64,1}(N)
#sync #parallel for i = 1:N
V[i] = i
println(V[i])
end
end
By using println, I tried to identify which worker went through which step. Surprisingly, what I got from the above code was that each worker kept going through the entire iteration until the last worker (worker 3 in my case) finished the for loop:
From worker 4: 1
From worker 4: 2
From worker 4: 3
From worker 5: 1
From worker 5: 2
From worker 5: 3
From worker 5: 4
From worker 5: 5
From worker 5: 6
From worker 5: 4
From worker 5: 5
From worker 5: 6
From worker 2: 7
From worker 2: 8
From worker 2: 4
From worker 2: 5
From worker 2: 6
From worker 2: 9
From worker 2: 10
From worker 4: 1
From worker 4: 2
From worker 2: 7
From worker 2: 8
From worker 3: 9
From worker 5: 4
From worker 5: 5
From worker 5: 6
From worker 3: 10
From worker 3: 7
From worker 3: 8
From worker 3: 7
From worker 3: 8
From worker 3: 9
From worker 3: 10
From worker 2: 4
From worker 2: 5
From worker 2: 6
From worker 3: 7
From worker 3: 8
From worker 4: 3
From worker 4: 9
From worker 4: 10
From worker 4: 1
From worker 4: 2
From worker 4: 3
From worker 4: 1
From worker 4: 2
From worker 4: 3
From worker 5: 9
From worker 5: 10
This is not what I would expect from parallel for loop, as I thought the work should be distributed among the workers and then combined together, rather than being done separately for each of the worker. I mean, should the concept of parallel for loop be that each worker only goes through 2-3 steps?
Is something wrong with my code or I misunderstand the concept of parallel for loop?
Thanks!
Edit: I just realized that there is something to do with #everywhere. After I eliminated #everywhere, things work as expected.

You told julia (through #everywhere) to run the parallel loop on every worker and on the host process, hence you count from 1 to 10 in a parallel fashion 5 times. (Check that in the output you posted every number from 1 to 10 occurs precisely 5 times)
Slightly more detailed: First we note that the total number of processes is nprocs() == 5 (one "host" and 4 workers, check workers()). #everywhere tells Julia to run the content of the begin end block on every process, hence 5 times in our example. The content of the begin end block is "do a parallel loop, counting from 1 to 10". That is exactly what happened.
When you remove the #everywhere you do just a single parallel loop. Thus you will get exactly what you want:
julia> N = 10
V = SharedArray{Int64,1}(N)
#sync #parallel for i = 1:N
V[i] = i
println(V[i])
end
From worker 5: 9
From worker 5: 10
From worker 3: 4
From worker 3: 5
From worker 3: 6
From worker 2: 1
From worker 2: 2
From worker 2: 3
Suggested reading: https://docs.julialang.org/en/stable/manual/parallel-computing

Related

Julia Distributed, failed to modify the global variable of the worker

I try to keep some computation results in each workers and fetch them together after all computation is done. However, I could not actually modify the variable of the workers.
Here is a simplified example
using Distributed
addprocs(2)
#everywhere function modify_x()
global x
x += 1
println(x) # x will increase as expected
end
#everywhere x = 0
#sync #distributed for i in 1:10
modify_x()
end
fetch(#spawnat 2 x) # gives 0
This sample tries to modify x contained in each worker. I expect x to be like 5, but the final fetch gives the initial value 0
By running fetch(#spawnat 2 x) you unintentionally transferred the value of x from the current worker to worker 2.
See this example:
julia> x = 3
3
julia> fetch(#spawnat 2 x)
3
If you want to retrieve the value of x, you could try the following:
julia> #everywhere x = 0
julia> #sync #distributed for i in 1:10
modify_x()
end
From worker 3: 1
From worker 3: 2
From worker 3: 3
From worker 3: 4
From worker 3: 5
From worker 2: 1
From worker 2: 2
From worker 2: 3
From worker 2: 4
Task (done) #0x000000000d34a6d0 From worker 2: 5
julia> #everywhere function fetch_x()
return x
end
julia> fetch(#spawnat 2 fetch_x())
5
See https://docs.julialang.org/en/v1/manual/distributed-computing/#Global-variables

DolphinDB: create a large matrix with repeat copies of matrix A

What should I do if I want to repeat copies of a matrix into a 2-by-2 block arrangement?
For example:
A=1..4$2:2
#0 #1
-- --
1 3
2 4
I need to repeat it n times, the matrix should be
#0 #1 #2 #3
-- -- -- --
1 3 1 3
2 4 2 4
1 3 1 3
2 4 2 4
Any idea how to accomplish this without using loop?
Try the functionrepmat:
repmat(A,2,2)

Confusion about a practical example of FIFO Page Replacement Algorithm?

I'm doing some theoretical examples with different page replacement algorithms, in order to get a better understanding for when I actually write the code. I'm kind of confused about this example.
Given below is a physical memory with 4 tiles (4 sections?). The following pages are visited one after the other:
R = 1, 2, 3, 2, 4, 5, 3, 6, 1, 4, 2, 3, 1, 4
Run the FIFO page replacement algorithm on R with 4 tiles.
I know that when a page needs to be swapped in, the operating system swaps out the page which has been in the memory for the longest period of time. In practice I'll have:
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Page 1 2 3 2 4 5 3 6 1 4 2 3 1 4
Tile 1 1 1 1 1 1 5 5
Tile 2 2 2 2 2 2 2
Tile 3 3 3 3 3 3
Tile 4 4 4 4
I'm not sure what happens at time = 8. I know that it won't replace page 5 and 4, but I'm not sure between 3 and 2. Since at time = 4 we have a 2, does it mean that page 3 will be replaced? Or is it that, since at time = 4, we already had 2 in the memory, therefore at time = 8 we replace 2?
FIFO (First In, First out) means here: If space is needed for a new entry, the oldest entry will be replaced. This is in contrast to LRU (Last recently Used), wherever the entry that has not been used for the longest time is replaced. Consider your memory with four tiles at time 5:
Tile Page Time of loading
1 1 1
2 2 2
3 3 3
4 4 5
At time 6, space for page 5 is needed, so you had to replace one of the pages in the memory. According to the FIFO principle, page 1 is replaced here:
Tile Page Time of loading
1 5 6
2 2 2
3 3 3
4 4 5
This event repeats itself at the time 8, the oldest page in memory will be replaced:
Tile Page Time of loading
1 5 6
2 6 8
3 3 3
4 4 5
So it is helpful here to write the time of creation while doing this assignment.

Hungarian Algorithm not giving right result for multiple assignments

The problem scenario :
The number of tasks(n) is greater than the number of workers(m).
I need to assign multiple tasks to a single worker.
Here is the cost matrix
I have 6 tasks and 3 workers available.
C (i,j) = 1, for the cell which indicates, worker can be assigned to task.
C (i,j) = 1000, for the cell which indicates, worker can not be assigned to task.
The cost matrix
TASK/WORKER WORKER1 WORKER2 WORKER3
TASK 1 1 1000 1000
TASK 2 1000 1 1000
TASK 3 1000 1000 1000
TASK 4 1 1000 1000
TASK 5 1000 1 1000
TASK 6 1000 1000 1
Here , worker1 can do tasks ( TASK-1, TASK-4)
worker2 can do tasks ( TASK-2, TASK-5)
worker3 can do tasks ( TASK-6)
To create square matrix, I added dummy WORKERS : DWORKER1, DWORKER2 and DWORKER3) as follows and assigned very large value(1000000) to the cell value.
TASK/WORKER WORKER1 WORKER2 WORKER3 DWORKER1 DWORKER2 DWORKER3
TASK 1 1 1000 1000 1000000 100000 1000000
TASK 2 1000 1 1000 1000000 100000 1000000
TASK 3 1000 1000 1000 1000000 100000 1000000
TASK 4 1 1000 1000 1000000 100000 1000000
TASK 5 1000 1 1000 1000000 100000 1000000
TASK 6 1000 1000 1 1000000 100000 1000000
I used the scipy package scipy.optimize.linear_sum_assignment. As follows:
from scipy.optimize import linear_sum_assignment
cost = np.array([[1,1000,1000,1000000,100000,1000000],[1000,1,1000,1000000,1000000,1000000],[1000,1000,
1000,1000000,100000,1000000],[1,1000,1000,1000000,1000000,1000000],[1000,1,1000,1000000,100000, 1000000],[1000,1000,1,1000000,1000000,1000000]])
row_ind, col_ind = linear_sum_assignment(cost)
The output for col_ind is array([5, 3, 4, 0, 1, 2])
The output indicates (If I am not wrong):
- Assign 6th task to worker 1
- Assign 4th task to worker 2
- Assign 5th task to worker 3
- Assign 1st task to Dummy worker 1
- Assign 2nd task to Dummy worker 2
- Assign 3rd task to Dummy worker 3
What I am expecting is, assigning TASK ( 1, 2 and 3) to the real workers not the dummy workers.
Is that possible through this implementation? Or I am missing anything here?
Hungarian algorithm is for solving the assignment problem, where there is exactly one task assigned to each worker. By doing the trick you propose, you will indeed have 1 task assign to each dummy worker also.
If you are interested in only assigning tasks to real workers, and assigning multiple tasks, that is much easier : for each task, select the worker with the smallest cost. In your example, it means that worker 1 will do tasks 1 and 4, worker 2 will do task 2 and 5, worker 3 will do task 6, and task 3 will be done by one of the three workers (depending on how you handle the equality case).

Selection sort count number of swaps

http://www.cs.pitt.edu/~kirk/cs1501/animations/Sort1.html is this applet counting right? Selection sort for 5 4 3 2 1, I see 2 swaps, but the applet is counting 4 exchanges....
I guess it's a matter of definition. He is doing a swap in the end of every loop, even if he is swapping one element against itself. In his case, the swaps will be:
Original: 5 4 3 2 1
Swap pos 1 and 5: 1 4 3 2 5
Swap pos 2 and 4: 1 2 3 4 5
Swap pos 3 and 3: 1 2 3 4 5
Swap pos 4 and 4: 1 2 3 4 5
(No swap is done for the last element since that will always be in the correct place)
A simple if statement could be used to eliminate the two last swaps.

Resources