Fischer's Mutual Exclusion Algorithm - algorithm

Two processes are trying to enter their critical sections, executing the same program:
while true do begin
// Noncritical section.
L: if not(id=0) then goto L;
id := i;
pause(delay);
if not(id=i) then goto L;
// Inside critical section.
id := 0;
end
The constant i identifies the process (i.e., has the value 1 or 2), and id is a global variable, with value 0 initially. The statement pause(delay) delays the execution for delay time units. It is assumed that the assignment id := i; takes at most t time units.
It has been proved that for delay > t the algorithm is correct.
I have two questions:
1) Suppose both processes A and B pass the control at label L. Suppose that at this point A get always chosen by the scheduler until it enters in its critical section. Suppose that, while A is in it critical section, the scheduler dispatches process B; since it has already passed the control at label L it also can enter in its critical section. Where am i wrong?
2) Why if delay == t the algorithm isn't correct?

Suppose that process A and B reach the label L at times t_A and t_B respectively (t_A < t_B) but the difference between these times is smaller-equal than t (worst-case assignment time). If it was larger than t, process B would stop at label L and wait until id=0.
As a result, process B will still see id=0 and assign its ID as well. But process A is not aware about this assignment yet. The only way for process A to get informed about this assignment is to wait for some time and re-check the value of id.
This waiting time must be larger that t. Why?
Let's consider two edge cases here
-case 1: t_A = t_B, in other words, process A and B reached label L at the same time. They both see id=0 and hence assign their IDs to it.
Let's assume that process A's assignment finishes in almost 0 time and process B's assignment finishes in worst-case time t. This means that process A has to delay more than t time, in order to see the process B's update to variable id. If delay is smaller-equal than t, the update will not be visible and they both will enter critical section. This is actually already sufficient for claiming that delay has to be larger than t.
-case 2: t_B = t_A + t, in other words, process A reaches label L, assigns its ID in worst-case time t, then after t time, process B reaches label L, checks id=0 (because assignment of process A has not finished yet) and assigns its ID in worst-case time t. Again here, if process A's delay will be smaller-equal than t, it will not see the update of process B.

Related

Find out efficient sequence

I am looking for a algorithm that can help me to find out the best way to assign the task to my team.
So here is the problem.
I have n team members (For example n=2) and I have to complete m task (for example m=4) and for every task every team member have their capacity to complete in time. Let say
One condition: the task can only be assigned continuously and output should be the minimum efforts.
in above example output would be 8. Either assign task1 & task2 to member1 and task3 & task4 to member2.
OR task1 to member1 and rest to member2 OR all the task to member2.
I know the stackoverflow helps developer to resolve the error but i don't understand how to build logic for the above problem.
Thanks in advance for suggestion of any algorithm to resolve this problem.
output: 6
My understanding of this problem is that we have a list of members, a list of tasks, and each member has a cost for each task. Starting at the beginning of the tasks we assign some to member 1, then the next block to member 2, then the next block to member 3 and so on. The order is fixed.
We are trying to minimize which member takes the longest.
Is that true?
If so then I recommend doing a binary search for the length of time that the longest member takes. Any time you don't complete everything in time, you have to increase the amount of time. Any time you complete everything, you decrease. The twist is that you need to keep track of what time period you would have made different choices at.
The heart of the algorithm is a function like this pseudocode:
try_time_period(members, costs, max_cost_per_member):
min_same_result = min_change_result = max_cost_per_member
i = 0
for each member:
cost = 0
while cost < max_cost_per_member:
this_cost = cost[member][i]
if max_cost_per_member <= cost + this_cost:
cost += this_cost
i++
if len(costs[member]) <= i:
return (True, min_same_result, min_change_result)
else:
if cost < min_same_result:
min_same_result = cost
if min_change_result < cost + this_cost:
min_change_result = cost + this_cost
next member
return (False, min_same_result, min_change_result)
And with that, we can build a binary search for the max_cost_per_member.
lower = 0
upper = time for first member to do everything
while lower < upper:
mid = (lower + upper)/2
(success, min_same_result, min_change_result) = try_time_period(members, costs, mid)
if success:
upper = min_same_result
else:
lower = min_change_result
And they will converge at the lowest time for completing everything from your assignments. Now you just work out the previous solution for that, and you're done.

Having a hard time to understand for loops and nested for loops [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
So, I understand programming very well,... but recently I came across with for loops and especially the nested ones.. I simply can't understand. It simply won't enter in my head. Can anyone give me some tips on how to perceive these loops better (or... at all)?
Thanks in advance
A program describes a sequence of operations for the computer to perform. Among those operations may be some subsequences that the computer should repeat multiple times. Instead of literally repeating these subsequences the appropriate number of times in the program source code (which in many cases is impossible), you can use a construct that tells the computer that at the end of such a subsequence it should return or 'loop' back to the beginning of that subsequence. These types of constructs are conventionally called "loops".
In some cases, a repeating subsequence of operations itself contains a subsequence of operations that should be repeated multiple times as part of performing the one iteration of the containing sequence. That, too, can be represented via a loop construct.
Example: algorithm for cleaning the windows in my house
Get cleaning supplies from the closet
If there are no more dirty windows then stop. Else,
Go to the next dirty window.
Spray cleaner
Wipe window
If it's not clean enough then go back to step 3.1
Go back to step 2.
That has two loops: an outer one comprising every step except the first, and an inner one comprising steps 3.1 through 3.3.
Often, there is some kind of initialization or starting state that the must be reached before starting the loop. In the example, I must have my cleaning supplies at hand before I can actually clean any windows, and I want to start at the first window.
In most interesting cases, you don't know in advance how many times the program will need to run through a given loop. In the example, for instance, I might be able to predict the number of iterations of the outer loop as the number of windows in my house, but I cannot be certain how many iterations of the inner loop will be needed for any given window. Looping constructs handle this by providing flexible conditions for loop termination.
On the other hand, something has to change from iteration to iteration, else the repetition will never stop. In the simplest case, the thing that changes to trigger eventual break from the loop is (abstractly) the number of loop iterations that have been performed already. Often, though, we want a more flexible measure of whether any more iterations are needed, such as "is the window clean enough yet?"
A C/Java-style for loop formalizes those three elements: initialization (getting the supplies), termination condition (are there any more dirty windows?), and update (go to the next window). The initialization step is performed once, before the first iteration. The termination condition is tested before each iteration (and the loop terminates if it evaluates to false), and the update step is performed after each iteration, before testing the termination condition for the next iteration. When the loop terminates normally, the computer next executes the statement immediately after the loop body.
To continue the silly example:
for (
int window_number = 0;
window_number < TOTAL_NUMBER_OF_WINDOWS;
window_number = window_number + 1) {
Window currentWindow = windows[window_number];
do {
cleaner.spray(currentWindow);
cloth.wipe(currentWindow);
} while (currentWindow.isDirty());
}
In this case I represented the inner loop with a different loop construct (do { ... } while) because it fits more naturally with the facts that there is no initialization step required, I don't need to test for termination before the first iteration, and the update step is performed within the body of the loop. Since it wouldn't actually be harmful to test the termination condition before the first iteration, however, I can write the inner loop as a for loop, too. I just leave the parts I don't need blank (but I always need the two semicolon separators):
for (
int window_number = 0;
window_number < TOTAL_NUMBER_OF_WINDOWS;
window_number = window_number + 1) {
Window currentWindow = windows[window_number];
for (
/* no initialization */ ;
currentWindow.isDirty();
/* no (additional) update */) {
cleaner.spray(currentWindow);
cloth.wipe(currentWindow);
}
}
And that's most of what you need to know about loops in general and for loops in particular.
When we place one loop inside the body of another loop is called nested loop. And the outer loop will take control of the number of complete repetitions of the inner loop meaning the inner loop in the below example will run at least 10 times due to the condition a<10.
In the below example "Print B" will appear 200 times i.e. 20 * 10. The outer loop A will run inner loop B 10 times. And since inner loop B is configured to run 20 times the total number of times Print B will appear is 200.
// Loop: A
for(int a=0;a< 10;a++) {
// Loop: B
for(int b=1;b<20;b++) {
System.out.println("Print B");
}
}
There are many different types of for loops, but all behave similarly.
The basic idea of a for loop is that the code inside the for loop block will iterate for as long as the iterator is within a certain range.
i.e.
for(int i = 0; i < 10; i++)
{
int x = i;
}
In the code here (C++) the iterator is i, and the code block is int x = i. This means that the code block will be executed from i = 0, to i = 9, each time setting x to i and then increasing the value of i by 1.
Here you can see another description: C++ For Loops
And if you are working in Java: Java For Loops
Nested for loops work the same way, the only difference is that for each iteration of the outer loop you iterate completely the inner loop.
i.e.
for(int i = 0; i < 3; i++)
{
for(int j = 0; j < 5; i++)
{
int x = j;
}
}
Here you see that each time you execute the code inside the first for loop, you will execute the code inside the inner for loop to completion, or until j equals 5. Then you iterate the outer loop and run it again.
Hope this helps.

Verilog: Minimal (hardware) algorithm for multiplying a binary input to its delayed form

I have a binary input in (1 bit serial input) which I want to delay by M clock pulses and then multiply (AND) the 2 signals. In other words, I want to evaluate the sum:
sum(in[n]*in[n+M])
where n is expressed in terms of number of clock pulses.
The most straightforward way is to store in a memory buffer in_dly the latest M samples of in. In Verilog, this would be something like:
always #(posedge clock ...)
...
in_dly[M-1:0] <= {in_dly[M-2:0], in};
if (in_dly[M-1] & in)
sum <= sum + 'd1;
...
While this works in theory, with large values of M (can be ~2000), the size of the buffer is not practical. However, I was thinking to take advantage of the fact that the input signal is 1 bit and it is expected to toggle only a few times (~1-10) during M samples.
This made me think of storing the toggle times from 2k*M to (2k+1)*M in an array a and from (2k+1)*M to (2k+2)*M in an array b (k is just an integer used to generalize the idea):
reg [10:0] a[0:9]; //2^11 > max(M)=2000 and "a" has max 10 elements
reg [10:0] b[0:9]; //same as "a"
Therefore, during M samples, in = 'b1 during intervals [a[1],a[2]], [a[3],a[4]], etc. Similarly, during the next M samples, the input is high during [b[1],b[2]], [b[3],b[4]], etc. Now, the sum is the "overlapping" of these intervals:
min(b[2],a[2])-max(b[1],a[1]), if b[2]>a[1] and b[1]<a[2]; 0 otherwise
Finally, the array b becomes the new array a and the next M samples are evaluated and stored into b. The process is repeated until the end of in.
Comparing this "optimized" method to the initial one, there is a significant gain in hardware: initially 2000 bits were stored, and now 220 bits are stored (for this example). However, the number is still large and not very practical..
I would greatly appreciate if somebody could suggest a more optimal (hardware-wise) way or a simpler way (algorithm-wise) of doing this operation. Thank you in advance!
Edit:
Thanks to Alexey's idea, I optimized the algorithm as follows:
Given a set of delays M[i] for i=1 to 10 with M[1]<M[2]<..<M[10], and an input binary array in, we need to compute the outputs:
y[i] = sum(in[n]*in[n+M[i]]) for n=1 to length(in).
We then define 2 empty arrays a[j] and b[j] with j=1,~5. Whenever in has a 0->1 transition, the smallest index empty element a[j] is "activated" and will increment at each clock cycle. Same goes for b[j] at 1->0 transitions. Basically, the pairs (a[j],b[j]) represent the portions of in equal to 1.
Whenever a[j] equals M[i], the sum y[i] will increment by 1 at each cycle while in = 1, until b[j] equals M[i]. Once a[j] equals M[10], a[j] is cleared. Same goes for b[j]. This is repeated until the end of in.
Based on the same numerical assumptions as the initial question, a total of 10 arrays (a and b) of 11 bits allow the computation of the 10 sums, corresponding to 10 different delays M[i]. This is almost 20 times better (in terms of resources used) than my initial approach. Any further optimization or idea is welcomed!
Try this:
make array A,
every time when in==1 get free A element and write M to it.
every clock decrement all non-zero A elements,
once any decremented element becomes zero, test in, if in==1 - sum++.
Edit: algorithm above intended for input like
- 00000000000010000000100010000000, while LLDinu realy needs
- 11111111111110000000011111000000, so here is modified algorithm:
make array (ring buffer) A,
every time when in toggles, get free A element and write M to it.
every clock decrement all non-zero A elements,
every clock test in, if in==1 and number of non-zero A elements is even - sum++.

Optimized algorithm to schedule tasks with dependency?

There are tasks that read from a file, do some processing and write to a file. These tasks are to be scheduled based on the dependency. Also tasks can be run in parallel, so the algorithm needs to be optimized to run dependent tasks in serial and as much as possible in parallel.
eg:
A -> B
A -> C
B -> D
E -> F
So one way to run this would be run
1, 2 & 4 in parallel. Followed by 3.
Another way could be
run 1 and then run 2, 3 & 4 in parallel.
Another could be run 1 and 3 in serial, 2 and 4 in parallel.
Any ideas?
Let each task (e.g. A,B,...) be nodes in a directed acyclic graph and define the arcs between the nodes based on your 1,2,....
You can then topologically order your graph (or use a search based method like BFS). In your example, C<-A->B->D and E->F so, A & E have depth of 0 and need to be run first. Then you can run F,B and C in parallel followed by D.
Also, take a look at PERT.
Update:
How do you know whether B has a higher priority than F?
This is the intuition behind the topological sort used to find the ordering.
It first finds the root (no incoming edges) nodes (since one must exist in a DAG). In your case, that's A & E. This settles the first round of jobs which needs to be completed. Next, the children of the root nodes (B,C and F) need to be finished. This is easily obtained by querying your graph. The process is then repeated till there are no nodes (jobs) to be found (finished).
Given a mapping between items, and items they depend on, a topological sort orders items so that no item precedes an item it depends upon.
This Rosetta code task has a solution in Python which can tell you which items are available to be processed in parallel.
Given your input the code becomes:
try:
from functools import reduce
except:
pass
data = { # From: http://stackoverflow.com/questions/18314250/optimized-algorithm-to-schedule-tasks-with-dependency
# This <- This (Reverse of how shown in question)
'B': set(['A']),
'C': set(['A']),
'D': set(['B']),
'F': set(['E']),
}
def toposort2(data):
for k, v in data.items():
v.discard(k) # Ignore self dependencies
extra_items_in_deps = reduce(set.union, data.values()) - set(data.keys())
data.update({item:set() for item in extra_items_in_deps})
while True:
ordered = set(item for item,dep in data.items() if not dep)
if not ordered:
break
yield ' '.join(sorted(ordered))
data = {item: (dep - ordered) for item,dep in data.items()
if item not in ordered}
assert not data, "A cyclic dependency exists amongst %r" % data
print ('\n'.join( toposort2(data) ))
Which then generates this output:
A E
B C F
D
Items on one line of the output could be processed in any sub-order or, indeed, in parallel; just so long as all items of a higher line are processed before items of following lines to preserve the dependencies.
Your tasks are an oriented graph with (hopefully) no cycles.
I contains sources and wells (sources being tasks that don't depends (have no inbound edge), wells being tasks that unlock no task (no outbound edge)).
A simple solution would be to give priority to your tasks based on their usefulness (lets call that U.
Typically, starting by the wells, they have a usefulness U = 1, because we want them to finish.
Put all the wells' predecessors in a list L of currently being assessed node.
Then, taking each node in L, it's U value is the sum of the U values of the nodes that depends on him + 1. Put all parents of the current node in the L list.
Loop until all nodes have been treated.
Then, start the task that can be started and have the biggest U value, because it is the one that will unlock the largest number of tasks.
In your example,
U(C) = U(D) = U(F) = 1
U(B) = U(E) = 2
U(A) = 4
Meaning you'll start A first with E if possible, then B and C (if possible), then D and F
first generate a topological ordering of your tasks. check for cycles at this stage. thereafter you can exploit parallelism by looking at maximal antichains. roughly speaking these are task sets without dependencies between their elements.
for a theoretical perspective, this paper covers the topic.
Without considering the serial/parallel aspect of the problem, this code can at least determine the overall serial solution:
def order_tasks(num_tasks, task_pair_list):
task_deps= []
#initialize the list
for i in range(0, num_tasks):
task_deps[i] = {}
#store the dependencies
for pair in task_pair_list:
task = pair.task
dep = pair.dependency
task_deps[task].update({dep:1})
#loop through list to determine order
while(len(task_pair_list) > 0):
delete_task = None
#find a task with no dependencies
for task in task_deps:
if len(task_deps[task]) == 0:
delete_task = task
print task
task_deps.pop(task)
break
if delete_task == None:
return -1
#check each task's hash of dependencies for delete_task
for task in task_deps:
if delete_key in task_deps[task]:
del task_deps[task][delete_key]
return 0
If you update the loop that checks for dependencies that have been fully satisfied to loop through the entire list and execute/remove tasks that no longer have any dependencies all at the same time, that should also allow you to take advantage of completing the tasks in parallel.

How to design a data structure that allows one to search, insert and delete an integer X in O(1) time

Here is an exercise (3-15) in the book "Algorithm Design Manual".
Design a data structure that allows one to search, insert, and delete an integer X in O(1) time (i.e. , constant time, independent of the total number of integers stored). Assume that 1 ≤ X ≤ n and that there are m + n units of space available, where m is the maximum number of integers that can be in the table at any one time. (Hint: use two arrays A[1..n] and B[1..m].) You are not allowed to initialize either A or B, as that would take O(m) or O(n) operations. This means the arrays are full of random garbage to begin with, so you must be very careful.
I am not really seeking for the answer, because I don't even understand what this exercise asks.
From the first sentence:
Design a data structure that allows one to search, insert, and delete an integer X in O(1) time
I can easily design a data structure like that. For example:
Because 1 <= X <= n, so I just have an bit vector of n slots, and let X be the index of the array, when insert, e.g., 5, then a[5] = 1; when delete, e.g., 5, then a[5] = 0; when search, e.g.,5, then I can simply return a[5], right?
I know this exercise is harder than I imagine, but what's the key point of this question?
You are basically implementing a multiset with bounded size, both in number of elements (#elements <= m), and valid range for elements (1 <= elementValue <= n).
Search: myCollection.search(x) --> return True if x inside, else False
Insert: myCollection.insert(x) --> add exactly one x to collection
Delete: myCollection.delete(x) --> remove exactly one x from collection
Consider what happens if you try to store 5 twice, e.g.
myCollection.insert(5)
myCollection.insert(5)
That is why you cannot use a bit vector. But it says "units" of space, so the elaboration of your method would be to keep a tally of each element. For example you might have [_,_,_,_,1,_,...] then [_,_,_,_,2,_,...].
Why doesn't this work however? It seems to work just fine for example if you insert 5 then delete 5... but what happens if you do .search(5) on an uninitialized array? You are specifically told you cannot initialize it, so you have no way to tell if the value you'll find in that piece of memory e.g. 24753 actually means "there are 24753 instances of 5" or if it's garbage.
NOTE: You must allow yourself O(1) initialization space, or the problem cannot be solved. (Otherwise a .search() would not be able to distinguish the random garbage in your memory from actual data, because you could always come up with random garbage which looked like actual data.) For example you might consider having a boolean which means "I have begun using my memory" which you initialize to False, and set to True the moment you start writing to your m words of memory.
If you'd like a full solution, you can hover over the grey block to reveal the one I came up with. It's only a few lines of code, but the proofs are a bit longer:
SPOILER: FULL SOLUTION
Setup:
Use N words as a dispatch table: locationOfCounts[i] is an array of size N, with values in the range location=[0,M]. This is the location where the count of i would be stored, but we can only trust this value if we can prove it is not garbage. >!
(sidenote: This is equivalent to an array of pointers, but an array of pointers exposes you being able to look up garbage, so you'd have to code that implementation with pointer-range checks.)
To find out how many is there are in the collection, you can look up the value counts[loc] from above. We use M words as the counts themselves: counts is an array of size N, with two values per element. The first value is the number this represents, and the second value is the count of that number (in the range [1,m]). For example a value of (5,2) would mean that there are 2 instances of the number 5 stored in the collection.
(M words is enough space for all the counts. Proof: We know there can never be more than M elements, therefore the worst-case is we have M counts of value=1. QED)
(We also choose to only keep track of counts >= 1, otherwise we would not have enough memory.)
Use a number called numberOfCountsStored that IS initialized to 0 but is updated whenever the number of item types changes. For example, this number would be 0 for {}, 1 for {5:[1 times]}, 1 for {5:[2 times]}, and 2 for {5:[2 times],6:[4 times]}.
                          1  2  3  4  5  6  7  8...
locationOfCounts[<N]: [☠, ☠, ☠, ☠, ☠, 0, 1, ☠, ...]
counts[<M]:           [(5,⨯2), (6,⨯4), ☠, ☠, ☠, ☠, ☠, ☠, ☠, ☠..., ☠]
numberOfCountsStored:          2
Below we flush out the details of each operation and prove why it's correct:
Algorithm:
There are two main ideas: 1) we can never allow ourselves to read memory without verifying that is not garbage first, or if we do we must be able to prove that it was garbage, 2) we need to be able to prove in O(1) time that the piece of counter memory has been initialized, with only O(1) space. To go about this, the O(1) space we use is numberOfItemsStored. Each time we do an operation, we will go back to this number to prove that everything was correct (e.g. see ★ below). The representation invariant is that we will always store counts in counts going from left-to-right, so numberOfItemsStored will always be the maximum index of the array that is valid.
.search(e) -- Check locationsOfCounts[e]. We assume for now that the value is properly initialized and can be trusted. We proceed to check counts[loc], but first we check if counts[loc] has been initialized: it's initialized if 0<=loc<numberOfCountsStored (if not, the data is nonsensical so we return False). After checking that, we look up counts[loc] which gives us a number,count pair. If number!=e, we got here by following randomized garbage (nonsensical), so we return False (again as above)... but if indeed number==e, this proves that the count is correct (★proof: numberOfCountsStored is a witness that this particular counts[loc] is valid, and counts[loc].number is a witness that locationOfCounts[number] is valid, and thus our original lookup was not garbage.), so we would return True.
.insert(e) -- Perform the steps in .search(e). If it already exists, we only need to increment the count by 1. However if it doesn't exist, we must tack on a new entry to the right of the counts subarray. First we increment numberOfCountsStored to reflect the fact that this new count is valid: loc = numberOfCountsStored++. Then we tack on the new entry: counts[loc] = (e,⨯1). Finally we add a reference back to it in our dispatch table so we can look it up quickly locationOfCounts[e] = loc.
.delete(e) -- Perform the steps in .search(e). If it doesn't exist, throw an error. If the count is >= 2, all we need to do is decrement the count by 1. Otherwise the count is 1, and the trick here to ensure the whole numberOfCountsStored-counts[...] invariant (i.e. everything remains stored on the left part of counts) is to perform swaps. If deletion would get rid of the last element, we will have lost a counts pair, leaving a hole in our array: [countPair0, countPair1, _hole_, countPair2, countPair{numberOfItemsStored-1}, ☠, ☠, ☠..., ☠]. We swap this hole with the last countPair, decrement numberOfCountsStored to invalidate the hole, and update locationOfCounts[the_count_record_we_swapped.number] so it now points to the new location of the count record.
Here is an idea:
treat the array B[1..m] as a stack, and make a pointer p to point to the top of the stack (let p = 0 to indicate that no elements have been inserted into the data structure). Now, to insert an integer X, use the following procedure:
p++;
A[X] = p;
B[p] = X;
Searching should be pretty easy to see here (let X' be the integer you want to search for, then just check that 1 <= A[X'] <= p, and that B[A[X']] == X'). Deleting is trickier, but still constant time. The idea is to search for the element to confirm that it is there, then move something into its spot in B (a good choice is B[p]). Then update A to reflect the pointer value of the replacement element and pop off the top of the stack (e.g. set B[p] = -1 and decrement p).
It's easier to understand the question once you know the answer: an integer is in the set if A[X]<total_integers_stored && B[A[X]]==X.
The question is really asking if you can figure out how to create a data structure that is usable with a minimum of initialization.
I first saw the idea in Cameron's answer in Jon Bentley Programming Pearls.
The idea is pretty simple but it's not straightforward to see why the initial random values that may be on the uninitialized arrays does not matter. This link explains pretty well the insertion and search operations. Deletion is left as an exercise, but is answered by one of the commenters:
remove-member(i):
if not is-member(i): return
j = dense[n-1];
dense[sparse[i]] = j;
sparse[j] = sparse[i];
n = n - 1

Resources