I recently encountered the bakery algorithm in my studies and just need to clarify some things.
Would it be possible for the bakery algorithm to violate mutual exclusion if processes did not pick a ticket number larger than that of all existing tickets?
Is setting number[i] to zero after the critical section important for success in the absence of contention?
And is one of the reasons for the bakery algorithm not being used in practice because the process of finding the maximum value of an array is non-atomic? I thought this was not the case, as that isn't the correct reason for it.
Would it be possible for the bakery algorithm to violate mutual exclusion if processes did not pick a ticket number larger than that of all existing tickets?
It wouldn't violate mut. ex. as long as two or more different processes don't have the same number. But it would violate fairness since a process which later came to the critical section would be given precedence to enter it over some other process which has been waiting for more time. So, it's not critical, but it's also not ideal.
Is setting number[i] to zero after the critical section important for success in the absence of contention?
I don't think it's important. That reset serves the purpose to indicate that a process no longer wishes to enter the critical section. Not resetting its value may cause others to think the process wishes to enter the critical section, which may not be good, but I don't see it linked to a performance issue.
And is one of the reasons for the bakery algorithm not being used in practice because the process of finding the maximum value of an array is non-atomic? I thought this was not the case, as that isn't the correct reason for it.
I thought it certainly was, until I read that "as that isn't the correct reason for it." If you could share some more knowledge on this third point, I'd be thankful!
Related
I'm working on a blockchain based identity system. And, since each item will be in the chain forever, consuming space, I'm thinking on adding a proof of work requirement to add items to the chain.
At first I was thinking of bitcoin, since it's a tried and tested way to prove that the work was done, but doing it this way would prevent users from joining in, since bitcoin is not widely adapted yet. Also, in a distributed system, it is not clear who should get the money.
So, I'm looking for a proof of work algorithm, complexity of which can be easily adjusted based on blockchain growth speed, as well as something that would be hard to be re-used. Also, if complexity would had grown since the work has been started, the work should be able to be completed with adjusted complexity without having to be re-done.
Can someone suggest to me something that would work for my purpose, as well as would be resistant to GPU acceleration?
Simple... burn bitcoins. Anyone can do it - so there's no barrier to entry, and really what you need is "proof of destroyed value". Because the value is destroyed, you know the miner's incentives are to strengthen your chain.
Invent a bitcoin address that cannot be real, but checksums correctly. Then have your miners send to that burn address, with a public key in OP-return. Doing so earns them the right to mine during some narrow window of time.
"Difficulty" is adjusted by increasing the amount of bitcoins burned. Multiple burns in the same window can get reward shared, but only one block is elected correct (that with a checksum closest to the checksum of all of the valid burns for the window).
I've tried to reason and understand if the algorithm fails in these cases but can't seem to find an example where they would.
If they don't then why isn't any of these followed?
Yes.
Don't forget that in later rounds, leaders may be proposing different values than in earlier rounds. Therefore the first message may have the wrong value.
Furthermore messages may arrive reordered. (Consider a node that goes offline, then comes back online to find messages coming in random order.) The most recent message may not be the most recently sent message.
And finally, don't forget that leaders change. The faster an acceptor can be convinced that it is on the wrong leader, the better.
Rather than asking whether the algorithm fails in such a scenario consider that if each node sees different messages lost, delayed, or reordered, is it correct for a node to just accept the first it happens to recieve? Clearly the answer is no.
The algorithm is designed to work when "first" cannot be decided by looking at the timestamp on a message as clocks on different machines may be out of sync. The algorithm is designed to work when the network paths, distances and congestion, may be different between nodes. Nodes may crash and restart else hang and resume making things even more "hostile".
So a five node cluster could have all two nodes try to be leader and all three see a random ordering of which leaders message is "first". So what's the right answer in that case? The algorithm has a "correct" answer based on its rules which ensures a consistent outcome under all "hostile" comditions.
In summary the point of Paxos is that our logical mental model of "first" as a programmer is based on an assumption of a perfect set of clocks, machines and networks. That doesn't exist in the real world. To try to see if things break if you change the algorithm you need "attack" the message flow with all those things said above. You will likely find some way to "break" things under any change.
Section 3.3.6 of "The Part-Time Parliament" suggests that membership in the parliament (and thus the quorum for decisions) can be changed safely "by letting the membership of Parliament used in passing decree n be specified by the law as of decree n-3".
Translated into more common MultiPaxos terms, that means that the set of acceptors becomes part of the replicated state machine's state, changed by proposals to add or remove acceptors.
The quorum for slot N would be taken from the set of acceptors defined in the state when slot N-3 was decided.
Lamport offers no justification for this decision, and while his next paragraph says that changes must be handled with care and describes the ultimate failure of the algorithm, it fails for reasons unrelated to this particular issue.
Is this an adequate safeguard to ensure consistency? If so, what literature supports it?
I maintain a Paxos system that is a core component to several large web services. The system runs Basic Paxos, and not Multi-Paxos. In that system changes to the set of acceptors can be proposed like any other transition. The set of acceptors for a paxos instance N is the one that was approved in N-1.
I am unsure if any literature supports this, but it trivial to see that it works. Because Paxos guarantees consensus of the transition N-1, it is guaranteed that hosts agree on which can act as acceptors for transition N.
However, things get a little more complicated with Multi-Paxos and Raft--or any pipelined consensus algorithm. According to the Raft video lecture, this must be a two-phased approach, but I don't recall that he explains why.
On further reading of the Paxos slides for the raft user study linked by Michael, I see that my suggestion is close, but in fact every decision needs to be made in a view that is agreed on by all participants. If we choose that view to be that in effect at slot N-1, that limits the whole machine to lock-step: each slot can only be decided once the previous slot has been decided.
However, N-1 can be generalized to N-α, where Lamport sets α=3. As long as all participants agree on α, they agree on the view for each slot, which means that the rest of the algorithm's correctness holds.
This adds a fairly trivial amount of storage overhead, then: leaders must track the view for the most recent slot executed at the replica and the preceding α-1 slots. This is sufficient information to either determine the view for slot N (slot_views[N-α]) or know that the view is undefined (slot N-α or some previous slot is not yet decided) and thus ignore the proposal.
I'm working on an interactive job scheduling application. Given a set of resources with corresponding capacity/availabilty profiles, a set of jobs to be executed on these resources and a set of constraints that determine job sequence and earliest/latest start/end times for jobs I want to enable the user to manually move jobs around. Essentially I want the user to be able to "grab" a node of the job network and drag that forwards/backwards in time without violating any of the constraints.
The image shows a simple example configuration. The triangular job at the end denotes the latest finish time for all jobs, the connecting lines between jobs impose an order on the jobs and the gray/green bars denote resource availabilty and load.
You can drag any of the jobs to compress the schedule. Note that jobs will change in length due to different capacity profiles.
I have implemented an ad-hock algorithm that kinda works. However there are still cases where it'll fail and violate some constraints. However, since job-shop-scheduling is a well researched field with lots of algorithms and heuristics for finding an optimal (or rather good) solution to the general NP-hard problem - I'm thinking solutions ought to exist for my easier subset. I have looked into constraint programming topics and even physics based solutions (rigid bodies connected via static joints) but so far couldn't find anything suitable. Any pointers/hints/tips/search key words for me?
I highly recommend you take a look at Mozart Oz, if your problem
deals only with integers. Oz has excellent support for finite domain
constraint specification, inference, and optimization. In your case
typically you would do the following:
Specify your constraints in a declarative manner. In this, you would
specify all the variables and their domains (say V1: 1#100, means
V1 variable can take values in the range of 1--100). Some variables
might have values directly, say V1: 99. In addition you would specify
all the constraints on the variables.
Ask the system for a solution: either any solution which satisfies
the constraints or an optimal solution. Then you would display this
solution on the UI.
Lets say the user changes the value of a variable, may be the start
time of a task. Now you can go to step 1 to post the problem to the
Oz solver. This time, solving the problem most probably will not take
as much time as earlier, as all the variables are already instantiated.
It may be the case that the user chose a inconsistent value. In that
case, the solver returns null. Then, you can take the UI to the earlier
solution.
If Oz suits your need and you like the language, then you may want to
write a constraint solver as a server which listens on a socket. This way,
you can keep the constraint solver separate from the rest of your code,
including the UI.
Hope this helps.
I would vote in favor of constraint programming for several reasons:
1) CP will quickly tell you if there is no schedule that satifies your constraints
2) It would appear that you want to give you users a feasible solution to start with but
allow them to manipulate jobs in order to improve the solution. CP is good at this too.
3) An MILP approach is usually complex and hard to formulate and you have to artificially create an objective function.
4) CP is not that difficult to learn especially for experienced programmers - it really comes more from the computer science community than the operations researchers (like me).
Good luck.
You could probably alter the Waltz constraint propagation algorithm to deal with changing constraints to quickly find out if a given solution is valid. I don't have a reference to hand, but this might point you in the right direction:
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TYF-41C30BN-5&_user=809099&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1102030809&_rerunOrigin=google&_acct=C000043939&_version=1&_urlVersion=0&_userid=809099&md5=696143716f0d363581a1805b34ae32d9
Have you considered using an Integer Linear Programming engine (like lp_solve)? It's quite a good fit for scheduling applications.
This question is about a whole class of similar problems, but I'll ask it as a concrete example.
I have a server with a file system whose contents fluctuate. I need to monitor the available space on this file system to ensure that it doesn't fill up. For the sake of argument, let's suppose that if it fills up, the server goes down.
It doesn't really matter what it is -- it might, for example, be a queue of "work".
During "normal" operation, the available space varies within "normal" limits, but there may be pathologies:
Some other (possibly external)
component that adds work may run out
of control
Some component that removes work seizes up, but remains undetected
The statistical characteristics of the process are basically unknown.
What I'm looking for is an algorithm that takes, as input, timed periodic measurements of the available space (alternative suggestions for input are welcome), and produces as output, an alarm when things are "abnormal" and the file system is "likely to fill up". It is obviously important to avoid false negatives, but almost as important to avoid false positives, to avoid numbing the brain of the sysadmin who gets the alarm.
I appreciate that there are alternative solutions like throwing more storage space at the underlying problem, but I have actually experienced instances where 1000 times wasn't enough.
Algorithms which consider stored historical measurements are fine, although on-the-fly algorithms which minimise the amount of historic data are preferred.
I have accepted Frank's answer, and am now going back to the drawing-board to study his references in depth.
There are three cases, I think, of interest, not in order:
The "Harrods' Sale has just started" scenario: a peak of activity that at one-second resolution is "off the dial", but doesn't represent a real danger of resource depletion;
The "Global Warming" scenario: needing to plan for (relatively) stable growth; and
The "Google is sending me an unsolicited copy of The Index" scenario: this will deplete all my resources in relatively short order unless I do something to stop it.
It's the last one that's (I think) most interesting, and challenging, from a sysadmin's point of view..
If it is actually related to a queue of work, then queueing theory may be the best route to an answer.
For the general case you could perhaps attempt a (multiple?) linear regression on the historical data, to detect if there is a statistically significant rising trend in the resource usage that is likely to lead to problems if it continues (you may also be able to predict how long it must continue to lead to problems with this technique - just set a threshold for 'problem' and use the slope of the trend to determine how long it will take). You would have to play around with this and with the variables you collect though, to see if there is any statistically significant relationship that you can discover in the first place.
Although it covers a completely different topic (global warming), I've found tamino's blog (tamino.wordpress.com) to be a very good resource on statistical analysis of data that is full of knowns and unknowns. For example, see this post.
edit: as per my comment I think the problem is somewhat analogous to the GW problem. You have short term bursts of activity which average out to zero, and long term trends superimposed that you are interested in. Also there is probably more than one long term trend, and it changes from time to time. Tamino describes a technique which may be suitable for this, but unfortunately I cannot find the post I'm thinking of. It involves sliding regressions along the data (imagine multiple lines fitted to noisy data), and letting the data pick the inflection points. If you could do this then you could perhaps identify a significant change in the trend. Unfortunately it may only be identifiable after the fact, as you may need to accumulate a lot of data to get significance. But it might still be in time to head off resource depletion. At least it may give you a robust way to determine what kind of safety margin and resources in reserve you need in future.