Leader election process with Timestamps - algorithm

trying to create a leader election protocol with my specifications but i fail. Let me describe more detail.
Let's imagine we have 5 nodes A,B,C,D,E and let's also assume that all of them compete to solve a puzzle something like PoW but for our scenario is irrelevant. Let's also assume that both C,D solve simultaneously the puzzle and send the proofs for verifications in the other nodes.
A[C2,D3]
B[C4,D2]
E[C3,D6]
Nodes A,B,E receives messages from C,D. For example node A receives a message from C at timestamp 2 and from D at timestamp 3 it also verifies proofs that C,D solves the puzzle successfully.
What i am searching for now is to pick the fastest node from the winners C,D and recognize him as the leader of the protocol. As a result, they all exchange their messages and they average the times to find the winner node with the smallest time. if randomly pick node B it will calculate the time for C [4+2+3]/3=3, D[3+2+6]/3=3,6.
Hence every node will calculate time 3 for C and time 3,6 D and finally, they choose C with the lowest timestamp as a leader.
Am i correct until now does this process has logic?
My big problem now is what happened if B node is malicious and try to trick the protocol and send different values to A,E this will confuse all node so they cannot reach a consensus. How can we fix this and pass this step?
Can anyone give me any idea?

Falsehoods programmers believe about time applies. Particularly the bits about trusting the system clock.
The first good algorithm for leader election was Paxos. A number of others have emerged since. See here for a partial list.
Getting distributed programming right is hard. In the past decade, Jepsen has analyzed a lot of systems. In the overwhelming majority of cases the guarantees that the authors thought they could validly give about how their systems would fail were wrong. You shouldn't expect to beat that average. Therefore I strongly recommend that you delegate leader election to proven software such as Zookeeper rather than trying to roll your own implementation.

Related

Algorithm to form teams of K members for each of M team leaders

I have two distinct sets of N workers and M team leaders. I have to form teams (of size K) chosen from N, where they are allocated to one of the M team leader based on some metrics, i.e. I am going to assign a score to each pair (worker, team leader) and I'd like to maximize the scores of the teams.
It's been at least 10 years since my last implementation of an optimization algorithm (and more than 15 years since my last Operational Research exam at Uni), so my apologies if my description is not as clear as it should be.
I am searching for the best algorithm to provide a solution (that can also be sub-optimal), and I have found suggestions about the Hungarian Algorithm, but I understand that works only where K=2 (so each team leader will only manage a worker)
What about K-Means clustering? (if I can think about the teams as clusters, minimize the distance of each worker from the best-suited team leader, then choose the top K workers for each cluster)
Also, some algorithms I found (i.e. hospitals/residents, based on the stable marriages) seem to be based on "preferences" listed by the workers to be allocated, while here I will have a measure of the affinity between worker and TL. I was also thinking about using the Jaccard index for it but again, I feel like I am doing a patchwork of something I don't really master, and sadly I don't have enough time to study it.
Any other suggestion please?
Also if you could keep the answer language-agnostic please!
Thanks
Vincenzo

BFT and PBFT and BA consensus algorithm

I've been digging into some of the most used consensus algorithms in permissioned blockchains and I got stuck.
I understood that BFT (Byzantine Fault Tolerance) is a property of some algorithms and pBFT is an algorithm itself. Is that right?
This rule that 2/3 of the nodes in the network are enough to reach consensus, is it for all BFT algorithms or just pBFT?
Also what is the difference between Byzantine Agreement and BFT?
If you could provide a reliable source of information, I'd be thankful.
I understood that BFT (Byzantine Fault Tolerance) is a property of some algorithms and pBFT is an algorithm itself. Is that right?
Yes.
This rule that 2/3 of the nodes in the network are enough to reach consensus, is it for all BFT algorithms or just pBFT?
Algorithms for Byzantine agreement can tolerate at most f failures in 3f+1 nodes, but they may not even be able to tolerate that many. The reasoning is that, if the Byzantine nodes stop participating, then n-f nodes have to be able to reach consensus, but if message delays temporarily hide f good nodes, then the remaining good nodes should be in the majority (so n-f >= 2f+1, and n >= 3f+1).
Also what is the difference between Byzantine Agreement and BFT?
The former is a distributed computing problem, more often referred to as consensus. The latter is a property of a protocol.

Prepare a schedule so that all courses are taught in the least time

I encountered one interview question:
There are some professors, some courses, and some students.
Each professor can teach only a single course.
Each course has a fixed duration(Eg. 10 weeks).
For each professor, you are given time availability schedule(assume week wise).
Each student has a list of courses he wants to learn.
There can be only 1:1 classes, i.e., 1 professor can teach only a single student.
A student can attend only one course at a time.
A professor has to finish teaching a course in a one go.
Your aim is to prepare a schedule so that all courses are taught in the least time.
My Approach: I mentioned that this will be solved via graph theory.Like make a directed edge from teacher to course or teacher to student.But I was not able to solve it completely .
Is my approach correct or is it DP problem?
Pseudocode or Algorithm suggestions?
The problem you were asked is a schedulling problem, which is a dynamic programming problem. In particular, your problem is what is usally called FJm|brkdwn,pj=10|Cmax, wich can be traduced as follow:
There are m machines (the professors) that can process a part of a job (here, a job is the full teaching of a student) independently and in whatever order. Some machines may process the same part of a job (the same course)
machines are not continuously available
the duration of a part of a job (a course) is 10 weeks
you want to minimize the time completion of all jobs
There exist solvers that are well optimized for schedulling problems, but I am not sure if to model your problem as a scheduling problem and to process it through a shedulling problem solver is what was intended by your job interviewer.
This is similar to the m-coloring problem. Except here we are asked to return the minimum m. Unfortunately, it's an NP-hard problem.
For the given problem, consider a course as a vertex and edge b/w 2 vertexes if a common student exists or professor is the same.
Now first, find the upper bound of m (minimum colors required) using Welsh–Powell Algorithm and then we can do a binary search to find which is the smallest value of m for which we are able to color all the vertex (with no 2 adjacent vertexes with the same color) using Graph Coloring

network flow approach for maximizing number of jobs that can be scheduled

I'm curious to lean the network-flow approach to solve this problem. Hope someone here can take time to help me construct an appropriate and suitable graph for this problem. The constructed graph, when solved for maximum flow should result in Job-Machine assignments maximizing number of jobs for given number of machines & job-schedule.
Given m machines and n jobs, with constraint m≤n. Use network flow algorithm to solve assignments for maximizing number of jobs with given number of machines.
Each job Ji has a start-time Si and Finish-time Fi. All machines are identical and can take at-most one job at a time. we have to find an assignment such that we can schedule maximum number of jobs.
Approach I've tried:
-> jobs and machines forms the nodes in the graph.
->An edge from source to all Job nodes.
->An edge to terminal node from all machines.
->An intermediary node for each job node, which has incoming edges from each overlapping job node.
and stuck here how to proceed further.
I've worked out a solution by greedy approach, I'm curious to learn network-flow approach.
P.S: I've worked out a solution by greedy approach. Asked the same question and was shot down with down votes without any explanation Hence re-asking as the previous question is not gaining any attention due to down votes.
How about this approach? I presume that you are familiar with the Circulation with Demand Problem.
Consider each job Ji to be a node, that has edge to Job Jj if Jj can be done after Ji, and Ji and Jj do not overlap in any way. Now consider a node for every machine also, and name it Mi. Now in this model, each Ji node has a demand of -1 and each machine has a demand of 0. Also add a node t, with demand n, and connect every Machine node m to it with capacity of n. every other edge has a capacity of 1.
Now solve this with circulation with demand and I think you will get your answer.

Is there a non-deterministic leader election algorithm?

I was wondering whether a non-deterministic leader election algorithm exists (in a one directional ring) that ensures termination. I cannot think of one nor can I find one that is non-deterministic. Some that I've found are:
Select the node with the highest process ID to be the leader (deterministic) and terminates.
Randomly decide whether a node wants to be a leader, if in the ring there is another node that wants to be a leade, restart the whole process. This one does not terminate, but has a probability of termination.
Can someone either give me some hints on how to create a distributed non-deteministic leade election algorithm? And maybe explain it in layman terms.
Thanks for everything!
There exists no probabilistic (anonymous) leader election algorithm with both, a guaranteed termination and a guaranteed correctness (only one leader). If I remember right, you will find a proof in N. Lynch's book on Distributed Systems.
However, there exists algorithms with a probability limit of zero for non-termination for a sufficient long runtime of the algorithm. In addition, the expected runtime is rather short (AFIR, in the order of ln(k) for k initiators).
The main idea for such an algorithm follows your second approach. However, do not simple restart the process in case of several leaders, but only allow the winners of the last round to become candidates in the next round.
EDIT 1-3:
If you asking for a non-anonymous leader election, there are several probabilistic algorithms that guarantees termination. E.g., take a normal ring algorithm and delay messages with with a certain probability, as smaller the ID as greater the chance for delay. In this way, messages with low chance of winning are erased earlier, resulting in less overall messages.
If you want to have a different outcome for non-anonymous members, you can e.g. use a two phase algorithm:
Perform a classical leader election => a nodes A with the highest ID wins
Let A roll a dice to determine the actual leader.
The element of fortuity could also be distributed:
Any nodes knows the set of identities (S) (if not, use a flooding algorithm to tell)
All nodes select by chance an ID of ouf S and send it to any other node
The ID that is named most often, wins. If there is more than one such ID, select one of them in a deterministic way, e.g., the median.
Termination and a non-deterministic outcome are granted for both alogrithm. However, the first has a lower average message complexity (n log n vs. n² ; the worst case complexity is the same) and is more fair (i.e., the probability that a ID wins is equaly distributed, what is not true for the second algorithm). I'm pretty sure, that at least the last disadvantage can eliminated by a more sophisticated algorithm, but the question here was for the general existence of such an algorithm.
From what I understand from the question, you're not actually looking for an election algorithm, just a distributed algorithm to fairly and randomly choose one client as "leader," where no subset of clients can work together to cheat.
This is actually pretty easy. Treat each client as a card in a deck of cards, then use the mental poker algorithm (which is a distributed algorithm to fairly and randomly shuffle a deck of cards) to shuffle it. Then just take the first card as the leader.

Resources