Which Data structure is suitable queue or stack? - data-structures

I am given a task to develop an application that will manage the appointment of patients coming for treatment. There can be two types of patients called normal patients and emergency patients. The system will take information of every patient and decide the turn of patient on the basis of type of patient.
If the patient will be from normal category, it will take appointment on the basis of arrival while the emergency patient will take appointment early than normal patients. All normal and emergency patients will be added into same system but appointment will be given differently. Every emergency patient will get appointment immediately.
Which data structure from Stack and Queue will be most efficient choice for the development of required application.
Note: You are not allowed to use any other data structures like priority queue and double ended queue
According to me, queue is a better option for calling normal patients because this follows FIFO. But how to deal with emergency patients without using priority queue

Are you prohibited from using two queues ? If not use two one for emergency patients the other for regular patients.

Which data structure from Stack and Queue will be most efficient choice for the development of required application.
A stack won't be of much help since that's a LIFO ("last in, first out") container. You should use a queue.
For the simple case, you could use two separate regular FIFO ("first in, first out") queues. When it's time to pick a patient:
If the emergency queue is not empty, pick one from that queue.
otherwise pick from the other queue.
A more generic solution is to use a priority queue.
Upon initial inspection of a new patient, you assign a priority to the patient, which can be a number from 0-100 for example. The patient is then placed in the priority queue, which automatically puts the patient before all other patiens with a lower priority.
When it's time to pick a patient you will always pick the first in line since the patients are lined up in the queue based on the priority of the matter.
A second sorting parameter for the priority queue could be the time of arrival, so that two patients that have been assigned the same priority are ordered according to when they came to seek help.

Related

Approach to measuring end-to-end latency from a sales transaction to a stock level in a database

I have a system in which sales transactions are written to a Kafka topic in real time. One of the consumers of this data is an aggregator program which maintains a database of stock quantities for all locations, in real time; it will consume data from multiple other sources as well. For example, when a product is sold from a store, the aggregator will reduce the quantity of that product in that store by the quantity sold.
This aggregator's database will be presented via an API to allow applications to check stock availability (the inventory) in any store in real time.
(Note for context - yes, there is an ERP behind all this which does a lot more; the purpose of this inventory API is to consume data from multiple sources, including the ERP and the ERP's data feeds, and potentially other ERPs in future, to give a single global information source for this singular purpose).
What I want to do is to measure the end-to-end latency: how long it takes from a sales transaction being written to the topic, to being processed by the aggregator (not just read from the topic). This will give an indicator of how far behind real-time the inventory database is.
The sales transaction topic will probably be partitioned, so the transactions may not arrive in order.
So far I have thought of two methods.
Method 1 - measure latency via stock level changes
Here, the sales producer injects a special "measurement" sale each minute, for an invalid location like "SKU 0 in branch 0". The sale quantity would be based on the time of day, using a numerical sequence of some kind. A program would then poll the inventory API, or directly read the database, to check for changes in the level. When it changes, the magnitude of the change will indicate the time of the originating transaction. The difference between then and now gives us the latency.
Problem: If multiple transactions are queued and are then later all processed together, the change in inventory value will be the sum of the queued transactions, giving a false reading.
To solve this, the sequence of numbers would have to be chosen such that when they are added together, we can always determine which was the lowest number, giving us the oldest transaction and therefore the latency.
We could use powers of 2 for this, so the lowest bit set would indicate the earliest transaction time. Our sequence would have to reset every 30 or 60 minutes and we'd have to cope with wraparound and lots of edge cases.
Assuming we can solve the wraparound problem and that a maximum measurable latency of, say, 20 minutes is OK (after which we just say it's "too high"), then with this method, it does not matter whether transactions are processed out of sequence or split into partitions.
This method gives a "true" picture of the end-to-end latency, in that it's measuring the point at which the database has actually been updated.
Method 2 - measure latency via special timestamp record
Instead of injecting measurement sales records, we use a timestamp which the producer is adding to the raw data. This timestamp is just the time at which the producer transmitted this record.
The aggregator would maintain a measurement of the most recently seen timestamp. The difference between that and the current time would give the latency.
Problem: If transactions are not processed in order, the latency measurement will be unstable, because it relies on the timestamps arriving in sequence.
To solve this, the aggregator would not just output the last timestamp it saw, but instead would output the oldest timestamp it had seen in the past minute across all of its threads (assuming multiple threads potentially reading from multiple partitions). This would give a less "lumpy" view.
This method gives an approximate picture of the end-to-end latency, since it's measuring the point at which the aggregator receives the sales record, not the point at which the database has been updated.
The questions
Is one method more likely to get usable results than the other?
For method 1, is there a sequence of numbers which would be more efficient than powers of 2 in allowing us to work out the earliest value when multiple ones arrive at once, requiring fewer bits so that the time before sequence reset would be longer?
Would method 1 have the same problem of "lumpy" data as method 2, in the case of a large number of partitions or data arriving out of order?
Given that method 2 seems simpler, is the method of smoothing out the lumps in the measurement a plausible one?

Solana - leader validator and incrementing field

As I understand it, Solana will elect a leader each round and there will be multiple validators handling the transactions independently. The leader will then consolidate all the transactions.
From this understanding, I'm curious how Solana actually handles programs which increment a field. So lets say we have this counter field, which increases by 1 each time the program is called. What happens if 10 different users calls this program at the same time, how will this work if the 10 transactions are handled by the ten validators independently. For example at the start of the round, counter=50 and during the round, ten different validators handles the transactions separately so each validator will increase the counter=51. When the leader gets back all the txns, it will say counter=51, what happens in this scenario?
I feel like there is something missing in my assumptions.
So my understanding here seems to be incorrect. It is actually the leader who executes the transactions and the validators who are verifying the transactions.
Source
Page 2 - Section 3 - https://solana.com/solana-whitepaper.pdf
As shown in Figure 1, at any given time a system node is designated as
Leader to generate a Proof of History sequence, providing the network global
read consistency and a verifiable passage of time. The Leader sequences user
messages and orders them such that they can be efficiently processed by other
nodes in the system, maximizing throughput. It executes the transactions
on the current state that is stored in RAM and publishes the transactions
and a signature of the final state to the replications nodes called Verifiers.
Verifiers execute the same transactions on their copies of the state, and publish their computed signatures of the state as confirmations. The published
confirmations serve as votes for the consensus algorithm.
The "recent blockhash" is another important part of this. A transaction references a recent blockhash, which is part of the Proof of History sequence. If two transactions reference the same blockhash, they are counted as duplicates by the network, even if they come from two different users.
More information can be found at https://docs.solana.com/developing/programming-model/transactions#recent-blockhash
There is only one PoH generator(Block producer) at a time. other nodes are just validating.
I cannot comment to Jon C but the answer is wrong. you can use the same recent blockhash otherwise there is no way solana can handle 50000 tps when block time is around 0.4 sec.

In Nifi, what is the difference between FirstInFirstOutPrioritizer and OldestFlowFileFirstPrioritizer

User guide https://nifi.apache.org/docs/nifi-docs/html/user-guide.html has the below details on prioritizers, could you please help me understand how these are different and provide any real time example.
FirstInFirstOutPrioritizer: Given two FlowFiles, the one that reached the connection first will be processed first.
OldestFlowFileFirstPrioritizer: Given two FlowFiles, the one that is oldest in the dataflow will be processed first. 'This is the default scheme that is used if no prioritizers are selected.'
Imagine two processors A and B that are both connected to a funnel, and then the funnel connects to processor C.
Scenario 1 - The connection between the funnel and processor C has first-in-first-out prioritizer.
In this case, the flow files in the queue between the funnel and connection C will be processed strictly based on the order they reached the queue.
Scenario 2 - The connection between the funnel and processor C has oldest-flow-file-first prioritizer.
In this case, there could already be flow files in the queue between the funnel and connection C, but one of the processors transfers a flow to that queue that is older than all the flow files in that queue, it will jump to the front.
You could imagine that some flow files come from a different portion of the flow that takes way longer to process than other flow files, but they both end up funneled into the same queue, so these flow files from the longer processing part are considered older.
Apache NiFi handles data from many disparate sources and can route it through a number of different processors. Let's use the following example (ignore the processor types, just focus on the titles):
First, the relative rate of incoming data can be different depending on the source/ingestion point. In this case, the database poll is being done once per minute, while the HTTP poll is every 5 seconds, and the file tailing is every second. So even if a database record is 59 seconds "older" than another, if they are captured in the same execution of the processor, they will enter NiFi at the same time and the flowfile(s) (depending on splitting) will have the same origin time.
If some data coming into the system "is dirty", it gets routed to a processor which "cleans" it. This processor takes 3 seconds to execute.
If both the clean relationship and the success relationship from "Clean Data" went directly to "Process Data", you wouldn't be able to control the order that those flowfiles were processed. However, because there is a funnel that merges those queues, you can choose a prioritizer on the queued queue, and control that order. Do you want the first flowfile to enter that queue processed first, or do you want flowfiles that entered NiFi earlier to be processed first, even if they entered this specific queue after a newer flowfile?
This is a contrived example, but you can apply this to disaster recovery situations where some data was missed for a time window and is now being recovered, or a flow that processes time-sensitive data and the insights aren't valid after a certain period of time has passed. If using backpressure or getting data in large (slow) batches, you can see how in some cases, oldest first is less valuable and vice versa.

What exactly is the queue Abstract Data Type?

I am not clear on the idea of a Queue. It seems that this term is ambiguous or at least I am confused about it.
While it seems that the most common explanation of a queue (e.g. in wikipedia) is that it is an Abstract Data Type that adheres to the FIFO principle, in practice this term appears to have a broader meaning.
For example, we have
Priority Queues where each item is retrieve according to a priority,
we have a stack which also is a form of inverse queue (LIFO),
we have message queues, which seem to be just a list of items with no
ordering, there by classifying a simple list as a queue etc
So could someone please help me out here on why exactly a queue has so many different meanings?
A queue is inherently a data structure following the FIFO principle as its default nature.
Let us treat this queue as a queue in our natural day-to-day lives. Take an example of a queue on the railway station for purchasing tickets.
Normal queue: The person standing front-most in the queue gets the ticket, and any new person arriving stands at the end of the queue, waiting for his turn to get a ticket.
Priority queue: Suppose you are a VIP standing in the middle of that queue. The ticket vendor immediately notices you, and calls you to the front of the queue to get your tickets, even though its not your turn to purchase. Had you not been important, the queue would have kept playing its usual role, but as soon as any element is considered more important than the other, its picked up, irrespective of its position in the queue. But otherwise, the default nature of the queue remains the same.
Stack: Let's not confuse it with the queue at all. The purpose of the stack is inherently different from that of a queue. Take an example of dishes washed and kept in your kitchen, where the last dish washed is the first one to be picked for serving. So, stack and queue have a different role to play in different situations, and should not be confused with each other.
Message queue: As is the case with priority queue, the default nature of this queue is that the message that comes first is read first, while the upcoming messages line up in the queue waiting for their turn, unless a message is considered more important than the other and is called to the front of the queue before its usual turn.
So, the default nature of any type of queue remains the same, it continues to follow its FIFO principle unless its made to do otherwise, in special circumstances.
Hope it helps
In general, a queue models a waiting area where items enter and are eventually selected and removed. However, different queues can have different scheduling policies such as First-In-First-Out (FIFO), Last-In-First-Out (LIFO), Priority, or Random. For example, queueing theory addresses all of these as queues.
However, in computer science/programming, we typically use the word "queue" to refer specifically to FIFO queues, and use the other words (stack, priority queue, etc.) for the other scheduling policies. In practice, you should assume FIFO when you hear the word queue, but don't completely rule out the possibility that the speaker might be using the word more generally.
As an aside, similar issues arise with the word "heap" which, depending on context, might refer to a specific implementation of priority queues or to priority queues in general (or, in a mostly unrelated meaning, to an area of memory used for dynamic allocation).
Priority Queue: Its not a queue. Take a look: http://docs.oracle.com/javase/7/docs/api/java/util/PriorityQueue.html it does not implement the Queue interface.
Stack: Its not a queue. Take a look: http://docs.oracle.com/javase/7/docs/api/java/util/Stack.html it does not implement the Queue interface.
Message queue: I do not know what it is.
Queue: Queue has only one meaning, whoever comes first also get served first.
Queue ADT: It is an interface, meaning it has bunch of functions. Most common ones: add-adds to the end of the line, remove-removes from the beginning of the line. http://docs.oracle.com/javase/7/docs/api/java/util/Queue.html

Efficiently implementing Birman-Schiper-Stephenson(BSS) protocol's delay queue

I am using the Birman-Schiper-Stephenson protocol of distributed system with the current assumption that peer set of any node doesn't change. As the protocol dictates, the messages which have come out of causal order to a node have to be put in a 'delay queue'. My problem is with the organisation of the delay queue where we must implement some kind of order with the messages. After deciding the order we will have to make a 'Wake-Up' protocol which would efficiently search the queue after the current timestamp is modified to find out if one of the delayed messages can be 'woken-up' and accepted.
I was thinking of segregating the delayed messages into bins based on the points of difference of their vector-timestamps with the timestamp of this node. But the number of bins can be very large and maintaining them won't be efficient.
Please suggest some designs for such a queue(s).
Sorry about the delay -- didn't see your question until now. Anyhow, if you look at Isis2.codeplex.com you'll see that in Isis2, I have a causalsend implementation that employs the same vector timestamp scheme we described in the BSS paper. What I do is to keep my messages in a partial order, sorted by VT, and then when a delivery occurs I can look at the delayed queue and deliver off the front of the queue until I find something that isn't deliverable. Everything behind it will be undeliverable too.
But in fact there is a deeper insight here: you actually never want to allow the queue of delayed messages to get very long. If the queue gets longer than a few messages (say, 50 or 100) you run into the problem that the guy with the queue could be holding quite a few bytes of data and may start paging or otherwise running slowly. So it becomes a self-perpetuating cycle in which because he has a queue, he is very likely to be dropping messages and hence enqueuing more and more. Plus in any case from his point of view, the urgent thing is to recover that missed message that caused the others to be out of order.
What this adds up to is that you need a flow control scheme in which the amount of pending asynchronous stuff is kept small. But once you know the queue is small, searching every single element won't be very costly! So this deeper perspective says flow control is needed no matter what, and then because of flow control (if you have a flow control scheme that works) the queue is small, and because the queue is small, the search won't be costly!

Resources