I have implemented a Device2Device-capable LTE scheduler for the SimuLTE framework for OMNeT++. It needs to reassign resource blocks. In typical LTE networks, each resource blocks is assigned to at most 1 node. With the advent of D2D, uplink resource blocks can be reassigned to D2D users.
The simulator knows both resource blocks and bands. A band is a logical collection of transmission frequencies. If numBands == numRBs then each band corresponds to one resource block.
So to try things out, I set numRbs = 1 = numBands and have two nodes. My scheduler simply assigns the first band to the first node, and then attempts to reassign the same band to the second node. The SchedulingResult tells me I should TERMINATE here - which seems like reassignment is not supported at all.
However, both nodes transmit UDP packets and if I run the simulation to the end, I find that both actually got to send out the same number of packets. Going through the logs I find that every 5th scheduling round (the number 5 might be specific to my setup), the second node still gets the TERMINATE answer, but is granted a resource block anyway. This seems to happen within the eNodeB Main Loop. How, why, what's going on? Does anyone know?
Turns out that one of the two notions of bands and resource blocks are obsolete and likely to be removed in a later version. Users of the framework should set numBands==numRBs! The number of resource blocks is absolute, so this means that per band there's going to be one resource block available.
Note that band reassignment is not currently supported. Trying to do so will always end in the TERMINATE answer as described in the question, because there's a check that makes sure that the number of unassigned resource blocks is >0, which is not true if the band has already been assigned (and therefore the resource block linked to it).
Related
I am trying to understand what is actually in Simple NightShade, the phase0 of the sharding strategy in NEAR. I have read all the relevant medium posted and also the videos in them, I have also search Zulip chat, got some understanding for the coming chunk and block producer selection algo, whic I "think" might be in phase 1. But I cannot understand what was actually implemented
Here are some of the information I got from the medium posts:
There are now 4 shards, only states are sharded, computation is not
sharded (all validators, aka block producers, have to track all
shards). article [near-launches-simple-nightshade-]
Above all, it will be far cheaper to reach consensus — i.e., add a block to NEAR’s chain, since each block will only require a backing of 0.1% of all of the staked coins in NEAR’s ecosystem. In article [how-simple-nightshade-works]
Under the section: What makes simple nightshade unique - In article [primer]
On a physical level, no participant downloads either the full state or the full logical block. Instead, they maintain the state that is connected to the shards for which they validate transactions.
Here are my questions of what is in phase0:
What is meant by all validators track all the shards? As in, do they compute all the state transitions in all the shards for the same block? or do they get assigned to one shard for each epoch but will rotate to all shards?
If the former case is true for Q1, and that the runtime is not aware of sharding, does this mean all validators will have 4 runtime running at the same time to
accommodate for cross-shard transactions? And that how can the validators not download all the states?
How are shards currently split? by merkle trie hash address?
How does this phase require less backing per block?
I saw from this stackoverflow post that now the gas-limit per block is 4 times that of pre-phase0. So this means (in this phase) that validators may be doing 4x the work as before (until chunk producers come in).
I would really appreciate help on understanding this! thank you!
As I understand it, Solana will elect a leader each round and there will be multiple validators handling the transactions independently. The leader will then consolidate all the transactions.
From this understanding, I'm curious how Solana actually handles programs which increment a field. So lets say we have this counter field, which increases by 1 each time the program is called. What happens if 10 different users calls this program at the same time, how will this work if the 10 transactions are handled by the ten validators independently. For example at the start of the round, counter=50 and during the round, ten different validators handles the transactions separately so each validator will increase the counter=51. When the leader gets back all the txns, it will say counter=51, what happens in this scenario?
I feel like there is something missing in my assumptions.
So my understanding here seems to be incorrect. It is actually the leader who executes the transactions and the validators who are verifying the transactions.
Source
Page 2 - Section 3 - https://solana.com/solana-whitepaper.pdf
As shown in Figure 1, at any given time a system node is designated as
Leader to generate a Proof of History sequence, providing the network global
read consistency and a verifiable passage of time. The Leader sequences user
messages and orders them such that they can be efficiently processed by other
nodes in the system, maximizing throughput. It executes the transactions
on the current state that is stored in RAM and publishes the transactions
and a signature of the final state to the replications nodes called Verifiers.
Verifiers execute the same transactions on their copies of the state, and publish their computed signatures of the state as confirmations. The published
confirmations serve as votes for the consensus algorithm.
The "recent blockhash" is another important part of this. A transaction references a recent blockhash, which is part of the Proof of History sequence. If two transactions reference the same blockhash, they are counted as duplicates by the network, even if they come from two different users.
More information can be found at https://docs.solana.com/developing/programming-model/transactions#recent-blockhash
There is only one PoH generator(Block producer) at a time. other nodes are just validating.
I cannot comment to Jon C but the answer is wrong. you can use the same recent blockhash otherwise there is no way solana can handle 50000 tps when block time is around 0.4 sec.
This always annoys me so I usually just ignore it but this time it has prompted me to ask the question...
I am animating agents queuing for a resource using a path to represent the queue. I have a moveTo block to move my agents to a node which is placed at the front of the queue. When the queue is empty and an agent arrives to be serviced, it looks great as the agent moves to the end of the queue path and smoothly progresses along the path to the front of the queue where the node is located.
However, if there are multiple agents in the queue then new agents will move to the queue path and move all the way to the front of the queue (where the node is located) and then jump back to their correct position on the queue path.
If I put the node at the back end of the queue then the animation looks great when the agents arrive as they join the queue behind others already there but when the agent at the front of the queue seizes the resource they are waiting for it jumps to the back of the queue and then proceeds along the queue to the resource node.
Any ideas on how to get this to animate correctly?
You can achieve this simply using a Conveyor block to represent the 'shuffling along' queue (with some specific configuration), but it's worth considering the wider picture (which also helps understand why trying to add a MoveTo block to a Service with its queue along a path cannot achieve what you want).
A process model can include model-relevant spatiality where movement times are important. (As well as the MoveTo block, blocks such as RackPick/Store and Service blocks with "Send seized resources" checked implicitly include movement.) However, typically you don't: a Service block with the queue being along a path is using the path just to provide some visual representation of the queue. In the underlying model agents arrive instantly into the queue from the upstream block and instantly enter the delay when a resource is free — that is the process abstraction of the model. Hence trying to 'fix the animation' with a previous MoveTo block or similar will not work because the Service block is not supposed to be representing such a conception of its queue (so agents will 'spring back' to the reality of the underlying behaviour as you observed). Additionally, a 'properly animated queue' would be obscuring the underlying basis of the model (making it seem as if that movement is being explicitly modelled when it isn't).
A Conveyor does conceptually capture agents which have to stay a certain distance apart and (for an accumulating conveyor) explicitly models agents moving along when there is free space. So, although it may seem counterintuitive, this is actually a 'correct' detailed conceptualisation of a moving human queue (which also of course matches an actual conveyor).
To make it work as you want it, you need to make the size of the agents (just from the conveyor's perspective) such that you only have the required number of people in your queue (now a conveyor), with the following Service block just having a capacity 1 queue (which thus represents the 'front of the queue person' only) — Service blocks can't have a capacity 0 queue. You can use a Point Node as the location for this single-entry queue which is just beyond the end of the conveyor path (so that this effectively represents the first position in the queue) — see below.
You then want the agent length on the conveyor to represent your 'queue slot length' which requires specifying the queue capacity (a variable in my example), so something like
path.length(METER) / (queueCapacity - 1)
where path is your conveyor path. (The conveyor represents all queue 'slots' except the first, hence why we subtract 1 above.)
You could also encapsulate all of this as a custom ServiceWithMovingQueue block or similar.
Note that the Queue before the Conveyor is needed in case the conveyor has no room for an arriving agent (i.e., the 'conceptual queue' is full). If you wanted to be particularly realistic you'd have to decide what happens in real-life and explicitly model that (e.g., overflow queue, agent leaves, etc.).
P.S. Another alternative is to use the Pedestrian library, where Service with Lines space markup is designed to model this: partial example flow below. However, that means switching your agents to be pedestrians (participating in the pedestrian modelling underlying physics model for movement) and back again which is performance-intensive (and can cause some bizarre movement in some cases due to the physics). Plus, because the pedestrian library has no explicit concept of resources for pedestrian services, you'd have to do something like have resource pool capacity changes influence which service points were open or not. (The service points within a Service with Lines markup have functions like setSuspended so you can dynamically set them as 'open' or 'closed', in this case linked to whether resources are on-shift to man them.)
P.P.S. Note that, from a modelling accuracy perspective, capturing the 'real' movement in a human queue is typically irrelevant because
If the queue's empty, the time to move from the end to the front is typically negligible compared to the service time (and, even if it's not negligible, a service which generally has a queue present means this extra movement is only relevant for a tiny proportion of arrivals — see below).
If the queue's not empty, people move up whilst others are being served so there is no delay in terms of the service (which can always accept the next person 'immediately' after finishing with someone because they are already at the front of the queue).
This cannot be fixed with the existing blocks of the process modeling library.
Nevertheless, if you use the pedestrian library, this problem doesn't occur, maybe you can consider using it if the animation is that important, at the cost of processing speed of your model
The only other way to actually do it, is by creating your own Agent-Based model to handle the behavior of agents in a queue, but this is not very straight forward.
Now, if you think about operation time, there is no difference for the process statistics if an agent moves like it does or if it moves to the end of the line, so in terms of results, you shouldn't be worried about it
I have an abstract question.
I need a service with fault tolerance. The service only can only be running on one node at a time. This is the key.
With two connected nodes: A and B.
If A is running the service, B must be waiting.
If A is turned off, B should detect this and start the service.
If A is turned on again, A should wait and don't run the service.
etc. (If B is turned off, A starts, if A is turned off B starts)
I have think about heartbeat protocol for sync the status of the nodes and detect timeouts, however there are a lot of race conditions.
I can add a third node with a global lock, but I'm not sure about how to do this.
Anybody know any well-known algorithm to do this? Or better Is there any open source software that lets me control this kind of things?
Thanks
If you can provide some sort of a shared memory between nodes, then there is the classical algorithm that resolves this problem, called Peterson's algorithm.
It is based on two additional variables, called flag and turn. Turn is an integer variable whose value represents an index of node that is allowed to be active at the moment. In other words, turn=1 indicates that node no 1 has right to be active, and other node should wait. In other words, it is his turn to be active - that's where the name comes from.
Flag is a boolean array where flag[i] indicates that i-th node declares itself as ready for service. In your setup, flag[i]=false means that i-th node is down. Key part of the algorithm is that a node which is ready for service (i.e. flag[i] = true) has to wait until he obtains turn.
Algorithm is originally developed for resolving a problem of execution a critical section without conflict. However, in your case a critical section is simply running the service. You just have to ensure that before i-th node is turned off, it sets flag[i] to false. This is definitely a tricky part because if a node crashes, it obviously cannot set any value. I would go here with a some sort of a heartbeat.
Regarding the open source software that resolves similar problems, try searching for "cluster failover". Read about Google's Paxos and Google FileSystem. There are plenty of solutions, but if you want to implement something by yourself, I would try Peterson's algorithm.
I am learning Paxos algorithm (http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf) and there is one point I do not understand.
We know that events follow a timely order, and it happens when, say, events 1-5 and 10 are decided, but 6-9 and 11 thereafter are not yet. In the paper above, it says we simply fill in the gap between 6-9 with no-op values, and simply record new events from 11 and on.
So in this case, since event 10 is already recorded, we know some kinds of events must have happened between 5 and 10 but are not recorded by Paxos due to some failures. If we simply fill in no-op values, these events will lost in our recording.
Even worse, if, as the paper I linked above says, events are in fact commands from the client, then missing a few commands in the middle might make the entire set of operations illegal (if none of the commands can be skipped or the order of them matters).
So why is it still legit for Paxos to fill no-op values for gaps between events? (If the entire set of records might be invalid because of no-op values as I concerned above.) Also, is there a better way to recover from such gaps instead of using no-op?
This is a multi-part answer.
Proposing no-op values is the way to discover commands that haven't got to the node yet. We don't simply fill each slot in the gap with a no-op command: we propose each slot is filled with a no-op. If any of the peers have accepted a command already, it will return that command in the Prepare-ack message and the proposer will use that command in the Accept round instead of the no-op.
For example, assume a node was behind a temporary network partition and was unable to play with the others for slots 6-9. It knows it missed out upon learning the command in slot 10. It then proposes no-ops to learn what was decided in those slots.
Practical implementations also have an out-of-band learning protocol to learn lots of transitions in bulk.
A command isn't a command until it is fully decided; until then it is just a proposed command. Paxos is about choosing between contending commands from multiple clients. Clients must be prepared to have their commands rejected because another client's was chosen instead.
Practical implementations are all about choosing the order of client commands. Their world view is that of a write-ahead log, and they are placing the commands in that log. They retry in the next slot if they're command wasn't chosen. (There are many ways to reduce the contention; Lamport mentions forwarding requests to a leader, such as is done in Multi-Paxos.)
Practical systems also have some means to know if the command is invalid before proposing it; such as knowing a set of reads and a set of writes. This is important for two reasons. First, it's an asynchronous, multi-client system and anything could have changed by the time the client's command has reached the server. Second, if two concurrent commands do not conflict then both should be able to succeed.
The system model allows commands (messages) to be lost by the network anyway. If a message is lost, the client is expected to eventually retry the request; so it is fine to drop some of them. If the commands of a client have to executed in client order, then either the client only sends commands synchronously; or the commands have to be ordered at a higher level in the library and kept in some client-session object before being executed.
AFAIK the Zab protocol guarantees client-order, if you don't want to implement that at a higher level.