Is there guaranteed order of subscribers in chronicle queue? - chronicle

I'm looking at Chronicle and I don't understand one thing.
An example - I have a queue with one writer - market data provider writes tick data as they appear.
Let's say queue has 10 readers - each reader is a different trading strategy that reads the new tick, and might send buy or sell order, let's name them Strategy1 .. Strategy10.
Let's say there is a rule that I can have only one trade open at any given time.
Now the problem - as far as I understand, there is no guarantee on the order how these subscribed readers process the tick event. Each strategy is subscribed to the queue, so each of them will get the new tick asynchronously.
So when I run it for the first time, it might be that Strategy1 receives the tick first and places the order, all the other strategies will then be not able to place their orders.
If I'll replay the same sequence of events, it might be that a different strategy processes the tick first, and places its order.
This will result in totally different results while using the same sequence of initial events.
Am I understanding something wrong or is this how it really works?
What are the possible solutions to this problem?
What I want to achieve is that the same sequence of source events (ticks) produces always the same sequence of trades.

If you want determinism in your system, then you need to remove sources of non-determinism. In your case, since you can only have a single trade open at one time, it sounds like it would be sensible to run all 10 strategies on a single thread (reader). This would also remove the need for any synchronisation on the reader side to ensure that there is only one open trade.
You can then use some fixed ordering to your strategies (e.g. round-robin) that will always produce the same output for a given set of inputs. Alternatively, if the decision logic is too expensive to run in serial, it could be executed in parallel, with each reader feeding into some form of gate (a Phaser-like structure), on which a decision about what strategy to use can be made deterministically. The drawback to this design is that the slowest trading strategy would hold back all the others.

Think you need to make a choice about how much you want concurrently and independently, and how much you want in order and serial. I suggest you allow the strategies to place orders independently, however the reader of those order to process them in the original order by checking a sequence number such as the queue index in the first queue.
This way the reader of the orders will process them in the same order regardless of the order they are processed and written which appears to be your goal.

Related

Expressing both concurrency and time in activity diagrams

I am not sure how to express my scenario using activity diagrams:
What I am trying to visualise is the fact that:
A message is received
Two independent and concurrent actions take place: logging of the message and processing the message
Logging always takes less time than processing
The first activity in the diagram is correct in the sense that the actions are independent but it does not relay the fact that logging is guaranteed to take less time than processing.
The second activity in the diagram is not correct because, even if logging completes before processing, it looks as though processing depended on the logging's finishing first and that does not represent the reality.
Here is a non-computer related example:
You are a novice in birdwatching, trying to make your first notes in your notebook about birds passing by
A flock of birds approaches, you try to recognise as many details as possible
You want to write down the details in your notebook, but wait, you begin to realise that your theoretical background does not work in practice, what should be a quick scribble actually amounts to nothing in the end because you did not recognise anything
In the meantime, the birds majestically flew away without waiting for you, the activity is gone
Or maybe you did actually write it down, it took you only a moment and the birds are still nearby, slowly flying away, ending the activity again after some time
Or maybe you were under such awe that you just kept watching at them, without taking any notes - they fly away, disappearing in the horizon, ending the activity
After a few hours, you have enough notes and you come home very happy - maybe you did not capture everything but this was enough to make you smile anyway
I can always add a comment to a diagram to express it all somehow but I wonder, is there a more structured way to express what I described in an activity diagram? If not an activity diagram then what kind of a diagram would be better suited in your opinion? Thank you.
Your first diagram assumes that the duration of logging is always shorter than processing:
If this assumption is correct, the upper flow reaches the flow-final node, and the remaining flows continue until the first reaches the activity-final node. Here, the processing continues and the activity ends when the processing ends. This is exactly what you want.
But if once, the execution would deviate from this assumption and logging would get delayed for any reason, then the end of the processing would reach the activity-final node, resulting in the immediate interruption of all other ongoing activities. So logging would not complete. Maybe it’s not a problem for you, but in most cases audit expects logs to be complete.
You may be interested in a safer way that would be to add a join node:
The advantage is that the activity does not depend on any assumptions. It will always work:
whenever the logging is faster, the token on that flow will wait at the join node, and as soon as process is finished the activity (safely) the join can happen and the outgoing token reaches the end. This is exactly what you currently expect.
if the logging is exceptionally slower, no problem: the processing will be over, but the activity will wait for the logging to be completed.
This robust notation makes logging like Schroedinger's cat in its box: we don't have to know what activity is longer or shorter. At the end of the activity, both actions are completed.
Time in activity diagrams?
Activity diagrams are not really meant to express timing and duration. It's about the flow of control and the synchronization.
However, if time is important to you, you could:
visually make one activity shorter than the other. This is super-ambiguous and absolute meaningless from a formal UML point of view. But it's intuitive when readers see the parallel flow (a kind of sublminal communication ;-) ) .
add a comment note to express your assumption in plain English. This has the advantage of being very clear an unambiguous.
using UML duration constraints. This is often used in timing diagram, sometimes in sequence diagrams, but in general not in activity diagrams (personally I have never seen it, but UML specs doesn't exclude it either).
Time is something very general in the UML specs, and defined independently of the diagram. For example:
8.4.4.2: A Duration is a value of relative time given in an implementation specific textual format. Often a Duration is a non- negative integer expression representing the number of “time ticks” which may elapse during this duration.
8.5.1: An Interval is a range between two values, primarily for use in Constraints that assert that some other Element has a value in the given range. Intervals can be defined for any type of value, but they are especially useful for time and duration values as part of corresponding TimeConstraints and DurationConstraints.
In your case you have a duration observation for the processing (e.g. d), and a duration constraint for the logging (e.g. 0..d).
8.5.4.2: An IntervalConstraint is shown as an annotation of its constrainedElement. The general notation for Constraints may be used for an IntervalConstraint, with the specification Interval denoted textually (...).
Unfortunately little more is said. The only graphical examples are for messages in sequence diagrams (Fig 8.5 and 17.5) and for timing diagrams (Fig 17.28 to 17.30). Nevertheless, the notation could be extrapolated for activity diagrams, but it would be so unusal that I'd rather recommend the comment note.

Priority Queues VS Queues

How a Priority Queue a Queue Data Structure. Since it doesn't follow FIFO, shouldn't it be named Priority Array or Priority Linked LIst majorly because Priority Queues don't follow a fashion like a FIFO queue
In a priority queue, an element with high priority is served before an element with low priority.
'If two elements have the same priority, they are served according to their order in the queue'
i think this will answer your question
If you look at most used implementations, priority queues are essentially heaps - they are arranged in a particular fashion based on priority defined by the programmer - in a simple example, ascending or descending order of integers.
Think of priority queue as a queue where rather than retrieving the elements based on when you add the element, you retrieve them based on how they compare with each other. This comparison can be simply ascending or descending order in your textbook examples. You can understand the ADT from an analogy from another StackOverflow answer:
You're running a hospital and patients are coming in. There's only one
doctor on staff. The first man walks in - and he's served immediately.
Next, a man with a cold comes in and requires assistance. You add him
to the queue and he waits in line for the doctor to become available.
Next, a man with an axe in his head comes through the door. He is
assigned a higher priority because he is a higher medical liability.
So the man with the cold is bumped down in line. Next, someone comes
in with breathing problems. So, once again, the man with the cold is
bumped down in priority. This is called trigaing in the real world -
but in this case it's a medical line.
Implementing this in code would use a priority queue and a worker
thread (the doctor) to perform work on the consumable / units of work
(the patients).
In real scenario, instead of patients, you might have processes waiting to be addressed by the CPU.
Read:
When would I use a priority queue?
In the queue, the natural
ordering given by how much time an element waits in a line can be considered the fairest. When you enter in a line waiting for something, first comes first served.
Sometimes, however, there is something special about some elements that
might suggest they should be served sooner than others that waited longer. For example, we don’t always read our emails in the order we received them, but often
you skip newsletters or “funny” jokes from friends to read work-related messages first.
Likewise, when you design an app or test an app, if there are some bugs, those bugs are prioritized and teams work on those bugs based on bugs severity. First, new bugs are discovered all the
time, and so new items will be added to the list. Say a nasty authentication bug is found—
you’d need to have it solved by yesterday! Moreover, priority for bugs can change over
time. For instance, your CEO might decide that you are going after the market share
that’s mostly using browser X, and you have a big feature launch next Friday, so you really need to solve that bug at the bottom within a couple of days.
Priority queues are especially useful when we need to consume elements in a certain order from a dynamically changing list (such as the list of tasks to run on a CPU), so that at any time we can get the next element (according to a certain criterion), remove it from the list, and (usually) stop worrying about fixing anything for
the other elements.
That’s the idea behind priority queues: they behave like regular, plain queues, except that the front of the queue is dynamically determined based on some kind of priority. The differences caused to the implementation by the introduction of priority are profound, enough to deserve a special kind of data structure.

Waiting Queue with Multiple Branches

I am currently working on a project and I am looking for a technique that will solve this scenario:
There are people waiting in a room to take one of many tests. There can be multiple tests assigned to each person. Each test may be given at one or more locations at a given time, but only one person can take the test at a given location at a time.
It is relatively simple just to randomly assign people to the tests and eventually they all get done, but what kind of system could I use to make it where people wait a relatively equal time? If I just randomly assign them, a person that only has to take one of the tests could be put behind people that have to take 5.
I have thought about assigning people with a lower number of tests to take first, but I have not yet tested that and it seems like it would still be unfair. And to add complexity, I am adding a feature that allows the priority to be changed.
To be clear, this is not a homework assignment. This project is still in the logical development phase, so I haven't really started programming to compare different techniques. The closest thing that I have thought of would be to create a system that acts somewhat like a thread pool, but I have not found anything that gives a detailed description of the techniques behind a thread pool and it seems that it would require a good bit of overhead and still run into problems if I just used a thread pool directly. I have also looked into the C# Queue class, but I haven't thought of a way to expand its capability.
Anyone have any ideas or suggestions?
C# (and most other languages) has a concurrent priority queue that you could use. Place the test takers on the queue, and remove one (and assign one test to it) whenever a room frees up; if the test taker has more tests left to take, then put it back on the queue.
One way to balance your execution times is to assign a random priority to your "test-takers," e.g.
testTaker.serPriority(random.Next(CONSTANT * testTaker.numberOfRemainingTests))
Then reset the test taker's priority whenever it completes a test. This will favor assigning tests to test takers with more tests to take, while the random element will approximate fairness. CONSTANT ought to be greater than the number of test takers to ensure sufficient randomness.

algorithm for even distribution of values to all processes

I have to write a distributed system with four processes running on four different nodes. The distributed system is supposed to work in the following way: a random number generator generates a random number at each process. The objective is to even out these values in all processes by message passing between processes. Such that process A is the server who gets the numbers from all proceses and then orders them to send a portion of their number to one or more other processes in order to even out all numbers the processes hold. For example A's count is 30, B's count is 65, C's count is 35 and D's count is 70. A computes 30+65+35+70 = 200 divided by 4 = 50. Now process A, the server, knows who has less than average and who has more than average. Now the question is how does A decide who sends what number to who? to even out the values of all processes. please note that A can't directly instruct a process to decrement or increment its count e.g. it can't send a message to B and tell it to decrement by 15 and then send another message to C and tell it to increment by 15. A must send a message to B that will tell B to decrement by 15 and then send a message to C and tell it to increment by 15 or in other words it tell B to send 15 of your count to C. Thanks in advance. Zaki.
For what I know there's not a specific recipe or only well define pattern to implement such distributed system (also if there's material that gives guidelines on argument, see the link at end of the question).
Here are involved various choice that will shape the final system, its scalability, how will be responsive, how will be solid, etc.
You tagged the question as language agnostic. I'm convinced that good concepts count more than technologies, but at the end the choice must be made and a system like this is too complex to be built with a language you're not more than familiar.
I would build it with C# because it's my primary development language, proceeding with techniques oriented to agile development.
First I'll try to sketch a macro-architectural design, highlighting actors involved and their responsibility (but without going in too much details).
Then I'll try to code a first simple prototype that involves two nodes.
When the prototype works, I'll try to found weak points and let it work with four nodes.
If there are issues, iterate last point until it satisfy requirements.
Going much into detail, you can build it even using raw sockets; but to keep it simple I suggest you found your system on HTTP protocol (e.g. using the .NET BCL HttpListener and HttpClient components as basis) for communication:
A predefined set of GET message can performs synchronization between peer servers.
POST message can be used to exchange data on random numbers.
About the generation of number, it opens a completely new world. I would rely on an external service like ANU Quantum Random Server (if you can count an active Internet connection). I know you stated you've an algorithm to implement, I supplied this as an alternative (I don't know if this part can be altered or not).
As least thing, I suggest you to read this article and also this about peer-to-peer if you'll use .NET framework.
The problem that you describe is known as distributed aggregation. There are a number of solutions appropriate for different assumptions on the network (what nodes are connected? can messages be lost?), the function to compute (average? sum?), and so on. A good overview, with references to algorithms that you can use, can be found at http://arxiv.org/abs/1110.0725.

Database for brute force solving board games

A few years back, researchers announced that they had completed a brute-force comprehensive solution to checkers.
I have been interested in another similar game that should have fewer states, but is still quite impractical to run a complete solver on in any reasonable time frame. I would still like to make an attempt, as even a partial solution could give valuable information.
Conceptually I would like to have a database of game states that has every known position, as well as its succeeding positions. One or more clients can grab unexplored states from the database, calculate possible moves, and insert the new states into the database. Once an endgame state is found, all states leading up to it can be updated with the minimax information to build a decision trees. If intelligent decisions are made to pick probable branches to explore, I can build information for the most important branches, and then gradually build up to completion over time.
Ignoring the merits of this idea, or the feasability of it, what is the best way to implement such a database? I made a quick prototype in sql server that stored a string representation of each state. It worked, but my solver client ran very very slow, as it puled out one state at a time and calculated all moves. I feel like I need to do larger chunks in memory, but the search space is definitely too large to store it all in memory at once.
Is there a database system better suited to this kind of job? I will be doing many many inserts, a lot of reads (to check if states (or equivalent states) already exist), and very few updates.
Also, how can I parallelize it so that many clients can work on solving different branches without duplicating too much work. I'm thinking something along the lines of a program that checks out an assignment, generates a few million states, and submits it back to be integrated into the main database. I'm just not sure if something like that will work well, or if there is prior work on methods to do that kind of thing as well.
In order to solve a game, what you really need to know per a state in your database is what is its game-theoretic value, i.e. if it's win for the player whose turn it is to move, or loss, or forced draw. You need two bits to encode this information per a state.
You then find as compact encoding as possible for that set of game states for which you want to build your end-game database; let's say your encoding takes 20 bits. It's then enough to have an array of 221 bits on your hard disk, i.e. 213 bytes. When you analyze an end-game position, you first check if the corresponding value is already set in the database; if not, calculate all its successors, calculate their game-theoretic values recursively, and then calculate using min/max the game-theoretic value of the original node and store in database. (Note: if you store win/loss/draw data in two bits, you have one bit pattern left to denote 'not known'; e.g. 00=not known, 11 = draw, 10 = player to move wins, 01 = player to move loses).
For example, consider tic-tac-toe. There are nine squares; every one can be empty, "X" or "O". This naive analysis gives you 39 = 214.26 = 15 bits per state, so you would have an array of 216 bits.
You undoubtedly want a task queue service of some sort, such as RabbitMQ - probably in conjunction with a database which can store the data once you've calculated it. Alternately, you could use a hosted service like Amazon's SQS. The client would consume an item from the queue, generate the successors, and enqueue those, as well as adding the outcome of the item it just consumed to the queue. If the state is an end-state, it can propagate scoring information up to parent elements by consulting the database.
Two caveats to bear in mind:
The number of items in the queue will likely grow exponentially as you explore the tree, with each work item causing several more to be enqueued. Be prepared for a very long queue.
Depending on your game, it may be possible for there to be multiple paths to the same game state. You'll need to check for and eliminate duplicates, and your database will need to be structured so that it's a graph (possibly with cycles!), not a tree.
The first thing that popped into my mind is the Linda-style of a shared 'whiteboard', where different processes can consume 'problems' off the whiteboard, add new problems to the whiteboard, and add 'solutions' to the whiteboard.
Perhaps the Cassandra project is the more modern version of Linda.
There have been many attempts to parallelize problems across distributed computer systems; Folding#Home provides a framework that executes binary blob 'cores' to solve protein folding problems. Distributed.net might have started the modern incarnation of distributed problem solving, and might have clients that you can start from.

Resources