Chronicle queue named appender to get the most recent roll cycle file for iteration returns earliest file - chronicle

I am trying to implement a named tailer to pickup where a task was previously left off. But in between continuing from a task, I may want to check the contents of the most recent roll cycle. I then tried to test if I can get the most recent roll cycle to iterate over (named tailers pick up where left off so I think it should iterate most recent file) and this is the output:
The code is as follows:
public static void main(String[] args) {
ChronicleQueue QUEUE = SingleChronicleQueueBuilder.single("./chronicle/roll").rollCycle(RollCycles.MINUTELY).build();
ExcerptAppender APPENDER = QUEUE.acquireAppender();
ExcerptTailer TAILER = QUEUE.createTailer("a");
//this reads all roll cycles starting from first and carries on to next rollcycle.
//busy spinner that spins non-stop trying to read from queue
while(true){
System.out.println(TAILER.cycle());
if (TAILER.readDocument(w -> w.read("packet").marshallable(
m -> {
System.out.println(m.read("moldUdpHeader").text());
}
))){ ; } else { TAILER.close(); break; }
//breaks out from spinner if nothing to be read.
//a named tailer could be used to pick up from where is left off.
}
//by calling TAILER.approximateExcerptsInCycle(TAILER.cycle()) inside the for-loops condition statement,
//it winds the index back to start of the cycle, so TAILER will constantly only read first item
ExcerptTailer tailer = QUEUE.createTailer("a");
tailer.moveToCycle(tailer.cycle());
System.out.println(tailer.cycle());
long excerpts = tailer.approximateExcerptsInCycle(tailer.cycle()); //gets the number of records for earliest roll cycle available
System.out.println(excerpts);
for (int i=0; i< excerpts; i++){
System.out.println(tailer.cycle());
tailer.readDocument(w -> w.read("packet").marshallable(
m -> {
System.out.println(m.read("moldUdpHeader").text());
}
));
}
tailer.close();
Explanation of output:
I ran the while loop, so it went to the end of the tailer and there is nothing to further iterate. So it prints the cycle, but does not print any documents then breaks.
I then try to move to start of the cycle with .moveToCycle(TAILER.cycle());, it prints and shows that I am still on most recent (last cycle 27927733), I print the number of excerpts (6) inside the tailer and enter the for loop. Inside the for loop, it outputs values from the earliest documents from the first chronicle roll cycle file (27927664). Although the first line prints 27927733 before printing output, the output is also from the first roll cycle.
Chronicle Queue POM:
<!-- https://mvnrepository.com/artifact/net.openhft/chronicle-queue -->
<dependency>
<groupId>net.openhft</groupId>
<artifactId>chronicle-queue</artifactId>
<version>5.24ea7</version>
</dependency>

That odd behaviour should be investigated.
However, I suggest creating a tailer for each purpose to simplify what it does. Doing so is likely to work around this issue. I also suggest you don't need any static fields.

I have checked out this SO question and approached the problem from a different perspective. Now, I create a new tailer (not named) move to end, get the approximate excerpts in cycle (from code above), move to the first index (approximate*) of the last available roll cycle, then iterate.
This may not always get the first item in the roll cycle, as it is an approximate. Please still do note the weird behaviour mentioned in above question.
tailer.toEnd();
long lastIndex = tailer.index();
long excerpts = tailer.approximateExcerptsInCycle(tailer.cycle());
tailer.moveToIndex(lastIndex - excerpts);

Related

recursion in sorted double linked list insertion

I'm new to the data structures and recursion concept. I'm struggling to understand why and who he was able to use the recursion in this concept. I found this code in the forums for this and I couldn't really understand the concept of this. For simple case of 2 1 3 4, if any one can explain the iteration steps, it will be greatly appreciated on my behalf.
Here is the link for hacker rank:
https://www.hackerrank.com/challenges/insert-a-node-into-a-sorted-doubly-linked-list
Node SortedInsert(Node head,int data) {
Node n = new Node();
n.data = data;
if (head == null) {
return n;
}
else if (data <= head.data) {
n.next = head;
head.prev = n;
return n;
}
else {
Node rest = SortedInsert(head.next, data);
head.next = rest;
rest.prev = head;
return head;
}
}
Recursion:
Recursion means a function calls itself. It is used as a simple way to save state information for algorithms that require saving of multiple states, usually a large number of states, and retrieving them in reverse order. (There are alternative techniques that are more professional and less prone to memory issues, such as using a Stack object to save program state).
This example is poor but typical of intro to recursion. Yes, you can iterate through a linked list using recursion but there is absolutely no reason to. A loop would be more appropriate. This is purely for demonstrating how recursion works. So, to answer your question "Why?" it is simply so you can learn the concept and use it later in other algorithms that it actually makes sense.
Recursion is useful when instead of a linked list you have a tree, where each node points to multiple other nodes. In that case, you need to save your state (which node you are on, and which subnode you called last) so that you can traversing one of the linked nodes, then return and go to the next node.
You also asked "how". When a function calls itself, all of its variables are saved (on the program stack) and new ones are created for the next iteration of itself. Then, when that call returns, it goes back to where it was called from and the previous set of variables are loaded. This is very different from a "jump" or a loop of some kind, where the same copies of the variables are used each time. By using recursion, there is a new copy of every local variable each time it is called. This is true even of the "data" variable in the example, which never changes (hence, one inefficiency).

Algorithm to time-sort N data streams

So I've got N asynchronous, timestamped data streams. Each stream has a fixed-ish rate. I want to process all of the data, but the catch is that I must process the data in order as close to the time that the data arrived as possible (it is a real-time streaming application).
So far, my implementation has been to create a fixed window of K messages which I sort by timestamp using a priority queue. I then process the entirety of this queue in order before moving on to the next window. This is okay, but its less than ideal because it creates lag proportional to the size of the buffer, and also will sometimes lead to dropped messages if a message arrives just after the end of the buffer has been processed. It looks something like this:
// Priority queue keeping track of the data in timestamp order.
ThreadSafeProrityQueue<Data> q;
// Fixed buffer size
int K = 10;
// The last successfully processed data timestamp
time_t lastTimestamp = -1;
// Called for each of the N data streams asyncronously
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
if (q.size() > K) {
processQueue();
}
}
// Process all the data in the queue.
void processQueue() {
while (!q.empty()) {
const auto& data = q.top();
// If the data is too old, drop it.
if (data.timestamp < lastTimestamp) {
LOG("Dropping message. Too old.");
q.pop();
continue;
}
// Otherwise, process it.
processData(data);
lastTimestamp = data.timestamp;
q.pop();
}
}
Information about the data: they're guaranteed to be sorted within their own stream. Their rates are between 5 and 30 hz. They consist of images and other bits of data.
Some examples of why this is harder than it appears. Suppose I have two streams, A and B both running at 1 Hz and I get the data in the following order:
(stream, time)
(A, 2)
(B, 1.5)
(A, 3)
(B, 2.5)
(A, 4)
(B, 3.5)
(A, 5)
See how if I processed the data in order of when I received them, B would always get dropped? that's what I wanted to avoid.Now in my algorithm, B would get dropped every 10th frame, and I would process the data with a lag of 10 frames into the past.
I would suggest a producer/consumer structure. Have each stream put data into the queue, and a separate thread reading the queue. That is:
// your asynchronous update:
void receiveAsyncData(const Data& dat) {
q.push(dat.timestamp, dat);
}
// separate thread that processes the queue
void processQueue()
{
while (!stopRequested)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This prevents the "lag" that you see in your current implementation when you're processing a batch.
The processQueue function is running in a separate, persistent thread. stopRequested is a flag that the program sets when it wants to shut down--forcing the thread to exit. Some people would use a volatile flag for this. I prefer to use something like a manual reset event.
To make this work, you'll need a priority queue implementation that allows concurrent updates, or you'll need to wrap your queue with a synchronization lock. In particular, you want to make sure that q.pop() waits for the next item when the queue is empty. Or that you never call q.pop() when the queue is empty. I don't know the specifics of your ThreadSafePriorityQueue, so I can't really say exactly how you'd write that.
The timestamp check is still necessary because it's possible for a later item to be processed before an earlier item. For example:
Event received from data stream 1, but thread is swapped out before it can be added to the queue.
Event received from data stream 2, and is added to the queue.
Event from data stream 2 is removed from the queue by the processQueue function.
Thread from step 1 above gets another time slice and item is added to the queue.
This isn't unusual, just infrequent. And the time difference will typically be on the order of microseconds.
If you regularly get updates out of order, then you can introduce an artificial delay. For example, in your updated question you show messages coming in out of order by 500 milliseconds. Let's assume that 500 milliseconds is the maximum tolerance you want to support. That is, if a message comes in more than 500 ms late, then it will get dropped.
What you do is add 500 ms to the timestamp when you add the thing to the priority queue. That is:
q.push(AddMs(dat.timestamp, 500), dat);
And in the loop that processes things, you don't dequeue something before its timestamp. Something like:
while (true)
{
if (q.peek().timestamp <= currentTime)
{
data = q.pop();
if (data.timestamp >= lastTimestamp)
{
processData(data);
lastTimestamp = data.timestamp;
}
}
}
This introduces a 500 ms delay in the processing of all items, but it prevents dropping "late" updates that fall within the 500 ms threshold. You have to balance your desire for "real time" updates with your desire to prevent dropping updates.
There's always be a lag and that lag will be determined by how long you'll be willing to wait for your slowest "fixed-ish rate" stream.
Suggestion:
keep the buffer
keep an array of bool flags with the meaning:"if position ix is true, in the buffer there is at least a sample originated from stream ix"
sort/process as soon as you have all flag to true
Not full-proof (each buffer will be sorted, but from one buffer to another you may have timestamp inversion), but perhaps good enough?
Playing around with the count of "satisfied" flags to trigger the processing (at step 3) may be used to make the lag smaller, but with the risk of more inter-buffer timestamp inversions. In extreme, accepting the processing with only one satisfied flag means "push a frame as soon as you receive it, timestamp sorting be damned".
I mentioned this to support my feeling that lag/timestamp inversions balance is inherent to your problem - except for absolutely equal framerates, there will be perfect solution in which one of the sides is not sacrificed.
Since a "solution" will be an act of balancing, any solution will require gathering/using extra information to help decisions (e.g. that "array of flags"). If what I suggested sounds silly for your case (may well be, the details you chose to share aren't too many), start thinking what metrics will be relevant for your targeted level of "quality of experience" and use additional data structures to help gathering/processing/using those metrics.

Possibility of saving partial outputs from bulk iteration in Flink Dataset?

I am doing an iterative computation using flink dataset API.
But the result of each iteration is a part of my complete solution.
(If more details required: I am computing lattice nodes level-wise starting from top towards bottom in each iteration, see Formal Concept Analysis)
If I use flink dataset API with bulk iteration without saving my result, the code will look like below:
val start = env.fromElements((0, BitSet.empty))
val end = start.iterateWithTermination(size) { inp =>
val result = ObjData.mapPartition(new MyMapPartition).withBroadcastSet(inp, "concepts").groupBy(0).reduceGroup(new MyReduceGroup)
(result,result)
}
end.count()
But, if I try to write partial results within iteration (_.writeAsText()) or any action, I will get error:
org.apache.flink.api.common.InvalidProgramException: A data set that is part of an iteration was used as a sink or action. Did you forget to close the iteration?
The alternative without bulk iteration seems to be below:
var start = env.fromElements((0, BitSet.empty))
var count = 1L
var all = count
while (count > 0){
start = ObjData.mapPartition(new MyMapPartition).withBroadcastSet(start, "concepts").groupBy(0).reduceGroup(new MyReduceGroup)
count = start.count()
all = all + count
}
println("total nodes: " + all)
But this approach is exceptionally slow on smallest input data, iteration version takes <30 seconds and loop version takes >3 minutes.
I guess flink is not able to create optimal plan to execute the loop.
Any workaround I should try? Is some modification to flink is possible to be able to save partial results on hadoop etc.?
Unfortunately, it is not currently possible to output intermediate results from a bulk iteration. You can only output the final result at the end of the iteration.
Also, as you correctly noticed, Flink cannot efficiently unroll a while-loop or for-loop, so that won't work either.
If your intermediate results are not that big, one thing you can try is appending your intermediate results in the partial solution and then output everything in the end of the iteration. A similar approach is implemented in the TransitiveClosureNaive example, where paths discovered in an iteration are accumulated in the next partial solution.

Travelling through a for loop at the same time as deleting and inserting

I am writing a predator/prey simulation where objects can be born or killed. When killed, they are deleted from the arraylist, when born they are added. Each object in the list can kill another object or replicate. I travel through the list simulating each objects movement and interaction with the surroundings including the decision to replicate or kill another object if its close.
A normal for loop breaks as if a deletion or a birth occurs the index it is currently on is skewed.
What would be a better solution? While with a counter and conditional that the size > 0 or some other way?
You could wait until after you are finished iterating to add/remove items:
entities_to_add = new Array;
entities_to_remove = new Array;
function tick():
for each entity in world:
//general entity behavior goes here
if entity.wants_to_reproduce:
entities_to_add.append(entity.make_baby())
if entity.wants_to_die:
entities_to_remove.append(entity)
function cleanup():
for each entity in entities_to_remove:
world.remove(entity)
for each entity in entities_to_add:
world.add(entity)
entities_to_remove.clear()
entities_to_add.clear()
function main():
while(True):
tick()
cleanup()
This has a disadvantage that an entity that dies will appear to remain alive until the end of the tick. This may be bad, for example if a predator kills a prey, and a second predator also kills that prey during the same tick. If that isn't desirable, you could make the predators check the entities_to_remove array before attacking to make sure their prey is still alive.
It is common in simulation to have two copies of the world - one for the state at time t, one for the state at time t+Δt.
If you don't have two copies, and try to birth entities as you process the state transition, then the first entity in the list will see a different world from the last.
For example, if more prey is birthed than felled in a tick, then predators who happen to be at the end of the list will have an advantage which is not part of the world being simulated, but an artefact of your implementation. If you've added 'red cats' and then 'blue cats', then the blue cats will do better, without there being any real difference.
If you do have two copies, then you have to resolve the issue of more than one predator felling the same prey, but your original problem won't exist.
ArrayList<Object> predators = new ArrayList<Object>();
ArrayList<Object> preys = new ArrayList<Object>();
for (Object predator : predators) {
Object[] entities = preys.toArray();
for (Object entity : entities) {
if (entity.shouldReprodue()) {
world.add(entity.baby());
}
if (entity.shouldDie()) {
world.remove(entity);
}
}
}
I think every predator should make a new copy of the current world ArrayList.

A way to reverse queue using only two temporary queues and nothing more?

Is there a way to reverse items' order in queue using only two temporary queues (and no other variables, such as counters)? Only standard queue operation are available: ENQUEUE(e), DEQUEUE(), EMPTY()?
Solutions in any language or pseudocode are welcome.
You can:
Use the two queues to simulate a stack.
Push all of the elements of the original queue to this stack.
Now pop each element from the stack and add it to the original queue.
I realize this thread is long dead, but I believe I found a pretty good solution that meets all the requirements.
You'll only need one temp queue. What you do is as long as the original queue isn't empty, you move the last node in the queue to the front by setting a pointer to the last node and dequing and re-enquing the nodes into the original queue.
Then you deqeueue from the original queue and enqueue into the temp queue.
After, you just copy the temp back to the original queue.
Here's my solution in C, ADT style:
(At least I don't have to worry about doing your homework for you)
QUEUE *reverseQueue (QUEUE *queue)
{
QUEUE *temp;
QUEUE_NODE *pLast;
void *dataPtr;
//Check for empty queue
if(emptyQueue(queue))
return NULL;
//Check for single node queue
if(queueCount(queue) == 1)
return queue;
temp = createQueue();
while(!emptyQueue(queue))
{
pLast = queue->rear;
//Move last node to front of queue
while(queue->front != pLast)
{
dequeue(queue, &dataPtr);
enqueue(queue, dataPtr);
}
//Place last node in temp queue
dequeue(queue, &dataPtr);
enqueue(temp, dataPtr);
}
//Copy temp queue back to original queue
while(!emptyQueue(temp))
{
dequeue(temp, &dataPtr);
enqueue(queue, dataPtr);
}
destroyQueue(temp);
return queue;
}
// REVERSE OF QUEUE USING 1 QUEUE N OTHER DEQUEUE
void reverse() {
int q1[20],q2[20],f1,r1,f2,r2;
while(f1!=r1) {
q1[r2]=q1[f1];
r2=r2+1;
f1=f1+1;
}
//whole elements of temporary queue is transfered to dequeue.
while(r2!=f2) {
q1[r1]=q2[r2];
r2=r2-1;
r1=r1+1;
}
}
So the way to do it is to dequeue everything (except the last item) from the original queue to a temp queue, putting the last item in the final queue. Then repeat, each time copying every item except the last one from one temp queue to another, and the last one to the final queue. hmm.... is there a better way?
while(!queue.EMTPY())
{
while(!finalQueue.EMPTY())
{
tempQueue.ENQUEUE(finalQueue.DEQUEUE());
}
finalQueue.ENQUEUE(queue.DEQUEUE());
while(!tempQueue.EMPTY())
{
finalQueue.ENQUEUE(tempQueue.DEQUEUE());
}
}
I think that should work. There is a more efficient way if you can swap the temp and final queues with each dequeue from the original queue.
The answer is yes. I'm tempted to leave it there because this sounds like a homework question, but I think its an interesting problem.
If you can only use those three operations, you have to use both temp queues.
Basically you have to dequeue from the main queue and put the item into temp queue A. Dequeue all items from temp queue B (empty at first) into A. Then do the same thing only reverse the order from A to B. You always enqueue into the temp queue that is empty, and then put in all the items from the other temp queue. When the queue to be reversed is empty, you can just dequeue from the non-empty temp queue and enqueue into the primary queue and you should be reversed.
I would give pseudo-code, but again, I'm worried I'm doing your homework for you.
This is kind of similar to the Tower of Hanoi puzzle :)
Here is a solution in C#:
static Queue<T> ReverseQueue<T>(Queue<T> initialQueue)
{
Queue<T> finalQueue = new Queue<T>();
Queue<T> intermediateQueue = new Queue<T>();
while (initialQueue.Count > 0)
{
// Move all items from the initial queue to the intermediate queue,
// except the last, which is placed on the final queue.
int c = initialQueue.Count - 1;
for (int i = 0; i < c; i++)
{
intermediateQueue.Enqueue(initialQueue.Dequeue());
}
finalQueue.Enqueue(initialQueue.Dequeue());
// Swap the 'initialQueue' and 'intermediateQueue' references.
Queue<T> tempQueue = initialQueue;
initialQueue = intermediateQueue;
intermediateQueue = tempQueue;
}
return finalQueue;
}
To reverse a string from queue that is queue content we need two more queues
in first queue dequeue all elements except last one and enqueue them in 2nd queue.
Now enqueue last element in 3rd queue which is resulting queue.Do the transactions from 2nd to 1st queue and enqueue last element in 3rd stack.
Repeats this untill both stack empty.

Resources