Should I Create a New Stream After a Snapshot Was Made? - snapshot

When creating a snapshot of a stream, do I continue the snapshotted stream as usual, or do I create a new stream where all events after the snapshot will be placed?
1
stream-123: event1 | event2
<snapshot-123-event2>
stream-123: event1 | event2 | event3 | event4
2
stream-123-1: event1 | event2
<snapshot-123-event2>
stream-123-2: event3 | event4

Most commonly, creating a new stream is done in response to something of significance in the business domain. For example, at the end of the fiscal year, we might "close the books", bringing that life cycle to an end, while beginning the life cycle of a new collection of entities to track the next year.
Another example might be a transition from one process to another -- when we are shopping, we're adding and removing items from the cart. But once we place the order, we have new processes that launch to handle billing, fulfillment, and so on, which could be tracked by different services in different streams.
Snapshotting as a performance optimization does not usually introduce a new stream; instead, it only caches some interesting (and non-authoritative) intermediate results to improve the latencies in handling requests.

Related

Remove agents event Anylogic

I would like to create a cyclic event that every 24 hours at a certain hour removes all the agents (in my case Person which are part of the population called Customers) from the flowchart in order to start the next morning whit noone in the loop.
Thanks in advance
I tried to put in the Action of the event : Object Customer=null;
This should do the trick, be sure to name your customers population with a lowercase initial c:
List <Customer> tempCustomers=findAll(customers,t->true);
for(Customer c : tempCustomers){
remove_customers(c);
}
But if your agents are in a process modeling block or other library, i'm not sure if it will work in 100% of the cases and won't cause erratic behaviors... i would try instead to be sure that your model is robust, to remove them through the sink block

Dataflow job has high data freshness and events are dropped due to lateness

I deployed an apache beam pipeline to GCP dataflow in a DEV environment and everything worked well. Then I deployed it to production in Europe environment (to be specific - job region:europe-west1, worker location:europe-west1-d) where we get high data velocity and things started to get complicated.
I am using a session window to group events into sessions. The session key is the tenantId/visitorId and its gap is 30 minutes. I am also using a trigger to emit events every 30 seconds to release events sooner than the end of session (writing them to BigQuery).
The problem appears to happen in the EventToSession/GroupPairsByKey. In this step there are thousands of events under the droppedDueToLateness counter and the dataFreshness keeps increasing (increasing since when I deployed it). All steps before this one operates good and all steps after are affected by it, but doesn't seem to have any other problems.
I looked into some metrics and see that the EventToSession/GroupPairsByKey step is processing between 100K keys to 200K keys per second (depends on time of day), which seems quite a lot to me. The cpu utilization doesn't go over the 70% and I am using streaming engine. Number of workers most of the time is 2. Max worker memory capacity is 32GB while the max worker memory usage currently stands on 23GB. I am using e2-standard-8 machine type.
I don't have any hot keys since each session contains at most a few dozen events.
My biggest suspicious is the huge amount of keys being processed in the EventToSession/GroupPairsByKey step. But on the other, session is usually related to a single customer so google should expect handle this amount of keys to handle per second, no?
Would like to get suggestions how to solve the dataFreshness and events droppedDueToLateness issues.
Adding the piece of code that generates the sessions:
input = input.apply("SetEventTimestamp", WithTimestamps.of(event -> Instant.parse(getEventTimestamp(event))
.withAllowedTimestampSkew(new Duration(Long.MAX_VALUE)))
.apply("SetKeyForRow", WithKeys.of(event -> getSessionKey(event))).setCoder(KvCoder.of(StringUtf8Coder.of(), input.getCoder()))
.apply("CreatingWindow", Window.<KV<String, TableRow>>into(Sessions.withGapDuration(Duration.standardMinutes(30)))
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))))
.discardingFiredPanes()
.withAllowedLateness(Duration.standardDays(30)))
.apply("GroupPairsByKey", GroupByKey.create())
.apply("CreateCollectionOfValuesOnly", Values.create())
.apply("FlattenTheValues", Flatten.iterables());
After doing some research I found the following:
regarding constantly increasing data freshness: as long as allowing late data to arrive a session window, that specific window will persist in memory. This means that allowing 30 days late data will keep every session for at least 30 days in memory, which obviously can over load the system. Moreover, I found we had some ever-lasting sessions by bots visiting and taking actions in websites we are monitoring. These bots can hold sessions forever which also can over load the system. The solution was decreasing allowed lateness to 2 days and use bounded sessions (look for "bounded sessions").
regarding events dropped due to lateness: these are events that on time of arrival they belong to an expired window, such window that the watermark has passed it's end (See documentation for the droppedDueToLateness here). These events are being dropped in the first GroupByKey after the session window function and can't be processed later. We didn't want to drop any late data so the solution was to check each event's timestamp before it is going to the sessions part and stream to the session part only events that won't be dropped - events that meet this condition: event_timestamp >= event_arrival_time - (gap_duration + allowed_lateness). The rest will be written to BigQuery without the session data (Apparently apache beam drops an event if the event's timestamp is before event_arrival_time - (gap_duration + allowed_lateness) even if there is a live session this event belongs to...)
p.s - in the bounded sessions part where he demonstrates how to implement a time bounded session I believe he has a bug allowing a session to grow beyond the provided max size. Once a session exceeded the max size, one can send late data that intersects this session and is prior to the session, to make the start time of the session earlier and by that expanding the session. Furthermore, once a session exceeded max size it can't be added events that belong to it but don't extend it.
In order to fix that I switched the order of the current window span and if-statement and edited the if-statement (the one checking for session max size) in the mergeWindows function in the window spanning part, so a session can't pass the max size and can only be added data that doesn't extend it beyond the max size. This is my implementation:
public void mergeWindows(MergeContext c) throws Exception {
List<IntervalWindow> sortedWindows = new ArrayList<>();
for (IntervalWindow window : c.windows()) {
sortedWindows.add(window);
}
Collections.sort(sortedWindows);
List<MergeCandidate> merges = new ArrayList<>();
MergeCandidate current = new MergeCandidate();
for (IntervalWindow window : sortedWindows) {
MergeCandidate next = new MergeCandidate(window);
if (current.intersects(window)) {
if ((current.union == null || new Duration(current.union.start(), window.end()).getMillis() <= maxSize.plus(gapDuration).getMillis())) {
current.add(window);
continue;
}
}
merges.add(current);
current = next;
}
merges.add(current);
for (MergeCandidate merge : merges) {
merge.apply(c);
}
}

In Dynamics GP, how do I remove items stuck in allocated?

We use WilloWare's MO Generator to create a Manufacturing Order in Dynamics GP. It creates a MO in GP and processes it. In this case, 90 items of a raw material (an item on the BOM of the item the MO is for) is stuck in allocated. WilloWare suggested I run Component Transaction Entry and MO Receipt Entry, but I receive errors on both. The Component Transaction Entry shows a yellow triangle beside the raw material I'm having problems with and clicking Post gives the message "the transaction quantity is greater than the expected quantity needed". The MO Receipt Entry also gives a yellow triangle beside the quantity to backflush by the raw material I'm having trouble with and clicking Post gives the message "At least one component has a shortage that has been overridden". This is the extent of my GP knowledge. I'm a C# developer by trade and am only looking into this because I wrote the batch program that runs the WilloWare software. How do I get the items out of allocated?
Here is the raw item with 90 items stuck in allocated:
Here is the MO:
Here's the Manufacturing Component Transaction Entry and the error message:
Here is the Manufacturing Order Receipt Entry and the error message:
To get item from allocation you simply need to run Inventory Reconcile on that item.
Microsoft Dynamics GP -> Tools -> Utilities -> Inventory -> Reconcile
then select that item and run reconcile, you can check - Include Item History
Make sure all NO user is login in GP at that time.

Graph with the number of events per hour?

I've got a list of Unix time stamps that looks like this:
1410576653
1410554469
1410527323
1410462212
1410431156
1410403429
1410403373
1410403258
1410402648
1410381795
1410293563
1410293330
1410292982
1410292718
1410276140
1410260911
1410233396
1410232755
1410229962
1410228512
1410222924
1410222655
1410221546
1410219208
1410218477
These were collected every time a certain event happened on my application.
I've got about 1100 of these within a timespan of about 4 years.
I want to find out what hours of the day the events happen most (or don't happen) on.
What would be the best way to create a graph with the number of events that happened per hour?

Selective dequeue of unrelated messages in Oracle Advanced Queueing

This question refers to the dequeueing of messages in Oracle Streams Advanced Queueing.
I need to ensure that the messages which are related to each other are processed sequentially.
For example, assume the queue is seeded with the four messages that have a business-related field called transaction reference (txn_ref) and two of the messages (1,3) belong to the same transaction (000001):
id | txn_ref |
---+---------+
1 | 000001 |
2 | 000002 |
3 | 000001 |
4 | 000003 |
Assume also that I am running 4 threads/processes that wish to dequeue from this queue. The following should occur:
thread 1 dequeues message #1
thread 2 dequeues message #2
thread 3 dequeues message #4 (because message #3 is related to #1 and #1 has not yet completed).
thread 4 blocks waiting for a message
thread 1 commits its work for message #1
thread 4 (or perhaps thread 1) dequeues message #3.
My initial thought was that I could achieve this with a dequeue condition where the ENQ_TIME (enqueue time) is not later than any other ENQ_TIME of all the messages that have the same TXN_REF. But my problem is how to reference the TXN_REF of a message that I have not yet selected, in order to select it. e.g.
// Java API
String condition = "ENQ_TIME = (select min(ENQ_TIME) from AQ_TABLE1 where ??";
dequeueOption.setCondition(condition);
Is it possible to achieve what I want here?
To answer your direct question, this can be achieved using the correlation field (called CORRID in the table), which is designed for this purpose.
So, on the enqueue, you'd use the AQMessageProperties.setCorrelation() method with the TXN_REF value as the parameter. Then, in your condition you would do something like this:
// Java API
String condition = "tab.ENQ_TIME = (select min(AQ_TABLE1.ENQ_TIME) from AQ_TABLE1 self where tab.CORRID=AQ_TABLE1.CORRID)";
dequeueOption.setCondition(condition);
A strategy which you can try, if possible, is using Message Groups. The Oracle Documentation describes it briefly, but I found this Toad World article to be far more useful. Basically, you setup the queue table to treat all messages committed at the same time as one "group". When dequeueing, only one user at a time can dequeue from a "group" of messages.

Resources