Is it the Windows system time could retreat occasionally - windows

Spring boot project, log how many time took to save 2 DB,
long start = System.currentTimeMillis();
getDao().batchInsert(batchList);
long end = System.currentTimeMillis();
log.info("Save {} data 2 DB successfully took time: {}", getDescName(), (end - start));
Very strangely, I found there is situation time cost is negativeļ¼Œsee below
2019-05-16 14:41:04.420 INFO 3324 --- [ave2db-thread-2] c.c.sz_vss.demo.AbstractSave2DBProcess : Save Stock data 2 DB batch size: 416
2019-05-16 14:41:03.152 INFO 3324 --- [ave2db-thread-2] c.c.sz_vss.demo.AbstractSave2DBProcess : Save Stock data 2 DB successfully took time: -1268
Why does this happen? Is it spring boot log system bug? or is it the Windows system time could retreat occasionally?

Network time synchronization can apply a correction in either direction, so yes the system calendar can move backwards. Also in timezones that observe daylight savings time, you will see +/- 1 hour discontinuities each year.
That's why it is not recommended to use the system calendar for measuring elapsed time. There are monotonic timers (on Windows, QueryPerformanceCounter() combined with QueryPerformanceFrequency(), on POSIX such as Linux it is clock_gettime(CLOCK_MONOTONIC)). Managed frameworks usually wrap these in an object named "Stopwatch".

Related

Time drift between processes on the same virtual guest (VMWare host and windows guest)

I'm struggling a bit with time drift between processes on the same virtual guest
In real life - we have around 50 processes sending messages to each other and the drift
makes reading logs hard.
To illustrate my problem I am running two processes I wrote on a Windows server 2019.
Process 1 (time_client - tc ) finds out what time it is,
and then send a string representing the current time to
process 2 (server TS) via a named pipe.
The string sent looks like '23-May-2022 14:26:55.608'
printout from TS looks like
23-May-2022 13:03:29.344 -
23-May-2022 14:39:57.396 -
23-May-2022 14:39:57.492
diff is 00000:00:00:00.096
server is ahead FALSE
where diff is days:hours:minutes:seconds.milliseconds
TS - the server - does the following:
save the time when process is started
then, upon arrival of time_string from TC
get time from os
print starttime, time from os, time from TC, and diff time_from_os - time_String from TC
tc send a new time_string every minute. The TC is started at every invokation and runs to completion, so every time it is a new instance running
I notice this after ca 2 hrs
win server 2019 on VMWare
diff grows to ca 150 ms after 1 hour
win server 2016 on VMWare
diff grows to ca 50 ms after 1 hour
Linux Rocky 8.6 on VirtualBox - Host Win 10
diff grows to ca 0 ms after 2 hours
The drift between the two processes on windows is very annoying since it messes up log completely.
One process is creating an event, and send a message to another process, which treats is several seconds earlier - according to the logs.
The process are usually up for months - but for some communications processes that are restarted due to lost communication at least daily
So - it is not that the guest is out of synch - that would be ok.
It is that the processes get a different value of 'now' depending on how long they have been running.
Is this a well known problem ? Having a hard time googling it
The problem could be within VMWare or within Windows.

Dataflow job has high data freshness and events are dropped due to lateness

I deployed an apache beam pipeline to GCP dataflow in a DEV environment and everything worked well. Then I deployed it to production in Europe environment (to be specific - job region:europe-west1, worker location:europe-west1-d) where we get high data velocity and things started to get complicated.
I am using a session window to group events into sessions. The session key is the tenantId/visitorId and its gap is 30 minutes. I am also using a trigger to emit events every 30 seconds to release events sooner than the end of session (writing them to BigQuery).
The problem appears to happen in the EventToSession/GroupPairsByKey. In this step there are thousands of events under the droppedDueToLateness counter and the dataFreshness keeps increasing (increasing since when I deployed it). All steps before this one operates good and all steps after are affected by it, but doesn't seem to have any other problems.
I looked into some metrics and see that the EventToSession/GroupPairsByKey step is processing between 100K keys to 200K keys per second (depends on time of day), which seems quite a lot to me. The cpu utilization doesn't go over the 70% and I am using streaming engine. Number of workers most of the time is 2. Max worker memory capacity is 32GB while the max worker memory usage currently stands on 23GB. I am using e2-standard-8 machine type.
I don't have any hot keys since each session contains at most a few dozen events.
My biggest suspicious is the huge amount of keys being processed in the EventToSession/GroupPairsByKey step. But on the other, session is usually related to a single customer so google should expect handle this amount of keys to handle per second, no?
Would like to get suggestions how to solve the dataFreshness and events droppedDueToLateness issues.
Adding the piece of code that generates the sessions:
input = input.apply("SetEventTimestamp", WithTimestamps.of(event -> Instant.parse(getEventTimestamp(event))
.withAllowedTimestampSkew(new Duration(Long.MAX_VALUE)))
.apply("SetKeyForRow", WithKeys.of(event -> getSessionKey(event))).setCoder(KvCoder.of(StringUtf8Coder.of(), input.getCoder()))
.apply("CreatingWindow", Window.<KV<String, TableRow>>into(Sessions.withGapDuration(Duration.standardMinutes(30)))
.triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(30))))
.discardingFiredPanes()
.withAllowedLateness(Duration.standardDays(30)))
.apply("GroupPairsByKey", GroupByKey.create())
.apply("CreateCollectionOfValuesOnly", Values.create())
.apply("FlattenTheValues", Flatten.iterables());
After doing some research I found the following:
regarding constantly increasing data freshness: as long as allowing late data to arrive a session window, that specific window will persist in memory. This means that allowing 30 days late data will keep every session for at least 30 days in memory, which obviously can over load the system. Moreover, I found we had some ever-lasting sessions by bots visiting and taking actions in websites we are monitoring. These bots can hold sessions forever which also can over load the system. The solution was decreasing allowed lateness to 2 days and use bounded sessions (look for "bounded sessions").
regarding events dropped due to lateness: these are events that on time of arrival they belong to an expired window, such window that the watermark has passed it's end (See documentation for the droppedDueToLateness here). These events are being dropped in the first GroupByKey after the session window function and can't be processed later. We didn't want to drop any late data so the solution was to check each event's timestamp before it is going to the sessions part and stream to the session part only events that won't be dropped - events that meet this condition: event_timestamp >= event_arrival_time - (gap_duration + allowed_lateness). The rest will be written to BigQuery without the session data (Apparently apache beam drops an event if the event's timestamp is before event_arrival_time - (gap_duration + allowed_lateness) even if there is a live session this event belongs to...)
p.s - in the bounded sessions part where he demonstrates how to implement a time bounded session I believe he has a bug allowing a session to grow beyond the provided max size. Once a session exceeded the max size, one can send late data that intersects this session and is prior to the session, to make the start time of the session earlier and by that expanding the session. Furthermore, once a session exceeded max size it can't be added events that belong to it but don't extend it.
In order to fix that I switched the order of the current window span and if-statement and edited the if-statement (the one checking for session max size) in the mergeWindows function in the window spanning part, so a session can't pass the max size and can only be added data that doesn't extend it beyond the max size. This is my implementation:
public void mergeWindows(MergeContext c) throws Exception {
List<IntervalWindow> sortedWindows = new ArrayList<>();
for (IntervalWindow window : c.windows()) {
sortedWindows.add(window);
}
Collections.sort(sortedWindows);
List<MergeCandidate> merges = new ArrayList<>();
MergeCandidate current = new MergeCandidate();
for (IntervalWindow window : sortedWindows) {
MergeCandidate next = new MergeCandidate(window);
if (current.intersects(window)) {
if ((current.union == null || new Duration(current.union.start(), window.end()).getMillis() <= maxSize.plus(gapDuration).getMillis())) {
current.add(window);
continue;
}
}
merges.add(current);
current = next;
}
merges.add(current);
for (MergeCandidate merge : merges) {
merge.apply(c);
}
}

Spring Boot Reactive: elapsed time calculation

I am currently using Spring Boot Reactive (using webflux) to develop a microservice. In it, I implement some kind of elapsed time calculation to determine how long it took for the process to run.
Basically, when the process start, I initialize the current timestamp to mark as the start timestamp as follows:
...
metrics.setStartMillis(System.currentTimeMillis());
...
and then it will print whether the process success or not along with the elapsed time in doOnSuccess() and onErrorResume() respectively:
...
metrics.setStartMillis(System.currentTimeMillis());
return webclientAdapter.getResponse(request)
.doOnSuccess(success -> {
metrics.info(true); // This will print a metric log indicating a success process along with process time with current timestamp
})
.onErrorResume(error -> {
metrics.info(false); // This will print a metric log indicating a failed process along with process time with current timestamp
});
...
When testing the service by mocking the backend call with a 100ms delay using cUrl, the elapsed time manage to be printed correctly (~100ms elapsed time), however, during a load test using jmeter, the printed elapsed time become very fast (~0 - 20ms ish) although the service is configured to call the mock backend with 100ms delay.
Does this have to do with the nature of Reactive being in an event loop, and if so, how can I ensure that the calling process elapsed time is able to be calculated properly?
Pardon if there is any confusion, feel free to ask additional information.

NTP time synchronization -

I want to write synchro check in my VBS application but starting at the very beginning i think whole NTP synchro is doesn't work.
Please, advice me how can i check this?
My network:
1 station with WindowsServer2012R2 (set as NTP Server)
6 stations with Windows10Pro (set as NTP Clients)
From Client station - when i manually press "Update Now" - from:
Date and Time / Internet Time / Change Settings / -> Update Now.
The time on client is updating.
But when set pooling interval to - for instance - every 5 minutes
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\W32Time\TimeProviders\NtpClient]
"SpecialPollInterval"=dword:0000012c
When i change time during this 5 minutes, the time will be not synchronized.
Why?
I don't see any reason why it does not work?
What i am missing?

How to get execution time in milliseconds format in Weka

I am using WEKA jvm and trying to run generalized sequential pattern algorithm on my data. I want to get execution time in millisecond. How can I do it ?
Firstly. this has nothing to do with weka. Remove that tag.
Put this at the beginning of your code
long startTime = Calendar.getInstance().getTimeInMillis();
and at the end of your program,
long endTime = Calendar.getInstance().getTimeInMillis();
long timeTook = endTime - startTime
System.out.println("time took in milliseconds="+timeTook);
System.out.println("time took in seconds="+timeTook/1000);
System class also has methods currentTimeMillis() and nanoTime()
But keep that in mind, if you're taking input from users. More user waits to enter the input, more the time taken will be displayed.

Resources