How to account for clock offsets in a distributed system? - time

Background
I have a system consisting of several distributed services, each of which is continuously generating events and reporting these to a central service.
I need to present a unified timeline of the events, where the ordering in the timeline corresponds to the moment event occurred. The frequency of event occurrence and the network latency is such that I cannot simply use time of arrival at the central collector to order the events.
E.g. in the following scenario:
E1 needs to be rendered in the timeline above E2, despite arriving at the collector afterwards, which means the events need to come with timestamp metadata. This is where the problem arises.
Problem
Due to constraints on how the environment is set up, it is not possible to ensure that the local time services on each machine are reliably aware of current UTC time. I can assume that each machine can accurately gauge relative time, i.e. the clock speeds are close enough to make measurement of short timespans identical, but problems like NTP misconfiguration/partitioning make it impossible to guarantee that every machine agrees on the current UTC time.
This means that a naive approach of simply generating a local timestamp for each event as it occurs, then ordering events using that will not work: every machine has its own opinion of what universal time is.
So the question is: how can I recover an ordering for events generated in a distributed system where the clocks do not agree?
Approaches I've considered
Most solutions I find online go down the path of trying to synchronize all the clocks, which is not possible for me since:
I don't control the machines in question
The reason the clocks are out of sync in the first place is due to network flakiness, which I can't fix
My own idea was to query some kind of central time service every time an event is generated, then stamp that event with the retrieved time minus network flight time. This gets hairy, because I have to add another service to the system and ensure its availability (I'm back to square zero if the other services can't reach this one). I was hoping there is some clever way to do this that doesn't require me to centralize timekeeping in this way.

A simple solution, somewhat inspired by your own at the end, is to periodically ping what I'll call the time-source server. In the ping include the service's chip clock; the time-source echos that and includes its timestamp. The service can then deduce the round-trip-time and guess that the time-source's clock was at the timestamp roughly round-trip-time/2 nanoseconds ago. You can then use this as an offset to the local chip clock to determine a globalish time.
You don't have to use a different service for this; the Collector server will do. The important part is that you don't have to ask call the time-source server at every request; it removes it from the critical path.
If you don't want a sawtooth function for the time, you can smooth the time difference
Congratulations, you've rebuilt NTP!

Related

Why TSync(Time Synchronization) is needed in Adaptive AUTOSAR?

I'm a rookie in Adaptive AUTOSAR.
I can't imagine why Time Synchronization(Tysnc) is needed. System time of ECUs can be synchronized by PTP.
Could you explain why Tsync is needed even though PTP synchronize time across a distributed system? Or I welcome any documents or materials for me to understand Tsync's usages or use-cases.
The reason for the existence time sync along with the definition of time domains is that you need to be able to define different time domains across different bus systems within the vehicle. One example for a not directly obvious definition of a time domain could be the metering of operation-hours.
On top of that, the time domains can cross AUTOSAR platforms, i.e. a time domain may consists of both CP and AP nodes.
You can find explanations for time sync in (e.g) the AUTOSAR documents TPS Manifest and TPS System Template.
There need to be different time bases in vehicle.
Examples of Time Bases in vehicles are:
• Absolute, which is based on a GPS based time.
• Relative, which represents the accumulated overall operating time of a vehicle,
i.e. this Time Base does not start with a value of zero whenever the vehicle starts
operating.
• Relative, starting at zero when the ECU begins its operation.

Kafka message timestamps for request/response

I am building a performance monitoring tool which works in a cluster with Kafka topics.
For example, I am monitoring two topics: request, response. I.e. I need to have two timestamps - one from request and another from response. Then I could calculate difference to see how much time spent in a service which received a request and produced a response.
Please take in the account that it is working on a cluster, so different components may run on different hosts, hence - different physical clocks - so they could be out-of-sync and it will distort results significantly.
Also, I could not reliably use the clock of the monitoring tool itself, as this will influence timing results by its own processing times.
So, I would like to design a proper way which is reliably calculate time difference. What is most reliable way to measure time difference between two events in Kafka?
Solution 1:
We had similar problem before and solution we had was setting up NTP ( network time protocol).
In this one of your node act as NTP server and runs demons to keep time in sync across all your nodes we kept UTC and all other nodes has NTP clients which kept same time across all the servers
Solution 2:
Build a clock common API for all your components which will provide current time. This will make your system design independent of node local clock.

How can i run my GPS application in background?

I want to send my current location to php web service after every 5 min even if my application is runing in background. I try to make this thing but its working good when my application in running state but when i put this application in background it stop sending data so please any buddy tell how can i run my application in background.
By "running in background", do you mean running when under the lock screen? If this is the case, then you need to set PhoneApplicationService.Current.ApplicationIdleDetectionMode = IdleDetectionMode.Disabled;
The post Running a Windows Phone Application under the lock screen by Jaime Rodriguez covers the subject well.
However, if you're talking about running an application that continues to run while the user uses other applications on the device, then this is not possible. In the Mango build of the operating system you can create background agents, but these only run every 30 minutes and only for 15 seconds as described on MSDN.
There is a request on the official UserVoice forum for Windows Phone development to Provide an agent to track routes, but even if adopted, this would not be available for quite some time.
Tracking applications are the bulk of what I do for a living, and the prospect of using WP7 like this is the primary reason I acquired one.
From a power consumption perspective, transmitting data is the single most expensive thing you can do, followed closely by sampling the GPS and accelerometers.
To produce a trace that closely conforms to roads, you need a higher sampling rate. WP7 won't let you sample more than once per second. This is (just barely) fast enough to track a motor vehicle, and at this level of power consumption the battery will last for about an hour assuming you log the data on the phone and don't attempt to transmit it.
You will also find that if you transmit for every sample, your sampling interval will be at least 15 seconds. Running the web call on another thread won't help because it will take more than one second to complete and you will run out of sockets in less than a minute with a one second sample interval.
There are solutions to all of these problems. For example, in a motor vehicle you can connect to vehicle power and run hot. You can batch and burst your data on a background thread.
These, however, are only the basic problems faced by every tracker designer. More interesting are the questions of proximity in space and time, measurement of deviation from a route, how to specify routes and geofences in a time dependent manner, how to associate them into named sets for rule evaluation purposes and how to associate rules with named sets of routes and geofences.
And then there is periodic clustering, which introduces all the calendrical nightmares that are too much for your average developer of desktop software. To apply the speed limit for a school zone you need to know the time zone, daylight savings, two start and two stop times and the start and end dates for school holidays in that region.
If you are just doing this for fun or as some kind of hiking trace then a five minute interval will impose much milder power demands than one second sampling, but I still suggest batch and burst because it means you can track locations that don't have comms.

What are common pitfalls of timestamp based syncing?

I am implementing my first syncing code. In my case I will have 2 types of iOS clients per user that will sync records to a server using a lastSyncTimestamp, a 64 bit integer representing the Unix epoch in milliseconds of the last sync. Records can be created on the server or the clients at any time and the records are exchanged as JSON over HTTP.
I am not worried about conflicts as there are few updates and always from the same user. However, I am wondering if there are common things that I need to be aware of that can go wrong with a timestamp based approach such as syncing during daylight savings time, syncs conflicting with another, or other gotchas.
I know that git and some other version control system eschew syncing with timestamps for a content based negotiation syncing approach. I could imagine such an approach for my apps too, where using the uuid or hash of the objects, both peers announce which objects they own, and then exchange them until both peers have the same sets.
If anybody knows any advantages or disadvantages of content-based syncing versus timestamp-based syncing in general that would be helpful as well.
Edit - Here are some of the advantages/disadvantages that I have come up with for timestamp and content based syncing. Please challenge/correct.
Note - I am defining content-based syncing as simple negotiation of 2 sets of objects such as how 2 kids would exchange cards if you gave them each parts of a jumbled up pile of 2 identical sets of baseball cards and told them that as they look through them to announce and hand over any duplicates they found to the other until they both have identical sets.
Johnny - "I got this card."
Davey - "I got this bunch of cards. Give me that card."
Johnny - "Here is your card. Gimme that bunch of cards."
Davey - "Here are your bunch of cards."
....
Both - "We are done"
Advantages of timestamp-based syncing
Easy to implement
Single property used for syncing.
Disadvantages of timestamp-based syncing
Time is a relative concept to the observer and different machine's clocks can be out of sync. There are a couple ways to solve this. Generate timestamp on a single machine, which doesn't scale well and represents a single point of failure. Or use logical clocks such as vector clocks. For the average developer building their own system, vector clocks might be too complex to implement.
Timestamp based syncing works for client to master syncing but doesn't work as well for peer to peer syncing or where syncing can occur with 2 masters.
Single point of failure, whatever generates the timestamp.
Time is not really related to the content of what is being synced.
Advantages of content-based syncing
No per peer timestamp needs to be maintained. 2 peers can start a sync session and start syncing based on the content.
Well defined endpoint to sync - when both parties have identical sets.
Allows a peer to peer architecture, where any peer can act as client or server, providing they can host an HTTP server.
Sync works with the content of the sets, not with an abstract concept time.
Since sync is built around content, sync can be used to do content verification if desired. E.g. a SHA-1 hash can be computed on the content and used as the uuid. It can be compared to what is sent during syncing.
Even further, SHA-1 hashes can be based on previous hashes to maintain a consistent history of content.
Disadvantages of content-based syncing
Extra properties on your objects may be needed to implement.
More logic on both sides compared to timestamp based syncing.
Slightly more chatty protocol (this could be tuned by syncing content in clusters).
Part of the problem is that time is not an absolute concept. Whether something happens before or after something else is a matter of perspective, not of compliance with a wall clock.
Read up a bit on relativity of simultaneity to understand why people have stopped trying to use wall time for figuring these things out and have moved to constructs that represent actual causality using vector clocks (or at least Lamport clocks).
If you want to use a clock for synchronization, a logical clock will likely suit you best. You will avoid all of your clock sync issues and stuff.
I don't know if it applies in your environment, but you might consider whose time is "right", the client or the server (or if it even matters)? If all clients and all servers are not sync'd to the same time source there could be the possibility, however slight, of a client getting an unexpected result when syncing to (or from) the server using the client's "now" time.
Our development organization actually ran into some issues with this several years ago. Developer machines were not all sync'd to the same time source as the server where the SCM resided (and might not have been sync'd to any time source, thus the developer machine time could drift). A developer machine could be several minutes off after a few months. I don't recall all of the issues, but it seems like the build process tried to get all files modified since a certain time (the last build). Files could have been checked in, since the last build, that had modification times (from the client) that occurred BEFORE the last build.
It could be that our SCM procedures were just not very good, or that our SCM system or build process were unduly susceptible to this problem. Even today, all of our development machines are supposed to sync time with the server that has our SCM system on it.
Again, this was several years ago and I can't recall the details, but I wanted to mention it on the chance that it is significant in your case.
You could have a look at unison. It's file-based but you might find some of the ideas interesting.

sync client time to server time, i.e. to make client application independant of the local computer time

Ok, so the situation is as follows.
I have a server with services for a game, a particular command from the server sends a timestamp for when the next game round should commence. To get this perfectly synced on all connected clients I also have a webbservice that returns a timestamp of the servers current time.
What I know: the time between request sent and answer recieved.
What I dont know: where the latency lies, on client processing or server processing or bandwidth issues.
What is the best practice to get a reasonable result here. I guess that GPS must have solved this in some fashion but I´ve been unable to find a good pattern.
What I do now is to add half the latency of the request to the server timestamp, but it's not quite good enough. This may have to do that the time between send and recieve can be as high as 11 seconds.
Suggestions?
There're many common solutions to sync time between machines, including correct PLL implementation done by NTPD with RTP. This is useful to you if you can change machine's local time. If not, perhaps you should do more or less what you did, but drop sync points where the latency is unreasonable.
The best practice is usually not to synchronise the absolute times but to work with relative times instead.

Resources