Xen time drift in DomU's

Xen time drift in DomU's - time

My Xen domu's keep drifting their times. My dom0 is using kernel 3.2.0 for AMD 64. DomU's are using 2.6.26. How do you keep the time from drifting?

I guess your Domu's are getting drifted w.r.t your Dom0 time. In that case , you can do one of the following:-
Configure ntp (Network time protocol) on your Domu. It is a fairly simple process. You can make changes in your /etc/ntp.conf file (assuming it is a linux domu) to include Dom0 as a ntp server and then start ntp daemon ("service ntpd start"). This way domu can sync their time with Dom0.
Configure Dom0 and all Domu's so that they sync their time with an external server. Make changes in ntp configuration file of all and then restart ntp on all of them. This way all of them will be in sync with some external source and hence time drift should not happen.
For an immediate time sync , you can use "ntpdate" command on Domu.

Related

Chrony sets system time but does not sync RTC

I have configured Chrony with rtcsync flag, which SHOULD "Enable kernel synchronization of the hardware real-time clock (RTC)", but that is not the case.
Chrony sets the system time correctly with ntp, but the RTC is untouched, and i can't seem to find out why that is. My guess is that the kernel doesn't recognize Chrony's request to sync the RTC, but that is just a guess.
Versions
Kernel: 4.19
Chrony: 3.5
UPDATE:
It appears that the external RTC is registered after the kernel tries to access it and this prevents syncing the RTC with the NTP synced system time.
from dmesg:
...
[ 6.317060] hctosys: unable to open rtc device (rtc)
...
[ 14.303503] rtc-ds1307 9-0068: registered as rtc0
...
I've done a temporary workaround by adding a cronjob that updates the hwclock every 10 minutes.

To get rtcsync working, you have to set the RTC_SYSTOHC and RTC_SYSTOHC_DEVICE kernel option properly as this simply asks the kernel to sync the system time to the RTC. It does so approximately every 11 minutes.
However, a better way of doing that is to use rtcfile (and rtcdevice) in that case, chrony will handle the RTC. It will even compute the RTC drift that could then be corrected if the RTC supports a trimming mechanism.

System Time becomes incorrect on reboot of VMs

Ever since virtualizing several physical servers into GCP, I have had an issue where anytime the servers(s) are rebooted the time is changed to be several hours ahead (I think it's 4 hours, but may be 6 hours). My local office is located in CST time zone and that is what we want the server to display. In GCP the virtual servers are in the us-central1a zone. On the virtual server, run the tzutil /g command it shows that the server is set to "central standard time". It also shows Central timezone if I click the clock on the toolbar then choosing "change date and time settings"
After the server has been rebooted (and reports wrong time) I can correct the time by clicking the "update now" options (found on toolbar clock, "change date and time settings", internet time tab, change settings" "update now" (this points to time server time.nist.gov).
This issue only began occurring after migrating into GCP so I believe it to be a Compute Engine issue and not an OS issue.
any thoughts on why this might be happening? I have this on occurring on all 4 windows servers that were migrated into Google Cloud. three are win2008r2, and one is Win2012r2
I appreciate any help that can be given to get this resolved, as I can't even reboot without connecting to the server afterwards and checking/fixing the time, I do have set a startup script to delay and then sync time after rebooting, but it has not worked 100% of the time, so this is more of a band-aid than a fix.

I do have set a startup script to delay and then sync time after rebooting, but it has not worked 100% of the time, so this is more of a band-aid
Getting this script working is probably the solution, here. For what it's worth, you'd need to do the same thing on both Azure and AWS as well, since they also set Windows timezones to UTC by default using the same mechanism.
See AWS docs on the Specialize Phase
See this Stackoverflow question for a similar question about Windows on Azure
Normally all servers run on UTC time, its clients (applications, browsers, etc) set their timezones according to where they are, and its up to them to translate UTC time to whichever locale they are in. (Put another way, you wouldn't want a server with a million client connections to have to keep track of each client's timezone in order to work properly). In your case, the bottom line is that requiring a custom timezone on the server will also require a custom server configuration, and the behavior you're seeing is by design. That's why your best bet is to understand why the startup script isn't working like you expect it to.
For reference, these docs may be helpful:
Google Compute Engine: Providing a startup script for Windows instances
Google Compute Engine: Creating a Windows Image

If you looked at the VM instance logs in the GCP Console you'd see that VM BIOS reports time in UTC
2019/10/3 14:9:44 Begin firmware boot time
After a while BIOS hands over to the bootloader
2019/10/3 14:9:45 End firmware boot time
Booting from Hard Disk 0...
The OS boots up. Behind the scene the OS time service recognizes the system timezone, then sets up and synchronizes time with the time source. From that time forward running programs and services report events based on the local system time:
...
2019/10/03 09:10:05 GCEWindowsAgent: GCE Agent Started (version 4.6.0#1)
In the Windows Event Log you should see entries made by the Time-Service:
Log Name: System
Source: Time-Service
Level: Information
The time provider NtpClient is currently receiving valid time data from metadata.google.internal,0x1 (ntp.m|0x1|0.0.0.0:123->169.254.169.254:123).
The time service is now synchronizing the system time with the time source metadata.google.internal,0x1 (ntp.m|0x1|0.0.0.0:123->169.254.169.254:123).
In the command prompt you can ensure that the time configuration and state are correct:
C:\Users\user>systeminfo | find /i "Time"
System Boot Time: 10/3/2019, 9:09:49 AM
Time Zone: (UTC-06:00) Central Time (US & Canada)
Hence you don't need synchronizing time neither manually or with with a startup script. The time service will do it for you: to synchronize the system time right after the boot and to keep it in sync afterwards.
All you need is to set correct Time zone and the Internet time server for Windows, and then make sure the time server is reachable via the network.

Network performance issues and slow tcp_write_xmit/tcp_ack syscalls with a lot of save_stack calls on OpenVZ kernel

I ran into a trouble with a bad network performance on Centos. The issue was observed on the latest OpenVZ RHEL7 kernel (3.10 based) on Dell server with 24 cores and Broadcom 5720 NIC. No matter it was host system or OpenVZ container. Server receives RTMP connections and reproxy RTMP streams to another consumers. Reads and writes was unstable and streams froze periodically for few seconds.
I've started to check system with strace and perf. Strace affects system heavily and seems that only perf may help. I've used OpenVZ debug kernel with debugfs enabled. System spends too much time in swapper process (according to perf data). I've built flame graph for the system under the load (100mbit in data, 200 mbit out) and have noticed that kernel spent too much time in tcp_write_xmit and tcp_ack. On the top of these calls I see save_stack syscalls.
On another hand, I tested the same scenario on Amazon EC2 instance (latest Amazon Linux AMI 2017.09) and perf doesn't track such issues. Total amount of samples was 300000, system spends 82% of time according to perf samples in swapper, but net_rx_action (and as consequent tcp_write_xmit and tcp_ack) in swapper takes only 1797 samples (0.59% of total amount of samples). On the top of net_rx_action call in flame graph I don't see any calls related to stack traces.
Output of OpenVZ system looks differently. Among 1833152 samples 500892 (27%) was in swapper process, 194289 samples (10.5%) was in net_rx_action.
Full svg of calls on vzkernel7 is here and svg of EC2 instance calls is here. You may download it and open in browser to interactively check flame graph.
So, I want to ask for help and I have few questions.
Why flame graph from EC2 instance doesn't contain so much save_stack calls like my server?
Does perf forces system to call save_stack or it's some kernel setting? May it be disabled and how?
Does Xen on EC2 guest process all tcp_ack and other syscalls? Is it possible that host system on EC2 server makes some job and guest system doesn't see it?
Thank you for a help.

I've read kernel sources and have an answer for my questions.
save_stack calls is caused by the Kernel Address Sanitizer feature that was enabled in OpenVZ debug kernel by CONFIG_KASAN option. When this options is enabled, on each kmem_cache_free syscall kernel calls __cache_free
static inline void __cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
/* Put the object into the quarantine, don't touch it for now. */
if (kasan_slab_free(cachep, objp))
return;
___cache_free(cachep, objp, caller);
}
With CONFIG_KASAN disabled kasan_slab_free will response with false (check include/linux/kasan.h). OpenVZ debug kernel was built with CONFIG_KASAN=y, Amazon AMI wasn't.

clock skew detected: you build may be incomplete

I've seen a bunch of these questions, most notable this one, which all say pretty much the same thing: This error is caused by the modification time of the source files being in the future, which usually occurs on a mounted NFS when the server clock and client clock are not in sync.
I've tried to touch all the files in my directory, as many have suggested. when that didn't work, I actually attempted copying all files out of the mounted drive and into a local drive, touching them again, then rerunning the build, and I still get the same error. Is there any other way to solve this problem?

The NFS server and NFS client's system times are out of sync. The NFS server is probably drifting ahead.
Running make on an NFS mount is sensitive to the millisecond level so client/server system times must be tight as a drum. This can be done by having your NFS client(s) sync their time off of the NFS server's time using NTP at the highest rate allowed (usually 8 seconds). On a LAN this should get you sub-millisecond accuracy.
Install NTP on both the NFS client(s) and the NFS server.
On the NTP config file of the clients (ntp.conf in linux), comment out the entries starting with 'pool' or 'server' and add the line:
server [put address of the nfs server here] minpoll 3 maxpoll 3
... The '3' is a power-of-two in seconds for the polling interval, hence 8 seconds. The NFS server's NTP config file can probably be left alone.
Restart the ntpd service on your client.
Test that your client is syncing by using the linux command within the client:
ntpq -p
... the important part is that your 'reach' column is not zero for long as that means it cannot contact the server's NTP.
If they don't sync, you may have to reboot the client and server. This may be the case with Synology NAS as the NTP server.
Perform a full clean of your build (even nuke the directory and re-clone if convenient) and try again.
Similar answers are throughout the internet, but they suggest simply installing NTP to the machines. This wasn't good enough to solve the issue for me - they weren't synced tightly enough. A better way is to sync the clients' clocks to the server's clock on the local network at very frequent intervals. This is frowned upon with the internet but cheap on a LAN.
If this isn't possible, at least try to ensure NTP on the clients and server uses the same time servers in its pool/server entries.

If you are using Windows check if you compiling on a FAT file-system and if so try to switch.
FAT has a 2 second resolution, so its possible for your build to add to an archive, compile the next file, but detect that the archive is already up to date. Time resolutions for other file systems are listed in another answer.
If you must FAT consider the .LOW_RESOLUTION_TIME special target.

I need to use NTP to serve a time offset from system time. Is broadcast the way to go?

I have a closed network with a few nodes that are mutually consistent in time. For this I use NTP with one node as the NTP server. One of the nodes is a dumb box over which I have little control. It runs an sntp client to synchronize time to the system NTP server. I now need the box to be set to a time that is offset from the system time by an amount that I control. I am trying to find out if this can be done using only the available sntp client on the box. I will now present you my approach and would love to hear from anyone who knows if this can be done.
As far as I found out a standard NTP server cannot be made to serve a time that is offset from the server's system time. I will therefore have to write my own implementation. The conceptually simplest NTP server must be a broadcast-only server. My thought is that I will be able to set the sntp box to listen to broadcast and then just send NTP broadcast packets set to my custom time.
Are there any NTP server implementations that allow me to do this out of the box?
Can anyone tell me how hard it is to write an sNTP broadcast server - or any other NTP server?
Does anyone know of any tutorials for how to write an NTP server?
Are there any show-stoppers to the scheme I am describing above?
To try to answer the questions that will inevitably come up:
Yes, I am also thinking about a new interface on the box to set the time to a value I specify. But that is not what I am asking about, and no, it will not be much simpler.
I have inverstigated if I could just use the time that the box needs as the system time. This is not an option. I will need two different times, one for the system and one for the box.
All insight will be appreciated! Even opinions like "it should be doable."

You could use Jans to serve a fake time. I have no experience with this product but I know of it from the ntp mailing list. It will allow you to server faketime but it does none of the clock discipline like the reference implementation.
More info: http://www.vanheusden.com/time/jans/

Jans on its own is not suitable to provide fake time with offset, but it can provide real time plus a lot of test functionality like time drift, so on.
I used Jans as the source of real time in conjunction with llibfaketime on linux CentOs 6 as fake NTP server with + or - offset.
Just wget jans-0.3.tgz and run "make" from here:
https://www.vanheusden.com/time/jans/
RPM of libfaketime for CentOs 6 is here:
http://rpm.pbone.net/info_idpl_54489387_distro_centos6_com_libfaketime-0.9.7-1.1.x86_64.rpm.html
or find it for your distro.
Stop real NTP server if its running on your linux:
service ntpd stop
Run fake NTP server (for examle 15 days in the past):
LD_PRELOAD=/usr/lib64/libfaketime.so.1 FAKETIME="-15d" ./jans -P 123 -t real
Keep in mind that NTP server can be running only on port 123, otherwise you should use iptables masquerading.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio