how to determine where the bottleneck is in my slow (home) internet? - macos

I recently got a new cable modem from my ISP (Rogers in Canada; old modem was a "Webstar" something, the new modem is a "SMC D3GN-RRR"). Since I got the new modem, it feels like my internet access is slower.
What I'm perceiving is sometimes when I enter a URL and hit enter, there is a delay -- a slight dealy, but it lasts half a second to two or three seconds -- before the web page loads. Once the web page starts loading it loads fast, but there's that delay during while it's looking it up or something.
I have a MacBook Pro, an Apple Airport Extreme wireless router, the new cable modem.
Is there some kind of tool, or cool UNIX command (traceroute, or something?) I can run see how much time is takes to jump from device to device, so I can "prove" where the delay is?
Just FYI, here's a "traceroute www.google.com", in case it's useful. I don't know what this means. :)
traceroute www.google.com
traceroute: Warning: www.google.com has multiple addresses; using 173.194.75.105
traceroute to www.l.google.com (173.194.75.105), 64 hops max, 52 byte packets
1 10.0.1.1 (10.0.1.1) 4.455 ms 1.204 ms 1.263 ms
2 * * *
3 * 69.63.255.237 (69.63.255.237) 36.694 ms 30.209 ms
4 69.63.250.210 (69.63.250.210) 44.503 ms 41.303 ms 46.039 ms
5 gw01.mtmc.phub.net.cable.rogers.com (66.185.81.137) 40.504 ms 34.937 ms 44.493 ms
6 * * *
7 * 216.239.47.114 (216.239.47.114) 58.605 ms 37.710 ms
8 216.239.46.170 (216.239.46.170) 56.073 ms 57.250 ms 64.373 ms
9 72.14.239.93 (72.14.239.93) 70.879 ms
209.85.249.11 (209.85.249.11) 114.399 ms 59.781 ms
10 209.85.243.114 (209.85.243.114) 72.877 ms 80.151 ms
209.85.241.222 (209.85.241.222) 82.524 ms
11 216.239.48.159 (216.239.48.159) 82.227 ms
216.239.48.183 (216.239.48.183) 80.065 ms
216.239.48.157 (216.239.48.157) 79.660 ms
12 * * *
13 ve-in-f105.1e100.net (173.194.75.105) 76.967 ms 71.142 ms 80.519 ms

Same problem for me. Download and upload numbers are great for me. Looks like the DNS, in my case, is responding extremely slowly (not due to slow RTT or line speeds, but maybe the DNS server itself is overloaded, or perhaps the DNS in this region is overloaded or under some form of attack). If I'm correct, it isn't trivial for us as "customers" to demonstrate that this is the explanation, or to have it fixed, unfortunately.

Related

Time drift between processes on the same virtual guest (VMWare host and windows guest)

I'm struggling a bit with time drift between processes on the same virtual guest
In real life - we have around 50 processes sending messages to each other and the drift
makes reading logs hard.
To illustrate my problem I am running two processes I wrote on a Windows server 2019.
Process 1 (time_client - tc ) finds out what time it is,
and then send a string representing the current time to
process 2 (server TS) via a named pipe.
The string sent looks like '23-May-2022 14:26:55.608'
printout from TS looks like
23-May-2022 13:03:29.344 -
23-May-2022 14:39:57.396 -
23-May-2022 14:39:57.492
diff is 00000:00:00:00.096
server is ahead FALSE
where diff is days:hours:minutes:seconds.milliseconds
TS - the server - does the following:
save the time when process is started
then, upon arrival of time_string from TC
get time from os
print starttime, time from os, time from TC, and diff time_from_os - time_String from TC
tc send a new time_string every minute. The TC is started at every invokation and runs to completion, so every time it is a new instance running
I notice this after ca 2 hrs
win server 2019 on VMWare
diff grows to ca 150 ms after 1 hour
win server 2016 on VMWare
diff grows to ca 50 ms after 1 hour
Linux Rocky 8.6 on VirtualBox - Host Win 10
diff grows to ca 0 ms after 2 hours
The drift between the two processes on windows is very annoying since it messes up log completely.
One process is creating an event, and send a message to another process, which treats is several seconds earlier - according to the logs.
The process are usually up for months - but for some communications processes that are restarted due to lost communication at least daily
So - it is not that the guest is out of synch - that would be ok.
It is that the processes get a different value of 'now' depending on how long they have been running.
Is this a well known problem ? Having a hard time googling it
The problem could be within VMWare or within Windows.

Postgres connect time delay on Windows

There is a long delay between "forked new backend" and "connection received", from about 200 to 13000 ms. Postgres 12.2, Windows Server 2016.
During this delay the client is waiting for the network packet to start the authentication. Example:
14:26:33.312 CEST 3184 DEBUG: forked new backend, pid=4904 socket=5340
14:26:33.771 CEST 172.30.100.238 [unknown] 4904 LOG: connection received: host=* port=56983
This was discussed earlier here:
Postegresql slow connect time on Windows
But I have not found a solution.
After rebooting the server the delay is much shorter, about 50 ms. Then it gradually increases in the course of a few hours. There are about 100 clients connected.
I use ip addresses only in "pg_hba.conf". "log_hostname" is off.
There is BitDefender running on the server but switching it off did not help. Further, Postgres files are excluded from BitDefender checks.
I used Process Monitor which revealed the following: Forking the postgres.exe process needs 3 to 4 ms. Then, after loading DLLs, postgres.exe is looking for custom and extended locale info of 648 locales. It finds none of these. This locale search takes 560 ms (there is a gap of 420 ms, though). Perhaps this step can be skipped by setting a connection parameter. After reading some TCP/IP parameters, there are no events for 388 ms. This time period overlaps the 420 ms mentioned above. Then postgres.exe creates a thread. The total connection time measured by the client was 823 ms.
Locale example, performed 648 times:
"02.9760160","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale","REPARSE","Desired Access: Read"
"02.9760500","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale","SUCCESS","Desired Access: Read"
"02.9760673","RegQueryValue","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale\bg-BG","NAME NOT FOUND","Length: 532"
"02.9760827","RegCloseKey","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale","SUCCESS",""
"02.9761052","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale","REPARSE","Desired Access: Read"
"02.9761309","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale","SUCCESS","Desired Access: Read"
"02.9761502","RegQueryValue","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale\bg-BG","NAME NOT FOUND","Length: 532"
"02.9761688","RegCloseKey","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale","SUCCESS",""
No events for 388 ms:
"03.0988152","RegCloseKey","HKLM\System\CurrentControlSet\Services\Tcpip6\Parameters\Winsock","SUCCESS",""
"03.4869332","Thread Create","","SUCCESS","Thread ID: 2036"

Traceroute OSX Terminal vs Network Utility

Why would terminal>traceroute #.#.#.# show different results than using the network utility.app? Here is the first 3 hops. I am connected to PIA VPN but regardless both methods should show the same results I would think.
termnial
traceroute to 8.8.8.8 (8.8.8.8), 64 hops max, 52 byte packets
1 10.199.1.1 (10.199.1.1) 42.559 ms 39.696 ms 38.293 ms
2 * * *
3 184-75-211-129.amanah.com (184.75.211.129) 49.639 ms
162.219.176.225 (162.219.176.225) 56.780 ms
dpaall.webexpressmail.net (162.219.179.65) 69.798 ms
netutil.app
traceroute to 8.8.8.8 (8.8.8.8), 64 hops max, 72 byte packets
1 10.199.1.1 (10.199.1.1) 41.221 ms 38.355 ms 47.237 ms
2 vl685-c8-10-c6-1.pnj1.choopa.net (209.222.15.225) 41.262 ms 38.674 ms 41.912 ms
3 vl126-br1.pnj1.choopa.net (108.61.92.105) 44.092 ms 36.200 ms 40.407 ms

Strategies in reducing network delay from 500 milliseconds to 60-100 milliseconds

I am building an autocomplete functionality and realized the amount of time taken between the client and server is too high (in the range of 450-700ms)
My first stop was to check if this is result of server delay.
But as you can see these Nginx logs are almost always 0.001 milliseconds (request time is the last column). It’s hardly a cause of concern.
So it became very evident that I am losing time between the server and the client. My benchmarks are Google Instant's response times. Which almost often is in the range of 30-40 milliseconds. Magnitudes lower.
Although it’s easy to say that Google's has massive infrastructural capabilities to deliver at this speed, I wanted to push myself to learn if this is possible for someone who is not that level. If not 60 milliseconds, I want to shave off 100-150 milliseconds.
Here are some of the strategies I’ve managed to learn.
Enable httpd slowstart and initcwnd
Ensure SPDY if you are on https
Ensure results are http compressed
Etc.
What are the other things I can do here?
e.g
Does have a persistent connection help?
Should I reduce the response size dramatically?
Edit:
Here are the ping and traceroute numbers. The site is served via cloudflare from a Fremont Linode machine.
mymachine-Mac:c name$ ping site.com
PING site.com (160.158.244.92): 56 data bytes
64 bytes from 160.158.244.92: icmp_seq=0 ttl=58 time=95.557 ms
64 bytes from 160.158.244.92: icmp_seq=1 ttl=58 time=103.569 ms
64 bytes from 160.158.244.92: icmp_seq=2 ttl=58 time=95.679 ms
^C
--- site.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 95.557/98.268/103.569/3.748 ms
mymachine-Mac:c name$ traceroute site.com
traceroute: Warning: site.com has multiple addresses; using 160.158.244.92
traceroute to site.com (160.158.244.92), 64 hops max, 52 byte packets
1 192.168.1.1 (192.168.1.1) 2.393 ms 1.159 ms 1.042 ms
2 172.16.70.1 (172.16.70.1) 22.796 ms 64.531 ms 26.093 ms
3 abts-kk-static-ilp-241.11.181.122.airtel.in (122.181.11.241) 28.483 ms 21.450 ms 25.255 ms
4 aes-static-005.99.22.125.airtel.in (125.22.99.5) 30.558 ms 30.448 ms 40.344 ms
5 182.79.245.62 (182.79.245.62) 75.568 ms 101.446 ms 68.659 ms
6 13335.sgw.equinix.com (202.79.197.132) 84.201 ms 65.092 ms 56.111 ms
7 160.158.244.92 (160.158.244.92) 66.352 ms 69.912 ms 81.458 ms
mymachine-Mac:c name$ site.com (160.158.244.92): 56 data bytes
I may well be wrong, but personally I smell a rat. Your times aren't justified by your setup; I believe that your requests ought to run much faster.
If at all possible, generate a short query using curl and intercept it with tcpdump on both the client and the server.
It could be a bandwidth/concurrency problem on the hosting. Check out its diagnostic panel, or try estimating the traffic.
You can try and save a response query into a static file, then requesting that file (taking care as not to trigger the local browser cache...), to see whether the problem might be in processing the data (either server or client side).
Does this slowness affect every request, or only the autocomplete ones? If the latter, and no matter what nginx says, it might be some inefficiency/delay in recovering or formatting the autocompletion data for output.
Also, you can try and serve a static response bypassing nginx altogether, in case this is an issue with nginx (and for that matter: have you checked out nginx' error log?).
One approach I didn't see you mention is to use SSL sessions: you can add the following into your nginx conf to make sure that an SSL handshake (very expensive process) does not happen with every connection request:
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
See "HTTPS server optimizations" here:
http://nginx.org/en/docs/http/configuring_https_servers.html
I would recommend using New Relic if you aren't already. It is possible that the server-side code you have could be the issue. If you think that might be the issue, there are quite a few free code profiling tools.
You may want to consider an option to preload autocomplete options in the background while the page is rendered and then save a trie or whatever structure you use on the client in the local storage. When the user starts typing in the autocomplete field you would not need to send any requests to the server but instead query local storage.
Web SQL Database and IndexedDB introduce databases to the clientside.
Instead of the common pattern of posting data to the server via
XMLHttpRequest or form submission, you can leverage these clientside
databases. Decreasing HTTP requests is a primary target of all
performance engineers, so using these as a datastore can save many
trips via XHR or form posts back to the server. localStorage and
sessionStorage could be used in some cases, like capturing form
submission progress, and have seen to be noticeably faster than the
client-side database APIs.
For example, if you have a data grid component or an inbox with
hundreds of messages, storing the data locally in a database will save
you HTTP roundtrips when the user wishes to search, filter, or sort. A
list of friends or a text input autocomplete could be filtered on each
keystroke, making for a much more responsive user experience.
http://www.html5rocks.com/en/tutorials/speed/quick/#toc-databases

Performance degradation using Azure CDN?

I have experimented quite a bit with CDN from Azure, and I thought i was home safe after a successful setup using a web-role.
Why the web-role?
Well, I wanted the benefits of compression and caching headers which I was unsuccessful obtaining using normal blob way. And as an added bonus; the case-sensitive constrain was eliminated also.
Enough with the choice of CDN serving; while all content before was served from the same domain, I now serve more or less all "static" content from cdn.cuemon.net. In theory, this should improve performance since browsers parallel can spread content gathering over "multiple" domains compared to one domain only.
Unfortunately this has lead to a decrease in performance which I believe has to do with number of hobs before content is being served (using a tracert command):
C:\Windows\system32>tracert -d cdn.cuemon.net
Tracing route to az162766.vo.msecnd.net [94.245.68.160]
over a maximum of 30 hops:
1 1 ms 1 ms 1 ms 192.168.1.1
2 21 ms 21 ms 21 ms 87.59.99.217
3 30 ms 30 ms 31 ms 62.95.54.124
4 30 ms 29 ms 29 ms 194.68.128.181
5 30 ms 30 ms 30 ms 207.46.42.44
6 83 ms 61 ms 59 ms 207.46.42.7
7 65 ms 65 ms 64 ms 207.46.42.13
8 65 ms 67 ms 74 ms 213.199.152.186
9 65 ms 65 ms 64 ms 94.245.68.160
C:\Windows\system32>tracert cdn.cuemon.net
Tracing route to az162766.vo.msecnd.net [94.245.68.160]
over a maximum of 30 hops:
1 1 ms 1 ms 1 ms 192.168.1.1
2 21 ms 22 ms 20 ms ge-1-1-0-1104.hlgnqu1.dk.ip.tdc.net [87.59.99.217]
3 29 ms 30 ms 30 ms ae1.tg4-peer1.sto.se.ip.tdc.net [62.95.54.124]
4 30 ms 30 ms 29 ms netnod-ix-ge-b-sth-1500.microsoft.com [194.68.128.181]
5 45 ms 45 ms 46 ms ge-3-0-0-0.ams-64cb-1a.ntwk.msn.net [207.46.42.10]
6 87 ms 59 ms 59 ms xe-3-2-0-0.fra-96cbe-1a.ntwk.msn.net [207.46.42.50]
7 68 ms 65 ms 65 ms xe-0-1-0-0.zrh-96cbe-1b.ntwk.msn.net [207.46.42.13]
8 65 ms 70 ms 74 ms 10gigabitethernet5-1.zrh-xmx-edgcom-1b.ntwk.msn.net [213.199.152.186]
9 65 ms 65 ms 65 ms cds29.zrh9.msecn.net [94.245.68.160]
As you can see from the above trace route, all external content is delayed for quite some time.
It is worth noticing, that the Azure service is setup in North Europe and I am settled in Denmark, why this trace route is a bit .. hmm .. over the top?
Another issue might be that the web-role is two extra small instances; I have not found the time yet to try with two small instances, but I know that Microsoft limits the extra small instances to a 5Mb/s WAN where small and above has 100Mb/s.
I am just unsure if this goes for CDN as well.
Anyway - any help and/or explanation is greatly appreciated.
And let me state, that I am very satisfied with the Azure platform - I am just curious in regards to the above mentioned matters.
Update
New tracert without the -d option.
Being inspired by user728584 I have researched and found this article, http://blogs.msdn.com/b/scicoria/archive/2011/03/11/taking-advantage-of-windows-azure-cdn-and-dynamic-pages-in-asp-net-caching-content-from-hosted-services.aspx, which I will investigate further in regards to public cache-control and CDN.
This does not explain the excessive hops count phenomenon, but I hope a skilled network professional can help in casting light to this matter.
Rest assured, that I will keep you posted according to my findings.
Not to state the obvious but I assume you have set the Cache-Control HTTP header to a large number so as your content is not being removed from the CDN Cache and being served from Blob Storage when you did your tracert tests?
There are quite a few edge servers near you so I would expect it to perform better: 'Windows Azure CDN Node Locations' http://msdn.microsoft.com/en-us/library/windowsazure/gg680302.aspx
Maarten Balliauw has a great article on usage and use cases for the CDN (this might help?): http://acloudyplace.com/2012/04/using-the-windows-azure-content-delivery-network/
Not sure if that helps at all, interesting...
Okay, after I'd implemented public caching-control headers, the CDN appears to do what is expected; delivering content from x-number of nodes in the CDN cluster.
The above has the constrain that it is experienced - it is not measured for a concrete validation.
However, this link support my theory: http://msdn.microsoft.com/en-us/wazplatformtrainingcourse_windowsazurecdn_topic3,
The time-to-live (TTL) setting for a blob controls for how long a CDN edge server returns a copy of the cached resource before requesting a fresh copy from its source in blob storage. Once this period expires, a new request will force the CDN server to retrieve the resource again from the original blob, at which point it will cache it again.
Which was my assumed challenge; the CDN referenced resources kept pooling the original blob.
Also, credits must be given to this link also (given by user728584); http://blogs.msdn.com/b/scicoria/archive/2011/03/11/taking-advantage-of-windows-azure-cdn-and-dynamic-pages-in-asp-net-caching-content-from-hosted-services.aspx.
And the final link for now: http://blogs.msdn.com/b/windowsazure/archive/2011/03/18/best-practices-for-the-windows-azure-content-delivery-network.aspx
For ASP.NET pages, the default behavior is to set cache control to private. In this case, the Windows Azure CDN will not cache this content. To override this behavior, use the Response object to change the default cache control settings.
So my conclusion so far for this little puzzle is that you must pay a close attention to your cache-control (which often is set to private for obvious reasons). If you skip the web-role approach, the TTL is per default 72 hours, why you may not never experience what i experienced; hence it will just work out-of-the-box.
Thanks to user728584 for pointing me in the right direction.

Resources