How reliable is DNS TTL for server switch over? (DNS TTL overriding) - caching

How reliable is DNS as a mechanism to switch between servers? With low TTL's, testing seems great, but I was wondering how reliable this would be for a public, production system?
My concern with this strategy is that I'm not sure if DNS record caching can be overridden by DNS proxies, and some providers may use this to save traffic. What if we use integration with other systems (e.g. a web service for mobile)?
While I understand how RFC compliant DNS is meant to work, I'm not actually sure how complaint networks are in general. (and DNS rr works great too for distribution, but this is specifically for switch-overs).

It's "reliable enough" for most of your audience.
Yes, recursive DNS resolvers do allow administrative override of TTLs (e.g.cache-min-ttl) that disobeys yours if you had set it too low. Also, there are software stacks that cache records forever in their default configuration (Java < 1.6).
You should always be prepared for some residual traffic to the old host even long after the switch over. In my experience, though, they're mostly poorly-written crawlers. If you want to be 100% certain of not missing any traffic, proxy all traffic from the old host to the new. Nginx/Apache can be easily made to do that.
You can query a recursive server for the remaining TTL that it is going to cache a record. Type this repeatedly, and you'll see that the TTL decreases with time:
dig #208.67.222.222 stackoverflow.com
Once the record expires, it should start at the TTL that you configured in your zone.
This way, you can at least test it against the public resolvers to see if it is obeying your TTL.

Related

Can any caching DNS servers refresh their cache asynchronously?

We run a latency-sensitive system. We found one significant cause of latency: some processes were making blocking DNS lookups to remote nameservers. To mitigate this, we have installed a local caching DNS resolver, specially dnsmasq.
But we still see occasional significant pauses where queries to the local DNS cache (dnsmasq) can take a long time. These are caused by TTL expiry; in these cases dnsmasq queries its upstream server before responding to the local process.
We would like to eliminate these pauses, too. I would like our local DNS cache to always respond immediately, even if the response is stale. The cache should query its upstream server asynchronously. For example, if the cache serves a stale response, it could refresh this asynchronously. Or a more sophisticated policy would be to refresh the cache asynchronously shortly before the TTL expires.
But I can't find any such setting for dnsmasq, or for any other caching DNS servers I've looked at. Are any DNS servers designed to run in this configuration?
Knot resolver with configuration modules = { 'predict' } will start asynchronous refresh of records that are put into answer at a moment when their TTL is close to expiration.
Note that version 2.0.0 has a bug that defeats this refresh for records without DNSSEC signatures (will be fixed in the next release).
Unbound DNS Server also does this with a prefetch option - yes/no.

how to reduce ssl time of website

I have an HTTPS website and I want to reduce the SSL time of this website. The SSL certificate has been installed on AWS ELB.
If I access the site from Netherlands, the SSL Time is high but if I access the same site from other countries then the SSL time is low. Why is that?
I am basically trying to minimize the time which is showing in this page
http://tools.pingdom.com/fpt/#!/ed9oYJ/https://www.google.com/index.html
Many things influence the SSL time including:
Infrastructure (this won't affect just SSL but ALL network traffic):
Standard network issues (how far away your server is from client, how fast the network is in between... etc) as the SSL/TLS handshake takes several round trips. You have little control over these except changing hosting provider and/or using a CDN. AWS is, in my experience fast and you are only asking to improve SSL rather than general access times so maybe skip this one for now.
Server response time. Is the server under powered in CPU, Ram or disk? Are you sharing this host? Again general issue so maybe skip past this but SSL/TLS does take some processing power though, with modern servers it is barely noticeable nowadays.
Server OS. Newer is better. So if running Red Hat Linux 4 for example then expect it to be considerably slower than the latest Red Hat Linux 7, with improved networking stack and newer versions of key software like OpenSSL.
SSL set up (run your site through https://www.ssllabs.com/ssltest and you should get a state of health of this):
Ciphers used. There are older and slower ciphers and faster and new ones. Can get complicated here really quickly but generally you should be looking for ECDHE ciphers for most clients (and preferable ECDHE...GCM ones) and want to specify that server order should be used so you get to pick the cipher used rather than the client.
Certificate used. You'll want a RSA 2048 cert. Anything more is overkill and slow. Some sites (and some scanning tools) choose RSA 4096 certificates but these do have a noticeable impact on speed with no real increase in security (at this time - that may change). There are newer ECDSA certs (usually shown as 256 EC cert in ssllabs report) but these faster ECDSA certs are not supplied by all CAs and are not universally supported by all clients, so visitors on older hardware and software may not be able to connect with them. Apache (and very recently Nginx from v 1.11.0) supports dual certs to have best of both worlds but at the expense of having two certs and some complexity of setting them up.
Certificate Chain. You'll want a short certificate chain (ideal 3 cert long: your server, intermediary and the CAs root certificate). Your server should return everything but the last cert (which is already in browsers certificate store). If any of the chain is missing, some browsers will attempt to look the musing ones but this takes time.
Reliable cert provider. As well as shorter cert chains, better OCSP responders, their intermediaries also are usually cached in users browsers as they are likely to be used by other sites.
OCSP Stapling saves network trip to check cert is valid, using OCSP or CRL. Turning it on won't make a difference for Chrome as they don’t check for revocation (mostly but EV certificates do get checked). It can make a noticeable difference to IE so should be turned on if your server supports them but do be aware of some implementation issues - particularly nginx’s first Request after restart always fails when OCSP Stapling is turned on.
TLSv1.2 should be used and possibly TLSv1 .0 for older clients but no SSLv2 and SSLv3. TLSv1.1 is kind of pointless (pretty much everyone that supports that also supports the newer and better TLSv1.2). TLSv1.3 is currently being worked on and has some good performance improvements but has not been fully standardised yet as there are some known compatibility issues. Hopefully these will be resolved soon so it can be used. Note PCI compliance (if taking credit cards on your site) demands TLSv1.2 or above on new sites, and on all sites by 30th June 2018.
Repeat visits - while above will help with the initial connection, most sites require several resources to be downloaded and with bad set up can have to go through whole handshake each time (this should be obvious if you're seeing repeated SSL connection set ups for each request when running things like webpagetest.org):
HTTP Keep-Alives should be turned on so the connection is not dropped after each HTTP Request (this should be the default for HTTP/1.1 implementations).
SSL caching and tickets should be on in my opinion. Some disagree for some obscure security reasons that should be fixed in TLSv1.3 but for performance reasons they should be on. Sites with highly sensitive information may choose the more complete security over performance but in my opinion the security issues are quite complex to exploit, and the performance gain is noticeable.
HTTP/2 should be considered, as it only opens one connection (and hence only one SSL/TLS setup) and has other performance improvements.
I would really need to know your site to see which if above (if any) can be improved. If not willing to give that, then I suggest you run ssllabs test and ask for help with anything it raises you don't understand, as it can require a lot of detailed knowledge to understand.
I run a personal blog explaining some of these concepts in more detail if that helps: https://www.tunetheweb.com/security/https/
You can try ECDSA certificates: https://scotthelme.co.uk/ecdsa-certificates/
But the cost of https is only visible on the first requests: session tickets avoid that cost for all other requests. Are they activated? ( you can check it with ssllabs.com )
If you can you should use SPDY or http2, it may improve the speed too.
ECDSA keys, SPDY and http2 reduce the number of round trip necessary so it should reduce the difference between the two location.
You say that you're not using a CDN, but I believe you should be. Here's why:
Connecting via TLS/SSL involves handshaking a secure connection, and that requires extra communication between the client and server before any data can begin flowing. This link has a handy diagram of the SSL handshake, and this link explains the first few milliseconds of an HTTPS connection.
Jordan Sissel wrote about his experiences with SSL handshake latency:
I started investigating the latency differences for similar requests between HTTP and HTTPS.
...
It's all in the handshake.
...
The point is, no matter how fast your SSL accelerators (hardware loadbalancer, etc), if your SSL end points aren't near the user, then your first connect will be slow.
If you use a CDN, then the handshaking can be done between the client and the nearest edge location, dramatically improving the latency.

High Performance Options for Remote services access

I have a service, foo, running on machine A. I need to access that service from machine B. One way is to launch a web server on A and do it via HTTP; code running under web server on A accesses foo and returns the results. Another is to write socket server on A; socket server access service foo and returns the result.
HTTP connection initiation and handshake is expensive; sockets can be written, but I want to avoid that. What other options are available for high performance remote calls?
HTTP is just the protocol over the socket. If you are using TCP/IP networks, you are going to be using a socket. The HTTP connection initiation and handshake are not the expensive bits, it's TCP initiation that's really the expensive bit.
If you use HTTP 1.1, you can use persistent connections (Keep-Alive), which drastically reduces this cost, closer to that of keeping a persistent socket open.
It all depends on whether you want/need the higher-level protocol. Using HTTP means you will be able to do things like consume this service from a lot more clients, while writing much less documentation (if you write your own protocol, you will have to document it). HTTP servers also supports things like authentication, cookies, logging, out of the box. If you don't need these sorts of capabilities, then HTTP might be a waste. But I've seen few projects that don't need at least some of these things.
Adding to the answer of #Rob, as the question is not precisely pointing to an application or performance boundaries, it would be good to look into the options available in a broader context, which is Inter process communication.
The wikipedia page cleanly lists down the options available and would be a good place to start with.
What technology are you going to use? Let me answer for Java world.
If your request rate is below 100/sec, you should not care about optimizations and use most versatile solution - HTTP.
Well-written asynchronous server like Netty-HTTP can easily handle 1000 requests per second on medium-level hardware.
If you need more or have constrained resources, you can go to binary format. Most popular one out there is Google Protobuf(multilanguage) + Netty (Java).
What you should know about HTTP performance:
Http can use Keep-Alive which removes reconnection cost for every request
Http adds traffic overhead for every request and response - around 50-100 bytes.
Http client and server consumes additional CPU for parsing HTTP headers - that is noticeable after abovementioned 100 req/sec
Be careful when selecting technology. Even in 21 century it is hard to find well-written HTTP server and client.

tcp_tw_recycle behind application level load balancer?

Given that our linux servers never open direct connections to our clients, is it safe to use tcp_tw_recycle on them ?
Those servers are behind a application level load-balancer and all the connections i see on them are between internal 10.x.x.x addresses.
Thanks
We have such a load balancer provided by AWS (ELB), so I'll provide my advice based on that:
Why gamble? If your overhead/port-consumption is coming from quick client connections, Amazon recommends enabling persistent connections on your ELB instead. (I asked them about this question specifically and got that recommendation...our Amazon contact does not recommend enabling tcp_tw_recycle).
That said, if, say it's another internal box they're struggling to establish rapid connections with (apache-php chatting with MySQL on behalf of the client without persistent connections), you might be able to get away with it:
If ALL client connections will be via the ELB (please set your security group accordingly), then technically speaking you shouldn't encounter problems for the tcp_tw_recycle timestamp jumping cases I'm aware of:
ELB is a termination point on behalf of the client (their NAT firewall won't factor in, and ELB is not NAT based)
The ELB box(es) will not reset themselves, acquire the same IP address, and still be assigned as your ELB (will be someone else's if it happens at all)
The ELB box(es) will not be replaced by another ELB machine using the same IP and still be serving your traffic as your ELB (will be someone else's if it happens at all)
*2 and 3 are not a guarantee from Amazon, but it does appear to be their behavior, just as stop/start will get you a new private IP for EC2 boxes). If that did happen, I'd imagine it is a thing of extremely low probability.
You could theoretically run into issues restarting your own boxes if they communicate with other service machines (like MySQL or memcached) and you restart (not stop/start) one of your boxes, or move their elastic IP to another box and are not using private IPs for internal chatter. But you have some control over this. However, if it's all on the AWS cloud (or your fast internal network), issues are extremely unlikely (unless your AWS zone is having a bad day, and you're restarting/replacing your systems for that reason).
A buddy and I had a long-standing argument about this, and he won by proving his point with a long running 4k browser (fast script) load test via Neustar...there were no connection issues from the client side via ELB, and eliminating the overhead helped quite a bit :-)
If you haven't already, consider tcp_tw_reuse (we were using this to keep the ephemeral port range active before the above mentioned test showed the additional merit of eliminating the overhead with tcp_tw_recycle for us). Be sure to watch your counters on ifconfig if you do decide to disable that chunk of the protocol ;-P.
The following is also a good summary resource on the topic of timestamps jumping: Dropping of connections with tcp_tw_recycle

Understanding HTTPS connection setup overhead

I'm building a web-based chat app which will need to make an AJAX request for every message sent or received. I'd like the data to be encrypted and am leaning towards running AJAX (with long-polling) over HTTPS.
However, since the frequency of requests here is a lot higher than with basic web browsing, I'd like to get a better understanding of the overhead (network usage, time, server CPU, client CPU) in setting up the encrypted connection for each HTTPS request.
Aside from any general info/advice, I'm curious about:
As a very rough approximation, how much extra time does an HTTPS request take compared to HTTP? Assume content length of 1 byte and an average PC.
Will every AJAX request after the first have anything significant cached, allowing it to establish the connection quicker? If so, how much quicker?
Thank you in advance :-)
Everything in HTTPS is slower. Personal information shouldn't be cached, you have encryption on both ends, and an SSL handshake is relatively slow.
Long-polling will help. Long keep-alives are good. Enabling SSL sessions on your server will avoid a lot of the overhead as well.
The real trick is going to be doing load-balancing or any sort of legitimate caching. Not sure how much that will come into play in your system, being a chat server, but it's something to consider.
You'll get more information from this article.
Most of the overhead is in the handshake (exchanging certificates, checking for their revocation, ...). Session resumption and the recent false start extension helps in that respect.
In my experience, the worse case scenario happens when using client-certificate authentication and advertising too many CAs (the CertificateRequest message sent by the server can even become too big); this is quite rare since in practice, when you use client-certificate authentication, you would only accept client-certificates from a limited number of CAs.
If you configure your server properly (for resources for which it's appropriate), you can also enable browser caching for resources served over HTTPS, using Cache-Control: public.

Resources