BIND 9 restart performance with DNSSEC - performance

At the moment i experimenting with the restart performance of BIND9 (version 9.16.17). With 26000 active zones in the named.conf, the restart is roundabout 10 seconds. but when i start using DNSSEC, at the moment with 500 signed zones (and 25500 unsigned), the restart is up to 40 seconds! when i think forword to over 10000 signed zones, the restart is taking over 12 minutes.
for dnssec signing i use "dnssec-policy", mostly in the default configration.
is there a way to improve the performance?

Nope. This is by design. As is, you letting the primary (formerly called master) authoritative nameserver do number-crunching of public keys while concurrently doing answering of network queries by the millions is no way to go.
An Alternative Approach
But it is common to set up a hidden master nameserver to deal with this long duration and CPU-intensive overhead of DNSSEC preparing the zone database by its signing of each and every record with a new public key.
Hidden-master is simply another primary (master) nameserver chain-added to before the original primary (master) authoritative nameserver (who still remains a master, but in public view).
In short, the overall network design outlines a hidden-master before a primary while in turn serving all secondaries (slaves).
A New Primary to a Primary
Hidden-master would then populate in an incremental manner the usualy primary authoritative nameserver with pre-signed RRSIG DNS records so that zones' AXFR/IXFR record data would transfer incrementally, as each record becomes DNSSEC-signed and ready.
This ensures that the remote but primary authoritative nameserver would NOT be burdened with intensive CPU burn during DNSSEC resigning effort.
The primary authoritative nameserver will still happily perform answer lookups to all queries in their usual speedy manner.
Furthermore, a compromise of your primary nameserver would still not impinge upon your original primary zone database files while your hidden-master hides, nice and snug, behind its corporate DMZ firewall.
Setup, Brief
Using ISC Bind9, the zone database gets relocated to and resides in the hidden master (just no longer resides at the primary authoritative nameserver).
The primary authoritative nameserver is now ALSO the hidden-secondary to its hidden-master while maintaining a public-façade of still being a primary).
Also, SOA MNAME of each zone would still point to the the original primary authoritative nameserver ... unchanged. This alone has broken many DNS administrators' die-hard notion that zone database file must reside at its SOA nameserver, not so anymore here.
In addition to standard settings for zone's type primary, this hidden-master have additional ISC Bind9 options like notify-to-soa yes to distinguish from another downstream primary; also allow-transfer, notify explicit and also-notify are required as well.
Primary authoritative nameserver would still remain to be a lightweight high-performance authoritative server, but tweaked to have allow-update and allow-notify containing this new hidden-master's IP address.
If hidden-master is behind a NAT, then the primary authoritative must refer to its public-facing NAT address of its hidden-master, and not by its corporate internal IP address inside the NAT masqerade. Also firewall open-up a distinctive TCP and UDP port number other than 53 to use for AXFR/IXFR. Of course, such transfer should require a TSIG key.
References
https://serverfault.com/questions/381920/bind-master-slaves-and-notify
http://www.ipamworldwide.com/ipam/bind-9-8-2.html
https://egbert.net/blog/articles/dns-bind9-hidden-primary-port53-not.html

Related

ActiveMQ Artemis - how does slave become live if there's only one master in the group?

I'm trying to understand a couple of items relating to failover in the ActiveMQ Artemis documentation. Specifically, there is a section (I'm sure I'm reading it wrong) that seems as if it's impossible to for a slave to take over for a master:
Specifically, the backup will become active when it loses connection to its live server. This can be problematic because this can also happen because of a temporary network problem. In order to address this issue, the backup will try to determine whether it still can connect to the other servers in the cluster. If it can connect to more than half the servers, it will become active, if more than half the servers also disappeared with the live, the backup will wait and try reconnecting with the live. This avoids a split brain situation
If there is only one other server, the master, the slave will not be able to connect to it. Since that is 100% of the other servers, it will stay passive. How can this work?
I did see that a pluggable quorum vote replication could be configured, but before I delve into that, I'd like to know what I'm missing here.
When using replication with only a single primary/backup pair there is no mitigation against split brain. When the backup loses its connection with the primary it will activate since it knows there are no other primary brokers in the cluster. Otherwise it would never activate, as you note.
The documentation should be clarified to remove this ambiguity.
Lastly, the documentation you referenced does not appear to be the most recent based on the fact that the latest documentation is slightly different from what you quoted (although it still contains this ambiguity).
Generally speaking, a single primary/backup pair with replication is only recommended with the new pluggable quorum voting since the risk of split brain is so high otherwise.

Azure Availability sets, Fault Domains, and Update Domains

I'm Azure newbie and need some clarifications:
When adding machines to Availability set, in order to prevent VM from rebooting, what's best strategy for VM's, put them in:
-different update and fault domains
-same update domain
-same fault domain ?
My logic is that it's enough to put them in diffrent update AND fault domain
I used this as reference:https://blogs.msdn.microsoft.com/plankytronixx/2015/05/01/azure-exam-prep-fault-domains-and-update-domains/
Am i correct ?
These update/fault domains are confusing
My logic is that it's enough to put them in diffrent update AND fault
domain
You are right, we should put VMs in different update and fault domain.
We put them in different update domain, when Azure hosts need update, Microsoft engineer will update one update domain, when it completed, update another update domain. In this way, our VMs will not reboot in the same time.
we put them in different fault domain, when an Unexpected Downtime happened, VMs in that fault domain will reboot, other VMs will keep running, in this way, our application running on those VMs will keep health.
To shot, add VMs to an availability set with different update domain and fault domain, that will get a high SLA, but not means one VM will not reboot.
Hope that helps.
There are three scenarios that can lead to virtual machine in Azure being impacted: unplanned hardware maintenance, unexpected downtime, and planned maintenance.
Unplanned Hardware Maintenance
An Unexpected Downtime
Planned Maintenance events
Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform. For a given availability set, five non-user-configurable update domains are assigned by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time. A rebooted update domain is given 30 minutes to recover before maintenance is initiated on a different update domain.
Fault domains define the group of virtual machines that share a common power source and network switch. By default, the virtual machines configured within your availability set are separated across up to three fault domains for Resource Manager deployments (two fault domains for Classic). While placing your virtual machines into an availability set does not protect your application from operating system or application-specific failures, it does limit the impact of potential physical hardware failures, network outages, or power interruptions.
For more details, refer this documentation.

Private etcd cluster using tokens

I'm trying to set up an embedded etcd cluster on a bunch of machines that I'm keeping track of (so I always have the list of machine IPs), and there will always be at least one machine with a known static IP that's always in the cluster.
I'd like to set it up so any machine that has a secret token can join this cluster, assuming I boostrap it with the token and one or more current cluster member IP addresses.
The only options that seem relevant are the initial-cluster and initial-cluster-token, but I can't make out where to set up the bootstrap servers.
Or does it make sense to just trust the (or a private) discovery service for this? Does the discovery service automatically scale up to member past the initial size, and does it work well for long lived clusters that have a lot of servers leaving joining over months or years?

How to slow down WWW on nameserver level?

for scientific purposes I would like to know how to slow down www server on DNS level.
Is it possible via TTL setting ?
Thank You
Ralph
It should not be possible to slow down the speed of a website (http) solely by modifying the DNS response.
However, you could easily slow down the initial page load time via DNS by modifying the DNS server to take an abnormally long time before returning the DNS results. The problem is, this will really only effect the initial load of the website, as after that, web browsers, computers, and ISPs will cache the results.
.
the TTL you spoke of only effects how long the DNS result should be cached for, which generally has minimal effect on speed of the website. That being said, theoretically it would be possible to set the DNS TTL to a value close to 0, requiring the client to have to re-lookup the IP via DNS with nearly every page load. This would make nearly every new page from the website load very slowly.
However, the problem with this attack is that in the real world, venders and ISPs often don't follow the rules exactly. There are numerous ISPs and even some consumer devices that don't honor low TTL values in DNS replies, and will cache the DNS result for a decent period of time regardless of what the DNS server asked it to be cached for.
.
So from my experience in lowering TTL to very low values while transferring services to new IPs, and seeing ridiculously long caching time regardless, I would say that while such an attack such as this may work, it would depend hugely on what DNS server each victim is using, and in most cases would make close to no delay after the initial page load.

Common Issues in Developing Cluster Aware non-web-based Enterprise Applications

I've to move a Windows based multi-threaded application (which uses global variables as well as an RDBMS for storage) to an NLB (i.e., network load balancer) cluster. The common architectural issues that immediately come to mind are
Global variables (which are both read/ written) will have to be moved to a shared storage. What are the best practices here? Is there anything available in Windows Clustering API to manage such things?
My application uses sockets, and persistent connections is a norm in the field I work. I believe persistent connections cannot be load balanced. Again, what are the architectural recommendations in this regard?
I'll answer the persistent connection part of the question first since it's easier. All good network load-balancing solutions (including Microsoft's NLB service built into Windows Server, but also including load balancing devices like F5 BigIP) have the ability to "stick" individual connections from clients to particular cluster nodes for the duration of the connection. In Microsoft's NLB this is called "Single Affinity", while other load balancers call it "Sticky Sessions". Sometimes there are caveats (for example, Microsoft's NLB will break connections if a new member is added to the cluster, although a single connection is never moved from one host to another).
re: global variables, they are the bane of load-balanced systems. Most designers of load-balanced apps will do a lot of re-architecture to minimize dependence on shared state since it impedes the scalabilty and availability of a load-balanced application. Most of these approaches come down to a two-step strategy: first, move shared state to a highly-available location, and second, change the app to minimize the number of times that shared state must be accessed.
Most clustered apps I've seen will store shared state (even shared, volatile state like global variables) in an RDBMS. This is mostly out of convenience. You can also use an in-memory database for maximum performance. But the simplicity of using an RDBMS for all shared state (transient and durable), plus the use of existing database tools for high-availability, tends to work out for many services. Perf of an RDBMS is of course orders of magnitude slower than global variables in memory, but if shared state is small you'll be reading out of the RDBMS's cache anyways, and if you're making a network hop to read/write the data the difference is relatively less. You can also make a big difference by optimizing your database schema for fast reading/writing, for example by removing unneeded indexes and using NOLOCK for all read queries where exact, up-to-the-millisecond accuracy is not required.
I'm not saying an RDBMS will always be the best solution for shared state, only that improving shared-state access times are usually not the way that load-balanced apps get their performance-- instead, they get performance by removing the need to synchronously access (and, especially, write to) shared state on every request. That's the second thing I noted above: changing your app to reduce dependence on shared state.
For example, for simple "counters" and similar metrics, apps will often queue up their updates and have a single thread in charge of updating shared state asynchronously from the queue.
For more complex cases, apps may swtich from Pessimistic Concurrency (checking that a resource is available beforehand) to Optimistic Concurrency (assuming it's available, and then backing out the work later if you ended up, for example, selling the same item to two different clients!).
Net-net, in load-balanced situations, brute force solutions often don't work as well as thinking creatively about your dependency on shared state and coming up with inventive ways to prevent having to wait for synchronous reading or writing shared state on every request.
I would not bother with using MSCS (Microsoft Cluster Service) in your scenario. MSCS is a failover solution, meaning it's good at keeping a one-server app highly available even if one of the cluster nodes goes down, but you won't get the scalability and simplicity you'll get from a true load-balanced service. I suspect MSCS does have ways to share state (on a shared disk) but they require setting up an MSCS cluster which involves setting up failover, using a shared disk, and other complexity which isn't appropriate for most load-balanced apps. You're better off using a database or a specialized in-memory solution to store your shared state.
Regarding persistent connection look into the port rules, because port rules determine which tcpip port is handled and how.
MSDN:
When a port rule uses multiple-host
load balancing, one of three client
affinity modes is selected. When no
client affinity mode is selected,
Network Load Balancing load-balances
client traffic from one IP address and
different source ports on
multiple-cluster hosts. This maximizes
the granularity of load balancing and
minimizes response time to clients. To
assist in managing client sessions,
the default single-client affinity
mode load-balances all network traffic
from a given client's IP address on a
single-cluster host. The class C
affinity mode further constrains this
to load-balance all client traffic
from a single class C address space.
In an asp.net app what allows session state to be persistent is when the clients affinity parameter setting is enabled; the NLB directs all TCP connections from one client IP address to the same cluster host. This allows session state to be maintained in host memory;
The client affinity parameter makes sure that a connection would always route on the server it was landed initially; thereby maintaining the application state.
Therefore I believe, same would happen for your windows based multi threaded app, if you utilize the affinity parameter.
Network Load Balancing Best practices
Web Farming with the
Network Load Balancing Service
in Windows Server 2003 might help you give an insight
Concurrency (Check out Apache Cassandra, et al)
Speed of light issues (if going cross-country or international you'll want heavy use of transactions)
Backups and deduplication (Companies like FalconStor or EMC can help here in a distributed system. I wouldn't underestimate the need for consulting here)

Resources