Detecting dead applications while server is alive in NLB - windows

Windows NLB works great and removes computer from the cluster when the computer is dead.
But what happens if the application dies but the server still works fine? How have you solved this issue?
Thanks

By not using NLB.
Hardware load balancers often have configurable "probe" functions to determine if a server is responding to requests. This can be by accessing the real application port/URL, or some specific "healthcheck" URL that returns only if the application is healthy.
Other options on these look at the queue/time taken to respond to requests
Cisco put it like this:
The Cisco CSM continually monitors server and application availability
using a variety of probes, in-band
health monitoring, return code
checking, and the Dynamic Feedback
Protocol (DFP). When a real server or
gateway failure occurs, the Cisco CSM
redirects traffic to a different
location. Servers are added and
removed without disrupting
service—systems easily are scaled up
or down.
(from here: http://www.cisco.com/en/US/products/hw/modules/ps2706/products_data_sheet09186a00800887f3.html#wp1002630)

Presumably with Windows NLB there is some way to programmatically set the weight of nodes? The nodes should self-monitor and if there is some problem (e.g. a particular node is low on disc space), set its weight to zero so it receives no further traffic.
However, this needs to be carefully engineered and have further human monitoring to ensure that you don't end up with a situation where one fault causes the entire cluster to announce itself down.
You can't really hope to deal with a "byzantine general" situation in network load balancing; an appropriately broken node may think it's fine, appear fine, but while being completely unable to do any actual work. The trick is to try to minimise the possibility of these situations happening in production.

There are multiple levels of health check for a network application.
is the server machine up?
is the application (service) running?
is the service accepting network connections?
does the service respond appropriately to a "are you ok" request?
does the service perform real work? (this will also check back-end systems behind the service your are probing)
My experience with NLB may be incomplete, but I'll describe what I know. NLB can do 1 and 2. With custom coding you can add the other levels with varying difficulty. With some network architectures this can be very difficult.
Most hardware load balancers from vendors like Cisco or F5 can be easily configured to do 3 or 4. Level 5 testing still requires custom coding.

We start in the situation where all nodes are part of the cluster but inactive.
We run a custom service monitor which makes a request on the service locally via the external interface. If the response was successful we start the node (allow it to start handling NLB traffic). If the response failed we stop the node from receiving traffic.
All the intermediate steps described by Darron are irrelevant. Did it work or not is the only thing we care about. If the machine is inaccessible then the rest of the NLB cluster will treat it as failed.

Related

Load balancer and WebSockets

Our infrastructure is composed by
1 F5 load balancer
3 nodes
We have an application which uses websockets, so when a user goes to our site, it opens a websocket to the balancer which it connects to the first available node, and it works as expected.
Our truobles arrives with maintenance tasks, when we have to update our software, we need to turn offline 1 node at a time, deploy the new release and then turn it on again. Doing this task, the balancer drops the open websocket connections to the node and the clients retries to connect after few seconds to the first available nodes, creating an inconvenience for the client because he could miss a signal (or more).
How we can keep the connection between the client and the balancer, changing the backend websocket server? Is the load balancer enough to achieve our goal or we need to change our infrastructure?
To avoid this kind of problems I recommend to read about the Azure SignalR. With this you don't need to thing about stuff like load balancer, redis backplane and other infrastructures that you possibly need to a WebSockets connection.
Basically the clients will not connected to your node directly but redirected to Azure SignalR. You can read more about it here: https://learn.microsoft.com/en-us/azure/azure-signalr/signalr-overview
Since it is important to your application to maintain the connection, I don't see how any other way to archive no connection drop to your nodes, since you need to shut them down.
It's important to understand that the F5 is a full TCP proxy. This means that the F5 is the server to the client and the client to the server. If you are using the websockets protocol then you must apply a websockets profile to the F5 Virtual Server in order for the websockets application to be handled properly by the Load Balancer.
Details of the websockets profile can be found here: https://support.f5.com/csp/article/K14754
If a websockets and an HTTP profile are applied to the Virtual Server - meaning that you have websockets and web traffic using the same port and LB nodes - then the F5 will allow the websockets traffic as passthrough. Also keep in mind that if this is an HTTPS virtual sever that you will need to ensure a client and server side HTTPS profile (SSL offload) are applied to the Virtual Server.
While there are a variety of ways that you can fiddle with load balancers to minimize the downtime caused by a software upgrade, none of them solve the problem, which is that your application-layer protocol seems to not tolerate some small network outages.
Even if you have a perfect load balancer and your software deploys cause zero downtime, the customer's computer may be on flaky wifi which causes a network dropout for half a second - or going over ethernet and someone reconfigures some routing on their LAN, etc.
I'd suggest having your server maintain a queue of messages for clients (up to some size/time limit) so that when a client drops a connection - whether it be due to load balancers/upgrades - or any other reason, it can continue without disruption.

Communicate to stateless web Api service from a different application in Azure Service Fabric

I have two different service fabric applications. Both are stateless web api models. I do have a situation that from service 1 inside application 1, I need to invoke service 2 which is part of application 2. I am deploying both applications in the same cluster. Can someone advise the best practice here. What could be best way to communicate. Please provide some sample as well.
Fabric Transport (aka Service Remoting) is the sdk built-in communication model. Compared to communication over HTTP or WCF it does a little more, especially on the client side of the communication.
When it comes to communicating with Service Fabric services (or really, any distributed systems service) your communication should take into account that the connection could be fail to established on an initial try, or be interrupted mid communication and that you really shouldn't build your solution to expect it to always work flawlessly. The reason for this is in the nature of how Service Fabric at any time can decide to move primaries from a node to another node, the nodes themselves can go down and the services can crash. Nothing strange about he great thing with Service Fabric is that it does a lot of the heavy lifting for you when it comes to maintaining your services and nodes over time.
So, in terms of communication this means that a client needs to be able to do three things (for it to truly work in a distributed environment);
resolve the address to the service (figure out which node it is on, which port it is listening on, which partition id and replica to target and so on)
connect to the service, package and send requests and then recieve and unpack responses
retry the resolve and connect if the communication fails
Fabric Transport does all this when you are using the Service Remoting clients (like ServiceProxy) and service side listeners.
Thats the good part with Fabric Transport, you get all that out of the box and most of the time you don't have to change the default setup either. The bad part is that it only works for communication inside the cluster, i.e. you cannot communicate from the outside to a service running in the cluster using Fabric Transport. For that you need HTTP or WCF.
HTTP(s) and WCF (over HTTP(s)) communication allow you to build your own clients and handle the communication yourself. There are a number of samples on how you can do the resolve, connect and retry for HTTP clients, this one for instance
According to Microsoft there are three built-in communication options. It's up to you to decide which one works best for you. I'm personally using service remoting which is easy to quickly set up. It also allows you to exception handling in your client service.

tcp_tw_recycle behind application level load balancer?

Given that our linux servers never open direct connections to our clients, is it safe to use tcp_tw_recycle on them ?
Those servers are behind a application level load-balancer and all the connections i see on them are between internal 10.x.x.x addresses.
Thanks
We have such a load balancer provided by AWS (ELB), so I'll provide my advice based on that:
Why gamble? If your overhead/port-consumption is coming from quick client connections, Amazon recommends enabling persistent connections on your ELB instead. (I asked them about this question specifically and got that recommendation...our Amazon contact does not recommend enabling tcp_tw_recycle).
That said, if, say it's another internal box they're struggling to establish rapid connections with (apache-php chatting with MySQL on behalf of the client without persistent connections), you might be able to get away with it:
If ALL client connections will be via the ELB (please set your security group accordingly), then technically speaking you shouldn't encounter problems for the tcp_tw_recycle timestamp jumping cases I'm aware of:
ELB is a termination point on behalf of the client (their NAT firewall won't factor in, and ELB is not NAT based)
The ELB box(es) will not reset themselves, acquire the same IP address, and still be assigned as your ELB (will be someone else's if it happens at all)
The ELB box(es) will not be replaced by another ELB machine using the same IP and still be serving your traffic as your ELB (will be someone else's if it happens at all)
*2 and 3 are not a guarantee from Amazon, but it does appear to be their behavior, just as stop/start will get you a new private IP for EC2 boxes). If that did happen, I'd imagine it is a thing of extremely low probability.
You could theoretically run into issues restarting your own boxes if they communicate with other service machines (like MySQL or memcached) and you restart (not stop/start) one of your boxes, or move their elastic IP to another box and are not using private IPs for internal chatter. But you have some control over this. However, if it's all on the AWS cloud (or your fast internal network), issues are extremely unlikely (unless your AWS zone is having a bad day, and you're restarting/replacing your systems for that reason).
A buddy and I had a long-standing argument about this, and he won by proving his point with a long running 4k browser (fast script) load test via Neustar...there were no connection issues from the client side via ELB, and eliminating the overhead helped quite a bit :-)
If you haven't already, consider tcp_tw_reuse (we were using this to keep the ephemeral port range active before the above mentioned test showed the additional merit of eliminating the overhead with tcp_tw_recycle for us). Be sure to watch your counters on ifconfig if you do decide to disable that chunk of the protocol ;-P.
The following is also a good summary resource on the topic of timestamps jumping: Dropping of connections with tcp_tw_recycle

When would you need multiple servers to host one web application?

Is that called "clustering" of servers? When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load? Also, is one "server" that's up and running the application called an "instance"?
[...] Is that called "clustering" of servers?
Clustering is indeed using transparently multiple nodes that are seen as a unique entity: the cluster. Clustering allows you to scale: you can spread your load on all the nodes and, if you need more power, you can add more nodes (short version). Clustering allows you to be fault tolerant: if one node (physical or logical) goes down, other nodes can still process requests and your service remains available (short version).
When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load?
In general, this is the job of a dedicated component called a "load balancer" (hardware, software) that can use many algorithms to balance the request: round-robin, FIFO, LIFO, load based...
In the case of EC2, you previously had to load balance with round-robin DNS and/or HA Proxy. See Introduction to Software Load Balancing with Amazon EC2. But for some time now, Amazon has launched load balancing and auto-scaling (beta) as part of their EC2 offerings. See Elastic Load Balancing.
Also, is one "server" that's up and running the application called an "instance"?
Actually, an instance can be many things (depending of who's speaking): a machine, a virtual machine, a server (software) up and running, etc.
In the case of EC2, you might want to read Amazon EC2 Instance Types.
Here is a real example:
This specific configuration is hosted at RackSpace in their Managed Colo group.
Requests pass through a Cisco Firewall. They are then routed across a Gigabit LAN to a Cisco CSS 11501 Content Services Switch (eg Load Balancer). The Load Balancer matches the incoming content to a content rule, handles the SSL decryption if necessary, and then forwards the traffic to one of several back-end web servers.
Each 5 seconds, the load balancer requests a URL on each webserver. If the webserver fails (two times in a row, IIRC) to respond with the correct value, that server is not sent any traffic until the URL starts responding correctly.
Further behind the webservers is a MySQL master / slave configuration. Connections may be mad to the master (for transactions) or to the slaves for read only requests.
Memcached is installed on each of the webservers, with 1 GB of ram dedicated to caching. Each web application may utilize the cluster of memcache servers to cache all kinds of content.
Deployment is handled using rsync to sync specific directories on a management server out to each webserver. Apache restarts, etc.. are handled through similar scripting over ssh from the management server.
The amount of traffic that can be handled through this configuration is significant. The advantages of easy scaling and easy maintenance are great as well.
For clustering, any web request would be handled by a load balancer, which being updated as to the current loads of the server forming the cluster, sends the request to the least burdened server. As for if it's an instance.....I believe so but I'd wait for confirmation first on that.
You'd' need a very large application to be bothered with thinking about clustering and the "fun" that comes with it software and hardware wise, though. Unless you're looking to start or are already running something big, it wouldn't' be anything to worry about.
Yes, it can be required for clustering. Typically as the load goes up you might find yourself with a frontend server that does url rewriting, https if required and caching with squid say. The requests get passed on to multiple backend servers - probably using cookies to associate a session with a particular backend if necessary. You might have the database on a separate server also.
I should add that there are other reasons why you might need multiple servers, for instance there may be a requirement that the database is not on the frontend server for security reasons

How can I detect another instance of the same Win32 application running on another workstation?

I have a small application, which is free for personal use, but requires a paid license for corporate use.
It is most likely that in a corporate environment my application will run on multiple workstations. If it is the freeware version, I want to show an unobtrusive message. (and continue)
It doesn't have to be bulletproof, if it is not possible (i.e. firewall) then the application should just continue. And I don't want to make the user set up some kind of central service to track the instances. I don't want to annoy my users (especially not the paying ones *g*)
Is there any way to achive this kind of functionality?
I remember an older version of Dreamweaver had this kind of feature. You couldn't run it more than once in the same network.
One way: Listen for UDP broadcast on specific ports. Let each instance send broadcast UDP packet on this port to local network. If application receives such packet, and recognizes its structure, it knows that other instance is running.
You can include license details to avoid messages if two valid licenses are used.
Broadcasts usually aren't routed, so this works on local network only. (And user can disable it completely via firewall too... but if you will use some standard port like 53 (DNS), it won't be blocked).
Other way is to use custom server, which is informed about all running instances around the world ;-)
There are two primary ways to achieve this:
First, you can set up a small server application on each workstation that communicates with other workstations on the network (personally I would use Bonjour for discovery, but there are other options). The drawback here is that you're going to write quite a bit more code to make this work than option #2.
Second (probably simpler) would be to use WMI to enumerate processes on other workstations (again, probably use a Bonjour-like system for discovery), and find your process running on other machines. The drawback to this is that your enumeration code will require privileges on all machines to conduct the search.
When the application starts, it sends out a UDP broadcast on a specific port. This will be restricted to the local subnet, and might not make it through firewalls. This is the "is anyone else running, or can I start?" query.
If there are no responses, the application starts as normal, listening for this UDP broadcast. If it sees one, it responds with an "I'm already running; you can't start" packet.
The application that's just started receives this response packet and then refuses to start or (if you don't want to be that strict) displays a warning to the user.
You'd want to include the product ID and license key (or a hash) in the initial request, so that you can have more than one license on the same network. The response probably wants the machine name in it, so that the second user can go and find the first user and ask if they really need to use the application.
Evil corporation solution:
Have the application call home every time it starts. If more than one application for a license wakes up, tell it not to. If there is no internet connection, don't start at all.

Resources