Redis failover triggers slow response times on some IIS instances - windows

We have 4 instances of IIS running. The .NET web app persists data to SQL Server HA and uses Redis for session store. All of these are hosted on servers in a single data center.
Redis, Redis-Sentinel and HAProxy are running as containers on 4 CentOS instances in a HA config.
The web app is using the Microsoft.Web.Redis.RedisSessionStateProvider 4.0.1 provider. The connection string in web.config is as follows:
<sessionState mode="Custom" customProvider="SessionStateStore" timeout="50">
<providers>
<add name="SessionStateStore" type="Microsoft.Web.Redis.RedisSessionStateProvider" connectionString="10.40.50.50:26379,10.40.50.51:26379,10.40.50.52:26379,10.40.50.53:26379,ssl=false,password=aPassword,serviceName=master" throwOnError="False" />
</providers>
</sessionState>
Since going into production a couple of weeks ago, I've noticed some IIS instances report significantly slower response times immediately after a Redis failover, probably once every few days. Sometimes 2 out of the 4 web servers are slow after Redis fails over, and the app pools need restarting. Other times, it may be just the 1. On other occasions, all 4 IIS servers report slow response but correct themselves after a few minutes without needing to restart the app pool.
Is there a way to determine why Redis is failing over? Also, what's a good approach to troubleshoot the IIS side to identify if certain active requests at the time of the failover results in a higher thread count, therefore slower response?

Related

Nginx slow static file serving after a period of inactivity

I have a nginx server deployed as a reverse proxy. everything works great if I regularly use the service.
Issue happens when the nginx service that I have deployed is inactive or not used(NO REQUEST PROCOSSED) for few days.
When I try to launch THE application using nginx the static files download take lots of time even though the size of the files are in byte.
issues goes away after I restart my nginx SERVER.
Using 1.15.8.3 version of openresty.
any suggestion/help will be highly appreciated.

Why Windows system having Cassandra Server is hanging?

For testing purpose, I have installed Cassandra Single Node server in my windows 64-bit System where the Cassandra Server running Continuously as a Service. After 2 or 3 Days of Continuous run, my System is getting hanged which does not allows any operations at all. But Cassandra Server is serving requests from Client applications with out creating any problem. What is the reason for this problem and How to solve this issue?

Session stickiness on Amazon Web Services

I'm a bit confused about the use of the session stickiness on Amazon Web Services. When I deploy my java web application using Amazon Elastic Beanstalk, I can choose to enable the session stickiness and then specify a cookie expiration period.
My application uses cookies for the session (JSESSIONID) as well as for other small things. Most of the website is accessible only after logging in (I use Spring security to manage it). The website will run on up to 25 small EC2 instances.
Should I enable the session stickiness? If I don't enable it, does it mean that I could be suddendly logged out because the load balancer took me to another server (not the server that authenticated me)? If I enable the session stickiness, do I get logged out when the server that authenticated me gets shut down? Basically, why and when should I use session stickiness?
Thank you very much.
If I don't enable it, does it mean that I could be suddendly logged out because the load balancer took me to another server (not the server that authenticated me)?
Yes
If I enable the session stickiness, do I get logged out when the server that authenticated me gets shut down?
Yes
When using Elastic Beanstalk with a typical Java webapp, I think you will definitely want to enable session stickiness. Otherwise each HTTP request from a user's browser could be routed to a different server.
To get around the issue of the user's session being destroyed when the server they are "stuck" to gets shut down you would need to look into Tomcat session replication. This isn't something that Elastic Beanstalk comes with out of the box unfortunately, so in order to setup session replication you would have to create a custom Elastic Beanstalk AMI for your application to use. Also, you would have to use an implementation of Tomcat session replication that does not rely on multicast, since multicast isn't available on AWS, or any other cloud environment that I know of. An example of an implementation that doesn't rely on multicast would be one that uses a database (such as Amazon RDS) or memcached server (such as Amazon Elastic Cache) to make the sessions available across multiple Tomcat instances.
Also note that the Elastic Beanstalk UI only allows you to enable load balancer-generated HTTP cookies. However after Elastic Beanstalk has created the load balancer, you can go into the EC2 console and modify the load balancer's settings to switch it to application-generated HTTP cookies, and then tell it to use the "JSESSIONID" cookie.
You can also use DynamoDB for tomcat session sharing: http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/java-dg-tomcat-session-manager.html

AppFabric Cache seems unstable

We're trying to use AppFabric distributed cache. After a lot of back and forth with non-domain servers we finally put them in a domain and installation/setup was a bit easier. We got it up and running after fighting through a ton of errors, most of which seems trivial to include some test or more descriptive error message for in AppFabric. "Temporary error" does not explain a lot...
But there are still issues.
We set up 3 servers, one of which is "lead". We finally got the cache working and we confirmed this by pointing a Network Load Balancer to one server at a time confirming that we can set cache at one server and retrieve it at another.
Then I restarted the AppFabric Caching service on all servers and suddenly it is not working. Get-CacheHost says they are up, but we get exceptions like:
ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out
ErrorCode<ERRCA0017>:SubStatus<ES0001>:There is a temporary failure. Please retry later.
Why would this error condition occur by simply restarting the services?
Is AppFabric Cache really ready for production use?
What happens if a server goes offline? Long timeouts?
Are we dependent on the "lead" server being up?
I suspect it will be back up after 5-10 minutes of R&R. It seems to come back by itself sometimes.
Update: It did come up after a few minutes. We have now tested by removing one server from the cluster and it resulted in a long timeout and finally an exception.
We have been debugging this for some time and I'm sharing what we have found so far.
UAC on Windows 2008 actually blocks access to local computer, so commands towards local computer will fail. Start PowerShell as admin or turn off UAC completely to bypass.
Simply changing the config file manually will not work. You need to use export and import commands.
Firewalls are a major issue as the installer opens the 222* range of ports, but the PowerShell tools use other Windows services. Turning off the firewall on all servers (not recommended) solved the problem.
If a server is removed from the cluster there will be an initial timeout before the cluster can operate again.
After restart the cluster uses 2-5 minutes to get back up.
If restarting and one server is not reachable the startup time is increased.
If the server holding the shared fileshare for config is not reachable the services will not start. We tried to solve this by giving each server a private share.

In Slapd, how do I deal with the "connection table full (64/64)" error?

I'm working on an application running on Windows servers which requires heavy use of LDAP. For now we are stuck with the slapd LDAP server on a Windows platform - it's not great but for various reasons we are stuck with this architecture.
Our system scales with demand, so at peak times there will be more application servers. Each application server is multi-threaded and may make up to 16 concurrent connections to the single LDAP server.
Any time the system tries to make more than 64 concurrent connections to the LDAP server, the slapd will block any further connection attemps.
It's obvious that the slapd connection pool is maxed-out, but how do I make it bigger? The machine we run Slapd on is a very-powerful 8-core server, so we can theoretically a few hundred concurrent connections. Furthermore, a previous incarnation of this project ran slapd on Ubuntu Linux on a dual-core server. It was able to handle twice the load of ours without any problem, so it would appear that our troubles are Windows spesific.
Found the ansewer:
You have to re-compile slapd with the source-code changed, there's a C++ macro which specifies the connection limit.

Resources