IIS 10 - maxConcurrentRequestsPerCPU setting over 5000 is ignored - performance

Facing an issue on IIS 10 (Windows Server 2016, .NET 4 integrated mode, classic ASP.NET app) where maxConcurrentRequestsPerCPU setting of over 5000 does not seem to be applied. However setting it to lower values has clear and visible effect.
I'm using JMetter for load testing. If I set thread count to 5000, I see 100% success rate, but anything higher has failure rates regardless of how high I set maxConcurrentRequestsPerCPU and requestQueueLimit. For example testing with 6000 threads consistently renders 16.6% failure rate.
If I set maxConcurrentRequestsPerCPU to something low like 12, I get almost 99% failure rate, so I know the lower bound settings have effect, just not anything over 5000.
The VM has 2 cores assigned, so with 5000 x 2 cores I feel I should be able to successfully test with 10,000 threads.
Aspnet.config
<?xml version="1.0" encoding="UTF-8" ?>
<configuration>
<system.web>
<applicationPool
maxConcurrentRequestsPerCPU="50000"
maxConcurrentThreadsPerCPU="0"
requestQueueLimit="50000"/>
</system.web>
<runtime>
<legacyUnhandledExceptionPolicy enabled="false" />
<legacyImpersonationPolicy enabled="true"/>
<alwaysFlowImpersonationPolicy enabled="false"/>
<SymbolReadingPolicy enabled="1" />
<shadowCopyVerifyByTimestamp enabled="true"/>
</runtime>
<startup useLegacyV2RuntimeActivationPolicy="true" />
</configuration>
What could be overwriting these settings and enforcing the upper bound of 5000?
I feel there is some additional setting that needs to be set to exceed 5000.
Any help / advice would be appreciated at this point. Thanks.

I was running into something similar, and after some digging, I believe the setting you're looking for is in the site configuration, called appConcurrentRequestLimit. I was able to test lower limits work as expected, but I don't have a good way to test over 5000 connections at the moment. I will update this post if/when I hit > 5000 connections in the future.
Update: Can confirm that this setting did the trick.
To change this value:
Click on the site in the left pane of IIS Manager.
Double-click Configuration Editor in the right pane.
From the "Section" dropdown, choose "system.webServer/serverRuntime".
Set the appConcurrentRequestLimit value (default is 5000).
Click "Apply" in the Actions pane.
Restart the site.

if you want to execute more than 5000 requests concurrently, you'll need to increase the requestQueueLimit. The requestQueueLimit restricts the total number of requests in the system. Due to its legacy, it is actually the total number of requests in the system, and not the number of requests in some queue. It's goal is to prevent the server from toppling over due to lack of physical memory, virtual memory, etc. When the limit is reached, incoming requests will receive a quick 503 "Server Too Busy" response.

just try to set appConcurrentRequestLimit over 5000

Related

Service Fabric Resource balancer uses stale Reported load

While looking into the resource balancer and dynamic load metrics on Service Fabric, we ran into some questions (Running devbox SDK GA 2.0.135).
In the Service Fabric Explorer (the portal and the standalone application) we can see that the balancing is ran very often, most of the time it is done almost instantly and this happens every second. While looking at the Load Metric Information on the nodes or partitions it is not updating the values as we report load.
We send a dynamic load report based on our interaction (a HTTP request to a service), increasing the reported load data of a single partition by a large amount. This spike becomes visible somewhere in 5 minutes at which point the balancer actually starts balancing. This seems to be an interval in which the load data gets refreshed. The last reported time gets updated all the time but without the new value.
We added the metrics to applicationmanifest and the clustermanifest to make sure it gets used in the balancing.
This means the resource balancer uses the same data for 5 minutes. Is this a configurable setting? Is it constraint because it is running on a devbox?
We tried a lot of variables in the clustermanifest but none seem to be affecting this refreshtime.
If this is not adaptable, can someone explain why would you run the balancer with stale data? and why this 5 minute interval was chosen?
This is indeed a configurable setting, and the default is 5 minutes. The idea behind it is that in prod you have tons of replicas all reporting load all the time, and so you want to batch them up so you don't spam the Cluster Resource Manager with all those as independent messages.
You're probably right in that this value is way too long for local development. We'll look into changing that for the local clusters, but in the meantime you can add the following to your local cluster manifest to change the amount of time we wait by default. If there are other settings already in there, just add the SendLoadReportInterval line. The value is in seconds and you can adjust it accordingly. The below would change the default load reporting interval from 5 minutes (300 seconds) to 1 minute (60 seconds).
<Section Name="ReconfigurationAgent">
<Parameter Name="SendLoadReportInterval" Value="60" />
</Section>
Please note that doing so does increase load on some of the system services (TANSTAAFL), and as always if you're operating on a generated or complete cluster manifest be sure to Test-ServiceFabricClusterManifest before deploying it. If you're working with a local development cluster the easiest way to get it deployed is probably just to modify the cluster manifest template (by default here: "C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\NonSecure\ClusterManifestTemplate.xml") and just add the line, then right click on the Service Fabric Local Cluster Manager in your system tray and select "Reset Local Cluster". This will regenerate the local cluster with your changes to the template.

MaxConcurrentRequest in selfhost application

I have a selfhost signalr application, everything is ok but when users become more than 5000, users reconnected rapidly. I know that defalt value of appConcurrentRequestLimit is 5000. and i run this:
cd %windir%\system32\inetsrv
appcmd.exe set config /section:system.webserver/serverRuntime /appConcurrentRequestLimit:100000
but nothing changed. I increased maxConcurrentRequestsPerCPU and requestQueueLimit according to this
but i have got problem yet.
i'm using windows server 2012 and iis 8
You are shooting in the dark here, and you have no data about the actual performance and what's happening. The users could reconnect because of different reasons (server timeouts, regular interval reconnects, server errors). There are countless possibilities.
The correct way to know what's happening and measure performance is to run a Baseline performance load test using the default configuration, and collect the relevant performance counters like current requests, queued requests, current connections, max connections etc.
You should also collect any relevant Error logs on the server that could help you figure out what's happening.
You can find the full list of performance counters you need below:
Memory
.NET CLR Memory# bytes in all Heaps (for w3wp)
ASP.NET
ASP.NET\Requests Current
ASP.NET\Queued
ASP.NET\Rejected
CPU
Processor Information\Processor Time
TCP/IP
TCPv6\Connections Established
TCPv4\Connections Established
Web Service
Web Service\Current Connections
Web Service\Maximum Connections
Threading
.NET CLR LocksAndThreads\ # of current logical Threads
.NET CLR LocksAndThreads\ # of current physical Threads
Once you have your baseline performance results on a graph, then you can modify configuration (e.g. modify the number of concurrent requests like you tried above) and then re-run your test, and collect again the same performance counters.
The performance counter results will speak for themselves, and they will lead you to a solution.
You can generate the load with a tool like Crank:
https://github.com/SignalR/SignalR/tree/dev/src/Microsoft.AspNet.SignalR.Crank
In addition you can also check the SignalR troubleshooting guide:
http://www.asp.net/signalr/overview/testing-and-debugging/troubleshooting

IIS Orphaned Requests

We have IIS 7 running a Classic ASP app and I've been noticing the following issue lately. Over the course of the day, if I look at Server Node --> Worker Processes some requests seem to fill up there. The elapsed time is something crazy like 12 hours at the end of the day. This requests all sit in the ExecuteRequestHandler stage.
There is no way anything is executing for that long, and I cannot seem to reproduce the issue. I have tried dumping w3wp.exe, using FRT, and all that good stuff, but I have some general questions:
Is there a setting that controls WHEN IIS stops a request? To be specific, in development, if I purposely design a page to be slow (i.e. update a SQL table thats locked) and then CLOSE out of browser, and monitor the requests in IIS, I see that the request still sits there for about 20 seconds before being removed. Is that 20 seconds a random interval, or can that be SET somewhere? To be clear, it's not that the page takes 20 seconds to execute, it will execute forever (in this test case) but it seems IIS gives up on it after 20 or so seconds after I log out.
Is there some way to see "orphaned" requests, I.E. requests in the app pool that nobody is waiting for anymore
What else can I do to try and debug this? A dump of w3wp says there are client connections with an HTTP request state of HTR_READING_CLIENT_REQUEST.
I keep getting suggestions of modifying IIS config settings such as AspRequestQueueMax, every time I try looking those up in the ApplicationHost.config I don't see those items set, so either I'm looking at the wrong place, or a default value would not be explicitly set in the config. This begs 2 questions: a) How do you READ these config values, i.e. get current value, b) how do you SET these.
A Classic ASP request will keep running until the script timeout is reached, regardless of whether the client is connected or not. I believe the default is 90 seconds, but an .ASP file can override this by setting the Server.ScriptTimeout property directly (which is pretty common). If your request queue is filling up then this is likely the reason and changing the defaults will not help.
If you can edit the ASP code, you can add logic like this in potentially long running sections:
If Not Response.IsClientConnected Then Call Response.End()
You can also global search your code for Server.ScriptTimeout to understand from where the abuse is coming.
If you do want to change the default script timeout, here is where it is stored:
https://www.iis.net/configreference/system.webserver/asp/limits
To change via the IIS7 GUI go to: (web site) > (features view) > ("IIS" category) > "ASP" > expand "Limits Properties" node > "Script Time-out"

Log file writing extremely delayed in WebSphere App Server

I am experiencing an issue with delayed writes to the application logs for a Java EE web application running in IBM WebSphere v. 7.x. Logging statements taking up to an hour to appear in the application logs.
The problem doesn't appear related to heavy loads; WAS is responding to page requests almost instantly, and I am testing against a box that isn't used for performance testing, and on a holiday no less -- there is very little activitiy on the server.
My guess would be that the thread associated with logging has been configured with very low priority, but I cant figure out where that would be configured via the admin console or the configuration files.
Has anyone else experienced this sort of issue with WebSphere?
it's possible you don't even enough available threads in the thread pool. Its consistant with the page requests being fast, as they are controlled by the WebContainer threads.
Try increasing it:
Servers > Application Servers > Thread pools > ...
Not sure exactly which one to increase its max value. In worst case, increase'em all. Increase it heavily, so to be sure.
Other options:
make sure you enough disk space / try to connect with jConsole to inquire.

How much load can a standard Jira on Tomcat installation take?

We are running Jira on a 4 way 32 bit RHEL box with 4 GB RAM with no problems so far. However we anticipate an increase in the number of users and would like to know the maximum no. of simultaneous requests that a Tomcat-Jira server can handle. (The Jira application is deployed on a standalone tomcat server that runs nothing else, so even Tomcat only statistics would help).
Currently we average around 6000 hits a day. Does anyone have any statistics from their setups for a comparison?
As a consultant and Atlassian partner, I see many different JIRA installations. Most have 10K issues, some have 200K issues or more. It's rarely the hits/day that are the limit. Usually you'll find your database config is the culprit or sometimes the number of users managed by JIRA - over 8K users with JIRA 3.13.x can get slow.
~Matt
The main thing you should be looking at is the HTTP Connector definition in JIRA's conf/server.xml file. Our JIRA (which I think uses the defaults) shows the config as:
<Connector port="8080"
maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" useBodyEncodingForURI="true"
enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" />
The thread values are the ones to consider, they define how many concurrent users the server can sustain. A maxThread value of 150 means a hard limit of 150 users at the same time.
If you're currently seeing 6000 hits a day, you're going to have to increase that by several orders of magnitude before hitting the limits of the default settings.
What you should do is setup a stress test. Every application is different and so you need to find where the issues are with your setup. You can use JMeter, a custom application, Selenium, or a number of other options to put your server under load. Grow the number of users slowly adding a few every 5 minutes. You will start seeing problems and can address them as they are hit. Repeat this until you have reached the number of concurrent users, hits per minute, or whatever other metric is important to you.

Resources