Can anyone see a reason not to enable the WSDL Cache in Magento?
I have an EPOS system that is periodically talking to Magento from outside the network. When it does this, the site suffers from a huge dip in speed, as it appears to struggle with the SOAP API. Even hitting the site with an https request like this:
https://[site-url]/api/v2_soap?wsdl=1
The response can be up to 10 seconds. Sometimes, when lots of these requests are made, the server grinds to a halt, and there are many sleeping connections left in the MySQL database.
On checking whether Magento is configured for WSDL caching, I notice that it isn't. I didn't develop the site, however, and I'm wondering if there are any legitimate reasons not to enable this feature?
Maybe this is obvious: for debugging.
I've experienced an issue (running Magento and PHP-FPM) where the WSDL cache became corrupted during a huge spike in traffic, which resulted in 503 errors whenever a SoapClient was constructed. That cache didn't clear through restarts of PHP-FPM, Apache, and the machine. Clearing the SOAP cache solved the problem, but it took some time to debug, and cache issues tend to be extra maddening.
I should say that I have no idea if this is a common issue, but the WSDL cache is a component that, like any component, can break.
Related
We are working on an ASP.NET 5 Web API project that is in production now but we are experiencing an issue where it becomes unresponsive intermittently throughout the day.
A few notes about the application architecture. It is an ASP.NET Web API project using a MariaDB database on a separate EC2 instance within the same private network. The connection string uses the private IP of the database server to avoid any name resolution issues. The site is hosted via IIS 10.
The application itself has been developed carefully following the best practices provided by Microsoft. Heavy focus on async operations, minimizing query response times and offloading more expensive operations into background services.
The app is extremely responsive. It performs with sub 100ms responses on almost all requests, even the more complicated requests, and all the way up until it becomes unresponsive this high level of performance remains the same. We tend to see between 10-30 requests per second and 300-500 select queries per second at peak usage so not too extreme. However, randomly (2-3 times over a 24 hour period) it will begin hanging on requests and simply not respond to the request. During this time, the database is still extremely responsive and we are never over 300 connections out of our 512 connection limit.
The resources on the application server itself are never really taxed much at all. The CPU never gets above ~20% and the memory usage sits around 20-30%.
If I were to stop the site in IIS and start it again while this is happening, it will quickly come back online. If I don't it will be down for a few minutes until IIS finally kills it due to a failed health check. There are no real errors generated as a response to the issue other than typical errors caused by the hanging of the process such as connection terminated errors. The only thing I have seen before that gave me pause was the fact that there a few connection timeouts when getting the connection from the pool, but like I said, the connections to the server are never close to the limit.
Also, this app and version has been in production for months and it wasn't until the traffic volume started to grow that we started seeing these issues. At this point, I am at a loss for next steps of troubleshooting and I'm seeking suggestions.
In IIS App Pool advanced settings set Start Mode to AlwaysRunning
I never found a root cause for this issue, however, after updating to newer versions of .NET MVC this issue went away. My best guess is that changes with the Kestrel possibly resolved this issue, although, I have no idea what specific change that might have been. I have gone through the change logs a few times and didn't see anything that specifically jumped out at me.
Using localhost and Tomcat 7, I'm seeing between 600-800ms per request in Chrome Developer tools for a specific webapp. Requests are JS files, CSS files, images or the initial server response. Some responses are less than 1KB, others are over 100KB.
As a result, it's taking around 10 seconds to load one page of the webapp. When I load the same webapp on our production server, it's taking less than 1 second to load an entire page.
I'm not sure where to continue debugging the issue...
I've ruled out it being a browser issue by testing in Safari too.
I've turned it off and on again
Reduced response to 500-600ms overall
I've cleared out my log files
I've ruled out the webapp's frontend entirely by hitting a resource directly, ex: http://ts.xyz.com:9091/1.0/toolsList/javascript/toolsList.js or http://ts.xyz.com:9091/awake
I've tested another webapp and that performs lightning-quick
So, it has to be this particular app and it has to be locally.
I've seen such behaviour long time ago when the webserver (Apache httpd back then) was configured to make DNS lookups for logs - these took awfully long time especially when an IP could not be resolved. As it doesn't make sense for a localhost app to be orders of magnitude slower (especially when you're talking about serving static resources) I'd check for any network related issues: Database connections, logging configurations, DNS lookups, TLS server trust issues (with backends, database, LDAP or others).
I can't decide if I add this as "if everything else fails" or rather add this as "but first try this:"... you decide:
Compare the setup of your production server with your development server (localhost) and make extra extra extra sure that there's no meaningful difference.
I have an asp.net mvc4 web api interface that gets about 54k requests a day.
http://myserv.x.com/api/123/getstuff?whatstuff=thisstuff
I have 3 web servers behind a load balancer that are setup to handle the http requests.
On average response times are ~300ms. However, lately something has gone awry (or maybe it has always been there) as there is sporadic behavior of response times coming back in 10-20sec. This would be for the same request hitting the same server directly instead of through the load balancer.
GIVEN:
- System has been passed down to me so there may be gaps with IIS confiuration, etc,.
- Database: SQL Server 2008R2
- Web Servers: Windows Server 2008R2 Enterprise SP1
- IIS 7.5
- Using MemoryCache aggressively with Model and Business Objects with eviction set to 2hrs
- Looked at the logs but really don't see anything significantly relevant
- One application pool...no other LOB applications running on this server
Assumptions & Ask:
Somehow I'm thinking that something is recycling the application pool or IIS worker threads are shutting down and restarting thus causing each new request to warmup and recache itself. It's so sporadic that it's tough to trouble shoot right now. The same request to the same server comes back fast as expected (back to back N requests) since it was cached in about 300ms....but wait about 5-10-20min and that same request to the same server takes 16seconds.
I have limited tracing to go by as these are prod systems so I can only expose so much logging details. Any help and information attacking this or similar behavior somebody else has run into is appreciated. Thx
UPDATE:
The w3wpe.exe process grows to ~3G. Somehow it gets wiped out and the PID changes so itself or something is killing it every 3-4min I see tons of warnings in my webserver (IIS) log:
A process serving application pool 'MyApplication' suffered a fatal
communication error with the Windows Process Activation Service. The
process id was '1732'. The data field contains the error number.
After 4-5 days of assessing IIS and configuration vs internal code issues I finally found the issue with little to no help with windbg or debugdiag IIS tools. Those tools contain so much information even with mini dumps or log trace stacks that they can be red herrings. Best bet was to reproduce it by setting up a "copy intelligently" instance of a production system, which we did not have at the time and took a bit for ops to set something up.
Needless to say the problem had to do with over cacheing business objects. There was one race condition where updates on a certain table were updating an attribute to that corresponding business object (updates were coming from multiple servers) which was causing an OOC stackoverflow that pretty much caused the cacheing to recursively cache itself to death thus causing the w3wp.exe process to die and psuedo-recycle itself. It was one of those edge cases that was incredibly hard to test and repro in a non-production environment.
I have a site that is moving incredibly slowly right now. Both Safari's inspector and Firebug are reporting that most of the load time is due to latency. The actual download is happening in less than a second. There's a lot of database activity in play (though the metrics on that indicate that it's pretty healthy), but what else can cause really high latency? Is it a purely network thing or are there changes I can make to the app to improve the latency numbers?
I'm using YSlow to help identify performance improvements, but on the whole, I don't see it reporting anything that seems crazy unreasonable. Opportunities for improvement, certainly, but nothing that seems like it would cause the huge load times I'm seeing.
Thanks.
UPDATE
Some background and metrics, in case it's useful. This is a CakePHP application and I'm using my UsersController::login action as the benchmark. For the sake of identifying how much of a factor the application code plays in this, I've printed a stacktrace immediately upon entering UsersController::beforeFilter(). Here's output:
UsersController::beforeFilter() - APP/controllers/users_controller.php, line 13
Controller::startupProcess() - CORE/cake/libs/controller/controller.php, line 522
Dispatcher::_invoke() - CORE/cake/dispatcher.php, line 187
Dispatcher::dispatch() - CORE/cake/dispatcher.php, line 171
[main] - APP/webroot/index.php, line 83
Load times, as shown by Safari's inspector range from 11.2 seconds to 52.2 seconds. This would seem to point me away from the application code and maybe something with my host, but maybe I'm completely misinterpreting this or oversimplifying it?
If you cannot identify directly a slow moving component of your application, there are a number of other steps along the way that can certainly slow your site down. Whenever I'm experiencing unusually long polling, I typically start by looking at the local DNS and then onto my hosted DNS. Sometimes a cache refresh (on their part, not yours) can cause a lot of polling until their database has caught up.
Else, they might actually have a service outage and your requests are being made to their secondary or backup server. If everything seems fine in terms of domain resolution, your hosting provider might be experiencing a service outage that can take a number of different shapes like serving static content from their backups or over-allocating shared resources until everything is running as it should. You can experience a ton of what they call throttling on shared cloud architectures when they have a box go down. On the plus side, you don't have a total outage in this circumstance.
One time, and this was just in a shared grid configuration, I had a processor go to hell. The bizarre part of it was that static content was still serving from a backup, but it was still polling against our database (which was on a different server) and causing our account to throttle because of over allocation on the backup. Wasn't our fault, but the host started sending nasty emails about our excessive long-polls. Moral of the story is, if it's not your application, and it's out of the blue, somewhere along the line I'll bet you'll find some hardware failure or misconfiguration.
Also now that I think of it, if you are syndicating some outside content (be it server or browser side) it might not be in your chain of responsibility altogether. If you are serving ads for example from a subscriber service, they might be having a high-load period or outage. These are just the steps that I would take to narrow down the culprit.
Probably this will be not the solution for you, but when I has doggy slow safari (and FF too) I simply changed the DNS servers to opendns (208.67.222.222, 208.67.220.220) and all my problems are resolved.
Is anybody using Windows AppFabric Server for out of process state management?
Any feedback, advice would be appreciated.
Using AppFabric Caching. We tried this and it appeared to work, was easy to setup etc. There are some very strange settings when setting up the cache about peristance which need to be read carefully.
Our issue was on two server we installed IIS and Appfabric Caching and told app to try the local one first. When we went into production is just started to fail. It appears that with only two servers there is a lead server which if it goes down things stop working, we read that if needed to scale to 3 or more servers to get the behaviour we wanted. Not an option when just gone live and not working so we switched to SQL server for now while we look at nCache and ScaleOut and Memchached
The other issue is that caching and session state are not the same animal, if you loose your cache it should not be the end of the world just put it back together, we need to keep session state for the lotted time period at all costs.