The following issue happens in multiple browsers, mutiple users both in Dev and Prod Environments.
While working in 4 specific pages, the users are reporting it to freeze at random moments of the day for a few minutes and them summarizing after it. This is not easily replicable and it happen only after a few tries.
Front-end: I've already tested javascript loops, ajax call loops and stackoverflow and none of those seem to be the issue. In Chrome's network tool I'm getting a "Pending" status for X minutes on a returning json from a execute server-side code
Database: the cost for the execution of the processes are all in the single digits. No locks
ORDS: The only information when were able to get was ORDS' log that does something after X minutes. When the page is resposnsive
This all leads me to think something in ORDS is having an issue (or is symptom) or the network.
Has anyone been through a similar issue?
Related
Across all browsers/devices, I find random different pages, at random times, are very slow to load/don't load. The browser is stuck on 'Waiting for website.com'. I will wait 20 seconds and nothing will happen until I manually refresh the page. As I realise this is very vague, can you suggest a) most likely issues to look for first or b) some diagnostic tools that I could use to try and de-bug the issue as a starting point, so that my hosts/developers can solve the issue. Here are some results of recent speed tests.
One thing to also add is that, it seems it more often gets stuck on particular pages. Namely the pages where users take practice tests. After each time the user clicks 'Next', their selected answer is inserted into the database. My speculation is that potentially it's an issue with the DB itself, or the process which inserts into the database. It's when clicking 'Next', that the whole website sometimes just dies as described above.
Results from Google Speed Test
Waterfall image
A wait time of 20secs at random times and random pages could possibly be due to stop-the-world garbage collection. So GC logs are probably a good starting point.
A thread sampler such as Djigger a colleague of mine wrote might probably also help you figuring out what the machine is doing during the 20 seconds.
If that doesn't help I suggest to use a Profiler or better an APM tool to monitor whats going on on your system. Those tools give a you a broader insight of the internals.
You need to run a few page speed tests and look at the waterfall images.
It is very common on shared servers for the server to be too busy to get to your request. 20 seconds would indicate a serious issue with the server.
Another common reason is the page has a link to a third party resource and that resource is too often unavailable.
In your case the culprit is website.com and I assume that is your site.
Use something like webpagetest.org to run the tests.
In the waterfall image below
Dark Green is DNS lookup time.
Orange is the time for Browser to connect to server.
Green is the wait time for server to put image in output buffer.
Blue is the time for the server to transmit to the Browser.
The problem with the sample waterfall page is the index page took 4 seconds to be generated or retrieved. Most likely this is a Word Press site with plugins.
I suspect yours may be 20 seconds. But due to the randomness, is is also a good possibility it is a page resource that is stalled.
If it is the index page, then you likely have a poor ISP and or one of the other users of the server is hogging the CPU.
Keep running the tests until you see the problem occur.
It will be very obvious where the problem is located.
You can post the waterfall image and send me a message if you have any questions.
Waterfall from webpagetest.org
Several times per day (though we cannot reproduce it ourselves), we're seeing instances of sessions being dropped.
What I mean is I have logs of the user coming to the site, performing a few requests, and then having each of their next few requests get a different session identifier and thus wiping out everything in their session. Same IP, same browser, and all of this happens in the course of a couple seconds. The session timeout is configured to 20 minutes.
It doesn't appear to be related to a specific browser, as users have claimed coworkers don't experience the issue on the same machine.
What's really bizarre is that for some requests I can clearly see one session ID coming in through CGI.HTTP_COOKIE and another one is assigned during the course of the request (by the time we get an error email, which is caused by their lack of session). WTF?
To my knowledge, nothing in our application code could be causing this. We use session variables of course, but don't wipe or reset the session ID cookies. I was under the impression that's completely handled by the server.
I'm ripping my hair out here. Any ideas on even how to go about debugging this would be appreciated.
I have a Drupal 6 site that is frequently (about once a day) going down. The hosting provider is reporting that something in our site code is occupying all Apache threads but keeping them idle, making the server run out of threads to respond to new requests. A simple restart of Apache frees the threads and fixes the issue, though it reoccurs within a few hours or a day.
I have no idea how to troubleshoot this issue and have never come across PHP code doing this. Is there some kind of Apache settings change I can make to capture more information about what might be keeping a thread occupied but idle? What typical PHP routines can cause this behavior? I looked for code that connects to external resources, but didn't see any issues there.
Any hints for what to look at, capture more information, or PHP code that can cause this would be most useful.
With Drupal6 you could have the poormanscron module running sometimes, or even the classical cron (from crontab wget or whatever).
Then you could get one heavy cron operation putting your database under heavy stuff. Then if your database reponse time is becoming very slow every http request will become very slow (as for example sessions are in the database, and several hundreds queries are required for a drupal page). having all reqests slowing down may put all the avĂ ilable php process in a 'occupied state'.
Restarting apache all current process are stoped. If you run the cron via wget and not via drush cron tasks are a nice thing to check (running cron via drush would make it run via php-cli' restarting apache would not kill the cron). You can try a module like elysia cron to get more details on cron tasks and maybe isolate the long ones (you have a report on tasks duration).
This effect (one request hurting bad the database, all requests slowing down, no more process available) could also be done by one bad piece of code coming from any of your installed modules. This would be harder to detect.
So I would ensure slow queries are tracked on MySQL (see my.cnf otinons), then analyse theses requests with tolls like mysqsla. The problem is that sometimes one query is so big that all query becames slow. Se use time of crash te detect the first ones. Use also tho MySQL option to track queries not using indexes.
Another way to get all apache process stalled on php operation with drupal is having a lock problem. Drupal is using is own lock implementation with MySQL. You could maybe add some watchdog (drupal internal debug messages) calls on theses files to try to detect locks problems.
Then you could also have sonme external http requests calls made by drupal. Calling external websites like facebook, google, some tiny url tools, or drupal.org module update things (which always try to find all modules, even the one you write). If the distant website is down or filtering your traffic you'll have problems (but the apache restart would not help you, so it may not be that).
I am using Meteor on Heroku (free tier) with MongoHQ. My app is very simple right now, it loads 3-4 entries from a Collection, but when I deploy it to Heroku, I am seeing ridiculous load times (1-2 minutes). The HTML is rendered immediately. When I deploy to Meteor.com's free server, load times are a lot lower but still about 15 seconds for 4 tiny pieces of data. I'm not seeing this whatsoever when I deploy locally, app pulls data from the DB right away.
It is worth noting that I don't think it's an "idling" issue for Heroku. Even if I already have one browser window with the app just opened, if I use a different browser and try again I still get 1-2 minute load times. Once the data is loaded, however, performance goes back to being great, I can read and write with no problems.
What am I missing? I'm not seeing any errors in the console, mongo shows several queries in the logs and shows that it is responding quickly with 4 documents, but apparently somewhere in the middle there's a traffic jam. Any help with this is greatly appreciated, if I can't get past this Meteor is useless for my needs right now.
UPDATE: I've been watching it closely in Firebug, and it looks like the performance is largely inconsistent. Sometimes a simple refresh will take 1 minute, sometimes it will take 10 seconds. But what I've noticed is that the times when its slow, it GETs the sockjs/info file, then right after that the sockjs POST is aborted (sometimes multiple times). When it runs fast, the POST and subsequent POSTs run smoothly
Slow:
GET http://pocleaderboard.herokuapp.com/sockjs/info 200 OK 22ms
POST http://pocleaderboard.herokuapp.com/sockjs/029/su0d77fb/xhr Aborted
GET http://pocleaderboard.herokuapp.com/sockjs/info 200 OK 27ms
POST http://pocleaderboard.herokuapp.com/sockjs/132/uljqusxd/xhr Aborted
GET http://pocleaderboard.herokuapp.com/sockjs/info 200 OK 28ms
POST http://pocleaderboard.herokuapp.com/sockjs/154/kcbr6a5p/xhr Aborted
Fast(er):
GET http://pocleaderboard.herokuapp.com/sockjs/info 200 OK 1.08s
POST http://pocleaderboard.herokuapp.com/sockjs/755/xiggb555/xhr 200 OK 1.02s
Meteor gets loaded that fast locally, because it doesn't depend on your internet connection and the files can just be read from your harddrive and don't need to be downloaded.
And once the data is loaded it's the same everywhere you host, because the client (you) perform all actions on your cached mongo database and then just wait for the server to say if the action was alright or not.
But for the Heroku loading times, I have no idea, Sorry!
UPDATE:
These are the long-pulls from SockJS that is used by Meteor.
Normally these pulls only get Aborted on a hot code push (when a file is added/changed/removed).
Either you or Heroku seem to write or change something in the directory.
Because then a hot code push may be initiated by Meteor.
Heroku may not support web-sockets, which means you're stuck with the slower polling approach. See this:
https://devcenter.heroku.com/articles/using-socket-io-with-node-js-on-heroku
Sometimes, perhaps once every few hundred AJAX requests and/or where AJAX requests are executing in quick succession, I've seen a request hang for up to several minutes before it completes. The weird thing is that the request completes successfully AND neither my machine or the server are really doing anything either (e.g. CPU and other resources are not spiking during the "hang").
I've noticed this issue with various web services too, so it isn't just my own website. It also isn't database related as it has happened on non-database sites. It also only seems to show up in non-localhost environments.
When I personally use AJAX, I am also using jQuery, so this might also be a jQuery issue. I also mostly use Firefox, so I don't know if this is just a Firefox issue or if it is an issue any browser could have. I've run into the issue on multiple computers in multiple locations.
If you have run into this issue before and "fixed" it, I would appreciate the solution you came up with.
Use a HTTP debugger like Firebug or Fiddler to track the AJAX requests to see how much time it takes (possible timeout issue due to server setting?) & what HTTP response status code it returns when it fails. Use the HTTP response status code to troubleshoot the issue.