How to resolve recurring OBIEE 500 errors? - obiee

We use a SOAP call to invoke an Oracle OBIEE report from a third-party scheduler. On some reports, after 9 minutes 31 seconds I get a server 500 error from the Oracle report server. I have verified it's not a connect timeout or read timeout issue. We can re-run these jobs once or twice and they are eventually successful. Other reports run much longer and are successful first time.
Service is https://server:port/analytics-ws/saw.dll?SoapImpl=nQSessionService
When they run the same reports from the Oracle scheduler, they seem to be 100% successful on the first execution. Likewise, when the same reports are run from the design GUI, 100% successful. None of the servers (db, reporting) seem to have high CPU or memory at the time, but it does seem like the errors are more frequent with more concurrent reports running.
Has anyone run into this odd behavior? Or have any suggestions?

Related

503 Error while Running JMeter for Thread 400,Is it Because of Server issues?

Getting 503 Error while Running the JMeter for the Thread User 400,Is it Because of Server issues.? When I run the thread group for 100 user with ramp up period 25 seconds then it will be working fine but for the user 400 users its giving 503 error.
Given you don't experience any issues with 100 users and have issues with 400 users most probably it's a server issue connected with the overload so congratulations on finding the bottleneck.
You can either report it as is or perform a little bit deeper investigation in order to find the cause, suggested steps:
Instead of kicking off 400 users at once try increasing the load gradually at the same time looking at Response Times vs Threads and Transaction Throughput vs Threads charts. Ideally response time should remain the same and throughput should be growing as the number of threads increase. When response time starts increasing and throughput starts decreasing it indicates the saturation point and at this stage you can state that this is the maximum number of users your application can support
Check your application logs and configuration as it might be not properly tuned for the high loads, you can use 15 Simple ASP.NET Performance Tuning Tips as a reference or look for a similar guide for your application technology stack
Ensure that your application has enough headroom to operate in terms of CPU, RAM, Network, etc. as it might be the case that it's basically a lack of resources, it can be done using i.e. JMeter PerfMon Plugin
Repeat your test with profiler tool telemetry in place, this way you will be able to localize the problem and state where is the problematic piece of code or inefficient algo lives.
If server isn't down/restarted, then yes, 503 indicate overload
Common causes are a server that is down for maintenance or that is overloaded
You need to find what stop server from serving 400 concurrent requests/users
Notice that if you are testing on a test environment which isn't equal/similar to production environment, it may not reflect the load that production server can endure

oci8 driver:bad connection intermittently

I have been using oci8 for over a year now for several batch processes. There I used to make oracle calls based on a particular frequency without any high number of parallel requests. Recently I started using this driver to process multiple number of user requests in parallel using go routines. The connections go through 90% of the times but for remaining 10% I see an error driver: bad connection being thrown from this driver. This is generally happening in two situations:
When the connection was left idle for too long(happens for few requests).
When there is a spike in number of connections.
Actions taken:
Already checked with my oracle DB for connection/session limits. There is no such limit on it.
Tried forking the branch and adding error logs which didn't seem to compile.
Most of the people who have faced this issue mentioned wrong handling of multiple connections at the same time. For me that is something done by oci8.
Please help!

Openshift proxy timeout

I have an application hosted with Openshift and I need it to generate some Excel reports. The report generation process can take a long time (over 5 minutes). This causes the the client to see a 502 error and the request times out. How can and where can I configure my Openshift stack (it is a Java webapp running from Tomcat6) to increase the timeout duration?
5 minutes is an awfully long time for a web request to run. It would be better to have the web request schedule a background job that then notifies the user when the report is done being generated.

Spring #Scheduled After A Server Restart

I'm creating a mechanism in my web server whereby a scheduled task will execute every 15 minutes and notify users if any activity has occurred within that time frame. It would work as follows:
Annotate a with #Scheduled and schedule to run every 15 minutes
When the task runs, scrape the database for any changes within 15 minutes of the current time
A couple problems I can see:
If I have to restart the server and it's down for longer than 15 minutes, I would need to look back longer than 15 minutes so that no activity is missed.
I m running a number of tomcat servers and only one of them needs to execute the task. Otherwise, duplicate emails will be sent to users.
Has anyone dealt with this before? I'm thinking that this should really be a task external to the web servers... that would solve the issue of duplicate emails being sent, but it wouldn't solve the server bounce issue.
Any ideas on how to solve would be greatly appreciated!
I would have done the following steps to perform the scheduling:
On Application startup query for tasks from database (only those which don't have a dirty flag set to false) and schedule it.
On each run of scheduled task put a dirty flag to suggest the task has run
Because I will be retrieving those tasks only which are marked as dirty, the issue of multiple emails should not occur even on server startup.

Can I exclude counts from failed webtests in a VS.Net 2010 Loadtest?

I am using Visual Studio 2010 Ultimate to perform loadtests. These loadtest use recorded webtests.
When running a loadtest with an increasing number of concurrent users, some steps in my webtests will start to fail. The first error is often an internal server error 500. This will give a wrong impression of the average page_load, because these internal server errors are often returned very fast, in contrast to the generation a succesful response. So, when the load increases, the average page_load drops.
Of course, I need to attend to these internal server errors, but in the meantime, I would like to exclude failed webtests from my measurements.
Does anybody know if this can be done?
Thanks in advance.
It may be possible to run your own query on the test results database that ignores errors, but even that will be inaccurate.
Remember that the page return stats are really only useful when read in conjunction with the load on the hardware.
Essentially, the load test is recording the effect on your hardware of a given load. If you website is returning a large number of 500 error pages quickly, the load on the hardware will be affected and any page stats will reflect the change in server loading.
You will have to investigate the cause of the 500 errors and either fix the issue or report in your load testing results that once a load of 'x' is reached on the servers, the pages 'y' will give an internal server error 500 result instead of the requested page.
This gives the business owners of you app some information to make the decision whether to fix the problem or live with it.

Resources