I have hosted a state machine Worklow as a WCF service..And the workflow is called in an ASP.NET code. I used netTcpContextBinding for workflow hosting. Problem is that if a SendRecieve activity within the workflow is taking a lot of time (say 1 minute) to execute, then it will show transaction aborted error and will terminate.. i have already set the binding values for send, recieve, open, close timeouts to maximum values in both web.config and the app.config..
How can i overcome this issue?
A TransactionScope has a default timeout of 60 seconds so if whatever you are doing in there takes longer it will time out and abort. You can increase the timeout on the TransactionScope but quite frankly the 60 seconds is already quite long. In most cases you are better of at doing any long running work to collect data before the transaction and keep your transaction time as short as possible.
Related
We're currently using H2 version 199 in embedded mode with default nio file protocol and MVStore storage system. The write_delay parameter is set to 60 seconds.
We run a batch insert/update/delete of about 30.000 statements within 2 seconds (in one transaction) followed by another batch of a couple of hundred statements only 30 seconds later (in a second transaction). The next attempt to open a db connection (only 2 minutes later) shows that the DB is corrupt:
File corrupted while reading record: null. Possible solution: use the recovery tool [90030-199]
Since the transactions occur within a minute, we wonder whether the write_delay of 60 seconds might be contributing to the issue.
Changing write_delay to 60s (from a default value of 0.5s) will definitely increase your risk of lost transactions, and I do not see a good reason for doing it. Should not cause a db corruption, though. More likely some thread interruptions do that, since you are running in the same JVM a web server and who knows what else. Using async file store might help in that area, and yes it is stable enough (how much worse it can go for your app, than a database corruption, anyway).
I'm load testing a system with 500 virtual users. I've kept the "Ramp-Up period (in seconds)" option to zero. So, what I understand, JMeter will hit the system with 500 virtual users all at the same time. Please correct me if I'm wrong here.
Now, the summary report shows the average response time for the first page is ~100 seconds!. Which is more than a minute and a half of wait time. But while the JMeter is running, I manually went to the same page/url using a browser and didn't have to wait for that long. It was not even close, the page response was almost immediate for me.
My question is: is there any known issue for the average response time of the first page? Is it JMeter which is taking long to trigger that many users?
Thanks in advance.
--Ishtiaque
There is no issue in Jmeter related to first page response time.
Summary Report shows all response time details in Milliseconds, the value "100" seconds have you converted milliseconds to seconds?
Also in order to make sure that 500 users hit concurrently, use Synchronizing Timer.
Hope this will help.
While the response times will be accurate, you need to consider the affect of starting so many threads at once on both your server and your client.
500 threads to start at once is not insignificant n the client. If your server has the connections, it will start 500 threads as well.
Ramping over a period of time is more realistic loadwise, but still not really indicative of server capability until the threads have all started and settled in.
Databases can also require a settling in period which can affect response times.
Alternative to ramping is introducing a random wait at the start of each thread before firing the first sample. You can then choose not to ramp over time, but still expect resources on the client to suddenly come under load and change the settings if you hit limits. This will make the entire run much more realistic of typical behaviour. However, you need to determine if your use cases are typical.
Although the heap size is increased, i notice there is still longer time as compared to actual response time. Later i realised it was the probe effect (the extra time a tool generates due to test execution)
We are seeing inconsistent performance on Heroku that is unrelated to the recent unicorn/intelligent routing issue.
This is an example of a request which normally takes ~150ms (and 19 out of 20 times that is how long it takes). You can see that on this request it took about 4 seconds, or between 1 and 2 orders of magnitude longer.
Some things to note:
the database was not the bottleneck, and it spent only 25ms doing db queries
we have more than sufficient dynos, so I don't think this was the bottleneck (20 double dynos running unicorn with 5 workers each, we get only 1000 requests per minute, avg response time of 150ms, which means we should be able to serve (60 / 0.150) * 20 * 5 = 40,000 requests per minute. In other words we had 40x the capacity on dynos when this measurement was taken.
So I'm wondering what could cause these occasional slow requests. As I mentioned, anecdotally it seems to happen in about 1 in 20 requests. The only thing I can think of is there is a noisy neighbor problem on the boxes, or the routing layer has inconsistent performance. If anyone has additional info or ideas I would be curious. Thank you.
I have been chasing a similar problem myself, with not much luck so far.
I suppose the first order of business would to be to recommend NewRelic. It may have some more info for you on these cases.
Second, I suggest you look at queue times: how long your request was queued. Look at NewRelic for this, or do it yourself with the "start time" HTTP header that Heroku adds to your incoming request (just print now() minus "start time" as your queue time).
When those failed me in my case, I tried coming up with things that could go wrong, and here's a (unorthodox? weird?) list:
1) DNS -- are you making any DNS calls in your view? These can take a while. Even DNS requests for resolving DB host names, Redis host names, external service providers, etc.
2) Log performance -- Heroku collects all your stdout using their "Logplex", which it then drains to your own defined logdrains, services such as Papertrail, etc. There is no documentation on the performance of this, and writes to stdout from your process could block, theoretically, for periods while Heroku is flushing any buffers it might have there.
3) Getting a DB connection -- not sure which framework you are using, but maybe you have a connection pool that you are getting DB connections from, and that took time? It won't show up as query time, it'll be blocking time for your process.
4) Dyno performance -- Heroku has an add-on feature that will print, every few seconds, some server metrics (load avg, memory) to stdout. I used Graphite to graph those and look for correlation between the metrics and times where I saw increased instances of "sporadic slow requests". It didn't help me, but might help you :)
Do let us know what you come up with.
I have an application running in CF8 which does calls to external systems like search engine and ldaps often. But at times some request never gets the response and is shown always in the the active request list.
Even tho there is request timeout set in the administration, its not getting applied to these scenarios.
I have around 5 request still pending to be finished for the last 20hours !!!
My server settings are as below
Timeout Requests after ( seconds) : 300 sec
Max no of simultaneous requests : 20
Maximum number of running JRun threads : 50
Maximum number of running JRun threads : 1000
Timeout requests waiting in queue after 300 seconds
I read through some articles and found there are cases where threads are never responded or killed. But i dont have a solid solution how can i timeout this or kill this automatically
really appreciated if you guys have some idea on this :)
The ColdFusion timeout does not apply to 'third party' connections.
A long-running LDAP query, for example, will take as long as it needs. When the calling template gets the result from the query your timeout will apply.
This often leads to confusion interpreting errors. You will get an error saying that whichever function after the long running request causes the timeout.
Further reading available here
You can (and probably should) set a timeout on the CFLDAP call itself. http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7f97.html
Thanks, Antony, for recommending my blog entry CF911: Lies, Damned Lies, and CF Request Timeouts...What You May Not Realize. This problem of requests not timing out when expected can be very troublesome and a surprise for most.
But Anooj, while that at least explains WHY they don't die (and you can't kill them within CF), one thing to consider is that you may be able to kill them in the REMOTE server being called, in your case, the LDAP server.
You may be able to go to the administrator of THAT server and on showing them that CF has a long-running request, they may be able to spot and resolve the problem. And if they can, that may free the connection from CF and your request then will stop.
I have just added a new section on this idea to the bottom of that blog entry, as "So is there really nothing I can do for the hung requests?"
Hope that helps.
My application that wraps around Oracle Data pump's executables IMPDP and EXPDP takes random amounts of time for the same work. On further investigation, I see it waiting for again random amounts of time with the event 'wait for unread message on broadcast channel'. This makes the application take anytime b/w 10 minutes to over an hour for the same work.
I fail to understand if this has something to do with the way my application uses these executables, or it has got something to do with Load on my server or something totally alien to me.
There's a bunch of processes and sessions involved in a data pump operation.
I suspect you are looking at the master processes, not at the worker processes. So all that event is saying is that the Master process spends more time waiting for the worker process when the job takes longer. Which is fairly useless information.
You need to monitor the worker processes and see why they are taking longer.
Those wait events are usually considered to be "idle" waits - i.e. Oracle has nothing to do, it is waiting for further data/instructions.