Is it possible to change the resolution time calculation to start not with the issue creation time, but rather with the time when an issue was transferred into a certain state?
The use case is as follows - We use a kanban-ish development method, where we create most issues/featues/stories in a backlog upfront; thus, this kills the usefulness of the resolution time gadget. In our case, the lead/resolution time should rather be calculated using the time where an issue has been pulled to the selected issues.
As this calculation is the basis for multiple gadgets, maybe it could be changed per gadget in order to avoid unforeseen issues with other gadgets?
There is a service level management tool SLAdiator (http://sladiator.com) which calculates resolution / reaction times based on the duration that ticket has spent in a certain status (or statuses). You can view these tickets online as well as get reports.
Related
I have one spring scheduler , which I will be deploying in 2 different data center.
My data centers will be in active and passive mode. I am looking for a mechanism where passive data center scheduler start working where that data center become active .
We can do it using manually changing some configurations to true/false but , I am looking for a automated process.
-Initial state:
Data center A active - Scheduler M is running.
Data center B passive - Scheduler M is turned off.
-May be after 3 days.
Data center A passive - Scheduler M turned off.
Data center B active - Scheduler M is starting
I don't know your business requirements but unless you want multiple instances running but only one active, the purpose you will have a load balancer would be to spread the load to multiple instances of the same application rather to stick with only one instance.
Anyway I think an easy way of doing this without using a very sophisticated mechanism (coming with a lot of complexity depending where you run your application) would be this:
Have shared location such as a semaphore table in your database storing the ID of the application instance owning the scheduler process
Have a timeout set for each task. Say if the scheduler is supposed to run every two minutes set the timeout to two minutes.
Have your schedulers always kick off on all application instances
Once the tasks kicks off first check if it is the one owning the processing. If yes do the work, if not go at point 7.
After doing the work record the time stamp of the task completion in the semaphore table
Wait for the time to pass for the next kick off
If not the one owning the processing check when the task last run in the semaphore table. If the time since last run is greater than the timeout set for that process take the ownership of the process (recording your application instance id in the semaphore table)
We applied this and it ran very well with one of our applications. In reality it was much more complex than explained above as we had a lot of application instances and we had to avoid starting an ownership battle between them. To address this we put in place a "Permission to process request" concept so no matter how many instances wanted to take control it was only one which was granted.
For another application with similar requirements we used a much much easier way to achieve this but the price we paid was some extra learning curve in using ILock from Hazelcast IMGB framework. That is really very easy but keep in mind the Hazelcat community edition comes with absolutely no security and paying for a Hazelcast license just to achieve this may be a bit of expense.
Again all depends on you use case, for us the semaphore table was good enough in first scenario but prove bad in the second one as the multiple processes trying to update the same table at the same time ended up with a lot of database contention which took us to Hazelcast.
Other ideas would be a custom health check implementation that could trigger activating one scheduler or the other depending of response received.
Hope that helps, just ideas from our experience. Good luck.
I am working on a web application that provides its users to optionally execute long-running processes 'in background'. An example would be some long-running report generation, or deleting thousands of objects simultaneously.
I've implemented this using an ExecutorService defined as FixedThreadPool using a ThreadFactory. The ThreadFactory is built like this:
ThreadFactoryBuilder()
.setNameFormat(clientId + "-BackgroundTask-%d")
.setDaemon(true)
.setPriority(Thread.MIN_PRIORITY)
.build()
I execute the task like this:
Future<TaskStatus> future = clientExecutors.get(clientId).submit(
backgroundTask::execute);
taskFutures.put(backgroundTask.getTaskId(), future);
How can I enforce my webserver to always priorize handling new incoming requests (as fast as possible) over executing background tasks?
In other words: It should never ever happen, that a user has to wait long time while browsing the site, just because there are a lot of background-tasks executing. As you can see from above, I tried to do this by setting .setPriority(Thread.MIN_PRIORITY). However that does not seem to be sufficient.
Furthermore, as for now, I've set some arbitrary value for the FixedThreadPool size (10) and use it globally for the entire background-handling of the application (and all its customers).
Instead I would like to define a threadpool for each customer, to make sure each customer has the same privilege to run a certain amount of tasks in the background. Say, each customer has a FixedThreadPool of size 5, and on the server I'll have a max. of 50 different customers. That would add up to 250 running background tasks at the same time.
The most important requirement here is: it does not matter, how long these background-tasks need to execute (say 2 minutes, or 20 minutes). What is important, is that each customer has the ability to send 5 tasks to be executed in background, and each of those are worked on equally.
I've tested running 30 cpu-intensive background tasks and it turns out that while these are running and cpu is near 100%, new incoming requests take a very long time to be handled.
So obviously, I am doing it wrong.
Update 12.09.2017
I've read about microservices and while it sounds great I see a great challenge in splitting the necessary parts from our monolithic application. Mostly because nearly every operation might turn into a long running process given a big enough data selection.
Furthermore, wouldn't I run into the same problem with my microservice, i.e. the server running the microservice would suffer the same performance degradation. Well the only good thing would, that the rest of the web app would not suffer from it anymore.
I've read some posts about introducing Thread.sleep(1) or Thread.sleep in general into CPU-heavy operations to reduce the amount of CPU used in these operations. I've also read about someone who introduced this as an aspect so that he can even change the amount of time waited dynamically in order to have some control about how much cpu would be used.
However, my gut tells me that ain't right either. What do you think about introducing Thread.sleep to lower the amount of CPU used for a task? Is this common practice? If not, what would be the right approach?
I would highly consider changing your system architecture to offload these long-running requests to a separate instance instead of running them in-process with the general request-service application. In general I think it is an anti-pattern to handle both batch / online (or long / short running) processing in the same application instance.
Ideally you'd build a standalone microservice to handle these requests, but you could also simply just deploy X instances of your existing application, and configure your load balancer to route requests to the long running invocation paths (e.g. POST /myapp/longrunningjob) only to the instances dedicated to running these long-running processes.
I am trying to measure time for next button one page to another. To do this I start transaction before to press button, I press the next button , when the next page loaded I end the transaction. Between this transaction process I use web_reg_find() and check specific text to verify that page.
When I use controller that transaction measured 5 sec , then I modified transaction content and delete web_reg_find() after I measured that transaction it will be 3 sec. Is that normal ?
Because I do load test , functionality is important so transaction are also important. Is there any alternative way to check content and save the performance ?
web_reg_find() does some logic based on the response sent from the server and therefore takes time. LoadRunner is aware that this is not actual time that will be perceived by the real user and therefore reports it as "wasted time" for the transaction. If you check the log for this transaction you will see something like this:
Notify: Transaction "login" ended with "Pass" status (Duration: 4.6360 Wasted Time: 0.0062).
The time the transaction took and out of that time how much time was wasted on LoadRunner internal operations.
Note that when you will open the result in Analysis the transaction times will be reported without the wasted time (i.e. Analysis will report the time as it is perceived by the real user).
The amount of time taken for the processing of web_reg_find() also seems unusually long. As web_reg_find() is both memory and CPU bound (holding the page in ram and running string comparisons) I would look at other possibilities as to why it takes an additional two seconds. My hypothesis is that you have a resource constricted, or over subscribed load generator. Look at the performance of a control group for this type of user, 1 user loaded by itself on a load generator. Compare your control group to the behavior of the global group. If you see a deviation then this is due to a local resource constriction which is showing as slowed virtual users. This would have an impact on your measurement of response time as well.
I deliberately underload my load generators to avoid any possibility of load generator coloration plus employing a control generator in the group to measure any possible coloration.
the time which is taken by web_reg_find is calculated as waste time...
I need to find out the total time a session is waiting when its is active.
For this i used the query like below...
SELECT (SUM (wait_time + time_waited) / 1000000)
FROM v$active_session_history
WHERE session_id = 614
But, i feel i'm not getting what i wanted using this query.
Like, first time when i ran this query i got 145.980962, # second time=145.953926and #3rd time i got 127.706429.
Ideally, the time should be same or increase. But, as you see, the value returned is reducing everytime.
Please correct me where i'm doing wrong.
It does not contain whole history, v$active_session_history "forgets" older lines. Think about it as a ring of buffers. Once all buffers are written, it restarts from 1st buffer.
To get events of some session, look v$session_event. To get current (active) event of active session: v$session_wait (In recent Oracle versions, you can find this info also in v$session)
NOTE: v$session_event view will not show you CPU time (which is not event but can be seen in v$active_session_history). You can add it, for example, from v$sesstat if needed...
Your bloomer is that you have not understood the nature of v$active_session_history: it is a sample not a log. That is, each record in ASH is a point in time, and doesn't refer back to previous records.
Don't worry, it's a common mistake.
This is a particular problem with WAIT_TIME. This is the total time waited for that specfic occurence of that event. So if the wait event stretches across two samples, in the first record WAIT_TIME will be 1 (one second) and in the next sample it will be 2 (two seconds). However, a SUM(WAIT_TIME) would produce a total of 3 which is too much. Of course this is an arithmetic proghression so if the wait event stretches to ten samples (ten seconds) a SUM(WAIT_TIME) would produce a total of 55.
Basically, WAIT_TIME is a flag - if it is 0 the session is ON CPU and if it's greater than zero it is WAITING.
TIME_WAITED is only populated when the event has stopped waiting. So a SUM(TIME_WAITED) wouldn't give an inflated value. In fact just the opposite: it will only be populated for wait events which were ongoing at the sample time. So there can be lots of waits which fall between the interstices of the samples which won't show up in that SUM.
This is why ASH is good for highlighting big performance issues and bad for identifying background niggles.
So why doesn't the total time doesn't increase each time you run your query? Because ASH is a circular buffer. Older records get aged out to make way for new samples. AWR stores a percentage of the ASH records on disk; they are accessible through DBA_HIST_ACTIVE_SESSION_HIST (the default is one record in ten). So probably ASH purged some samples with high wait times between the second and third times you ran your queries. You could check that by including MIN(SAMPLE_TIME) in the select list.
Finally, bear in mind that SIDs get reused. The primary key for identifying a session is (SID, Serial#), Your query only grouops by SID, so it may use data from several different sessions.
There is a useful presentation by Graham Woods, on of the Oracle gurus who worked on ASH called "Shifting through the ASHes". Altough if would be better to hear Graham speaking, the slide deck on its own still provides some useful insights. Find it here.
tl;dr
ASH is a sample not a log. Use it for COUNTs not SUMs.
"Anything wrong in the way query these tables? "
As I said above, but perhaps didn't make clear enough, DBA_HIST_ACTIVE_SESSION_HIST only holds a fraction of the records from ASH. So it is even less meaningful to run SUM() on its columns than on the live ASH.
Whereas V$SESSION_EVENT is an actual log of events. Its wait times are reliable and accurate. That's why you pay the overhead of enabling timed statistics. Having said which, V$SESSION_EVENT only gives us aggregated values per session, so it's not particularly useful in diagnosis.
I'm running Coldfusion8 and have a cfc, that loops through a set of database records.
Each record contains two fields image path and image file. I'm constructing a path for every image, upload it to a temp folder, resize and then store it to S3.
Depending on the number of records, this may take quite some time and I have not been able to successfully finish the upload cycle with larger sets of images (eventually times out).
I'm already settings my timeout threshold to 5000, but it still does not seem enough.
I can pick up where I left, because I'm keeping a media log to check against, before uploading to S3. This way I can finish the task, but I need to trigger this function 5x to upload 400 items.
Question:
Is there way to avoid a timeout without setting (in S3 case) httptimeout to some 50000000? And would it make sense to run this in a CFTHREAD or will this be a problem if the user leaves the import page while the system is still uploading?
Thanks for some insights.
You can use a CFthread to perform the task, but make sure you LOCK THE SCOPE! otherwise you could end up running this memory intensive proccess several times over and kill the server, you only want this proccess running once at a time if its so intensive.
You have other options though, if this is not something that your application users will need to run and its a one-off proccess your doing, you could set a scheduled task with an exceedingly long timeout to run overnight, when the server is not very high use, This allows you to set the timeout independently to the application so the rest of the application is unaffected by global timeout changes.
Another option is, if this is something users will be doing semi-regularly then a thread which pushes a notification via email, log or other means (Ajax or Websockets) letting the user know they're task is complete. This has the upside that timeouts can be changed, calculated on the amount of data to be proccessed dynamically at thread generation. However, if your not careful you can overload your server with many threads proccessing large datasets (plus log file read-write locks will be harder to manage).
I would encourage you though, to take this away and see what solution works for you and post your final solution so others can see what the outcome is.
Hope this helps.