I have a complex context index that gets synchronized nightly. The process takes some 10 minutes, and any updates to this table that touch the index column during synchronization period result in ORA-29861: domain index is marked LOADING/FAILED/UNUSABLE exception - what can I do about it?
I don't think you can do anything else while that thing is synchronizing. If the index update is unavoidable then you'll need to find a way to queue up the requests and retry say every 15 minutes till it/they go through. I would suggest a limit of 3 tries until it goes through or fail gracefully. I make the suggestion of 3 times because if it is supposed to take 10 minutes and it doesn't work in 45 minutes, you have bigger fish to fry I would think. Might as well make it fail gracefully than endlessly repeating on a broken system. Hopefully you don't have so many attempted hits to the database during that period that you end up with a big queue. You could also see if you can synchronize your app with the times your infra people set up for updating indexes. So that you block these transactions at the same time. I don't know how big your organization is or what the systems are like (if you are running Oracle you guys have enough money for something sizable). That means you might have scheduling apps that might help? In any case, unless the DBAs stop doing these updates, you'll have to wait till they finish I would think.
Related
We're running into an issue with the SOS-Berlin JobScheduler running on Windows that is difficult to diagnose* and I would appreciate any guidance.
*Difficult because I don't know Scala (though I do know C++ and Java). It's difficult to navigate this code-base (some of it's in German).
We have a process-class called Foo, that will sometimes burst up outside the limit of how many processes can be run. So, for example, we limit the process-class to 30 processes and 60 want to run. This leaves 30 running and 30 "waiting for process."
The problem is that JobScheduler doesn't seem to prioritize the 30 that are waiting for a process. Instead, any new job that gets fired after the burst receives processes, leaving some jobs waiting indefinitely. Once the number of jobs "waiting for process" hits zero, the jobs clear out immediately.
Further, it seems that when there are a large number of jobs "waiting for process," the run time for tasks doubles or triples. A job that normally takes 20 seconds to run, will spike to 1-2 minutes, further amplifying the issue as processes are not released back to the pool.
Admittedly, we're running an older version of JS, which we're planning to upgrade this/next week. However, I'm wondering if there is something fundamental we're missing. We've turned down the logging, looked for DB locks, added memory to the heap, shut-down some other processes on the server. We've also increased the process pool, but we don't want to push it too far, lest we crush the server. Nothing seems to be alleviating the issue.
Any tuning help would be appreciated!
As a follow-up, we determined the cause of the issue.
Another user had been using the temp directory to store intermediate generated files. The user was not clearing out these files, resulting in 100's of thousands of files in the directory. They were not very large so we didn't notice. For some reason Job Scheduler started to choke based on this. I'm not clear on the reasons.
Clearing the temp directory, scolding the user, and fixing his script fixed the issue.
We have about 10 different Python scripts that download data from the web, read data from a database and write data back to that database. They do so repeatedly every 10 seconds (or 10 seconds after the last task has completed).
The question is, what is the best approach at running these tasks? I can think of a few ways:
a while True that runs the task then sleeps for the interval. It could be guarded by a watchdog like supervisord, making sure it is always up.
having the script execute the task just once, and invoking the script externally once every 10 seconds by another process.
having the script execute the task lets say for 1 hour (every 10 seconds for an hour), and having a watchdog make sure that task runs again once the hour is over.
I would like to avoid long running processes that actually do something because I don't want to deal with memory problems etc over long periods of time.
Additional Information
The scripts are different because they each retrieve data from a different source, and query, calculate and insert different data into the database.
The tasks are performed every 10 seconds since the data being retrieve is in real-time, and we need to not only keep updating it very frequently, but also keep all the historical data in the database.
There are a lot of resources being used by the scripts - MySQL connections, HTTP connections, Redis connections, etc. We have encountered issues with using the long-running approach before, specifically with MySQL connections (things like MySQL server has gone away, even though all connections had been closed). Hence the inclination toward having the scripts run in shorter periods of time.
What are some common approaches at this?
Unless your scripts somehow leak memory (quite unlikely), they should all be the same. So, for sheer simplicity (your time programming/debugging is much more expensive than a few miliseconds of the machine's time, even each 10 seconds!) I'd go for the single script that checks each 10 seconds.
OTOH, checking each 10 seconds sounds like busywork. Can't you set up so that whatever you are monitoring tells you when there are changes? Or batch the records up so you can retrieve, say, a day's worth at at time?
If you are running on linux, cron has granularity of a minute. We have processes we run constantly. Rather than watch them, the script will open a semaphore that gets released when the program finishes normally or not. This way if it runs long and it gets called again by cron, the copy will exit when it can't get the lock. This way you can call it a often as you need to without it stepping on a possibly still running copy.
I need to find out the total time a session is waiting when its is active.
For this i used the query like below...
SELECT (SUM (wait_time + time_waited) / 1000000)
FROM v$active_session_history
WHERE session_id = 614
But, i feel i'm not getting what i wanted using this query.
Like, first time when i ran this query i got 145.980962, # second time=145.953926and #3rd time i got 127.706429.
Ideally, the time should be same or increase. But, as you see, the value returned is reducing everytime.
Please correct me where i'm doing wrong.
It does not contain whole history, v$active_session_history "forgets" older lines. Think about it as a ring of buffers. Once all buffers are written, it restarts from 1st buffer.
To get events of some session, look v$session_event. To get current (active) event of active session: v$session_wait (In recent Oracle versions, you can find this info also in v$session)
NOTE: v$session_event view will not show you CPU time (which is not event but can be seen in v$active_session_history). You can add it, for example, from v$sesstat if needed...
Your bloomer is that you have not understood the nature of v$active_session_history: it is a sample not a log. That is, each record in ASH is a point in time, and doesn't refer back to previous records.
Don't worry, it's a common mistake.
This is a particular problem with WAIT_TIME. This is the total time waited for that specfic occurence of that event. So if the wait event stretches across two samples, in the first record WAIT_TIME will be 1 (one second) and in the next sample it will be 2 (two seconds). However, a SUM(WAIT_TIME) would produce a total of 3 which is too much. Of course this is an arithmetic proghression so if the wait event stretches to ten samples (ten seconds) a SUM(WAIT_TIME) would produce a total of 55.
Basically, WAIT_TIME is a flag - if it is 0 the session is ON CPU and if it's greater than zero it is WAITING.
TIME_WAITED is only populated when the event has stopped waiting. So a SUM(TIME_WAITED) wouldn't give an inflated value. In fact just the opposite: it will only be populated for wait events which were ongoing at the sample time. So there can be lots of waits which fall between the interstices of the samples which won't show up in that SUM.
This is why ASH is good for highlighting big performance issues and bad for identifying background niggles.
So why doesn't the total time doesn't increase each time you run your query? Because ASH is a circular buffer. Older records get aged out to make way for new samples. AWR stores a percentage of the ASH records on disk; they are accessible through DBA_HIST_ACTIVE_SESSION_HIST (the default is one record in ten). So probably ASH purged some samples with high wait times between the second and third times you ran your queries. You could check that by including MIN(SAMPLE_TIME) in the select list.
Finally, bear in mind that SIDs get reused. The primary key for identifying a session is (SID, Serial#), Your query only grouops by SID, so it may use data from several different sessions.
There is a useful presentation by Graham Woods, on of the Oracle gurus who worked on ASH called "Shifting through the ASHes". Altough if would be better to hear Graham speaking, the slide deck on its own still provides some useful insights. Find it here.
tl;dr
ASH is a sample not a log. Use it for COUNTs not SUMs.
"Anything wrong in the way query these tables? "
As I said above, but perhaps didn't make clear enough, DBA_HIST_ACTIVE_SESSION_HIST only holds a fraction of the records from ASH. So it is even less meaningful to run SUM() on its columns than on the live ASH.
Whereas V$SESSION_EVENT is an actual log of events. Its wait times are reliable and accurate. That's why you pay the overhead of enabling timed statistics. Having said which, V$SESSION_EVENT only gives us aggregated values per session, so it's not particularly useful in diagnosis.
I am designing a cloud app and need a worker process which scours my database looking for work, and then performs it.
Most of the info I seem to find on the subject of background tasks in the cloud involves some kind of scheduler and/or queuing system.
What I have doesn't quite fit into the "run this task every 5 minutes" or "add this to the queue to be executed later" models. I think the main difference to my problem is that the workers themselves find work to do, rather than being assigned it by a periodic scheduler or an external process that generates work.
What I have is basically a giant table where each entry has three fields:
job: a small task to be performed, lets say it gets the last message from a twitter account and stores it in the database
the interval at which to perform that job: say every 5 minutes, N.B. the interval is arbitrary and different for each entry in the table
the last date when the job was performed
The way I would implement this is to have a worker which has an infinite loop. When it enters the loop, it scours the database a)looking for items whose date + interval < currentTime, b)when it finds one, it sets date = currentTime, and c)then executes the job. If there is no work ATM, it sleep for a few seconds, then tries again.
I will have many parallel workers scouring the database simultaneously, which is why I do b) first and then c) in the paragraph above. Since there are parallel workers, action a) and b) are atomic operations on the database to prevent work being duplicated. If the worker crashes after a) and b), but before it manages to finish the work, it's no big deal, and the workers can just do it at the next interval; reason for this is that the work is not performed in a time-invariant system so a backlog scenario of failed jobs has no benefit as the tasks have to be performed at their exact intervals, so it's better to skip 1 interval than to have uneven intervals between which the tasks were executed.
My question is whether that is a reasonable implementation strategy? If so, how do I bring this process to life on the cloud (I am using Heroku, but may switch to EC2 in the future)? I still haven't written any code so I would welcome other suggestions (maybe I misunderstood the use cases/applications for queue systems).
This sounds so close to using something like a scheduled job that you might as well tread the well beaten path and do it the more conventional way. There's no reason why you can't schedule a job to run once every few seconds.
However, this idea of looking for work sounds dodgy. What happens if two workers find the same task to run at the same time for instance? Also, are there not triggers in the application which can indicate that work needs doing? It seems strange that you have code 'looking for work'.
You can go a very long way with simple periodic background tasks, so I would exhaust all possibilities in that area before rolling your own.
This is kind of a 2 part question
1) Is there a max number of HttpWebRequests that can be run at the same time in WP7?
I'm going to create a ScheduledTaskAgent to run a PeriodicTask. There will be 2 different REST service calls the first one will get a list of IDs for records that need to be downloaded, the second service will be used to download those records one at a time. I don't know how many records there will be my guestimage would be +-50.
2.) Would making all the individual record requests at once be a bad idea? (assuming that its possible) or should I wait for a request to finish before starting another?
Having just spent a week and a half working at getting a BackgroundAgent to stay within it's memory limits, I would suggest doing them one at a time.
You lose about half your memory to system libraries and the like, your first web request will take another nearly 20%, but it seems to reuse that memory on subsequent requests.
If you need to store the results into a local database, it is going to take a good chunk more. I have found a CompiledQuery uses less memory, which means holding a single instance of your context.
Between each call I would suggest doing a GC.Collect(), I even add a short Thread.Sleep() just to be sure the process has some time to tidying things up.
Another thing I do is track how much memory I am using and attempt to exit gracefully when I get to around 97 or 98%.
You can not use the debugger to test memory limits as the debug memory is much higher and the limits are not enforced. However, for comparative testing between versions of your code, the debugger does produce very similar result on subsequent runs over the same code.
You can track your memory usage with Microsoft.Phone.Info.DeviceStatus.ApplicationCurrentMemoryUsage and Microsoft.Phone.Info.DeviceStatus.ApplicationMemoryUsageLimit
I write a status log into IsolatedStorage so I can see the result of runs on the phone and use ScheduledActionService.LaunchForTest() to kick the off. I then use ShellToast notifications to let me know when the task runs and also when it completes, that way I can launch my app to read the status log without interrupting it.
Tyler,
My 2 cents here.
I don't believe there is any restriction on how mant HTTPWebequests you can spin up. These however have to be async, off course, and may be served from the browser stack. Most modern browsers including IE9 handle over 5 concurrently to the same domain; but you are not guaranteed a request handle immediately. However, it should not matter if you are willing to wait on a separate thread, dump your content on to the request pipe & wait for response on yet another thread. This post (here) has a nice walkthrough of why we need to do this.
Nothing wrong with this approach either, IMO. You're just going to have to wait until all the requests have their respective pipelines & then wait for the responses.
Thanks!
1) Your memory limit in a PeriodicTask or ResourceIntensiveTask is 5 MB. So you definitely should control your requests really careful. I dont think there is a limit in the code.
2)You have only 5 MB. So when you start all your requests at the same time it will terminate immediately.
3) I think you should better use a ResourceIntensiveTask because a PeriodicTask should only run 15 seconds.
Good guide for Multitasking features in Mango: http://blogs.infosupport.com/blogs/alexb/archive/2011/05/26/multi-tasking-in-windows-phone-7-1.aspx
I seem to remember (but can't find the reference right now) that the maximum number of requests that the OS can make at once is 7. You should avoid making this many at once though as it will stop other/system apps from being able to make requests.