Lot of time spent with following waits 'SQL*Net message from client' and 'wait for unread message on broadcast channel' - oracle

My application that wraps around Oracle Data pump's executables IMPDP and EXPDP takes random amounts of time for the same work. On further investigation, I see it waiting for again random amounts of time with the event 'wait for unread message on broadcast channel'. This makes the application take anytime b/w 10 minutes to over an hour for the same work.
I fail to understand if this has something to do with the way my application uses these executables, or it has got something to do with Load on my server or something totally alien to me.

There's a bunch of processes and sessions involved in a data pump operation.
I suspect you are looking at the master processes, not at the worker processes. So all that event is saying is that the Master process spends more time waiting for the worker process when the job takes longer. Which is fairly useless information.
You need to monitor the worker processes and see why they are taking longer.

Those wait events are usually considered to be "idle" waits - i.e. Oracle has nothing to do, it is waiting for further data/instructions.


How can I execute long running background code without monopolizing a Goroutine thread?

I have a CPU-bound Go service that receives a high volume of time-sensitive work. As work is performed, data is pushed to a queue to be periodically processed in the background. The processing is low priority, performed by an external package, and can take a long time.
This background processing is causing a problem, because it's not really happening in the background: it's consuming an entire Goroutine thread and forcing the service to run at reduced capacity, which slows down the rate it can process work at.
There are obviously solutions like performing the background work out-of-process, but this would add an unacceptable level of complexity to the service.
Given that the background processing code isn't mine and I can't add yields, is there any way to prevent it from hogging an entire Goroutine thread?
your server maybe call producer ,background processing call consumer
consumer running in other machine
consumer is a single progress? if yes limit cpu、mem

SOS-Berlin JobScheduler process queue logic

We're running into an issue with the SOS-Berlin JobScheduler running on Windows that is difficult to diagnose* and I would appreciate any guidance.
*Difficult because I don't know Scala (though I do know C++ and Java). It's difficult to navigate this code-base (some of it's in German).
We have a process-class called Foo, that will sometimes burst up outside the limit of how many processes can be run. So, for example, we limit the process-class to 30 processes and 60 want to run. This leaves 30 running and 30 "waiting for process."
The problem is that JobScheduler doesn't seem to prioritize the 30 that are waiting for a process. Instead, any new job that gets fired after the burst receives processes, leaving some jobs waiting indefinitely. Once the number of jobs "waiting for process" hits zero, the jobs clear out immediately.
Further, it seems that when there are a large number of jobs "waiting for process," the run time for tasks doubles or triples. A job that normally takes 20 seconds to run, will spike to 1-2 minutes, further amplifying the issue as processes are not released back to the pool.
Admittedly, we're running an older version of JS, which we're planning to upgrade this/next week. However, I'm wondering if there is something fundamental we're missing. We've turned down the logging, looked for DB locks, added memory to the heap, shut-down some other processes on the server. We've also increased the process pool, but we don't want to push it too far, lest we crush the server. Nothing seems to be alleviating the issue.
Any tuning help would be appreciated!
As a follow-up, we determined the cause of the issue.
Another user had been using the temp directory to store intermediate generated files. The user was not clearing out these files, resulting in 100's of thousands of files in the directory. They were not very large so we didn't notice. For some reason Job Scheduler started to choke based on this. I'm not clear on the reasons.
Clearing the temp directory, scolding the user, and fixing his script fixed the issue.

beanstalkd allowing one job to be reserved twice

I have a beanstalkd instance with two workers picking jobs from one tube.
I've noticed that occasionally one of the workers will reserve a job that has already been reserved (and being worked on) by the other worker.
I know there aren't duplicate jobs in the queue.
Why does beanstalkd allow the same job to be reserved twice?
It sounds to me that you didn't implemented the protocol properly. You need to handle DEADLINE_SOON, and do TOUCH.
What does DEADLINE_SOON mean?
DEADLINE_SOON is a response to a reserve command indicating that you have a job reserved whose deadline is real soon (current safety margin is approximately 1 second).
If you are frequently receiving DEADLINE_SOON errors on reserve, you should probably consider increasing the TTR on your jobs as it generally indicates you aren’t completing them in time. It may also be that you are failing to delete tasks when you have completed them.
See the mailing list discussion for more information.
How does TTR work?
TTR only applies to a job at the moment it becomes reserved. At that event, a timer (called “time-left” in the job stats) starts counting down from the job’s TTR.
If the timer reaches zero, the job gets put back in the ready queue.
If the job is buried, deleted, or released before the timer runs out, the timer ceases to exist.
If the job is touch"ed before the timer reaches zero, the timer starts over counting down from TTR.
The "touch" command
Allows a worker to request more time to work on a job.
This is useful for jobs that potentially take a long time, but you still want
the benefits of a TTR pulling a job away from an unresponsive worker. A worker
may periodically tell the server that it's still alive and processing a job
(e.g. it may do this on DEADLINE_SOON). The command postpones the auto
release of a reserved job until TTR seconds from when the command is issued.
The jobs take longer to run than the TTR, so it was being returned back to the queue and picked up by the other worker.
I now set a larger TTR on the job.

Repeated tasks - spawn new processes or run continuously?

We have about 10 different Python scripts that download data from the web, read data from a database and write data back to that database. They do so repeatedly every 10 seconds (or 10 seconds after the last task has completed).
The question is, what is the best approach at running these tasks? I can think of a few ways:
a while True that runs the task then sleeps for the interval. It could be guarded by a watchdog like supervisord, making sure it is always up.
having the script execute the task just once, and invoking the script externally once every 10 seconds by another process.
having the script execute the task lets say for 1 hour (every 10 seconds for an hour), and having a watchdog make sure that task runs again once the hour is over.
I would like to avoid long running processes that actually do something because I don't want to deal with memory problems etc over long periods of time.
Additional Information
The scripts are different because they each retrieve data from a different source, and query, calculate and insert different data into the database.
The tasks are performed every 10 seconds since the data being retrieve is in real-time, and we need to not only keep updating it very frequently, but also keep all the historical data in the database.
There are a lot of resources being used by the scripts - MySQL connections, HTTP connections, Redis connections, etc. We have encountered issues with using the long-running approach before, specifically with MySQL connections (things like MySQL server has gone away, even though all connections had been closed). Hence the inclination toward having the scripts run in shorter periods of time.
What are some common approaches at this?
Unless your scripts somehow leak memory (quite unlikely), they should all be the same. So, for sheer simplicity (your time programming/debugging is much more expensive than a few miliseconds of the machine's time, even each 10 seconds!) I'd go for the single script that checks each 10 seconds.
OTOH, checking each 10 seconds sounds like busywork. Can't you set up so that whatever you are monitoring tells you when there are changes? Or batch the records up so you can retrieve, say, a day's worth at at time?
If you are running on linux, cron has granularity of a minute. We have processes we run constantly. Rather than watch them, the script will open a semaphore that gets released when the program finishes normally or not. This way if it runs long and it gets called again by cron, the copy will exit when it can't get the lock. This way you can call it a often as you need to without it stepping on a possibly still running copy.

Erlang "system" memory section keeps growing

I have an application with the following pattern:
2 long running processes that go into hibernate after some idle time
and their memory consumption goes down as expected
N (0 < N < 100) worker processes that do some work and hibernate when idle more than
10 seconds or terminate if idle more than two hours
during the night,
when there is no activity the process memory goes back to almost the
same value that was at the application start, which is expected as
all the workers have died.
The issue is that "system" section keeps growing (around 1GB/week).
My question is how can I debug what is stored there or who's allocating memory in that area and is not freeing it.
I've already tested lists:keysearch/3 and it doesn't seem to leak memory, as that is the only native thing I'm using (no ports, no drivers, no NIFs, no BIFs, nothing). Erlang version is R15B03.
Here is the current erlang:memory() output (slight traffic, app started on Feb 03):
This is a 64-bit system. As you can see, "system" section has ~270MB and "processes" is at around 100MB (that drops down to ~16MB during the night).
It seems that I've found the issue.
I have a "process_killer" gen_server where processes can subscribe for periodic GC or kill. Its subscribe functions are called on each message received by some processes to postpone the GC/kill (something like re-arm).
This process performs an erlang:monitor if not already monitored to catch a dead process and remove it from watch list. If I comment our the re-subscription line on each handled message, "system" area seems to behave normally. That means it is a bug in my process_killer that does leak monitor refs (remember you can call erlang:monitor multiple times and each call creates a reference).
I was lead to this idea because I've tested a simple module which was calling erlang:monitor in a loop and I have seen ~13 bytes "system" area grow on each call.
The workers themselves were OK because they would die anyway taking their monitors along with them. There is one long running (starts with the app, stops with the app) process that dispatches all the messages to the workers that was calling GC re-arm on each received message, so we're talking about tens of thousands of monitors spawned per hour and never released.
I'm writing this answer here for future reference.
TL;DR; make sure you are not leaking monitor refs on a long running process.
