I have an application with the following pattern:
2 long running processes that go into hibernate after some idle time
and their memory consumption goes down as expected
N (0 < N < 100) worker processes that do some work and hibernate when idle more than
10 seconds or terminate if idle more than two hours
during the night,
when there is no activity the process memory goes back to almost the
same value that was at the application start, which is expected as
all the workers have died.
The issue is that "system" section keeps growing (around 1GB/week).
My question is how can I debug what is stored there or who's allocating memory in that area and is not freeing it.
I've already tested lists:keysearch/3 and it doesn't seem to leak memory, as that is the only native thing I'm using (no ports, no drivers, no NIFs, no BIFs, nothing). Erlang version is R15B03.
Here is the current erlang:memory() output (slight traffic, app started on Feb 03):
[{total,378865650},
{processes,100727351},
{processes_used,100489511},
{system,278138299},
{atom,1123505},
{atom_used,1106100},
{binary,4493504},
{code,7960564},
{ets,489944},
{maximum,402598426}]
This is a 64-bit system. As you can see, "system" section has ~270MB and "processes" is at around 100MB (that drops down to ~16MB during the night).
It seems that I've found the issue.
I have a "process_killer" gen_server where processes can subscribe for periodic GC or kill. Its subscribe functions are called on each message received by some processes to postpone the GC/kill (something like re-arm).
This process performs an erlang:monitor if not already monitored to catch a dead process and remove it from watch list. If I comment our the re-subscription line on each handled message, "system" area seems to behave normally. That means it is a bug in my process_killer that does leak monitor refs (remember you can call erlang:monitor multiple times and each call creates a reference).
I was lead to this idea because I've tested a simple module which was calling erlang:monitor in a loop and I have seen ~13 bytes "system" area grow on each call.
The workers themselves were OK because they would die anyway taking their monitors along with them. There is one long running (starts with the app, stops with the app) process that dispatches all the messages to the workers that was calling GC re-arm on each received message, so we're talking about tens of thousands of monitors spawned per hour and never released.
I'm writing this answer here for future reference.
TL;DR; make sure you are not leaking monitor refs on a long running process.
Related
I have a CPU-bound Go service that receives a high volume of time-sensitive work. As work is performed, data is pushed to a queue to be periodically processed in the background. The processing is low priority, performed by an external package, and can take a long time.
This background processing is causing a problem, because it's not really happening in the background: it's consuming an entire Goroutine thread and forcing the service to run at reduced capacity, which slows down the rate it can process work at.
There are obviously solutions like performing the background work out-of-process, but this would add an unacceptable level of complexity to the service.
Given that the background processing code isn't mine and I can't add yields, is there any way to prevent it from hogging an entire Goroutine thread?
your server maybe call producer ,background processing call consumer
consumer running in other machine
consumer is a single progress? if yes limit cpu、mem
Following are the properties I have set -
spring.task.execution.pool.core-size=50
spring.task.execution.pool.max-size=200
spring.task.execution.pool.queue-capacity=100
spring.task.execution.shutdown.await-termination=true
spring.task.execution.shutdown.await-termination-period=10s
spring.task.execution.thread-name-prefix=async-task-exec-
I still see thread names as - "async-task-exec-7200"
Does it mean it is creating 7200 threads?
Also, another issue I observed that #Async would wait for more than 10min to get a thread and relieve the parent thread.
You specified core size of 50 and max size of 200. So your pool will normally run with 50 threads, and when there is extra work, it will spawn additional threads, you'll see "async-task-exec-51", "async-task-exec-52" created and so on. Later, if there is not enough work for all the threads, the pool will kill some threads to get back to just 50. So it may kill thread "async-task-exec-52". The next time it has too much work for 50 threads, it will create a new thread "async-task-exec-53".
So the fact that you see "async-task-exec-7200" means that over the life time of the thread pool it has created 7200 threads, but it will still never have more than the max of 200 running at the same time.
If #Async method is waiting 10 minutes for a thread it means that you have put so much work into the pool that it has already spawned all 200 threads and they are processing, and you have filled up the queue capacity of 100, so now the parent thread has to block(wait) until there is at least a spot in the queue to put the task.
If you need to consistently handle more tasks, you will need a powerful enough machine and enough max threads in the pool. But if your work load is just very spiky, and you don't want to spend on a bigger machine and you are ok with tasks waiting longer sometimes, you might be able to get away with just raising your queue-capacity, so the work will queue up and eventually your threads might catch up (if the task creation gets slower).
Keep trying combinations of these settings to see what will be right for your workload.
We're running into an issue with the SOS-Berlin JobScheduler running on Windows that is difficult to diagnose* and I would appreciate any guidance.
*Difficult because I don't know Scala (though I do know C++ and Java). It's difficult to navigate this code-base (some of it's in German).
We have a process-class called Foo, that will sometimes burst up outside the limit of how many processes can be run. So, for example, we limit the process-class to 30 processes and 60 want to run. This leaves 30 running and 30 "waiting for process."
The problem is that JobScheduler doesn't seem to prioritize the 30 that are waiting for a process. Instead, any new job that gets fired after the burst receives processes, leaving some jobs waiting indefinitely. Once the number of jobs "waiting for process" hits zero, the jobs clear out immediately.
Further, it seems that when there are a large number of jobs "waiting for process," the run time for tasks doubles or triples. A job that normally takes 20 seconds to run, will spike to 1-2 minutes, further amplifying the issue as processes are not released back to the pool.
Admittedly, we're running an older version of JS, which we're planning to upgrade this/next week. However, I'm wondering if there is something fundamental we're missing. We've turned down the logging, looked for DB locks, added memory to the heap, shut-down some other processes on the server. We've also increased the process pool, but we don't want to push it too far, lest we crush the server. Nothing seems to be alleviating the issue.
Any tuning help would be appreciated!
As a follow-up, we determined the cause of the issue.
Another user had been using the temp directory to store intermediate generated files. The user was not clearing out these files, resulting in 100's of thousands of files in the directory. They were not very large so we didn't notice. For some reason Job Scheduler started to choke based on this. I'm not clear on the reasons.
Clearing the temp directory, scolding the user, and fixing his script fixed the issue.
I have a question haunting me for a long time.
Short version:
What's the working paradigm of Windows Message Loop?
Detailed version:
When we start a Windows application (not a console application), we can interact with it through mouse or keyboard. The application retrieve all kinds of messages representing our movements from its meesage queue. And it is Windows that is responsible for collecting our actions and properly feeding messages into this queue. But doesn't this scenario mean that Windows has to run infinitively?
I think the Windows scheduler should be running all the time. It could possibly be invoked by a time interrupt at a pre-defined interval. When the scheduler is trigged by the time interrupt, it swithes current thread for the next pending thread. A single thread can only get its message with GetMessage() when it is scheduled to run.
I am wondering if there's only one Windows application running, will this application got more chance to get its message?
Update - 1 (9:59 AM 11/22/2010)
Here is my latest finding:
According to < Windows via C/C++ 5th Edition > Chapter 7 Section: Thread Priorities
...For example, if your process'
primary thread calls GetMessage() and
the system sees that no messages are
pending, the system suspends your
porcess' thread, relinquishes the
remainder of the thread's time slice,
and immediately assigns the CPU to
another waiting thread.
If no messages show up for GetMessage
to retrieve, the process' primary
thread stays suspended and is never
assigned to a CPU. However, when a
message is placed in the thread's
queue, the system knows that the
thread should no longer be suspended
and assigns the thread to a CPU if no
higher-priority threads need to
execute.
My current understanding is:
In order for the system to know when a message is placed in a thread's queue, I can think of 2 possible approaches:
1 - Centralized approach: It is the system who is responsible to always check EVERY thread's queue. Even that thread is blocked for the lacking of messages. If any message is availabe, the system will change the state of that thread to schedulable. But this checking could be a real burden to the system in my opinion.
2 - Distributed approach: The system doesn't check every thread's queue. When a thread calls GetMessage and find that no message is available, the system will just change the thread's state to blocked, thus not schedulable any more. And in the future no matter who places a message into a blocked thread's queue, it is this "who"(not the system) that is responsible to change the the thread's state from blocked to ready (or whatever state). So this thread is dis-qualified for scheduling by the system and re-qualified by someone else in the regard of GetMessage. What the system cares is just to schedule the runable threads. The system doesn't care where these schedulable threads come from. This approach will avoid the burden in approach 1, and thus avoid the possible bottleneck.
In fact, the key point here is, how are the states of the threads changed? I am not sure if it is really a distributed paradigm as shown in appraoch 2, but could it be a good option?
Applications call GetMessage() in their message loop. If the message queue is empty, the process will just block until another message becomes available. Thus, GetMessage is a processes' way of telling Windows that it doesn't have anything to do at the moment.
I am wondering if there's only one
Windows application running, will this
application got more chance to get its
message?
Well yeah probably, but I think you might be missing a crucial point. Extracting a message from the queue is a blocking call. The data structure used is usually referred to as a blocking queue. The dequeue operation is designed to voluntarily yield the current thread's execution if the queue is empty. Threads can stay parked using a various different methods, but it is likely that thread remains in a waiting state using kernel level mechanisms in this case. Once the signal is given that the queue has items available the thread may go into a ready state and the scheduler will start assigning its fair share of the CPU. In other words, if there are no messages pending for that application then it just sits there in an idle state consuming close to zero CPU time.
The fewer threads you have running (time slices are scheduled to threads, not processes), the more chances any single application will have to pull messages from its queue. Actually, this has nothing to do with Windows messages; it's true for all multithreading; the more threads of the same or higher priority which are running, the fewer time slices any thread will get.
Beyond that, I'm not sure what you are really asking, though...
My application that wraps around Oracle Data pump's executables IMPDP and EXPDP takes random amounts of time for the same work. On further investigation, I see it waiting for again random amounts of time with the event 'wait for unread message on broadcast channel'. This makes the application take anytime b/w 10 minutes to over an hour for the same work.
I fail to understand if this has something to do with the way my application uses these executables, or it has got something to do with Load on my server or something totally alien to me.
There's a bunch of processes and sessions involved in a data pump operation.
I suspect you are looking at the master processes, not at the worker processes. So all that event is saying is that the Master process spends more time waiting for the worker process when the job takes longer. Which is fairly useless information.
You need to monitor the worker processes and see why they are taking longer.
Those wait events are usually considered to be "idle" waits - i.e. Oracle has nothing to do, it is waiting for further data/instructions.