Preventing a heavy process from sinking in the swap file - windows

Our service tends to fall asleep during the nights on our client's server, and then have a hard time waking up. What seems to happen is that the process heap, which is sometimes several hundreds of MB, is moved to the swap file. This happens at night, when our service is not used, and others are scheduled to run (DB backups, AV scans etc). When this happens, after a few hours of inactivity the first call to the service takes up to a few minutes (consequent calls take seconds).
I'm quite certain it's an issue of virtual memory management, and I really hate the idea of forcing the OS to keep our service in the physical memory. I know doing that will hurt other processes on the server, and decrease the overall server throughput. Having that said, our clients just want our app to be responsive. They don't care if nightly jobs take longer.
I vaguely remember there's a way to force Windows to keep pages on the physical memory, but I really hate that idea. I'm leaning more towards some internal or external watchdog that will initiate higher-level functionalities (there is already some internal scheduler that does very little, and makes no difference). If there were a 3rd party tool that provided that kind of service is would have been just as good.
I'd love to hear any comments, recommendations and common solutions to this kind of problem. The service is written in VC2005 and runs on Windows servers.

As you mentioned, forcing the app to stay in memory isn't the best way to share resources on the machine. A quick solution that you might find that works well is to simply schedule an event that wakes your service up at a specific time each morning before your clients start to use it. You can just schedule it in the windows task scheduler with a simple script or EXE call.

I'm not saying you want to do this, or that it is best practice, but you may find it works well enough for you. It seems to match what you've asked for.
Summary: Touch every page in the process, on page at a time, on a regular basis.
What about a thread that runs in the background and wakes up once every N seconds. Each time the page wakes up, it attempts to read from address X. The attempt is protected with an exception handler in case you read a bad address. Then increment X by the size of a page.
There are 65536 pages in 4GB, 49152 pages in 3GB, 32768 pages in 2GB. Divide your idle time (overnight dead time) by how often you want (attempt) to hit each page.
BYTE *ptr;
ptr = NULL;
while(TRUE)
{
__try
{
BYTE b;
b = *ptr;
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
// ignore, some pages won't be accessible
}
ptr += sizeofVMPage;
Sleep(N * 1000);
}
You can get the sizeOfVMPage value from the dwPageSize value in the returned result from GetSystemInfo().
Don't try to avoid the exception handler by using if (!IsBadReadPtr(ptr)) because other threads in the app may be modifying memory protections at the same time. If you get unstuck because of this it will almost impossible to identify why (it will most likely be a non-repeatable race condition), so don't waste time with it.
Of course, you'd want to turn this thread off during the day and only run it during your dead-time.

A third approach could be to have your service run a thread that does something trivial like incrementing a counter and then sleeps for a fairly long period, say 10 seconds. Thios should have minimal effect on other applications but keep at least some of your pages available.

The other thing to ensure is that your data is localized.
In other words: do you really need all 300 MiB of the memory before you can do anything? Can the data structures you use be rearranged so that any particular request could be satisfied with only a few megabytes?
For example
if your 300 MiB of heap memory contains facial recognition data. Can the data internally be arranged so that male and female face data are stored together? Or big-noes are separate from small-noses?
if it has some sort of logical structure to it can be it sorted? so that a binary search can be used to skip over a lot of pages?
if it's a propritary, in-memory, database engine, can the data be better indexed/clustered to not require so many memory page hits?
if they're image textures, can commonly used textures be located near each other?
Do you really need all 300 MiB of the memory before you can do anything? You cannot service request without all that data back in memory?
Otherwise: scheduled task at 6 ᴀᴍ to wake it up.

In terms of cost, the cheapest and easiest solution is probably just to buy more RAM for that server, and then you can disable the page file entirely. If you're running 32-bit Windows, just buy 4GB of RAM. Then the entire address space will be backed with physical memory, and the page file won't be doing anything anyway.

Related

Elimination of run time variation over repeated executions of the same program

I am trying to design an Online Programming Contest Judge, and one of the things that I need to ensure is that when the same code is compiled (assuming the requirement),
given the same input, it should take exactly the same amount of time for the program to execute, each time this is done.
Currently, I am using a simple python script that
has 2 threads, one of which invokes a blocking system call that starts the execution of the test code, and the other keeps track of time and sends a kill signal to the
child process after the time limit expires. Incidentally, I am doing this inside a virtual machine for reason of security, and convenience (setting up a proper chroot is
way too complicated, and more risky).
However, given identical conditions (ie, when I restore a snapshot), I still get a variation in the time taken for execution in range of approximately 50ms on either side. As this prevents setting strict time limits, is there anyway to eliminate this variation?
I'm not an expert in that field, but I don't think you can do it. Even if you restore the snapshot inside the VM, the state of the "Outside" Machine is going to be pretty different. You have two OSs running, each one which multiple process which are probably going to compete for the resources at some point. If it's a website or a PC with an internet connection, you can get hit by different amounts of connections (or request), and that will make process start running and consume requests etc... If some application tries to access the hard disk, the initial position of the physical disk matters a lot for seek time, etc...
If you want a "deterministic" limit, you might wanna check if you can count how many instructions were executed by a certain process, or something like that.
Anyways, I've participated in several programming contents, and as far as I know, they don't care about the 50 ms differences... If you do a proper algorithm, you can get inside the time with a really big margin. So I'd advise you to live with it, and just include that in the rules.

Why do you use the keyword delete?

I understand that delete returns memory to the heap that was allocated of the heap, but what is the point? Computers have plenty of memory don't they? And all of the memory is returned as soon as you "X" out of the program.
Example:
Consider a server that allocates an object Packet for each packet it receives (this is bad design for the sake of the example).
A server, by nature, is intended to never shut down. If you never delete the thousands of Packet your server handles per second, your system is going to swamp and crash in a few minutes.
Another example:
Consider a video game that allocates particles for the special effect, everytime a new explosion is created (and never deletes them). In a game like Starcraft (or other recent ones), after a few minutes of hilarity and destruction (and hundres of thousands of particles), lag will be so huge that your game will turn into a PowerPoint slideshow, effectively making your player unhappy.
Not all programs exit quickly.
Some applications may run for hours, days or longer. Daemons may be designed to run without cease. Programs can easily consume more memory over their lifetime than available on the machine.
In addition, not all programs run in isolation. Most need to share resources with other applications.
There are a lot of reasons why you should manage your memory usage, as well as any other computer resources you use:
What might start off as a lightweight program could soon become more complex, depending on your design areas of memory consumption may grow exponentially.
Remember you are sharing memory resources with other programs. Being a good neighbour allows other processes to use the memory you free up, and helps to keep the entire system stable.
You don't know how long your program might run for. Some people hibernate their session (or never shut their computer down) and might keep your program running for years.
There are many other reasons, I suggest researching on memory allocation for more details on the do's and don'ts.
I see your point, what computers have lots of memory but you are wrong. As an engineer you have to create programs, what uses computer resources properly.
Imagine, you made program which runs all the time then computer is on. It sometimes creates some objects/variables with "new". After some time you don't need them anymore and you don't delete them. Such a situation occurs time to time and you just make some RAM out of stock. After a while user have to terminate your program and launch it again. It is not so bad but it not so comfortable, what is more, your program may be loading for a while. Because of these user feels bad of your silly decision.
Another thing. Then you use "new" to create object you call constructor and "delete" calls destructor. Lets say you need to open so file and destructor closes it and makes it accessible for other processes in this case you would steel not only memory but also files from other processes.
If you don't want to use "delete" you can use shared pointers (it has garbage collector).
It can be found in STL, std::shared_ptr, it has one disatvantage, WIN XP SP 2 and older do not support this. So if you want to create something for public you should use boost it also has boost::shared_ptr. To use boost you need to download it from here and configure your development environment to use it.

Give all possible resources to a program

I created a program in C# to work with 2.5 million records in Oracle Express (local instance), parse/split those records and create an additional 5 million records.
I added some code to print times on the screen and it seems fairly fast. It is doing all the processing for 1K records every 9 seconds. Which means it takes more than 6 hours to finish.
Now, with Task Manager I can see the program is using 6% of CPU (max) and around 50MB of memory. I understand the OS, and Oracle itself need resources to operate but..... is there a way to tell this little program "hey, it's ok, go ahead and use at least 50% of CPU, there are 4GB of RAM so knock yourself out"?
Note: One of the reasons I'm using a local instance with Oracle Express is to reduce the network bottleneck. Also I might not run this process quite often but I was intrigued to see if this was at all possible.
Please forgive my noobness,
Thanks!
The operating system will give your program all the resources it needs, the reason your process is not consuming all the CPU is probably because it's waiting for the IO sub system more than the processor.
If you want to see if you can consume more CPU cycles try writing a program that runs a short infinite loop as fast as possible and you will see the difference in CPU usage.
A number of thoughts, not really answers I guess, but.
You could up the priority of the applications thread, however, its possible that the code maybe less efficient than you think, so..
Have you run a profiler on it?
If its currently a single threaded app, you could look to see if you could parse it in batches and therefore run them in parallel.
Without knowing a lot of detail of the splitting of records, is it possible to off hand that more to oracle to do? eg, would matter less about network etc or local or otherwise.
If you're apps drawing/updating a screen or UI then it will almost certainly slow the progress of the work down. An example. I ran an app which sorted about 10k emails into around 250k lines into a database, if I added an item to a listbox each line the time went from short to rediculous eg, crash out got bored. So, again, offloading to a thread to do the work with as few UI updates to do as possible can help.

How to force workflow runtime to use more CPU power?

Hello
I've quite unordinary problem because I think that in my case workflow runtime doesn't use enough CPU power. Scenario is as follow:
I send a lot of messages to queues. I use EnqueueItem method from WorkflowRuntime class.
I create new instance of workflow with CreateWorkflow method of WorkflowRuntime class.
I wait until new workflow will be moved to the first state. Under normal conditions it takes dozens of second (the workflow is complicated). When at the same time messages are being sent to queues (as described in the point 1) it takes 1 minute or more.
I observe low CPU (8 cores) utilization, no more than 15%. I can add that I have separate process that is responsible for workflow logic and I communicate with it with WCF.
You've got logging, which you think is not a problem, but you don't know. There are many database operations. Those need to block for I/O. Having more cores will only help if different threads can run unimpeded.
I hate to sound like a stuck record, always trotting out the same answer, but you are guessing at what the problem is, and you're asking other people to guess too. People are very willing to guess, but guesses don't work. You need to find out what's happening.
To find out what's happening, the method I use is, get it running under a debugger. (Simplify the problem by going down to one core.) Then pause the whole thing, look at each active thread, and find out what it's waiting for. If it's waiting for some CPU-bound function to complete for some reason, fine - make a note of it. If it's waiting for some logging to complete, make a note. If it's waiting for a DB query to complete, note it. If it's waiting at a mutex for some other thread, note it.
Do this for each thread, and do it several times. Then, you can really say you know what it's doing. When you know what it's waiting for and why, you'll have a pretty good idea how to improve it. That's a variation on this technique.
What are you doing in the work item?
If you have any sort of cross thread synchronisation (Critical sections etc) then this could cause you to spend time stalling the threads waiting for resources to become free.
For example, If you are doing any sort of file access then you are going to spend considerable time blocked waiting for the loads to complete and this will leave your threads idle a lot of the time. You could throw more threads at the problem but then you'd end up generating more disk requests and the resource contention would become even more of a problem.
Thats a couple of potential ideas but I'd really need to know what you are doing before I can be more useful ...
Edit: in answer to your comments...
1) OK
2) You'd perform terribly with 2000 threads working flat out due to switching overhead. In fact running 20-25 threads on an 8 core machine may be a bad plan too because if you get them running at high speed then they will spend time stealing each other's runtime and regular context switches (software thread switches) are very expensive. They may not be as expensive as the waits your code is suffering.
3) Logging? Do you just submit them to an asynchronous queue that spits them out to disk when it has the opportunity or are they sychronous file writes? If they are aysnchronous can you guarantee that there isn't a maximum number of request that can be queued before you DO have to wait? And if you have to wait how many threads end up iin contention for the space that just opened up? There are a lot of ifs there alone.
4) Database operation even on the best database are likely to block if 2 threads make similar calls into the database simultaneously. A good database is designed to limit this but its quite likely that, at least some, clashing will happen.
Suffice to say you will want to get a good thread profiler to see where time is REALLY being lost. Failing that you will just have to live with the performance or attack the problem in a different way ...
WF3 performance is a little on the slow side. If you are using .NET 4 you will get a better performance moving to WF4. Mind you is means a rewrite as WF4 is a completely different product.
As to WF3. There is white paper here that should give you plenty of information to improve things from the standard settings. Look for things like increasing the number of threads used by the DefaultWorkflowSchedulerService or switching to the ManualWorkflowSchedulerService and disabling performance counters which are enabled by default.

Effect of short term high VM memory usage (Windows)

In the app I'm writing, I use a lot of in memory containers (C++ std containers but I don't think that's relevant).
During one "task" of my app, in a heavy usage edge-case the private bytes memory usage hits 1GB.
Just as a bit of context, this task is a user initiated task involving 100,000s files. It's likely that the user will kick this off and then leave the machine running.
(And no I don't do anything dumb like load files into memory - this footprint is all metadata related to the task in progress).
For most users the memory usage during this task is negligable - it's just the 1% of users who want to do 500,000 "things" insted of 5000 "things".
I was about to embark on a process to move a lot of this in-memory stuff to disk somehow, e.g. scratch file, embedded DB.
But then I thought - "hang on a minute. All of these solutions are essentially caching memory to disk. Isn't that what Virtual Memory is for?".
I'm not interested in persisting this data - it's purely scratch/temporary stuff I need access to while the task is running.
So my question is, what should I do?
I don't want to do a major refactor for that 1%, but I want to know the impact of running an app with that high a memory footprint.
Am I right in saying that I probably wouldn't be able to do much better than the Windows VM manager anyway?
Under what conditions does this become harmful? OK so yes, if I used up all the real memory then it'd be thrashing to reload pages. But wouldn't I have that anyway in the case if e.g. an embedded database?
Cheers,
John
Yes, the memory manager will do the job for you. Not without side-effects though, it is going to evict pages from RAM that other processes have mapped and give them to you. Those other processes are going to get slowed down by this, they'll be hit with a page fault when they access such a swapped-out page.
The balancing act here is whether your app is "important" enough to justify those other processes from getting short shrift. Usually that's Yes on a work station, a resounding No on a server.

Resources