PHP CLI - Detect where my memory is wasted - memory-management

I'm using PHP + Zend Framework for several CLI daemons.
They take up quite a bit of memory. I'm assuming the Zend Framework part might be causing this, but I want to have facts showing me where the memory is wasted.
How can I determine where memory is wasted? Is this just a trial + error process?
Also how can I improve garbage collection (I read some articles that this might also be an issue causing big memory usage).

I'd recommend using XDebug's profiler, which should give you the answers you need.
The profiler will generate a cachegrind file, which you can view in a tool such as KCacheGrind to see where your program's bottlenecks and memory usages are.
Find out more on XDebug's profiler page: http://www.xdebug.org/docs/profiler

IME, PHP uses a huge amount of memory for parsing code - try building a simple script which does nothing other than explicitly including all the libs you're using and track the memory usage at start/finish. Compare this with what you see in your actual script.
Htbaa is partially correct - more recent versions of PHP have a much smarter garbage collector however the earlier versions still do garbage collection - they just don't find all the cases that the newer gc does. But because its garbage collection, you'll typically see something of a sawtooth in memory usage given a steady input load.
But good garbage collection won't fix bad code - if you've stored something in a variable which is not on the stack, then you need to unset it when you're done with it.

What version of PHP are you running? Only PHP >=5.3 has a decent garbage collector. PHP <=5.2 can eat all your memory when used to run daemon scripts.

Related

Memory Leak Issue in Windows Phone Develoment - Silver Light Framework

I am creating one game in Windows phone using c# and silver light platform. I am new in this technology and currently facing memory leak issue.
As per research and study I have done, I have tried to do all the things including events, string and usage of garbage collector.
Can any one please give common tips to best utilize garbage collector and memory management since it seems issue right now. When my garbage collector reaches 5 lac size, it stop collecting new things and application is getting crash.
I also tried empty the garbage collectore passing parameter 0 in gc collect but it is crashing the app.
Can you please guide and help for basic things to take care, process to follow to avoid such issues and best use of GC collect?
Thanks in advance,
Jacob
In general, you should never have to call GC.Collect yourself as unused objects will be automatically collected every few seconds.
As for what can prevent objects from being collected, it comes down to them being "rooted". Roots include:
Any static references
Any references held by the run loop (your Application is the closest thing here)
Anything being displayed on the current page or any page behind it
Anything referenced by any of the above (including UI events), or referenced by anything that is referenced by any of the above (etc).
In the above scenarios, those objects and any objects they hold a reference to cannot be GC'd. So as for advice:
Avoid defining anything as static
Be careful how many objects are held by Application
Avoid a navigation model that allows your back stack to grow to ulimited levels
Potentially look at setting references to large data sets to null in your page/viewmodel's OnNavigatedFrom method and re-initialise them in OnNavigatedTo
I'd recommend using the Windows Phone Profiler, which comes with the 7.1 SDK. It will tell you what objects are in memory and why.
Without seeing any of your code, it is difficult to give specific advice.
However, I strongly suggest you run a memory profiling tool like ANTS Memory Profiler or .Net Memory Profiler. These tools will show you what portions of your code are never released and are very helpful in making the adjustments that you need.

Codeigniter. Autoload models, will things get slower?

I am building an API using Codeigniter.
In this API I got 10 models that I use now and then.
Currently I am loading them when I need them but I am thinking of auto loading
all models instead (to cut down on space in my controllers).
What will I loose by doing this? Will they cause things to slow down?
You are instructing CI to auto load your models into memory, which will increase memory footprint. I think autoloading won't have much effect of performance if you have plenty of RAM available but if you run PHP using mod_php then it might cause some slowdown because php processes have to respawned per request.
In any case, before making a decision -- Profile your app! there are two ways to do it.
PECL APD
Xdebug + kcachegrind (linux) or wincachegrind (windows) and it'll show you a few pretty charts that detail the exact timings, counts and memory usage (but you need another extension for that).
I would suggest PECL APD extension because its easier to setup

Are there memory leaks in Linux?

This question is purely theoretical.
I was wondering whether the Linux source code could have memory leaks, and how they debugged it, considering that it is Linux, after all, that deals with each program's memory?
I obviously understand that Linux, being written in C, has to deal itself with malloc and free. What I don't understand is how we measure the operating system's memory leaks.
Note that this question is not Linux-specific; it also addresses the corresponding issues in Windows and MacOS X (darwin).
Quite frequently non-mainstream drivers and the staging tree has memory leaks. Follow the LKML and you can see occasional fixes for mistakes in the networking code for corner cases handling lists of SKBs.
Due to the nature of the kernel most work is code review and refactoring, but work is ongoing to make more tools:
http://www.linuxfoundation.org/en/Google_Summer_of_Code#kmemtrace_-_Kernel_Memory_Profiler
In certain cases you can use frameworks like Usermode Linux and then use conventional tools like Valgrind to attempt to peer into the running code:
http://user-mode-linux.sourceforge.net/
The implementation of malloc and free (actually brk/sbrk, since malloc and free are implemented by libc in-process) are not magical or special - it's just code, just like anything else, and there are data structures behind it that describe the mappings.
If you want to test correct behavior, one way is to write test programs in user-space that are known to allocate then free all their memory correctly. Run the app, then check the internal memory allocation structures in kernel mode using a debugger (or better yet, make this check a debug assert on process shutdown).
All software has bugs, including operating systems. Some of those bugs will result in memory leaks.
The Linux has a kernel debugger to help track down these things, but one has to discover that they exist before one can track them down. Usually, once a bug has been discovered and can be replicated at will, it becomes much easier to fix (Relatively speaking! Obviously you need a good coder to do the job). The hard part is finding the bugs in the first place and creating reliable test cases that demonstrate them. This is where you need to have a skilled QA team.
So I guess the short version of this answer is that good QA is as important good coding.

Can you start and stop boundschecker (DevPartner)?

I'm trying to use boundschecker to analyze a rather complex program. Running the program with boundschecker is almost too slow for it to be of any use since it takes me almost a day to run the program to the point in the code where I suspect the issue exists. Can anyone give me some ideas for how to inspect only certain parts of my software using boundschecker (DevPartner) in Visual Studio 2005?
Thanks in advance for all your help!
I last used BoundsChecker a few years ago, and had the same problems. With large projects, it makes everything run so slowly that it is useless. We ended up ditching it.
But, we still needed some of it's functionality, but like you, not for the whole program. So we had to do it ourselves.
In our case, we mainly used it to try and track down memory leaks. If that's your objective as well, there are other options.
Visual Studio does a pretty good job of telling you about memory leaks when your program exits
It reports leaks in the order that they were created
It will tell you exactly where the leaked memory was created if your source files have this at the top
#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FILE[]=__FILE__;
#define new DEBUG_NEW
#endif
Those help a lot, but it's often not enough. Adding that snippet everywhere isn't always feasible. If you use factory classes, knowing where memory was allocated doesn't help at all. So when all else fails, we take advantage of #2.
Add something like the following:
#define LEAK(str) {char *s = (char*)malloc(100); strcpy(s, str);}
Then, pepper your code with "LEAK("leak1");" or whatever. Run the program, and exit it. Your new leaked strings will display in Visual Studio's leak dump surrounding the existing leaks. Keep adding/moving your LEAK statements and re-running the program to narrow your search until you've pinpointed the exact location. Then fix the leak, remove your debugging leaks, and you're all set!
BoundsChecker tracks all memory allocations and releases in extreme detail. It knows, for instance, that such and such a memory allocation was done from the C runtime heap, which in turn was taken from a Win32 heap, which in turn started life as memory allocated by VirtualAlloc. If the application was instrumented (FinalCheck), it also has detailed information as to which pointers reference the memory.
This is one reason (of many) why the thing is slow.
If BC were to connect to an application late, it would have none of this data built up, and would have either (1) dig it all up at once, or (2) start guessing about things. Neither solution is very practical.
One way to lighten up BoundsChecker is by excluding from instrumentation all but the few modules you are interested in. I know thats not great because if you knew where the leak was you wouldn't need BoundsChecker. What I usually recommend is that you use BC's Active Check mode first with only Memory Tracking available. You miss the API Validations but you could always rerun that seperately. After you run Active Check and you get clues regarding which modules tend to be problematic, only then do you enable instrumentation for the module or modules of interest and their dependencies. We know Final Check is annoyingly slow but as Mistiano correctly states, with Final Check not only does BC keep a graph of all allocated blocks but also all pointers and contexts to them. Therein lies the magic in how Final Check can nail leaks and corruptions at the point of occurance, not just on application shutdown or fault. Shameless plug: I work on the DevPartner team. We are releasing DPS 10.5 on February 4, 2011 with x64 application support in BC. Unlike the relatively ancient and undersold BC64 for Itanium which only provided Active Check, DPS 10.5 provides full Final Check support for x64 applications, both for pure C++ and for native modules running in .NET processes. See microfocus.com under MF Developer for details.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.
As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.
It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.
How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.
#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).
One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.
A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.
It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.
If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)

Resources