Garbage Collection in Biztalk, What would be the wise approach? - performance

Our Biztalk 2006 application contains two orchestrations which are invoked on frequent basis (approx 15 requests per second). We identified possible memory leakages in our application by doing certain throttling threshold changes in the host. As we disabled the memory based throttling, the Process memory started increasing till 1400 MB and after that we started to experience the out of memory exceptions.
We are forced to restart the host instances when this situation occurs.
We were wondering if explicitly calling GC.Collect from the Orchestration is fruitful in such a case. and what could be the cons of using this approach?
Thanks.

Out of memory exceptions occur only if the garbage collector was unable to free enough memory to perform a requested allocation. This can happen if you have a memory leak, which in a garbage collected platform, means some object references are kept longer than they need to. Frequent causes of leaks are objects that hold global data (static variables), such as a a singleton, a cache or a pool that keeps references for too long.
If you explicitly call GC.Collect, it will also fail to free the memory for the same reasons as implicit collection failed. So the explit GC.Collect call would only result in slowing down the orchestration.
If you are calling .Net classes from your orchestrations, I suggest trying to isolate the problem by calling the same classes from a pure .Net application (no BizTalk involved)
It's also possible that there's no leak, but that each instance is consuming too much memory at the same time. BizTalk can usually dehydrate orchestrations when it finds it necessary, but it may be prevented for doing that if a step in the orchestration (or a large atomic scope) takes too long to execute.
1400 mb also looks large for only 15 concurrent instances. Are you doing manipulations on large messages in the orchestration? In that case you can greatly reduce memory usage by avoiding operations that force the whole message to be loaded in memory, and instead manipulate the message using streaming.

Not knowing Biztalk my answer may be way off…
I am assuming that many more orchestration instances running in a process increases the time it take for a single orchestration instances to complete. Then as you increase the number of orchestration instances that you let run at the same time, at some point the time it takes them to complete will be large enough that the size of the running orchestration instances is to great for your RAM.
I think you need to do throttling based on the number of running orchestration instances. If you graph “rate of completion” against “number of running orchestration instances” you will likely see a big flat zone in the middle of the graph, choose your throttling to keep you in the middle of this stable zone.

I agree with the poster above. Trying to clear the memory or resetting the host instance is not the solution but just a band aid. you need to find where you are leaking memory. I would look at the whole Application and not just the orchestration; it is possible that the ports could also be causing your memory leak. Do you use custom functoids in your maps? how about inline code? Custom xslt?
I would also look at custom pipelines if you are using them.
If its possible i would try isolating the different components and putting them under stress and volume tests individually; somehow i dont think your orchestration itself is the problem, but rather a map or a custom component.

Doing a garbage collect isn't going to free up memory leaked, as its still (in error) referenced by your application somehow. You would only call GC.Collect if you did a lot of generating short lived objects and knew you were at a good point to free them.
You must identify and fix the leaking code!

I completely agree with most of what the other's said - you should look into where is your leak and fix it; calling the GC directly will not help you and in any case is very unlikely to be a reasonable way forward.
I would add though, that the throttling exists to protect your environment from grinding to a halt should you encounter sudden rise in resource consumption; without throttling it is possible for BizTalk (like any other server) to reach a point where it cannot continue processing and effectively "get stuck"; throttling allows it to slow down in order to ensure processing is still happening, until the resource consumption level (hopefully) returns to normal levels;
for that reason I would also suggest that you consider having some throttling configured for your environment, the values of which would have to be tweaked to suit your scenario

Related

Putting memory limits with .NET core

I am building a ML application for binary classification using ML.NET. It will have multiple ML models of varying sizes (built using different training data) which will be stored in SQL server database as Blob. Clients will send items for classification to this app in random order and based on client ID, corresponding model is to be used for classification. To classify item, model needs be read from database and then loaded into memory. Loading model in memory is taking considerable time depending on size and I don't see any way to optimize it. Hence I am planning to cache models in memory. If I cache many heavy models, it may put pressure on memory hampering performance of other processes running on server. So there is no straightforward way to limit caching. So looking for suggestions to handle this.
Spawn a new process
In my opinion this is the only viable option to accomplish what you're trying to do. Spawn a complete new process that communicates (via IPC?) with your "main application". You could set a memory limit using this property https://learn.microsoft.com/en-us/dotnet/api/system.gcmemoryinfo.totalavailablememorybytes?view=net-5.0 or maybe even use a 3rd-party-library (e.g. https://github.com/lowleveldesign/process-governor), that kills your process if it reaches a specific amount of RAM. Both of these approaches are quite rough and will basically kill your process.
If you have control over your side car application running, it might make sense to really monitor the RAM usage with something like this Getting a process's ram usage and gracefully stop the process.
Do it yourself solution (not recommended)
Basically there is no built in way of limiting memory usage by thread or similar.
What counts towards the memory limit?
Shared resources
Since you have a running process, you need to define what exactly counts towards the memory limit. For example if you have some static Dictionary that is manipulated by the running thread - what did it occupy? Only the diff between the old value and the new value? The whole new value? The key and the value?
There are many more cases like this you'll have to take into consideration.
The actual measuring
You need some kind of way to count the actual memory usage. This will probably be hard/near impossible to "implement":
Reference counting needed?
If you have a hostile thread, it might spawn an infinite amount of references to one object, no new keyword used. For each reference you'd have to count 32/64 bits.
What about built in types?
It might be "easy" to measure a byte[] included in your own type definition, but what about built in classes? If someone initializes a string with 100MB this might be an amount you need to keep track of.
... and many more ...
As you maybe noticed with previous samples, there is no easy definition of "RAM used by a thread". This is the reason there also is no easy to get the value of it.
In my opinion it's insanely complex to do such a thing and needs a lot of definition work to do on your side. It might be feasable with lots of effort but I'm not sure if that really is what you want. Even if you manage to - what will do you about it? Only killing the thread might not clean up the ressources.
Therefore I'd really think about having a OS managed, independent, process, that you can kill whenever you feel like it.
How big are your models? Even large models 100meg+ load pretty quickly off of fast/SSD storage. I would consider caching them on fast drives/SSDs, because pulling off of SQL Server is going to be much slower than raw disk. See if this helps your performance.

How could I make a Go program use more memory? Is that recommended?

I'm looking for option something similar to -Xmx in Java, that is to assign maximum runtime memory that my Go application can utilise. Was checking the runtime , but not entirely if that is the way to go.
I tried setting something like this with func SetMaxStack(), (likely very stupid)
debug.SetMaxStack(5000000000) // bytes
model.ExcelCreator()
The reason why I am looking to do this is because currently there is ample amount of RAM available but the application won't consume more than 4-6% , I might be wrong here but it could be forcing GC to happen much faster than needed leading to performance issue.
What I'm doing
Getting large dataset from RDBMS system , processing it to write out in excel.
Another reason why I am looking for such an option is to limit the maximum usage of RAM on the server where it will be ultimately deployed.
Any hints on this would greatly appreciated.
The current stable Go (1.10) has only a single knob which may be used to trade memory for lower CPU usage by the garbage collection the Go runtime performs.
This knob is called GOGC, and its description reads
The GOGC variable sets the initial garbage collection target percentage. A collection is triggered when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage. The default is GOGC=100. Setting GOGC=off disables the garbage collector entirely. The runtime/debug package's SetGCPercent function allows changing this percentage at run time. See https://golang.org/pkg/runtime/debug/#SetGCPercent.
So basically setting it to 200 would supposedly double the amount of memory the Go runtime of your running process may use.
Having said that I'd note that the Go runtime actually tries to adjust the behaviour of its garbage collector to the workload of your running program and the CPU processing power at hand.
I mean, that normally there's nothing wrong with your program not consuming lots of RAM—if the collector happens to sweep the garbage fast enough without hampering the performance in a significant way, I see no reason to worry about: the Go's GC is
one of the points of the most intense fine-tuning in the runtime,
and works very good in fact.
Hence you may try to take another route:
Profile memory allocations of your program.
Analyze the profile and try to figure out where the hot spots
are, and whether (and how) they can be optimized.
You might start here
and continue with the gazillion other
intros to this stuff.
Optimize. Typically this amounts to making certain buffers
reusable across different calls to the same function(s)
consuming them, preallocating slices instead of growing them
gradually, using sync.Pool where deemed useful etc.
Such measures may actually increase the memory
truly used (that is, by live objects—as opposed to
garbage) but it may lower the pressure on the GC.

Why do you use the keyword delete?

I understand that delete returns memory to the heap that was allocated of the heap, but what is the point? Computers have plenty of memory don't they? And all of the memory is returned as soon as you "X" out of the program.
Example:
Consider a server that allocates an object Packet for each packet it receives (this is bad design for the sake of the example).
A server, by nature, is intended to never shut down. If you never delete the thousands of Packet your server handles per second, your system is going to swamp and crash in a few minutes.
Another example:
Consider a video game that allocates particles for the special effect, everytime a new explosion is created (and never deletes them). In a game like Starcraft (or other recent ones), after a few minutes of hilarity and destruction (and hundres of thousands of particles), lag will be so huge that your game will turn into a PowerPoint slideshow, effectively making your player unhappy.
Not all programs exit quickly.
Some applications may run for hours, days or longer. Daemons may be designed to run without cease. Programs can easily consume more memory over their lifetime than available on the machine.
In addition, not all programs run in isolation. Most need to share resources with other applications.
There are a lot of reasons why you should manage your memory usage, as well as any other computer resources you use:
What might start off as a lightweight program could soon become more complex, depending on your design areas of memory consumption may grow exponentially.
Remember you are sharing memory resources with other programs. Being a good neighbour allows other processes to use the memory you free up, and helps to keep the entire system stable.
You don't know how long your program might run for. Some people hibernate their session (or never shut their computer down) and might keep your program running for years.
There are many other reasons, I suggest researching on memory allocation for more details on the do's and don'ts.
I see your point, what computers have lots of memory but you are wrong. As an engineer you have to create programs, what uses computer resources properly.
Imagine, you made program which runs all the time then computer is on. It sometimes creates some objects/variables with "new". After some time you don't need them anymore and you don't delete them. Such a situation occurs time to time and you just make some RAM out of stock. After a while user have to terminate your program and launch it again. It is not so bad but it not so comfortable, what is more, your program may be loading for a while. Because of these user feels bad of your silly decision.
Another thing. Then you use "new" to create object you call constructor and "delete" calls destructor. Lets say you need to open so file and destructor closes it and makes it accessible for other processes in this case you would steel not only memory but also files from other processes.
If you don't want to use "delete" you can use shared pointers (it has garbage collector).
It can be found in STL, std::shared_ptr, it has one disatvantage, WIN XP SP 2 and older do not support this. So if you want to create something for public you should use boost it also has boost::shared_ptr. To use boost you need to download it from here and configure your development environment to use it.

Memory management for intentionally memory intensive applications

Note: I am aware of the question Memory management in memory intensive application, however that question appears to be about applications that make frequent memory allocations, whereas my question is about applications intentionally designed to consume as much physical memory as is safe.
I have a server application that uses large amounts of memory in order to perform caching and other optimisations (think SQL Server). The application runs on a dedicated machine, and so can (and should) consume as much memory as it wants / is able to in order to speed up and increase throughput and response times without worry of impacting other applications on the system.
The trouble is that if memory usage is underestimated, or if load increases its possible to end up with nasty failures as memory allocations fail - in this situation obviously the best thing to do is to free up memory in order to prevent the failure at the expense of performance.
Some assumptions:
The application is running on a dedicated machine
The memory requirements of the application exceed the physical memory on the machine (that is, if additional memory was available to the application it would always be able to use that memory to in some way improve response times or throughput)
The memory is effectively managed in a way such that memory fragmentation is not an issue.
The application knows what memory can be safely freed, and what memory should be freed first for the least performance impact.
The app runs on a Windows machine
My question is - how should I handle memory allocations in such an application? In particular:
How can I predict whether or not a memory allocation will fail?
Should I leave a certain amount of memory free in order to ensure that core OS operations remain responsive (and don't in that way adversely impact the applications performance), and how can I find out how much memory that is?
The core objective is to prevent failures as a result of using too much memory, while at the same time using up as much memory as possible.
I'm a C# developer, however my hope is that the basic concepts for any such app are the same regardless of the language.
In linux, the memory usage percentage is divided into following levels.
0 - 30% - no swapping
30 - 60% - swap dirty pages only
60 - 90% - swap clean pages also based on LRU policy.
90% - Invoke OOM(Out of memory) killer and kill the process consuming maximum memory.
check this - http://linux-mm.org/OOM_Killer
In think windows might have similar policy, so you can check the memory stats and make sure you never get to the max threshold.
One way to stop consuming more memory is to go to sleep and give more time for memory cleanup threads to run.
That is a very good question, and bound to be subjective as well, because the very nature of the fundamental of C# is that all memory management is done by the runtime, i.e. Garbage Collector. The Garbage Collector is a non-deterministic entity that manages and sweeps the memory for reclaiming, depending on how often the memory gets fragmented, the GC will kick in hence to know in advance is not easy thing to do.
To properly manage the memory sounds tedious but common sense applies, such as the using clause to ensure an object gets disposed. You could put in a single handler to trap the OutOfMemory Exception but that is an awkward way, since if the program has run out of memory, does the program just seize up, and bomb out, or should it wait patiently for the GC to kick in, again determining that is tricky.
The load of the system can adversely affect the GC's job, almost to a point of a Denial of Service, where everything just grind to a halt, again, since the specifications of the machine, or what is the nature of that machine's job is unknown, I cannot answer it fully, but I'll assume it has loads of RAM..
In essence, while an excellent question, I think you should not worry about it and leave it to the .NET CLR to handle the memory allocation/fragmentation as it seems to do a pretty good job.
Hope this helps,
Best regards,
Tom.
Your question reminds me of an old discussion "So what's wrong with 1975 programming ?". The architect of varnish-cache argues, that instead of telling the OS to get out of the way and manage all memory yourself, you should rather cooperate with the OS and let it understand what you intend to do with memory.
For example, instead of simply reading data from disk, you should use memory-mapped files. This allows the OS to apply its LRU algorithm to write-back data to disk when memory becomes scarce. At the same time, as long as there is enough memory, your data will stay in memory. Thus, your application may potentially use all memory, without risking getting killed.

Do static classes cause performance issues on multi-core systems?

the other day a colleague of mine stated that using static classes can cause performance issues on multi-core systems, because the static instance cannot be shared between the processor caches. Is that right? Are there some benchmarks around proofing this statement? This statement was made in the context of .Net development (with C#) related discussion, but it sounds to me like a language and environment independent problem.
Thx for your comments.
I would push your colleague for data or at least references.
The thing is, if you've got shared data, you've got shared data. Whether that's exposed through static classes, a singleton, whatever, isn't terribly important. If you don't need the shared data in the first place, I expect you wouldn't have a static class anyway.
Besides all of this, in any given application there's likely to be a much bigger bottleneck than processor caches for shared data in static classes.
As ever, write the most sensible, readable, maintainable code first - then work out if you have a performance bottleneck and act accordingly.
"[a] static instance cannot be shared between the processor caches. Is that right?"
That statement doesn't make much sense to me. The point of each processor's dedicated cache is that it contains a private copy of a small patch of memory, so that if the processor is doing some algorithm that only needs to access that particular memory region then it doesn't have to go to keep going back to access the external memory. If we're talking about the static fields inside a static class, the memory for those fields may all fit into a contiguous chunk of memory that will in turn fit into a single processor's (or core's) dedicated cache. But they each have their own cached copy - it's not "shared". That's the point of caches.
If an algorithm's working set is bigger than a cache then it will defeat that cache. Meaning that as the algorithm runs, it repeatedly causes the processor to pull data from external memory, because all the necessary pieces won't fit in the cache at once. But this is a general problem that doesn't apply specifically to static classes.
I wonder if your colleague was actually talking not about performance but about the need to apply correct locking if multiple threads are reading/writing the same data?
If multiple threads are writing to that data, you'll have cache thrashing (the write on one CPU's cache invalidates the caches of the other CPUs). Your friend is technically correct, but there's a good chance it's not your primary bottleneck, so it doesn't matter.
If multiple threads are reading the data, your friend is flat-out wrong.
If you don't use any kind of locks or synchronization then static-vs.-non-static won't have any influence on your performance.
If you're using synchronization then you could run into a problem if all threads need to acquire the same lock, but that's only a side-effect of the static-ness and not a direct result of the methods being static.
In any "virtual machine" controlled language (.NET, Java, etc) this control is likely delegated to the underlying OS and likely further down to the BIOS and other scheduling controls. That being said, in the two biggies, .NET and Java, static vs. non-static is a memory issue, not a CPU issue.
Re-iterating saua's point, the impact on the CPU comes from the synchronization and thread control, not the access to the static information.
The problem with CPU cache management is not limited to only static methods. Only one CPU can update any memory address at a time. An object in your virtual machine, and specifically a field in your object, is a pointer to said memory address. Thus, even if I have a mutable object Foo, calling setBar(true) on Foo will only be allowed on a single CPU at a time.
All that being said, the point of .NET and Java is that you shouldn't be spending your time sweating these problems until you can prove that you have a problem and I doubt you will.
if you share mutable data between threads, you need either a lock or a lock-free algorithm (seldom available, and sometimes hard to use, unfortunately).
having few, widely used, lock-arbitrated resources can get you to bottlenecks.
static data is similar to a single-instance resource.
therefore:
if many threads access static data, and you use a lock to arbitrate, your threads are going to fight for access.
when designing a highly multithreaded app, try to use many fine-grained locks. split your data so that a thread can grab one piece and run with it, hopefully no other thread will need to wait for it because they're busy with their own pieces of data.
x86 architecture implements cache-snooping to keep data caches in sync on writes should they happen to cache the same thing... Not all architectures do that in hardware, some depend on software to make sure that the case never occurs.
Even if it were true, I suspect you have plenty of better ways to improve performance. When it gets down to changing static to instance, for processor caching, you'll know you are really pushing the envelope.

Resources