Overheads involved in OSGI - osgi

OSGI is a buzzword today. It has many advantages like loose coupling, reusability. But i want to know whether there are any overheads involved with OSGI as i have to use in my project. Does it effect speed or any other kind of overhead. Please help
Thanks

There is no real overhead. OSGi was designed to work in very memory-constrained environments, and it can actually make execution speed better because of the smaller class space to search for each load request.
It is always possible for a naïve developer to screw up performance by doing silly things, but that is true of any environment.

In a typical OSGi environment you will have the bundle classloader and OSGi services where you could suspect they affect performance. The bundle classloader simply sees a smaller space of classes so it should not be slower than a normal classloader. In some cases it could be even faster like Neil wrote. The OSGi services also should not affect performance. Basically they are just a way to look up Impl classes for an interface. So once you have the Impl resolved it is just a method call with no overhead. There is no serialization and no proxies involved.

The largest negative production impact of using OSGi is the increase in PermGen space required due to having multiple versions of classes loaded with different classloaders. Unless using JRockit (where allocated from the OS as-needed), maximum PermGen size is fixed at JVM startup, and can be hard to clear if one has classloader leaks (which are easy to collect), making it potentially a somewhat limited resource.
There is not substantial performance impact. There may be some cognitive load in knowing what to expect in corner cases -- for instance, instances of the "same class" loaded through different classloaders will result in them having different types (instanceof returning false in cases where comparing obj.getClass().getName() return true) -- but performance will not be impacted.

Related

What do I need to offset the performance setback induced by use of the Spring framework?

I am using Spring with Hibernate to create an Enterprise application.
Now, due to the abstractions given by the framework to the underlying J2EE
architecture, there is obviously going to be a runtime performance hit on my app.
What I need to know is a set of factors that I need to consider to make a decision about the minimum specs(Proc speed + RAM etc) that I need for a single host server of the application running RedHat Linux 3+ and devoted to running this application only, that would produce an efficiency score of say 8 out of 10 given a simultaneous-access-userbase increase of 100 per month.
No clustering is to be used.
No offense, but I'd bet that performance issues are more likely to be due to your application code than Spring.
If you look at the way they've written their source code, you'll see that they pay a great deal of attention to quality.
The only way to know is to profile your app, see where the time is being spent, analyze to determine root cause, correct it, rinse, repeat. That's science. Anything else is guessing.
I've used Spring in a production app that's run without a hitch for three years and counting. No memory leaks, no lost connections, no server bounces, no performance issues. It just runs like butter.
I seriously doubt that using Spring will significantly affect your performance.
What particular aspects of Spring are you expecting to cause performance issues?
There are so many variables here that the only answer is to "suck it and see", but, in a scientific manner.
You need to build a server than benchmark this. Start of with some "commodity" setup say 4 core cpu and 2 gig ram, then run a benchmark script to see if it meets your needs. (which most likely it will!).
If it doesnt you should be able to calculate the required server size from the nulbers you get out of the benchmark -- or -- fix the performance problem so it runs on hte hardware youve got.
The important thing is to identiffy what is limmiting your performance. Is you server using all the cores or are your processes stuck on a single core, is your JVM getting enough memory, are you IO bound or database bound.
Once you know the limiting factors its pretty easy to work out the solution -- either improve the efficiency of your programs or buy more of the right hardware.
Two thing to watch out for with J2EE -- most JVMs have default heap sizes from the last decade, make sure your JVM has enough Heap and Stack (at least 1G each!), -- it takes time for all the JIT compiling, object cacheing, module loading etc to settle down -- exercise your system for at least an hour before you start benchmarking.
As toolkit, I don't see Spring itself affecting the performance after initialization, but I think Hibernate will. How big this effect is, depends on a lot of details like the DB-Schema and how much relational layout differs from the OO layer and of course how DB-access is organized and how often DB-access happens etc. So I doubt, there is a rule of thumb to this. Just try out by developing significant prototypes using alternative applications servers or try a own small no-ORM-use-JDBC-version.
I've never heard that Spring creates any type of runtime performance hit. Since it uses mainly POJOs I'd be surprised if there was something wrong with it. Other than parsing a lot of XML on startup maybe, but that's solved by using annotations.
Just write your app first and then tune accordingly.
Spring is typically used to create long-lived objects shortly after the application starts. There is virtually no performance cost over the life of the process.
Which performance setback? In relation to what?
Did you measure the performance before using the framework?
If the Spring framework causes inacceptable performance issues the obvious solution is not to use it.

Garbage Collection in Biztalk, What would be the wise approach?

Our Biztalk 2006 application contains two orchestrations which are invoked on frequent basis (approx 15 requests per second). We identified possible memory leakages in our application by doing certain throttling threshold changes in the host. As we disabled the memory based throttling, the Process memory started increasing till 1400 MB and after that we started to experience the out of memory exceptions.
We are forced to restart the host instances when this situation occurs.
We were wondering if explicitly calling GC.Collect from the Orchestration is fruitful in such a case. and what could be the cons of using this approach?
Thanks.
Out of memory exceptions occur only if the garbage collector was unable to free enough memory to perform a requested allocation. This can happen if you have a memory leak, which in a garbage collected platform, means some object references are kept longer than they need to. Frequent causes of leaks are objects that hold global data (static variables), such as a a singleton, a cache or a pool that keeps references for too long.
If you explicitly call GC.Collect, it will also fail to free the memory for the same reasons as implicit collection failed. So the explit GC.Collect call would only result in slowing down the orchestration.
If you are calling .Net classes from your orchestrations, I suggest trying to isolate the problem by calling the same classes from a pure .Net application (no BizTalk involved)
It's also possible that there's no leak, but that each instance is consuming too much memory at the same time. BizTalk can usually dehydrate orchestrations when it finds it necessary, but it may be prevented for doing that if a step in the orchestration (or a large atomic scope) takes too long to execute.
1400 mb also looks large for only 15 concurrent instances. Are you doing manipulations on large messages in the orchestration? In that case you can greatly reduce memory usage by avoiding operations that force the whole message to be loaded in memory, and instead manipulate the message using streaming.
Not knowing Biztalk my answer may be way off…
I am assuming that many more orchestration instances running in a process increases the time it take for a single orchestration instances to complete. Then as you increase the number of orchestration instances that you let run at the same time, at some point the time it takes them to complete will be large enough that the size of the running orchestration instances is to great for your RAM.
I think you need to do throttling based on the number of running orchestration instances. If you graph “rate of completion” against “number of running orchestration instances” you will likely see a big flat zone in the middle of the graph, choose your throttling to keep you in the middle of this stable zone.
I agree with the poster above. Trying to clear the memory or resetting the host instance is not the solution but just a band aid. you need to find where you are leaking memory. I would look at the whole Application and not just the orchestration; it is possible that the ports could also be causing your memory leak. Do you use custom functoids in your maps? how about inline code? Custom xslt?
I would also look at custom pipelines if you are using them.
If its possible i would try isolating the different components and putting them under stress and volume tests individually; somehow i dont think your orchestration itself is the problem, but rather a map or a custom component.
Doing a garbage collect isn't going to free up memory leaked, as its still (in error) referenced by your application somehow. You would only call GC.Collect if you did a lot of generating short lived objects and knew you were at a good point to free them.
You must identify and fix the leaking code!
I completely agree with most of what the other's said - you should look into where is your leak and fix it; calling the GC directly will not help you and in any case is very unlikely to be a reasonable way forward.
I would add though, that the throttling exists to protect your environment from grinding to a halt should you encounter sudden rise in resource consumption; without throttling it is possible for BizTalk (like any other server) to reach a point where it cannot continue processing and effectively "get stuck"; throttling allows it to slow down in order to ensure processing is still happening, until the resource consumption level (hopefully) returns to normal levels;
for that reason I would also suggest that you consider having some throttling configured for your environment, the values of which would have to be tweaked to suit your scenario

Do static classes cause performance issues on multi-core systems?

the other day a colleague of mine stated that using static classes can cause performance issues on multi-core systems, because the static instance cannot be shared between the processor caches. Is that right? Are there some benchmarks around proofing this statement? This statement was made in the context of .Net development (with C#) related discussion, but it sounds to me like a language and environment independent problem.
Thx for your comments.
I would push your colleague for data or at least references.
The thing is, if you've got shared data, you've got shared data. Whether that's exposed through static classes, a singleton, whatever, isn't terribly important. If you don't need the shared data in the first place, I expect you wouldn't have a static class anyway.
Besides all of this, in any given application there's likely to be a much bigger bottleneck than processor caches for shared data in static classes.
As ever, write the most sensible, readable, maintainable code first - then work out if you have a performance bottleneck and act accordingly.
"[a] static instance cannot be shared between the processor caches. Is that right?"
That statement doesn't make much sense to me. The point of each processor's dedicated cache is that it contains a private copy of a small patch of memory, so that if the processor is doing some algorithm that only needs to access that particular memory region then it doesn't have to go to keep going back to access the external memory. If we're talking about the static fields inside a static class, the memory for those fields may all fit into a contiguous chunk of memory that will in turn fit into a single processor's (or core's) dedicated cache. But they each have their own cached copy - it's not "shared". That's the point of caches.
If an algorithm's working set is bigger than a cache then it will defeat that cache. Meaning that as the algorithm runs, it repeatedly causes the processor to pull data from external memory, because all the necessary pieces won't fit in the cache at once. But this is a general problem that doesn't apply specifically to static classes.
I wonder if your colleague was actually talking not about performance but about the need to apply correct locking if multiple threads are reading/writing the same data?
If multiple threads are writing to that data, you'll have cache thrashing (the write on one CPU's cache invalidates the caches of the other CPUs). Your friend is technically correct, but there's a good chance it's not your primary bottleneck, so it doesn't matter.
If multiple threads are reading the data, your friend is flat-out wrong.
If you don't use any kind of locks or synchronization then static-vs.-non-static won't have any influence on your performance.
If you're using synchronization then you could run into a problem if all threads need to acquire the same lock, but that's only a side-effect of the static-ness and not a direct result of the methods being static.
In any "virtual machine" controlled language (.NET, Java, etc) this control is likely delegated to the underlying OS and likely further down to the BIOS and other scheduling controls. That being said, in the two biggies, .NET and Java, static vs. non-static is a memory issue, not a CPU issue.
Re-iterating saua's point, the impact on the CPU comes from the synchronization and thread control, not the access to the static information.
The problem with CPU cache management is not limited to only static methods. Only one CPU can update any memory address at a time. An object in your virtual machine, and specifically a field in your object, is a pointer to said memory address. Thus, even if I have a mutable object Foo, calling setBar(true) on Foo will only be allowed on a single CPU at a time.
All that being said, the point of .NET and Java is that you shouldn't be spending your time sweating these problems until you can prove that you have a problem and I doubt you will.
if you share mutable data between threads, you need either a lock or a lock-free algorithm (seldom available, and sometimes hard to use, unfortunately).
having few, widely used, lock-arbitrated resources can get you to bottlenecks.
static data is similar to a single-instance resource.
therefore:
if many threads access static data, and you use a lock to arbitrate, your threads are going to fight for access.
when designing a highly multithreaded app, try to use many fine-grained locks. split your data so that a thread can grab one piece and run with it, hopefully no other thread will need to wait for it because they're busy with their own pieces of data.
x86 architecture implements cache-snooping to keep data caches in sync on writes should they happen to cache the same thing... Not all architectures do that in hardware, some depend on software to make sure that the case never occurs.
Even if it were true, I suspect you have plenty of better ways to improve performance. When it gets down to changing static to instance, for processor caching, you'll know you are really pushing the envelope.

Is log4net much slower than System.Diagnostics.Trace?

I'm investigating the differences between using log4net and System.Diagnostics.Trace for logging, and I'm curious about the performance differences I've observed.
I created a test application to compare the performance of both logging methods in several scenarios, and I'm finding that log4net is significantly slower than the Trace class. For example, in a scenario where I log 1,000 messages with no string formatting, log4net's mean execution time over 1,000 trials is 9.00ms. Trace executes with a mean of 1.13ms. A lot of my test cases have a relatively large amount of variance in the log4net execution times; the periodic nature of outlier long executions seems to suggest GC interference. Poking around with CLR Profiler confirms there are a large amount of collections for a ton of log4net.Core.LoggingEvent objects that are generated (to be fair, it looks like Trace generates a ton of Char[] objects as well, but it doesn't display the large variance that log4net does.)
One thing I'm keeping in mind here are that even though log4net seems roughly 9 times slower than Trace, the difference is 8ms over 1,000 iterations; this isn't exactly a significant performance drain. Still, some of my expected use cases might be calling methods that are logging things hundreds of thousands of times, and these numbers are from my fast machine. On a slower machine more typical of our users' configurations the difference is 170ms to 11ms which is a tiny bit more alarming.
Is this performance typical of log4net, or are there some gotchas that can significantly increase log4net's performance?
(NOTE: I am aware that string formatting can alter the execution time; I am trying to compare apples to apples and I have test cases with no formatting and test cases with formatting; log4net stays as proportionally slow whether string formatting is used or not.)
The story so far:
Robert Gould has the best answer to the question; I was mainly curious if it was typical to see log4net perform much slower than the Trace class.
Alex Shnayder's answer is interesting information but doesn't really fall under the scope of the question. Half of the intent for introducing this logging is to assist in debugging both logical and performance problems on live systems; our customers put our products in many exotic scenarios that are often difficult to reproduce without expensive and large-scale hardware configurations. My main concern is that a large timing difference between "not logging" and "logging" could affect the system in such a way that bugs don't happen. In the end, the scale of the performance decrease is large but the magnitude is small, so I'm hoping it won't be a problem.
yes log4xxx is slower than trace, since trace is normally a near kernel tool, while log4xxx is a much more powerful tool. Personally I prefer log4xxx because of it's fexibility, but if you want something that doesn't impact as much, and you don't really need logs for production,say in debug only trace should be enough.
Note: I use log4xxx because the exact same applies to all languages with a log4 library not just .Net
you might be interested in the Common.Logging library. It's a thin abstraction wrapper over existing logging implementations and allows you to plug in any logging framework you like at runtime. Also it is much faster then System.Diagnostics.Trace as described in my blog post about performance.
hth,
Erich
From my experience log4net performance isn't an issue in most cases.
The real question is, why would you even need to "logging things hundreds of thousands of times" in a production system.
As I see it, in production you should log only bare minimum (info nd may be warning level), and only if need to (debugging an issue on site) should activate debugging at debug level.
If you want the best of both worlds, log4net will allow you to log to the aspnet tracer as well. I turn this option on when I want to get performance stats that are tied in to specific events in my logging.
Have just run a test comparing sequental writing to a simple file compared to using Log4Net for the same task.Log4Net is about 400 times slower comparet to a StreamWriter.So I consider Log4Net not usable if You are writing to huge logfiles. But I find it very usefull for small amounts of log entries and debugging.
Maybe a solution to isolate logging in a separate thread in some cases.

Caching Schemes for Managed Languages

This is mostly geared toward desktop application developers. How do I design a caching block which plays nicely with the GC? How do I tell the GC that I have just done a cache sweep and it is time to do a GC? How do I get an accurate measure of when it is time to do a cache sweep?
Are there any prebuilt caching schemes which I could borrow some ideas from?
While I obviously cannot speak to the specifics of your application, in most instances you should not tie your caching implementation to some perceived expectation for how the GC will work. As Stu mentions, calling GC.Collect() will force a collection (with overloads for a specific generation) but more often than not doing so will result in worse performance than just letting the GC manage itself.
If you do find (after doing some real performance testing) that you need to interact with the GC make sure you take into account the different types of GC's that the framework currently has (see here for more information).
All you'll ever need to know (and then some):
http://msdn.microsoft.com/en-us/library/ee817645.aspx
Oh, and GC.Collect() forces a collect.

Resources