Is log4net much slower than System.Diagnostics.Trace? - performance

I'm investigating the differences between using log4net and System.Diagnostics.Trace for logging, and I'm curious about the performance differences I've observed.
I created a test application to compare the performance of both logging methods in several scenarios, and I'm finding that log4net is significantly slower than the Trace class. For example, in a scenario where I log 1,000 messages with no string formatting, log4net's mean execution time over 1,000 trials is 9.00ms. Trace executes with a mean of 1.13ms. A lot of my test cases have a relatively large amount of variance in the log4net execution times; the periodic nature of outlier long executions seems to suggest GC interference. Poking around with CLR Profiler confirms there are a large amount of collections for a ton of log4net.Core.LoggingEvent objects that are generated (to be fair, it looks like Trace generates a ton of Char[] objects as well, but it doesn't display the large variance that log4net does.)
One thing I'm keeping in mind here are that even though log4net seems roughly 9 times slower than Trace, the difference is 8ms over 1,000 iterations; this isn't exactly a significant performance drain. Still, some of my expected use cases might be calling methods that are logging things hundreds of thousands of times, and these numbers are from my fast machine. On a slower machine more typical of our users' configurations the difference is 170ms to 11ms which is a tiny bit more alarming.
Is this performance typical of log4net, or are there some gotchas that can significantly increase log4net's performance?
(NOTE: I am aware that string formatting can alter the execution time; I am trying to compare apples to apples and I have test cases with no formatting and test cases with formatting; log4net stays as proportionally slow whether string formatting is used or not.)
The story so far:
Robert Gould has the best answer to the question; I was mainly curious if it was typical to see log4net perform much slower than the Trace class.
Alex Shnayder's answer is interesting information but doesn't really fall under the scope of the question. Half of the intent for introducing this logging is to assist in debugging both logical and performance problems on live systems; our customers put our products in many exotic scenarios that are often difficult to reproduce without expensive and large-scale hardware configurations. My main concern is that a large timing difference between "not logging" and "logging" could affect the system in such a way that bugs don't happen. In the end, the scale of the performance decrease is large but the magnitude is small, so I'm hoping it won't be a problem.

yes log4xxx is slower than trace, since trace is normally a near kernel tool, while log4xxx is a much more powerful tool. Personally I prefer log4xxx because of it's fexibility, but if you want something that doesn't impact as much, and you don't really need logs for production,say in debug only trace should be enough.
Note: I use log4xxx because the exact same applies to all languages with a log4 library not just .Net

you might be interested in the Common.Logging library. It's a thin abstraction wrapper over existing logging implementations and allows you to plug in any logging framework you like at runtime. Also it is much faster then System.Diagnostics.Trace as described in my blog post about performance.
hth,
Erich

From my experience log4net performance isn't an issue in most cases.
The real question is, why would you even need to "logging things hundreds of thousands of times" in a production system.
As I see it, in production you should log only bare minimum (info nd may be warning level), and only if need to (debugging an issue on site) should activate debugging at debug level.

If you want the best of both worlds, log4net will allow you to log to the aspnet tracer as well. I turn this option on when I want to get performance stats that are tied in to specific events in my logging.

Have just run a test comparing sequental writing to a simple file compared to using Log4Net for the same task.Log4Net is about 400 times slower comparet to a StreamWriter.So I consider Log4Net not usable if You are writing to huge logfiles. But I find it very usefull for small amounts of log entries and debugging.
Maybe a solution to isolate logging in a separate thread in some cases.

Related

Why there is a huge gap in performance(in terms of time taken) when I use Stream API twice in Java 8? [duplicate]

How do you write (and run) a correct micro-benchmark in Java?
I'm looking for some code samples and comments illustrating various things to think about.
Example: Should the benchmark measure time/iteration or iterations/time, and why?
Related: Is stopwatch benchmarking acceptable?
Tips about writing micro benchmarks from the creators of Java HotSpot:
Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics.
Rule 1: Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)
Rule 2: Always run with -XX:+PrintCompilation, -verbose:gc, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.
Rule 2.1: Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.
Rule 3: Be aware of the difference between -client and -server, and OSR and regular compilations. The -XX:+PrintCompilation flag reports OSR compilations with an at-sign to denote the non-initial entry point, for example: Trouble$1::run # 2 (41 bytes). Prefer server to client, and regular to OSR, if you are after best performance.
Rule 4: Be aware of initialization effects. Do not print for the first time during your timing phase, since printing loads and initializes classes. Do not load new classes outside of the warmup phase (or final reporting phase), unless you are testing class loading specifically (and in that case load only the test classes). Rule 2 is your first line of defense against such effects.
Rule 5: Be aware of deoptimization and recompilation effects. Do not take any code path for the first time in the timing phase, because the compiler may junk and recompile the code, based on an earlier optimistic assumption that the path was not going to be used at all. Rule 2 is your first line of defense against such effects.
Rule 6: Use appropriate tools to read the compiler's mind, and expect to be surprised by the code it produces. Inspect the code yourself before forming theories about what makes something faster or slower.
Rule 7: Reduce noise in your measurements. Run your benchmark on a quiet machine, and run it several times, discarding outliers. Use -Xbatch to serialize the compiler with the application, and consider setting -XX:CICompilerCount=1 to prevent the compiler from running in parallel with itself. Try your best to reduce GC overhead, set Xmx(large enough) equals Xms and use UseEpsilonGC if it is available.
Rule 8: Use a library for your benchmark as it is probably more efficient and was already debugged for this sole purpose. Such as JMH, Caliper or Bill and Paul's Excellent UCSD Benchmarks for Java.
I know this question has been marked as answered but I wanted to mention two libraries that help us to write micro benchmarks
Caliper from Google
Getting started tutorials
http://codingjunkie.net/micro-benchmarking-with-caliper/
http://vertexlabs.co.uk/blog/caliper
JMH from OpenJDK
Getting started tutorials
Avoiding Benchmarking Pitfalls on the JVM
Using JMH for Java Microbenchmarking
Introduction to JMH
Important things for Java benchmarks are:
Warm up the JIT first by running the code several times before timing it
Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.
I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a couple of earlier posts which may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.
jmh is a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth a look.
The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.
Very interesting pieces of information buried in the sample tests comments.
See also:
Avoiding Benchmarking Pitfalls on the JVM
Discussion on the main strengths of jmh.
Should the benchmark measure time/iteration or iterations/time, and why?
It depends on what you are trying to test.
If you are interested in latency, use time/iteration and if you are interested in throughput, use iterations/time.
Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.
If you are trying to compare two algorithms, do at least two benchmarks for each, alternating the order. i.e.:
for(i=1..n)
alg1();
for(i=1..n)
alg2();
for(i=1..n)
alg2();
for(i=1..n)
alg1();
I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..
Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.
There are many possible pitfalls for writing micro-benchmarks in Java.
First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.
Second: You cannot trust the accuracy of the measured times for very short intervals.
Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.
My recommendations: Make your benchmark run some seconds, that is more reliable than a runtime over milliseconds. Warm up the JVM (means running the benchmark at least once without measuring, that the JVM can run optimizations). And run your benchmark multiple times (maybe 5 times) and take the median-value. Run every micro-benchmark in a new JVM-instance (call for every benchmark new Java) otherwise optimization effects of the JVM can influence later running tests. Don't execute things, that aren't executed in the warmup-phase (as this could trigger class-load and recompilation).
It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance test should be made.
This is because implementation A might be faster during most of the runs of the benchmark than implementation B. But A might also have a higher spread, so the measured performance benefit of A won't be of any significance when compared with B.
So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.
To add to the other excellent advice, I'd also be mindful of the following:
For some CPUs (e.g. Intel Core i5 range with TurboBoost), the temperature (and number of cores currently being used, as well as thier utilisation percent) affects the clock speed. Since CPUs are dynamically clocked, this can affect your results. For example, if you have a single-threaded application, the maximum clock speed (with TurboBoost) is higher than for an application using all cores. This can therefore interfere with comparisons of single and multi-threaded performance on some systems. Bear in mind that the temperature and volatages also affect how long Turbo frequency is maintained.
Perhaps a more fundamentally important aspect that you have direct control over: make sure you're measuring the right thing! For example, if you're using System.nanoTime() to benchmark a particular bit of code, put the calls to the assignment in places that make sense to avoid measuring things which you aren't interested in. For example, don't do:
long startTime = System.nanoTime();
//code here...
System.out.println("Code took "+(System.nanoTime()-startTime)+"nano seconds");
Problem is you're not immediately getting the end time when the code has finished. Instead, try the following:
final long endTime, startTime = System.nanoTime();
//code here...
endTime = System.nanoTime();
System.out.println("Code took "+(endTime-startTime)+"nano seconds");
http://opt.sourceforge.net/ Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.

How to save CPU cycles in Ruby/Rails?

What are the ways to save CPU cycles in a Ruby web application? As my project is expanding, the average time for rendering, running methods, db queries is increasing. I consider average of 60ms as high.
Any design guidelines to keep machine code tight and compact? Like - Being completely OO?
Using more variables and memory to avoid repeated iteration etc? Consolidating db queries?
try to find out your bottlenecks before actually start tuning everything.
Are you sure the problem is in the CPU? could it be your storage system?, is it your methods? is it your DB? are you using idices in your db? are you using more indices that you need in your db?
There is no all in one solution for your question.
You should run you application against a profiler, which will determine which point of your software eat the most resources. Optimize that part, and repeat the operation, until you're happy with the results.
There are plenty tools for measuring performance, ruby-prof, gperftools , and such stuff like newrelic, check out this keywords on google
Have you looked into caching? It's a very important tool for speeding up your web application and making it scale to high traffic, and in many cases it is orders of magnitude more efficient than regular code optimizations.
Besides the (correct) answers of profiling your code to find the parts that are actually slow, you could also look into alternate Ruby implementations that could be faster for your use case.
See:
http://jruby.org/
http://rubini.us/
for two alternatives that are reported to be faster than the standard 1.9 interpreter.

Do all profilers significantly slow execution?

The profilers I have experience with (mainly the Digital Mars D profiler that comes w/ the compiler) seem to massively slow down the execution of the program being profiled. This has a major effect on my willingness to use a profiler, as it makes profiling a "real" run of a lot of my programs, as opposed to testing on a very small input, impractical. I don't know much about how profilers are implemented. Is a major (>2x) slowdown when profiling pretty much a fact of life, or are there profilers that avoid it? If it can be avoided, are there any fast profilers available for D, preferrably for D2 and preferrably for free?
I don't know about D profilers, but in general there are two different ways a profiler can collect profiling information.
The first is by instrumentation, by injecting logging calls all over the place. This slows down the application more or less. Typically more.
The second is sampling. Then the profiler breaks the application at regular intervals and inspects the call stack. This does not slow down the application very much at all.
The downside of a sampling profiler is that the result is not as detailed as with an instrumenting profiler.
Check the documentation for your profiler if you can run with sampling instead of instrumentation. Otherwise you have some new Google terms in "sampling" and "instrumenting".
My favorite method of profiling slows the program way way down, and that's OK. I run the program under the debugger, with a realistic load, and then I manually interrupt it. Then I copy the call stack somewhere, like to Notepad. So it takes on the order of a minute to collect one sample. Then I can either resume execution, or it's even OK to start it over from the beginning to get another sample.
I do this 10 or 20 times, long enough to see what the program is actually doing from a wall-clock perspective. When I see something that shows up a lot, then I take more samples until it shows up again. Then I stop and really study what it is in the process of doing and why, which may take 10 minutes or more. That's how I find out if that activity is something I can replace with more efficient code, i.e. it wasn't totally necessary.
You see, I'm not interested in measuring how fast or slow it's going. I can do that separately with maybe only a watch. I'm interested in finding out which activities take a large percentage of time (not amount, percentage), and if something takes a large percentage of time, that is the probability that each stackshot will see it.
By "activity" I don't necessarily mean where the PC hangs out. In realistic software the PC is almost always off in a system or library routine somewhere. Typically more important is call sites in our code. If I see, for example, a string of 3 calls showing up on half of the stack samples, that represents very good hunting, because if any one of those isn't truly necessary and can be done away with, execution time will drop by half.
If you want a grinning manager, just do that once or twice.
Even in what you would think would be math-heavy scientific number crunching apps where you would think low-level optimization and hotspots would rule the day, you know what I often find? The math library routines are checking arguments, not crunching. Often the code is not doing what you think it's doing, and you don't have to run it at top speed to find that out.
I'd say yes, both sampling and instrumenting forms of profiling will tax your program heavily - regardless of whose profiler you are using, and on what language.
You could try h3r3tic's xfProf, which is a sampling profiler. Haven't tried it myself, but that guy always makes cool stuff :)
From the description:
If the program is sampled only a few hundred (or thousand)
times per second, the performance overhead will not be noticeable.

Effective Code Instrumentation?

All too often I read statements about some new framework and their "benchmarks." My question is a general one but to the specific points of:
What approach should a developer take to effectively instrument code to measure performance?
When reading about benchmarks and performance testing, what are some red-flags to watch out for that might not represent real results?
There are two methods of measuring performance: using code instrumentation and using sampling.
The commercial profilers (Hi-Prof, Rational Quantify, AQTime) I used in the past used code instrumentation (some of them could also use sampling) and in my experience, this gives the best, most detailed result. Especially Rational Quantity allow you to zoom in on results, focus on sub trees, remove complete call trees to simulate an improvement, ...
The downside of these instrumenting profilers is that they:
tend to be slow (your code runs about 10 times slower)
take quite some time to instrument your application
don't always correctly handle exceptions in the application (in C++)
can be hard to set up if you have to disable the instrumentation of DLL's (we had to disable instrumentation for Oracle DLL's)
The instrumentation also sometimes skews the times reported for low-level functions like memory allocations, critical sections, ...
The free profilers (Very Sleepy, Luke Stackwalker) that I use use sampling, which means that it is much easier to do a quick performance test and see where the problem lies. These free profilers don't have the full functionality of the commercial profilers (although I submitted the "focus on subtree" functionality for Very Sleepy myself), but since they are fast, they can be very useful.
At this time, my personal favorite is Very Sleepy, with Luke StackWalker coming second.
In both cases (instrumenting and sampling), my experience is that:
It is very difficult to compare the results of profilers over different releases of your application. If you have a performance problem in your release 2.0, profile your release 2.0 and try to improve it, rather than looking for the exact reason why 2.0 is slower than 1.0.
You must never compare the profiling results with the timing (real time, cpu time) results of an application that is run outside the profiler. If your application consumes 5 seconds CPU time outside the profiler, and when run in the profiler the profiler reports that it consumes 10 seconds, there's nothing wrong. Don't think that your application actually takes 10 seconds.
That's why you must consistently check results in the same environment. Consistently compare results of your application when run outside the profiler, or when run inside the profiler. Don't mix the results.
Also use a consistent environment and system. If you get a faster PC, your application could still run slower, e.g. because the screen is larger and more needs to be updated on screen. If moving to a new PC, retest the last (one or two) releases of your application on the new PC so you get an idea on how times scale to the new PC.
This also means: use fixed data sets and check your improvements on these datasets. It could be that an improvement in your application improves the performance of dataset X, but makes it slower with dataset Y. In some cases this may be acceptible.
Discuss with the testing team what results you want to obtain beforehand (see Oded's answer on my own question What's the best way to 'indicate/numerate' performance of an application?).
Realize that a faster application can still use more CPU time than a slower application, if the faster one uses multi-threading and the slower one doesn't. Discuss (as said before) with the testing time what needs to be measured and what doesn't (in the multi-threading case: real time instead of CPU time).
Realize that many small improvements may lead to one big improvement. If you find 10 parts in your application that each take 3% of the time and you can reduce it to 1%, your application will be 20% faster.
It depends what you're trying to do.
1) If you want to maintain general timing information, so you can be alert to regressions, various instrumenting profilers are the way to go. Make sure they measure all kinds of time, not just CPU time.
2) If you want to find ways to make the software faster, that is a distinctly different problem.
You should put the emphasis on the find, not on the measure.
For this, you need something that samples the call stack, not just the program counter (over multiple threads, if necessary). That rules out profilers like gprof.
Importantly, it should sample on wall-clock time, not CPU time, because you are every bit as likely to lose time due to I/O as due to crunching. This rules out some profilers.
It should be able to take samples only when you care, such as not when waiting for user input. This also rules out some profilers.
Finally, and very important, is the summary you get.
It is essential to get per-line percent of time.
The percent of time used by a line is the percent of stack samples containing the line.
Don't settle for function-only timings, even with a call graph.
This rules out still more profilers.
(Forget about "self time", and forget about invocation counts. Those are seldom useful and often misleading.)
Accuracy of finding the problems is what you're after, not accuracy of measuring them. That is a very important point. (You don't need a large number of samples, though it does no harm. The harm is in your head, making you think about measuring, rather than what is it doing.)
One good tool for this is RotateRight's Zoom profiler. Personally I rely on manual sampling.

What do I need to offset the performance setback induced by use of the Spring framework?

I am using Spring with Hibernate to create an Enterprise application.
Now, due to the abstractions given by the framework to the underlying J2EE
architecture, there is obviously going to be a runtime performance hit on my app.
What I need to know is a set of factors that I need to consider to make a decision about the minimum specs(Proc speed + RAM etc) that I need for a single host server of the application running RedHat Linux 3+ and devoted to running this application only, that would produce an efficiency score of say 8 out of 10 given a simultaneous-access-userbase increase of 100 per month.
No clustering is to be used.
No offense, but I'd bet that performance issues are more likely to be due to your application code than Spring.
If you look at the way they've written their source code, you'll see that they pay a great deal of attention to quality.
The only way to know is to profile your app, see where the time is being spent, analyze to determine root cause, correct it, rinse, repeat. That's science. Anything else is guessing.
I've used Spring in a production app that's run without a hitch for three years and counting. No memory leaks, no lost connections, no server bounces, no performance issues. It just runs like butter.
I seriously doubt that using Spring will significantly affect your performance.
What particular aspects of Spring are you expecting to cause performance issues?
There are so many variables here that the only answer is to "suck it and see", but, in a scientific manner.
You need to build a server than benchmark this. Start of with some "commodity" setup say 4 core cpu and 2 gig ram, then run a benchmark script to see if it meets your needs. (which most likely it will!).
If it doesnt you should be able to calculate the required server size from the nulbers you get out of the benchmark -- or -- fix the performance problem so it runs on hte hardware youve got.
The important thing is to identiffy what is limmiting your performance. Is you server using all the cores or are your processes stuck on a single core, is your JVM getting enough memory, are you IO bound or database bound.
Once you know the limiting factors its pretty easy to work out the solution -- either improve the efficiency of your programs or buy more of the right hardware.
Two thing to watch out for with J2EE -- most JVMs have default heap sizes from the last decade, make sure your JVM has enough Heap and Stack (at least 1G each!), -- it takes time for all the JIT compiling, object cacheing, module loading etc to settle down -- exercise your system for at least an hour before you start benchmarking.
As toolkit, I don't see Spring itself affecting the performance after initialization, but I think Hibernate will. How big this effect is, depends on a lot of details like the DB-Schema and how much relational layout differs from the OO layer and of course how DB-access is organized and how often DB-access happens etc. So I doubt, there is a rule of thumb to this. Just try out by developing significant prototypes using alternative applications servers or try a own small no-ORM-use-JDBC-version.
I've never heard that Spring creates any type of runtime performance hit. Since it uses mainly POJOs I'd be surprised if there was something wrong with it. Other than parsing a lot of XML on startup maybe, but that's solved by using annotations.
Just write your app first and then tune accordingly.
Spring is typically used to create long-lived objects shortly after the application starts. There is virtually no performance cost over the life of the process.
Which performance setback? In relation to what?
Did you measure the performance before using the framework?
If the Spring framework causes inacceptable performance issues the obvious solution is not to use it.

Resources