Currently we are looking to use drools in a system for managing knowledge intensive processes.
To the best of my knowledge information about and results of performance tests done for drools are scarce and hard to find.
While use cases differ widely, it would be good to know common bottlenecks (the inserts are one, for example) and possible best practices to get around them for certain scenarios. Also, knowing more about the performance in general could help evaluating if Drools is a viable solution to the problems at hand.
Is there any information about for example performance metrics and performance tests for drools made available?
The only pitfall I found fundamental while working with drools is about when block authoring. Don't use computations in when blocks.
This is what drools says even about getters!:
Person( age == 50 )
// this is the same as:
Person( getAge() == 50 )
Note
We recommend using property access (age) over using getters explicitly (getAge()) because of performance enhancements through field indexing. (source)
Then block by default is executed by the same (single) thread. This is now your professionalism to make it efficient. Use asynchronous processing (#Anync in Spring), thread pool, java parallel streams etc.
You can use jvisualvm to profile the run where rules will be listed as ordinal java method calls and you can use any other tools to analyse bottlenecks in your code, like this library for tests and live environments.
Related
We are planning for Java 8 migration for our application from Java 7. As part of this migration, the most important thing that we want to achieve is to recompile our source code using JDK 8 and gain out of the performance improvement made in JVM, garbage collection model etc. Besides this, we want to also set stage to take advantage from the new features added in Java 8.
My question to this group is to get some advice on how should we plan our testing. What are the key areas that we should be watching out for? What are some of the challenges that others have faced?
Note: Our application is intended for low latency use.
A few things...
Some things that compile under jdk-7 might not with jdk-8. This is because lots of bugs were fixed and some code now potentially might be much closer to the jls (this is probably more about generics, but might affect other areas as well).
If you have external libraries, not all are compatible with jdk-8.
HashMap internals have changed. If you rely on some iteration order (I've seen that), it might now fail; otherwise the internal changes will only make your HashMap usages faster.
You say that your app is intended for low latency. Be aware that Stream operations are slower and require more resources then simpler structures. BUT unless you actually measure this to be an impact (it wasn't my case when migrating), there's nothing to worry about.
This is a great example where if you have your test cases in place - they would help A LOT. You would be catching all the major problems at that point (if any).
I'd say that the biggest challenge for me was not the migration itself, but the post-migration. Lots of people (including me) have done multiple mistakes in basic stuff - since lambda and Streams were quite new. My personal advice here - do not be afraid to ask. Better late then sorry.
P.S. As noted in comment, you should also check this guide.
I'm trying to get a better understanding of server languages / frameworks and their potential advantages and disadvantages as used in a microservice environment. Development time is not important to me since this is for my own personal project and learning to use the right tool for the problem is more important to me than the development time required to build the service.
The more I think about it, the more I think that Elixir should be used 90% of the time. The reason is twofold:
1) concurrency implies many users can hit the service without fail
2) most microservices have 0 processing overhead, they hit a database and return a json. I.e. the gains from hitting a database with a faster language are not differentiable from using a slower language. The database in question will determine the speed at which data is returned, not the server language since the database implementation will itself be written in lower language like C++. (Is this true? Will Elixir + Postgresql be noticably slower than Go + Postgresql? Or even Ruby + Postgresql? Is the bottleneck Postgresql or the language making the request?)
Assuming the above 2 are true, then it stands to reason to me that I would use Elixir 90% of the time because I would get a service that is future-proofed to traffic spikes and since it will generally have the same speed of execution as any other database retrieval Rest APIs.
The other 10% of the time where a service requires processor speed like an Image Recognition service I would then implement in C++ or in Python because it has libraries already implemented in C++ for Image Recognition (ie Tensor Flow).
Is this a correct way of thinking about when to use specific languages for a microservice? If not, besides Development Time what else should I consider?
Assuming the above 2 are true, then it stands to reason to me that I would use Elixir 90% of the time [...]
Be careful when making these kinds of statements! They tempt you into choosing the thing you always choose when setting up a new service, when actually you should be thinking about what that service is supposed to do and what languages and frameworks help you get there best! That said: your two premises are true! A DB hit is the most expensive operation and concurrency is a vital tool when handling larger loads. They are true but not complete: There are other conditions you might need to think about like resource consumption, scheduling behavior of your platform etc.
On the count of languages: Managed languages (like for example everything based on the JVM or .NET runtime) always imply a certain static overhead because their need to do garbage collection, or their need to compile code on the go, dynamic type deduction at runtime, reflection etc. This means, that they will need more memory and CPU cycles from your machines than other languages like C++, GO, Rust and the likes.
While you have to do memory management yourself in languages like C++, languages like GO, D and Rust attempt to provide a middle ground towards fully managed languages/runtimes like JVM or .NET.
What matters at least as much as your choice in languages/runtimes is your architecture. Everything involving classic databases will probably give you troubles on the scaling side of things, Everything hitting a disk is going to kill you under load!
So what's the my suggestion? Keep all the variables in mind (Request latency is not the only metric! Resource consumption can be a killer too!), choose the best language and toolchains for whatever purpose your service has to fullfill and validate different architectures!
What is a good workflow for detecting performance regressions in R packages? Ideally, I'm looking for something that integrates with R CMD check that alerts me when I have introduced a significant performance regression in my code.
What is a good workflow in general? What other languages provide good tools? Is it something that can be built on top unit testing, or that is usually done separately?
This is a very challenging question, and one that I'm frequently dealing with, as I swap out different code in a package to speed things up. Sometimes a performance regression comes along with a change in algorithms or implementation, but it may also arise due to changes in the data structures used.
What is a good workflow for detecting performance regressions in R packages?
In my case, I tend to have very specific use cases that I'm trying to speed up, with different fixed data sets. As Spacedman wrote, it's important to have a fixed computing system, but that's almost infeasible: sometimes a shared computer may have other processes that slow things down 10-20%, even when it looks quite idle.
My steps:
Standardize the platform (e.g. one or a few machines, a particular virtual machine, or a virtual machine + specific infrastructure, a la Amazon's EC2 instance types).
Standardize the data set that will be used for speed testing.
Create scripts and fixed intermediate data output (i.e. saved to .rdat files) that involve very minimal data transformations. My focus is on some kind of modeling, rather than data manipulation or transformation. This means that I want to give exactly the same block of data to the modeling functions. If, however, data transformation is the goal, then be sure that the pre-transformed/manipulated data is as close as possible to standard across tests of different versions of the package. (See this question for examples of memoization, cacheing, etc., that can be used to standardize or speed up non-focal computations. It references several packages by the OP.)
Repeat tests multiple times.
Scale the results relative to fixed benchmarks, e.g. the time to perform a linear regression, to sort a matrix, etc. This can allow for "local" or transient variations in infrastructure, such as may be due to I/O, the memory system, dependent packages, etc.
Examine the profiling output as vigorously as possible (see this question for some insights, also referencing tools from the OP).
Ideally, I'm looking for something that integrates with R CMD check that alerts me when I have introduced a significant performance regression in my code.
Unfortunately, I don't have an answer for this.
What is a good workflow in general?
For me, it's quite similar to general dynamic code testing: is the output (execution time in this case) reproducible, optimal, and transparent? Transparency comes from understanding what affects the overall time. This is where Mike Dunlavey's suggestions are important, but I prefer to go further, with a line profiler.
Regarding a line profiler, see my previous question, which refers to options in Python and Matlab for other examples. It's most important to examine clock time, but also very important to track memory allocation, number of times the line is executed, and call stack depth.
What other languages provide good tools?
Almost all other languages have better tools. :) Interpreted languages like Python and Matlab have the good & possibly familiar examples of tools that can be adapted for this purpose. Although dynamic analysis is very important, static analysis can help identify where there may be some serious problems. Matlab has a great static analyzer that can report when objects (e.g. vectors, matrices) are growing inside of loops, for instance. It is terrible to find this only via dynamic analysis - you've already wasted execution time to discover something like this, and it's not always discernible if your execution context is pretty simple (e.g. just a few iterations, or small objects).
As far as language-agnostic methods, you can look at:
Valgrind & cachegrind
Monitoring of disk I/O, dirty buffers, etc.
Monitoring of RAM (Cachegrind is helpful, but you could just monitor RAM allocation, and lots of details about RAM usage)
Usage of multiple cores
Is it something that can be built on top unit testing, or that is usually done separately?
This is hard to answer. For static analysis, it can occur before unit testing. For dynamic analysis, one may want to add more tests. Think of it as sequential design (i.e. from an experimental design framework): if the execution costs appear to be, within some statistical allowances for variation, the same, then no further tests are needed. If, however, method B seems to have an average execution cost greater than method A, then one should perform more intensive tests.
Update 1: If I may be so bold, there's another question that I'd recommend including, which is: "What are some gotchas in comparing the execution time of two versions of a package?" This is analogous to assuming that two programs that implement the same algorithm should have the same intermediate objects. That's not exactly true (see this question - not that I'm promoting my own questions, here - it's just hard work to make things better and faster...leading to multiple SO questions on this topic :)). In a similar way, two executions of the same code can differ in time consumed due to factors other than the implementation.
So, some gotchas that can occur, either within the same language or across languages, within the same execution instance or across "identical" instances, which can affect runtime:
Garbage collection - different implementations or languages can hit garbage collection under different circumstances. This can make two executions appear different, though it can be very dependent on context, parameters, data sets, etc. The GC-obsessive execution will look slower.
Cacheing at the level of the disk, motherboard (e.g. L1, L2, L3 caches), or other levels (e.g. memoization). Often, the first execution will pay a penalty.
Dynamic voltage scaling - This one sucks. When there is a problem, this may be one of the hardest beasties to find, since it can go away quickly. It looks like cacheing, but it isn't.
Any job priority manager that you don't know about.
One method uses multiple cores or does some clever stuff about how work is parceled among cores or CPUs. For instance, getting a process locked to a core can be useful in some scenarios. One execution of an R package may be luckier in this regard, another package may be very clever...
Unused variables, excessive data transfer, dirty caches, unflushed buffers, ... the list goes on.
The key result is: Ideally, how should we test for differences in expected values, subject to the randomness created due to order effects? Well, pretty simple: go back to experimental design. :)
When the empirical differences in execution times are different from the "expected" differences, it's great to have enabled additional system and execution monitoring so that we don't have to re-run the experiments until we're blue in the face.
The only way to do anything here is to make some assumptions. So let us assume an unchanged machine, or else require a 'recalibration'.
Then use a unit-test alike framework, and treat 'has to be done in X units of time' as just yet another testing criterion to be fulfilled. In other words, do something like
stopifnot( timingOf( someExpression ) < savedValue plus fudge)
so we would have to associate prior timings with given expressions. Equality-testing comparisons from any one of the three existing unit testing packages could be used as well.
Nothing that Hadley couldn't handle so I think we can almost expect a new package timr after the next long academic break :). Of course, this has to be either be optional because on a "unknown" machine (think: CRAN testing the package) we have no reference point, or else the fudge factor has to "go to 11" to automatically accept on a new machine.
A recent change announced on the R-devel feed could give a crude measure for this.
CHANGES IN R-devel UTILITIES
‘R CMD check’ can optionally report timings on various parts of the check: this is controlled by environment variables documented in ‘Writing R Extensions’.
See http://developer.r-project.org/blosxom.cgi/R-devel/2011/12/13#n2011-12-13
The overall time spent running the tests could be checked and compared to previous values. Of course, adding new tests will increase the time, but dramatic performance regressions could still be seen, albeit manually.
This is not as fine grained as timing support within individual test suites, but it also does not depend on any one specific test suite.
I'm teaching Java EE, especially JPA, Spring and Spring MVC. As I have not so much experience in large projects, it is difficult to know what to present to students about optimisation of ORM.
At the present time, I present some classic optimisation tricks:
prepared statements (most of ORM implicitely use them by default)
first and second-level caches
"write first, optimize later"
it is possible to switch off ORM and send SQL commands directly to the database for very frequent, specialized and costly requests
Is there any other point the community see about other way to optimize ORM ? I'm especially interested by DAO patterns...
From the point of developer, there are following optimization cases he must deal with:
Reduce chattiness between ORM and DB. Low chatiness is important, since each roundtrip between ORM and database implies network interaction, and thus its length varies between 0.1 and 1ms at least - independently of query complxity (note that may be 90% of queries are normally fairly simple). Particular case is SELECT N+1 problem: if processing of each row of some query result requires an additional query to be executed (so 1 + count(...) queries are executed at total), developer must try to rewrite the code in such a way that nearly constant count of queries is executed. CRUD sequence batching and future queries are other examples of optimization reducing the chattines (described below).
Reduce query complexity. Usually ORM is helpless here, so this is solely a developer's headache. But APIs allowing to execute SQL commands directly are frequently intended to be used also in this case.
So I can enlist few more optimizations:
Future queries: an API allowing to delay an execution of query until the moment when its result will be necessary. If there are several future queries scheduled at this moment, they're executed alltogether as a single batch. So the main benefit of this is reduction of # of roundtrip to database (= reduction of chattiness between ORM and DB). Many ORMs implement this, e.g. NHibernate.
CRUD sequence batching: nearly the same, but when INSERT, UPDATE and DELETE statements are batched together to reduce the chattines. Again, implemented by many ORM tools.
Combination of above two cases - so-called "generalized batching". AFAIK, so far this is implemented only by DataObjects.Net (the ORM my team works on).
Asynchronous generalized batching: if batch requires no immediate reply, it is executed asynchronously (but certainly, in sync with other batches sent by the same session, i.e. underlying connection is anyway used synchronously). Brings noticeable benefits when there are lots of CRUD statements: the code modifying persistent entities is executed in parallel with DB-side operation. AFAIK, no ORM implements this optimization so far.
All these cases fit under "write first, optimize later" rule (or "express intention first, optimize later").
Another well-known optimization-related API is prefetch API ("Prefetch paths"). The idea behind is to fetch a graph of objects that is expected to be processed further with minimal count of queries (or, better, in minimal time). So this API addresses "SELECT N+1" problem. Again, this part is normally expected to be implemented in any serious ORM product.
All above optimizations are safe from the point of transaction isolation - i.e. they don't break it. Caching-related optimizations normally aren't safe from this point: you must carefully configure caching to ensure you won't get stale objects when getting actual content is important (e.g. on security checks or on some real-time interaction). There are lots of techniques here, starting from usage of built-in caches in finishing with integration with distributed caches (memcached, etc.). Any approach solving the problem is good here; personally I would expect an open API allowing to integrate any with cache I prefer.
P.S. I'm a .NET fanboy, as well as one of DataObjects.Net and ORMeter.NET developers. So I don't know how exactly similar features are implemented in Java, but I'm familiar with the range of available solutions.
How about N+1 queries for collections? For example, see here: ORM Select n + 1 performance; join or no join
Regarding Spring and Spring MVC, you might find this interesting.
It's in C++, not Java, but it shows how to reduce UI source code w.r.t. Spring by an order of magnitude.
Lazy-loading is via proxys is probably one of the killer features in ORM's.
Additionally, Hibernate is also able to proxy out requests like object.collection.count and optimizes them, so instead of the whole collection being retrieved only a SELECT Count(*) is issues.
You mentioned the DAO pattern, but many in the JPA camp are saying the pattern is dead (I think the Hibernate guys have blogged on this, but I can't remember the link). Have a look at Spring Roo to see how they add the ORM-related functionality directly to the domain model via static methods.
For a set of JPA optimizations, and their resulting improvements, see,
http://java-persistence-performance.blogspot.com/2011/06/how-to-improve-jpa-performance-by-1825.html
My Question:
Performance tests are generally done after an application is integrated with various modules and ready for deploy.
Is there any way to identify performance bottlenecks during the development phase. Does code analysis throw any hints # performance?
It all depends on rules that you run during code analysis but I don't think that you can prevent performance bottlenecks just by CA.
From my expired it looks that performance problems are usually quite complicated and to find real problems you have to run performance tests.
No, except in very minor cases (eg for Java, use StringBuilder in a loop rather than string appends).
The reason is that you won't know how a particular piece of code will affect the application as a whole, until you're running the whole application with relevant dataset.
For example: changing bubblesort to quicksort wouldn't significantly affect your application if you're consistently sorting lists of a half-dozen elements. Or if you're running the sort once, in the middle of the night, and it doesn't delay other processing.
If we are talking .NET, then yes and no... FxCop (or built-in code analysis) has a number of rules in it that deal with performance concerns. However, this list is fairly short and limited in nature.
Having said that, there is no reason that FxCop could not be extended with a lot more rules (heuristic or otherwise) that catch potential problem areas and flag them. It's simply a fact that nobody (that I know of) has put significant work into this (yet).
Generally, no, although from experience I can look at a system I've never seen before and recognize some design approaches that are prone to performance problems:
How big is it, in terms of lines of code, or number of classes? This correlates strongly with performance problems caused by over-design.
How many layers of abstraction are there? Each layer is a chance to spend more cycles than necessary, and this effect compounds, especially if each operation is perceived as being "pretty efficient".
Are there separate data structures that need to be kept in agreement? If so, how is this done? If there is an attempt, through notifications, to keep the data structures tightly in sync, that is a red flag.
Of the categories of input information to the system, does some of it change at low frequency? If so, chances are it should be "compiled" rather than "interpreted". This can be a huge win both in performance and ease of development.
A common motif is this: Programmer A creates functions that wrap complex operations, like DB access to collect a good chunk of information. Programmer A considers this very useful to other programmers, and expects these functions to be used with a certain respect, not casually. Programmer B appreciates these powerful functions and uses them a lot because they get so much done with only a single line of code. (Programmers B and A can be the same person.) You can see how this causes performance problems, especially if distributed over multiple layers.
Those are the first things that come to mind.