Custom allocator for LDLT - eigen

Is there a way to supply raw buffers for the LDLT class in the same way that raw buffers can be used for the Matrix classes (like https://eigen.tuxfamily.org/dox/group__TutorialMapClass.html)?
Even if I do an in-place LDLT on a mapped matrix, then the transpositions will be allocated on the regular C++ heap.
The issue is in the application I'm working on we need to use a custom allocator.
If there's not a supported way of doing this, what might be the easiest way to work around this? Could I subclass LDLT and just take over the allocation of the different heap variables? That might be easier to maintain than trying to pull the data out of the LDLT class and reproducing the solve code elsewhere.
Thanks!

Related

Can a Swift object have arbitrary inline storage in the instance?

For example, an immutable CFString can store the length and the character data in the same block of memory. And, more generally, there is NSAllocateObject(), which lets you specify extra bytes to be allocated after the object’s ivars. The amount of storage is determined by the particular instance rather than being fixed by the class. This reduces memory use (one allocation instead of two) and improves locality of reference. Is there a way to do this with Swift?
A rather later reply. 😄 NSAllocateObject() is now deprecated for some reason. However, NSAllocateObject() is really a wrapper around class_createInstance which is not deprecated. So, in principle, one could use this to allocate extra bytes for an object instance.
I can't see why this wouldn't work in Swift. But accessing the extra storage would be messy because you'd have to start fooling around with unsafe pointers and the like. Moreover, if you're not the author of the original class, then you risk conflicting with Apple's ivars, especially in cases where you might be dealing with a class cluster which could potentially have a number of different instance sizes, according to the specific concrete implementation.
I think a safter approach would be to make use of objc_setAssociatedObject and objc_getAssociatedObject, which are accessible in Swift. E.g. Is there a way to set associated objects in Swift?

There is a fast way to use Clojure vectors as matrices?

I am trying to use Clojure to process images and I would like to represent images using Clojure data structures. Basically, my first approach was using a vector of vectors and mapv to operate over each pixel value and return a new image representation with the same data structure. However, some basic operations are taking too much time.
Using Jvisual profiler, I got the results showed below. Somebody could give me a tip to improve the performance? I can give more details if it is necessary, but maybe just looking at the costs of seq and next someone can have a good guess.
You should check out core.matrix and associated libraries for anything to do with matrix computation. core.matrix is a general purpose Clojure API for matrix computation, supporting multiple back-end implementations.
Clojure's persistent data structures are great for most purposes, but are really not suited for fast processing of large matrices. The main problems are:
Immutability: usually a good thing, but can be a killer for low level code where you need to do things like accumulate results in a mutable array for performance reasons.
Boxing: Clojure data structures generally box results (as java.lang.Double etc.) which adds a lot of overhead compared to using primitives
Sequences: traversing most Clojure data structures as sequences involved the creation of temporary heap objects to hold the sequence elements. Normally not a problem, but when you are dealing with large matrices it becomes problematic.
The related libraries that you may want to look at are:
vectorz-clj : a very fast matrix library that works as a complete core.matrix implementation. The underlying code is pure Java but with a nice Clojure wrapper. I believe it is currently the fastest way of doing general purpose matrix computation in Clojure without resorting to native code. Under the hood, it uses arrays of Java primitives, but you don't need to deal with this directly.
Clatrix: another fast matrix library for Clojure which is also a core.matrix implementation. Uses JBLAS under the hood.
image-matrix : represents a Java BufferedImage as a core.matrix implementation, so you can perform matrix operations on images. A bit experimental right now, but should work for basic use cases
Clisk : a library for procedural image processing. Not so much a matrix library itself, but very useful for creating and manipulating digital images using a Clojure-based DSL.
Depending on what you want to do, the best approach may be to use image-matrix to convert the images into vectorz-clj matrices and do your processing there. Alternatively, Clisk might be able to do what you want out of the box (it has a lot of ready-made filters / distortion effects etc.)
Disclaimer: I'm the lead developer for most of the above libraries. But I'm using them all myself for serious work, so very willing to vouch for their usefulness and help fix any issues you find.
I really think that you should use arrays of primitives for this. Clojure has array support built-in, even though it's not highlighted, and it's for cases just like this, where you have a high volume of numerical data.
Any other approach, vectors, even java collections will result in all of your numbers being boxed individually, which is very wasteful. Arrays of primitives (int, double, byte, whatever is appropriate) don't have this problem, and that's why they're there. People feel shy about using arrays in clojure, but they're there for a reason, and this is it. And it'll be good protable clojure code -- int-array works in both jvm clojure and clojure-script.
Try arrays and benchmark.
Clojure's Transients offer a middle ground between full persistence and no persistence like you would get with a standard java array. This allows you to build the image using fast mutate-in-place opperations (which are limited to the current thread) and then call persistent! to convert it in constante time to a proper persistent structure for manipulation in the rest of the program
It looks like you are also seeing a lot of overhead from working with sequences over the contents of the image, if transients don't make enough of a difference you may want to next consider using normal java arrays and structure the accesses to directly access the array elements.

Java code ported to Objective-C is very slow

I want to illustrate a concrete example to understand if there are best (and worst) practices when java code is rewritten in Objective-C.
I've ported the Java implementation of org.apache.wicket.util.diff.myers to Objective-C on OSX Snow Leopard (Xcode 4) but the result runs very slowly compared to the Java version.
The method with worst performances is buildPath, it mainly does
sparse array access (diagonal variable, this array is allocated inside method and isn't returned)
random array access (orig and rev variables)
allocation of PathNode and its subclasses (an object with three properties, only property is an element using internally by array)
strings comparison
Cocoa hasn't any collection class to work easily with sparse arrays so I've allocated an array with malloc, this dramatically improved the first version based on NSDictionary and zillion of NSNumber's object allocated to be used as key.
The PathNode(s) allocation is done using the normal syntax [[MyClass alloc] init], they aren't autoreleased because are added to an NSMutableArray (but are released immediately after adding it to array)
Random access to array is done using [NSArray objectAtIndex:index] I think (but I can be wrong) that moving it to an C-like doesn't speedup so much.
Do you have any idea to improve performance, where bottlenecks can be found?
Using instruments 74% of time is spent on allocation, how can I improve allocation?
EDIT I've submitted my actual implementation to github, obviously is an alpha version not ready for production and doesn't use any efficient objective-c construct
You're off to an excellent start. You've profiled the code, isolated the actual bottleneck, and are now focused on how to address it.
The first question is which allocation is costly? Obviously you should focus on that one first.
There are several efficient ways to deal with sparse arrays. First, look at NSPointerArray, which is designed to hold NULL values. It does not promise to be efficient for sparse arrays, but #bbum (who knows such things) suggests it is.
Next, look at NSHashMap, which is certainly efficient for sparse collections (it's a dictionary), and supports non-object keys (i.e. you don't need to create an NSNumber).
Finally, if allocation really is your problem, there are various tricks to work around it. The most common is to reuse objects rather than destroying one and creating another. This is how UITableViewCell works (and NSCell in a different way).
Finally, if you switch to Core Foundation objects, you can create your own specialized memory allocator, but that really is a last resort.
Note that 10.6 supports ARC (without zeroing weak references). ARC dramatically improves performance around a lot of common memory management patterns. For example, the very common pattern of "retain+autorelease+return" is highly optimized under ARC. ("retain" doesn't exist in the language in ARC, but it does still exist in the compiler, and ARC is much faster than doing by hand.) I highly recommend switching to ARC in any code you can.
You can use the NSPointerArray class as a replacement for your sparse array. NSPointerArray allows null elements.
If you post the code thats generating the bulk of your allocations, we might be able to help you more.

Overhead of memory allocator

I've been creating a temporary object stack- mainly for the use of heap-based STL structures which only actually have temporary lifetimes, but any other temporary dynamically sized allocation too. The one stack performs all types- storing in an unrolled linked list.
I've come a cropper with alignment. I can get the alignment with std::alignment_of<T>, but this isn't really great, because I need the alignment of the next type I want to allocate. Right now, I've just arbitrarily sized each object at a multiple of 16, which as far as I know, is the maximal alignment for any x86 or x64 type. But now, I'm having two pointers of memory overhead per object, as well as the cost of allocating them in my vector, plus the cost of making every size round up to a multiple of 16.
On the plus side, construction and destruction is fast and reliable.
How does this compare to regular operator new/delete? And, what kind of test suites can I run? I'm pretty pleased with my current progress and don't want to find out later that it's bugged in some nasty subtle fashion, so any advice on testing the operations would be nice.
This doesn't really answer your question, but Boost has just recently added a memory pool library in the most recent version.
It may not be exactly what you want, but there is a thorough treatment of alignment which might spark an idea? If the docs are not enough, there is always the source code.

Gc using type information

Does anyone know of a GC algorithm which utilises type information to allow incremental collection, optimised collection, parallel collection, or some other nice feature?
By type information, I mean real semantics. Let me give an example: suppose we have an OO style class with methods to maintain a list which hide the representation. When the object becomes unreachable, the collector can just run down the list deleting all the nodes. It knows they're all unreachable now, because of encapsulation. It also knows there's no need to do a general scan of the nodes for pointers, because it knows all the nodes are the same type.
Obviously, this is a special case and easily handled with destructors in C++. The real question is whether there is way to analyse types used in a program, and direct the collector to use the resulting information to advantage. I guess you'd call this a type directed garbage collector.
The idea of at least exploiting containers for garbage collection in some way is not new, though in Java, you cannot generally assume that a container holds the only reference to objects within it, so your approach will not work in that context.
Here are a couple of references. One is for leak detection, and the other (from my research group) is about improving cache locality.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4814126
http://www.cs.umass.edu/~emery/pubs/06-06.pdf
You might want to visit Richard Jones's extensive garbage collection bibliography for more references, or ask the folks on gc-list.
I don't think it has anything to do with a specific algorithm.
When the GC computes the graph of objects relationship, the information that a Collection object is sole responsible for those elements of the list is implicitly present in the graph if the compiler was good enough to extract it.
Whatever the GC algorithm chosen: the information depends more on how the compiler/runtime will extract this information.
Also, I would avoid C and C++ with GC. Because of pointer arithmetic, aliasing and the possibility to point within an object (reference on a data member or in an array), it's incredibly hard to perform accurate garbage collection in these languages. They have not been crafted for it.

Resources