For example, an immutable CFString can store the length and the character data in the same block of memory. And, more generally, there is NSAllocateObject(), which lets you specify extra bytes to be allocated after the object’s ivars. The amount of storage is determined by the particular instance rather than being fixed by the class. This reduces memory use (one allocation instead of two) and improves locality of reference. Is there a way to do this with Swift?
A rather later reply. 😄 NSAllocateObject() is now deprecated for some reason. However, NSAllocateObject() is really a wrapper around class_createInstance which is not deprecated. So, in principle, one could use this to allocate extra bytes for an object instance.
I can't see why this wouldn't work in Swift. But accessing the extra storage would be messy because you'd have to start fooling around with unsafe pointers and the like. Moreover, if you're not the author of the original class, then you risk conflicting with Apple's ivars, especially in cases where you might be dealing with a class cluster which could potentially have a number of different instance sizes, according to the specific concrete implementation.
I think a safter approach would be to make use of objc_setAssociatedObject and objc_getAssociatedObject, which are accessible in Swift. E.g. Is there a way to set associated objects in Swift?
Related
Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
In other words, are Rust-style ownership and lifetimes and Rust-style borrow checking two separable concepts? Alternatively, are these two ideas inherently entangled at a semantic level?
Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
A language can do anything so sure.
The problem is that dropping this requirement would be an UB-nest in a language like Rust: if you drop mutable references being unique then they have no purpose so you just have references (always mutable) and the only thing they do is be lexically scoped, this means you can hold a reference to a sub-part of an object, and have a second reference mutate the object such that the sub-part is invalidated (e.g. a reference to a vec item and clearing the vec[0]), and the first reference is now dangling, it points to garbage.
The way to solve that would be to… add a GC? And from that point on the value of "rust-style ownership and references" becomes… limited to nonexistent, because you need a GC non-lexical automated memory management and your references can keep objects alive so having all types be affine by default isn't very useful.
Now what can be useful (and what some languages explore) is for sub-normal types to be opt-in, so types would be normal by default but could be opted into being affine, linear, or even ordered, on a needs basis. This would be solely a type safety measure.
If so, are there any existing languages which achieve this?
Not to my knowledge.
If not, why not?
Because nobody's written one? Affine types by default are useful to Rust but they're not super useful in general so most of the research and design has focused around linear types, which provide more guarantees and are therefore more useful if only a small subset of your types are going to be sub-normal.
[0] which shows that "data races" are not solely about concurrency, it's an entire class of problems which occur commonly in sequential code (e.g. iterator invalidation)
This page has been quite confusing for me.
It says:
Memory management in newLISP does not rely on a garbage collection algorithm. Memory is not marked or reference-counted. Instead, a decision whether to delete a newly created memory object is made right after the memory object is created.
newLISP follows a one reference only (ORO) rule. Every memory object not referenced by a symbol is obsolete once newLISP reaches a higher evaluation level during expression evaluation. Objects in newLISP (excluding symbols and contexts) are passed by value copy to other user-defined functions. As a result, each newLISP object only requires one reference.
Further down, I see:
All lists, arrays and strings are passed in and out of built-in functions by reference.
I can't make sense of these two.
How can newLISP "not rely on a garbage collection algorithm", and yet pass things by reference?
For example, what would it do in the case of circular references?!
Is it even possible for a LISP to not use garbage collection, without making performance go down the drain? (I assume you could always pass things by value, or you could always perform a full-heap scan whenever you think it might be necessary, but then it seems to me like that would insanely hurt your performance.)
If so, how would it deal with circular references? If not, what do they mean?
Perhaps reading http://www.newlisp.org/ExpressionEvaluation.html helps understanding the http://www.newlisp.org/MemoryManagement.html paper better. Regarding circular references: they do not exist in newLISP, there is no way to create them. The performance question is addressed in a sub chapter of that memory management paper and here: http://www.newlisp.org/benchmarks/
May be working and experimenting with newLISP - i.e. trying to create a circular reference - will clear up most of the questions.
I want to illustrate a concrete example to understand if there are best (and worst) practices when java code is rewritten in Objective-C.
I've ported the Java implementation of org.apache.wicket.util.diff.myers to Objective-C on OSX Snow Leopard (Xcode 4) but the result runs very slowly compared to the Java version.
The method with worst performances is buildPath, it mainly does
sparse array access (diagonal variable, this array is allocated inside method and isn't returned)
random array access (orig and rev variables)
allocation of PathNode and its subclasses (an object with three properties, only property is an element using internally by array)
strings comparison
Cocoa hasn't any collection class to work easily with sparse arrays so I've allocated an array with malloc, this dramatically improved the first version based on NSDictionary and zillion of NSNumber's object allocated to be used as key.
The PathNode(s) allocation is done using the normal syntax [[MyClass alloc] init], they aren't autoreleased because are added to an NSMutableArray (but are released immediately after adding it to array)
Random access to array is done using [NSArray objectAtIndex:index] I think (but I can be wrong) that moving it to an C-like doesn't speedup so much.
Do you have any idea to improve performance, where bottlenecks can be found?
Using instruments 74% of time is spent on allocation, how can I improve allocation?
EDIT I've submitted my actual implementation to github, obviously is an alpha version not ready for production and doesn't use any efficient objective-c construct
You're off to an excellent start. You've profiled the code, isolated the actual bottleneck, and are now focused on how to address it.
The first question is which allocation is costly? Obviously you should focus on that one first.
There are several efficient ways to deal with sparse arrays. First, look at NSPointerArray, which is designed to hold NULL values. It does not promise to be efficient for sparse arrays, but #bbum (who knows such things) suggests it is.
Next, look at NSHashMap, which is certainly efficient for sparse collections (it's a dictionary), and supports non-object keys (i.e. you don't need to create an NSNumber).
Finally, if allocation really is your problem, there are various tricks to work around it. The most common is to reuse objects rather than destroying one and creating another. This is how UITableViewCell works (and NSCell in a different way).
Finally, if you switch to Core Foundation objects, you can create your own specialized memory allocator, but that really is a last resort.
Note that 10.6 supports ARC (without zeroing weak references). ARC dramatically improves performance around a lot of common memory management patterns. For example, the very common pattern of "retain+autorelease+return" is highly optimized under ARC. ("retain" doesn't exist in the language in ARC, but it does still exist in the compiler, and ARC is much faster than doing by hand.) I highly recommend switching to ARC in any code you can.
You can use the NSPointerArray class as a replacement for your sparse array. NSPointerArray allows null elements.
If you post the code thats generating the bulk of your allocations, we might be able to help you more.
Does anyone know of a GC algorithm which utilises type information to allow incremental collection, optimised collection, parallel collection, or some other nice feature?
By type information, I mean real semantics. Let me give an example: suppose we have an OO style class with methods to maintain a list which hide the representation. When the object becomes unreachable, the collector can just run down the list deleting all the nodes. It knows they're all unreachable now, because of encapsulation. It also knows there's no need to do a general scan of the nodes for pointers, because it knows all the nodes are the same type.
Obviously, this is a special case and easily handled with destructors in C++. The real question is whether there is way to analyse types used in a program, and direct the collector to use the resulting information to advantage. I guess you'd call this a type directed garbage collector.
The idea of at least exploiting containers for garbage collection in some way is not new, though in Java, you cannot generally assume that a container holds the only reference to objects within it, so your approach will not work in that context.
Here are a couple of references. One is for leak detection, and the other (from my research group) is about improving cache locality.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4814126
http://www.cs.umass.edu/~emery/pubs/06-06.pdf
You might want to visit Richard Jones's extensive garbage collection bibliography for more references, or ask the folks on gc-list.
I don't think it has anything to do with a specific algorithm.
When the GC computes the graph of objects relationship, the information that a Collection object is sole responsible for those elements of the list is implicitly present in the graph if the compiler was good enough to extract it.
Whatever the GC algorithm chosen: the information depends more on how the compiler/runtime will extract this information.
Also, I would avoid C and C++ with GC. Because of pointer arithmetic, aliasing and the possibility to point within an object (reference on a data member or in an array), it's incredibly hard to perform accurate garbage collection in these languages. They have not been crafted for it.
If one has a big structure, having lot of member variables.
Some function needs to access 4-5 elements in the structure for its working, so which of the below scenario could be cache effective(less cache misses)-
1.) Pass the pointer to the structure as argument to the function, Which in turn will access the needed elements.(Assume that the elements are not continuous in the structure declaration and they are apart)
2.) Pass individual structure member variables as argument to the function.
In first place, Does this scenario affect performance of the code from cache perspective in first place?
Thanks.
-AD
Ignoring cache issues, passing a pointer will always be fastest, as there is no overhead of copying the interesting fields.
Well ... If the members accessed are many cache lines apart, then it would probably help to get them all collected (on the stack, or even in registers if possible) as arguments, if the function does many accesses. If not, the extra overhead of reading out the arguments and setting up the call might eat up the benefit.
I think this is a micro-optimization, and that you should profile both cases, and then document any change to the code that you do as a result of said profiling (since it won't be obvious to the casual observer, later on).
A memory access is a memory access. It doesn't matter whether it happens in the caller or the callee. Ignoring the cache, there are several reasons to pass a pointer (pass by reference).
Separation of concerns dictates that the callee should decide what it wants to access.
Passing more parameters may increase pressure on the register file and/or cause more accesses to the stack.
Passing a single argument is more readable than several. (Arguably related to separation of concerns.)
The only way to improve cache performance is to improve locality. Arrange the variables to be consecutive in the struct (or whatever) definition. Arrange the algorithm to access each structure only once. If these aren't simple changes to make, and the program is cache bound, then performance will just take that much programming effort.