Serializing objects as raw memory - performance

It seems that if it were possible to serialize data as the raw memory chunks that properties and fields are made up of, then it ought to be that much faster to communicate these objects to another system and the other system would only have to allocate memory for this memory and properly set reference pointers where they should go.
Yes, I know that's a little oversimplified and there are probably a plethora of reasons why it's difficult to do (like circular references). But I'm wondering if anyone has tried it and if there is a way to do it possibly with objects that meet certain restrictions?
One the one hand this is probably me just trying to micro-optimize, but on the other hand it really seems like this could be pretty useful in certain scenarios where performance is vital.

Obviously this kind of serialization is going to be faster than JSON any day (XML is slow by definition. In fact, I think that's what the L stands for. It was supposed to be XMS, but because it's so slow they missed the S and ended up with an L). However, I doubt it would beat efficient binary serializations such as Google's Protocol Buffers in real world scenarios.
If your serialized entities hold no references to other entities, and your memory layout on the two sides is exactly the same (same alignment, same order, etc...), you'll earn a little bit of performance by copying the memory buffer once, instead of doing so in chunks. However, the second you have to reconstruct references, memory copying is going to be trivial compared to looking up the referenced object. Copying memory is fast, especially when done in order, minimizing cache misses.

Things like normal memory addresses will completely break between serialization-deserialization. However if you're clever and careful you could device a mechanism where a data structure is serialized. Maybe translate addresses to offset-bytes-from-base?

Related

Performance loss through obfuscation?

I’ve read that good obfuscation techniques not merely do things like replacing method names with something obscure, but also, for instance, replace strings in the source code with byte arrays and add methods to convert those back to the original strings.
This might be one of those questions leading to opinion-based answers, but I’m going to ask it anyway: Is there any general notion how much performance loss an application would suffer from in case such an obfuscation method is applied? I’ve got in mind a software that is heavily leaning on a database, i.e., queries exist in the code, for instance, as C# strings or StringBuilder entities.
Yes, string obfuscation has a significant performance impact, at the micro-level. With obfuscation, instead of a direct memory lookup you have code that has to execute (every time), and it is usually somewhat complicated, so it is necessarily much worse at the micro-performance level.
However, that cost usually doesn't matter; the time required for the database call (or showing the UI dialog, or sending the error to a log, or network traffic, or ...) is going to be orders of magnitude higher than the cost of converting the string. In most cases, the cost of the conversion is essentially invisible.
As with everything, careful testing is wise, but usually the costs are only "visible" if you are accessing obfuscated strings in a tight loop that is already CPU-performance-sensitive.

Modern data structures

I just realized all the data structures I regularly use are really old and really simple. Linked lists, hash tables, trees, and even the more complex variants such as VLists or RBTrees are all pretty old inventions.
Most of them were conceived for a serial, single CPU world and require adapting to work in parallel environments.
What kind of newer, better data structures do we have? Why are they not widely used?
I understand using a plain old linked list if you have to implement it and prefer the simplicity, but having huge STLs and piles of third party libraries like Guava or Boost, why am I still placing locks around hashes?
Don't we have potentially standard, hard-proven modern data structures that can actually replace the trusty old-timers?
There is nothing wrong with the old ones. A good way to keep flexibility is to separate concerns. Normal (old style) datastructures are concerned with the way how data is stored. Locking is a completely different concern, which should not be part of the datastructure.
Locking is a potentially expensive operation, so if you can, you should lock multiple structures at once to optimize your code. I.e. lock critical sections not datastructures. If you directly add locking to your datastructures, you do not have the possibility to optimize this way. Also this will introduce implicit synchronisation points, that you may not want and canot control.
This does not answer a different aspect of your question. The part of "Why do we need locking at all". The answer is, that sometimes there is just no way around it. You either need to have lock somewhere, completely rely on atomic operations or disallow mutation altogether.
Method one is not feasible, as I have pointed out above, because you loose potential for optimization and you have implicit synchronisation points.
Only using atomic operations in your data structure (i.e. non-locking structures) is still an open research question, and might not always be possible. I know of some non-locking structures, i.e. queues, lists etc, but I have never heard of a non locking tree. Also non-locking structures tend to become much more complicated and slower, so we still need some better structure for thread local data, and can only add these to our datastructure zoo.
Not having mutable datastructures at all is in my opinion the best way of all of them. Mutability is often more of a hassle than it is worth. However this is a concept from functional programming and only makes sense in such an environment. Functional programming however is regarded as an esoteric concept by most programmers. Most languages which are actually used in production work mainly use non-functional concepts (this does not mean it actually is more complicated or is more error prone, it is just reflecting the current state of training among developers). In my opinion functional programming will become more wide spread, once people start to note it solves their threading problems automatically in a blink. Several other languages are now borrowing already from functional languages, so this is probably where we will find the next evolution of data structures.
If you want lock-free data structures, study persistent data structures. These are mostly popular in the functional programming world, but are applicable in other domains as well. Most persistent DSs are variants of plain lists, trees etc. but newer ones such as hash tries have surfaced in recent years.

Why don't Haskell compilers facilitate deterministic memory management?

With the wealth of type information available why can't Haskell runtimes avoid running GC to clean up? It should be possible to figure out all usages and insert appropriate calls to alloc/release in the compiled code, right? This would avoid the overhead of a runtime GC.
It is sensible to ask whether functional programming languages can do less GC by tracking usage. Although the general problem of whether some data can safely be discarded is undecidable (because of conditional branching), it's surely plausible to work harder statically and find more opportunities for direct deallocation.
It's worth paying attention to the work of Martin Hofmann and the team on the Mobile Resource Guarantees project, who made type-directed memory (de/re)allocation a major theme. The thing that makes their stuff work, though, is something Haskell doesn't have in its type system --- linearity. If you know that a function's input data are secret from the rest of the computation, you can reallocate the memory they occupy. The MRG stuff is particularly nice because it manages a realistic exchange rate between deallocation for one type and allocation for another which turns into good old-fashioned pointer-mangling underneath a purely functional exterior. In fact, lots of lovely parsimonious mutation algorithms (e.g. pointer-reversing traversal, overwrite-the-tail-pointer construction, etc) can be made to look purely functional (and checked for nasty bugs) using these techniques.
In effect, the linear typing of resources gives a conservative but mechanically checkable approximation to the kind of usage analysis that might well help to reduce GC. Interesting questions then include how to mix this treatment cleanly (deliberate adverb choice) with the usual persistent deal. It seems to me that quite a lot of intermediate data structures has an initial single-threaded phase in recursive computation, before being either shared or dropped when the computation finishes. It may be possible to reduce the garbage generated by such processes.
TL;DR There are good typed approaches to usage analysis which cut GC, but Haskell has the wrong sort of type information just now to be particularly useful for this purpose.
Region-based memory management is what programmers in C and C++ often end up programming by hand: Allocate a chunk of memory ("region", "arena", etc.), allocate the individual data in it, use them and eventually deallocate the whole chunk when you know none of the individual data are needed any more. Work in the 90s by Tofte, Aiken, and others (incl. yours truly, with our respective colleagues), has shown that it is possible to statically infer region allocation and deallocation points automatically ("region inference") in such a way as to guarantee that chunks are never freed too early and, in practice, early enough to avoid having too much memory being held for long after the last data in it was needed. The ML Kit for Regions, for example, is a full Standard ML compiler based on region inference. In its final version it is combined with intra-region garbage collection: If the static inference shows there is a long-living region, use garbage collection inside it. You get to have your cake and eat it, too: You have garbage collection for long living data, but a lot of data is managed like a stack, even though it would end in a heap ordinarily.
Consider the following pseudo-code:
func a = if some_computation a
then a
else 0
To know whether a is garbage after calling func a, the compiler has to be able to know the result of some_computation a. If it could do that in the general case (which requires solving the halting problem), thrn there'd be no need to emit code for this function at all, let alone garbage collect it. Type information is not sufficient.
It's not easy to determine object lifetime with lazy evaluation. The JHC compiler does have (had?) region memory management which tries to release memory by deallocation when the lifetime is over.
I'm also curious exactly what you mean by deterministic memory management.
Type information has mostly to do with compile time where as memory management is a runtime thing, so I don't think they are related to each other.

Overhead of memory allocator

I've been creating a temporary object stack- mainly for the use of heap-based STL structures which only actually have temporary lifetimes, but any other temporary dynamically sized allocation too. The one stack performs all types- storing in an unrolled linked list.
I've come a cropper with alignment. I can get the alignment with std::alignment_of<T>, but this isn't really great, because I need the alignment of the next type I want to allocate. Right now, I've just arbitrarily sized each object at a multiple of 16, which as far as I know, is the maximal alignment for any x86 or x64 type. But now, I'm having two pointers of memory overhead per object, as well as the cost of allocating them in my vector, plus the cost of making every size round up to a multiple of 16.
On the plus side, construction and destruction is fast and reliable.
How does this compare to regular operator new/delete? And, what kind of test suites can I run? I'm pretty pleased with my current progress and don't want to find out later that it's bugged in some nasty subtle fashion, so any advice on testing the operations would be nice.
This doesn't really answer your question, but Boost has just recently added a memory pool library in the most recent version.
It may not be exactly what you want, but there is a thorough treatment of alignment which might spark an idea? If the docs are not enough, there is always the source code.

What are your strategies to keep the memory usage low?

Ruby is truly memory-hungry - but also worth every single bit.
What do you do to keep the memory usage low? Do you avoid big strings and use smaller arrays/hashes instead or is it no problem to concern about for you and let the garbage collector do the job?
Edit: I found a nice article about this topic here - old but still interesting.
I've found Phusion's Ruby Enterprise Edition (a fork of mainline Ruby with much-improved garbage collection) to make a dramatic difference in memory usage... Plus, they've made it extraordinarily easy to install (and to remove, if you find the need).
You can find out more and download it on their website.
I really don't think it matters all that much.
Making your code less readable in order to improve memory consumption is something you should only ever do if you need it. And by need, I mean have a specific case for the performance profile and specific metrics that indicate that any change will address the issue.
If you have an application where memory is going to be the limiting factor, then Ruby may not be the best choice. That said, I have found that my Rails apps generally consume about 40-60mb of RAM per Mongrel instance. In the scheme of things, this isn't very much.
You might be able to run your application on the JVM with JRuby - the Ruby VM is currently not as advanced as the JVM for memory management and garbage collection. The 1.9 release is adding many improvements and there are alternative VM's under development as well.
Choose date structures that are efficient representations, scale well, and do what you need.
Use algorithms that work using efficient data structures rather than bloated, but easier ones.
Look else where. Ruby has a C bridge and its much easier to be memory conscious in C than in Ruby.
Ruby developers are quite lucky since they don’t have to manage the memory themselves.
Be aware that ruby allocates objects, for instance something as simple as
100.times{ 'foo' }
allocates 100 string objects (strings are mutable and each version requires its own memory allocation).
Make sure that if you are using a library allocating a lot of objects, that other alternatives are not available and your choice is worth paying the garbage collector cost. (you might not have a lot of requests/s or might not care for a few dozen ms per requests).
Creating a hash object really allocates more than an object, for instance
{'joe' => 'male', 'jane' => 'female'}
doesn’t allocate 1 object but 7. (one hash, 4 strings + 2 key strings)
If you can use symbol keys as they won’t be garbage collected. However because they won’t be garbage collected you want to make sure to not use totally dynamic keys like converting the username to a symbol, otherwise you will ‘leak’ memory.
Example: Somewhere in your app, you apply a to_sym on an user’s name like :
hash[current_user.name.to_sym] = something
When you have hundreds of users, that’s could be ok, but what is happening if you have one million of users ? Here are the numbers :
ruby-1.9.2-head >
# Current memory usage : 6608K
# Now, add one million randomly generated short symbols
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s).to_sym }
# Current memory usage : 153M, even after a Garbage collector run.
# Now, imagine if symbols are just 20x longer than that ?
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s * 20).to_sym }
# Current memory usage : 501M
Be aware to never convert non controlled arguments in symbol or check arguments before, this can easily lead to a denial of service.
Also remember to avoid nested loops more than three levels deep because it makes the maintenance difficult. Limiting nesting of loops and functions to three levels or less is a good rule of thumb to keep the code performant.
Here are some links in regards:
http://merbist.com
http://blog.monitis.com
When deploying a Rails/Rack webapp, use REE or some other copy-on-write friendly interpreter.
Tweak the garbage collector (see https://www.engineyard.com/blog/tuning-the-garbage-collector-with-ruby-1-9-2 for example)
Try to cut down the number of external libraries/gems you use since additional code uses memory.
If you have a part of your app that is really memory-intensive then it's maybe worth rewriting it in a C extension or completing it by invoking other/faster/better optimized programs (if you have to process vast amounts of text data, maybe you can replace that code with calls to grep, awk, sed etc.)
I am not a ruby developer but I think some techniques and methods are true of any language:
Use the minimum size variable suitable for the job
Destroy and close variables and connections when not in use
However if you have an object you will need to use many times consider keeping it in scope
Any loops with manipulations of a big string dp the work on a smaller string and then append to bigger string
Use decent (try catch finally) error handling to make sure objects and connections are closed
When dealing with data sets only return the minimum necessary
Other than in extreme cases memory usage isn't something to worry about. The time you spend trying to reduce memory usage will buy a LOT of gigabytes.
Take a look at Small Memory Software - Patterns for Systems with Limited Memory. You don't specify what sort of memory constraint, but I assume RAM. While not Ruby-specific, I think you'll find some useful ideas in this book - the patterns cover RAM, ROM and secondary storage, and are divided into major techniques of small data structures, memory allocation, compression, secondary storage, and small architecture.
The only thing we've ever had which has actually been worth worrying about is RMagick.
The solution is to make sure you're using RMagick version 2, and call Image#destroy! when you're done using your image
Avoid code like this:
str = ''
veryLargeArray.each do |foo|
str += foo
# but str << foo is fine (read update below)
end
which will create each intermediate string value as a String object and then remove its only reference on the next iteration. This junks up the memory with tons of increasingly long strings that have to be garbage collected.
Instead, use Array#join:
str = veryLargeArray.join('')
This is implemented in C very efficiently and doesn't incur the String creation overhead.
UPDATE: Jonas is right in the comment below. My warning holds for += but not <<.
I'm pretty new at Ruby, but so far I haven't found it necessary to do anything special in this regard (that is, beyond what I just tend to do as a programmer generally). Maybe this is because memory is cheaper than the time it would take to seriously optimize for it (my Ruby code runs on machines with 4-12 GB of RAM). It might also be because the jobs I'm using it for are not long-running (i.e. it's going to depend on your application).
I'm using Python, but I guess the strategies are similar.
I try to use small functions/methods, so that local variables get automatically garbage collected when you return to the caller.
In larger functions/methods I explicitly delete large temporary objects (like lists) when they are no longer needed. Closing resources as early as possible might help too.
Something to keep in mind is the life cycle of your objects. If you're objects are not passed around that much, the garbage collector will eventually kick in and free them up. However, if you keep referencing them it may require some cycles for the garbage collector to free them up. This is particularly true in Ruby 1.8, where the garbage collector uses a poor implementation of the mark and sweep technique.
You may run into this situation when you try to apply some "design patterns" like decorator that keep objects in memory for a long time. It may not be obvious when trying example in isolation, but in real world applications where thousands of objects are created at the same time the cost of memory growth will be significant.
When possible, use arrays instead of other data structures. Try not to use floats when integers will do.
Be careful when using gem/library methods. They may not be memory optimized. For example, the Ruby PG::Result class has a method 'values' which is not optimized. It will use a lot of extra memory. I have yet to report this.
Replacing malloc(3) implementation to jemalloc will immediately decrease your memory consumption up to 30%. I've created 'jemalloc' gem to achieve this instantly.
'jemalloc' GEM: Inject jemalloc(3) into your Ruby app in 3 min
I try to keep arrays & lists & datasets as small as possible. The individual object do not matter much, as creation and garbage collection is pretty fast in most modern languages.
In the cases you have to read some sort of huge dataset from the database, make sure to read in a forward/only manner and process it in little bits instead og loading everything into memory first.
dont use a lot of symbols, they stay in memory until the process gets killed.. this because symbols never get garbage collected.

Resources