In this video(Escape from the Ivory Tower - The Haskell Journey), Simon Peyton Jones says that making Haskell Lazy helped them with resource-constraints on the machines they had at the time. It also led to a whole lot of other benefits with laziness.
Then he said that the challenge they have now is that laziness impacts on performance.
My question is: Why did making Haskell lazy have an impact on performance?
If you're not going to use the result of something, then storing it lazily and then never executing it is more efficient than uselessly executing it. That much is immediately obvious.
However, if you are going to execute it, then storing it lazily and then executing it later is less efficient than just executing it now. There's more indirection involved. It takes time to note down all the details you need for the execution, and it takes time to load them all back when you realise you actually need to execute.
This is particularly the case with something like adding two machine-width integers. If your operands are already in CPU registers, then adding them immediately is a single machine instruction. Instead, we laboriously put all that stuff onto the heap, and then fetch it back later (quite possibly with a bunch of cache misses and pipeline stalls).
On top of that, sometimes a computation isn't all that expensive, and produces a small result, but the details we need to store to run the computation later are quite large. The canonical example is totaling up a list. The result might be a single 32-bit integer, but the list to be totaled could be huge! All that extra work for the garbage collector to manage this data that might otherwise be dead objects that could be deallocated.
In general, laziness used right can result in massive performance gains, but laziness used wrong results in appalling performance disasters. And laziness can be very tricky to reason about; this stuff isn't easy. With experience, you do gradually get used to it though.
I am reading this article about how to program in Ruby in Functional Style.
https://code.google.com/p/tokland/wiki/RubyFunctionalProgramming
One of the examples that took my attention is the following:
# No (mutable):
output = []
output << 1
output << 2 if i_have_to_add_two
output << 3
# Yes (immutable):
output = [1, (2 if i_have_to_add_two), 3].compact
Whereas the "mutable" option is less safe because we changing the value of the array, the inmutable one seems less efficient because it calls .compact. That means that it has to iterate the array to return a new one without the nil's values.
In that case, which option is preferible? And in general how do you choose between immutability (functional) vs performance (in the case when the imperative solution is faster)?
You're not wrong. It is often the case that a purely functional solution will be slower than a destructive one. Immutable values generally mean a lot more allocation has to go on unless the language is very well optimized for them (which Ruby isn't).
However, it doesn't often matter. Worrying about the performance of specific operations is not a great use of your time 99% of the time. Shaving off a microsecond from a piece of code that runs 100 times a second is simply not a win.
The best approach is usually to do whatever makes your code cleanest. Very often this means taking advantage of the functional features of the language — for example, map and select instead of map! and keep_if. Then, if you need to speed things up, you have a nice, clean codebase that you can make changes to without fear that your changes will make one piece of code stomp over another piece's data.
I have a large dataset from an analytics provider.
It arrives in JSON and I parse it into a hash, but due to the size of the set I'm ballooning to over a gig in memory usage. Almost everything starts as strings (a few values are numerical), and while of course the keys are duplicated many times, many of the values are repeated as well.
So I was thinking, why not symbolize all the (non-numerical) values, as well?
I've found some discusion of potential problems, but I figure it would be nice to have a comprehensive description for Ruby, since the problems seem dependent on the implementation of the interning process (what happens when you symbolize a string).
I found this talking about Java:
Is it good practice to use java.lang.String.intern()?
The interning process can be expensive
Interned strings are never de-allocated, resulting in a memory leak
(Except there's some contention on that last point.)
So, can anyone give a detailed explanation of when not to intern strings in Ruby?
When the list of things in question is an open set (i.e., dynamic, has no fixed inventory), you should not convert them into symbols. Each symbol created will not be garbage collected, and will cause memory leak.
When the list of things in question is a closed set (i.e., static, has a fixed inventory), you should better convert them into symbols. Each symbol will be created only once, and will be reused. That will save memory.
The interning process can be expensive
there is always a tradeoff between memory and computing power we have to choose. so try some best practices out there and benchmark to figure out what's right for you. a few suggestions I like to mention..
symbols are an excellent choice for a hash key
{name: "my name"}
Freeze Strings to save memory, try to keep a small string pool
person[:country] = "USA".freeze
have fun with Ruby GC tuning.
Interned strings are never de-allocated, resulting in a memory leak
ruby 2.2 introduced a symbol garbage collection. so this concern is no longer valid. however, overuse of frozen strings and symbols will decrease the performance.
My team is working on an MMO server in Ruby, and we opted to start moving computationally intensive operations into a C extension. As part of that effort, we moved the actual data storage into C (using Data_Get_Struct and all that). So, for example, each Ruby "Zone" object has an associated "ZoneKernel::Zone" C struct where the actual binary data is stored.
Basically, I'm wondering if this is a terrible idea or not. I'm not super familiar with the ruby internals, but it seems like the data should be fine as long as the parent Zone stays in memory on the ruby side (thus preventing garbage collection of the C data).
One caveat is that we've been getting semi-regular "Stack Consistency Errors" that crash our server - this seems like potentially a related memory issue (instead of just your garden variety segfault) - if anyone has any knowledge of what that might be, I would appreciate that as well!
As stated in the documentation to Data_Wrap_Struct(klass, mark, free, ptr) function:
The free argument is the function to free the pointer allocation. If
this is -1, the pointer will be just freed.
These mark / free functions are invoked during GC execution.
Your wrapped native structure will be automatically freed when its corresponding Ruby object is finalized. Until that happens, your data will not be freed unless you do so manually.
Writing C extensions doesn't guarantee a performance boost, but it almost always increases the complexity of your code. Profile your server in order to measure your performance gains, and develop your Zone class in pure Ruby if viable.
In general I like to keep any data that could change outside of my source. Loading it from YAML or a database means you can tweak the data to your heart's content without needing to recompile. Obviously, if your compile time and load times are fast then it's not as big an issue, but it's still a good idea to separate the two.
I favor YAML, because it's a standard format, so you could access the same file from any number of languages. You could load it directly into the C side, or into the Ruby side, depending on what seems faster/smarter.
Ruby is truly memory-hungry - but also worth every single bit.
What do you do to keep the memory usage low? Do you avoid big strings and use smaller arrays/hashes instead or is it no problem to concern about for you and let the garbage collector do the job?
Edit: I found a nice article about this topic here - old but still interesting.
I've found Phusion's Ruby Enterprise Edition (a fork of mainline Ruby with much-improved garbage collection) to make a dramatic difference in memory usage... Plus, they've made it extraordinarily easy to install (and to remove, if you find the need).
You can find out more and download it on their website.
I really don't think it matters all that much.
Making your code less readable in order to improve memory consumption is something you should only ever do if you need it. And by need, I mean have a specific case for the performance profile and specific metrics that indicate that any change will address the issue.
If you have an application where memory is going to be the limiting factor, then Ruby may not be the best choice. That said, I have found that my Rails apps generally consume about 40-60mb of RAM per Mongrel instance. In the scheme of things, this isn't very much.
You might be able to run your application on the JVM with JRuby - the Ruby VM is currently not as advanced as the JVM for memory management and garbage collection. The 1.9 release is adding many improvements and there are alternative VM's under development as well.
Choose date structures that are efficient representations, scale well, and do what you need.
Use algorithms that work using efficient data structures rather than bloated, but easier ones.
Look else where. Ruby has a C bridge and its much easier to be memory conscious in C than in Ruby.
Ruby developers are quite lucky since they don’t have to manage the memory themselves.
Be aware that ruby allocates objects, for instance something as simple as
100.times{ 'foo' }
allocates 100 string objects (strings are mutable and each version requires its own memory allocation).
Make sure that if you are using a library allocating a lot of objects, that other alternatives are not available and your choice is worth paying the garbage collector cost. (you might not have a lot of requests/s or might not care for a few dozen ms per requests).
Creating a hash object really allocates more than an object, for instance
{'joe' => 'male', 'jane' => 'female'}
doesn’t allocate 1 object but 7. (one hash, 4 strings + 2 key strings)
If you can use symbol keys as they won’t be garbage collected. However because they won’t be garbage collected you want to make sure to not use totally dynamic keys like converting the username to a symbol, otherwise you will ‘leak’ memory.
Example: Somewhere in your app, you apply a to_sym on an user’s name like :
hash[current_user.name.to_sym] = something
When you have hundreds of users, that’s could be ok, but what is happening if you have one million of users ? Here are the numbers :
ruby-1.9.2-head >
# Current memory usage : 6608K
# Now, add one million randomly generated short symbols
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s).to_sym }
# Current memory usage : 153M, even after a Garbage collector run.
# Now, imagine if symbols are just 20x longer than that ?
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s * 20).to_sym }
# Current memory usage : 501M
Be aware to never convert non controlled arguments in symbol or check arguments before, this can easily lead to a denial of service.
Also remember to avoid nested loops more than three levels deep because it makes the maintenance difficult. Limiting nesting of loops and functions to three levels or less is a good rule of thumb to keep the code performant.
Here are some links in regards:
http://merbist.com
http://blog.monitis.com
When deploying a Rails/Rack webapp, use REE or some other copy-on-write friendly interpreter.
Tweak the garbage collector (see https://www.engineyard.com/blog/tuning-the-garbage-collector-with-ruby-1-9-2 for example)
Try to cut down the number of external libraries/gems you use since additional code uses memory.
If you have a part of your app that is really memory-intensive then it's maybe worth rewriting it in a C extension or completing it by invoking other/faster/better optimized programs (if you have to process vast amounts of text data, maybe you can replace that code with calls to grep, awk, sed etc.)
I am not a ruby developer but I think some techniques and methods are true of any language:
Use the minimum size variable suitable for the job
Destroy and close variables and connections when not in use
However if you have an object you will need to use many times consider keeping it in scope
Any loops with manipulations of a big string dp the work on a smaller string and then append to bigger string
Use decent (try catch finally) error handling to make sure objects and connections are closed
When dealing with data sets only return the minimum necessary
Other than in extreme cases memory usage isn't something to worry about. The time you spend trying to reduce memory usage will buy a LOT of gigabytes.
Take a look at Small Memory Software - Patterns for Systems with Limited Memory. You don't specify what sort of memory constraint, but I assume RAM. While not Ruby-specific, I think you'll find some useful ideas in this book - the patterns cover RAM, ROM and secondary storage, and are divided into major techniques of small data structures, memory allocation, compression, secondary storage, and small architecture.
The only thing we've ever had which has actually been worth worrying about is RMagick.
The solution is to make sure you're using RMagick version 2, and call Image#destroy! when you're done using your image
Avoid code like this:
str = ''
veryLargeArray.each do |foo|
str += foo
# but str << foo is fine (read update below)
end
which will create each intermediate string value as a String object and then remove its only reference on the next iteration. This junks up the memory with tons of increasingly long strings that have to be garbage collected.
Instead, use Array#join:
str = veryLargeArray.join('')
This is implemented in C very efficiently and doesn't incur the String creation overhead.
UPDATE: Jonas is right in the comment below. My warning holds for += but not <<.
I'm pretty new at Ruby, but so far I haven't found it necessary to do anything special in this regard (that is, beyond what I just tend to do as a programmer generally). Maybe this is because memory is cheaper than the time it would take to seriously optimize for it (my Ruby code runs on machines with 4-12 GB of RAM). It might also be because the jobs I'm using it for are not long-running (i.e. it's going to depend on your application).
I'm using Python, but I guess the strategies are similar.
I try to use small functions/methods, so that local variables get automatically garbage collected when you return to the caller.
In larger functions/methods I explicitly delete large temporary objects (like lists) when they are no longer needed. Closing resources as early as possible might help too.
Something to keep in mind is the life cycle of your objects. If you're objects are not passed around that much, the garbage collector will eventually kick in and free them up. However, if you keep referencing them it may require some cycles for the garbage collector to free them up. This is particularly true in Ruby 1.8, where the garbage collector uses a poor implementation of the mark and sweep technique.
You may run into this situation when you try to apply some "design patterns" like decorator that keep objects in memory for a long time. It may not be obvious when trying example in isolation, but in real world applications where thousands of objects are created at the same time the cost of memory growth will be significant.
When possible, use arrays instead of other data structures. Try not to use floats when integers will do.
Be careful when using gem/library methods. They may not be memory optimized. For example, the Ruby PG::Result class has a method 'values' which is not optimized. It will use a lot of extra memory. I have yet to report this.
Replacing malloc(3) implementation to jemalloc will immediately decrease your memory consumption up to 30%. I've created 'jemalloc' gem to achieve this instantly.
'jemalloc' GEM: Inject jemalloc(3) into your Ruby app in 3 min
I try to keep arrays & lists & datasets as small as possible. The individual object do not matter much, as creation and garbage collection is pretty fast in most modern languages.
In the cases you have to read some sort of huge dataset from the database, make sure to read in a forward/only manner and process it in little bits instead og loading everything into memory first.
dont use a lot of symbols, they stay in memory until the process gets killed.. this because symbols never get garbage collected.