Garbage collect in ruby? - ruby

I can see related questions asked before
In some languages, there's a specific way to force garbage collection. For example in R, we can call gc() and it will free up memory that was previously used to store objects that have since been removed.
Is there any way to do this in ruby?
In case it's relevant, I'm running a very long loop and I think it's slowly accumulating a little memory use each iteration, and I would like to force garbage collection every 100th iteration or so just to be sure e.g. (pseudo code) if index % 100 == 0 then gc(). Also note I intend to use this in a rails app, although I don't think that's relevant (since garbage collection would be entirely a ruby feature, nothing to do with rails)

No, there is no way to do it in Ruby.
There is a method called GC::start, and the documentation even says:
Initiates garbage collection, even if manually disabled.
But that is not true. GC::start is simply an advisory from your code to the runtime that it would be safe for your application to run a garbage collection now. But that is only a suggestion. The runtime is free to ignore this suggestion.
The majority of programming languages with automatic memory management do not give the programmer control over the garbage collector.
If Ruby had a method to force a garbage collection, then it would be impossible to implement Ruby on the JVM and neither JRuby nor TruffleRuby could exist, it would be impossible to implement Ruby on .NET and IronRuby couldn't exist, it would be impossible to implement Ruby on ECMAScript and Opal couldn't exist, it would be impossible to implement Ruby using existing high-performance garbage collectors and RubyOMR couldn't exist.
Since it is in generally desirable to give implementors freedom to implement optimizations and make the language faster, languages are very cautious on specifying features that so drastically restrict what an implementor can do.
I am quite surprised that R has such a feature, especially since that means it is impossible to implement high-performance implementations like FastR in a way that is compliant with the language specification. FastR is up to more than 35× faster than GNU R, so it is obvious why it is desirable for something like FastR to exist. But one of the ways FastR is faster, is that it uses a third-party high-performance garbage collected runtime (either the GraalVM or the JVM) that does not allow control over garbage collection, and thus FastR can never be a compliant R implementation.
Interestingly, the documentation of gc() has this to say:
[T]he primary purpose of calling gc is for the report on memory usage.

This is done via GC.start.
--

Related

Will using Sorbet's ruby type checker have an impact on a ruby app's performance?

Maybe a newb's question but if you never ask youll never know
Will using Stripe's Sorbet (https://sorbet.org/) on a RoR app, can potentially improve the app's performance?
(performance meaning response times, not robustness \ runtime error rate)
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
It doesn't yet, but one could potentially build this one day.
Goal of Sorbet was to build a type system for people, compared to building a type system for computers(compiler). It can introduce some performance overhead, but as Stripe runs it in production, we keep it in check. Internally, we page us if overhead is >7% of cpu time.
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
Yes, this can be done. What you're describing is a common optimization in Just-In-Time(JIT) compilers. The technique that you seem to refer to uses run time profiling and actually is a common alternative technique that allows to achieve this result in absence of type system. It's also worth noting that well-build JITs can do it more frequently than a type system, as type system encodes what could happen, while profiling & JITs can optimize for what actually happens in practice.
That said, building a JIT is frequently much more work than building an online compiler, thus, depending on amount of investment one wants to put into speeding up Ruby, either using building a JIT or using types can prove better under different real-world constrains.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
Summarizing the previous paragraph, Sorbet typesystem does not currently speedup Ruby, but it doesn't slow it down much either.
Type systems could be indeed used to speed up languages, but they aren't your only tool, with profiling & JIT compilation being the major competitor.
the optimizations you are talking about apply more to the JIT that is beeing worked on for the ruby runtime.
in general, sorbet aims at type-safety by introducing type interfaces or method signatures. they enable static type-checks that are applied before deploying the application in order to get rid of "type errors".
sorbet comes with a runtime component that can enforce type checks at runtime in your runnable application, but those are going to decrease the applications performance as they wrap method-calls in order to check for correct types https://sorbet.org/docs/runtime#runtime-checked-sig-s

Hack the JVM to avoid unnecessary bounds checks and casts

There are some languages that support a sufficiently powerful type system that they can prove at compile time that the code does not address an array outside its bounds. My question is that if we were to compile such a language to the JVM, is there some way we could take advantage of that for performance and remove the array bounds checks that occur on every array access?
1) I know that recent JDK supports some array bound check elimination, but since I know at compile time that certain calls are safe, I could remove a lot more safely.
2) Some might think this doesn't affect performance much but it most certainly does, especially in array/computation heavy applications such as scientific computing.
The same question regarding casting. I know something is a certain type, but Java doesn't because its limited type system. Is there some way to just tell the JVM to "trust me" and skip any checks?
I realize there is probably no way to do this as the JVM is generally distributed, could it be reasonable to modify a JVM with this feature? Is this something that has been done?
It's one of the frustrations in compiling a more powerfully typed language to the JVM, it still is hampered by Java's limitations.
In principle this cannot be done in a safe fashion without a proof-carrying code (PCC) infrastructure. PCC would allow you to embed your reasoning of safety in the class file. Your embedded proof is checked at class-loading time. The class is not loaded if there is a flaw in the proof.
If the JVM ever allowed you to drop runtime checks without requiring a formal proof, then, as SecurityMatt put it, it would defeat the original philosophy of Java as a safe platform.
The JVM uses a special form of PCC for type-checking local variables in a method. All local variable typing info is used by the class-loading mechanism to check its correctness, but discarded after that. But that's the only instance of PCC concepts used in the JVM. As far as I know there is no general PCC infrastructure for the JVM.
I once heard one existed for the JavaCard platform which supports a small subset of Java. I am not sure if that can be helpful in your problem though.
One of the key features of Java is that it does not need to "trust" the developer to do bounds checking. This eliminates the "buffer overflow" security vulnerabilities which can lead to attackers being able to execute arbitrary code within your application.
By allowing developers the ability to turn off bounds checking, Java would lose one of its key features - that no matter how wrong the Java developer is, there is not going to be any exploitable buffer overflows within his/her code.
If you would like to use a language where the programmer is trusted to manage their own bounds checking, might I suggest C++. This gives you the ability to allocate arrays with no automatic bounds checking (new int[]) and to allocate arrays with inbuilt bounds checking (std::vector).
Additionally, I strongly suggest that before blaming bounds checking for the speed loss in your application, you perform some BENCHMARKING to determine whether there is somewhere else in your code that might be causing the bottleneck.
You may find that for a compiler target that a bytecode language such as MSIL is more suited to your needs than Java bytecode. MSIL is strongly typed and does not suffer from a number of the inefficiencies that you have found in Java.

NoSQL DB written in Ruby?

Was curious, but are any NoSQL DBMS written in Ruby?
And if not, would it be unwise to create one in Ruby?
Was curious, but are any NoSQL DBMS written in Ruby?
In 2007, Anthony Eden played around with RDDB, a CouchDB-inspired document-oriented database. He still keeps a copy of the code in his GitHub account.
I vaguely remember that at or around the same time, someone else was also playing around with a database in Ruby. I think it was either inspired by or a reaction to RDDB.
Last but not least, there is the PStore library in the stdlib, which – depending on your definition – may or may not count as a database.
And if not, would it be unwise to create one in Ruby?
The biggest problem I see in Ruby are its concurrency primitives. Threads and locks are so 1960s. If you want to support multiple concurrent users, then you obviously need concurrency, although if you want to build an embedded in-process database, then this is much less of a concern.
Other than that, there are some not-so-stellar implementations of Ruby, but that is not a limitation of Ruby but of those particular implementations, and it applies to pretty much every other programming language as well. Rubinius (especially the current development trunk, which adds Ruby 1.9 compatibility and removes the Global Interpreter Lock) and JRuby would both be fine choices.
As an added bonus, Rubinius comes with a built-in actors library for concurrency and JRuby gives you access to e.g. Clojure's concurrency libraries or the Akka actors library.
Performance isn't really much of a concern, I think. Rubinius's Hash class, which is written in 100% pure Ruby, performs comparably to YARV's Hash class, which is written in 100% hand-optimized C. This shows you that Ruby code, at least when it is carefully written, can be just as fast as C, especially since databases tend to be long-running and thus Rubinius's or JRuby's (and in the latter case specifically also the JVM's) dynamic optimizers (which C compilers typically do not have) can really get to work.
Ruby is just too slow for any type of DBMS
c/c++/erlang are generally the best choice.
You generally shouldn't care in what programming language was a DBMS implemented as long it has all the features and is available for use from your application programming language of choice.
So, the real question here is do you need one written in Ruby or available for use in Ruby.
In first case, I doubt you'll find a DBMS natively written in Ruby (any correction of this statement will be appreciated).
In second case, you should be able to find Ruby bindings/wrappers for any decent DBMS relational or not.

What makes Ruby slow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Ruby is slow at certain things. But what parts of it are the most problematic?
How much does the garbage collector affect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.
I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?
Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?
Has there been recent work that has significantly improved performance?
Ruby is slow. But what parts of it are the most problematic?
It does "late lookup" for methods, to allow for flexibility. This slows it down quite a bit. It also has to remember variable names per context to allow for eval, so its frames and method calls are slower. Also it lacks a good JIT compiler currently, though MRI 1.9 has a bytecode compiler (which is better), and jruby compiles it down to java bytecode, which then (can) compile via the HotSpot JVM's JIT compiler, but it ends up being about the same speed as 1.9.
How much does the garbage collector effect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.
from some of the graphs at http://www.igvita.com/2009/06/13/profiling-ruby-with-googles-perftools/ I'd say it takes about 10% which is quite a bit--you can decrease that hit by increasing the malloc_limit in gc.c and recompiling.
I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?
Ruby 1.8 "didn't" implement basic math it implemented Numeric classes and you'd call things like Fixnum#+ Fixnum#/ once per call--which was slow. Ruby 1.9 cheats a bit by inlining some of the basic math ops.
Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?
Things like eval are hard to implement efficiently, though much work can be done, I'm sure. The kicker for Ruby is that it has to accomodate for somebody in another thread changing the definition of a class spontaneously, so it has to be very conservative.
Has there been recent work that has significantly improved performance?
1.9 is like a 2x speedup. It's also more space efficient. JRuby is constantly trying to improve speed-wise [and probably spends less time in the GC than KRI]. Besides that I'm not aware of much except little hobby things I've been working on. Note also that 1.9's strings are at times slower because of encoding friendliness.
Ruby is very good for delivering solutions quickly. Less so for delivering quick solutions. It depends what kind of problem you're trying to solve. I'm reminded of the discussions on the old CompuServe MSBASIC forum in the early 90s: when asked which was faster for Windows development, VB or C, the usual answer was "VB, by about 6 months".
In its MRI 1.8 form, Ruby is - relatively - slow to perform some types of computationally-intensive tasks. Pretty much any interpreted language suffers in that way in comparison to most mainstream compiled languages.
The reasons are several: some fairly easily addressable (the primitive garbage collection in 1.8, for example), some less so.
1.9 addresses some of the issues, although it's probably going to be some time before it becomes generally available. Some of the other implementation that target pre-existing runtimes, JRuby, IronRuby, MagLev for example, have the potential to be significantly quicker.
Regarding mathematical performance, I wouldn't be surprised to see fairly slow throughput: it's part of the price you pay for arbitrary precision. Again, pick your problem. I've solved 70+ of the Project Euler problems in Ruby with almost no solution taking more than a mintue to run. How fast do you need it to run and how soon do you need it?
The most problematic part is "everyone".
Bonus points if that "everyone" didn't really use the language, ever.
Seriously, 1.9 is much faster and now is on par with python, and jruby is faster than jython.
Garbage collectors are everywhere; for example, Java has one, and it's faster than C++ on dynamic memory handling. Ruby isn't suited well for number crunching; but few languages are, so if you have computational-intensive parts in your program in any language, you better rewrite them in C (Java is fast with math due to its primitive types, but it paid dearly for them, they're clearly #1 in ugliest parts of the language).
As for dynamic features: they aren't fast, but code without them in static languages can be even slower; for example, java would use a XML config instead of Ruby using a DSL; and it would likely be SLOWER since XML parsing is costly.
Hmm - I worked on a project a few years ago where I scraped the barrel with Ruby performance, and I'm not sure much has changed since. Right now it's caveat emptor - you have to know not to do certain things, and frankly games / realtime applications would be one of them (since you mention OpenGL).
The culprit for killing interactive performance is the garbage collector - others here mention that Java and other environments have garbage collection too, but Ruby's has to stop the world to run. That is to say, it has to stop running your program, scan through every register and memory pointer from scratch, mark the memory that's still in use, and free the rest. The process can't be interrupted while this happens, and as you might have noticed, it can take hundreds of milliseconds.
Its frequency and length of execution is proportional to the number of objects you create and destroy, but unless you disable it altogether, you have no control. My experience was there were several unsatisfactory strategies to smooth out my Ruby animation loop:
GC.disable / GC.enable around critical animation loops and maybe an opportunistic GC.start to force it to go when it can't do any harm. (because my target platform at the time was a 64MB Windows NT machine, this caused the system to run out of memory occasionally. But fundamentally it's a bad idea - unless you can pre-calculate how much memory you might need before doing this, you're risking memory exhaustion)
Reduce the number of objects you create so the GC has less work to do (reduces the frequency / length of its execution)
Rewrite your animation loop in C (a cop-out, but the one I went with!)
These days I would probably also see if JRuby would work as an alternative runtime, as I believe it relies on Java's more sophisticated garbage collector.
The other major performance issue I've found is basic I/O when trying to write a TFTP server in Ruby a while back (yeah I pick all the best languages for my performance-critical projects this was was just an experiment). The absolute simplest tightest loop to simply respond to one UDP packet with another, contaning the next piece of a file, must have been about 20x slower than the stock C version. I suspect there might have been some improvements to make there based around using low-level IO (sysread etc.) but the slowness might just be in the fact there is no low-level byte data type - every little read is copied out into a String. This is just speculation though, I didn't take this project much further but it warned me off relying on snappy I/O.
The main speed recent increase that has gone on, though I'm not fully up-to-date here, is that the virtual machine implementation was redone for 1.9, resulting in faster code execution. However I don't think the GC has changed, and I'm pretty sure there's nothing new on the I/O front. But I'm not fully up-to-date on bleeding-edge Ruby so someone else might want to chip in here.
I assume that you're asking, "what particular techniques in Ruby tend to be slow."
One is object instantiation. If you are doing large amounts of it, you want to look at (reasonable) ways of reducing that, such as using the flyweight pattern, even if memory usage is not a problem. In one library where I reworked it not to be creating a lot of very similar objects over and over again, I doubled the overall speed of the library.
Steve Dekorte: "Writing a Mandelbrot set calculator in a high level language is like trying to run the Indy 500 in a bus."
http://www.dekorte.com/blog/blog.cgi?do=item&id=4047
I recommend to learn various tools in order to use the right tool for the job. Doing matrix transformations could be done efficiently using high-level API which wraps around tight loops with arithmetic-intensive computations. See RubyInline gem for an example of embedding C or C++ code into Ruby script.
There is also Io language which is much slower than Ruby, but it efficiently renders movies in Pixar and outperforms raw C on vector arithmetics by using SIMD acceleration.
http://iolanguage.com
https://renderman.pixar.com/products/tools/it.html
http://iolanguage.com/scm/git/checkout/Io/docs/IoGuide.html#Primitives-Vector
Ruby 1.9.1 is about twice as fast as PHP, and a little bit faster than Perl, according to some benchmarks.
(Update: My source is this (screenshot). I don't know what his source is, though.)
Ruby is not slow. The old 1.8 is, but the current Ruby isn't.
Ruby is slow because it was designed to optimize the programmers experience, not the program's execution time. Slowness is just a symptom of that design decision. If you would prefer performance to pleasure, you should probably use a different language. Ruby's not for everything.
IMO, dynamic languages are all slow in general. They do something in runtime that static languages do in compiling time.
Syntax Check, Interpreting and Like type checking, converting. this is inevitable, therefore ruby is slower than c/c++/java, correct me if I am wrong.

What are your strategies to keep the memory usage low?

Ruby is truly memory-hungry - but also worth every single bit.
What do you do to keep the memory usage low? Do you avoid big strings and use smaller arrays/hashes instead or is it no problem to concern about for you and let the garbage collector do the job?
Edit: I found a nice article about this topic here - old but still interesting.
I've found Phusion's Ruby Enterprise Edition (a fork of mainline Ruby with much-improved garbage collection) to make a dramatic difference in memory usage... Plus, they've made it extraordinarily easy to install (and to remove, if you find the need).
You can find out more and download it on their website.
I really don't think it matters all that much.
Making your code less readable in order to improve memory consumption is something you should only ever do if you need it. And by need, I mean have a specific case for the performance profile and specific metrics that indicate that any change will address the issue.
If you have an application where memory is going to be the limiting factor, then Ruby may not be the best choice. That said, I have found that my Rails apps generally consume about 40-60mb of RAM per Mongrel instance. In the scheme of things, this isn't very much.
You might be able to run your application on the JVM with JRuby - the Ruby VM is currently not as advanced as the JVM for memory management and garbage collection. The 1.9 release is adding many improvements and there are alternative VM's under development as well.
Choose date structures that are efficient representations, scale well, and do what you need.
Use algorithms that work using efficient data structures rather than bloated, but easier ones.
Look else where. Ruby has a C bridge and its much easier to be memory conscious in C than in Ruby.
Ruby developers are quite lucky since they don’t have to manage the memory themselves.
Be aware that ruby allocates objects, for instance something as simple as
100.times{ 'foo' }
allocates 100 string objects (strings are mutable and each version requires its own memory allocation).
Make sure that if you are using a library allocating a lot of objects, that other alternatives are not available and your choice is worth paying the garbage collector cost. (you might not have a lot of requests/s or might not care for a few dozen ms per requests).
Creating a hash object really allocates more than an object, for instance
{'joe' => 'male', 'jane' => 'female'}
doesn’t allocate 1 object but 7. (one hash, 4 strings + 2 key strings)
If you can use symbol keys as they won’t be garbage collected. However because they won’t be garbage collected you want to make sure to not use totally dynamic keys like converting the username to a symbol, otherwise you will ‘leak’ memory.
Example: Somewhere in your app, you apply a to_sym on an user’s name like :
hash[current_user.name.to_sym] = something
When you have hundreds of users, that’s could be ok, but what is happening if you have one million of users ? Here are the numbers :
ruby-1.9.2-head >
# Current memory usage : 6608K
# Now, add one million randomly generated short symbols
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s).to_sym }
# Current memory usage : 153M, even after a Garbage collector run.
# Now, imagine if symbols are just 20x longer than that ?
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s * 20).to_sym }
# Current memory usage : 501M
Be aware to never convert non controlled arguments in symbol or check arguments before, this can easily lead to a denial of service.
Also remember to avoid nested loops more than three levels deep because it makes the maintenance difficult. Limiting nesting of loops and functions to three levels or less is a good rule of thumb to keep the code performant.
Here are some links in regards:
http://merbist.com
http://blog.monitis.com
When deploying a Rails/Rack webapp, use REE or some other copy-on-write friendly interpreter.
Tweak the garbage collector (see https://www.engineyard.com/blog/tuning-the-garbage-collector-with-ruby-1-9-2 for example)
Try to cut down the number of external libraries/gems you use since additional code uses memory.
If you have a part of your app that is really memory-intensive then it's maybe worth rewriting it in a C extension or completing it by invoking other/faster/better optimized programs (if you have to process vast amounts of text data, maybe you can replace that code with calls to grep, awk, sed etc.)
I am not a ruby developer but I think some techniques and methods are true of any language:
Use the minimum size variable suitable for the job
Destroy and close variables and connections when not in use
However if you have an object you will need to use many times consider keeping it in scope
Any loops with manipulations of a big string dp the work on a smaller string and then append to bigger string
Use decent (try catch finally) error handling to make sure objects and connections are closed
When dealing with data sets only return the minimum necessary
Other than in extreme cases memory usage isn't something to worry about. The time you spend trying to reduce memory usage will buy a LOT of gigabytes.
Take a look at Small Memory Software - Patterns for Systems with Limited Memory. You don't specify what sort of memory constraint, but I assume RAM. While not Ruby-specific, I think you'll find some useful ideas in this book - the patterns cover RAM, ROM and secondary storage, and are divided into major techniques of small data structures, memory allocation, compression, secondary storage, and small architecture.
The only thing we've ever had which has actually been worth worrying about is RMagick.
The solution is to make sure you're using RMagick version 2, and call Image#destroy! when you're done using your image
Avoid code like this:
str = ''
veryLargeArray.each do |foo|
str += foo
# but str << foo is fine (read update below)
end
which will create each intermediate string value as a String object and then remove its only reference on the next iteration. This junks up the memory with tons of increasingly long strings that have to be garbage collected.
Instead, use Array#join:
str = veryLargeArray.join('')
This is implemented in C very efficiently and doesn't incur the String creation overhead.
UPDATE: Jonas is right in the comment below. My warning holds for += but not <<.
I'm pretty new at Ruby, but so far I haven't found it necessary to do anything special in this regard (that is, beyond what I just tend to do as a programmer generally). Maybe this is because memory is cheaper than the time it would take to seriously optimize for it (my Ruby code runs on machines with 4-12 GB of RAM). It might also be because the jobs I'm using it for are not long-running (i.e. it's going to depend on your application).
I'm using Python, but I guess the strategies are similar.
I try to use small functions/methods, so that local variables get automatically garbage collected when you return to the caller.
In larger functions/methods I explicitly delete large temporary objects (like lists) when they are no longer needed. Closing resources as early as possible might help too.
Something to keep in mind is the life cycle of your objects. If you're objects are not passed around that much, the garbage collector will eventually kick in and free them up. However, if you keep referencing them it may require some cycles for the garbage collector to free them up. This is particularly true in Ruby 1.8, where the garbage collector uses a poor implementation of the mark and sweep technique.
You may run into this situation when you try to apply some "design patterns" like decorator that keep objects in memory for a long time. It may not be obvious when trying example in isolation, but in real world applications where thousands of objects are created at the same time the cost of memory growth will be significant.
When possible, use arrays instead of other data structures. Try not to use floats when integers will do.
Be careful when using gem/library methods. They may not be memory optimized. For example, the Ruby PG::Result class has a method 'values' which is not optimized. It will use a lot of extra memory. I have yet to report this.
Replacing malloc(3) implementation to jemalloc will immediately decrease your memory consumption up to 30%. I've created 'jemalloc' gem to achieve this instantly.
'jemalloc' GEM: Inject jemalloc(3) into your Ruby app in 3 min
I try to keep arrays & lists & datasets as small as possible. The individual object do not matter much, as creation and garbage collection is pretty fast in most modern languages.
In the cases you have to read some sort of huge dataset from the database, make sure to read in a forward/only manner and process it in little bits instead og loading everything into memory first.
dont use a lot of symbols, they stay in memory until the process gets killed.. this because symbols never get garbage collected.

Resources