I'm developing a simple desktop gui (swing) app in JRuby 9.0.0.0-pre2 (latest rvm version), and it is a step up from 1.7.19. It uses mechanize to access a corporate website and upload a file. The app has a JFrame with 2 images (a few kb), a JButton, and a bit textis, and it takes about 8 seconds to load the window. These loading times are unacceptable.
The builtin profiler jruby --profile script.rb shows this:
total self children calls method
5.36 0.03 5.32 806 Kernel.require
4.54 0.00 4.53 371 Kernel.require
4.53 0.00 4.53 8 Kernel.require_relative
1.28 0.08 1.20 2691 Array#each
1.20 0.10 1.11 35 Kernel.load
Besides Array#each, all methods are Kernel methods. Is this what Aaron Patterson was talking about in Railsconf2015? Or is this a specific to JRuby implementation? Can I boost this? Client didn't help, and I'm not sure if I can turn this on when I warble it to a jar even if it did help.
From reading the doc https://github.com/ruby-prof/ruby-prof:
It seems that some patches were originally required for these features,
for allocations:
http://rubyforge.org/tracker/index.php?func=detail&aid=11497&group_id=426&atid=1700
for memory use:
http://rubyforge.org/tracker/index.php?func=detail&aid=17676&group_id=1814&atid=7062
The latter of which claims to have been applied.
This github issue seems to back that up:
https://github.com/ruby-prof/ruby-prof/issues/86
But I have had absolutely no luck getting either of these modes to work in a 2.1.1 install, eg:
require 'ruby-prof'
RubyProf.measure_mode = RubyProf::MEMORY
RubyProf.start
1000.times do
s = "string"
end
RubyProf::FlatPrinter.new(RubyProf.stop).print
Produces:
Thread ID: 4183000
Fiber ID: 8413080
Total: 0.000000
Sort by: self_time
%self total self wait child calls name
NaN 0.000 0.000 0.000 0.000 1 Integer#times
NaN 0.000 0.000 0.000 0.000 1 Global#[No method]
I even went so far as to try install ruby 1.9.3 to do this profiling, but rvm seems unable to find any of the old patches. Is there some way to get this working? or have these features been abandoned?
According to a ruby-prof issue we need a patched version of ruby to use the memory profiler.
https://github.com/ruby-prof/ruby-prof/issues/166
https://github.com/skaes/rvm-patchsets
I have a Rscript that will load ggplot2 in its first line.
Though loading a library doesn't take much time, as this script may be executed in command line for millions of times so the speed is really important for me.
Is there a way to speed up this loading process?
Don't restart -- keep a persistent R session and just issue requests to it. Something like Rserve can provide this, and for example FastRWeb uses it very well -- with millsecond round-trips for chart generation.
As an addition to #MikeDunlavey's answer:
Actually, both library and require check whether the package is already loaded.
Here are some timings with microbenchmark I get:
> microbenchmark (`!` (exists ("qplot")),
`!` (existsFunction ('qplot')),
require ('ggplot2'),
library ('ggplot2'),
"package:ggplot2" %in% search ())
## results reordered with descending median:
Unit: microseconds
expr min lq median uq max
3 library("ggplot2") 259.720 262.8700 266.3405 271.7285 448.749
1 !existsFunction("qplot") 79.501 81.8770 83.7870 89.2965 114.182
5 require("ggplot2") 12.556 14.3755 15.5125 16.1325 33.526
4 "package:ggplot2" %in% search() 4.315 5.3225 6.0010 6.5475 9.201
2 !exists("qplot") 3.370 4.4250 5.0300 6.2375 12.165
For comparison, loading for the first time:
> system.time (library (ggplot2))
User System verstrichen
0.284 0.016 0.300
(these are seconds!)
In the end, as long as the factor 3 = 10 μs between require and "package:ggplot2" %in% search() isn't needed, I'd go with require, otherwise witht the %in% search ().
What Dirk said, plus you can use the exists function to conditionally load a library, as in
if ( ! exists( "some.function.defined.in.the.library" )){
library( the.library )
}
So if you put that in the script you can run the script more than once in the same R session.
Mahoro is a libmagic wrapper. Right now my process for reading in a file is:
filetype = Mahoro.new.file(full_path)
File.open(full_path, get_access_string(filetype)) do |f|
The problem is that Mahoro seems to read the entire file, and not just the header strings. So I get a profiling result like:
%self total self wait child calls name
6.02 0.26 0.26 0.00 0.00 1 Mahoro#file
5.81 4.36 0.25 0.00 4.11 1 Parser#read_from_file
Each are taking .25 seconds, which implies that they are duplicating each other's work. Is there a way to get the file as a string from libmagic? That seems to be the only way to make this process more efficient.
I have been using Ruby for a while now and I find, for bigger projects, it can take up a fair amount of memory. What are some best practices for reducing memory usage in Ruby?
Please, let each answer have one "best practice" and let the community vote it up.
When working with huge arrays of ActiveRecord objects be very careful... When processing those objects in a loop if on each iteration you are loading their related objects using ActiveRecord's has_many, belongs_to, etc. - the memory usage grows a lot because each object that belongs to an array grows...
The following technique helped us a lot (simplified example):
students.each do |student|
cloned_student = student.clone
...
cloned_student.books.detect {...}
ca_teachers = cloned_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
# Not sure if the following is necessary, but we have it just in case...
cloned_student = nil
end
In the code above "cloned_student" is the object that grows, but since it is "nullified" at the end of each iteration this is not a problem for huge array of students. If we didn't do "clone", the loop variable "student" would have grown, but since it belongs to an array - the memory used by it is never released as long as array object exists.
Different approach works too:
students.each do |student|
loop_student = Student.find(student.id) # just re-find the record into local variable.
...
loop_student.books.detect {...}
ca_teachers = loop_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
end
In our production environment we had a background process that failed to finish once because 8Gb of RAM wasn't enough for it. After this small change it uses less than 1Gb to process the same amount of data...
Don't abuse symbols.
Each time you create a symbol, ruby puts an entry in it's symbol table. The symbol table is a global hash which never gets emptied.
This is not technically a memory leak, but it behaves like one. Symbols don't take up much memory so you don't need to be too paranoid, but it pays to be aware of this.
A general guideline: If you've actually typed the symbol in code, it's fine (you only have a finite amount of code after all), but don't call to_sym on dynamically generated or user-input strings, as this opens the door to a potentially ever-increasing number
Don't do this:
def method(x)
x.split( doesn't matter what the args are )
end
or this:
def method(x)
x.gsub( doesn't matter what the args are )
end
Both will permanently leak memory in ruby 1.8.5 and 1.8.6. (not sure about 1.8.7 as I haven't tried it, but I really hope it's fixed.) The workaround is stupid and involves creating a local variable. You don't have to use the local, just create one...
Things like this are why I have lots of love for the ruby language, but no respect for MRI
Beware of C extensions which allocate large chunks of memory themselves.
As an example, when you load an image using RMagick, the entire bitmap gets loaded into memory inside the ruby process. This may be 30 meg or so depending on the size of the image.
However, most of this memory has been allocated by RMagick itself. All ruby knows about is a wrapper object, which is tiny(1).
Ruby only thinks it's holding onto a tiny amount of memory, so it won't bother running the GC. In actual fact it's holding onto 30 meg.
If you loop over a say 10 images, you can run yourself out of memory really fast.
The preferred solution is to manually tell the C library to clean up the memory itself - RMagick has a destroy! method which does this. If your library doesn't however, you may need to forcibly run the GC yourself, even though this is generally discouraged.
(1): Ruby C extensions have callbacks which will get run when the ruby runtime decides to free them, so the memory will eventually be successfully freed at some point, just perhaps not soon enough.
Measure and detect which parts of your code are creating objects that cause memory usage to go up. Improve and modify your code then measure again. Sometimes, you're using gems or libraries that use up a lot of memory and creating a lot of objects as well.
There are many tools out there such as busy-administrator that allow you to check the memory size of objects (including those inside hashes and arrays).
$ gem install busy-administrator
Example # 1: MemorySize.of
require 'busy-administrator'
data = BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
puts BusyAdministrator::MemorySize.of(data)
# => 10 MiB
Example # 2: MemoryUtils.profile
Code
require 'busy-administrator'
results = BusyAdministrator::MemoryUtils.profile(gc_enabled: false) do |analyzer|
BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
end
BusyAdministrator::Display.debug(results)
Output:
{
memory_usage:
{
before: 12 MiB
after: 22 MiB
diff: 10 MiB
}
total_time: 0.406452
gc:
{
count: 0
enabled: false
}
specific:
{
}
object_count: 151
general:
{
String: 10 MiB
Hash: 8 KiB
BusyAdministrator::MemorySize: 0 Bytes
Process::Status: 0 Bytes
IO: 432 Bytes
Array: 326 KiB
Proc: 72 Bytes
RubyVM::Env: 96 Bytes
Time: 176 Bytes
Enumerator: 80 Bytes
}
}
You can also try ruby-prof and memory_profiler. It is better if you test and experiment different versions of your code so you can measure the memory usage and performance of each version. This will allow you to check if your optimization really worked or not. You usually use these tools in development / testing mode and turn them off in production.