How to eliminate JIT overhead in a Julia executable (with MWE) - compilation

I'm using PackageCompiler hoping to create an executable that eliminates just-in-time compilation overhead.
The documentation explains that I must define a function julia_main to call my program's logic, and write a "snoop file", a script that calls functions I wish to precompile. My julia_main takes a single argument, the location of a file containing the input data to be analysed. So to keep things simple my snoop file simply makes one call to julia_main with a particular input file. So I'd hope to see the generated executable run nice and fast (no compilation overhead) when executed against that same input file.
But alas, that's not what I see. In a fresh Julia instance julia_main takes approx 74 seconds for the first execution and about 4.5 seconds for subsequent executions. The executable file takes approx 50 seconds each time it's called.
My use of the build_executable function looks like this:
julia> using PackageCompiler
julia> build_executable("d:/philip/source/script/julia/jsource/SCRiPTMain.jl",
"testexecutable",
builddir = "d:/temp/builddir4",
snoopfile = "d:/philip/source/script/julia/jsource/snoop.jl",
compile = "all",
verbose = true)
Questions:
Are the above arguments correct to achieve my aim of an executable with no JIT overhead?
Any other advice for me?
Here's what happens in response to that call to build_executable. The lines from Start of snoop file execution! to End of snoop file execution! are emitted by my code.
Julia program file:
"d:\philip\source\script\julia\jsource\SCRiPTMain.jl"
C program file:
"C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c"
Build directory:
"d:\temp\builddir4"
Executing snoopfile: "d:\philip\source\script\julia\jsource\snoop.jl"
Start of snoop file execution!
┌ Warning: The 'control file' contains the key 'InterpolateCovariance' with value 'true' but that is not supported. Pass a value of 'false' or omit the key altogether.
└ # ValidateInputs d:\Philip\Source\script\Julia\JSource\ValidateInputs.jl:685
Time to build model 20.058000087738037
Saving c:/temp/SCRiPT/SCRiPTModel.jls
Results written to c:/temp/SCRiPT/SCRiPTResultsJulia.json
Time to write file: 3620 milliseconds
Time in method runscript: 76899 milliseconds
End of snoop file execution!
[ Info: used 1313 out of 1320 precompile statements
Build static library "testexecutable.a":
atexit_hook_copy = copy(Base.atexit_hooks) # make backup
# clean state so that any package we use can carelessly call atexit
empty!(Base.atexit_hooks)
Base.__init__()
Sys.__init__() #fix https://github.com/JuliaLang/julia/issues/30479
using REPL
Base.REPL_MODULE_REF[] = REPL
Mod = #eval module $(gensym("anon_module")) end
# Include into anonymous module to not polute namespace
Mod.include("d:\\\\temp\\\\builddir4\\\\julia_main.jl")
Base._atexit() # run all exit hooks we registered during precompile
empty!(Base.atexit_hooks) # don't serialize the exit hooks we run + added
# atexit_hook_copy should be empty, but who knows what base will do in the future
append!(Base.atexit_hooks, atexit_hook_copy)
Build shared library "testexecutable.dll":
`'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root\mingw\bin\gcc.exe' --sysroot 'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root' -shared '-DJULIAC_PROGRAM_LIBNAME="testexecutable.dll"' -o testexecutable.dll -Wl,--whole-archive testexecutable.a -Wl,--no-whole-archive -std=gnu99 '-IC:\Users\philip\AppData\Local\Julia-1.2.0\include\julia' -DJULIA_ENABLE_THREADING=1 '-LC:\Users\philip\AppData\Local\Julia-1.2.0\bin' -Wl,--stack,8388608 -ljulia -lopenlibm -m64 -Wl,--export-all-symbols`
Build executable "testexecutable.exe":
`'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root\mingw\bin\gcc.exe' --sysroot 'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root' '-DJULIAC_PROGRAM_LIBNAME="testexecutable.dll"' -o testexecutable.exe 'C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c' testexecutable.dll -std=gnu99 '-IC:\Users\philip\AppData\Local\Julia-1.2.0\include\julia' -DJULIA_ENABLE_THREADING=1 '-LC:\Users\philip\AppData\Local\Julia-1.2.0\bin' -Wl,--stack,8388608 -ljulia -lopenlibm -m64`
Copy Julia libraries to build directory:
7z.dll
BugpointPasses.dll
libamd.2.4.6.dll
libamd.2.dll
libamd.dll
libatomic-1.dll
libbtf.1.2.6.dll
libbtf.1.dll
libbtf.dll
libcamd.2.4.6.dll
libcamd.2.dll
libcamd.dll
libccalltest.dll
libccolamd.2.9.6.dll
libccolamd.2.dll
libccolamd.dll
libcholmod.3.0.13.dll
libcholmod.3.dll
libcholmod.dll
libclang.dll
libcolamd.2.9.6.dll
libcolamd.2.dll
libcolamd.dll
libdSFMT.dll
libexpat-1.dll
libgcc_s_seh-1.dll
libgfortran-4.dll
libgit2.dll
libgmp.dll
libjulia.dll
libklu.1.3.8.dll
libklu.1.dll
libklu.dll
libldl.2.2.6.dll
libldl.2.dll
libldl.dll
libllvmcalltest.dll
libmbedcrypto.dll
libmbedtls.dll
libmbedx509.dll
libmpfr.dll
libopenblas64_.dll
libopenlibm.dll
libpcre2-8-0.dll
libpcre2-8.dll
libpcre2-posix-2.dll
libquadmath-0.dll
librbio.2.2.6.dll
librbio.2.dll
librbio.dll
libspqr.2.0.9.dll
libspqr.2.dll
libspqr.dll
libssh2.dll
libssp-0.dll
libstdc++-6.dll
libsuitesparseconfig.5.4.0.dll
libsuitesparseconfig.5.dll
libsuitesparseconfig.dll
libsuitesparse_wrapper.dll
libumfpack.5.7.8.dll
libumfpack.5.dll
libumfpack.dll
libuv-2.dll
libwinpthread-1.dll
LLVM.dll
LLVMHello.dll
zlib1.dll
All done
julia>
EDIT
I was afraid that creating a minimal working example would be hard, but it was straightforward:
TestBuildExecutable.jl contains:
module TestBuildExecutable
Base.#ccallable function julia_main(ARGS::Vector{String}=[""])::Cint
#show sum(myarray())
return 0
end
#Function which takes approx 8 seconds to compile. Returns a 500 x 20 array of 1s
function myarray()
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
# PLEASE EDIT TO INSERT THE MISSING 496 LINES, EACH IDENTICAL TO THE LINE ABOVE!
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
end
end #module
SnoopFile.jl contains:
module SnoopFile
currentpath = dirname(#__FILE__)
push!(LOAD_PATH, currentpath)
unique!(LOAD_PATH)
using TestBuildExecutable
println("Start of snoop file execution!")
TestBuildExecutable.julia_main()
println("End of snoop file execution!")
end # module
In a fresh Julia instance, julia_main takes 8.3 seconds for the first execution and half a millisecond for the second execution:
julia> #time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
8.355108 seconds (425.36 k allocations: 25.831 MiB, 0.06% gc time)
0
julia> #time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
0.000537 seconds (25 allocations: 82.906 KiB)
0
So next I call build_executable:
julia> using PackageCompiler
julia> build_executable("d:/philip/source/script/julia/jsource/TestBuildExecutable.jl",
"testexecutable",
builddir = "d:/temp/builddir15",
snoopfile = "d:/philip/source/script/julia/jsource/SnoopFile.jl",
verbose = false)
Julia program file:
"d:\philip\source\script\julia\jsource\TestBuildExecutable.jl"
C program file:
"C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c"
Build directory:
"d:\temp\builddir15"
Start of snoop file execution!
sum(myarray()) = 10000
End of snoop file execution!
[ Info: used 79 out of 79 precompile statements
All done
Finally, in a Windows Command Prompt:
D:\temp\builddir15>testexecutable
sum(myarray()) = 1000
D:\temp\builddir15>
which took (by my stopwatch) 8 seconds to run, and it takes 8 seconds to run every time it's executed, not just the first time. This is consistent with the executable doing a JIT compile every time it's run, but the snoop file is designed to avoid that!
Version information:
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6700 CPU # 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 8
JULIA_EDITOR = "C:\Users\Philip\AppData\Local\Programs\Microsoft VS Code\Code.exe"

Looks like you are using Windows.
At some point PackageCompiler.jl will be mature for Windows at which you can try it.

The solution was indeed to wait for progress on PackageCompilerX, as suggested by #xiaodai.
On 10 Feb 2020 what was formerly PackageCompilerX became a new (version 1.0 of) PackageCompiler, with a significantly changed API, and more thorough documentation.
In particular, the MWE above (mutated for the new API to PackageCompiler) now works correctly without any JIT overhead.

Related

Julia parallel computing in IPython Jupyter

I'm preparing a small presentation in Ipython where I want to show how easy it is to do parallel operation in Julia.
It's basically a Monte Carlo Pi calculation described here
The problem is that I can't make it work in parallel inside an IPython (Jupyter) Notebook, it only uses one.
I started Julia as: julia -p 4
If I define the functions inside the REPL and run it there it works ok.
#everywhere function compute_pi(N::Int)
"""
Compute pi with a Monte Carlo simulation of N darts thrown in [-1,1]^2
Returns estimate of pi
"""
n_landed_in_circle = 0
for i = 1:N
x = rand() * 2 - 1 # uniformly distributed number on x-axis
y = rand() * 2 - 1 # uniformly distributed number on y-axis
r2 = x*x + y*y # radius squared, in radial coordinates
if r2 < 1.0
n_landed_in_circle += 1
end
end
return n_landed_in_circle / N * 4.0
end
 
function parallel_pi_computation(N::Int; ncores::Int=4)
"""
Compute pi in parallel, over ncores cores, with a Monte Carlo simulation throwing N total darts
"""
# compute sum of pi's estimated among all cores in parallel
sum_of_pis = #parallel (+) for i=1:ncores
compute_pi(int(N/ncores))
end
return sum_of_pis / ncores # average value
end
 
julia> #time parallel_pi_computation(int(1e9))
elapsed time: 2.702617652 seconds (93400 bytes allocated)
3.1416044160000003
But when I do:
using IJulia
notebook()
And try to do the same thing inside the Notebook it only uses 1 core:
In [5]: #time parallel_pi_computation(int(10e8))
elapsed time: 10.277870808 seconds (219188 bytes allocated)
Out[5]: 3.141679988
So, why isnt Jupyter using all the cores? What can I do to make it work?
Thanks.
Using addprocs(4) as the first command in your notebook should provide four workers for doing parallel operations from within your notebook.
One way to solve this is to create a kernel that always uses 4 cores. For that some manual work is required. I assume that you are on a unix machine.
In the folder ~/.ipython/kernels/julia-0.x, you will find following kernel.json file:
{
"display_name": "Julia 0.3.9",
"argv": [
"/usr/local/Cellar/julia/0.3.9_1/bin/julia",
"-i",
"-F",
"/Users/ch/.julia/v0.3/IJulia/src/kernel.jl",
"{connection_file}"
],
"language": "julia"
}
If you copy the whole folder cp -r julia-0.x julia-0.x-p4, and modify the newly copied kernel.json file:
{
"display_name": "Julia 0.3.9 p4",
"argv": [
"/usr/local/Cellar/julia/0.3.9_1/bin/julia",
"-p",
"4",
"-i",
"-F",
"/Users/ch/.julia/v0.3/IJulia/src/kernel.jl",
"{connection_file}"
],
"language": "julia"
}
The paths will probably be different for you. Note that I only gave the kernel a new name and added the command line argument `-p 4.
You should see a new kernel named Julia 0.3.9 p4 which should always use 4 cores.
Also note that this kernel file will not get updated when you update IJulia, so you have to update it manually whenever you update julia or IJulia.
You can add new kernels using this command:
using IJulia
#for 4 cores
installkernel("Julia_4_threads", env=Dict("JULIA_NUM_THREADS"=>"4"))
#or for 8 cores
installkernel("Julia_8_threads", env=Dict("JULIA_NUM_THREADS"=>"8"))
After restart your VSCode this options will apear you your select kernel option.

Garbage collector in Ruby 2.2 provokes unexpected CoW

How do I prevent the GC from provoking copy-on-write, when I fork my process ? I have recently been analyzing the garbage collector's behavior in Ruby, due to some memory issues that I encountered in my program (I run out of memory on my 60core 0.5Tb machine even for fairly small tasks). For me this really limits the usefulness of ruby for running programs on multicore servers. I would like to present my experiments and results here.
The issue arises when the garbage collector runs during forking. I have investigated three cases that illustrate the issue.
Case 1: We allocate a lot of objects (strings no longer than 20 bytes) in the memory using an array. The strings are created using a random number and string formatting. When the process forks and we force the GC to run in the child, all the shared memory goes private, causing a duplication of the initial memory.
Case 2: We allocate a lot of objects (strings) in the memory using an array, but the string is created using the rand.to_s function, hence we remove the formatting of the data compared to the previous case. We end up with a smaller amount of memory being used, presumably due to less garbage. When the process forks and we force the GC to run in the child, only part of the memory goes private. We have a duplication of the initial memory, but to a smaller extent.
Case 3: We allocate fewer objects compared to before, but the objects are bigger, such that the amount of memory allocated stays the same as in the previous cases. When the process forks and we force the GC to run in the child all the memory stays shared, i.e. no memory duplication.
Here I paste the Ruby code that has been used for these experiments. To switch between cases you only need to change the “option” value in the memory_object function. The code was tested using Ruby 2.2.2, 2.2.1, 2.1.3, 2.1.5 and 1.9.3 on an Ubuntu 14.04 machine.
Sample output for case 1:
ruby version 2.2.2
proces pid log priv_dirty shared_dirty
Parent 3897 post alloc 38 0
Parent 3897 4 fork 0 37
Child 3937 4 initial 0 37
Child 3937 8 empty GC 35 5
The exact same code has been written in Python and in all cases the CoW works perfectly fine.
Sample output for case 1:
python version 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2]
proces pid log priv_dirty shared_dirty
Parent 4308 post alloc 35 0
Parent 4308 4 fork 0 35
Child 4309 4 initial 0 35
Child 4309 10 empty GC 1 34
Ruby code
$start_time=Time.new
# Monitor use of Resident and Virtual memory.
class Memory
shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
priv_dirty = '.+?Private_Dirty:\s+(\d+)'
MEM_REGEXP = /#{shared_dirty}#{priv_dirty}/m
# get memory usage
def self.get_memory_map( pids)
memory_map = {}
memory_map[ :pids_found] = {}
memory_map[ :shared_dirty] = 0
memory_map[ :priv_dirty] = 0
pids.each do |pid|
begin
lines = nil
lines = File.read( "/proc/#{pid}/smaps")
rescue
lines = nil
end
if lines
lines.scan(MEM_REGEXP) do |shared_dirty, priv_dirty|
memory_map[ :pids_found][pid] = true
memory_map[ :shared_dirty] += shared_dirty.to_i
memory_map[ :priv_dirty] += priv_dirty.to_i
end
end
end
memory_map[ :pids_found] = memory_map[ :pids_found].keys
return memory_map
end
# get the processes and get the value of the memory usage
def self.memory_usage( )
pids = [ $$]
result = self.get_memory_map( pids)
result[ :pids] = pids
return result
end
# print the values of the private and shared memories
def self.log( process_name='', log_tag="")
if process_name == "header"
puts " %-6s %5s %-12s %10s %10s\n" % ["proces", "pid", "log", "priv_dirty", "shared_dirty"]
else
time = Time.new - $start_time
mem = Memory.memory_usage( )
puts " %-6s %5d %-12s %10d %10d\n" % [process_name, $$, log_tag, mem[:priv_dirty]/1000, mem[:shared_dirty]/1000]
end
end
end
# function to delay the processes a bit
def time_step( n)
while Time.new - $start_time < n
sleep( 0.01)
end
end
# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=1)
result = []
count = size/20
if option > 3 or option < 1
count.times do
result << "%20.18f" % rand
end
elsif option == 1
count.times do
result << rand.to_s
end
elsif option == 2
count = count/10
count.times do
result << ("%20.18f" % rand)*30
end
end
return result
end
##### main #####
puts "ruby version #{RUBY_VERSION}"
GC.disable
# print the column headers and first line
Memory.log( "header")
# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)
Memory.log( "Parent", "post alloc")
lab_time = Time.new - $start_time
if lab_time < 3.9
lab_time = 0
end
# start the forking
pid = fork do
time = 4
time_step( time + lab_time)
Memory.log( "Child", "#{time} initial")
# force GC when nothing happened
GC.enable; GC.start; GC.disable
time = 8
time_step( time + lab_time)
Memory.log( "Child", "#{time} empty GC")
sleep( 1)
STDOUT.flush
exit!
end
time = 4
time_step( time + lab_time)
Memory.log( "Parent", "#{time} fork")
# wait for the child to finish
Process.wait( pid)
Python code
import re
import time
import os
import random
import sys
import gc
start_time=time.time()
# Monitor use of Resident and Virtual memory.
class Memory:
def __init__(self):
self.shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
self.priv_dirty = '.+?Private_Dirty:\s+(\d+)'
self.MEM_REGEXP = re.compile("{shared_dirty}{priv_dirty}".format(shared_dirty=self.shared_dirty, priv_dirty=self.priv_dirty), re.DOTALL)
# get memory usage
def get_memory_map(self, pids):
memory_map = {}
memory_map[ "pids_found" ] = {}
memory_map[ "shared_dirty" ] = 0
memory_map[ "priv_dirty" ] = 0
for pid in pids:
try:
lines = None
with open( "/proc/{pid}/smaps".format(pid=pid), "r" ) as infile:
lines = infile.read()
except:
lines = None
if lines:
for shared_dirty, priv_dirty in re.findall( self.MEM_REGEXP, lines ):
memory_map[ "pids_found" ][pid] = True
memory_map[ "shared_dirty" ] += int( shared_dirty )
memory_map[ "priv_dirty" ] += int( priv_dirty )
memory_map[ "pids_found" ] = memory_map[ "pids_found" ].keys()
return memory_map
# get the processes and get the value of the memory usage
def memory_usage( self):
pids = [ os.getpid() ]
result = self.get_memory_map( pids)
result[ "pids" ] = pids
return result
# print the values of the private and shared memories
def log( self, process_name='', log_tag=""):
if process_name == "header":
print " %-6s %5s %-12s %10s %10s" % ("proces", "pid", "log", "priv_dirty", "shared_dirty")
else:
global start_time
Time = time.time() - start_time
mem = self.memory_usage( )
print " %-6s %5d %-12s %10d %10d" % (process_name, os.getpid(), log_tag, mem["priv_dirty"]/1000, mem["shared_dirty"]/1000)
# function to delay the processes a bit
def time_step( n):
global start_time
while (time.time() - start_time) < n:
time.sleep( 0.01)
# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=2):
count = size/20
if option > 3 or option < 1:
result = [ "%20.18f"% random.random() for i in xrange(count) ]
elif option == 1:
result = [ str( random.random() ) for i in xrange(count) ]
elif option == 2:
count = count/10
result = [ ("%20.18f"% random.random())*30 for i in xrange(count) ]
return result
##### main #####
print "python version {version}".format(version=sys.version)
memory = Memory()
gc.disable()
# print the column headers and first line
memory.log( "header") # Print the headers of the columns
# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10) # Allocate memory
memory.log( "Parent", "post alloc")
lab_time = time.time() - start_time
if lab_time < 3.9:
lab_time = 0
# start the forking
pid = os.fork() # fork the process
if pid == 0:
Time = 4
time_step( Time + lab_time)
memory.log( "Child", "{time} initial".format(time=Time))
# force GC when nothing happened
gc.enable(); gc.collect(); gc.disable();
Time = 10
time_step( Time + lab_time)
memory.log( "Child", "{time} empty GC".format(time=Time))
time.sleep( 1)
sys.exit(0)
Time = 4
time_step( Time + lab_time)
memory.log( "Parent", "{time} fork".format(time=Time))
# Wait for child process to finish
os.waitpid( pid, 0)
EDIT
Indeed, calling the GC several times before forking the process solves the issue and I am quite surprised. I have also run the code using Ruby 2.0.0 and the issue doesn't even appear, so it must be related to this generational GC just like you mentioned.
However, if I call the memory_object function without assigning the output to any variables (I am only creating garbage), then the memory is duplicated. The amount of memory that is copied depends on the amount of garbage that I create - the more garbage, the more memory becomes private.
Any ideas how I can prevent this ?
Here are some results
Running the GC in 2.0.0
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 3664 post alloc 67 0
Parent 3664 4 fork 1 69
Child 3700 4 initial 1 69
Child 3700 8 empty GC 6 65
Calling memory_object( 1000*1000) in the child
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 3703 post alloc 67 0
Parent 3703 4 fork 1 70
Child 3739 4 initial 1 70
Child 3739 8 empty GC 15 56
Calling memory_object( 1000*1000*10)
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 3743 post alloc 67 0
Parent 3743 4 fork 1 69
Child 3779 4 initial 1 69
Child 3779 8 empty GC 89 5
UPD2
Suddenly figured out why all the memory is going private if you format the string -- you generate garbage during formatting, having GC disabled, then enable GC, and you've got holes of released objects in your generated data. Then you fork, and new garbage starts to occupy these holes, the more garbage - more private pages.
So i added a cleanup function to run GC each 2000 cycles (just enabling lazy GC didn't help):
count.times do |i|
cleanup(i)
result << "%20.18f" % rand
end
#......snip........#
def cleanup(i)
if ((i%2000).zero?)
GC.enable; GC.start; GC.disable
end
end
##### main #####
Which resulted in(with generating memory_object( 1000 * 1000 * 10) after fork):
RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 0
ruby version 2.2.0
proces pid log priv_dirty shared_dirty
Parent 2501 post alloc 35 0
Parent 2501 4 fork 0 35
Child 2503 4 initial 0 35
Child 2503 8 empty GC 28 22
Yes, it affects performance, but only before forking, i.e. increase load time in your case.
UPD1
Just found criteria by which ruby 2.2 sets old object bits, it's 3 GC's, so if you add following before forking:
GC.enable; 3.times {GC.start}; GC.disable
# start the forking
you will get(the option is 1 in command line):
$ RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 1
ruby version 2.2.0
proces pid log priv_dirty shared_dirty
Parent 2368 post alloc 31 0
Parent 2368 4 fork 1 34
Child 2370 4 initial 1 34
Child 2370 8 empty GC 2 32
But this needs to be further tested concerning the behavior of such objects on future GC's, at least after 100 GC's :old_objects remains constant, so i suppose it should be OK
Log with GC.stat is here
By the way there's also option RGENGC_OLD_NEWOBJ_CHECK to create old objects from the beginning, but i doubt it's a good idea, but may be useful for a particular case.
First answer
My proposition in the comment above was wrong, actually bitmap tables are the savior.
(option = 1)
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 14807 post alloc 27 0
Parent 14807 4 fork 0 27
Child 14809 4 initial 0 27
Child 14809 8 empty GC 6 25 # << almost everything stays shared <<
Also had by hand and tested Ruby Enterprise Edition it's only half better than worst cases.
ruby version 1.8.7
proces pid log priv_dirty shared_dirty
Parent 15064 post alloc 86 0
Parent 15064 4 fork 2 84
Child 15065 4 initial 2 84
Child 15065 8 empty GC 40 46
(I made the script run strictly 1 GC, by increasing RUBY_GC_HEAP_INIT_SLOTS to 600k)

Xcode 4.6 / 5 with lldb breakpoints not working

I had a problem with using debugger LLDB,
if in "main.c" , I include another file like "a.c" , and set breakpoint in "a.c"
the breakpoint will never been stopped.
Is anyone else getting this?
ok, here is the example
// main.c
#include "a.c"
int main()
{
test();
}
// a.c
void test()
{
return; // (Using UI to)set break point here, the gdb will stop, and lldb will not
}
======================================================================
To trojanfoe:
I had tried these steps in Xcode 4.6.3 command line utilities,
the result is like yours, but my problem is on GUI,
When I use the mouse to set a break point on "a.c", it will not work.
I had tried to stop on main(), and enter this cmd "br list",
here is the message on console,
(lldb) br list
Current breakpoints:
1: file ='a.c', line = 13, locations = 0 (pending)
2: file ='main.c', line = 15, locations = 1, resolved = 1
2.1: where = test`main + 15 at main.c:15, address = 0x0000000100000f3f, resolved, hit count = 1
(lldb)
if you need the log by using command line utilities, please tell me,
thanks~
See "File and line breakpoints are not getting hit" in http://lldb.llvm.org/troubleshooting.html - that seems to be talking about exactly your build scenario, and I've just had this problem. To solve it I not only had to put this in $HOME/.lldbinit:
settings set target.inline-breakpoint-strategy always
I also had to do a clobber (distclean) build and restart Xcode.
NOTE This is not an answer, however I wanted to document the works for me response fully.
OP: Please follow these steps to see how it differs for you.
$ clang -g -o bptest main.c
$ ls -l
total 32
-rw-r--r-- 1 andy staff 110 Oct 31 10:55 a.c
-rwxr-xr-x 1 andy staff 4664 Oct 31 10:56 bptest
drwxr-xr-x 3 andy staff 102 Oct 31 10:56 bptest.dSYM (NOTE THIS)
-rw-r--r-- 1 andy staff 42 Oct 31 10:55 main.c
$ lldb
(lldb) target create bptest
Current executable set to 'bptest' (x86_64).
(lldb) break set -b test
Breakpoint 1: where = bptest`test + 4 at a.c:4, address = 0x0000000100000f34
(lldb) run
Process 9743 launched: '/Users/andy/tmp/bptest/bptest' (x86_64)
Process 9743 stopped
* thread #1: tid = 0x65287, 0x0000000100000f34 bptest`test + 4 at a.c:4, queue = 'com.apple.main-thread, stop reason = breakpoint 1.1
frame #0: 0x0000000100000f34 bptest`test + 4 at a.c:4
1 // a.c
2 void test()
3 {
-> 4 return; // (Using UI to)set break point here, the gdb will stop, and lldb will not
5 }
(lldb) bt
* thread #1: tid = 0x65287, 0x0000000100000f34 bptest`test + 4 at a.c:4, queue = 'com.apple.main-thread, stop reason = breakpoint 1.1
frame #0: 0x0000000100000f34 bptest`test + 4 at a.c:4
frame #1: 0x0000000100000f49 bptest`main + 9 at main.c:4
frame #2: 0x00007fff8eb3f7e1 libdyld.dylib`start + 1
(lldb)
Note: I am using the Xcode 5.0.1 command line utilities.

Ruby big array and memory

I created a big array a, whose memory grew to ~500 MB:
a = []
t = Thread.new do
loop do
sleep 1
print "#{a.size} "
end
end
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
t.join
After that, I "cleared" a, but the allocated memory didn't change until I killed the process. Is there something special I need to do to remove all these data which were assigned to a from the memory?
If I use the Ruby Garbage Collection Profiler on a lightly modified version of your code:
GC::Profiler.enable
GC::Profiler.clear
a = []
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
GC::Profiler.report
I get the following output (on Ruby 1.9.3)(some columns and rows removed):
GC 60 invokes.
Index Invoke Time(sec) Use Size(byte) Total Size(byte) ...
1 0.109 131136 409200 ...
2 0.125 192528 409200 ...
...
58 33.484 199150344 260938656 ...
59 36.000 211394640 260955024 ...
The profile starts with 131 136 bytes used, and ends with 211 394 640 bytes used, without decreasing in size anywhere in the run, we can assume that no garbage collection has taken place.
If I then add a line of code which adds a single element to the array a, placed after a has grown to 5 million elements, and then has an empty array assigned to it:
GC::Profiler.enable
GC::Profiler.clear
a = []
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
# the only change is to add one element to the (now) empty array a
a << [rand(36**10).to_s(36)]
GC::Profiler.report
This changes the profiler output to (some columns and rows removed):
GC 62 invokes.
Index Invoke Time(sec) Use Size(byte) Total Size(byte) ...
1 0.156 131376 409200 ...
2 0.172 192792 409200 ...
...
59 35.375 211187736 260955024 ...
60 36.625 211395000 469679760 ...
61 41.891 2280168 307832976 ...
This profiler run now starts with 131 376 bytes used, which is similar to the previous run, grows, but ends with 2 280 168 bytes used, significantly lower than the previous profile run that ended with 211 394 640 bytes used, we can assume that garbage collection took place this during this run, probably triggered by our new line of code that adds an element to a.
The short answer is no, you don't need to do anything special to remove the data that was assigned to a, but hopefully this gives you the tools to prove it.
You can call GC.start(), but you might not want to. See for example: Ruby garbage collect for a discussion here on Stack Overflow. Basically, I'd let the garbage collector decide for itself when to run unless you have a compelling reason to force it.

sphinx, xmlpipe2, and cassandra

I start to using cassandra and I want to index my db with sphinx.
I wrote ruby script which is used as xmlpipe, and I configure sphinx to use it.
source xmlsrc
{
type = xmlpipe2
xmlpipe_command = /usr/local/bin/ruby /home/httpd/html/app/script/sphinxpipe.rb
}
When I run script from console output looks fine, but when I run indexer sphinx return error
$ indexer test_index
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/usr/local/etc/sphinx.conf'...
indexing index 'test_index'...
ERROR: index 'test_index': source 'xmlsrc': attribute 'id' required in <sphinx:document> (line=10, pos=0, docid=0).
total 0 docs, 0 bytes
total 0.000 sec, 0 bytes/sec, 0.00 docs/sec
total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 0 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
my script is very simple
$stdout.sync = true
puts %{<?xml version="1.0" encoding="utf-8"?>}
puts %{<sphinx:docset>}
puts %{<sphinx:schema>}
puts %{<sphinx:field name="body"/>}
puts %{</sphinx:schema>}
puts %{<sphinx:document id="ba32c02e-79e2-11df-9815-af1b5f766459">}
puts %{<body><![CDATA[aaa]]></body>}
puts %{</sphinx:document>}
puts %{</sphinx:docset>}
I use ruby 1.9.2-head, ubuntu 10.04, sphinx 0.9.9
How can I get this to work?
I have answer to my own quiestion :)
sphinx has defined document max id in source code
for 64 bit mashines
#define DOCID_MAX U64C(0xffffffffffffffff)
document id must be less than 18446744073709551615
for 32 bit mashines
#define DOCID_MAX 0xffffffffUL
document id must be less than 4294967295
I used SimpleUUID that why it don't work

Resources