I came across the Timeout module in Ruby, and wanted to test it out. I looked at their official source code at http://ruby-doc.org/stdlib-2.1.1/libdoc/timeout/rdoc/Timeout.html
Here is the code I had
require 'timeout'
require 'benchmark'
numbers = [*1..80]
Timeout::timeout(5) { numbers.combination(5).count }
=> 24040016
I did some benchmarking tests, and got the following.
10.828000 0.063000 10.891000 11.001676
According to the documentation, this method is supposed to return an exception if the block is not executed within 5 seconds. If it is executed within the time frame, it will return the result of the code block
For what it's worth, I've tried timeout with 1 second, instead of 5 seconds, and I still get returned the result of the code block.
Here is the official documentation
timeout(sec, klass=nil)
Performs an operation in a block, raising an error if it takes longer than sec seconds to complete.
sec: Number of seconds to wait for the block to terminate. Any number may be used,
including Floats to specify fractional seconds. A value of 0 or nil will execute the
block without any timeout.
klass: Exception Class to raise if the block fails to terminate in sec seconds. Omitting
will use the default, Timeout::Error
I am mystified as to why this doesn't work.
The problem is the way MRI (Matz's Ruby Implementation) thread scheduling works. MRI uses a GIL (Global Interpreter Lock), which in practice means only one thread is truly running at a time.
There are some exception, but for the majority of the time there is only one thread executing Ruby code at any one time.
Normally you do not notice this, even during heavy computations that consume 100% CPU, because the MRI keeps time-slicing the threads at regular intervals so that each thread gets a turn to run.
However there's one exception where time-slicing isn't active and that's when a Ruby thread is executing native C-code instead of Ruby code.
Now it so happens that Array#combination is implemented in pure C:
[1] pry(main)> show-source Array#combination
From: array.c (C Method):
static VALUE
rb_ary_combination(VALUE ary, VALUE num)
{
...
}
When we combine this knowledge with how Timeout.timeout is implemented we can start to get a clue of what is happening:
[7] pry(main)> show-source Timeout#timeout
From: /opt/ruby21/lib/ruby/2.1.0/timeout.rb # line 75:
75: def timeout(sec, klass = nil) #:yield: +sec+
76: return yield(sec) if sec == nil or sec.zero?
77: message = "execution expired"
78: e = Error
79: bl = proc do |exception|
80: begin
81: x = Thread.current
82: y = Thread.start {
83: begin
84: sleep sec
85: rescue => e
86: x.raise e
87: else
88: x.raise exception, message
89: end
90: }
91: return yield(sec)
92: ensure
93: if y
94: y.kill
95: y.join # make sure y is dead.
96: end
97: end
98: end
99: ...
1xx: end
Your code running Array.combination most likely actually starts executing even BEFORE the timeout thread runs sleep sec on line 84. Your code is launched on line 91 through yield(sec).
This means the order of execution actually becomes:
1: [thread 1] numbers.combination(5).count
# ...some time passes while the combinations are calculated ...
2: [thread 2] sleep 5 # <- The timeout thread starts running sleep
3: [thread 1] y.kill # <- The timeout thread is instantly killed
# and never times out.
In order to make sure the timeout thread starts first you can try this, which will most likely trigger the timeout exception this time:
Timeout::timeout(5) { Thread.pass; numbers.combination(5).count }
This is because by running Thread.pass you allow the MRI scheduler to start and run the code on line 82 before the native combination C-code executes. However even in this case the exception won't be triggered until combination exits because of the GIL.
There is no way around this unfortunately. You would have to use something like JRuby instead, which has real concurrent threads. Or you could run the combination calculation in a Process instead of a thread.
Related
I am building a shippable app with Julia using the PackageCompiler.jl library.
https://github.com/JuliaLang/PackageCompiler.jl
I have followed the example here and got it to run as expected:
https://github.com/JuliaLang/PackageCompiler.jl
I am now trying to modify the code piece by piece to replace the toy function with my own function. It involves reading a CSV file into a dataframe whose path is taken from the command line. For now, I am just trying to accomplish this task.
I have this code in MyApp.jl:
module MyApp
using Example
using HelloWorldC_jll
using Pkg.Artifacts
using DataFrames, CSV, Statistics
fooifier_path() = joinpath(artifact"fooifier", "bin", "fooifier" * (Sys.iswindows() ? ".exe" : ""))
function julia_main()
try
real_main()
catch
Base.invokelatest(Base.display_error, Base.catch_stack())
return 1
end
return 0
end
function real_main()
# #show ARGS
# #show Base.PROGRAM_FILE
# #show DEPOT_PATH
# #show LOAD_PATH
# #show pwd()
# #show Base.active_project()
# #show Threads.nthreads()
# #show Sys.BINDIR
# display(Base.loaded_modules)
for arg in ARGS
println(arg)
end
println("this part worked!")
df = CSV.read(string(ARGS[1]), DataFrame)[10:end-10,:];
println(df)
return
end
if abspath(PROGRAM_FILE) == #__FILE__
real_main()
end
I compile the code with:
using PackageCompiler;
create_app("MyApp", "MyAppCompiled")
and it compiles just fine.
I run it with:
julia MyAppCompiled/bin/MyApp <absolute path to csv>
and it works up until the dataframe portion, where I get this error:
it worked!
MyApp(66457,0x115997dc0) malloc: *** error for object 0x1117d9510: pointer being realloc'd was not allocated
MyApp(66457,0x115997dc0) malloc: *** set a breakpoint in malloc_error_break to debug
signal (6): Abort trap: 6
in expression starting at none:0
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 47484171 (Pool: 47468051; Big: 16120); GC: 49
Abort trap: 6
Can anyone help me figure out what I am doing wrong? I know I am putting in the proper path because if I incorrectly type the path I get the " is not a proper file..." message. I've tried calling with relative and absolute paths.
I'm testing with a minimal example how futures work when using ProcessPoolExecutor.
First, I want to know the result of my processing functions, then I would like to add complexity to the scenario.
import time
import string
import random
import traceback
import concurrent.futures
from concurrent.futures import ProcessPoolExecutor
def process(*args):
was_ok = True
try:
name = args[0][0]
tiempo = args[0][1]
print(f"Task {name} - Sleeping {tiempo}s")
time.sleep(tiempo)
except:
was_ok = False
return name, was_ok
def program():
amount = 10
workers = 2
data = [''.join(random.choices(string.ascii_letters, k=5)) for _ in range(amount)]
print(f"Data: {len(data)}")
tiempo = [random.randint(5, 15) for _ in range(amount)]
print(f"Times: {len(tiempo)}")
with ProcessPoolExecutor(max_workers=workers) as pool:
try:
index = 0
futures = [pool.submit(process, zipped) for zipped in zip(data, tiempo)]
for future in concurrent.futures.as_completed(futures):
name, ok = future.result()
print(f"Task {index} with code {name} finished: {ok}")
index += 1
except Exception as e:
print(f'Future failed: {e}')
if __name__ == "__main__":
program()
If I run this program, the output is as expected, obtaining all the future results. However, just at the end I also get a failure:
Data: 10
Times: 10
Task utebu - Sleeping 14s
Task klEVG - Sleeping 10s
Task ZAHIC - Sleeping 8s
Task 0 with code klEVG finished: True
Task RBEgG - Sleeping 9s
Task 1 with code utebu finished: True
Task VYCjw - Sleeping 14s
Task 2 with code ZAHIC finished: True
Task GDZmI - Sleeping 9s
Task 3 with code RBEgG finished: True
Task TPJKM - Sleeping 10s
Task 4 with code GDZmI finished: True
Task CggXZ - Sleeping 7s
Task 5 with code VYCjw finished: True
Task TUGJm - Sleeping 12s
Task 6 with code CggXZ finished: True
Task THlhj - Sleeping 11s
Task 7 with code TPJKM finished: True
Task 8 with code TUGJm finished: True
Task 9 with code THlhj finished: True
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/concurrent/futures/process.py", line 101, in _python_exit
thread_wakeup.wakeup()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/concurrent/futures/process.py", line 89, in wakeup
self._writer.send_bytes(b"")
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 183, in send_bytes
self._check_closed()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 136, in _check_closed
raise OSError("handle is closed")
OSError: handle is closed
AFAIK the code doesn't have an error itself. I've been researching really old questions like the following, without luck to find a fix for it:
this one related to the same error msg but for Python 2,
this issue in GitHub which seems to be exactly the same I'm having (and seems not fixed yet...?),
this other issue where the comment I linked to seems to point to the actual problem, but doesn't find a solution to it,
of course the official docs for Python 3.7,
...
And so forth. However still unlucky to find how to solve this behaviour. Even found a old question here in SO who pointed to avoid using as_completed function and using submit instead (and from there came this actual test, before I just had a map to my process function).
Any idea, fix, explanation or workaround are welcome. Thanks!
The debugger can be programmatically invoked by executing (break). For example, the debugging banner then displays what caused the interrupt, the HELP line, the available restarts, some related info, and finally the source of the interrupt:
debugger invoked on a SIMPLE-CONDITION in thread
#<THREAD "main thread" RUNNING {10010B0523}>:
break
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [CONTINUE] Return from BREAK.
1: [ABORT ] Exit debugger, returning to top level.
#(
NODE: STATE=<NIL NIL NIL 0.0 0.0
( )> DEPTH=0)
#(
NODE: STATE=<NIL NIL NIL 0.0 0.0
((ACTIVE GATE1) (ACTIVE GATE2) (COLOR RECEIVER1 BLUE) (COLOR RECEIVER2 RED) (COLOR TRANSMITTER1 BLUE) (COLOR TRANSMITTER2 RED) (FREE ME) (LOC CONNECTOR1 AREA5) (LOC CONNECTOR2 AREA7) (LOC ME AREA5))> DEPTH=0)
(DF-BNB1 )
source: (BREAK)
0]
I don't understand the related info between the restarts and the source. Can this info be suppressed, as
sometimes it is many lines long in my application. I've tried changing the debug & safety optimization settings, but to no effect.
The output you are confused with is related to the place in the code where break was invoked. When I call it from the vanilla Lisp REPL (without SLIME), it displays:
(SB-INT:SIMPLE-EVAL-IN-LEXENV (BREAK) #<NULL-LEXENV>)
However, if I do something wrong in the debugger, here's what happens:
0] q
; in: PROGN (PRINT 1)
; (PROGN Q)
;
; caught WARNING:
; undefined variable: COMMON-LISP-USER::Q
;
; compilation unit finished
; Undefined variable:
; Q
; caught 1 WARNING condition
debugger invoked on a UNBOUND-VARIABLE in thread
#<THREAD "main thread" RUNNING {10005204C3}>:
The variable Q is unbound.
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [CONTINUE ] Retry using Q.
1: [USE-VALUE ] Use specified value.
2: [STORE-VALUE] Set specified value and use it.
3: [ABORT ] Reduce debugger level (to debug level 1).
4: Return from BREAK.
5: Exit debugger, returning to top level.
((LAMBDA (#:G498)) #<unused argument>)
source: (PROGN Q)
You can see that the last line resembles the output you got with the line starting at source:. Actually, the output we saw consists of 3 main parts:
1. Description of the condition
2. Listing of the available restarts
3. Debug REPL prompt printed by debug-loop-fun
The last output is part of the prompt and it is generated by the invocation of:
(print-frame-call *current-frame* *debug-io* :print-frame-source t)
So, you can recompile the call providing :print-frame-source nil or try to understand why your current frame looks this way...
In Julia, I want to use addprocs and pmap inside a function that is defined inside a module. Here's a silly example:
module test
using Distributions
export g, f
function g(a, b)
a + rand(Normal(0, b))
end
function f(A, b)
close = false
if length(procs()) == 1 # If there are already extra workers,
addprocs() # use them, otherwise, create your own.
close = true
end
W = pmap(x -> g(x, b), A)
if close == true
rmprocs(workers()) # Remove the workers you created.
end
return W
end
end
test.f(randn(5), 1)
This returns a very long error
WARNING: Module test not defined on process 4
WARNING: Module test not defined on process 3
fatal error on fatal error on WARNING: Module test not defined on process 2
43: : WARNING: Module test not defined on process 5
fatal error on fatal error on 5: 2: ERROR: UndefVarError: test not defined
in deserialize at serialize.jl:504
in handle_deserialize at serialize.jl:477
in deserialize at serialize.jl:696
...
in message_handler_loop at multi.jl:878
in process_tcp_streams at multi.jl:867
in anonymous at task.jl:63
Worker 3 terminated.
Worker 2 terminated.ERROR (unhandled task failure): EOFError: read end of file
WARNING: rmprocs: process 1 not removed
Worker 5 terminated.ERROR (unhandled task failure): EOFError: read end of file
4-element Array{Any,1}:Worker 4 terminated.ERROR (unhandled task failure): EOFError: read end of file
ERROR (unhandled task failure): EOFError: read end of file
ProcessExitedException()
ProcessExitedException()
ProcessExitedException()
ProcessExitedException()
What I'm trying to do is write a package that contains functions that perform operations that can be optionally parallelized at the user's discretion. So a function like f might take an argument par::Bool that does something like I've shown above if the user calls f with par = true and loops otherwise. So from within the definition of f (and within the definition of the module test), I want to create workers and broadcast the Distributions package and the function g to them.
What's wrong with using #everywhere in your function? The following, for example, works fine on my computer.
function f(A, b)
close = false
if length(procs()) == 1 # If there are already extra workers,
addprocs() # use them, otherwise, create your own.
#everywhere begin
using Distributions
function g(a, b)
a + rand(Normal(0, b))
end
end
close = true
end
W = pmap(x -> g(x, b), A)
if close == true
rmprocs(workers()) # Remove the workers you created.
end
return W
end
f(randn(5), 1)
Note: when I first ran this, I needed to recompile the Distributions package since it had been updated since I had last used it. When I first tried the above script right after the recompiling, it failed. But, I then quit Julia and reopened it and it worked fine. Perhaps that was what is causing your error?
Suppose there is a script A that calls function B, both in Julia.
There are some errors in function B, which cause the script to be stopped at runtime.
Is there a neat way to find out which line is causing the error?
It does not make any sense, to have to put messages like println manually in each line to find out upto which line the code survives, and in which line error happens.
Edit: I am using Linux Red Hat 4.1.2 and Julia version 0.3.6. directly. With no IDE.
Reading the backtrace:
juser#juliabox:~$ cat foo.jl
# line 1 empty comment
foo() = error("This is line 2")
foo() # line 3
juser#juliabox:~$ julia foo.jl
ERROR: This is line 2
in foo at /home/juser/foo.jl:2
in include at ./boot.jl:245
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:285
in _start at ./client.jl:354
while loading /home/juser/foo.jl, in expression starting on line 3
This lines in foo at /home/juser/foo.jl:2 ... while loading /home/juser/foo.jl, in expression starting on line 3 reads as: "there was an error at line 2 in /home/juser/foo.jl file ... while loading /home/juser/foo.jl, in expression starting on line 3"
Looks pretty clear to me!
Edit: /home/juser/foo.jl:2 means; file: /home/juser/foo.jl, line number: 2.
Also you could use #show macro instead of println function for debugging purposes:
julia> println(1 < 5 < 10)
true
julia> #show 1 < 5 < 10
(1<5<10) => true
true