I am trying to write a custom EM::Protocol module that can pack/unpack structured binary packets. Packet structure should be defined as name/format pairs, either as a string, some other easily parsable format, or some sort of DSL.
Some quick code to get the idea across:
module PacketProtocol
def self.included(base)
base.extend ClassMethods
end
def receive_data(data)
# retrieve packet header
# find matching packet definition
# packet.unpack(data)
end
module ClassMethods
def packet(defn)
# create an instance of Packet (see blow) and shove it
# somewhere i can get to later.
end
end
end
module MyHandler
include PacketProtocol
packet '<id:S><len:S><msg:A%{len}>'
end
EM.run do
EM.start_server '0.0.0.0', 8080, MyHandler
end
My goal is to minimize runtime complexity. Packet definitions are static per execution, so I would like to avoid this (crude) implementation:
class Packet
FmtSize = {
'S' => 2,
'A' => Proc.new {|fmt| fmt[1..-1].to_i }
}
def initialize(defn)
#fields = defn.scan(/<([^>]+):([^>]+)>/)
end
def pack(data)
data.values.pack #fields.map { |name, fmt| fmt % data }.join
end
def unpack(defn)
data = {}
posn = 0
#fields.each do |name, len|
fmt = len % data
len = FmtSizes[fmt[0]]
len = len.call(fmt) if len.class == Proc
data[name.to_sym] = bytes[posn..posn + len - 1].unpack(fmt)[0]
posn += len
end
data
end
end
data = { :id => 1, :len => 5, :msg = 'Hello' }
packet = Packet.new '<id:S><len:S><msg:A%{len}>'
packed = packet.pack(data)
require 'benchmark'
Benchmark.bm(7) do |x|
x.report('slow') {
100000.times do
unpacked = packet.unpack(packed)
end
}
x.report('fast') {
100000.times do
data = {}
data[:id] = packed[0..1].unpack('S' % data)
data[:len] = packed[2..3].unpack('S' % data)
data[:msg] = packed[4..8].unpack('A%{len}' % data)
end
}
end
# output:
# user system total real
# slow 1.970000 0.000000 1.970000 ( 1.965525)
# fast 0.140000 0.000000 0.140000 ( 0.146227)
Of the two examples, using the Packet class appears to be several magnitudes slower.
SO. The question is:
Is there a way (or a gem) that allows you generate code at runtime (other than simply eval'ing strings)?
EDIT:
Just found BinData. While it's feature-set is nice, it benchmarks much slower as well.
Related
I have some objects in an array objects. Given a certain property-value pair, I need a function that returns the first object that matches this. For example, given objects.byName "John", it should return the first object with name: "John".
Currently I'm doing this:
def self.byName name
ID_obj_by_name = {}
##objects.each_with_index do |o, index|
ID_obj_by_name[o.name] = index
end
##objects[ID_obj_by_name[name]]
end
But it seems very slow, and is using a lot of memory. How can I improve this?
If you need performance, you should consider this approach:
require 'benchmark'
class Foo
def initialize(name)
#name = name
end
def name
#name
end
end
# Using array ######################################################################
test = []
500000.times do |i|
test << Foo.new("ABC" + i.to_s + "#!###!DS")
end
puts "using array"
time = Benchmark.measure {
result = test.find { |o| o.name == "ABC250000#!###!DS" }
}
puts time
####################################################################################
# Using a hash #####################################################################
test = {}
i_am_your_object = Object.new
500000.times do |i|
test["ABC" + i.to_s + "#!###!DS"] = i_am_your_object
end
puts "using hash"
time = Benchmark.measure {
result = test["ABC250000#!###!DS"]
}
puts time
####################################################################################
Results:
using array
0.060000 0.000000 0.060000 ( 0.060884)
using hash
0.000000 0.000000 0.000000 ( 0.000005)
Try something like
def self.by_name name
##objects.find { |o| o.name == name }
end
I have a number of ranges that I want merge together if they overlap. The way I’m currently doing this is by using Sets.
This is working. However, when I attempt the same code with a larger ranges as follows, I get a `stack level too deep (SystemStackError).
require 'set'
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error
puts sets_subsets
The line that is failing is taken directly from the Ruby Set Documentation.
I would appreciate it if anyone could suggest a fix or an alternative that works for the above example
EDIT
I have put the full code I’m using here:
Basically it is used to add html tags to an amino acid sequence according to some features.
require 'set'
def calculate_formatting_classes(hsps, signalp)
merged_hsps = merge_ranges(hsps)
sp = format_signalp(merged_hsps, signalp)
hsp_class = (merged_hsps - sp[1]) - sp[0]
rank_format_positions(sp, hsp_class)
end
def merge_ranges(ranges)
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten
end
def format_signalp(merged_hsps, sp)
sp_class = sp - merged_hsps
sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
[sp_class, sp_hsp_class]
end
def rank_format_positions(sp, hsp_class)
results = []
results += sets_to_hash(sp[0], 'sp')
results += sets_to_hash(sp[1], 'sphsp')
results += sets_to_hash(hsp_class, 'hsp')
results.sort_by { |s| s[:pos] }
end
def sets_to_hash(set = nil, cl)
return nil if set.nil?
hashes = []
merged_set = set.divide { |i, j| (i - j).abs == 1 }
merged_set.each do |s|
hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
end
hashes
end
working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
sp = Range.new(1, 20).to_set
# working
results = calculate_formatting_classes(working_hsp, sp)
# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)
puts results
Here is one way to do this:
ranges = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
ranges.size.times do
ranges = ranges.sort_by(&:begin)
t = ranges.each_cons(2).to_a
t.each do |r1, r2|
if (r2.cover? r1.begin) || (r2.cover? r1.end) ||
(r1.cover? r2.begin) || (r1.cover? r2.end)
ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
ranges.delete(r1)
ranges.delete(r2)
t.delete [r1,r2]
end
end
end
p ranges
#=> [73..2914, 3203..3241]
The other answers aren't bad, but I prefer a simple recursive approach:
def merge_ranges(*ranges)
range, *rest = ranges
return if range.nil?
# Find the index of the first range in `rest` that overlaps this one
other_idx = rest.find_index do |other|
range.cover?(other.begin) || other.cover?(range.begin)
end
if other_idx
# An overlapping range was found; remove it from `rest` and merge
# it with this one
other = rest.slice!(other_idx)
merged = ([range.begin, other.begin].min)..([range.end, other.end].max)
# Try again with the merged range and the remaining `rest`
merge_ranges(merged, *rest)
else
# No overlapping range was found; move on
[ range, *merge_ranges(*rest) ]
end
end
Note: This code assumes each range is ascending (e.g. 10..5 will break it).
Usage:
ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]
ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]
I believe your resulting set has too many items (2881) to be used with divide, which if I understood correctly, would require 2881^2881 iterations, which is such a big number (8,7927981983090337174360463368808e+9966) that running it would take nearly forever even if you didn't get stack level too deep error.
Without using sets, you can use this code to merge the ranges:
module RangeMerger
def merge(range_b)
if cover?(range_b.first) && cover?(range_b.last)
self
elsif cover?(range_b.first)
self.class.new(first, range_b.last)
elsif cover?(range_b.last)
self.class.new(range_b.first, last)
else
nil # Unmergable
end
end
end
module ArrayRangePusher
def <<(item)
if item.kind_of?(Range)
item.extend RangeMerger
each_with_index do |own_item, idx|
own_item.extend RangeMerger
if new_range = own_item.merge(item)
self[idx] = new_range
return self
end
end
end
super
end
end
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
new_ranges = Array.new
new_ranges.extend ArrayRangePusher
ranges.each do |range|
new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect
This will output:
[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]
which I believe is the intended output for your original problem. It's a bit ugly, but I'm a bit rusty at the moment.
Edit: I don't think this has anything to do with your original problem before the edits which was about merging ranges.
I'm scrubbing large data files (+1MM comma-separated rows). An example row might look like this:
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
Certain columns must be removed from it, after which the row should look like this:
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo,05-MAR-14 05:50:24,SourceID,TransactionalID"
QUESTION 1: If I convert a row of data into an Array, which method is preferred for removing elements: Array#delete_at or Array#slice!? I'd like to know which is the more idiomatic option. Performance is a consideration here, and I'm on a Windows machine.
def remove_bad_columns
ary = #row.split(",")
ary.delete_at(10)
ary.delete_at(5)
#row = ary.join(",")
end
QUESTION 2: I was wondering if one of these methods was implemented using the other. How can I see how the methods are built in ruby? (How for is implemented using each, for example.)
I suggest you use Array#values_at rather than delete_at or slice!:
def remove_vals(str, *indices)
ary = str.split(",")
v = (0...ary.size).to_a - indices
ary.values_at(*v).join(",")
end
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry," +
"R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
#row = remove_vals(#row, 5, 10)
#=> "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo," +
# "05-MAR-14 05:50:24,SourceID,TransactionalID"
Array#values_at has the advantage over the other two methods that you don't have to worry about the order in which the elements are removed.
The efficiency of this method is not significantly different than the other two. If #spickermann would like to add it to his benchmarks, he could use this:
def values_at
ary = array.split(",")
v = (0...ary.size).to_a - [5,10]
#row = ary.values_at(*v).join(",")
end
There is not really a difference in performance. I would prefer delete_at because that reads nicer.
require 'benchmark'
def array
"123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
end
def delete_at
ary = array.dup.split(",")
ary.delete_at(10)
ary.delete_at(5)
#row = ary.join(",")
end
def slice!
ary = array.dup.split(",")
ary.slice!(10)
ary.slice!(5)
#row = ary.join(",")
end
require 'benchmark'
n = 1_000_000
Benchmark.bmbm(15) do |x|
x.report("delete_at :") { n.times do; delete_at; end }
x.report("slice! :") { n.times do; slice! ; end }
end
# Rehearsal ---------------------------------------------------
# delete_at : 4.560000 0.000000 4.560000 ( 4.566496)
# slice! : 4.580000 0.010000 4.590000 ( 4.576767)
# ------------------------------------------ total: 9.150000sec
#
# user system total real
# delete_at : 4.500000 0.000000 4.500000 ( 4.505638)
# slice! : 4.600000 0.000000 4.600000 ( 4.613447)
I have a method, foo, that yields objects. I want to count the number of objects it yields.
I have
def total_foo
count = 0
foo { |f| count += 1}
count
end
but there's probably a better way. Any ideas for this new Rubyist?
Here's the definition for foo (it's a helper method in Rails):
def foo(resource=#resource)
resource.thingies.each do |thingy|
bar(thingy) { |b| yield b } # bar also yields objects
end
end
Any method that calls yield can be used to build an Enumerator object, on which you can call count, by means of the Object#to_enum method. Remember that when you call count the iterator is actually executed so it should be free of side effects! Following a runnable example that mimics your scenario:
#resources = [[1,2], [3,4]]
def foo(resources = #resources)
resources.each do |thingy|
thingy.each { |b| yield b }
end
end
foo { |i| puts i }
# Output:
# 1
# 2
# 3
# 4
to_enum(:foo).count
# => 4
You can pass an argument to foo:
to_enum(:foo, [[5,6]]).count
# => 2
Alternatively you can define foo to return an Enumerator when it's called without a block, this is the way stdlib's iterators work:
def foo(resources = #resources)
return to_enum(__method__, resources) unless block_given?
resources.each do |thingy|
thingy.each { |b| yield b }
end
end
foo.count
# => 4
foo([[1,2]]).count
# => 2
foo([[1,2]]) { |i| puts i }
# Output:
# 1
# 2
You can pass a block to to_enum that is called when you call size on the Enumerator to return a value:
def foo(resources = #resources)
unless block_given?
return to_enum(__method__, resources) do
resources.map(&:size).reduce(:+) # thanks to #Ajedi32
end
end
resources.each do |thingy|
thingy.each { |b| yield b }
end
end
foo.size
# => 4
foo([]).size
# => 0
In this case using size is sligthly faster than count, your mileage may vary.
Assuming you otherwise only care about the side-effect of foo, you could have foo itself count the iterations:
def foo(resource=#resource)
count = 0
resource.thingies.each do |thingy|
bar(thingy) do |b|
count += 1
yield b
end # bar also yields objects
end
count
end
And then:
count = foo { |f| whatever... }
You can also ignore the return value if you choose, so just:
foo { |f| whatever... }
In cases you don't care what the count is.
There may be better ways to handle all of this depending upon the bigger context.
I have the following class:
require 'strscan'
class ConfParser
include Enumerable
class Error < StandardError; end
VERSION = '0.0.1'
SECTION_REGEX = /^\[ # Opening bracket
([^\]]+) # Section name
\]$ # Closing bracket
/x
PARAMETER_REGEX = /^\s*([^:]+) # Option
:
(.*?)$ # Value
/x
attr_accessor :filename, :sections
CONFIG_DIRECTORY = "./config"
ENCODING = "UTF-8"
def self.read(filename, opts = {})
new(opts.merge(:filename => filename))
end
def initialize(opts = {})
#filename = opts.fetch(:filename)
#separator = opts.fetch(:separator, ":")
#file = "#{CONFIG_DIRECTORY}/#{#filename}"
#content = nil
#config = Hash.new { |h,k| h[k] = Hash.new }
load
end
def load
raise_error("First line of config file contain be blank") if first_line_empty?
f = File.open(#file, 'r')
#content = f.read
parse!
ensure
f.close if f && !f.closed?
end
def sections
#config.keys
end
def [](section)
return nil if section.nil?
#config[section.to_s]
end
def []=( section, value )
#config[section.to_s] = value
end
private
def parse!
#_section = nil
#_current_line = nil
property = ''
string = ''
#config.clear
scanner = StringScanner.new(#content)
until scanner.eos?
#_current_line = scanner.check(%r/\A.*$/) if scanner.bol?
if scanner.scan(SECTION_REGEX)
#_section = #config[scanner[1]]
else
tmp = scanner.scan_until(%r/([\n"#{#param}#{#comment}] | \z | \\[\[\]#{#param}#{#comment}"])/mx)
raise_error if tmp.nil?
len = scanner[1].length
tmp.slice!(tmp.length - len, len)
scanner.pos = scanner.pos - len
string << tmp
end
end
process_property(property, string)
logger #config
end
def process_property( property, value )
value.chomp!
return if property.empty? and value.empty?
return if value.sub!(%r/\\\s*\z/, '')
property.strip!
value.strip!
parse_error if property.empty?
current_section[property.dup] = unescape_value(value.dup)
property.slice!(0, property.length)
value.slice!(0, value.length)
nil
end
def logger log
puts "*"*50
puts log
puts "*"*50
end
def first_line_empty?
File.readlines(#file).first.chomp.empty?
end
def raise_error(msg = 'Error processing line')
raise Error, "#{msg}: #{#_current_line}"
end
def current_section
#_section ||= #config['header']
end
end
The above class parses files that are setup like so:
[header]
project: Hello World
budget : 4.5
accessed :205
[meta data]
description : This is a tediously long description of the Hello World
project that you are taking. Tedious isn't the right word, but
it's the first word that comes to mind.
correction text: I meant 'moderately,' not 'tediously,' above.
[ trailer ]
budget:all out of budget.
You start running it like this:
require 'conf_parser'
cf = ConfParser.read "/path/to/conf/file"
For some reason when the parse! method runs, an infinite loop occurs and I can't figure out why. Any reason why this would be happening? I have never used StringScanner before, so it may be my lack of knowledge of the class
At the risk of stating the obvious, you are most likely never satisfying scanner.eos?, which in turn would mean that you're not advancing the scan pointer to the end of the string. Since the only change to scanner.pos in the else branch of parse! is to decrement it (i.e. by len), this would be understandable. If the if branch doesn't advance it to the end, you'll never terminate.