Retrieving from an array an object that satisfies some characteristics - ruby

I have some objects in an array objects. Given a certain property-value pair, I need a function that returns the first object that matches this. For example, given objects.byName "John", it should return the first object with name: "John".
Currently I'm doing this:
def self.byName name
ID_obj_by_name = {}
##objects.each_with_index do |o, index|
ID_obj_by_name[o.name] = index
end
##objects[ID_obj_by_name[name]]
end
But it seems very slow, and is using a lot of memory. How can I improve this?

If you need performance, you should consider this approach:
require 'benchmark'
class Foo
def initialize(name)
#name = name
end
def name
#name
end
end
# Using array ######################################################################
test = []
500000.times do |i|
test << Foo.new("ABC" + i.to_s + "#!###!DS")
end
puts "using array"
time = Benchmark.measure {
result = test.find { |o| o.name == "ABC250000#!###!DS" }
}
puts time
####################################################################################
# Using a hash #####################################################################
test = {}
i_am_your_object = Object.new
500000.times do |i|
test["ABC" + i.to_s + "#!###!DS"] = i_am_your_object
end
puts "using hash"
time = Benchmark.measure {
result = test["ABC250000#!###!DS"]
}
puts time
####################################################################################
Results:
using array
0.060000 0.000000 0.060000 ( 0.060884)
using hash
0.000000 0.000000 0.000000 ( 0.000005)

Try something like
def self.by_name name
##objects.find { |o| o.name == name }
end

Related

Array#delete_at or Array#slice!? and how to look up implementations

I'm scrubbing large data files (+1MM comma-separated rows). An example row might look like this:
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
Certain columns must be removed from it, after which the row should look like this:
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo,05-MAR-14 05:50:24,SourceID,TransactionalID"
QUESTION 1: If I convert a row of data into an Array, which method is preferred for removing elements: Array#delete_at or Array#slice!? I'd like to know which is the more idiomatic option. Performance is a consideration here, and I'm on a Windows machine.
def remove_bad_columns
ary = #row.split(",")
ary.delete_at(10)
ary.delete_at(5)
#row = ary.join(",")
end
QUESTION 2: I was wondering if one of these methods was implemented using the other. How can I see how the methods are built in ruby? (How for is implemented using each, for example.)
I suggest you use Array#values_at rather than delete_at or slice!:
def remove_vals(str, *indices)
ary = str.split(",")
v = (0...ary.size).to_a - indices
ary.values_at(*v).join(",")
end
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry," +
"R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
#row = remove_vals(#row, 5, 10)
#=> "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo," +
# "05-MAR-14 05:50:24,SourceID,TransactionalID"
Array#values_at has the advantage over the other two methods that you don't have to worry about the order in which the elements are removed.
The efficiency of this method is not significantly different than the other two. If #spickermann would like to add it to his benchmarks, he could use this:
def values_at
ary = array.split(",")
v = (0...ary.size).to_a - [5,10]
#row = ary.values_at(*v).join(",")
end
There is not really a difference in performance. I would prefer delete_at because that reads nicer.
require 'benchmark'
def array
"123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
end
def delete_at
ary = array.dup.split(",")
ary.delete_at(10)
ary.delete_at(5)
#row = ary.join(",")
end
def slice!
ary = array.dup.split(",")
ary.slice!(10)
ary.slice!(5)
#row = ary.join(",")
end
require 'benchmark'
n = 1_000_000
Benchmark.bmbm(15) do |x|
x.report("delete_at :") { n.times do; delete_at; end }
x.report("slice! :") { n.times do; slice! ; end }
end
# Rehearsal ---------------------------------------------------
# delete_at : 4.560000 0.000000 4.560000 ( 4.566496)
# slice! : 4.580000 0.010000 4.590000 ( 4.576767)
# ------------------------------------------ total: 9.150000sec
#
# user system total real
# delete_at : 4.500000 0.000000 4.500000 ( 4.505638)
# slice! : 4.600000 0.000000 4.600000 ( 4.613447)

Replace keys from the hash by values from the hash

I have a hash (with hundreds of pairs) and I have a string.
I want to replace in this string all occurrences of keys from the hash to according values from the hash.
I understand that I can do something like this
some_hash.each { |key, value| str = str.gsub(key, value) }
However, I am wondering whether there is some better (performance wise) method to do this.
You only need to run gsub once. Since regex (oniguruma) is implemented in C, it should be faster than looping within Ruby.
some_hash = {
"a" => "A",
"b" => "B",
"c" => "C",
}
"abcdefgabcdefg".gsub(Regexp.union(some_hash.keys), some_hash)
# => "ABCdefgABCdefg"
Some benchmarks:
require 'benchmark'
SOME_HASH = Hash[('a'..'z').zip('A'..'Z')]
SOME_REGEX = Regexp.union(SOME_HASH.keys)
SHORT_STRING = ('a'..'z').to_a.join
LONG_STRING = SHORT_STRING * 100
N = 10_000
def sub1(str)
SOME_HASH.each { |key, value|
str = str.gsub(key, value)
}
str
end
def sub2(str)
SOME_HASH.each { |key, value|
str.gsub!(key, value)
}
str
end
def sub_regex(str)
str.gsub(SOME_REGEX, SOME_HASH)
end
puts RUBY_VERSION
puts "#{ N } loops"
puts
puts "sub1: #{ sub1(SHORT_STRING) }"
puts "sub2: #{ sub2(SHORT_STRING) }"
puts "sub_regex: #{ sub_regex(SHORT_STRING) }"
puts
Benchmark.bm(10) do |b|
b.report('gsub') { N.times { sub1(LONG_STRING) } }
b.report('gsub!') { N.times { sub2(LONG_STRING) } }
b.report('regex') { N.times { sub_regex(LONG_STRING) } }
end
Which outputs:
1.9.3
10000 loops
sub1: ABCDEFGHIJKLMNOPQRSTUVWXYZ
sub2: ABCDEFGHIJKLMNOPQRSTUVWXYZ
sub_regex: ABCDEFGHIJKLMNOPQRSTUVWXYZ
user system total real
gsub 14.360000 0.030000 14.390000 ( 14.412178)
gsub! 1.940000 0.010000 1.950000 ( 1.957591)
regex 0.080000 0.000000 0.080000 ( 0.075038)

Ruby/EventMachine Packet Parser

I am trying to write a custom EM::Protocol module that can pack/unpack structured binary packets. Packet structure should be defined as name/format pairs, either as a string, some other easily parsable format, or some sort of DSL.
Some quick code to get the idea across:
module PacketProtocol
def self.included(base)
base.extend ClassMethods
end
def receive_data(data)
# retrieve packet header
# find matching packet definition
# packet.unpack(data)
end
module ClassMethods
def packet(defn)
# create an instance of Packet (see blow) and shove it
# somewhere i can get to later.
end
end
end
module MyHandler
include PacketProtocol
packet '<id:S><len:S><msg:A%{len}>'
end
EM.run do
EM.start_server '0.0.0.0', 8080, MyHandler
end
My goal is to minimize runtime complexity. Packet definitions are static per execution, so I would like to avoid this (crude) implementation:
class Packet
FmtSize = {
'S' => 2,
'A' => Proc.new {|fmt| fmt[1..-1].to_i }
}
def initialize(defn)
#fields = defn.scan(/<([^>]+):([^>]+)>/)
end
def pack(data)
data.values.pack #fields.map { |name, fmt| fmt % data }.join
end
def unpack(defn)
data = {}
posn = 0
#fields.each do |name, len|
fmt = len % data
len = FmtSizes[fmt[0]]
len = len.call(fmt) if len.class == Proc
data[name.to_sym] = bytes[posn..posn + len - 1].unpack(fmt)[0]
posn += len
end
data
end
end
data = { :id => 1, :len => 5, :msg = 'Hello' }
packet = Packet.new '<id:S><len:S><msg:A%{len}>'
packed = packet.pack(data)
require 'benchmark'
Benchmark.bm(7) do |x|
x.report('slow') {
100000.times do
unpacked = packet.unpack(packed)
end
}
x.report('fast') {
100000.times do
data = {}
data[:id] = packed[0..1].unpack('S' % data)
data[:len] = packed[2..3].unpack('S' % data)
data[:msg] = packed[4..8].unpack('A%{len}' % data)
end
}
end
# output:
# user system total real
# slow 1.970000 0.000000 1.970000 ( 1.965525)
# fast 0.140000 0.000000 0.140000 ( 0.146227)
Of the two examples, using the Packet class appears to be several magnitudes slower.
SO. The question is:
Is there a way (or a gem) that allows you generate code at runtime (other than simply eval'ing strings)?
EDIT:
Just found BinData. While it's feature-set is nice, it benchmarks much slower as well.

Convert Input Value to Integer or Float, as Appropriate Using Ruby

I believe I have a good answer to this issue, but I wanted to make sure ruby-philes didn't have a much better way to do this.
Basically, given an input string, I would like to convert the string to an integer, where appropriate, or a float, where appropriate. Otherwise, just return the string.
I'll post my answer below, but I'd like to know if there is a better way out there.
Ex:
to_f_or_i_or_s("0523.49") #=> 523.49
to_f_or_i_or_s("0000029") #=> 29
to_f_or_i_or_s("kittens") #=> "kittens"
I would avoid using regex whenever possible in Ruby. It's notoriously slow.
def to_f_or_i_or_s(v)
((float = Float(v)) && (float % 1.0 == 0) ? float.to_i : float) rescue v
end
# Proof of Ruby's slow regex
def regex_float_detection(input)
input.match('\.')
end
def math_float_detection(input)
input % 1.0 == 0
end
n = 100_000
Benchmark.bm(30) do |x|
x.report("Regex") { n.times { regex_float_detection("1.1") } }
x.report("Math") { n.times { math_float_detection(1.1) } }
end
# user system total real
# Regex 0.180000 0.000000 0.180000 ( 0.181268)
# Math 0.050000 0.000000 0.050000 ( 0.048692)
A more comprehensive benchmark:
def wattsinabox(input)
input.match('\.').nil? ? Integer(input) : Float(input) rescue input.to_s
end
def jaredonline(input)
((float = Float(input)) && (float % 1.0 == 0) ? float.to_i : float) rescue input
end
def muistooshort(input)
case(input)
when /\A\s*[+-]?\d+\.\d+\z/
input.to_f
when /\A\s*[+-]?\d+(\.\d+)?[eE]\d+\z/
input.to_f
when /\A\s*[+-]?\d+\z/
input.to_i
else
input
end
end
n = 1_000_000
Benchmark.bm(30) do |x|
x.report("wattsinabox") { n.times { wattsinabox("1.1") } }
x.report("jaredonline") { n.times { jaredonline("1.1") } }
x.report("muistooshort") { n.times { muistooshort("1.1") } }
end
# user system total real
# wattsinabox 3.600000 0.020000 3.620000 ( 3.647055)
# jaredonline 1.400000 0.000000 1.400000 ( 1.413660)
# muistooshort 2.790000 0.010000 2.800000 ( 2.803939)
def to_f_or_i_or_s(v)
v.match('\.').nil? ? Integer(v) : Float(v) rescue v.to_s
end
A pile of regexes might be a good idea if you want to handle numbers in scientific notation (which String#to_f does):
def to_f_or_i_or_s(v)
case(v)
when /\A\s*[+-]?\d+\.\d+\z/
v.to_f
when /\A\s*[+-]?\d+(\.\d+)?[eE]\d+\z/
v.to_f
when /\A\s*[+-]?\d+\z/
v.to_i
else
v
end
end
You could mash both to_f cases into one regex if you wanted.
This will, of course, fail when fed '3,14159' in a locale that uses a comma as a decimal separator.
Depends on security requirements.
def to_f_or_i_or_s s
eval(s) rescue s
end
I used this method
def to_f_or_i_or_s(value)
return value if value[/[a-zA-Z]/]
i = value.to_i
f = value.to_f
i == f ? i : f
end
CSV has converters which do this.
require "csv"
strings = ["0523.49", "29","kittens"]
strings.each{|s|p s.parse_csv(converters: :numeric).first}
#523.49
#29
#"kittens"
However for some reason it converts "00029" to a float.

ruby fast reading from std

What is the fastest way to read from STDIN a number of 1000000 characters (integers), and split it into an array of one character integers (not strings) ?
123456 > [1,2,3,4,5,6]
The quickest method I have found so far is as follows :-
gets.unpack("c*").map { |c| c-48}
Here are some results from benchmarking most of the provided solutions. These tests were run with a 100,000 digit file but with 10 reps for each test.
user system total real
each_char_full_array: 1.780000 0.010000 1.790000 ( 1.788893)
each_char_empty_array: 1.560000 0.010000 1.570000 ( 1.572162)
map_byte: 0.760000 0.010000 0.770000 ( 0.773848)
gets_scan 2.220000 0.030000 2.250000 ( 2.250076)
unpack: 0.510000 0.020000 0.530000 ( 0.529376)
And here is the code that produced them
#!/usr/bin/env ruby
require "benchmark"
MAX_ITERATIONS = 100000
FILE_NAME = "1_million_digits"
def build_test_file
File.open(FILE_NAME, "w") do |f|
MAX_ITERATIONS.times {|x| f.syswrite rand(10)}
end
end
def each_char_empty_array
STDIN.reopen(FILE_NAME)
a = []
STDIN.each_char do |c|
a << c.to_i
end
a
end
def each_char_full_array
STDIN.reopen(FILE_NAME)
a = Array.new(MAX_ITERATIONS)
idx = 0
STDIN.each_char do |c|
a[idx] = c.to_i
idx += 1
end
a
end
def map_byte()
STDIN.reopen(FILE_NAME)
a = STDIN.bytes.map { |c| c-48 }
a[-1] == -38 && a.pop
a
end
def gets_scan
STDIN.reopen(FILE_NAME)
gets.scan(/\d/).map(&:to_i)
end
def unpack
STDIN.reopen(FILE_NAME)
gets.unpack("c*").map { |c| c-48}
end
reps = 10
build_test_file
Benchmark.bm(10) do |x|
x.report("each_char_full_array: ") { reps.times {|y| each_char_full_array}}
x.report("each_char_empty_array:") { reps.times {|y| each_char_empty_array}}
x.report("map_byte: ") { reps.times {|y| map_byte}}
x.report("gets_scan ") { reps.times {|y| gets_scan}}
x.report("unpack: ") { reps.times {|y| unpack}}
end
This should be reasonably fast:
a = []
STDIN.each_char do |c|
a << c.to_i
end
although some rough benchmarking shows this hackish version is considerably faster:
a = STDIN.bytes.map { |c| c-48 }
scan(/\d/).map(&:to_i)
This will split any string into an array of integers, ignoring any non-numeric characters. If you want to grab user input from STDIN add gets:
gets.scan(/\d/).map(&:to_i)

Resources