Summary
Given a Hash, what is the most efficient way to create a subset Hash based on a list of keys to use?
h1 = { a:1, b:2, c:3 } # Given a hash...
p foo( h1, :a, :c, :d ) # ...create a method that...
#=> { :a=>1, :c=>3, :d=>nil } # ...returns specified keys...
#=> { :a=>1, :c=>3 } # ...or perhaps only keys that exist
Details
The Sequel database toolkit allows one to create or update a model instance by passing in a Hash:
foo = Product.create( hash_of_column_values )
foo.update( another_hash )
The Sinatra web framework makes available a Hash named params that includes form variables, querystring parameters and also route matches.
If I create a form holding only fields named the same as the database columns and post it to this route, everything works very conveniently:
post "/create_product" do
new_product = Product.create params
redirect "/product/#{new_product.id}"
end
However, this is both fragile and dangerous. It's dangerous because a malicious hacker could post a form with columns not intended to be changed and have them updated. It's fragile because using the same form with this route will not work:
post "/update_product/:foo" do |prod_id|
if product = Product[prod_id]
product.update(params)
#=> <Sequel::Error: method foo= doesn't exist or access is restricted to it>
end
end
So, for robustness and security I want to be able to write this:
post "/update_product/:foo" do |prod_id|
if product = Product[prod_id]
# Only update two specific fields
product.update(params.slice(:name,:description))
# The above assumes a Hash (or Sinatra params) monkeypatch
# I will also accept standalone helper methods that perform the same
end
end
...instead of the more verbose and non-DRY option:
post "/update_product/:foo" do |prod_id|
if product = Product[prod_id]
# Only update two specific fields
product.update({
name:params[:name],
description:params[:description]
})
end
end
Update: Benchmarks
Here are the results of benchmarking the (current) implementations:
user system total real
sawa2 0.250000 0.000000 0.250000 ( 0.269027)
phrogz2 0.280000 0.000000 0.280000 ( 0.275027)
sawa1 0.297000 0.000000 0.297000 ( 0.293029)
phrogz3 0.296000 0.000000 0.296000 ( 0.307031)
phrogz1 0.328000 0.000000 0.328000 ( 0.319032)
activesupport 0.639000 0.000000 0.639000 ( 0.657066)
mladen 1.716000 0.000000 1.716000 ( 1.725172)
The second answer by #sawa is the fastest of all, a hair in front of my tap-based implementation (based on his first answer). Choosing to add the check for has_key? adds very little time, and is still more than twice as fast as ActiveSupport.
Here is the benchmark code:
h1 = Hash[ ('a'..'z').zip(1..26) ]
keys = %w[a z c d g A x]
n = 60000
require 'benchmark'
Benchmark.bmbm do |x|
%w[ sawa2 phrogz2 sawa1 phrogz3 phrogz1 activesupport mladen ].each do |m|
x.report(m){ n.times{ h1.send(m,*keys) } }
end
end
I would just use the slice method provided by active_support
require 'active_support/core_ext/hash/slice'
{a: 1, b: 2, c: 3}.slice(:a, :c) # => {a: 1, c: 3}
Of course, make sure to update your gemfile:
gem 'active_support'
I changed by mind. The previous one doesn't seem to be any good.
class Hash
def slice1(*keys)
keys.each_with_object({}){|k, h| h[k] = self[k]}
end
def slice2(*keys)
h = {}
keys.each{|k| h[k] = self[k]}
h
end
end
Sequel has built-in support for only picking specific columns when updating:
product.update_fields(params, [:name, :description])
That doesn't do exactly the same thing if :name or :description is not present in params, though. But assuming you are expecting the user to use your form, that shouldn't be an issue.
I could always expand update_fields to take an option hash with an option that will skip the value if not present in the hash. I just haven't received a request to do that yet.
Perhaps
class Hash
def slice *keys
select{|k| keys.member?(k)}
end
end
Or you could just copy ActiveSupport's Hash#slice, it looks a bit more robust.
Here are my implementations; I will benchmark and accept faster (or sufficiently more elegant) solutions:
# Implementation 1
class Hash
def slice(*keys)
Hash[keys.zip(values_at *keys)]
end
end
# Implementation 2
class Hash
def slice(*keys)
{}.tap{ |h| keys.each{ |k| h[k]=self[k] } }
end
end
# Implementation 3 - silently ignore keys not in the original
class Hash
def slice(*keys)
{}.tap{ |h| keys.each{ |k| h[k]=self[k] if has_key?(k) } }
end
end
Related
I am developing a Ruby application where I am dynamically invoking methods based on JSON data. Loosely:
def items
# do something
end
def createItem( name:, data:nil )
# do something that requires a name keyword argument
end
def receive_json(json) # e.g. { "cmd":"createItem", "name":"jim" }
hash = JSON.parse(json)
cmd = hash.delete('cmd')
if respond_to?(cmd)
params = Hash[ hash.map{ |k,v| [k.to_sym, v } ]
method(cmd).arity==0 ? send(cmd) : send(cmd,params)
end
end
As shown above, some methods take no arguments, and some take keyword arguments. Under Ruby 2.1.0 (where I'm developing) the arity of both methods above is 0. However, if I send(cmd,params) always, I get an error for methods that take no parameters.
How can I use send to correctly pass along the keyword arguments when desired, but omit them when not?
Using parameters instead of arity appears to work for my needs:
method(cmd).parameters.empty? ? send(cmd) : send(cmd,opts)
More insight into the richness of the parameters return values:
def foo; end
method(:foo).parameters
#=> []
def bar(a,b=nil); end
method(:bar).parameters
#=> [[:req, :a], [:opt, :b]]
def jim(a:,b:nil); end
method(:jim).parameters
#=> [[:keyreq, :a], [:key, :b]]
Here's a generic method that picks out only those named values that your method supports, in case you have extra keys in your hash that aren't part of the keyword arguments used by the method:
module Kernel
def dispatch(name,args)
keyargs = method(name).parameters.map do |type,name|
[name,args[name]] if args.include?(name)
end.compact.to_h
keyargs.empty? ? send(name) : send(name,keyargs)
end
end
h = {a:1, b:2, c:3}
def no_params
p :yay
end
def few(a:,b:99)
p a:a, b:b
end
def extra(a:,b:,c:,z:17)
p a:a, b:b, c:c, z:z
end
dispatch(:no_params,h) #=> :yay
dispatch(:few,h) #=> {:a=>1, :b=>2}
dispatch(:extra,h) #=> {:a=>1, :b=>2, :c=>3, :z=>17}
At first, I thought params is supposed to become empty when the :cmd value is "items", in which case Jesse Sielaff's answer would be correct. But since you seem to be claiming that it isn't, I think that it is your design flaw. Instead of trying to dispatch in that way, you should rather have those methods just gobble the arguments:
def items(name:nil, data:nil)
...
end
I'm scrubbing large data files (+1MM comma-separated rows). An example row might look like this:
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
Certain columns must be removed from it, after which the row should look like this:
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo,05-MAR-14 05:50:24,SourceID,TransactionalID"
QUESTION 1: If I convert a row of data into an Array, which method is preferred for removing elements: Array#delete_at or Array#slice!? I'd like to know which is the more idiomatic option. Performance is a consideration here, and I'm on a Windows machine.
def remove_bad_columns
ary = #row.split(",")
ary.delete_at(10)
ary.delete_at(5)
#row = ary.join(",")
end
QUESTION 2: I was wondering if one of these methods was implemented using the other. How can I see how the methods are built in ruby? (How for is implemented using each, for example.)
I suggest you use Array#values_at rather than delete_at or slice!:
def remove_vals(str, *indices)
ary = str.split(",")
v = (0...ary.size).to_a - indices
ary.values_at(*v).join(",")
end
#row = "123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry," +
"R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
#row = remove_vals(#row, 5, 10)
#=> "123456789,11122,CustomerName,2014-01-31,2014-02-01,R,SKUInfo," +
# "05-MAR-14 05:50:24,SourceID,TransactionalID"
Array#values_at has the advantage over the other two methods that you don't have to worry about the order in which the elements are removed.
The efficiency of this method is not significantly different than the other two. If #spickermann would like to add it to his benchmarks, he could use this:
def values_at
ary = array.split(",")
v = (0...ary.size).to_a - [5,10]
#row = ary.values_at(*v).join(",")
end
There is not really a difference in performance. I would prefer delete_at because that reads nicer.
require 'benchmark'
def array
"123456789,11122,CustomerName,2014-01-31,2014-02-01,RemoveThisEntry,R,SKUInfo,05-MAR-14 05:50:24,SourceID,RemoveThisEntryToo,TransactionalID"
end
def delete_at
ary = array.dup.split(",")
ary.delete_at(10)
ary.delete_at(5)
#row = ary.join(",")
end
def slice!
ary = array.dup.split(",")
ary.slice!(10)
ary.slice!(5)
#row = ary.join(",")
end
require 'benchmark'
n = 1_000_000
Benchmark.bmbm(15) do |x|
x.report("delete_at :") { n.times do; delete_at; end }
x.report("slice! :") { n.times do; slice! ; end }
end
# Rehearsal ---------------------------------------------------
# delete_at : 4.560000 0.000000 4.560000 ( 4.566496)
# slice! : 4.580000 0.010000 4.590000 ( 4.576767)
# ------------------------------------------ total: 9.150000sec
#
# user system total real
# delete_at : 4.500000 0.000000 4.500000 ( 4.505638)
# slice! : 4.600000 0.000000 4.600000 ( 4.613447)
This probably isn't something you should try at home, but for some reason or another I tried to create an array of methods in Ruby.
I started by defining two methods.
irb(main):001:0> def test1
irb(main):002:1> puts "test!"
irb(main):003:1> end
=> nil
irb(main):004:0> def test2
irb(main):005:1> puts "test2!"
irb(main):006:1> end
=> nil
The weird thing happens when you try to put it into an actual array. It seems to run both methods.
irb(main):007:0> array = [test1, test2]
test!
test2!
=> [nil, nil]
And afterwards, the array is empty.
irb(main):008:0> puts array
=> nil
Can someone explain to me why it runs the methods? Other than that the whole excercise is seriously in need of an exorcist?
What you're storing in your array is the result of calling your methods, not the methods themselves.
def test1
puts "foo!"
end
def test2
puts "bar!"
end
You can store references to the actual methods like this:
> arr = [method(:test1), method(:test2)]
# => [#<Method: Object#test1>, #<Method: Object#test2>]
Later, you can call the referenced methods like this:
> arr.each {|m| m.call }
foo!
bar!
#alestanis explained the reason well. If you were trying to store the methods, then you can do what Lars Haugseth says or you could do the folllowing:
test1 = Proc.new { puts "test!" }
test2 = Proc.new { puts "test2!" }
a = [test1, test2]
This may make your code much more readable.
Here is an irb run.
1.9.3p194 :009 > test1 = Proc.new { puts "test!" }
=> #<Proc:0x00000002798a90#(irb):9>
1.9.3p194 :010 > test2 = Proc.new { puts "test2!" }
=> #<Proc:0x00000002792988#(irb):10>
1.9.3p194 :011 > a = [test1, test2]
=> [#<Proc:0x00000002798a90#(irb):9>, #<Proc:0x00000002792988#(irb):10>]
Your array never contains anything else than two nil values. I tricks you by putting the strings when evaluating. But the return value of each function still is nil.
Your code runs the two methods because you're actually calling the methods when you say "test1" and "test2" - parentheses are optional for ruby method calls.
Since both of your methods just contain a "puts", which returns nil, your resulting array is just an array of two nils.
If you had a square method and wanted to create an array with the square values of 2 and 4, you would write
array = [square(2), square(4)]
Here you are doing exactly the same thing, except that your test methods don't return anything and that's why your final array seems empty (actually, it contains [nil, nil]).
Here's my two-pennies worth. Building on the solutions already posted, this is an example of a working example. What might be handy for some here is that it includes method arguments and the use of self (which refers to the instance of the PromotionalRules class when it is instantiated) and the array of symbols, which is neat - I got that from the Ruby docs on the #send method here. Hope this helps someone!
class PromotionalRules
PROMOTIONS = [:lavender_heart_promotion, :ten_percent_discount]
def apply_promotions total, basket
#total = total
if PROMOTIONS.count > 0
PROMOTIONS.each { |promotion| #total = self.send promotion, #total, basket }
end
#total.round(2)
end
def lavender_heart_promotion total, basket
if two_or_more_lavender_hearts? basket
basket.map { |item| total -= 0.75 if item == 001 }
end
total
end
def two_or_more_lavender_hearts? basket
n = 0
basket.each do |item|
n += 1 if item == 001
end
n >= 2
end
def ten_percent_discount total, *arg
if total > 60.00
total = total - total/10
end
total
end
end
Thanks to everyone for their help. I love the open-source nature of coding - threads just get better and better as people iterate over each other's solutions!
There is a common idiom of using substitutions like:
def with clazz, &block
yield clazz
clazz
end
with Hash.new |hash|
hash.merge!{:a => 1}
end
Is there a way to go further and define #with to have a possibility of doing:
with Hash.new |hash|
merge!{:a => 1}
end
or even:
with Hash.new do
merge!{:a => 1}
end
?
UPDATE
Later accidentally I found exactly what I was looking for (solution similar to the accepted one):
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/19153
UPDATE 2
It was added to sugar-high/dsl in https://github.com/kristianmandrup/sugar-high
UPDATE 3
docille project on Github exploits this idea very nicely.
If you are referring to the way in which Rails does routing then I think you need to do something like this
def with(instance, &block)
instance.instance_eval(&block)
instance
end
with(Hash.new) do
merge!({:a => 1})
merge!({:b => 1})
end
This is how I can see it being done in the Rails source anyway start by looking at the draw method in action_pack/lib/action_dispatch/routing/route_set
Isn't your pseudo-Ruby:
with Hash.new do |hash|
merge!{:a => 1}
end
The same thing as using 1.9's tap? For example:
>> x = Hash[:a, :b].tap { |h| h.merge!({:c => :d}) }
=> {:a=>:b, :c=>:d}
You still have to name the block argument of course.
You can use the ruby builtin tap:
Hash.new.tap do |hash|
hash.merge! a: 1
end
This can even be "abused" for multiple objects:
[one_long_name, another_long_name].tap do |(a,b)|
a.prop = b.prop
end
Of course both don't give you exactly what with would do according to your example: The block won't be evaluated within the instance of the object. But I prefer a lot to use tap with multiple objects, plus tap return self, so it can be chained:
[one_long_name, another_long_name].tap {|(a,b)| a.prop = b.prop }.inspect
I have an array of Elements, and each element has a property :image.
I would like an array of :images, so whats the quickest and least expensive way to achieve this. Is it just iteration over the array and push each element into a new array, something like this:
images = []
elements.each {|element| images << element.image}
elements.map {|element| element.image}
This should have about the same performance as your version, but is somewhat more succinct and more idiomatic.
You can use the Benchmark module to test these sorts of things. I ran #sepp2k's version against your original code like so:
require 'benchmark'
class Element
attr_accessor :image
def initialize(image)
#image = image
end
end
elements = Array.new(500) {|index| Element.new(index)}
n = 10000
Benchmark.bm do |x|
x.report do
n.times do
# Globalkeith's version
image = []
elements.each {|element| image << element.image}
end
end
# sepp2k's version
x.report { n.times do elements.map {|element| element.image} end }
end
The output on my machine was consistently (after more than 3 runs) very close to this:
user system total real
2.140000 0.000000 2.140000 ( 2.143290)
1.420000 0.010000 1.430000 ( 1.422651)
Thus demonstrating that map is significantly faster than manually appending to an array when the array is somewhat large and the operation is performed many times.