Statistics with Ruby - ruby

I'm writing a command line tool to calculate statistical data and I'm using minitest. load_data returns an array of user input. I would like to know why does this test pass when I use the method concat, but if I use the << operator the test passes even when empty. Aren't these two the same thing?
class Test < MiniTest::Unit::TestCase
def setup
#collection = DataSet.new
end
def test_data_is_not_empty
assert ! #collection.load_data.empty?
end
end
class DataSet
def initialize
#collected = []
end
def append
print 'Please input a list of data: '
value = gets.chomp.split(',').map(&:to_f)
#collected.concat(value)
end
end

Use concat:
[ "a", "b" ].concat( ["c", "d"] ) #=> [ "a", "b", "c", "d" ]
a = [ 1, 2, 3 ]
a.concat( [ 4, 5 ] )
a #=> [ 1, 2, 3, 4, 5 ]
In your case, it looks like it would be something like
#collected.concat( load_data )
If load_data is a member of another object, such as #collection, then do
#collected.concat( #collection.load_data )
If you want to append each element my_data at a time from within load_data, then do
#collected << my_input

Related

Sorting Multiple Directory Entries in Ruby

If I have the following files And they have the following paths (changed for simplicity)
array = ["root_path/dir1/a/file.jpg",
"root_path/dir1/a/file2.jpg",
"root_path/dir1/b/file3.jpg",
"root_path/dir2/c/file4.jpg"]
How can I sort them to get this sort of hash like this?
sort_directory(array)
#=>
{
"dir1" => {
"a" => [
"root_path/dir1/a/file.jpg",
"root_path/dir1/a/file2.jpg"
],
"b" => [
"root_path/dir1/b/file3.jpg"
]
},
"dir2" => {
"c" => [
"root_path/dir2/c/file4.jpg"
]
}
}
one way of doing it using group_by, split and/or some regex
array.group_by{ |dir| dir.split('/')[1] }.map{ |k,v| {k => v.group_by{ |file| file[/\/([^\/]+)(?=\/[^\/]+\/?\Z)/, 1]} } }
Here is how you can use recursion to obtain the desired result.
If s = "root_path/dir1/a/b/c/file.jpg", we can regard "root_path" as being at "position" 0, "dir1" at position 1 and so on. The example given by the OP has desired grouping on values at positions 1 and 2, which I will write positions = [1,2].
There is no limit to the number of positions on which to group or their order. For the string above we could write, for example, positions = [2,4,1], so the first grouping would be on position 2, the next on position 4 and the last on position 1 (though I have no idea if that could be of interest).
Code
def hashify(arr, positions)
recurse(positions, arr.map { |s| s.split("/") })
end
def recurse(positions, parts)
return parts.map { |a| a.join('/') } if positions.empty?
pos, *positions = positions
h = parts.group_by { |a| a[pos] }.
each_with_object({}) { |(k,a),g| g[k]=recurse(positions, a) }
end
Example
arr = ["root_path/dir1/a/file.jpg",
"root_path/dir1/a/file2.jpg",
"root_path/dir1/b/file3.jpg",
"root_path/dir2/c/file4.jpg"]
hashify(arr, [1, 2])
#=>{"dir1"=>{"a"=>["root_path/dir1/a/file.jpg", "root_path/dir1/a/file2.jpg"],
# "b"=>["root_path/dir1/b/file3.jpg"]},
# "dir2"=>{"c"=>["root_path/dir2/c/file4.jpg"]}}
Explanation
Recursive methods are difficult to explain. The best way, in my opinion, is to insert puts statements to show the sequence of calculation. I've also indented a few spaces whenever the method calls itself. Here is how the code might be modified for that purpose.
INDENT = 4
def hashify(arr, positions)
recurse(positions, arr.map { |s| s.split("/") }, 0)
end
def recurse(positions, parts, lvl)
puts
"lvl=#{lvl}".pr(lvl)
"positions=#{ positions }".pr(lvl)
if positions.empty?
"parts=#{parts}".pr(lvl)
return parts.map { |a| a.join('/') }
end
pos, *positions = positions
"pos=#{pos}, positions=#{positions}".pr(lvl)
h = parts.group_by { |a| a[pos] }
"h=#{h}".pr(lvl)
g = h.each_with_object({}) { |(k,a),g| g[k]=recurse(positions, a, lvl+1) }
"rv=#{g}".pr(lvl)
g
end
class String
def pr(lvl)
print "#{ ' ' * INDENT * lvl}"
puts self
end
end
We now execute this method for the data given in the example.
hashify(arr, [1, 2])
lvl=0
positions=[1, 2]
pos=1, positions=[2]
h={"dir1"=>[["root_path", "dir1", "a", "file.jpg"],
["root_path", "dir1", "a", "file2.jpg"],
["root_path", "dir1", "b", "file3.jpg"]],
"dir2"=>[["root_path", "dir2", "c", "file4.jpg"]]}
lvl=1
positions=[2]
pos=2, positions=[]
h={"a"=>[["root_path", "dir1", "a", "file.jpg"],
["root_path", "dir1", "a", "file2.jpg"]],
"b"=>[["root_path", "dir1", "b", "file3.jpg"]]}
lvl=2
positions=[]
parts=[["root_path", "dir1", "a", "file.jpg"],
["root_path", "dir1", "a", "file2.jpg"]]
lvl=2
positions=[]
parts=[["root_path", "dir1", "b", "file3.jpg"]]
rv={"a"=>["root_path/dir1/a/file.jpg", "root_path/dir1/a/file2.jpg"],
"b"=>["root_path/dir1/b/file3.jpg"]}
lvl=1
positions=[2]
pos=2, positions=[]
h={"c"=>[["root_path", "dir2", "c", "file4.jpg"]]}
lvl=2
positions=[]
parts=[["root_path", "dir2", "c", "file4.jpg"]]
rv={"c"=>["root_path/dir2/c/file4.jpg"]}
rv={"dir1"=>{"a"=>["root_path/dir1/a/file.jpg",
"root_path/dir1/a/file2.jpg"],
"b"=>["root_path/dir1/b/file3.jpg"]},
"dir2"=>{"c"=>["root_path/dir2/c/file4.jpg"]}}

Ruby: Sort array of objects after an array of name properties

I have an array "sizes" that look like this:
[#<OPTIONVALUE ID: 5, NAME: "M">,
#<OPTIONVALUE ID: 6, NAME: "M/L">,
#<OPTIONVALUE ID: 7, NAME: "XS/S">]
Consider the values of attribute NAME. The array is sorted: M, M/L, XS/S.
But the sort order should look like this:
#sizes_sort_order = ["XS", "XS/S", "S", "S/M", "M", "M/L", "L", "L/XL", "XL"]
applied to the former array the order of the elements should look like this:
[#<SPREE::OPTIONVALUE ID: 7, NAME: "XS/S">,
#<SPREE::OPTIONVALUE ID: 5, NAME: "M">,
#<SPREE::OPTIONVALUE ID: 6, NAME: "M/L">]
def sizes
#sizes ||= grouped_option_values_by_option_type[Spree::OptionType.find_by!(name: 'size')]
#sizes_sort_order = ["XS", "XS/S", "S", "S/M", "M", "M/L", "L", "L/XL", "XL"]
#sizes.map { # sort after #size_sort_order }
end
How can i achieve to get the elements in the array sorted after #sizes_sort_order ?
You can use Enumerable#sort_by
my_array.sort_by {|x| #sizes_sort_order.index(x.name) }
You can include the Comparablemodule to get a natural sort for the objects.
http://ruby-doc.org/core-2.2.3/Comparable.html
The Comparable mixin is used by classes whose objects may be ordered.
The class must define the <=> operator, which compares the receiver
against another object, returning -1, 0, or +1 depending on whether
the receiver is less than, equal to, or greater than the other object.
class Size
include Comparable
SIZES = ["XS", "XS/S", "S", "S/M", "M", "M/L", "L", "L/XL", "XL"]
attr_reader :name
def initialize(id, name)
#id = id
#name = name
end
def <=>(b)
SIZES.index(name) <=> SIZES.index(b.name)
end
end
a = Size.new(5, 'M')
b = Size.new(6, 'M/L')
c = Size.new(7, 'XS/S')
print [a, b, c].sort
[#<Size:0x007f8e910458e0 #id=7, #name="XS/S">, #<Size:0x007f8e910459a8 #id=5, #name="M">, #<Size:0x007f8e91045930 #id=6, #name="M/L">]
This approach involves more steps than ones that employ sort or sort_by, but for larger arrays it may be faster, as no sorting--which is relatively expensive--is involved.
Code
def reorder_by_size(instances, size_order)
instances.each_with_object({}) { |inst, h| h.update(inst.name=>inst) }.
values_at(*(size_order & (instances.map { |s| s.name })))
end
Example
First let's create an array of instances of
class Sizes
attr_reader :name
def initialize(id, name)
#id = id
#name = name
end
end
like so:
instances = [Sizes.new(5,'M'), Sizes.new(6,'M/L'), Sizes.new(7, 'XS/S')]
#=> [#<Sizes:0x007fa66a955ac0 #id=5, #name="M">,
# #<Sizes:0x007fa66a955a70 #id=6, #name="M/L">,
# #<Sizes:0x007fa66a955a20 #id=7, #name="XS/S">]
Then
reorder_by_size(instances, #sizes_sort_order)
#=> [#<Sizes:0x007fa66a01dfc0 #id=7, #name="XS/S">,
# #<Sizes:0x007fa66a86fdb8 #id=5, #name="M">,
# #<Sizes:0x007fa66a8404f0 #id=6, #name="M/L">]
Explanation
For instances as defined for the example, first create an array of sizes in the desired order:
names = #sizes_sort_order & (instances.map { |s| s.name })
#=> ["XS/S", "M", "M/L"]
Important: the doc for Array#& states, "The order is preserved from the original array.".
Now we can create the desired reordering without sorting, by creating a hash with keys the sizes and values the instances, then use Hash#values_at to extract the instances in the desired order.
instances.each_with_object({}) { |inst, h|
h.update(inst.name=>inst) }.values_at(*names)
#=> [#<Sizes:0x007fa66a01dfc0 #id=7, #name="XS/S">,
# #<Sizes:0x007fa66a86fdb8 #id=5, #name="M">,
# #<Sizes:0x007fa66a8404f0 #id=6, #name="M/L">]
The last operation involves the following two steps.
h = instances.each_with_object({}) { |inst, h| h.update(inst.name=>inst) }
#=> {"M" => #<Sizes:0x007fa66a955ac0 #id=5, #name="M">,
# "M/L" => #<Sizes:0x007fa66a955a70 #id=6, #name="M/L">,
# "XS/S" => #<Sizes:0x007fa66a955a20 #id=7, #name="XS/S">}
h.values_at(*names)
#=> h.values_at(*["XS/S", "M", "M/L"])
#=> h.values_at("XS/S", "M", "M/L")
#=> [#<Sizes:0x007fa66a955a20 #id=7, #name="XS/S">,
# #<Sizes:0x007fa66a955ac0 #id=5, #name="M">,
# #<Sizes:0x007fa66a955a70 #id=6, #name="M/L">]

How to use &proc argument inside method

Array#max_by returns only a single value, but I want to have all values that have the max value.
hashes = [{a: 1, b:2}, {a:2, b:3}, {a:1, b:3}]
max = hashes.map{|h| h[:b]}.max
hashes.select{|h| h[:b] == max}
# => [{a: 2, b: 3}, {a: 1, b: 3}]
This code works fine, and I want to add it to Array class.
class Array
def max_values_by(&proc)
max = map(&proc).max
# I don't know how to use `select` here.
end
end
How to access the value of the &proc argument?
Use the proc in the block passed to select by calling it with call:
class Array
def max_values_by(&proc)
max = map(&proc).max
select { |h| proc.call(h) == max }
end
end
hashes.max_values_by { |h| h[:b] }
=> [{a: 2, b: 3}, {a: 1, b: 3}]
or with yield, which gives identical results:
def max_values_by(&proc)
max = map(&proc).max
select { |h| yield(h) == max }
end
Although proc.call is a little longer than yield, I prefer it in this case because it makes it clearer that the same block is being used in two places in the method, and because it's weird to use both the implicit block passing of yield and the explicit passing of &proc in the same method.
#DaveSchweisguth suggests a great implementation using select, like you requested. Another way of achieving the same result is by using group_by, like this:
>> hashes.group_by{|h| h[:b]}.max.last
=> [{:a=>2, :b=>3}, {:a=>1, :b=>3}]
or monkey-patched into Array as:
class Array
def max_values_by(&proc)
group_by(&proc).max.last
end
end

How to retrieve values within an array

def self.foo
[
["a","aa"],
["b","bb"],
]
end
Given "a", I should be able to retrieve "aa"
Given "bb", I should be able to retrieve "b"
How do I do this?
assoc and rassoc are your friends:
ar = [
["a","aa"],
["b","bb"],
]
p ar.assoc("a").last #=> "aa"
p ar.rassoc("bb").first #=> "b"
Hash[self.foo].invert["bb"] #=> "b"
Hash[self.foo]["a"] #=> "aa"
Hash[] turns array into hash
Hash#invert inverts the hash so all values map to the keys
If you want to do both:
Hash[self.foo]["bb"] or Hash[self.foo].invert["bb"] #=> "b"
I would create my own "bimap" implementation, perhaps something like:
class Bimap < Hash
alias :__put__ :[]=
def []=(key,value)
__put__(key,value)
__put__(value,key)
end
alias :__size__ :size
def size
__size__ / 2
end
# ...any other Hash methods to reimplement?
end

Convert cartesian product to nested hash in ruby

I have a structure with a cartesian product that looks like this (and could go out to arbitrary depth)...
variables = ["var1","var2",...]
myhash = {
{"var1"=>"a", "var2"=>"a", ...}=>1,
{"var1"=>"a", "var2"=>"b", ...}=>2,
{"var1"=>"b", "var2"=>"a", ...}=>3,
{"var1"=>"b", "var2"=>"b", ...}=>4,
}
... it has a fixed structure but I'd like simple indexing so I'm trying to write a method to convert it to this :
nested = {
"a"=> {
"a"=> 1,
"b"=> 2
},
"b"=> {
"a"=> 3,
"b"=> 4
}
}
Any clever ideas (that allow for arbitrary depth)?
Maybe like this (not the cleanest way):
def cartesian_to_map(myhash)
{}.tap do |hash|
myhash.each do |h|
(hash[h[0]["var1"]] ||= {}).merge!({h[0]["var2"] => h[1]})
end
end
end
Result:
puts cartesian_to_map(myhash).inspect
{"a"=>{"a"=>1, "b"=>2}, "b"=>{"a"=>3, "b"=>4}}
Here is my example.
It uses a method index(hash, fields) that takes the hash, and the fields you want to index by.
It's dirty, and uses a local variable to pass up the current level in the index.
I bet you can make it much nicer.
def index(hash, fields)
# store the last index of the fields
last_field = fields.length - 1
# our indexed version
indexed = {}
hash.each do |key, value|
# our current point in the indexed hash
point = indexed
fields.each_with_index do |field, i|
key_field = key[field]
if i == last_field
point[key_field] = value
else
# ensure the next point is a hash
point[key_field] ||= {}
# move our point up
point = point[key_field]
end
end
end
# return our indexed hash
indexed
end
You can then just call
index(myhash, ["var1", "var2"])
And it should look like what you want
index({
{"var1"=>"a", "var2"=>"a"} => 1,
{"var1"=>"a", "var2"=>"b"} => 2,
{"var1"=>"b", "var2"=>"a"} => 3,
{"var1"=>"b", "var2"=>"b"} => 4,
}, ["var1", "var2"])
==
{
"a"=> {
"a"=> 1,
"b"=> 2
},
"b"=> {
"a"=> 3,
"b"=> 4
}
}
It seems to work.
(see it as a gist
https://gist.github.com/1126580)
Here's an ugly-but-effective solution:
nested = Hash[ myhash.group_by{ |h,n| h["var1"] } ].tap{ |nested|
nested.each do |v1,a|
nested[v1] = a.group_by{ |h,n| h["var2"] }
nested[v1].each{ |v2,a| nested[v1][v2] = a.flatten.last }
end
}
p nested
#=> {"a"=>{"a"=>1, "b"=>2}, "b"=>{"a"=>3, "b"=>4}}
You might consider an alternative representation that is easier to map to and (IMO) just as easy to index:
paired = Hash[ myhash.map{ |h,n| [ [h["var1"],h["var2"]], n ] } ]
p paired
#=> {["a", "a"]=>1, ["a", "b"]=>2, ["b", "a"]=>3, ["b", "b"]=>4}
p paired[["a","b"]]
#=> 2

Resources