Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Is there anyway of grouping first common letters in an array of strings?
For example:
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
so when i do
array.group_by{ |string| some_logic_with_string }
The result should be,
{
'hello' => ['hello', 'hello you'],
'people' => ['people'],
'fin' => ['finally', 'finland']
}
NOTE: Some test cases are ambiguous and expectations conflict with other tests, you need to fix them.
I guess plain group_by may not work, a further processing is needed.
I have come up with below code that seems to work for all the given test cases in consistent manner.
I have left notes in the code to explain the logic. Only way to fully understand it will be to inspect value of h and see the flow for a simple test case.
def group_by_common_chars(array)
# We will iteratively group by as many time as there are characters
# in a largest possible key, which is max length of all strings
max_len = array.max_by {|i| i.size}.size
# First group by first character.
h = array.group_by{|i| i[0]}
# Now iterate remaining (max_len - 1) times
(1...max_len).each do |c|
# Let's perform a group by next set of starting characters.
t = h.map do |k,v|
h1 = v.group_by {|i| i[0..c]}
end.reduce(&:merge)
# We need to merge the previously generated hash
# with the hash generated in this iteration. Here things get tricky.
# If previously, we had
# {"a" => ["a"], "ab" => ["ab", "abc"]},
# and now, we have
# {"a"=>["a"], "ab"=>["ab"], "abc"=>["abc"]},
# We need to merge the two hashes such that we have
# {"a"=>["a"], "ab"=>["ab", "abc"], "abc"=>["abc"]}.
# Note that `Hash#merge`'s block is called only for common keys, so, "abc"
# will get merged, we can't do much about it now. We will process
# it later in the loop
h = h.merge(t) do |k, o, n|
if (o.size != n.size)
diff = [o,n].max - [o,n].min
if diff.size == 1 && t.value?(diff)
[o,n].max
else
[o,n].min
end
else
o
end
end
end
# Sort by key length, smallest in the beginning.
h = h.sort {|i,j| i.first.size <=> j.first.size }.to_h
# Get rid of those key-value pairs, where value is single element array
# and that single element is already part of another key-value pair, and
# that value array has more than one element. This step will allow us
# to get rid of key-value like "abc"=>["abc"] in the example discussed
# above.
h = h.tap do |h|
keys = h.keys
keys.each do |k|
v = h[k]
if (v.size == 1 &&
h.key?(v.first) &&
h.values.flatten.count(v.first) > 1) then
h.delete(k)
end
end
end
# Get rid of those keys whose value array consist of only elements that
# already part of some other key. Since, hash is ordered by key's string
# size, this process allows us to get rid of those keys which are smaller
# in length but consists of only elements that are present somewhere else
# with a key of larger length. For example, it lets us to get rid of
# "a"=>["aba", "abb", "aaa", "aab"] from a hash like
# {"a"=>["aba", "abb", "aaa", "aab"], "ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
h.tap do |h|
keys = h.keys
keys.each do |k|
values = h[k]
other_values = h.values_at(*(h.keys-[k])).flatten
already_present = values.all? do |v|
other_values.include?(v)
end
h.delete(k) if already_present
end
end
end
Sample Run:
p group_by_common_chars ['hello', 'hello you', 'people', 'finally', 'finland']
#=> {"fin"=>["finally", "finland"], "hello"=>["hello", "hello you"], "people"=>["people"]}
p group_by_common_chars ['a', 'ab', 'abc']
#=> {"a"=>["a"], "ab"=>["ab", "abc"]}
p group_by_common_chars ['aba', 'abb', 'aaa', 'aab']
#=> {"ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
p group_by_common_chars ["Why", "haven't", "you", "answered", "the", "above", "questions?", "Please", "do", "so."]
#=> {"a"=>["answered", "above"], "do"=>["do"], "Why"=>["Why"], "you"=>["you"], "so."=>["so."], "the"=>["the"], "Please"=>["Please"], "haven't"=>["haven't"], "questions?"=>["questions?"]}
Not sure, if you can sort by all common letters. But if you want to do sort only by first letter then here it is:
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
result = {}
array.each { |st| result[st[0]] = result.fetch(st[0], []) + [st] }
pp result
{"h"=>["hello", "hello you"], "p"=>["people"], "f"=>["finally", "finland"]}
Now result contains your desired hash.
Hmm, you're trying to do something that's pretty custom. I can think of two classical approaches that sort of do what you want: 1) Stemming and 2) Levenshtein Distance.
With stemming you're finding the root word to a longer word. Here's a gem for it.
Levenshtein is a famous algorithm which calculates the difference between two strings. There is a gem for it that runs pretty fast due to a native C extension.
I need to create a signature string for a variable in Ruby, where the variable can be a number, a string, a hash, or an array. The hash values and array elements can also be any of these types.
This string will be used to compare the values in a database (Mongo, in this case).
My first thought was to create an MD5 hash of a JSON encoded value, like so: (body is the variable referred to above)
def createsig(body)
Digest::MD5.hexdigest(JSON.generate(body))
end
This nearly works, but JSON.generate does not encode the keys of a hash in the same order each time, so createsig({:a=>'a',:b=>'b'}) does not always equal createsig({:b=>'b',:a=>'a'}).
What is the best way to create a signature string to fit this need?
Note: For the detail oriented among us, I know that you can't JSON.generate() a number or a string. In these cases, I would just call MD5.hexdigest() directly.
I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.
This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.
def createsig(body)
Digest::MD5.hexdigest( sigflat body )
end
def sigflat(body)
if body.class == Hash
arr = []
body.each do |key, value|
arr << "#{sigflat key}=>#{sigflat value}"
end
body = arr
end
if body.class == Array
str = ''
body.map! do |value|
sigflat value
end.sort!.each do |value|
str << value
end
end
if body.class != String
body = body.to_s << body.class.to_s
end
body
end
> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true
If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:
require 'digest/md5'
class Object
def md5key
to_s
end
end
class Array
def md5key
map(&:md5key).join
end
end
class Hash
def md5key
sort.map(&:md5key).join
end
end
Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:
def createsig(o)
Digest::MD5.hexdigest(o.md5key)
end
Example:
body = [
{
'bar' => [
345,
"baz",
],
'qux' => 7,
},
"foo",
123,
]
p body.md5key # => "bar345bazqux7foo123"
p createsig(body) # => "3a92036374de88118faf19483fe2572e"
Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].
Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)
I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".
module ObjSum
refine Object do
def objsum
parts = []
queue = [self]
while queue.size > 0
item = queue.shift
if item.kind_of?(Hash)
parts << "\\000"
item.keys.sort.each do |k|
queue << k
queue << item[k]
end
elsif item.kind_of?(Set)
parts << "\\001"
item.to_a.sort.each { |i| queue << i }
elsif item.kind_of?(Enumerable)
parts << "\\002"
item.each { |i| queue << i }
elsif item.kind_of?(Fixnum)
parts << "\\003"
parts << item.to_s
elsif item.kind_of?(Float)
parts << "\\004"
parts << item.to_s
else
parts << item.to_s
end
end
Digest::MD5.hexdigest(parts.join)
end
end
end
Just my 2 cents:
module Ext
module Hash
module InstanceMethods
# Return a string suitable for generating content signature.
# Signature image does not depend on order of keys.
#
# {:a => 1, :b => 2}.signature_image == {:b => 2, :a => 1}.signature_image # => true
# {{:a => 1, :b => 2} => 3}.signature_image == {{:b => 2, :a => 1} => 3}.signature_image # => true
# etc.
#
# NOTE: Signature images of identical content generated under different versions of Ruby are NOT GUARANTEED to be identical.
def signature_image
# Store normalized key-value pairs here.
ar = []
each do |k, v|
ar << [
k.is_a?(::Hash) ? k.signature_image : [k.class.to_s, k.inspect].join(":"),
v.is_a?(::Hash) ? v.signature_image : [v.class.to_s, v.inspect].join(":"),
]
end
ar.sort.inspect
end
end
end
end
class Hash #:nodoc:
include Ext::Hash::InstanceMethods
end
These days there is a formally defined method for canonicalizing JSON, for exactly this reason: https://datatracker.ietf.org/doc/html/draft-rundgren-json-canonicalization-scheme-16
There is a ruby implementation here: https://github.com/dryruby/json-canonicalization
Depending on your needs, you could call ary.inspect or ary.to_yaml, even.
I have a Ruby array containing some string values. I need to:
Find all elements that match some predicate
Run the matching elements through a transformation
Return the results as an array
Right now my solution looks like this:
def example
matchingLines = #lines.select{ |line| ... }
results = matchingLines.map{ |line| ... }
return results.uniq.sort
end
Is there an Array or Enumerable method that combines select and map into a single logical statement?
I usually use map and compact together along with my selection criteria as a postfix if. compact gets rid of the nils.
jruby-1.5.0 > [1,1,1,2,3,4].map{|n| n*3 if n==1}
=> [3, 3, 3, nil, nil, nil]
jruby-1.5.0 > [1,1,1,2,3,4].map{|n| n*3 if n==1}.compact
=> [3, 3, 3]
Ruby 2.7+
There is now!
Ruby 2.7 is introducing filter_map for this exact purpose. It's idiomatic and performant, and I'd expect it to become the norm very soon.
For example:
numbers = [1, 2, 5, 8, 10, 13]
enum.filter_map { |i| i * 2 if i.even? }
# => [4, 16, 20]
Here's a good read on the subject.
Hope that's useful to someone!
You can use reduce for this, which requires only one pass:
[1,1,1,2,3,4].reduce([]) { |a, n| a.push(n*3) if n==1; a }
=> [3, 3, 3]
In other words, initialize the state to be what you want (in our case, an empty list to fill: []), then always make sure to return this value with modifications for each element in the original list (in our case, the modified element pushed to the list).
This is the most efficient since it only loops over the list with one pass (map + select or compact requires two passes).
In your case:
def example
results = #lines.reduce([]) do |lines, line|
lines.push( ...(line) ) if ...
lines
end
return results.uniq.sort
end
Another different way of approaching this is using the new (relative to this question) Enumerator::Lazy:
def example
#lines.lazy
.select { |line| line.property == requirement }
.map { |line| transforming_method(line) }
.uniq
.sort
end
The .lazy method returns a lazy enumerator. Calling .select or .map on a lazy enumerator returns another lazy enumerator. Only once you call .uniq does it actually force the enumerator and return an array. So what effectively happens is your .select and .map calls are combined into one - you only iterate over #lines once to do both .select and .map.
My instinct is that Adam's reduce method will be a little faster, but I think this is far more readable.
The primary consequence of this is that no intermediate array objects are created for each subsequent method call. In a normal #lines.select.map situation, select returns an array which is then modified by map, again returning an array. By comparison, the lazy evaluation only creates an array once. This is useful when your initial collection object is large. It also empowers you to work with infinite enumerators - e.g. random_number_generator.lazy.select(&:odd?).take(10).
If you have a select that can use the case operator (===), grep is a good alternative:
p [1,2,'not_a_number',3].grep(Integer){|x| -x } #=> [-1, -2, -3]
p ['1','2','not_a_number','3'].grep(/\D/, &:upcase) #=> ["NOT_A_NUMBER"]
If we need more complex logic we can create lambdas:
my_favourite_numbers = [1,4,6]
is_a_favourite_number = -> x { my_favourite_numbers.include? x }
make_awesome = -> x { "***#{x}***" }
my_data = [1,2,3,4]
p my_data.grep(is_a_favourite_number, &make_awesome) #=> ["***1***", "***4***"]
I'm not sure there is one. The Enumerable module, which adds select and map, doesn't show one.
You'd be required to pass in two blocks to the select_and_transform method, which would be a bit unintuitive IMHO.
Obviously, you could just chain them together, which is more readable:
transformed_list = lines.select{|line| ...}.map{|line| ... }
Simple Answer:
If you have n records, and you want to select and map based on condition then
records.map { |record| record.attribute if condition }.compact
Here, attribute is whatever you want from the record and condition you can put any check.
compact is to flush the unnecessary nil's which came out of that if condition
No, but you can do it like this:
lines.map { |line| do_some_action if check_some_property }.reject(&:nil?)
Or even better:
lines.inject([]) { |all, line| all << line if check_some_property; all }
I think that this way is more readable, because splits the filter conditions and mapped value while remaining clear that the actions are connected:
results = #lines.select { |line|
line.should_include?
}.map do |line|
line.value_to_map
end
And, in your specific case, eliminate the result variable all together:
def example
#lines.select { |line|
line.should_include?
}.map { |line|
line.value_to_map
}.uniq.sort
end
def example
#lines.select {|line| ... }.map {|line| ... }.uniq.sort
end
In Ruby 1.9 and 1.8.7, you can also chain and wrap iterators by simply not passing a block to them:
enum.select.map {|bla| ... }
But it's not really possible in this case, since the types of the block return values of select and map don't match up. It makes more sense for something like this:
enum.inject.with_index {|(acc, el), idx| ... }
AFAICS, the best you can do is the first example.
Here's a small example:
%w[a b 1 2 c d].map.select {|e| if /[0-9]/ =~ e then false else e.upcase end }
# => ["a", "b", "c", "d"]
%w[a b 1 2 c d].select.map {|e| if /[0-9]/ =~ e then false else e.upcase end }
# => ["A", "B", false, false, "C", "D"]
But what you really want is ["A", "B", "C", "D"].
You should try using my library Rearmed Ruby in which I have added the method Enumerable#select_map. Heres an example:
items = [{version: "1.1"}, {version: nil}, {version: false}]
items.select_map{|x| x[:version]} #=> [{version: "1.1"}]
# or without enumerable monkey patch
Rearmed.select_map(items){|x| x[:version]}
If you want to not create two different arrays, you can use compact! but be careful about it.
array = [1,1,1,2,3,4]
new_array = map{|n| n*3 if n==1}
new_array.compact!
Interestingly, compact! does an in place removal of nil. The return value of compact! is the same array if there were changes but nil if there were no nils.
array = [1,1,1,2,3,4]
new_array = map{|n| n*3 if n==1}.tap { |array| array.compact! }
Would be a one liner.
Your version:
def example
matchingLines = #lines.select{ |line| ... }
results = matchingLines.map{ |line| ... }
return results.uniq.sort
end
My version:
def example
results = {}
#lines.each{ |line| results[line] = true if ... }
return results.keys.sort
end
This will do 1 iteration (except the sort), and has the added bonus of keeping uniqueness (if you don't care about uniq, then just make results an array and results.push(line) if ...
Here is a example. It is not the same as your problem, but may be what you want, or can give a clue to your solution:
def example
lines.each do |x|
new_value = do_transform(x)
if new_value == some_thing
return new_value # here jump out example method directly.
else
next # continue next iterate.
end
end
end
I've got a Ruby method like the following:
# Retrieve all fruits from basket that are of the specified kind.
def fruits_of_kind(kind)
basket.select { |f| f.fruit_type == kind.to_s }
end
Right now, you can call this like:
fruits_of_kind(:apple) # => all apples in basket
fruits_of_kind('banana') # => all bananas in basket
and so on.
How do I change the method so that it will correctly handle iterable inputs as well as no inputs and nil inputs? For example, I'd like to be able to support:
fruits_of_kind(nil) # => nil
fruits_of_kind(:apple, :banana) # => all apples and bananas in basket
fruits_of_kind([:apple, 'banana']) # => likewise
Is this possible to do idiomatically? If so, what's the best way to write methods so that they can accept zero, one, or many inputs?
You need to use the Ruby splat operator, which wraps all remaining arguments into an Array and passes them in:
def foo (a, b, *c)
#do stuff
end
foo(1, 2) # a = 1, b = 2, c = []
foo(1, 2, 3, 4, 5) #a = 1, b = 2, c = [3, 4, 5]
In your case, something like this should work:
def fruits_of_kind(*kinds)
kinds.flatten!
basket.select do |fruit|
kinds.each do |kind|
break true if fruit.fruit_type == kind.to_s
end == true #if no match is found, each returns the whole array, so == true returns false
end
end
EDIT
I changed the code to flatten kinds so that you can send in a list. This code will handle entering no kinds at all, but if you want to expressly input nil, add the line kinds = [] if kinds.nil? at the beginning.
Use the VARARGS feature of Ruby.
# Retrieve all fruits from basket that are of the specified kind.
# notice the * prefix used for method parameter
def fruits_of_kind(*kind)
kind.each do |x|
puts x
end
end
fruits_of_kind(:apple, :orange)
fruits_of_kind()
fruits_of_kind(nil)
-sasuke
def fruits_of_kind(kind)
return nil if kind.nil?
result = []
([] << kind).flatten.each{|k| result << basket.select{|f| f.fruit_type == k.to_s }}
result
end
The 'splat' operator is probably the best way to go, but there are two things to watch out for: passing in nil or lists. To modify Pesto's solution for the input/output you'd like, you should do something like this:
def fruits_of_kind(*kinds)
return nil if kinds.compact.empty?
basket.select do |fruit|
kinds.flatten.each do |kind|
break true if fruit.fruit_type == kind.to_s
end == true #if no match is found, each returns the whole array, so == true returns false
end
end
If you pass in nil, the * converts it to [nil]. If you want to return nil instead of an empty list, you have to compact it (remove nulls) to [], then return nil if it's empty.
If you pass in a list, like [:apple, 'banana'], the * converts it to [[:apple, 'banana']]. It's a subtle difference, but it's a one-element list containing another list, so you need to flatten kinds before doing the "each" loop. Flattening will convert it to [:apple, 'banana'], like you expect, and give you the results you're looking for.
EDIT: Even better, thanks to Greg Campbell:
def fruits_of_kind(basket, kind)
return nil if kind.nil?
kind_list = ([] << kind).flatten.map{|kind| kind.to_s}
basket.select{|fruit| kind_list.include?(fruit) }
end
OR (using splat)
def fruits_of_kind(*kinds)
return nil if kinds.compact.empty?
kind_list = kinds.flatten.map{|kind| kind.to_s}
basket.select{|fruit| kind_list.include?(fruit.fruit_type) }
end
There's a nicely expressive use of splat as an argument to array creation that handles your last example:
def foo(may_or_may_not_be_enumerable_arg)
arrayified = [*may_or_may_not_be_enumerable_arg]
arrayified.each do |item|
puts item
end
end
obj = "one thing"
objs = ["multiple", "things", 1, 2, 3]
foo(obj)
# one thing
# => ["one thing"]
foo(objs)
# multiple
# things
# 1
# 2
# 3
# => ["multiple", "things", 1, 2, 3]
In Ruby, what's the difference between {} and []?
{} seems to be used for both code blocks and hashes.
Are [] only for arrays?
The documention isn't very clear.
It depends on the context:
When on their own, or assigning to a variable, [] creates arrays, and {} creates hashes. e.g.
a = [1,2,3] # an array
b = {1 => 2} # a hash
[] can be overridden as a custom method, and is generally used to fetch things from hashes (the standard library sets up [] as a method on hashes which is the same as fetch)
There is also a convention that it is used as a class method in the same way you might use a static Create method in C# or Java. e.g.
a = {1 => 2} # create a hash for example
puts a[1] # same as a.fetch(1), will print 2
Hash[1,2,3,4] # this is a custom class method which creates a new hash
See the Ruby Hash docs for that last example.
This is probably the most tricky one -
{} is also syntax for blocks, but only when passed to a method OUTSIDE the arguments parens.
When you invoke methods without parens, Ruby looks at where you put the commas to figure out where the arguments end (where the parens would have been, had you typed them)
1.upto(2) { puts 'hello' } # it's a block
1.upto 2 { puts 'hello' } # syntax error, ruby can't figure out where the function args end
1.upto 2, { puts 'hello' } # the comma means "argument", so ruby sees it as a hash - this won't work because puts 'hello' isn't a valid hash
Another, not so obvious, usage of [] is as a synonym for Proc#call and Method#call. This might be a little confusing the first time you encounter it. I guess the rational behind it is that it makes it look more like a normal function call.
E.g.
proc = Proc.new { |what| puts "Hello, #{what}!" }
meth = method(:print)
proc["World"]
meth["Hello",","," ", "World!", "\n"]
Broadly speaking, you're correct. As well as hashes, the general style is that curly braces {} are often used for blocks that can fit all onto one line, instead of using do/end across several lines.
Square brackets [] are used as class methods in lots of Ruby classes, including String, BigNum, Dir and confusingly enough, Hash. So:
Hash["key" => "value"]
is just as valid as:
{ "key" => "value" }
The square brackets [ ] are used to initialize arrays.
The documentation for initializer case of [ ] is in
ri Array::[]
The curly brackets { } are used to initialize hashes.
The documentation for initializer case of { } is in
ri Hash::[]
The square brackets are also commonly used as a method in many core ruby classes, like Array, Hash, String, and others.
You can access a list of all classes that have method "[ ]" defined with
ri []
most methods also have a "[ ]=" method that allows to assign things, for example:
s = "hello world"
s[2] # => 108 is ascii for e
s[2]=109 # 109 is ascii for m
s # => "hemlo world"
Curly brackets can also be used instead of "do ... end" on blocks, as "{ ... }".
Another case where you can see square brackets or curly brackets used - is in the special initializers where any symbol can be used, like:
%w{ hello world } # => ["hello","world"]
%w[ hello world ] # => ["hello","world"]
%r{ hello world } # => / hello world /
%r[ hello world ] # => / hello world /
%q{ hello world } # => "hello world"
%q[ hello world ] # => "hello world"
%q| hello world | # => "hello world"
a few examples:
[1, 2, 3].class
# => Array
[1, 2, 3][1]
# => 2
{ 1 => 2, 3 => 4 }.class
# => Hash
{ 1 => 2, 3 => 4 }[3]
# => 4
{ 1 + 2 }.class
# SyntaxError: compile error, odd number list for Hash
lambda { 1 + 2 }.class
# => Proc
lambda { 1 + 2 }.call
# => 3
Note that you can define the [] method for your own classes:
class A
def [](position)
# do something
end
def #rank.[]= key, val
# define the instance[a] = b method
end
end