I just started to learn Erlang, and really like their list comprehension syntax, for example:
Weather = [{toronto, rain}, {montreal, storms}, {london, fog}, {paris, sun}, {boston, fog}, {vancounver, snow}].
FoggyPlaces = [X || {X, fog} <- Weather].
In this case, FoggyPlaces will evaluate to "london" and "boston".
What's the best way to do this in Ruby?
For example, an Array like (very common, I believe):
weather = [{city: 'toronto', weather: :rain}, {city: 'montreal', weather: :storms}, {city: 'london', weather: :fog}, {city: 'paris', weather: :sun}, {city: 'boston', weather: :fog}, {city: 'vancounver', weather: :snow}]
The best I got 'til now is:
weather.collect {|w| w[:city] if w[:weather] == :fog }.compact
But in this case, I have to call compact to remove nil values, and the example itself is not that readable as Erlang.
And even more, in the Erlang example, both city and weather are atoms. I don't even know how to get something that makes sense and looks good like this in Ruby.
First off, your data structures aren't equivalent. The equivalent Ruby data structure to your Erlang example would be more like
weather = [[:toronto, :rain], [:montreal, :storms], [:london, :fog],
[:paris, :sun], [:boston, :fog], [:vancouver, :snow]]
Secondly, yes, Ruby doesn't have list comprehensions nor pattern matching. So, the example will probably be more complex. Your list comprehension first filters all foggy cities, then projects the name. Let's do the same in Ruby:
weather.select {|_, weather| weather == :fog }.map(&:first)
# => [:london, :boston]
However, Ruby is centered around objects, but you are using abstract data types. With a more object-oriented data abstraction, the code would probably look more like
weather.select(&:foggy?).map(&:city)
which isn't too bad, is it?
Related
I have an array with hashes:
test = [
{"type"=>1337, "age"=>12, "name"=>"Eric Johnson"},
{"type"=>1338, "age"=>18, "name"=>"John Doe"},
{"type"=>1339, "age"=>22, "name"=>"Carl Adley"},
{"type"=>1340, "age"=>25, "name"=>"Anna Brent"}
]
I am interested in getting all the hashes where the name key equals to a value that can be found in an array:
get_hash_by_name = ["John Doe","Anna Brent"]
Which would end up in the following:
# test_sorted = would be:
# {"type"=>1338, "age"=>18, "name"=>"John Doe"}
# {"type"=>1340, "age"=>25, "name"=>"Anna Brent"}
I probably have to iterate with test.each somehow, but I still trying to get a grasp of Ruby. Happy for all help!
Here's something to meditate on:
Iterating over an array to find something is slow, even if it's a sorted array. Computer languages have various structures we can use to improve the speed of lookups, and in Ruby Hash is usually a good starting point. Where an Array is like reading from a sequential file, a Hash is like reading from a random-access file, we can jump right to the record we need.
Starting with your test array-of-hashes:
test = [
{'type'=>1337, 'age'=>12, 'name'=>'Eric Johnson'},
{'type'=>1338, 'age'=>18, 'name'=>'John Doe'},
{'type'=>1339, 'age'=>22, 'name'=>'Carl Adley'},
{'type'=>1340, 'age'=>25, 'name'=>'Anna Brent'},
{'type'=>1341, 'age'=>13, 'name'=>'Eric Johnson'},
]
Notice that I added an additional "Eric Johnson" record. I'll get to that later.
I'd create a hash that mapped the array of hashes to a regular hash where the key of each pair is a unique value. The 'type' key/value pair appears to fit that need well:
test_by_types = test.map { |h| [
h['type'], h]
}.to_h
# => {1337=>{"type"=>1337, "age"=>12, "name"=>"Eric Johnson"},
# 1338=>{"type"=>1338, "age"=>18, "name"=>"John Doe"},
# 1339=>{"type"=>1339, "age"=>22, "name"=>"Carl Adley"},
# 1340=>{"type"=>1340, "age"=>25, "name"=>"Anna Brent"},
# 1341=>{"type"=>1341, "age"=>13, "name"=>"Eric Johnson"}}
Now test_by_types is a hash using the type value to point to the original hash.
If I create a similar hash based on names, where each name, unique or not, points to the type values, I can do fast lookups:
test_by_names = test.each_with_object(
Hash.new { |h, k| h[k] = [] }
) { |e, h|
h[e['name']] << e['type']
}.to_h
# => {"Eric Johnson"=>[1337, 1341],
# "John Doe"=>[1338],
# "Carl Adley"=>[1339],
# "Anna Brent"=>[1340]}
Notice that "Eric Johnson" points to two records.
Now, here's how we look up things:
get_hash_by_name = ['John Doe', 'Anna Brent']
test_by_names.values_at(*get_hash_by_name).flatten
# => [1338, 1340]
In one quick lookup Ruby returned the matching types by looking up the names.
We can take that output and grab the original hashes:
test_by_types.values_at(*test_by_names.values_at(*get_hash_by_name).flatten)
# => [{"type"=>1338, "age"=>18, "name"=>"John Doe"},
# {"type"=>1340, "age"=>25, "name"=>"Anna Brent"}]
Because this is running against hashes, it's fast. The hashes can be BIG and it'll still run very fast.
Back to "Eric Johnson"...
When dealing with the names of people it's likely to get collisions of the names, which is why test_by_names allows multiple type values, so with one lookup all the matching records can be retrieved:
test_by_names.values_at('Eric Johnson').flatten
# => [1337, 1341]
test_by_types.values_at(*test_by_names.values_at('Eric Johnson').flatten)
# => [{"type"=>1337, "age"=>12, "name"=>"Eric Johnson"},
# {"type"=>1341, "age"=>13, "name"=>"Eric Johnson"}]
This will be a lot to chew on if you're new to Ruby, but the Ruby documentation covers it all, so dig through the Hash, Array and Enumerable class documentation.
Also, *, AKA "splat", explodes the array elements from the enclosing array into separate parameters suitable for passing into a method. I can't remember where that's documented.
If you're familiar with database design this will look very familiar, because it's similar to how we do database lookups.
The point of all of this is that it's really important to consider how you're going to store your data when you first ingest it into your program. Do it wrong and you'll jump through major hoops trying to do useful things with it. Do it right and the code and data will flow through very easily, and you'll be able to massage/extract/combine the data easily.
Said differently, Arrays are containers useful for holding things you want to access sequentially, such as jobs you want to print, sites you need to access in order, files you want to delete in a specific order, but they're lousy when you want to lookup and work with a record randomly.
Knowing which container is appropriate is important, and for this particular task, it appears that an array of hashes isn't appropriate, since there's no fast way of accessing specific ones.
And that's why I made my comment above asking what you were trying to accomplish in the first place. See "What is the XY problem?" and "XyProblem" for more about that particular question.
You can use select and include? so
test.select {|object| get_hash_by_name.include? object['name'] }
…should do the job.
I have to search an item in an array and return the value of the next item. Example:
a = ['abc.df','-f','test.h']
i = a.find_index{|x| x=~/-f/}
puts a[i+1]
Is there any better way other than working with index?
A classical functional approach uses no indexes (xs.each_cons(2) -> pairwise combinations of xs):
xs = ['abc.df', '-f', 'test.h']
(xs.each_cons(2).detect { |x, y| x =~ /-f/ } || []).last
#=> "test.h"
Using Enumerable#map_detect simplifies it a litte bit more:
xs.each_cons(2).map_detect { |x, y| y if x =~ /-f/ }
#=> "test.h"
The reason something like array.find{something}.next doesn't exist is that it's an array rather than a linked list. Each item is just it's own value; it doesn't have a concept of "the item after me".
#tokland gives a good solution by iterating over the array with each pair of consecutive items, so that when the first item matches, you have your second item handy. There are strong arguments to be made for the functional style, to be sure. Your version is shorter, though, and I'd argue that yours is also more quickly and easily understood at a glance.
If the issue is that you're using it a lot and want something cleaner and more to the point, then of course you could just add it as a singleton method to a:
def a.find_after(&test)
self[find_index(&test).next]
end
Then
a.find_after{|x| x=~/-f/}
is a clear way to find the next item after the first match.
All of that said, I think #BenjaminCox makes the best point about what appears to be your actual goal. If you're parsing command line options, there are libraries that do that well.
I don't know of a cleaner way to do that specific operation. However, it sure looks like you're trying to parse command-line arguments. If so, I'd recommend using the built-in OptionParser module - it'll save a ton of time and hair-pulling trying to parse them yourself.
This article explains how it works.
Your solution working with indexes is fine, as others have commented. You could use Enumerable#drop_while to get an array from your match on and take the second element of that:
a = ['abc.df','-f','test.h']
f_arg = a.drop_while { |e| e !~ /-f/ }[1]
Given a json array:
[{ "x":"5", "y":"20" },{ "x":"6", "y":"10" },{ "x":"50", "y":"5" }]
I'd like to find argmax(x), such that I can do puts argmax(arr, :arg => "x").y and get 5. How can I elegantly implement this in Ruby?
Edit: Clarified a bit. The idea is that you can specify the field of an element in a list that you want to maximize and the method will return the maximizing element.
I think you want Enumerable#max_by. To get y like you're saying, it would be:
arr.max_by {|hash| hash['x']}['y']
(Well, actually, you'll want the numbers to be numbers instead of strings, since '50' sorts lower than '6'. But I think you get the idea. You can to_i or do whatever processing you need in the block to get the "real" value to sort by.)
In the code below, the order of my items gets changed after the JSON.parse(f) line, i.e., this hash:
{
a => aval,
b => bval,
c => cval,
d => dval
}
becomes something like:
{
b => bval,
c => cval,
a => aval,
d => dval
}
This is a problem because my display code just reads from the json file, so any time I save back to it, and then display, everything gets changed around. Is there anything I can do to retain the order?
CODE:
f = File.read($PLAN_DESC_PATH)
puts ("f " + f.to_s())
hash = JSON.parse(f)
puts ("hash " + hash.to_s())
My Ruby version is 1.8.7. I am using Sinatra. I believe I got the JSON gem from here: http://flori.github.com/json/ (sorry, kinda new to this). Thanks!
In Ruby 1.8.7 the Hash class does not maintain order either by keys or by order added. If you need something like that, you would need to implement something like ActiveSupport::OrderedHash (http://rubydoc.info/docs/rails/ActiveSupport/OrderedHash)
In Ruby 1.9.x hashes are ordered by when they are inserted by default (see http://www.ruby-doc.org/core/classes/Hash.html)
When you serialize a hash to JSON, all bets are off for maintaining order of your keys. You'll need some post processing after your serialization to ensure order if that's necessary for you.
No, hashmaps are not meant to have a specific ordering. If you need ordering use something different like an array. Or extract all the keys, sort them like you want and then you can have what order you like.
Making assumptions on ordering inside maps is anyway something on which you shouldn't rely, that's the fact.
A good alternative would be to have:
[ [a, aval], [b, bval], ... ]
Jack answered for Ruby, so I'll answer for JSON. From RFC 4627 (emphasis added):
"An object is an unordered collection of zero or more name/value pairs"
I have a bunch of regression test data. Each test is just a list of messages (associative arrays), mapping message field names to values. There's a lot of repetition within this data.
For example
test1 = [
{ sender => 'client', msg => '123', arg => '900', foo => 'bar', ... },
{ sender => 'server', msg => '456', arg => '800', foo => 'bar', ... },
{ sender => 'client', msg => '789', arg => '900', foo => 'bar', ... },
]
I would like to represent the field data (as a minimal-depth decision tree?) so that each message can be programatically regenerated using a minimal number of parameters. For example, in the above
foo is always 'bar', so I don't need to mention it
sender and client are correlated, so I only need to mention one or the other
and msg is different each time
So I would like to be able to regenerate these messages with a program along the lines of
write_msg( 'client', '123' )
write_msg( 'server', '456' )
write_msg( 'client', '789' )
where the write_msg function would be composed of nested if statements or subfunction calls using the parameters.
Based on my original data, how can I determine the 'most important' set of parameters, i.e. the ones that will let me recreate my data set using the smallest number of arguments?
The following papers describe algortithms for discovering functional dependencies:
Y. Huhtala, J. Kärkkäinen, P. Porkka,
and H. Toivonen. TANE: An efficient
algorithm for discovering functional
and approximate dependencies. The
Computer Journal, 42(2):100–111,
1999, doi:10.1093/comjnl/42.2.100.
I. Savnik and P. A. Flach. Bottom-up
induction of functional dependencies
from relations. In Proc. AAAI-93 Workshop:
Knowledge Discovery in Databases,
pages 174–185, Washington, DC, USA,
1993.
C. Wyss, C. Giannella, and E.
Robertson. FastFDs: A
Heuristic-Driven, Depth-First
Algorithm for Mining Functional
Dependencies from Relation Instances.
In Proc. Data Warehousing and Knowledge Discovery, pages 101–110, Munich,
Germany, 2001, doi:10.1007/3-540-44801-2.
Hong Yao and Howard J. Hamilton. "Mining functional dependencies from data." Data Mining and Knowledge Discovery, 2008, doi:10.1007/s10618-007-0083-9.
There has also been some work on discovering multivalued dependencies:
I. Savnik and P. A. Flach. "Discovery
of Mutlivalued Dependencies from
Relations." Intelligent Data Analysis
Journal, 4(3):195–211, IOS Press, 2000.
This looks very similar to Database Normalization.
You have a relation (your test data set), and some known functional dependencies ({sender} => arg, {} => foo and possibly {msg} => sender. If the order of tests is important then add {testNr} => msg.) and you want to eliminate redundancies.
Treat your test set as a database table, apply the normalization rules and create equivalent functions (getArgFromSender(sender) etc.) for each join.
If the number of fields and records is small:
Brute force it by looping through every combination of fields, and for each combination detect if there are multiple items in the list which map to the same value.
If you can live with a fairly good choice of fields:
Start off assuming you need all fields. Then, select a field at random and see if it can be eliminated; if it can, cross it off the set of fields. Otherwise, choose another field at random and try again. If you find no fields can be eliminated, then you've found a reasonable set of fields. Had you chosen other fields first, you may find a better solution. You can repeat the whole procedure a few times and pick the best solution if you like. This kind of approach is called hill climbing.
(I suspect that this problem is NP complete, i.e. we probably don't know of an efficient and powerful solution so it is not worth losing sleep over trying to dream up a perfect solution.)