EDIT:
In Short
I have 3 attributes for a single entity, and I have about 100 such entities. I need a good data structure to store them and retrieve them efficiently.
Example:
Lets consider an image with 100 pixels.
Each pixel has three attributes - Red, Green and Blue. I need to store the entire image in terms of its pixels and its RGB values in a data structure like Hash.
An example data structure I was thinking of was something like this:
x={[:red => 1, :green => 2, :blue => 3],[:red => 21, :green => 21, :blue => 32], [:red => 21, :green => 21, :blue => 32]}
My question:
1) Is there a better way to store such sets of data?
2) Is there an efficient way to access such sets data?
In other words, What's the most easiest and efficient way to store multiple key-value pair sets of data and access them efficiently?
Disclaimer: I'm a newbie to Ruby (made some 50% progress).
Thank you.
I think this is what you're asking so please clarify if I'm off base. You want a quick and easy way to take a hash and turn it into a object with methods like x.red, correct? An OpenStruct might be the answer:
require 'ostruct'
hash = { :red => 1, :green => 2, :blue => 3 }
colorset = OpenStruct.new(hash)
Then you can call:
colorset.red + colorset.green + colorset.blue
And get:
=> 6
EDIT:
Based on your comments, forget the above, I think you simply need nested hashes with meaningful keys:
colors = { 'fuschia' => { 'red'=> 1 , 'green' => 2, 'blue' => 3 },
'goldenrod' => { 'red'=> 2, 'green' => 3, 'blue'=> 4 } }
Then access values like this:
colors['fuschia']['red']
=> 1
Related
I'm trying to complete a project-based assessment for a job interview, and they only offer it in Ruby on Rails, which I know little to nothing about. I'm trying to take one hash that contains two or more hashes of arrays and combine the arrays into one array of hashes, while eliminating duplicate hashes based on an "id":value pair.
So I'm trying to take this:
h = {
'first' =>
[
{ 'authorId' => 12, 'id' => 2, 'likes' => 469 },
{ 'authorId' => 5, 'id' => 8, 'likes' => 735 },
{ 'authorId' => 8, 'id' => 10, 'likes' => 853 }
],
'second' =>
[
{ 'authorId' => 9, 'id' => 1, 'likes' => 960 },
{ 'authorId' => 12, 'id' => 2, 'likes' => 469 },
{ 'authorId' => 8, 'id' => 4, 'likes' => 728 }
]
}
And turn it into this:
[
{ 'authorId' => 12, 'id' => 2, 'likes' => 469 },
{ 'authorId' => 5, 'id' => 8, 'likes' => 735 },
{ 'authorId' => 8, 'id' => 10, 'likes' => 853 },
{ 'authorId' => 9, 'id' => 1, 'likes' => 960 },
{ 'authorId' => 8, 'id' => 4, 'likes' => 728 }
]
Ruby has many ways to achieve this.
My first instinct is to group them by id it and pick only first item from the array.
h.values.flatten.group_by{|x| x["id"]}.map{|k,v| v[0]}
Much cleaner approach is to pick the distinct item based on id after flattening the array of hash which is what Cary Swoveland suggested in the comments
h.values.flatten.uniq { |h| h['id'] }
TL;DR
The simplest solution to the problem that fits the data you posted is h.values.flatten.uniq. You can stop reading here unless you want to understand why you don't need to care about duplicate IDs with this particular data set, or when you might need to care and why that's often less straightforward than it seems.
Near the end I also mention some features of Rails that address edge cases that you don't need for this specific data. However, they might help with other use cases.
Skip ID-Specific Deduplication; Focus on Removing Duplicate Hashes Instead
First of all, you have no duplicate id keys that aren't also part of duplicate Hash objects. Despite the fact that Ruby implementations preserve entry order of Hash objects, a Hash is conceptually unordered. Pragmatically, that means two Hash objects with the same keys and values (even if they are in a different insertion order) are still considered equal. So, perhaps unintuitively:
{'authorId' => 12, 'id' => 2, 'likes' => 469} ==
{'id' => 2, 'likes' => 469, 'authorId' => 12}
#=> true
Given your example input, you don't actually have to worry about unique IDs for this exercise. You just need to eliminate duplicate Hash objects from your merged Array, and you have only one of those.
duplicate_ids =
h.values.flatten.group_by { _1['id'] }
.reject { _2.one? }.keys
#=> [2]
unique_hashes_with_duplicate_ids =
h.values.flatten.group_by { _1['id'] }
.reject { _2.uniq.one? }.count
#=> 0
As you can see, 'id' => 2 is the only ID found in both Hash values, albeit in identical Hash objects. Since you have only one duplicate Hash, the problem has been reduced to flattening the Array of Hash values stored in h so that you can remove any duplicate Hash elements (not duplicate IDs) from the combined Array.
Solution to the Posted Problem
There might be uses cases where you need to handle the uniqueness of Hash keys, but this is not one of them. Unless you want to sort your result by some key, all you really need is:
h.values.flatten.uniq
Since you aren't being asked to sort the Hash objects in your consolidated Array, you can avoid the need for another method call that (in this case, anyway) is a no-op.
"Uniqueness" Can Be Tricky Absent Additional Context
The only reason to look at your id keys at all would be if you had duplicate IDs in multiple unique Hash objects, and if that were the case you'd then have to worry about which Hash was the correct one to keep. For example, given:
[ {'id' => 1, 'authorId' => 9, 'likes' => 1_920},
{'id' => 1, 'authorId' => 9, 'likes' => 960} ]
which one of these records is the "duplicate" one? Without other data such as a timestamp, simply chaining uniq { h['id' } or merging the Hash objects will either net you the first or last record respectively. Consider:
[
{'id' => 1, 'authorId' => 9, 'likes' => 1_920},
{'id' => 1, 'authorId' => 9, 'likes' => 960}
].uniq { _1['id'] }
#=> [{"id"=>1, "authorId"=>9, "likes"=>1920}]
[
{'id' => 1, 'authorId' => 9, 'likes' => 1_920},
{'id' => 1, 'authorId' => 9, 'likes' => 960}
].reduce({}, :merge)
#=> {"id"=>1, "authorId"=>9, "likes"=>960}
Leveraging Context Like Rails-Specific Timestamp Features
While the uniqueness problem described above may seem out of scope for the question you're currently being asked, understanding the limitations of any kind of data transformation is useful. In addition, knowing that Ruby on Rails supports ActiveRecord::Timestamp and the creation and management of timestamp-related columns within database migrations may be highly relevant in a broader sense.
You don't need to know these things to answer the original question. However, knowing when a given solution fits a specific use case and when it doesn't is important too.
I have an array filled with hashes. The data structure looks like this:
students = [
{
"first_name" => "James",
"last_name" => "Sullivan",
"age" => 20,
"study_results" => {"CAR" => 1, "PR1" => 1, "MA1" => 1, "BEN" => 2, "SDP" => nil}
}
]
I want to find students with the mark 1 from at least two subjects.
I tried to convert the hash with marks into an array and then use the inject method to count the number of 1 and find out if the number is > 1:
students.select{|student
(
(student["study_results"].to_a)
.inject(0){|sum, x| sum += 1 if x.include?(1)}
) > 1
}
Is there any way to put a condition into the method, or should I find a different way to solve it?
I commend you for attempting to solve this before posting, but you made it unnecessarily complicated. I'd write it like this:
students.select{|student| student['study_results'].values.count(1) >= 2}
That's all, no need for inject. You were misusing it here.
Ruby collections have TONS of useful methods. If you find yourself using inject or each, there's a better method for this, 90% of the time.
Explanation:
student['study_results'] # => {"CAR"=>1, "PR1"=>1, "MA1"=>1, "BEN"=>2, "SDP"=>nil}
student['study_results'].values # => [1, 1, 1, 2, nil]
student['study_results'].values.count(1) # => 3
student['study_results'].values.count(2) # => 1
student['study_results'].values.count(3) # => 0
student['study_results'].values.count(nil) # => 1
Documentation
Hash#values
Array#count
I heard that the positions of the key value pairs in a hash are not fixed, and could be rearranged.
I would like to know if this is true, and if it is, could someone point me to some documentation? If it is wrong, it would be great to have some documentation to the contrary.
To illustrate, if I have the following hash:
NUMBERS = {
1000 => "M",
900 => "CM",
500 => "D",
400 => "CD",
100 => "C",
90 => "XC",
50 => "L",
40 => "XL",
10 => "X",
9 => "IX",
5 => "V",
4 => "IV",
1 => "I",
}
and iterate through it over and over again, would the first key/value pair possibly not be 1000 => 'M'? Or, are the positions of the key/value pairs fixed by definition, and would have to be manually changed in order for the positions to change?
This question is a more general and basic question about the qualities of hashes. I'm not asking how to get to a certain position in a hash.
Generally hashes (or dictionaries, associative arrays etc...) are considered unordered data structures.
From Wikipedia
In addition, associative arrays may also include other operations such
as determining the number of bindings or constructing an iterator to
loop over all the bindings. Usually, for such an operation, the order
in which the bindings are returned may be arbitrary.
However since Ruby 1.9, hash keys maintain the order in which they were inserted in Ruby.
The answer is right at the top of the Ruby documentation for Hash
Hashes enumerate their values in the order that the corresponding keys
were inserted.
In Ruby you can test it yourself pretty easily
key_indices = {
1000 => 0,
900 => 1,
500 => 2,
400 => 3,
100 => 4,
90 => 5,
50 => 6,
40 => 7,
10 => 8,
9 => 9,
5 => 10,
4 => 11,
1 => 12
}
1_000_000.times do
key_indices.each_with_index do |key_val, i|
raise if key_val.last != i
end
end
A hash (also called associative array) is an unordered data structure.
Since Ruby 1.9 Ruby keeps the order of the keys as inserted though.
You can find a whole lot more about this here: Is order of a Ruby hash literal guaranteed?
And some here https://ruby-doc.org/core-2.4.1/Hash.html
I have an array that stores banned IP addresses in my application:
bannedips = ["10.10.10.10", "20.20.20.20", "30.30.30.30"]
I want to add more information to each banned IP address (IP address, ban timestamp, ban reason).
How can I do this in Ruby?
In Ruby, multidimensional arrays are simply arrays of arrays:
bannedips = [["10.10.10.10", "more data", "etc"], ["20.20.20.20", ...]]
A better approach would be to use an array of hashes, so you can label values:
bannedips = [{ip: "10.10.10.10", timestamp: 89327414}, ...]
If there are a reasonable number of IPs to be tracked, I'd probably use a simple Hash:
banned_ips = {
"10.10.10.10" => {:timestamp => Time.now, :reason => 'foo'},
"20.20.20.20" => {:timestamp => Time.now, :reason => 'bar'},
"30.30.30.30" => {:timestamp => Time.now, :reason => nil}
}
A hash is a quick and dirty way to create a list that acts like an indexed database; Lookups are extremely fast. And, since you can only have a single instance of a particular key, it keeps you from dealing with duplicate data:
banned_ips["20.20.20.20"] # => {:timestamp=>2015-01-02 12:33:19 -0700, :reason=>"bar"}
banned_ips.keys # => ["10.10.10.10", "20.20.20.20", "30.30.30.30"]
As a general programming tip for choosing arrays vs hashes. If you:
have to quickly access a specific value, use a hash, which acts like a random-access database.
want to have a queue or list of values you'll sequentially access, then use an Array.
So, for what you want, retrieving values tied to a specific IP, use a hash. An array, or array-of-arrays would cause the code to waste time looking for the particular value and would slow down as new items were added to the array because of those lookups.
There's a point where it becomes more sensible to store this sort of information into a database, and as a developer it's good to learn about them. They're one of many tools we need to have in our toolbox.
Yes, multidimensional arrays are possible in Ruby. Arrays can contain any value, so a multidimensional array is just an array which contains other arrays:
banned_ips = [
["10.10.10.10", Date.new(2015, 1, 2), "reason"],
["20.20.20.20", Date.new(2014, 12, 28), "reason"],
["30.30.30.30", Date.new(2014, 12, 29), "reason"],
]
Personally though I wouldn't recommend using a multidimensional array for this purpose. Instead, create a class which encapsulates information about the banned IP.
Simple example:
class BannedIP
attr_reader :ip, :time, :reason
def initialize(ip, time:, reason: "N/A")
#ip = ip
#time = time
#reason = reason
end
end
banned_ips = [
BannedIP.new("10.10.10.10", time: Date.new(2015, 1, 2)),
BannedIP.new("20.20.20.20", time: Date.new(2014, 12, 28)),
BannedIP.new("30.30.30.30", time: Date.new(2014, 12, 29), reason: "Spam"),
]
One wants to store objects, allowing them to be retrieved via a numerical key. These keys can range from 0 to an arbitrary size (~100K, for instance), but not every natural number in the range has a corresponding object.
One might have the following:
structure[0] => some_obj_a
structure[3] => some_obj_b
structure[7] => some_obj_c
...
structure[100103] => some_obj_z
But all other keys (1, 2, 4, 5, 6, ...) do not have an associated object. The numerical keys are used for retrieval, such that an "ID" is provided to return an object associated to that ID:
ID = get_input_id
my_obj = structure[ID]
What is the most efficient data structure for this scenario in Ruby? And for what reasons? (So far, I can see it being a hash or an array.)
I define "efficient" in terms of:
Least memory used
Fastest lookup times
Fastest entry creation/updates (at arbitrary keys)
An initialization for this structure might be
hsh = Hash.new # or Array.new
hsh[0] = {:id => 0, :var => "a", :count => 45}
hsh[3] = {:id => 3, :var => "k", :count => 32}
hsh[7] = {:id => 7, :var => "e", :count => 2}
You've essentially described a sparse array or a hash.
Hashes are fast and only use the memory they have to use. They are also memory efficient. There is no "magic" data structure for this that'll be any faster. Use a hash.