An Efficient Numerically Keyed Data Structure in Ruby - ruby

One wants to store objects, allowing them to be retrieved via a numerical key. These keys can range from 0 to an arbitrary size (~100K, for instance), but not every natural number in the range has a corresponding object.
One might have the following:
structure[0] => some_obj_a
structure[3] => some_obj_b
structure[7] => some_obj_c
...
structure[100103] => some_obj_z
But all other keys (1, 2, 4, 5, 6, ...) do not have an associated object. The numerical keys are used for retrieval, such that an "ID" is provided to return an object associated to that ID:
ID = get_input_id
my_obj = structure[ID]
What is the most efficient data structure for this scenario in Ruby? And for what reasons? (So far, I can see it being a hash or an array.)
I define "efficient" in terms of:
Least memory used
Fastest lookup times
Fastest entry creation/updates (at arbitrary keys)
An initialization for this structure might be
hsh = Hash.new # or Array.new
hsh[0] = {:id => 0, :var => "a", :count => 45}
hsh[3] = {:id => 3, :var => "k", :count => 32}
hsh[7] = {:id => 7, :var => "e", :count => 2}

You've essentially described a sparse array or a hash.
Hashes are fast and only use the memory they have to use. They are also memory efficient. There is no "magic" data structure for this that'll be any faster. Use a hash.

Related

Position of key/value pairs in a hash in Ruby (or any language)

I heard that the positions of the key value pairs in a hash are not fixed, and could be rearranged.
I would like to know if this is true, and if it is, could someone point me to some documentation? If it is wrong, it would be great to have some documentation to the contrary.
To illustrate, if I have the following hash:
NUMBERS = {
1000 => "M",
900 => "CM",
500 => "D",
400 => "CD",
100 => "C",
90 => "XC",
50 => "L",
40 => "XL",
10 => "X",
9 => "IX",
5 => "V",
4 => "IV",
1 => "I",
}
and iterate through it over and over again, would the first key/value pair possibly not be 1000 => 'M'? Or, are the positions of the key/value pairs fixed by definition, and would have to be manually changed in order for the positions to change?
This question is a more general and basic question about the qualities of hashes. I'm not asking how to get to a certain position in a hash.
Generally hashes (or dictionaries, associative arrays etc...) are considered unordered data structures.
From Wikipedia
In addition, associative arrays may also include other operations such
as determining the number of bindings or constructing an iterator to
loop over all the bindings. Usually, for such an operation, the order
in which the bindings are returned may be arbitrary.
However since Ruby 1.9, hash keys maintain the order in which they were inserted in Ruby.
The answer is right at the top of the Ruby documentation for Hash
Hashes enumerate their values in the order that the corresponding keys
were inserted.
In Ruby you can test it yourself pretty easily
key_indices = {
1000 => 0,
900 => 1,
500 => 2,
400 => 3,
100 => 4,
90 => 5,
50 => 6,
40 => 7,
10 => 8,
9 => 9,
5 => 10,
4 => 11,
1 => 12
}
1_000_000.times do
key_indices.each_with_index do |key_val, i|
raise if key_val.last != i
end
end
A hash (also called associative array) is an unordered data structure.
Since Ruby 1.9 Ruby keeps the order of the keys as inserted though.
You can find a whole lot more about this here: Is order of a Ruby hash literal guaranteed?
And some here https://ruby-doc.org/core-2.4.1/Hash.html

Is it possible to have multidimensional arrays in Ruby?

I have an array that stores banned IP addresses in my application:
bannedips = ["10.10.10.10", "20.20.20.20", "30.30.30.30"]
I want to add more information to each banned IP address (IP address, ban timestamp, ban reason).
How can I do this in Ruby?
In Ruby, multidimensional arrays are simply arrays of arrays:
bannedips = [["10.10.10.10", "more data", "etc"], ["20.20.20.20", ...]]
A better approach would be to use an array of hashes, so you can label values:
bannedips = [{ip: "10.10.10.10", timestamp: 89327414}, ...]
If there are a reasonable number of IPs to be tracked, I'd probably use a simple Hash:
banned_ips = {
"10.10.10.10" => {:timestamp => Time.now, :reason => 'foo'},
"20.20.20.20" => {:timestamp => Time.now, :reason => 'bar'},
"30.30.30.30" => {:timestamp => Time.now, :reason => nil}
}
A hash is a quick and dirty way to create a list that acts like an indexed database; Lookups are extremely fast. And, since you can only have a single instance of a particular key, it keeps you from dealing with duplicate data:
banned_ips["20.20.20.20"] # => {:timestamp=>2015-01-02 12:33:19 -0700, :reason=>"bar"}
banned_ips.keys # => ["10.10.10.10", "20.20.20.20", "30.30.30.30"]
As a general programming tip for choosing arrays vs hashes. If you:
have to quickly access a specific value, use a hash, which acts like a random-access database.
want to have a queue or list of values you'll sequentially access, then use an Array.
So, for what you want, retrieving values tied to a specific IP, use a hash. An array, or array-of-arrays would cause the code to waste time looking for the particular value and would slow down as new items were added to the array because of those lookups.
There's a point where it becomes more sensible to store this sort of information into a database, and as a developer it's good to learn about them. They're one of many tools we need to have in our toolbox.
Yes, multidimensional arrays are possible in Ruby. Arrays can contain any value, so a multidimensional array is just an array which contains other arrays:
banned_ips = [
["10.10.10.10", Date.new(2015, 1, 2), "reason"],
["20.20.20.20", Date.new(2014, 12, 28), "reason"],
["30.30.30.30", Date.new(2014, 12, 29), "reason"],
]
Personally though I wouldn't recommend using a multidimensional array for this purpose. Instead, create a class which encapsulates information about the banned IP.
Simple example:
class BannedIP
attr_reader :ip, :time, :reason
def initialize(ip, time:, reason: "N/A")
#ip = ip
#time = time
#reason = reason
end
end
banned_ips = [
BannedIP.new("10.10.10.10", time: Date.new(2015, 1, 2)),
BannedIP.new("20.20.20.20", time: Date.new(2014, 12, 28)),
BannedIP.new("30.30.30.30", time: Date.new(2014, 12, 29), reason: "Spam"),
]

Getting an array of hash values given specific keys

Given certain keys, I want to get an array of values from a hash (in the order I gave the keys). I had done this:
class Hash
def values_for_keys(*keys_requested)
result = []
keys_requested.each do |key|
result << self[key]
end
return result
end
end
I modified the Hash class because I do plan to use it almost everywhere in my code.
But I don't really like the idea of modifying a core class. Is there a builtin solution instead? (couldn't find any, so I had to write this).
You should be able to use values_at:
values_at(key, ...) → array
Return an array containing the values associated with the given keys. Also see Hash.select.
h = { "cat" => "feline", "dog" => "canine", "cow" => "bovine" }
h.values_at("cow", "cat") #=> ["bovine", "feline"]
The documentation doesn't specifically say anything about the order of the returned array but:
The example implies that the array will match the key order.
The standard implementation does things in the right order.
There's no other sensible way for the method to behave.
For example:
>> h = { :a => 'a', :b => 'b', :c => 'c' }
=> {:a=>"a", :b=>"b", :c=>"c"}
>> h.values_at(:c, :a)
=> ["c", "a"]
i will suggest you do this:
your_hash.select{|key,value| given_keys.include?(key)}.values

Multiple Key-Value pairs in a Hash in Ruby

EDIT:
In Short
I have 3 attributes for a single entity, and I have about 100 such entities. I need a good data structure to store them and retrieve them efficiently.
Example:
Lets consider an image with 100 pixels.
Each pixel has three attributes - Red, Green and Blue. I need to store the entire image in terms of its pixels and its RGB values in a data structure like Hash.
An example data structure I was thinking of was something like this:
x={[:red => 1, :green => 2, :blue => 3],[:red => 21, :green => 21, :blue => 32], [:red => 21, :green => 21, :blue => 32]}
My question:
1) Is there a better way to store such sets of data?
2) Is there an efficient way to access such sets data?
In other words, What's the most easiest and efficient way to store multiple key-value pair sets of data and access them efficiently?
Disclaimer: I'm a newbie to Ruby (made some 50% progress).
Thank you.
I think this is what you're asking so please clarify if I'm off base. You want a quick and easy way to take a hash and turn it into a object with methods like x.red, correct? An OpenStruct might be the answer:
require 'ostruct'
hash = { :red => 1, :green => 2, :blue => 3 }
colorset = OpenStruct.new(hash)
Then you can call:
colorset.red + colorset.green + colorset.blue
And get:
=> 6
EDIT:
Based on your comments, forget the above, I think you simply need nested hashes with meaningful keys:
colors = { 'fuschia' => { 'red'=> 1 , 'green' => 2, 'blue' => 3 },
'goldenrod' => { 'red'=> 2, 'green' => 3, 'blue'=> 4 } }
Then access values like this:
colors['fuschia']['red']
=> 1

Is saving a hash in another hash common practice?

I'd like to save some hash objects to a collection (in the Java world think of it as a List). I search online to see if there is a similar data structure in Ruby and have found none. For the moment being I've been trying to save hash a[] into hash b[], but have been having issues trying to get data out of hash b[].
Are there any built-in collection data structures on Ruby? If not, is saving a hash in another hash common practice?
If it's accessing the hash in the hash that is the problem then try:
>> p = {:name => "Jonas", :pos => {:x=>100.23, :y=>40.04}}
=> {:pos=>{:y=>40.04, :x=>100.23}, :name=>"Jonas"}
>> p[:pos][:x]
=> 100.23
There shouldn't be any problem with that.
a = {:color => 'red', :thickness => 'not very'}
b = {:data => a, :reason => 'NA'}
Perhaps you could explain what problems you're encountering.
The question is not completely clear, but I think you want to have a list (array) of hashes, right?
In that case, you can just put them in one array, which is like a list in Java:
a = {:a => 1, :b => 2}
b = {:c => 3, :d => 4}
list = [a, b]
You can retrieve those hashes like list[0] and list[1]
Lists in Ruby are arrays. You can use Hash.to_a.
If you are trying to combine hash a with hash b, you can use Hash.merge
EDIT: If you are trying to insert hash a into hash b, you can do
b["Hash a"] = a;
All the answers here so far are about Hash in Hash, not Hash plus Hash, so for reasons of completeness, I'll chime in with this:
# Define two independent Hash objects
hash_a = { :a => 'apple', :b => 'bear', :c => 'camel' }
hash_b = { :c => 'car', :d => 'dolphin' }
# Combine two hashes with the Hash#merge method
hash_c = hash_a.merge(hash_b)
# The combined hash has all the keys from both sets
puts hash_c[:a] # => 'apple'
puts hash_c[:c] # => 'car', not 'camel' since B overwrites A
Note that when you merge B into A, any keys that A had that are in B are overwritten.

Resources