Find location of a value in a hash in Ruby - ruby

I have a hash list and I want to find when a user gives a value if that value comes after or before a target key in the list.
So for example I have the following list
my_list = {
key1 => value1,
key2 => value2,
key3 => value3,
key4 => value4
}
Then the user chooses key2 then it has a target key that was chosen let's say was key3
So my function will check if key2 comes before or after the target key, in this case key3.
If it was an array I would check which index the value is. But in a hash I am not sure how to do.

You can create a simple sequence look-up table with almost zero effort:
MY_LIST = {
start: 'Start',
middle: 'Middle?',
end: 'End!'
}
MY_LIST_SEQ = MY_LIST.keys.each_with_index.to_h
Now you have something that looks like this:
{:start=>0, :middle=>1, :end=>2}
Which means you can do this:
if MY_LIST_SEQ[a] > MY_LIST_SEQ[b]
# ...
end
No need to use a linear scan each time you want to look something up.

Related

Perl: Using data from an anonymous hash

I am adding on to another developers code, we are both new to perl I am trying to take a list of IPs and run it through whois. I am unsure how to access the data in the anonymous hash, how do I use it?
He told me the data was stored inside one. This is the only instance I could find mentioning a hash:
## Instantiate a reference to an anonymous hash (key value pair)
my $addr = {};
The anonymous hash is the {} part. It returns a reference which can be stored in a scalar variable, like in your example:
my $addr = {};
To see the structure, you can print it with Data::Dumper:
use Data::Dumper;
print Dumper $addr;
It might show you something like:
$VAR1 = {
'c' => 1,
'a' => 2
};
You access the hash keys using the arrow operator:
print $addr->{"a"}
Like how you would access a regular hash, but with the arrow operator in between.
You can dereference the reference by putting a hash sigil in front
%$addr
# compare %addr %$addr
# hash hashref dereferenced
Here is an anonymous hash:
my $anon_hash = {
key1 => 'Value 1',
key2 => 'Value 2',
key3 => 'Value 3',
}
If you want to access an individual value:
my $value = $anon_hash->{key1};
say $anon_hash->{key2};
If you want to update an individual value:
$anon_hash->{key3} = 'New value 3';
If you want to add a new key/value pair:
$anon_hash->{key4} = 'Value 4';
You can also use all of the standard hash functions (e.g. keys()). You just need to "deference" your hash reference - which means putting a '%' in front of it.
So, for example, to print all the key/value pairs:
foreach my $key (keys %$anon_hash) {
say "$key : $anon_hash->{$_}";
}

Filter data based on multiple keys from json array

I have a json array in the following format [{"key1":"abc", "key2": "def", "key3": "ghi"}, {"key1":"abc", "key2": "jkl", "key3": "mno"}, ...]. There's also a table inside the database having four columns: key1, key2, key3 and value. Each of key1, key2 and key3 can hold either * or any other string. What I'm trying to achieve is returning an array of values from the value column when the keys from the database match those in the json array. * should mean any value however an exact match should take precedence over *.
Here's an example to clarify things up:
Assume the table contains these rows
1. abc | def | ghi | value1
2. abc | def | * | value2
3. * | def | * | value3
4. * | * | * | value4
If the value being checked is {"key1":"abc", "key2": "def", "key3": "ghi"} then value1 should be returned. If we remove row 1 from the database then value2 should be returned. If row 2 was removed as well, value3 should be returned. If row 3 was removed, value4 should be returned and finally if row 4 was removed null should be returned. I'm looking for an efficient way to solve this assuming that the number of rows in the table are relatively small and those in the json array are much larger
You need to define your problem more precisely : how much is a * match worth compared to an exact match ? For example what is best for you : a line with 10 * matchs but zero exact match or the opposite ? If you don't define the "weight" of an exact match then there are many different solutions to your problem.
Then you can define a score for each line based on the number of exact matchs and "*" matchs (weighted with whatever weight you'll define), and select the lines with the highest scores (for example using a sorting algorithm).

Ruby: Optimizing storage for holding a huge number of strings, some of them duplicates

I have a text file with two columns. The values in the first column ("key") are all different, the values in the second column - these strings have a length between 10 and approximately 200 - have some duplicates. The number of duplicates varies. Some strings - especially the longer ones - don't have any duplicate, while others might have 20 duplicate occurancies.
key1 valueX
key2 valueY
key3 valueX
key4 valueZ
I would like to represent this data as a hash. Because of the large number of keys and the existence of duplicate values, I am wondering, whether some method of sharing common strings would be helpful.
The data in the file is kind of "constant", i.e. I can put effort (in time of space) to preprocess it in a suitable way, as long as it is accessed efficiently, once it is entered my application.
I will now outline an algorithm, where I believe this would solve the problem. My question is, whether the algorithm is sound, respectively whether it could be improved. Also, I would like to know whether using freeze on the strings would provide an additional optimization:
In a separated preprocessing process, I find out which strings values are indeed duplicate, and I annotate the data accordingly (i.e. create a third column in the file), in that all occurances of a repeated string except the first occurance, have a pointer to the first occurance:
key1 valueX
key2 valueY
key3 valueX key1
key4 valueZ
When I read in my application the data into memory (line by line), I use this annotation, to create a pointer to the original string, instead of allocating a new one:
if columns.size == 3
myHash[columns[0]] = columns[1] # First occurance of the string
else
myHash[columns[0]] = myHash[columns[2]].dup # Subsequent occurances
end
Will this achieve my goal? Can it be done any better?
One way you could do this is using symbols.
["a", "b", "c", "a", "d", "c"].each do |c|
puts c.intern.object_id
end
417768 #a
313128 #b
312328 #c
417768 #a
433128 #d
312328 #c
Note how c got the same value.
You can turn a string into a symbol with the intern method. If you intern an equal string you should get the same symbol out, like a flyweight pattern.
If you save the symbol in your hash you'll just have each string a single time. When it's time to use the symbol just call .to_s on the symbol and you'll get the string back. (Not sure how the to_s works, it may do creation work on each call.) Another idea would be to cache strings your self, ie have an integer to string cache hash and just put the integer key in your data structures. When you need the string you can look it up.

Which value for a duplicate key is ignored in a Ruby hash?

If a hash has more than one occurrences of identical keys pointing to different values, then how does Ruby determine which value is assigned to that key?
In other words,
hash = {keyone: 'value1', keytwo: 'value2', keyone: 'value3'}
results in
warning: duplicated key at line 1 ignored: :keyone
but how do I know which value is assigned to :keyone?
The last one overwrites the previous values. In this case, "value3" becomes the value for :keyone. This works just as the same with merge. When you merge two hashes that have the same keys, the value in the latter hash (not the receiver but the argument) overwrites the other value.
Line numbers on duplicate key warnings can be misleading. As the other answers here confirm, every value of a duplicated key is ignored except for the last value defined for that key.
Using the example in the question across multiple lines:
1 hash1 = {key1: 'value1',
2 key2: 'value2',
3 key1: 'value3'}
4 puts hash1.to_s
keydup.rb:1: warning: duplicated key at line 3 ignored: :key1
{:key1=>"value3", :key2=>"value2"}
The message says "line 3 ignored" but, in fact it was the value of the key defined at line 1 that is ignored, and the value at line 3 is used, because that is the last value passed into that key.
IRB is your friend. Try the following in the command line:
irb
hash = {keyone: 'value1', keytwo: 'value2', keyone: 'value3'}
hash[:keyone]
What did you get? Should be "value3".
Best way to check these things is simply to try it out. It's one of the great things about Ruby.
This is spelled out clearly in section 11.5.5.2 Hash constructor of the ISO Ruby Language Specification:
11.5.5.2 Hash constructor
Semantics
[...]
b) 2) For each association Ai, in the order it appears in the program text, take the following steps:
i) Evaluate the operator-expression of the association-key of Ai. Let Ki be the resulting value.
ii) Evaluate the operator-expression of the association-value. Let Vi be the resulting value.
iii) Store a pair of Ki and Vi in H by invoking the method []= on H with Ki and Vi as the arguments.

When do you need string as a hash key

Given that using a symbol as a hash key is great according to this post, when do you need to use a string as a hash key?
Key concatenation, e.g.
hash["name" + "xxx"]
may be one such case, but I think the need is rare.
Even key concatenation can be converted easily:
hash[ ("name" + "xxx").to_sym ]
The short answer is that you benefit from avoiding using strings as keys in Ruby where the keys are just semantic labels to enable you to refer the values in code. In that case, it is clear that symbols do that job more efficiently (well, as pointed out above, provided you are not performing many conversions to generate the labels).
When you are parsing arbitrary data, e.g. XMl or JSON, then strings as keys might be more natural way of expressing the structure. Again, amount of conversion time from strings emitted by a parser into labels could be a factor.
If you're executing hash["name"+"xxx"] many times in a loop, then it can be beneficial to pull the key out of the loop and turn it into a symbol. It's just a performance thing. Symbols use a single location in memory, but strings get computed every time they're created.
If you have a .yaml file that looks like this:
- thing1: value1
thing2: value2
thing3: value3
- thing1: value1
thing2: value2
thing3: value3
- thing1: value1
thing2: value2
thing3: value3
and you load it with YAML::load_file('filename'), then you will need to use strings for keys.
However, if your yaml file looks like this:
- :thing1: value1
:thing2: value2
:thing3: value3
- :thing1: value1
:thing2: value2
:thing3: value3
- :thing1: value1
:thing2: value2
:thing3: value3
Then you can use symbols for keys. Symbols in this case are preferred for the ruby side, but the yaml would be cleaner with strings.

Resources