Ruby Building an Array - ruby

Argh. Having trouble here trying to work out how to build my array in Ruby.
So I am looping through a result and I want to add the category field as the key and have it so that if the next row has the same category it gets put into that category. If not, it makes a new array and adds it to that.
Here is what I have so far.
data = Array.new
results.each do |row|
data[row.category].push row.field
end
Which is not going to work I know. I want data[row.category] to eventually be (after the loop) an array containing all the row.field's
So I end up with an array that looks like this.
[['Dogs', 5, 12, 2], ['Cats', 4, 5, 9], ['Fish', 25, 82, 23]]
So no matter how many loops I do, if I push it into an array that already exists in data then it just appends it, if the array doesn't exist it creates it and then appends it.
In PHP I would simply do this:
$data[$row['category']][] = $row['field']
With the empty [] denoting to create a new array if there is none. How do I do this in Ruby???

Yeah, you seem to be confused by PHP and its associative arrays (which aren't called arrays in any other language :) ). You need a hash. Try this snippet:
data = results.each_with_object({}) do |row, memo|
memo[row.category] ||= [] # create array unless it exists
memo[row.category] << row.field
end

Unlike PHP, you cannot use any object as index for an Array. In Ruby, we use Hashes to associate arbitrary objects with other objects.
Your code should work if you change it to:
data = Hash.new { |hash, key| hash[key] = [] }
results.each do |row|
data[row.category] << row.field
end

Related

Ruby Nokogiri parsing omit duplicates

I'm parsing XML files and wanting to omit duplicate values from being added to my Array. As it stands, the XML will looks like this:
<vulnerable-software-list>
<product>cpe:/a:octopus:octopus_deploy:3.0.0</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.1</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.2</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.3</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.4</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.5</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.6</product>
</vulnerable-software-list>
document.xpath("//entry[
number(substring(translate(last-modified-datetime,'-.T:',''), 1, 12)) > #{last_imported_at} and
cvss/base_metrics/access-vector = 'NETWORK'
]").each do |entry|
product = entry.xpath('vulnerable-software-list/product').map { |product| product.content.split(':')[-2] }
effected_versions = entry.xpath('vulnerable-software-list/product').map { |product| product.content.split(':').last }
puts product
end
However, because of the XML input, that's parsing quite a bit of duplicates, so I end up with an array like ['Redhat','Redhat','Redhat','Fedora']
I already have the effected_versions taken care of, since those values don't duplicate.
Is there a method of .map to only add unique values?
If you need to get an array of unique values, then just call uniq method to get the unique values:
product =
entry.xpath('vulnerable-software-list/product').map do |product|
product.content.split(':')[-2]
end.uniq
There are many ways to do this:
input = ['Redhat','Redhat','Redhat','Fedora']
# approach 1
# self explanatory
result = input.uniq
# approach 2
# iterate through vals, and build a hash with the vals as keys
# since hashes cannot have duplicate keys, it provides a 'unique' check
result = input.each_with_object({}) { |val, memo| memo[val] = true }.keys
# approach 3
# Similar to the previous, we iterate through vals and add them to a Set.
# Adding a duplicate value to a set has no effect, and we can convert it to array
result = input.each_with_object.(Set.new) { |val, memo| memo.add(val) }.to_a
If you're not familiar with each_with_object, it's very similar to reduce
Regarding performance, you can find some info if you search for it, for example What is the fastest way to make a uniq array?
From a quick test, I see these performing in increasing time. uniq is 5 times faster than each_with_object, which is 25% slower than the Set.new approach. Probably because sort is implemetned using C. I only tested with only an arbitrary input though, so it might not be true for all cases.

How to generate a unique identifier for a hash with a certain content?

For a caching layer, I need to create a unique sha for a hash. It should be unique for the content of that hash. Two hashes with the same config should have the same sha.
in_2014 = { scopes: [1, 2, 3], year: 2014 }
not_in_2104 = { scopes: [1, 2, 3], year: 2015 }
also_in_2014 = { year: 2014, scopes: [1, 2, 3] }
in_2014 == also_in_2014 #=> true
not_in_2104 == in_2014 #=> false
Now, in order to store it and quickly look this up, it need to be turned
into something of a shasum. Simply converting to string does not work,
so generating a hexdigest from it does not work either:
require 'digest'
in_2014.to_s == also_in_2014.to_s #=> false
Digest::SHA2.hexdigest(in_2014.to_s) == Digest::SHA2.hexdigest(also_in_2014.to_s) #=> false
What I want is a shasum or some other identifier that will allow me to
compare the hashes with one another. I want something like the last test that will return true if the contents of the hashes match.
I could sort the hashes before to_s, yet that seems cludgy to me. I
am, for one, afraid that I am overlooking something there (a sort returns an array, no longer a hash, for one). Is there
something simple that I am overlooking? Or is this not possible at all?
FWIW, we need this in a scenario like below:
Analysis.find_by_config({scopes: [1,2], year: 2014}).datasets
Analysis.find_by_config({account_id: 1337}).datasets
class Analysis < ActiveRecord::Base
def self.find_by_config(config)
self.find_by(config_digest: shasum_of(config))
end
def self.shasum_of(config)
#WAT?
end
def before_saving
self.config_digest = Analysis.shasum_of(config)
end
end
Note that here, Analysis does not have columns "scopes" or "year" or
"account_id". These are arbitrary configs, that we only need for looking
up the datasets.
I wouldn't recommend the hash method because it is unreliable. You can quickly confirm this by executing {one: 1}.hash in your IRB, the same command in your Rails console, and then in the IRB and/or Rails Console on another machine. The outputs will differ.
Sticking with Digest::SHA2.hexdigest(string) would be wiser.
You'll have to sort the hash and stringify it of course. This is what I would do:
hash.sort.to_s
If you don't want an array, for whatever reason, turn it back into a hash.
Hash[hash.sort].to_s #=> will return hash
And, for whatever reason, if you don't want to turn the hash into an array and then back into a hash, do the following for hash-to-sorted-hash:
def prepare_for_sum( hash )
hash.keys.sort.each_with_object({}) do |key, return_hash|
return_hash[key] = hash[key]
end.to_s
end
Using some modifications in the method above, you can sort the values too; it can be helpful in case of Array or Hash values.
Turns out, Ruby has a method for this exact case: Hash.hash.
in_2014.hash == also_in_2014.hash

Can I use index to get a set element in Ruby?

Suppose I have a set in Ruby s1:
#<Set: {12, 25}>
I use s1.find_index(12) to get the index 0
Can I use the index to get back the set element, something like s1[0] to get back 12?
The reason I want to do this is my set elements are large. I want to store links between the set elements. I use the index to store the links.
I am using Ruby 1.9.3
I think you want to use an Array and a Hash for this.
ary = []
hsh = {}
unless hsh[item]
hash[item] = ary.size
ary << item
end
Then when you look up the item in hsh later you will have the index of the item in the list and effectively you will have the internals of your set with a specific caveat
That might not be possible. Set is an unordered list.
Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.
You can get an element from a set by its index in this way:
my_set = Set.new([1, 4, 7])
if index = my_set.find_index(4)
puts my_set.to_a[index]
end

Creating and adding to arrays within a program -- Ruby

I'm a fairly new Ruby user and I was wondering how you create and edit arrays within a program. I'm making a sentence-generator-type program where you can add to arrays of nouns, verbs, and other sentence parts, but I'm currently not sure how to make the arrays in the first place.
Here's the code:
#!/usr/bin/ruby Make_a_Sentence!
#This will eventually create the sentence
def makeSent (*arg)
for i in 0...arg.length
print arg[i].sample if arg[i].kind_of?(Array)
print arg[i] if arg[i].kind_of?(String)
end
end
#this is supposed to add to the array (it's not working)
def addWord (array, word)
if array.kind_of?(Array)
array.push(word)
puts "#{ word } added to #{ array }"
else
puts "#{ array } does not exist"
end
end
#This is supposed to create the arrays
def addType (name)
#name = Array.new
puts "#{ name } created"
end
while 1 > 0
input = gets
$words = input.split
if $words[0] == "addWord" && $words.length == 3
addWord($words[1], $words[2])
end
if $words[0] == "addType" && $words.length == 2
addType($words[1])
end
end
**Sorry! I guess I didn't phrase the question well enough! I was mainly wondering how to create new arrays while the program is running, but the arrays have specific names that are given. I actually ended up just using hashes for this, but thanks for the responses nonetheless!
Making an Array is done like so:
array = ["val1","val2","val3"]
array = %w{value value value}
# The second example is a shorthand syntax (splits values with just a space)
Familiarize yourself with the documentation: http://www.ruby-doc.org/core-2.1.0/Array.html
Also when you're using methods like: def makeSent (*arg) just be aware that *args with the * in front of it - is what's called a splat. Which means this method takes many arguments without the [] syntax and automatically converts all of the arguments into an array for you.
So you would call this method like: makeSent (first, second, third, etc)
Creating arrays is easy: simply enclose the objects in square brackets, like so:
my_array = [12, 29, 36, 42]
another_array = ['something', 64, 'another', 1921]
Notice you can mix and match types within the array. If you want to add items to the end of the array, you can use << to do it:
my_array = [1, 3, 5, 7]
my_array << 9
# my_array is now: [1, 3, 5, 7, 9]
You can access and modify specific items within the array by indexing it within square brackets:
my_array[2] # <-- this will be 5, as arrays are 0-indexed
my_array[2] = 987
# my_array is now [1, 3, 987, 7, 9]
There are a lot of great methods for accessing, modifying, and comparing arrays that you can find in the documentation.

Ruby: hash that doesn't remember key values

Is there a hash implementation around that doens't remember key values? I have to make a giant hash but I don't care what the keys are.
Edit:
Ruby's hash implementation stores the key's value. I would like hash that doesn't remember the key's value. It just uses the hash function to store your value and forgets the key. The reason for this is that I need to make a hash for about 5 gb of data and I don't care what the key values are after creating it. I only want to be able to look up the values based on other keys.
Edit Edit:
The language is kind of confusing. By key's value I mean this:
hsh['value'] = data
I don't care what 'value' is after the hash function stores data in the hash.
Edit^3:
Okay so here's what I am doing: I am generating every 35-letter (nucleotide) kmer for a set of multiple genes. Each gene has an ID. The hash looks like this:
kmers = { 'A...G' => [1, 5, 3], 'G...T' => [4, 9, 9, 3] }
So the hash key is the kmer, and the value is an array containing IDs for the gene(s)/string(s) that have that kmer.
I am querying the hash for kmers in another dataset to quickly find matching genes. I don't care what the hash keys are, I just need to get the array of numbers from a kmer.
>> kmers['A...G']
=> [1, 5, 3]
>> kmers.keys.first
=> "Sorry Dave, I can't do that"
I guess you want a set, allthough it stores unique keys and no values. It has the fast lookup time from a hash.
Set is included in the standard libtrary.
require 'set'
s = Set.new
s << 'aaa'
p s.merge(['ccc', 'ddd']) #=> #<Set: {"aaa", "ccc", "ddd"}>
Even if there was an oddball hash that just recorded existence (which is how I understand the question) you probably wouldn't want to use it, as the built-in Hash would be simpler, faster, not require a gem, etc. So just set...
h[k] = k
...and call it a day...
I assume the 5 gb string is a genome, and the kmers are 35 base pair nucleotide sequences.
What I'd probably do (slightly simplified) is:
human_genome = File.read("human_genome.txt")
human_kmers = Set.new
human_genome.each_cons(35) do |potential_kmer|
human_kmers << potential_kmer unless human_kmers.include?(potential_kmer)
end
unknown_gene = File.read("unknown_gene.txt")
related_to_humans = unknown_gene.each_cons(35).any? do |unknown_gene_kmer|
human_kmers.include?(unknown_gene_kmer)
end
I have to make a giant hash but I don't care what the keys are.
That is called an array. Just use an array. A hash without keys is not a hash at all and loses its value. If you don't need key-value lookup then you don't need a hash.
Use an Array. An Array indexes by integers instead of keys. http://www.ruby-doc.org/core/classes/Array.html
a = []
a << "hello"
puts a #=> ["hello"]

Resources