Performance - Ruby - Compare large array of hashes (dictionary) to primary hash; update resulting value - ruby

I'm attempting to compare my data, which is in the format of an array of hashes, with another large array of hashes (~50K server names and tags) which serves as a dictionary. The dictionary is stripped down to only include the absolutely relevant information.
The code I have works but it is quite slow on this scale and I haven't been able to pinpoint why. I've done verbose printing to isolate the issue to a specific statement (tagged via comments below)--when it is commented out, the code runs ~30x faster.
After reviewing the code extensively, I feel like I'm doing something wrong and perhaps Array#select is not the appropriate method for this task. Thank you so much in advance for your help.
Code:
inventory = File.read('inventory_with_50k_names_and_associate_tag.csv')
# Since my CSV is headerless, I'm forcing manual headers
#dictionary_data = CSV.parse(inventory).map do |name|
Hash[ [:name, :tag].zip(name) ]
end
# ...
# API calls to my app to return an array of hashes is not shown (returns '#app_data')
# ...
#app_data.each do |issue|
# Extract base server name from FQDN (e.g. server_name1.sub.uk => server_name1)
derived_name = issue['name'].split('.').first
# THIS IS THE BLOCK OF CODE that slows down execution 30 fold:
#dictionary_data.select do |src_server|
issue['tag'] = src_server[:tag] if src_server[:asset_name].start_with?(derived_name)
end
end
Sample Data Returned from REST API (#app_data):
#app_data = [{'name' => 'server_name1.sub.emea', 'tag' => 'Europe', 'state' => 'Online'}
{'name' => 'server_name2.sub.us', 'tag' => 'US E.', 'state' => 'Online'}
{'name' => 'server_name3.sub.us', 'tag' => 'US W.', 'state' => 'Failover'}]
Sample Dictionary Hash Content:
#dictionary_data = [{:asset_name => 'server_name1-X98765432', :tag => 'Paris, France'}
{:asset_name => 'server_name2-Y45678920', :tag => 'New York, USA'}
{:asset_name => 'server_name3-Z34534224', :tag => 'Portland, USA'}]
Desired Output:
#app_data = [{'name' => 'server_name1', 'tag' => 'Paris, France', 'state' => 'Up'}
{'name' => 'server_name2', 'tag' => 'New York, USA', 'state' => 'Up'}
{'name' => 'server_name3', 'tag' => 'Portland, USA', 'state' => 'F.O'}]

Assuming "no" on both of my questions in the comments:
#!/usr/bin/env ruby
require 'csv'
#dictionary_data = CSV.open('dict_data.csv') { |csv|
Hash[csv.map { |name, tag| [name[/^.+(?=-\w+$)/], tag] }]
}
#app_data = [{'name' => 'server_name1.sub.emea', 'tag' => 'Europe', 'state' => 'Online'},
{'name' => 'server_name2.sub.us', 'tag' => 'US E.', 'state' => 'Online'},
{'name' => 'server_name3.sub.us', 'tag' => 'US W.', 'state' => 'Failover'}]
STATE_MAP = {
'Online' => 'Up',
'Failover' => 'F.O.'
}
#app_data = #app_data.map do |server|
name = server['name'][/^[^.]+/]
{
'name' => name,
'tag' => #dictionary_data[name],
'state' => STATE_MAP[server['state']],
}
end
p #app_data
# => [{"name"=>"server_name1", "tag"=>"Paris, France", "state"=>"Up"},
# {"name"=>"server_name2", "tag"=>"New York, USA", "state"=>"Up"},
# {"name"=>"server_name3", "tag"=>"Portland, USA", "state"=>"F.O."}]
EDIT: I find it more convenient here to read the CSV without headers, as I don't want it to generate an array of hashes. But to read a headerless CSV as if it had headers, you don't need to touch the data itself, as Ruby's CSV is quite powerful:
CSV.read('dict_data.csv', headers: %i(name tag)).map(&:to_hash)

Related

Elegantly return value(s) matching criteria from an array of nested hashes - in one line

I have been searching for a solution to this issue for a couple of days now, and I'm hoping someone can help out. Given this data structure:
'foo' => {
'bar' => [
{
'baz' => {'faz' => '1.2.3'},
'name' => 'name1'
},
{
'baz' => {'faz' => '4.5.6'},
'name' => 'name2'
},
{
'baz' => {'faz' => '7.8.9'},
'name' => 'name3'
}
]
}
I need to find the value of 'faz' that begins with a '4.', without using each. I have to use the '4.' value as a key for a hash I will create while looping over 'bar' (which obviously I can't do if I don't yet know the value of '4.'), and I don't want to loop twice.
Ideally, there would be an elegant one-line solution to return the value '4.5.6' to me.
I found this article, but it doesn't address the full complexity of this data structure, and the only answer given for it is too verbose; the looping-twice solution is more readable. I'm using Ruby 2.3 on Rails 4 and don't have the ability to upgrade. Are there any Ruby gurus out there who can guide me?
You can use select to filter results.
data = {'foo' => {'bar' => [{'baz' => {'faz' => '1.2.3'}, 'name' => 'name1'}, {'baz' => {'faz' => '4.5.6'}, 'name' => 'name2'}, {'baz' => {'faz' => '7.8.9'}, 'name' => 'name3'}]}}
data.dig('foo', 'bar').select { |obj| obj.dig('baz', 'faz').slice(0) == '4' }
#=> [{"baz"=>{"faz"=>"4.5.6"}, "name"=>"name2"}]
# or if you prefer the square bracket style
data['foo']['bar'].select { |obj| obj['baz']['faz'][0] == '4' }
The answer assumes that every element inside the bar array has the nested attributes baz -> faz.
If you only expect one result you can use find instead.

How to do mongoid 'not_in' 'greater than' query

If I want to search a mongoid model with attribute greater than 100 I would do this.
Model.where({'price' => {'$gt' => 100}})
How do I do search a mongoid model without attribute greater than 100?
Tried this and failed.
Model.not_in({'price' => [{'$gt' => 100}]})
Additional info:
In the end of the day would like to make a query like so:
criteria = {
'price' => [{'$gt' => 100}],
'size' => 'large',
'brand' => 'xyz'
}
Model.not_in(criteria)
As the criteria would be dynamically created.
model without attribute greater than 100 = model with attribute less than or equal to 100?
Model.where({'price' => {'$lte' => 100}})
Try this
Model.where(:price.lte => 100,:size.ne => 'large',:brand.ne => 'xzy')
Try using the .ne() (not equals) operator
Model.where({:price.lte => 100}).ne({:size => 'large', :brand => 'xzy'})
You can also find the Mongoid documentation here http://mongoid.org/en/origin/docs/selection.html#negation

rake import- only adding one line from csv to database

I am attempting to import a CSV file into my rails database (SQLite in Development) following this tutorial. Data is actually getting inserted into my database but it seems to only insert the first record from the CSV File. the rake seems to run without problem. and a running it with --trace reveals no additional information.
require 'csv'
desc "Import Voters from CSV File"
task :import => [:environment] do
file = "db/GOTV.csv"
CSV.foreach(file, :headers => false) do |row|
Voter.create({
:last_name => row[0],
:first_name => row[1],
:middle_name => row[2],
:name_suffix => row[3],
:primary_address => row[4],
:primary_city => row[5],
:primary_state => row[6],
:primary_zip => row[7],
:primary_zip4 => row[8],
:primary_unit => row[9],
:primary_unit_number => row[10],
:phone_number => row[11],
:phone_code => row[12],
:gender => row[13],
:party_code => row[14],
:voter_score => row[15],
:congressional_district => row[16],
:house_district => row[17],
:senate_district => row[18],
:county_name => row[19],
:voter_key => row[20],
:household_id => row[21],
:client_id => row[22],
:state_voter_id => row[23]
})
end
end
Just ran into this as well - guess you solved it some other way, but still might be useful for others.
In my case, the issue seems to be an incompatible change in the CSV library.
I guess you were using Ruby 1.8, where
CSV.foreach(path, rs = nil, &block)
The docs here are severely lacking, actually no docs at all, so have to guess from source: http://ruby-doc.org/stdlib-1.8.7/libdoc/csv/rdoc/CSV.html#method-c-foreach..
Anyway, 'rs' is clearly not an option hash, it looks like the record separator.
In Ruby 1.9 this is nicer: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-foreach
self.foreach(path, options = Hash.new, &block)
so this is the one that supports options such as :headers..

how do i loop an array in ruby?

Hi i just started learning ruby and i'm trying to loop through an array i created looking like this:
server[0] = ['hostname' => 'unknown01', 'ip' => '192.168.0.2', 'port' => '22']
server[1] = ['hostname' => 'unknown02', 'ip' => '192.168.0.3', 'port' => '23']
server[2] = ['hostname' => 'unknown03', 'ip' => '192.168.0.4', 'port' => '24']
i tried using this code:
i=0
server[i].each do |x|
print x['hostname']
print x['ip']
i+=1
end
but it only loops through server[0] how i can loop through server[0-3]
You don't need to use i at all, just do this:
server.each do |x|
print x['hostname']
print x['ip']
end
What you're doing is setting i to 0, so when you get to the server[i].each, you're calling server[0].each. Even though you increment i that doesn't change what you're enumerating over. The correct code is:
server.each
to enumerate over every element in server.
This probably isn't doing what you think it is. Since you're putting your key/value pairs inside of square brackets, it winds up being an array of one hash (other languages call them maps and dictionaries). Probably you want to exchange your square brackets with curly brackets like this:
servers = []
servers[0] = {'hostname' => 'unknown01', 'ip' => '192.168.0.2', 'port' => '22'}
servers[1] = {'hostname' => 'unknown02', 'ip' => '192.168.0.3', 'port' => '23'}
servers[2] = {'hostname' => 'unknown03', 'ip' => '192.168.0.4', 'port' => '24'}
But since you're just setting each one by its index, you don't need to build it up this way, you can just place them in their respective positions within the array.
servers = [
{'hostname' => 'unknown01', 'ip' => '192.168.0.2', 'port' => '22'},
{'hostname' => 'unknown02', 'ip' => '192.168.0.3', 'port' => '23'},
{'hostname' => 'unknown03', 'ip' => '192.168.0.4', 'port' => '24'},
]
Then to iterate over each one you can do:
servers.each do |server|
puts server['hostname']
puts server['ip']
puts
end

Ruby: Create hash with default keys + values of an array

I believe this has been asked/answered before in a slightly different context, and I've seen answers to some examples somewhat similar to this - but nothing seems to exactly fit.
I have an array of email addresses:
#emails = ["test#test.com", "test2#test2.com"]
I want to create a hash out of this array, but it must look like this:
input_data = {:id => "#{id}", :session => "#{session}",
:newPropValues => [{:key => "OWNER_EMAILS", :value => "test#test.com"} ,
{:key => "OWNER_EMAILS", :value => "test2#test2.com"}]
I think the Array of Hash inside of the hash is throwing me off. But I've played around with inject, update, merge, collect, map and have had no luck generating this type of dynamic hash that needs to be created based on how many entries in the #emails Array.
Does anyone have any suggestions on how to pull this off?
So basically your question is like this:
having this array:
emails = ["test#test.com", "test2#test2.com", ....]
You want an array of hashes like this:
output = [{:key => "OWNER_EMAILS", :value => "test#test.com"},{:key => "OWNER_EMAILS", :value => "test2#test2.com"}, ...]
One solution would be:
emails.inject([]){|result,email| result << {:key => "OWNER_EMAILS", :value => email} }
Update: of course we can do it this way:
emails.map {|email| {:key => "OWNER_EMAILS", :value => email} }

Resources