I'm trying to extract data from mongodb to Elasticsearch, getMongodoc = coll.find().limit(10)
will find the first 10 entries in mongo.
As you can see , result = ec.mongoConn should get result from method mongoConn() in class MongoConnector. when I use p hsh(to examine the output is correct), it will print 10 entires, while p result = ec.mongoConn will print #<Enumerator: #<Mongo::Cursor:0x70284070232580 #view=#<Mongo::Collection::View:0x70284066032180 namespace='mydatabase.mycollection' #filter={} #options={"limit"=>10}>>:each>
I changed p hsh to return hsh, p result = ec.mongoConn will get the correct result, but it just prints the first entry not all 10 entries. it seems that the value of hsh did not pass to result = ec.mongoConn correctly, Can anyone tell me what am I doing wrong? is this because I did something wrong with method calling?
class MongoConncetor
def mongoConn()
BSON::OrderedHash.new
client = Mongo::Client.new([ 'xx.xx.xx.xx:27017' ], :database => 'mydatabase')
coll = client[:mycollection]
getMongodoc = coll.find().limit(10)
getMongodoc.each do |document|
hsh = symbolize_keys(document.to_hash).select { |hsh| hsh != :_id }
return hsh
# p hsh
end
end
class ElasticConnector < MongoConncetor
include Elasticsearch::API
CONNECTION = ::Faraday::Connection.new url: 'http://localhost:9200'
def perform_request(method, path, params, body)
puts "--> #{method.upcase} #{path} #{params} #{body}"
CONNECTION.run_request \
method.downcase.to_sym,
path,
((
body ? MultiJson.dump(body) : nil)),
{'Content-Type' => 'application/json'}
end
ec = ElasticConnector.new
p result = ec.mongoConn
client = ElasticConnector.new
client.bulk index: 'myindex',
type:'test' ,
body: result
end
You are calling return inside a loop (each). This will stop the loop and return the first result. Try something like:
getMongodoc.map do |document|
symbolize_keys(document.to_hash).select { |hsh| hsh != :_id }
end
Notes:
In ruby you usually don't need the return keyword as the last value is returned automatically. Usually you'd use return to prevent some code from being executed
in ruby snake_case is used for variable and method names (as opposed to CamelCase or camelCase)
map enumerates a collection (by calling the block for every item in the collection) and returns a new collection of the same size with the return values from the block.
you don't need empty parens () on method definitions
UPDATE:
The data structure returned by MongoDB is a Hash (BSON is a special kind of serialization). A Hash is a collection of keys ("_id", "response") that point to values. The difference you point out in your comment is the class of the hash key: string vs. symbol
In your case a document in Mongo is represented as Hash, one hash per document
If you want to return multiple documents, then an array is required. More specifically an array of hashes: [{}, {}, ...]
If your target (ES) does only accept one hash at a time, then you will need to loop over the results from mongo and add them one by one:
list_of_results = get_mongo_data
list_of_results.each do |result|
add_result_to_es(result)
end
Related
I'm generating some local variables in a method. I would like to find a way of returning them in a hash with keys and values, but the generation of that very hash ends up in the return itself. How can I avoid this?
def get_map_libs()
libjq = JSON.parse(File.read(URI.open('https://api.cdnjs.com/libraries/jquery', 'r')))['latest']
liblf = JSON.parse(File.read(URI.open('https://api.cdnjs.com/libraries/leaflet', 'r')))['latest']
libbs = JSON.parse(File.read(URI.open('https://api.cdnjs.com/libraries/twitter-bootstrap', 'r')))['latest']
libfa = JSON.parse(File.read(URI.open('https://api.cdnjs.com/libraries/font-awesome', 'r')))['latest']
return {libjq: libjq, liblf: liblf, libbs: libbs, libfa: libfa}
# return local_variables
end
This method works as expected. There should be a way of grabbing local_variables and returning it. However, when I use that commented-out return (return local_variables), it only returns their keys:
[
[0] :libjq,
[1] :liblf,
[2] :libbs,
[3] :libfa
]
I tried building a return hash r = {} and populating it, however that very hash also shows up in the return. I tried deleting it, but that throws an error when I try to delete itself in itself.
Can this be done or do I have to hard code it like above?
The documentation of local_variables tells that it only returns an array with the names of the local variables.
But you could use that method to generate the hash:
local_variables.map { |name| [name, eval(name.to_s)] }.to_h
I think that is a bit error-prone because you might return unexpected variables and their values.
Perhaps it would be better to refector your method to something like this:
LIBRARIES = {
libjq: 'jquery',
liblf: 'leaflet',
libbs: 'twitter-bootstrap',
libfa: 'font-awesome'
}
def library_urls
LIBRARIES.map { |k, v|
[k, JSON.parse(URI.open("https://api.cdnjs.com/libraries/#{v}").read)['latest']]
}.to_h
end
Well, the problem is that you are assigning the result hash to a local variable, so if you want to return a list of local variables, then of course that variable will be included.
The two simplest solutions I can think of would be:
Filter out the name of that local variable.
Just don't assign it. I.e. where you have r = something, just return something without assigning it to r in the first place. Something like this:
def get_map_libs
libjq = :jq
liblf = :lf
libbs = :bs
libfa = :fa
local_variables.map {|var| [var, binding.local_variable_get(var)] }.to_h
end
get_map_libs
#=> { :libjq => :jq, :liblf => :lf, :libbs => :bs, :libfa => :fa }
Background
I have a collection of nested hashes which present a set of parameters to define application behavior:
custom_demo_options: {
verticals: {
fashion: true,
automotive: false,
fsi: false
},
channels: {
b2b: true,
b2c: true
}
}
website_data: {
verticals: {
fashion: {
b2b: {
code: 'luma_b2b',
url: 'b2b.luma.com'
},
b2c: {
code: 'base',
url: 'luma.com'
}
}
}
}
The choices made in the custom_demo_options hash relate to data stored in the website_data hash and serve to return values from it:
data = []
collection = {}
custom_demo_options[:verticlas].each do |vertical_name, vertical_choice|
# Get each vertical selection
if vertical_choice == true
# Loop through the channels for each selected vertical
custom_demo_options[:channels].each do |channel_name, channel_choice|
# Get each channel selection for each vertical selection
if channel_choice == true
# Loop through the website data for each vertical/channel selection
website_data[:verticals].each do |site_vertical, vertical_data|
# Look at the keys of the [:website_data][:verticals] hash
# If we have a vertical selection that matches a website_data vertical...
if site_vertical == vertical_name
# For each website_data vertical collection...
vertical_data.each do |vertical_channel, channel_value|
# If we have a matching channel in the collection...
if vertical_channel == channel_name
# Add the channel's url and code to the collection hash
collection[:url] = channel_value[:url]
collection[:code] = channel_value[:code]
# Push the collection hash(es) onto the data array
data.push(collection)
}
}
}
}
}
}
}
}
The data pushed to the data array is ultimately used to create the following nginx map definition:
map $http_host $MAGE_RUN_CODE {
luma.com base;
b2b.luma.com luma_b2b;
}
As an example of the relationship between the hashes, if a user sets custom_demo_options[:channels][:b2b] tofalse, the b2b code/url pair stored in thewebsite_data` hash would be removed from the nginx block:
map $http_host $MAGE_RUN_CODE {
luma.com base;
}
Question
The above code works, but I know it's horribly inefficient. I'm relatively new to ruby, but I think this is most likely a logical challenge rather than a language-specific one.
My question is, what is the proper way to connect these hashes rather than using loops as I've done? I've done some reading on hash.select and it seems like this might be the best route, but I'd like to know: are there are other approaches I should consider that would optimize this operation?
UPDATE
I've been able to implement the first suggestion (thanks again to the poster); however, I think the second solution will be a better approach. Everything works as described; however, my data structure has changed slightly, and although I understand what the solution is doing, I'm having trouble adapting accordingly. Here's the new structure:
custom_demo_options = {
verticals: {
fashion: true,
automotive: false,
fsi: false
},
channels: {
b2b: true,
b2c: true
},
geos: [
'us_en'
]
}
website_data = {
verticals: {
fashion: {
us_en: {
b2b: {
code: 'luma_b2b',
url: 'b2b.luma.com'
},
b2c: {
code: 'base',
url: 'luma.com'
}
}
}
}
}
So, I add another level to the hashes, :geo.
I've tried to adapt the second solution has follows:
class CustomOptionsMap
attr_accessor :custom_options, :website_data
def initialize(custom_options, website_data)
#custom_options = custom_options
#website_data = website_data[:verticals]
end
def data
verticals = selected_verticals
channels = selected_channels
geos = selected_geos
# I know this is the piece I'm not understanding. How to map channels and geos accordingly.
verticals.map{ |vertical| #website_data.fetch(vertical).slice(*channels) }
end
private
def selected_geos
#custom_options[:geos].select{|_,v| v } # I think this is correct, as it extracts the geo from the array and we don't have additional keys
end
def selected_verticals
#custom_options[:verticals].select{|_,v| v }.keys
end
def selected_channels
#custom_options[:channels].select{|_,v| v }.keys
end
end
demo_configuration = CustomOptionsMap.new(custom_demo_options, website_data)
print demo_configuration.data
Any guidance on what I'm missing regarding the map statement would be very much appreciated.
Object Oriented approach.
Using OOP might be more readable and consistent in this context, as Ruby is Object Oriented language.
Introducing simple Ruby class and using activesupport module, which is extending Hash with some useful methods, same result can be achieved in the following way:
class WebsiteConifg
attr_accessor :custom_options, :website_data
def initialize(custom_options, website_data)
#custom_options = custom_options
#website_data = website_data[:verticals]
end
def data
verticals = selected_verticals
channels = selected_channels
verticals.map{ |vertical| #website_data.fetch(vertical).slice(*channels) }
end
private
def selected_verticals
#custom_options[:verticals].select{|_,v| v }.keys
end
def selected_channels
#custom_options[:channels].select{|_,v| v }.keys
end
Based on passed custom_demo_options we can select verticals and channels of only those keys, which values are set as true.
For your configuration will return
selected_verticals # [:fashion]
selected_channels # [:b2b, :b2c]
+data()
Simple public interface is iterating through all selected verticals based on the passed options and return Array of hashes for the given channels by using slice(keys).
fetch(key)
return value for the given key it is an equivalent of h[:key]
h = {a: 2, b: 3}
h.fetch(:a) # 2
h.fetch(:b) # 3
slice(key1, key2) does require activesupport
returns hash which contains passed as an arguments, keys. Method is accepting multiple arguments, as in our example we are getting an Array of those keys, we can use * splat operator to comply with this interface.
h = {a: 2, b: 3}
h.slice(:a) # {:a=>2}
h.slice(:a, :b) # {:a=>2, :b=>3}
h.slice(*[:a, :b]) # {:a=>2, :b=>3}
Usage
website_config = WebsiteConifg.new(custom_demo_options, website_data)
website_config.data
# returns
# [{:b2b=>{:code=>"luma_b2b", :url=>"b2b.luma.com"}, :b2c=>{:code=>"base", :url=>"luma.com"}}]
UPDATE
Changed relevant parts:
def data
verticals = selected_verticals
channels = selected_channels
geos = selected_geos
verticals.map do |vertical|
verticals_data = #website_data.fetch(vertical)
# in case of multiple geolocations
# collecting relevant entries of all of them
geos_data = geos.map{|geo| verticals_data.fetch(geo) }
# for each geo-location getting selected channels
geos_data.map {|geo_data| geo_data.slice(*channels) }
end.flatten
end
private
# as `website_data' hash is using symbols, we need to covert string->sym
def selected_geos
#custom_options[:geos].map(&:to_sym)
end
def selected_verticals
selected_for(:verticals).keys
end
def selected_channels
selected_for(:channels).keys
end
def selected_for(key)
#custom_options[key].select{|_,v| v }
end
Easiest way to understand what kind of output(data) you have on each of the steps in the each(map) iterator, would be to place there debugger
like: pry, byebug.
Say you have key = :foo and hash = { foo: 1, bar: 2 } - you want to know the hash's value for that key.
The approach you're using here is essentially
result = nil
hsh.each { |k,v| result = v if k == :foo }
But why do that when you can simply say
result = hsh[:foo]
It seems like you understand how hashes can be iterable structures, and you can traverse them like arrays. But you're overdoing it, and forgetting that hashes are indexed structures. In terms of your code I would refactor it like so:
# fixed typo here: verticlas => verticals
custom_demo_options[:verticals].each do |vertical_name, vertical_choice|
# == true is almost always unnecessary, just use a truthiness check
next unless vertical_choice
custom_demo_options[:channels].each do |channel_name, channel_choice|
next unless channel_choice
vertical_data = website_data[:verticals][site_vertical]
channel_value = vertical_data[channel_name]
# This must be initialized here:
collection = {}
collection[:url] = channel_value[:url]
collection[:code] = channel_value[:code]
data.push(collection)
end
end
You can see that a lot of the nesting and complexity is removed. Note that I am initializing collection at the time it has attributes added to it. This is a little too much to go into here but I highly advise reading up on mutability in Ruby. You're current code will probably not do what you expect it to because you're pushing the same collection hash into the array multiple times.
At this point, you could refactor it into a more functional-programming style, with some chained methods, but I'd leave that exercise for you
I am trying to parse and store some data from a file into a hash map, not using regular expressions but string comparison, and I am getting some errors I tried to fix but didn't solve the problem.
The file has a structure like:
"key" + "double colon" + "value"
in every line. This structure is repeated along the file, and every data has an ID key, almost everything has at least one "is_a" key, and may also have "is_obsolete" and "replaced_by" keys.
I'm trying to parse it like this:
def get_hpo_data(hpofile="hp.obo")
hpo_data = Hash.new() #Hash map where i want to store all IDs
File.readlines(hpofile).each do |line|
if line.start_with? "id:" #if line is an ID
hpo_id = line[4..13] #Store ID value
hpo_data[hpo_id] = Hash.new() #Setting up hash map for that ID
hpo_data[hpo_id]["parents"] = Array.new()
elsif line.start_with? "is_obsolete:" #If the ID is obsolete
hpo_data[hpo_id]["is_obsolete"] = true #store value in the hash
elsif line.start_with? "replaced_by:" #If the ID is obsolete
hpo_data[hpo_id]["replaced_by"] = line[13..22]
#Store the ID term it was replaced by
elsif line.start_with? "is_a:" #If the ID has a parent ID
hpo_data[hpo_id]["parents"].push(line[6..15])
#Store the parent(s) in the array initialized before
end
end
return hpo_data
end
The structure I was expecting to be created is a global hash in which every ID also is a hash with its diferent data (one string data, one boolean and an array with a variable length depending the number of ID parents of that ID term, but I'm getting the following error:
table_combination.rb:224:in `block in get_hpo_data': undefined method `[]=' for nil:NilClass (NoMethodError)
This time the error is pointing to the replaced_by elsif statement, but I also get it with any of other elsif statements, so the code does not work parsing "is_obsolete", "replaced_by" and "is_a" properties. If I try deleting these statements, the code succesfully creates the global hash with every ID term as a hash.
I also tried giving default values for every hash but it does not solve the problem. I'm even getting a new error not seen before:
table_combination.rb:233:in '[]': no implicit conversion of String into Integer (TypeError)
at this line:
hpo_data[hpo_id]["parents"].push(line[6..15])
Here is an example of how the file looks like for two terms showing the different keys I want to take care of:
[Term]
id: HP:0002578
name: Gastroparesis
def: "Decreased strength of the muscle layer of stomach, which leads to a decreased ability to empty the contents of the stomach despite the absence of obstruction." [HPO:probinson]
subset: hposlim_core
synonym: "Delayed gastric emptying" EXACT layperson [ORCID:0000-0001-5208-3432]
xref: MSH:D018589
xref: SNOMEDCT_US:196753007
xref: SNOMEDCT_US:235675006
xref: UMLS:C0152020
is_a: HP:0002577 ! Abnormality of the stomach
is_a: HP:0011804 ! Abnormal muscle physiology
[Term]
id: HP:0002564
name: obsolete Malformation of the heart and great vessels
is_obsolete: true
replaced_by: HP:0030680
There might be more errors hidden in your code, but one problem is indeed that your hpo_data doesn't have default values.
Calling hpo_data[hpo_id]["replaced_by"] = line[13..22] fails if hpo_id hasn't been initialized.
You could define hpo_data like this:
hpo_data = Hash.new { |hash, key| hash[key] = {'parents' => [] } }
and remove
hpo_data = Hash.new() #Hash map where i want to store all IDs
and
hpo_data[hpo_id] = Hash.new() #Setting up hash map for that ID
hpo_data[hpo_id]["parents"] = Array.new()
Any time you call hpo_data[hpo_id], it will be automatically defined to {"parents"=>[]}.
As an example:
hpo_data = Hash.new { |hash, key| hash[key] = {'parents' => [] } }
# => {}
hpo_data[1234]
# => {"parents"=>[]}
hpo_data[1234]["parents"] << 6
# => [6]
hpo_data
# => {1234=>{"parents"=>[6]}}
hpo_data[42]["is_obsolete"] = true
# => true
hpo_data
# => {1234=>{"parents"=>[6]}, 42=>{"parents"=>[], "is_obsolete"=>true}}
Context and Code Examples
I have an Array with instances of a class called TimesheetEntry.
Here is the constructor for TimesheetEntry:
def initialize(parameters = {})
#date = parameters.fetch(:date)
#project_id = parameters.fetch(:project_id)
#article_id = parameters.fetch(:article_id)
#hours = parameters.fetch(:hours)
#comment = parameters.fetch(:comment)
end
I create an array of TimesheetEntry objects with data from a .csv file:
timesheet_entries = []
CSV.parse(source_file, csv_parse_options).each do |row|
timesheet_entries.push(TimesheetEntry.new(
:date => Date.parse(row['Date']),
:project_id => row['Project'].to_i,
:article_id => row['Article'].to_i,
:hours => row['Hours'].gsub(',', '.').to_f,
:comment => row['Comment'].to_s.empty? ? "N/A" : row['Comment']
))
end
I also have a Set of Hash containing two elements, created like this:
all_timesheets = Set.new []
timesheet_entries.each do |entry|
all_timesheets << { 'date' => entry.date, 'entries' => [] }
end
Now, I want to populate the Array inside of that Hash with TimesheetEntries.
Each Hash array must contain only TimesheetEntries of one specific date.
I have done that like this:
timesheet_entries.each do |entry|
all_timesheets.each do |timesheet|
if entry.date == timesheet['date']
timesheet['entries'].push entry
end
end
end
While this approach gets the job done, it's not very efficient (I'm fairly new to this).
Question
What would be a more efficient way of achieving the same end result? In essence, I want to "split" the Array of TimesheetEntry objects, "grouping" objects with the same date.
You can fix the performance problem by replacing the Set with a Hash, which is a dictionary-like data structure.
This means that your inner loop all_timesheets.each do |timesheet| ... if entry.date ... will simply be replaced by a more efficient hash lookup: all_timesheets[entry.date].
Also, there's no need to create the keys in advance and then populate the date groups. These can both be done in one go:
all_timesheets = {}
timesheet_entries.each do |entry|
all_timesheets[entry.date] ||= [] # create the key if it's not already there
all_timesheets[entry.date] << entry
end
A nice thing about hashes is that you can customize their behavior when a non-existing key is encountered. You can use the constructor that takes a block to specify what happens in this case. Let's tell our hash to automatically add new keys and initialize them with an empty array. This allows us to drop the all_timesheets[entry.date] ||= [] line from the above code:
all_timesheets = Hash.new { |hash, key| hash[key] = [] }
timesheet_entries.each do |entry|
all_timesheets[entry.date] << entry
end
There is, however, an even more concise way of achieving this grouping, using the Enumerable#group_by method:
all_timesheets = timesheet_entries.group_by { |e| e.date }
And, of course, there's a way to make this even more concise, using yet another trick:
all_timesheets = timesheet_entries.group_by(&:date)
What I'd like to do is pass in a hash of hashes that looks something like this:
input = {
"configVersion" => "someVers",
"box" =>
{
"primary" => {
"ip" => "192.168.1.1",
"host" => "something"
},
"api" => {
"live" => "livekey",
"test" => "testkey"
}
}
}
then iterate over it, continuing if the value is another hash, and generating output with it. The result should be something like this:
configVersion = "someVers"
box.primary.ip = "192.168.1.1"
box.primary.host = "something"
and so on...
I know how to crawl through and continue if the value is a hash, but I'm unsure how to concatenate the whole thing together and pass the value back up. Here is my code:
def crawl(input)
input.each do |k,v|
case v
when Hash
out < "#{k}."
crawl(v)
else
out < " = '#{v}';"
end
end
end
My problem is: where to define out and how to return it all back. I'm very new to Ruby.
You can pass strings between multiple calls of the recursive method and use them like accumulators.
This method uses an ancestors string to build up your dot-notation string of keys, and an output str that collects the output and returns it at the end of the method. The str is passed through every call; the chain variable is a modified version of the ancestor string that changes from call to call:
def hash_to_string(hash, ancestors = "", str = "")
hash.each do |key, value|
chain = ancestors.empty? ? key : "#{ancestors}.#{key}"
if value.is_a? Hash
hash_to_string(value, chain, str)
else
str << "#{chain} = \"#{value}\"\n"
end
end
str
end
hash_to_string input
(This assumes you want your output to be a string formatted as you've shown above)
This blog post has a decent solution for the recursion and offers a slightly better alternative using the method_missing method available in Ruby.
In general, your recursion is correct, you just want to be doing something different instead of concatenating the output to out.