Dynamic Nested Ruby Loops - ruby

So, What I'm trying to do is make calls to a Reporting API to filter by all possible breakdowns (breakdown the reports by site, avertiser, ad type, campaign, etc...). But, one issue is that the breakdowns can be unique to each login.
Example:
user1: alice123's reporting breakdowns are ["site","advertiser","ad_type","campaign","line_items"]
user2: bob789's reporting breakdowns are ["campaign","position","line_items"]
When I first built the code for this reporting API, I only had one login to test with, so I hard coded the loops for the dimensions (["site","advertiser","ad_type","campaign","line_items"]). So what I did was pinged the API for a report by sites. Then for each site, pinged for advertisers, and each advertiser, I pinged for the next dimension and so on..., leaving me with a nested loop of ~6 layers.
basically what I'm doing:
sites = mechanize.get "#{base_ur}/report?dim=sites"
sites = Yajl::Parser.parse(sites.body) # json parser
sites.each do |site|
advertisers = mechanize.get "#{base_ur}/report?site=#{site.fetch("id")}&dim=advertiser"
advertisers = Yajl::Parser.parse(advertisers.body) # json parser
advertisers.each do |advertiser|
ad_types = mechanize.get "#{base_ur}/report?site=#{site.fetch("id")}&advertiser=#{advertiser.fetch("id")}&dim=ad_type"
ad_types = Yajl::Parser.parse(ad_types.body) # json parser
ad_types.each do |ad_type|
...and so on...
end
end
end
GET <api_url>/?dim=<dimension to breakdown>&site=<filter by site id>&advertiser=<filter by advertiser id>...etc...
At the end of the nested loop, I'm left with a report that's broken down as much granularity as possible.
This works now since I only thought that there was one path of breaking down, but apparently each account could have different dimensions breakdowns.
So what I'm asking is if given an array of breakdowns, how can I set up a nested loop to traverse down dynamically do the granularity singularity?
Thanks.

I'm not sure what your JSON/GET returns exactly but for a problem like this you would need recursion.
Something like this perhaps? It's not very elegant and can definitely be optimised further but should hopefully give you an idea.
some_hash = {:id=>"site-id", :body=>{:id=>"advertiser-id", :body=>{:id=>"ad_type-id", :body=>{:id=>"something-id"}}}}
#breakdowns = ["site", "advertiser", "ad_type", "something"]
def recursive(some_hash, str = nil, i = 0)
if #breakdowns[i+1].nil?
str += "#{#breakdowns[i]}=#{some_hash[:id]}"
else
str += "#{#breakdowns[i]}=#{some_hash[:id]}&dim=#{#breakdowns[i + 1]}"
end
p str
some_hash[:body].is_a?(Hash) ? recursive(some_hash[:body], str.gsub(/dim.*/, ''), i + 1) : return
end
recursive(some_hash, 'base-url/report?')
=> "base-url/report?site=site-id&dim=advertiser"
=> "base-url/report?site=site-id&advertiser=advertiser-id&dim=ad_type"
=> "base-url/report?site=site-id&advertiser=advertiser-id&ad_type=ad_type-id&dim=something"
=> "base-url/report?site=site-id&advertiser=advertiser-id&ad_type=ad_type-id&something=something-id"

If you are just looking to map your data, you can recursively map to a hash as another user pointed out. If you are actually looking to do something with this data while within the loop and want to dynamically recreate the loop structure you listed in your question (though I would advise coming up with a different solution), you can use metaprogramming as follows:
require 'active_support/inflector'
# Assume we are given an input of breakdowns
# I put 'testarr' in place of the operations you perform on each local variable
# for brevity and so you can see that the code works.
# You will have to modify to suit your needs
result = []
testarr = [1,2,3]
b = binding
breakdowns.each do |breakdown|
snippet = <<-END
eval("#{breakdown.pluralize} = testarr", b)
eval("#{breakdown.pluralize}", b).each do |#{breakdown}|
END
result << snippet
end
result << "end\n"*breakdowns.length
eval(result.join)
Note: This method is probably frowned upon, and as I've said I'm sure there are other methods of accomplishing what you are trying to do.

Related

Ruby future buffer?

I want to get a bunch a XML and parse them. They are somewhat large.
So I was thinking I could get and parse them in a future like this:(I currently use Celluloid)
country_xml = {}
country_pool = GetAndParseXML.pool size: 4, args: [#connection]
countries.each do |country|
country_xml[country] = country_pool.future.fetch_xml country
end
countries.each do |country|
xml = country_xml[country]
# Do stuff with the XML!
end
This would be fine if it weren't that it takes up a lot of memory before it's actually needed.
Ideally I want it to maybe buffer up 3 XML files stop and wait until at least 1 is processed then continue. How would I do that?
The first question is: what is it that's taking up the memory? I will assume it's the prased XML documents, as that seems most likely to me.
I think the easiest way would be to create an actor that will fetch and process the XML. If you then create a pool of 3 of these actors you will have at most 3 requests being processed at once.
In vague terms (assuming that you aren't using the Celluloid registry):
class DoStuffWithCountryXml
include Celluloid
exclusive :do_stuff_with_country
def initialize(fetcher)
#fetcher = fetcher
end
def do_stuff_with_country(country)
country_xml = fetcher.fetch_xml country
# Do stuff with country_xml
end
end
country_pool = GetAndParseXML.pool size: 4, args: [#connection]
country_process_pool = DoStuffWithCountryXml.pool size: 3, args: [country_pool]
countries_futures = countries.map { |c| country_process_pool.future.do_stuff_with_country(c) }
countries_stuff = countries_futures.map { |f| f.value }
Note that if this is the only place where GetAndParseXML is used then the pool size might as well be the same as the DoStuffWithXmlActor.
I would not use a Pool at all. You're not benefiting from it. A lot of people seem to feel using a Future and a Pool together is a good idea, but it's usually worse than using one or the other.
In your case, use Future ... but you will also benefit from the upcoming Multiplexer features. Until then, do this... use a totally different strategy than has been tried or suggested:
class HandleXML
include Celluloid
def initialize(fetcher)
#fetcher = fetcher
end
def get_xml(country)
#fetcher.fetch_xml(country)
end
def process_xml(country, xml)
#de Do whatever you need to do with the data.
end
end
def begin_processor(handler, countries, index)
data = handler.future.get_xml(countries[index])
index += 1
data
end
limiter = 3 #de This sets your desired limit.
country_index = 0
data_index = 0
data = {}
processing = []
handler = HandleXML.new(#connection)
#de Load up your initial futures.
limiter.times {
processing << begin_processor(handler, countries, country_index)
}
while data_index < countries.length
data[countries[data_index]] = processor.shift.value
handler.process_xml(countries[data_index],data[countries[data_index]])
#de Once you've taken out one XML set above, load up another.
if country_index < countries.length
processing << begin_processor(handler, countries, country_index)
end
end
The above is just an example of how to do it with Future only, handling 3 at a time. I've not run it and it could have errors, but the idea is demonstrated for you.
The code loads up 3 sets of Country XML, then starts processing that XML. Once it has processed one set of XML, it loads up another, until all the country XML is processed.

Ruby // Random number between range, ensure uniqueness against others existing stored ones

Currently trying to generate a random number in a specific range;
and ensure that it would be unique against others stored records.
Using Mysql. Could be like an id, incremented; but can't be it.
Currently testing other existing records in an 'expensive' manner;
but I'm pretty sure that there would be a clean 1/2 lines of code to use
Currently using :
test = 0
Order.all.each do |ord|
test = (0..899999).to_a.sample.to_s.rjust(6, '0')
if Order.find_by_number(test).nil? then
break
end
end
return test
Thanks for any help
Here your are my one-line solution. It is also the quicker one since calls .pluck to retrieve the numbers from the Order table. .select instantiates an "Order" object for every record (that is very costly and unnecessary) while .pluck does not. It also avoids to iterate again each object with a .map to get the "number" field. We can avoid the second .map as well if we convert, using CAST in this case, to a numeric value from the database.
(Array(0...899999) - Order.pluck("CAST('number' AS UNSIGNED)")).sample.to_s.rjust(6, '0')
I would do something like this:
# gets all existing IDs
existing_ids = Order.all.select(:number).map(&:number).map(&:to_i)
# removes them from the acceptable range
available_numbers = (0..899999).to_a - existing_ids
# choose one (which is not in the DB)
available_numbers.sample.to_s.rjust(6, '0')
I think, you can do something like below :
def uniq_num_add(arr)
loop do
rndm = rand(1..15) # I took this range as an example
# random number will be added to the array, when the number will
# not be present
break arr<< "%02d" % rndm unless arr.include?(rndm)
end
end
array = []
3.times do
uniq_num_add(array)
end
array # => ["02", "15", "04"]

refactor this ruby database/sequel gem lookup

I know this code is not optimal, any ideas on how to improve it?
job_and_cost_code_found = false
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, clean_cost_code].each do |row|
job_and_cost_code_found = true
end
if job_and_cost_code_found == false then
info = linenum + "," + id + ",,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
add_to_exception_output_file(info)
end
You're breaking a lot of simple rules here.
Don't select what you don't use.
You select a number of columns, then completely ignore the result data. What you probably want is a count:
SELECT COUNT(*) AS cost_code_count FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?'
Then you'll get one row that will have either a zero or non-zero value in it. Save this into a variable like:
job_and_cost_codes_found = timberline_db[...][0]['cost_code_count']
Don't compare against false unless you need to differentiate between that and nil
In Ruby only two things evaluate as false, nil and false. Most of the time you will not be concerned about the difference. On rare occasions you might want to have different logic for set true, set false or not set (nil), and only then would you test so specifically.
However, keep in mind that 0 is not a false value, so you will need to compare against that.
Taking into account the previous optimization, your if could be:
if job_and_cost_codes_found == 0
# ...
end
Don't use then or other bits of redundant syntax
Most Ruby style-guides spurn useless syntax like then, just as they recommend avoiding for and instead use the Enumerable class which is far more flexible.
Manipulate data, not strings
You're assembling some kind of CSV-like line in the end there. Ideally you'd be using the built-in CSV library to do the correct encoding, and libraries like that want data, not a string they'd have to parse.
One step closer to that is this:
line = [
linenum,
id,
nil,
"Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
].join(',')
add_to_exception_output_file(line)
You'd presumably replace join(',') with the proper CSV encoding method that applies here. The library is more efficient when you can compile all of the data ahead of time into an array-of-arrays, so I'd recommend doing that if this is the end goal.
For example:
lines = [ ]
# ...
if (...)
# Append an array to the lines to write to the CSV file.
lines << [ ... ]
end
Keep your data in a standard structure like an Array, a Hash, or a custom object, until you're prepared to commit it to its final formatted or encoded form. That way you can perform additional operations on it if you need to do things like filtering.
It's hard to refactor this when I'm not exactly sure what it's supposed to be doing, but assuming that you want to log an error when there's no entry matching a job & code pair, here's what I've come up with:
def fetch_by_job_and_cost_code(job, cost_code)
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, cost_code]
end
if fetch_by_job_and_cost_code(job, clean_cost_code).none?
add_to_exception_output_file "#{linenum},#{id},,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
end

Rails: control ordering of query parameters in url_for?

I'd like to generate a URL where the "p=1" query param appears at the end of the URL, like:
/path?foo=X&bar=Y&p=1
Is it possible to control the ordering of query parameters when generating URLs via:
url_for(params.merge({ p: page_num }))
?
Update:
I tried ChuckE's suggestion below. It turns out that in Ruby 1.9 Hashes are already ordered, so the code in ActiveSupport::OrderedHash is effectively no-op'd. You can verify with Ruby 1.9 that order is preserved:
>> h = {one: 1, two: 2, three: 3 }
{:one=>1, :two=>2, :three=>3}
>> f = h.except(:one)
{:two=>2, :three=>3}
>> f[:one] = 1
1
>> f
{:two=>2, :three=>3, :one=>1}
However, url_for still puts the "p" param first. It seems that any potential solution will need to address how url_for iterates the hash.
After further digging, I see that what's happening is that url_for is actually sorting the parameters by key lexicographically, independent of their insertion order in the hash. Apparently this is being done to aid caching, since URL params are often used for page cache keys.
In short, you can't do it without patching Hash, specifically, you need to override activesupport/core_ext/object/to_param.rb so that Hash#to_param does not call .sort on the return value.
Related question: How to generate custom sorted query string URL in Rails link_to?.
First question is: why would you need something like that? The order which the parameters appear in the url in doesn't influence the way they are fetched by the server, since they are basic key/value associations. So, no matter where the parameter appears, it will always be recognized by the server.
Nonetheless, to answer your question, yes, it is possible. You just have to use ordered hashes. They are available through active support.
opts = OrderedHash.new
opts[:foo] = 'X'
opts[:bar] = 'Y'
opts[:p] = 1
your_helper_url(opts)
Should do the trick for you.

Need help breaking up Ruby code into methods

I am writing a program to better learn to program and I wish to use RSpec so that I can learn that as well. However, as is, the code isn't particularly RSpec friendly, so I need to break it up into methods so that I can test it.
I don't need anyone to write the code for me, but perhaps explain how I can break it up. I am new to programming and this kind of thing (breaking things up into methods) is a really difficult concept for me.
Here's what I have:
if params[:url] != ''
url = params[:url] #line created so I can return url more easily (or, in general)
words = params[:word].gsub("\n", ",").delete("\r").split(",") #.delete redundant?
words.reject!(&:empty?)
words.each(&:lstrip!)
return "#{words}", "#{url}" #so that I can return url, not sure how to do that yet
end
The code is a SERP checker, it takes a url and keywords and checks their location in the search engines.
For url, it'll just be the url of the website the user wishes to check... for word, it would be the keywords they wish to check their site against in Google.. a user may fill out the input form like so:
Corn on the cob,
Fibonacci,
StackOverflow
Chat, Meta, About
Badges
Tags,,
Unanswered
Ask Question
Your code takes a sloppy string and turns it into a clean array. You first clean up the string, then you polish the array. You could define methods for these actions.
def clean_up_words(str)
#code to clean str
str
end
def clean_up_list(arr)
#code to clean arr
arr
end
dirty_list = clean_up_words( params[:word]).split(',')
clean_list = clean_up_list( dirty_list )
def foo params
url = params[:url]
url.empty? ? nil : [params[:word].scan(/[^\s\r,]+/), url]
end
You are assigning url = params[:url]. If you are going to do that, you should do it before other places where you refer to the same thing to reduce the amount of calling [] on param.
You have several conditions on the words to be extracted. (a) Either split by "\n", ",", "\r", (b) the word should not be of 0 length, (c) white characters should be stripped off. All of this can be put together as scan(/[^\s\r,]+/).
You want to return two variables when url is not empty. Use an array in that case.

Resources