Delete from NodeSet during map iteration? - ruby

Is it safe to delete a Node from a NodeSet during iteration? I'm pulling some links out of a bunch of a tags but want to remove the tags from the set altogether if the link is invalid.
def get_links(nodeset)
links = nodeset.map do |node|
begin
URI.join(node.document.url, node.get_attribute('href'))
rescue URI::InvalidURIError
nodeset.delete(node) # Is this safe?
nil
end
end
links.compact
end

In your example code I think you're not separating your actions well. Don't manipulate your nodeset array inside the map; It's not that you can't do it, it's that you shouldn't for clarity and ease of maintenance. "Map" the URLs separately from removing the bad ones.
At a minimum I'd do something more like:
def get_valid_links(nodeset)
doc_url = nodeset.first.document.url
links = nodeset.map { |node|
begin
URI.join(doc_url, node['href'])
rescue URI::InvalidURIError
nil
end
end
links.compact
end
nodeset = get_valid_links(nodeset)
Doing it that way doesn't alter nodeset unless you explicitly say so, by assigning the compacted/mapped value returned from get_links. That keeps the purpose of the method very clear, and it has no side effects.
I think this is one of those cases where "POLS" ("Principle Of Least Surprise") would kick in. Having the side-effect of munging nodeset inside the method could be very surprising to someone who's trying to maintain the code or use it in a library, and it'd be hard to work around.
From experience, I'll recommend being very careful throwing the contents of href attributes onto the end of a URL and expecting it to be good or useful. Remember that it's possible for the href to be a JavaScript link, which will make an ugly URL.

Related

Recursively check if nested elements exist

Just to give you a background, I'm using Ruby for creating automated tests along with Selenium, Cucumber, Capybara and SitePrism. I have some tests that need to check the text of a certain element on the page, for example:
def get_section_id
return section.top.course.section_id.text
end
However, I would like to check if all the parent elements exist before calling .text on the nested course_and_section_id element. For example, to check the text of this particular element I would do:
if(has_section? && section.has_top? && section.top.has_course? && section.top.course.has_section_id?)
return section.top.course.section_id.text
end
Is there any way to recursively check if something exists in Ruby like this? Something that could be called like: has_text?(section.top.course.section_id) maybe?
There is nothing builtin to ruby that would do this because the methods you're calling return the element, or raise an exception. If they returned the element or nil then the suggestion of Cary Swoveland to use &. would be the answer.
The critical thing to remember here is what you're actually trying to do. Since you're writing automated tests, you're (most likely) not trying to check whether or not the elements exist (tests should be predictable and repeatable so you should know the elements are going to exist) but rather just wait for the elements to exist before getting the text. This means what you really want is probably more like
def get_section_id
wait_until_section_visible
section.wait_until_top_visible
section.top.wait_until_course_visible
section.top.course.wait_until_section_id_visible
return section.top.course.section_id.text
end
You can write a helper method to make that easier, something like
def get_text_from_nested_element(*args)
args.reduce(self) do |scope, arg|
scope.send("wait_until_#{arg}_visible")
scope.send(arg)
end.text
end
which could be called as
def get_section_id
get_text_from_nested_element(:section, :top, :course, :section_id)
end
It sounds like you may want something like the following.
arr = [section, :top, :course, :section_id, :text]
arr.reduce { |e,m| e && e.respond_to?(m) && e.public_send(m) }
Because reduce has no argument the initial value of the memo e is section. If e becomes nil or false it will remain that value.
Whilst this is a bit outdated, the fact that &. won't work here when it is the most elegant perhaps gives rise for this being a useful feature
If you can raise it on GH with a sample page where this would be useful then we could look at getting it introduced
Luke

How to reset value of local variable within loop?

I'd like to point out I tried quite extensively to find a solution for this and the closest I got was this. However I couldn't see how I could use map to solve my issue here. I'm brand new to Ruby so please bear that in mind.
Here's some code I'm playing with (simplified):
def base_word input
input_char_array = input.split('') # split string to array of chars
#file.split("\n").each do |dict_word|
input_text = input_char_array
dict_word.split('').each do |char|
if input_text.include? char.downcase
input_text.slice!(input_text.index(char))
end
end
end
end
I need to reset the value of input_text back to the original value of input_char_array after each cycle, but from what I gather since Ruby is reference-based, the modifications I make with the line input_text.slice!(input_text.index(char)) are reflected back in the original reference, and I end up assigning input_text to an empty array fairly quickly as a result.
How do I mitigate that? As mentioned I've tried to use .map but maybe I haven't fully wrapped my head around how I ought to go about it.
You can get an independent reference by cloning the array. This, obviously, has some RAM usage implications.
input_text = input_char_array.dup
The Short and Quite Frankly Not Very Good Answer
Using slice! overwrites the variable in place, equivalent to
input_text = input_text.slice # etc.
If you use plain old slice instead, it won't overwrite input_text.
The Longer and Quite Frankly Much Better Answer
In Ruby, code nested four levels deep is often a smell. Let's refactor, and avoid the need to reset a loop at all.
Instead of splitting the file by newline, we'll use Ruby's built-in file handling module to read through the lines. Memoizing it (the ||= operator) may prevent it from reloading the file each time it's referenced, if we're running this more than once.
def dictionary
#dict ||= File.open('/path/to/dictionary')
end
We could also immediately make all the words lowercase when we open the file, since every character is downcased individually in the original example.
def downcased_dictionary
#dict ||= File.open('/path/to/dictionary').each(&:downcase)
end
Next, we'll use Ruby's built-in file and string functions, including #each_char, to do the comparisons and output the results. We don't need to convert any inputs into Arrays (at all!), because #include? works on strings, and #each_char iterates over the characters of a string.
We'll decompose the string-splitting into its own method, so the loop logic and string logic can be understood more clearly.
Lastly, by using #slice instead of #slice!, we don't overwrite input_text and entirely avoid the need to reset the variable later.
def base_word(input)
input_text = input.to_s # Coerce in case it's not a string
# Read through each line in the dictionary
dictionary.each do |word|
word.each_char {|char| slice_base_word(input_text, char) }
end
end
def slice_base_word(input, char)
input.slice(input.index(char)) if input.include?(char)
end

Ruby way of looping through nested objects

How can I rewrite the following code to be more Ruby-wayish? I'm thinking about inject but can't figure out how to do it.
def nested_page_path(page)
path = "/#{page.slug}"
while page.parent_id do
path.prepend "/#{page.parent.slug}"
page = page.parent
end
path
end
Input is an AR object, that has 0-5 consecutive parents. And output is something like '/pages/services/law'.
If you know for sure that there are no cycles in your parenting, you can do that recursively, i. e. with a function that calls itself. 5-level nesting should do just fine, trouble could arise with thousands.
def nested_page_path(page)
return "" if page.nil? # Or whatever that is root
"#{nested_page_path(page.parent)}/#{page.slug}"
end
But bear in mind, that the approach above, as well as yours, will fetch each object in a separate query. It's fine when you already have them fetched, but if not, you're in a bit of N+1 query trouble.
An easy workaround is caching. You can rebuild the nested path of this object and its descendants on before_save: that is some significant overhead on each write. There is a much better way.
By using nested sets you can get the object's hierarchy branch in just one query. Like this:
page.self_and_ancestors.pluck(:slug).join('/')
# ^
# Nested sets' goodness
What that query does is essentially "fetch me pages ordered by left bound, ranges of which enclose my own". I'm using awesome_nested_set in my examples.
SELECT "pages"."slug" FROM "pages"
WHERE ("pages"."lft" <= 42) AND ("pages"."rgt" >= 88)
ORDER BY "pages"."lft"
Without knowing your object structure it's difficult. But something recursive like this should do:
def nested_page_path(page)
path = "/#{page.slug}"
return path unless page.parent_id
path.prepend "#{nested_page_path(page.parent)}/"
end
Not sure inject is the simple answer since it operates on an Enumerable and you don’t have an obvious enumerable to start with.
I’d suggest something like this (not unlike your solution)
def nested_page_path(page)
pages = [page]
pages << pages.last.parent while pages.last.parent
'/' + pages.reverse.map(&:slug).join('/')
end
There’s scope for reducing repetition there, but that’s more or less what I’d go with.

iterate over an array and delete elements conditionally

I want to iterate over an array of URLs, and remove elements from it if there is a timeout with the HTTP request for the given URL. It has been implemented in the following way:
#urls.delete_if do |url|
begin
doc = perform_request(some_params)
break
rescue TimeoutError
Rails.logger.warn("URL #{url} times out, will be removed from list")
true
end
end
Anyone for a cleaner solution?
There are a lot more things that can go wrong than timeout, and it's better to ask the affirmative than the negative. That is, does the site respond in the way I want, rather than the does the site not respond in the way I want.
Furthermore, I would encourage practicing immutability, that is, not changing your data in place, but rather creating new versions from the old. My version would look like:
#urls = %w[www.google.com www.example.com]
valid_urls = #urls.select{ |url| up?(url)} #=> [www.google.com]
def up?(url)
Net::HTTP.new(url).head('/').kind_of? Net::HTTPOK
end
Everything seems reasonable, except the break in the cycle. I also like the cleanness of the solution.

When to use blocks

I love Ruby blocks! The idea behind them is just very very neat and convenient.
I have just looked back over my code from the past week or so, which is basically every single ruby function I ever have written, and I have noticed that not a single one of them returns a value! Instead of returning values, I always use a block to pass the data back!
I have even caught myself contemplating writing a little status class which would allow me to write code like :
something.do_stuff do |status|
status.success do
# successful code
end
status.fail do
# fail code
puts status.error_message
end
end
Am I using blocks too much? Is there a time to use blocks and a time to use return values?
Are there any gotchas to be aware of? Will my huge use of blocks come and bite me sometime?
The whole thing would be more readable as:
if something.do_stuff
#successful code
else
#unsuccessful code
end
or to use a common rails idiom:
if #user.save
render :action=>:show
else
#user.errors.each{|attr,msg| logger.info "#{attr} - #{msg}" }
render :action=>:edit
end
IMHO, avoiding the return of a boolean value is overuse of code blocks.
A block makes sense if . . .
It allows code to use a resource without having to close that resource
open("fname") do |f|
# do stuff with the file
end #don't have to worry about closing the file
The calling code would have to do non-trivial computation with the result
In this case, you avoid adding the return value to calling scope. This also often makes sense with multiple return values.
something.do_stuff do |res1, res2|
if res1.foo? and res2.bar?
foo(res1)
elsif res2.bar?
bar(res2)
end
end #didn't add res1/res2 to the calling scope
Code must be called both before and after the yield
You see this in some of the rails helpers:
&lt% content_tag :div do %>
&lt%= content_tag :span "span content" %>
&lt% end -%>
And of course iterators are a great use case, as they're (considered by ruby-ists to be) prettier than for loops or list comprehensions.
Certainly not an exhaustive list, but I recommend that you don't just use blocks because you can.
This is what functional programming people call "continuation-passing style". It's a valid technique, though there are cases where it will tend to complicate things more than it's worth. It might be worth to relook some of the places where you're using it and see if that's the case in your code. But there's nothing inherently wrong with it.
I like this style. It's actually very Ruby-like, and often you'll see projects restructure their code to use this format instead of something less readable.
Returning values makes sense where returning values makes sense. If you have an Article object, you want article.title to return the title. But for this particular example of callbacks, it's stellar style, and it's good that you know how to use them. I suspect that many new to Ruby will never figure out how to do it well.

Resources