Elegant way to loop in ascending and descending order - ruby

I have the following code that parses HTML text and trims (or strips) the paragraphs that are empty. It's similar to .strip on a String object.
doc = Nokogiri::HTML::DocumentFragment.parse(html)
# repetition that I want to collapse
doc.css('p').each do |p|
if all_children_are_blank?(p)
p.remove
else
break
end
end
# repetition that I want to collapse
doc.css('p').reverse_each do |p|
if all_children_are_blank?(p)
p.remove
else
break
end
end
doc.to_s.strip
Is there a more elegant way to prevent code that I've labelled with comments to be duplicated and adhere to principles of code-reuse?
Here is what I've come up with but I'm not happy with it yet and wanted to see if there is something better:
doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.css('p').each do |p|
if stop(p) then break end
end
doc.css('p').reverse_each do |p|
if stop(p) then break end
end
doc.to_s.strip
def self.stop(p)
if all_children_are_blank?(p)
p.remove
false
else
true
end
end

If I understand what you're looking for, you would like a simpler way to iterate over the elements you're looking at, in order to remove blank p elements.
Here is a straightforward way to collapse what you've written, without doing a whole lot different:
doc.tap do |d|
[:each, :reverse_each].each do |sym|
d.css("p").public_send(sym) do |p|
if blank_children?(p)
p.remove
else
break
end
end
end
end.to_s.strip
I have not tested this out, so you might need to tweak it a little. If this were production code, I would probably decompose it into one or more method calls in order to keep things clear.

Maybe something like:
puts "removing a top p" until stop(doc.at('p'))
puts "removing a bottom p" until stop(doc.search('p').last)
or just:
puts "removing a p" until stop(doc.at('p')) && stop(doc.search('p').last)

How about:
[*doc.css('p'), *doc.css('p').reverse].each do |p|
if stop(p) then break end
end
In this case, the splat operator ("*") expands both lists into one array, with the elements in ascending, then descending order. Then you just iterate over the whole group.
Edit:
This won't work properly because of the break statement skipping to the end of everything. So the proper way of doing this, IMHO, would be to assign the block to a variable. And you might as well eliminate the stop function since you are eliminating the duplication of code anyway:
remover = lambda do |p|
if all_children_are_blank? p
p.remove
else
break
end
end
doc.css('p').to_a.each(&remover).reverse_each(&remover)
Hope this helps.

Related

Repeating a block of code until the block itself returns false?

I want to:
pass a block to a method call, and then
pass that entire method call as the condition of a while loop,
even though I don't need to put any logic inside the loop itself.
Specifically, I have an array that I'd like to #reject! certain elements from based on rather complicated logic. Subsequent calls to #reject! may remove elements that were not removed on a previous pass. When #reject! finally stops finding elements to reject, it will return nil. At this point, I would like the loop to stop and the program to proceed.
I thought I could do the following:
while array.reject! do |element|
...
end
end
I haven't actually tried it yet, but this construction throws vim's ruby syntax highlighter for a loop (i.e., it thinks the first do is for the while statement, and thinks the second end is actually the end of the encapsulating method). I also tried rewriting this as an inline while modifier attached to a begin...end block,
begin; end while array.reject! do |element|
...
end
but it still screws up the highlighting in the same way. In any case, it feels like an abuse of the while loop.
The only way I could think of to accomplish this is by assigning the method call as a proc:
proc = Proc.new do
array.reject! do |element|
...
end
end
while proc.call do; end
which works but feels kludgy, especially with the trailing do; end.
Is there any elegant way to accomplish this??
It's not just vim, while array.reject! do |element| is invalid syntax:
$ ruby -c -e 'while array.reject! do |element| end'
-e:1: syntax error, unexpected '|'
while array.reject! do |element| end
^
You could use { ... } instead of do ... end:
while array.reject! { |element|
# ...
}
end
or loop and break:
loop do
break unless array.reject! do |element|
# ...
end
end
a little more explicit:
loop do
r = array.reject! do |element|
# ...
end
break unless r
end
Ruby lets you move your condition to the end of the loop statement. This makes it easy to store a result inside of the loop and check it against the conditional:
begin
any_rejected = arr.reject! { … }
end while any_rejected
This would work the same as doing end while arr.reject! { … }, but it's much clearer here what's happening, especially with a complicated reject!.
You're right that the Ruby parser thinks that do belongs to while, and doesn't understand where the second end is coming from. It's a precedence problem.
This code is just to show that it can be done. For how it should be done, see Stefan's answer :
array = (1..1000).to_a
while (array.reject! do |element|
rand < 0.5
end)
p array.size
end
It outputs :
473
238
113
47
30
18
8
1
0
My personal preference in situations where I need to call a method until the return value is what I want is:
:keep_going while my_method
Or more tersely I sometimes use:
:go while my_method
It's one line, and you can use the contents of the symbol to help document what's going on. With your block, I'd personally create a proc/lambda out of it and pass that to reject for clarity.
# Harder to follow, IMHO
:keep_going while array.reject! do |...|
more_code
end
# Easier to follow, IMHO
simplify = ->(...){ ... }
:keep_simplifying while array.reject!(&simplify)

How to stop outer block from inner block

I try to implement search function which looks for occurrence for particular keyword, but if --max options is provided it will print only some particular number of lines.
def search_in_file(path_to_file, keyword)
seen = false
File::open(path_to_file) do |f|
f.each_with_index do |line, i|
if line.include? keyword
# print path to file before only if there occurence of keyword in a file
unless seen
puts path_to_file.to_s.blue
seen = true
end
# print colored line
puts "#{i+1}:".bold.gray + "#{line}".sub(keyword, keyword.bg_red)
break if i == #opt[:max] # PROBLEM WITH THIS!!!
end
end
end
puts "" if seen
end
I try to use break statement, but when it's within if ... end block I can't break out from outer each_with_index block.
If I move break outside if ... end it works, but it's not what I want.
How I can deal with this?
Thanks in advance.
I'm not sure how to implement it in your code as I'm still learning Ruby, but you can try catch and throw to solve this.
def search_in_file(path_to_file, keyword)
seen = false
catch :limit_reached do
#put your code to look in file here...
throw :limit_reached if i == #opt[:max] #this will break and take you to the end of catch block
Something like this already exist here

Functionally find mapping of first value that passes a test

In Ruby, I have an array of simple values (possible encodings):
encodings = %w[ utf-8 iso-8859-1 macroman ]
I want to keep reading a file from disk until the results are valid. I could do this:
good = encodings.find{ |enc| IO.read(file, "r:#{enc}").valid_encoding? }
contents = IO.read(file, "r:#{good}")
...but of course this is dumb, since it reads the file twice for the good encoding. I could program it in gross procedural style like so:
contents = nil
encodings.each do |enc|
if (s=IO.read(file, "r:#{enc}")).valid_encoding?
contents = s
break
end
end
But I want a functional solution. I could do it functionally like so:
contents = encodings.map{|e| IO.read(f, "r:#{e}")}.find{|s| s.valid_encoding? }
…but of course that keeps reading files for every encoding, even if the first was valid.
Is there a simple pattern that is functional, but does not keep reading the file after a the first success is found?
If you sprinkle a lazy in there, map will only consume those elements of the array that are used by find - i.e. once find stops, map stops as well. So this will do what you want:
possible_reads = encodings.lazy.map {|e| IO.read(f, "r:#{e}")}
contents = possible_reads.find {|s| s.valid_encoding? }
Hopping on sepp2k's answer: If you can't use 2.0, lazy enums can be easily implemented in 1.9:
class Enumerator
def lazy_find
self.class.new do |yielder|
self.each do |element|
if yield(element)
yielder.yield(element)
break
end
end
end
end
end
a = (1..100).to_enum
p a.lazy_find { |i| i.even? }.first
# => 2
You want to use the break statement:
contents = encodings.each do |e|
s = IO.read( f, "r:#{e}" )
s.valid_encoding? and break s
end
The best I can come up with is with our good friend inject:
contents = encodings.inject(nil) do |s,enc|
s || (c=File.open(f,"r:#{enc}").valid_encoding? && c
end
This is still sub-optimal because it continues to loop through encodings after finding a match, though it doesn't do anything with them, so it's a minor ugliness. Most of the ugliness comes from...well, the code itself. :/

Ruby best practice : if not empty each do else in one operator

1.I can't find an elegant way to write this code:
if array.empty?
# process empty array
else
array.each do |el|
# process el
end
end
I'd like to have one loop, without writing array twice. I read this, but there is no solution good enough.
2.
I am actually in an HAML template. Same question.
- if array.empty?
%p No result
- else
%ul
- array.each do |el|
%li el
What about?
array.each do |x|
#...
puts "x",x
end.empty? and begin
puts "empty!"
end
The cleanest way I've seen this done in HAML (not plain Ruby) is something like:
- array.each do |item|
%li
= item.name
- if array.empty?
%li.empty
Nothing here.
As mentioned by other answers, there is no need for the else clause because that's already implied in the other logic.
Even if you could do the each-else in one clean line, you wouldn't be able to achieve the markup you're trying to achieve (<p> if array.empty?, <ul> if array.present?). Besides, the HAML you show in your question is the best way to tell the story behind your code, which means it will be more readable and maintainable to other developers, so I don't know why you would want to refactor into something more cryptic.
I think there is no much more elegant or readable way to write this. Any way to somehow combine an iteration with a condition will just result in blackboxed code, meaning: the condition will just most likely be hidden in an Array extension.
If array is empty, then it will not be iterated, so the each block does not need to be conditioned. Since the return value of each is the receiver, you can put the each block within the empty? condition.
if (array.each do |el|
# process el
end).empty?
# process empty array
end
Assuming that "process empty array" leaves it empty after processing, you can leave out the else:
if array.empty?
# process empty array
end
array.each do |el|
# process el
end
or in one line:
array.empty? ? process_empty_array : array.each { |el| process_el }
An if the array is nil then we can enforce to empty array
if (array || []).each do |x|
#...
puts "x",x
end.empty?
puts "empty!"
end
I saw some people asking how to handle this for nil cases.
The trick is to convert it to string. All nils converted to string becomes a empty string, all empty cases continue being empty.
nil.to_s.empty?
"".to_s.empty?
both will return true

Is there an implicit keyword in this Ruby Array map code?

Is there a keyword I can use to explicitly tell the map function what the result of that particular iteration should be?
Consider:
a = [1,2,3,4,5]
a.map do |element|
element.to_s
end
In the above example element.to_s is implicitly the result of each iteration.
There are some situations where I don't want to rely on using the last executed line as the result, I would prefer to explicitly say what the result is in code.
For example,
a = [1,2,3,4,5]
a.map do |element|
if some_condition
element.to_s
else
element.to_f
end
end
Might be easier for me to read if it was written like:
a = [1,2,3,4,5]
a.map do |element|
if some_condition
result_is element.to_s
else
result_is element.to_f
end
end
So is there a keyword I can use in place of result_is?
return will return from the calling function, and break will stop the iteration early, so neither of those is what I'm looking for.
The last thing left on the stack is automatically the result of a block being called. You're correct that return would not have the desired effect here, but overlook another possibility: Declaring a separate function to evaluate the entries.
For example, a reworking of your code:
def function(element)
if (some_condition)
return element.to_s
end
element.to_f
end
a.map do |element|
function(element)
end
There is a nominal amount of overhead on calling the function, but on small lists it should not be an issue. If this is highly performance sensitive, you will want to do it the hard way.
Yes, there is, it's called next. However, using next in this particular case will not improve readability. On the contrary, it will a) confuse the reader and b) give him the impression that the author of that code doesn't understand Ruby.
The fact that everything is an expression in Ruby (there are no statements) and that every expression evaluates to the value of the last sub-expression in that expression are fundamental Ruby knowledge.
Just like return, next should only be used when you want to "return" from the middle of a block. Usually, you only use it as a guard clause.
The nature of map is to assign the last executed line to the array. Your last example is very similar to the following, which follows the expected behavior:
a = [1,2,3,4,5]
a.map do |element|
result = if some_condition
element.to_s
else
element.to_f
end
result
end
No, there is no language keyword in ruby you can use to determine the result mapped into the resulting array before executing other code within the iteration.
You may assign a variable which you then return when some other code has been executed:
a.map do |element|
result = some_condition ? element.to_s : element.to_f
#do something else with element
result
end
Keep in mind the reason for ruby not providing a keyword for this kind of code is that these patterns tend to have a really low readability.

Resources