Unused regex captures in Ruby - ruby

I have a script that processes the contents of a file from a CAD program, for use in another CAD program. Can the unused variables in the block be skipped, or written around? The script works fine with them in place, I was just curious if there was a cleaner way to write it. Thank you.
string = IO.read("file.txt")
string.scan(/regex/m) {|a,b,c,d,e,f,g|
# captures 7 items, I use 1-4, & 6 below, skipping 5 & 7
print a, b+".ext", c.to_f/25400000, d.to_f/25400000, f,"\n"
}
My question lies in the last line - if I'm not using them all - do I still have to declare them all, for it to work properly, and remain in the correct order?
Elements 5 & 7 may be used at a later time, but for now, they are just part of the regex, for future flexibility.

Since you are getting the variables as block variables, you cannot skip the order. The problem is with your regex. If you have a group that you don't want to capture, you should use the uncapturing group (?: ) instead of the capturing group ( ). So change the fifth and the seventh ( ) in your regex to (?: ). If you are using ruby 1.9 or are using oniguruma regex engine on ruby 1.8.7, then you can also use named captures; for example use (?<foo> ) in the regex, and refer to the captured string in the block as foo or $~[:foo].

You could use an array instead of an explicit list of variables and then pick things out of the array by index:
string.scan(/regex/m) { |a|
print a[0], a[1] + ".ext", a[2].to_f / 25400000, a[3].to_f / 25400000, a[5], "\n"
}
Either that or rework your regular expression to only capture what you need.
You can use the same variable multiple times in the list so just renaming the things you're not using to unused would probably be the simplest choice:
string.scan(/regex/m) { |a, b, c, d, unused, f, unused|
print a, b + ".ext", c.to_f / 25400000, d.to_f / 25400000, f, "\n"
}
At least this way it is (or should be) obvious that you're not using the fifth and seventh captures. However, this doesn't work in 1.9 so you'd have to use unused1 and unused2 in 1.9.
An ideal balance would be to use 1.9's named capture groups but scan doesn't give you access to them.

Related

Ruby: Add value to Variable and Clamp/limit the Variable in one line

Suppose I have several arrays in Ruby which I add/subtract values and afterwards I limit their range, like so:
array[x][y]=array[x][y]+1
array[x][y]=array[x][y].clamp (0..99)
Since I have many different arrays with rather long (index) names - and in order not to repeat those names twice in one line, I'd like to achieve something like
array[x][y]+=1.clamp (0..99)
Which is accepted by the interpreter, but doesn't work. It adds, but the value in the array does not get clamped.
Splitting it in at least two lines
array[x][y]+=1
array[x][y].clamp(0..99)
does also add, but doesn't clamp.
Is there any solution for this to fit the entire command in one line?
Many thanks!
The #clamp method doesn't take a range as a single argument for Ruby versions before 2.7, but rather two arguments representing the min and max, and #clamp does not mutate the object it's called on.
array[x][y] = (array[x][y] + 1).clamp(0, 99)
Note that because it's valid to call a method without parentheses, if parentheses are used around an argument list, there should not be any space between the method name and the parentheses. E.g. 1.clamp(0..4) rather than 1.clamp (0..4).

Selecting key words in a string (that are included in an Array) to change their format in Ruby

Select key words in a string to change their format in Ruby
I have a big string (text) and an Array of strings (key_words) as below:
text = 'So in this election, we cannot sit back and hope that everything works out for the best. We cannot afford to be tired or frustrated or cynical. No, hear me. Between now and November, we need to do what we did eight years ago and four years ago…'
key_words = ['frustrated', 'tired', 'hope']
My objective is to print each word in ‘text’ while changing the colour and case of the words that are included in key_words. I’ve been able to do that by doing:
require 'colorize'
text.split(/\b/).each do |x|
if key_words.include?(x.downcase) ; print '#{x}'.colorize(:red)
else print '#{x}' end
end
However, since I don’t want to include many words in key_words I want to make the selection more sensitive going beyond an exact match. Such as if, for example:
key_words = ['frustrat', 'tire', 'hope'] => the algorithm would select both 'Frustration', 'Frustrated' or 'Tiring' and 'Tired' or 'Hope' and 'Hopeful'.
I’ve tried playing with word lengths in both the string and the array as below but it’s seems very inefficient solution and I’m getting very confused with the usage of .any? and .include? methods in this scenario.
key_words = ['frustrated', 'tired', 'hope']
key_words_abb = []
key_words.each { |x| key_words_abb << x.downcase[0][0..x.length-2]}
text.split(/\b/).each do |x|
if key_words_abb.include?(x.downcase[0][0..x.length-2]); print '#{x}'.colorize(:red)
else print x
end
end
Since I can’t find a specific solution online I would appreciate your help.
It's worth noting that when doing repeated substitutions on strings, especially longer ones, you'll want your substitution method to be as efficient as possible. Spinning through an array of things to switch out is painfully expensive, especially as that list grows.
Here's a variation on your approach:
replacement = Regexp.new('\b%s\b' % [ Regexp.union(key_words) ])
replaced = text.gsub(replacement) do |s|
s.colorize(:red)
end
puts replaced
If you're using that substitution repeatedly you should persist the Regexp object into a constant. That avoids having to compile it for each string you're adjusting. If the list changes based on factors hard to predict, leave it like this and produce it dynamically.
One thing to note about using Ruby is it's often best to express your code as a series of transformations with output as a final step. Putting things like print in the middle of a loop complicates things unnecessarily. If you want to add an additional step to your loop you have to do a lot of extra work to move that print to a later stage. With the approach here you can just chain on the end and do whatever you want.

what is the elegant way to replace string without gsub?

I have some dynamic strings, which have an X character. X can appear continuously or scattered though the string. I want to replace those X with #.
For example, abXXcX12XX. I want ab#c#12#. That means multiple contiguous X have to be replaced by only one # and if only one X, then also by a single #.
I tried:
s = "aXX123Xc56XXX"
s.squeeze('X').gsub('X','#') # => "a#123#c56#"
Any elegant way or direct approach to do the same operation ?
I will do using String#tr_s as below :
Processes a copy of str as described under String#tr, then removes duplicate characters in regions that were affected by the translation.
s = "aXX123Xc56XXX"
s.tr_s('X','#') # => "a#123#c56#"
Not sure why you wouldn't use gsub here?
Regexes seem to work pretty well:
"aXX123Xc56XXX".gsub(/X+/, "#")
=> "a#123#c56#"
The reason this works is that /X+/ will match one or more of the X character, so multiple X in a row will generate only one match and be replaced by one #.
pry(main)> "aXX123Xc56XXX".gsub(/X+/, "#")
=> "a#123#c56#"
While both gsub and tr_s will accomplish the task, here's the compelling reason to use tr_s:
require 'fruity'
STRING = 'aXX123Xc56XXX' * 1000
compare do
using_tr_s { STRING.tr_s('X', '#') }
using_gsub { STRING.gsub(/X+/, '#') }
end
Which, after running on my laptop, results in:
Running each test 16 times. Test will take about 1 second.
using_tr_s is faster than using_gsub by 5x ± 0.1
The regular expression engine has seen a lot of speedups, but it'll still be beat for non-anchored lookups. If there was a way to tell it to start at the start or end of the string we'd see it speed up greatly, and, for some search/replace actions, I've seen it outrun everything else. The pattern used is critical; Poorly written ones cripple the engine, so be careful what you use.

Good style for splitting lengthy expressions over lines

If the following is not the best style, what is for the equivalent expression?
if (some_really_long_expression__________ && \
some_other_really_long_expression)
The line continuation feels ugly. But I'm having a hard time finding a better alternative.
The parser doesn't need the backslashes in cases where the continuation is unambiguous. For example, using Ruby 2.0:
if true &&
true &&
true
puts true
end
#=> true
The following are some more-or-less random thoughts about the question of line length from someone who just plays with Ruby. Nor have I had any training as a software engineer, so consider yourself forewarned.
I find the problem of long lines is often more the number of characters than the number of operations. The former can be reduced by (drum-roll) shortening variable names and method names. The question, of course, is whether the application of a verbosity filter (aka babbling, prattling or jabbering filter) will make the code harder to comprehend. How often have you seen something fairly close to the following (without \)?
total_cuteness_rating = cats_dogs_and_pigs.map {|animal| \
cuteness_calculation(animal)}.reduce {|cuteness_accumulator, \
cuteness_per_animal| cuteness_accumulator + cuteness_per_animal}
Compare that with:
tot_cuteness = pets.map {|a| cuteness(a)}.reduce(&:+)
Firstly, I see no benefit of long names for local variables within a block (and rarely for local variables in a method). Here, isn't it perfectly obvious what a refers to in the calculation of tot_cuteness? How good a memory do you need to remember what a is when it is confined to a single line of code?
Secondly, whenever possible use the short form for enumerables followed by a block (e.g, reduce(&:+)). This allows us to comprehend what's going on in microseconds, here as soon as our eyes latch onto the +. Same, for .to_i, _s or _f. True, reduce {|tot, e| tot + e} isn't much longer, but we're forcing the reader's brain to decode two variables as well as the operator, when + is really all it needs.
Another way to shorten lines is to avoid long chains of operations. That comes at a cost, however. As far as I'm concerned, the longer the chain, the better. It reduces the need for temporary variables, reduces the number of lines of code and--possibly of greatest importance--allows us to read across a line, as most humans are accustomed, rather than down the page. The above line of code reads, "To calculate total cuteness, calculate each pet's cuteness rating, then sum those ratings". How could it be more clear?
When chains are particularly long, they can be written over multiple lines without using the line-continuaton character \:
array.each {|e| blah, blah, ..., blah
.map {|a| blah, blah, ..., blah
.reduce {|i| blah, blah, ..., blah }
}
}
That's no less clear than separate statements. I think this is frequently done in Rails.
What about the use of abbreviations? Which of the following names is most clear?
number_of_dogs
number_dogs
nbr_dogs
n_dogs
I would argue the first three are equally clear, and the last no less clear if the writer consistently prefixes variable names with n_ when that means "number of". Same for tot_, and so on. Enough.
One approach is to encapsulate those expressions inside meaningful methods. And you might be able to break it into multiple methods that you can later reuse.
Other then that is hard to suggest anything with the little information you gave. You might be able to get rid of the if statement using command objects or something like that but I can't tell if it makes sense on your code because you didn't show it.
Ismael answer works really well in Ruby (there may be other languages too) for 2 reasons:
Ruby has very low overhead to creating methods due to lack of type
definition
It allows you to decouple such logic for reuse or future adaptability and testing
Another option I'll toss out is create logic equations and store the result in a variable e.g.
# this are short logic equations testing x but you can apply same for longer expressions
number_gt_5 = x > 5
number_lt_20 = x < 20
number_eq_11 = x == 11
if (number_gt_5 && number_lt_20 && !number_eq_11)
# do some stuff
end

string parsing optimization : ruby

I am working on a parser that is currently way too slow for my needs (like 40x slower than I would like) and would like advice on methods to increase my speed. I have tried and am currently using a custom regex parser, aswell as a custom parser using strscanner class. Ive heard a lot of positive comments on treetop, and have considered trying to combine the regex into one huge regex that would cover all matches, but would like to get some feedback w/ experience before I rewrite my parser yet again.
The basic rules of the strings that I am parsing are:
3 segments (BoL operators, message, EoL operators)
~6 BoL operators
BoL operators can be in any order
2 EoL operators EoL operators can be in any order
Quantity of any specific operator can be 0, 1, or >1 but only 1 is used rest are removed and discarded
Operators in the 'message' section of the string are not captured / removed
Whitespaces is allowed before & after operators but not required
Some BoL operators can have whitespace in the setting
My current Regex parser works by running the string through a loop that checks for BoL or EoL operators 1 at a time and cutting them out, ending the loop when there are no more operators of the given type as so...
loop{
if input =~ /^\s+/ then input.gsub!(/^\s+/,'') end
if input =~ /reges for operator_a/ #sets
sets operator_a
input.gsub!(/regex for operator_a)/, '')
elsif input =~ /regex for operator_b/
sets operator_b
input.gsub!(/regex for operator_b/,'')
elsif input =~ /regex for operator_c/
sets operator_c
etc .. etc .. etc..
else
break
end
}
The question I have, What would be the best way to optimize this code? Treetop, another library/gem that I have not found yet, combining the loops into one huge regex, something else?
Please restrict all answers and input to the Ruby language, I know that it is not the 'best' tool for this job, it is the language that I use.
More specific grammer / examples if that helps.
This is for parsing communication commands sent to a game by users, so far the only commands are say, and whisper. The begenning of line operators accepted are ::{target}, :{adverb}, ={verb}, and #{direction of}. The end of line operators are {emoticon (aka. :D :( :)}, which sets adverb if not already set, and end of line puncutation which sets verb if not already set.
the character ' is an alias for say, and sayto is an alias for say::
examples :
':happy::my sword=as# my helm Bol command operators work.
{:action=>:say, :adverb=>"happily", :verb=>"ask", :direction=>"my helm", :message=>"Bol command operators work."}
say yep say works
{:action=>:say, :message=>" yep say works"}
sayto my sword yep sayto works as do EoL operators!:)
{:action=>:say, :target=>"my sword", :adverb=>"happily", :verb=>"say", :message=>"yep sayto works as do EoL operators!"}
whisper::my friend : happy Bol command operators work with
whisper.
{:action=>:whisper, :target=>"my friend", :adverb=>"happily", :message=>"Bol command operators work with whisper."}
whisp:happy::tinkerbell and they work in a different order.
{:action=>:whisper, :adverb=>"happily", :target=>"tinkerbell", :message=>"and they work in a different order."}
':bash=exclaim::hammer BoL operators work in this order too.
{:action=>:say, :adverb=>"bashfully", :verb=>"exclaim", :target=>"hammer", :message=>"BoL operators work in this order too."}
sayto bells =say :sad #wontwork Bol > Eol and directed !work with
directional? :)
{:action=>:say, :verb=>"say", :adverb=>"sadly", :direction=>"wontwork", :message=>"Bol > Eol and directed !work with directional?"}
'all EoL removed closest to end used and reinserted. !!??!?....... :)
? :(
{:action=>:say, :adverb=>"sadly", :verb=>"ask", :message=>"all EoL removed closest to end used and reinserted?"}
Maybe this syntax is useful in your case:
emoti_convert = { ":)" => "happily", ":(" => "sadly" }
re_emoti = Regexp.union(emoti_convert.keys)
str = "It does not work :(. Oh, it does :)!"
p str.gsub(re_emoti, emoti_convert)
#=> "It does not work sadly. Oh, it does happily!"
But if you are trying to define a grammar, this is not the way to go (agreeing with #Dave Newton's comments).

Resources