What is this regex replacing? - ruby

I have this line in a Ruby file loading program:
row_hash.map{|k,v| v.gsub!(/\A"|"\Z/, '').try(:strip!) if !v.nil? }
I remember adding it, though the reason escapes me. I know that \A and \Z are the start and end of a string, respectively.
I've written regexes intermittently for 15 years, but the "|" is what's really mystifying me?

It strips quotes from strings.
This regex suffers from leaning toothpick syndrome. We can ease that by using %r, balanced delimiters, and extended formatting to ignore whitespace.
%r{ \A" | "\Z }x;
It matches a quote at the beginning of the string, or one at the end (or just before a newline).
So looking at it all together...
v.gsub!( %r{ \A" | "\Z }x;, '' ).try(:strip) if !v.nil?
The gsub! will apply the match until it doesn't match anymore. So it will match quotes at the beginning and end of v and replace them with nothing, all in place to v. The end result is v is stripped of beginning and ending quotes.
Then there's the blah.try(:strip). That's a Rails extension which is roughly equivalent to...
blah.strip if blah
Since gsub! will return null if the match fails, that means it will strip v only if it was in quotes. It will do it after the quotes have been stripped and it will only do it if there were quotes. I suspect this is not the intended behavior.
However, strip doesn't alter v in place so probably does nothing unless you're using the return value of map which would make this even more complicated. You probably want try(:strip!).
Finally if !v.nil? means all that will only happen if v wasn't nil. Putting it at the end of an already complicated statement makes things even harder to understand.
This is a bit over-complicated as one line. It would be better if the nil check was done separate and the whole thing properly spaced out. I've also decided to use an if condition instead of try to make it more obvious the stripping only happens if the gsub matches, I don't think that's the desired behavior and want it to be really obvious to anyone reading it.
row_hash.map { |_,v|
next if v.nil?
if v.gsub!( %r{ \A" | "\Z }x;, '' )
v.strip!
end
}
Finally, since the behavior is really specific and finicky (and probably subtly wrong) the inner portion should be turned into a method so it can be named, documented and tested.
row_hash.map { |_,v| v.strip_quotes! }

It replaces the quote character at the start and end of the string. It ignores other occurrences of the character. Here's a sample of how the regex works.
http://rubular.com/r/pVMbQ9aqSl
"|" does not mean that the pipe is quoted. It basically matches \A" (start of the string followed by " ) or "\Z ( " followed by end of the string)
Let me know if this helps.

Related

Match & includes? method

My code is about a robot who has 3 posible answers (it depends on what you put in the message)
So, inside this posible answers, one depends if the input it's a question, and to prove it, i think it has to identify the "?" symbol on the string.
May i have to use the "match" method or includes?
This code it's gonna be include in a loop, that may answer in 3 possible ways.
Example:
puts "whats your meal today?"
answer = gets.chomp
answer.includes? "?"
or
answer.match('?')
Take a look at String#end_with? I think that is what you should use.
Use String#match? Instead
String#chomp will only remove OS-specific newlines from a String, but neither String#chomp nor String#end_with? will handle certain edge cases like multi-line matches or strings where you have whitespace characters at the end. Instead, use a regular expression with String#match?. For example:
print "Enter a meal: "
answer = gets.chomp
answer.match? /\?\s*\z/m
The Regexp literal /\?\s*\z/m will return true value if the (possibly multi-line) String in your answer contains:
a literal question mark (which is why it's escaped)...
followed by zero or more whitespace characters...
anchored to the end-of-string with or without newline characters, e.g. \n or \r\n, although those will generally have been removed by #chomp already.
This will be more robust than your current solution, and will handle a wider variety of inputs while being more accurate at finding strings that end with a question mark without regard to trailing whitespace or line endings.

Working with Ruby class: Capitalizing a string

I'm trying to get my head around how to work with Classes in Ruby and would really appreciate some insight on this area. Currently, I've got a rather simple task to convert a string with the start of each word capitalized. For example:
Not Jaden-Cased: "How can mirrors be real if our eyes aren't real"
Jaden-Cased: "How Can Mirrors Be Real If Our Eyes Aren't Real"
This is my code currently:
class String
def toJadenCase
split
capitalize
end
end
#=> usual case: split.map(&:capitalize).join(' ')
Output:
Expected: "The Moment That Truth Is Organized It Becomes A Lie.",
instead got: "The moment that truth is organized it becomes a lie."
I suggest you not pollute the core String class with the addition of an instance method. Instead, just add an argument to the method to hold the string. You can do that as follows, by downcasing the string then using gsub with a regular expression.
def to_jaden_case(str)
str.downcase.gsub(/(?<=\A| )[a-z]/) { |c| c.upcase }
end
to_jaden_case "The moMent That trUth is organized, it becomes a lie."
#=> "The Moment That Truth Is Organized, It Becomes A Lie."
Ruby's regex engine performs the following operations.
(?<=\A| ) : use a positive lookbehind to assert that the following match
is immediately preceded by the start of the string or a space
[a-z] : match a lowercase letter
(?<=\A| ) can be replaced with the negative lookbehind (?<![^ ]), which asserts that the match is not preceded by a character other than a space.
Notice that by using String#gsub with a regular expression (unlike the split-process-join dance), extra spaces are preserved.
When spaces are to be matched by a regular expression one often sees whitespaces (\s) matched instead. Here, for example, /(?<=\A|\s)[a-z]/ works fine, but sometimes matching whitespaces leads to problems, mainly because they also match newlines (\n) (as well as spaces, tabs and a few other characters). My advice is to match space characters if spaces are to be matched. If tabs are to be matched as well, use a character class ([ \t]).
Try:
def toJadenCase
self.split.map(&:capitalize).join(' ')
end

Generating a character class

I'm trying to censor letters in a word with word.gsub(/[^#{guesses}]/i, '-'), where word and guesses are strings.
When guesses is "", I get this error RegexpError: empty char-class: /[^]/i. I could sort such cases with an if/else statement, but can I add something to the regex to make it work in one line?
Since you are only matching (or not matching) letters, you can add a non-letter character to your regex, e.g. # or %:
word.gsub(/[^%#{guesses}]/i, '-')
See IDEONE demo
If #{guesses} is empty, the regex will still be valid, and since % does not appear in a word, there is no risk of censuring some guessed percentage sign.
You have two options. One is to avoid testing if your matches are empty, that is:
unless (guesses.empty?)
word.gsub(/^#{Regex.escape(guesses)}/i, '-')
end
Although that's not your intention, it's really the safest plan here and is the most clear in terms of code.
Or you could use the tr function instead, though only for non-empty strings, so this could be substituted inside the unless block:
word.tr('^' + guesses.downcase + guesses.upcase, '-')
Generally tr performs better than gsub if used frequently. It also doesn't require any special escaping.
Edit: Added a note about tr not working on empty strings.
Since tr treats ^ as a special case on empty strings, you can use an embedded ternary, but that ends up confusing what's going on considerably:
word.tr(guesses.empty? ? '' : ('^' + guesses.downcase + guesses.upcase), '-')
This may look somewhat similar to tadman's answer.
Probably you should keep the string that represents what you want to hide, instead of what you want to show. Let's say this is remains. Then, it would be easy as:
word.tr(remains.upcase + remains.downcase, "-")

In Ruby, what's the easiest way to "chomp" at the start of a string instead of the end?

In Ruby, sometimes I need to remove the new line character at the beginning of a string. Currently what I did is like the following. I want to know the best way to do this. Thanks.
s = "\naaaa\nbbbb"
s.sub!(/^\n?/, "")
lstrip seems to be what you want (assuming trailing white space should be kept):
>> s = "\naaaa\nbbbb" #=> "\naaaa\nbbbb"
>> s.lstrip #=> "aaaa\nbbbb"
From the docs:
Returns a copy of str with leading whitespace removed. See also
String#rstrip and String#strip.
http://ruby-doc.org/core-1.9.3/String.html#method-i-lstrip
strip will remove all trailing whitespace
s = "\naaaa\nbbbb"
s.strip!
Little hack to chomp leading whitespace:
str = "\nmy string"
chomped_str = str.reverse.chomp.reverse
To be perfectly accurate chomp not only can delete whitespace, from the end of a string, but can also delete arbitrary characters.
If the latter functionality is sought, one can use:
'\naaaa\nbbbb'.delete_prefix( "\n" )
As opposed to strip this works for arbitrary characters exactly like chomp.
So, just for a bit of clarification, there are three ways that you can go about this: sub, reverse.chomp.reverse and lstrip.
I'd recommend against sub because it's a bit less readable, but also because of how it works: by creating a new string that inherits from your old string. Plus you need a regular expression for something that's fairly simple.
So then you're down to reverse.chomp.reverse and lstrip. Most likely, you want lstrip because it's a bit faster, but keep in mind that the strip operations are not the same as the chomp operations. strip will remove all leading newlines and whitespace:
"\n aaa\nbbb".reverse.chomp.reverse # => " aaa\nbbb"
"\n aaa\nbbb".lstrip # => "aaa\nbbb"
If you want to make sure you only remove one character and that it's definitely a newline, use the reverse.chomp.reverse solution. If you consider all leading newlines and whitespace garbage, go with lstrip.
The one case I can think of for using regular expressions would be if you have an unknown number of \rs and \ns at the beginning and want to trim them all but avoid touching any whitespace. You could use a loop and the more String methods for trimming but it would just be uglier. The performance implications don't really matter that much.
s.sub(/^[\n\r]*/, '')
This removes leading newlines (carriage returns and line feeds, as in chomp), not any whitespace.
Not sure if it's the best way but you could try:
s.reverse.chomp.reverse
if you want to leave the trailing newline (if it exists).
This should work for you: s.strip.
A way to do this for whitespace or non-whitespace characters is like this:
s = "\naaaa\nbbbb"
s.slice!("\n") # returns "\n" but s also has the first newline removed.
puts s # shows s has the first newline removed

ruby global variable dollar sign semicolon ($;) regex equivilant

I have a string "\nbed.bed_id,\nbed.bed_label,\nbed.room_id,\nbed.pool_bed, nbed.record_state\n"and I need to split it by white space and comma.
I tried split(/,?\s+/) which works but also leaves a "" at the beginning.
Using split($;) doesn't. What I'm looking for is say split(/,?$;/) is there a way to retain the default functionality and just add to it?
(p.s I know I can do this split[1..-1], there are so many ways to do things in ruby).
update:
My issue was with $; I wasn't sure really what it was and thought it had special meaning, because as a variable irb>$; #=> nil. Now it may be that I missed it, or that the documentation has been updated but, ruby-doc.org says "If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ` ‘ were specified."
As well, $; is from Perl or awk, known as the SUBSEP, and a further explanation as to why the beginning is stripped away with $; is here Why is split(' ') trying to be (too) smart?
You can't avoid split() from returning some empty elements at the start or end in this case?
Try rejecting empty strings from the array:
string.split(/,?\s+/).reject &:empty?
With using split u can do it
str = "\nbed.bed_id,\nbed.bed_label,\nbed.room_id,\nbed.pool_bed, nbed.record_state\n"
st = str.split(/,?\s+/)
st.shift
st

Resources