Ruby claims " " isn't actually a space? - ruby

I'm trying to troubleshoot why Ruby isn't splitting my string by empty spaces. For example:
[1] pry(#<irb>)> msg
=> "!iex <http://test-domain.com.au|test-domain.com.au> <mailto:first.last#test-domain.com.au|first.last#test-domain.com.au> FirstName"
[2] pry(#<irb>)> msg.split(" ")
=> ["!iex <http://test-domain.com.au|test-domain.com.au> <mailto:first.last#test-domain.com.au|first.last#test-domain.com.au> FirstName"]
[3] pry(#<irb>)> msg.include? " "
=> false
[8] pry(#<irb>)> msg.inspect
=> "\"!iex <http://test-domain.com.au|test-domain.com.au> <mailto:first.last#test-domain.com.au|first.last#test-domain.com.au> FirstName\""
[9] pry(#<irb>)>
As you can see above, my string appears to contain spaces, but the split method isn't working on it. I tried to run inspect on the string just to see if something else was being displayed, but it doesn't really make a lot of sense to me.

Either the string contains some other kind of whitespace or you're splitting on some other kind of whitespace. For example "foo\u2002bar" will look like foo bar but contains a special space.
Try msg.dump to see the special characters.
2.6.5 :008 > msg = "foo\u2002bar"
=> "foo bar"
2.6.5 :009 > msg.dump
=> "\"foo\\u2002bar\""
To split on any space or tab, split on the [[:blank:]] character class.
2.6.5 :006 > msg.split(/[[:blank:]]/)
=> ["foo", "bar"]

Related

Replace words in string by multiple scans

My goal is to turn any two consecutive commas into ",NA,". This means that:
str = ",,,123,,BLAH,," changes to ",NA,123,NA,BLAH,NA,"
",,," changes to ",NA,NA,"
",,,," changes to ",NA,NA,NA,"
",blah,,hi," changes to ",blah,NA,hi,"
There could be anywhere between 1 and 100,000 commas in the strings with any number of characters between the commas. My code is:
str = str.gsub!(",,",",NA,")
# => ",NA,123,NABLAH,NA"
I am running into issues because it needs to happen multiple times. If I repeat the gsub multiple times, I hit an error undefined method gsub! for nil class because gsub returns the result, yet if there is no substitution, it returns nil.
ruby > ",,,,,,,,,,,,,,,,,,,,,,".gsub(",",",NA")
=> ",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA"
or alternately:
ruby > ",,,,,,,,,,,,,,,,,,,,,,".gsub(",","NA,")
=> "NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"
edit: To handle the use case better (didn't quite get original question):
2.2.0 :004 > str=",,,123,,BLAH,,"
=> ",,,123,,BLAH,,"
2.2.0 :005 > str.split(",")
=> ["", "", "", "123", "", "BLAH"]
2.2.0 :006 > str.split(",").map{|x|x.length == 0 ? "NA" : x}.join(",")
=> "NA,NA,NA,123,NA,BLAH"
According to your use-case (",,,123,,BLAH,," turning into ",NA,123,NA,BLAH,NA,") I'm assuming you want all commas between characters to turn into ,NA,?
This is easily done using regular expressions with gsub.
str=",,,123,,BLAH,,"
str.gsub!(/,+/,",NA,") #returns ",NA,123,NA,BLAH,NA,"
the regular expression /,+/ is matching 'one or more' commas

Named capture in Ruby's regular expressions

I am trying to extract information from a line of text with relatively long regular expression. Below is a simplified regexp that describes the problem.
line = "Internet 10.9.68.178 127 c07b.bce9.7d41 ARPA Vlan2"
If I try to match this line directly without trying to 'save' regexp into a variable, it works very well:
[223] pry(main)> /Internet\s+(?<ipaddr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ =~ line
=> 0
[224] pry(main)> ipaddr
=> "10.9.68.178"
[225] pry(main)> $1
=> "10.9.68.178"
Now, when I try to do exact same thing with 'stored' version of the regexp, it fails miserably:
[226] pry(main)> ipaddr = nil # ensure that it's cleared before match
[227] pry(main)> myreg = /Internet\s+(?<ipaddr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/
=> /Internet\s+(?<ipaddr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/
[228] pry(main)> myreg =~ line
=> 0
[229] pry(main)> ipaddr
=> nil
[230] pry(main)> $1
=> "10.9.68.178"
I have also tried to call match method directly and it seems to work:
[231] pry(main)> myreg.match(line)
=> #<MatchData "Internet 10.9.68.178" ipaddr:"10.9.68.178">
but this means for a simple if statement I need to do something like this:
if m = myreg.match(line)
do_stuff m[:ipaddr]
end
instead of simply
if myreg =~ line
do_stuff ipaddr
end
Any ideas as to why the names are not captured correctly in this instance?
Interesting. I've looked this up in the Ruby Documentation.
It says there:
The assignment does not occur if the regexp is not a literal.
That's why /Internet\s+(?<ipaddr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ =~ line works, but myreg =~ line does not.
Thanks for making me learn something new. :)

Escape line breaks in puts output

In IRB on Ruby 1.8.7, I have a collection of strings I'm working with that have newlines in them. When these newlines are output, I want to explicitly see the \r and \n characters within my strings. Is there some way to tell puts to escape those characters, or a method similar to puts that will do what I want?
Note that directly evaluating each string isn't satisfactory because I want to be able to do something like this:
=> mystrings.each { |str| puts str.magical_method_to_escape_special_chars }
This is\na string in mystrings.
This is another\n\rstring.
And don't want to have to do this:
=> mystrings[0]
"This is\na string in mystrings."
=> mystrings[1]
"This is another\n\rstring."
...
=> mystrings[1000]
"There are a lot of\n\nstrings!"
I can use the string#dump method:
=> mystrings.each { |str| puts str.dump }
This is\na string in mystrings.
This is another\n\rstring.
According to the Ruby documention for String, string#dump
Produces a version of str with all nonprinting characters replaced by
\nnn notation and all special characters escaped.
1.8.7 :001 > s = "hi\nthere"
=> "hi\nthere"
1.8.7 :002 > p s
"hi\nthere"

Are strings in Ruby mutable? [duplicate]

This question already has answers here:
Are strings mutable in Ruby?
(3 answers)
Closed 7 years ago.
Consider the following code:
$ irb
> s = "asd"
> s.object_id # prints 2171223360
> s[0] = ?z # s is now "zsd"
> s.object_id # prints 2171223360 (same as before)
> s += "hello" # s is now "zsdhello"
> s.object_id # prints 2171224560 (now it's different)
Seems like individual characters can be changed w/o creating a new string. However appending to the string apparently creates a new string.
Are strings in Ruby mutable?
Yes, strings in Ruby, unlike in Python, are mutable.
s += "hello" is not appending "hello" to s - an entirely new string object gets created. To append to a string 'in place', use <<, like in:
s = "hello"
s << " world"
s # hello world
ruby-1.9.3-p0 :026 > s="foo"
=> "foo"
ruby-1.9.3-p0 :027 > s.object_id
=> 70120944881780
ruby-1.9.3-p0 :028 > s<<"bar"
=> "foobar"
ruby-1.9.3-p0 :029 > s.object_id
=> 70120944881780
ruby-1.9.3-p0 :031 > s+="xxx"
=> "foobarxxx"
ruby-1.9.3-p0 :032 > s.object_id
=> 70120961479860
so, Strings are mutable, but += operator creates a new String. << keeps old
Appending in Ruby String is not +=, it is <<
So if you change += to << your question gets addressed by itself
Strings in Ruby are mutable, but you can change it with freezing.
irb(main):001:0> s = "foo".freeze
=> "foo"
irb(main):002:0> s << "bar"
RuntimeError: can't modify frozen String
Ruby Strings are mutable. But you need to use << for concatenation rather than +.
In fact concatenating string with
+ operator(immutable) because it creates new string object.
<< operator(mutable) because it changes in the same object.
From what I can make of this pull request, it will become possible in Ruby 3.0 to add a "magic comment" that will make all string immutable, rather than mutable.
Because it seems you have to explicitly add this comment, it seems like the answer to "are string mutable by default?" will still be yes, but a sort of conditional yes - depends on whether you wrote the magic comment into your script or not.
EDIT
I was pointed to this bug/issue on Ruby-Lang.org that definitively states that some type of strings in Ruby 3.0 will in fact be immutable by default.

Split specific string by regular expression

i am trying to get an array that contain of aaaaa,bbbbb,ccccc as split output below.
a_string = "aaaaa[x]bbbbb,ccccc";
split_output a_string.split.split(%r{[,|........]+})
what supposed i put as replacement of ........ ?
No need for a regex when it's just a literal:
irb(main):001:0> a_string = "aaaaa[x]bbbbb"
irb(main):002:0> a_string.split "[x]"
=> ["aaaaa", "bbbbb"]
If you want to split by "open bracket...anything...close bracket" then:
irb(main):003:0> a_string.split /\[.+?\]/
=> ["aaaaa", "bbbbb"]
Edit: I'm still not sure what your criteria is, but let's guess that what you are really doing is looking for runs of 2-or-more of the same character:
irb(main):001:0> a_string = "aaaaa[x]bbbbb,ccccc"
=> "aaaaa[x]bbbbb,ccccc"
irb(main):002:0> a_string.scan(/((.)\2+)/).map(&:first)
=> ["aaaaa", "bbbbb", "ccccc"]
Edit 2: If you want to split by either the of the literal strings "," or "[x]" then:
irb(main):003:0> a_string.split /,|\[x\]/
=> ["aaaaa", "bbbbb", "ccccc"]
The | part of the regular expression allows expressions on either side to match, and the backslashes are needed since otherwise the characters [ and ] have special meaning. (If you tried to split by /,|[x]/ then it would split on either a comma or an x character.)
no regex needed, just use "[x]"

Resources