Ruby string split with space as input removes more than just spaces - ruby

I have some text that has this particular character:
When I call the string split() method (with just ' ' as the input) , the  gets removed. What should I do to keep the ?

That's the expected behavior when passing ' '. According to the docs:
If pattern is a single space, str is split on whitespace, with leading and trailing whitespace and runs of contiguous whitespace characters ignored.
With "whitespace" being space (" ") and \t, \n, \v, \f, \r.
"foo bar\nbaz \f qux".split(' ')
#=> ["foo", "bar", "baz", "qux"]
To split on space (U+0020) only, you have to use a regexp:
"foo bar\nbaz \f qux".split(/ /)
#=> ["foo", "bar\nbaz", "\f", "qux"]

Related

Ruby - substitute \n if not \\n

I'm trying to do a regex with lookbehind that changes \n to but not if it's a \\n.
My closest attempt has no effect:
text.gsub /(?<!\\)\n/, ''
Unfortunately, no number of backslashes in the lookbehind seem to fix the problem. How can I address this?
You need to double the backslash before the n in the regex, otherwise it's looking for a newline instead of a literal backslash followed by n:
irb(main):001:0> puts "hello\\nthere\\\\n".gsub(/(?<!\\)\\n/, ' ')
hello there\\n
You don't need anything special. "\n" is a single character. It does not include a "\" or "n" character.
text.gsub(/\n/, "")
But instead of that, you should do:
text.gsub("\n", "")
or
text.tr("\n", "")
But I would do:
text.tr($/, "")

How to replace \r in a string in ruby

I have a string that looks like this.
mystring="The Body of a\r\n\t\t\t\tSpider"
I want to replace all the \r, \n, \t etc with a whitespace.
The code I wrote for this is :
mystring.gsub(/\\./, " ")
But this isn't doing anything to the string.
Help.
\r, \n and \t are escape sequences representing carriage return, line feed and tab. Although they are written as two characters, they are interpreted as a single character:
"\r\n\t".codepoints #=> [13, 10, 9]
Because it is such a common requirement, there's a shortcut \s to match all whitespace characters:
mystring.gsub(/\s/, ' ')
#=> "The Body of a Spider"
Or \s+ to match multiple whitespace characters:
mystring.gsub(/\s+/, ' ')
#=> "The Body of a Spider"
/\s/ is equivalent to /[ \t\r\n\f]/
String#tr is designed for stream symbol substitution. It appears to be a bit quickier, than String#gsub:
mystring.tr "\r", ' '
It hasan insplace version also (this will replace all carriage returns, line feed and spaces with space):
mystring.tr! "\s\r\n\t\f", ' '
Stefen's Answer is really very Cool as always comeup with very short and clean solutions. But here what I tried to remove all special characters. [Posted as just optional solution] ;)
> a = "The Body of a\r\n\t\t\t\tSpider"
=> "The Body of a\r\n\t\t\t\tSpider"
> a.gsub(/[^0-9A-Za-z]/, ' ')
=> "The Body of a Spider"
you can use strip , then add a space to your string
mystring.strip . " "
If you literally has \r\n\t in your string:
mystring="The Body of a\r\n\t\t\t\tSpider"
mystring.split(/[\r\t\n]/)

How to use Ruby's gsub function to replace excessive '\n' on a string

I have this string:
string = "SEGUNDA A SEXTA\n05:24 \n05:48\n06:12\n06:36\n07:00\n07:24\n07:48\n\n08:12 \n08:36\n09:00\n09:24\n09:48\n10:12\n10:36\n11:00 \n11:24\n11:48\n12:12\n12:36\n13:00\n13:24\n13:48 \n14:12\n14:36\n15:00\n15:24\n15:48\n16:12\n16:36 \n17:00\n17:24\n17:48\n18:12\n18:36\n19:00\n19:48 \n20:36\n21:24\n22:26\n23:15\n00:00\n"
And I'd like to replace all \n\n occurrences to only one \n and if it's possible I'd like to remove also all " " (spaces) between the numbers and the newline character \n
I'm trying to do:
string.gsub(/\n\n/, '\n')
but it is replacing \n\n by \\n
Can anyone help me?
The real reason is because single quoted sting doesn't escape special characters (like \n).
string.gsub(/\n/, '\n')
It replaces one single character \n with two characters '\' and 'n'
You can see the difference by printing the string:
[302] pry(main)> puts '\n'
\n
=> nil
[303] pry(main)> puts "\n"
=> nil
[304] pry(main)> string = '\n'
=> "\\n"
[305] pry(main)> string = "\n"
=> "\n"
I think you're looking for:
string.gsub( / *\n+/, "\n" )
This searches for zero or more spaces followed by one or more newlines, and replaces the match with a single newline.

Regex to match any character except a backslash

How can i say "all symbols except backslash" in Ruby character class?
/'[^\]*'/.match("'some string \ hello'") => should be nil
Variant with two backslashed doesn't work
/'[^\\]*'/.match("'some string \ hello'") => 'some string \ hello' BUT should be nil
Your problem is not with your regex; you got that right. Your problem is that your test string does not have a backslash in it. It has an escaped space, instead. Try this:
str = "'some string \\ hello'"
puts str #=> 'some string \ hello'
p /'[^\\]*'/.match(str) #=> nil
You need to escape the backslash:
[^\\]*
because backslash is the escape character in regular expressions, thus escaping the closing bracket here.
If you want to verify that the whole string contains non-backslash characters, then you need anchors:
^[^\\]*$
There's actually not a backslash in your string to match against. Try taking a look at just your input:
"'some string \ hello'".length # => 20
"a\ b".length # => 3
The "\ " in double quotes is being escaped into just a space:
irb(main):014:0> " "[0].to_i # => 32
irb(main):015:0> "\ "[0].to_i # => 32
irb(main):016:0> "\ ".size #=> 1
If you want to match no slash, you'll need two, like in your second example, which looks good to me:
/'[^\\]*'/.match("'some string \\ hello'") # => nil
This is relevant to Csharp.. but idk it might help you.
[^\\\\]*
four slashes instead of two.

Why does '\n' not work, and what does $/ mean?

Why doesn't this code work:
"hello \nworld".each_line(separator = '\n') {|s| p s}
while this works?
"hello \nworld".each_line(separator = $/) {|s| p s}
A 10 second google yielded this:
$/ is the input record separator, newline by default.
The first one doesn't work because you used single quotes. Backslash escape sequences are ignored in single quoted strings. Use double quotes instead:
"hello \nworld".each_line(separator = "\n") {|s| p s}
First, newline is the default. All you need is
"hello \nworld".each_line {|s| p s}
Secondly, single quotes behave differently than double quotes. '\n' means a literal backslash followed by the letter n, whereas "\n" means the newline character.
Last, the special variable $/ is the record separator which is "\n" by default, which is why you don't need to specify the separator in the above example.
Simple gsub! your string with valid "\n" new line character:
text = 'hello \nworld'
text.gsub!('\n', "\n")
After that \n character will act like newline character.

Resources