How to escape newline in regex scan - ruby

str = "This\n is a sample text for test"
str.scan(/\S.{0,15}\S(?=\s|$)|\S+/)
# => ["This", "is a sample text", "for test"]
Here, it splits when the newline (\n) is present. I actually want the output as,
["This\n is a", "sample text for", "test"]
How can I achieve that?

Use the /m modifier which allows the dot to match newlines:
str.scan(/\S.{0,15}\S(?=\s|\z)|\S+/m)
Also, I suggest you use \z instead of $ because $ matches the end of a line; \z is the only way to force Ruby to match the end of the string. It doesn't matter in this example, but it's a good habit to get into. Ruby differs from all other regex flavors in these two points.

Related

Extracting word in with regex

I want to replace $word with another word in the following string:
"Hello $word How are you"
I used /\$(.*)/, /\$(.*)(\s)/ , /\$(.* \s)/. Due to *, I get the whole string after $, but I only need that word; I need to escape the space. I tried /s,\b, and few other options, but I cannot figure it out. Any help would be appreciated.
* is a greedy operator meaning it will match as much as it can and still allow the remainder of the regular expression to match. The token .* will greedily match every single character in the string. The regex engine will then advance to the next token \s which matches the last whitespace before the word "you" in the string given you a result of word How are.
You can use \S in place of .* which matches any non-whitespace characters.
\$\S+
Or to simply match only word characters, you can use the following:
\$\w+
If you only want to replace "$world" using a regex, try this:
"Hello $word How are you".gsub(/\$word/, 'other_word')
Or:
"Hello $word How are you".sub('$word',"*")
You can read more for gsub here: http://www.ruby-doc.org/core-2.2.0/String.html#method-i-gsub
Substituting placeholder words for other words is usually not done with a regex but with the % method and a hash:
h = {word: "aaa", other_word: "bbb"}
p "Hello %{word} How are you. %{other_word}. Bye %{word}" % h
# => "Hello aaa How are you. bbb. Bye aaa"
Consider:
>> string = "Hello $word How are you"
=> "Hello $word How are you"
>> replace_regex = /(?<replace_word>\$\w+)/
=> /(?<replace_word>\$\w+)/
>> string.gsub(replace_regex, "Bob")
=> "Hello Bob How are you"
>> string.match(replace_regex)[:replace_word]
=> "$word"
Note:
replace_word is the regex with a named capture group.

Regexp for certain character to end of line

I have a string
"So on and so forth $5.99"
I would like to extract everything after the $ until the end of the line.
/$ finds the character $. How do I select the rest of the string? I know it's something \z but I can't get the syntax right.
In regexp $ represents the end of the line.
So in your case you need \$.*$ To include your escaped $ and everything (.*) up until the end of the line $.
No, /$ does not match that character. You need to escape it \ to match a literal.
string = "So on and so forth $5.99"
result = string.match(/\$(.*)$/)
puts result[1] #=> "5.99"
If you want to capture everything after the $, you'll want:
/\$(.*)\z/
See http://rubular.com/r/T4fR1SEl3j

Why $ doesn't match \r\n

Can someone explain this:
str = "hi there\r\n\r\nfoo bar"
rgx = /hi there$/
str.match rgx # => nil
rgx = /hi there\s*$/
str.match rgx # => #<MatchData "hi there\r\n\r">
On the one hand it seems like $ does not match \r. But then if I first capture all the white spaces, which also include \r, then $ suddenly does appear to match the second \r, not continuing to capture the trailing "\nfoo bar".
Is there some special rule here about consecutive \r\n sequences? The docs on $ simply say it will match "end of line" which doesn't explain this behavior.
$ is a zero-width assertion. It doesn't match any character, it matches at a position. Namely, it matches either immediately before a \n, or at the end of string.
/hi there\s*$/ matches because \s* matches "\r\n\r", which allows the $ to match at the position before the second \n. The $ could have also matched at the position before the first \n, but the \s* is greedy and matches as much as it can, while still allowing the overall regex to match.

pattern matching in ruby

cud any body tell me how this expression works
output = "#{output.gsub(/grep .*$/,'')}"
before that opearation value of ouptput is
"df -h | grep /mnt/nand\r\n/dev/mtdblock4 248.5M 130.7M 117.8M 53% /mnt/nand\r\n"
but after opeartion it comes
"df -h | \n/dev/mtdblock4 248.5M 248.5M 130.7M 117.8M 53% /mnt/nand\r\n "
plzz help me
Your expression is equivalent to:
output.gsub!(/grep .*$/,'')
which is much easier to read.
The . in the regular expression matches all characters except newline by default. So, in the string provided, it matches "grep /mnt/nand", and will substitute a blank string for that. The result is the provided string, without the matched substring.
Here is a simpler example:
"hello\n\n\nworld".gsub(/hello.*$/,'') => "\n\n\nworld"
In both your provided regex, and the example above, the $ is not necessary. It is used as an anchor to match the end of a line, but since the pattern immediately before it (.*) matches everything up to a newline, it is redundant (but does not cause harm).
Since gsub returns a string, your first line is exactly the same as
output = output.gsub(/grep .*$/, '')
which takes the string and removes any occurance of the regexp pattern
/grep .*$/
i.e. all parts of the string that start with 'grep ' until the end of the string or a line break.
There's a good regexp tester/reference here. This one matches the word "grep", then a space, then any number of characters until the next line-break (\r or \n). "." by itself means any character, and ".*" together means any number of them, as many as possible. "$" means the end of a line.
For the '$', see here http://www.regular-expressions.info/reference.html
".*$" means "take every character from the end of the string" ; but the parser will interpret the "\n" as the end of a line, so it stops here.

How to replace multiple newlines in a row with one newline using Ruby

I have a script written in ruby. I need to remove any duplicate newlines (e.g.)
\n
\n
\n
to
\n
My current attempt worked (or rather not) using
str.gsub!(/\n\n/, "\n")
Which gave me no change to the output. What am I doing wrong?
This works for me:
#!/usr/bin/ruby
$s = "foo\n\n\nbar\nbaz\n\n\nquux";
puts $s
$s.gsub!(/[\n]+/, "\n");
puts $s
Use the more idiomatic String#squeeze instead of gsub.
str = "a\n\n\nb\n\n\n\n\n\nc"
str.squeeze("\n") # => "a\nb\nc"
You need to match more than one newline up to an infinite amount. Your code example will work with just a minor tweak:
str.gsub!(/\n+/, "\n")
For example:
str = "this\n\n\nis\n\n\n\n\na\ntest"
str.gsub!(/\n+/, "\n") # => "this\nis\na\ntest"
are you sure it shouldn't be /\n\n\n/, "\n" that what you seem to be wanting in your question above.
also, are you sure it's not doing a windows new-line "\r\n"?
EDIT: Additional info
Per Comment
"The amount of newlines can change. Different lines have between 2 and 5 newlines."
if you only want to hit the 2-5 lines try this
/\n{2,5}/, "\n"
Simply splitting and recombining the lines will give the desired result
>> "one\ntwo\n\nthree\n".split.join("\n")
=> "one\ntwo\nthree"
Edit: I just noticed this will replace ALL whitespace substrings with newlines, e.g.
>> "one two three\n".split.join("\n")
=> "one\ntwo\nthree"
First check that this is what you want!
Simply calling split will also trim out all of your whitespace.
You need to pass \n to split
>> "one ok \ntwo\n\nthree\n".split(/\n+/).join("\n")
=> "one ok \ntwo\nthree"
Additionally, also works with
spaces on blank lines
n number of back to back blank lines
str.gsub! /\n^\s*\n/, "\n\n"
where,
\n is of course newline
\s is space
denotes 1 or more spaces along when used after \s
Try This It Worked for me:
s = test\n\n\nbar\n\n\nfooo
s.gsub("\n\n", '')
Ruby needs the backslashes escaped differently than you have provided.
str.sub!("\\\\n+\\\\n","\\\\n")
http://www.ruby-forum.com/topic/176239

Resources