Ruby gsub issues - ruby

I have a piece of text that resembled the following:
==EXCLUDE
#lots of lines of text
==EXCLUDE
#this is what I actually want
And so I was trying to remove the unwanted bit by doing:
str.gsub!(/==EX.*?==EXCLUDE/, '')
However, its not working. When I tried to remove the \n chars first, it worked like a dream. The issue is that I can't actually remove the \n characters. How can I do a substitution like this while leaving newlines in place?

By default, the . does not match line break chars. If you enable the m modifier in Ruby (in other languages, this is the s modifier) it should work:
str.gsub!(/==EX.*?==EXCLUDE/m, '')
Here's a live demo on Rubular: http://rubular.com/r/YxLSB1Iq95

Try str.gsub!(/==EX.*?==EXCLUDE/m, '')
That should make it span new lines.

Related

Ruby regex and special characters like dash (—) and »

I'm trying to replace all punctuation and the likes in some text with just a space. So I have the line
text = "—Bonne chance Harry murmura t il »"
How can I remove the dash and the dash and »? I tried
text.gsub( /»|—/, ' ')
which gives an error, not surprisingly. I'm new to ruby and just trying to get a hang of things by writing a script to pull all the words out of a chapter of a book. I figure I'd just remove the punctuation and symbols and just use text.split. Any help would be appreciated. I couldn't find much
It turns out the problem had to do with the utf-8 encoding. Adding
# encoding: utf-8
solved my issues and what #Andrewlton said works great
This should properly substitute in the way you were trying to do it; just add brackets and remove the pipe:
text.gsub(/[»—]/, ' ')
The standard punctuation regexp also works:
text.gsub(/\p{P}/, ' ')
You should be able to use regexp pretty universally, coming from whatever language you know. Hope this helps!

easy issue about Ruby

I would like to know what it does:
File.open(filename,"r").each_file do |line|
if (!line.strip.empty? and !line.starts_with?(" "))
....
.....
end
end
Especially what isstrip? Thanks for your time!
Strip removes all the leading and trailing whitespace chars from a string. In essense the code you pasted checks if the sting contains anything apart from whitespaces AND the first symbol is not a space.

TextMate: remove trailing spaces and save

I'm using a script to remove trailing spaces and then save the file.
The problem is that all my code foldings expand when I use it. How do I change the command so it will keep the code foldings?
You can use foldingStartMarker & foldingStopMarker to indicate the folds to TextMate.
To define a block that starts with { as the last non-space character on the line and stops with } as the first non-space character on the line, we can use the following patterns:
foldingStartMarker = '\{\s*$';
foldingStopMarker = '^\s*\}';
pressing the F1 key will fold any code folds present.
Reference: http://manual.macromates.com/en/navigation_overview#collapsing_text_blocks_foldings

How to remove the extra double quote?

In a malformed .csv file, there is a row of data with extra double quotes, e.g. the last line:
Name,Comment
"Peter","Nice singer"
"Paul","Love "folk" songs"
How can I remove the double quotes around folk and replace the string as:
Name,Comment
"Peter","Nice singer"
"Paul","Love _folk_ songs"
In Ruby 1.9, the following works:
result = subject.gsub(/(?<!^|,)"(?!,|$)/, '_')
Previous versions don't have lookbehind assertions.
Explanation:
(?<!^|,) # Assert that we're not at the start of the line or right after a comma
" # Match a quote
(?!,|$) # Assert that we're not at the end of the line or right before a comma
Of course this assumes that we won't run into pathological cases like
"Mary",""Oh," she said"
If you're not on Ruby 1.9, or just get tired of regexes sometimes, split the string on ,, strip the first/last quotes, replace remaining "s with _s, re-quote, and join with ,.
(We don't always have to worry about efficiency!)
$str = '"folk"';
$new = str_replace('"', '', $str);
/* now $new is only folk, without " */
Meta-strategy:
It's likely the case that the data was manually entered inconsistently, CSV's get messy when people manually enter either field terminators (double quote) or separators (comma) into the field itself. If you can have the file regenerated, ask them to use an extremely unlikely field begin/end marker, like 5 tilde's (~~~~~), and then you can split on "~~~~~,~~~~~" and get the correct number of fields every time.
Unless you have no other choice, get the file regenerated with correct escaping. Any other approach is asking for trouble, because the insertion of unescaped quotes is lossy, and thus cannot be reliably reversed.
If you can't get the file fixed from the source, then Tim Pietzcker's regex is better than nothing, but I strongly recommend that you have your script print all "fixed" lines and check them for errors manually.

Rubular/Ruby discrepancy in captured text

I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
only gets me the first line, i.e.
<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
I don't think it's my test data, but that's possible. What am I missing?
(ruby 1.9 on Ubuntu 10.10(
Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)
The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.
/^<DD>(.*)\n?(.*)\n/
Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.
^<DD>([^\n]*\n?[^\n]*)\n
I believe you need the multiline modifier in your code:
/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.
The following:
#!/usr/bin/env ruby
desc= '<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
prints
#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
on my system (Linux, Ruby 1.8.7).
Perhaps your line breaks are really \r\n (Windows style)? What if you try:
desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/

Resources