Issue copying file into new file gsub with regex, variable and string? - ruby

I'm struggling with a script to target specific XML files in a directory and rename them as copies with a different name.
I put in the puts statements for debugging, and, from what I can tell, everything looks OK until the FileUtils.cp line. I tried this with simpler text and it worked, but my overly complicated cp(file, file.gsub()) seems to be causing problems that I can't figure out.
def piano_treatment(cats)
FileUtils.chdir('12Piano')
src = Dir.glob('*.xml')
src.each do |file|
puts file
cats.each do |text|
puts text
if file =~ /#{text}--\d\d/
puts "Match Found!!"
puts FileUtils.pwd
FileUtils.cp(file, file.gsub!(/#{text}--\d\d/, "#{text}--\d\dBass "))
end
end
end
end
piano_treatment(cats)
I get the following output in Terminal:
12Piano--05Free Stuff--11Test.xml
05Free Stuff
Match Found!!
/Users/mbp/Desktop/Sibelius_Export/12Piano
cp 12Piano--05Free Stuff--ddBass Test.xml 12Piano--05Free Stuff--ddBass Test.xml
/Users/mbp/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:1551:in `stat': No such file or directory - 12Piano--05Free Stuff--ddBass Test.xml (Errno::ENOENT)
Why is \d\d showing up as "dd" when it should actually be numbers? Is this a single vs. double quote issue? Both yield errors.
Any suggestions are appreciated. Thanks.
EDIT One additional change was needed to this code. The FileUtils.chdir('12Piano') would change the directory for the first iteration of the loop, but it would revert to the source directory after that. Instead I did this:
def piano_treatment(cats)
src = Dir.glob('12Piano/*.xml')
which sets the match path for the whole method.

Your replacement string is not a regex, so \d has no special meaning, but is just a literal string. You need to specify a group in your regex, and then you can use the captured group in your replacement string:
FileUtils.cp(file, file.gsub(/#{text}--(\d\d)/, "#{text}--\\1Bass "))
The parenthesis in the regex form the group, which can be used (by number) in the replacement string: \1 for the first group, \2 for the second, etc. \0 refers to the entire regex match.
Update
Replaced gsub!() with gsub() and escaped the backslash in the replacement string (to treat \1 as the capture group, not a literal character... Doh!).

Related

how to overwrite part of a line in a txt file with regex and .sub in ruby

I have the following layout in a txt file.
[item] label1: comment1 | label2: foo
I have the code below. The goal is to modify part of an existing line in text
def replace_info(item, bar)
return "please create a file first" unless File.exist?('site_info.txt')
IO.foreach('site_info.txt','a+') do |line|
if line.include?(item)
#regex should find the data from the whitespace after the colon all the way to the end.
#this should be equivalent to foo
foo_string = line.scan(/[^"label2: "]*\z/)
line.sub(foo_string, bar)
end
end
end
Please advise. Perhaps my regrex is off, but .sub is correct, but I cannot overwrite line.
Tiny problem: Your regular expression does not do what you think. /[^"label2: "]*\z/ means: any number of characters at the end of line that are not a, b, e, l, ", space, colon or 2 (see Character classes). And scan returns an array, which sub doesn't work with. But that doesn't really matter, because...
Small problem: line.sub(foo_string, bar) doesn't do anything. It returns a changed string, but you don't assign it to anything and it gets thrown away. line.sub!(foo_string, bar) would change line itself, but that leads us to...
Big problem: You cannot just change the read line and expect it to change in the file itself. It's like reading a book, thinking you could write a line better, and expecting it to change the book. The way to change a line in a text file is to read from one file and copy what you read to another. If you change a line between reading and writing, the newly written copy will be different. At the end, you can rename the new file to the old file (which will delete the old file and replace it atomically with the new one).
EDIT: Here's some code. First, I dislike IO.foreach as I like to control the iteration myself (and IMO, IO.foreach is not readable as IO#each_line). In the regular expression, I used lookbehind to find the label without including it into the match, so I can replace just the value; I changed to \Z for a similar reason, to exclude the newline from the match. You should not be returning error messages from functions, that's what exceptions are for. I changed simple include? to #start_with? because your item might be found elsewhere in the line when we wouldn't want to trigger the change.
class FileNotFoundException < RuntimeError; end
def replace_info(item, bar)
# check if file exists
raise FileNotFoundException unless File.exist?('site_info.txt')
# rewrite the file
File.open('site_info.txt.bak', 'wt') do |w|
File.open('site_info.txt', 'rt') do |r|
r.each_line do |line|
if line.start_with?("[#{item}]")
line.sub!(/(?<=label2: ).*?\Z/, bar)
end
w.write(line)
end
end
end
# replace the old file
File.rename('site_info.txt.bak', 'site_info.txt')
end
replace_info("item", "bar")

Regex match anything except ending string

I'm trying to make a regex that matches anything except an exact ending string, in this case, the extension '.exe'.
Examples for a file named:
'foo' (no extension) I want to get 'foo'
'foo.bar' I want to get 'foo.bar'
'foo.exe.bar' I want to get 'foo.exe.bar'
'foo.exe1' I want to get 'foo.exe1'
'foo.bar.exe' I want to get 'foo.bar'
'foo.exe' I want to get 'foo'
So far I created the regex /.*\.(?!exe$)[^.]*/
but it doesn't work for cases 1 and 6.
You can use a positive lookahead.
^.+?(?=\.exe$|$)
^ start of string
.+? non greedily match one or more characters...
(?=\.exe$|$) until literal .exe occurs at end. If not, match end.
See demo at Rubular.com
Wouldn't a simple replacement work?
string.sub(/\.exe\z/, "")
Do you mean regex matching or capturing?
There may be a regex only answer, but it currently eludes me. Based on your test data and what you want to match, doing something like the following would cover both what you want to match and capture:
name = 'foo.bar.exe'
match = /(.*).exe$/.match(name)
if match == nil
# then this filename matches your conditions
print name
else
# otherwise match[1] is the capture - filename without .exe extension
print match[1]
end
string pattern = #" (?x) (.* (?= \.exe$ )) | ((?=.*\.exe).*)";
First match is a positive look-ahead that checks if your string
ends with .exe. The condition is not included in the match.
Second match is a positive look-ahead with the condition included in the
match. It only checks if you have something followed by .exe.
(?x) is means that white spaces inside the pattern string are ignored.
Or don't use (?x) and just delete all white spaces.
It works for all the 6 scenarios provided.

Ruby file output issue

I have written some ruby to automate batch file creation, the problem lies with the resulting output in the GUI;
The files are outputted, but the formatting looks very strange indeed. Also the filenames are all ending in '.txt' but MacOS does not see it this way. i.e. You cannot click to open in Textedit.
Code is as follows;
puts "Please enter amount of files to create: "
file_count = gets.to_i
puts "Thanks! Enter a filename header: "
file_head = gets
puts "And a suffix?"
suffix = gets
puts "Please input your target directory:"
Dir.chdir(gets.chomp)
while file_count != 0
filename = "#{file_head}_#{file_count}#{suffix}"
File.open(filename, "w") {|x| x.write("This is #{filename}.")}
file_count -= 1
end
Tips on shortening length or refactoring are always welcome.
The Kernel#gets documentation contains:
The separator is included with the contents of each record.
By default the separator is a newline (see $/). So both file_head and suffix end with a newline character. filename also does, of course. Thus the extension of your files isn't .txt as it's actually ".txt\n" (in Ruby string notation). The application takes the newline character literally and continues writing the filename on a new line. That's why it looks so strange!
You already know a way to fix it: call String#chomp to get rid of the trailing newline (the separator). See the line in your code that contains Dir.chdir for an example.

How to remove the extra double quote?

In a malformed .csv file, there is a row of data with extra double quotes, e.g. the last line:
Name,Comment
"Peter","Nice singer"
"Paul","Love "folk" songs"
How can I remove the double quotes around folk and replace the string as:
Name,Comment
"Peter","Nice singer"
"Paul","Love _folk_ songs"
In Ruby 1.9, the following works:
result = subject.gsub(/(?<!^|,)"(?!,|$)/, '_')
Previous versions don't have lookbehind assertions.
Explanation:
(?<!^|,) # Assert that we're not at the start of the line or right after a comma
" # Match a quote
(?!,|$) # Assert that we're not at the end of the line or right before a comma
Of course this assumes that we won't run into pathological cases like
"Mary",""Oh," she said"
If you're not on Ruby 1.9, or just get tired of regexes sometimes, split the string on ,, strip the first/last quotes, replace remaining "s with _s, re-quote, and join with ,.
(We don't always have to worry about efficiency!)
$str = '"folk"';
$new = str_replace('"', '', $str);
/* now $new is only folk, without " */
Meta-strategy:
It's likely the case that the data was manually entered inconsistently, CSV's get messy when people manually enter either field terminators (double quote) or separators (comma) into the field itself. If you can have the file regenerated, ask them to use an extremely unlikely field begin/end marker, like 5 tilde's (~~~~~), and then you can split on "~~~~~,~~~~~" and get the correct number of fields every time.
Unless you have no other choice, get the file regenerated with correct escaping. Any other approach is asking for trouble, because the insertion of unescaped quotes is lossy, and thus cannot be reliably reversed.
If you can't get the file fixed from the source, then Tim Pietzcker's regex is better than nothing, but I strongly recommend that you have your script print all "fixed" lines and check them for errors manually.

Rubular/Ruby discrepancy in captured text

I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
only gets me the first line, i.e.
<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
I don't think it's my test data, but that's possible. What am I missing?
(ruby 1.9 on Ubuntu 10.10(
Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)
The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.
/^<DD>(.*)\n?(.*)\n/
Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.
^<DD>([^\n]*\n?[^\n]*)\n
I believe you need the multiline modifier in your code:
/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.
The following:
#!/usr/bin/env ruby
desc= '<DD>#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
puts description = $1
end
prints
#mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
on my system (Linux, Ruby 1.8.7).
Perhaps your line breaks are really \r\n (Windows style)? What if you try:
desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/

Resources