Ruby regex gsub a line in a text file - ruby

I need to match a line in an inputted text file string and wrap that captured line with a character for example.
For example imagine a text file as such:
test
foo
test
bar
I would like to use gsub to output:
XtestX
XfooX
XtestX
XbarX
I'm having trouble matching a line though. I've tried using regex starting with ^ and ending with $, but it doesn't seem to work. Any ideas?
I have a text file that has the following in it:
test
foo
test
bag
The text file is being read in as a command line argument.
So I got
string = IO.read(ARGV[0])
string = string.gsub(/^(test)$/,'X\1X')
puts string
It outputs the exact same thing that is in the text file.

If you're trying to match every line, then
gsub(/^.*$/, 'X\&X')
does the trick. If you only want to match certain lines, then replace .* with whatever you need.
Update:
Replacing your gsub with mine:
string = IO.read(ARGV[0])
string = string.gsub(/^.*$/, 'X\&X')
puts string
I get:
$ gsub.rb testfile
XtestX
XfooX
XtestX
XbarX
Update 2:
As per #CodeGnome, you might try adding chomp:
IO.readlines(ARGV[0]).each do |line|
puts "X#{line.chomp}X"
end
This works equally well for me. My understanding of ^ and $ in regular expressions was that chomping wouldn't be necessary, but maybe I'm wrong.

You can do it in one line like this:
IO.write(filepath, File.open(filepath) {|f| f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)})
IO.write truncates the given file by default, so if you read the text first, perform the regex String.gsub and return the resulting string using File.open in block mode, it will replace the file's content in one fell swoop.
I like the way this reads, but it can be written in multiple lines too of course:
IO.write(filepath, File.open(filepath) do |f|
f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)
end
)

If your file is input.txt, I'd do as following
File.open("input.txt") do |file|
file.lines.each do |line|
puts line.gsub(/^(.*)$/, 'X\1X')
end
end
(.*) allows to capture any characters and makes it a variable Regexp
\1 in the string replacement is that captured group
If you prefer to do it in one line on the whole content, you can do it as following
File.read("input.txt").gsub(/^(.*)$/, 'X\1X')

string.gsub(/^(matchline)$/, 'X\1X')
Uses a backreference (\1) to get the first capture group of the regex, and surround it with X
Example:
string = "test\nfoo\ntest\nbar"
string.gsub!(/^test$/, 'X\&X')
p string
=> "XtestX\nfoo\nXtestX\nbar"

Chomp Line Endings
Your lines probably have newline characters. You need to handle this one way or another. For example, this works fine for me:
$ ruby -ne 'puts "X#{$_.chomp}X"' /tmp/corpus
XtestX
XfooX
XtestX
XbarX

Related

How can I print a string contained in an object as literal with Ruby?

I'm trying to write a program which will detect if a file has \n or \r\n line endings and then fix them. I'm hoping to have the script output some messages to a console, but I'm running into trouble. I can't figure out how to print the line endings as literals.
Here is my method which checks for the line ending type:
def determine_line_ending(filename)
File.open(filename, 'r') do |file|
return file.readline[/\r?\n$/]
end
end
ending = determine_line_ending(ARGV.first)
Supposedly this method will return either \n or \r\n if it matches one of those patterns on the first line of the file.
I would like to then print to the console which ending type was detected but if I use puts ending then it just adds a line ending to the console. I know that if I used puts '\r\n' then it will print them literal, or if I use double quotes I just have to escape the backslashes. But I'm pretty new to Ruby and I'm having a hard time just finding a way to print my variable as a literal instead of a string.
If I'm understanding you well, you want to print the "\r" string if the line ending is \r and "\r\n" if \r\n.
In this case you can use the dump function is what you need:
puts ending.dump // => "\r" or "\r\n"
I would use method String#inspect:
s = "abc\n\rdef"
puts s
puts s.inspect
This is very handy method that is defined on all objects. You can print hashes, arrays, whatever.

Ruby unable to create directories

I'm trying to read a file having string values, line by line and create respective folder/directory for every string value.
#require 'fileutils'
value=File.open('D:\\exercise\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp
print "FOlder names:#{line}"
Dir.mkdir("D:\\exercise\\#{line}")
end
and I'm getting the below error:
read_folders_svn.rb:8:in `mkdir': Invalid argument - Australia (Errno::EINVAL)
from read_folders_svn.rb:8:in `block in <main>'
list.txt file's content below
Australia
USA
EUrope
Africa
ANtartica
I tried printing the values and its working fine, while creating the respective directories facing the above issue and even tried using fileutils (fileutils.mkdir) option but still the same issue.
Any suggestions please. Thanks
The error is in the line:
line.chomp
It strips the newline from the tail of line and returns a value that is ignored. It doesn't change the value of line. It still ends with "\n" and this is a character that is not allowed in file names on Windows. The code runs fine on Linux and creates directories whose names end in "\n".
The solution is also simple. Use #chomp! instead:
#require 'fileutils'
value=File.open('D:\\exercise\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "FOlder names:#{line}"
Dir.mkdir("D:\\exercise\\#{line}")
end
(It might still produce errors, however, because of empty lines in the input).
Have you checked that the line doesn't contain extra characters? Where line.chomp! will solve your problem but line.strip! is probably the more robust variant, esp if you have windows line-endings of \r\n.
Difference between chomp and strip
String#chomp operates on the end of strings, while String#strip
operates on the start and end of strings. String#chomp takes an
optional 'record separator' argument, while String#strip takes no
arguments. If String#chomp is given no arguments it will remove
carriage returns characters from the end of the string being operated
on (\r, \n or \r\n). If String#chomp is passed a string as an
argument, that string is removed from the end of the string being
operated on. String#strip will remove leading and trailing null and
whitespace characters from the string being operated on.
"Cadel Evans".chomp(' Evans') # => "Cadel"
"Cadel Evans\r\n".chomp # => "Cadel Evans"
"\tRobbie McEwen\r\n".strip # => "Robbie McEwen"

Check the formatting of an entire file using regex

I have a file formatted by lines like this (I know it's a terrible format, I didn't write it):
id: 12345 synset: word1,word2
I want to read the entire file and check to see if every line is correct without having to look line by line.
I've looked into File and Regex, but couldn't find what I need. I tried to use File.read to read the entire file all at once, then use m modifier for regex to check multiple lines, but it's not working the way I anticipated (perhaps it's not what I need).
p.s. Ruby newbie :)
Assuming your file always ends with a newline, this should work:
/^(id: \d+ synset: \w+,\w+\n)+$/m
The full ruby:
content = ''
File.open('myfile.txt', 'r') { |f| content = f.read }
puts 'file is valid!' if content =~ /^(id: \d+ synset: \w+,\w+\n)+$/m
You can use this regex to check each line of the file: ^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$. You can try the following code, but I don't know any Ruby, I just searched and tested a little. It might work.
line_num = 0
text = File.open('file.txt').read
text.each_line do |line|
line_num += 1
if !/^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$/.match(line)
print "Line #{line_num} is incorrect"
end
end

Ruby: How to append to each line of a string based on a given regex?

I want to append </tag> to each line where it's missing:
text = '<tag>line 1</tag>
<tag>line2 # no closing tag, append
<tag>line3 # no closing tag, append
line4</tag> # no opening tag, but has a closing tag, so ignore
<tag>line5</tag>'
I tried to create a regular expression to match this but I know its wrong:
text.gsub! /.*?(<\/tag>)Z/, '</tag>'
How can I create a regular expression to conditionally append each line?
Here you go:
text.gsub!(%r{(?<!</tag>)$}, "</tag>")
Explanation:
$ means end of line and \z means end of string. \Z means something similar, with complications.
(?<!) work together to create a negative lookbehind.
Given the example provided, I'd just do something like this:
text.split(/<\/?tag>/).
reject {|t| t.strip.length == 0 }.
map {|t| "<tag>%s</tag>" % t.strip }.
join("\n")
You're basically treating either and as record delimiters, so you can just split on them, reject any blank records, then construct a new combined string from the extracted values. This works nicely when you can't count on newlines being record delimiters and will generally be tolerant of missing tags.
If you're insistent on a pure regex solution, though, and your data format will always match the given format (one record per line), you can use a negative lookbehind:
text.strip.gsub(/(?<!<\/tag>)(\n|$)/, "</tag>\\1")
One that could work is:
/<tag>[^\n ]+[^>][\s]*(\n)/
This is will return all the newline chars without a ">" before them.
Replace it with "\n", i.e.
text.gsub!( /<tag>[^\n ]+[^>][\s]*(\n)/ , "</tag>\n")
For more polishing, try http://rubular.com/
text = '<tag>line 1</tag>
<tag>line2
<tag>line3
line4</tag>
<tag>line5</tag>'
result = ""
text.each_line do |line|
line.rstrip!
line << "</tag>" if not line.end_with?("</tag>")
result << line << "\n"
end
puts result
--output:--
<tag>line 1</tag>
<tag>line2</tag>
<tag>line3</tag>
line4</tag>
<tag>line5</tag>

How do you loop through a multiline string in Ruby?

Pretty simple question from a first-time Ruby programmer.
How do you loop through a slab of text in Ruby? Everytime a newline is met, I want to re-start the inner-loop.
def parse(input)
...
end
String#each_line
str.each_line do |line|
#do something with line
end
What Iraimbilanja said.
Or you could split the string at new lines:
str.split(/\r?\n|\r/).each { |line| … }
Beware that each_line keeps the line feed chars, while split eats them.
Note the regex I used here will take care of all three line ending formats. String#each_line separates lines by the optional argument sep_string, which defaults to $/, which itself defaults to "\n" simply.
Lastly, if you want to do more complex string parsing, check out the built-in StringScanner class.
You can also do with with any pattern:
str.scan(/\w+/) do |w|
#do something
end
str.each_line.chomp do |line|
# do something with a clean line without line feed characters
end
I think this should take care of the newlines.

Resources