Check the formatting of an entire file using regex - ruby

I have a file formatted by lines like this (I know it's a terrible format, I didn't write it):
id: 12345 synset: word1,word2
I want to read the entire file and check to see if every line is correct without having to look line by line.
I've looked into File and Regex, but couldn't find what I need. I tried to use File.read to read the entire file all at once, then use m modifier for regex to check multiple lines, but it's not working the way I anticipated (perhaps it's not what I need).
p.s. Ruby newbie :)

Assuming your file always ends with a newline, this should work:
/^(id: \d+ synset: \w+,\w+\n)+$/m
The full ruby:
content = ''
File.open('myfile.txt', 'r') { |f| content = f.read }
puts 'file is valid!' if content =~ /^(id: \d+ synset: \w+,\w+\n)+$/m

You can use this regex to check each line of the file: ^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$. You can try the following code, but I don't know any Ruby, I just searched and tested a little. It might work.
line_num = 0
text = File.open('file.txt').read
text.each_line do |line|
line_num += 1
if !/^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$/.match(line)
print "Line #{line_num} is incorrect"
end
end

Related

How to delete lines from multiple files

I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).
If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.
list.txt:
car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........
Directory having multiple files: d:\new_work:
Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......
My code:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
The issue is, values are not getting deleted as expected. Also, text is empty when I tried to print the value.
There are a number of things wrong with your code, and you're not safely handling your file changes.
Meditate on this untested code:
ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")
File.foreach('D:/mywork/list.txt') do |target|
target = target.strip.sub(/,$/, '')
ACCESS_FILES.each do |filename|
new_filename = "#{filename}.new"
old_filename = "#{filename}.old"
File.open(new_filename, 'w') do |fileout|
File.foreach(filename) do |line_in|
fileout.puts line_in unless line_in[target]
end
end
File.rename(filename, old_filename)
File.rename(new_filename, filename)
File.delete(old_filename)
end
end
In your code you use:
File.open('D:\\mywork\\list.txt').read
instead, a shorter, and more concise and clear way would be to use:
File.read('D:/mywork/list.txt')
Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".
The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.
Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
While
Dir.glob("D:/new_work/*-access.txt") do |fn|
is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.
Again,
text = File.read(fn)
has scalability issues. Using foreach is a better solution. Again.
Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:
replace = text.gsub(line.strip, "")
Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:
File.open(fn, "w") { |file| file.puts replace }
A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.
A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.
I just ran your code and it works as expected on my machine. My best guess is that you're not taking the commas at the end of each line in list.txt into account. Try removing them with an extra chomp!:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
line.chomp!(",")
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
By the way, you shouldn't need this line: value.gsub!(/\r\n?/, "\n") since you're chomping all the newlines away anyway, and chomp can recognize \r\n by default.

Ruby: sub/gsub at a particular line OR before/after a pattern

I know that I can replace text as below in a file
File.write(file, File.read(file).gsub(/text/, "text_to_replace"))
Can we also use sub/gsub to:-
Replace a string on a particular line number (useful when there is a same string at different locations in a file)
Example
root#vikas:~# cat file.txt
fix grammatical or spelling errors
clarify meaning without changing it
correct minor mistakes
add related resources or links
root#box27:~#
I want to insert some text at 3rd line
root#vikas:~# cat file.txt
fix grammatical or spelling errors
clarify meaning without changing it
Hello, how are you ?
correct minor mistakes
add related resources or links
root#box27:~
Replace a string on the line just before/after matching a pattern
Example
root#vikas:~# cat file.txt
fix grammatical or spelling errors
clarify meaning without changing it
correct minor mistakes
add related resources or links
root#box27:~#
I want to search 'minor mistakes' and put text 'Hello, how are you ?' before that.
root#vikas:~# cat file.txt
fix grammatical or spelling errors
clarify meaning without changing it
Hello, how are you ?
correct minor mistakes
add related resources or links
root#box27:~
Here is the answer.
File.open("file.txt", "r").each_line do |line|
if line =~ /minor mistakes/
puts "Hello, how are you ?"
end
puts "#{line}"
end
Here is ruby one-liner.
ruby -pe 'puts "Hello, how are you ?" if $_ =~ /minor mistakes/' < file.txt
You can find this functionality in a gem like Thor. Check out the documentation for the inject_into_file method here:
http://www.rubydoc.info/github/erikhuda/thor/master/Thor/Actions#inject_into_file-instance_method.
Here is the source code for the method:
https://github.com/erikhuda/thor/blob/067f6638f95bd000b0a92cfb45b668bca5b0efe3/lib/thor/actions/inject_into_file.rb#L24-L32
If you wish to match on line n (offset from zero):
def match_line_i(fname, linenbr regex)
IO.foreach(fname).with_index { |line,i|
return line[regex] if i==line_nbr }
end
or
return scan(regex) if i==line_nbr }
depending on your requirements.
If you wish to match on a given line, then return the previous line, for application of gsub (or whatever):
def return_previous_line(fname, regex)
last_line = nil
IO.foreach(fname) do |line|
line = f.readline
return last_line if line =~ regex
last_line = line
end
end
Both methods return nil if there is no match.
Okay, as there is no such option available with sub/gsub, I am pasting here my code (with slight modifications to BMW's code) for all three options. Hopefully, this helps someone in a similar situation.
Insert text before a pattern
Insert text after a pattern
Insert text at a specific line number
root#box27:~# cat file.txt
fix grammatical or spelling errors
clarify meaning without changing it
correct minor mistakes
add related resources or links
always respect the original author
root#box27:~#
root#box27:~# cat ruby_script
puts "#### Insert text before a pattern"
pattern = 'minor mistakes'
File.open("file.txt", "r").each_line do |line|
puts "Hello, how are you ?" if line =~ /#{pattern}/
puts "#{line}"
end
puts "\n\n#### Insert text after a pattern"
pattern = 'meaning without'
File.open("file.txt", "r").each_line do |line|
found = 'no'
if line =~ /#{pattern}/
puts "#{line}"
puts "Hello, how are you ?"
found = 'yes'
end
puts "#{line}" if found == 'no'
end
puts "\n\n#### Insert text at a particular line"
insert_at_line = 3
line_number = 1
File.open("file.txt", "r").each_line do |line|
puts "Hello, how are you ?" if line_number == insert_at_line
line_number += 1
puts "#{line}"
end
root#box27:~#

i am getting a 50 different loops while including a variable in a string using ruby

I'm trying to get a string to run and print on a seperate page with a certain string and a variable concatenated. i thought i had the code right but all i get is a loop fifty timesthis is the code that i am using
f = File.open("urlfile.txt", "r")
line = ""
while (line = f.gets)
puts "<outline text=\"\" type=\"link\" url=\""+File.read("urlfile.txt")+"\" dateCreated=\"\"/>"
end
f.close
then this is what its spitting out
a loop that runs for about 50 times
http://washingtondc.craigslist.org
http://westpalmbeach.craigslist.org
http://westpalmbeach.craigslist.org
http://westslope.craigslist.org
http://westslope.craigslist.org
http://yubasutter.craigslist.org
http://yubasutter.craigslist.org
http://yuma.craigslist.org
http://yuma.craigslist.org
" dateCreated=""/>
and this is what the code should look like when it is spit out
You are re-reading the file again inside the loop; as hinted by #Mark you have to be using line inside the string interpolation.
Aside: Perhaps its better to refactor the code to idiomatic Ruby; consider the following for instance:
lines = File.open('urlfile.txt', 'r').readlines
lines.each do |line|
puts %|<outline text="" type="link" url="#{line.strip}" dateCreated=""/>|
end
Jikku's answer is correct but if the file is large readlines will be expensive. In such case read file line by line (as you are already trying to do). Here is the correct code:
f = File.open("urlfile.txt", "r")
while (line = f.gets)
puts "<outline text=\"\" type=\"link\" url=\""+line.strip+"\" dateCreated=\"\"/>"
end
f.close
Its all your code, there are only two things that I've changed.
You had predefined line which was incorrect.
While you were already using line to iterate over your file line by line, by doing File.read("urlfile.txt") you were dumping the whole file again in each iteration. Hence "so many loops" as you described in your question.

How far does .each read? To the end of the line?

Sorry for the newbie question. Was loading a .txt file into the following code:
line_count = 0
File.open("text.txt").each {|line| line_count += 1}
puts line_count
Does .each simply read until the end of a line before passing its value to the code block? Little explanation would be great. Thanks!
You can use .each_line to be more explicit, but yes, http://www.ruby-doc.org/core-2.0.0/IO.html#method-i-each each reads a line.
f = File.new("testfile")
f.each {|line| puts "#{f.lineno}: #{line}" }
It's really important to read the documentation, because all sorts of things are explained there. For instance, the documentation for each says:
Executes the block for every line in ios, where lines are separated by sep.
sep means "\r", "\n" or "\r\n", depending on the OS the code is running on which is also the value of the special $/ global variable which contains the default line-ending character for that OS. You can tell Ruby to use a different value for the line-end/separator if you know the file uses something else.
Regarding your code:
I'd do it this way:
line_count = 0
File.foreach("text.txt") do |line|
line_count += 1
end
puts line_count
foreach is very self-explanatory, which is important when writing code. You want it to be self-documenting as much as possible. foreach iterates over "each" line in the file. It also assumes the line-ends are the same as $/, but you can force it to be something different, perhaps the letter "z" or "." or " ", depending on your whim and fancy at the moment.

Ruby regex gsub a line in a text file

I need to match a line in an inputted text file string and wrap that captured line with a character for example.
For example imagine a text file as such:
test
foo
test
bar
I would like to use gsub to output:
XtestX
XfooX
XtestX
XbarX
I'm having trouble matching a line though. I've tried using regex starting with ^ and ending with $, but it doesn't seem to work. Any ideas?
I have a text file that has the following in it:
test
foo
test
bag
The text file is being read in as a command line argument.
So I got
string = IO.read(ARGV[0])
string = string.gsub(/^(test)$/,'X\1X')
puts string
It outputs the exact same thing that is in the text file.
If you're trying to match every line, then
gsub(/^.*$/, 'X\&X')
does the trick. If you only want to match certain lines, then replace .* with whatever you need.
Update:
Replacing your gsub with mine:
string = IO.read(ARGV[0])
string = string.gsub(/^.*$/, 'X\&X')
puts string
I get:
$ gsub.rb testfile
XtestX
XfooX
XtestX
XbarX
Update 2:
As per #CodeGnome, you might try adding chomp:
IO.readlines(ARGV[0]).each do |line|
puts "X#{line.chomp}X"
end
This works equally well for me. My understanding of ^ and $ in regular expressions was that chomping wouldn't be necessary, but maybe I'm wrong.
You can do it in one line like this:
IO.write(filepath, File.open(filepath) {|f| f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)})
IO.write truncates the given file by default, so if you read the text first, perform the regex String.gsub and return the resulting string using File.open in block mode, it will replace the file's content in one fell swoop.
I like the way this reads, but it can be written in multiple lines too of course:
IO.write(filepath, File.open(filepath) do |f|
f.read.gsub(//<appId>\d+<\/appId>/, "<appId>42</appId>"/)
end
)
If your file is input.txt, I'd do as following
File.open("input.txt") do |file|
file.lines.each do |line|
puts line.gsub(/^(.*)$/, 'X\1X')
end
end
(.*) allows to capture any characters and makes it a variable Regexp
\1 in the string replacement is that captured group
If you prefer to do it in one line on the whole content, you can do it as following
File.read("input.txt").gsub(/^(.*)$/, 'X\1X')
string.gsub(/^(matchline)$/, 'X\1X')
Uses a backreference (\1) to get the first capture group of the regex, and surround it with X
Example:
string = "test\nfoo\ntest\nbar"
string.gsub!(/^test$/, 'X\&X')
p string
=> "XtestX\nfoo\nXtestX\nbar"
Chomp Line Endings
Your lines probably have newline characters. You need to handle this one way or another. For example, this works fine for me:
$ ruby -ne 'puts "X#{$_.chomp}X"' /tmp/corpus
XtestX
XfooX
XtestX
XbarX

Resources