How to delete lines from multiple files - ruby

I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).
If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.
list.txt:
car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........
Directory having multiple files: d:\new_work:
Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......
My code:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
The issue is, values are not getting deleted as expected. Also, text is empty when I tried to print the value.

There are a number of things wrong with your code, and you're not safely handling your file changes.
Meditate on this untested code:
ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")
File.foreach('D:/mywork/list.txt') do |target|
target = target.strip.sub(/,$/, '')
ACCESS_FILES.each do |filename|
new_filename = "#{filename}.new"
old_filename = "#{filename}.old"
File.open(new_filename, 'w') do |fileout|
File.foreach(filename) do |line_in|
fileout.puts line_in unless line_in[target]
end
end
File.rename(filename, old_filename)
File.rename(new_filename, filename)
File.delete(old_filename)
end
end
In your code you use:
File.open('D:\\mywork\\list.txt').read
instead, a shorter, and more concise and clear way would be to use:
File.read('D:/mywork/list.txt')
Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".
The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.
Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
While
Dir.glob("D:/new_work/*-access.txt") do |fn|
is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.
Again,
text = File.read(fn)
has scalability issues. Using foreach is a better solution. Again.
Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:
replace = text.gsub(line.strip, "")
Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:
File.open(fn, "w") { |file| file.puts replace }
A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.
A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.

I just ran your code and it works as expected on my machine. My best guess is that you're not taking the commas at the end of each line in list.txt into account. Try removing them with an extra chomp!:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
line.chomp!(",")
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
By the way, you shouldn't need this line: value.gsub!(/\r\n?/, "\n") since you're chomping all the newlines away anyway, and chomp can recognize \r\n by default.

Related

Check the formatting of an entire file using regex

I have a file formatted by lines like this (I know it's a terrible format, I didn't write it):
id: 12345 synset: word1,word2
I want to read the entire file and check to see if every line is correct without having to look line by line.
I've looked into File and Regex, but couldn't find what I need. I tried to use File.read to read the entire file all at once, then use m modifier for regex to check multiple lines, but it's not working the way I anticipated (perhaps it's not what I need).
p.s. Ruby newbie :)
Assuming your file always ends with a newline, this should work:
/^(id: \d+ synset: \w+,\w+\n)+$/m
The full ruby:
content = ''
File.open('myfile.txt', 'r') { |f| content = f.read }
puts 'file is valid!' if content =~ /^(id: \d+ synset: \w+,\w+\n)+$/m
You can use this regex to check each line of the file: ^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$. You can try the following code, but I don't know any Ruby, I just searched and tested a little. It might work.
line_num = 0
text = File.open('file.txt').read
text.each_line do |line|
line_num += 1
if !/^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$/.match(line)
print "Line #{line_num} is incorrect"
end
end

Writing Arrays using grep

I'm trying to search through a specified string and assign the results to an array.
Opening and writing to "input.txt" and "ms3.txt" files works fine. putting a normal string like reassign << "hello" in works fine its just when i use line.grep and the regex following it prints nothing to the console or the ms3 file it doesn't even throw up any errors
i've also tried a search and replace: reassign << line.gsub(/[abc]/, '£')
Here's the code
# encoding: utf-8
#!/usr/bin/ruby
file = File.open("input.txt", "w+")
reassign = []
file.each_line do |line|
reassign << line.grep(/[abc]/)
end
new_file = File.open("ms3.txt", "w+")
new_file.puts(reassign)
new_file.close
Your code can be streamlined a lot to make it more Ruby-like, and to make it behave better:
# encoding: utf-8
#!/usr/bin/ruby
file = File.open("input.txt", "w+")
reassign = []
file.each_line do |line|
reassign << line.grep(/[abc]/)
end
new_file = File.open("ms3.txt", "w+")
new_file.puts(reassign)
new_file.close
#! lines have to be first, so reverse the encoding and "slash-bang" lines.
open takes a block, which will allow Ruby to automatically close the file when the block exits. This is a very powerful, and smart thing to do, as it keeps your file I/O environment clean. It's possible, and often a problem as in your code, to open files but never close them, which can exhaust all the file handles on a machine if it's done in a loop, causing all apps to fail. Using the block form will avoid this.
The IO class has foreach, which makes it simple to iterate over the lines of a file. Take advantage of it instead of opening the file then using each_line, because it simplifies your code.
Here's how I'd initially write your code:
#!/usr/bin/ruby
# encoding: utf-8
reassign = []
File.foreach("input.txt") do |line|
reassign << line[/[abc]/]
end
File.write("ms3.txt", reassign.join("\n"))
But, after refactoring it I'd end up with:
#!/usr/bin/ruby
# encoding: utf-8
File.open('ms3.txt', 'w') do |fo|
fo.puts File.foreach('input.txt').grep(/[abc/])
end
The open opens the output file using a block to take advantage of automatically closing the file when the block exits.
foreach is an iterator, and normally is used with a block to pass each line read into the block. Instead, I'm letting grep read all the lines found and search for the pattern.
Any lines found by grep that match the pattern are returned as an array to puts which will iterate over them, appending "\n" to the end of each.
fo.puts directs the output of puts to the output file.
end causes the block to exit, which causes open to close the file.
That's untested but looks correct.
There are several issues with your code:
You open "input.txt" with open mode "w+". According to the documentation, this truncates your file to zero length. An empty file doesn't contain any lines and therefore, file.each_line doesn't invoke the block.
If you want to read from the file, use "r", which is the default:
file = File.open("input.txt")
You don't close file. Use the block form which closes the file automatically:
File.open("input.txt") do |file|
# ...
end
line is a String and there's no String#grep method. But since File includes Enumerable, you can use Enumerable#grep instead:
reassign = file.grep(/[abc]/)
A complete example:
File.open("input.txt") do |file|
reassign = file.grep(/[abc]/)
File.open("ms3.txt", "w+") do |new_file|
new_file.puts(reassign)
end
end

How to print from specific column range?

I want to grab only the first line of columns 46 to 245 of source.txt and write it to output.txt
source_file.each { |line|
File.open(output_file,"a+") { |f|
f.print ???
}
Bonus: I also need to keep a count of the number of characters in this range, as some may be whitespace. i.e. 38 characters and the rest whitespace.
Example:
source_file: (first line only, columns 45 to 245): 13287912721981239854 + 180 blank columns
output_file: 13287912721981239854
count = 20 characters
Update: appending [46..245].delete(' ').size gives me the desired count.
If I am understanding what you are asking correctly, there's no reason to grab the whole file when you only want the first line. If this isn't what you're asking for, then you need to specify what you're trying to pull out of the source file more clearly.
This should grab the data you need:
output_line = source_file.gets [45..244]
If you write:
source_file.each { |line|
File.open(output_file,"a+") { |f|
f.print ???
}
}
You will open, then close, your output file for each line read from the output file. That is the wrong way to do it, even if you only want to read one line of input.
Instead try something like one of these:
File.open(output_file, 'a') do |fo|
File.open('path/to/input_file') do |fi|
fo.puts fi.readline[46..245]
end
end
This uses IO.readline, which reads a single line from the file. The block falls through afterwards, causing both the input and output files to be closed automatically. Also, it opens the output file as 'a' which is append-mode only. 'a+' is wrong unless you intend to append and read, which is rarely done. From the documentation:
"a+" Read-write, starts at end of file if file exists,
otherwise creates a new file for reading and
writing
Or:
File.open(output_file, 'a') do |fo|
File.foreach('path/to/input_file') do |li|
fo.puts li[46..245]
break
end
end
foreach is used most often when we're reading a file line-by-line. It's the mainstay for reading files in a scalable manner. It wants to loop over the file inside the block, which is why break is there, to break out of that loop.
Or:
File.foreach('path/to/input_file') do |li|
File.write(output_file, li[46..245], -1, :mode => 'a')
break
end
File.write is useful when you have a blob of text or binary, and want to write it in one chunk, then move on. The -1 tells Ruby to move to the end of the file. :mode => 'a' overrides the default mode which would normally truncate an existing file.
Maybe this will do the job:
line = f.readline
columns = line.split
File.open("output.txt", "w") do |out|
columns[46, (245 - 46 + 1)].each do |column|
out.puts column
end
end
break # only process first line
I have used 245 - 46 + 1 to indicate this is the number of columns we are interested in. I have also assumed that columns are separate by whitespaces. If that is not the case you will need to change the delimiter of split.

Deleting a specific line in a text file?

How can I delete a single, specific line from a text file? For example the third line, or any other line. I tried this:
line = 2
file = File.open(filename, 'r+')
file.each { last_line = file.pos unless file.eof? }
file.seek(last_line, IO::SEEK_SET)
file.close
Unfortunately, it does nothing. I tried a lot of other solutions, but nothing works.
I think you can't do that safely because of file system limitations.
If you really wanna do a inplace editing, you could try to write it to memory, edit it, and then replace the old file. But beware that there's at least two problems with this approach. First, if your program stops in the middle of rewriting, you will get an incomplete file. Second, if your file is too big, it will eat your memory.
file_lines = ''
IO.readlines(your_file).each do |line|
file_lines += line unless <put here your condition for removing the line>
end
<extra string manipulation to file_lines if you wanted>
File.open(your_file, 'w') do |file|
file.puts file_lines
end
Something along those lines should work, but using a temporary file is a much safer and the standard approach
require 'fileutils'
File.open(output_file, "w") do |out_file|
File.foreach(input_file) do |line|
out_file.puts line unless <put here your condition for removing the line>
end
end
FileUtils.mv(output_file, input_file)
Your condition could be anything that showed it was the unwanted line, like, file_lines += line unless line.chomp == "aaab" for example, would remove the line "aaab".

How to read lines of a file in Ruby

I was trying to use the following code to read lines from a file. But when reading a file, the contents are all in one line:
line_num=0
File.open('xxx.txt').each do |line|
print "#{line_num += 1} #{line}"
end
But this file prints each line separately.
I have to use stdin, like ruby my_prog.rb < file.txt, where I can't assume what the line-ending character is that the file uses. How can I handle it?
Ruby does have a method for this:
File.readlines('foo').each do |line|
puts(line)
end
http://ruby-doc.org/core-1.9.3/IO.html#method-c-readlines
File.foreach(filename).with_index do |line, line_num|
puts "#{line_num}: #{line}"
end
This will execute the given block for each line in the file without slurping the entire file into memory. See: IO::foreach.
I believe my answer covers your new concerns about handling any type of line endings since both "\r\n" and "\r" are converted to Linux standard "\n" before parsing the lines.
To support the "\r" EOL character along with the regular "\n", and "\r\n" from Windows, here's what I would do:
line_num=0
text=File.open('xxx.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
print "#{line_num += 1} #{line}"
end
Of course this could be a bad idea on very large files since it means loading the whole file into memory.
Your first file has Mac Classic line endings (that’s "\r" instead of the usual "\n"). Open it with
File.open('foo').each(sep="\r") do |line|
to specify the line endings.
I'm partial to the following approach for files that have headers:
File.open(file, "r") do |fh|
header = fh.readline
# Process the header
while(line = fh.gets) != nil
#do stuff
end
end
This allows you to process a header line (or lines) differently than the content lines.
It is because of the endlines in each lines.
Use the chomp method in ruby to delete the endline '\n' or 'r' at the end.
line_num=0
File.open('xxx.txt').each do |line|
print "#{line_num += 1} #{line.chomp}"
end
how about gets ?
myFile=File.open("paths_to_file","r")
while(line=myFile.gets)
//do stuff with line
end
Don't forget that if you are concerned about reading in a file that might have huge lines that could swamp your RAM during runtime, you can always read the file piece-meal. See "Why slurping a file is bad".
File.open('file_path', 'rb') do |io|
while chunk = io.read(16 * 1024) do
something_with_the chunk
# like stream it across a network
# or write it to another file:
# other_io.write chunk
end
end

Resources