Delete first two lines and add two lines to file - ruby

I have a text file that starts with:
Title
aaa
bbb
ccc
I don't know what the line would include, but I know that the structure of the file will be Title, then an empty line, then the actual lines. I want to modify it to:
New Title
fff
aaa
bbb
ccc
I had this in mind:
lineArray = File.readlines(destinationFile).drop(2)
lineArray.insert(0, 'fff\n')
lineArray.insert(0, '\n')
lineArray.insert(0, 'new Title\n')
File.writelines(destinationFile, lineArray)
but writelines doesn't exist.
`writelines' for File:Class (NoMethodError)
Is there a way to delete the first two lines of the file an add three new lines?

I'd start with something like this:
NEWLINES = {
0 => "New Title",
1 => "\nfff"
}
File.open('test.txt.new', 'w') do |fo|
File.foreach('test.txt').with_index do |li, ln|
fo.puts (NEWLINES[ln] || li)
end
end
Here's the contents of test.txt.new after running:
New Title
fff
aaa
bbb
ccc
The idea is to provide a list of replacement lines in the NEWLINES hash. As each line is read from the original file the line number is checked in the hash, and if the line exists then the corresponding value is used, otherwise the original line is used.
If you want to read the entire file then substitute, it reduces the code a little, but the code will have scalability issues:
NEWLINES = [
"New Title",
"",
"fff"
]
file = File.readlines('test.txt')
File.open('test.txt.new', 'w') do |fo|
fo.puts NEWLINES
fo.puts file[(NEWLINES.size - 1) .. -1]
end
It's not very smart but it'll work for simple replacements.
If you really want to do it right, learn how diff works, create a diff file, then let it do the heavy lifting, as it's designed for this sort of task, runs extremely fast, and is used millions of times every day on *nix systems around the world.

Use put with the whole array:
File.open("destinationFile", "w+") do |f|
f.puts(lineArray)
end

If your files are big, the performance and memory implications of reading them into memory in their entirety are worth thinking about. If that's a concern, then your best bet is to treat the files as streams. Here's how I would do it.
First, define your replacement text:
require "stringio"
replacement = StringOI.new <<END
New Title
fff
END
I've made this a StringIO object, but it could also be a File object if your replacement text is in a file.
Now, open your destination file (a new file) and write each line from the replacement text into it.
dest = File.open(dest_fn, 'wb') do |dest|
replacement.each_line {|ln| dest << ln }
We could have done this more efficiently, but there's a good reason to do it this way: Now we can call replacement.lineno to get the number of lines read, instead of iterating over it a second time to count the lines.
Next, open the original file and seek ahead by calling gets replacement.lineno times:
orig = File.open(orig_fn, 'r')
replacement.lineno.times { orig.gets }
Finally, write the remaining lines from the original file to the new file. We'll do it more efficiently this time with File.copy_stream:
File.copy_stream(orig, dest)
orig.close
dest.close
That's it. Of course, it's a drag closing those files manually (and when we do we should do it in an ensure block), so it's better to use the block form of File.open to automatically close them. Also, we can move the orig.gets calls into the replacement.each_line loop:
File.open(dest_fn, 'wb') do |dest|
File.open(orig_fn, 'r') do |orig|
replacement.each_line {|ln| dest << ln; orig.gets }
File.copy_stream(orig, dest)
end
end

First create an input test file.
FNameIn = "test_in"
text = <<_
Title
How now,
brown cow?
_
#=> "Title\n\nHow now,\nbrown cow?\n"
File.write(FNameIn, text)
#=> 27
Now read and write line-by-line.
FNameOut = "test_out"
File.open(FNameIn) do |fin|
fin.gets; fin.gets
File.open(FNameOut, 'w') do |fout|
fout.puts "New Title"
fout.puts
fout.puts "fff"
until fin.eof?
fout.puts fin.gets
end
end
end
Check the result:
puts File.read(FNameOut)
# New Title
#
# fff
# How now,
# brown cow?
Ruby will close each of the two files when its block terminates.
If the files are not large, you could instead write:
File.write(FNameOut,
["New Title\n", "\n", "fff\n"].concat(File.readlines(FNameIn).drop(2)).join)

Related

How to delete lines from multiple files

I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).
If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.
list.txt:
car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........
Directory having multiple files: d:\new_work:
Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......
My code:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
The issue is, values are not getting deleted as expected. Also, text is empty when I tried to print the value.
There are a number of things wrong with your code, and you're not safely handling your file changes.
Meditate on this untested code:
ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")
File.foreach('D:/mywork/list.txt') do |target|
target = target.strip.sub(/,$/, '')
ACCESS_FILES.each do |filename|
new_filename = "#{filename}.new"
old_filename = "#{filename}.old"
File.open(new_filename, 'w') do |fileout|
File.foreach(filename) do |line_in|
fileout.puts line_in unless line_in[target]
end
end
File.rename(filename, old_filename)
File.rename(new_filename, filename)
File.delete(old_filename)
end
end
In your code you use:
File.open('D:\\mywork\\list.txt').read
instead, a shorter, and more concise and clear way would be to use:
File.read('D:/mywork/list.txt')
Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".
The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.
Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
While
Dir.glob("D:/new_work/*-access.txt") do |fn|
is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.
Again,
text = File.read(fn)
has scalability issues. Using foreach is a better solution. Again.
Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:
replace = text.gsub(line.strip, "")
Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:
File.open(fn, "w") { |file| file.puts replace }
A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.
A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.
I just ran your code and it works as expected on my machine. My best guess is that you're not taking the commas at the end of each line in list.txt into account. Try removing them with an extra chomp!:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
line.chomp!(",")
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
By the way, you shouldn't need this line: value.gsub!(/\r\n?/, "\n") since you're chomping all the newlines away anyway, and chomp can recognize \r\n by default.

Replace a specific line in a file using Ruby

I have a text file (a.txt) that looks like the following.
open
close
open
open
close
open
I need to find a way to replace the 3rd line with "close". I did some search and most method involve searching for the line than replace it. Can't really do it here since I don't want to turn all the "open" to "close".
Essentially (for this case) I'm looking for a write version of IO.readlines("./a.txt") [2].
How about something like:
lines = File.readlines('file')
lines[2] = 'close' << $/
File.open('file', 'w') { |f| f.write(lines.join) }
str = <<-_
my
dog
has
fleas
_
FNameIn = 'in'
FNameOut = 'out'
First, let's write str to FNameIn:
File.write(FNameIn, str)
#=> 17
Here are a couple of ways to replace the third line of FNameIn with "had" when writing the contents of FNameIn to FNameOut.
#1 Read a line, write a line
If the file is large, you should read from the input file and write to the output file one line at a time, rather than keeping large strings or arrays of strings in memory.
fout = File.open(FNameOut, "w")
File.foreach(FNameIn).with_index { |s,i| fout.puts(i==2 ? "had" : s) }
fout.close
Let's check that FNameOut was written correctly:
puts File.read(FNameOut)
my
dog
had
fleas
Note that IO#puts writes a record separator if the string does not already end with a record separator.1. Also, if fout.close is omitted FNameOut is closed when fout goes out of scope.
#2 Use a regex
r = /
(?:[^\n]*\n) # Match a line in a non-capture group
{2} # Perform the above operation twice
\K # Discard all matches so far
[^\n]+ # Match next line up to the newline
/x # Free-spacing regex definition mode
File.write(FNameOut, File.read(FNameIn).sub(r,"had"))
puts File.read(FNameOut)
my
dog
had
fleas
1 File.superclass #=> IO, so IO's methods are inherited by File.

Work with strings and arrays in file in Ruby

I have a textfile ("dict.txt") with 8K+ English words:
apple -- description text
angry -- description text
bear -- description text
...
I need to delete all text after "--" on each line of my file.
What is the easiest and fastest way to solve this problem?
Starting with:
words = [
'apple -- description text',
'angry -- description text',
'bear -- description text',
]
If you want just the words preceeding --:
words.map{ |w| w.split(/\s-+\s/).first } # => ["apple", "angry", "bear"]
Or:
words.map{ |w| w[/^(.+) --/, 1] } # => ["apple", "angry", "bear"]
If you want the words AND --:
words.map{ |w| w[/^(.+ --)/, 1] } # => ["apple --", "angry --", "bear --"]
If the goal is to create a version of the file without the descriptions:
File.open('new_dict.txt', 'w') do |fo|
File.foreach('dict.txt') do |li|
fo.puts li.split(/\s-+\s/).first
end
end
In general, to avoid scalability problems if/when your input file grows to huge proportions, use foreach to iterate over the input file and process it as single lines. It's a wash as far as processing speed goes when iterating line-by-line or trying to slurp it all in and process as a buffer or an array. Slurping a huge file can slow a machine to a crawl or crash your code making it infinitely slower; Line-by-line IO is surprising fast and without that potential problem.
File.read("dict.txt").gsub(/(?<=--).*/, "")
output
apple --
angry --
bear --
...
lines_without_description = File.read('dict.txt').lines.map{|line| line[0..line.index('-')+1]}
File.open('dict2.txt', 'w'){|f| f.write(lines_without_description.join("\n"))}
If you want speed, you might want to think about doing it with sed on the command line:
sed -r 's/(.*?) -- .*/\1/g' < dict.txt > new_dict.txt
This creates a new file new_dict.txt containing only the words.

How to print from specific column range?

I want to grab only the first line of columns 46 to 245 of source.txt and write it to output.txt
source_file.each { |line|
File.open(output_file,"a+") { |f|
f.print ???
}
Bonus: I also need to keep a count of the number of characters in this range, as some may be whitespace. i.e. 38 characters and the rest whitespace.
Example:
source_file: (first line only, columns 45 to 245): 13287912721981239854 + 180 blank columns
output_file: 13287912721981239854
count = 20 characters
Update: appending [46..245].delete(' ').size gives me the desired count.
If I am understanding what you are asking correctly, there's no reason to grab the whole file when you only want the first line. If this isn't what you're asking for, then you need to specify what you're trying to pull out of the source file more clearly.
This should grab the data you need:
output_line = source_file.gets [45..244]
If you write:
source_file.each { |line|
File.open(output_file,"a+") { |f|
f.print ???
}
}
You will open, then close, your output file for each line read from the output file. That is the wrong way to do it, even if you only want to read one line of input.
Instead try something like one of these:
File.open(output_file, 'a') do |fo|
File.open('path/to/input_file') do |fi|
fo.puts fi.readline[46..245]
end
end
This uses IO.readline, which reads a single line from the file. The block falls through afterwards, causing both the input and output files to be closed automatically. Also, it opens the output file as 'a' which is append-mode only. 'a+' is wrong unless you intend to append and read, which is rarely done. From the documentation:
"a+" Read-write, starts at end of file if file exists,
otherwise creates a new file for reading and
writing
Or:
File.open(output_file, 'a') do |fo|
File.foreach('path/to/input_file') do |li|
fo.puts li[46..245]
break
end
end
foreach is used most often when we're reading a file line-by-line. It's the mainstay for reading files in a scalable manner. It wants to loop over the file inside the block, which is why break is there, to break out of that loop.
Or:
File.foreach('path/to/input_file') do |li|
File.write(output_file, li[46..245], -1, :mode => 'a')
break
end
File.write is useful when you have a blob of text or binary, and want to write it in one chunk, then move on. The -1 tells Ruby to move to the end of the file. :mode => 'a' overrides the default mode which would normally truncate an existing file.
Maybe this will do the job:
line = f.readline
columns = line.split
File.open("output.txt", "w") do |out|
columns[46, (245 - 46 + 1)].each do |column|
out.puts column
end
end
break # only process first line
I have used 245 - 46 + 1 to indicate this is the number of columns we are interested in. I have also assumed that columns are separate by whitespaces. If that is not the case you will need to change the delimiter of split.

How to get a particular line from a file

Is it possible to extract a particular line from a file knowing its line number? For example, just get the contents of line N as a string from file "text.txt"?
You could get it by index from readlines.
line = IO.readlines("file.txt")[42]
Only use this if it's a small file.
Try one of these two solutions:
file = File.open "file.txt"
#1 solution would eat a lot of RAM
p [*file][n-1]
#2 solution would not
n.times{ file.gets }
p $_
file.close
def get_line_from_file(path, line)
result = nil
File.open(path, "r") do |f|
while line > 0
line -= 1
result = f.gets
end
end
return result
end
get_line_from_file("/tmp/foo.txt", 20)
This is a good solution because:
You don't use File.read, thus you don't read the entire file into memory. Doing so could become a problem if the file is 20MB large and you read often enough so GC doesn't keep up.
You only read from the file until the line you want. If your file has 1000 lines, getting line 20 will only read the 20 first lines into Ruby.
You can replace gets with readline if you want to raise an error (EOFError) instead of returning nil when passing an out-of-bounds line.
File has a nice lineno method.
def get_line(filename, lineno)
File.open(filename,'r') do |f|
f.gets until f.lineno == lineno - 1
f.gets
end
end
linenumber=5
open("file").each_with_index{|line,ind|
if ind+1==linenumber
save=line
# break or exit if needed.
end
}
or
linenumber=5
f=open("file")
while line=f.gets
if $. == linenumber # $. is line number
print "#{f.lineno} #{line}" # another way
# break # break or exit if needed
end
end
f.close
If you just want to get the line and do nothing else, you can use this one liner
ruby -ne '(print $_ and exit) if $.==5' file
If you want one liner and do not care about memory usage, use (assuming lines are numbered from 1)
lineN = IO.readlines('text.txt')[n-1]
or
lineN = f.readlines[n-1]
if you already have file opened.
Otherwise it would be better to do like this:
lineN = File.open('text.txt') do |f|
(n-1).times { f.gets } # skip lines preceeding line N
f.gets # read line N contents
end
These solutions work if you want only one line from a file, or if you want multiple lines from a file small enough to be read repeatedly. Large files (for example, 10 million lines) take much longer to search for a specific line so it's better to get the necessary lines sequentially in a single read so the large file doesn't get read multiple times.
Create a large file:
File.open('foo', 'a') { |f| f.write((0..10_000_000).to_a.join("\n")) }
Pick which lines will be read from it and make sure they're sorted:
lines = [9_999_999, 3_333_333, 6_666_666].sort
Print out those lines:
File.open('foo') do |f|
lines.each_with_index do |line, index|
(line - (index.zero? ? 0 : lines[index - 1]) - 1).times { f.gets }
puts f.gets
end
end
This solution works for any number of lines, does not load the entire file into memory, reads as few lines as possible, and only reads the file one time.

Resources