writing in specific position in a text file using ruby - ruby

I want to insert data in specific positions in a text file, like in line 1 starting from position 10, how can I do it using ruby?
I also want to pass fake data into this file using fakker gem or in any other way possible. Like sending phone number, name, SSN etc.

Here's a sample script that takes two arguments and writes a modified copy of the first file's contents to the second file:
require 'faker'
input = File.open(ARGV[0], 'r')
lines = input.readlines
lines[0].gsub!(/^(.{0,10})/, '\1' + Faker::Base.numerify('###').to_s)
output = File.open(ARGV[1], 'w')
lines.each do |line|
output.write(line)
end
If you have an input file that looks like:
12345678901234567890
^^^ fake data
the output might look like:
12345678909451234567890
^^^ fake data
Since I opened the output file after reading the input file, you can pass the same file name as both the first and the second argument. That isn't exactly inserting the string into the file, but it's as close as you'll get.
The key line is:
lines[0].gsub!(/^(.{0,10})/, '\1' + Faker::Base.numerify('###').to_s)
It takes the fist line and substitutes in place a random 3-digit integer. If there are fewer than 10 characters in the first line, it'll append the random data to the end of the line. If you'd prefer to not substitute, you might want to remove the beginning of the range in the regex:
/^(.{10})/
Or maybe do something else if lines[0].length < 10.

Related

how to overwrite part of a line in a txt file with regex and .sub in ruby

I have the following layout in a txt file.
[item] label1: comment1 | label2: foo
I have the code below. The goal is to modify part of an existing line in text
def replace_info(item, bar)
return "please create a file first" unless File.exist?('site_info.txt')
IO.foreach('site_info.txt','a+') do |line|
if line.include?(item)
#regex should find the data from the whitespace after the colon all the way to the end.
#this should be equivalent to foo
foo_string = line.scan(/[^"label2: "]*\z/)
line.sub(foo_string, bar)
end
end
end
Please advise. Perhaps my regrex is off, but .sub is correct, but I cannot overwrite line.
Tiny problem: Your regular expression does not do what you think. /[^"label2: "]*\z/ means: any number of characters at the end of line that are not a, b, e, l, ", space, colon or 2 (see Character classes). And scan returns an array, which sub doesn't work with. But that doesn't really matter, because...
Small problem: line.sub(foo_string, bar) doesn't do anything. It returns a changed string, but you don't assign it to anything and it gets thrown away. line.sub!(foo_string, bar) would change line itself, but that leads us to...
Big problem: You cannot just change the read line and expect it to change in the file itself. It's like reading a book, thinking you could write a line better, and expecting it to change the book. The way to change a line in a text file is to read from one file and copy what you read to another. If you change a line between reading and writing, the newly written copy will be different. At the end, you can rename the new file to the old file (which will delete the old file and replace it atomically with the new one).
EDIT: Here's some code. First, I dislike IO.foreach as I like to control the iteration myself (and IMO, IO.foreach is not readable as IO#each_line). In the regular expression, I used lookbehind to find the label without including it into the match, so I can replace just the value; I changed to \Z for a similar reason, to exclude the newline from the match. You should not be returning error messages from functions, that's what exceptions are for. I changed simple include? to #start_with? because your item might be found elsewhere in the line when we wouldn't want to trigger the change.
class FileNotFoundException < RuntimeError; end
def replace_info(item, bar)
# check if file exists
raise FileNotFoundException unless File.exist?('site_info.txt')
# rewrite the file
File.open('site_info.txt.bak', 'wt') do |w|
File.open('site_info.txt', 'rt') do |r|
r.each_line do |line|
if line.start_with?("[#{item}]")
line.sub!(/(?<=label2: ).*?\Z/, bar)
end
w.write(line)
end
end
end
# replace the old file
File.rename('site_info.txt.bak', 'site_info.txt')
end
replace_info("item", "bar")

Ruby file output issue

I have written some ruby to automate batch file creation, the problem lies with the resulting output in the GUI;
The files are outputted, but the formatting looks very strange indeed. Also the filenames are all ending in '.txt' but MacOS does not see it this way. i.e. You cannot click to open in Textedit.
Code is as follows;
puts "Please enter amount of files to create: "
file_count = gets.to_i
puts "Thanks! Enter a filename header: "
file_head = gets
puts "And a suffix?"
suffix = gets
puts "Please input your target directory:"
Dir.chdir(gets.chomp)
while file_count != 0
filename = "#{file_head}_#{file_count}#{suffix}"
File.open(filename, "w") {|x| x.write("This is #{filename}.")}
file_count -= 1
end
Tips on shortening length or refactoring are always welcome.
The Kernel#gets documentation contains:
The separator is included with the contents of each record.
By default the separator is a newline (see $/). So both file_head and suffix end with a newline character. filename also does, of course. Thus the extension of your files isn't .txt as it's actually ".txt\n" (in Ruby string notation). The application takes the newline character literally and continues writing the filename on a new line. That's why it looks so strange!
You already know a way to fix it: call String#chomp to get rid of the trailing newline (the separator). See the line in your code that contains Dir.chdir for an example.

Ruby scan/gets until EOF

I want to scan unknown number of lines till all the lines are scanned. How do I do that in ruby?
For ex:
put returns between paragraphs
for linebreak add 2 spaces at end
_italic_ or **bold**
The input is not from a 'file' but through the STDIN.
Many ways to do that in ruby.
Most usually, you're gonna wanna process one line at a time, which you can do, for example, with
while line=gets
end
or
STDIN.each_line do |line|
end
or by running ruby with the -n switch, for example, which implies one of the above loops (line is being saved into $_ in each iteration, and you can addBEGIN{}, and END{}, just like in awk—this is really good for one-liners).
I wouldn't do STDIN.read, though, as that will read the whole file into memory at once (which may be bad, if the file is really big.)
Use IO#read (without length argument, it reads until EOF)
lines = STDIN.read
or use gets with nil as argument:
lines = gets(nil)
To denote EOF, type Ctrl + D (Unix) or Ctrl + Z (Windows).

Why must I .read() a file I wrote before being able to actually output the content to the terminal?

I am learning Ruby and am messing with reading/writing files right now. When I create the file, 'filename', I can write to it with the .write() method. However, I cannot output the content to the terminal without reopening it after running .read() on it (see line 8: puts write_txt.read()). I have tried running line 8 multiple times, but all that does is output more blank lines. Without line 8, puts txt.read() simply outputs a blank line. The following code also works without the puts in line 8 (simply write_txt.read())
# Unpacks first argument to 'filename'
filename = ARGV.first
# Lets write try writing to a file
write_txt = File.new(filename, 'w+')
write_txt.write("OMG I wrote this file!\nHow cool is that?")
# This outputs a blank line THIS IS THE LINE IN QUESTION
puts write_txt.read()
txt = File.open(filename)
# This actually outputs the text that I wrote
puts txt.read()
Why is this necessary? Why is the file that has clearly been written to being read as blank until it is reopened after being read as blank at least once?
When you read or write to a file, there's an internal pointer called a "cursor" that keeps track of where in the file you currently are. When you write a file, the cursor is set to the point after the last byte you wrote, so that if you perform additional writes, they happen after your previous write (rather than on top of it). When you perform a read, you are reading from the current position to the end of the file, which contains...nothing!
You can open a file (cursor position 0), then write the string "Hello" (cursor position 6), and attempting to read from the cursor will cause Ruby to say "Oh hey, there's no more content in this file past cursor position 6", and will simply return a blank string.
You can rewind the file cursor with IO#rewind to reset the cursor to the beginning of the file. You may then read the file (which will read from the cursor to the end of the file) normally.
Note that if you perform any writes after rewinding, you will overwrite your previously-written content.
# Unpacks first argument to 'filename'
filename = ARGV.first
# Lets write try writing to a file
write_txt = File.new(filename, 'w+')
write_txt.write("OMG I wrote this file!\nHow cool is that?")
write_txt.rewind
puts write_txt.read()
Note, however, that it is generally considered bad practice to both read from and write to the same file handle. You would generally open one file handle for reading and one for writing, as mixing the two can have nasty consequenses (such as accidentally overwriting existing content by rewinding the cursor for a read, and then performing a write!)
The output is not necessarily written to the file immediately. Also, the pointer is at the end of the file, if you want to read while in read-write mode you have to reset it. You can simply close if you want to reopen it for reading. Try:
write_txt.write("OMG I wrote this file!\nHow cool is that?")
# This outputs a blank line THIS IS THE LINE IN QUESTION
write_txt.close
txt = File.open(filename)
puts txt.read()

How to efficiently parse large text files in Ruby

I'm writing an import script that processes a file that has potentially hundreds of thousands of lines (log file). Using a very simple approach (below) took enough time and memory that I felt like it would take out my MBP at any moment, so I killed the process.
#...
File.open(file, 'r') do |f|
f.each_line do |line|
# do stuff here to line
end
end
This file in particular has 642,868 lines:
$ wc -l nginx.log /code/src/myimport
642868 ../nginx.log
Does anyone know of a more efficient (memory/cpu) way to process each line in this file?
UPDATE
The code inside of the f.each_line from above is simply matching a regex against the line. If the match fails, I add the line to a #skipped array. If it passes, I format the matches into a hash (keyed by the "fields" of the match) and append it to a #results array.
# regex built in `def initialize` (not on each line iteration)
#regex = /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (.{0})- \[([^\]]+?)\] "(GET|POST|PUT|DELETE) ([^\s]+?) (HTTP\/1\.1)" (\d+) (\d+) "-" "(.*)"/
#... loop lines
match = line.match(#regex)
if match.nil?
#skipped << line
else
#results << convert_to_hash(match)
end
I'm completely open to this being an inefficient process. I could make the code inside of convert_to_hash use a precomputed lambda instead of figuring out the computation each time. I guess I just assumed it was the line iteration itself that was the problem, not the per-line code.
I just did a test on a 600,000 line file and it iterated over the file in less than half a second. I'm guessing the slowness is not in the file looping but the line parsing. Can you paste your parse code also?
This blogpost includes several approaches to parsing large log files. Maybe thats an inspiration. Also have a look at the file-tail gem
If you are using bash (or similar) you might be able to optimize like this:
In input.rb:
while x = gets
# Parse
end
then in bash:
cat nginx.log | ruby -n input.rb
The -n flag tells ruby to assume 'while gets(); ... end' loop around your script, which might cause it to do something special to optimize.
You might also want to look into a prewritten solution to the problem, as that will be faster.

Resources