Ruby scan/gets until EOF - ruby

I want to scan unknown number of lines till all the lines are scanned. How do I do that in ruby?
For ex:
put returns between paragraphs
for linebreak add 2 spaces at end
_italic_ or **bold**
The input is not from a 'file' but through the STDIN.

Many ways to do that in ruby.
Most usually, you're gonna wanna process one line at a time, which you can do, for example, with
while line=gets
end
or
STDIN.each_line do |line|
end
or by running ruby with the -n switch, for example, which implies one of the above loops (line is being saved into $_ in each iteration, and you can addBEGIN{}, and END{}, just like in awk—this is really good for one-liners).
I wouldn't do STDIN.read, though, as that will read the whole file into memory at once (which may be bad, if the file is really big.)

Use IO#read (without length argument, it reads until EOF)
lines = STDIN.read
or use gets with nil as argument:
lines = gets(nil)
To denote EOF, type Ctrl + D (Unix) or Ctrl + Z (Windows).

Related

writing in specific position in a text file using ruby

I want to insert data in specific positions in a text file, like in line 1 starting from position 10, how can I do it using ruby?
I also want to pass fake data into this file using fakker gem or in any other way possible. Like sending phone number, name, SSN etc.
Here's a sample script that takes two arguments and writes a modified copy of the first file's contents to the second file:
require 'faker'
input = File.open(ARGV[0], 'r')
lines = input.readlines
lines[0].gsub!(/^(.{0,10})/, '\1' + Faker::Base.numerify('###').to_s)
output = File.open(ARGV[1], 'w')
lines.each do |line|
output.write(line)
end
If you have an input file that looks like:
12345678901234567890
^^^ fake data
the output might look like:
12345678909451234567890
^^^ fake data
Since I opened the output file after reading the input file, you can pass the same file name as both the first and the second argument. That isn't exactly inserting the string into the file, but it's as close as you'll get.
The key line is:
lines[0].gsub!(/^(.{0,10})/, '\1' + Faker::Base.numerify('###').to_s)
It takes the fist line and substitutes in place a random 3-digit integer. If there are fewer than 10 characters in the first line, it'll append the random data to the end of the line. If you'd prefer to not substitute, you might want to remove the beginning of the range in the regex:
/^(.{10})/
Or maybe do something else if lines[0].length < 10.

Add line break/new line in IRB?

How do I add a line-break/new-line in IRB/Ruby? The book I'm learning from shows this code:
print "2+3 is equal to "
print 2 + 3
without telling how to go to the second line without hitting Enter, which obviously just runs the program.
You could use semicolon at the end of statement like this puts "hello";puts"world"
That book might be taking very tiny steps to introducing this idea:
print "Continues..."
puts "(Up to here)"
The print function just outputs to the terminal exactly what it's given. The puts function does the same but also adds a newline, which is what you want.
The more Ruby way of doing this is either:
puts "2+3 equals #{2+3}" # Using string interpolation
puts "2+3 equals %d" % (2 + 3) # Using sprintf-style interpolation
Now if you're using irb, that's a Read-Evaluate-Print-Loop (REPL) which means it executes everything you type in as soon as you press enter, by design. If you want to use your original code, you need to force it on one line:
print "2+3 equals "; print 2+3
Then that will work as expected. The ; line separator is rarely used in Ruby, most style guides encourage you to split things up onto multiple lines, but if you do need to do a one-liner, this is how.
When writing code in, say a .rb file the return key is just used for formatting and doesn't execute any code.
You can put a semicolon after the first line, like this:
print "2+3 is equal to ";
print 2 + 3

How do I print the line number of the file I am working with via ARGV?

I'm currently opening a file taken at runtime via ARGV:
File.open(ARGV[0]) do |f|
f.each_line do |line|
Once a match is found I print output to the user.
if line.match(/(strcpy)/i)
puts "[!] strcpy does not check for buffer overflows when copying to destination."
puts "[!] Consider using strncpy or strlcpy (warning, strncpy is easily misused)."
puts " #{line}"
end
I want to know how to print out the line number for the matching line in the (ARGV[0]) file.
Using print __LINE__ shows the line number from the Ruby script. I've tried many different variations of print __LINE__ with different string interpolations of #{line} with no success. Is there a way I can print out the line number from the file?
When Ruby's IO class opens a file, it sets the $. global variable to 0. For each line that is read that variable is incremented. So, to know what line has been read simply use $..
Look in the English module for $. or $INPUT_LINE_NUMBER.
We can also use the lineno method that is part of the IO class. I find that a bit more convoluted because we need an IO stream object to tack that onto, while $. will work always.
I'd write the loop more simply:
File.foreach(ARGV[0]) do |line|
Something to think about is, if you're on a *nix system, you can use the OS's built-in grep or fgrep tool to greatly speed up your processing. The "grep" family of applications are highly optimized for doing what you want, and can find all occurrences, only the first, can use regular expressions or fixed strings, and can easily be called using Ruby's %x or backtick operators.
puts `grep -inm1 abacus /usr/share/dict/words`
Which outputs:
34:abacus
-inm1 means "ignore character-case", "output line numbers", "stop after the first occurrence"

Weird behavior when changing line separator and then changing it back

I was following the advice from this question when trying to read in multi-line input from the command line:
# change line separator
$/ = 'END'
answer = gets
pp answer
However, I get weird behavior from STDIN#gets when I try to change $/ back:
# put it back to normal
$/ = "\n"
answer = gets
pp answer
pp 'magic'
This produces output like this when executed with Ruby:
$ ruby multiline_input_test.rb
this is
a multiline
awesome input string
FTW!!
END
"this is\n\ta multiline\n awesome input string\n \t\tFTW!!\t\nEND"
"\n"
"magic"
(I input up to the END and the rest is output by the program, then the program exits.)
It does not pause to get input from the user after I change $/ back to "\n". So my question is simple: why?
As part of a larger (but still small) application, I'm trying to devise a way of recording notes; as it is, this weird behavior is potentially devastating, as the rest of my program won't be able to function properly if I can't reset the line separator. I've tried all manner of using double- and single-quotes, but that doesn't seem to be the issue. Any ideas?
The problem you're having is that your input ends with END\n. Ruby sees the END, and there's still a \n left in the buffer. You do successfully set the input record separator back to \n, so that character is immediately consumed by the second gets.
You therefore have two easy options:
Set the input record separator to END\n (use double quotes in order to have the newline character work):
$/ = "END\n"
Clear the buffer with an extra call to gets:
$/ = 'END'
answer = gets
gets # Consume extra `\n`
I consider option 1 clearer.
This shows it working on my system using option 1:
$ ruby multiline_input_test.rb
this is
a multiline
awesome input string
FTW!!
END
"this is\n a multiline\n awesome input string\n FTW!!\nEND\n"
test
"test\n"
"magic"

How to efficiently parse large text files in Ruby

I'm writing an import script that processes a file that has potentially hundreds of thousands of lines (log file). Using a very simple approach (below) took enough time and memory that I felt like it would take out my MBP at any moment, so I killed the process.
#...
File.open(file, 'r') do |f|
f.each_line do |line|
# do stuff here to line
end
end
This file in particular has 642,868 lines:
$ wc -l nginx.log /code/src/myimport
642868 ../nginx.log
Does anyone know of a more efficient (memory/cpu) way to process each line in this file?
UPDATE
The code inside of the f.each_line from above is simply matching a regex against the line. If the match fails, I add the line to a #skipped array. If it passes, I format the matches into a hash (keyed by the "fields" of the match) and append it to a #results array.
# regex built in `def initialize` (not on each line iteration)
#regex = /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (.{0})- \[([^\]]+?)\] "(GET|POST|PUT|DELETE) ([^\s]+?) (HTTP\/1\.1)" (\d+) (\d+) "-" "(.*)"/
#... loop lines
match = line.match(#regex)
if match.nil?
#skipped << line
else
#results << convert_to_hash(match)
end
I'm completely open to this being an inefficient process. I could make the code inside of convert_to_hash use a precomputed lambda instead of figuring out the computation each time. I guess I just assumed it was the line iteration itself that was the problem, not the per-line code.
I just did a test on a 600,000 line file and it iterated over the file in less than half a second. I'm guessing the slowness is not in the file looping but the line parsing. Can you paste your parse code also?
This blogpost includes several approaches to parsing large log files. Maybe thats an inspiration. Also have a look at the file-tail gem
If you are using bash (or similar) you might be able to optimize like this:
In input.rb:
while x = gets
# Parse
end
then in bash:
cat nginx.log | ruby -n input.rb
The -n flag tells ruby to assume 'while gets(); ... end' loop around your script, which might cause it to do something special to optimize.
You might also want to look into a prewritten solution to the problem, as that will be faster.

Resources