How to edit every x amount of lines in txtfile in Ruby? - ruby

I'm trying to change something in every other line in a text file using Ruby (and some text files I need to change something every third line and so on.)
I found this question helpful for iterating over every line, but I specifically need help making changes every x amount of lines.
The ### is the part I'm having trouble with (the iterating over x amount of lines.)
text = File.open('fr.txt').read
clean = ### .sub("\n", " ");
new = File.new("edit_fr.txt", "w")
new.puts clean
new.close

You can use modulus division as below where n refers to the nth line you want to process and i refers to the 0-based index for the file lines. Using those two values, modulo math provides the remainder from integer division which will be 0 whenever the 1-based index (i+1) is multiple of n.
n = 3 # modify every 3rd line
File.open('edit_fr.txt','w') do |f| # Open the output file
File.open('fr.txt').each_with_index do |line,i| # Open the input file
if (i+1) % n == 0 # Every nth line
f.print line.chomp # Remove newline
else # Every non-nth line
f.puts line # Print line
end
end
end
More info is available on Wikipedia: http://en.wikipedia.org/wiki/Modulo_operation
In computing, the modulo operation finds the remainder after division of one number by another (sometimes called modulus).
Given two positive numbers, a (the dividend) and n (the divisor), a modulo n (abbreviated as a mod n) is the remainder of the Euclidean division of a by n. For instance, the expression "5 mod 2" would evaluate to 1 because 5 divided by 2 leaves a quotient of 2 and a remainder of 1, while "9 mod 3" would evaluate to 0 because the division of 9 by 3 has a quotient of 3 and leaves a remainder of 0; there is nothing to subtract from 9 after multiplying 3 times 3. (Note that doing the division with a calculator will not show the result referred to here by this operation; the quotient will be expressed as a decimal fraction.)

every_other = 2
File.open('data.txt') do |f|
e = f.each
target_line = nil
loop do
every_other.times do
target_line = e.next
end
puts target_line
end
end

You wish to write each line of an input file to an output file, but you want to modify each nth line of the input file before writing it, beginning with the first line of the file.
Suppose we have defined a method modify, which accepts a line of text as its argument and returns a modified string. Then you can do it like this:
def modify_and_write(in_fname, out_fname, n)
enum = Array.new(n) { |i| i.zero? ? :process : :skip }.cycle
f = File.open(out_fname, 'w')
IO.foreach(in_fname) do |line|
(line = process(line)) if enum.next == :process
f.puts(line)
end
f.close
end
I'm reading one line at a time (rather using IO#readlines) to read the entire file into an array) so that it will work with files of any size.
Suppose:
n = 3
The key here is the enumerator:
enum = Array.new(n) { |i| i.zero? ? :process : :skip }.cycle
#=> #<Enumerator: [:process, :skip, :skip]:cycle>
enum.next #=> :process
enum.next #=> :skip
enum.next #=> :skip
enum.next #=> :process
enum.next #=> :skip
enum.next #=> :skip
enum.next #=> :process
enum.next #=> :skip
...
Edit: after answering I noticed the OP's comment: I need to combine every two lines: line1 /n line2 /n line3 /n line would become line1 space line2 /n line3 space line4, which is not consistent with "I'm trying to change something in every other line in a text file". To address the specific requirement, my solution could be modified as follows:
def combine_lines(in_fname, out_fname, n)
enum = Array.new(n) { |i| (i==n-1) ? :write : :read }.cycle
f = File.open(out_fname, 'w')
combined = []
IO.foreach(in_fname) do |line|
combined << line.chomp
if enum.next == :write
f.puts(combined.join(' '))
combined.clear
end
end
f.puts(combined.join(' ')) if combined.any?
f.close
end
Let's try it:
text =<<_
Now is
the time
for all
good
Rubyists
to do
something
other
than
code.
_
File.write('in',text)
combine_lines('in', 'out', 3)
puts File.read('out')
# Now is the time for all
# good Rubyists to do
# something other than
# code.
You could also use a regex, as #Stefan has done, which would be my preference for less-than-humongous files. Here's another regex implementation:
def combine_lines(in_fname, out_fname, n)
IO.write(out_fname,
IO.read(in_fname)
.scan(/(?:.*?\n){1,#{n}}/)
.map { |s| s.split.join(' ') }
)
end
combine_lines('in', 'out', 3)
puts File.read('out')
# Now is the time for all
# good Rubyists to do
# something other than
# code.
We could write the above regex with the final / changed to /x to include comments:
r = /
(?: # begin a non-capture group
.*? # match any number of any character, non-greedily
\n # match (the first, because of non-greedily) end-of-line
) # end the non-capture group
{1,#{n}} # match between 1 and n of the preceding non-capture group
/x
{1,#{n}} is "greedy" in the sense that it will match as many lines as possible, up to n. If the number of lines were always a multiple of n, we could instead write {{#n}}, meaning match n non-capture groups (i.e., n lines). However, if the number of lines is not a multiple of n (as in my example above), we need {1,#{n}} to match the last few lines in the last non-capture group.

I think you could do it with just a regex:
EDIT
OK, I knew I could do this with each_slice and a simple regex:
def chop_it(file,num)
#file name and the number of lines to join
arr = []
#create an empty array to hold the lines we create
File.open(file) do |f|
#open your file into a `do..end` block, it closes automatically for you
f.each_slice(num) do |slice|
#get an array of lines equal to num
arr << slice.join(' ').gsub!(/\n/, '') + "\n"
#join the lines with ' ', then remove all the newlines and tack one
# on the end, adding the resulting line to the array.
end
end
arr.join
#join all of the lines back into one string that can be sent to a file.
end
And there you have it, simple and flexible. Just enter file name and the number of lines you want reduced down to one line. i.e. if you want every two lines joined, chop_it('data.txt',2). Every three? chop_it('data.txt,3).
** old answer **
old_text = File.read(data.txt)
new_text = old_text.gsub(/(?:(^.*)\n(^.*\n))/i,'\1 \2')
The regex matches the first line up to "\n" and the second line up to and including "\n". The substitution returns the two matches with a space between them.
"this is line one\nthis is line two\n this is line three\nthis is line four]n"
\1 = "this is line one"
\2 = "this is line two\n"
'\1 \2' = "this is line one this is line two\n"
This regex will also handle removing every other blank line in successive blank lines

new = File.new("edit_fr.txt", "w")
File.readlines("test.txt").each_slice(2) do |batch| # or each_slice(3) etc
new.puts batch.map(&:chomp).join(" ")
end
new.close

I need to combine every two lines: line1 /n line2 /n line3 /n line would become line1 space line2 /n line3 space line4
You could read the entire file into a string, use gsub! and with_index to replace every nth newline with space and write the replaced content to a new file:
content = IO.read('fr.txt')
content.gsub!("\n").with_index(1) { |m, i| (i % 2).zero? ? m : ' ' }
IO.write('edit-fr.txt', content)
Input fr.txt:
line1
line2
line3
line4
Output edit-fr.txt:
line1 line2
line3 line4

Related

Ruby: Using an array list in order to select specific columns

I'm new in Ruby.
Here the script, I would like to use the selector in line 10 instead of fields[0] etc...
How can I do that ?
For the example the data are embedded.
Don't hesitate to correct me if I'm doing wrong when I'm opening or writing a file or anything else, I like to learn.
#!/usr/bin/ruby
filename = "/tmp/log.csv"
selector = [0, 3, 5, 7]
out = File.open(filename + ".rb.txt", "w")
DATA.each_line do |line|
fields = line.split("|")
columns = fields[0], fields[3], fields[5], fields[7]
puts columns.join("|")
out.puts(columns.join("|"))
end
out.close
__END__
20180704150930|rtsp|645645643|30193|211|KLM|KLM00SD624817.ts|172.30.16.34|127299264|VERB|01780000|21103|277|server01|OK
20180704150931|api|456456546|30130|234|VC3|VC300179201139.ts|172.30.16.138|192271838|VERB|05540000|23404|414|server01|OK
20180704150931|api|465456786|30154|443|BAD|BAD004416550.ts|172.30.16.50|280212202|VERB|04740000|44301|18|server01|OK
20180704150931|api|5437863735|30157|383|VSS|VSS0011062009.ts|172.30.16.66|312727922|VERB|05700000|38303|381|server01|OK
20180704150931|api|3453432|30215|223|VAE|VAE00TF548197.ts|172.30.16.74|114127126|VERB|05060000|22305|35|server01|OK
20180704150931|api|312121|30044|487|BOV|BOVVAE00549424.ts|172.30.16.58|69139448|VERB|05300000|48708|131|server01|OK
20180704150931|rtsp|453432123|30127|203|GZD|GZD0900032066.ts|172.30.16.58|83164150|VERB|05460000|20303|793|server01|OK
20180704150932|api|12345348|30154|465|TYH|TYH0011224259.ts|172.30.16.50|279556843|VERB|04900000|46503|241|server01|OK
20180704150932|api|4343212312|30154|326|VAE|VAE00TF548637.ts|172.30.16.3|28966797|VERB|04740000|32601|969|server01|OK
20180704150932|api|312175665|64530|305|TTT|TTT000000011852.ts|172.30.16.98|47868183|VERB|04740000|30501|275|server01|OK
You can get fields at specific indices using Ruby's splat operator (search for 'splat') and Array.values_at like so:
columns = fields.values_at(*selector)
A couple of coding style suggestions:
1.You may want to make selector a constant since its unlikely that you'll want to mutate it further down in your code base
2.The out and out.close and appending to DATA can all be condensed into a CSV.open:
CSV.open(filenname, 'wb') do |csv|
columns.map do |col|
csv << col
end
end
You can also specify a custom delimiter (pipe | in your case) as noted in this answer like so:
...
CSV.open(filenname, 'wb', {col_sep: '|') do |csv|
...
Let's begin with a more manageable example. First note that if your string is held by the variable data, each line of the string contains the same number (14) of vertical bars ('|'). Lets reduce that to the first 4 lines of data with each line terminated immediately before the 6th vertical bar:
str = data.each_line.map { |line| line.split("|").first(6).join("|") }.first(4).join("\n")
puts str
20180704150930|rtsp|645645643|30193|211|KLM
20180704150931|api|456456546|30130|234|VC3
20180704150931|api|465456786|30154|443|BAD
20180704150931|api|5437863735|30157|383|VSS
We need to also modify selector (arbitrarily):
selector = [0, 3, 4]
Now on to answering the question.
There is no need to divide the string into lines, split each line on the vertical bars, select the elements of interest from the resulting array, join the latter with a vertical bar and then lastly join the whole shootin' match with a newline (whew!). Instead, simply use String#gsub to remove all unwanted characters from the string.
terms_per_row = str.each_line.first.count('|') + 1
#=> 6
r = /
(?:^|\|) # match the beginning of a line or a vertical bar in a non-capture group
[^|\n|]+ # match one or more characters other than a vertical bar or newline
/x # free-spacing regex definition mode
line_idx = -1
new_str = str.gsub(r) do |s|
line_idx += 1
selector.include?(line_idx % terms_per_row) ? s : ''
end
puts new_str
20180704150930|30193|211
20180704150931|30130|234
20180704150931|30154|443
20180704150931|30157|383
Lastly, we write new_str to file:
File.write(fname, new_str)

How can I read lines from a file into an array that are not comments or empty?

I have a text file where a line may be either blank, a comment (begins with //) or an instruction (i.e. anything not blank or a comment). For instance:
Hiya #{prefs("interlocutor")}!
// Say morning appropriately or hi otherwise
#{(0..11).include?(Time.now.hour) ? 'Morning' : 'Hi'} #{prefs("interlocutor")}
I'm trying to read the contents of the file into an array where only the instruction lines are included (i.e. skip the blank lines and comments). I have this code (which works):
path = Pathname.new("test.txt")
# Get each line from the file and reject comment lines
lines = path.readlines.reject{ |line| line.start_with?("//") }.map{ |line| line.chomp }
# Reject blank lines
lines = lines.reject{ |line| line.length == 0 }
Is there a more efficient or elegant way of doing it? Thanks.
start_with takes multiple arguments, so you can do
File.open("test.txt").each_line.reject{|line| line.start_with?("//", "\n")}.map(&:chomp)
in one go.
I would do it like so, using regex:
def read_commands(path)
File.read(path).split("\n").reduce([]) do |results, line|
case line
when /^\s*\/\/.*$/ # check for comments
when /^\s*$/ # check for empty lines
else
results.push line
end
results
end
end
To break down the regexes:
comments_regex = %r{
^ # beginning of line
\s* # any number of whitespaces
\/\/ # the // sequence
.* # any number of anything
$ # end of line
}x
empty_lines_regex = %r{
^ # beginning of line
\s* # any number of whitespaces
$ # end of line
}x

Replace a specific line in a file using Ruby

I have a text file (a.txt) that looks like the following.
open
close
open
open
close
open
I need to find a way to replace the 3rd line with "close". I did some search and most method involve searching for the line than replace it. Can't really do it here since I don't want to turn all the "open" to "close".
Essentially (for this case) I'm looking for a write version of IO.readlines("./a.txt") [2].
How about something like:
lines = File.readlines('file')
lines[2] = 'close' << $/
File.open('file', 'w') { |f| f.write(lines.join) }
str = <<-_
my
dog
has
fleas
_
FNameIn = 'in'
FNameOut = 'out'
First, let's write str to FNameIn:
File.write(FNameIn, str)
#=> 17
Here are a couple of ways to replace the third line of FNameIn with "had" when writing the contents of FNameIn to FNameOut.
#1 Read a line, write a line
If the file is large, you should read from the input file and write to the output file one line at a time, rather than keeping large strings or arrays of strings in memory.
fout = File.open(FNameOut, "w")
File.foreach(FNameIn).with_index { |s,i| fout.puts(i==2 ? "had" : s) }
fout.close
Let's check that FNameOut was written correctly:
puts File.read(FNameOut)
my
dog
had
fleas
Note that IO#puts writes a record separator if the string does not already end with a record separator.1. Also, if fout.close is omitted FNameOut is closed when fout goes out of scope.
#2 Use a regex
r = /
(?:[^\n]*\n) # Match a line in a non-capture group
{2} # Perform the above operation twice
\K # Discard all matches so far
[^\n]+ # Match next line up to the newline
/x # Free-spacing regex definition mode
File.write(FNameOut, File.read(FNameIn).sub(r,"had"))
puts File.read(FNameOut)
my
dog
had
fleas
1 File.superclass #=> IO, so IO's methods are inherited by File.

How do I count the number of instances of particular words in a paragraph?

I'd like to count the number of times a set of words appear in each paragraph in a text file. I am able to count the number of times a set of words appears in an entire text.
It has been suggested to me that my code is really buggy, so I'll just ask what I would like to do, and if you want, you can look at the code I have at the bottom.
So, given that "frequency_count.txt" has the words "apple pear grape melon kiwi" in it, I want to know how often "apple" shows up in each paragraph of a separate file "test_essay.txt", how often pear shows up, etc., and then for these numbers to be printed out in a series of lines of numbers, each corresponding to a paragraph.
For instance:
apple, pear, grape, melon, kiwi
3,5,2,7,8
2,3,1,6,7
5,6,8,2,3
Where each line corresponds to one of the paragraphs.
I am very, very new to Ruby, so thank you for your patience.
output_file = '/Users/yirenlu/Quora-Personal-Analytics/weka_input6.csv'
o = File.open(output_file, "r+")
common_words = '/Users/yirenlu/Quora-Personal-Analytics/frequency_count.txt'
c = File.open(common_words, "r")
c.each_line{|$line1|
words1 = $line1.split
words1.each{|w1|
the_file = '/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt'
f = File.open(the_file, "r")
rows = File.readlines("/Users/yirenlu/Quora-Personal-Analytics/test_essay.txt")
text = rows.join
paragraph = text.split(/\n\n/)
paragraph.each{|p|
h = Hash.new
puts "this is each paragraph"
p.each_line{|line|
puts "this is each line"
words = line.split
words.each{|w|
if w1 == w
if h.has_key?(w)
h[w1] = h[w1] + 1
else
h[w1] = 1
end
$x = h[w1]
end
}
}
o.print "#{$x},"
}
}
o.print "\n"
o.print "#{$line1}"
}
If you're used to PHP or Perl you may be under the impression that a variable like $line1 is local, but this is a global. Use of them is highly discouraged and the number of instances where they are strictly required is very short. In most cases you can just omit the $ and use variables that way with proper scoping.
This example also suffers from nearly unreadable indentation, though perhaps that was an artifact of the cut-and-paste procedure.
Generally what you want for counters is to create a hash with a default of zero, then add to that as required:
# Create a hash where the default values for each key is 0
counter = Hash.new(0)
# Add to the counters where required
counter['foo'] += 1
counter['bar'] += 2
puts counter['foo']
# => 1
puts counter['baz']
# => 0
You basically have what you need, but everything is all muddled and just needs to be organized better.
Here are two one-liners to calculate frequencies of words in a string.
The first one is a bit easier to understand, but it's less effective:
txt.scan(/\w+/).group_by{|word| word.downcase}.map{|k,v| [k, v.size]}
# => [['word1', 1], ['word2', 5], ...]
The second solution is:
txt.scan(/\w+/).inject(Hash.new(0)) { |hash, w| hash[w.downcase] += 1; hash}
# => {'word1' => 1, 'word2' => 5, ...}
This could be shorter and easier to read if you use:
The CSV library.
A more functional approach using map and blocks.
require 'csv'
common_words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
def word_frequency(words, text)
words.map { |word| text.scan(/\b#{word}\b/).length }
end
CSV.open("file.csv", "wb") do |csv|
paragraphs = text.split /\n\n/
paragraphs.each do |para|
csv << word_frequency(common_words, para)
end
end
Note this is currently case-sensitive but it's a minor adjustment if you want case-insensitivity.
Here's an alternate answer, which is has been tweaked for conciseness (though not as easy to read as my other answer).
require 'csv'
words = %w(apple pear grape melon kiwi)
text = File.open("test_essay.txt").read
CSV.open("file.csv", "wb") do |csv|
text.split(/\n\n/).map {|p| csv << words.map {|w| p.scan(/\b#{w}\b/).length}}
end
I prefer the slightly longer but more self-documenting code, but it's fun to see how small it can get.
What about this:
# Create an array of regexes to be used in `scan' in the loop.
# `\b' makes sure that `barfoobar' does not match `bar' or `foo'.
p word_list = File.open("frequency_count.txt"){|io| io.read.scan(/\w+/)}.map{|w| /\b#{w}\b/}
File.open("test_essay.txt") do |io|
loop do
# Add lines to `paragraph' as long as there is a continuous line
paragraph = ""
# A `l.chomp.empty?' becomes true at paragraph border
while l = io.gets and !l.chomp.empty?
paragraph << l
end
p word_list.map{|re| paragraph.scan(re).length}
# The end of file has been reached when `l == nil'
break unless l
end
end
To count how many times one word appears in a text:
text = "word aaa word word word bbb ccc ccc"
text.scan(/\w+/).count("word") # => 4
To count a set of words:
text = "word aaa word word word bbb ccc ccc"
wlist = text.scan(/\w+/)
wset = ["word", "ccc"]
result = {}
wset.each {|word| result[word] = wlist.count(word) }
result # => {"word" => 4, "ccc" => 2}
result["ccc"] # => 2

How do I remove the first n lines from a string in Ruby?

One\n
Two\n
Three\n
Four\n
remove_lines(2) would remove the first two lines, leaving the string:
Three\n
Four\n
s.to_a[2..-1].join
>> s = "One\nTwo\nThree\nFour\n"
=> "One\nTwo\nThree\nFour\n"
>> s.to_a[2..-1].join
=> "Three\nFour\n"
s = "One\nTwo\nThree\nFour"
lines = s.lines
> ["One\n", "Two\n", "Three\n", "Four"]
remaining_lines = lines[2..-1]
> ["Three\n", "Four"]
remaining_lines.join
> "Three\nFour"
String#lines converts the string into an array of lines (retaining the new line character at the end of each string)
[2..-1] specifies the range of lines to return, in this case the third through the last
Array#join concatenates the lines back together, without any space (but since the lines still contain the new line character, we don't need a separator)
In one line:
s.lines[2..-1].join
class String
def remove_lines(i)
split("\n")[i..-1].join("\n")
end
end
Calling "One\nTwo\nThree\nFour\n".remove_lines(2) would result in "Three\nFour". If you need the trailing "\n" you need to extend this method accordingly.
I had a situation where I needed to support multiple platform EOLN (both \r and \n), and had success with the following:
split(/\r\n|\r|\n/, 2).last
Or the equivalent remove_lines:
def remove_lines(number_of_lines=1)
split(/\r\n|\r|\n/, number_of_lines+1).last
end
Here is a pure regexp one-liner. Hypothetically it should be even faster than the elegant solution provided by #DigitalRoss:
n = 4 # number of lines
str.gsub(/\A(.*\n){#{n}}/,'')
If you know in advance how many line you want to cut (4 here):
str.gsub(/\A(.*\n){4}/,'')
And if you want to cut only one line:
str.gsub(/\A.*\n/,'')
In order to cut n lines from the tail:
gsub(/(\n.*){#{n}}\Z/,'')
This problem will remove the first two lines using regular expression.
Text = "One\nTwo\nThree\nFour"
Text = Text.gsub /^(?:[^\n]*\n){2}/, ''
# -----------------------------------^^ (2) Replace with nothing
# ----------------^^^^^^^^^^^^^^^^ (1) Detect first 2 lines
puts Text
EDIT: I've just saw that the question is also about 'n' lines not just two lines.
So here is my new answer.
Lines_Removed = 2
Original_Text = "One\nTwo\nThree\nFour"
Result___Text = (Original_Text.gsub(Regexp.new("([^\n]*\n){%s}" % Lines_Removed), ''))
# ^^^^^^^^^^^^^^ ^^
# - (1) Detect first lines -----++++++++++++++ ||
# - (2) Replace with nothing -----------------------------------------------------++
puts Result___Text # Returns "Three\nFour"
def remove_lines(str, n)
res = ""
arr = str.split("\n")[n..(str.size-n)]
arr.each { |i| res.concat(i + "\n") }
return res
end
a = "1\n2\n3\n4\n"
b = remove_lines(a, 2)
print b

Resources