I'm trying to count the lines of a file with ruby but I can't get either IO or File to count the last line.
What do I mean by last line?
Here's a screenshot of Atom editor getting that last line
Ruby returns 20 lines, I need 21 lines. Here is such file
https://copy.com/cJbiAS4wxjsc9lWI
Interesting question (although your example file is cumbersome). Your editor shows a 21st line because the 20th line ends with a newline character. Without a trailing newline character, your editor would show 20 lines.
Here's a simpler example:
a = "foo\nbar"
b = "baz\nqux\n"
A text editor would show:
# file a
1 foo
2 bar
# file b
1 baz
2 qux
3
Ruby however sees 2 lines in either cases:
a.lines #=> ["foo\n", "bar"]
a.lines.count #=> 2
b.lines #=> ["baz\n", "qux\n"]
b.lines.count #=> 2
You could trick Ruby into recognizing the trailing newline by adding an arbitrary character:
(a + '_').lines #=> ["foo\n", "bar_"]
(a + '_').lines.count #=> 2
(b + '_').lines #=> ["baz\n", "qux\n", "_"]
(b + '_').lines.count #=> 3
Or you could use a Regexp that matches either end of line ($) or end of string (\Z):
a.scan(/$|\Z/) #=> ["", ""]
a.scan(/$|\Z/).count #=> 2
b.scan(/$|\Z/) #=> ["", "", ""]
b.scan(/$|\Z/).count #=> 3
Ruby lines method doesn't count the last empty line.
To trick, you can add an arbitrary character at the end of your stream.
Ruby lines returns 2 lines for this example:
1 Hello
2 World
3
Instead, it returns 3 lines in this case
1 Hello
2 World
3 *
Related
I'm trying to change something in every other line in a text file using Ruby (and some text files I need to change something every third line and so on.)
I found this question helpful for iterating over every line, but I specifically need help making changes every x amount of lines.
The ### is the part I'm having trouble with (the iterating over x amount of lines.)
text = File.open('fr.txt').read
clean = ### .sub("\n", " ");
new = File.new("edit_fr.txt", "w")
new.puts clean
new.close
You can use modulus division as below where n refers to the nth line you want to process and i refers to the 0-based index for the file lines. Using those two values, modulo math provides the remainder from integer division which will be 0 whenever the 1-based index (i+1) is multiple of n.
n = 3 # modify every 3rd line
File.open('edit_fr.txt','w') do |f| # Open the output file
File.open('fr.txt').each_with_index do |line,i| # Open the input file
if (i+1) % n == 0 # Every nth line
f.print line.chomp # Remove newline
else # Every non-nth line
f.puts line # Print line
end
end
end
More info is available on Wikipedia: http://en.wikipedia.org/wiki/Modulo_operation
In computing, the modulo operation finds the remainder after division of one number by another (sometimes called modulus).
Given two positive numbers, a (the dividend) and n (the divisor), a modulo n (abbreviated as a mod n) is the remainder of the Euclidean division of a by n. For instance, the expression "5 mod 2" would evaluate to 1 because 5 divided by 2 leaves a quotient of 2 and a remainder of 1, while "9 mod 3" would evaluate to 0 because the division of 9 by 3 has a quotient of 3 and leaves a remainder of 0; there is nothing to subtract from 9 after multiplying 3 times 3. (Note that doing the division with a calculator will not show the result referred to here by this operation; the quotient will be expressed as a decimal fraction.)
every_other = 2
File.open('data.txt') do |f|
e = f.each
target_line = nil
loop do
every_other.times do
target_line = e.next
end
puts target_line
end
end
You wish to write each line of an input file to an output file, but you want to modify each nth line of the input file before writing it, beginning with the first line of the file.
Suppose we have defined a method modify, which accepts a line of text as its argument and returns a modified string. Then you can do it like this:
def modify_and_write(in_fname, out_fname, n)
enum = Array.new(n) { |i| i.zero? ? :process : :skip }.cycle
f = File.open(out_fname, 'w')
IO.foreach(in_fname) do |line|
(line = process(line)) if enum.next == :process
f.puts(line)
end
f.close
end
I'm reading one line at a time (rather using IO#readlines) to read the entire file into an array) so that it will work with files of any size.
Suppose:
n = 3
The key here is the enumerator:
enum = Array.new(n) { |i| i.zero? ? :process : :skip }.cycle
#=> #<Enumerator: [:process, :skip, :skip]:cycle>
enum.next #=> :process
enum.next #=> :skip
enum.next #=> :skip
enum.next #=> :process
enum.next #=> :skip
enum.next #=> :skip
enum.next #=> :process
enum.next #=> :skip
...
Edit: after answering I noticed the OP's comment: I need to combine every two lines: line1 /n line2 /n line3 /n line would become line1 space line2 /n line3 space line4, which is not consistent with "I'm trying to change something in every other line in a text file". To address the specific requirement, my solution could be modified as follows:
def combine_lines(in_fname, out_fname, n)
enum = Array.new(n) { |i| (i==n-1) ? :write : :read }.cycle
f = File.open(out_fname, 'w')
combined = []
IO.foreach(in_fname) do |line|
combined << line.chomp
if enum.next == :write
f.puts(combined.join(' '))
combined.clear
end
end
f.puts(combined.join(' ')) if combined.any?
f.close
end
Let's try it:
text =<<_
Now is
the time
for all
good
Rubyists
to do
something
other
than
code.
_
File.write('in',text)
combine_lines('in', 'out', 3)
puts File.read('out')
# Now is the time for all
# good Rubyists to do
# something other than
# code.
You could also use a regex, as #Stefan has done, which would be my preference for less-than-humongous files. Here's another regex implementation:
def combine_lines(in_fname, out_fname, n)
IO.write(out_fname,
IO.read(in_fname)
.scan(/(?:.*?\n){1,#{n}}/)
.map { |s| s.split.join(' ') }
)
end
combine_lines('in', 'out', 3)
puts File.read('out')
# Now is the time for all
# good Rubyists to do
# something other than
# code.
We could write the above regex with the final / changed to /x to include comments:
r = /
(?: # begin a non-capture group
.*? # match any number of any character, non-greedily
\n # match (the first, because of non-greedily) end-of-line
) # end the non-capture group
{1,#{n}} # match between 1 and n of the preceding non-capture group
/x
{1,#{n}} is "greedy" in the sense that it will match as many lines as possible, up to n. If the number of lines were always a multiple of n, we could instead write {{#n}}, meaning match n non-capture groups (i.e., n lines). However, if the number of lines is not a multiple of n (as in my example above), we need {1,#{n}} to match the last few lines in the last non-capture group.
I think you could do it with just a regex:
EDIT
OK, I knew I could do this with each_slice and a simple regex:
def chop_it(file,num)
#file name and the number of lines to join
arr = []
#create an empty array to hold the lines we create
File.open(file) do |f|
#open your file into a `do..end` block, it closes automatically for you
f.each_slice(num) do |slice|
#get an array of lines equal to num
arr << slice.join(' ').gsub!(/\n/, '') + "\n"
#join the lines with ' ', then remove all the newlines and tack one
# on the end, adding the resulting line to the array.
end
end
arr.join
#join all of the lines back into one string that can be sent to a file.
end
And there you have it, simple and flexible. Just enter file name and the number of lines you want reduced down to one line. i.e. if you want every two lines joined, chop_it('data.txt',2). Every three? chop_it('data.txt,3).
** old answer **
old_text = File.read(data.txt)
new_text = old_text.gsub(/(?:(^.*)\n(^.*\n))/i,'\1 \2')
The regex matches the first line up to "\n" and the second line up to and including "\n". The substitution returns the two matches with a space between them.
"this is line one\nthis is line two\n this is line three\nthis is line four]n"
\1 = "this is line one"
\2 = "this is line two\n"
'\1 \2' = "this is line one this is line two\n"
This regex will also handle removing every other blank line in successive blank lines
new = File.new("edit_fr.txt", "w")
File.readlines("test.txt").each_slice(2) do |batch| # or each_slice(3) etc
new.puts batch.map(&:chomp).join(" ")
end
new.close
I need to combine every two lines: line1 /n line2 /n line3 /n line would become line1 space line2 /n line3 space line4
You could read the entire file into a string, use gsub! and with_index to replace every nth newline with space and write the replaced content to a new file:
content = IO.read('fr.txt')
content.gsub!("\n").with_index(1) { |m, i| (i % 2).zero? ? m : ' ' }
IO.write('edit-fr.txt', content)
Input fr.txt:
line1
line2
line3
line4
Output edit-fr.txt:
line1 line2
line3 line4
I need to find each occurrence of "$" and change it to a number using a count. eg str = "foo $ bar $ foo $ bar $ * run code here * => "foo 1 bar 2 foo 3 bar 4
It feels like this should be a lot easier than i'm making it out to be. Here's my code:
def counter(file)
f = File.open(file, "r+")
count = 0
contents = f.readlines do |s|
if s.scan =~ /\$/
count += 1
f.seek(1)
s.sub(/\$/, count.to_s)
else
puts "Total changes: #{count}"
end
end
end
However I'm not sure if I'm meant to be using .match, .scan, .find or whatever else.
When i run this it doesn't come up with any errors but it doesn't change anything either.
Your syntax for scan is incorrect and it should throw error.
You can try something along this line:
count = 0
str = "foo $ bar $ foo $ bar $ "
occurences = str.scan('$')
# => ["$", "$", "$", "$"]
occurences.size.times do str.sub!('$', (count+=1).to_s) end
str
# => "foo 1 bar 2 foo 3 bar 4 "
Explanation:
I am finding all occurences of $ in the string, then I am using sub! in iteration as it replaces only the first occurrence at a time.
Note: You may want to improve scan line by using regex with boundary match instead of plain "$" as it will replace $ even from within words. Eg: exa$mple will also get replace to something like: exa1mple
Why your code is not throwing error?
If you read the description about readlines, you will find:
Reads the entire file specified by name as individual lines, and
returns those lines in an array.
As it reads the entire file at once there is no value passing block along this method. Following example will make it more clear:
contents = f.readlines do |s|
puts "HELLO"
end
# => ["a\n", "b\n", "c\n", "d\n", "asdasd\n", "\n"] #lines of file f
As you can see "HELLO" never gets printed, showing the block code is never executed.
So it reads from a online source file
the string look like this,
1. this is line1.X
2. this is "line2X"
3. this is X line4.
4. this is line3.X.X
So I only wants to put out the whole string with the ending "X" removed. In this example, only the X at the end of line 1 and line 4 will be removed.
I used chomp, but it only removed the X in line4.
string.each_line { |line| line.chomp("X") }
Should I use chomp or use something else?
String#chomp will work if you change the record separator to include both your chosen character and a newline. For example:
string.each_line { |line| puts line.chomp("X\n") }
The code above will print:
1. this is line1.
2. this is "line2X"
3. this is X line4.
4. this is line3.X.
but it will still return the original string. This may or may not matter for your use case. If it does matter, then you may want to use Kernel#p and String#gsub instead. For example:
p string.gsub(/X$/, '')
#=> "1. this is line1.\n2. this is \"line2X\"\n3. this is X line4.\n4. this is line3.X.\n"
string
#=> "1. this is line1.X\n2. this is \"line2X\"\n3. this is X line4.\n4. this is line3.X.X\n"
Try this:
string.each_line { |line|
if line[-1] = 'X'
line.chop!
end
}
Why not just use gsub?
text =<<_
1. this is line1.X
2. this is "line2X"
3. this is X line4.
4. this is line3.X.X
_
puts text.gsub(/X$/,'')
# 1. this is line1.
# 2. this is "line2X"
# 3. this is X line4.
# 4. this is line3.X.
With this regex:
regex1 = /\z/
the following strings match:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
What is interfering? The string encoding is UTF-8, and the OS is Linux (i.e., $/ is "\n"). Are the multibyte characters interfering with $/? How?
The problem you reported is definitely a bug of the Regexp of RUBY_VERSION #=> "2.0.0" but already existing in previous 1.9 when the encoding allow multi-byte chars such as __ENCODING__ #=> #<Encoding:UTF-8>
Does not depend on Linux , it's possibile to reproduce the same behavoir in OSX and Windows too.
In the while bug 8210 will be fixed, we can help by isolating and understanding the cases in which the problem occurs.
This can also be useful for any workaround when applicable to specific cases.
I understand that the problem occurs when:
searching something before end of string \z.
and the last character of the string is multi-byte.
and the the before search uses zero or one pattern ?
but the number of zero or one char searched in less than the number of bytes of the last character.
The bug may be caused by misunderstandings between the number of bytes and the number of chars that is actually checked by the regular expression engine.
A few examples may help:
TEST 1: where last character:"は" is 3 bytes:
s = "んにちは"
testing for zero or one of ん [3 bytes] before end of string:
s =~ /ん?\z/u #=> 4" # OK it works 3 == 3
when we try with ç [2 bytes]
s =~ /ç?\z/u #=> nil # KO: BUG when 3 > 2
s =~ /x?ç?\z/u #=> 4 # OK it works 3 == ( 1+2 )
when test for zero or one of \n [1 bytes]
s =~ /\n?\z/u #=> nil" # KO: BUG when 3 > 1
s =~ /\n?\n?\z/u #=> nil" # KO: BUG when 3 > 2
s =~ /\n?\n?\n?\z/u #=> 4" # OK it works 3 == ( 1+1+1)
By results of TEST1 we can assert: if the last multi-byte character of the string is 3 bytes , then the 'zero or one before' test only works when we test for at least 3 bytes (not 3 character) before.
TEST 2: Where last character "ç" is 2 bytes
s = "in French there is the ç"
check for zero or one of ん [3 bytes]"
s =~ /ん?\z/u #=> 24 # OK 2 <= 3
check for zero or one of é [2 bytes]
s =~ /é?\z/u #=> 24 # OK 2 == 2
s =~ /x?é?\z/u #=> 24 # OK 2 < (2+1)
test for zero or one of \n [1 bytes]
s =~ /\n?\z/u #=> nil # KO 2 > 1 ( the BUG occurs )
s =~ /\n?\n?\z/u #=> 24 # OK 2 == (1+1)
s =~ /\n?\n?\n?\z/u #=> 24 # OK 2 < (1+1+1)
By results of TEST2 we can assert: if the last multi-byte character of the string is 2 bytes , then the 'zero or one before' test only works when we check for at least 2 bytes (not 2 character) before.
When the multi-byte character is not at the end of the string I found it works correctly.
public gist with my test code available here
In Ruby trunk, the issue has now been accepted as a bug. Hopefully, it will be fixed.
Update: Two patches have been posted in Ruby trunk.
In Ruby language, how can I get the number of lines in a string?
There is a lines method for strings which returns an Enumerator. Call count on the enumerator.
str = "Hello\nWorld"
str.lines.count # 2
str = "Hello\nWorld\n" # trailing newline is ignored
str.lines.count # 2
The lines method was introduced in Ruby 1.8.7. If you're using an older version, checkout the answers by #mipadi and #Greg.
One way would be to count the number of line endings (\n or \r\n, depending on the string), the caveat being that if the string does not end in a new line, you'll have to make sure to add one to your count. You could do so with the following:
c = my_string.count("\n")
c += 1 unless c[-1,1] == "\n"
You could also just loop through the string and count the lines:
c = 0
my_string.each { |line| c += 1 }
Continuing with that solution, you could get really fancy and use inject:
c = my_string.each.inject(0) { |count, line| count += 1 }
string".split("\n").size works nicely. I like that it ignores trailing new-lines if they don't contain content.
"Hello\nWorld\n".split("\n") # => ["Hello", "World"]
"hello\nworld\nfoo bar\n\n".split("\n").size # => 3
That might not be what you want, so use lines() as #Anurag suggested instead if you need to honor all new-lines.
"hello\nworld\nfoo bar\n\n".lines.count # => 4
"hello\nworld\nfoo bar\n\n".chomp.split("\n",-1).size # => 4
String#chomp gets rid of an end of line if it exists, and the -1 allows empty strings.
given a file object (here, in rails)
file = File.open(File.join(Rails.root, 'lib', 'file.json'))
file.readlines.count
returns the number of lines
IO#readlines performs a split method on strings (IOStrings in this case) using newlines as the separator
This will not count blank lines:
string.split("\n").select{ |line| line != "" }.size