How far does .each read? To the end of the line? - ruby

Sorry for the newbie question. Was loading a .txt file into the following code:
line_count = 0
File.open("text.txt").each {|line| line_count += 1}
puts line_count
Does .each simply read until the end of a line before passing its value to the code block? Little explanation would be great. Thanks!

You can use .each_line to be more explicit, but yes, http://www.ruby-doc.org/core-2.0.0/IO.html#method-i-each each reads a line.
f = File.new("testfile")
f.each {|line| puts "#{f.lineno}: #{line}" }

It's really important to read the documentation, because all sorts of things are explained there. For instance, the documentation for each says:
Executes the block for every line in ios, where lines are separated by sep.
sep means "\r", "\n" or "\r\n", depending on the OS the code is running on which is also the value of the special $/ global variable which contains the default line-ending character for that OS. You can tell Ruby to use a different value for the line-end/separator if you know the file uses something else.
Regarding your code:
I'd do it this way:
line_count = 0
File.foreach("text.txt") do |line|
line_count += 1
end
puts line_count
foreach is very self-explanatory, which is important when writing code. You want it to be self-documenting as much as possible. foreach iterates over "each" line in the file. It also assumes the line-ends are the same as $/, but you can force it to be something different, perhaps the letter "z" or "." or " ", depending on your whim and fancy at the moment.

Related

Check the formatting of an entire file using regex

I have a file formatted by lines like this (I know it's a terrible format, I didn't write it):
id: 12345 synset: word1,word2
I want to read the entire file and check to see if every line is correct without having to look line by line.
I've looked into File and Regex, but couldn't find what I need. I tried to use File.read to read the entire file all at once, then use m modifier for regex to check multiple lines, but it's not working the way I anticipated (perhaps it's not what I need).
p.s. Ruby newbie :)
Assuming your file always ends with a newline, this should work:
/^(id: \d+ synset: \w+,\w+\n)+$/m
The full ruby:
content = ''
File.open('myfile.txt', 'r') { |f| content = f.read }
puts 'file is valid!' if content =~ /^(id: \d+ synset: \w+,\w+\n)+$/m
You can use this regex to check each line of the file: ^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$. You can try the following code, but I don't know any Ruby, I just searched and tested a little. It might work.
line_num = 0
text = File.open('file.txt').read
text.each_line do |line|
line_num += 1
if !/^id:\s*\d+\s+synset:\s*(?:\w+,)*\w+$/.match(line)
print "Line #{line_num} is incorrect"
end
end

i am getting a 50 different loops while including a variable in a string using ruby

I'm trying to get a string to run and print on a seperate page with a certain string and a variable concatenated. i thought i had the code right but all i get is a loop fifty timesthis is the code that i am using
f = File.open("urlfile.txt", "r")
line = ""
while (line = f.gets)
puts "<outline text=\"\" type=\"link\" url=\""+File.read("urlfile.txt")+"\" dateCreated=\"\"/>"
end
f.close
then this is what its spitting out
a loop that runs for about 50 times
http://washingtondc.craigslist.org
http://westpalmbeach.craigslist.org
http://westpalmbeach.craigslist.org
http://westslope.craigslist.org
http://westslope.craigslist.org
http://yubasutter.craigslist.org
http://yubasutter.craigslist.org
http://yuma.craigslist.org
http://yuma.craigslist.org
" dateCreated=""/>
and this is what the code should look like when it is spit out
You are re-reading the file again inside the loop; as hinted by #Mark you have to be using line inside the string interpolation.
Aside: Perhaps its better to refactor the code to idiomatic Ruby; consider the following for instance:
lines = File.open('urlfile.txt', 'r').readlines
lines.each do |line|
puts %|<outline text="" type="link" url="#{line.strip}" dateCreated=""/>|
end
Jikku's answer is correct but if the file is large readlines will be expensive. In such case read file line by line (as you are already trying to do). Here is the correct code:
f = File.open("urlfile.txt", "r")
while (line = f.gets)
puts "<outline text=\"\" type=\"link\" url=\""+line.strip+"\" dateCreated=\"\"/>"
end
f.close
Its all your code, there are only two things that I've changed.
You had predefined line which was incorrect.
While you were already using line to iterate over your file line by line, by doing File.read("urlfile.txt") you were dumping the whole file again in each iteration. Hence "so many loops" as you described in your question.

Alternative code to read and process array by newline in Ruby

My code is supposed to read a file on the server, store its content in an Array, then read the array elements (eventually each element is a line) and split each line into 7 parts by (:)
I wrote this code and it works 100% fine.
lines = File.readlines('/etc/passwd')
lines.each do |line|
line = line.chomp! #I removed the \n
line_arr = line.split(/:/)
puts line_arr.inspect
puts "*************"
end
I just want to know if there is a shortcut to do this since each element of the array ends with \n.
Maybe I am a bit confused between a an array elements ending with \n and a string that contains \n
the content of the file looks like this
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
As for the output, there's no specific format, because I am going to use this part and extend my code later. As long as I can access those 7 parts that I extracted from the line_arr, i should be fine.
thank you
require 'etc'
[].tap {|ary| Etc.passwd {|u|
ary << [u.name, u.passwd, u.uid, u.gid, u.gecos, u.dir, u.shell, u.change,
u.uclass, u.expire]
}}
Rule of thumb: never try to reimplement behavior that someone else has already written for you. Unless you are really, really, really, REALLY smart.
Actually, now that you have edited your question, I don't even see why you need those arrays in the first place and cannot just use the Etc.passwd iterator and Struct::Passwd directly.

Ruby: line by line match range

Is there a way to do the following Perl structure in Ruby?
while( my $line = $file->each_line() ) {
if($line =~ /first_line/ .. /end_line_I_care_about/) {
do_something;
# this will do something on a line per line basis on the range of the match
}
}
In ruby that would read something like:
file.each_line do |line|
if line.match(/first_line/) .. line.match(/end_line_I_care_about/)
do_something;
# this will only do it based on the first match not the range.
end
end
Reading the whole file into memory is not an option and I don't know how big is the chunk of the range.
EDIT:
Thanks for the answers, the answers I got where basically the same as the code I had in the first place. The problem I was having was " It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once."
"If you don't want it to test the right operand until the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does."
I am marking the correct answer as the one that pointed me to see that '..' can be turn off in the same call it is made.
For reference the code I am using is:
file.each_line do |line|
if line.match(/first_line/) ... line.match(/end_line_I_care_about/)
do_something;
end
end
Yes, Ruby supports flip-flops:
str = "aaa
ON
bbb
OFF
cccc
ON
ddd
OFF
eee"
str.each_line do |line|
puts line if line =~ /ON/..line =~ /OFF/
#puts line if line.match(/ON/)..line.match(/OFF/) #works too
end
Output:
ON
bbb
OFF
ON
ddd
OFF
I'm not perfectly clear on the exact semantics of the Perl code, assuming you want exactly the same. Ruby does have something that looks and works similarly, or perhaps identically: a Range as a condition works as a toggle. The code you presented works exactly as I imagine you intend.
There are a few caveats, however:
Even after you reach the end condition, lines will keep being read until you reach the end of the file. This may be a performance consideration if you expect the end condition to be near the beginning of a large file.
The start condition can be triggered multiple times, flipping the "switch" back on, doing your do_something and testing for the end condition again. This may be fine if your condition is specific enough, or if you want that behavior, but it's something to be aware of.
The end condition can be called at the same time the start condition is called giving you true for just one line.
Here's an alternative:
started = false
file.each_line do |line|
started = true if line =~ /first_line_condition/
next unless started
do_something()
break if line =~ /last_line_condition/
end
That code reads each line of the file until the start condition is reached. Then it does whatever processing you like starting with that line until you reach a line that matches your end condition, at which point it breaks out of the loop, reading no more lines from the file.
This solution is the closest to your needs. It almost looks like Perl, but this valid Ruby (although the flip-flop operator is kind of discouraged).
The file is read line by line, it is not fully loaded in memory.
File.open("my_file.txt", "r").each_line do |line|
if (line =~ /first_line/) .. (line =~ /end_line_I_care_about/)
do_something
end
end
The parentheses are optional, but they improve readability.

Ruby: Length of a line of a file in bytes?

I'm writing this little HelloWorld as a followup to this and the numbers do not add up
filename = "testThis.txt"
total_bytes = 0
file = File.new(filename, "r")
file.each do |line|
total_bytes += line.unpack("U*").length
end
puts "original size #{File.size(filename)}"
puts "Total bytes #{total_bytes}"
The result is not the same as the file size. I think I just need to know what format I need to plug in... or maybe I've missed the point entirely. How can I measure the file size line by line?
Note: I'm on Windows, and the file is encoded as type ANSI.
Edit: This produces the same results!
filename = "testThis.txt"
total_bytes = 0
file = File.new(filename, "r")
file.each_byte do |whatever|
total_bytes += 1
end
puts "Original size #{File.size(filename)}"
puts "Total bytes #{total_bytes}"
so anybody who can help now...
IO#gets works the same as if you were capturing input from the command line: the "Enter" isn't sent as part of the input; neither is it passed when #gets is called on a File or other subclass of IO, so the numbers are definitely not going to match up.
See the relevant Pickaxe section
May I enquire why you're so concerned about the line lengths summing to the file size? You may be solving a harder problem than is necessary...
Aha. I think I get it now.
Lacking a handy iPod (or any other sort, for that matter), I don't know if you want exactly 4K chunks, in which case IO#read(4000) would be your friend (4000 or 4096?) or if you're happier to break by line, in which case something like this ought to work:
class Chunkifier
def Chunkifier.to_chunks(path)
chunks, current_chunk_size = [""], 0
File.readlines(path).each do |line|
line.chomp! # strips off \n, \r or \r\n depending on OS
if chunks.last.size + line.size >= 4_000 # 4096?
chunks.last.chomp! # remove last line terminator
chunks << ""
end
chunks.last << line + "\n" # or whatever terminator you need
end
chunks
end
end
if __FILE__ == $0
require 'test/unit'
class TestFile < Test::Unit::TestCase
def test_chunking
chs = Chunkifier.to_chunks(PATH)
chs.each do |chunk|
assert 4_000 >= chunk.size, "chunk is #{chunk.size} bytes long"
end
end
end
end
Note the use of IO#readlines to get all the text in one slurp: #each or #each_line would do as well. I used String#chomp! to ensure that whatever the OS is doing, the byts at the end are removed, so that \n or whatever can be forced into the output.
I would suggest using File#write, rather than #print or #puts for the output, as the latter have a tendency to deliver OS-specific newline sequences.
If you're really concerned about multi-byte characters, consider taking the each_byte or unpack(C*) options and monkey-patching String, something like this:
class String
def size_in_bytes
self.unpack("C*").size
end
end
The unpack version is about 8 times faster than the each_byte one on my machine, btw.
You might try IO#each_byte, e.g.
total_bytes = 0
file_name = "test_this.txt"
File.open(file_name, "r") do |file|
file.each_byte {|b| total_bytes += 1}
end
puts "Original size #{File.size(file_name)}"
puts "Total bytes #{total_bytes}"
That, of course, doesn't give you a line at a time. Your best option for that is probably to go through the file via each_byte until you encounter \r\n. The IO class provides a bunch of pretty low-level read methods that might be helpful.
You potentially have several overlapping issues here:
Linefeed characters \r\n vs. \n (as per your previous post). Also EOF file character (^Z)?
Definition of "size" in your problem statement: do you mean "how many characters" (taking into account multi-byte character encodings) or do you mean "how many bytes"?
Interaction of the $KCODE global variable (deprecated in ruby 1.9. See String#encoding and friends if you're running under 1.9). Are there, for example, accented characters in your file?
Your format string for #unpack. I think you want C* here if you really want to count bytes.
Note also the existence of IO#each_line (just so you can throw away the while and be a little more ruby-idiomatic ;-)).
The issue is that when you save a text file on windows, your line breaks are two characters (characters 13 and 10) and therefore 2 bytes, when you save it on linux there is only 1 (character 10). However, ruby reports both these as a single character '\n' - it says character 10. What's worse, is that if you're on linux with a windows file, ruby will give you both characters.
So, if you know that your files are always coming from windows text files and executed on windows, every time you get a newline character you can add 1 to your count. Otherwise it's a couple of conditionals and a little state machine.
BTW there's no EOF 'character'.
f = File.new("log.txt")
begin
while (line = f.readline)
line.chomp
puts line.length
end
rescue EOFError
f.close
end
Here is a simple solution, presuming that the current file pointer is set to the start of a line in the read file:
last_pos = file.pos
next_line = file.gets
current_pos = file.pos
backup_dist = last_pos - current_pos
file.seek(backup_dist, IO::SEEK_CUR)
in this example "file" is the file from which you are reading. To do this in a loop:
last_pos = file.pos
begin loop
next_line = file.gets
current_pos = file.pos
backup_dist = last_pos - current_pos
last_pos = current_pos
file.seek(backup_dist, IO::SEEK_CUR)
end loop

Resources