Ruby: line by line match range - ruby

Is there a way to do the following Perl structure in Ruby?
while( my $line = $file->each_line() ) {
if($line =~ /first_line/ .. /end_line_I_care_about/) {
do_something;
# this will do something on a line per line basis on the range of the match
}
}
In ruby that would read something like:
file.each_line do |line|
if line.match(/first_line/) .. line.match(/end_line_I_care_about/)
do_something;
# this will only do it based on the first match not the range.
end
end
Reading the whole file into memory is not an option and I don't know how big is the chunk of the range.
EDIT:
Thanks for the answers, the answers I got where basically the same as the code I had in the first place. The problem I was having was " It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once."
"If you don't want it to test the right operand until the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does."
I am marking the correct answer as the one that pointed me to see that '..' can be turn off in the same call it is made.
For reference the code I am using is:
file.each_line do |line|
if line.match(/first_line/) ... line.match(/end_line_I_care_about/)
do_something;
end
end

Yes, Ruby supports flip-flops:
str = "aaa
ON
bbb
OFF
cccc
ON
ddd
OFF
eee"
str.each_line do |line|
puts line if line =~ /ON/..line =~ /OFF/
#puts line if line.match(/ON/)..line.match(/OFF/) #works too
end
Output:
ON
bbb
OFF
ON
ddd
OFF

I'm not perfectly clear on the exact semantics of the Perl code, assuming you want exactly the same. Ruby does have something that looks and works similarly, or perhaps identically: a Range as a condition works as a toggle. The code you presented works exactly as I imagine you intend.
There are a few caveats, however:
Even after you reach the end condition, lines will keep being read until you reach the end of the file. This may be a performance consideration if you expect the end condition to be near the beginning of a large file.
The start condition can be triggered multiple times, flipping the "switch" back on, doing your do_something and testing for the end condition again. This may be fine if your condition is specific enough, or if you want that behavior, but it's something to be aware of.
The end condition can be called at the same time the start condition is called giving you true for just one line.
Here's an alternative:
started = false
file.each_line do |line|
started = true if line =~ /first_line_condition/
next unless started
do_something()
break if line =~ /last_line_condition/
end
That code reads each line of the file until the start condition is reached. Then it does whatever processing you like starting with that line until you reach a line that matches your end condition, at which point it breaks out of the loop, reading no more lines from the file.

This solution is the closest to your needs. It almost looks like Perl, but this valid Ruby (although the flip-flop operator is kind of discouraged).
The file is read line by line, it is not fully loaded in memory.
File.open("my_file.txt", "r").each_line do |line|
if (line =~ /first_line/) .. (line =~ /end_line_I_care_about/)
do_something
end
end
The parentheses are optional, but they improve readability.

Related

How to preserve format of content while writing to another file?

I'm reading some content from a file and use a regex and scan to discard a few things in the file and write the content to another file.
If I look at the newly written file, it has escape characters and "\n" in the file instead of actual new line.
filea.txt is:
test
in run
]
}
end
I'm getting the content between 'test' and 'end' using:
file = File.open('filea.txt', 'r')
result = file.read
regex = /(?<=test) .*?(?=end)/mx
ans = result.scan(regex)
Writing ans to a new file like fileb.txt puts:
in run'\"\n ]\n }
But, if I try writing the entire result, then it has correct content format in fileb.txt.
Your question isn't clear and needs work, but you're using read in a way that can cause scalability problems.
Here's how to accomplish the same sort of task without using read:
content = []
DATA.each_line do |li|
marker = li.lstrip
if marker =~ /^in run/i .. marker =~ /^end of file/i
content << li
end
end
content # => ["in run\n", "]\n", "}\n", "end of file\n"]
__END__
test file
in run
]
}
end of file
The .. (elipsis) is a multitool in Ruby (and other languages). We use it to define ranges but can also use it to flip-flop between logic states. In this case I'm using it in the second form, a "flip-flop".
When Ruby runs the code it checks
marker =~ /^in run/i`
If that is false the if fails and the code continues. If
marker =~ /^in run/i
succeeds, Ruby will remember that it succeeded and immediately test for
marker =~ /^end of file/i
If that fails Ruby will fall into the if block and do whatever is inside the block, then continue as normal.
The next loop of each_line will hit the if tests and .. will remember that
marker =~ /^in run/i
succeeded previously and immediately test the second condition. If it is true it steps into the block and resets itself to a false again, so that any subsequent loops will fail until
marker =~ /^in run/i
returns true again.
This logic is really powerful and makes it easy to build code that can scan huge files, extracting portions of them.
There are other ways to do it but they generally run into messier logic.
In the example code I'm also using __END__ which has some rarely seen magic to it also. You'll want to read about __END__ and DATA if you don't understand what's happening.
If you're dealing with files in the GB or TB range, with lots of content you're grabbing, it might be smart to not accumulate too much into your data-gathering array content. A minor tweak will keep that from happening:
if marker =~ /^in run/i .. marker =~ /^end of file/i
content << li
next
end
unless content.empty?
# do something that clears content:
end
In this code I'm using DATA.each_line. In real life you'd want to use File.foreach instead.

See if the beginning of a line matches a regex character

There are lines inside a file that contain !. I need all other lines. I only want to print lines within the file that do not start with an exclamation mark.
The line of code which I have written so far is:
unless parts.each_line.split("\n" =~ /^!/)
# other bit of nested code
end
But it doesn't work. How do I do it?
As a start I'd use:
File.foreach('foo.txt') do |li|
next if li[0] == '!'
puts li
end
foreach is extremely fast and allows your code to handle any size file - "scalable" is the term. See "Why is "slurping" a file not a good practice?" for more information.
li[0] is a common idiom in Ruby to get the first character of a string. Again, it's very fast and is my favorite way to get there, however consider these tests:
require 'fruity'
STR = '!' + ('a'..'z').to_a.join # => "!abcdefghijklmnopqrstuvwxyz"
compare do
_slice { STR[0] == '!' }
_start_with { STR.start_with?('!') }
_regex { !!STR[/^!/] }
end
# >> Running each test 32768 times. Test will take about 2 seconds.
# >> _start_with is faster than _slice by 2x ± 1.0
# >> _slice is similar to _regex
Using start_with? (or its String end equivalent end_with?) is twice as fast and it looks like I'll be using start_with? and end_with? from now on.
Combine that with foreach and your code will have a decent chance of being fast and efficient.
See "What is the fastest way to compare the start or end of a String with a sub-string using Ruby?" for more information.
You can use string#start_with to find the lines that start with a particular string.
file = File.open('file.txt').read
file.each_line do |line|
unless line.start_with?('!')
print line
end
end
You can also check the index of the first character
unless line[0] === "!"
You can also do this with Regex
unless line.scan(/^!/).length

i am getting a 50 different loops while including a variable in a string using ruby

I'm trying to get a string to run and print on a seperate page with a certain string and a variable concatenated. i thought i had the code right but all i get is a loop fifty timesthis is the code that i am using
f = File.open("urlfile.txt", "r")
line = ""
while (line = f.gets)
puts "<outline text=\"\" type=\"link\" url=\""+File.read("urlfile.txt")+"\" dateCreated=\"\"/>"
end
f.close
then this is what its spitting out
a loop that runs for about 50 times
http://washingtondc.craigslist.org
http://westpalmbeach.craigslist.org
http://westpalmbeach.craigslist.org
http://westslope.craigslist.org
http://westslope.craigslist.org
http://yubasutter.craigslist.org
http://yubasutter.craigslist.org
http://yuma.craigslist.org
http://yuma.craigslist.org
" dateCreated=""/>
and this is what the code should look like when it is spit out
You are re-reading the file again inside the loop; as hinted by #Mark you have to be using line inside the string interpolation.
Aside: Perhaps its better to refactor the code to idiomatic Ruby; consider the following for instance:
lines = File.open('urlfile.txt', 'r').readlines
lines.each do |line|
puts %|<outline text="" type="link" url="#{line.strip}" dateCreated=""/>|
end
Jikku's answer is correct but if the file is large readlines will be expensive. In such case read file line by line (as you are already trying to do). Here is the correct code:
f = File.open("urlfile.txt", "r")
while (line = f.gets)
puts "<outline text=\"\" type=\"link\" url=\""+line.strip+"\" dateCreated=\"\"/>"
end
f.close
Its all your code, there are only two things that I've changed.
You had predefined line which was incorrect.
While you were already using line to iterate over your file line by line, by doing File.read("urlfile.txt") you were dumping the whole file again in each iteration. Hence "so many loops" as you described in your question.

How far does .each read? To the end of the line?

Sorry for the newbie question. Was loading a .txt file into the following code:
line_count = 0
File.open("text.txt").each {|line| line_count += 1}
puts line_count
Does .each simply read until the end of a line before passing its value to the code block? Little explanation would be great. Thanks!
You can use .each_line to be more explicit, but yes, http://www.ruby-doc.org/core-2.0.0/IO.html#method-i-each each reads a line.
f = File.new("testfile")
f.each {|line| puts "#{f.lineno}: #{line}" }
It's really important to read the documentation, because all sorts of things are explained there. For instance, the documentation for each says:
Executes the block for every line in ios, where lines are separated by sep.
sep means "\r", "\n" or "\r\n", depending on the OS the code is running on which is also the value of the special $/ global variable which contains the default line-ending character for that OS. You can tell Ruby to use a different value for the line-end/separator if you know the file uses something else.
Regarding your code:
I'd do it this way:
line_count = 0
File.foreach("text.txt") do |line|
line_count += 1
end
puts line_count
foreach is very self-explanatory, which is important when writing code. You want it to be self-documenting as much as possible. foreach iterates over "each" line in the file. It also assumes the line-ends are the same as $/, but you can force it to be something different, perhaps the letter "z" or "." or " ", depending on your whim and fancy at the moment.

Can using the ruby flip-flop as a filter be made less kludgy?

In order to get part of text, I'm using a true if kludge in front of a flip-flop:
desired_portion_lines = text.each_line.find_all do |line|
true if line =~ /start_regex/ .. line =~ /finish_regex/
end
desired_portion = desired_portion_lines.join
If I remove the true if bit, it complains
bad value for range (ArgumentError)
Is it possible to make it less kludgy, or should I merely do
desired_portion_lines = ""
text.each_line do |line|
desired_portion_lines << line if line =~ /start_regex/ .. line =~ /finish_regex/
end
Or is there a better approach that doesn't use enumeration?
if you are doing it line by line, my preference is something like this
line =~ /finish_regex/ && p=0
line =~ /start_regex/ && p=1
puts line if p
if you have all in one string. I would use split
mystring.split(/finish_regex/).each do |item|
if item[/start_regex/]
puts item.split(/start_regex/)[-1]
end
end
I think
desired_portion_lines = ""
text.each_line do |line|
desired_portion_lines << line if line =~ /start_regex/ .. line =~ /finish_regex/
end
is perfectly acceptable. The .. operator is very powerful, but not used by a lot of people, probably because they don't understand what it does. Possibly it looks weird or awkward to you because you're not used to using it, but it'll grow on you. It's very common in Perl when dealing with ranges of lines in text files, which is where I first encountered it, and eventually was using it a lot.
The only thing I'd do differently is add some parenthesis to visually separate the logical tests from each other, and from the rest of the line:
desired_portion_lines = ""
text.each_line do |line|
desired_portion_lines << line if ( (line =~ /start_regex/) .. (line =~ /finish_regex/) )
end
Ruby (and Perl) coders seem to abhor using parenthesis, but I consider them useful for visually separating the logic tests. For me it's a readability and, by extension, a maintenance thing.
The only other thing I can think of that might help, would be to change desired_portion_lines to an array, and push your selected lines onto it. Currently, using desired_portion_lines << line appends to the string, mutating it each time. It might be faster pushing on the array then joining its elements afterward to build your string.
Back to the first example. I didn't test this but I think you can simplify it to:
desired_portion = text.each_line.find_all { |line| line =~ /start_regex/ .. line =~ /finish_regex/ }.join
The only downside to iterating over all lines in a file using the flip-flop, is that if the start-pattern can occur multiple times, you'll get each found block added to desired_portion.
You can save three characters by replacing true if with !!() (with the flip flop belonging in between the parentheses).

Resources