I'm reading some content from a file and use a regex and scan to discard a few things in the file and write the content to another file.
If I look at the newly written file, it has escape characters and "\n" in the file instead of actual new line.
filea.txt is:
test
in run
]
}
end
I'm getting the content between 'test' and 'end' using:
file = File.open('filea.txt', 'r')
result = file.read
regex = /(?<=test) .*?(?=end)/mx
ans = result.scan(regex)
Writing ans to a new file like fileb.txt puts:
in run'\"\n ]\n }
But, if I try writing the entire result, then it has correct content format in fileb.txt.
Your question isn't clear and needs work, but you're using read in a way that can cause scalability problems.
Here's how to accomplish the same sort of task without using read:
content = []
DATA.each_line do |li|
marker = li.lstrip
if marker =~ /^in run/i .. marker =~ /^end of file/i
content << li
end
end
content # => ["in run\n", "]\n", "}\n", "end of file\n"]
__END__
test file
in run
]
}
end of file
The .. (elipsis) is a multitool in Ruby (and other languages). We use it to define ranges but can also use it to flip-flop between logic states. In this case I'm using it in the second form, a "flip-flop".
When Ruby runs the code it checks
marker =~ /^in run/i`
If that is false the if fails and the code continues. If
marker =~ /^in run/i
succeeds, Ruby will remember that it succeeded and immediately test for
marker =~ /^end of file/i
If that fails Ruby will fall into the if block and do whatever is inside the block, then continue as normal.
The next loop of each_line will hit the if tests and .. will remember that
marker =~ /^in run/i
succeeded previously and immediately test the second condition. If it is true it steps into the block and resets itself to a false again, so that any subsequent loops will fail until
marker =~ /^in run/i
returns true again.
This logic is really powerful and makes it easy to build code that can scan huge files, extracting portions of them.
There are other ways to do it but they generally run into messier logic.
In the example code I'm also using __END__ which has some rarely seen magic to it also. You'll want to read about __END__ and DATA if you don't understand what's happening.
If you're dealing with files in the GB or TB range, with lots of content you're grabbing, it might be smart to not accumulate too much into your data-gathering array content. A minor tweak will keep that from happening:
if marker =~ /^in run/i .. marker =~ /^end of file/i
content << li
next
end
unless content.empty?
# do something that clears content:
end
In this code I'm using DATA.each_line. In real life you'd want to use File.foreach instead.
Related
I'm trying to receive multiple paragraphs at once from a user.
I've tried using gets, but it doesn't seem to be working... it discards the second paragraph:
#The code:
print("Paste your text here: ")
.. essay = gets
.. puts(essay)
# Getting user imput (the second sentance is a separate paragraph)
Paste your text here: I like cake.
It makes me happy.
# What the computer did for puts(essay):
I like cake.
=> nil
I expected the result to be something like this:
"I like cake.\nIt makes me happy.\n"
But it gave me "I like cake." instead.
How could I end up with my expected result?
Add paragraphs to a string until the input consists of a empty line:
str = ""
para = "init"
str << (para = gets) until para.chomp.empty? #or para == "\n"
p str
Here's an alternative, with a slightly different logic
def getps
save, $/ = $/, "\n\n"
gets.chomp
ensure
$/ = save
end
str = getps
The global variable $/ is what Ruby uses to find out what line end is. gets gets things till line end. If we tell Ruby that line end is two newlines, then gets waits till we have two newlines in a row till it exits. Since we don't need them, we'll just chomp them off. The rest of the code is just to ensure that $/ gets restored properly afterwards so normal gets is not messed up forever.
I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).
If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.
list.txt:
car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........
Directory having multiple files: d:\new_work:
Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......
My code:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
The issue is, values are not getting deleted as expected. Also, text is empty when I tried to print the value.
There are a number of things wrong with your code, and you're not safely handling your file changes.
Meditate on this untested code:
ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")
File.foreach('D:/mywork/list.txt') do |target|
target = target.strip.sub(/,$/, '')
ACCESS_FILES.each do |filename|
new_filename = "#{filename}.new"
old_filename = "#{filename}.old"
File.open(new_filename, 'w') do |fileout|
File.foreach(filename) do |line_in|
fileout.puts line_in unless line_in[target]
end
end
File.rename(filename, old_filename)
File.rename(new_filename, filename)
File.delete(old_filename)
end
end
In your code you use:
File.open('D:\\mywork\\list.txt').read
instead, a shorter, and more concise and clear way would be to use:
File.read('D:/mywork/list.txt')
Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".
The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.
Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
While
Dir.glob("D:/new_work/*-access.txt") do |fn|
is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.
Again,
text = File.read(fn)
has scalability issues. Using foreach is a better solution. Again.
Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:
replace = text.gsub(line.strip, "")
Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:
File.open(fn, "w") { |file| file.puts replace }
A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.
A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.
I just ran your code and it works as expected on my machine. My best guess is that you're not taking the commas at the end of each line in list.txt into account. Try removing them with an extra chomp!:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
line.chomp!(",")
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
By the way, you shouldn't need this line: value.gsub!(/\r\n?/, "\n") since you're chomping all the newlines away anyway, and chomp can recognize \r\n by default.
I have a file like this:
some content
some oterh
*********************
useful1 text
useful3 text
*********************
some other content
How do I get the content of the file within between two stars line in an array. For example, on processing the above file the content of array should be like this
a=["useful1 text" , "useful2 text"]
A really hack solution is to split the lines on the stars, grab the middle part, and then split that, too:
content.split(/^\*+$/)[1].split(/\s+/).reject(&:empty?)
# => ["useful1","useful3"]
f = File.open('test_doc.txt', 'r')
content = []
f.each_line do |line|
content << line.rstrip unless !!(line =~ /^\*(\*)*\*$/)
end
f.close
The regex pattern /^*(*)*$/ matches strings that contain only asterisks. !!(line =~ /^*(*)*$/) always returns a boolean value. So if the pattern does not match, the string is added to the array.
What about this:
def values_between(array, separator)
array.slice array.index(separator)+1..array.rindex(separator)-1
end
filepath = '/tmp/test.txt'
lines = %w(trash trash separator content content separator trash)
separator = "separator\n"
File.write '/tmp/test.txt', lines.join("\n")
values_between File.readlines('/tmp/test.txt'), "separator\n"
#=> ["content\n", "content\n"]
I'd do it like this:
lines = []
File.foreach('./test.txt') do |li|
lines << li if (li[/^\*{5}/] ... li[/^\*{5}/])
end
lines[1..-2].map(&:strip).select{ |l| l > '' }
# => ["useful1 text", "useful3 text"]
/^\*{5}/ means "A string that starts with and has at least five '*'.
... is one of two uses of .. and ... and, in this use, is commonly called a "flip-flop" operator. It isn't used often in Ruby because most people don't seem to understand it. It's sometimes mistaken for the Range delimiters .. and ....
In this use, Ruby watches for the first test, li[/^\*{5}/] to return true. Once it does, .. or ... will return true until the second condition returns true. In this case we're looking for the same delimiter, so the same test will work, li[/^\*{5}/], and is where the difference between the two versions, .. and ... come into play.
.. will return toggle back to false immediately, whereas ... will wait to look at the next line, which avoids the problem of the first seeing a delimiter and then the second seeing the same line and triggering.
That lets the test assign to lines, which, prior to the [1..-2].map(&:strip).select{ |l| l > '' } looks like:
# => ["*********************\n",
# "\n",
# "useful1 text\n",
# "\n",
# "useful3 text\n",
# "\n",
# "*********************\n"]
[1..-2].map(&:strip).select{ |l| l > '' } cleans that up by slicing the array to remove the first and last elements, strip removes leading and trailing whitespace, effectively getting rid of the trailing newlines and resulting in empty lines and strings containing the desired text. select{ |l| l > '' } picks up the lines that are greater than "empty" lines, i.e., are not empty.
See "When would a Ruby flip-flop be useful?" and its related questions, and "What is a flip-flop operator?" for more information and some background. (Perl programmers use .. and ... often, for just this purpose.)
One warning though: If the file has multiple blocks delimited this way, you'll get the contents of them all. The code I wrote doesn't know how to stop until the end-of-file is reached, so you'll have to figure out how to handle that situation if it could occur.
My code is supposed to read a file on the server, store its content in an Array, then read the array elements (eventually each element is a line) and split each line into 7 parts by (:)
I wrote this code and it works 100% fine.
lines = File.readlines('/etc/passwd')
lines.each do |line|
line = line.chomp! #I removed the \n
line_arr = line.split(/:/)
puts line_arr.inspect
puts "*************"
end
I just want to know if there is a shortcut to do this since each element of the array ends with \n.
Maybe I am a bit confused between a an array elements ending with \n and a string that contains \n
the content of the file looks like this
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
As for the output, there's no specific format, because I am going to use this part and extend my code later. As long as I can access those 7 parts that I extracted from the line_arr, i should be fine.
thank you
require 'etc'
[].tap {|ary| Etc.passwd {|u|
ary << [u.name, u.passwd, u.uid, u.gid, u.gecos, u.dir, u.shell, u.change,
u.uclass, u.expire]
}}
Rule of thumb: never try to reimplement behavior that someone else has already written for you. Unless you are really, really, really, REALLY smart.
Actually, now that you have edited your question, I don't even see why you need those arrays in the first place and cannot just use the Etc.passwd iterator and Struct::Passwd directly.
Is there a way to do the following Perl structure in Ruby?
while( my $line = $file->each_line() ) {
if($line =~ /first_line/ .. /end_line_I_care_about/) {
do_something;
# this will do something on a line per line basis on the range of the match
}
}
In ruby that would read something like:
file.each_line do |line|
if line.match(/first_line/) .. line.match(/end_line_I_care_about/)
do_something;
# this will only do it based on the first match not the range.
end
end
Reading the whole file into memory is not an option and I don't know how big is the chunk of the range.
EDIT:
Thanks for the answers, the answers I got where basically the same as the code I had in the first place. The problem I was having was " It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once."
"If you don't want it to test the right operand until the next evaluation, as in sed, just use three dots ("...") instead of two. In all other regards, "..." behaves just like ".." does."
I am marking the correct answer as the one that pointed me to see that '..' can be turn off in the same call it is made.
For reference the code I am using is:
file.each_line do |line|
if line.match(/first_line/) ... line.match(/end_line_I_care_about/)
do_something;
end
end
Yes, Ruby supports flip-flops:
str = "aaa
ON
bbb
OFF
cccc
ON
ddd
OFF
eee"
str.each_line do |line|
puts line if line =~ /ON/..line =~ /OFF/
#puts line if line.match(/ON/)..line.match(/OFF/) #works too
end
Output:
ON
bbb
OFF
ON
ddd
OFF
I'm not perfectly clear on the exact semantics of the Perl code, assuming you want exactly the same. Ruby does have something that looks and works similarly, or perhaps identically: a Range as a condition works as a toggle. The code you presented works exactly as I imagine you intend.
There are a few caveats, however:
Even after you reach the end condition, lines will keep being read until you reach the end of the file. This may be a performance consideration if you expect the end condition to be near the beginning of a large file.
The start condition can be triggered multiple times, flipping the "switch" back on, doing your do_something and testing for the end condition again. This may be fine if your condition is specific enough, or if you want that behavior, but it's something to be aware of.
The end condition can be called at the same time the start condition is called giving you true for just one line.
Here's an alternative:
started = false
file.each_line do |line|
started = true if line =~ /first_line_condition/
next unless started
do_something()
break if line =~ /last_line_condition/
end
That code reads each line of the file until the start condition is reached. Then it does whatever processing you like starting with that line until you reach a line that matches your end condition, at which point it breaks out of the loop, reading no more lines from the file.
This solution is the closest to your needs. It almost looks like Perl, but this valid Ruby (although the flip-flop operator is kind of discouraged).
The file is read line by line, it is not fully loaded in memory.
File.open("my_file.txt", "r").each_line do |line|
if (line =~ /first_line/) .. (line =~ /end_line_I_care_about/)
do_something
end
end
The parentheses are optional, but they improve readability.