I'm writing a program to parse a basic text file, and compare certain lines from it to results from a test. I'm using specific words to find the line which should be compared to the result from the test, and then passing or failing the result based upon whether or not the line matches the result (they should be exactly the same). I'm using the following general format:
File.open(file).each do |line|
if line include? "Revision"
if line==result
puts "Correct"
else
puts "Fail"
Most of the cases are just one line, so that's easy enough. But for a few of the cases, my result is 4 lines long, not just one. So, once I find the line I need, I need to check to see if the result is equal to the line of interest plus the following 3 lines after it. This is how the information is formatted in the file being read, and also how the result from the test should look:
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
Once the line of interest is found, I just need to compare the line of interest plus the next three lines to the whole result.
if line include? "Product Serial Number"
#if (#this line and the next 3) == result
puts Correct
else
puts "Fail"
How do I do this?
text =<<_
My, oh my
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
My, oh my
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.08
My, ho my
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
_
result =<<_.lines
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
_
#=> ["Product Serial Number: 12058-2865\n", "Product Part Number: 3456\n",
# "Product Type: H-Type\n", "Product Version: 2.07\n"]
FName = "test"
File.write(FName, text)
#=> 339
target = "Product Serial Number"
nbr_result_lines = result.size
#=> 4
lines = File.readlines(FName)
#=> ["My, oh my\n",
# "Product Serial Number: 12058-2865\n",
# ...
# "Product Version: 2.07\n"]
lines.each_with_index do |line, i|
(puts (lines[i, nbr_result_lines] == result ? "Correct" : "Fail")) if
line.match?(target)
end
# "Correct"
# "Fail"
# "Correct"
Note that the array lines[i, nbr_result_lines] will end with one or more nils when i is sufficiently large.
If the file is so large that slurping it into an array is undesirable or infeasible, one could
read the first nbr_result_lines into a buffer (using, say, IO::foreach);
compare target with the first line of the buffer, if a match, compare result with the buffer;
remove the first line of the buffer, add the next line of the file to the end of the buffer and repeat the above, continuing until the buffer has been examined after the last line of the file has been added to it.
There is exist similar answered question: reading a mulitply lines at once
I think if you have file with knowed format and have persisten series of lines, you can read multiply lines to array and iterate over array elements with needed logic.
File.foreach("large_file").each_slice(8) do |eight_lines|
# eight_lines is an array containing 8 lines.
# at this point you can iterate over these lines
end
Yep loop in loop not very good, but better rather multiply if else
Well you can have several approaches for this, the easy way is to go through each line. and try to detect the sequence like this, it should be something similar to state machine for detecting a sequence:
step = 0
File.open('sample-file.txt').each do |line|
if /^Product Serial Number.*/.match? line
puts(step = 1)
elsif /^Product Part Number.*/.match?(line) && step == 1
puts(step = 2)
elsif /^Product Type.*/.match?(line) && step == 2
puts(step = 3)
elsif /^Product Version.*/.match?(line) && step == 3
puts 'correct'
puts(step = 0)
else
puts(step = 0)
end
end
with this result:
ruby read_file.rb
1
2
3
correct
0
0
1
0
0
0
0
0
0
1
2
3
correct
0
0
and this sample file:
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
no good line
Product Serial Number: 12058-2865
BAD Part Number: 3456
Product Type: H-Type
Product Version: 2.07
no good line
no good line
no good line
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
no good line
Related
I'm using the IO.foreach loop to find a string using regular expressions. I want to append the next block (next line) to the file_names list. How can I do that?
file_names = [""]
IO.foreach("a.txt") { |block|
if block =~ /^file_names*/
dir = # get the next block
file_names.append(dir)
end
}
Actually my input looks like this:
file_names[174]:
name: "vector"
dir_index: 1
mod_time: 0x00000000
length: 0x00000000
file_names[175]:
name: "stl_bvector.h"
dir_index: 2
mod_time: 0x00000000
length: 0x00000000
I have a list of file_names, and I want to capture each of the name, dir_index, mod_time and length properties and put them into the files_names array index according to the file_names index in the text.
You can use #each_cons to get the value of the next 4 rows from the text file:
files = IO.foreach("text.txt").each_cons(5).with_object([]) do |block, o|
if block[0] =~ /file_names.*/
o << block[1..4].map{|e| e.split(':')[1]}
end
end
puts files
#=> "vector"
# 1
# 0x00000000
# 0x00000000
# "stl_bvector.h"
# 2
# 0x00000000
# 0x00000000
Keep in mind that the files array contains subarrays of 4 elements. If the : symbol occurs later in the lines, you could replace the third line of my code with this:
o << block[1..4].map{ |e| e.partition(':').last.strip}
I also added #strip in case you want to remove the whitespaces around the values. With this line changed, the actual array will look something like this:
p files
#=>[["\"vector\"", "1", "0x00000000", "0x00000000"], ["\"stl_bvector.h\"", "2", "0x00000000", "0x00000000"]]
(the values don't contain the \ escape character, that's just the way #p shows it).
Another option, if you know the pattern 1 filename, 4 values will be persistent through the entire text file and the textfile always starts with a filename, you can replace #each_cons with #each_slice and remove the regex completely, this will also speed up the entire process:
IO.foreach("text.txt").each_slice(5).with_object([]) do |block, o|
o << block[1..4].map{ |e| e.partition(':').last.strip }
end
It's actually pretty easy to carve up a series of lines based on a pattern using slice_before:
File.readlines("data.txt").slice_before(/\Afile_names/)
Now you have an array of arrays that looks like:
[
[
"file_names[174]:\n",
" name: \"vector\"\n",
" dir_index: 1\n",
" mod_time: 0x00000000\n",
" length: 0x00000000\n"
],
[
"file_names[175]:\n",
" name: \"stl_bvector.h\"\n",
" dir_index: 2\n",
" mod_time: 0x00000000\n",
" length: 0x00000000"
]
]
Each of these groups could be transformed further, like for example into a Ruby Hash using those keys.
I'm attempting to parse the following XML:
<marketstat><type id="18">
<buy><volume>33000000</volume><avg>40.53</avg><max>65.57</max><min>6.55</min><stddev>26.61</stddev><median>58.56</median><percentile>65.57</percentile></buy>
<sell><volume>494489</volume><avg>69.47</avg><max>69.47</max><min>69.47</min><stddev>0.00</stddev><median>69.47</median><percentile>69.47</percentile></sell>
<all><volume>33494489</volume><avg>40.96</avg><max>69.47</max><min>6.55</min><stddev>26.77</stddev><median>58.56</median><percentile>6.55</percentile></all>
</type><type id="19">
<buy><volume>270000</volume><avg>1707.31</avg><max>3549.38</max><min>239.74</min><stddev>1554.26</stddev><median>239.75</median><percentile>3549.34</percentile></buy>
<sell><volume>48599</volume><avg>24930.45</avg><max>29869.95</max><min>5200.00</min><stddev>9875.66</stddev><median>29869.93</median><percentile>5232.20</percentile></sell>
<all><volume>280926</volume><avg>1957.07</avg><max>10750.00</max><min>239.74</min><stddev>3352.87</stddev><median>1874.31</median><percentile>239.74</percentile></all>
</type></marketstat>
</evec_api>
The pieces of information that I want to retrieve are the minimum sell and maximum buy values, associated with the ID, found here: <sell><min>69.47</min></sell>.
I'm currently using the following to get the XML: marketData = Nokogiri::XML(open(api))
Use xpath to pull out the nodes of interest, then convert them to Floats and pick the value you want. The path to your minimum sell node is /marketstat/type/sell/min, or if you want to use shorthand, // says "anywhere in the document", so you can specify just //sell/min to get all of the minimum sell nodes and //buy/max to get all of the maximum buys.
sells = market_data.xpath('//sell/min').map(&:content).map(&:to_f)
buys = market_data.xpath('//buy/max').map(&:content).map(&:to_f)
puts sells.min, buys.max
The following will print the ID and its corresponding min/max:
marketData = Nokogiri::XML(open(api))
marketData.xpath("//type").each do |i|
puts "#{i.attr('id')}: #{i.xpath('.//max').map {|j| j.text.to_f}.max}"
puts "#{i.attr('id')}: #{i.xpath('.//min').map {|j| j.text.to_f}.min}"
end
Output:
18: 69.47
18: 6.55
19: 29869.95
19: 239.74
I need some help is some unique solution. I have a text file in which I have to replace some value based on some position. This is not a big file and will always contain 5 lines with fixed number of length in all the lines at any given time. But I have to specficaly replace soem text in some position only. Further, i can also put in some text in required position and replace that text with required value every time. I am not sure how to implement this solution. I have given the example below.
Line 1 - 00000 This Is Me 12345 trying
Line 2 - 23456 This is line 2 987654
Line 3 - This is 345678 line 3 67890
Consider the above is the file I have to use to replace some values. Like in line 1, I have to replace '00000' with '11111' and in line 2, I have to replace 'This' with 'Line' or any require four digit text. The position will always remain the same in text file.
I have a solution which works but this is for reading the file based on position and not for writing. Can someone please give a solution similarly for wrtiting aswell based on position
Solution for reading the file based on position :
def read_var file, line_nr, vbegin, vend
IO.readlines(file)[line_nr][vbegin..vend]
end
puts read_var("read_var_from_file.txt", 0, 1, 3) #line 0, beginning at 1, ending at 3
#=>308
puts read_var("read_var_from_file.txt", 1, 3, 6)
#=>8522
I have also tried this solution for writing. This works but I need it to work based on position or based on text present in the specific line.
Explored solution to wirte to file :
open(Dir.pwd + '/Files/Try.txt', 'w') { |f|
f << "Four score\n"
f << "and seven\n"
f << "years ago\n"
}
I made you a working sample anagraj.
in_file = "in.txt"
out_file = "out.txt"
=begin
=>contents of file in.txt
00000 This Is Me 12345 trying
23456 This is line 2 987654
This is 345678 line 3 67890
=end
def replace_in_file in_file, out_file, shreds
File.open(out_file,"wb") do |file|
File.read(in_file).each_line.with_index do |line, index|
shreds.each do |shred|
if shred[:index]==index
line[shred[:begin]..shred[:end]]=shred[:replace]
end
end
file << line
end
end
end
shreds = [
{index:0, begin:0, end:4, replace:"11111"},
{index:1, begin:6, end:9, replace:"Line"}
]
replace_in_file in_file, out_file, shreds
=begin
=>contents of file out.txt
11111 This Is Me 12345 trying
23456 Line is line 2 987654
This is 345678 line 3 67890
=end
I have a relatively big text file with blocks of data layered like this:
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
(they contain more lines and then are repeated)
I would like first to extract the numerical value after TUNE X = and output these in a text file. Then I would like to extract the numerical value of LINE FREQUENCY and AMPLITUDE as a pair of values and output to a file.
My question is the following: altough I could make something moreorless working using a simple REGEXP I'm not convinced that it's the right way to do it and I would like some advices or examples of code showing how I can do that efficiently with Ruby.
Generally, (not tested)
toggle=0
File.open("file").each do |line|
if line[/TUNE/]
puts line.split("=",2)[-1].strip
end
if line[/Line Frequency/]
toggle=1
next
end
if toggle
a = line.split
puts "#{a[1]} #{a[2]}"
end
end
go through the file line by line, check for /TUNE/, then split on "=" to get last item.
Do the same for lines containing /Line Frequency/ and set the toggle flag to 1. This signify that the rest of line contains the data you want to get. Since the freq and amplitude are at fields 2 and 3, then split on the lines and get the respective positions. Generally, this is the idea. As for toggling, you might want to set toggle flag to 0 at the next block using a pattern (eg SIGNAL CASE or ANALYSIS)
file = File.open("data.dat")
#tune_x = #frequency = #amplitude = []
file.each_line do |line|
tune_x_scan = line.scan /TUNE X = (\d*\.\d*)/
data_scan = line.scan /(\d*\.\d*E[-|+]\d*)/
#tune_x << tune_x_scan[0] if tune_x_scan
#frequency << data_scan[0] if data_scan
#amplitude << data_scan[0] if data_scan
end
There are lots of ways to do it. This is a simple first pass at it:
text = 'ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 1.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 1.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 1.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 2.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 2.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 2.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
'
require 'stringio'
pretend_file = StringIO.new(text, 'r')
That gives us a StringIO object we can pretend is a file. We can read from it by lines.
I changed the numbers a bit just to make it easier to see that they are being captured in the output.
pretend_file.each_line do |li|
case
when li =~ /^TUNE.+?=\s+(.+)/
print $1.strip, "\n"
when li =~ /^\d+\s+(\S+)\s+(\S+)/
print $1, ' ', $2, "\n"
end
end
For real use you'd want to change the print statements to a file handle: fileh.print
The output looks like:
# >> 0.2561890123390808
# >> 0.2561890123391E+00 0.204316425208E-01
# >> 0.2562865535359E+00 0.288712798671E-01
# >> 1.2561890123390808
# >> 1.2561890123391E+00 0.204316425208E-01
# >> 1.2562865535359E+00 0.288712798671E-01
# >> 2.2561890123390808
# >> 2.2561890123391E+00 0.204316425208E-01
# >> 2.2562865535359E+00 0.288712798671E-01
You can read your file line by line and cut each by number of symbol, for example:
to extract tune x get symbols from
10 till 27 on line 2
to extract LINE FREQUENCY get
symbols from 3 till 22 on line 6+n
I'm parsing some text from an XML file which has sentences like
"Subtract line 4 from line 1.", "Enter the amount from line 5"
i want to replace all occurrences of line with line_
eg. Subtract line 4 from line 1 --> Subtract line_4 from line_1
Also, there are sentences like "Are the amounts on lines 4 and 8 the same?" and "Skip lines 9 through 12; go to line 13."
I want to process these sentences to become
"Are the amounts on line_4 and line_8 the same?"
and
"Skip line_9 through line_12; go to line_13."
Here's a working implementation with rspec test. You call it like this: output = LineIdentifier[input]. To test, spec file.rb after installing rspec gem.
require 'spec'
class LineIdentifier
def self.[](input)
output = input.gsub /line (\d+)/, 'line_\1'
output.gsub /lines (\d+) (and|from|through) (line )?(\d+)/, 'line_\1 \2 line_\4'
end
end
describe "LineIdentifier" do
it "should identify line mentions" do
examples = {
#Input Output
'Subtract line 4 from line 1.' => 'Subtract line_4 from line_1.',
'Enter the amount from line 5' => 'Enter the amount from line_5',
'Subtract line 4 from line 1' => 'Subtract line_4 from line_1',
}
examples.each do |input, output|
LineIdentifier[input].should == output
end
end
it "should identify line ranges" do
examples = {
#Input Output
'Are the amounts on lines 4 and 8 the same?' => 'Are the amounts on line_4 and line_8 the same?',
'Skip lines 9 through 12; go to line 13.' => 'Skip line_9 through line_12; go to line_13.',
}
examples.each do |input, output|
LineIdentifier[input].should == output
end
end
end
This works for the specific examples including the ones in the OP comments. As is often the case when using regex to do parsing, it becomes a hodge-podge of additional cases and tests to handle ever-increasing known inputs. This handles the lists of line numbers using a while loop with a non-greedy match. As written, it is simply processing an input line-by-line. To get series of line numbers across line boundaries, it would need to be changed to process it as one chunk with matching across lines.
open( ARGV[0], "r" ) do |file|
while ( line = file.gets )
# replace both "line ddd" and "lines ddd" with line_ddd
line.gsub!( /(lines?\s)(\d+)/, 'line_\2' )
# Now replace the known sequences with a non-greedy match
while line.gsub!( /(line_\d+[a-z]?,?)(\sand\s|\sthrough\s|,\s)(\d+)/, '\1\2line_\3' )
end
puts line
end
end
Sample Data: For this input:
Subtract line 4 from line 1.
Enter the amount from line 5
on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
... on line 10 Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4
It produces this output:
Subtract line_4 from line_1.
Enter the amount from line_5
on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
... on line_10 Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4
sed is your friend:
lines.sed:
#!/bin/sed -rf
s/lines? ([0-9]+)/line_\1/g
s/\b([0-9]+[a-z]?)\b/line_\1/g
lines.txt:
Subtract line 4 from line 1.
Enter the amount from line 5
Are the amounts on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
Enter the total of the amounts from Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4
demo:
$ cat lines.txt | ./lines.sed
Subtract line_4 from line_1.
Enter the amount from line_5
Are the amounts on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
Enter the total of the amounts from Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4
You can also make this into a sed one-liner if you prefer, although the file is more maintainable.