replace every occurrence of 'line 2' with line_2 with regex - ruby

I'm parsing some text from an XML file which has sentences like
"Subtract line 4 from line 1.", "Enter the amount from line 5"
i want to replace all occurrences of line with line_
eg. Subtract line 4 from line 1 --> Subtract line_4 from line_1
Also, there are sentences like "Are the amounts on lines 4 and 8 the same?" and "Skip lines 9 through 12; go to line 13."
I want to process these sentences to become
"Are the amounts on line_4 and line_8 the same?"
and
"Skip line_9 through line_12; go to line_13."

Here's a working implementation with rspec test. You call it like this: output = LineIdentifier[input]. To test, spec file.rb after installing rspec gem.
require 'spec'
class LineIdentifier
def self.[](input)
output = input.gsub /line (\d+)/, 'line_\1'
output.gsub /lines (\d+) (and|from|through) (line )?(\d+)/, 'line_\1 \2 line_\4'
end
end
describe "LineIdentifier" do
it "should identify line mentions" do
examples = {
#Input Output
'Subtract line 4 from line 1.' => 'Subtract line_4 from line_1.',
'Enter the amount from line 5' => 'Enter the amount from line_5',
'Subtract line 4 from line 1' => 'Subtract line_4 from line_1',
}
examples.each do |input, output|
LineIdentifier[input].should == output
end
end
it "should identify line ranges" do
examples = {
#Input Output
'Are the amounts on lines 4 and 8 the same?' => 'Are the amounts on line_4 and line_8 the same?',
'Skip lines 9 through 12; go to line 13.' => 'Skip line_9 through line_12; go to line_13.',
}
examples.each do |input, output|
LineIdentifier[input].should == output
end
end
end

This works for the specific examples including the ones in the OP comments. As is often the case when using regex to do parsing, it becomes a hodge-podge of additional cases and tests to handle ever-increasing known inputs. This handles the lists of line numbers using a while loop with a non-greedy match. As written, it is simply processing an input line-by-line. To get series of line numbers across line boundaries, it would need to be changed to process it as one chunk with matching across lines.
open( ARGV[0], "r" ) do |file|
while ( line = file.gets )
# replace both "line ddd" and "lines ddd" with line_ddd
line.gsub!( /(lines?\s)(\d+)/, 'line_\2' )
# Now replace the known sequences with a non-greedy match
while line.gsub!( /(line_\d+[a-z]?,?)(\sand\s|\sthrough\s|,\s)(\d+)/, '\1\2line_\3' )
end
puts line
end
end
Sample Data: For this input:
Subtract line 4 from line 1.
Enter the amount from line 5
on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
... on line 10 Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4
It produces this output:
Subtract line_4 from line_1.
Enter the amount from line_5
on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
... on line_10 Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4

sed is your friend:
lines.sed:
#!/bin/sed -rf
s/lines? ([0-9]+)/line_\1/g
s/\b([0-9]+[a-z]?)\b/line_\1/g
lines.txt:
Subtract line 4 from line 1.
Enter the amount from line 5
Are the amounts on lines 4 and 8 the same?
Skip lines 9 through 12; go to line 13.
Enter the total of the amounts from Form 1040A, lines 7, 8a, 9a, 10, 11b, 12b, and 13
Add lines 2, 3, and 4
demo:
$ cat lines.txt | ./lines.sed
Subtract line_4 from line_1.
Enter the amount from line_5
Are the amounts on line_4 and line_8 the same?
Skip line_9 through line_12; go to line_13.
Enter the total of the amounts from Form 1040A, line_7, line_8a, line_9a, line_10, line_11b, line_12b, and line_13
Add line_2, line_3, and line_4
You can also make this into a sed one-liner if you prefer, although the file is more maintainable.

Related

How to access multiple lines at once in ruby

I'm writing a program to parse a basic text file, and compare certain lines from it to results from a test. I'm using specific words to find the line which should be compared to the result from the test, and then passing or failing the result based upon whether or not the line matches the result (they should be exactly the same). I'm using the following general format:
File.open(file).each do |line|
if line include? "Revision"
if line==result
puts "Correct"
else
puts "Fail"
Most of the cases are just one line, so that's easy enough. But for a few of the cases, my result is 4 lines long, not just one. So, once I find the line I need, I need to check to see if the result is equal to the line of interest plus the following 3 lines after it. This is how the information is formatted in the file being read, and also how the result from the test should look:
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
Once the line of interest is found, I just need to compare the line of interest plus the next three lines to the whole result.
if line include? "Product Serial Number"
#if (#this line and the next 3) == result
puts Correct
else
puts "Fail"
How do I do this?
text =<<_
My, oh my
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
My, oh my
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.08
My, ho my
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
_
result =<<_.lines
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
_
#=> ["Product Serial Number: 12058-2865\n", "Product Part Number: 3456\n",
# "Product Type: H-Type\n", "Product Version: 2.07\n"]
FName = "test"
File.write(FName, text)
#=> 339
target = "Product Serial Number"
nbr_result_lines = result.size
#=> 4
lines = File.readlines(FName)
#=> ["My, oh my\n",
# "Product Serial Number: 12058-2865\n",
# ...
# "Product Version: 2.07\n"]
lines.each_with_index do |line, i|
(puts (lines[i, nbr_result_lines] == result ? "Correct" : "Fail")) if
line.match?(target)
end
# "Correct"
# "Fail"
# "Correct"
Note that the array lines[i, nbr_result_lines] will end with one or more nils when i is sufficiently large.
If the file is so large that slurping it into an array is undesirable or infeasible, one could
read the first nbr_result_lines into a buffer (using, say, IO::foreach);
compare target with the first line of the buffer, if a match, compare result with the buffer;
remove the first line of the buffer, add the next line of the file to the end of the buffer and repeat the above, continuing until the buffer has been examined after the last line of the file has been added to it.
There is exist similar answered question: reading a mulitply lines at once
I think if you have file with knowed format and have persisten series of lines, you can read multiply lines to array and iterate over array elements with needed logic.
File.foreach("large_file").each_slice(8) do |eight_lines|
# eight_lines is an array containing 8 lines.
# at this point you can iterate over these lines
end
Yep loop in loop not very good, but better rather multiply if else
Well you can have several approaches for this, the easy way is to go through each line. and try to detect the sequence like this, it should be something similar to state machine for detecting a sequence:
step = 0
File.open('sample-file.txt').each do |line|
if /^Product Serial Number.*/.match? line
puts(step = 1)
elsif /^Product Part Number.*/.match?(line) && step == 1
puts(step = 2)
elsif /^Product Type.*/.match?(line) && step == 2
puts(step = 3)
elsif /^Product Version.*/.match?(line) && step == 3
puts 'correct'
puts(step = 0)
else
puts(step = 0)
end
end
with this result:
ruby read_file.rb
1
2
3
correct
0
0
1
0
0
0
0
0
0
1
2
3
correct
0
0
and this sample file:
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
no good line
Product Serial Number: 12058-2865
BAD Part Number: 3456
Product Type: H-Type
Product Version: 2.07
no good line
no good line
no good line
Product Serial Number: 12058-2865
Product Part Number: 3456
Product Type: H-Type
Product Version: 2.07
no good line

Ruby question mark in filename

I have a little piece of ruby that creates a file containing tsv content with 2 columns, a date, and a random number.
#!/usr/bin/ruby
require 'date'
require 'set'
startDate=Date.new(2014,11,1)
endDate=Date.new(2015,9,1)
dates=File.new("/PATH_TO_FILE/dates_randoms.tsv","w+")
rands=Set.new
while startDate <= endDate do
random=rand(1000)
while rands.add?(random).nil? do
random=rand(1000)
end
dates.puts("#{startDate.to_s.gsub("-","")} #{random}")
startDate=startDate+1
end
Then, from another program, i read this file and create a file out of the random number:
dates_file=File.new(DATES_FILE_PATH,"r")
dates_file.each_line do |line|
parts=line.split("\t")
random=parts.at(1)
table=File.new("#{TMP_DIR}#{random}.tsv","w")
end
But when i go and check the file i see 645?.tsv for example.
I initially thought that was the line separator in the tsv file (the one containing the date and the random) but its run in the same unix filesystem, its not a transaction from dos to unix
Some lines from the file:
head dates_randoms.tsv
20141101 356
20141102 604
20141103 680
20141104 668
20141105 995
20141106 946
20141107 354
20141108 234
20141109 429
20141110 384
Any advice?
parts = line.split("\t")
random = parts.at(1)
line there will contain a trailing newline char. So for a line
"whatever\t1234\n"
random will contain "1234\n". That newline char then becomes a part of filename and you see it as a question mark. The simplest workaround is to do some sanitization:
random = parts.at(1).chomp
# alternatively use .strip if you want to remove whitespaces
# from beginning of the value too

Converting a multi line string to an array in Ruby using line breaks as delimiters

I would like to turn this string
"P07091 MMCNEFFEG
P06870 IVGGWECEQHS
SP0A8M0 VVPVADVLQGR
P01019 VIHNESTCEQ"
into an array that looks like in ruby.
["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]
using split doesn't return what I would like because of the line breaks.
This is one way to deal with blank lines:
string.split(/\n+/)
For example,
string = "P07091 MMCNEFFEG
P06870 IVGGWECEQHS
SP0A8M0 VVPVADVLQGR
P01019 VIHNESTCEQ"
string.split(/\n+/)
#=> ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS",
# "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]
To accommodate files created under Windows (having line terminators \r\n) replace the regular expression with /(?:\r?\n)+/.
I like to use this as a pretty generic method for handling newlines and returns:
lines = string.split(/\n+|\r+/).reject(&:empty?)
string = "P07091 MMCNEFFEG
P06870 IVGGWECEQHS
SP0A8M0 VVPVADVLQGR
P01019 VIHNESTCEQ"
Using CSV::parse
require 'csv'
CSV.parse(string).flatten
# => ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]
Another way using String#each_line :-
ar = []
string.each_line { |line| ar << line.strip unless line == "\n" }
ar # => ["P07091 MMCNEFFEG", "P06870 IVGGWECEQHS", "SP0A8M0 VVPVADVLQGR", "P01019 VIHNESTCEQ"]
Building off of #Martin's answer:
lines = string.split("\n").reject(&:blank?)
That'll give you only the lines that are valued
Split can take a parameter in the form of the character to use to split, so you can do:
lines = string.split("\n")
I think it should be noted that in some situations, line breaks can include not only newlines (\n) but also carriage returns (\r) and that there could potentially be any combination or quantity thereof. Let's take the following string for example:
str = "Useful Line 1 ....
Useful Line 2
Useful Line 3
Useful Line 4... \n
Useful Line 5\r \n
Useful Line 6\n\r
Useful Line 7\n\r\n\r
Useful Line 8 \r\n\r\n
Useful Line 9\r\r\r Useful Line 10\n\n\n\n\nUseful Line 11 \r Useful Line 12"
To deal with all instances of \n and \r, I would do the following to replace all instances of \r with \n using gsub, and then I would combine all consecutive instances of \n using squeeze(arg):
str.gsub("\r", "\n").squeeze("\n")
which would result in :
#=>
"Useful Line 1 ....
Useful Line 2
Useful Line 3
Useful Line 4...
Useful Line 5
Useful Line 6
Useful Line 7
Useful Line 8
Useful Line 9
Useful Line 10
Useful Line 11
Useful Line 12"
...which brings me to our next issue. Sometimes those extra line breaks contain unwanted whitespace and not truly blank or empty lines. To deal with not only line breaks but also unwanted empty lines, I would add the each_line, reject, and strip method like so:
str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.join
which would result in the desired string:
#=>
Useful Line 1 ....
Useful Line 2
Useful Line 3
Useful Line 4...
Useful Line 5
Useful Line 6
Useful Line 7
Useful Line 8
Usefule Line 9
Useful Line 10
Useful Line 11
Useful Line 12
Now more specifically to the OP, we could then simply use split("\n") to finish it all off (as was already mentioned by others):
str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.join.split("\n")
or we could simply skip straight to the desired array by replacing each_line with map and leaving off the unnecessary join like so:
str.gsub("\r", "\n").squeeze("\n").split("\n").map.reject{|x| x.strip == ""}
both of which would result in:
#=>
["Useful Line 1 ....", " Useful Line 2", "Useful Line 3", " Useful Line 4... ", "Useful Line 5", " Useful Line 6", "Useful Line 7", " Useful Line 8 ", "Usefule Line 9", " Useful Line 10", "Useful Line 11 ", " Useful Line 12"]
NOTE:
You may also want to strip off leading and trailing whitespace from each line in which case we could replace .join.split("\n") with .map(&:strip) like so:
str.gsub("\r", "\n").squeeze("\n").each_line.reject{|x| x.strip == ""}.map(&:strip)
or
str.gsub("\r", "\n").squeeze("\n").split("\n").map.reject{|x| x.strip == ""}.map(&:strip)
which would both result in:
#=>
["Useful Line 1 ....", "Useful Line 2", "Useful Line 3", "Useful Line 4...", "Useful Line 5", "Useful Line 6", "Useful Line 7", "Useful Line 8", "Usefule Line 9", "Useful Line 10", "Useful Line 11", "Useful Line 12"]

Recovering hex data from a large log-file using Ruby and RegEx

I'm trying to filter/append lines of hex data from a large log-file, using Ruby and RegEx.
The lines of the log-file that I need look like this:
Data: 10 55 61 (+ lots more hex data)
I want to add all of the hex data, for further processing later. The regex /^\sData:(.+)/ should do the trick.
My Ruby-program looks like this:
puts "Start"
fileIn = File.read("inputfile.txt")
fileOut = File.new("outputfile.txt", "w+")
fileOut.puts "Start of regex data\n"
fileIn.each_line do
dataLine = fileIn.match(/^\sData:(.+)/).captures
fileOut.write dataLine
end
fileOut.puts "\nEOF"
fileOut.close
puts "End"
It works - sort of - but the lines in the output file are all the same, just repeating the result of the first regex match.
What am I doing wrong?
You are iterating over the same entire file. You need to iterate over the line.
fileIn.each_line do |line|
dataLine = line.match(/^\sData:(.+)/).captures
fileOut.write dataLine
end

How to write some value to a text file in ruby based on position

I need some help is some unique solution. I have a text file in which I have to replace some value based on some position. This is not a big file and will always contain 5 lines with fixed number of length in all the lines at any given time. But I have to specficaly replace soem text in some position only. Further, i can also put in some text in required position and replace that text with required value every time. I am not sure how to implement this solution. I have given the example below.
Line 1 - 00000 This Is Me 12345 trying
Line 2 - 23456 This is line 2 987654
Line 3 - This is 345678 line 3 67890
Consider the above is the file I have to use to replace some values. Like in line 1, I have to replace '00000' with '11111' and in line 2, I have to replace 'This' with 'Line' or any require four digit text. The position will always remain the same in text file.
I have a solution which works but this is for reading the file based on position and not for writing. Can someone please give a solution similarly for wrtiting aswell based on position
Solution for reading the file based on position :
def read_var file, line_nr, vbegin, vend
IO.readlines(file)[line_nr][vbegin..vend]
end
puts read_var("read_var_from_file.txt", 0, 1, 3) #line 0, beginning at 1, ending at 3
#=>308
puts read_var("read_var_from_file.txt", 1, 3, 6)
#=>8522
I have also tried this solution for writing. This works but I need it to work based on position or based on text present in the specific line.
Explored solution to wirte to file :
open(Dir.pwd + '/Files/Try.txt', 'w') { |f|
f << "Four score\n"
f << "and seven\n"
f << "years ago\n"
}
I made you a working sample anagraj.
in_file = "in.txt"
out_file = "out.txt"
=begin
=>contents of file in.txt
00000 This Is Me 12345 trying
23456 This is line 2 987654
This is 345678 line 3 67890
=end
def replace_in_file in_file, out_file, shreds
File.open(out_file,"wb") do |file|
File.read(in_file).each_line.with_index do |line, index|
shreds.each do |shred|
if shred[:index]==index
line[shred[:begin]..shred[:end]]=shred[:replace]
end
end
file << line
end
end
end
shreds = [
{index:0, begin:0, end:4, replace:"11111"},
{index:1, begin:6, end:9, replace:"Line"}
]
replace_in_file in_file, out_file, shreds
=begin
=>contents of file out.txt
11111 This Is Me 12345 trying
23456 Line is line 2 987654
This is 345678 line 3 67890
=end

Resources