How to get text between two strings in ruby? - ruby

I have a text file that contains this text:
What's New in this Version
==========================
-This is the text I want to get
-It can have 1 or many lines
-These equal signs are repeated throughout the file to separate sections
Primary Category
================
I just want to get everything between ========================== and Primary Category and store that block of text in a variable. I thought the following match method would work but it gives me, NoMethodError: undefined method `match'
f = File.open(metadataPath, "r")
line = f.readlines
whatsNew = f.match(/==========================(.*)Primary Category/m).strip
Any ideas? Thanks in advance.

f is a file descriptor - you want to match on the text in the file, which you read into line. What I prefer to do instead of reading the text into an array (which is hard to regex on) is to just read it into one string:
contents = File.open(metadataPath) { |f| f.read }
contents.match(/==========================(.*)Primary Category/m)[1].strip
The last line produces your desired output:
-This is the text I want to get \n-It can have 1 or many lines\n-These equal signs are repeated throughout the file to separate sections"

f = File.open(metadataPath, "r")
line = f.readlines
line =~ /==========================(.*)Primary Category/m
whatsNew = $1
you may want to consider refining the .* though as that could be greedy

Your problem is that readlines gives you an array of strings (one for each line), but the regular expression you're using needs a single string. You could read the file as one string:
contents = File.read(metadataPath)
puts contents[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category
or you could join the lines into a single string before applying the regular expression:
lines = File.readlines(metadataPath)
puts lines.join[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category

The approach I'd take is read in the lines, find out which line numbers are a series of equal signs (using Array#find_index), and group the lines into chunks from the line after the equal signs to the line before (or two lines before) the next lot of equal signs (probably using Enumerable#each_cons(2) and map). That way I don't have to modify much if the section headings change.

Related

How to read the second line in a document.txt and then make a loop that reads line +1 in this document

My bot reads emails one by one from a document.txt file and after login with this email the bot outputs the comments that I have in another file.
I have reached the point that the bot reads the emails but I want that a specific account makes a specific and not a repeated comment.
So I have in mind the solution of reading a specific line from the comments file.
For example account 1 reads and puts line 1 of the comments file. I want to know how can I read the second line from a comments file.
This is the code part when I read comments one by one but I want to read for example line two or three!
file = 'comments.txt'
File.readlines(file).each do |line|
comment = ["#{line}"]
comment.each { |val|
comment = ["#{val}"]
}
end
File.readlines returns array. So you can do everything you want
lines = []
File.readlines(path_to_file, chomp: true).each.with_index(1) do |line, line_number|
lines << (line_number == 2 ? 'Special line' : line)
end
Try the below.
# set the line number to read
line_number = 2 # <== Reading 2nd line
comment = IO.readlines('comments.txt')[line_number-1]
Your code is overwriting the comment variable in each iteration.
I'd write your code like this:
lines = File.readlines('comments.txt')
lines.each do |line|
# entire line
end
In the loop you can do a lot of things with the single line, unfortunately I don't get 100% what you want to do (one comment vs. multiple, always the same for specific users, etc.) I hope this helps anyway.

Parse CSV file with headers when the headers are part way down the page

I have a CSV file that, as a spreadsheet, looks like this:
I want to parse the spreadsheet with the headers at row 19. Those headers wont always start at row 19, so my question is, is there a simple way to parse this spreadsheet, and specify which row holds the headers, say by using the "Date" string to identify the header row?
Right now, I'm doing this:
CSV.foreach(params['logbook'].tempfile, headers: true) do |row|
Flight.create(row.to_hash)
end
but obviously that wont work because it doesn't get the right headers.
I feel like there should be a simple solution to this since it's pretty common to have CSV files in this format.
Let's first create the csv file that would be produced from the spreadsheet.
csv =<<-_
N211E,C172,2004,Cessna,172R,airplane,airplane
C-GPGT,C172,1976,Cessna,172M,airplane,airplane
N17AV,P28A,1983,Piper,PA-28-181,airplane,airplane
N4508X,P28A,1975,Piper,PA-28-181,airplane,airplane
,,,,,,
Flights Table,,,,,,
Date,AircraftID,From,To,Route,TimeOut,TimeIn
2017-07-27,N17AV,KHPN,KHPN,KHPN KHPN,17:26,18:08
2017-07-27,N17AV,KHSE,KFFA,,16:29,17:25
2017-07-27,N17AV,W41,KHPN,,21:45,23:53
_
FName = 'test.csv'
File1.write(FName, csv)
#=> 395
We only want the part of the string that begins "Date,".The easiest option is probably to first extract the relevant text. If the file is not humongous, we can slurp it into a string and then remove the unwanted bit.
str = File.read(FName).gsub(/\A.+?(?=^Date,)/m, '')
#=> "Date,AircraftID,From,To,Route,TimeOut,TimeIn\n2017-07-27,N17AV,
# KHPN,KHPN,KHPN KHPN,17:26,18:08\n2017-07-27,N17AV,KHSE,KFFA,,16:29,
# 17:25\n2017-07-27,N17AV,W41,KHPN,,21:45,23:53\n"
The regular expression that is gsub's first argument could be written in free-spacing mode, which makes it self-documenting:
/
\A # match the beginning of the string
.+? # match any number of characters, lazily
(?=^Date,) # match "Date," at the beginning of a line in a positive lookahead
/mx # multi-line and free-spacing regex definition modes
Now that we have the part of the file we want in the string str, we can use CSV::parse to create the CSV::Table object:
csv_tbl = CSV.parse(str, headers: true)
#=> #<CSV::Table mode:col_or_row row_count:4>
The option :headers => true is documented in CSV::new.
Here are a couple of examples of how csv_tbl can be used.
csv_tbl.each { |row| p row }
#=> #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"KHPN"\
# "To":"KHPN" "Route":"KHPN KHPN" "TimeOut":"17:26" "TimeIn":"18:08">
# #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"KHSE"\
# "To":"KFFA" "Route":nil "TimeOut":"16:29" "TimeIn":"17:25">
# #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"W41"\
# "To":"KHPN" "Route":nil "TimeOut":"21:45" "TimeIn":"23:53">
(I've used the character '\' to signify that the string continues on the following line, so that readers would not have to scroll horizontally to read the lines.)
csv_tbl.each { |row| p row["From"] }
# "KHPN"
# "KHSE"
# "W41"
Readers who want to know more about how Ruby's CSV class is used may wish to read Darko Gjorgjievski's piece, "A Guide to the Ruby CSV Library, Part 1 and Part 2".
You can use the smarter_csv gem for this. Parse the file once to determine how many rows you need to skip to get to the header row you want, and then use the skip_lines option:
header_offset = <code to determine number of lines above the header>
SmarterCSV.process(params['logbook'].tempfile, skip_lines: header_offset)
From this format, I think the easiest way is to detect an empty line that comes before the header line. That would also work under changes to the header text. In terms of CSV, that would mean a whole line that has only empty cell items.

How to read a file's content and search for a string in multiple files

I have a text file that has around 100 plus entries like out.txt:
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
I have to read out.txt, line-by-line and check whether the strings like "domain\1esrt" are present in any of the files under a different directory. If present delete only that string occurrence and save the file.
I know how to read a file line-by-line and also know how to grep for a string in multiple files in a directory but I'm not sure how to join those two to achieve my above requirement.
You can create an array with all the words or strings you want to find and then delete/replace:
strings_to_delete = ['aaa', 'domain\1esrt', 'delete_me']
Then to read the file and use map to create an array with all the lines who doesn't match with none of the elements in the array created before:
# read the file 'text.txt'
lines = File.open('text.txt', 'r').map do|line|
# unless the line matches with some value on the strings_to_delete array
line unless strings_to_delete.any? do |word|
word == line.strip
end
# then remove the nil elements
end.reject(&:nil?)
And then open the file again but this time to write on it, all the lines which didn't match with the values in the strings_to_delete array:
File.open('text.txt', 'w') do |line|
lines.each do |element|
line.write element
end
end
The txt file looks like:
aaa
domain\1esrt
domain\2345p
yrtfj
tkpdp
....
....
delete_me
I don't know how it'll work with a bigger file, anyways, I hope it helps.
I would suggest using gsub here. It will run a regex search on the string and replace it with the second parameter. So if you only have to replace any single string, I believe you can simply run gsub on that string (including the newline) and replace it with an empty string:
new_file_text = text.gsub(/regex_string\n/, "")

How do I regex-match an unknown number of repeating elements?

I'm trying to write a Ruby script that replaces all rem values in a CSS file with their px equivalents. This would be an example CSS file:
body{font-size:1.6rem;margin:4rem 7rem;}
The MatchData I'd like to get would be:
# Match 1 Match 2
# 1. font-size 1. margin
# 2. 1.6 2. 4
# 3. 7
However I'm entirely clueless as to how to get multiple and different MatchData results. The RegEx that got me closest is this (you can also take a look at it at Rubular):
/([^}{;]+):\s*([0-9.]+?)rem(?=\s*;|\s*})/i
This will match single instances of value declarations (so it will properly return the desired Match 1 result), but entirely disregards multiples.
I also tried something along the lines of ([0-9.]+?rem\s*)+, but that didn't return the desired result either, and doesn't feel like I'm on the right track, as it won't return multiple result data sets.
EDIT After the suggestions in the answers, I ended up solving the problem like this:
# search for any declarations that contain rem unit values and modify blockwise
#output.gsub!(/([^ }{;]+):\s*([^}{;]*[0-9.]rem+[^;]*)(?=\s*;|\s*})/i) do |match|
# search for any single rem value
string = match.gsub(/([0-9.]+)rem/i) do |value|
# convert the rem value to px by multiplying by 10 (this is not universal!)
value = sprintf('%g', Regexp.last_match[1].to_f * 10).to_s + 'px'
end
string += ';' + match # append the original match result to the replacement
match = string # overwrite the matched result
end
You can't capture a dynamic number of match groups (at least not in ruby).
Instead you could do either one of the following:
Capture the whole value and split on space
Use multilevel matching to capture first the whole key/value pair and secondly match the value. You can use blocks on the match method in ruby.
This regex will do the job for your example :
([^}{;]+):(?:([0-9\.]+?)rem\s?)?(?:([0-9\.]+?)rem\s?)
But whith this you can't match something like : margin:4rem 7rem 9rem
This is what I've been able to do: DEMO
Regex: (?<={|;)([^:}]+)(?::)([^A-Za-z]+)
And this is what my result looks like:
# Match 1 Match 2
# 1. font-size 1. margin
# 2. 1.6 2. 4
As #koffeinfrei says, dynamic capture isn't possible in Ruby. Would be smarter to capture the whole string and remove spaces.
str = 'body{font-size:1.6rem;margin:4rem 7rem;}'
str.scan(/(?<=[{; ]).+?(?=[;}])/)
.map { |e| e.match /(?<prop>.+):(?<value>.+)/ }
#⇒ [
# [0] #<MatchData "font-size:1.6rem" prop:"font-size" value:"1.6rem">,
# [1] #<MatchData "margin:4rem 7rem" prop:"margin" value:"4rem 7rem">
# ]
The latter match might be easily adapted to return whatever you want, value.split(/\s+/) will return all the values, \d+ instead of .+ will match digits only etc.

Ruby splitting string into different files

Here I've created an algorithm that extracts an array of the Federalist papers and splits them up saving them into separate files titled "Federalist No." followed by their respective numbers. Everything works perfectly and the files are being created beautifully; however, the only problem I run into now is that it fails to create the last output.
Maybe it's because I've been staring at this for too many hours but I'm at an impasse.
I've inserted the line puts fedSections.length to see what the output is.
Using a smaller version of the compilation of the Fed papers for testing, the terminal output is 3... it creates "Federalist No. 0" a blank document to take into account empty space and "Federalist No. 1" with the first federalist paper. No "Federalist No. 2."
Any thoughts?
# Create new string to add array l to
fedString = " "
for f in 0...l.length-1
fedString += l[f] + ''
end
# Create variables applied to new files
Federalist_No= "Federalist No."
a = "0"
b = "FEDERALIST No."
fedSections = Array.new() # New array to insert Federalist paper to
fedSections = fedString.split("FEDERALIST No.") # Split string into elements of the array at each change in Federalist paper
puts fedSections.length
# Split gives empty string, off by one
for k in 0...fedSections.length-1 # Use of loop to write each Fed paper to its own file
new_text = File.open(Federalist_No + a + ".txt", "w") # Open said file with write capabilities
new_text.puts(b+a) # Write the "FEDERALIST No" and the number from "a"
new_text.puts fedSections[k] # Write contents of string (section of paper) to a file
new_text.close()
a = a.to_i + 1 # Increment "a" by one to accomodate for consecutive papers
a = a.to_s # Restore to string
end
The error is in your for loop
for k in 0...fedSections.length-1
you actually want
for k in 0..fedSections.length-1
... does not include the last element in the range
but as screenmutt said, it is more idiomatic ruby to use an each loop
fedSections.each do |section|

Resources