I'm trying to read all the text files from a directory, iterate through every file, and search for strings in the file and delete those lines. E.g.,
sample.txt
#Wrote for the configuration ideas
mag = some , db\m09oi, id\polki
jio = red\po9i8
[\]
#mag = denk
#jio = tea
I want to delete the lines having mag.
Output
#Wrote for the configuration ideas
jio = red\po9i8
[\]
#jio = tea
I tried:
Dir.glob("D:\\my_folder\\*.txt") do |file_name|
value = File.read(file_name)
change = value.gsub!(/[#m]ag/, "")
File.open(file_name, "w") { |file| file.puts change }
end
But the lines aren't removed.
Any suggestions please.
It is better to read the file line by line and if a line contains mag, just omit it, and only write other lines to the new file.
— Credits to #WiktorStribiżew
File.write(file_name, File.readlines(file_name).reject do |line|
line[/\bmag\b/]
end.join($/))
Here we read the file, split by lines with IO#readlines, reject lines having mag as a single word inside ("magistrate" won’t be matched,) join it back with the line delimiter, specified for this particular platform (e.g. \n on unix) and write it back.
I have a CSV file that, as a spreadsheet, looks like this:
I want to parse the spreadsheet with the headers at row 19. Those headers wont always start at row 19, so my question is, is there a simple way to parse this spreadsheet, and specify which row holds the headers, say by using the "Date" string to identify the header row?
Right now, I'm doing this:
CSV.foreach(params['logbook'].tempfile, headers: true) do |row|
Flight.create(row.to_hash)
end
but obviously that wont work because it doesn't get the right headers.
I feel like there should be a simple solution to this since it's pretty common to have CSV files in this format.
Let's first create the csv file that would be produced from the spreadsheet.
csv =<<-_
N211E,C172,2004,Cessna,172R,airplane,airplane
C-GPGT,C172,1976,Cessna,172M,airplane,airplane
N17AV,P28A,1983,Piper,PA-28-181,airplane,airplane
N4508X,P28A,1975,Piper,PA-28-181,airplane,airplane
,,,,,,
Flights Table,,,,,,
Date,AircraftID,From,To,Route,TimeOut,TimeIn
2017-07-27,N17AV,KHPN,KHPN,KHPN KHPN,17:26,18:08
2017-07-27,N17AV,KHSE,KFFA,,16:29,17:25
2017-07-27,N17AV,W41,KHPN,,21:45,23:53
_
FName = 'test.csv'
File1.write(FName, csv)
#=> 395
We only want the part of the string that begins "Date,".The easiest option is probably to first extract the relevant text. If the file is not humongous, we can slurp it into a string and then remove the unwanted bit.
str = File.read(FName).gsub(/\A.+?(?=^Date,)/m, '')
#=> "Date,AircraftID,From,To,Route,TimeOut,TimeIn\n2017-07-27,N17AV,
# KHPN,KHPN,KHPN KHPN,17:26,18:08\n2017-07-27,N17AV,KHSE,KFFA,,16:29,
# 17:25\n2017-07-27,N17AV,W41,KHPN,,21:45,23:53\n"
The regular expression that is gsub's first argument could be written in free-spacing mode, which makes it self-documenting:
/
\A # match the beginning of the string
.+? # match any number of characters, lazily
(?=^Date,) # match "Date," at the beginning of a line in a positive lookahead
/mx # multi-line and free-spacing regex definition modes
Now that we have the part of the file we want in the string str, we can use CSV::parse to create the CSV::Table object:
csv_tbl = CSV.parse(str, headers: true)
#=> #<CSV::Table mode:col_or_row row_count:4>
The option :headers => true is documented in CSV::new.
Here are a couple of examples of how csv_tbl can be used.
csv_tbl.each { |row| p row }
#=> #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"KHPN"\
# "To":"KHPN" "Route":"KHPN KHPN" "TimeOut":"17:26" "TimeIn":"18:08">
# #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"KHSE"\
# "To":"KFFA" "Route":nil "TimeOut":"16:29" "TimeIn":"17:25">
# #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"W41"\
# "To":"KHPN" "Route":nil "TimeOut":"21:45" "TimeIn":"23:53">
(I've used the character '\' to signify that the string continues on the following line, so that readers would not have to scroll horizontally to read the lines.)
csv_tbl.each { |row| p row["From"] }
# "KHPN"
# "KHSE"
# "W41"
Readers who want to know more about how Ruby's CSV class is used may wish to read Darko Gjorgjievski's piece, "A Guide to the Ruby CSV Library, Part 1 and Part 2".
You can use the smarter_csv gem for this. Parse the file once to determine how many rows you need to skip to get to the header row you want, and then use the skip_lines option:
header_offset = <code to determine number of lines above the header>
SmarterCSV.process(params['logbook'].tempfile, skip_lines: header_offset)
From this format, I think the easiest way is to detect an empty line that comes before the header line. That would also work under changes to the header text. In terms of CSV, that would mean a whole line that has only empty cell items.
I have a mal-formatted .csv file which is caused by some extra \n. e.g.:
Name,Comment
"Peter","Good morning"
"Paul","How are you
"
"Mary","Fine"
The 2nd row ends with a unwanted, extra \n.
How can I remove all tailing \ns which are not followed by a double-quote " (assume the whole file is read into a string already)?
Don't read the whole thing into a string, use the standard CSV parser in 1.9 to read it. If you have that in, say, pancakes.csv, then:
require 'csv'
data = CSV.open('pancakes.csv').map { |r| r.map(&:strip) }
# or
data = CSV.open('pancakes.csv').map { |r| r.map(&:chomp) }
Then you'll have this in data:
[
["Name", "Comment"],
["Peter", "Good morning"],
["Paul", "How are you"],
["Mary", "Fine"]
]
So you can get your data all clean and nicely parsed quite simply. And if you just need to clean up the CSV for some other program that can't handled embedded newlines, then you can use CSV to write it back out again.
You don't need a Regexp for that. It's basically any double-quote on its own line:
csv_string.gsub("\n\"\n", "\"\n")
Why don't you just add a trailing double quote for lines which don't end in a double quote, and remove empty lines (lines that only have a double quote)?
I have a text file that contains this text:
What's New in this Version
==========================
-This is the text I want to get
-It can have 1 or many lines
-These equal signs are repeated throughout the file to separate sections
Primary Category
================
I just want to get everything between ========================== and Primary Category and store that block of text in a variable. I thought the following match method would work but it gives me, NoMethodError: undefined method `match'
f = File.open(metadataPath, "r")
line = f.readlines
whatsNew = f.match(/==========================(.*)Primary Category/m).strip
Any ideas? Thanks in advance.
f is a file descriptor - you want to match on the text in the file, which you read into line. What I prefer to do instead of reading the text into an array (which is hard to regex on) is to just read it into one string:
contents = File.open(metadataPath) { |f| f.read }
contents.match(/==========================(.*)Primary Category/m)[1].strip
The last line produces your desired output:
-This is the text I want to get \n-It can have 1 or many lines\n-These equal signs are repeated throughout the file to separate sections"
f = File.open(metadataPath, "r")
line = f.readlines
line =~ /==========================(.*)Primary Category/m
whatsNew = $1
you may want to consider refining the .* though as that could be greedy
Your problem is that readlines gives you an array of strings (one for each line), but the regular expression you're using needs a single string. You could read the file as one string:
contents = File.read(metadataPath)
puts contents[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category
or you could join the lines into a single string before applying the regular expression:
lines = File.readlines(metadataPath)
puts lines.join[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category
The approach I'd take is read in the lines, find out which line numbers are a series of equal signs (using Array#find_index), and group the lines into chunks from the line after the equal signs to the line before (or two lines before) the next lot of equal signs (probably using Enumerable#each_cons(2) and map). That way I don't have to modify much if the section headings change.
I have a string of four blank lines which all up makes eight lines in total in the following:
str = "aaa\n\n\nbbb\n\nccc\ddd\n"
I want to return this all in one line. The output should be like this on a single line:
aaabbbcccddd
I used various trim functions to get the output but still I am failing.
What method do I have to use here?
The Ruby (and slightly less Perl-ish) way:
new_str = str.delete "\n"
...or if you want to do it in-place:
str.delete! "\n"
str.gsub(/\n/,'')
> str = "aaa\n\n\nbbb\n\nccc\ddd\n"
=> "aaa\n\n\nbbb\n\ncccddd\n"
> str.gsub("\n", "")
=> "aaabbbcccddd"