parsing position files in ruby

parsing position files in ruby - ruby

I have a sample position file like below.
789754654 COLORA SOMETHING1 19370119FYY076 2342423234SS323423
742784897 COLORB SOMETHING2 20060722FYY076 2342342342SDFSD3423
I am interested in positions 54-61 (4th column). I want to change the date to be a different format. So final outcome will be:
789754654 COLORA SOMETHING1 01191937FYY076 2342423234SS323423
742784897 COLORB SOMETHING2 07222006FYY076 2342342342SDFSD3423
The columns are seperated by spaces not tabs. And the final file should have exact number of spaces as the original file....only thing changing should be the date format. How can I do this? I wrote a script but it will lose the original spaces and positioning will be messed up.
file.each_line do |line|
dob = line.split(" ")
puts dob[3] #got the date. change its format
5.times { puts "**" }
end
Can anyone suggest a better strategy so that positioning in the original file remains the same?

You can use line[range] to replace part of a line, leaving the rest the same.
#!/usr/bin/ruby1.8
line = "789754654 COLORA SOMETHING1 19370119FYY076 2342423234SS323423"
line[44..57] = "01191937FYY076"
p line
# => "789754654 COLORA SOMETHING1 01191937FYY076 2342423234SS323423"

I would:
Read the file in
Split the lines like you have
Store the data in an array (temporarily)
Do all the date changes, etc. you want to that array
Make a method that knows how to output your data correctly (with spaces) to turn the array back into a string
Print out what that method gives you
Take a look at String#ljust and/or String#rjust for making the conversion method.

You can use String#sub for a simple search/replace.
>> s.sub(/(\d{8}FYY\d{3})(\s*)/){ "Original: '#$1', Spaces: '#$2'" }
=> "789754654 COLORA SOMETHING1 Original: '19370119FYY076', Spaces: ' '2342423234SS323423"
Of course, in your case you'd output the reformatted date.
>> s.sub(/(\d{8}FYY\d{3})/){ $1.reverse }
=> "789754654 COLORA SOMETHING1 670YYF91107391 2342423234SS323423"

Use a regex to break the line out into it's constituent parts, then put them back together in the order you require.
lines = [
'789754654 COLORA SOMETHING1 19370119FYY076 2342423234SS323423',
'742784897 COLORB SOMETHING2 20060722FYY076 2342342342SDFSD3423'
]
rx = Regexp.new(/^(\d{9})(\s+)(\S+)(\s+)(\S+)(\s+)(\d{4})(\d{2})(\d{2})(FYY076)(\s+)(\S+)$/)
lines.each do |line|
match = rx.match(line)
puts sprintf("%s%s%s%s%s%s%s%s%s%s%s%s",
match[1], match[2], match[3], match[4], match[5], match[6],
match[8], match[9], match[7], match[10],match[11],match[12]
)
end

Related

Find strings and remove those lines

I'm trying to read all the text files from a directory, iterate through every file, and search for strings in the file and delete those lines. E.g.,
sample.txt
#Wrote for the configuration ideas
mag = some , db\m09oi, id\polki
jio = red\po9i8
[\]
#mag = denk
#jio = tea
I want to delete the lines having mag.
Output
#Wrote for the configuration ideas
jio = red\po9i8
[\]
#jio = tea
I tried:
Dir.glob("D:\\my_folder\\*.txt") do |file_name|
value = File.read(file_name)
change = value.gsub!(/[#m]ag/, "")
File.open(file_name, "w") { |file| file.puts change }
end
But the lines aren't removed.
Any suggestions please.

It is better to read the file line by line and if a line contains mag, just omit it, and only write other lines to the new file.
— Credits to #WiktorStribiżew
File.write(file_name, File.readlines(file_name).reject do |line|
line[/\bmag\b/]
end.join($/))
Here we read the file, split by lines with IO#readlines, reject lines having mag as a single word inside ("magistrate" won’t be matched,) join it back with the line delimiter, specified for this particular platform (e.g. \n on unix) and write it back.

Parse CSV file with headers when the headers are part way down the page

I have a CSV file that, as a spreadsheet, looks like this:
I want to parse the spreadsheet with the headers at row 19. Those headers wont always start at row 19, so my question is, is there a simple way to parse this spreadsheet, and specify which row holds the headers, say by using the "Date" string to identify the header row?
Right now, I'm doing this:
CSV.foreach(params['logbook'].tempfile, headers: true) do |row|
Flight.create(row.to_hash)
end
but obviously that wont work because it doesn't get the right headers.
I feel like there should be a simple solution to this since it's pretty common to have CSV files in this format.

Let's first create the csv file that would be produced from the spreadsheet.
csv =<<-_
N211E,C172,2004,Cessna,172R,airplane,airplane
C-GPGT,C172,1976,Cessna,172M,airplane,airplane
N17AV,P28A,1983,Piper,PA-28-181,airplane,airplane
N4508X,P28A,1975,Piper,PA-28-181,airplane,airplane
,,,,,,
Flights Table,,,,,,
Date,AircraftID,From,To,Route,TimeOut,TimeIn
2017-07-27,N17AV,KHPN,KHPN,KHPN KHPN,17:26,18:08
2017-07-27,N17AV,KHSE,KFFA,,16:29,17:25
2017-07-27,N17AV,W41,KHPN,,21:45,23:53
_
FName = 'test.csv'
File1.write(FName, csv)
#=> 395
We only want the part of the string that begins "Date,".The easiest option is probably to first extract the relevant text. If the file is not humongous, we can slurp it into a string and then remove the unwanted bit.
str = File.read(FName).gsub(/\A.+?(?=^Date,)/m, '')
#=> "Date,AircraftID,From,To,Route,TimeOut,TimeIn\n2017-07-27,N17AV,
# KHPN,KHPN,KHPN KHPN,17:26,18:08\n2017-07-27,N17AV,KHSE,KFFA,,16:29,
# 17:25\n2017-07-27,N17AV,W41,KHPN,,21:45,23:53\n"
The regular expression that is gsub's first argument could be written in free-spacing mode, which makes it self-documenting:
/
\A # match the beginning of the string
.+? # match any number of characters, lazily
(?=^Date,) # match "Date," at the beginning of a line in a positive lookahead
/mx # multi-line and free-spacing regex definition modes
Now that we have the part of the file we want in the string str, we can use CSV::parse to create the CSV::Table object:
csv_tbl = CSV.parse(str, headers: true)
#=> #<CSV::Table mode:col_or_row row_count:4>
The option :headers => true is documented in CSV::new.
Here are a couple of examples of how csv_tbl can be used.
csv_tbl.each { |row| p row }
#=> #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"KHPN"\
# "To":"KHPN" "Route":"KHPN KHPN" "TimeOut":"17:26" "TimeIn":"18:08">
# #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"KHSE"\
# "To":"KFFA" "Route":nil "TimeOut":"16:29" "TimeIn":"17:25">
# #<CSV::Row "Date":"2017-07-27" "AircraftID":"N17AV" "From":"W41"\
# "To":"KHPN" "Route":nil "TimeOut":"21:45" "TimeIn":"23:53">
(I've used the character '\' to signify that the string continues on the following line, so that readers would not have to scroll horizontally to read the lines.)
csv_tbl.each { |row| p row["From"] }
# "KHPN"
# "KHSE"
# "W41"
Readers who want to know more about how Ruby's CSV class is used may wish to read Darko Gjorgjievski's piece, "A Guide to the Ruby CSV Library, Part 1 and Part 2".

You can use the smarter_csv gem for this. Parse the file once to determine how many rows you need to skip to get to the header row you want, and then use the skip_lines option:
header_offset = <code to determine number of lines above the header>
SmarterCSV.process(params['logbook'].tempfile, skip_lines: header_offset)

From this format, I think the easiest way is to detect an empty line that comes before the header line. That would also work under changes to the header text. In terms of CSV, that would mean a whole line that has only empty cell items.

ruby regex to remove extra \n

I have a mal-formatted .csv file which is caused by some extra \n. e.g.:
Name,Comment
"Peter","Good morning"
"Paul","How are you
"
"Mary","Fine"
The 2nd row ends with a unwanted, extra \n.
How can I remove all tailing \ns which are not followed by a double-quote " (assume the whole file is read into a string already)?

Don't read the whole thing into a string, use the standard CSV parser in 1.9 to read it. If you have that in, say, pancakes.csv, then:
require 'csv'
data = CSV.open('pancakes.csv').map { |r| r.map(&:strip) }
# or
data = CSV.open('pancakes.csv').map { |r| r.map(&:chomp) }
Then you'll have this in data:
[
["Name", "Comment"],
["Peter", "Good morning"],
["Paul", "How are you"],
["Mary", "Fine"]
]
So you can get your data all clean and nicely parsed quite simply. And if you just need to clean up the CSV for some other program that can't handled embedded newlines, then you can use CSV to write it back out again.

You don't need a Regexp for that. It's basically any double-quote on its own line:
csv_string.gsub("\n\"\n", "\"\n")

Why don't you just add a trailing double quote for lines which don't end in a double quote, and remove empty lines (lines that only have a double quote)?

How to get text between two strings in ruby?

I have a text file that contains this text:
What's New in this Version
==========================
-This is the text I want to get
-It can have 1 or many lines
-These equal signs are repeated throughout the file to separate sections
Primary Category
================
I just want to get everything between ========================== and Primary Category and store that block of text in a variable. I thought the following match method would work but it gives me, NoMethodError: undefined method `match'
f = File.open(metadataPath, "r")
line = f.readlines
whatsNew = f.match(/==========================(.*)Primary Category/m).strip
Any ideas? Thanks in advance.

f is a file descriptor - you want to match on the text in the file, which you read into line. What I prefer to do instead of reading the text into an array (which is hard to regex on) is to just read it into one string:
contents = File.open(metadataPath) { |f| f.read }
contents.match(/==========================(.*)Primary Category/m)[1].strip
The last line produces your desired output:
-This is the text I want to get \n-It can have 1 or many lines\n-These equal signs are repeated throughout the file to separate sections"

f = File.open(metadataPath, "r")
line = f.readlines
line =~ /==========================(.*)Primary Category/m
whatsNew = $1
you may want to consider refining the .* though as that could be greedy

Your problem is that readlines gives you an array of strings (one for each line), but the regular expression you're using needs a single string. You could read the file as one string:
contents = File.read(metadataPath)
puts contents[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category
or you could join the lines into a single string before applying the regular expression:
lines = File.readlines(metadataPath)
puts lines.join[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category

The approach I'd take is read in the lines, find out which line numbers are a series of equal signs (using Array#find_index), and group the lines into chunks from the line after the equal signs to the line before (or two lines before) the next lot of equal signs (probably using Enumerable#each_cons(2) and map). That way I don't have to modify much if the section headings change.

Trim blank newlines from string in Ruby

I have a string of four blank lines which all up makes eight lines in total in the following:
str = "aaa\n\n\nbbb\n\nccc\ddd\n"
I want to return this all in one line. The output should be like this on a single line:
aaabbbcccddd
I used various trim functions to get the output but still I am failing.
What method do I have to use here?

The Ruby (and slightly less Perl-ish) way:
new_str = str.delete "\n"
...or if you want to do it in-place:
str.delete! "\n"

str.gsub(/\n/,'')

> str = "aaa\n\n\nbbb\n\nccc\ddd\n"
=> "aaa\n\n\nbbb\n\ncccddd\n"
> str.gsub("\n", "")
=> "aaabbbcccddd"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

parsing position files in ruby - ruby

You can use line[range] to replace part of a line, leaving the rest the same. #!/usr/bin/ruby1.8 line = "789754654 COLORA SOMETHING1 19370119FYY076 2342423234SS323423" line[44..57] = "01191937FYY076" p line # => "789754654 COLORA SOMETHING1 01191937FYY076 2342423234SS323423"

Related

Find strings and remove those lines

Parse CSV file with headers when the headers are part way down the page

ruby regex to remove extra \n

How to get text between two strings in ruby?

Trim blank newlines from string in Ruby

Categories

Resources