Split a text file in ruby - ruby

I have a text file with several different sections. Each section has a header followed by the actual data. For example:
Header1
x,y,z
x,y,z
x,y,z
Header2
a,b,c
a,b,c
a,b,c
I want to read through the file in one pass and do different things with the data present under each section. I know how to parse the data, but I'm having trouble figuring out how to code the logic for "Do this until hitting Header2, then do something else until Header3, etc."
I'm using ruby, and I haven't really come across any examples of doing this. Any suggestions?

At the simplest you could do something like this:
# Process lines for header1
def do_header1(line)
puts line.split(/,/).join("|")
end
# Process lines for header2
def do_header2(line)
puts line.split(/,/).map{ |e| e.upcase}.join(",")
end
header1 = false
header2 = false
# Main loop
File.open("file.txt").each_line do |line|
if line.chomp == 'Header1' # or whatever match for header1
header1 = true
header2 = false
next
end
if line.chomp == 'Header2' # or whatever match for header2
header1 = false
header2 = true
next
end
do_header1(line) && next if header1
do_header2(line) && next if header2
end
If the number of headers becomes too high, you can start tracking headers with an integer:
header = -1
# Main loop
File.open("file.txt").each_line do |line|
if line.chomp == 'Header1' # or whatever match for header1
header = 1
next
end
if line.chomp == 'Header2' # or whatever match for header2
header = 2
next
end
do_header1(line) && next if header == 1
do_header2(line) && next if header == 2
end

A solution using objects. For each line you ask each parser if a new section has started that the parser can parse.
class Section1Parser
def section? potential_header
potential_header.chomp == 'Header1'
end
def parse line
puts "Section 1: #{line.split(/,/).join("|")}"
end
end
class Section2Parser
def section? potential_header
potential_header.chomp == 'Header2'
end
def parse line
puts "Section 2: #{line.split(/,/).join("|")}"
end
end
parsers = [Section1Parser.new, Section2Parser.new]
selected_parser = nil
File.open("c:\\temp\\file.txt").each_line do |line|
if new_parser_detected = parsers.detect {|p| p.section? line }
selected_parser = new_parser_detected
next # skip header
end
selected_parser.parse line if selected_parser
end

Would something like this work?
File.open('datafile').each_line do |s|
if s =~ /^headerpattern$/
#Start a new parsing block
...
else
#Parse data
...
end
end

In my case 'Header' was in form of following string OBJECT ObjectType ObjectNumber ObjectName
if File.exist?("all.txt") then
object_file = File
File.open("all.txt").each_line do |line|
file_name = case
when line.match('^OBJECT Table.*')
"TAB" + line.split[2] + ".TXT"
when line.match('^OBJECT Form.*')
"FOR" + line.split[2] + ".TXT"
when line.match('^OBJECT Report.*')
"REP" + line.split[2] + ".TXT"
when line.match('^OBJECT Dataport.*')
"DAT" + line.split[2] + ".TXT"
when line.match('^OBJECT XMLPort.*')
"XML" + line.split[2] + ".TXT"
when line.match('^OBJECT Codeunit.*')
"COD" + line.split[2] + ".TXT"
when line.match("^OBJECT MenuSuite.*")
"MEN" + line.split[2] + ".TXT"
when line.match('^OBJECT Page.*')
"PAG" + line.split[2] + ".TXT"
when line.match('^OBJECT Query.*')
"QUE" + line.split[2] + ".TXT"
end
unless file_name.nil?
File.exist?(file_name) { File.delete(file_name) }
object_file = File.open(file_name,"w")
end
object_file.write(line)
end
end
But there are some prerequisites: I'm always sure that first line of the file will contain a header. I'm also not closing file (this will definitely draw my karma to the zero one day).

Related

Return the lines which have no given symbol in Ruby

I want to print the lines from a website's content page which do not start with the symbol "#".
def open(url)
Net::HTTP.get(URI.parse(url))
end
page_content = open('https://virusshare.com/hashes/VirusShare_00000.md5')
line_num=0
page_content.each_line do |lines|
line_num += 1
if lines[0] == "#"
lines.each_line do |line|
if (line_num==1)
puts line
end
end
end
end
Expected result:
2d75cc1bf8e57872781f9cd04a529256
00f538c3d410822e241486ca061a57ee
3f066dd1f1da052248aed5abc4a0c6a1
781770fda3bd3236d0ab8274577dddde
................................
It works when I try to print the lines that start with "#":
lines[0] != "#"
But it does not work in the opposite way.
You could just use a mix of reject and start_with? :
require 'net/http'
def open(url)
Net::HTTP.get(URI.parse(url))
end
page_content = open('https://virusshare.com/hashes/VirusShare_00000.md5')
puts page_content.each_line.reject{ |line| line.start_with?('#') }
It outputs :
2d75cc1bf8e57872781f9cd04a529256
00f538c3d410822e241486ca061a57ee
3f066dd1f1da052248aed5abc4a0c6a1
781770fda3bd3236d0ab8274577dddde
86b6c59aa48a69e16d3313d982791398
42914d6d213a20a2684064be5c80ffa9
10699ac57f1cf851ae144ebce42fa587
248338632580f9c018c4d8f8d9c6c408
999eb1840c209aa70a84c5cf64909e5f
12c4201fe1db96a1a1711790b52a3cf9
................................
If you just want the first line :
page_content.each_line.find{ |line| !line.start_with?('#') }
Notes
page_content.each_line do |lines|
lines should be called line. It is just one line.
When you call
lines.each_line do |line|
You iterate over "each" line of just one line, so the loop isn't needed at all.
Your code could be :
require 'net/http'
def open(url)
Net::HTTP.get(URI.parse(url))
end
page_content = open('https://virusshare.com/hashes/VirusShare_00000.md5')
page_content.each_line do |line|
puts line if line[0] != "#"
end

Ruby mulitple conditional statment write to same file twice?

I am trying to create a find and replace script in ruby. But I cannot figure out how to write to the same file twice when there are two conditions matched (2 different regex patterns are found and need to be replaced in the same file) I can get it to provide 2 copies of the file concatonated with only changes made from one condition in each.
Here is my code (Specifically pattern3 and pattern4):
print "What extension do you want to modify? "
ext = gets.chomp
if ext == "py"
print("Enter password: " )
pass = gets.chomp
elsif ext == "bat"
print "Enter drive letter: "
drive = gets.chomp
print "Enter IP address and Port: "
ipport = gets.chomp
end
pattern1 = /'Admin', '.+'/
pattern2 = /password='.+'/
pattern3 = /[a-zA-Z]:\\(?i:dir1\\dir2)/
pattern4 = /http:\/\/.+:\d\d\d\d\//
Dir.glob("**/*."+ext).each do |file|
data = File.read(file)
File.open(file, "w") do |f|
if data.match(pattern1)
match = data.match(pattern1)
replace = data.gsub(pattern1, '\''+pass+'\'')
f.write(replace)
puts "File " + file + " modified " + match.to_s
elsif data.match(pattern2)
match = data.match(pattern2)
replace = data.gsub(pattern2, 'password=\''+pass+'\'')
f.write(replace)
puts "File " + file + " modified " + match.to_s
end
if data.match(pattern3)
match = data.match(pattern3)
replace = data.gsub(pattern3, drive+':\dir1\dir2')
f.write(replace)
puts "File " + file + " modified " + match.to_s
if data.match(pattern4)
match = data.match(pattern4)
replace = data.gsub(pattern4, 'http://' + ipport + '/')
f.write(replace)
puts "File " + file + " modified " + match.to_s
end
end
end
end
f.truncate(0) makes things better but truncates the first line since it concatonates from the end of the 1st modified portion of the file.
Try writing file only once after all substitutions:
print "What extension do you want to modify? "
ext = gets.chomp
if ext == "py"
print("Enter password: " )
pass = gets.chomp
elsif ext == "bat"
print "Enter drive letter: "
drive = gets.chomp
print "Enter IP address and Port: "
ipport = gets.chomp
end
pattern1 = /'Admin', '.+'/
pattern2 = /password='.+'/
pattern3 = /[a-zA-Z]:\\(?i:dir1\\dir2)/
pattern4 = /http:\/\/.+:\d\d\d\d\//
Dir.glob("**/*.#{ext}").each do |file|
data = File.read(file)
data.gsub!(pattern1, "'#{pass}'")
data.gsub!(pattern2, "password='#{pass}'")
data.gsub!(pattern3, "#{drive}:\\dir1\\dir2")
data.gsub!(pattern4, "http://#{ipport}/")
File.open(file, 'w') {|f| f.write(data)}
end

How to split the strings to line by line

I have a text file which contains
1.6.0_43/opt/oracle/agent12c/core/12.1.0.4.0/jdk/bin/java
1.6.0_43/opt/oracle/agent12c/core/12.1.0.4.0/jdk/jre/bin/java
1.5.0/opt/itm/v6.2.2/JRE/lx8266/bin/java
1.7.0_72/u01/java/jdk1.7.0_72/jre/bin/java
1.7.0_72/u01/java/jdk1.7.0_72/bin/java
I am trying to read each line by line and get the result by line by line Here is my ruby code:
logfile = "/home/weblogic/javacheck.txt"
java_count = 0
log = Facter::Util::FileRead.read(logfile)
unless log.nil?
log.each_line do |line|
if line.include?('/java')
java_count += 1
val = "#{line}"
But the output is:
"1.6.0_43/opt/oracle/agent12c/core/12.1.0.4.0/jdk/bin/java\n1.6.0_43/opt/oracle/agent12c/core/12.1.0.4.0/jdk/jre/bin/java\n1.5.0/opt/itm/v6.2.2/JRE/lx8266/bin/java\n1.7.0_72/u01/java/jdk1.7.0_72/jre/bin/java\n1.7.0_72/u01/java/jdk1.7.0_72/bin/java\n1.7.0_65/u01/java/jdk1.7.0_65/jre/bin/java\n1.7.0_65/u01/java/jdk1.7.0_65/bin/java\n
How can I convert this string into line by line?
Here is the fix code.
#!/usr/bin/env ruby
logfile = "/home/weblogic/javacheck.txt"
java_count = 0
log = File.open(logfile,"r")
unless log.nil?
log.each_line do |line|
if line.include?('/java')
puts "#{line}"
java_count += 1
end
end
end
puts "#{java_count}"

Add URL field to Jekyll post yaml data

I want to programmatically append a single-line of YAML front data to all the source _posts files.
Background: I currently have a Jekyll-powered website that generates its URLs with the following plugin:
require "Date"
module Jekyll
class PermalinkRewriter < Generator
safe true
priority :low
def generate(site)
# Until Jekyll allows me to use :slug, I have to resort to this
site.posts.each do |item|
day_of_year = item.date.yday.to_s
if item.date.yday < 10
day_of_year = '00'+item.date.yday.to_s
elsif item.date.yday < 100
day_of_year = '0'+item.date.yday.to_s
end
item.data['permalink'] = '/archives/' + item.date.strftime('%g') + day_of_year + '-' + item.slug + '.html'
end
end
end
end
end
All this does is generate a URL like /archives/12001-post-title.html, which is the two-digit year (2012), followed by the day of the year on which the post was written (in this case, January 1st).
(Aside: I like this because it essentially creates a UID for every Jekyll post, which can then be sorted by name in the generated _site folder and end up in chronological order).
However, now I want to change the URL scheme for new posts I write, but I don't want this to break all my existing URLs, when the site is generated. So, I need a way to loop through my source _posts folder and append the plugin-generated ULR to each post's YAML data, with the URL: front matter.
I'm at a loss of how to do this. I know how to append lines to a text file with Ruby, but how do I do that for all my _posts files AND have that line contain the URL that would be generated by the plugin?
Et voilĂ  ! Tested on Jekyll 2.2.0
module Jekyll
class PermalinkRewriter < Generator
safe true
priority :low
def generate(site)
#site = site
site.posts.each do |item|
if not item.data['permalink']
# complete string from 1 to 999 with leading zeros (0)
# 1 -> 001 - 20 -> 020
day_of_year = item.date.yday.to_s.rjust(3, '0')
file_name = item.date.strftime('%g') + day_of_year + '-' + item.slug + '.html'
permalink = '/archives/' + file_name
item.data['permalink'] = permalink
# get post's datas
post_path = item.containing_dir(#site.source, "")
full_path = File.join(post_path, item.name)
file_yaml = item.data.to_yaml
file_content = item.content
# rewrites the original post with the new Yaml Front Matter and content
# writes 'in stone !'
File.open(full_path, 'w') do |f|
f.puts file_yaml
f.puts '---'
f.puts "\n\n"
f.puts file_content
end
Jekyll.logger.info "Added permalink " + permalink + " to post " + item.name
end
end
end
end
end

How can I do readline arguments completion?

I have a Ruby app which uses readline with command completion.
After the first string (the command) was typed, I would like to be able to complete its arguments. The arguments list should be based on the chosen command.
Does someone have a quick example?
These are the commands:
COMMANDS = [
'collect', 'watch'
].sort
COLLECT = [
'stuff', 'otherstuff'
].sort
comp = proc do |s|
COMMANDS.grep( /^#{Regexp.escape(s)}/ )
end
Readline.completion_proc = comp
Each time I press TAB, the proc block is executed and a command from the COMMANDS array is matched.
After one of the commands was fully matched I would like to start searching for the argument only in the COLLECT array.
Since your question popped up first every time I looked for something like this I want to share my code for any one else.
#!/usr/bin/env ruby
require 'readline'
module Shell
PROMPT = "shell> "
module InputCompletor
CORE_WORDS = %w[ clear help show exit export]
SHOW_ARGS = %w[ list user ]
EXPORT_ARGS = %w[ file ]
COMPLETION_PROC = proc { |input|
case input
when /^(show|export) (.*)/
command = $1
receiver = $2
DISPATCH_TABLE[$1].call($2)
when /^(h|s|c|e.*)/
receiver = $1
CORE_WORDS.grep(/^#{Regexp.quote(receiver)}/)
when /^\s*$/
puts
CORE_WORDS.map{|d| print "#{d}\t"}
puts
print PROMPT
end
}
def self.show(receiver)
if SHOW_ARGS.grep(/^#{Regexp.quote(receiver)}/).length > 1
SHOW_ARGS.grep(/^#{Regexp.quote(receiver)}/)
elsif SHOW_ARGS.grep(/^#{Regexp.quote(receiver)}/).length == 1
"show #{SHOW_ARGS.grep(/^#{Regexp.quote(receiver)}/).join}"
end
end
def self.export(receiver)
if EXPORT_ARGS.grep(/^#{Regexp.quote(receiver)}/).length > 1
EXPORT_ARGS.grep(/^#{Regexp.quote(receiver)}/)
elsif EXPORT_ARGS.grep(/^#{Regexp.quote(receiver)}/).length == 1
"export #{EXPORT_ARGS.grep(/^#{Regexp.quote(receiver)}/).join}"
end
end
DISPATCH_TABLE = {'show' => lambda {|x| show(x)} ,
'export' => lambda {|x| export(x)}}
end
class CLI
Readline.completion_append_character = ' '
Readline.completer_word_break_characters = "\x00"
Readline.completion_proc = Shell::InputCompletor::COMPLETION_PROC
def initialize
while line = Readline.readline("#{PROMPT}",true)
Readline::HISTORY.pop if /^\s*$/ =~ line
begin
if Readline::HISTORY[-2] == line
Readline::HISTORY.pop
end
rescue IndexError
end
cmd = line.chomp
case cmd
when /^clear/
system('clear')
when /^help/
puts 'no help here'
when /show list/
puts 'nothing to show'
when /^show\s$/
puts 'missing args'
when /export file/
puts 'nothing to export'
when /^export\s$/
puts 'missing args'
when /^exit/
exit
end
end
end
end
end
Shell::CLI.new
After thinking a while, the solution was very simple:
comp = proc do |s|
if Readline.line_buffer =~ /^.* /
COLLECT.grep( /^#{Regexp.escape(s)}/ )
else
COMMANDS.grep( /^#{Regexp.escape(s)}/ )
end
end
Now I just need to turn it into something more flexible/usable.

Resources