loop, array and file problem in ruby - ruby

I'm currently learning ruby and here what I'm trying to do:
A script which open a file, make a subsitution, then comparing every lines to each other to see if it exist many times.
So, I tried to work directly with the string, but I didn't find how to do it, so I put every line in an array, and comparing every row.
But I got a first problem.
Here is my code:
#!/usr/bin/env ruby
DOC = "test.txt"
FIND = /,,^M/
SEP = "\n"
#make substitution
puts File.read(DOC).gsub(FIND, SEP)
#open the file and put every line in an array
openFile = File.open(DOC, "r+")
fileArray = openFile.each { |line| line.split(SEP) }
#print fileArray #--> give the name of the object
#Cross the array to compare every items to every others
fileArray.each do |items|
items.chomp
fileArray.each do |items2|
items2.chomp
#Delete if the item already exist
if items = items2
fileArray.delete(items2)
end
end
end
#Save the result in a new file
File.open("test2.txt", "w") do |f|
f.puts fileArray
end
At the end, I only have the name of the array object "fileArray". I print the object after the split, and i've got the same, so I guess the problem is from here. Little help required (if you know how to do this without array, just with the line in the file, answer appreciate too).
Thanks !
EDIT:
So, here's my code now
#!/usr/bin/env ruby
DOC = "test.txt"
FIND = /,,^M/
SEP = "\n"
#make substitution
File.read(DOC).gsub(FIND, SEP)
unique_lines = File.readlines(DOC).uniq
#Save the result in a new file
File.open('test2.txt', 'w') { |f| f.puts(unique_lines) }
Can't figure out how to chomp this.

Deleting duplicate lines in a file:
no_duplicate_lines = File.readlines("filename").uniq
No need to write so much code :)

Modify your code like this:
f.puts fileArray.join("\n")
Alternate way:
unique_lines = File.readlines("filename").uniq
# puts(unique_lines.join("\n")) # Uncomment this line and see if the variable holds the result you want...
File.open('filename', 'w') {|f| f.puts(unique_lines.join("\n"))}

Just a couple of points about the original code:
fileArray = openFile.each { |line| line.split(SEP) }
sets fileArray to a File object, which I suspect wasn't your intention. File#each (the # notation is Ruby convention to describe a particular method on an object of the supplied class) executes your supplied block for each line (it's also available with a synonym: each_line), where a line is defined by default as your OS's end-line character(s).
If you were looking to build an array of lines, then you could just have written
fileArray = openFile.readlines
and if you wanted those lines to be chomped (often a good idea) then that could be achieved by something like
fileArray = openFile.readlines.collect { |line| line.chomp }
or even (since File mixes in Enumerable)
fileArray = openFile.collect { |line| line.chomp }
And one other tiny thing: Ruby tests for equality with ==, = is only for assignment, so
if items = items2
will set items to items2 (and will always evaluate as true)

Related

Puts arrays in file using ruby

This is a part of my file:
project(':facebook-android-sdk-3-6-0').projectDir = new File('facebook-android-sdk-3-6-0/facebook-android-sdk-3.6.0/facebook')
project(':Forecast-master').projectDir = new File('forecast-master/Forecast-master/Forecast')
project(':headerListView').projectDir = new File('headerlistview/headerListView')
project(':library-sliding-menu').projectDir = new File('library-sliding-menu/library-sliding-menu')
I need to extract the names of the libs. This is my ruby function:
def GetArray
out_file = File.new("./out.txt", "w")
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
File.open(out_file, "w") do |f|
l.each do |ch|
f.write("#{ch}\n")
end
end
puts "#{l} "
end
end
My function returns this:
[]
[["CoverFlowLibrary"]]
[["Android-RSS-Reader-Library-master"]]
[["library"]]
[["facebook-android-sdk-3-6-0"]]
[["Forecast-master"]]
My problem is that I find nothing in out_file. How can I write to a file? Otherwise, I only need to get the name of the libs in the file.
Meditate on this:
"project(':facebook-android-sdk-3-6-0').projectDir'".scan(/project\(\'\:(.*)\'\).projectDir/)
# => [["facebook-android-sdk-3-6-0"]]
When scan sees the capturing (...), it will create a sub-array. That's not what you want. The knee-jerk reaction is to flatten the resulting array of arrays but that's really just a band-aid on the code because you chose the wrong method.
Instead consider this:
"project(':facebook-android-sdk-3-6-0').projectDir'"[/':([^']+)'/, 1]
# => "facebook-android-sdk-3-6-0"
This is using String's [] method to apply a regular expression with a capture and return that captured text. No sub-arrays are created.
scan is powerful and definitely has its place, but not for this sort of "find one thing" parsing.
Regarding your code, I'd do something like this untested code:
def get_array
File.new('./out.txt', 'w') do |out_file|
File.foreach('./file.txt') do |line|
l = line[/':([^']+)'/, 1]
out_file.puts l
puts l
end
end
end
Methods in Ruby are NOT camelCase, they're snake_case. Constants, like classes, start with a capital letter and are CamelCase. Don't go all Java on us, especially if you want to write code for a living. So GetArray should be get_array. Also, don't start methods with "get_", and don't call it array; Use to_a to be idiomatic.
When building a regular expression start simple and do your best to keep it simple. It's a maintainability thing and helps to reduce insanity. /':([^']+)'/ is a lot easier to read and understand, and accomplishes the same as your much-too-complex pattern. Regular expression engines are greedy and lazy and want to do as little work as possible, which is sometimes totally evil, but once you understand what they're doing it's possible to write very small/succinct patterns to accomplish big things.
Breaking it down, it basically says "find the first ': then start capturing text until the next ', which is what you're looking for. project( can be ignored as can ).projectDir.
And actually,
/':([^']+)'/
could really be written
/:([^']+)'/
but I felt generous and looked for the leading ' too.
The problem is that you're opening the file twice: once in:
out_file = File.new("./out.txt", "w")
and then once for each line:
File.open(out_file, "w") do |f| ...
Try this instead:
def GetArray
File.open("./out.txt", "w") do |f|
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
l.each do |ch|
f.write("#{ch}\n")
end # l.each
end # File.foreach
end # File.open
end # def GetArray

read file into an array excluding the the commented out lines

I'm almost a Ruby-nOOb (have just the knowledge of Ruby to write some basic .erb template or Puppet custom-facts). Looks like my requirements fairly simple but can't get my head around it.
Trying to write a .erb template, where it reads a file (with space delimited lines) to an array and then handle each array element according to the requirements. This is what I got so far:
fname = "webURI.txt"
def myArray()
#if defined? $fname
if File.exist?($fname) and File.file?($fname)
IO.readlines($fname)
end
end
myArray.each_index do |i|
myLine = myArray[i].split(' ')
puts myLine[0] +"\t=> "+ myLine.last
end
Which works just fine, except (for obvious reason) for the line that is commented out or blank lines. I also want to make sure that when spitted (by space) up, the line shouldn't have more than two fields in it; a file like this:
# This is a COMMENT
#
# Puppet dashboard
puppet controller-all-local.example.co.uk:80
# Nagios monitoring
nagios controller-all-local.example.co.uk::80/nagios
tac talend-tac-local.example.co.uk:8080/org.talend.admin
mng console talend-mca-local.example.co.uk:8080/amc # Line with three fields
So, basically these two things I'd like to achieve:
Read the lines into array, stripping off everything after the first #
Split each element and print a message if the number id more than two
Any help would be greatly appreciated. Cheers!!
Update 25/02
Thanks guy for your help!!
The blankthing doesn't work for at all; throwing in this error; but I kinda failed to understand why:
undefined method `blank?' for "\n":String (NoMethodError)
The array: myArray, which I get is actually something like this (using p instead of puts:
["\n", "puppet controller-all-local.example.co.uk:80\n", "\n", "\n", "nagios controller-all-local.example.co.uk::80/nagios\n", ..... \n"]
Hence, I had to do this to get around this prob:
$fname = "webURI.txt"
def myArray()
if File.exist?($fname) and File.file?($fname)
IO.readlines($fname).map { |arr| arr.gsub(/#.*/,'') }
end
end
# remove blank lines
SSS = myArray.reject { |ln| ln.start_with?("\n") }
SSS.each_index do |i|
myLine = SSS[i].split(' ')
if myLine.length > 2
puts "Too many arguments!!!"
elsif myLine.length == 1
puts "page"+ i.to_s + "\t=> " + myLine[0]
else
puts myLine[0] +"\t=> "+ myLine.last
end
end
You are most welcome to improve the code. cheers!!
goodArray = myArray.reject do |line|
line.start_with?('#') || line.split(' ').length > 2
end
This would reject whatever that either starts with # or the split returns an array of more than two elements returning you an array of only good items.
Edit:
For your inline commenting you can then do
goodArray.map do |line|
line.gsub(/#.*/, '')
end

string compare in Ruby not working

I'm not sure what is going on here. I need to run a string compare on two variables that are Times. One variable is a Time object using the .mtime function. The other variable is taken from a sqlite3 database. I would like to compare these times to see if the modification date is different from the last modification date that is listed in the sqlite3 table. here is the code for that part.
When I print out the values they look identical...So why is the compare not working
def scanfile
dir = Dir.new(Dir.pwd)
dir.each do |file|
fileName = File.basename(file)
modTime = File.mtime(file).strftime("%F %T")
lastMod = nil
exists = checkDB(fileName)
if exists == true
$db.execute("SELECT DateMod FROM Files WHERE fileName = '#{fileName}'") do |mod|
lastMod = mod
end
mod = modTime.to_s
printf("modTime: #{mod} lastMod: #{lastMod}\n")
if mod != lastMod
$db.execute("UPDATE Files SET NumMods=NumMods+1 WHERE fileName = '#{fileName}'")
$db.execute("UPDATE Files SET DateMod='#{modTime}' WHERE fileName = '#{fileName}'")
print "#{fileName} updated...\n"
end
else
if fileName != "." && fileName != ".."
inputRecord(fileName, modTime, modTime, 1)
print "#{fileName} inserted...\n"
end
end
end
end
When you use execute (or this version), you'll be working with the result set's rows as arrays of strings, not simple strings. So in here:
$db.execute(...) do |mod|
#...
end
your mod will be an array which contains a single string. The problem is that you're saving that array and treating it like a string; with sufficient to_s calls and similar mangling, you'll get a string that looks right to both you and Ruby and everything will work.
You should unpack the row array yourself:
$db.execute(...) do |mod|
lastMod = mod.first
# ------------^^^^^
end
Well I figured it out. I am not sure why this fixed it because I thought I was basically doing this but using:
if !modTime.to_s.eql? lastMod.to_s
worked out well..

Moving to the last line of a file while reading it in a .each loop in Ruby

I'm reading in a file that can contain any number of rows.
I only need to save the first 1000 or so, passed in as a variable "recordsToParse".
If I reach my 1000 line limit, or whatever it's set to, I need to save the trailer information in the file to verify total_records, total_amount etc.
So, I need a way to move my "pointer" from where ever I am in the file to the last line and run through one more time.
file = File.open(file_name)
parsed_file_rows = Array.new
successful_records, failed_records = 0, 0
file_contract = file_contract['File_Contract']
output_file_name = file_name.gsub(/.TXT|.txt|.dat|.DAT/,'')
file.each do |line|
line.chomp!
line_contract = determine_row_type(file_contract, line)
if line_contract
parsed_row = parse_row_by_contract(line_contract, line)
parsed_file_rows << parsed_row
successful_records += 1
else
failed_records += 1
end
if (not recordsToParse.nil?)
if successful_records > recordsToParse
# move "pointer" to last line and go through loop once more
#break;
end
end
end
store_parsed_file('Parsed_File',"#{output_file_name}_parsed", parsed_file_rows)
[successful_records, failed_records]
Use IO.seek with IO::SEEK_END to move your pointer to the end of the file, then move up to the last CR, then you have your last line.
This would only be worthwhile if the file is very big, otherwise just follow the file.each do |line| to the last line or you could read the last line like this IO.readlines("file.txt")[-1].
The easiest solution is to use a gem like elif
require "elif"
lastline = Elif.open("bigfile.txt") { |f| f.gets }
It reads your lastline in a snap undoubtedly using seek.
This is one of those times I'd take advantage of the OS's head and tail commands using something like:
head = `head -#{ records_to_parse } #{ file_to_read }`.split("\n")
tail = `tail -1 #{ file_to_read }
head.pop if (head[-1] == tail.chomp)
Then write it all out using something like:
File.open(new_file_to_write, 'w') do |fo|
fo.puts head, tail
end

Ruby: Deleting last iterated item?

What I'm doing is this: have one file as input, another as output. I chose a random line in the input, put it in the output, and then delete it.
Now, I've iterated over the file and am on the line I want. I've copied it to the output file. Is there a way to delete it? I'm doing something like this:
for i in 0..number_of_lines_to_remove
line = rand(lines_in_file-2) + 1 #not removing the first line
counter = 0
IO.foreach("input.csv", "r") { |current_line|
if counter == line
File.open("output.csv", "a") { |output|
output.write(current_line)
}
end
counter += 1
}
end
So, I have current_line, but I'm not sure how to remove it from the source file.
Array.delete_at might do. Given an index, it removes the object at that index, returning the object.
input.csv:
one,1
two,2
three,3
Program:
#!/usr/bin/ruby1.8
lines = File.readlines('/tmp/input.csv')
File.open('/tmp/output.csv', 'a') do |file|
file.write(lines.delete_at(rand(lines.size)))
end
p lines # ["two,2\n", "three,3\n"]
output.csv:
one,1
Here is a randomline class. You create a new randomline object by passing it an input file name and an output file name. You can then call the deleterandom method on that object and pass it a number of lines to delete.
The data is stored internally in arrays as well as being put to file. Currently output is in append mode so if you use the same file it will just add to the end, you could change the a to a w if you wanted to start the file fresh each time.
class Randomline
attr_accessor :inputarray, :outputarray
def initialize(filein, fileout)
#filename = filein
#filein = File.open(filein,"r+")
#fileoutput = File.open(fileout,"a")
#inputarray = []
#outputarray = []
readin()
end
def readin()
#filein.each do |line|
#inputarray << line
end
end
def deleterandom(numtodelete)
numtodelete.times do |num|
random = rand(#inputarray.size)
#outputarray << inputarray[random]
#fileoutput.puts inputarray[random]
#inputarray.delete_at(random)
end
#filein = File.open(#filename,"w")
#inputarray.each do |line|
#filein.puts line
end
end
end
here is an example of it being used
a = Randomline.new("testin.csv","testout.csv")
a.deleterandom(3)
You have to re-write the source-file after removing a line otherwise the modifications won't stick as they're performed on a copy of the data.
Keep in mind that any operation which modifies a file in-place runs the risk of truncating the file if there's an error of any sort and the operation cannot complete.
It would be safer to use some kind of simple database for this kind of thing as libraries like SQLite and BDB have methods for ensuring data integrity, but if that's not an option, you just need to be careful when writing the new input file.

Resources