reading from disk multiple times possibly cause bottleneck - ruby

I'm trying to find out where the bottleneck of a ruby script is. I suspect that it might happen because the script parses thousands of lines and, for each one, it checks if a certain file is present in disk and eventually reads its contents.
def sectionsearch(brand, season, video)
mytab.trs.each_with_index do |row, i|
# ...some code goes here...
f = "modeldesc/" + brand.downcase + "/" + modelcode + ".html"
if File.exist?(f)
modeldesc = File.read(f)
else
modeldesc = ""
end
# ...more code here...
end
end
Given that there are no more than 30 different "modelcode" files for thousands of record, I was looking for a different approach that reads all the content of the folder before the each loop (since it is not going to change during the execution).
Is this approach going to speed up my script, also is this the right way to implement this?

I would probably do something like a hash (passing a block) to check for the file, on unknown keys:
def sectionsearch(brand, season, video)
modeldescrs = Hash.new do |cache, model|
if File.exist?(model)
cache[model] = File.read(model)
else
cache[model] = ''
end
end
mytab.trs.each_with_index do |row, i|
# ...some code goes here...
f = "modeldesc/" + brand.downcase + "/" + modelcode + ".html"
puts modeldescrs[f]
# ...more code here...
end
end
then just access modeldescrs[f] when you need it (the puts above is an example) if the key doesn't exist the block will be executed and it will look it up / populate it. see http://www.ruby-doc.org/core-2.0/Hash.html for more info on the block form of the initializer for Hash
Also you could make modeldescrs an instance variable if it needs to be saved.

Related

How to write a multidimensional array into separate files and then read from them in order in Ruby

I want to take a file, read the file into my program and split it into characters, split the resulting character array into a multidimensional array of 5,000 characters each, then write each separate array into a file found in the same location.
I have taken a file, read it, and created the multidimensional array. Now I want to write each separate single dimension array into separate files.
The file is obtained via user input. Then I created a chain helper method that stores the file to an array in the first mixin, this is then passed to another method that breaks it down into a multidimensional array, which finally hands it off to the end of the chain which currently is setup to make a new directory for which I will put these files.
require 'Benchmark/ips'
file = "C:\\test.php"
class String
def file_to_array
file = self
return_file = File.open(file) do |line|
line.each_char.to_a
end
return return_file
end
def file_write
file_to_write = self
if Dir.exist?("I:\\file_to_array")
File.open("I:/file_to_array/tmp.txt", "w") { |file| file.write(file_to_write) }
read_file = File.read("I:/file_to_array/tmp.txt")
else
Dir.mkdir("I:\\file_to_array")
end
end
end
class Array
def file_divider
file_to_divide = self
file_to_separate = []
count = 0
while count != file_to_divide.length
separator = count % 5000
if separator == 0
start = count - 5000
stop = count
file_to_separate << file_to_divide[start..stop]
end
count = count + 1
end
return file_to_separate
end
def file_write
file_to_write = self
if Dir.exist?("I:\\file_to_array")
File.open("I:/file_to_array/tmp.txt", "w") { |file| file.write(file_to_write) }
else
Dir.mkdir("I:\\file_to_array")
end
end
end
Benchmark.ips do |result|
result.report { file.file_to_array.file_divider.file_write }
end
Test.php
<?php
echo "hello world"
?>
This untested code is where I'd start to split text into chunks and save it:
str = "I want to take a file"
str_array = str.scan(/.{1,10}/) # => ["I want to ", "take a fil", "e"]
str_array.each.with_index(1) do |str_chunk, i|
File.write("output#{i}", str_chunk)
end
This doesn't honor word-boundaries.
Reading a separate input file is easy; You can use read if you KNOW the input will never exceed the available memory and you don't care about performance.
Thinking about it further, if you want to read a text file and break its contents into smaller files, then read it in chunks:
input = File.open('input.txt', 'r')
i = 1
until input.eof? do
chunk = input.read(10)
File.write("output#{i}", chunk)
i += 1
end
input.close
Or even better because it automatically closes the input:
File.open('input.txt', 'r') do |input|
i = 1
until input.eof? do
chunk = File.read(10)
File.write("output#{i}", chunk)
i += 1
end
end
Those are not tested but it look about right.
Use standard File API and Serialisation.
File.write('path/to/yourfile.txt', Marshal.dump([1, 2, 3]))

Outputting hash to text file

I am having trouble outputting the contents of my hash to a file. The program is one that manages a list of student records, including their StudentID, first name, last name, Major, and catalog year. Once the user is finished adding records, it is then added to the hash.
Everything in the program works perfectly, except when I try running the quit_program function, it doesn't save the contents in the file. Additionally, i am not getting any errors, any ideas?
could it potentially not be working because it is having trouble with converting the text in my hash, which is alphanumeric, into the text file?
def quit_program()
puts "Save Changes? y/n"
#changes = gets().chomp
if #changes=="y"
#fh=File.open(#file_name, 'w')
#this_string=""
#sDB.each do |key, store_account_data| #line 50
puts "#{key}: #{store_account_data.join(',')}"
end
end
#fh.puts(#this_string)
#fh.close()
end
You're not writing anything to the file. The string #this_string is empty. You should do
#sDB.each do |key, store_account_data|
#fh.puts "#{key}: #{store_account_data.join(',')}"
end
it doesn't save the contents in the file.
The following is NOT how you write to a file:
puts "#{key}: #{store_account_data.join(',')}"
That is how you write to your terminal/console window.
And this code:
#this_string=""
#fh.puts(#this_string)
writes a blank string to the file.
Here is how you write to a file:
class Student
def initialize(sDB, filename)
#sDB = sDB
#filename = filename
end
def save_changes()
puts "Save Changes? y/n"
user_answer = gets().chomp
if user_answer == "y"
File.open(#file_name, 'w') do |f|
#sDB.each do |key, store_account_data| #line 50
f.puts "#{key}: #{store_account_data.join(',')}"
end
end
end
end
could it potentially not be working because it is having trouble with
converting the text in my hash, which is alphanumeric, into the text
file?
No. Here is a concrete example you can try:
data = {
"John" => ['a', 123, 'b', 456],
"Sally" => ['c', 789, 'b', 0]
}
File.open('data.txt', 'w') do |f|
data.each do |name, data|
f.puts "#{name}: #{data.join(',')}"
end
end
$ ruby myprog.rb
$ cat data.txt
John: a,123,b,456
Sally: c,789,b,0
Also, ruby indenting is 2 spaces--not 0 spaces or 3 spaces, or anything else.
The answer is given in the error message: undefined local variable or method 'sDB'. (Which you have since removed from your question making the edited version next to impossible to answer.) Where and when is sDB defined in your program? You are evidently attempting to quit before initializing it.
In any case it is not a good thing to be accessing instance variables directly inside other methods. You should use accessor (getter and setter) methods instead. That would have probably prevented this situation from biting you in the first place.
def sdb
#sDB ||= Hash.new
end
def sdb=( key, value )
sdb
#sDB[ key ] = value
end
. . .
You are not properly writing to a file even if #sDB is defined. See Ruby - Printing a hash to txt file for an example.
Your question is missing essential input data, so there's no way to test our suggested changes.
Here's untested code I'd work from:
def quit_program
puts "Save Changes? y/n"
if gets.chomp.downcase == 'y'
File.write(
#file_name,
#s_db.map{ |k, v| "#{ k }: #{ v.join(',') }" }.join("\n")
)
end
end
Note:
#sDB isn't a proper variable name in Ruby. We use snake_case, not camelCase for variables and method names. ItsAMatterOfReadability. Follow the convention or suffer the wrath of your team members the first time you have a code review.
Don't add empty parenthesis to method names (quit_program()) or calls (gets()) unless it's essential to tell the difference between a variable and a method invocation. You should also never name a variable the same as a method because it'll confuse everyone working on the code, so that should never be a consideration.
Don't create a variable (#changes) you use once and throw away, unless what you're doing is so complex you need to break down the operation into smaller chunks. And, if you're doing that, it'd be a really good candidate for refactoring into separate methods, so again, just don't.
When comparing user-input to something you expect, fold the case of their input to match what you expect. (gets.chomp.downcase == 'y'). It really irritates users to enter "y" and fail because you insisted on "Y".
While you can use File.open to create or write to a file, there's less visual noise to use File.write. open is great when you need to use various options for the mode but for plain text write is sufficient.
The whole block used for writing looks like it can be cleaned up to a single map and join, which coerces the data into an array of strings then into a single string.

read file into an array excluding the the commented out lines

I'm almost a Ruby-nOOb (have just the knowledge of Ruby to write some basic .erb template or Puppet custom-facts). Looks like my requirements fairly simple but can't get my head around it.
Trying to write a .erb template, where it reads a file (with space delimited lines) to an array and then handle each array element according to the requirements. This is what I got so far:
fname = "webURI.txt"
def myArray()
#if defined? $fname
if File.exist?($fname) and File.file?($fname)
IO.readlines($fname)
end
end
myArray.each_index do |i|
myLine = myArray[i].split(' ')
puts myLine[0] +"\t=> "+ myLine.last
end
Which works just fine, except (for obvious reason) for the line that is commented out or blank lines. I also want to make sure that when spitted (by space) up, the line shouldn't have more than two fields in it; a file like this:
# This is a COMMENT
#
# Puppet dashboard
puppet controller-all-local.example.co.uk:80
# Nagios monitoring
nagios controller-all-local.example.co.uk::80/nagios
tac talend-tac-local.example.co.uk:8080/org.talend.admin
mng console talend-mca-local.example.co.uk:8080/amc # Line with three fields
So, basically these two things I'd like to achieve:
Read the lines into array, stripping off everything after the first #
Split each element and print a message if the number id more than two
Any help would be greatly appreciated. Cheers!!
Update 25/02
Thanks guy for your help!!
The blankthing doesn't work for at all; throwing in this error; but I kinda failed to understand why:
undefined method `blank?' for "\n":String (NoMethodError)
The array: myArray, which I get is actually something like this (using p instead of puts:
["\n", "puppet controller-all-local.example.co.uk:80\n", "\n", "\n", "nagios controller-all-local.example.co.uk::80/nagios\n", ..... \n"]
Hence, I had to do this to get around this prob:
$fname = "webURI.txt"
def myArray()
if File.exist?($fname) and File.file?($fname)
IO.readlines($fname).map { |arr| arr.gsub(/#.*/,'') }
end
end
# remove blank lines
SSS = myArray.reject { |ln| ln.start_with?("\n") }
SSS.each_index do |i|
myLine = SSS[i].split(' ')
if myLine.length > 2
puts "Too many arguments!!!"
elsif myLine.length == 1
puts "page"+ i.to_s + "\t=> " + myLine[0]
else
puts myLine[0] +"\t=> "+ myLine.last
end
end
You are most welcome to improve the code. cheers!!
goodArray = myArray.reject do |line|
line.start_with?('#') || line.split(' ').length > 2
end
This would reject whatever that either starts with # or the split returns an array of more than two elements returning you an array of only good items.
Edit:
For your inline commenting you can then do
goodArray.map do |line|
line.gsub(/#.*/, '')
end

Increment part of a string in Ruby

I have a method in a Ruby script that is attempting to rename files before they are saved. It looks like this:
def increment (path)
if path[-3,2] == "_#"
print " Incremented file with that name already exists, renaming\n"
count = path[-1].chr.to_i + 1
return path.chop! << count.to_s
else
print " A file with that name already exists, renaming\n"
return path << "_#1"
end
end
Say you have 3 files with the same name being saved to a directory, we'll say the file is called example.mp3. The idea is that the first will be saved as example.mp3 (since it won't be caught by if File.exists?("#{file_path}.mp3") elsewhere in the script), the second will be saved as example_#1.mp3 (since it is caught by the else part of the above method) and the third as example_#2.mp3 (since it is caught by the if part of the above method).
The problem I have is twofold.
1) if path[-3,2] == "_#" won't work for files with an integer of more than one digit (example_#11.mp3 for example) since the character placement will be wrong (you'd need it to be path[-4,2] but then that doesn't cope with 3 digit numbers etc).
2) I'm never reaching problem 1) since the method doesn't reliably catch file names. At the moment it will rename the first to example_#1.mp3 but the second gets renamed to the same thing (causing it to overwrite the previously saved file).
This is possibly too vague for Stack Overflow but I can't find anything that addresses the issue of incrementing a certain part of a string.
Thanks in advance!
Edit/update:
Wayne's method below seems to work on it's own but not when included as part of the whole script - it can increment a file once (from example.mp3 to example_#1.mp3) but doesn't cope with taking example_#1.mp3 and incrementing it to example_#2.mp3. To provide a little more context - currently when the script finds a file to save it is passing the name to Wayne's method like this:
file_name = increment(image_name)
File.open("images/#{file_name}.jpeg", 'w') do |output|
open(image_url) do |input|
output << input.read
end
end
I've edited Wayne's script a little so now it looks like this:
def increment (name)
name = name.gsub(/\s{2,}|(http:\/\/)|(www.)/i, '')
if File.exists?("images/#{name}.jpeg")
_, filename, count, extension = *name.match(/(\A.*?)(?:_#(\d+))?(\.[^.]*)?\Z/)
count = (count || '0').to_i + 1
"#{name}_##{count}#{extension}"
else
return name
end
end
Where am I going wrong? Again, thanks in advance.
A regular expression will git 'er done:
#!/usr/bin/ruby1.8
def increment(path)
_, filename, count, extension = *path.match(/(\A.*?)(?:_#(\d+))?(\.[^.]*)?\Z/)
count = (count || '0').to_i + 1
"#{filename}_##{count}#{extension}"
end
p increment('example') # => "example_#1"
p increment('example.') # => "example_#1."
p increment('example.mp3') # => "example_#1.mp3"
p increment('example_#1.mp3') # => "example_#2.mp3"
p increment('example_#2.mp3') # => "example_#3.mp3"
This probably doesn't matter for the code you're writing, but if you ever may have multiple threads or processes using this algorithm on the same files, there's a race condition when checking for existence before saving: Two writers can both find the same filename unused and write to it. If that matters to you, then open the file in a mode that fails if it exists, rescuing the exception. When the exception occurs, pick a different name. Roughly:
loop do
begin
File.open(filename, File::CREAT | File::EXCL | File::WRONLY) do |file|
file.puts "Your content goes here"
end
break
rescue Errno::EEXIST
filename = increment(filename)
redo
end
end
Here's a variation that doesn't accept a file name with an existing count:
def non_colliding_filename( filename )
if File.exists?(filename)
base,ext = /\A(.+?)(\.[^.]+)?\Z/.match( filename ).to_a[1..-1]
i = 1
i += 1 while File.exists?( filename="#{base}_##{i}#{ext}" )
end
filename
end
Proof:
%w[ foo bar.mp3 jim.bob.mp3 ].each do |desired|
3.times{
file = non_colliding_filename( desired )
p file
File.open( file, 'w' ){ |f| f << "tmp" }
}
end
#=> "foo"
#=> "foo_#1"
#=> "foo_#2"
#=> "bar.mp3"
#=> "bar_#1.mp3"
#=> "bar_#2.mp3"
#=> "jim.bob.mp3"
#=> "jim.bob_#1.mp3"
#=> "jim.bob_#2.mp3"

Ruby: Deleting last iterated item?

What I'm doing is this: have one file as input, another as output. I chose a random line in the input, put it in the output, and then delete it.
Now, I've iterated over the file and am on the line I want. I've copied it to the output file. Is there a way to delete it? I'm doing something like this:
for i in 0..number_of_lines_to_remove
line = rand(lines_in_file-2) + 1 #not removing the first line
counter = 0
IO.foreach("input.csv", "r") { |current_line|
if counter == line
File.open("output.csv", "a") { |output|
output.write(current_line)
}
end
counter += 1
}
end
So, I have current_line, but I'm not sure how to remove it from the source file.
Array.delete_at might do. Given an index, it removes the object at that index, returning the object.
input.csv:
one,1
two,2
three,3
Program:
#!/usr/bin/ruby1.8
lines = File.readlines('/tmp/input.csv')
File.open('/tmp/output.csv', 'a') do |file|
file.write(lines.delete_at(rand(lines.size)))
end
p lines # ["two,2\n", "three,3\n"]
output.csv:
one,1
Here is a randomline class. You create a new randomline object by passing it an input file name and an output file name. You can then call the deleterandom method on that object and pass it a number of lines to delete.
The data is stored internally in arrays as well as being put to file. Currently output is in append mode so if you use the same file it will just add to the end, you could change the a to a w if you wanted to start the file fresh each time.
class Randomline
attr_accessor :inputarray, :outputarray
def initialize(filein, fileout)
#filename = filein
#filein = File.open(filein,"r+")
#fileoutput = File.open(fileout,"a")
#inputarray = []
#outputarray = []
readin()
end
def readin()
#filein.each do |line|
#inputarray << line
end
end
def deleterandom(numtodelete)
numtodelete.times do |num|
random = rand(#inputarray.size)
#outputarray << inputarray[random]
#fileoutput.puts inputarray[random]
#inputarray.delete_at(random)
end
#filein = File.open(#filename,"w")
#inputarray.each do |line|
#filein.puts line
end
end
end
here is an example of it being used
a = Randomline.new("testin.csv","testout.csv")
a.deleterandom(3)
You have to re-write the source-file after removing a line otherwise the modifications won't stick as they're performed on a copy of the data.
Keep in mind that any operation which modifies a file in-place runs the risk of truncating the file if there's an error of any sort and the operation cannot complete.
It would be safer to use some kind of simple database for this kind of thing as libraries like SQLite and BDB have methods for ensuring data integrity, but if that's not an option, you just need to be careful when writing the new input file.

Resources