I have
def read_album(music_file)
music_file.gets
album_artist = music_file.gets
album_title = music_file.gets
album_genre = music_file.gets.to_i
tracks = read_tracks(music_file)
album = Album.new(album_artist, album_title, album_genre, tracks)
print_album(album)
end
I want to loop the entire block 3 times (maybe use something like 3.times do), but have music_file.gets (the first line in the procedure) run a different amount of times per loop. (say just once on the first loop, 5 times on the second loop, 8 times on the third loop.) I'm not sure if there's a way to add an index and somehow have the index change from specific values per loop and have music_file.gets repeat according to that, or some other way.
Edit: the text file has a group of albums and has a format similar to this: I want to use the number of tracks as a control variable for a loop to read the album info, music_file.gets is to get that info.
Albums.txt (the file name, everything below is a separate line of text in the file)
*Number of albums (integer)
*Artist name
*Album name
*Number of Tracks (integer)
*Track 1
*Track 2
*Artist Name
*Album name
*Number of Tracks (integer)
*Track 1
*Track 2
*Track 3
etc. (number of tracks per album are random)
Given a pair of counts acquired by reading, you could use a nested loop structure. Two basic mechanisms for counts are count.times or Range.each, illustrated here:
number_of_albums.times do |i| # i goes from 0 to m-1
# do album stuff, including picking up the value of number_of_tracks
(1..number_of_tracks).each do |j| # j goes from 1 to number_of_tracks
# do track stuff
end
# do additional stuff if needed
end
If the "stuff" to be done is a one-liner, you can replace do/end with curly braces.
See this tutorial for much more information about the variety of looping options.
Related
I have a data set that consists thousand of rows. I would like to count how many times an alarm toggle between ALARM_OPENED and ALARM_NORMALIZED
Here is a data sample. The Alarm toggle twice and hence ideally the count = 2
The issue now is I cannot figure how to
1) compare ALARM _OPENED and ALARM_NORMALIZED for the event type
2) To compare the difference in time between the change in event (the toggling should happen within a time frame of two seconds.)
count = 0
#loop this
if event_type[0] = 'ALARM_OPENED'
if event_type[1] = 'ALARM_NORMALIZED'
#time[0] - time[1] = 2 seconds
count = count + 1
end
end
p count
If you can assume that you always have a bunch of OPENED/NORMALIZED pairs, you can slice the array into pairs:
event_type.each_slice(2) do |opened, normalized|
break unless normalized # unpaired event at the end
# whatever you want to do with the two events here
end
My CSV contains about 60 million rows. The 10th column contains some alphanumeric entries, some of which repeat, that I want to convert into integers with a one-to-one mapping. That is, I don't want the same entry in Original.csv to have multiple corresponding integer values in Processed.csv. So, initially, I wrote the following code:
require 'csv'
udids = []
CSV.open('Original.csv', "wb") do |csv|
CSV.foreach('Processed.csv', :headers=>true) do |row|
unless udids.include?(row[9])
udids << row[9]
end
udid = udids.index(row[9]) + 1
array = [udid]
csv<<array
end
end
But, the program was taking a lot of time, which I soon realized was because it had to check all the previous rows to make sure only the new values get assigned a new integer value, and the existing ones are not assigned any new value.
So, I thought of hashing them, because when exploring the web about this issue, I learnt that hashing is faster than sequential comparing, somehow (I have not read the details about the how, but anyway...) So, I wrote the following code to hash them:
arrayUDID=[]
arrayUser=[]
arrayHash=[]
array1=[]
f = File.open("Original.csv", "r")
f.each_line { |line|
row = line.split(",");
arrayUDID<<row[9]
arrayUser<<row[9]
}
arrayUser = arrayUser.uniq
arrayHash = []
for i in 0..arrayUser.size-1
arrayHash<<arrayUser[i]
arrayHash<<i
end
hash = Hash[arrayHash.each_slice(2).to_a]
array1=hash.values_at *arrayUDID
logfile = File.new("Processed.csv","w")
for i in 0..array1.size-1
logfile.print("#{array1[i]}\n")
end
logfile.close
But here again, I observed that the program was taking a lot of time, which I realized must be due to the hash array (or hash table) running out of memory.
So, can you kindly suggest any method that will work for my huge file in a reasonable amount of time? By reasonable amount, I mean within 10 hours, because I realize that it's going to take some hours at least as it took about 5 hours to extract that dataset from an even bigger dataset. So, with my aforementioned codes, it was not getting finished even after 2 days of running the programs. So, if you can suggest a method which can do the task by leaving the computer on overnight, that would be great. Thanks.
I think this should work:
udids = {}
unique_count = 1
output_csv = CSV.open("Processed.csv", "w")
CSV.foreach("Original.csv").with_index do |row, i|
output_csv << row and next if i == 0 # skip first row (header info)
val = row[9]
if udids[val.to_sym]
row[9] = udids[val.to_sym]
else
udids[val.to_sym] = unique_count
row[9] = unique_count
unique_count += 1
end
output_csv << row
end
output_csv.close
The performance depends heavily on how many duplicates there are (the more the better), but basically it keeps track of each value as a key in a hash, and checks to see if it has encountered that value yet or not. If so, it uses the corresponding value, and if not it increments a counter, stores that count as the new value for that key and continues.
I was able to process a 10 million line test CSV file in about 3 minutes.
My question is, how can I search through an array and replace the string at the current index of the search without knowing what the indexed array string contains?
The code below will search through an ajax file hosted on the internet, it will find the inventory, go through each weapon in my inventory, adding the ID to a string (so I can check if that weapon has been checked before). Then it will add another value after that of the amount of times it occurs in the inventory, then after I have check all weapon in the inventory, it will go through the all of the IDs added to the string and display them along with the number (amount of occurrences). This is so I know how many of each weapon I have.
This is an example of what I have:
strList = ""
inventory.each do |inv|
amount = 1
exists = false
ids = strList.split(',')
ids.each do |ind|
if (inv['id'] == ind.split('/').first) then
exists = true
amount = ind.split('/').first.to_i
amount += 1
ind = "#{inv['id']}/#{amount.to_s}" # This doesn't seem work as expected.
end
end
if (exists == true) then
ids.push("#{inv['id']}/#{amount.to_s}")
strList = ids.join(",")
end
end
strList.split(",").each do |item|
puts "#{item.split('/').first} (#{item.split('/').last})"
end
Here is an idea of what code I expected (pseudo-code):
inventory = get_inventory()
drawn_inv = ""
loop.inventory do |inv|
if (inv['id'].occurred_before?)
inv['id'].count += 1
end
end loop
loop.inventory do |inv|
drawn_inv.add(inv['id'] + "/" + inv['id'].count)
end loop
loop.drawn_inv do |inv|
puts "#{inv}"
end loop
Any help on how to replace that line is appreciated!
EDIT: Sorry for not requiring more information on my code. I skipped the less important part at the bottom of the code and displayed commented code instead of actual code, I'll add that now.
EDIT #2: I'll update my description of what it does and what I'm expecting as a result.
EDIT #3: Added pseudo-code.
Thanks in advance,
SteTrezla
You want #each_with_index: http://ruby-doc.org/core-2.2.0/Enumerable.html#method-i-each_with_index
You may also want to look at #gsub since it takes a block. You may not need to split this string into an array at all. Basically something like strList.gsub(...){ |match| #...your block }
In Ruby, I'm parsing a CSV file and storing the values into a class (each 'column' in the csv corresponds to an attribute of my class, Expense). My CSV has 15.000 rows. But after I go through all the lines, I only have 260 objects of the Expense class, instead of 15.000, which is the number of rows. If I parse a CSV with 100 rows, it works fine, it creates 100 objects of the Expense Class. But up from 150 rows it starts to create issues - 150 rows returns me 36 instances of the class Expense, and 15.000 returns 260. I don't see a logic, it is like at some point it resets the number of instances of my class and starts counting again.
Is there a limit to the number of instances a class can have in Ruby?
This is not a program, and I'm a real begginer. I'm just going a through a CSV, doing some validation on the data, and then returning to a CSV. So I was looking for way in which I didn't need to store the values in a temp file.
Thanks
Code:
class Expense
attr_accessor :var1, :var2, :var3
def initialize(var1,var2,var3)
#var1 = var1
#var2 = var2
#var3 = var3
end
def self.count
ObjectSpace.each_object(self).to_a.count
end
end
old_file = File.read("C:/Folder/OldFile.csv")
new_file = File.new("C:/Folder/NewFile.csv", "w")
puts "Number of rows in input file: #{old_file.count("\n")}"
puts "Number of Expense objects stored before procedure: #{Expense.count}"
#loop through the rows and store each column as an attribute of the class
old_file.each_line do |line|
#save each column of the row as an element of the array
attr_ay = []
line.chomp.each_line(';') do |att|
attr_ay.push(att.chomp(";"))
end
#loops through each attribute and assigns the corresponding value of the array
i=0
expense = Expense.new("","","")
expense.instance_variables.each do |att|
expense.instance_variable_set(att,attr_ay[i])
new_file.print(expense.instance_variable_get(att)+";")
i = i + 1
end
#jump to the next line in new file
new_file.print "\n"
end
new_file.close
#compare number of rows
new_file = File.read("C:/Folder/NewFile.csv")
puts "Number of rows in output file: #{new_file.count("\n")}"
puts "Number of Expense objects stored after procedure: #{Expense.count}"
#Result:
#Number of rows in input file: 15031
#Number of Expense objects stored before procedure: 0
#Number of rows in output file: 15031
#Number of Expense objects stored after procedure: 57
This answer based on comments by myself and Max that seem to of resolved the problem.
The code in the question is not being affected by any limit on number of objects. There is no limit inherent in the Ruby language, and most implementations allow for large numbers of objects, more than will fit into terabytes of memory. A script will typically run out memory or CPU time before it runs out of object pointers.
Using ObjectSpace to access the objects is the cause of the problem:
ObjectSpace is a useful debugging tool or meta-programming tool, you can find current objects using it, but it does not maintain any active references.
Ruby will clear up un-referenced objects, this is called garbage collection. It does this by checking all active bindings and following their object references (and all their child references etc). Anything not marked as "in use" in that mark phase can be removed in a later sweep phase.
The garbage collection runs concurrently with your code, and only starts at all once the script has consumed at least some memory. This explains the puzzling changes of number when you tried different tests.
The fix is to use a container to hold the objects. Simply push-ing them on to an Array creates the necessary references, and stops garbage collection being a problem for you. You should use that variable to find and count your objects, not ObjectSpace.
I'm generating some load test results with jmeter and it outputs nicely formatted csv file, but now I need to do some number crunching with ruby. An example beginning of the csv file:
threadName,grpThreads,allThreads,URL,Latency,SampleCount,ErrorCount
Thread Group 1-1,1,1,urlXX,240,1,0
Thread Group 1-1,1,1,urlYY,463,1,0
Thread Group 1-2,1,1,urlXX,200,1,0
Thread Group 1-3,1,1,urlXX,212,1,0
Thread Group 1-2,1,1,urlYY,454,1,0
.
.
.
Thread Group 1-N,1,1,urlXX,210,1,0
Now, for statistics I need to read the first line of each thread group, add the Latency fields up and then divide with the amount of thread groups I have, to just get an average latency. Then iterate to the second line of every thread group and so forth..
I was thinking that maybe I would need to write some temporary sorted csv files for each thread group (the order of the url's are hit is always the same within a thread group) and then use those as input, add first lines, do math, add second lines until there are no more lines.
But since the amount of thread groups change, I haven't been able to write ruby so that it could flex around that... any code examples would be really appreciated :)
[update] - Is this what you want, I wonder?
How about this - it's probably inefficient but does it do what you want?
CSV = File.readlines("data.csv")
CSV.shift # minus the header.
# Hash where key is grp name; value is list of HASHES with keys {:grp, :lat}
hash = CSV.
map {|l| # Turn every line into a HASH of grp name and it's lats.
fs = l.split(","); {:grp => fs[0], :lat => fs[4]}
}.
group_by{|o| o[:grp]}
# The largest number of lines we have in any group
max_lines = hash.max_by{|gname, l| l.size}.size
# AVGS is a list of averages.
# AVGS[0] is the average lat. for all the first lines,
# AVGS[1] is the average lat. for all second lines, etc.
AVGS =
(0..(max_lines-1)).map{|lno| # line no
total = # total latency for the i'th line...
hash.map {|gname, l|
if l[lno] then l[lno][:lat].to_i
else 0 end
}
total.reduce{|a,b| a+b} / (hash.size)
}
# So we have 'L' Averages - where L is the maximum number of
# lines in any group. You could do anything with this list
# of numbers... find the average again?
puts AVGS.inspect
Should return something like:
[217/*avg for 1st-liners*/, 305 /*avg for 2nd liners*/]