Ruby regular expressions - speed problem - ruby

I want to obtain the information of students in class c79363 ONLY.
The following file(users.txt) contains the user id's
c79363::7117:dputnam,gliao01,hmccon01,crober06,cpurce01,cdavid03,dlevin01,jsmith88
d79363::7117:dputn,gliao0,hmcc01,crob06,cpur01,cdad03,dlen01,jsmh88
f79363::7117:dpnam,gli01,hmcn01,ober06,crce01,cdav03,dln01,jith88
FILENAME=user_info.txt
The other one contains specific information about a user like in this format
jsmith88:*:4185:208:jsmith113:/students/jsmith88:/usr/bin/bash
userd:*:4185:208:jsmith113:/students/jsmith88:/usr/bin/bash
gliao01:*:4185:208:jsmith113:/students/jsmith88:/usr/bin/bash
Here was my solution but was slow! I want to optimize the speed.
pseudo code
I read the file using File.readlines(users.txt) ->
I used split(/,/) -> I then pop array until i had an array with the following values
dputnam,gliao01,hmccon01,crober06,cpurce01,cdavid03,dlevin01,jsmith88
I then continue to read user_info.txt with File.readlines(user_info.txt)
I split(/:/) i have USER ARRAY
Finally I compared the first entry USER ARRAY with my users in class c79363.

user_info = {}
File.foreach("user_info.txt") {|line| user_info[line[/^[^:]+/]] = line.chomp}
class_name = "c79363"
File.foreach("users.txt") do |line|
next unless line[/^[^:]+/] == class_name
line[/[^:]+$/].rstrip.split(/,/).each do |user|
puts user_info[user] if user_info.has_key?(user)
end
end

user_info = {}
File.readlines("users_info.txt").each do |line|
user_info[line.split(/:/,2)[0]] = line.chomp
end
class_name = "c79363"
File.readlines("users.txt").each do |line|
line=line.strip.split(/:/)
if line[0] == class_name then
line[-1].split(",").each {|u| puts user_info[u] if user_info[u] }
end
end

Related

Ruby - reading from .csv and creating objects out of it

I have .csv file with rows of which every row represents one call with certain duration, number etc. I need to create array of Call objects - every Call.new expects Hash of parameters, so it's easy - it just takes rows from CSV. But for some reason it doesn't work - when I invoke Call.new(raw_call) it's nil.
It's also impossible for me to see any output - I placed puts in various places in code (inside blocks etc) and it simply doesn't show anything. I obviously have another class - Call, which holds initialize for Call etc.
require 'csv'
class CSVCallParser
attr_accessor :io
def initialize(io)
self.io = io
end
NAMES = {
a: :date,
b: :service,
c: :phone_number,
d: :duration,
e: :unit,
f: :cost
}
def run
parse do |raw_call|
parse_call(raw_call)
end
end
private
def parse_call(raw_call)
NAMES.each_with_object({}) do |name, title, memo|
memo[name] = raw_call[title.to_s]
end
end
def parse(&block)
CSV.parse(io, headers: true, header_converters: :symbol, &block)
end
end
CSVCallParser.new(ARGV[0]).run
Small sample of my .csv file: headers and one row:
"a","b","c","d","e","f"
"01.09.2016 08:49","International","48627843111","0:29","","0,00"
I noticed a few things that isn't going as expected. In the parse_call method,
def parse_call(raw_call)
NAMES.each_with_object({}) do |name, title, memo|
memo[name] = raw_call[title.to_s]
end
end
I tried to print name, title, and memo. I expected to get :a, :date, and {}, but what I actually got was [:a,:date],{}, and nil.
Also, raw_call headers are :a,:b,:c..., not :date, :service..., so you should be using raw_call[name], and converting that to string will not help, since the key is a symbol in the raw_call.
So I modified the function to
def parse_call(raw_call)
NAMES.each_with_object({}) do |name_title, memo|
memo[name_title[1]] = raw_call[name_title[0]]
end
end
name_title[1] returns the title (:date, :service, etc)
name_title[0] returns the name (:a, :b, etc)
Also, in this method
def run
parse do |raw_call|
parse_call(raw_call)
end
end
You are not returning any results you get, so you are getting nil,
So, I changed it to
def run
res = []
parse do |raw_call|
res << parse_call(raw_call)
end
res
end
Now, if I output the line
p CSVCallParser.new(File.read("file1.csv")).run
I get (I added two more lines to the csv sample)
[{:date=>"01.09.2016 08:49", :service=>"International", :phone_number=>"48627843111", :duration=>"0:29", :unit=>"", :cost=>"0,00"},
{:date=>"02.09.2016 08:49", :service=>"International", :phone_number=>"48622454111", :duration=>"1:29", :unit=>"", :cost=>"0,00"},
{:date=>"03.09.2016 08:49", :service=>"Domestic", :phone_number=>"48627843111", :duration=>"0:29", :unit=>"", :cost=>"0,00"}]
If you want to run this program from the terminal like so
ruby csv_call_parser.rb calls.csv
(In this case, calls.csv is passed in as an argument to ARGV)
You can do so by modifying the last line of the ruby file.
p CSVCallParser.new(File.read(ARGV[0])).run
This will also return the array with hashes like before.
csv = CSV.parse(csv_text, :headers => true)
puts csv.map(&:to_h)
outputs:
[{a:1, b:1}, {a:2, b:2}]

Ruby: file output blank

So I'm trying to build a program which tells me how many days are left till someone's birthday. I have a text file that I'm drawing data from. The problem is the save method is not working for some reason, and nothing is being printed to the output file. Thank you so much in advance!
require 'date'
Person = Struct.new(:name, :bday)
module Family
Member = Hash.new
File.readlines("bday_info.txt").each do |line|
name, bday = line.split(',')
person = Person.new(name, bday)
Member[name] = bday
end
def self.next_bday(name)
birthday = Date.parse(Family::Member[name])
this_year = Date.new(Date.today.year, birthday.month, birthday.day)
next_year = Date.new(Date.today.year + 1, birthday.month, birthday.day)
if this_year > Date.today
puts "\n#{(this_year - Date.today).to_i} days to #{name}'s birthday"
else
puts "\n#{(next_year - Date.today).to_i} days to #{name}'s birthday"
end
end
def self.save(name)
File.open("days_left.txt", "w") do |file|
file.puts "#{next_bday(name)}"
end
end
end
loop do
puts "\nName:"
response = gets.chomp.capitalize
if response.downcase == "quit"
break
elsif Family::Member.has_key?("#{response}") == false
puts "\nYou ain't on the list"
else
Family.next_bday(response)
Family.save(response) #WHY IS THIS LINE NOT WORKING???
end
end
Your Family.next_bday does not return anything (to be more precise - it's returning nil that last puts returns), thus nothing is written to the file.
Other notes(not directly related):
No reason in calling Family.next_bday(response) twice, just save return value from the first call
Family::Member constant looks more like a variable, more clean design will be to make the whole thing a class that takes input file name, reads data in initializer and then stores into a instance variable

Calling multiple methods on a CSV object

I have constructed an Event Manager class that performs parsing actions on a CSV file, and produces html letters using erb. It is part of a jumpstart labs tutorial
The program works fine, but I am unable to call multiple methods on an object without the earlier methods interfering with the later methods. As a result, I have opted to create multiple objects to call instance methods on, which seems like a clunky inelegant solution. Is there a better way to do this, where I can create a single new object and call methods on it?
Like so:
eventmg = EventManager.new("event_attendees.csv")
eventmg.print_valid_phone_numbers
eventmg_2 = EventManager.new("event_attendees.csv")
eventmg_2.print_zipcodes
eventmg_3 = EventManager.new("event_attendees.csv")
eventmg_3.time_targeter
eventmg_4 = EventManager.new("event_attendees.csv")
eventmg_4.day_of_week
eventmg_5 = EventManager.new("event_attendees.csv")
eventmg_5.create_thank_you_letters
The complete code is as follows
require 'csv'
require 'sunlight/congress'
require 'erb'
class EventManager
INVALID_PHONE_NUMBER = "0000000000"
Sunlight::Congress.api_key = "e179a6973728c4dd3fb1204283aaccb5"
def initialize(file_name, list_selections = [])
puts "EventManager Initialized."
#file = CSV.open(file_name, {:headers => true,
:header_converters => :symbol} )
#list_selections = list_selections
end
def clean_zipcode(zipcode)
zipcode.to_s.rjust(5,"0")[0..4]
end
def print_zipcodes
puts "Valid Participant Zipcodes"
#file.each do |line|
zipcode = clean_zipcode(line[:zipcode])
puts zipcode
end
end
def clean_phone(phone_number)
converted = phone_number.scan(/\d/).join('').split('')
if converted.count == 10
phone_number
elsif phone_number.to_s.length < 10
INVALID_PHONE_NUMBER
elsif phone_number.to_s.length == 11 && converted[0] == 1
phone_number.shift
phone_number.join('')
elsif phone_number.to_s.length == 11 && converted[0] != 1
INVALID_PHONE_NUMBER
else
phone_number.to_s.length > 11
INVALID_PHONE_NUMBER
end
end
def print_valid_phone_numbers
puts "Valid Participant Phone Numbers"
#file.each do |line|
clean_number = clean_phone(line[:homephone])
puts clean_number
end
end
def time_targeter
busy_times = Array.new(24) {0}
#file.each do |line|
registration = line[:regdate]
prepped_time = DateTime.strptime(registration, "%m/%d/%Y %H:%M")
prepped_time = prepped_time.hour.to_i
# inserts filtered hour into the array 'list_selections'
#list_selections << prepped_time
end
# tallies number of registrations for each hour
i = 0
while i < #list_selections.count
busy_times[#list_selections[i]] += 1
i+=1
end
# delivers a result showing the hour and the number of registrations
puts "Number of Registered Participants by Hour:"
busy_times.each_with_index {|counter, hours| puts "#{hours}\t#{counter}"}
end
def day_of_week
busy_day = Array.new(7) {0}
d_of_w = ["Monday:", "Tuesday:", "Wednesday:", "Thursday:", "Friday:", "Saturday:", "Sunday:"]
#file.each do |line|
registration = line[:regdate]
# you have to reformat date because of parser format
prepped_date = Date.strptime(registration, "%m/%d/%y")
prepped_date = prepped_date.wday
# adds filtered day of week into array 'list selections'
#list_selections << prepped_date
end
i = 0
while i < #list_selections.count
# i is minus one since days of week begin at '1' and arrays begin at '0'
busy_day[#list_selections[i-1]] += 1
i+=1
end
#busy_day.each_with_index {|counter, day| puts "#{day}\t#{counter}"}
prepared = d_of_w.zip(busy_day)
puts "Number of Registered Participants by Day of Week"
prepared.each{|date| puts date.join(" ")}
end
def legislators_by_zipcode(zipcode)
Sunlight::Congress::Legislator.by_zipcode(zipcode)
end
def save_thank_you_letters(id,form_letter)
Dir.mkdir("output") unless Dir.exists?("output")
filename = "output/thanks_#{id}.html"
File.open(filename,'w') do |file|
file.puts form_letter
end
end
def create_thank_you_letters
puts "Thank You Letters Available in Output Folder"
template_letter = File.read "form_letter.erb"
erb_template = ERB.new template_letter
#file.each do |line|
id = line[0]
name = line[:first_name]
zipcode = clean_zipcode(line[:zipcode])
legislators = legislators_by_zipcode(zipcode)
form_letter = erb_template.result(binding)
save_thank_you_letters(id,form_letter)
end
end
end
The reason you're experiencing this problem is because when you apply each to the result of CSV.open you're moving the file pointer each time. When you get to the end of the file with one of your methods, there is nothing for anyone else to read.
An alternative is to read the contents of the file into an instance variable at initialization with readlines. You'll get an array of arrays which you can operate on with each just as easily.
"Is there a better way to do this, where I can create a single new object and call methods on it?"
Probably. If your methods are interfering with one another, it means you're changing state within the manager, instead of working on local variables.
Sometimes, it's the right thing to do (e.g. Array#<<); sometimes not (e.g. Fixnum#+)... Seeing your method names, it probably isn't.
Nail the offenders down and adjust the code accordingly. (I only scanned your code, but those Array#<< calls on an instance variable, in particular, look fishy.)

Testing a lambda

I am creating an import feature that imports CSV files into several tables. I made a module called CsvParser which parses a CSV file and creates records. My models that receive the create actions extends theCsvParser. They make a call to CsvParser.create and pass the correct attribute order and an optional lambda called value_parser. This lambda transforms values in a hash to a preffered format.
class Mutation < ActiveRecord::Base
extend CsvParser
def self.import_csv(csv_file)
attribute_order = %w[reg_nr receipt_date reference_number book_date is_credit sum balance description]
value_parser = lambda do |h|
h["is_credit"] = ((h["is_credit"] == 'B') if h["is_credit"].present?)
h["sum"] = -1 * h["sum"].to_f unless h["is_credit"]
return [h]
end
CsvParser.create(csv_file, self, attribute_order, value_parser)
end
end
The reason that I'm using a lambda instead of checks inside the CsvParser.create method is because the lambda is like a business rule that belongs to this model.
My question is how i should test this lambda. Should i test it in the model or the CsvParser? Should i test the lambda itself or the result of an array of the self.import method? Maybe i should make another code structure?
My CsvParser looks as follows:
require "csv"
module CsvParser
def self.create(csv_file, klass, attribute_order, value_parser = nil)
parsed_csv = CSV.parse(csv_file, col_sep: "|")
records = []
ActiveRecord::Base.transaction do
parsed_csv.each do |row|
record = Hash.new {|h, k| h[k] = []}
row.each_with_index do |value, index|
record[attribute_order[index]] = value
end
if value_parser.blank?
records << klass.create(record)
else
value_parser.call(record).each do |parsed_record|
records << klass.create(parsed_record)
end
end
end
end
return records
end
end
I'm testing the module itself:
require 'spec_helper'
describe CsvParser do
it "should create relations" do
file = File.new(Rails.root.join('spec/fixtures/files/importrelaties.txt'))
Relation.should_receive(:create).at_least(:once)
Relation.import_csv(file).should be_kind_of Array
end
it "should create mutations" do
file = File.new(Rails.root.join('spec/fixtures/files/importmutaties.txt'))
Mutation.should_receive(:create).at_least(:once)
Mutation.import_csv(file).should be_kind_of Array
end
it "should create strategies" do
file = File.new(Rails.root.join('spec/fixtures/files/importplan.txt'))
Strategy.should_receive(:create).at_least(:once)
Strategy.import_csv(file).should be_kind_of Array
end
it "should create reservations" do
file = File.new(Rails.root.join('spec/fixtures/files/importreservering.txt'))
Reservation.should_receive(:create).at_least(:once)
Reservation.import_csv(file).should be_kind_of Array
end
end
Some interesting questions. A couple of notes:
You probably shouldn't have a return within the lambda. Just make the last statement [h].
If I understand the code correctly, the first and second lines of your lambda are overcomplicated. Reduce them to make them more readable and easier to refactor:
h["is_credit"] = (h['is_credit'] == 'B') # I *think* that will do the same
h['sum'] = h['sum'].to_f # Your original code would have left this a string
h['sum'] *= -1 unless h['is_credit']
It looks like your lambda doesn't depend on anything external (aside from h), so I would test it separately. You could even make it a constant:
class Mutation < ActiveRecord::Base
extend CsvParser # <== See point 5 below
PARSE_CREDIT_AND_SUM = lambda do |h|
h["is_credit"] = (h['is_credit'] == 'B')
h['sum'] = h['sum'].to_f
h['sum'] *= -1 unless h['is_credit']
[h]
end
Without knowing the rationale, it's hard to say where you should put this code. My gut instinct is that it is not the job of the CSV parser (although a good parser may detect floating point numbers and convert them from strings?) Keep your CSV parser reusable. (Note: Re-reading, I think you've answered this question yourself - it is business logic, tied to the model. Go with your gut!)
Lastly, you are defining and the method CsvParser.create. You don't need to extend CsvParser to get access to it, although if you have other facilities in CsvParser, consider making CsvParser.create a normal module method called something like create_from_csv_file

objected oriented and parsing csv

i have to parse csv file which has customers and product they have ordered . customers can repeat for different product . i have to get all the unique customers and products they have ordered. Then print out each customer and there product . i have been asked to do in a object oriented way so
1) should i create a customer objects and have a product as there attribute
2) just write a program using foreach and loop through and store customer and product in a hash and print it out .
what throws me off is i have been asked to do it in a object oriented way. if do it by creating objects how can i store a custom object in memory ? so that if i come across a customer second time i have to add the product and at the end i have to loop through all the objects and print it out . sorry i have bad English thanks reading a long question and for the help.
How can you store a custom object in memory? By creating the object and keeping it in a list, hash, or whatever seems appropriate. (Probably a hash, with the key being whatever unique value you have in your CSV, and the value would be a collection of products.)
Being asked to do it in "an object-oriented way" is a little arbitrary, though.
If you are using FasterCSV or Ruby 1.9 you can extend the parser allowing you to map each CSV row to a custom object.
# http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-load
# http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-dump
# https://github.com/JEG2/faster_csv/blob/master/test/tc_serialization.rb
require 'csv'
class Person
attr_accessor :id, :name, :email
def self.csv_load(meta, headers, row)
person = Person.new
headers.each.with_index { |h,i|
person.send "#{h}=", row[i]
}
person
end
def self.parse(csv)
meta = "class,#{self.to_s}\n"
CSV.load("#{meta}#{csv}")
end
def dump
self.class.dump([self])
end
def self.dump(people, io='', options={})
CSV.dump(people, io, options).strip
end
def self.csv_meta
[]
end
def csv_headers
%w(id name email)
end
def csv_dump(headers)
headers.map { |h| self.instance_variable_get "##{h}" }
end
end
CSV_DUMP = <<-CSV
class,Person
id=,name=,email=
1,"First Dude",dude#company.com
2,"Second Dude",2nddude#company.com
3,"Third Dude",3rddude#company.com
CSV
CSV_INPUT = <<-CSV
id,name,email
1,"First Dude",dude#company.com
2,"Second Dude",2nddude#company.com
3,"Third Dude",3rddude#company.com
CSV
CSV_DUMP2 = <<-CSV
class,Person
#{CSV_INPUT}
CSV
people = Person.parse(CSV_INPUT)
puts people.inspect
dumped = Person.dump(people)
puts dumped
puts "----"
puts Person.parse(dumped).inspect

Resources