rake import- only adding one line from csv to database - ruby

I am attempting to import a CSV file into my rails database (SQLite in Development) following this tutorial. Data is actually getting inserted into my database but it seems to only insert the first record from the CSV File. the rake seems to run without problem. and a running it with --trace reveals no additional information.
require 'csv'
desc "Import Voters from CSV File"
task :import => [:environment] do
file = "db/GOTV.csv"
CSV.foreach(file, :headers => false) do |row|
Voter.create({
:last_name => row[0],
:first_name => row[1],
:middle_name => row[2],
:name_suffix => row[3],
:primary_address => row[4],
:primary_city => row[5],
:primary_state => row[6],
:primary_zip => row[7],
:primary_zip4 => row[8],
:primary_unit => row[9],
:primary_unit_number => row[10],
:phone_number => row[11],
:phone_code => row[12],
:gender => row[13],
:party_code => row[14],
:voter_score => row[15],
:congressional_district => row[16],
:house_district => row[17],
:senate_district => row[18],
:county_name => row[19],
:voter_key => row[20],
:household_id => row[21],
:client_id => row[22],
:state_voter_id => row[23]
})
end
end

Just ran into this as well - guess you solved it some other way, but still might be useful for others.
In my case, the issue seems to be an incompatible change in the CSV library.
I guess you were using Ruby 1.8, where
CSV.foreach(path, rs = nil, &block)
The docs here are severely lacking, actually no docs at all, so have to guess from source: http://ruby-doc.org/stdlib-1.8.7/libdoc/csv/rdoc/CSV.html#method-c-foreach..
Anyway, 'rs' is clearly not an option hash, it looks like the record separator.
In Ruby 1.9 this is nicer: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-foreach
self.foreach(path, options = Hash.new, &block)
so this is the one that supports options such as :headers..

Related

Parsing remote csv file : No such file or directory # rb_sysopen

I am trying to parse a csv file hosted remotely. I user rails 6 and active storage. The file is stored on the ImportJob model. Its url can be accessed this way :
ImportJob.last.csv_file.url
the file does exist and is downloadable : http://res.cloudinary.com/dockcyr0z/raw/upload/rghn3zi2190nmc28qwbtr24apqxe.csv
However when trying to parse it
CSV.foreach(url, headers: true, header_converters: :symbol, col_sep: ';') do |row|
puts row
end
Im getting Errno::ENOENT: No such file or directory # rb_sysopen - http://res.cloudinary.com/dockcyr0z/raw/upload/rghn3zi2190nmc28qwbtr24apqxe.csv
same thing if I try to open the file first : open(url)
Why am I getting this error ? How can I parse this remote csv file ?
Open url with URI.parse and change CSV.foreach to CSV.parse
CSV.parse(URI.parse(url).read, headers: true, header_converters: :symbol, col_sep: ';') do |row|
puts row
end
# output
{
:first_name => "Souper",
:last_name => "Man",
:email => "dageismar+learner233#gmail.com",
:role => "CEO",
:tags => "sales,marketing",
:avatar_url => "http://res.cloudinary.com/dockcyr0z/image/upload/x3f65o5mepbdhi4fwvww99gjqr7p"
}
{
:first_name => "Gentil",
:last_name => "Keum",
:email => "dageismar+learner234#gmail.com",
:role => "CEO",
:tags => "sales,marketing",
:avatar_url => "http://res.cloudinary.com/dockcyr0z/image/upload/x3f65o5mepbdhi4fwvww99gjqr7p"
}
Update:
Or as Stefan suggests just URI.open(url) instead of URI.parse(url).read

Performance - Ruby - Compare large array of hashes (dictionary) to primary hash; update resulting value

I'm attempting to compare my data, which is in the format of an array of hashes, with another large array of hashes (~50K server names and tags) which serves as a dictionary. The dictionary is stripped down to only include the absolutely relevant information.
The code I have works but it is quite slow on this scale and I haven't been able to pinpoint why. I've done verbose printing to isolate the issue to a specific statement (tagged via comments below)--when it is commented out, the code runs ~30x faster.
After reviewing the code extensively, I feel like I'm doing something wrong and perhaps Array#select is not the appropriate method for this task. Thank you so much in advance for your help.
Code:
inventory = File.read('inventory_with_50k_names_and_associate_tag.csv')
# Since my CSV is headerless, I'm forcing manual headers
#dictionary_data = CSV.parse(inventory).map do |name|
Hash[ [:name, :tag].zip(name) ]
end
# ...
# API calls to my app to return an array of hashes is not shown (returns '#app_data')
# ...
#app_data.each do |issue|
# Extract base server name from FQDN (e.g. server_name1.sub.uk => server_name1)
derived_name = issue['name'].split('.').first
# THIS IS THE BLOCK OF CODE that slows down execution 30 fold:
#dictionary_data.select do |src_server|
issue['tag'] = src_server[:tag] if src_server[:asset_name].start_with?(derived_name)
end
end
Sample Data Returned from REST API (#app_data):
#app_data = [{'name' => 'server_name1.sub.emea', 'tag' => 'Europe', 'state' => 'Online'}
{'name' => 'server_name2.sub.us', 'tag' => 'US E.', 'state' => 'Online'}
{'name' => 'server_name3.sub.us', 'tag' => 'US W.', 'state' => 'Failover'}]
Sample Dictionary Hash Content:
#dictionary_data = [{:asset_name => 'server_name1-X98765432', :tag => 'Paris, France'}
{:asset_name => 'server_name2-Y45678920', :tag => 'New York, USA'}
{:asset_name => 'server_name3-Z34534224', :tag => 'Portland, USA'}]
Desired Output:
#app_data = [{'name' => 'server_name1', 'tag' => 'Paris, France', 'state' => 'Up'}
{'name' => 'server_name2', 'tag' => 'New York, USA', 'state' => 'Up'}
{'name' => 'server_name3', 'tag' => 'Portland, USA', 'state' => 'F.O'}]
Assuming "no" on both of my questions in the comments:
#!/usr/bin/env ruby
require 'csv'
#dictionary_data = CSV.open('dict_data.csv') { |csv|
Hash[csv.map { |name, tag| [name[/^.+(?=-\w+$)/], tag] }]
}
#app_data = [{'name' => 'server_name1.sub.emea', 'tag' => 'Europe', 'state' => 'Online'},
{'name' => 'server_name2.sub.us', 'tag' => 'US E.', 'state' => 'Online'},
{'name' => 'server_name3.sub.us', 'tag' => 'US W.', 'state' => 'Failover'}]
STATE_MAP = {
'Online' => 'Up',
'Failover' => 'F.O.'
}
#app_data = #app_data.map do |server|
name = server['name'][/^[^.]+/]
{
'name' => name,
'tag' => #dictionary_data[name],
'state' => STATE_MAP[server['state']],
}
end
p #app_data
# => [{"name"=>"server_name1", "tag"=>"Paris, France", "state"=>"Up"},
# {"name"=>"server_name2", "tag"=>"New York, USA", "state"=>"Up"},
# {"name"=>"server_name3", "tag"=>"Portland, USA", "state"=>"F.O."}]
EDIT: I find it more convenient here to read the CSV without headers, as I don't want it to generate an array of hashes. But to read a headerless CSV as if it had headers, you don't need to touch the data itself, as Ruby's CSV is quite powerful:
CSV.read('dict_data.csv', headers: %i(name tag)).map(&:to_hash)

Ruby CSV input value format

I'm using ruby CSV module to read in a csv file.
One of the values inside the csv file is in format is XXX_XXXXX where X are number. I treat this value as string, actually, but CSV module is reading in these values as XXXXXXXX, as numbers, which I do not want.
Options I am currently using
f = CSV.read('file.csv', {:headers => true, :header_converters => :symbol, :converters => :all} )
Is there a way to tell CSV to not do that?
f = CSV.read('file.csv', {:headers => true, :header_converters => :symbol)}
Leave out the :converters => :all; that one tries (amongst others) to convert all numerical looking strings to numbers.
The :convertors => all causes this, try the following
require "csv"
CSV.parse(DATA, :col_sep => ",", :headers => true, :converters => :all).each do |row|
puts row["numfield"]
end
__END__
textfield,datetimefield,numfield
foo,2008-07-01 17:50:55.004688,123_45678
bar,2008-07-02 17:50:55.004688,234_56789
# gives
# 12345678
# 23456789
and
CSV.parse(DATA, :col_sep => ",", :headers => true).each do |row|
puts row["numfield"]
end
__END__
textfield,datetimefield,numfield
foo,2008-07-01 17:50:55.004688,123_45678
bar,2008-07-02 17:50:55.004688,234_56789
# gives
# 123_45678
# 234_56789

Ruby Sinatra app for downloading a file (as streaming)

I have a sinatra app, where I want to make a download feature. This download take data from table and make excel to download for user.
require 'csv'
get '/download' do
data = [{:name => "john", :age => 12, :state => 'ca'}, {:name => "tony", :age => 22, :state => 'va'}]
# I want to download this data as excel file and the content of file should be as follows:
# name,age,state
# john,12,ca
# tony,22,va
# I don't want to save data as a temp file on the server and then throw to user for download
# rather I want to stream data for download on the browser. So I used this,
# but it is not working
send_data (data, :type => 'application/csv', :disposition => 'attachment')
end
What am I doing wrong? Or how to achieve, what I am trying to do? I was trying to follow http://sinatra.rubyforge.org/api/classes/Sinatra/Streaming.html
UPDATE:
I am not married to send_data method of sinatra. If streaming blocks my server for that duration, then I am open to alternatives.
get '/download' do
data = [{:name => "john", :age => 12, :state => 'ca'}, {:name => "tony", :age => 22, :state => 'va'}]
content_type 'application/csv'
attachment "myfilename.csv"
data.each{|k, v|
p v
}
end
This works for me. I know it is incomplete as I have to put header and comma in the excel file with line break. But this works.

FasterCSV default options and their usage

FasterCSV has a default options hash;
DEFAULT_OPTIONS = { :col_sep => ",",
:row_sep => :auto,
:quote_char => '"',
:converters => nil,
:unconverted_fields => nil,
:headers => false,
:return_headers => false,
:header_converters => nil,
:skip_blanks => false,
:force_quotes => false }
These options can be overridden by passing a hash to FasterCSV read and write methods. Most of them are self explanatory and easy to use but I couldn't find documentation explaining their usage. Is this information available (I haven't been able to find any credible source on the internet) I have had to resort to just trying them out to see what they do.
FasterCSV has replaced the former CSV module in the standard library and is since then renamed to 'CSV'. Have a look at the new method for the options.

Resources