Download files from URL's in array naming them by items in another array - ruby

I have a CSV with two columns, I am pushing each column's data into an array. Column 2 contains URL's of images that I would like to download. How do I name the file it's corresponding value from column 1?
require "open-uri"
require "csv"
members = []
photos = []
CSV.foreach('members.csv', :headers => true) do |csv_obj|
members << csv_obj[0]
photos << csv_obj[1]
end
photos.each {
|x| File.open({value from members array}, 'wb') do |fo|
fo.write open(x).read
end
}

Try this:
require "open-uri"
require "csv"
members = []
photos = []
CSV.foreach('members.csv', :headers => true) do |csv_obj|
members << csv_obj[0]
photos << csv_obj[1]
end
photos.each_with_index do |photo, index|
File.open(members[index], 'wb') do |fo|
fo.write open(photo) { |file| file.read }
end
end
Notes:
Try to submit a snippet of the CSV file too, it will help testing the code.
The code assumes that the members array will contain file names with extension.
The reason for using the block with open while downloading file is so that to ensure closing of file stream.
I suggest to use long descriptive variable names; it silently documents your intent and makes code very readable.
wb argument in File.open method is to ensure writing the file in binary mode.

Related

How do I get this Nokogiri output to write each object to a column in a csv?

I have this code here which outputs a CSV, but when I open the CSV file its just has a 0 in the first two columns.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
page = Nokogiri::HTML(open("https://www.drugs.com/pharmaceutical-
companies.html"))
puts page.class #=> Nokogiri::HTML::Document
pharma_links = page.css("div.col-list-az a")
link= pharma_links.each{|link| puts link['href'] }
company = pharma_links.each{|link| puts link.text}
CSV.open("/Users/file.csv", "wb") do |csv|
csv << [company, link]
end
The problem is that pharma_links.each{|link| ...} returns the ENTIRE enumerator, so if you do this once for company and once for link you now have two new arrays. You then have to re-map each company & link in a new array / hash (or by index if you are lazy AND you know for certain nothing went wrong in the either .each call)
To avoid this, simply construct the CSV while you are looping through the data. For each line of the CSV you expect one pharma_links 'line', so iterate through each at the same time:
require 'nokogiri'
require 'open-uri'
require 'csv'
page = Nokogiri::HTML(open("https://www.drugs.com/pharmaceutical-companies.html"))
# puts page.class #=> Nokogiri::HTML::Document
pharma_links = page.css("div.col-list-az a")
# Create the CSV and iterate through the links while creating it
# You can also add headers to the CSV on instantiation
CSV.open("file.csv", "wb", write_headers: true, headers: ['url','description']) do |csv|
pharma_links.each do |link|
puts "Adding #{link.text}" # prove that it works :)
csv << [link['href'], link.text]
end
end

How do I access the filename of the CSV file I just opened?

I have a method that looks like this:
def extract_websites
websites = []
csvs = Dir["#{#dir_name}/#{#state}/*.csv"]
csvs.each do |csv|
CSV.foreach(csv, headers: true) do |row|
websites << row['Website']
end
end
websites.uniq!
end
But what I need want to do is for each CSV file that is opened, I would like to detect the name of that file.
How do I do that?
In your sample the variable csv holds the path of the CSV file.
That local variable is available in the blocks of its children, it shares its scope down but not upwards.
So:
def extract_websites
websites = []
csvs = Dir["#{#dir_name}/#{#state}/*.csv"]
csvs.each do |csv|
puts File.expand_path(csv) # show the full path for each csv file
CSV.foreach(csv, headers: true) do |row|
puts csv # shows unexpanded path for each row of a csv
websites << row['Website']
end
end
websites.uniq!
end
should print out the path for each CSV file and for each row.

Reading every line in a CSV and using it to query an API

I have the following Ruby code:
require 'octokit.rb'
require 'csv.rb'
CSV.foreach("actors.csv") do |row|
CSV.open("node_attributes.csv", "wb") do |csv|
csv << [Octokit.user "userid"]
end
end
I have a csv called actors.csv where every row has one entry - a string with a userid.
I want to go through all the rows, and for each row do Octokit.user "userid", and then store the output from each query on a separate row in a CSV - node_attributes.csv.
My code does not seem to do this? How can I modify it to make this work?
require 'csv'
DOC = 'actors.csv'
DOD = 'new_output.csv'
holder = CSV.read(DOC)
You can navigate it by calling
holder[0][0]
=> data in the array
holder[1][0]
=> moar data in array
make sense?
#make this a loop
profile = []
profile[0] = holder[0][0]
profile[1] = holder[1][0]
profile[2] = 'whatever it is you want to store in the new cell'
CSV.open(DOD, "a") do |data|
data << profile.map
end
#end the loop here
That last bit of code will print whatever you want into a new csv file

Need help exporting parsed results, via Nokogiri, and exporting to CSV,. Only last parsed result is shown, why?

This is killing me and searching here and the big G is confusing me even more.
I followed the tutorial at Railscasts #190 on Nokogiri and was able to write myself a nice little parser:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.target.com/c/movies-entertainment/-/N-5xsx0/Ntk-All/Ntt-wwe/Ntx-matchallpartial+rel+E#navigation=true&facetedValue=/-/N-5xsx0&viewType=medium&sortBy=PriceLow&minPrice=0&maxPrice=10&isleaf=false&navigationPath=5xsx0&parentCategoryId=9975218&RatingFacet=0&customPrice=true"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".standard").each do |item|
title = item.at_css("span.productTitle a")[:title]
format = item.at_css("span.description").text
price = item.at_css(".price-label").text[/\$[0-9\.]+/]
link = item.at_css("span.productTitle a")[:href]
puts "#{title}, #{format}, #{price}, #{link}"
end
I'm happy with the results and able to see it in the Windows console. However, I want to export the results to a CSV file and have tried numerous ways (with no luck) and I know I'm missing something. My latest updated code (after downloading the html files) is below:
require 'rubygems'
require 'nokogiri'
require 'csv'
#title = Array.new
#format = Array.new
#price = Array.new
#link = Array.new
doc = Nokogiri::HTML(open("index1.html"))
doc.css(".standard").each do |item|
#title << item.at_css("span.productTitle a")[:title]
#format << item.at_css("span.description").text
#price << item.at_css(".price-label").text[/\$[0-9\.]+/]
#link << item.at_css("span.productTitle a")[:href]
end
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
csv << [#title, #format, #price, #link]
end
It works and spits a file out for me, but just the last result. I followed the tutorial at Andrew!: WEb Scraping... and trying to mix what I'm trying to achieve with someone else's process is confusing.
I assume it's looping through all of the results and only printing the last. Can someone give me pointers on how I should loop this (if that's the problem) so that all the results are in their respective columns?
Thanks in advance.
You're storing values in four arrays, but you're not enumerating the arrays when you generate your output.
Here is a possible fix:
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
until #title.empty?
csv << [#title.shift, #format.shift, #price.shift, #link.shift]
end
end
Note that this is a destructive operation that shifts the values off of the arrays one at a time, so in the end they will all be empty.
There are more efficient ways to read and convert the data, but this will hopefully do what you want for now.
There are several things you could do to write this more in the "Ruby way":
require 'rubygems'
require 'nokogiri'
require 'csv'
doc = Nokogiri::HTML(open("index1.html"))
CSV.open('file.csv', 'wb') do |csv|
csv << %w[title format price link]
doc.css('.standard').each do |item|
csv << [
item.at_css('span.productTitle a')[:title]
item.at_css('span.description').text
item.at_css('.price-label').text[/\$[0-9\.]+/]
item.at_css('span.productTitle a')[:href]
]
end
end
Without sample HTML it's not possible to test this, but, based on your code, it looks like it'd work.
Notice that in your code you're using instance variables. They're not necessary because you aren't defining a class to have an instance of. You can use local values instead.

How to create CSV file using CSV gem in ruby 1.9.2?

I am new to ruby 1.9.2. How to generate CSV file in a single ruby script file?
Here, I wrote a ruby script,
require 'rubygems'
require 'pg'
require 'active_record'
require 'csv'
class AttachEmail
def generate_csv
begin
filename = "csvout.csv"
users = User.all
users.each do |u|
products = Product.find(:all,:conditions=>["user_id=?",u.id])
CSV.open(filename, 'w') do |csv|
# header row
user_name = u.name
csv << ['Report']
csv << ['Name','Product', 'Item Count']
products.each do |product|
csv << [user_name, product.title,product.count]
end
end
end
rescue Exception => e
puts e
end
end
generate= AttachEmail.new
generate.generate_csv
When i run this script.it will produce output like below,
A B C
0 Report
1 Name,Product,Item Count
2 user1,PD123,10,990
But I need output like, separate column, Please can you kind me ? Thanks in advance
First of all, you need to swap loops if you are trying to put all the user data in the same file, and not overwrite it for every user:
CSV.open(filename, 'w') do |csv|
users.each do |u|
products = Product.find(:all,:conditions=>["user_id=?",u.id])
Next, fix your Excel (I suspect the output is taken from it, right?) to use comma as a separator, not a "space or comma".
Come back with the file contents attached and an example of CSV file which works for you if it still doesn't work.

Resources