I tried (with some success)
require 'open-uri'
require 'chunky_png'
image_url = "http://res.cloudinary.com/houlihan-lokey/image/upload/c_limit,h_75,w_120/ixl7z4c1czlvrqnbt0mm.png"
# image_url = "http://res.cloudinary.com/houlihan-lokey/image/upload/c_limit,h_75,w_120/zqw2pgczdzbtyj3aib2o.png" # this works
image_file = open(image_url)
image = ChunkyPNG::Image.from_file(image_file)
puts image.width
Some images work, others don't. The error:
TypeError: no implicit conversion of StringIO into String
from /Users/theuser/.rvm/gems/ruby-2.0.0-p247/gems/chunky_png-1.3.3/lib/chunky_png/datastream.rb:66:in `initialize'
from /Users/theuser/.rvm/gems/ruby-2.0.0-p247/gems/chunky_png-1.3.3/lib/chunky_png/datastream.rb:66:in `open'
from /Users/theuser/.rvm/gems/ruby-2.0.0-p247/gems/chunky_png-1.3.3/lib/chunky_png/datastream.rb:66:in `from_file'
from /Users/theuser/.rvm/gems/ruby-2.0.0-p247/gems/chunky_png-1.3.3/lib/chunky_png/canvas/png_decoding.rb:53:in `from_file'
from (irb):5
from /Users/theuser/.rvm/rubies/ruby-2.0.0-p247/bin/irb:16:in `<main>'
I will be running this on Heroku and am wondering -- is there a reliable way to achieve this without creating temporary files?
The issue was with files which were too small for open to create a temp file for.
The solution is to not rely on temp files but to read the image into memory and use ChunkyPNG's Image.from_blob:
require 'open-uri'
require 'chunky_png'
image_url = "http://res.cloudinary.com/houlihan-lokey/image/upload/c_limit,h_75,w_120/ixl7z4c1czlvrqnbt0mm.png"
image_file = open(image_url).read
image = ChunkyPNG::Image.from_blob(image_file)
puts image.width
This may not work with large images, but is OK for my application.
Related
I have a simple crawler written in Ruby that should crawl specific sites and save data into a CSV file, but, when running it from the Windows command line I get this error:
C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:1282:in `initialize': No such file or directory # rb_sysopen - csv/boxers.csv (Errno::ENOENT)
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:1282:in `open'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:1282:in `open'
from boxers.rb:18:in `<main>'
This is the script:
#!/usr/bin/env ruby
require 'csv'
require 'mechanize'
agent = Mechanize.new{ |agent| agent.history.max_size=0 }
agent.user_agent = 'Mozilla/5.0'
base = "http://baseurl.com/"
division = ARGV[0]
search_url = "http://baseurl.com/ratings.php?sex=M&division=#{division}&pageID="
path='//*[#id="mainContent"]/table/tr[position()>2]'
boxers = CSV.open("csv/file.csv","w")
url = search_url+"1"
begin
page = agent.get(url)
rescue
print " -> error, retrying\n"
retry
end
end
boxers.close
So I have a method and when I pass in a straight URL (see code below) the method returns just fine. When I pass in the URL using #{} for the possibility of using different location in Craigslist, it throws the error shown at the bottom. I suppose my question is twofold:
Why doesn't Nokogiri allow me to open this?
Can I change this to accept the URL?
Code:
def get_post_date(listing_url)
# This method takes in a page and returns a date hopefully in a date format
# but right now text
listing = Nokogiri::HTML(open(listing_url)).css("p")
setter = ""
for element in listing
if element.css('time').text!=""&&setter==""
post_time = "poop" # Time.parse(element.css('time').text)
return "poop"
end
end
end
location = "sfbay"
# THIS throws an error
p get_post_date("#{location}.craigslist.org/sfc/vac/4248712420.html")
# THIS works
p get_post_date("sfbay.craigslist.org/sfc/vac/4248712420.html")
Error:
c:\>ruby cljobs.rb C:/Ruby193/lib/ruby/1.9.1/open-uri.rb:35:in
`initialize': No such file or direct ory -
sfbay.craigslist.org/sfc/vac/4248712420.html (Errno::ENOENT)
from C:/Ruby193/lib/ruby/1.9.1/open-uri.rb:35:in `open'
from C:/Ruby193/lib/ruby/1.9.1/open-uri.rb:35:in `open'
from cljobs.rb:7:in `get_post_date'
from cljobs.rb:40:in `'
In order to open a URL you need to require OpenURI. Otherwise nokogiri will try to open a file.
require 'open-uri'
listing = Nokogiri::HTML(open(listing_url))
I'm trying to fetch a remote XML file with Mechanize to get icecast status information. But I'm having problems to pass the XML file from Mechanize::File format to string or some XML format which XMLSimple can work with.
The XML document looks like that:
<icestats>
<admin>donschoe#stackoverflow.com</admin>
<!-- ... -->
</icestats>
My code looks like that right now:
require 'mechanize'
require 'xmlsimple'
server = 'example.net'
port = 8000
user = 'stackoverflow'
password = 'hackme'
agent = Mechanize.new
agent.user_agent_alias = 'Linux Firefox'
agent.add_auth("http://#{server}:#{port}/admin/status.xml", user, password)
agent.get("http://#{server}:#{port}/admin/status.xml")
xml = agent.current_page
status = XmlSimple.xml_in(xml)
puts status['admin']
This should output: donschoe#stackoverflow.com
But it throws:
/home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:191:in 'xml_in': Could not parse object of type: <Mechanize::File>. (ArgumentError)
Now, I understand the XMLSimple needs a string and therefore I tried to convert the Mechanize::File format to string, replacing the second last line with:
status = XmlSimple.xml_in(xml.to_s)
But this throws an even more weird exception:
/usr/lib64/ruby/1.9.1/rexml/parsers/baseparser.rb:406:in `block in pull_event': Undefined prefix Mechanize: found (REXML::UndefinedNamespaceException)
from /usr/lib64/ruby/1.9.1/set.rb:222:in `block in each'
from /usr/lib64/ruby/1.9.1/set.rb:222:in `each_key'
from /usr/lib64/ruby/1.9.1/set.rb:222:in `each'
from /usr/lib64/ruby/1.9.1/rexml/parsers/baseparser.rb:404:in `pull_event'
from /usr/lib64/ruby/1.9.1/rexml/parsers/baseparser.rb:183:in `pull'
from /usr/lib64/ruby/1.9.1/rexml/parsers/treeparser.rb:22:in `parse'
from /usr/lib64/ruby/1.9.1/rexml/document.rb:231:in `build'
from /usr/lib64/ruby/1.9.1/rexml/document.rb:43:in `initialize'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:965:in `new'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:965:in `parse'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:164:in `xml_in'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:203:in `xml_in'
from debugging.rb:16:in `<main>'
What's wrong with my approach? When I download the XML file and use the local XML file the code above works as desired.
I'm especially looking for solutions with Mechanize rather than Nokogiri.
Try changing:
xml = agent.current_page
to:
xml = agent.current_page.body
I have problem with script that makes simple .xls file and writes data to one cell. Here is simple code:
require 'spreadsheet'
class Filter
def filter
#excel = Spreadsheet::Workbook.new
#sheet = #excel.create_worksheet
#sheet[0, 0] = "test"
#excel.write 'test.xls'
end
end
f = Filter.new
f.filter
But it raises error:
C:/Ruby193/lib/ruby/gems/1.9.1/gems/ruby-ole-1.2.11.5/lib/ole/storage/base.rb:62:in
write_nonblock': Bad file descriptor - test.xls (Errno::EBADF)
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/ruby-ole-1.2.11.5/lib/ole/storage/base.rb:62:in
initialize'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/ruby-ole-1.2.11.5/lib/ole/storage/base.rb:78:in
new'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/ruby-ole-1.2.11.5/lib/ole/storage/base.rb:78:in
open'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/spreadsheet-0.7.4/lib/spreadsheet/excel/writer/workbook.rb:4
53:in write_from_scratch'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/spreadsheet-0.7.4/lib/spreadsheet/excel/writer/workbook.rb:6
31:inwrite_workbook'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/spreadsheet-0.7.4/lib/spreadsheet/writer.rb:15:in
block in write'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/spreadsheet-0.7.4/lib/spreadsheet/writer.rb:14:in
open'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/spreadsheet-0.7.4/lib/spreadsheet/writer.rb:14:in
write'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/spreadsheet-0.7.4/lib/spreadsheet/workbook.rb:116:in
write'
from filter.rb:10:in `filter'
from filter.rb:15:in `<main>'
because ruby-ole 1.2.11.5 doesn't support windows platform,
more detail: ruby-ole issue
you can use ruby-ole 1.2.11.4 to avoid this problem.
require 'rubygems'
gem 'ruby-ole','1.2.11.4'
require 'spreadsheet'
I've seen these before. First verify that you can write to that file's location.
My guess is either the file is already open in Excel or your antivirus is blocking the 'threat'.
I am trying to retrieve files (.csv) from an ftp site and save them all locally in the same folder. My code looks like this:
#! /usr/bin/ruby
require 'logger'
require 'fileutils'
require 'net/ftp'
require 'rubygems'
require 'mysql2'
require 'roo'
require 'date'
# logging setup
log = Logger.new("/path_to_logs/ftp_log.log", 10, 1024000)
log.level = Logger::INFO
export_ftp_path = '/Receive/results/'
export_work_path ='/Users/pierce/results_exports/'
Net::FTP.open('host', 'username', 'password') do |ftp|
log.info("Logged into FTP")
ftp.passive = true
ftp.chdir("#{export_ftp_path}")
ftp.list.each do |file|
log.info("Found file #{file}")
new_file = file[56..115] #take part of the file name and remove spaces and periods
new_file = new_file.gsub(/[.]+/, "")
new_file = new_file.gsub(/\s/, "0")
ftp.gettextfile(file,"#{new_file}")
log.info("Downloaded file #{new_file}")
end
end
And here is the error I receive:
/Users/pierce/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/net/ftp.rb:560:in `initialize': No such file or directory - (Errno::ENOENT)
from /Users/pierce/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/net/ftp.rb:560:in `open'
from /Users/pierce/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/net/ftp.rb:560:in `gettextfile'
from ftp_test.rb:44:in `block (2 levels) in <main>'
from ftp_test.rb:33:in `each'
from ftp_test.rb:33:in `block in <main>'
from /Users/pierce/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/net/ftp.rb:116:in `open'
As suggested, here are the values I have for puts file and puts new_file.
file = -rwxr-xr-x 1 1130419 114727 9546 May 17 08:11 results_Wed. 16 May 2012.csv
new_file = results_Wed0230May02012csv
Any suggestions on what to change in gettextfile or within my script to get the files saved correctly?
You should use nlst instead of list when you just need a list of files in a directory. The output of list needs to be properly parsed otherwise.
When you request the file it has to be the original filename, including all spaces. When you save the file it can be anything you want (including spaces or not). The error was because you were requesting the wrong file. Use nlst in your case instead. It will make it much easier (no conversion or parsing needed).