Error trying to image scrape - ruby

I'm trying to make a ruby program which will automatically download the most recent Penny-Arcade. Here's the code I have:
require 'mechanize'
agent = Mechanize.new
date_string = Date.today.to_s
page = agent.get('http://www.penny-arcade.com/comic/')
puts page
art_link = page.at('div#comicFrame > a > img')['src']
File.open(date_string, 'wb') do |fo|
fo.write open(art_link).read
end
And the output I get from running the program is:
$ ruby grab_PA.rb
#<Mechanize::Page:0x007f38bc743af0>
grab_PA.rb:12:in `initialize': No such file or directory # rb_sysopen - http://art.penny-arcade.com/photos/i-QpzhbpN/0/1050x10000/i-QpzhbpN-1050x10000.jpg (Errno::ENOENT)
from grab_PA.rb:12:in `open'
from grab_PA.rb:12:in `block in <main>'
from grab_PA.rb:11:in `open'
from grab_PA.rb:11:in `<main>'
But if I copy that exact link and put it into Firefox, it opens up the image. What's happening here? The program does write an image file to the program's directory with today's date, but the file is empty.

open takes an argument that's a filename, not an URL. If you want to access the URL, you would normally have to do a lot more than simply open a file.
Luckily, Ruby provides a nice wrapper for Net::HTTP, called open-uri.
Just drop the following line at the top of your program and it should work fine:
require 'open-uri'

Get the art_link src (something like art_link.attributes['src']). And than agent.get from the source.
After you'll have only the image at agent.page. Just save it by agent.page.save ('image_path_and_name').

Related

Can't open a file in docx gem when I call it through a string

I am using the docx gem to read a docx file. The code works when I write it like this:
require 'docx'
doc = Docx::Document.open('example.docx')
puts doc
It prints the doc perfectly. However, I need to get the path from the user through a gets. I need to do this:
require docx
puts "Provide the path of the document:"
document_path = gets.chomp.tr(" ", "") #it makes sure that any accidental whitespace is removed.
doc = Docx::Document.open(document_path)
puts doc
With this code I expect to get the same result that with the former one. The only difference is that I call the docx document to open through a string, not explicitly. Instead, I get this error:
/var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:82:in `initialize': File '/root/Documents/Projects/Wordsworth/example.docx' (Zip::Error)
not found
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:96:in `new'
from /var/lib/gems/2.3.0/gems/rubyzip-1.1.7/lib/zip/file.rb:96:in `open'
from /var/lib/gems/2.3.0/gems/docx-0.2.07/lib/docx/document.rb:25:in `initialize'
from /var/lib/gems/2.3.0/gems/docx-0.2.07/lib/docx/document.rb:50:in `new'
from /var/lib/gems/2.3.0/gems/docx-0.2.07/lib/docx/document.rb:50:in `open'
from test.rb:17:in `<main>'
I visited the docx gem github page but in the examples it gives the docx called is always explicitly written by the coder, never a string. I hope I can get some help. Thanks a lot!

Require Not Working in Ruby

I have two files person.rb and contact_info.rb and the person.rb file contains a class, and the contact_info.rb file contains a module. When i do load 'person.rb' in irb this works fine when load 'contact_info.rb' is at the top of this file.
When I switch this to require 'contact_info.rb' at the top of the person.rb file, and in irb do require 'person.rb' I get an error (i've included the error at the bottom of this text and it's using a completely different file path.
I've googled some solutions such as using './person.rb' and require_relative 'person' but these don't work either.
I've simplified the code within files to make things easier.
Any help would be awesome.
CODE IN THE person.rb FILE
require 'contact_info.rb'
class Person
include ContactInfo
end
CODE IN THE contact_info.rb FILE
module ContactInfo
#some code
end
ERROR MESSAGE THAT I'M GETTING - when I type in require 'person.rb' in irb.
**note - when i drag the file into the command line the file path is /Volumes/New\ Passport/All\ Creative/Ruby/module_folder_tut/person.rb
LoadError: cannot load such file -- person.rb
from /Users/paulknight/.rvm/rubies/ruby-2.4.1/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Users/paulknight/.rvm/rubies/ruby-2.4.1/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from (irb):1
from /Users/paulknight/.rvm/rubies/ruby-2.4.1/bin/irb:11:in `<main>'
Current directory in NOT on the ruby search path by default. Add it to $: and everything will be fine (assuming you are launching irb from the directory where person.rb is located):
$: << "."
require "person.rb"
More detailed info.

No such file or directory # rb_sysopen ruby and csv

First of all I would like to say that I'm new to Ruby and if I'm not able to give you a good picture of what I'm trying to solve, that is the reason.
I'm trying to convert URLs into images and I've looked around for answers but I can't seem to find an answer that works for me. The file has around 70,000+ links and I'm also trying to name these at the same time. I'm using ruby 2.3.0 if that is relevant.
Code --
require 'open-uri'
require 'tempfile'
require 'uri'
require 'csv'
def downloadFile(path,url)
begin
open(path, "wb+") do |file|
file << open(url).read
end
return true
rescue
return false
end
end
puts Dir.pwd
CSV.foreach("C/Users/b40ssr/RubymineProjects/Bygma/convert/konvertera.CSV", headers:true) do |row|
downloadFile(row[0], row[1])
end
So the error that I'm getting is
C:/Ruby23/lib/ruby/2.3.0/csv.rb:1265:in `initialize': No such file or directory # rb_sysopen - C/Users/b40ssr/RubymineProjects/Bygma/convert/konvertera.CSV (Errno::ENOENT)
I understand that there is something wrong with the directory but I cant seem to figure out what it is.
First of all, you can use relative path or just use "C:/"
Second, You are trying to open each row of CSV file ??
CSV.foreach("C/Users/b40ssr/RubymineProjects/Bygma/convert/konvertera.CSV"). This will iterate over each rows in CSV file.
Do you want to open each CSV file inside a directory ??

RubyZip Unzipping .docx, modifying, and zipping back up throws Errno::EACCESS error

So, I'm using Nokogiri and Rubyzip to unzip a .docx file, modify the word/docoument.xml file in it (in this case just change every element wrapped in to say "Dreams!"), and then zip it back up.
require 'nokogiri'
require 'zip'
zip = Zip::File.open("apple.docx")
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
inputs = xml.root.xpath("//w:t")
inputs.each{|element| element.content = "DREAMS!"}
zip.get_output_stream("word/document.xml", "w") {|f| f.write(xml.to_s)}
zip.close
Running the code through IRB line by line works perfectly and makes the changes to the .docx file as I needed, but if I run the script from the command line
ruby xmltodoc.rb
I receive the following error:
C:/Ruby193/lib/ruby/gems/1.9.1/gems/rubyzip-1.1.7/lib/zip/file.rb:416:in `rename': Permission denied - (C:/Users/Bane/De
sktop/apple.docx20150326-6016-k9ff1n, apple.docx) (Errno::EACCES)
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/rubyzip-1.1.7/lib/zip/file.rb:416:in `on_success_replace'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/rubyzip-1.1.7/lib/zip/file.rb:308:in `commit'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/rubyzip-1.1.7/lib/zip/file.rb:332:in `close'
from ./xmltodoc.rb:15:in `<main>'
All users on my computer have all permissions for that .docx file. The file also doesn't have any special settings--just a new file with a paragraph. This error only shows up on Windows, but the script works perfectly on Mac and Ubuntu. Running Powershell as Admin throws the same error. Any ideas?
On my Windows 7 system the following works.
require 'nokogiri'
require 'zip'
Zip::File.open("#{File.dirname(__FILE__)}/apple.docx") do |zipfile|
doc = zipfile.read("word/document.xml")
xml = Nokogiri::XML.parse(doc)
inputs = xml.root.xpath("//w:t")
inputs.each{|element| element.content = "DREAMS!"}
zipfile.get_output_stream("word/document.xml") {|f| f.write(xml.to_s)}
end
Instead you also could use the gem docx, here is an example, the names of the bookmarks are in dutch because, well that's the language my MS Office is in.
require 'docx'
# Create a Docx::Document object for our existing docx file
doc = Docx::Document.open('C:\Users\Gebruiker\test.docx'.gsub(/\\/,'/'))
# Insert a single line of text after one of our bookmarks
# p doc.bookmarks['bladwijzer1'].methods
doc.bookmarks['bladwijzer1'].insert_text_after("Hello world.")
# Insert multiple lines of text at our bookmark
doc.bookmarks['bladwijzer3'].insert_multiple_lines(['Hello', 'World', 'foo'])
# Save document to specified path
doc.save('example-edited.docx')

Require not able to find ruby file

I am an absolute beginner in Ruby. I created a small ruby file, and it runs well when I run the command ruby "methods.rb". That means I am in the correct directory.
But when I launch irb and run the command require "methods.rb", I get the following response:
LoadError: cannot load such file -- methods.rb
from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_require.rb:53:in `require'
from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/site_ruby/1.9.1/rubygems/core_ext/kernel_require.rb:53:in `require'
from (irb):1
from /usr/local/rvm/rubies/ruby-1.9.3-p392/bin/irb:16:in `<main>'
Ruby doesn't add the current path to the load path by default.
From irb, you can try require "./methods.rb" instead.
I do have a ruby file called so.rb in the directory /home/kirti/Ruby. So first from IRB I would change my current working directory using Dir#chdir method. Then I would call #load or #require method. My so.rb file contains only p hello line.
I would go this way :
>> Dir.pwd
=> "/home/kirti"
>> Dir.chdir("/home/kirti/Ruby")
=> 0
>> Dir.pwd
=> "/home/kirti/Ruby"
>> load 'so.rb'
"hello"
=> true
>> require './so.rb'
"hello"
=> true
To add the directory you are executing the ruby script from to the load path use:
$LOAD_PATH.unshift( File.join( File.dirname(__FILE__), '' ) )
or if you have put your dependencies in 'subdir' of the current directory:
$LOAD_PATH.unshift( File.join( File.dirname(__FILE__), 'subdir' ) )
If you are going to load things in IRB that are in your current directory, you can do:
irb -I.
Note the 'dot' there, indicating current directory.
If you are exploring and making changes in that file, while you are in IRB, use load rather than `require as load lets you load your changes, and require will only allow the file to be required once. This means you will not need to exit IRB to see how your changes are being affected.
To find out what options you have for IRB, you can do irb --help which is good to do if you are learning the tool.

Resources