Hi I am running some code that scrapes a web page and then spits out a csv file here is that code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
page = Nokogiri::HTML(open("https://www.drugs.com/pharmaceutical-companies.html"))
puts page.class #=> Nokogiri::HTML::Document
pharma_links = page.css("div.col-list-az a")
link= pharma_links.each{|link| puts link['href'] }
company = pharma_links.each{|link| puts link.text}
CSV.open("file.csv", "wb") do |csv|
csv << [company, link]
end
The code works perfectly all the way till the end, where I get this error
C:/Ruby24/lib/ruby/2.4.0/csv.rb:1282:in `initialize': Permission denied # rb_sysopen - file.csv (Errno::EACCES)
I have literally given myself ownership of the entire C: drive and still receive this error. Please help I am beyond my wits. Also, I am new to Ruby so please be explicit in your answers.
I am running Windows 10 with 32-bit Ruby.
I have literally given myself ownership of the entire C: drive
Contrary to what you assert, you don't have write permissions everywhere on drive C:. Something is preventing you from writing in the current working dir (Dir.getwd).
Presumably, you have write access to at least root of C:? If so, try writing there.
CSV.open("c:/file.csv", "wb") do |csv|
Related
First of all I would like to say that I'm new to Ruby and if I'm not able to give you a good picture of what I'm trying to solve, that is the reason.
I'm trying to convert URLs into images and I've looked around for answers but I can't seem to find an answer that works for me. The file has around 70,000+ links and I'm also trying to name these at the same time. I'm using ruby 2.3.0 if that is relevant.
Code --
require 'open-uri'
require 'tempfile'
require 'uri'
require 'csv'
def downloadFile(path,url)
begin
open(path, "wb+") do |file|
file << open(url).read
end
return true
rescue
return false
end
end
puts Dir.pwd
CSV.foreach("C/Users/b40ssr/RubymineProjects/Bygma/convert/konvertera.CSV", headers:true) do |row|
downloadFile(row[0], row[1])
end
So the error that I'm getting is
C:/Ruby23/lib/ruby/2.3.0/csv.rb:1265:in `initialize': No such file or directory # rb_sysopen - C/Users/b40ssr/RubymineProjects/Bygma/convert/konvertera.CSV (Errno::ENOENT)
I understand that there is something wrong with the directory but I cant seem to figure out what it is.
First of all, you can use relative path or just use "C:/"
Second, You are trying to open each row of CSV file ??
CSV.foreach("C/Users/b40ssr/RubymineProjects/Bygma/convert/konvertera.CSV"). This will iterate over each rows in CSV file.
Do you want to open each CSV file inside a directory ??
I am using Ruby 2.1.0p0 on Mac OS.
I'm parsing a CSV file and grabbing all the URLs, then using Nokogiri and OpenURI to scrape them which is where I'm getting stuck.
When I try to use an each loop to run through the URLs array, I get this error:
initialize': No such file or directory # rb_sysopen - URL (Errno::ENOENT)
When I manually create an array, and then run through it I get no error. I've tried to_s, URI::encode, and everything I could think of and find on Stack Overflow.
I can copy and paste the URL from the CSV or from the terminal after using puts on the array and it opens in my browser no problem. I try to open it with Nokogiri it's not happening.
Here's my code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'csv'
events = Array.new
CSV.foreach('productfeed.csv') do |row|
events.push URI::encode(row[0]).to_s
end
events.each do |event|
page = Nokogiri::HTML(open("#{event}"))
#eventually, going to find info on the page, and scrape it, but not there yet.
#something to show I didn't get an error
puts "open = success"
end
Please help! I am completely out of ideas.
It looks like you're processing the header row, where on of those values is literally "URL". That's not a valid URI so open-uri won't touch it.
There's a headers option for the CSV module that will make use of the headers automatically. Try turning that on and referring to row["URL"]
I tried doing the same thing and found it to work better using a text file.
Here is what I did.
#!/usr/bin/python
#import webbrowser module and time module
import webbrowser
import time
#open text file as "dataFile" and verify there is data in said file
dataFile = open('/home/user/Desktop/urls.txt','r')
if dataFile > 1:
print("Data file opened successfully")
else:
print("!!!!NO DATA IN FILE!!!!")
exit()
#read file line by line, remove any spaces/newlines, and open link in chromium-browser
for lines in dataFile:
url = str(lines.strip())
print("Opening " + url)
webbrowser.get('chromium-browser').open_new_tab(url)
#close file and exit
print("Closing Data File")
dataFile.close()
#wait two seconds before printing "Data file closed".
#this is purely for visual effect.
time.sleep(2)
print("Data file closed")
#after opener has run, user is prompted to press enter key to exit.
raw_input("\n\nURL Opener has run. Press the enter key to exit.")
exit()
Hope this helps!
I need to download a file daily from a client that I have SCP but not SSH access to.
The file name will always be /outgoing/Extract/visit_[date]-[timestamp].dat.gz'
For example yesterdays file was called visits_20130604-090003.dat.gz
I can not rely on the fact that the time stamp will always be the same, but the date should always be yesterdays date:
My set up so far:
My home directory contains to sub-directories named downloads_fullname and downloads_wildcard.
It also contains an simple ruby script named foo.rb.
The contents of foo.rb are this`
#! /usr/bin/ruby
require 'net/ssh'
require 'net/scp'
yesterday = (Time.now - 86400).strftime('%Y%m%d')
Net::SCP.start('hostname', 'username') do |scp|
scp.download!('/outgoing/Extract/visits_' + yesterday + '-090003.dat.gz', 'downloads_fullname')
scp.download!('/outgoing/Extract/visits_' + yesterday + '-*.dat.gz', 'downloads_wildcard')
end
When run the downloads_fullname directory contains the file, but the downloads_wildcard directory does not.
Is there any way to use wildcarding in Net::SCP? Or does anybody have any sly workarounds? I tried \*to no avail.
Thank you Tin Man!!!
To anybody else, here is the code I ended up with following Tin Man's lead:
(Tried to post it as a comment but had formatting issues)
#! /usr/bin/ruby
require 'net/sftp'
yesterday = (Time.now - 86400).strftime('%Y%m%d')
Net::SFTP.start('hostname', 'username') do |sftp|
sftp.dir.foreach("/outgoing/Extract") do |file|
if file.name.include? '_' + yesterday + '-'
sftp.download!('/outgoing/Extract/' + file.name, 'downloads/'+ file.name)
end
end
end
I don't think you can get there using scp because it expects you to know exactly which file you want, but sftp will let you get a directory listing.
You can use Net::SFTP to programmatically pick your file and request it. This is the example code:
require 'net/sftp'
Net::SFTP.start('host', 'username', :password => 'password') do |sftp|
# upload a file or directory to the remote host
sftp.upload!("/path/to/local", "/path/to/remote")
# download a file or directory from the remote host
sftp.download!("/path/to/remote", "/path/to/local")
# grab data off the remote host directly to a buffer
data = sftp.download!("/path/to/remote")
# open and write to a pseudo-IO for a remote file
sftp.file.open("/path/to/remote", "w") do |f|
f.puts "Hello, world!\n"
end
# open and read from a pseudo-IO for a remote file
sftp.file.open("/path/to/remote", "r") do |f|
puts f.gets
end
# create a directory
sftp.mkdir! "/path/to/directory"
# list the entries in a directory
sftp.dir.foreach("/path/to/directory") do |entry|
puts entry.longname
end
end
Based on that you can list the directory entries then use find or select to iterate over the returned list to find the one with the current date. Pass that filename to sftp.download! to download it to a local file.
I see a lot of cool stuff I can add to my Ruby console. For example, a good list is
"My .irbrc for console/irb".
I googled, but all I found is weblogs saying what gems people add to their .irbrc. No one is saying where to find it.
I cannot find "irbrc".
I opened my home folder and, if I type IRB, it goes to the Ruby console, but I can't find this file.
Can someone help me locate it?
It's a irbrc dotfile so you will need to ls -a in your home directory to find it. If it isn't in there, simply create a .irbrc file.
Mine's pretty simple but this is what I have in it:
require 'rubygems'
require 'ap'
require 'irb/completion'
ARGV.concat [ "--readline", "--prompt-mode", "simple" ]
module Readline
module History
LOG = "#{ENV['HOME']}/.irb-history"
def self.write_log(line)
File.open(LOG, 'ab') {|f| f << "#{line}\n"}
end
def self.start_session_log
write_log("\n# session start: #{Time.now}\n\n")
at_exit { write_log("\n# session stop: #{Time.now}\n") }
end
end
alias :old_readline :readline
def readline(*args)
ln = old_readline(*args)
begin
History.write_log(ln)
rescue
end
ln
end
end
IRB::Irb.class_eval do
def output_value
ap #context.last_value
end
end
Readline::History.start_session_log
require 'irb/ext/save-history'
IRB.conf[:SAVE_HISTORY] = 100
IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"
IRB.conf[:PROMPT_MODE] = :SIMPLE
require 'irb/completion'
If you are unable to find the file.irbrc in your home directory, simply create it in your home directory and fill it with some lines such as:
require "irb/completion"
Then your irb will automatically load completion module when you launch irb.
PS: it also works for UNIX/Linux system.
Since Heroku does not allow saving dynamic files to disk, I've run into a dilemma that I am hoping you can help me overcome. I have a text file that I can create in RAM. The problem is that I cannot find a gem or function that would allow me to stream the file to another FTP server. The Net/FTP gem I am using requires that I save the file to disk first. Any suggestions?
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextfile(path_to_web_file)
ftp.close
The ftp.puttextfile function is what is requiring a physical file to exist.
StringIO.new provides an object that acts like an opened file. It's easy to create a method like puttextfile, by using StringIO object instead of file.
require 'net/ftp'
require 'stringio'
class Net::FTP
def puttextcontent(content, remotefile, &block)
f = StringIO.new(content)
begin
storlines("STOR " + remotefile, f, &block)
ensure
f.close
end
end
end
file_content = <<filecontent
<html>
<head><title>Hello!</title></head>
<body>Hello.</body>
</html>
filecontent
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextcontent(file_content, path_to_web_file)
ftp.close
David at Heroku gave a prompt response to a support ticket I entered there.
You can use APP_ROOT/tmp for temporary file output. The existence of files created in this dir is not guaranteed outside the life of a single request, but it should work for your purposes.
Hope this helps,
David