How to FTP in Ruby without first saving the text file - ruby

Since Heroku does not allow saving dynamic files to disk, I've run into a dilemma that I am hoping you can help me overcome. I have a text file that I can create in RAM. The problem is that I cannot find a gem or function that would allow me to stream the file to another FTP server. The Net/FTP gem I am using requires that I save the file to disk first. Any suggestions?
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextfile(path_to_web_file)
ftp.close
The ftp.puttextfile function is what is requiring a physical file to exist.

StringIO.new provides an object that acts like an opened file. It's easy to create a method like puttextfile, by using StringIO object instead of file.
require 'net/ftp'
require 'stringio'
class Net::FTP
def puttextcontent(content, remotefile, &block)
f = StringIO.new(content)
begin
storlines("STOR " + remotefile, f, &block)
ensure
f.close
end
end
end
file_content = <<filecontent
<html>
<head><title>Hello!</title></head>
<body>Hello.</body>
</html>
filecontent
ftp = Net::FTP.new(domain)
ftp.passive = true
ftp.login(username, password)
ftp.chdir(path_on_server)
ftp.puttextcontent(file_content, path_to_web_file)
ftp.close

David at Heroku gave a prompt response to a support ticket I entered there.
You can use APP_ROOT/tmp for temporary file output. The existence of files created in this dir is not guaranteed outside the life of a single request, but it should work for your purposes.
Hope this helps,
David

Related

Ruby: Reading from a file written to by the system process

I'm trying to open a tmpfile in the system $EDITOR, write to it, and then read in the output. I can get it to work, but I am wondering why calling file.read returns an empty string (when the file does have content)
Basically I'd like to know the correct way of reading the file once it has been written to.
require 'tempfile'
file = Tempfile.new("note")
system("$EDITOR #{file.path}")
file.rewind
puts file.read # this puts out an empty string "" .. why?
puts IO.read(file.path) # this puts out the contents of the file
Yes, I will be running this in an ensure block to nuke the file once used ;)
I was running this on ruby 2.2.2 and using vim.
Make sure you are calling open on the file object before attempting to read it in:
require 'tempfile'
file = Tempfile.new("note")
system("$EDITOR #{file.path}")
file.open
puts file.read
file.close
file.unlink
This will also let you avoid calling rewind on the file, since your process hasn't written any bytes to it at the time you open it.
I believe IO.read will always open the file for you, which is why it worked in that case. Whereas calling .read on an IO-like object does not always open the file for you.

Open a local file with open-uri

I am doing data scraping with Ruby and Nokogiri. Is it possible to download and parse a local file in my computer?
I have:
require 'open-uri'
url = "file:///home/nav/Desktop/Scraping/scrap1.html"
It gives error as:
No such file or directory # rb_sysopen - file:\home/nav/Desktop/Scraping/scrap1.html
If you want to parse a local file with Nokogiri you can do it like this.
file = File.read('/home/nav/Desktop/Scraping/scrap1.html')
doc = Nokogiri::HTML(file)
When you open a local file in a browser, the URL in the address bar is displayed as:
file:///Users/7stud/Desktop/accounts.txt
But that doesn't mean you use that format in a Ruby script. Your Ruby script doesn't send the file name to a browser and then ask the browser to retrieve the file. Your Ruby script searches your file system directly.
The same is true for URLs: your Ruby script doesn't ask your browser to go retrieve a page from the internet, Ruby retrieves the page itself by sending a request using your system's network interface. After all, a browser and a Ruby program are both just computer programs. What your browser can do over a network, a Ruby program can do, too.
This works for me:
require 'open-uri'
text = open('./data.txt').read
puts text
You have to get your path right, though. The only reason I can think of to use open() is if you had an array of filenames and URLs mixed together. If that isn't your situation, see new2code's answer.
This is how I do it as according to the documentation.
f = File.open("//home/nav/Desktop/Scraping/scrap1.html")
doc = Nokogiri::HTML(f)
f.close
I would make use of Mechanize and save the file locally, then parse it with Nokogiri like so:
# Save the file
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Download
current_url = 'http://www.example.com'
file = agent.get(current_url)
file.save!("#{Rails.root}/tmp/")
# Read the file
page = Nokogiri::HTML::Reader(File.open(file))
Hope that helps!

ruby rss module not reading full path

I am downloading an rss file posted as xml, and saving it with the rss extension.
I then use the rss module to read it as an rss file. The issue I have is the following:
If I create the file (page.rss) with an implicit path and I use just
that filename to process it with the rss parsing function, everything
is fine (downloaded_file = 'page.rss')
If I explicity enter manually the full path into the script (downloaded_file = "E:/Libraries/Documents/Android dev/page.rss"), everything works fine also.
But if I "calculate" the value of the absolute path with: downloaded_file = File.join(Dir.pwd, 'page.rss') the rss function fails. The value of the variable is apparently the same ("E:/Libraries/Documents/Android dev/page.rss") but there must be an invisible difference. I would like to be able to use the 'calculated' absolute path. I am sure there is a subtle difference in the way this string is interpreted by the rss function. How can I elucidate it?
Thanks for any suggestion.
Here is my script:
require 'rss'
require 'open-uri'
url = 'http://tutorialspoint.com/android/sampleXML.xml'
downloaded_file = File.join(Dir.pwd, 'page.rss') # FAILS
puts "Path = #{downloaded_file}"#=> "E:/Libraries/Documents/Android dev/page.rss"
downloaded_file = 'page.rss' # WORKS
#downloaded_file = "E:/Libraries/Documents/Android dev/page.rss" # WORKS
puts "Used path/filename: #{downloaded_file}"
File.open(downloaded_file, 'wb') do |file| # Download url content into rss file
file << open(url).read
end
rss = RSS::Parser.parse(downloaded_file, false) # Read rss from downloaded_file
puts "Title: #{rss.channel.title}"
NEW ANSWER
Okay, so your downloaded_file string has been marked as tainted, and the RSS::Parser won't open a tainted file string for some reason (see rss/parser.rb about l. 105 for more details). The solution is to either: untaint the downloaded_file string before you call parse, e.g.:
RSS::Parser.parse(downloaded_file.untaint, false)
or to just open the file for the parser, e.g.:
RSS::Parser.parse(File.open(downloaded_file), false)
I'd never run into this issue before, so thanks! I'd heard of object tainting before, but I never really had any use to look into it. There is a bit more information about it here: What are tainted objects, and when should we untaint them?.
PREVIOUS ANSWER
Dir.pwd is going to change depending on where you call the script from. Unless you are calling the script from E:/Libraries/Documents/Android dev, the filepath will be off.
It's better to build your filepath from the location of your script itself. To do so you can add:
ROOT = File.expand_path('..', __FILE__)
downloaded_file = File.join(ROOT, 'page.rss')
# or just downloaded_file = File.expand_path('../page.rss', __FILE__)

how to parse XML file remotely from FTP with nokogiri gem, without downloading

require 'net/ftp'
require 'nokogiri'
server = "xxxxxx"
user = "xxxxx"
password = "xxxxx"
ftp = Net::FTP.new(server, user, password)
files = ftp.nlst('File*.xml')
files.each do |file|
ftp.getbinaryfile(file)
doc = Nokogiri::XML(open(file))
# some operations with doc
end
With the code above I'm able to parse/read XML file, because it first downloads a file.
But how can I parse remote XML file without downloading it?
The code above is a part of rake task that loads rails environment when run.
UPDATE:
I'm not going to create any file. I will import info into the mongodb using mongoid.
If you simply want to avoid using a temporary local file, it is possible to to fetch the file contents direct as a String, and process in memory, by supplying nil as the local file name:
files.each do |file|
xml_string = ftp.getbinaryfile( file, nil )
doc = Nokogiri::XML( xml_string )
# some operations with doc
end
This still does an FTP fetch of the contents, and XML parsing happens at the client.
It is not really possible to avoid fetching the data in some form or other, and if FTP is the only protocol you have available, then that means copying data over the network using an FTP get. However, it is possible, but far more complicated, to add capabilities to your FTP (or other net-based) server, and return the data in some other form. That could include Nokogiri parsing done remotely on the server, but you'd still need to serialise the end result, fetch it and deserialise it.

Open an IO stream from a local file or url

I know there are libs in other languages that can take a string that contains either a path to a local file or a url and open it as a readable IO stream.
Is there an easy way to do this in ruby?
open-uri is part of the standard Ruby library, and it will redefine the behavior of open so that you can open a url, as well as a local file. It returns a File object, so you should be able to call methods like read and readlines.
require 'open-uri'
file_contents = open('local-file.txt') { |f| f.read }
web_contents = open('http://www.stackoverflow.com') {|f| f.read }

Resources