Open an IO stream from a local file or url - ruby

I know there are libs in other languages that can take a string that contains either a path to a local file or a url and open it as a readable IO stream.
Is there an easy way to do this in ruby?

open-uri is part of the standard Ruby library, and it will redefine the behavior of open so that you can open a url, as well as a local file. It returns a File object, so you should be able to call methods like read and readlines.
require 'open-uri'
file_contents = open('local-file.txt') { |f| f.read }
web_contents = open('http://www.stackoverflow.com') {|f| f.read }

Related

Save and parse CSV file from URL

I am looking for an implementation that would allow me to download a CSV file from a browser (via a URL), to a point where I can open that file manually and view its contents in CSV form.
I have been doing some research and can see that I should use the IO, CSV or File classes.
I have a URL that looks something like:
"https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
From what I have read I have:
href = page.find('#csv-download > a')['href']
csv_path = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
require 'open-uri'
download = open(csv_path, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE)
IO.copy_stream(download, 'test.csv')
This actually outputs:
2684
Which tells me that I have successfully got the data?
When downloading the file, the contents are just
#<StringIO:0x00000003e07d30>
Would there be any reason for this?
It's where to go from here, could anyone point me in the right direction please?
This should read from remote, write and then parse the file:
require 'open-uri'
require 'csv'
url = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
download = open(url)
IO.copy_stream(download, 'test.csv')
CSV.new(download).each do |l|
puts l
end
If all you want to do is read a file and save it, it's simple. This untested code should be all that's required:
require 'open-uri'
CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
IO.copy_stream(
open(
CSV_PATH,
ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
),
'test.csv'
)
OpenURI's open returns an IO stream, which is all you need to make copy_stream happy.
More typically you'll see the open, read, write pattern. open will create the IO stream for the remote document and read will retrieve the remote document and write will output it to a text file on your local disk. See their documentation for more information.
require 'open-uri'
CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
File.write(
'test.csv',
open(
CSV_PATH,
ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
).read
)
There might be a scalability advantage to using copy_stream for huge files that potentially wouldn't fit into memory. That'd be a test for the user.
Here is a one-liner I use. Of course if the file is huge - I might want to stream or download it first, but this works in 99% of cases, just fine.
require 'open-uri'
require 'csv'
csv_data = CSV.readlines(open(download_url), headers: true)

Ruby: Download zip file and extract

I have a ruby script that downloads a remote ZIP file from a server using rubys opencommand. When I look into the downloaded content, it shows something like this:
PK\x03\x04\x14\x00\b\x00\b\x00\x9B\x84PG\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x10\x00foobar.txtUX\f\x00\x86\v!V\x85\v!V\xF6\x01\x14\x00K\xCB\xCFOJ,RH\x03S\\\x00PK\a\b\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00PK\x01\x02\x15\x03\x14\x00\b\x00\b\x00\x9B\x84PG\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00\n\x00\f\x00\x00\x00\x00\x00\x00\x00\x00#\xA4\x81\x00\x00\x00\x00foobar.txtUX\b\x00\x86\v!V\x85\v!VPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00D\x00\x00\x00T\x00\x00\x00\x00\x00
I tried using the Rubyzip gem (https://github.com/rubyzip/rubyzip) along with its class Zip::ZipInputStream like this:
stream = open("http://localhost:3000/foobar.zip").read # this outputs the zip content from above
zip = Zip::ZipInputStream.new stream
Unfortunately, this throws an error:
Failure/Error: zip = Zip::ZipInputStream.new stream
ArgumentError:
string contains null byte
My questions are:
Is it possible, in general, to download a ZIP file and extract its content in-memory?
Is Rubyzip the right library for it?
If so, how can I extract the content?
I found the solution myself and then at stackoverflow :D (How to iterate through an in-memory zip file in Ruby)
input = HTTParty.get("http://example.com/somedata.zip").body
Zip::InputStream.open(StringIO.new(input)) do |io|
while entry = io.get_next_entry
puts entry.name
parse_zip_content io.read
end
end
Download your ZIP file, I'm using HTTParty for this (but you could also use ruby's open command (require 'open-uri').
Convert it into a StringIO stream using StringIO.new(input)
Iterate over every entry inside the ZIP archive using io.get_next_entry (it returns an instance of Entry)
With io.read you get the content, and with entry.name you get the filename.
Like I commented in https://stackoverflow.com/a/43303222/4196440, we can just use Zip::File.open_buffer:
require 'open-uri'
content = open('http://localhost:3000/foobar.zip')
Zip::File.open_buffer(content) do |zip|
zip.each do |entry|
puts entry.name
# Do whatever you want with the content files.
end
end

Reading and writing to and from files - can you do it the same way? (Ruby)

I'm in the process of learning Ruby and reading through Chris Pine's book. I'm learning how to read (and write) files, and came upon this example:
require 'yaml'
test_array = ['Give Quiche A Chance',
'Mutants Out!',
'Chameleonic Life-Forms, No Thanks']
test_string = test_array.to_yaml
filename = 'whatever.txt'
File.open filename, 'w' do |f|
f.write test_string
end
read_string = File.read filename
read_array = YAML::load read_string
puts(read_string == test_string)
puts(read_array == test_array )
The point of the example was to teach me about YAML, but my question is, if you can read a file with:
File.read filename
Can you write to a file in a similar way?:
File.write filename test_string
Sorry if it's a dumb question. I was just curious why it's written the way it was and if it had to be written that way.
Can you write to a file in a similar way?
Actually, yes. And it's pretty much exactly as you guessed:
File.write 'whatever.txt', test_array.to_yaml
I think it is amazing how intuitive Ruby can be.
See IO.write for more details. Note that IO.binwrite is also available, along with IO.read and IO.binread.
The Ruby File class will give you new and open but it inherits from the IO class so you get the read and write methods too.
I think the right way to write into a file is the following:
File.open(yourfile, 'w') { |file| file.write("your text") }
To brake this line down:
We first open the file setting the access mode ('w' to overwrite, 'a' to append, etc.)
We then actually write into the file
You can read or write to a file by specifying the mode you access it through. The Ruby File class is a subclass of IO.
The File class open or new methods take a path and a mode as arguments:
File.open('path', 'mode') alternatively: File.new('path','mode')
Example: to write to an existing file
somefile = File.open('./dir/subdirectory/file.txt', 'w')
##some code to write to file, eg:
array_of_links.each {|link| somefile.puts link }
somefile.close
See the source documentation as suggested above for more details, or similar question here: How to write to file in Ruby?

Write StringIO to Tempfile

I am trying in ruby to read image from url and than save it to Tempfile to be later processed.
require 'open-uri'
url = 'http://upload.wikimedia.org/wikipedia/commons/8/89/Robie_House.jpg'
file = Tempfile.new(['temp','.jpg'])
stringIo = open(url)
# this is part I am confused about how to save StringIO to temp file?
file.write stringIo
This does not work that is resulting temp.jpg is not valid image. Not sure how to proceed with this.
Thanks
You're super close:
file.binmode
file.write stringIo.read
open(url) is just opening the stream for reading. It doesn't actually read the data until you call .read on it (which you can then pass in to file.write).
You could also create your tempfile with the correct encoding, like so:
file = Tempfile.new(['temp','.jpg'], :encoding => 'ascii-8bit')
This is the same as setting the file to binmode.

Getting webpage content with Ruby -- I'm having troubles

I want to get the content off this* page. Everything I've looked up gives the solution of parsing CSS elements; but, that page has none.
Here's the only code that I found that looked like it should work:
file = File.open('http://hiscore.runescape.com/index_lite.ws?player=zezima', "r")
contents = file.read
puts contents
Error:
tracker.rb:1:in 'initialize': Invalid argument - http://hiscore.runescape.com/index_lite.ws?player=zezima (Errno::EINVAL)
from tracker.rb:1:in 'open'
from tracker.rb:1
*http://hiscore.runescape.com/index_lite.ws?player=zezima
If you try to format this as a link in the post it doesn't recognize the underscore (_) in the URL for some reason.
You really want to use open() provided by the Kernel class which can read from URIs you just need to require the OpenURI library first:
require 'open-uri'
Used like so:
require 'open-uri'
file = open('http://hiscore.runescape.com/index_lite.ws?player=zezima')
contents = file.read
puts contents
This related SO thread covers the same question:
Open an IO stream from a local file or url
The appropriate way to fetch the content of a website is through the NET::HTTP module in Ruby:
require 'uri'
require 'net/http'
url = "http://hiscore.runescape.com/index_lite.ws?player=zezima"
r = Net::HTTP.get_response(URI.parse(url).host, URI.parse(url).path)
File.open() does not support URIs.
Best wishes,
Fabian
Please use open-uri, its support both uri and local files
require 'open-uri'
contents = open('http://www.google.com') {|f| f.read }

Resources