Write StringIO to Tempfile - ruby

I am trying in ruby to read image from url and than save it to Tempfile to be later processed.
require 'open-uri'
url = 'http://upload.wikimedia.org/wikipedia/commons/8/89/Robie_House.jpg'
file = Tempfile.new(['temp','.jpg'])
stringIo = open(url)
# this is part I am confused about how to save StringIO to temp file?
file.write stringIo
This does not work that is resulting temp.jpg is not valid image. Not sure how to proceed with this.
Thanks

You're super close:
file.binmode
file.write stringIo.read
open(url) is just opening the stream for reading. It doesn't actually read the data until you call .read on it (which you can then pass in to file.write).

You could also create your tempfile with the correct encoding, like so:
file = Tempfile.new(['temp','.jpg'], :encoding => 'ascii-8bit')
This is the same as setting the file to binmode.

Related

in Ruby open IO object and pass each line to another object

I need to download a large zipped file, unzip it and modify each string before I save them to array.
I prefer to read downloaded zipped file line(entry) at a time, and manipulate each line(entry) as they load, rather then load the whole file in the memory.
I experimented with many IO methods of opening a file this way, but I struggle to pass a line(entry) to Zip::InputStream object. This is what I have:
require 'tempfile'
require 'zip'
require 'open-uri'
f = open(FILE_URL) #FILE_URL contains download path to .zip file
Zip::InputStream.open(f) do |io| #io is a String
while (io.get_next_entry)
io.each do |line|
# manipulate the line and push it to an array
end
end
end
if I use open(FILE_URL).each do |zip_entry|, I cannot figure out how to pass zip_entry to Zip::InputStream. Simply Zip::InputStream.open(zip_entry) does not work...
is this scenario possible, or do I have to have content of zipped file downloaded in to Tempfile completely? Any pointers so solve will be helpful

Ruby: Reading from a file written to by the system process

I'm trying to open a tmpfile in the system $EDITOR, write to it, and then read in the output. I can get it to work, but I am wondering why calling file.read returns an empty string (when the file does have content)
Basically I'd like to know the correct way of reading the file once it has been written to.
require 'tempfile'
file = Tempfile.new("note")
system("$EDITOR #{file.path}")
file.rewind
puts file.read # this puts out an empty string "" .. why?
puts IO.read(file.path) # this puts out the contents of the file
Yes, I will be running this in an ensure block to nuke the file once used ;)
I was running this on ruby 2.2.2 and using vim.
Make sure you are calling open on the file object before attempting to read it in:
require 'tempfile'
file = Tempfile.new("note")
system("$EDITOR #{file.path}")
file.open
puts file.read
file.close
file.unlink
This will also let you avoid calling rewind on the file, since your process hasn't written any bytes to it at the time you open it.
I believe IO.read will always open the file for you, which is why it worked in that case. Whereas calling .read on an IO-like object does not always open the file for you.

Save and parse CSV file from URL

I am looking for an implementation that would allow me to download a CSV file from a browser (via a URL), to a point where I can open that file manually and view its contents in CSV form.
I have been doing some research and can see that I should use the IO, CSV or File classes.
I have a URL that looks something like:
"https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
From what I have read I have:
href = page.find('#csv-download > a')['href']
csv_path = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
require 'open-uri'
download = open(csv_path, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE)
IO.copy_stream(download, 'test.csv')
This actually outputs:
2684
Which tells me that I have successfully got the data?
When downloading the file, the contents are just
#<StringIO:0x00000003e07d30>
Would there be any reason for this?
It's where to go from here, could anyone point me in the right direction please?
This should read from remote, write and then parse the file:
require 'open-uri'
require 'csv'
url = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
download = open(url)
IO.copy_stream(download, 'test.csv')
CSV.new(download).each do |l|
puts l
end
If all you want to do is read a file and save it, it's simple. This untested code should be all that's required:
require 'open-uri'
CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
IO.copy_stream(
open(
CSV_PATH,
ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
),
'test.csv'
)
OpenURI's open returns an IO stream, which is all you need to make copy_stream happy.
More typically you'll see the open, read, write pattern. open will create the IO stream for the remote document and read will retrieve the remote document and write will output it to a text file on your local disk. See their documentation for more information.
require 'open-uri'
CSV_PATH = "https://mydomain/manage/reporting/index?who=user&users=0&teams=0&datasetName=0&startDate=2015-10-18&endDate=2015-11-17&format=csv"
File.write(
'test.csv',
open(
CSV_PATH,
ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE
).read
)
There might be a scalability advantage to using copy_stream for huge files that potentially wouldn't fit into memory. That'd be a test for the user.
Here is a one-liner I use. Of course if the file is huge - I might want to stream or download it first, but this works in 99% of cases, just fine.
require 'open-uri'
require 'csv'
csv_data = CSV.readlines(open(download_url), headers: true)

Open an IO stream from a local file or url

I know there are libs in other languages that can take a string that contains either a path to a local file or a url and open it as a readable IO stream.
Is there an easy way to do this in ruby?
open-uri is part of the standard Ruby library, and it will redefine the behavior of open so that you can open a url, as well as a local file. It returns a File object, so you should be able to call methods like read and readlines.
require 'open-uri'
file_contents = open('local-file.txt') { |f| f.read }
web_contents = open('http://www.stackoverflow.com') {|f| f.read }

Read binary file as string in Ruby

I need an easy way to take a tar file and convert it into a string (and vice versa). Is there a way to do this in Ruby? My best attempt was this:
file = File.open("path-to-file.tar.gz")
contents = ""
file.each {|line|
contents << line
}
I thought that would be enough to convert it to a string, but then when I try to write it back out like this...
newFile = File.open("test.tar.gz", "w")
newFile.write(contents)
It isn't the same file. Doing ls -l shows the files are of different sizes, although they are pretty close (and opening the file reveals most of the contents intact). Is there a small mistake I'm making or an entirely different (but workable) way to accomplish this?
First, you should open the file as a binary file. Then you can read the entire file in, in one command.
file = File.open("path-to-file.tar.gz", "rb")
contents = file.read
That will get you the entire file in a string.
After that, you probably want to file.close. If you don’t do that, file won’t be closed until it is garbage-collected, so it would be a slight waste of system resources while it is open.
If you need binary mode, you'll need to do it the hard way:
s = File.open(filename, 'rb') { |f| f.read }
If not, shorter and sweeter is:
s = IO.read(filename)
To avoid leaving the file open, it is best to pass a block to File.open. This way, the file will be closed after the block executes.
contents = File.open('path-to-file.tar.gz', 'rb') { |f| f.read }
how about some open/close safety.
string = File.open('file.txt', 'rb') { |file| file.read }
Ruby have binary reading
data = IO.binread(path/filaname)
or if less than Ruby 1.9.2
data = IO.read(path/file)
on os x these are the same for me... could this maybe be extra "\r" in windows?
in any case you may be better of with:
contents = File.read("e.tgz")
newFile = File.open("ee.tgz", "w")
newFile.write(contents)
You can probably encode the tar file in Base64. Base 64 will give you a pure ASCII representation of the file that you can store in a plain text file. Then you can retrieve the tar file by decoding the text back.
You do something like:
require 'base64'
file_contents = Base64.encode64(tar_file_data)
Have look at the Base64 Rubydocs to get a better idea.
Ruby 1.9+ has IO.binread (see #bardzo's answer) and also supports passing the encoding as an option to IO.read:
Ruby 1.9
data = File.read(name, {:encoding => 'BINARY'})
Ruby 2+
data = File.read(name, encoding: 'BINARY')
(Note in both cases that 'BINARY' is an alias for 'ASCII-8BIT'.)
If you can encode the tar file by Base64 (and storing it in a plain text file) you can use
File.open("my_tar.txt").each {|line| puts line}
or
File.new("name_file.txt", "r").each {|line| puts line}
to print each (text) line in the cmd.

Resources