in Ruby open IO object and pass each line to another object - ruby

I need to download a large zipped file, unzip it and modify each string before I save them to array.
I prefer to read downloaded zipped file line(entry) at a time, and manipulate each line(entry) as they load, rather then load the whole file in the memory.
I experimented with many IO methods of opening a file this way, but I struggle to pass a line(entry) to Zip::InputStream object. This is what I have:
require 'tempfile'
require 'zip'
require 'open-uri'
f = open(FILE_URL) #FILE_URL contains download path to .zip file
Zip::InputStream.open(f) do |io| #io is a String
while (io.get_next_entry)
io.each do |line|
# manipulate the line and push it to an array
end
end
end
if I use open(FILE_URL).each do |zip_entry|, I cannot figure out how to pass zip_entry to Zip::InputStream. Simply Zip::InputStream.open(zip_entry) does not work...
is this scenario possible, or do I have to have content of zipped file downloaded in to Tempfile completely? Any pointers so solve will be helpful

Related

Load gemspec from stdin

I'm trying to adapt some existing code to also handle gems. This existing code needs the version number of the thing in question (here: the gem) and does some git stuff to get the relevant file (here I take the gemspec) in the right version, and then passes it on stdin to another script that extract the version number (and other stuff).
To avoid having to write code to parse a gemspec, I was trying to do:
spec = Gem::Specification::load('-')
puts spec.name
puts spec.version
But I can't make it read from stdin (it works fine if I hardcode a file name, but that won't work in my usecase). Can I do that, or is there another (easy) way to do it?
Gem::Specification.load expects either a File instance or a path to a file as the first argument so the easiest way to solve this would be to simply create a Tempfile instance and write the data from stdin to it.
file = Tempfile.new
begin
file.write(data_from_stdin)
file.rewind
spec = Gem::Specification.load(file)
puts spec.name
puts spec.version
ensure
file.close
file.unlink
end

After loading a url by open-uri, how to handle the generated Tempfile object?

I wanna figure out how to download images from internet then store them locally.
Here's what I did:
require 'open-uri' # => true
file = open "https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png"
# => #<Tempfile:/var/folders/k0/.../T/open-uri20180524-60756-1r44uix>
Then I was confused about this Tempfile object. I found I can get the original url by:
file.base_uri
# => #<URI::HTTPS https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png>
But I failed in finding a method that can directly get the original file name Snip20180323_40.png.
Is there a method that can directly get the original file name from a Tempfile object?
What purpose are Tempfile objects mainly used for? Are they different from normal file objects such as: file_object = File.open('how_old.rb') # => #<File:how_old.rb>?
Can I convert a Tempfile object to a File object?
How can I write this Tempfile as the same name file in a local directory, for example /users/user_name/images/Snip20180323_40.png?
The original filename is only really available in the URL. Just take uri.path.split("/").last.
Tempfiles are effective Files, with the distinction that when it is garbage collected, the underlying file is deleted.
You can copy the underlying file with FileUtils.copy, or you can open the Tempfile, read it, and write it into a new File handle of your choosing.
Something like this should work:
def download_url_to(url, base_path)
uri = URI(url)
filename = uri.path.split("/").last
new_file = File.join(base_path, filename)
response = uri.open
open(new_file, "wb") {|fp| fp.puts response.read }
return new_file
end
It's worth noting that if the file is less than 10kb, you'll get a StringIO object rather than a Tempfile object. The above solution handles both cases. This also just accepts whatever the last part of the path parameter is - it's going to be up to you to sanitize it, as well as the contents of the file itself; you don't want to permit clients to download arbitrary files to your system, in most cases. For example, you may want to be extra sure that the filename doesn't include paths like ..\\..\\.."which may be used to write files to non-intended locations.

Ruby: Reading from a file written to by the system process

I'm trying to open a tmpfile in the system $EDITOR, write to it, and then read in the output. I can get it to work, but I am wondering why calling file.read returns an empty string (when the file does have content)
Basically I'd like to know the correct way of reading the file once it has been written to.
require 'tempfile'
file = Tempfile.new("note")
system("$EDITOR #{file.path}")
file.rewind
puts file.read # this puts out an empty string "" .. why?
puts IO.read(file.path) # this puts out the contents of the file
Yes, I will be running this in an ensure block to nuke the file once used ;)
I was running this on ruby 2.2.2 and using vim.
Make sure you are calling open on the file object before attempting to read it in:
require 'tempfile'
file = Tempfile.new("note")
system("$EDITOR #{file.path}")
file.open
puts file.read
file.close
file.unlink
This will also let you avoid calling rewind on the file, since your process hasn't written any bytes to it at the time you open it.
I believe IO.read will always open the file for you, which is why it worked in that case. Whereas calling .read on an IO-like object does not always open the file for you.

Ruby: Download zip file and extract

I have a ruby script that downloads a remote ZIP file from a server using rubys opencommand. When I look into the downloaded content, it shows something like this:
PK\x03\x04\x14\x00\b\x00\b\x00\x9B\x84PG\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x10\x00foobar.txtUX\f\x00\x86\v!V\x85\v!V\xF6\x01\x14\x00K\xCB\xCFOJ,RH\x03S\\\x00PK\a\b\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00PK\x01\x02\x15\x03\x14\x00\b\x00\b\x00\x9B\x84PG\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00\n\x00\f\x00\x00\x00\x00\x00\x00\x00\x00#\xA4\x81\x00\x00\x00\x00foobar.txtUX\b\x00\x86\v!V\x85\v!VPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00D\x00\x00\x00T\x00\x00\x00\x00\x00
I tried using the Rubyzip gem (https://github.com/rubyzip/rubyzip) along with its class Zip::ZipInputStream like this:
stream = open("http://localhost:3000/foobar.zip").read # this outputs the zip content from above
zip = Zip::ZipInputStream.new stream
Unfortunately, this throws an error:
Failure/Error: zip = Zip::ZipInputStream.new stream
ArgumentError:
string contains null byte
My questions are:
Is it possible, in general, to download a ZIP file and extract its content in-memory?
Is Rubyzip the right library for it?
If so, how can I extract the content?
I found the solution myself and then at stackoverflow :D (How to iterate through an in-memory zip file in Ruby)
input = HTTParty.get("http://example.com/somedata.zip").body
Zip::InputStream.open(StringIO.new(input)) do |io|
while entry = io.get_next_entry
puts entry.name
parse_zip_content io.read
end
end
Download your ZIP file, I'm using HTTParty for this (but you could also use ruby's open command (require 'open-uri').
Convert it into a StringIO stream using StringIO.new(input)
Iterate over every entry inside the ZIP archive using io.get_next_entry (it returns an instance of Entry)
With io.read you get the content, and with entry.name you get the filename.
Like I commented in https://stackoverflow.com/a/43303222/4196440, we can just use Zip::File.open_buffer:
require 'open-uri'
content = open('http://localhost:3000/foobar.zip')
Zip::File.open_buffer(content) do |zip|
zip.each do |entry|
puts entry.name
# Do whatever you want with the content files.
end
end

My file is getting shorter and I don't know why

I have a requirement where I need to edit part of xml file and save it, but in my code some part of the xml file it not saving.I want to modify <mtn:ttl>4</mtn:ttl> to <mtn:ttl>9</mtn:ttl>, this part is getting modified in the below code but while writting/saving only part of file is getting chaged or the format of the file is getting chaged, can any one tell me how to solve this? original xml file size is 79kb but after editing and saving its becoming 78kb...
require "rexml/text"
require "rexml/document"
include REXML
File.open("c://conf//cad-mtn-config.xml") do |config_file|
# Open the document and edit the file
config = Document.new(config_file)
if testField.to_s.match(/<mtn:ttl>/)
config.root.elements[4].elements[11].elements[1].elements[1].elements[1].elements[8].text="9"
# Write the result to a new file.
formatter = REXML::Formatters::Default.new
File.open("c://mtn-3//mtn-2.2//conf//cad-mtn-config.xml", 'w') do |result|
formatter.write(config, result)
end
end
end
It looks like your trying to use regular expressions, why not just use rexml? The only requirement is that you need to know where the namespace is located online. Note if it were not mtn:ttl and just ttl you would not need the namespace.
require 'rexml/document'
file_path="path to file"
contents=File.new(file_path).read
xml_doc=REXML::Document.new(contents)
xml_doc.add_namespace('mtn',"http://url to mtn namespace")
xml_doc.root.elements.each('mtn:ttl') do |element|
element.text="9"
end
File.open(file_path,"w") do |data|
data<<xml_doc
end

Resources