How do I get Zlib to compress to a stream in Ruby? - ruby

I’m trying to upload files to Amazon S3 using AWS::S3, but I’d like to compress them with Zlib first. AWS::S3 expects its data to be a stream object, i.e. you would usually upload a file with something like
AWS::S3::S3Object.store('remote-filename.txt', open('local-file.txt'), 'bucket')
(Sorry if my terminology is off; I don’t actually know much about Ruby.) I know that I can zlib-compress a file with something like
data = Zlib::Deflate.deflate(File.read('local-file.txt'))
but passing data as the second argument to S3Object.store doesn’t seem to do what I think it does. (The upload goes fine but when I try to access the file from a web browser it doesn’t come back correctly.) How do I get Zlib to deflate to a stream, or whatever kind of object S3Object.store wants?

I think my problem before was not that I was passing the wrong kind of thing to S3Object.store, but that I was generating a zlib-compressed data stream without the header you’d usually find in a .gz file. In any event, the following worked:
str = StringIO.new()
gz = Zlib::GzipWriter.new(str)
gz.write File.read('local-file.txt')
gz.close
AWS::S3::S3Object.store('remote-filename.txt', str.string, 'bucket')

Related

Missing data when decompressing zlib data with Ruby

Well I have a deflated json encoded in a request log and I need to decompress it.
I tried to use zlib, i.e:
Zlib::Inflate.new(-Zlib::MAX_WBITS).inflate(File.read("PATH_OF_FILE"))
It shows only a part of the JSON. Something like:
"{\"seq\":53,\"app_id\":\"567067343352427\",\"app_ver\":\"10.3.2\",\"build_num\":\"46395473\",\"device_id\":\"c12f541a-5936-4477-b6fc-653db675d16"
There is a lot of missing data because the deflated data is too big.
Deflate full data:
Check it here.
After testing, I figured out that only this part is being decompressed:
Check it here.
Well I'm a bit confused with this. Could someone could please help me with it?

open-uri and sax parsing for a giant xml document

I need to connect to an external XML file to download and process (300MB+).
Then run through the XML document and save elements in the database.
I am already doing this no problem on a production server with Saxerator to be gentle on memory. It works great. Here is my issue now --
I need to use open-uri (though there could be alternative solutions?) to grab the file to parse through. This problem is that open-uri has to load the whole file before anything starts parsing, which defeats the entire purpose of using a SAX Parser to save on memory... any work arounds? Can I just read from the external XML document? I cannot load the entire file or it crashes my server, and since the document is updated every 30 minutes, I can't just save a copy of it on my server (though this is what I am doing currently to make sure everything id working).
I am doing this Ruby, p.s.
You may want to try Net::HTTP's streaming interface instead of open-URI. This will give Saxerator (via the underlying Nokogiri::SAX::Parser) an IO object rather than the entire file.
I took a few minutes to write this up and then realized you tagged this question with ruby. My solution is in Java so I apologize for that. I'm still including it here since it could be useful to you or someone down the road.
This is always how I've processed large external xml files
XMLReader xmlReader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
xmlReader.setFeature("http://xml.org/sax/features/namespaces", true);
XMLFilter filter = new XMLFilterImpl();
filter.setParent(xmlReader);
filter.parse(new InputSource(new BufferedReader(new InputStreamReader(new URL("<url to external document here>").openConnection().getInputStream(),"UTF8"))));

Using gzip compression in Sinatra with Ruby

Note: I had another similar question about how to GZIP data using Ruby's zlib which technically was answered and I didn't feel I could start evolving the question since it had been answered so although this question is related it is not the same...
The following code (I believe) is GZIP'ing a static CSS file and storing the results in the result variable. But what do I do with this in the sense: how can I send this data back to the browser so it is recognised as being GZIP'ed rather than the original file size (e.g. when checking my YSlow score I want to see it correctly marking me for making sure I GZIP static resources).
z = Zlib::Deflate.new(6, 31)
z.deflate(File.read('public/Assets/Styles/build.css'))
z.flush
#result = z.finish # could also of done: result = z.deflate(file, Zlib::FINISH)
z.close
...one thing to note is that in my previous question the respondent clarified that Zlib::Deflate.deflate will not produce gzip-encoded data. It will only produce zlib-encoded data and so I would need to use Zlib::Deflate.new with the windowBits argument equal to 31 to start a gzip stream.
But when I run this code I don't actually know what to do with the result variable and its content. There is no information on the internet (that I can find) about how to send GZIP encoded static resources (like JavaScript, CSS, HTML etc) to the browser, this making the page load quicker. It seems every Ruby article I read is based on someone using Ruby on Rails!!?
Any help really appreciated.
After zipping the file you would simply return the result and ensure to set the header Content-Encoding: gzip for the response. Google has a nice, little introduction to gzip compression and what you have to watch out for. Here is what you could do in Sinatra:
get '/whatever' do
headers['Content-Encoding'] = 'gzip'
StringIO.new.tap do |io|
gz = Zlib::GzipWriter.new(io)
begin
gz.write(File.read('public/Assets/Styles/build.css'))
ensure
gz.close
end
end.string
end
One final word of caution, though. You should probably choose this approach only for content that you created on the fly or if you just want to use gzip compression in a few places.
If, however, your goal is to serve most or even all of your static resources with gzip compression enabled, then it will be a much better solution to rely on what is already supported by your web server instead of polluting your code with this detail. There's a good chance that you can enable gzip compression with some configuration settings. Here's an example of how it is done for nginx.
Another alternative would be to use the Rack::Deflater middleware.
Just to highlight 'Rack::Deflater' way as an 'answer' ->
As mentioned in the comment above, just put the compression in config.ru
use Rack::Deflater
thats pretty much it!
We can see that users are going to compress web related data like css files. I want to recommend using brotli. It was heavily optimized for such purpose. Any modern web browser today supports it.
You can use ruby-brs bindings for ruby.
gem install ruby-brs
require "brs"
require "sinatra"
get "/" do
headers["Content-Encoding"] = "br"
BRS::String.compress File.read("sample.css")
end
You can use streaming interface instead, it is similar to Zlib interface.
require "brs"
require "sinatra"
get "/" do
headers["Content-Encoding"] = "br"
StringIO.new.tap do |io|
writer = BRS::Stream::Writer.new io
begin
writer.write File.read("sample.css")
ensure
writer.close
end
end
.string
end
You can also use nonblock methods, please read more information about ruby-brs.

Reading a zip file from a GET request without saving first in Ruby?

I am trying read zip file from HTTP GET request. One way to do it is by
saving the response body to a physical file first and then reading the
zip file to read the files inside the zip.
Is there a way to read the files inside directly without having to save
the zip file into a physical file first?
My current code:
Net::HTTP.start("clinicaltrials.gov") do |http|
resp = http.get("/ct2/results/download?id=15002A")
open("C:\\search_result.zip", "wb") do |file|
file.write(resp.body)
end
end
Zip::ZipFile.open("C:\\search_result.zip") do |zipfile|
xml = zipfile.file.read("search_result.xml")
end
Looks like you're using rubyzip, which can't unzip from an in-memory buffer.
You might want to look at using Chilkat's Ruby Zip Library instead as it supports in-memory processing of zip data. It claims to be able to "Create or open in-memory Zips", though I have no experience with the library myself. Chilkat's library isn't free, however, so that may be a consideration. Not sure if there is a free library that has this feature.
One way might be to implement in-memory file, so that RubyZip can still play with your file without changing anything.
You sould take a look at this Ruby Hack

Ruby CGI Stream Audio

What I want to do is use a CGI script (written in Ruby) to read a binary file off of the filesystem (audio, specifically) and stream it to the browser.
This is how I am doing that so far,
require 'config'
ENV['REQUEST_URI'] =~ /\?(.*)$/
f= $config[:mdir] + '/' + $1.url_decode
f = File.open f, 'rb'
print "Content-Type: audio/mpeg\r\n"#TODO: Make it guess mime type
print "\r\n"
#Outputs file
while blk = f.read(4096)
$stdout.puts blk
$stdout.flush
end
f.close
It's not perfect, there are security holes (exposing the whole filesystem..), but it just isn't working right. It's reading the right file, and as far as I can tell, outputting it in 4k blocks like it should. Using Safari, if I go to the URL, it will show a question mark on the audio player. If I use wget to download it, it appears to work and is about the right size, but is corrupted. It begins playing fine, then crackles, and then stops.
How should I go about doing this? Do I need to Base-64 encode, if can I do that without loading the whole file into memory in one go?
Btw, this is only going to be used over local area network, and I want easy setup, so I'm not interested in a dedicated streaming server.
You could just use IO#print instead of IO#puts but this has some disadvantages.
Don't do the file handling in ruby
Ruby is not good in doing stupid tasks very fast. With this code, you will probably not be able to fill your bandwith.
No CGI
All you want to do is expose a part of the filestystem via http, here are some options on how to do it
Set your document root to the folder you want to expose.
Make a symlink to the folder you want to expose.
Write some kind of rule in your webserver config to map certain url's to a folder
Use any of these and the http server will do for you what you want.
X-Sendfile
Some http servers honor a special header called X-Sendfile. If you do
print "X-Sendfile: #{file}"
the server will use the specified file as body. In lighty this works only through fastcgi, so you would need a fastcgi wrapper like: http://cgit.stbuehler.de/gitosis/fcgi-cgi/ . I don't know about apache and others.
Use a c extension
C extensions are good at doing stupid tasks fast. I wrote a c extension which does nothing, but reading from one IO and writing it to another IO. With this c extension you can fill a GIGABIT Lan through ruby: git://johannes.krude.de/io2io
Use it like this:
IO2IO.do(file, STDOUT)
puts is for writing "lines" of "text", and therefore appends a newline to whatever you give it (turning your mpegs into garbage). You want syswrite.

Resources