Buffered Multipart Form Posts in Ruby - ruby

I am currently using Net::HTTP in a Ruby script to post files to a website via a multipart form post. It works great for small files, but I frequently have to send very large files using this script, and HTTP#post only seems to accept post data as a String object, which means that the file I'm sending has to be read into memory before anything can be sent. This script is running on a busy production server, so it's unacceptable to gobble up hundreds of megabytes of RAM just to send a file.
Ideally, there'd be a method that could be given a buffer size and an IO object, and would send off buffer-sized chunks of data, reading from the IO object only as required. What would be the best way to make this happen? Did I miss something relevant in Net::HTTP?
Update: Net::HTTP#body_stream(input) looks good, though the documentation is rather... sparse. Anyone able to point me to a good example of this in action?

Actually I managed to upload a file using body_stream. The full source code is here:
http://stanislavvitvitskiy.blogspot.com/2008/12/multipart-post-in-ruby.html

Use Net::HTTP#body_stream(input)
Example for multipart post without streaming:

Related

How to edit a bio write request

I'm writing a device-mapper which has to edit incoming writes, for example:
Incoming bio request contains a biovec with a page full of 0s, and I want to change it to all 1s.
I tried using bvec_kmap_local() to get the page address, and then I can read the data and if needed adjust it using memcpy or similar. Initial tests seemed to work, but if I execute things like mkfs on the created dm, I get a lot of segmentation faults. I narrowed down the problem that this only happens if I actually write something to the mapped page. However, I see no reason why it should cause this fault as I (after checking probably 100 times) don't access invalid memory. I'm wondering if my method of editing the write is actually correct?
Thanks! I can provide way more information if needed
So apparently you're not supposed to edit the pages of a bio request, since write submitters (like file systems) expect their data buffers not to be modified. Instead, you should create a new one with the page you want it to be, and submit it.

Save file from POST request to disk without storing in memory with Python's BaseHTTPServer

I'm writing an HTTP server in Python 2 with BaseHTTPServer, and it's assumed that it accepts multiple connections at the same time, on each connection the user can send a large file through a POST request. However my understanding is that the whole request will be stored in the server's memory before being processed, and multiple uploaded file at the same time can exceed the amount of memory on the server. Is there any way to, instead of storing the file/request in memory, stream it to a file on disk directly?
BaseHTTPServer doesn't come with a POST handler out of the box, so you'll have to implement it yourself or find an implementation that works for you. (These are easy to search for; here's one I found that looked straightforward.
Your question is similar to this question about limiting the max-size of POST; the answer points out you'll need to read through all that data in order to ensure proper browser functionality. The comments to that answer seem to indicate the use of other techniques ("e.g. AJAX and realtime notifications via WebSocket." #dmitry-nedbaylo)

`finish': buffer error (Zlib::BufError) while web scraping with ruby and mechanize

I'm scraping multiple pages (from an array) and checking links on those pages. Every once in a while the script will crash and I will git the following error:
/usr/local/rvm/rubies/ruby-2.0.0-p451/lib/ruby/2.0.0/net/http/response.rb:357:in `finish': buffer error (Zlib::BufError)
I found a answer here that says: It's possible that you're
hitting a URL that points to a load-balancer. One of the hosts behind
that load-balancer is mis-configured, or perhaps it's configured
differently than its peers, and is returning a gzipped version of the
content, where others aren't. I've seen that problem in the past.
I've also see situations where the server said it was returning
gzipped content, but sent it uncompressed. Or it could be sending
zipped, not gzipped. The combinations are many.
The fix is to be sure your code is capable of sensing whether the
returned content is compressed. Make sure you're sending the correct
acceptable-content HTTP headers for your code to their server too. You
have to program defensively and look at the actual content you get
back, and then branch to do the right decompression, then pass that on
for parsing
I really don't understand it. I just want to be able to skip the the element in the array and move on to the next one. Is there a simple way to do this?

When sending a file via AJAX, does it get read into memory first?

I'm writing an uploader that has to be able to transmit files of any size (up to 30gigs) to the server.
My original intention was to write a java applet that would break the file up into pieces, send those to the server, and then reassemble them there.
However, someone has suggested that AJAX's XMLHttpRequest can do the job in conjunction with nsIFileInputStream
(example here: https://developer.mozilla.org/en/using_xmlhttprequest#Sending_files_using_a_FormData_object )
and by using PUT instead of POST.
I'm worried about 2 things and can't seem to find the answer.
1) Will AJAX attempt to read the file into memory before sending it (that obviously would break the whole thing)
[EDIT]
This http://www.codeproject.com/KB/ajax/AJAXFileUpload.aspx?msg=2329446 example explicitly states that they're using ActiveXObject because that DOESN'T load the file into memory... which suggests to me that XMLHttpRequest would load it into memory. I'm surprised I'm having such a hard time finding this info, to be honest.
2) How reliable is this approach. I realize that if the connection just dies the upload would have to resume from scratch, but realistically, how likely is it that using a standard cable connection with an upload throttle of about .5MB/s that a 30 gig file would arrive at the server?
I'm trying something similar using File Api and blob.slice, but it turned out to clock up memory on large files.. However, you could use Google Gears, which plays much better with large sliced files. It also doesnt cause errors with the slice order, which FileReader combined with XHR does frequently and randomly.
I do however find (generally) that uploading files via JavaScript is very unstable..

What data formats can AJAX transfer?

I'm new to AJAX, but as an overview I'd like to know what formats you can upload and download. Is it limited to JSON or XML or can you even send binary types like MP3 or UTF-8 HTML. And finally, do you have full control over the data, byte for byte in something like a byte array, or is only a string sent/received.
If we are talking about ajax we are talking about javascript? And about XMLHTTPRequest?
The XMLHttpRequest which is only a http request can transfer everything. But there is no byte array in javascript. Only strings, numbers and such. Every thing you get from an ajax call is a piece of text (responseText). That might be parsed into XML (which gives you reponseXML). Special encodings should be more a matter of the http transport.
The binary stuff is not ajax dependent but javascript dependent. There are some weird encodings for strings to deliver byte data inside in javascript (especially for images) but it is not a general solution.
HTML is not a problem and that is the most prominent use case. From this type of request you get an HTML string delivered and that is added to some node in the DOM per innerHTML that parses the HTML.
Since data is transported via HTTP you will have to make sure that you use some kind of encoding. One of the most popular is base64 encoding. You can find more information at: http://www.webtoolkit.info/javascript-base64.html
The methodology is to base64-encode the data you would like to send and then base64-decode the data at the server(or the client) and use the original data as you intended.
You can transfer any type of data either string or bytes
You can send anything you like, the problem may be how to handle it once you get it ;)
Standard HTML is probably the most common type of ajax content in use out there - you can choose character encoding too, although it's always best to stick with one type of encoding.
AJAX simply means you're transferring data asynchronously over HTTP with a JavaScript call. So your script makes a "normal" HTTP request using the XmlHttpRequest() object. However, as the name implies, it's really only suited for text-based data formats since you generally want to perform some action on the client side with the data you got back from the server (not always though, sometimes people just send XmlHttpRequests only to update something on the server).
On a side note, I have never seen an application where sending binary data would have been appropriate anyway.
Most often, people choose to send data over to the server with POST or GET (which is basically a method to transfer name-value pairs inherent to HTTP). For sending more complex data, for example hierarchical structures, they need to be encoded somehow. XML documents can be made natively per JavaScript, sent over to the server and get parsed into whatever data types necessary. But since XML can be a bit of a pain, many devs use JSON encoded data instead because it's easy to generate and easy to parse.
What the server sends back is equally as arbitrary. Usually, you specify a callback function in your Javascript that handles the incoming data. Again, the popular choices are XML and JSON, they parse easily into a document object or an array structure respectively. You could also send plain text or some other packaging but remember that you then have to take care of extracting the usable data from it yourself. Sometimes, it can also be beneficial to send actual HTML fragments to the client to update something on the page directly.
For starters, I suggest you have a look at JQuery. It's a very lightweight framework that abstracts many of evil compatibility stuff and lets you write AJAX requests very nicely.
You can move anything that can be sent over HTTP. There are restrictions about the call being made to the same domain as the page loaded from, but not on the content of the transfer. You can do either GET or POST transactions too.
There is a Digg the Blog entry titled DUI.Stream and MXHR that shows off what they call "Multipart XMLHttpRequests." It is alpha code now, but there is a demo that handles images.

Resources