Transfer XML as text or as Stream (Binary) - asp.net-web-api

We would like to transfer a XML to a WEB API that can accept text as well as binary data.
What is the best way to transfer it in terms of traffic size?
Is it better to transfer it as clear text or as Stream of Binary data?

If you are concerned that the XML data you want to transfer is too large, then you can try using compression, gzip compression being the most popular. Web API has some built-in functionality for this but you could also "roll your own" if you like, for example if you want a different compression algorithm.
Fortunately, there's plenty of code around to help with compressing and decompressing your data stream. Take a look at the following:
MS nuget: https://www.nuget.org/packages/Microsoft.AspNet.WebApi.MessageHandlers.Compression/
http://benfoster.io/blog/aspnet-web-api-compression (blog article with a link to GitHub code)
https://github.com/benfoster/Fabrik.Common/tree/master/src/Fabrik.Common.WebAPI (the GitHub code mentioned above)
(SO) Compression filter for Web API
Finally, you could consider using Expect: 100-Continue. If an API client is about to send a request with a large entity body, like a POST, PUT, or PATCH, they can send “Expect: 100-continue” in their HTTP headers, and wait for a “100 Continue” response before sending their entity body. This allows the API server to verify much of the validity of the request before wasting bandwidth to return an error response (such as a 401 or a 403). Supporting this functionality is not very common, but it can improve API responsiveness and reduce bandwidth in some scenarios. (RFC2616 §8.2.3).
While I appreciate an answer full of links can be problematic if those links go out-of-date or get deleted, explaining Web API compression here is just too large a subject. I hope my answer steers you in a useful direction.

Related

Save file from POST request to disk without storing in memory with Python's BaseHTTPServer

I'm writing an HTTP server in Python 2 with BaseHTTPServer, and it's assumed that it accepts multiple connections at the same time, on each connection the user can send a large file through a POST request. However my understanding is that the whole request will be stored in the server's memory before being processed, and multiple uploaded file at the same time can exceed the amount of memory on the server. Is there any way to, instead of storing the file/request in memory, stream it to a file on disk directly?
BaseHTTPServer doesn't come with a POST handler out of the box, so you'll have to implement it yourself or find an implementation that works for you. (These are easy to search for; here's one I found that looked straightforward.
Your question is similar to this question about limiting the max-size of POST; the answer points out you'll need to read through all that data in order to ensure proper browser functionality. The comments to that answer seem to indicate the use of other techniques ("e.g. AJAX and realtime notifications via WebSocket." #dmitry-nedbaylo)

Calculate size of JSON Data Endpoint?

I am making a call to an endpoint that returns JSON. When I save the data to a file, the total size is 500 Kilobytes. What I wanted to do was to compress the JSON, but I heard by just enabling compression on the web server (Apache), I will accomplish the same thing. Now I have done that, and enabled compression. But how do I get the size of the DOWNLOAD, and not the size of the file if I save it?
It's not quite as simple as just enabling compression on the web server. The HTTP request received by the server must include the Accept-Encoding header to indicate which compression scheme or schemes it support.
The most common is: Accept-Encoding: gzip.
You'd likely need to use a packet sniffer (fiddler or equivalent) to determine the difference in payload sizes when compressed versus decompressed. Most HTTP libraries I am aware of decompress the payload before passing it back to the calling code.

What data formats can AJAX transfer?

I'm new to AJAX, but as an overview I'd like to know what formats you can upload and download. Is it limited to JSON or XML or can you even send binary types like MP3 or UTF-8 HTML. And finally, do you have full control over the data, byte for byte in something like a byte array, or is only a string sent/received.
If we are talking about ajax we are talking about javascript? And about XMLHTTPRequest?
The XMLHttpRequest which is only a http request can transfer everything. But there is no byte array in javascript. Only strings, numbers and such. Every thing you get from an ajax call is a piece of text (responseText). That might be parsed into XML (which gives you reponseXML). Special encodings should be more a matter of the http transport.
The binary stuff is not ajax dependent but javascript dependent. There are some weird encodings for strings to deliver byte data inside in javascript (especially for images) but it is not a general solution.
HTML is not a problem and that is the most prominent use case. From this type of request you get an HTML string delivered and that is added to some node in the DOM per innerHTML that parses the HTML.
Since data is transported via HTTP you will have to make sure that you use some kind of encoding. One of the most popular is base64 encoding. You can find more information at: http://www.webtoolkit.info/javascript-base64.html
The methodology is to base64-encode the data you would like to send and then base64-decode the data at the server(or the client) and use the original data as you intended.
You can transfer any type of data either string or bytes
You can send anything you like, the problem may be how to handle it once you get it ;)
Standard HTML is probably the most common type of ajax content in use out there - you can choose character encoding too, although it's always best to stick with one type of encoding.
AJAX simply means you're transferring data asynchronously over HTTP with a JavaScript call. So your script makes a "normal" HTTP request using the XmlHttpRequest() object. However, as the name implies, it's really only suited for text-based data formats since you generally want to perform some action on the client side with the data you got back from the server (not always though, sometimes people just send XmlHttpRequests only to update something on the server).
On a side note, I have never seen an application where sending binary data would have been appropriate anyway.
Most often, people choose to send data over to the server with POST or GET (which is basically a method to transfer name-value pairs inherent to HTTP). For sending more complex data, for example hierarchical structures, they need to be encoded somehow. XML documents can be made natively per JavaScript, sent over to the server and get parsed into whatever data types necessary. But since XML can be a bit of a pain, many devs use JSON encoded data instead because it's easy to generate and easy to parse.
What the server sends back is equally as arbitrary. Usually, you specify a callback function in your Javascript that handles the incoming data. Again, the popular choices are XML and JSON, they parse easily into a document object or an array structure respectively. You could also send plain text or some other packaging but remember that you then have to take care of extracting the usable data from it yourself. Sometimes, it can also be beneficial to send actual HTML fragments to the client to update something on the page directly.
For starters, I suggest you have a look at JQuery. It's a very lightweight framework that abstracts many of evil compatibility stuff and lets you write AJAX requests very nicely.
You can move anything that can be sent over HTTP. There are restrictions about the call being made to the same domain as the page loaded from, but not on the content of the transfer. You can do either GET or POST transactions too.
There is a Digg the Blog entry titled DUI.Stream and MXHR that shows off what they call "Multipart XMLHttpRequests." It is alpha code now, but there is a demo that handles images.

Why Is HTTP/SOAP considered to be "thick"

I've heard some opinions that the SOAP/HTTP web service call stack is "thick" or "heavyweight," but I can't really pinpoint why. Would it be considered thick because of the serialization/deserialization of the SOAP envelope and the message? Is that really a heavy-weight operation?
Or is it just considered "thick" compared to a raw/binary data transfer over a fixed connection?
Or is it some other reason? Can anyone shed some light on this?
SOAP is designed to be abstract enough to use other transports besides HTTP. That means, among other things, that it does not take advantage of certain aspects of HTTP (mostly RESTful usage of URLs and methods, e.g. PUT /customers/1234 or GET /customers/1234).
SOAP also bypasses existing TCP/IP mechanisms for the same reason - to be transport-independent. Again, this means it can't take advantage of the transport, such as sequence management, flow control, service discovery (e.g. accept()ing a connection on a well-known port means the service exists), etc.
SOAP uses XML for all of its serialization - while that means that data is "universally readable" with just an XML parser, it introduces so much boilerplate that you really need a SOAP parser in order to function efficiently. And at that point, you (as a software consumer) have lost the benefit of XML anyways; who cares what the payload looks like over the wire if you need libSOAP to handle it anyways.
SOAP requires WSDL in order to describe interfaces. The WSDL itself isn't a problem, but it tends to be advertised as much more "dynamic" than it really is. In many cases, a single WSDL is created, and producer/consumer code is auto-generated from that, and it never changes. Overall, that requires a lot of tooling around without actually solving the original problem (how to communicate between different servers) any better. And since most SOAP services run over HTTP, the original problem was already mostly solved to begin with.
SOAP and WSDL are extremely complicated standards, which have many implementations that support different subsets of the standards. SOAP does not map very well to a simple foreign function interface in the same way that XML-RPC does. Instead, you have to understand about XML namespaces, envelopes, headers, WSDL, XML schemas, and so on to produce correct SOAP messages. All you need to do to call an XML-RPC service is to define and endpoint and call a method on it. For example, in Ruby:
require 'xmlrpc/client'
server = XMLRPC::Client.new2("http://example.com/api")
result = server.call("add", 1, 2)
Besides XML-RPC, there are other techniques that can also be much more simple and lightweight, such as plain XML or JSON over HTTP (frequently referred to as REST, though that implies certain other design considerations). The advantage of something like XML or JSON over HTTP is that it's easy to use from JavaScript or even just a dumb web page with a form submission. It can also be scripted easily from the command line with tools like curl. It works with just about any language as HTTP libraries, XML libraries, and JSON libraries are available almost everywhere, and even if a JSON parser is not available, it is very easy to write your own.
Edit: I should clarify that I am referring to how conceptually heavyweight SOAP is, as opposed to heavy weight it is in terms of raw amount of data. I think that the raw amount of data is less important (though it adds up quick if you need to handle lots of small requests), while how conceptually heavyweight it is is quite important, because that means that there are a lot more places where something can go wrong, where there can be an incompatibility, etc.
I agree with the first poster, but would like to add to it. The thick and thin definition is relative. With transports like JSON or REST emerging SOAP looks heavy on the surface for "hello world" examples. Now as you might already know what makes SOAP heavy and WS 2.0 in general is the enterprise/robust features . JSON is not secure in the same way that WS 2.0 can be. I have not heard SOAP referred to as thick, but many non-XML nuts look at these specifications as heavy or thick. To be clear I am not speaking for or against either as the both have their place. XML more verbose and human readable and thus "thicker". The last piece is that some people view HTTP a persisting connection protocol to be heavy given newer web trends like AJAX rather than serving up on big page. The connection overhead is large given there is really no benefit.
In summary, no real reason other than someone wants to call SOAP/HTTP thick, it is all relative. Fewer standards are perfect and for all scenarios. If I had to guess some smart web developer thinks he is being oh so smart by talking about how think XML technologies are and how super JSON is. Each have a place.
SOAP's signal-to-noise ratio is too low. For a simple conversation there's too much structural overhead with no data value; and there's too much explicit configuration required (as compared to implicit configuration, like JSON).
It didn't start out that way, but it ended up being a poster-child for what happens to a good idea when a standards committee gets involved.
1 - XML schemas, which are a key part of the WSDL spec, are really, really big and complicated. In practice, you tools that do things like map XML schema to programming language constructs only end up supporting part of the XML schema features.
2 - The WS-* specs, e.g., WS-Security and WS-SecureConversation, are again big and complicated. They are almost designed so that no one will fewer resources than Microsoft or IBM would ever be able to implement them completely.
First of all, it depends a lot on how your services are implemented (i.e. you can do a lot to reduce the payload by just being careful of how your method signatures are done).
That said, not only the soap envelope but the message itself can be a lot more bulky in xml rather than a streamlined binary format. Just choosing the right class and member names can reduce it a lot...
Consider the following examples of serialized method returns from methods returning a collection of a stuff. Just choosing the right [serialization] name for classes/wrappers and members can make a big difference in the verbosity of the serialized soap request/response if you're returning repeated data (e.g. lists/collections/arrays).
Brief / short names:
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfShortIDName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://tempuri.org/">
<ShortIDName>
<id>0</id>
<name>foo 0</name>
</ShortIDName>
<ShortIDName>
<id>1</id>
<name>foo 1</name>
</ShortIDName>
<ShortIDName>
<id>2</id>
<name>foo 2</name>
</ShortIDName>
...
</ArrayOfShortIDName>
Long names:
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfThisClassHasALongClassNameIDName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://tempuri.org/">
<ThisClassHasALongClassNameIDName>
<MyLongMemberNameObjectID>0</MyLongMemberNameObjectID>
<MyLongMemberNameObjectName>foo 0</MyLongMemberNameObjectName>
</ThisClassHasALongClassNameIDName>
<ThisClassHasALongClassNameIDName>
<MyLongMemberNameObjectID>1</MyLongMemberNameObjectID>
<MyLongMemberNameObjectName>foo 1</MyLongMemberNameObjectName>
</ThisClassHasALongClassNameIDName>
<ThisClassHasALongClassNameIDName>
<MyLongMemberNameObjectID>2</MyLongMemberNameObjectID>
<MyLongMemberNameObjectName>foo 2</MyLongMemberNameObjectName>
</ThisClassHasALongClassNameIDName>
...
</ArrayOfThisClassHasALongClassNameIDName>
I considered it "thick" because of the relatively large overhead involved with packaging and unpacking a message (serializing and deserializing).
Consider a web service with a web method called Add that takes two 32-bit integers. The caller packages up two integers and receive a single integer in reply. Where there's really only 96 bits of information being transmitted, the addition of the SOAP packets will probably add around 3,000 or more extra bits in each direction. A 30x increase.
Added to this is the relatively slow performance associated with serializing and deserializing the message into UTF-8 (or whatever) XML. Admittedly it's pretty fast these days, but it's certainly not trivial.
I think it's mainly that the SOAP envelope adds a large amount of overhead to constructing the message, especially for the common case of a simple request with only a few, not-deeply-structured parameters. Compare that to a REST style web service where the parameters are simply included in the URL query.
Then add to that the complexity of WSDL and the typical "enterprise" library implementations...

Buffered Multipart Form Posts in Ruby

I am currently using Net::HTTP in a Ruby script to post files to a website via a multipart form post. It works great for small files, but I frequently have to send very large files using this script, and HTTP#post only seems to accept post data as a String object, which means that the file I'm sending has to be read into memory before anything can be sent. This script is running on a busy production server, so it's unacceptable to gobble up hundreds of megabytes of RAM just to send a file.
Ideally, there'd be a method that could be given a buffer size and an IO object, and would send off buffer-sized chunks of data, reading from the IO object only as required. What would be the best way to make this happen? Did I miss something relevant in Net::HTTP?
Update: Net::HTTP#body_stream(input) looks good, though the documentation is rather... sparse. Anyone able to point me to a good example of this in action?
Actually I managed to upload a file using body_stream. The full source code is here:
http://stanislavvitvitskiy.blogspot.com/2008/12/multipart-post-in-ruby.html
Use Net::HTTP#body_stream(input)
Example for multipart post without streaming:

Resources