Calculate size of JSON Data Endpoint? - download

I am making a call to an endpoint that returns JSON. When I save the data to a file, the total size is 500 Kilobytes. What I wanted to do was to compress the JSON, but I heard by just enabling compression on the web server (Apache), I will accomplish the same thing. Now I have done that, and enabled compression. But how do I get the size of the DOWNLOAD, and not the size of the file if I save it?

It's not quite as simple as just enabling compression on the web server. The HTTP request received by the server must include the Accept-Encoding header to indicate which compression scheme or schemes it support.
The most common is: Accept-Encoding: gzip.
You'd likely need to use a packet sniffer (fiddler or equivalent) to determine the difference in payload sizes when compressed versus decompressed. Most HTTP libraries I am aware of decompress the payload before passing it back to the calling code.

Related

What is the best way to generate a ETag based on the timestamp of the resource

So in one of my projects i have to create a http cache to handle multiple API calls to the server. I read about this ETag header that can be used with a conditional GET to minimize server load and enact caching.. However i have a problem with generating the E-Tag.. I can use the LAST_UPDATED_TIMESTAMP of the resource as the ETag or hash it using some sort of hashing algo like MD5. but what would be the best way to do this? Is there any cons in using raw timestamp as the Etag??
any supportive answer is highly appreciated .. Thanks in advance....Cheers!!
If your timestamp has enough precision so that you can guarantee it will change any time the resource changes, then you can use an encoding of the timestamp (the header value needs to be ascii).
But bear in mind that ETag may not save you much. It's just a cache revalidation header, so you will still get as many requests from clients, just some will be conditional, and you may then be able to avoid sending payload back if the ETag didn't change, but you will still incur some work figuring that out (maybe a bunch less work, so could be worth it).
In fact several versions of IIS used the file timestamp to generate an Etag. We tripped over that when building WinGate's cache module, when a whole bunch of files with the same timestmap ended up with the same Etag, and we learned that an Etag is only valid in the context of the request URI.

Transfer XML as text or as Stream (Binary)

We would like to transfer a XML to a WEB API that can accept text as well as binary data.
What is the best way to transfer it in terms of traffic size?
Is it better to transfer it as clear text or as Stream of Binary data?
If you are concerned that the XML data you want to transfer is too large, then you can try using compression, gzip compression being the most popular. Web API has some built-in functionality for this but you could also "roll your own" if you like, for example if you want a different compression algorithm.
Fortunately, there's plenty of code around to help with compressing and decompressing your data stream. Take a look at the following:
MS nuget: https://www.nuget.org/packages/Microsoft.AspNet.WebApi.MessageHandlers.Compression/
http://benfoster.io/blog/aspnet-web-api-compression (blog article with a link to GitHub code)
https://github.com/benfoster/Fabrik.Common/tree/master/src/Fabrik.Common.WebAPI (the GitHub code mentioned above)
(SO) Compression filter for Web API
Finally, you could consider using Expect: 100-Continue. If an API client is about to send a request with a large entity body, like a POST, PUT, or PATCH, they can send “Expect: 100-continue” in their HTTP headers, and wait for a “100 Continue” response before sending their entity body. This allows the API server to verify much of the validity of the request before wasting bandwidth to return an error response (such as a 401 or a 403). Supporting this functionality is not very common, but it can improve API responsiveness and reduce bandwidth in some scenarios. (RFC2616 §8.2.3).
While I appreciate an answer full of links can be problematic if those links go out-of-date or get deleted, explaining Web API compression here is just too large a subject. I hope my answer steers you in a useful direction.

Can HTTP headers be too big for browsers?

I am building an AJAX application that uses both HTTP Content and HTTP Header to send and receive data. Is there a point where the data received from the HTTP Header won't be read by the browser because it is too big ? If yes, what is the limit and is it the same behaviour in all the browser ?
I know that theoretically there is no limit to the size of HTTP headers, but in practice what is the point that past that, I could have problem under certain platform, browsers or with certain software installed on the client computer or machine. I am more looking into a guide-line for safe practice of using HTTP headers. In other word, up to what extend can HTTP headers be used for transmitting additional data without having potential problem coming into the line ?
Thanks, for all the input about this question, it was very appreciated and interesting. Thomas answer got the bounty, but Jon Hanna's answer brought up a very good point about the proxy.
Short answers:
Same behaviour: No
Lowest limit found in popular browsers:
10KB per header
256 KB for all headers in one response.
Test results from MacBook running Mac OS X 10.6.4:
Biggest response successfully loaded, all data in one header:
Opera 10: 150MB
Safari 5: 20MB
IE 6 via Wine: 10MB
Chrome 5: 250KB
Firefox 3.6: 10KB
Note
Those outrageous big headers in Opera, Safari and IE took minutes to load.
Note to Chrome:
Actual limit seems to be 256KB for the whole HTTP header.
Error message appears: "Error 325 (net::ERR_RESPONSE_HEADERS_TOO_BIG): Unknown error."
Note to Firefox:
When sending data through multiple headers 100MB worked fine, just split up over 10'000 headers.
My Conclusion:
If you want to support all popular browsers 10KB per header seems to be the limit and 256KB for all headers together.
My PHP Code used to generate those responses:
<?php
ini_set('memory_limit', '1024M');
set_time_limit(90);
$header = "";
$bytes = 256000;
for($i=0;$i<$bytes;$i++) {
$header .= "1";
}
header("MyData: ".$header);
/* Firfox multiple headers
for($i=1;$i<1000;$i++) {
header("MyData".$i.": ".$header);
}*/
echo "Length of header: ".($bytes / 1024).' kilobytes';
?>
In practice, while there are rules prohibitting proxies from not passing certain headers (indeed, quite clear rules on which can be modified and even on how to inform a proxy on whether it can modify a new header added by a later standard), this only applies to "transparent" proxies, and not all proxies are transparent. In particular, some wipe headers they don't understand as a deliberate security practice.
Also, in practice some do misbehave (though things are much better than they were).
So, beyond the obvious core headers, the amount of header information you can depend on being passed from server to client is zero.
This is just one of the reasons why you should never depend on headers being used well (e.g., be prepared for the client to repeat a request for something it should have cached, or for the server to send the whole entity when you request a range), barring the obvious case of authentication headers (under the fail-to-secure principle).
Two things.
First of all, why not just run a test that gives the browser progressively larger and larger headers and wait till it hits a number that doesn't work? Just run it once in each browser. That's the most surefire way to figure this out. Even if it's not entirely comprehensive, you at least have some practical numbers to go off of, and those numbers will likely cover a huge majority of your users.
Second, I agree with everyone saying that this is a bad idea. It should not be hard to find a different solution if you are really that concerned about hitting the limit. Even if you do test on every browser, there are still firewalls, etc to worry about, and there is absolutely no way you will be able to test every combination (and I'm almost positive that no one else has done this before you). You will not be able to get a hard limit for every case.
Though in theory, this should all work out fine, there might later be that one edge case that bites you in the butt if you decide to do this.
TL;DR: This is a bad idea. Save yourself the trouble and find a real solution instead of a workaround.
Edit: Since you mention that the requests can come from several types of sources, why not just specify the source in the request header and have the data contained entirely in the body? Have some kind of Source or ClientType field in the header that specifies where the request is coming from. If it's coming from a browser, include the HTML in the body; if it's coming from a PHP application, put some PHP-specific stuff in there; etc etc. If the field is empty, don't add any extra data at all.
The RFC for HTTP/1.1 clearly does not limit the length of the headers or the body.
According to this page modern browsers (Firefox, Safari, Opera), with the exception of IE can handle very long URIs: https://web.archive.org/web/20191019132547/https://boutell.com/newfaq/misc/urllength.html. I know it is different from receiving headers, but at least shows that they can create and send huge HTTP requests (possibly unlimited length).
If there's any limit in the browsers it would be something like the size of the available memory or limit of a variable type, etc.
Theoretically, there's no limit to the amount of data that can be sent in the browser. It's almost like saying there's a limit to the amount of content that can be in the body of a web page.
If possible, try to transmit the data through the body of the document. To be on the safe side, consider splitting the data up, so that there are multiple passes for loading.

Can I use Google's Protocol buffers for processing LDAP requests in my LDAP server?

I need to process the incoming predefined ASN format data(coming from verity of clients that uses BER library to build it) in my application server. This is typically an LDAP server where every request will be in a predefined ASN format. Can i use Google's protocol buffers to process the requests in the server side? Will it help any way to improve performance of my servers request handling? Is it anyway reduce the number of malloc() calls that happens while processing ASN messages?
Thanks,
Naga
I don't see how it's likely to help, to be honest. Unless you can change both the server and the client, you'll have to handle the ASN format at some point anyway - where do you think you'd get benefit from converting from one format to another?
If you have a lot of internal processing between different servers after you've received the request, then in that case it may make sense to translate from ASN to a protocol buffer format - but it sounds like you're still going to need ASN handling at the boundary.
The binary format of protobuf is not like BER encoding, you cannot use protobuf to decode those messages.

What data formats can AJAX transfer?

I'm new to AJAX, but as an overview I'd like to know what formats you can upload and download. Is it limited to JSON or XML or can you even send binary types like MP3 or UTF-8 HTML. And finally, do you have full control over the data, byte for byte in something like a byte array, or is only a string sent/received.
If we are talking about ajax we are talking about javascript? And about XMLHTTPRequest?
The XMLHttpRequest which is only a http request can transfer everything. But there is no byte array in javascript. Only strings, numbers and such. Every thing you get from an ajax call is a piece of text (responseText). That might be parsed into XML (which gives you reponseXML). Special encodings should be more a matter of the http transport.
The binary stuff is not ajax dependent but javascript dependent. There are some weird encodings for strings to deliver byte data inside in javascript (especially for images) but it is not a general solution.
HTML is not a problem and that is the most prominent use case. From this type of request you get an HTML string delivered and that is added to some node in the DOM per innerHTML that parses the HTML.
Since data is transported via HTTP you will have to make sure that you use some kind of encoding. One of the most popular is base64 encoding. You can find more information at: http://www.webtoolkit.info/javascript-base64.html
The methodology is to base64-encode the data you would like to send and then base64-decode the data at the server(or the client) and use the original data as you intended.
You can transfer any type of data either string or bytes
You can send anything you like, the problem may be how to handle it once you get it ;)
Standard HTML is probably the most common type of ajax content in use out there - you can choose character encoding too, although it's always best to stick with one type of encoding.
AJAX simply means you're transferring data asynchronously over HTTP with a JavaScript call. So your script makes a "normal" HTTP request using the XmlHttpRequest() object. However, as the name implies, it's really only suited for text-based data formats since you generally want to perform some action on the client side with the data you got back from the server (not always though, sometimes people just send XmlHttpRequests only to update something on the server).
On a side note, I have never seen an application where sending binary data would have been appropriate anyway.
Most often, people choose to send data over to the server with POST or GET (which is basically a method to transfer name-value pairs inherent to HTTP). For sending more complex data, for example hierarchical structures, they need to be encoded somehow. XML documents can be made natively per JavaScript, sent over to the server and get parsed into whatever data types necessary. But since XML can be a bit of a pain, many devs use JSON encoded data instead because it's easy to generate and easy to parse.
What the server sends back is equally as arbitrary. Usually, you specify a callback function in your Javascript that handles the incoming data. Again, the popular choices are XML and JSON, they parse easily into a document object or an array structure respectively. You could also send plain text or some other packaging but remember that you then have to take care of extracting the usable data from it yourself. Sometimes, it can also be beneficial to send actual HTML fragments to the client to update something on the page directly.
For starters, I suggest you have a look at JQuery. It's a very lightweight framework that abstracts many of evil compatibility stuff and lets you write AJAX requests very nicely.
You can move anything that can be sent over HTTP. There are restrictions about the call being made to the same domain as the page loaded from, but not on the content of the transfer. You can do either GET or POST transactions too.
There is a Digg the Blog entry titled DUI.Stream and MXHR that shows off what they call "Multipart XMLHttpRequests." It is alpha code now, but there is a demo that handles images.

Resources