Using the HTTP Range Header with a range specifier other than bytes? - ajax

The core question is about the use of the HTTP Headers, including Range, If-Range, Accept-Ranges and a user defined range specifier.
Here is a manufactured example to help illustrate my question. Assume I have a Web 2.0 style application that displays some sort of human readable documents. These documents are editorially broken up into pages (similar to articles you see on news websites). For this example, assume:
There is a document titled "HTTP Range Question" is broken up into three pages.
The shell page (/document/shell/http-range-question) knows the meta information about the document, including the number of pages.
The first readable page of the document is loaded during the page onload event via an ajax GET and inserted onto the page.
A UI control that looks like [ 1 2 3 All ] is at the bottom of the page, and clicking on a number will display that readable page (also loaded via ajax), and clicking "All" will display the entire document. Assume these URLS for the 1, 2, 3 and All use cases:
/document/content/http-range-question?page=1
/document/content/http-range-question?page=2
/document/content/http-range-question?page=3
/document/content/http-range-question
Now to the question. Can I use the HTTP Range headers instead part of the URL (e.g. a querystring parameter)? Maybe something like this on the GET /document/content/http-range-question request:
Range: page=1
It looks like the spec only defines byte ranges as allowable, so even if I made my ajax calls work with my browser and server code, anything in the middle could break the contract (e.g. a caching proxy server).
Range: bytes=0-499
Any opinions or real world examples of custom range specifiers?
Update: I did find a similar question about the Range header (Paging in a Rest Collection) where they mention that Dojo's JsonRestStore uses a custom Range header value.
Range: items=0-24

Absolutely - you are free to specify any range units you like.
From RFC 2616:
3.12 Range Units
HTTP/1.1 allows a client to request
that only part (a range of) the
response entity be included within the
response. HTTP/1.1 uses range units
in the Range (section 14.35) and
Content-Range (section 14.16)
header fields. An entity can be broken
down into subranges according to
various structural units.
range-unit = bytes-unit | other-range-unit
bytes-unit = "bytes"
other-range-unit = token
The only range unit defined by
HTTP/1.1 is "bytes". HTTP/1.1
implementations MAY ignore ranges
specified using other units.
The key piece is the last paragraph. Really what it's saying is that when they wrote the spec for HTTP/1.1, they only outlined the "bytes" token. But, as you can see from the 'other-range-unit' bit, you are free to come up with your own token specifiers.
Coming up with your own Range specifiers does mean that you have to have control over the client and server code that uses that specifier. So, if you own the backend piece that exposes the "/document/content/http-range-question" URI, you are good to go; presumably you're using a modern web framework that lets you inspect the request headers coming in. You could then look at the Range values to perform the backing query correctly.
Furthermore, if you control the AJAX code that makes requests to the backend, you should be able to set the Range header yourself.
However, there is a potential downside which you anticipate in your question: the potential to break caching. If you are using a custom Range unit, any caches between your client and the origin servers "MAY ignore ranges specified using [units other than 'bytes']". So for example, if you had a Squid/Varnish cache between the front and backend, there's no guarantee that the results you're hoping for will be served from the cache!
You might also consider an alternative implementation where, rather than using a query string, you make the page a "parameter" of the URI; e.g.: /document/content/http-range-question/page/1. This would likely be a little more work for you server-side, but it's HTTP/1.1 compliant and caches should treat it properly.
Hope this helps.

bytes is the only unit supported by HTTP 1.1 Specification.

HTTP Range is typically used for recovering interrupted downloads without starting from the beginning.
What you're trying to do would be better handled by OAI-ORE, which allows you to define relationships between multiple documents. (alternative formats, components of the whole, etc)
Unfortunately, it's a relatively new metadata format, and I don't know of any web browsers that ship with native support.

It sounds like you want to change the HTTP spec just to remove a querystring parameter. In order to do this you'd have to modify code on both the client to send the modified header and the server to read from the "Range" header instead of the querystring.
The end result is that this will probably work, but you're breaking all of the standards and existing tools to do so.

Related

Are there any best practices what Controller methods should return?

For example, should controller return updated entity after update?
To fit into your question we can set the "best practices" in the range of the verbs. What is the best option to return for each verb?
I'm going to use the book RESTful Web Services (Chapter 4) to answer your question.
This is textual iformation from the book, none of the following words are mine.
GET
The server sends back a representation in the response entity-body.
POST
Response of POST request usually has an HTTP status code of 201 ("Created"). Its Location header contains the URI of the newly created resource.
PUT
The entity-body contains the client's proposed new representation of the resource. What data this is, and what format it's in, depends on the service.
DELETE
The response entity-body may contain a status message or nothing at all
HEAD
Retrieve a metadata-only representation.
A client can use HEAD to check wheter a resource exists, or find out other information about the resource, without fetching its entire representation. HEAD gives you exactly what a GET request would give you, but without the entity-body
OPTIONS
Contains the HTTP Allow header, which lays out the subset of the uniform interface this resource supports.

What does a Ajax call response like 'for (;;); { json data }' mean? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why do people put code like “throw 1; <dont be evil>” and “for(;;);” in front of json responses?
I found this kind of syntax being used on Facebook for Ajax calls. I'm confused on the for (;;); part in the beginning of response. What is it used for?
This is the call and response:
GET http://0.131.channel.facebook.com/x/1476579705/51033089/false/p_1524926084=0
Response:
for (;;);{"t":"continue"}
I suspect the primary reason it's there is control. It forces you to retrieve the data via Ajax, not via JSON-P or similar (which uses script tags, and so would fail because that for loop is infinite), and thus ensures that the Same Origin Policy kicks in. This lets them control what documents can issue calls to the API — specifically, only documents that have the same origin as that API call, or ones that Facebook specifically grants access to via CORS (on browsers that support CORS). So you have to request the data via a mechanism where the browser will enforce the SOP, and you have to know about that preface and remove it before deserializing the data.
So yeah, it's about controlling (useful) access to that data.
Facebook has a ton of developers working internally on a lot of projects, and it is very common for someone to make a minor mistake; whether it be something as simple and serious as failing to escape data inserted into an HTML or SQL template or something as intricate and subtle as using eval (sometimes inefficient and arguably insecure) or JSON.parse (a compliant but not universally implemented extension) instead of a "known good" JSON decoder, it is important to figure out ways to easily enforce best practices on this developer population.
To face this challenge, Facebook has recently been going "all out" with internal projects designed to gracefully enforce these best practices, and to be honest the only explanation that truly makes sense for this specific case is just that: someone internally decided that all JSON parsing should go through a single implementation in their core library, and the best way to enforce that is for every single API response to get for(;;); automatically tacked on the front.
In so doing, a developer can't be "lazy": they will notice immediately if they use eval(), wonder what is up, and then realize their mistake and use the approved JSON API.
The other answers being provided seem to all fall into one of two categories:
misunderstanding JSONP, or
misunderstanding "JSON hijacking".
Those in the first category rely on the idea that an attacker can somehow make a request "using JSONP" to an API that doesn't support it. JSONP is a protocol that must be supported on both the server and the client: it requires the server to return something akin to myFunction({"t":"continue"}) such that the result is passed to a local function. You can't just "use JSONP" by accident.
Those in the second category are citing a very real vulnerability that has been described allowing a cross-site request forgery via tags to APIs that do not use JSONP (such as this one), allowing a form of "JSON hijacking". This is done by changing the Array/Object constructor, which allows one to access the information being returned from the server without a wrapping function.
However, that is simply not possible in this case: the reason it works at all is that a bare array (one possible result of many JSON APIs, such as the famous Gmail example) is a valid expression statement, which is not true of a bare object.
In fact, the syntax for objects defined by JSON (which includes quotation marks around the field names, as seen in this example) conflicts with the syntax for blocks, and therefore cannot be used at the top-level of a script.
js> {"t":"continue"}
typein:2: SyntaxError: invalid label:
typein:2: {"t":"continue"}
typein:2: ....^
For this example to be exploitable by way of Object() constructor remapping, it would require the API to have instead returned the object inside of a set of parentheses, making it valid JavaScript (but then not valid JSON).
js> ({"t":"continue"})
[object Object]
Now, it could be that this for(;;); prefix trick is only "accidentally" showing up in this example, and is in fact being returned by other internal Facebook APIs that are returning arrays; but in this case that should really be noted, as that would then be the "real" cause for why for(;;); is appearing in this specific snippet.
Well the for(;;); is an infinite loop (you can use Chrome's JavaScript console to run that code in a tab if you want, and then watch the CPU-usage in the task manager go through the roof until the browser kills the tab).
So I suspect that maybe it is being put there to frustrate anyone attempting to parse the response using eval or any other technique that executes the returned data.
To explain further, it used to be fairly commonplace to parse a bit of JSON-formatted data using JavaScript's eval() function, by doing something like:
var parsedJson = eval('(' + jsonString + ')');
...this is considered unsafe, however, as if for some reason your JSON-formatted data contains executable JavaScript code instead of (or in addition to) JSON-formatted data then that code will be executed by the eval(). This means that if you are talking with an untrusted server, or if someone compromises a trusted server, then they can run arbitrary code on your page.
Because of this, using things like eval() to parse JSON-formatted data is generally frowned upon, and the for(;;); statement in the Facebook JSON will prevent people from parsing the data that way. Anyone that tries will get an infinite loop. So essentially, it's like Facebook is trying to enforce that people work with its API in a way that doesn't leave them vulnerable to future exploits that try to hijack the Facebook API to use as a vector.
I'm a bit late and T.J. has basically solved the mystery, but I thought I'd share a great paper on this particular topic that has good examples and provides deeper insight into this mechanism.
These infinite loops are a countermeasure against "Javascript hijacking", a type of attack that gained public attention with an attack on Gmail that was published by Jeremiah Grossman.
The idea is as simple as beautiful: A lot of users tend to be logged in permanently in Gmail or Facebook. So what you do is you set up a site and in your malicious site's Javascript you override the object or array constructor:
function Object() {
//Make an Ajax request to your malicious site exposing the object data
}
then you include a <script> tag in that site such as
<script src="http://www.example.com/object.json"></script>
And finally you can read all about the JSON objects in your malicious server's logs.
As promised, the link to the paper.
This looks like a hack to prevent a CSRF attack. There are browser-specific ways to hook into object creation, so a malicious website could use do that first, and then have the following:
<script src="http://0.131.channel.facebook.com/x/1476579705/51033089/false/p_1524926084=0" />
If there weren't an infinite loop before the JSON, an object would be created, since JSON can be eval()ed as javascript, and the hooks would detect it and sniff the object members.
Now if you visit that site from a browser, while logged into Facebook, it can get at your data as if it were you, and then send it back to its own server via e.g., an AJAX or javascript post.

GET vs POST in AJAX?

Why are there GET and POST requests in AJAX as it does not affect page URL anyway? What difference does it make by passing sensitive data over GET in AJAX as the data is not getting reflected to page URL?
You should use the proper HTTP verb according to what you require from your web service.
When dealing with a Collection URI like: http://example.com/resources/
GET: List the members of the collection, complete with their member URIs for further navigation. For example, list all the cars for sale.
PUT: Meaning defined as "replace the entire collection with another collection".
POST: Create a new entry in the collection where the ID is assigned automatically by the collection. The ID created is usually included as part of the data returned by this operation.
DELETE: Meaning defined as "delete the entire collection".
When dealing with a Member URI like: http://example.com/resources/7HOU57Y
GET: Retrieve a representation of the addressed member of the collection expressed in an appropriate MIME type.
PUT: Update the addressed member of the collection or create it with the specified ID.
POST: Treats the addressed member as a collection in its own right and creates a new subordinate of it.
DELETE: Delete the addressed member of the collection.
Source: Wikipedia
Well, as for GET, you still have the url length limitation. Other than that, it is quite conceivable that the server treats POST and GET requests differently; thus the need to be able to specify what request you're doing.
Another difference between GET and POST is the way caching is handled in browsers. POST response is never cached. GET may or may not be cached based on the caching rules specified in your response headers.
Two primary reasons for having them:
GET requests have some pretty restrictive limitations on size; POST are typically capable of containing much more information.
The backend may be expecting GET or POST, depending on how it's designed. We need the flexibility of doing a GET if the backend expects one, or a POST if that's what it's expecting.
It's simply down to respecting the rules of the http protocol.
Get - calls must be idempotent. This means that if you call it multiple times you will get the same result. It is not intended to change the underlying data. You might use this for a search box etc.
Post - calls are NOT idempotent. It is allowed to make a change to the underlying data, so might be used in a create method. If you call it multiple times you will create multiple entries.
You normally send parameters to the AJAX script, it returns data based on these parameters. It works just like a form that has method="get" or method="post". When using the GET method, the parameters are passed in the query string. When using POST method, the parameters are sent in the post body.
Generally, if your parameters have very few characters and do not contain sensitive information then you send them via GET method. Sensitive data (e.g. password) or long text (e.g. an 8000 character long bio of a person) are better sent via POST method.
Thanks..
I mainly use the GET method with Ajax and I haven't got any problems until now except the following:
Internet Explorer (unlike Firefox and Google Chrome) cache GET calling if using the same GET values.
So, using some interval with Ajax GET can show the same results unless you change URL with irrelevant random number usage for each Ajax GET.
Others have covered the main points (context/idempotency, and size), but i'll add another: encryption. If you are using SSL and want to encrypt your input args, you need to use POST.
When we use the GET method in Ajax, only the content of the value of the field is sent, not the format in which the content is. For example, content in the text area is just added in the URL in case of the GET method (without a new line character). That is not the case in the POST method.

GET vs POST in Ajax

What is the difference between GET and POST for Ajax requests?
I don't see any difference between those two, except that when I use GET, the parameters are send in URL, which for me don't really make any difference, since all requests are made on background and user doesn't find any difference.
edit:
What are PUT and DELETE methods used for?
GET is designed for getting data from the server. POST (and lesser-known friends PUT and DELETE) are designed for modifying data on the server.
A GET request should never cause data to be removed from an application. If you have a link you can click on with a GET to remove data, then Google spidering your site could click on all your "Delete" links.
The canonical answer can be found here, which quotes the HTML 2.0 spec:
If the processing of a form is idempotent (i.e. it has no lasting
observable effect on the state of the
world), then the form method should be
GET. Many database searches have no
visible side-effects and make ideal
applications of query forms.
If the service associated with the processing of a form has side effects
(for example, modification of a
database or subscription to a
service), the method should be POST.
In your AJAX call, you need to use whatever method your server supports. You should always design your server so that operations that modify data are called by POST/PUT/DELETE. Other comments have links to REST, which generally maps C/R/U/D to "POST or PUT"(Create)/GET(Read)/PUT(Update)/DELETE(Delete).
If you're sending large amounts of data, or sensitive data over HTTPS, you will want to use POST. If it's just a simple parameter, I would use GET.
GET requests have a limit to the amount of data that can be sent. I forget the exact number, but this can cause issues if you're sending anything substantial.
Basically the difference between GET and POST is that in a GET request, the parameters are passed in the URL where as in a POST, the parameters are included in the message body.
Whether its AJAX or not is irrelevant. Its about the action that you're taking. I'd recommend following the principles of REST. Which have further provisions for updating, deleting, etc...
GET requests are easier to exploit in CSRF (cross site request forgery) attacks. Namely fake POST requests require Javascript to be enabled on the user side, while fake GET requests are still possible just with img, script tags.
Many web servers limit the length of the data that can be passed as part of the URL, so the GET request may break in odd ways that are hard to debug.
Also, most server software logs URLs in the access logs, so if you pass sensitive information (such as passwords) in a GET request, this will in all likelihood be written to disk in plaintext.
From a REST perspective, GET requests should have no side-effects -- they shouldn't modify data. So, if you're just GETting a resource by ID, this makes sense, but if you're committing changes to a resource, you should be using PUT, POST, or UPDATE for the http verb.
Both are used to send some data and receive some response using that data.
GET: Get information store in server. Ie. Search, tweet, Person Information. If you want to send information then get request send request using process.php?name=subroto
So it basically send information through url. Url cannot handle more than 2083 char. So for blog post can you remember it is not possible?
POST: Post do same thing as get. User registration, User login, Big data send, Blog Post.
If you need to send secure information then use post or for big data as it not go through url.
AJAX: $.get() and $.post() contain features that are subsets of $.ajax(). It has much configuration.
$.get () method, which is a kind of shorthand for $.Ajax (). When using $.get (), instead of passing in an object, you pass in arguments. At minimum, you’ll need the first two arguments, which are the URL of the file you want to retrieve (i.e. ‘test.txt’) and a success callback.
Summary:
$.get( url [, data ] [, success ] [, dataType ] )
$.post( url [, data ] [, success ] [, dataType ] ) // for sending secure or Large information
$.ajax( url [, settings ] ) // More Configaration
First, general information. Use GET if you only read data, use POST if you change something on database, txt files etc.
But the problem is, some browsers cache GET results. I had problems with AJAX requests in IE7, but at last I found out that browser caches GET results. I rethought the flow and changes my request to POST.
So, don't use GET if you don't want caching.
(Of course you can disable caching in GET operations. But I didn't prefer it)
About me, i prefer POST. I reserve get to the events i know the sent value is limited to data i have the "control", for example, to retreive an item with an id. Example, "getitem?id=123", "deleteImtem?id=123", ... For the other cases, when i have a form fillable by a user, i prefer POST.
Like Ryan Smith have said, it's better to use POST to send a large amount of data, and less wories in cases of the use in others language/special chars (generally all majors javascript framework should'nt have any problems to deal with that but i think is less wories to use POST).
For the REST perspective, in my opinion, you can use this with a new project (to keep a consistency with the entire project).
Finally, maybee some programs used in a network (URL loguers (ie.: to see if the employees lost their time on non-autorised sites, ...) proxys, ... ) or any other kind of tool can intercept the query. Somes will show in the reports the params you have sent with GET, considering it like a different web page. But in this situation, is could be not your problem it's changes from a project to an other! ;)
The difference is the same between GET and POST whether you're using Ajax, HTML forms, or curl. Here are the relevant definitions:
GET
POST
If you are passing on any arguments with characters that can get messed up in the URL (such as spaces), you use POST. Otherwise you can use GET.
Generally, if you're just passing on a few tiny arguments you would use GET. But for passing on user submitted information such as blog entries, text, etc, its a good practice to use POST.
There are also certain frameworks that rely completely on segment based urls (such as site.com/products/133 rather than site.com/products.php?id=333 and these frameworks unset the GET variables for security. In such cases you would use POST allt the time.

Ajax and filenames - Best practices

I am using jQuery to call PHP files via the $.get method
function fetchDepartment(company_id)
{
$.get("ajax/fetchDepartment.php?sec=departments&company_id="+company_id, function(data){
$("#department_id").html(data);
});
}
What I am thinking is can I secure the filename even further?
Currently I have a global access check within the .php file that check if the user is logged in, if he can access this data etc.
But I am wondering if there are further steps I can take so a user couldn't see this filename, or what other steps you recommend to take.
Encoded requests
You could make the request details effectively invisible to the casual miscreant by encoding almost all of the URL and then decoding the request details server-side.
The request details would include the action you wish to perform plus the parameters relevant to that action.
All requests would be sent to a single URL, where a server-side process would decode the request details and perform the relevant action as required.
Example Original URL:
/ajax/delete.php?parameter1=foo&parameter2=bar
Request details:
action=delete&parameter1=foo&parameter2=bar
Encoded request details (encoded using base64):
YWN0aW9uPWRlbGV0ZSZwYXJhbWV0ZXIxPWZvbyZwYXJhbWV0ZXIyPWJhcg==
Encoded URL:
/ajax/?request=YWN0aW9uPWRlbGV0ZSZwYXJhbWV0ZXIxPWZvbyZwYXJhbWV0ZXIyPWJhcg==
I don't believe there is native functionality to encode to base64 in JavaScript, but it's far from impossible to find a suitable method or to write your own.
With obfuscated/minified client-side JavaScript it would be quite tricky for someone to determine how to make a request 'by hand'.
Hide implementation details
There are a number of practices you can follow to make your application less susceptible to attack through URL misuse.
Let's start with a URL of: ajax/fetchDepartment.php?sec=departments&company_id=99
There's no need to reveal what server-side technology you're using (PHP) nor, through the query string (sec, company_id), what the query string values actually mean.
Masking the server-side technology
Assuming you have index.php defined as a default, the following URLs are equivalent:
ajax/fetchDepartment.php?sec=departments&company_id=99
ajax/fetchDepartment/index.php?sec=departments&company_id=99
ajax/fetchDepartment/?sec=departments&company_id=99
The third URL does not reveal the server-side technology you're using. This limits the range of possible attacks. It also makes it easier for you to switch over to a different server-side technology without changing your URLs.
Hiding the meaning of request parameters
ajax/fetchDepartment/?sec=departments&company_id=99
ajax/99/departments/
The latter URL still conveys enough information to perform the request without revealing what the information means.
Whilst someone could still change the values, they won't know what they're changing. This will make it more difficult for an attacker to evaluate and understand the result of any URL changes they make.
Pretty much the only way you can obscure the URL for a certain piece of information from the user is by not loading it in through http. Maybe you can load a set of data on the calling page, or another page with a more generic url.

Resources