Related
After the most recent Firefox update (68.0), I am having problems with persistent session data.
When a user logs in, as the page loads, there are various expected CSP violations that send a POST request containing the violation report to the path report-uri directive contains.
Subsequent API GET requests to retrieve user data returns a 403 Forbidden, which (by design) redirects the user back to the login page. Since the user is logged in already, same API requests are sent that result in another 403, which leads to an infinite loop until after an arbitrary number of loops API requests return 200 OK.
All requests (both POST and GET) before and after the update are the same.
It seems to me that the fact that there are CSP report POST requests before the API requests changes something related to the session, which is used by the back-end to determine if the user has the correct privileges.
Could Firefox have changed something about the way it handles CSP report-uri requests or their responses change with the update?
What would be a good way to approach this problem?
Firefox has just been updated to version 68.0.1. The update seems to have fixed this problem. Release notes don't seem to be related to this in a way I can make sense, but regardless, the problem is solved.
tl;dr; About the Same Origin Policy
I have a Grunt process which initiates an instance of express.js server. This was working absolutely fine up until just now when it started serving a blank page with the following appearing in the error log in the developer's console in Chrome (latest version):
XMLHttpRequest cannot load https://www.example.com/
No 'Access-Control-Allow-Origin' header is present on the requested
resource. Origin 'http://localhost:4300' is therefore not allowed access.
What is stopping me from accessing the page?
tl;dr — When you want to read data, (mostly) using client-side JS, from a different server you need the server with the data to grant explicit permission to the code that wants the data.
There's a summary at the end and headings in the answer to make it easier to find the relevant parts. Reading everything is recommended though as it provides useful background for understanding the why that makes seeing how the how applies in different circumstances easier.
About the Same Origin Policy
This is the Same Origin Policy. It is a security feature implemented by browsers.
Your particular case is showing how it is implemented for XMLHttpRequest (and you'll get identical results if you were to use fetch), but it also applies to other things (such as images loaded onto a <canvas> or documents loaded into an <iframe>), just with slightly different implementations.
The standard scenario that demonstrates the need for the SOP can be demonstrated with three characters:
Alice is a person with a web browser
Bob runs a website (https://www.example.com/ in your example)
Mallory runs a website (http://localhost:4300 in your example)
Alice is logged into Bob's site and has some confidential data there. Perhaps it is a company intranet (accessible only to browsers on the LAN), or her online banking (accessible only with a cookie you get after entering a username and password).
Alice visits Mallory's website which has some JavaScript that causes Alice's browser to make an HTTP request to Bob's website (from her IP address with her cookies, etc). This could be as simple as using XMLHttpRequest and reading the responseText.
The browser's Same Origin Policy prevents that JavaScript from reading the data returned by Bob's website (which Bob and Alice don't want Mallory to access). (Note that you can, for example, display an image using an <img> element across origins because the content of the image is not exposed to JavaScript (or Mallory) … unless you throw canvas into the mix in which case you will generate a same-origin violation error).
Why the Same Origin Policy applies when you don't think it should
For any given URL it is possible that the SOP is not needed. A couple of common scenarios where this is the case are:
Alice, Bob, and Mallory are the same person.
Bob is providing entirely public information
… but the browser has no way of knowing if either of the above is true, so trust is not automatic and the SOP is applied. Permission has to be granted explicitly before the browser will give the data it has received from Bob to some other website.
Why the Same Origin Policy applies to JavaScript in a web page but little else
Outside the web page
Browser extensions*, the Network tab in browser developer tools, and applications like Postman are installed software. They aren't passing data from one website to the JavaScript belonging to a different website just because you visited that different website. Installing software usually takes a more conscious choice.
There isn't a third party (Mallory) who is considered a risk.
* Browser extensions do need to be written carefully to avoid cross-origin issues. See the Chrome documentation for example.
Inside the webpage
Most of the time, there isn't a great deal of information leakage when just showing something on a webpage.
If you use an <img> element to load an image, then it gets shown on the page, but very little information is exposed to Mallory. JavaScript can't read the image (unless you use a crossOrigin attribute to explicitly enable request permission with CORS) and then copy it to her server.
That said, some information does leak so, to quote Domenic Denicola (of Google):
The web's fundamental security model is the same origin policy. We
have several legacy exceptions to that rule from before that security
model was in place, with script tags being one of the most egregious
and most dangerous. (See the various "JSONP" attacks.)
Many years ago, perhaps with the introduction of XHR or web fonts (I
can't recall precisely), we drew a line in the sand, and said no new
web platform features would break the same origin policy. The existing
features need to be grandfathered in and subject to carefully-honed
and oft-exploited exceptions, for the sake of not breaking the web,
but we certainly can't add any more holes to our security policy.
This is why you need CORS permission to load fonts across origins.
Why you can display data on the page without reading it with JS
There are a number of circumstances where Mallory's site can cause a browser to fetch data from a third party and display it (e.g. by adding an <img> element to display an image). It isn't possible for Mallory's JavaScript to read the data in that resource though, only Alice's browser and Bob's server can do that, so it is still secure.
CORS
The Access-Control-Allow-Origin HTTP response header referred to in the error message is part of the CORS standard which allows Bob to explicitly grant permission to Mallory's site to access the data via Alice's browser.
A basic implementation would just include:
Access-Control-Allow-Origin: *
… in the response headers to permit any website to read the data.
Access-Control-Allow-Origin: http://example.com
… would allow only a specific site to access it, and Bob can dynamically generate that based on the Origin request header to permit multiple, but not all, sites to access it.
The specifics of how Bob sets that response header depend on Bob's HTTP server and/or server-side programming language. Users of Node.js/Express.js should use the well-documented CORS middleware. Users of other platforms should take a look at this collection of guides for various common configurations that might help.
NB: Some requests are complex and send a preflight OPTIONS request that the server will have to respond to before the browser will send the GET/POST/PUT/Whatever request that the JS wants to make. Implementations of CORS that only add Access-Control-Allow-Origin to specific URLs often get tripped up by this.
Obviously granting permission via CORS is something Bob would only do only if either:
The data was not private or
Mallory was trusted
How do I add these headers?
It depends on your server-side environment.
If you can, use a library designed to handle CORS as they will present you with simple options instead of having to deal with everything manually.
Enable-Cors.org has a list of documentation for specific platforms and frameworks that you might find useful.
But I'm not Bob!
There is no standard mechanism for Mallory to add this header because it has to come from Bob's website, which she does not control.
If Bob is running a public API then there might be a mechanism to turn on CORS (perhaps by formatting the request in a certain way, or a config option after logging into a Developer Portal site for Bob's site). This will have to be a mechanism implemented by Bob though. Mallory could read the documentation on Bob's site to see if something is available, or she could talk to Bob and ask him to implement CORS.
Error messages which mention "Response for preflight"
Some cross-origin requests are preflighted.
This happens when (roughly speaking) you try to make a cross-origin request that:
Includes credentials like cookies
Couldn't be generated with a regular HTML form (e.g. has custom headers or a Content-Type that you couldn't use in a form's enctype).
If you are correctly doing something that needs a preflight
In these cases then the rest of this answer still applies but you also need to make sure that the server can listen for the preflight request (which will be OPTIONS (and not GET, POST, or whatever you were trying to send) and respond to it with the right Access-Control-Allow-Origin header but also Access-Control-Allow-Methods and Access-Control-Allow-Headers to allow your specific HTTP methods or headers.
If you are triggering a preflight by mistake
Sometimes people make mistakes when trying to construct Ajax requests, and sometimes these trigger the need for a preflight. If the API is designed to allow cross-origin requests but doesn't require anything that would need a preflight, then this can break access.
Common mistakes that trigger this include:
trying to put Access-Control-Allow-Origin and other CORS response headers on the request. These don't belong on the request, don't do anything helpful (what would be the point of a permissions system where you could grant yourself permission?), and must appear only on the response.
trying to put a Content-Type: application/json header on a GET request that has no request body the content of which to describe (typically when the author confuses Content-Type and Accept).
In either of these cases, removing the extra request header will often be enough to avoid the need for a preflight (which will solve the problem when communicating with APIs that support simple requests but not preflighted requests).
Opaque responses (no-cors mode)
Sometimes you need to make an HTTP request, but you don't need to read the response. e.g. if you are posting a log message to the server for recording.
If you are using the fetch API (rather than XMLHttpRequest), then you can configure it to not try to use CORS.
Note that this won't let you do anything that you require CORS to do. You will not be able to read the response. You will not be able to make a request that requires a preflight.
It will let you make a simple request, not see the response, and not fill the Developer Console with error messages.
How to do it is explained by the Chrome error message given when you make a request using fetch and don't get permission to view the response with CORS:
Access to fetch at 'https://example.com/' from origin 'https://example.net' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
Thus:
fetch("http://example.com", { mode: "no-cors" });
Alternatives to CORS
JSONP
Bob could also provide the data using a hack like JSONP which is how people did cross-origin Ajax before CORS came along.
It works by presenting the data in the form of a JavaScript program that injects the data into Mallory's page.
It requires that Mallory trust Bob not to provide malicious code.
Note the common theme: The site providing the data has to tell the browser that it is OK for a third-party site to access the data it is sending to the browser.
Since JSONP works by appending a <script> element to load the data in the form of a JavaScript program that calls a function already in the page, attempting to use the JSONP technique on a URL that returns JSON will fail — typically with a CORB error — because JSON is not JavaScript.
Move the two resources to a single Origin
If the HTML document the JS runs in and the URL being requested are on the same origin (sharing the same scheme, hostname, and port) then the Same Origin Policy grants permission by default. CORS is not needed.
A Proxy
Mallory could use server-side code to fetch the data (which she could then pass from her server to Alice's browser through HTTP as usual).
It will either:
add CORS headers
convert the response to JSONP
exist on the same origin as the HTML document
That server-side code could be written & hosted by a third party (such as CORS Anywhere). Note the privacy implications of this: The third party can monitor who proxies what across their servers.
Bob wouldn't need to grant any permissions for that to happen.
There are no security implications here since that is just between Mallory and Bob. There is no way for Bob to think that Mallory is Alice and to provide Mallory with data that should be kept confidential between Alice and Bob.
Consequently, Mallory can only use this technique to read public data.
Do note, however, that taking content from someone else's website and displaying it on your own might be a violation of copyright and open you up to legal action.
Writing something other than a web app
As noted in the section "Why the Same Origin Policy only applies to JavaScript in a web page", you can avoid the SOP by not writing JavaScript in a webpage.
That doesn't mean you can't continue to use JavaScript and HTML, but you could distribute it using some other mechanism, such as Node-WebKit or PhoneGap.
Browser extensions
It is possible for a browser extension to inject the CORS headers in the response before the Same Origin Policy is applied.
These can be useful for development but are not practical for a production site (asking every user of your site to install a browser extension that disables a security feature of their browser is unreasonable).
They also tend to work only with simple requests (failing when handling preflight OPTIONS requests).
Having a proper development environment with a local development server
is usually a better approach.
Other security risks
Note that SOP / CORS do not mitigate XSS, CSRF, or SQL Injection attacks which need to be handled independently.
Summary
There is nothing you can do in your client-side code that will enable CORS access to someone else's server.
If you control the server the request is being made to: Add CORS permissions to it.
If you are friendly with the person who controls it: Get them to add CORS permissions to it.
If it is a public service:
Read their API documentation to see what they say about accessing it with client-side JavaScript:
They might tell you to use specific URLs
They might support JSONP
They might not support cross-origin access from client-side code at all (this might be a deliberate decision on security grounds, especially if you have to pass a personalized API Key in each request).
Make sure you aren't triggering a preflight request you don't need. The API might grant permission for simple requests but not preflighted requests.
If none of the above apply: Get the browser to talk to your server instead, and then have your server fetch the data from the other server and pass it on. (There are also third-party hosted services that attach CORS headers to publically accessible resources that you could use).
Target server must allowed cross-origin request. In order to allow it through express, simply handle http options request :
app.options('/url...', function(req, res, next){
res.header('Access-Control-Allow-Origin', "*");
res.header('Access-Control-Allow-Methods', 'POST');
res.header("Access-Control-Allow-Headers", "accept, content-type");
res.header("Access-Control-Max-Age", "1728000");
return res.sendStatus(200);
});
As this isn't mentioned in the accepted answer.
This is not the case for this exact question, but might help others that search for that problem
This is something you can do in your client-code to prevent CORS errors in some cases.
You can make use of Simple Requests.
In order to perform a 'Simple Requests' the request needs to meet several conditions. E.g. only allowing POST, GET and HEAD method, as well as only allowing some given Headers (you can find all conditions here).
If your client code does not explicit set affected Headers (e.g. "Accept") with a fix value in the request it might occur that some clients do set these Headers automatically with some "non-standard" values causing the server to not accept it as Simple Request - which will give you a CORS error.
This is happening because of the CORS error. CORS stands for Cross Origin Resource Sharing. In simple words, this error occurs when we try to access a domain/resource from another domain.
Read More about it here: CORS error with jquery
To fix this, if you have access to the other domain, you will have to allow Access-Control-Allow-Origin in the server. This can be added in the headers. You can enable this for all the requests/domains or a specific domain.
How to get a cross-origin resource sharing (CORS) post request working
These links may help
This CORS issue wasn't further elaborated (for other causes).
I'm having this issue currently under different reason.
My front end is returning 'Access-Control-Allow-Origin' header error as well.
Just that I've pointed the wrong URL so this header wasn't reflected properly (in which i kept presume it did). localhost (front end) -> call to non secured http (supposed to be https), make sure the API end point from front end is pointing to the correct protocol.
I got the same error in Chrome console.
My problem was, I was trying to go to the site using http:// instead of https://. So there was nothing to fix, just had to go to the same site using https.
This bug cost me 2 days. I checked my Server log, the Preflight Option request/response between browser Chrome/Edge and Server was ok. The main reason is that GET/POST/PUT/DELETE server response for XHTMLRequest must also have the following header:
access-control-allow-origin: origin
"origin" is in the request header (Browser will add it to request for you). for example:
Origin: http://localhost:4221
you can add response header like the following to accept for all:
access-control-allow-origin: *
or response header for a specific request like:
access-control-allow-origin: http://localhost:4221
The message in browsers is not clear to understand: "...The requested resource"
note that:
CORS works well for localhost. different port means different Domain.
if you get error message, check the CORS config on the server side.
In most housing services just add in the .htaccess on the target server folder this:
Header set Access-Control-Allow-Origin 'https://your.site.folder'
I had the same issue. In my case i fixed it by adding addition parameter of timestamp to my URL. Even this was not required by the server I was accessing.
Example yoururl.com/yourdocument?timestamp=1234567
Note: I used epos timestamp
"Get" request with appending headers transform to "Options" request. So Cors policy problems occur. You have to implement "Options" request to your server. Cors Policy about server side and you need to allow Cors Policy on your server side. For Nodejs server:details
app.use(cors)
For Java to integrate with Angular:details
#CrossOrigin(origins = "http://localhost:4200")
You should enable CORS to get it working.
At the moment I have the following understanding (which, I assume, is incomplete and probably even wrong).
A web server receives request from a client. The requests are coming to a particular "path" ("address", "URL") and have a particular type (GET, POST and probably something else?). The GET and POST requests can also come with variables and their values (which can be though as a "dictionary" or "associate array"). The parameters of GET requests are set in the address line (for example: http://example.com?x=1&y=2) while parameters of POST requests are set by the client (user) via web forms (in other words, a user fills in a form and press "Submit" button).
In addition to that we have what is called SESSION (also known as COOKIES). This works the following way. When a web-server gets a request (of GET or POST type) it (web server) checks the values of the sent parameters and based on that it generates and sends back to the client HTML code that is displayed in a browser (and is seen by the user). In addition to that the web servers sends some parameters (which again can be imagined as "dictionary" or "associative arrays"). These parameters are saved by the browser somewhere on the client side and when a client sends a new request, he/she also sends back the session parameters received earlier from the web server. In fact server says: you get this from me, memorize it and next time when you speak to me, give it back (so, I can recognize you).
So, what I do not know is if client can see what exactly is in the session (what parameters are there and what values they have) and if client is able to modify the values of these parameters (or add or remove parameters). But what user can do, he/she might decide not to accept any cookies (or session).
There is also something called "local storage" (it is available in HTML5). It works as follows. Like SESSION it is some information sent by web-server to the client and is also memorized (saved) by the client (if client wants to). In contrast to the session it is not sent back b the client to the server. Instead, JavaScripts running on the client side (and send by web-servers as part of the HTML code) can access information from the local storage.
What I am still missing, is how AJAX is working. It is like by clicking something in the browser users sends (via Browser) a request to the web-server and waits for a response. Then the browser receives some response and use it to modify (but not to replace) the page observed by the user. What I am missing is how the browser knows how to use the response from the web-server. Is it written in the HTML code (something like: if this is clicked, send this request to the web server, and use its answer (provided content) to modify this part of the page).
I am going to answer to your questions on AJAX and LocalStorage, also on a very high level, since your definition strike me as such on a high level.
AJAX stands for Asynchronous JavaScript and XML.
Your browser uses an object called XMLHTTPRequest in order to establish an HTTP request with a remote resource.
The client, being a client, is oblivious of what the remote server entails on. All it has to do is provide the request with a URL, a method and optionally the request's payload. The payload is most commonly a parameter or a group of parameters that are received by the remote server.
The request object has several methods and properties, and it also has its ways of handling the response.
What I am missing is how the browser knows how to use the response
from the web-server.
You simply tell it what to do with the reponse. As mentioned above, the request object can also be told what to do with a response. It will listen to a response, and when such arrives, you tell the client what to do with it.
Is it (the response) written in the HTML code?
No. The response is written in whatever the server served it. Most commonly, it's Unicode. A common way to serve a response is a JSON (JavaScript Object Notation) object.
Whatever happens afterwards is a pure matter of implementation.
LocalStorage
There is also something called "local storage" (it is available in
HTML5). It works as follows. Like SESSION it is some information sent
by web-server to the client and is also memorized (saved) by the
client (if client wants to)
Not entirely accurate. Local Storage is indeed a new feature, introduced with HTML5. It is a new way of storing data in the client, and is unique to an origin. By origin, we refer to a unique protocol and a domain.
The life time of a Local Storage object on a client (again, per unique origin), is entirely up to the user. That said, of course a client application can manipulate the data and decide what's inside a local storage object. You are right about the fact that it is stored and can be used in the client through JavaScript.
Example: some web tracking tools want to have some sort of a back up plan, in case the server that collects user data is unreachable for some reason. The web tracker, sometimes introduced as a JavaScript plugin, can write any event to the local storage first, and release it only when the remote server confirmed that it received the event successfully, even if the user closed the browser.
First of all, this is just a simple explanation to clarify your mind. To explain this stuff in more detail we would need to write a book. This been said, I'll go step by step.
Request
A request is a client asking for / sending data to a server.
This request has the following parts:
An URL (Protocol + hostname/IP + path)
A Method (GET, POST, PUT, DELETE, PATCH, and so on)
Some optional parameters (the way they are sent depends on the method)
Some headers (metadata sent to the server)
Some optional cookies
An optional SESSION ID
Some explanations about this:
Cookies can be set by the client or by the server, but they are always stored by the client's browser. Therefore, the browser can decide whether to accept them or not, or to delete or modify them
Session is stored in the server. The server sends the client a session ID to be able to recongnize him in any future request.
Session and cookies are two different things. One is server side, and the other is client side.
AJAX
I'll ommit the meaning of the acronym as you can easily google it.
The great thing about AJAX is the very first A, that stands for asynchronous, what means that the JS engine (in this case built in the browser) won't block until the response gets back.
To understand how AJAX works, you have to know that it's very much alike a common request, but with the difference that it can be triggered without reloading the web page.
The content of the response it whatever you want it to be. From some HTML code, to a JSON string. Even some plain text.
The way the response is treated depends on the implementation and programming. As an example, you could simply alert() the result of an AJAX call, or you and append it to a DOM element.
Local Storage
This doesn't have much to do with anything.
Local storage is just some disk space offered by the browser so you can save data in the browser that persists even if the page or the browser is closed.
An example
Chrome offers a javascript API to manage local storage. It's client side, and you can programmatically access to this storage and make CRUD opperations. It's just like a non-sql non-relational DB in the browser.
I wil summarize your main questions along with a brief answer right below them:
Q1:
Can the client see what exactly is in the session?
A: No. The client only knows the "SessionID", which is meta-data (all other session-data is stored on server only, and client can't see or alter it). The SessionID is used by the server only to identify the client and to map the application process to it's previous state.
HTTP is a stateless protocol, and this classic technique enables it as stateful.
There are very rare cases when the complete session data is stored on client-side (but in such cases, the server should also encrypt the session data so that the client can't see/alter it).
On the other hand, there are web clients that don't have the capability to store cookies at all, or they have features that prevent storing cookie data (e.g. the ability of the user to reject cookies from domains). In such cases, the workaround is to inject the SessionID into URL parameters, by using HTTP redirects.
Q2:
What's the difference between HTML5 LocalStorage and Session?
LocalStorage can be viewed as the client's own 'session' data, or better said a local data store where the client can save/persist data. Only the client (mainly from javascript) can access and alter the data. Think of it as javascript-controlled persistent storage (with the advantage over cookies that you can control what data, it's structure and the format you want to store it). It's also more advantageous than storing data to cookies - which have their own limitations such as data size and structure.
Q3:
How AJAX works?
In very simple words, AJAX means loading on-demand data on top of an already loaded (HTML) page. A typical http request would load the whole data of a page, while an ajax request would load and update just a portion of the (already-loaded) page.
This being said, an AJAX request is very similar to a standard HTTP Request.
Ajax requests are controlled by the javascript code and it can enrich the interaction with the page. You can request specific segments of data and update sections of the page.
Now, if we remember the old days when any interaction with a website (eg. signing in, navigating to other pages etc.) required a complete page reload? Back then, a lot of unnecessary traffic occurred just to perform any simple action. This in turn impacted site responsiveness, user experience, network traffic etc.
This happened due to browsers incapability (at that time) to [a.] perform a parallel HTTP request to the server and [b]render a partial HTML view.
Modern browsers come with these two features that enables AJAX technology - that is, invoking asynchronous(parallel) HTTP Requests (Ajax HTTP Requests) and they also provide on-the-fly DOM alteration mechanism via javascript (real-time HTML Document Object Model manipulation).
Please let me know if you need more info on these topics, or if there's anything else I can help with.
For a more profound understanding, I also recommend this nice web history article as it explains how everything started from when HTML was created and what was it's purpose (to define [at the time] rich documents), and then how HTTP was initially created and what problem it solved (at the time - to "transfer" static HTML). That explains why it is a stateless protocol.
Later on, as HTML and the WEB evolved, other needs emerged (such as the need to interact with the end-user) - and then the Cookie mechanism enhanced the protocol to enable stateful client-server communication by using session cookies. Then Ajax followed. Nowadays, the cookies come with their own limitations too and we have LocalStorage. Did I also mention WebSockets?
1. Establishing a Connection
The most common way web servers and clients communicate is through a connection which follows Transmission Control Protocol, or TCP. Basically, when using TCP, a connection is established between client and server machines through a series of back-and-forth checks. Once the connection is established and open, data can be sent between client and server. This connection can also be termed a Session.
There is also UDP, or User Datagram Protocol which has a slightly different way of communicating and comes with its own set of pros and cons. I believe recently some browsers may have begun to use a combination of the two in order to get the best results.
There is a ton more to be said here, but unless you are going to be writing a browser (or become a hacker) this should not concern you too much beyond the basics.
2. Sending Packets
Once the client-server connection is established, packets of data can be sent between the two. TCP packets contain various bits of information to assist in communication between the two ports. For web programmers, the most important part of the packet will be the section which includes the HTTP request.
HTTP, Hypertext Transfer Protocol is another protocol which describes what the makeup/format of these client-server communications should be.
A most basic example of the relevant portion of a packet sent from a client to a server is as follows:
GET /index.html HTTP/1.1
Host: www.example.com
The first line here is called the Response line. GET describes the method to be used, (others include POST, HEAD, PUT, DELETE, etc.) /index.html describes the resource requested. Finally, HTTP/1.11 describes the protocol being used.
The second line is in this case the only header field in the request, and in this case it is the HOST field which is sort of an alias for the IP address of the server, given by the DNS.
[Since you mentioned it, the difference between a GET request and a POST request is simply that in a GET request the parameters (ex: form data) is included as part of the Response Line, whereas in a POST request the parameters will be included as part of the Message Body (see below).]
3. Receiving Packets
Depending on the request sent to the server, the server will scratch its head, think about what you asked it, and respond accordingly (aka whatever you program it to do).
Here is an example of a response packet send from the server:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
...
<html>
<head>
<title>A response from a server</title>
</head>
<body>
<h1>Hello World!</h1>
</body>
</html>
The first line here is the Status Line which includes a numerical code along with a brief text description. 200 OK obviously means success. Most people are probably also familiar with 404 Not Found, for example.
The second line is the first of the Response Header Fields. Other fields often added include date, Content-Length, and other useful metadata.
Below the headers and the necessary empty line is finally the (optional) Message Body. Of course this is usually the most exciting part of the response, as it will contain things like HTML for our browsers to display for us, JSON data, or pretty much anything you can code in a return statement.
4. AJAX, Asynchronous JavaScript and XML
Based off all of that, AJAX is fairly simple to understand. In fact, the packets sent and received can look identical to non-ajax requests.
The only difference is how and when the browser decides to send a request packet. Normally, upon page refresh a browser will send a request to the server. However, when issuing an AJAX request, the programmer simply tells the browser to please send a packet to the server NOW as opposed to on page refresh.
However, given the nature of AJAX requests, usually the Message Body won't contain an entire HTML document, but will request smaller, more specific bits of data, such as a query from a database.
Then, your JavaScript which calls the Ajax can also act based off the response. Any JavaScript method is available as making an Ajax call is just another JavaScript function. Thus, you can do things like innerHTML to add/replace content on your page with some HTML returned by the Ajax call. Alternatively though, you could also do something like make an Ajax call which simply should return True or False, and then you could call some JavaScript function with an if else statement. As you can hopefully see, Ajax has nothing to do with HTML per say, it is just a JavaScript function which makes a request from the server and returns the response, whatever it may be.
5. Cookies
HTTP Protocol is an example of a Stateless Protocol. Basically, this means that each pair of Request and Response (like we described) is treated independently of other requests and responses. Thus, the server does not have to keep track of all the thousands of users who are currently demanding attention. Instead, it can just respond to each request individually.
However, sometimes we wish the server would remember us. How annoying would it be if every time I waned to check my Gmail I had to log in all over again because the server forgot about me?
To solve this problem a server can send Cookies to be stored on the client's machine. The server can send a response which tells the client to store a cookie and what exactly it should contain. The client's browser is in charge of storing these cookies on the client's system, thus the location of these cookies will vary depending on your browser and OS. It is important to realize though that these are just small files stored on the client machine which are in fact readable and writable by anyone who knows how to locate and understand them. As you can imagine, this poses a few different potential security threats. One solution is to encrypt the data stored inside these cookies so that a malicious user won't be able to take advantage of the information you made available. (Since your browser is setting these cookies, there is usually a setting within your browser which you can modify to either accept, reject, or perhaps set a new location for cookies.
This way, when the client makes a request from the server, it can include the Cookie within one of the Request Header Fields which will tell the server, "Hey I am an authenticated user, my name is Bob, and I was just in the middle of writing an extremely captivating blog post before my laptop died," or, "I have 3 designer suits picked out in my shopping cart but I am still planning on searching your site tomorrow for a 4th," for example.
6. Local Storage
HTML5 introduced Local Storage as a more secure alternative to Cookies. Unlike cookies, with local storage data is not actually sent to the server. Instead, the browser itself keeps track of State.
This alternative also allows much larger amounts of data to be stored, as there is no requirement for it to be passed across the internet between client and server.
7. Keep Researching
That should cover the basics and give a pretty clear picture as to what is going on between clients and servers. There is more to be said on each of these points, and you can find plenty of information with a simple Google search.
I'm not sure I understand what types of vulnerabilities this causes.
When I need to access data from an API I have to use ajax to request a PHP file on my own server, and that PHP file accesses the API. What makes this more secure than simply allowing me to hit the API directly with ajax?
For that matter, it looks like using JSONP http://en.wikipedia.org/wiki/JSONP you can do everything that cross-domain ajax would let you do.
Could someone enlighten me?
I think you're misunderstanding the problem that the same-origin policy is trying to solve.
Imagine that I'm logged into Gmail, and that Gmail has a JSON resource, http://mail.google.com/information-about-current-user.js, with information about the logged-in user. This resource is presumably intended to be used by the Gmail user interface, but, if not for the same-origin policy, any site that I visited, and that suspected that I might be a Gmail user, could run an AJAX request to get that resource as me, and retrieve information about me, without Gmail being able to do very much about it.
So the same-origin policy is not to protect your PHP page from the third-party site; and it's not to protect someone visiting your PHP page from the third-party site; rather, it's to protect someone visiting your PHP page, and any third-party sites to which they have special access, from your PHP page. (The "special access" can be because of cookies, or HTTP AUTH, or an IP address whitelist, or simply being on the right network — perhaps someone works at the NSA and is visiting your site, that doesn't mean you should be able to trigger a data-dump from an NSA internal page.)
JSONP circumvents this in a safe way, by introducing a different limitation: it only works if the resource is JSONP. So if Gmail wants a given JSON resource to be usable by third parties, it can support JSONP for that resource, but if it only wants that resource to be usable by its own user interface, it can support only plain JSON.
Many web services are not built to resist XSRF, so if a web-site can programmatically load user data via a request that carries cross-domain cookies just by virtue of the user having visited the site, anyone with the ability to run javascript can steal user data.
CORS is a planned secure alternative to XHR that solves the problem by not carrying credentials by default. The CORS spec explains the problem:
User agents commonly apply same-origin restrictions to network requests. These restrictions prevent a client-side Web application running from one origin from obtaining data retrieved from another origin, and also limit unsafe HTTP requests that can be automatically launched toward destinations that differ from the running application's origin.
In user agents that follow this pattern, network requests typically use ambient authentication and session management information, including HTTP authentication and cookie information.
EDIT:
The problem with just making XHR work cross-domain is that many web services expose ambient authority. Normally that authority is only available to code from the same origin.
This means that a user that trusts a web-site is trusting all the code from that website with their private data. The user trusts the server they send the data to, and any code loaded by pages served by that server. When the people behind a website and the libraries it loads are trustworthy, the user's trust is well-placed.
If XHR worked cross-origin, and carried cookies, that ambient authority would be available to code to anyone that can serve code to the user. The trust decisions that the user previously made may no longer be well-placed.
CORS doesn't inherit these problems because existing services don't expose ambient authority to CORS.
The pattern of JS->Server(PHP)->API makes it possible and not only best, but essential practice to sanity-check what you get while it passes through the server. In addition to that, things like poisened local resolvers (aka DNS Worms) etc. are much less likely on a server, than on some random client.
As for JSONP: This is not a walking stick, but a crutch. IMHO it could be seen as an exploit against a misfeature of the HTML/JS combo, that can't be removed without breaking existing code. Others might think different of this.
While JSONP allows you to unreflectedly execute code from somwhere in the bad wide world, nobody forces you to do so. Sane implementations of JSONP allways use some sort of hashing etc to verify, that the provider of that code is trustwirthy. Again others might think different.
With cross site scripting you would then have a web page that would be able to pull data from anywhere and then be able to run in the same context as your other data on the page and in theory have access to the cookie and other security information that you would not want access to be given too. Cross site scripting would be very insecure in this respect since you would be able to go to any page and if allowed the script on that page could just load data from anywhere and then start executing bad code hence the reason that it is not allowed.
JSONP on the otherhand allows you to get data in JSON format because you provide the necessary callback that the data is passed into hence it gives you the measure of control in that the data will not be executed by the browser unless the callback function does and exec or tries to execute it. The data will be in a JSON format that you can then do whatever you wish with, however it will not be executed hence it is safer and hence the reason it is allowed.
The original XHR was never designed to allow cross-origin requests. The reason was a tangible security vulnerability that is primarily known by CSRF attacks.
In this attack scenario, a third party site can force a victim’s user agent to send forged but valid and legitimate requests to the origin site. From the origin server perspective, such a forged request is not indiscernible from other requests by that user which were initiated by the origin server’s web pages. The reason for that is because it’s actually the user agent that sends these requests and it would also automatically include any credentials such as cookies, HTTP authentication, and even client-side SSL certificates.
Now such requests can be easily forged: Starting with simple GET requests by using <img src="…"> through to POST requests by using forms and submitting them automatically. This works as long as it’s predictable how to forge such valid requests.
But this is not the main reason to forbid cross-origin requests for XHR. Because, as shown above, there are ways to forge requests even without XHR and even without JavaScript. No, the main reason that XHR did not allow cross-origin requests is because it would be the JavaScript in the web page of the third party the response would be sent to. So it would not just be possible to send cross-origin requests but also to receive the response that can contain sensitive information that would then be accessible by the JavaScript.
That’s why the original XHR specification did not allow cross-origin requests. But as technology advances, there were reasonable requests for supporting cross-origin requests. That’s why the original XHR specification was extended to XHR level 2 (XHR and XHR level 2 are now merged) where the main extension is to support cross-origin requests under particular requirements that are specified as CORS. Now the server has the ability to check the origin of a request and is also able to restrict the set of allowed origins as well as the set of allowed HTTP methods and header fields.
Now to JSONP: To get the JSON response of a request in JavaScript and be able to process it, it would either need to be a same-origin request or, in case of a cross-origin request, your server and the user agent would need to support CORS (of which the latter is only supported by modern browsers). But to be able to work with any browser, JSONP was invented that is simply a valid JavaScript function call with the JSON as a parameter that can be loaded as an external JavaScript via <script> that, similar to <img>, is not restricted to same-origin requests. But as well as any other request, a JSONP request is also vulnerable to CSRF.
So to conclude it from the security point of view:
XHR is required to make requests for JSON resources to get their responses in JavaScript
XHR2/CORS is required to make cross-origin requests for JSON resources to get their responses in JavaScript
JSONP is a workaround to circumvent cross-origin requests with XHR
But also:
Forging requests is laughable easy, although forging valid and legitimate requests is harder (but often quite easy as well)
CSRF attacks are a not be underestimated threat, so learn how to protect against CSRF
A customer sometimes sends POST requests with Content-Length: 0 when submitting a form (10 to over 40 fields).
We tested it with different browsers and from different locations but couldn't reproduce the error. The customer is using Internet Explorer 7 and a proxy.
We asked them to let their system administrator see into the problem from their side. Running some tests without the proxy, etc..
In the meantime (half a year later and still no answer) I'm curious if somebody else knows of similar problems with a Content-Length: 0 request. Maybe from inside some Windows network with a special proxy for big companies.
Is there a known problem with Internet Explorer 7? With a proxy system? The Windows network itself?
Google only showed something in the context of NTLM (and such) authentication, but we aren't using this in the web application. Maybe it's in the way the proxy operates in the customer's network with Windows logins? (I'm no Windows expert. Just guessing.)
I have no further information about the infrastructure.
UPDATE: In December 2010 it was possible to inform one administrator about this, incl. links from the answers here. Contact was because of another problem which was caused by the proxy, too. No feedback since then. And the error messages are still there. I'm laughing to prevent me from crying.
UPDATE 2: This problem exists since mid 2008. Every few months the customer is annoyed and wants it to be fixed ASAP. We send them all the old e-mails again and ask them to contact their administrators to either fix it or run some further tests. In December 2010 we were able to send some information to 1 administrator. No feedback. Problem isn't fixed and we don't know if they even tried. And in May 2011 the customer writes again and wants this to be fixed. The same person who has all the information since 2008.
Thanks for all the answers. You helped a lot of people, as I can see from some comments here. Too bad the real world is this grotesque for me.
UPDATE 3: May 2012 and I was wondering why we hadn't received another demand to fix this (see UPDATE 2). Looked into the error protocol, which only reports this single error every time it happened (about 15 a day). It stopped end of January 2012. Nobody said anything. They must have done something with their network. Everything is OK now. From summer 2008 to January 2012. Too bad I can't tell you what they have done.
UPDATE 4: September 2015. The website had to collect some data and deliver it to the main website of the customer. There was an API with an account. Whenever there was a problem they contacted us, even if the problem was clearly on the other side. For a few weeks now we can't send them the data. The account isn't available anymore. They had a relaunch and I can't find the pages anymore that used the data of our site. The bug report isn't answered and nobody complaint. I guess they just ended this project.
UPDATE 5: March 2017. The API stopped working in the summer of 2015. The customer seems to continue paying for the site and is still accessing it in February 2017. I'm guessing they use it as an archive. They don't create or update any data anymore so this bug probably won't reemerge after the mysterious fix of January 2012. But this would be someone else's problem. I'm leaving.
Internet Explorer does not send form fields if they are posted from an authenticated site (NTLM) to a non-authenticated site (anonymous).
This is feature for challange-response situations (NTLM- or Kerberos- secured web sites) where IE can expect that the first POST request immediately leads to an HTTP 401 Authentication Required response (which includes a challenge), and only the second POST request (which includes the response to the challange) will actually be accepted. In these situations IE does not upload the possibly large request body with the first request for performance reasons. Thanks to EricLaw for posting that bit of information in the comments.
This behavior occurs every time an HTTP POST is made from a NTLM authenticated (i.e. Intranet) page to a non-authenticated (i.e. Internet) page, or if the non-authenticated page is part of a frameset, where the frameset page is authenticated.
The work-around is either to use a GET request as the form method, or to make sure the non-authenticated page is opened in a fresh tab/window (favorite/link target) without a partly authenticated frameset. As soon as the authentication model for the whole window is consistent, IE will start to send form contents again.
Definitely related: http://www.websina.com/bugzero/kb/browser-ie.html
Possibly related: KB923155
Full Explanation: IEInternals Blog – Challenge-Response Authentication and Zero-Length Posts
This is easy to reproduce with MS-IE and an NTLM authentication filter on server side. I have the same issue with JCIFS (1.2.), struts 1. and MS-IE 6/7 on XP-SP2. It was finally fixed. There are several workarounds to make it up.
change form method from POST (struts default setting) to GET.
For most pages with small sized forms, it works well. Unfortunately i have possibly more than 50 records to send in HTTP stream back to server side. IE has a GET URL limit 2038 Bytes (not parameter length, but the whole URL length). So this is a quick workaround but not applicable for me.
send a GET before POST action executing.
This was recommended in MS-KB. My project has many legacy procedures and i would not take the risk at the right time. I have never tried this because it still needs some extra authentication processing when GET is received by filter layer based on my understanding from MS-KB and I would not like to change the behavior with other browsers, e.g. Firefox, Opera.
detecting if POST was sent with zero content-length (you may get it from header properties hash structure with your framework).
If so, trigger an NTLM authentication cycle by get challenge code from DC or cache and expect an NTLM response.
When the NTLM type2 msg is received and the session is still valid, you don't really need to authenticate the user but just forward it to the expected action if POST content-length is not zero. BTW, this would increase the network traffics. So check your cache life time setting and SMB session soTimeOut configuration before applying the change plz.
Or, more simple, you may just send a 401-unauthorized status to MS-IE and the browser shall send back POST request with data in reply.
MS-KB has provided a hot-fix with KB-923155 (I could not post more than one link because of a low reputation number :{ ) , but it seems not working. Would someone post a workable hot-fix here? Thanks :) Here is a link for reference, http://www.websina.com/bugzero/kb/browser-ie.html
We have a customer on our system with exactly the same problem. We've pin pointed it down to the proxy/firewall. Microsoft's IAS. It's stripping the POST body and sending content-length: 0. Not a lot we can do to work around however, and down want to use GET requests as this exposes usernames/passwords etc on the URL string. There's nearly 7,000 users on our system and only one with the problem... also only one using Microsoft IAS, so it has to be this.
There's a good chance the problem is that the proxy server in between implements HTTP 1.0.
In HTTP 1.0 you must use the Content-Length header field: (See section 10.4 here)
A valid Content-Length is required on
all HTTP/1.0 POST requests. An
HTTP/1.0 server should respond with a
400 (bad request) message if it cannot
determine the length of the request
message's content.
The request going into the proxy is HTTP 1.1 and therefore does not need to use the Content-Length header field. The Content-Length header is usually used but not always. See the following excerpt from the HTTP 1.1 RFC S. 14.13.
Applications SHOULD use this field to
indicate the transfer-length of the
message-body, unless this is
prohibited by the rules in section
4.4.
Any Content-Length greater than or
equal to zero is a valid value.
Section 4.4 describes how to determine
the length of a message-body if a
Content-Length is not given.
So the proxy server does not see the Content-Length header, which it assumes is absolutely needed in HTTP 1.0 if there is a body. So it assumes 0 so that the request will eventually reach the server. Remember the proxy doesn't know the rules of the HTTP 1.1 spec, so it doesn't know how to handle the situation when there is no Content-Length header.
Are you 100% sure your request is specifying the Content-Length header? If it is using another means as defined in section 4.4 because it thinks the server is 1.1 (because it doesn't know about the 1.0 proxy in between) then you will have your described problem.
Perhaps you can use HTTP GET instead to bypass the problem.
This is a known problem for Internet explorer 6, but not for 7 that I know of. You can install this fix for the IE6 KB831167 fix.
You can read more about it here.
Some questions for you:
Do you know which type of proxy?
Do you know if there is an actual body sent in the request?
Does it happen consistently every time? Or only sometimes?
Is there any binary data sent in the request? Maybe the data starts with a \0 and the proxy has a bug with binary data.
If the user is going through an ISA proxy that uses NTLM authentication, then it sounds like this issue, which has a solution provided (a patch to the ISA proxy)
http://support.microsoft.com/kb/942638POST requests that do not have a POST body may be sent to a Web server that is published in ISA Server 2006
I also had a problem where requests from a customer's IE 11 browser had Content-Length: 0 and did not include the expected POST content. When the customer used Firefox, or Chrome the expected content was included in the request.
I worked out the cause was the customer was using a HTTP URL instead of a HTTPS URL (e.g. http://..., not https://...) and our application uses HSTS. It seems there might be a bug in IE 11 that when a request gets upgraded to HTTPS due to HSTS the request content gets lost.
Getting the customer to correct the URL to https://... resulted in the content being included in the POST request and resolved the problem.
I haven't investigated whether it is actually a bug in IE 11 any further at this stage.
Are you sure these requests are coming from a "customer"?
I've had this issue with bots before; they sometimes probe sites for "contact us" forms by sending blank POST requests based on the action URI in FORM tags they discover during crawling.
Presence and possible values of the ContentLength header in HTTP are described in the HTTP ( I assume 1/1) RFC:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13
In HTTP, it SHOULD be sent whenever the message's length can be determined prior to being transferred
See also:
If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4
Maybe your message is carrying a Transfer-Encoding header?
Later edit: also please note "SHOULD" as used in the RFC is very important and not equivalent to "MUST":
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
Ref: http://www.ietf.org/rfc/rfc2119.txt
We had a customer using same website in anonymous and NTLM mode (on different ports). We found out that in our case the 401 was related to Riverbed Steelhead application used for http optimization. The first signal pointing us into that direction was a X-RBT-Optimized-By header. The issue was the Gratuitous 401 feature:
This feature can be used with both per-request and per-connection
authentication but it‘s most effective when used with per-request
authentication. With per-request authentication, every request must be
authenticated against the server before the server would serve the
object to the client. However, most browsers do not cache the server‘s
response requiring authentication and hence it will waste one
round-trip for every GET request. With Gratuitous 401, the client-side
Steelhead appliance will cache the server response and when the client
sends the GET request without any authentication headers, it will
locally respond with a ―401 Unauthorized‖ message and therefore saving
a round trip. Note that the HTTP module does not participate in the
actual authentication itself. What the HTTP module does is to inform
the client that the server requires authentication without requiring
it to waste one round trip.
Google also shows this as an IE (some versions, anyway) bug after an https connection hits the keepalive timeout and reconnects to the server. The solution seems to be configuring the server to not use keepalive for IE under https.
Microsoft's hotfix for KB821814 can set Content-Length to 0:
The hotfix that this article describes implements a code change in Wininet.dll to:
Detect the RESET condition on a POST request.
Save the data that is to be posted.
Retry the POST request with the content length set to 0. This prevents the reset from occurring and permits the authentication process to complete.
Retry the original POST request.
curl sends PUT/POST requests with Content-Length: 0 when configured to use HTTP proxy. It's trick to overcome required buffering in case of first unauthorized PUT/POST request to proxy. In case of GET/HEAD requests curl simply repeats the query. The scheme for PUT/POST is like:
Send first PUT/POST request with Content-Length set to 0.
Get answer. HTTP status code of 407 means we have to use proxy
authorization. Prepare headers for proxy authentication for send request.
Send request again with filled headers for proxy authentication and real data to POST/PUT.