How can I validate HTML input to prevent XSS? - asp.net-mvc-3

For example, StackExchange whitelists a subset of HTML:
https://meta.stackexchange.com/questions/1777/what-html-tags-are-allowed-on-stack-exchange-sites
How could you do that in your controller to make sure user input is safe?

This approach is not identical to StackExchange, but I found the AntiXSS 4.x library to a simple way to sanitize the input to allow "safe" HTML.
http://www.microsoft.com/en-us/download/details.aspx?id=28589 You can download a version here, but I linked it for the useful DOCX file. My preferred method is to use the NuGet package manager to get the latest AntiXSS package.
You can use the HtmlSanitizationLibrary assembly found in the 4.x AntiXss library. Note that GetSafeHtml() is in the HtmlSanitizationLibrary, under Microsoft.Security.Application.Sanitizer.
content = Sanitizer.GetSafeHtml(userInput);
This can be done before saving to the database. The advantage is removing malicious content immediately, and not having to worry about it when you output it. The disadvantage is that it won't handle any existing database content, and you do have to apply this any time you're making database updates.
The alternate approach is to use this method every time you output content.
I'd love to hear what the preferred approach is.

You can try JSoup parser which along with sanitizing your HTML input will also provide many functionalities out of the box.
You can visit http://jsoup.org/ for more details on the JSoup and download the binary from there.
It provides DOM method to traverse through your HTML tree and get desired elements.
Although sanitizing your HTML generated code to prevent XSS attack is a goodd practice, but I would strongly advise to avoid using any parser to avoid XSS attach by sanitizing your HTML input.
If your HTML tree is very big then the response time would increase manifold.Instaed of sanitizing your HTML tree you should ensure that whatever user is entering in the FORM is proper and as per the expected value.
You can visit www.owasp.org to know more about how to avoid XSS attack.The site provides you possible cheat sheets to ensure your HTML tree is free from any XSS attack.

ASP.NET HttpUtility.Htmlencode() makes it for you.
But if you want to block dangerous scripts, first DO NOT insert it to your database. First, clean the HTML Text before inserting to database.
I found a class that do it for you: http://eksith.wordpress.com/2012/02/13/antixss-4-2-breaks-everything/
It works fine and you can add new tags and attributes to custom whitelist of the Sanitizer.
Note: Microsoft Sanitizer and Anti-XSS Library was not useful for me. May be you can also try them.

Related

ajax call vulnerable to xss attack

I have a simple web application in which I make a call to a java servlet using ajax from a jsp page (via post). In the servlet I take data from the database and formulate a JSON and retreive in the jsp page . I then use eval function to parse the json and display the data in the division using the innerHTML property . Somehow, this approach seems to be vulnerable to xss attacks . Can someone provide some pointers on how XSS attck can be prevented in this use case?
This sounds like DOM Based XSS. There are a few ways of preventing DOM Based XSS. Either you have to html encode the data on the server or the client. HTML encoding data in the database should always be avoided because it changes the value of the data and will affect how the data is sorted, ect. XSS is an output problem so it should be solved by the code that is building the HTML, which in your case is JavaScript.
Newer browsers support JSON.parse().For older browsers use json2.js.
You should also properly encode the JSON so values cannot break out of quotes etc. Find a decent json encoder and use that on the server side.

What is the most efficient way to write headers and footers, 'global' header/footer or 'local' ones?

I'm about to start coding a website, and because this is my first time writing a code for a webpage, there is something I've been wondering about.
Writing separate header.php and footer.php is probably the fastest and easiet way to do stuff.
The problem is, what if for some pages I'd like to use specific javascript files and codes and for some I would like to use others?
It would result in more HTTP request and will eventually impact the performance of the site.
I thought about using if statements in the header and just give every page exactly what it needs, and nothing more.
But which way is more efficient?:
Coding global header.php and footer.php files and separating the codes using if statements OR add the whole header+footer code to every single file (ie local header/footer)?
P.S global and local header/footer is something I just made up, didn't really know how to call it, lol.
The advantage of your "global" header and footer is that 1) they are consistent and changes are "global" and 2) they are included in the pages in server code. So there isn't a lot of HTTP traffic if you do the include on the server side.
You can (and should) do page-specific includes on the server side if at all possible using logic that determines what to load at the time of the Request.
There are other ways to accomplish this but with straight up PHP, what you are considering is the best way.
If you are using a framework like Yii, you can do this sort of thing in layouts but with simple PHP, you are on the right track.
Defining the header and footer in each page (local), causes you to repeat a lot of code and causes maintenance headaches going forward. You have a lot of pages to update for simple header/footer changes.

Why doesn't CodeIgniter's XSS filter clean all?

Why does CodeIgniter's XSS filter only react through regular expressions on specific things instead of sanitizing all input in the first place regardless if the content is tainted or not? Also, why is this done during input and not on output (like it's supposed to be?)
Why does CodeIgniter's XSS filter only react through regular expressions on specific things instead of sanitizing all input in the first place regardless if the content is tainted or not?
This doesn't make much sense. How are we to tell whether or not something is "tainted" without checking it first?
By the definition of CI's xss_clean(), we don't always want to sanitize input. As you mentioned, it's the output that matters - and that's where we need to be mindful of XSS atacks. If we always "sanitize" input with CI's xss_clean(), then how would I, for one example, be able to post javascript or PHP code examples on my blog, or let users do it in the comments? It would end up getting [removed].
Also, why is this done during input and not on output (like it's supposed to be?)
You do have the option to enable the global xss filter in your CI config, which will run xss_clean() on $_POST, $_GET, and $_COOKIE data automatically before you can get your hands on it. This is the lowest level possible to protect you from yourself, bu the option is always available to instead clean the data explicitly. For example:
// With the Input class on $_POST data
$this->input->post('username', TRUE); // Second parameter runs xss_clean
// Using the Security class on any data
$this->security->xss_clean($username);
// Using the Form Validation class to automatically clean the input
$this->form_validation->set_rules('username', '', 'xss_clean');
Since you could still simply use $_POST['username'] instead, by enabling the global filter it will already be xss_cleaned for you. This is the lazy way to do it, and unfortunately once those globals are cleaned, there's no way to undo it.
If you are already aware of when and where XSS attacks can happen - you have the function easily available to use if you wish. Keep in mind that this does not magically make all data "safe", it merely prevents some of the more malicious code injection. Something more harmless like </div> will get past this filter. You should always be sanitizing input explicitly in an appropriate way for the context in which it is used.

What does a Ajax call response like 'for (;;); { json data }' mean? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why do people put code like “throw 1; <dont be evil>” and “for(;;);” in front of json responses?
I found this kind of syntax being used on Facebook for Ajax calls. I'm confused on the for (;;); part in the beginning of response. What is it used for?
This is the call and response:
GET http://0.131.channel.facebook.com/x/1476579705/51033089/false/p_1524926084=0
Response:
for (;;);{"t":"continue"}
I suspect the primary reason it's there is control. It forces you to retrieve the data via Ajax, not via JSON-P or similar (which uses script tags, and so would fail because that for loop is infinite), and thus ensures that the Same Origin Policy kicks in. This lets them control what documents can issue calls to the API — specifically, only documents that have the same origin as that API call, or ones that Facebook specifically grants access to via CORS (on browsers that support CORS). So you have to request the data via a mechanism where the browser will enforce the SOP, and you have to know about that preface and remove it before deserializing the data.
So yeah, it's about controlling (useful) access to that data.
Facebook has a ton of developers working internally on a lot of projects, and it is very common for someone to make a minor mistake; whether it be something as simple and serious as failing to escape data inserted into an HTML or SQL template or something as intricate and subtle as using eval (sometimes inefficient and arguably insecure) or JSON.parse (a compliant but not universally implemented extension) instead of a "known good" JSON decoder, it is important to figure out ways to easily enforce best practices on this developer population.
To face this challenge, Facebook has recently been going "all out" with internal projects designed to gracefully enforce these best practices, and to be honest the only explanation that truly makes sense for this specific case is just that: someone internally decided that all JSON parsing should go through a single implementation in their core library, and the best way to enforce that is for every single API response to get for(;;); automatically tacked on the front.
In so doing, a developer can't be "lazy": they will notice immediately if they use eval(), wonder what is up, and then realize their mistake and use the approved JSON API.
The other answers being provided seem to all fall into one of two categories:
misunderstanding JSONP, or
misunderstanding "JSON hijacking".
Those in the first category rely on the idea that an attacker can somehow make a request "using JSONP" to an API that doesn't support it. JSONP is a protocol that must be supported on both the server and the client: it requires the server to return something akin to myFunction({"t":"continue"}) such that the result is passed to a local function. You can't just "use JSONP" by accident.
Those in the second category are citing a very real vulnerability that has been described allowing a cross-site request forgery via tags to APIs that do not use JSONP (such as this one), allowing a form of "JSON hijacking". This is done by changing the Array/Object constructor, which allows one to access the information being returned from the server without a wrapping function.
However, that is simply not possible in this case: the reason it works at all is that a bare array (one possible result of many JSON APIs, such as the famous Gmail example) is a valid expression statement, which is not true of a bare object.
In fact, the syntax for objects defined by JSON (which includes quotation marks around the field names, as seen in this example) conflicts with the syntax for blocks, and therefore cannot be used at the top-level of a script.
js> {"t":"continue"}
typein:2: SyntaxError: invalid label:
typein:2: {"t":"continue"}
typein:2: ....^
For this example to be exploitable by way of Object() constructor remapping, it would require the API to have instead returned the object inside of a set of parentheses, making it valid JavaScript (but then not valid JSON).
js> ({"t":"continue"})
[object Object]
Now, it could be that this for(;;); prefix trick is only "accidentally" showing up in this example, and is in fact being returned by other internal Facebook APIs that are returning arrays; but in this case that should really be noted, as that would then be the "real" cause for why for(;;); is appearing in this specific snippet.
Well the for(;;); is an infinite loop (you can use Chrome's JavaScript console to run that code in a tab if you want, and then watch the CPU-usage in the task manager go through the roof until the browser kills the tab).
So I suspect that maybe it is being put there to frustrate anyone attempting to parse the response using eval or any other technique that executes the returned data.
To explain further, it used to be fairly commonplace to parse a bit of JSON-formatted data using JavaScript's eval() function, by doing something like:
var parsedJson = eval('(' + jsonString + ')');
...this is considered unsafe, however, as if for some reason your JSON-formatted data contains executable JavaScript code instead of (or in addition to) JSON-formatted data then that code will be executed by the eval(). This means that if you are talking with an untrusted server, or if someone compromises a trusted server, then they can run arbitrary code on your page.
Because of this, using things like eval() to parse JSON-formatted data is generally frowned upon, and the for(;;); statement in the Facebook JSON will prevent people from parsing the data that way. Anyone that tries will get an infinite loop. So essentially, it's like Facebook is trying to enforce that people work with its API in a way that doesn't leave them vulnerable to future exploits that try to hijack the Facebook API to use as a vector.
I'm a bit late and T.J. has basically solved the mystery, but I thought I'd share a great paper on this particular topic that has good examples and provides deeper insight into this mechanism.
These infinite loops are a countermeasure against "Javascript hijacking", a type of attack that gained public attention with an attack on Gmail that was published by Jeremiah Grossman.
The idea is as simple as beautiful: A lot of users tend to be logged in permanently in Gmail or Facebook. So what you do is you set up a site and in your malicious site's Javascript you override the object or array constructor:
function Object() {
//Make an Ajax request to your malicious site exposing the object data
}
then you include a <script> tag in that site such as
<script src="http://www.example.com/object.json"></script>
And finally you can read all about the JSON objects in your malicious server's logs.
As promised, the link to the paper.
This looks like a hack to prevent a CSRF attack. There are browser-specific ways to hook into object creation, so a malicious website could use do that first, and then have the following:
<script src="http://0.131.channel.facebook.com/x/1476579705/51033089/false/p_1524926084=0" />
If there weren't an infinite loop before the JSON, an object would be created, since JSON can be eval()ed as javascript, and the hooks would detect it and sniff the object members.
Now if you visit that site from a browser, while logged into Facebook, it can get at your data as if it were you, and then send it back to its own server via e.g., an AJAX or javascript post.

Is the concept of a link inseparable from its html markup?

I'm looking for a strategy for managing links within articles. The body of the article is saved in a database and pulled during page assembly. What all should be saved in the database to easily define and manage links?
Some purists believe that markup should NEVER be stored in the database. Some believe its ok in moderation. But to me, the notion of a link is almost inseparable from its html markup.
Is there a better, more succinct way of representing a link in an article (in a database) than simply embedding "anchor text"?
One idea I've kicked around involves embedding just enough markup to semantically describe areas of interest, and in a different table, map those notions to actual URLs. All encounters of a particular notion get wrapped with the link.
<p>Here is an example of a
<span class="external-reference semantic-web">semantic</span>
approach to link management.</p>
A table then might associate the URL of the article and the key class of 'semantic-web' to a URL like http://en.wikipedia.org/wiki/Semantic_Web
<p>Here is an example of a <span class="external-reference semantic-web">
semantic</span>
approach to link management.</p>
Things I like about this approach is that all my URLs are in one location in the database. I could technically change or remove links without touching the body of the article. I have very good class names for CSS.
I don't like having another table to maintain, and another step/phase in render time. It could slow down response time.
Are there any other strategies out there that provide superior link management?
You may want to look at templating (such as Smarty for PHP).
I agree that markup shouldn't normally be held in the database.
However, you might also consider implementing a "pointer" concept, where at each link, you break your storage of the page, add a pointer in the table to the link, then a pointer in the link table to the next segment of content for the page. (I have no idea how complicated that would be - just an idea.)
Or look at how various CMS tools handle the idea. Some just put everything in the database as one big block of text, while others rely on templating, and others may do something else entirely (like object-oriented environments such as Plone).
There are a few attempts to do this that I have seen.
One way to do this is through URL redirects. You can implement a logic component on the server that will interpretate what the URL is requesting rather than a path to the content.
Another attempt is that the links orginally set to a reference value [which can be looked up in a database], and is requested at runtime/generation.
Regardless, you will have to reference the material that you wish to link to with some sort of identifier.

Resources