When you're passing variables through your site using GET requests, do you validate (regular expressions, filters, etc.) them before you use them?
Say you have the URL http://www.example.com/i=45&p=custform. You know that "i" will always be an integer and "p" will always contain only letters and/or numbers. Is it worth the time to make sure that no one has attempted to manipulate the values and then resubmit the page?
Yes. Without a doubt. Never trust user input.
To improve the user experience, input fields can (and IMHO should) be validated on the client. This can pre-empt a round trip to the server that only leads to the same form and an error message.
However, input must always be validated on the server side since the user can just change the input data manually in the GET url or send crafted POST data.
In a worst case scenario you can end up with an SQL injection, or even worse, a XSS vulnerability.
Most frameworks already have some builtin way to clean the input, but even without this it's usually very easy to clean the input using a combination of regular exceptions and lookup tables.
Say you know it's an integer, use int.Parse or match it against the regex "^\d+$".
If it's a string and the choices are limited, make a dictionary and run the string through it. If you don't get a match change the string to a default.
If it's a user specified string, match it against a strict regex like "^\w+$"
As with any user input it is extremely important to check to make sure it is what you expect it is. So yes!
Yes, yes, and thrice yes.
Many web frameworks will do this for you of course, e.g., Struts 2.
One important reason is to check for sql injection.
So yes, always sanitize user input.
not just what the others are saying. Imagine a querystring variable called nc, which can be seen to have values of 10, 50 and 100 when the user selects 10, 50 and 100 results per page respectively. Now imagine someone changing this to 50000. If you are just checking that to be an integer, you will be showing 50000 results per page, affecting your pageviews, server loads, script times and so on. Plus this could be your entire database. When you have such rules (10, 50 or 100 results per page), you should additionaly check to see if the value of nr is 10, 50 or 100 only, and if not, set it to a default. This can simply be a min(nc, 100), so it will work if nc is changed to 25, 75 and so on, but will default to 100 if it sees anything above 100.
I want to stress how important this is. I know the first answer discussed SQL Injection and XSS Vulnerabilities. The latest rave in SQL Injection is passing a binary encoded SQL statement in query strings, which if it finds a SQL injection hole, it will add a http://reallybadsite.com'/> to every text field in your database.
As web developers we have to validate all input, and clean all the output.
Remember a hacker isn't going to use IE to compromise your site, so you can't rely on any validation in the web.
Yes, check them as thoroughly as you can. In PHP I always check the types (IsInt(i), IsString(p)).
Related
I often encounter advice for protecting a web application against a number of vulnerabilities, like SQL injection and other types of injection, by doing input validation.
It's sometimes even said to be the single most important technique.
Personally, I feel that input validation for security reasons is never necessary and better replaced with
if possible, not mixing user input with a programming language at all (e.g. using parameterized SQL statements instead of concatenating input in the query strings)
escaping user input before mixing it with a programming or markup language (e.g. html escaping, javascript escaping, ...)
Of course for a good UX it's best to catch input that would generate errors on the backand early in the GUI, but that's another matter.
Am I missing something or is the only purpose to try to make up for mistakes against the above two rules?
Yes you are generally correct.
A piece of data is only dangerous when "used". And it is only dangerous if it has special meaning in the context it is used.
For example, <script> is only dangerous if used in output to an HTML page.
Robert'); DROP TABLE Students;-- is only dangerous when used in a database query.
Generally, you want to make this data "safe" as late as possible. Such as HTML encoding when output as HTML to an HTML page, and parameterised when inserting into a database. The big advantage of this is that when the data is later retrieved from these locations, it will be returned in its original, unsanitized format.
So if you have the value A&B O'Leary in an input field, it would be encoded like so:
<input type="hidden" value="A& O'Leary" />
and if this is submitted to your application, your programming framework will automatically decode it for you back to A&B O'Leary. Same with your DB:
string name = "A&B O'Leary";
string sql = "INSERT INTO Customers (Name) VALUES (#Name)";
SqlCommand command = new SqlCommand(sql);
command.Parameters.Add("#Name", name];
Simples.
Additionally if you then need to give the user any output in plain text, you should retrieve it from your DB and spit it out. Or in JavaScript - you just JavaScript entity encode (although best avoided for complexity reasons - I find it easier to secure if I only output to HTML then read the values from the DOM).
If you'd HTML encoded it early, then to output to JavaScript/JSON you'd first have to convert it back then hex entity encode it. It will get messy and some developers will forget they have to decode first and you will have &s everywhere.
You can use validation as an additional defence, but it should not be the first port of call. For example, if you are validating a UK postcode you would want to whitelist the alphanumeric characters in upper and lower cases. Any other characters would be rejected or removed by your application. This can reduce the chances of SQLi or XSS occurring on your application, but this method falls down where you need inputs to include characters that have special meaning to your output context (" '<> etc). For example, on Stack Overflow if they did not allow characters such as these you would be preventing questions and answers from including code snippets which would pretty much make the site useless.
Not all SQL statements are parameterizable. For example, if you need to use dynamic identifiers (as opposed to literals). Even whitelisting can be hard, sometimes it needs to be dynamic.
Escaping XSS on output is a good idea. Until you forget to escape it on your admin dashboard too and they steal all your admin's cookies. Don't let XSS in your database.
Am wondering if the combination of trim(), strip_tags() and addslashes() is enough to filter values of variables from $_GET and $_POST
That depends what kind of validation you are wanting to perform.
Here are some basic examples:
If the data is going to be used in MySQL queries make sure to use mysql_real_escape_query() on the data instead of addslashes().
If it contains file paths be sure to remove the "../" parts and block access to sensitive filename.
If you are going to display the data on a web page, make sure to use htmlspecialchars() on it.
But the most important validation is only accepting the values you are expecting, in other words: only allow numeric values when you are expecting numbers, etc.
Short answer: no.
Long answer: it depends.
Basically you can't say that a certain amount of filtering is or isn't sufficient without considering what you want to do with it. For example, the above will allow through "javascript:dostuff();", which might be OK or it might not if you happen to use one of those GET or POST values in the href attribute of a link.
Likewise you might have a rich text area where users can edit so stripping tags out of that doesn't exactly make sense.
I guess what I'm trying to say is that there is simple set of steps to sanitizing your data such that you can cross it off and say "done". You always have to consider what that data is doing.
It highly depends where you are going to use it for.
If you are going to display things as HTML, make absolutely sure you are properly specifying the encoding (e.g.: UTF-8). As long as you strip all tags, you should be fine.
For use in SQL queries, addslashes is not enough! If you use the mysqli library for example, you want to look at mysql::real_escape_string. For other DB libraries, use the designated escape function!
If you are going to use the string in javascript, addslashes will not be enough.
If you are paranoid about browser bugs, check out the OWASP Reform library
If you use the data in another context than HTML, other escaping techniques apply.
I have received requirements that ask to normalize text box content when the user changes the focus to another control on the same data input form. Example normalizations:
whitespace at the start and end of the input is trimmed
If the text box was made empty and this is not valid, replace the content of the text box with the default value
I have a feeling that this is not in line with good GUI design. I have read the Windows UX Guidelines for text boxes but I did not immediately find any relevant rules.
Is normalizing text box content in this way acceptable?
I have definitely seen this before (examples elude me right now) but I personally don't like it when the UI changes my input.
If the UI is smart enough to change my input on me then it should accept it as is and change the value when it needs to process it.
When the input changes auto-magically you are now forcing the user to stop and ask themselves why it changed and if they did something wrong or if the application has an error. Don't make the user think!
Generally, you should accept user input exactly has they entered it. Chances are users did it that way for a good reason. For example, imagine a user entering a foreign address, and then your app screws it up trying to format like a domestic address. At the very least, users entered the input the way they’re used to it being, so changing it can make it hard for them to cross-check it.
However, there are several exceptions:
Add defaults to incomplete input. Adding input the user left off (e.g., years to dates, units to dimensions) provides good feedback on how the app is interpreting the input that would otherwise be ambiguous. This also encourages the user to use defaults, making their input more efficient.
Resolve other ambiguities. Change to an unambiguous format if the user’s format is open to interpretation. For example, if you have international users, you may want to change “9-8-09” to “Sep 8 2009” (or “9 Aug 2009”) to provide feedback on what your app considers the month and day to be.
Add delimiters when none provided. Automagically adding standard or even arbitrary delimiters to long alphanumeric strings (e.g., phone numbers, credit card numbers, serial numbers) provides an input display that the users can crosscheck more easily. Sometimes users may enter a string without delimiters in order to go faster or because they are the victim of web abuse by sites that refuse to accept even standard delimiters.
Spelling, grammar, and capitalization correction. Users often appreciate this, but only if there’s also a means to override it. Some users like to use "i" as the first person pronoun.
If the field is used by more than one user, then you probably should automatically format the value in some standard way that accommodates the majority of your users, but that should be done when the value is stored on the backend, not when focus leaves the field. For example, if a user enters a time of 15:30 it should remain as 15:30 as long as the user views the page. However, the next time a user (any user) retrieves the data, it should appear as 3:30pm (if that’s how most of your users are used to seeing time).
Such backend formatting applies to trimming whitespace so that all users can search, find, and sort on the field consistently. It’s probably not a good idea to replace a blank value (or any invalid value) with the default because users are unlikely to anticipate getting that value. An exception would perhaps be changing blank to 0 for numeric fields in situations where obviously blank == none == zero, but again this probably should be done when storing in the backend, not in the field itself. If blank is ambiguous, (e.g., may mean 0 or may mean "I don't know") then the second bullet above applies, and you may want to autocorrect in the field when focus is lost.
Of course, if your users vary in how they need to have a data type formatted, then you can have different variants of the app that display the data type in different ways for different user groups, or you can make the format of the data type a user preference, but that’s really another issue.
If the user wants it, and the Stakeholder ask for it, then is perfectly safe.
Trimming is very common. and the replace is common when you are talking about filling textbox with numbers. (a 0 instead of a blank).
It's a fairly standard feature, especially the whitespace trimming. The default value replacement raises a larger flag just because it is less common.
I'm pretty sure that I've seen versions of Microsoft Office that do this - putting "pt." after a value in points, for instance. Microsoft's endorsement should be a good sign.
We have quite a few of these kind of requirement. The reason given for forcing a default value rather than a blank space is that it looks better in reports or if the client wants to see the live system. A blank looks a bit like "couldn't be bothered to enter anything". For a similar reason, we often upper-case the text for consistency as the users never use consistent formatting.
I made a report with about 30 different rectangles and textboxes that have different visibility expressions depending on the parameters. (It's a student invoice and many different messages have to appear depending on the semester) When I made all the expressions I coded in the parameters in all upper case. Now I have a problem when users enter lowercase letters, the SQL all works fine since it is not case sensitive, but the different rectangles and textboxes don't show. Is there a way in the report code to first capitalize all the parameters before running the SQL? Or do I actually have to go back to every visibility expression and add separate iif's for upper and lower case? (That seems incredibly silly to have to do). I can't change my parameters to numbers because I have been given strict requirements for input. Thanks.
I do not know if this is the most elegant solution, but you could accomplish this by following this procedure for every parameter on the Report Parameters page:
1)Re-name the parameter, leaving its prompt as that of the old parameter.
2)Add a new parameter with the same name as the old parameter.
3)Mark this new parameter as Hidden.
4)Make sure that the new parameter's available values are marked as non-queried(available values will never be actually used.)
5)Mark the Default Values as Non-queried, using the following syntax:
=ucase(Parameters!OldParameterName.Value)
Can't you just UCASE the params (do it in the xml view, it will be quicker and you might even be able to do a regex find/replace)
What are the best ways (or at least most common ways) in ASP (VBScript) for input handling? My main concerns are HTML/JavaScript injections & SQL injections. Is there some equivalent to PHP's htmlspecialchars or addslashes, et cetera? Or do I have to do it manually with something like string replace functions?
The bottom line is this:
Always HTML-encode user input before you write it to your page. Server.HTMLEncode() does that for you.
Always use parameterized queries to interface with a database. The ÀDODB.Command and ADODB.CommandParameter objects are the right choice here.
Always use the URLScan utility and IIS lockdown on the IIS server that renders the page, unless they are version 6 and up, which do not require these tools anymore.
If you stick to points 1 and 2 slavishly, I can't think of much that can go wrong.
Most vulnerabilities come from not properly encoding user input or building SQL strings from it. If you for some reason come to the point where HTML-encoding user input stands in your way, you have found a design flaw in your application.
I would add to Tomalaks list one other point.
Avoid using concatenation of field values in SQL code. That is, in some cases a stored procedure may build some SQL in a string to subsequently execute. This is fine unless a textual field value is used as part of its construction.
A command parameter can protect SQL code designed to input a value from being hijacked into executing unwanted SQL but it allows such unwanted SQL to become data in the database. This is a first-level vunerability. A second-level injection vunerability exists if the field's value is then used in some SQL string concatenation inside a stored procedure.
Another consideration is that this is just minimal protection. All its doing is rendering attack attempts harmless. However in many cases it may be better to add to this a system which prevents such data entry altogther and/or alters admins to a potential injection attack.
This is where input validation becomes important. I don't know of any tools that do this for you but a few simple Regular Expressions might help. For example, "<\w+" would detect the attempt to include a HTML tag in the field.