Should I allow underscores in first and last name? - jquery-validate

We have a form that has fields for first and last name. I was asked to allow underscores. I don't know of any sql injection that uses underscores, but I also don't know of anyone with an underscore in their name. Is there a good reason to allow or not allow underscores in names?
EDIT: I'm using parameters and server side validation. This is for client side validation via the jQuery validation plugin.
EDIT 2: I didn't mean for this to become a discussion on whether or not I should do any validation...I just wanted to know know if there was any compelling reason to accept underscores, like I should accept Irish people or hyphens. Based on that, I'm accepting Oren's answer.

You should be as liberal as possible in what you allow as a name. There is no good reason to disallow an underscore, so why do it? There are many horror stories of people who try to utilize software that disallows their actual name. Have a look at Falsehoods Programmers Believe About Names for assumptions you should not make.

DO NOT PREVENT SQL INJECTION USING WHITELISTS!
Have you come across an O'Neill yet?
Instead, use parameters.
I will admit, though, that whitelists will work better than blacklists
Re: EDIT:
You should not do such validation at all.
If your server-side code can handle it, there's nothing wrong with the name --'!#--_.
If your server-side code cannot handle it, it should.

You're doing your validation wrong. When preventing sql injection, just use placeholders or your database library's escape function to escape the data. What characters you use in the name doesn't matter then.

You'll need to allow apostrophes and hyphens (O'Reilly, Double-Barrel). Never heard of an underscore in a name though.

Ideally, you should be able to allow any characters and not have a problem with SQL injection because you are using parameterized queries etc.
Do you disallow '? How do you think Mr O'Reilly likes that?

If you prevent underscores with the assumption that we are not aware of names with underscores, would you do the same for the other dozens (hundreds) of other "special characters"?
Unless there is some reason to block underscores, I would leave it up to the user to be able to enter their name as they want.

Related

Validating FirstName in a web application

I do not want to be too strict as there may be thousands of possible characters in a possible first name
Normal english alphabets, accented letters, non english letters, numbers(??), common punctuation synbols
e.g.
D'souza
D'Anza
M.D. Shah (dots and space)
Al-Rashid
Jatin "Tom" Shah
However, I do not want to except HTML tags, semicolons etc
Is there a list of such characters which is absolutely bad from a web application perspective
I can then use RegEx to blacklist these characters
Background on my application
It is a Java Servlet-JSP based web app.
Tomcat on Linux with MySQL (and sometimes MongoDB) as a backend
What I have tried so far
String regex = "[^<>~##$%;]*";
if(!fname.matches(regex))
throw new InputValidationException("Invalid FirstName")
My question is more on the design than coding ... I am looking for a exhaustive (well to a good degree of exhaustiveness) list of characters that I should blacklist
A better approach is to accept anything anyone wants to enter and then escape any problematic characters in the context where they might cause a problem.
For instance, there's no reason to prohibit people from using <i> in their names (although it might be highly unlikely that it's a legit name), and it only poses a potential problem (XSS) when you are generating HTML for your users. Similarly, disallowing quotes, semi-colons, etc. only make sense in other scenarios (SQL queries, etc.). If the rules are different in different places and you want to sanitize input, then you need all the rules in the same place (what about whitespace? Are you gong to create filenames including the user's first name? If so, maybe you'll have to add that to the blacklist).
Assume that you are going to get it wrong in at least one case: maybe there is something you haven't considered for your first implementation, so you go back and add the new item(s) to your blacklist. You still have users who have already registered with tainted data. So, you can either run through your entire database sanitizing the data (which could take a very very long time), or you can just do what you really have to do anyway: sanitize data as it is being presented for the current medium. That way, you only have to manage the sanitization at the relevant points (no need to protect HTML output from SQL injection attacks) and it will work for all your data, not just data you collect after you implement your blacklist.

Why should tags be space-delimited?

I'm working on a Web application that uses tags like Stackoverflow tags. I've noticed that a lot of sites that use tags make them space-delimited, which disallows a tags like...
"favorite recipes"
Instead they enforce this...
"favorite-recipes" | "favorite_recipes" | "FavoriteRecipes"
If the tags were comma-delimited, an item could have a set of tags like...
"cats, birds, favorite recipes, horses"
I have to decide on the policy for my app.
I guess I like the idea of space-delimited, but if my users aren't programmers they might be more comfortable with the familiar idea of commas denoting a list.
Why are comma-delimited tags unusual? Is there a major downside to them?
I think space delimited tags feel more like typing and you aren't syntactically bound to commas or pipes etc...
I also think that not allowing spaces in tags is kind of nice because you don't have to worry about turning them into URLs they are already ready to go.
Online tools like Delicious use space delimiting for tags, but other tools like Wordpress allow you to have spaces in tags so they require commas. If you do things this way, you will have to create similar tag "slugs" to make sure they work in a URL cleanly (i.e. my tag would be my-tag)
Remember, allowing too much freedom in your tag creation can result in some pretty crazy tags like "Things I like to do on a Saturday afternoon..." etc.
I think the idea is that a tag should be short and to the point.
If you go to type in "favorite recipes", when it doesn't work, you think to yourself "Oh, I should redo this." So you make "favorites" "recipes" instead.
If it was something you really needed, like "pork roast" then it makes sense to make it "pork-roast" but only after you've thought about really joining those works. Perhaps you shouldn't though - perhaps it should be "pork" "roast" so it shows up under searches for pork and searches for roast.
tl;dr It's for user experience and searching so they don't enter something that can't be easily searched for.
Yes, I also think that space-separated tags reduce complexity. And you can enter them faster, because you have the big space key to separate them. Maybe the brain is also less loaded, because it doesn't need to think about where to put a comma delimiter and where not.
I built an app with tags once, and used commas. The only down side to commas is that you have to do more thorough checking of empty tags. For instance in the example:
"George Bush, Bill Clinton, Barack Obama, "
If someone posts the tags with a comma at the end and there is a space it generally will get added to the database.
This is because if you simply check to delete once space you would turn Bill Clinton into BillClinton.
However you can make sure that the tag is at least a certain amount of characters to ensure there is not empty tags. This will not ensure that there is not three or four spaces in a row though.
Anther thing to note is that humans generally put spaces before words and after commas.
So in the example above there is a space after each comma and before each president. This space will be included in the database making the tag:
" Bill Clinton"
instead of
"Bill Clinton"
as the user probably attempted.
Once again you can eliminate both beginning and trailing spaces, but there is more server side code to implement in this case.
If you just use spaces then you can eliminate any unwanted characters like commas etc, and put the tags into an array using space characters to separate them.
I actually prefer tags that are comma or semi-colon delimited for two basic reasons:
It's more natural (user-centric not techie-centric)
It reduces redundancy, as the example noted "favorite-recipes"
"favorite_recipes" "FavoriteRecipes"
And I'm afraid some of the other answers sort of make it sound as if web developers are (how shall I say this) less willing to do the work than, for instance, database developers - who've been addressing some of these same issues for years. (I've coded for web and client-server databases.)

How to do SQL injection on Oracle

I'm doing an audit of a system, which the developers insist is SQL injection proof. This they achieve by stripping out the single-quotes in the login form - but the code behind is not parameterized; it's still using literal SQL like so:
username = username.Replace("'", "");
var sql = "select * from user where username = '" + username + "'";
Is this really secure? Is there another way of inserting a single quote, perhaps by using an escape character? The DB in use is Oracle 10g.
Maybe you can also fail them because not using bind variables will have a very negative impact on performance.
A few tips:
1- It is not necessarily the ' character that can be used as a quote. Try this:
select q'#Oracle's quote operator#' from dual;
2- Another tip from "Innocent Code" book says: Don't massage invalid input to make it valid (by escaping or removing). Read the relevant section of the book for some very interesting examples. Summary of rules are here.
Have a look at the testing guide here: http://www.owasp.org/index.php/Main_Page That should give you more devious test scenarios, perhaps enough to prompt a reassessment of the SQL coding strategy :-)
No, it is not secure. SQL injection doesn't require a single-quote character to succeed. You can use AND, OR, JOIN, etc to make it happen. For example, suppose a web application having a URL like this: http://www.example.com/news.php?id=100.
You can do many things if the ID parameter is not properly validated. For example, if its type is not checked, you could simply use this: ?id=100 AND INSERT INTO NEWS (id, ...) VALUES (...). The same is valid for JOIN, etc. I won't teach how to explore it because not all readers have good intentions like you appear to have. So, for those planning to use a simple REPLACE, be aware that this WILL NOT prevent an attack.
So, no one can have a name like O'Brian in their system?
The single quote check won't help if the parameter is numeric - then 1; DROP TABLE user;-- would cause some trouble =)
I wonder how they handle dates...
If the mechanism for executing queries got smart like PHP, and limited queries to only ever run one query, then there shouldn't be an issue with injection attacks...
What is the client language ? That is, we'd have to be sure exactly what datatype of username is and what the Replace method does in regard to that datatype. Also how the actual concatenation works for that datatype. There may be some character set translation that would translate some quote-like character in UTF-8 to a "regular" quote.
For the very simple example you show it should just work, but the performance will be awful (as per Thilo's comment). You'd need to look at the options for cursor_sharing
For this SQL
select * from user where username = '[blah]'
As long as [blah] didn't include a single quote, it should be interpreted as single CHAR value. If the string was more than 4000 bytes, it would raise an error and I'd be interested to see how that was handled. Similarly an empty string or one consisting solely of single quotes. Control characters (end-of-file, for example) might also give it some issues but that might depend on whether they can be entered at the front-end.
For a username, it would be legitimate to limit the characterset to alphanumerics, and possibly a limited set of punctuation (dot and underscore perhaps). So if you did take a character filtering approach, I'd prefer to see a whitelist of acceptable characters rather than blacklisting single quotes, control characters etc.
In summary, as a general approach it is poor security, and negatively impacts performance. In particular cases, it might (probably?) won't expose any vulnerabilities. But you'd want to do a LOT of testing to be sure it doesn't.

Are unescaped user names incompatible with BNF?

I've got a (proprietary) output from a software that I need to parse. Sadly, there are unescaped user names and I'm scratching my hairs trying to know if I can, or not, describe the files I need to parse using a BNF (or EBNF or ABNF).
The problem, oversimplified (it's really just an example), may look like this:
(data) ::= <username>
<username> ::= (other type of data)
And in some case, instead of appearing at the left or at the right, the username can also appear in the middle of a line.
The problem is that the username is unescaped and there are not enough restrictions on user names (they're printable ASCII, max 20 chars and they can't contain line break). So "=" would be a perfectly valid username, for example. And so would "= 1 = john = 2" (because user, at sign-on, where allowed to choose any user name they wanted and these appear unescaped in the output I've got).
I'm asking because my parser chocked on some very creative usernames (once again, not in my control, they're "weird" and I need to deal with it) and I cannot find an easy way to deal with this. Also note that I do not know in advance the user names (for example I don't have access to a database that would contain all the user names that the users created).
So are unrestricted and unescaped user names incompatibles with BNF?
P.S: be cool with me if I made mistakes, it's my first post on stackoverflow :)
BNF doesn't "care" for user names per-se. It works on the token level. If you define a username token, you can build describe a grammar using BNF based on it.
Your problem should be solved on the lexer level. The lexer should be smart enough to recognize user names, even when they're not escaped, and pass username tokens to the parser.
In theory you could describe all kinds of user names with a grammar, but this heavily depends on the other things in your language. Is = a valid token on its own right? How can you tell a username having = in it apart if it is? I think you'll have to describe the rest of the rules and valid tokens in your language to get a fuller answer here.
It might be possible to work by recognising things that are not usernames and then declaring everything else a username, even if this means parsing from right to left instead of left to right or doing something equally eccentric.
It may be worth looking to see if your input is actually ambiguous: can you find two different situations that lead to identical output being generated? If so, you need to go back and get requirements for which of them to favour, or what sort of error to produce, or whatever. If not, the reason why not might help you work out what your parser or lexer or whatever needs to do.

filtering user input in php

Am wondering if the combination of trim(), strip_tags() and addslashes() is enough to filter values of variables from $_GET and $_POST
That depends what kind of validation you are wanting to perform.
Here are some basic examples:
If the data is going to be used in MySQL queries make sure to use mysql_real_escape_query() on the data instead of addslashes().
If it contains file paths be sure to remove the "../" parts and block access to sensitive filename.
If you are going to display the data on a web page, make sure to use htmlspecialchars() on it.
But the most important validation is only accepting the values you are expecting, in other words: only allow numeric values when you are expecting numbers, etc.
Short answer: no.
Long answer: it depends.
Basically you can't say that a certain amount of filtering is or isn't sufficient without considering what you want to do with it. For example, the above will allow through "javascript:dostuff();", which might be OK or it might not if you happen to use one of those GET or POST values in the href attribute of a link.
Likewise you might have a rich text area where users can edit so stripping tags out of that doesn't exactly make sense.
I guess what I'm trying to say is that there is simple set of steps to sanitizing your data such that you can cross it off and say "done". You always have to consider what that data is doing.
It highly depends where you are going to use it for.
If you are going to display things as HTML, make absolutely sure you are properly specifying the encoding (e.g.: UTF-8). As long as you strip all tags, you should be fine.
For use in SQL queries, addslashes is not enough! If you use the mysqli library for example, you want to look at mysql::real_escape_string. For other DB libraries, use the designated escape function!
If you are going to use the string in javascript, addslashes will not be enough.
If you are paranoid about browser bugs, check out the OWASP Reform library
If you use the data in another context than HTML, other escaping techniques apply.

Resources