blacklisting vs whitelisting in form's input filtering and validation - filter

which is the preferred approach in sanitizing inputs coming from the user?
thank you!

I think whitelisting is the desired approach, however I never met a real whitelist HTML form validation. For example here is a symfony 1.x form with validation from the documentation:
class ContactForm extends sfForm
{
protected static $subjects = array('Subject A', 'Subject B', 'Subject C');
public function configure()
{
$this->setWidgets(array(
'name' => new sfWidgetFormInput(),
'email' => new sfWidgetFormInput(),
'subject' => new sfWidgetFormSelect(array('choices' => self::$subjects)),
'message' => new sfWidgetFormTextarea(),
));
$this->widgetSchema->setNameFormat('contact[%s]');
$this->setValidators(array(
'name' => new sfValidatorString(array('required' => false)),
'email' => new sfValidatorEmail(),
'subject' => new sfValidatorChoice(array('choices' => array_keys(self::$subjects))),
'message' => new sfValidatorString(array('min_length' => 4)),
));
}
}
What you cannot see, that it accepts new inputs without validation settings and it does not check the presence of inputs which are not registered in the form. So this is a blacklist input validation. By whitelist you would define an input validator first, and only after that bind an input field to that validator. By a blacklist approach like this, it is easy to forget to add a validator to an input, and it works perfectly without that, so you would not notice the vulnerability, only when it is too late...
A hypothetical whitelist approach would look like something like this:
class ContactController {
/**
* #input("name", type = "string", singleLine = true, required = false)
* #input("email", type = "email")
* #input("subject", type = "string", alternatives = ['Subject A', 'Subject B', 'Subject C'])
* #input("message", type = "string", range = [4,])
*/
public function post(Inputs $inputs){
//automatically validates inputs
//throws error when an input is not on the list
//throws error when an input has invalid value
}
}
/**
* #controller(ContactController)
* #method(post)
*/
class ContactForm extends sfFormX {
public function configure(InputsMeta $inputs)
{
//automatically binds the form to the input list of the #controller.#method
//throws error when the #controller.#method.#input is not defined for a widget
$this->addWidgets(
new sfWidgetFormInput($inputs->name),
new sfWidgetFormInput($inputs->email),
new sfWidgetFormSelect($inputs->subject),
new sfWidgetFormTextarea($inputs->message)
);
$this->widgetSchema->setNameFormat('contact[%s]');
}
}

The best approach is to either use stored procedures or parameterized queries. White listing is an additional technique that is ok to prevent any injections before they reach the server, but should not be used as your primary defense. Black listing is usually a bad idea because it's usually impossible to filter out all malicious inputs.
BTW, this answer is considering you mean sanitizing as in preventing sql injection.

WL is a best practice against BL whenever it is practicable.
The reason is simple: you can't be reasonably safe enumerating what it is not permitted, an attacker could always find a way you did not think about. If you can, say what is allowed for sure, it is simpler and much much safer !

Let me explain your question with few more question and answer.
Blacklist VS Whitelist restriction
i. A Blacklist XSS and SQL Injection handling verifies a desired input against a list of negative input's. Basically one would compile a list of all the negative or bad conditions, and verifies that the input received is not one among the bad or negative conditions.
ii. A Whitelist XSS and SQL Injection handling verifies a desired input against a list of possible correct input's. To do this one would compile a list of all the good/positive input values/conditions, and verifies that the input received is one among the correct conditions.
Which one is better to have?
i. An attacker will use any possible means to gain access to your application. This includes trying all sort of negative or bad conditions, various encoding methods, and appending malicious input data to valid data. Do you think you can think of every possible bad permutation that could occur?
ii. A Whitelist is the best way to validate input. You will know exacty what is desired and that there is not any bad types accepted. Typically the best way to create a whitelist is with the use of regular expression's. Using regular expressions is a great way to abstract the whitelisting, instead of manually listing every possible correct value.
Build a good regular expression. Just because you are using a regular expression does not mean bad input will not be accepted. Make sure you test your regular expression and that invalid input cannot be accepted by your regular expression.

Personally, I gauge the number of allowed or disallowed characters and go from there. If there are more allowed chars than disallowed, then blacklist. Else whitelist. I don't believe that there is any 'standard' that says you should do it one way or the other.
BTW, this answer is assuming you want to limit inputs into form fields such as phone numbers or names :) #posterBelow

As a general rule it's best to use whitelist validation since it's easier to accept only characters you know should go there, for example if you have a field where the user inputs his/her phone number you could just do a regex and check that the values received are only numbers, drop everything else and just store the numbers. Note that you should proceed to validate the resulting numbers as well. Blacklist validation is weaker because a skilled attacker could evade your validation functions or send values that your function did not expect, from OWASP "Sanitize with Blacklist":
Eliminate or translate characters (such as to HTML entities or to remove quotes) in an effort to make the input "safe". Like blacklists, this approach requires maintenance and is usually incomplete. As most fields have a particular grammar, it is simpler, faster, and more secure to simply validate a single correct positive test than to try to include complex and slow sanitization routines for all current and future attacks.
Realize that this validation is just a first front defense against attacks. For XSS you should always "Escape" your output so you can print any character's needed but they are escaped meaning that they are changed to their HTML entity and thus the browser knows it's data and not something that the parser should interpret thus effectively shutting down all XSS attacks. For SQL injections escape all data before storing it, try to never use dynamic queries as they are the easiest type of query to exploit. Try to use parameterized store procedures. Also remember to use connections relevant to what the connection has to do. If the connection only needs to read data, create a db account with only "Read" privileges this depends mostly on the roles of the users. For more information please check the links from where this information was extracted from:
Data Validation OWASP
Guide to SQL Injection OWASP

The answer generally is, it depends.
For inputs with clearly defined parameters (say the equivalent of a dropdown menu), I would whitelist the options and ignore anything that wasn't one of those.
For free-text inputs, it's significantly more difficult. I subscribe to the school of thought that you should just filter it as best you can so it's as safe as possible (escape HTML, etc). Some other suggestions would be to specifically disallow any invalid input - however, while this might protect against attacks, it might also affect usability for genuine users.
I think it's just a case of finding the blend that works for you. I can't think of any one solution that would work for all possibilities. Mostly it depends on your userbase.

Related

How to match parameter to TWO entities simultaneously?

My bot asks: 'how do you (i.e. customer) want to pay for this product?'
Customer says: 'part in cash and the difference in 48x'
What the customer is saying above is that he wants to pay in cash and use financing. And that financing should consider 48 installments.
Entities:
paymentType: {cash, financed} ; Financed includes 48x as a synonym
numInstallments: {12x, 24x, 36x, 48x} ; 48x is the number of installments desired
Using the GUI only, how to do this:
IF user says '48x' THEN simultaneously add 'financed' to the paymentType list AND set numInstallments equal to '48x' ?
Apparently the GUI doesn't allow me to do that unless I'm doing something wrong (see below the screen which allows a parameter to be mapped to an entity and notice that this dropdown apparently allows selection of a single entity and not two, which is what I need).
How to solve this problem in an easy way through the GUI?
I don't know if what you have in mind is actually feasible in this case.
What you could do is keep the intent and entities as-is and then create several conditions in the page where you fill this parameters or another page (i think this is preferred).
In that page you can put different routes where your conditions are true that modify your parameters as you wish.
For example, after asking the user how they'd like to pay, you can have a route going to a "Set parameters" page which has several routes:
First route has a condition $session.params.numeroDeParcelas != null (you know the user has asked a specific number of installments, so handle the case by setting the parameters you need in this route (under parameters in the route write paymentType : "financed")
Second route has another condition, for example $session.params.numeroDeParcelas = null (you know the user hasn't asked for financing, so set the same parameter as before to "cash")
and so on, until you've exhausted your user cases (all payment methods, possibly all types of financing).
Pay attention: the routes are always evaluated in order so make sure to keep this in mind while writing/ordering them: be specific to avoid fulfilling the wrong one by mistake (e.g. by creating compound conditions, chaining parameter checks as in $session.params.numeroDeParcelas = null AND $session.params. numInstallments = "36x"

I want to check duplication value during insert time without using unique keyword

i make one table for with some column with nullable.
i already tried with two different query. one using
Register_member::where('passport',$passport)->orWhere('adharcardnumber',$adharcardnumber)->get();
and second DB::table type query.
$row = Register_member::where('passport',$passport)->orWhere('adharcardnumber',$adharcardnumber)->get();
if (!empty($row))
{
return response()->json(["status"=>0, "message"=>"Adharcard or Paasport number already exit."]);
}
if (empty($row))
{
Register_member::insert(['first_name'=>request('first_name'), 'middle_name'=>request('middle_name'), 'last_name'=>request('last_name'), 'adharcardnumber'=>request('adharcardnumber'), 'ocipcinumber'=>request('ocipcinumber'), 'passport'=>request('passport'), 'birthday'=>request('birthday'),
'mobilecode'=>request('mobilecode'), 'mobilenumber'=>request('mobilenumber'), 'email'=>request('email'), 'address'=>request('address'), 'landmark'=>request('landmark'), 'area'=>request('area'),
'gender'=>request('gender'), 'pincode'=>request('pincode'), 'city_name'=>request('city_name'), 'state_id'=>request('state_id'), 'country_id'=>request('country_id'), 'sampraday'=>request('sampraday'), 'other'=>request('other'), 'sms'=>request('sms')]);
return response()->json(["status"=>1, "message"=>"Member register successful."]);
}
if adharcardnumber or passport number are exists in table, then nagetive response. if in both any one in unique then, insert data in table
Let me suggest you something which I think serve you as a good solution. You can use the unique with required and regex. In this way it will use the already recommended ways of Laravel which are the best.
As an example for your adhaar card,
the validation should look like this
$request->validate([
'adhaar ' =>['required','unique:users','regex:/\d{12}/'],
]);
where adhar is the filed name where adhaar number is entered. Be sure to use validator like this use Illuminate\Support\Facades\Validator;. Also $request is the instance of the Request.
Using the required prevent empty field submission and the regex will throw an error if the pattern is not matched completely. so I think it would be a better a way to handle the scenario.
the above solution will work in both adhaar and passport. But for the passport the regex will be different though.
Please note these are all demo examples, you might need to modify it according to your needs. I use https://www.phpliveregex.com/ for regex making and checking and it is good enough.
I hope you get an idea of how to begin but if you need more information then let me know in the comments.

In CakePHP validation, do I need a notEmpty rule if I have another rule like alphaNumeric and allowEmpty is false?

If I have a validation rule such as
alphaNumeric' => array(
'rule' => array('alphaNumeric'),
'allowEmpty' => false),
Is there any need to have a notEmpty rule? As I understand it, the allowEmpty being set to false will consider empty values a violation of the alphaNumeric rule, so other than if I wanted to define two different error messages, is there any need for a notEmpty rule?
(Another way to ask this question: is there some separate functionality that a standalone notEmpty rule would provide or be necessary for, other than to give a separate custom message, that I'm not seeing?)
To be perfectly clear: I understand that idea that notEmpty is a standalone rule, where allowEmpty is an attribute of a rule. That's not my question. My question is, is there any need or value to adding a notEmpty rule (other than the custom message that it would allow you to have for that rule), if you already have an alphaNumeric (or some other similar) rule you can just add allowEmpty = false to? Is there any difference in what the rule vs the attribute does, other than the rule being stand alone?
It really depends on the "other" rule that you're using.
You can see exactly what each rule is ACTUALLY checking for in the CakePHP Validation utility:
https://github.com/cakephp/cakephp/blob/44b7d013ae304a05699179bb4ea0077956c57e10/lib/Cake/Utility/Validation.php
For instance, in that file you can see the alphanumiric check:
public static function alphaNumeric($check) {
if (is_array($check)) {
extract(self::_defaults($check));
}
if (empty($check) && $check != '0') {
return false;
}
return self::_check($check, '/^[\p{Ll}\p{Lm}\p{Lo}\p{Lt}\p{Lu}\p{Nd}]+$/Du');
}
In the case of alphanumeric, you can see that it has an empty check already, so you shouldn't also need the allowEmpty=>false rule.
Lastly, to your point, the only benefit I see in adding it as a separate rule is that you can give a better error message to the user.
Please read... http://book.cakephp.org/2.0/en/models/model-attributes.html
Model attributes allow you to set properties that can override the default model behavior and rules in your context is the business logic of your application.
The answer of your question lies in the below link:
http://book.cakephp.org/2.0/en/models/data-validation.html#allowempty
Actually you are absolutely correct, if you there is field in which you have to apply more than one validation, eventually one is nonEmpty in that case you can simply use allowEmpty=>false.
But if your datafield requires only one validation for non empty check in that case you should use nonEmpty for better understanding of your code!
I guess I had made my point...thanks

Best way to make sure username isn't a reserved word?

Let's say I'm building a web application whose user pages can be found at http://example.com/NAME. What's the best way to make sure the username doesn't conflict with a reserved word (e.g. 'about', 'contact', etc.)? I can think of two ways:
Maintain a list somewhere in my code. This is great and all, but means I have another piece of code I have to edit if I decide to, say, change the "about" page to "aboutus".
Request the URI (e.g. http://example.com/someusername) and check if it exists (doesn't return a 404). This feels kind of like a hack, but on the other hand it does exactly what it's supposed to do. On the other hand, I can't reserve anything without making a page for it.
What would be the best way to go about this? Manual validation of usernames is not an option. Thanks!
EDIT: I forgot to mention, the username has to go at the root, like this:
http://example.com/USERNAME
Not like this:
http://example.com/users/USERNAME
Hence why I'm asking this question. This is for technical reasons, don't ask.
I would strongly suggest using a unique path like http://example.com/users/NAME instead. Otherwise, what are you going to do if you want to add a reserved word, but a user has already taken it as their user name? You'll end up with all kinds of potential migration problems down the track.
Alternatively, if you must have something that goes straight off http://example.com/, could you possibly prefix all user names? So that user jerryjvl would translate to link http://example.com/user_jerryjvl?
If there is really no other possible solution, then I'd say either check user names against whatever data source determines what the 'reserved words' are, or make a lookup file / table / structure somewhere that contains all the reserved words.
In the interest of completeness, if you can't change the routing. Another possibility is to have your user routes and your non-user routes have a programmatic distinction. For example, if you appended a '_' to the end of each of your user routes, then you can make sure that users are located at: http://example.com/NAME_ and the other route would never end in '_'
How about changing your routing scheme so that users are at example.com/users/NAME ?
I maintain the reserved words inside the code.
This is the PERL code that I use in the http://postbit.com/ website to check if the usernamename is a reserved word:
# Black list of logins and sub-domains reserved keywords
my #black_list = qw(
about access account accounts add address adm admin administration
adult advertising affiliate affiliates ajax analytics android anon
anonymous api app apps archive atom auth authentication
...
);
my $username_normalized = lc($username);
$username_normalized =~ s/\W//gs; # 'log-in' -> 'login'
for my $this_username (#black_list) {
if ($username_normalized eq $this_username) {
die("This username is already taken. Please choose other username.\n");
}
}
The complete list of reserved names (like 'css', 'images', 'js', 'admin', 'root', 'old', 'test', 'www', 'admin', 'login', 'devel'...) with more than 300 login usernames is posted here:
http://blog.postbit.com/reserved-username-list.html
You only know what are these 'reserved' words. So better maintain a list and validate against it.
Another method will be if you use a CMS, then all these keywods 'about', 'contact' etc. will be there in your database. Validate against it.
Right next to the text box something like: "Please use your personal nickname or you real name. Usernames with common words indicating affiliation with the site administration may be revoked".
How about just create dummy accounts first with all the reserve words? just list all the possible ones and create them.
if you use
www.example.com/user/name
then there will be no problem but it seems like you'd like the URL to be short.
Maintain a list somewhere in my code. This is great and all, but means I have another piece of code I have to edit if I decide to, say, change the "about" page to "aboutus".
Your menus should be stored in an array/list. This way you would have only 1 piece of code to edit, not 2. =]
Then, since all menus are in one array, you can match username with elements in the array.
for example
$menu = array('About', 'Contact', 'Home')
if( in_array($username, $menu) ) {
echo 'invalid username'
}
You could always look and see how stackoverflow.com works.

Avoiding duplicate code in input validation

Suppose you have a subsystem that does some kind of work. It could be anything. Obviously, at the entry point(s) to this subsystem there will be certain restrictions on the input. Suppose this subsystem is primarily called by a GUI. The subsystem needs to check all the input it recieves to make sure it's valid. We wouldn't want to FireTheMissles() if there was invalid input. The UI is also interested in the validation though, because it needs to report what went wrong. Maybe the user forgot to specify a target or targetted the missles at the launchpad itself. Of course, you can just return a null value or throw an exception, but that doesn't tell the user SPECIFICALLY what went wrong (unless, of course, you write a separate exception class for each error, which I'm fine with if that's the best practice).
Of course, even with exceptions, you have a problem. The user might want to know if input is valid BEFORE clicking the "Fire Missles!" button. You could write a separate validation function (of course IsValid() doesn't really help much because it doesn't tell you what went wrong), but then you'll be calling it from the button click handler and again from the FireTheMissles() function (I really don't know how this changed from a vague subsystem to a missle-firing program). Certainly, this isn't the end of the world, but it seems silly to call the same validation function twice in a row without anything having changed, especially if this validation function requires, say, computing the hash of a 1gb file.
If the preconditions of the function are clear, the GUI can do its own input validation, but then we're just duplicating the input validation logic, and a change in one might not be reflected in the other. Sure, we may add a check to the GUI to make sure the missle target is not within an allied nation, but then if we forget to copy it to the FireTheMissles() routine, we'll accidentally blow up our allies when we switch to a console interface.
So, in short, how do you achieve the following:
Input validation that tells you not just that something went wrong, but what specifically went wrong.
The ability to run this input validation without calling the function which relies on it.
No double validation.
No duplicate code.
Also, and I just thought of this, but error messages should not be written in the FireTheMissles() method. The GUI is responsible for picking appropriate error messages, not the code the GUI is calling.
"The subsystem needs to check all the input it receives to make sure it's valid"
Think of the inputs not so much as a list of arguments, but as a message, it gets easier after that.
The message class has an IsValid member function, it remembers if IsValid was called and what the result was. It also remembers its state, if the state changes then it needs to be re validated. This message class also keeps a list of validation errors.
Now, the UI builds a TargetMissiles message, and the UI can validate it, or pass it directly to the MissileFiring subsystem, it checks to see if the message was validated, if not it validates it, and proceeds / fails depending.
The UI gets the message back, with the list of validations already populated.
The messages with their validation sit in a separate library. No code is duplicated.
This sound OK?
This is what Model-View-Controller is all about.
You build up a model (a launch which is composed of coordinates, missile types and number of missiles) and the model has a validate method which returns a list of errors/warnings. When you update the model (on key-up, <ENTER>, button-press) you call the validate method and show the user any warnings, errors, etc. (Eclipse has a little area just under the tools bar in a dialog that does this, you might want to look at that.)
When the model is valid, you activate the launch missiles button so that the user knows that they can launch the missiles. If you have an update event that is called particularly frequently or a part of the validation that is particularly costly, you can have a validate_light method on the model that you use for validating only the parts that are easy to do.
When you switch to a console based UI you build up the model from the command line arguments, call the same validate method (and report errors to stderr) and then launch the missiles.
Double the validation. In many case the validation is trebled (FKs and not null fields in the DB for example). Depending on your platform it may be possible to code the validation rules once. For example your front end and backend code could share C# business classes. Alternatively you could store the validation rule as metadata that both the backend and front end can access an apply.
In reality the fact that you need different responses to a validation problem (for example the Fire Missile button shouldn't even be enabled until the other inputs are valid) there will be different code associated with the same rule.
I'd suggest an input validation class, which takes the input type (an enumeration) in its' constructor, and provides a public IsValid method.
The IsValid method should return a boolean TRUE for valid and FALSE for invalid. It should also have an OUT parameter that takes a string and assigns a status message to that string. The caller will be free to ignore that message if it wants to, or report it up to the GUI if that's appropriate for the context.
So, in pseudocode (forgive the Delphi-like syntax, but it should be readable to anybody):
//different types of data we might want to validate
TValidationType = (vtMissileLaunchCodes, vtFirstName,
vtLastName, vtSSN);
TInputValidator = class
public
//call the constructor with the validation type
constructor Create(ValidationType: TValidationType);
//this should probably be ABSTRACT, implemented by descendants
//if you took that approach, then you'd have 1 descendant class
//for each validation type, instead of an enumeration
function IsValid(InputData: string; var msg: string): boolean;
And then to use it, you'd do something like this:
procedure ValidateForm;
var
validator: TInputValidator;
begin
validator := TInputValidator.Create(vtSSN);
if validator.IsValid(edtSSN.Text,labelErrorMsg.Text) then
SaveData; //it's valid, so save it!
//if it wasn't valid, then the error msg is in the GUI in "labelErrorMsg".
end;
Each piece of data has its own meta data (type, format, unit, mask, range etc.). Unfortunately this is not always specified.
The GUI controlls need to check the input with the metadata and give warnings/errors if the data is invalid.
Example: a number has a range. The range is provided by the metadata, but the range check is provided by the control.

Resources