Laravel 7 Sanitizing Form Data

Laravel 7 Sanitizing Form Data - laravel

I'm new to Laravel 7 and wondering if there's an out-of-the-box, elegant solution to sanitizing HTML form inputs? Or maybe a trusted third-party package I can download that you recommend? This is for data I will store in a database. Thanks for any help.

One recommended out-of-the-box way of sanitizing data is using the filter_var function that comes with PHP in conjunction with the different sanitize filters. By the way, this is also a cool way to validate input, take a look at the types of filters to find out more.
When working in Laravel projects, I like to use voku/portable-ascii library, because it's already a framework dependency. It is a nice assortment of functions to clean input, remove non-printable characters, and to generally transform any input into ASCII, complete with transliteration and whatnot. It's not always perfect, but usually good enough and gets the job done.
It always depends on what you want to sanitize, how, and why. In many situations you do not need to sanitize the input at all if you stick to the best practices. When working with Eloquent or the Query Builder, your data is automatically escaped and on retrieval, when you output it e.g. via {{ $data }}, they will be properly escaped too.
There are some situations where you should be more cautious, especially if you are handling the raw user input yourself and probably passing it to the system in command line parameters, filenames or such. In those cases it is usually a good idea to be as restrictive as possible and as permissive as necessary. Sometimes a good old preg_replace('/[^0-9A-Z_-]/i', '', $subject) is just the right choice. If you want to be as permissive as possible, give the suggestions above a try.

Related

i18n server-side vs. client-side

Seeking some advice on two approaches to internationalization & localization. I have a web app using Spring MVC and Dojo, and I would like to support multiple languages. So, I could:
Use <spring:message> to generate the appropriate text on the server side using a properties file.
Use dojo/i18n to select the appropriate text on the client side using a js file.
And of course any combination of the two is also an option.
So, what are the pros and cons of each approach? When would you use one vs. the other?

The combination of these two approaches is the only reasonable answer.
Basically, you should try to stick to server-side, and only do client-side when it is really necessary (there is no other way, like you have some dynamically created controls).
Pros and cons? The main con of client-side string externalization is, you won't be able to translate everything correctly. That's because of context. The same English terms might be translated in a different way, depending on the context.
At the same time, you will often need to format message (add parameters to your message tag), which in regular Java you would do by calling MessageFormat.format(). Theoretically, you could do that on the client-side, but this is risky to say the least. You won't have access to original message parts (like dates, some data sources, whatever) and it might hurt translation correctness.
Formatting dates, numbers, etc. is more painful on the client-side. It is possible with Dojo or jQuery Globalize, but the results might not as good as they should be. But Spring has problem with formatting dates, anyway (lack of default local date/time designation, you may only choose from short, medium, long, full, which to me is completely useless).
Another issue might be handling plural forms (non-English). Believe, or not but languages may have more than one plural form (depending on the quantity) and because of that translations might differ. I don't think Dojo is handling it at all (however, I might be mistaken, some time has past since I evaluated it). Spring won't handle it as well, but you may build custom solution based on ICU's PluralRules (or PluralFormat if you're hard enough to learn formatting and want to kill the translators at the same time).
To cut a long story short, doing I18n correctly is far from being easy and you'll get better support on the server side.
BTW. I remember Dojo as quite "heavy", library itself was over 1MB... It might take a while to load it and your application might seem slow comparing to others... That was one of the reasons, I recommended Globalize rather than Dojo for our projects. It might not have so many features, but at least it seems lightweight.

HTML/XSS escape on input vs output

From everything I've seen, it seems like the convention for escaping html on user-entered content (for the purposes of preventing XSS) is to do it when rendering content. Most templating languages seem to do it by default, and I've come across things like this stackoverflow answer arguing that this logic is the job of the presentation layer.
So my question is, why is this the case? To me it seems cleaner to escape on input (i.e. form or model validation) so you can work under the assumption that anything in the database is safe to display on a page, for the following reasons:
Variety of output formats - for a modern web app, you may be using a combination of server-side html rendering, a JavaScript web app using AJAX/JSON, and mobile app that receives JSON (and which may or may not have some webviews, which may be JavaScript apps or server-rendered html). So you have to deal with html escaping all over the place. But input will always get instantiated as a model (and validated) before being saved to db, and your models can all inherit from the same base class.
You already have to be careful about input to prevent code-injection attacks (granted this is usually abstracted to the ORM or db cursor, but still), so why not also worry about html escaping here so you don't have to worry about anything security-related on output?
I would love to hear the arguments as to why html escaping on page render is preferred

In addition to what has been written already:
Precisely because you have a variety of output formats, and you cannot guarantee that all of them will need HTML escaping. If you are serving data over a JSON API, you have no idea whether the client needs it for a HTML page or a text output (e.g. an email). Why should you force your client to unescape "Jack & Jill" to get "Jack & Jill"?
You are corrupting your data by default.
When someone does a keyword search for 'amp', they get "Jack & Jill". Why? Because you've corrupted your data.
Suppose one of the inputs is a URL: http://example.com/?x=1&y=2. You want to parse this URL, and extract the y parameter if it exists. This silently fails, because your URL has been corrupted into http://example.com/?x=1&y=2.
It's simply the wrong layer to do it - HTML related stuff should not be mixed up with raw HTTP handling. The database shouldn't be storing things that are related to one possible output format.
XSS and SQL Injection are not the only security problems, there are issues for every output you deal with - such as filesystem (think extensions like '.php' that cause web servers to execute code) and SMTP (think newline characters), and any number of others. Thinking you can "deal with security on input and then forget about it" decreases security. Rather you should be delegating escaping to specific backends that don't trust their input data.
You shouldn't be doing HTML escaping "all over the place". You should be doing it exactly once for every output that needs it - just like with any escaping for any backend. For SQL, you should be doing SQL escaping once, same goes for SMTP etc. Usually, you won't be doing any escaping - you'll be using a library that handles it for you.
If you are using sensible frameworks/libraries, this is not hard. I never manually apply SQL/SMTP/HTML escaping in my web apps, and I never have XSS/SQL injection vulnerabilities. If your method of building web pages requires you to remember to apply escaping, or end up with a vulnerability, you are doing it wrong.
Doing escaping at the form/http input level doesn't ensure safety, because nothing guarantees that data doesn't get into your database or system from another route. You've got to manually ensure that all inputs to your system are applying HTML escaping.
You may say that you don't have other inputs, but what if your system grows? It's often too late to go back and change your decision, because by this time you've got a ton of data, and may have compatibility with external interfaces e.g. public APIs to worry about, which are all expecting the data to be HTML escaped.
Even web inputs to the system are not safe, because often you have another layer of encoding applied e.g. you might need base64 encoded input in some entry point. Your automatic HTML escaping will miss any HTML encoded within that data. So you will have to do HTML escaping again, and remember to do, and keep track of where you have done it.
I've expanded on these here: http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/

The original misconception
Do not confuse sanitation of output with validation.
While <script>alert(1);</script> is a perfectly valid username, it definitely must be escaped before showing on the website.
And yes, there is such a thing as "presentation logic", which is not related to "domain business logic". And said presentation logic is what presentation layer deals with. And the View instances in particular. In a well written MVC, Views are full-blown objects (contrary to what RoR would try to to tell you), which, when applied in web context, juggle multiple templates.
About your reasons
Different output formats should be handled by different views. The rules and restrictions, which govern HTML, XML, JSON and other formats, are different in each case.
You always need to store the original input (sanitized to avoid injections, if you are not using prepared statements), because someone might need to edit it at some point.
And storing original and the xss-safe "public" version is waste. If you want to store sanitized output, because it takes too much resources to sanitize it each time, then you are already pissing at the wrong tree. This is a case, when you use cache, instead of polluting the database.

Why do people use plain english as translation placeholders?

This may be a stupid question, but here goes.
I've seen several projects using some translation library (e.g. gettext) working with plain english placeholders. So for example:
_("Please enter your name");
instead of abstract placeholders (which has always been my instinctive preference)
_("error_please_enter_name");
I have seen various recommendations on SO to work with the former method, but I don't understand why. What I don't get is what do you do if you need to change the english wording? Because if the actual text is used as the key for all existing translations, you would have to edit all the translations, too, and change each key. Or don't you?
Isn't that awfully cumbersome? Why is this the industry standard?
It's definitely not proper normalization to do it this way. Are there massive advantages to this method that I'm not seeing?

Yes, you have to alter the existing translation files, and that is a good thing.
If you change the English wording, the translations probably need to change, too. Even if they don't, you need someone who speaks the other language to check.
You prep a new version, and part of the QA process is checking the translations. If the English wording changed and nobody checked the translation, it'll stick out like a sore thumb and it'll get fixed.

The main language is already existent: you don't need to translate it.
Translators have better context with a real sentence than vague placeholders.
The placeholders are just the keys, it's still possible to change the original language by creating a translation for it. Because when the translation doesn't exists, it uses the placeholder as the translated text.

We've been using abstract placeholders for a while and it was pretty annoying having to write everything twice when creating a new function. When English is the placeholder, you just write the code in English, you have meaningful output from the start and don't have to think about naming placeholders.
So my reason would be less work for the developers.

I like your second approach. When translating texts you always have the problem of homonyms. Like 'open' can mean a state of a window but also the verb to perform the action. In other languages these homonyms may not exist. That's why you should be able to add meaning to your placeholders. Best approach is to put this meaning in your text library. If this is not possible on the platform the framework you use, it might be a good idea to define a 'development language'. This language will add meaning to the text entries like: 'action_open' and 'state_open'. you will off course have to put extra effort i translating this language to plain english (or the language you develop for). I have put this philosophy in some large projects and in the long run this saves some time (and headaches).
The best way in my opinion is keeping meaning separate so if you develop your own translation library or the one you use supports it you can do something like this:
_(i18n("Please enter your name", "error_please_enter_name"));
Where:
i18n(text, meaning)

Interesting question. I assume the main reason is that you don't have to care about translation or localization files during development as the main language is in the code itself.

Well it probably is just that it's easier to read, and so easier to translate. I'm of the opinion that your way is best for scalability, but it does just require that extra bit of effort, which some developers might not consider worth it... and for some projects, it probably isn't.

There's a fallback hierarchy, from most specific locale to the unlocalised version in the source code.
So French in France might have the following fallback route:
fr_FR
fr
Unlocalised. Source code.
As a result, having proper English sentences in the source code ensures that if a particular translation is not provided for in step (1) or (2), you will at least get a proper understandable sentence than random programmer garbage like “error_file_not_found”.
Plus, what do you do if it is a format string: “Sorry but the %s does not exist” ? Worse still: “Written %s entries to %s, total size: %d” ?

Quite old question but one additional reason I haven't seen in the answers yet:
You could end up with more placeholders than necessary, thus more work for translators and possible inconsistent translations. However, good editors like Poedit or Gtranslator can probably help with that.
To stick with your example:
The text "Please enter your name" could appear in a different context in a different template (that the developer is most likely not aware of and shouldn't need to be). E.g. it could be used not as an error but as a prompt like a placeholder of an input field.
If you use
_("Please enter your name");
it would be reusable, the developer can be unaware of the already existing key for an error message and would just use the same text intuitively.
However, if you used
_("error_please_enter_name");
in a previous template, developers wouldn't necessarily be aware of it and would make up a second key (most likely according to a predefined wording scheme to not end up in complete chaos), e.g.
_("prompt_please_enter_name");
which then has to be translated again.
So I think that doesn't scale very well. A pre-agreed wording scheme of suffixes/prefixes e.g. for contexts can never be as precise as the text itself I think (either too verbose or too general, beforehand you don't know and afterwards it's difficult to change) and is more work for the developer that's not worth it IMHO.
Does anybody agree/disagree?

filtering user input in php

Am wondering if the combination of trim(), strip_tags() and addslashes() is enough to filter values of variables from $_GET and $_POST

That depends what kind of validation you are wanting to perform.
Here are some basic examples:
If the data is going to be used in MySQL queries make sure to use mysql_real_escape_query() on the data instead of addslashes().
If it contains file paths be sure to remove the "../" parts and block access to sensitive filename.
If you are going to display the data on a web page, make sure to use htmlspecialchars() on it.
But the most important validation is only accepting the values you are expecting, in other words: only allow numeric values when you are expecting numbers, etc.

Short answer: no.
Long answer: it depends.
Basically you can't say that a certain amount of filtering is or isn't sufficient without considering what you want to do with it. For example, the above will allow through "javascript:dostuff();", which might be OK or it might not if you happen to use one of those GET or POST values in the href attribute of a link.
Likewise you might have a rich text area where users can edit so stripping tags out of that doesn't exactly make sense.
I guess what I'm trying to say is that there is simple set of steps to sanitizing your data such that you can cross it off and say "done". You always have to consider what that data is doing.

It highly depends where you are going to use it for.
If you are going to display things as HTML, make absolutely sure you are properly specifying the encoding (e.g.: UTF-8). As long as you strip all tags, you should be fine.
For use in SQL queries, addslashes is not enough! If you use the mysqli library for example, you want to look at mysql::real_escape_string. For other DB libraries, use the designated escape function!
If you are going to use the string in javascript, addslashes will not be enough.
If you are paranoid about browser bugs, check out the OWASP Reform library
If you use the data in another context than HTML, other escaping techniques apply.

What are the url parameters naming convention or standards to follow

Are there any naming conventions or standards for Url parameters to be followed. I generally use camel casing like userId or itemNumber. As I am about to start off a new project, I was searching whether there is anything for this, and could not find anything. I am not looking at this from a perspective of language or framework but more as a general web standard.

I recommend reading Cool URI's Don't Change by Tim Berners-Lee for an insight into this question. If you're using parameters in your URI, it might be better to rewrite them to reflect what the data actually means.
So instead of having the following:
/index.jsp?isbn=1234567890
/author-details.jsp?isbn=1234567890
/related.jsp?isbn=1234567890
You'd have
/isbn/1234567890/index
/isbn/1234567890/author-details
/isbn/1234567890/related
It creates a more obvious data structure, and means that if you change the platform architecture, your URI's don't change. Without the above structure,
/index.jsp?isbn=1234567890
becomes
/index.aspx?isbn=1234567890
which means all the links on your site are now broken.
In general, you should only use query strings when the user could reasonably expect the data they're retrieving to be generated, e.g. with a search. If you're using a query string to retrieve an unchanging resource from a database, then use URL-rewriting.

There are no standards that I'm aware of. Just be mindful of IE's URL length limit of 2,083 characters.

Standard for URI are defined by RFC2396.
Anything after the standardized portion of the URL is left to you.
You probably only want to follow a particular convention on your parameters based on the framework you use.
Most of the time you wouldn't even really care because these are not under your control, but when they are, you probably want to at least be consistent and try to generate user-friendly bits:
that are short,
if they are meant to be directly accessible by users, they should be easy to remember,
case-insensitive (may be hard depending on the server OS).
follow some SEO guidelines and best practices, they may help you a lot.
I would say that cleanliness and user-friendliness are laudable goals to strive for when presenting URLs.
StackOverflow does a fairly good job of it.

I use lowercase. Depending on the technology you use, QS is either threated as case-sensitive (eg. PHP) or not (eg. ASP). Using lowercase avoids possible confusion.

Like the other answers I've not heard about any conventions.
The only "standard" I would adhere to is to use the more search engine friendly practice of using a URL rewriter.

There are no standards that I know of, and case shouldn't matter.
However within your application (website), you should stick to your own standards. For your own sanity if nothing else.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio