I'm new with freemarker, I need know about this problem too choose it or not, I will strip XSS by myself but I don't know are other features of freemarker safe when site allow user edit their template?
Oh, goodness no! This is basically equivalent to allowing the user to evaluate arbitrary code. Removing XSS after the fact only removes one potential vulnerability. They'll still be able to do plenty of other things like manipulate POST parameters or perform page redirects.
John is right. And letting the user actually edit freemarker templates themselves seems odd. If you are outputting user input again (like displaying the search term on the results page) I'd suggest using the using the ?html string built-in, it'll save you from the most rudimentary xss attacks (e.g. "you searched for '${term?html}'").
So as others said, it's not safe. However, if those users are employees at your company or something like that (i.e., if they are easily accountable for malevolent actions) then it's not entirely out of question. For more details see: http://freemarker.org/docs/app_faq.html#faq_template_uploading_security
Related
I'm just trying to find a pattern / best practice to come up with interactive user decisions.
So basically I have a (quite large) form the user has to fill. Once he submits, an AJAX-Post-Request is sent to the server. At first some fault checks are done there but some checks require user interaction e.g. "Is this really correct". As the returned "document" is always XML I thought of returning all questions at once like
bla
bla2
And then iterating through them. Ohh, I'm using JQuery and the Bootstrap' modal therefore. And if all these questions are answered with yes I'll send the form again with a parameter allyes=true or something.
However, I don't feel very happy with that and I'm just wondering if there are some easier ways to code that.
Best regards,
fire
From a users perspective, it's better if the form tells you there is a problem as you go along, rather than having filled it all in. If the checks are field level, I'd be tempted to validate each one as they are filled in.
From everything I've seen, it seems like the convention for escaping html on user-entered content (for the purposes of preventing XSS) is to do it when rendering content. Most templating languages seem to do it by default, and I've come across things like this stackoverflow answer arguing that this logic is the job of the presentation layer.
So my question is, why is this the case? To me it seems cleaner to escape on input (i.e. form or model validation) so you can work under the assumption that anything in the database is safe to display on a page, for the following reasons:
Variety of output formats - for a modern web app, you may be using a combination of server-side html rendering, a JavaScript web app using AJAX/JSON, and mobile app that receives JSON (and which may or may not have some webviews, which may be JavaScript apps or server-rendered html). So you have to deal with html escaping all over the place. But input will always get instantiated as a model (and validated) before being saved to db, and your models can all inherit from the same base class.
You already have to be careful about input to prevent code-injection attacks (granted this is usually abstracted to the ORM or db cursor, but still), so why not also worry about html escaping here so you don't have to worry about anything security-related on output?
I would love to hear the arguments as to why html escaping on page render is preferred
In addition to what has been written already:
Precisely because you have a variety of output formats, and you cannot guarantee that all of them will need HTML escaping. If you are serving data over a JSON API, you have no idea whether the client needs it for a HTML page or a text output (e.g. an email). Why should you force your client to unescape "Jack & Jill" to get "Jack & Jill"?
You are corrupting your data by default.
When someone does a keyword search for 'amp', they get "Jack & Jill". Why? Because you've corrupted your data.
Suppose one of the inputs is a URL: http://example.com/?x=1&y=2. You want to parse this URL, and extract the y parameter if it exists. This silently fails, because your URL has been corrupted into http://example.com/?x=1&y=2.
It's simply the wrong layer to do it - HTML related stuff should not be mixed up with raw HTTP handling. The database shouldn't be storing things that are related to one possible output format.
XSS and SQL Injection are not the only security problems, there are issues for every output you deal with - such as filesystem (think extensions like '.php' that cause web servers to execute code) and SMTP (think newline characters), and any number of others. Thinking you can "deal with security on input and then forget about it" decreases security. Rather you should be delegating escaping to specific backends that don't trust their input data.
You shouldn't be doing HTML escaping "all over the place". You should be doing it exactly once for every output that needs it - just like with any escaping for any backend. For SQL, you should be doing SQL escaping once, same goes for SMTP etc. Usually, you won't be doing any escaping - you'll be using a library that handles it for you.
If you are using sensible frameworks/libraries, this is not hard. I never manually apply SQL/SMTP/HTML escaping in my web apps, and I never have XSS/SQL injection vulnerabilities. If your method of building web pages requires you to remember to apply escaping, or end up with a vulnerability, you are doing it wrong.
Doing escaping at the form/http input level doesn't ensure safety, because nothing guarantees that data doesn't get into your database or system from another route. You've got to manually ensure that all inputs to your system are applying HTML escaping.
You may say that you don't have other inputs, but what if your system grows? It's often too late to go back and change your decision, because by this time you've got a ton of data, and may have compatibility with external interfaces e.g. public APIs to worry about, which are all expecting the data to be HTML escaped.
Even web inputs to the system are not safe, because often you have another layer of encoding applied e.g. you might need base64 encoded input in some entry point. Your automatic HTML escaping will miss any HTML encoded within that data. So you will have to do HTML escaping again, and remember to do, and keep track of where you have done it.
I've expanded on these here: http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/
The original misconception
Do not confuse sanitation of output with validation.
While <script>alert(1);</script> is a perfectly valid username, it definitely must be escaped before showing on the website.
And yes, there is such a thing as "presentation logic", which is not related to "domain business logic". And said presentation logic is what presentation layer deals with. And the View instances in particular. In a well written MVC, Views are full-blown objects (contrary to what RoR would try to to tell you), which, when applied in web context, juggle multiple templates.
About your reasons
Different output formats should be handled by different views. The rules and restrictions, which govern HTML, XML, JSON and other formats, are different in each case.
You always need to store the original input (sanitized to avoid injections, if you are not using prepared statements), because someone might need to edit it at some point.
And storing original and the xss-safe "public" version is waste. If you want to store sanitized output, because it takes too much resources to sanitize it each time, then you are already pissing at the wrong tree. This is a case, when you use cache, instead of polluting the database.
I may need to implement this sometime in the future, but I think the trigger for the question now is mainly curiosity.
I thought of how to write a text editor to a web site I'll build soon, and saw this site's (and other's) way, so I thought - isn't it a bit too complicated? If tags should be used from the first place, why not let users use HTML tags? The only reason I can think of is HTML injection which I don't know much about, but it sounds like an easy issue to solve, isn't it?
Thank you.
Simply because not all of your users will know HTML. *bold text* is a lot more easy to understand (and read in it's raw form) than <b>bold text</b>. Especially if you get into links.
The reason we use Markdown, Textile and the rest is to provide a nice alternative that's accessible to more users.
Of course you can still provide the ability to use HTML to your users (it's in the Markdown spec) but you'll have to do a lot of checking to make sure there's nothing malicious going on - for example, blocking <script>, <iframe>, large images, javascript in the form <a href="javascript:alert("...");"> etc.
There are several reason why you should not use HTML tags in such an editor:
1) It might be less complex for the user if you introduce an own reduced tag set
2) HTML Injection: There is a big risk of dangerous HTML code getting injected.
If you really want to allow HTML code you have to be very careful.
Historically, systems like BBCode were designed to limit available formatting elements to things that would not break the layout of the site, but now, with more mature and smarter HTML parsers, it's not necessary to invent a new markup language just to bar certain un-safe HTML tags.
The current main reason I've seen is that HTML is foreign to most users, and the HTML substitutes are aimed at providing a simplified version of the formatting directives an every-day user would need.
HTML script injection is most emphatically not an easy problem to solve. HTML is a fairly complicated, non-regular language - detecting all possible vulnerabilities is a really hard problem. Many sites have tried, and failed. It's easier, from a vulnerability-prevention POV, to just prohibit HTML entirely, or allow only a small subset of tags.
Am wondering if the combination of trim(), strip_tags() and addslashes() is enough to filter values of variables from $_GET and $_POST
That depends what kind of validation you are wanting to perform.
Here are some basic examples:
If the data is going to be used in MySQL queries make sure to use mysql_real_escape_query() on the data instead of addslashes().
If it contains file paths be sure to remove the "../" parts and block access to sensitive filename.
If you are going to display the data on a web page, make sure to use htmlspecialchars() on it.
But the most important validation is only accepting the values you are expecting, in other words: only allow numeric values when you are expecting numbers, etc.
Short answer: no.
Long answer: it depends.
Basically you can't say that a certain amount of filtering is or isn't sufficient without considering what you want to do with it. For example, the above will allow through "javascript:dostuff();", which might be OK or it might not if you happen to use one of those GET or POST values in the href attribute of a link.
Likewise you might have a rich text area where users can edit so stripping tags out of that doesn't exactly make sense.
I guess what I'm trying to say is that there is simple set of steps to sanitizing your data such that you can cross it off and say "done". You always have to consider what that data is doing.
It highly depends where you are going to use it for.
If you are going to display things as HTML, make absolutely sure you are properly specifying the encoding (e.g.: UTF-8). As long as you strip all tags, you should be fine.
For use in SQL queries, addslashes is not enough! If you use the mysqli library for example, you want to look at mysql::real_escape_string. For other DB libraries, use the designated escape function!
If you are going to use the string in javascript, addslashes will not be enough.
If you are paranoid about browser bugs, check out the OWASP Reform library
If you use the data in another context than HTML, other escaping techniques apply.
I'm implementing a password + password hint code I and want to prevent the user from making the password hint reveal the actual password right away.
Here are the scenario that I want to prevent:
Lets say that the password is: foobar123
Then the password hint can't be:
"foobar123"
"The password is: foobar123"
"f-o-o-b-a-r-1-2-3" (or any other x len separator)
"f00bar123" (replace o with zeros)
several questions:
Am I going overboard with this? Should I just let users pay the price for being security unaware?
Am I missing an obvious scenario that I need to prevent also?
Can each scenario be evaluated using regex? This is the most extendable method of adding future checks that I can think of.
I would simply give the user a fixed set of questions to choose from, to which they supply the answer. In this way you are never exposing user input values, only the user's selected value from your pre-canned list of choices. This would avoid your problem entirely.
Alternatively, if you have the user's email address, you could simply have a password reset that sends a link with an encoded key that allows a one-time password change. This way you need not provide a hint, simply a means of changing the password in response to one of these single-use tickets.
If your threat model makes password hints acceptable, I think you're going overboard with your meticulous password exposure prevention.
However, if your threat model doesn't make them acceptable, but you're being pressured into offering the feature, then be as fascist as you can.
Finally, don't limit people to canned password hints. They're extremely annoying. They imply that you know what is and isn't public knowledge in my life. Most of the sites I notice canned-only password hints on, offer hints that are all a matter of public record.
Good luck!
Personally, I say you are probably going overboard. But it somewhat depends on both the severity of compromised data (e.g. is this a web site to vote for Ms. High School or is it a web site for high-end auction house or is it a web access form for CIA?), the amount of users, and the likelihood that anyone would sue you for negligence in design after using bad hint and having their access compromised.
You can do the regex for the most dumb ones (e.g. take 6-character sub-strings of the password and do a match of those sub-strings in the hint), as well as character count for the smart ones. E.g. if the hint uses 60 to 80% of the characters in password (by count), reject it.
An even more nuanced solution is to count with position, e.g. count "o" only if it comes after "f". but this is probably overboard too.
Also consider non-hint solutions (multiple choices, non-verbal hints, e-mailable password change requests)
Does it need to be a hinting model?
The way I've done this in the past is to:
A- Have a security question.
B- Have a captcha.
C- Provide a new temporary password to an email on file only that must be changed on first use.
You can't prevent users from doing something dumb. No matter what protections you put in place, they will find a way to get around them. For example:
"321raboof backwards"
"foo and bar123"
"foobar (124 - 1)"
I don't believe there's a deterministic way to generate a hint, unless you're limiting passwords to something like birthdays or given names.
But they wouldn't be strong passwords would they?
Let the user suggest a hint - and pay the price for an obvious one.
Give plenty of advice that the hint shouldn't be obvious, but I think it must be up to the user to decide.
It's not your problem if they compromise the security of their account. Save on unnecessary coding and testing, and just don't worry about this feature!
I am about to change our password hint model to one with canned choices. To those who said it's the users own problem if they put a stupid question and answer I would mention that it become the problem of those who work for our help desk tech support. That's what we'e trying to avoid.