Should JSON data contain formatted data? - ajax

When using JSON to populate a section of a page I often encounter that data needs special formatting - formatting that need to match that already on the page, which is done serverside.
A number might need to be formatted as a currency, a special date format or wrapped in for negative values.
But where should this formatting take place - doing it clientside will mean that I need to replicate all the formatting that takes place on the serverside. Doing it serverside and placing the formatted values in the JSON object means a less generic and reusable data set.
What is the recommended approach here?

The generic answer is to format data as late/as close to the user as is possible (or perhaps "practical" is a better term).
Irritatingly this means that its an "it depends" answer - and you've more or less already identified the compromise you're going to have to make i.e. do you remove flexibility/portability by formatting server side or do you potentially introduct duplication by doing it client side.
Personally I would tend towards client side unless there's a very good reason not to do so - simply because we're back to trying to format stuff as close to the user as possible, though I would be a bit concerned about making sure that I'm applying the right formatting rules in the browser.

JSON supports the following basic types:
Numbers,
Strings,
Boolean,
Arrays,
Objects
and Null (empty).
A currency is usually nothing else than a number, but formatted according to country-specific rules. And dates are not (yet) included in JSON at all.
Whatever is recommendable depends on what you do in your application and what kind of JScript libraries you are already using. If you are already formatting alot of data in your server side code, then add it there. If not, and you already have some classes included, which can cope with formatting (JQuery and MooTools have some capabilities), do it in the browser.
So either format them in the client or format them before sending them over - both solutions work.
If you want to delve deeper into this, i recommend this wikipedia article about JSON.

Related

Why does protobuf's FieldMask use field names instead of field numbers?

In the docs for FieldMask the paths use the field names (e.g., foo.bar.buzz), which means renaming the message field names can result in a breaking change.
Why doesn't FieldMask use the field numbers to define the path?
Something like 1.3.1?
You may want to consider filing an issue on the GitHub protocolbuffers repo for a definitive answer from the code's authors.
Your proposal seems logical. Using names may be a historical artifact. There's a possibly relevant comment on an issue thread in that repo:
https://github.com/protocolbuffers/protobuf/issues/3793#issuecomment-339734117
"You are right that if you use FieldMasks then you can't safely rename fields. But for that matter, if you use the JSON format or text format then you have the same issue that field names are significant and can't be changed easily. Changing field names really only works if you use the binary format only and avoid FieldMasks."
The answer for your question lies in the fact FieldMasks are a convention/utility developed on top of the proto3 schema definition language, and not a feature of it (and that utility is not present in all of the language bindings)
While you’re right in your observation that it can break easily (as schemas tend evolve and change), you need to consider this design choice from a user friendliness POV:
If you’re building an API and want to allow the user to select the field set present inside the response payload (the common use case for field masks), it’ll be much more convenient for you to allow that using field paths, rather then binary fields indices, as the latter would force the user of the gRPC/protocol generated code to be “aware” of the schema. That’s not always the desired case when providing API as a code software packages.
While implementing this as a proto schema feature can allow the user to have the best of both worlds (specify field paths, have them encoded as binary indices) for binary encoding, it would also:
Complicate code generation requirements
Still be an issue for plain text encoding.
So, you can understand why it was left as an “external utility”.

Purpose of web app input validation for security reasons

I often encounter advice for protecting a web application against a number of vulnerabilities, like SQL injection and other types of injection, by doing input validation.
It's sometimes even said to be the single most important technique.
Personally, I feel that input validation for security reasons is never necessary and better replaced with
if possible, not mixing user input with a programming language at all (e.g. using parameterized SQL statements instead of concatenating input in the query strings)
escaping user input before mixing it with a programming or markup language (e.g. html escaping, javascript escaping, ...)
Of course for a good UX it's best to catch input that would generate errors on the backand early in the GUI, but that's another matter.
Am I missing something or is the only purpose to try to make up for mistakes against the above two rules?
Yes you are generally correct.
A piece of data is only dangerous when "used". And it is only dangerous if it has special meaning in the context it is used.
For example, <script> is only dangerous if used in output to an HTML page.
Robert'); DROP TABLE Students;-- is only dangerous when used in a database query.
Generally, you want to make this data "safe" as late as possible. Such as HTML encoding when output as HTML to an HTML page, and parameterised when inserting into a database. The big advantage of this is that when the data is later retrieved from these locations, it will be returned in its original, unsanitized format.
So if you have the value A&B O'Leary in an input field, it would be encoded like so:
<input type="hidden" value="A& O'Leary" />
and if this is submitted to your application, your programming framework will automatically decode it for you back to A&B O'Leary. Same with your DB:
string name = "A&B O'Leary";
string sql = "INSERT INTO Customers (Name) VALUES (#Name)";
SqlCommand command = new SqlCommand(sql);
command.Parameters.Add("#Name", name];
Simples.
Additionally if you then need to give the user any output in plain text, you should retrieve it from your DB and spit it out. Or in JavaScript - you just JavaScript entity encode (although best avoided for complexity reasons - I find it easier to secure if I only output to HTML then read the values from the DOM).
If you'd HTML encoded it early, then to output to JavaScript/JSON you'd first have to convert it back then hex entity encode it. It will get messy and some developers will forget they have to decode first and you will have &amps everywhere.
You can use validation as an additional defence, but it should not be the first port of call. For example, if you are validating a UK postcode you would want to whitelist the alphanumeric characters in upper and lower cases. Any other characters would be rejected or removed by your application. This can reduce the chances of SQLi or XSS occurring on your application, but this method falls down where you need inputs to include characters that have special meaning to your output context (" '<> etc). For example, on Stack Overflow if they did not allow characters such as these you would be preventing questions and answers from including code snippets which would pretty much make the site useless.
Not all SQL statements are parameterizable. For example, if you need to use dynamic identifiers (as opposed to literals). Even whitelisting can be hard, sometimes it needs to be dynamic.
Escaping XSS on output is a good idea. Until you forget to escape it on your admin dashboard too and they steal all your admin's cookies. Don't let XSS in your database.

i18n server-side vs. client-side

Seeking some advice on two approaches to internationalization & localization. I have a web app using Spring MVC and Dojo, and I would like to support multiple languages. So, I could:
Use <spring:message> to generate the appropriate text on the server side using a properties file.
Use dojo/i18n to select the appropriate text on the client side using a js file.
And of course any combination of the two is also an option.
So, what are the pros and cons of each approach? When would you use one vs. the other?
The combination of these two approaches is the only reasonable answer.
Basically, you should try to stick to server-side, and only do client-side when it is really necessary (there is no other way, like you have some dynamically created controls).
Pros and cons? The main con of client-side string externalization is, you won't be able to translate everything correctly. That's because of context. The same English terms might be translated in a different way, depending on the context.
At the same time, you will often need to format message (add parameters to your message tag), which in regular Java you would do by calling MessageFormat.format(). Theoretically, you could do that on the client-side, but this is risky to say the least. You won't have access to original message parts (like dates, some data sources, whatever) and it might hurt translation correctness.
Formatting dates, numbers, etc. is more painful on the client-side. It is possible with Dojo or jQuery Globalize, but the results might not as good as they should be. But Spring has problem with formatting dates, anyway (lack of default local date/time designation, you may only choose from short, medium, long, full, which to me is completely useless).
Another issue might be handling plural forms (non-English). Believe, or not but languages may have more than one plural form (depending on the quantity) and because of that translations might differ. I don't think Dojo is handling it at all (however, I might be mistaken, some time has past since I evaluated it). Spring won't handle it as well, but you may build custom solution based on ICU's PluralRules (or PluralFormat if you're hard enough to learn formatting and want to kill the translators at the same time).
To cut a long story short, doing I18n correctly is far from being easy and you'll get better support on the server side.
BTW. I remember Dojo as quite "heavy", library itself was over 1MB... It might take a while to load it and your application might seem slow comparing to others... That was one of the reasons, I recommended Globalize rather than Dojo for our projects. It might not have so many features, but at least it seems lightweight.

HTML/XSS escape on input vs output

From everything I've seen, it seems like the convention for escaping html on user-entered content (for the purposes of preventing XSS) is to do it when rendering content. Most templating languages seem to do it by default, and I've come across things like this stackoverflow answer arguing that this logic is the job of the presentation layer.
So my question is, why is this the case? To me it seems cleaner to escape on input (i.e. form or model validation) so you can work under the assumption that anything in the database is safe to display on a page, for the following reasons:
Variety of output formats - for a modern web app, you may be using a combination of server-side html rendering, a JavaScript web app using AJAX/JSON, and mobile app that receives JSON (and which may or may not have some webviews, which may be JavaScript apps or server-rendered html). So you have to deal with html escaping all over the place. But input will always get instantiated as a model (and validated) before being saved to db, and your models can all inherit from the same base class.
You already have to be careful about input to prevent code-injection attacks (granted this is usually abstracted to the ORM or db cursor, but still), so why not also worry about html escaping here so you don't have to worry about anything security-related on output?
I would love to hear the arguments as to why html escaping on page render is preferred
In addition to what has been written already:
Precisely because you have a variety of output formats, and you cannot guarantee that all of them will need HTML escaping. If you are serving data over a JSON API, you have no idea whether the client needs it for a HTML page or a text output (e.g. an email). Why should you force your client to unescape "Jack & Jill" to get "Jack & Jill"?
You are corrupting your data by default.
When someone does a keyword search for 'amp', they get "Jack & Jill". Why? Because you've corrupted your data.
Suppose one of the inputs is a URL: http://example.com/?x=1&y=2. You want to parse this URL, and extract the y parameter if it exists. This silently fails, because your URL has been corrupted into http://example.com/?x=1&y=2.
It's simply the wrong layer to do it - HTML related stuff should not be mixed up with raw HTTP handling. The database shouldn't be storing things that are related to one possible output format.
XSS and SQL Injection are not the only security problems, there are issues for every output you deal with - such as filesystem (think extensions like '.php' that cause web servers to execute code) and SMTP (think newline characters), and any number of others. Thinking you can "deal with security on input and then forget about it" decreases security. Rather you should be delegating escaping to specific backends that don't trust their input data.
You shouldn't be doing HTML escaping "all over the place". You should be doing it exactly once for every output that needs it - just like with any escaping for any backend. For SQL, you should be doing SQL escaping once, same goes for SMTP etc. Usually, you won't be doing any escaping - you'll be using a library that handles it for you.
If you are using sensible frameworks/libraries, this is not hard. I never manually apply SQL/SMTP/HTML escaping in my web apps, and I never have XSS/SQL injection vulnerabilities. If your method of building web pages requires you to remember to apply escaping, or end up with a vulnerability, you are doing it wrong.
Doing escaping at the form/http input level doesn't ensure safety, because nothing guarantees that data doesn't get into your database or system from another route. You've got to manually ensure that all inputs to your system are applying HTML escaping.
You may say that you don't have other inputs, but what if your system grows? It's often too late to go back and change your decision, because by this time you've got a ton of data, and may have compatibility with external interfaces e.g. public APIs to worry about, which are all expecting the data to be HTML escaped.
Even web inputs to the system are not safe, because often you have another layer of encoding applied e.g. you might need base64 encoded input in some entry point. Your automatic HTML escaping will miss any HTML encoded within that data. So you will have to do HTML escaping again, and remember to do, and keep track of where you have done it.
I've expanded on these here: http://lukeplant.me.uk/blog/posts/why-escape-on-input-is-a-bad-idea/
The original misconception
Do not confuse sanitation of output with validation.
While <script>alert(1);</script> is a perfectly valid username, it definitely must be escaped before showing on the website.
And yes, there is such a thing as "presentation logic", which is not related to "domain business logic". And said presentation logic is what presentation layer deals with. And the View instances in particular. In a well written MVC, Views are full-blown objects (contrary to what RoR would try to to tell you), which, when applied in web context, juggle multiple templates.
About your reasons
Different output formats should be handled by different views. The rules and restrictions, which govern HTML, XML, JSON and other formats, are different in each case.
You always need to store the original input (sanitized to avoid injections, if you are not using prepared statements), because someone might need to edit it at some point.
And storing original and the xss-safe "public" version is waste. If you want to store sanitized output, because it takes too much resources to sanitize it each time, then you are already pissing at the wrong tree. This is a case, when you use cache, instead of polluting the database.

filtering user input in php

Am wondering if the combination of trim(), strip_tags() and addslashes() is enough to filter values of variables from $_GET and $_POST
That depends what kind of validation you are wanting to perform.
Here are some basic examples:
If the data is going to be used in MySQL queries make sure to use mysql_real_escape_query() on the data instead of addslashes().
If it contains file paths be sure to remove the "../" parts and block access to sensitive filename.
If you are going to display the data on a web page, make sure to use htmlspecialchars() on it.
But the most important validation is only accepting the values you are expecting, in other words: only allow numeric values when you are expecting numbers, etc.
Short answer: no.
Long answer: it depends.
Basically you can't say that a certain amount of filtering is or isn't sufficient without considering what you want to do with it. For example, the above will allow through "javascript:dostuff();", which might be OK or it might not if you happen to use one of those GET or POST values in the href attribute of a link.
Likewise you might have a rich text area where users can edit so stripping tags out of that doesn't exactly make sense.
I guess what I'm trying to say is that there is simple set of steps to sanitizing your data such that you can cross it off and say "done". You always have to consider what that data is doing.
It highly depends where you are going to use it for.
If you are going to display things as HTML, make absolutely sure you are properly specifying the encoding (e.g.: UTF-8). As long as you strip all tags, you should be fine.
For use in SQL queries, addslashes is not enough! If you use the mysqli library for example, you want to look at mysql::real_escape_string. For other DB libraries, use the designated escape function!
If you are going to use the string in javascript, addslashes will not be enough.
If you are paranoid about browser bugs, check out the OWASP Reform library
If you use the data in another context than HTML, other escaping techniques apply.

Resources