UPDATE: It was suggested in the comments that I create a wiki for this. I have done, you can find it here (should you wish to keep tabs on it and/or contribute).
http://vrs.tomelders.com
I've never worked on anything like this before, so I'm completely winging it.
I've never worked on anything like this before, so please
I'm want to work on an open "standard" or "language", or... well, I don't really know what to call it.... to make form validation easier. I'm calling it VRS (Validation Rule Sheets) but at this stage, everything is negotiable.
The idea is to create a sheet of rules, similar to CSS that define how forms should be validated. This will require
A Syntax / Specification
A VRS Parser to convert the VRS into something useable
A VRS Processor to compare the form data against the rules and return a response.
A response format.
The benefits of a system like this would be
A platform/language agnostic way to define form validations.
A cross platform, highly portable way to define form validations.
Easy to read, easy to setup, easy to modify.
Client side and backend integration.
First things first though. What should the syntax / specification look like.
The way I see this working online is that a VRS file could be specified as a hidden field and the application routes the supplied form data through the VRS processor before processing it.
By way of an example, you could validate the content type of the "name" field would look like this
name {
content-type: alpha;
}
content-type could be one of three values: alpha, numeric or alpha-numeric.
Hopefully that makes sense. I've never done anything like this before so I'm eager to get other peoples input. Here's as far as I've gotten
------------------------------------------------------------
content-type: (string) alphanumeric | alpha | numeric
Restricts content to either numeric, text or alphanumeric.
------------------------------------------------------------
is-fatal: BOOL
If the rule fails, is it fatal? This could be really useful
for AJAX responses.
------------------------------------------------------------
allow-null: BOOL
wether a field can be empty or not. Good for required fields
and checkboxes
------------------------------------------------------------
pattern-match: (string) email | url | regex
match the field against a pattern.
------------------------------------------------------------
field-match: (string) field_name
compares a field to another field. eg: password confirmation
------------------------------------------------------------
greater-than: INT | FLOAT
less-than: INT | FLOAT
within-range: INT | FLOAT, INT | FLOAT
Pretty self explanatory. With regard to strings however,
the string length is compared against the params.
------------------------------------------------------------
is-unique: (func) connection(host,user,pass), (func) map(table, field)
Check the value against a field in the database to see if
it's unique.
------------------------------------------------------------
date & time validations
This i'm a bit worried about in terms of terminology. I also
want to include dynamic vars in VRS such as
#now
#today
#thisMonth
#thisYear
------------------------------------------------------------
before: STRING | VAR
after: STRING | VAR
Again, self explanatory. Although I'm unsure what the date/time
format should be. UTC?
------------------------------------------------------------
Elapsed Time:
I'm completely stuck on how to handle conditions like
"years elapsed since date is more than 16"
I don't relish the idea of rules as prolix as
years-elapsed-since-now-are-more-than:18;
years-elapsed-since-now-are-less-than:18;
Finally, I'm debating wether devs should be able to specify the errors/warnings in the VRS or should they do that when handling the response?
So, that's a lot to take in and I hope it's clear. My question(s) I guess are
Good idea / bad idea?
Is this the right kind of syntax?
Are there more elegant ways of naming the rules.
What's missing.
thanks
UPDATE: A few people have stated that this proposed system is a bad idea. If you think so, please provide a scenario in which it wouldn't work. Thinking it's a bad idea is one thing, proving it's a bad idea is another, and I'd like to see proof that it's a bad idea sooner rather than later. If you really think form validation could not be made easier or less tedious, please explain why.
In addition, I'm aware that form validation is not a new issue. However, there is currently no portable, cross platform, cross language solution to address form validation, which is what this proposal is specifically addressing.
I like the idea of putting the error messages in the VRS too. But they should specific to the rule that failed.
Also, you might consider not developing an entirely new "language" but use something like YAML for which parses already exist.
I see this language as being useful as you could use the same VRS for both client- and server-side validation.
PS: This should be community wiki methinks.
I suppose it is a good idea, if you can maintain it yourself.
Remember, your making the syntax. It's up to you (so long as it looks decent).
no, not really. So long as the names are obvious (look like what they are), and aren't too long or confusing, then the're fine.
Perhaps you should note default values for the rules when they aren't specified.
Good idea / bad idea?
Generally, this kind of thing is a bad idea. That's what PHP is for.
What's wrong with http://www.phpformclass.com/
http://www.x-code.com/vdaemon_web_form_validation.php
or other PHP form management tools?
Is this the right kind of syntax?
No. What's wrong with PHP? It has good syntax for this kind of thing.
Are there more elegant ways of naming the rules.
Yes. PHP object classes. Numerous Other projects. You're not the first person to validate form input.
What's missing.
Answering the fundamental question: What's wrong with PHP?
A list of related projects that already do this and specific reasons why your project is better than all the other ones already out there.
Related
I named one of my boolean parameters didInfoChange.
Many people on my team tell me to change it to isInfoChanged, which I don't agree. It maybe because my team members aren't native English speaker (neither am I), but I feel that ifInfoChanged is just isn't right.
didInfoChange -> Did information change? -> True/False
is pretty understand in my oppinion
isInfoChanged -> is info changed?
just does not sound right.
It's probably not a big deal to fight about this, but I did some search and people do not really use did for flag name. I'm ok with hasInfoChanged but has and did is basically the same thing.
I'm wondering why is did not ok?
There are two questions here:
1) Which is better, didInfoChange or isInfoChanged?
The English word "change" can be transitive or intransitive, but in this context it is clear that "the info is changed" and "the info did change" mean exactly the same thing. (There is a subtle difference in connotation, but it is of no importance here.) The two names have the same length. There seems to be no difference except style convention.
2) If your way is better than theirs, what should you do?
Consider the consequences of your actions.
If you have the power to persuade the rest of the team to use your variable name, at no cost, then do so. If doing so would cause stress (e.g. by commanding your subordinates to do something they consider a bad idea), then the improvement in style probably isn't the cost to the group dynamic.
If you cannot persuade them, but you can prolong the argument and prevent the team from doing constructive work, then... don't. Use their variable name.
If you cannot prolong the argument, but you can make yourself unpopular by being argumentative, then... don't. Use their variable name.
Beside is it is also sometimes admissible to use has in naming Boolean getter-methods, depending on which auxiliary verb would be used in spoken language; I never saw did as part of a Boolean identifier.
With hasInfoChanged you would keep the participle ending (e)d. Maybe that satisfies the rest of your team.
infoChanged could be mistaken for an EventHandler-Delegate.
Unfortunately I am not a native English speaker, either.
It depends on the context and what this field "really" expresses in semantics.
didInfoChange puts emphasis on the completed action (past-tense implied) by "did" + action-verb
isInfoChanged puts emphasis and indicates the current-state when asking now by is + state; where the past-tense is indicated by the passive "changed"
Note: Info is the vague part in the name. is is a common indicator for boolean fields or getters - indicating a question (same like has or can). did is rather rarely used because we usually ask for the current state at runtime using is. The completion or history can be expressed by other parts of the name, like specific action-verb in past-tense.
Other ways of recording/asking for change
What about asking more about the context of this change:
Who did change the information ? Like an audit field (changedBy) conveying also who did change something.
When was the information changed ? Like a audit date/time field (changedAt), not only telling that it was changed, but also when.
What information was changed ? Like capturing the change itself (lastChange) which could also be null if not changed at all.
In most ORM-frameworks which capture audit-information like this (when/who/what was changed) we can see fields like createdBy or createdAt for the initial user and timestamp when created, modifiedBy or modifiedAt for the user and timestamp that last updated or changed the object.
Sometimes also a version-indicator helps to keep track of the number of changes.
Keep it simple
One compromise could be inspired by KISS-principle:
Have a boolean field changed which could for enrichment-purpose also hold a timestamp along with a getter named isChanged to query the current state like asking a human question like:
is [this] changed ?
Note: [this] is implied when invoking the method on the object like this.isChanged().
Problem: we are wanting to use SonarJS but much of our old Javascript code uses functions from the Microsoft ASP.Net framework (and the MS AjaxToolkit). As such we have a couple of hundred occurrences of the error "XXX" does not exist. Change its name or declare it so that its usage doesn't result in a "ReferenceError". (where XXX is Sys, Type, $get etc.).
I appreciate that I could suppress these by specifying them all in the sonar.javascript.globals property (as per the Elena Vilchik's answer to this question ) but it feels like what I really want to do is to add my own bespoke entries in sonar.javascript.environments (called msajax and msajaxtoolkit say). Then I could be more precise about when to include / exclude these globals.
So I guess I would like to know whether defining my own environment is supported or if there is a more elegant solutions overall.
Thanks in advance.
You are more than welcome to open pull request for https://github.com/SonarSource/sonar-javascript. Edit "javascript-frontend/src/main/resources/org/sonar/javascript/tree/symbols/globals.json" by adding new group/groups of names.
This may be a stupid question, but here goes.
I've seen several projects using some translation library (e.g. gettext) working with plain english placeholders. So for example:
_("Please enter your name");
instead of abstract placeholders (which has always been my instinctive preference)
_("error_please_enter_name");
I have seen various recommendations on SO to work with the former method, but I don't understand why. What I don't get is what do you do if you need to change the english wording? Because if the actual text is used as the key for all existing translations, you would have to edit all the translations, too, and change each key. Or don't you?
Isn't that awfully cumbersome? Why is this the industry standard?
It's definitely not proper normalization to do it this way. Are there massive advantages to this method that I'm not seeing?
Yes, you have to alter the existing translation files, and that is a good thing.
If you change the English wording, the translations probably need to change, too. Even if they don't, you need someone who speaks the other language to check.
You prep a new version, and part of the QA process is checking the translations. If the English wording changed and nobody checked the translation, it'll stick out like a sore thumb and it'll get fixed.
The main language is already existent: you don't need to translate it.
Translators have better context with a real sentence than vague placeholders.
The placeholders are just the keys, it's still possible to change the original language by creating a translation for it. Because when the translation doesn't exists, it uses the placeholder as the translated text.
We've been using abstract placeholders for a while and it was pretty annoying having to write everything twice when creating a new function. When English is the placeholder, you just write the code in English, you have meaningful output from the start and don't have to think about naming placeholders.
So my reason would be less work for the developers.
I like your second approach. When translating texts you always have the problem of homonyms. Like 'open' can mean a state of a window but also the verb to perform the action. In other languages these homonyms may not exist. That's why you should be able to add meaning to your placeholders. Best approach is to put this meaning in your text library. If this is not possible on the platform the framework you use, it might be a good idea to define a 'development language'. This language will add meaning to the text entries like: 'action_open' and 'state_open'. you will off course have to put extra effort i translating this language to plain english (or the language you develop for). I have put this philosophy in some large projects and in the long run this saves some time (and headaches).
The best way in my opinion is keeping meaning separate so if you develop your own translation library or the one you use supports it you can do something like this:
_(i18n("Please enter your name", "error_please_enter_name"));
Where:
i18n(text, meaning)
Interesting question. I assume the main reason is that you don't have to care about translation or localization files during development as the main language is in the code itself.
Well it probably is just that it's easier to read, and so easier to translate. I'm of the opinion that your way is best for scalability, but it does just require that extra bit of effort, which some developers might not consider worth it... and for some projects, it probably isn't.
There's a fallback hierarchy, from most specific locale to the unlocalised version in the source code.
So French in France might have the following fallback route:
fr_FR
fr
Unlocalised. Source code.
As a result, having proper English sentences in the source code ensures that if a particular translation is not provided for in step (1) or (2), you will at least get a proper understandable sentence than random programmer garbage like “error_file_not_found”.
Plus, what do you do if it is a format string: “Sorry but the %s does not exist” ? Worse still: “Written %s entries to %s, total size: %d” ?
Quite old question but one additional reason I haven't seen in the answers yet:
You could end up with more placeholders than necessary, thus more work for translators and possible inconsistent translations. However, good editors like Poedit or Gtranslator can probably help with that.
To stick with your example:
The text "Please enter your name" could appear in a different context in a different template (that the developer is most likely not aware of and shouldn't need to be). E.g. it could be used not as an error but as a prompt like a placeholder of an input field.
If you use
_("Please enter your name");
it would be reusable, the developer can be unaware of the already existing key for an error message and would just use the same text intuitively.
However, if you used
_("error_please_enter_name");
in a previous template, developers wouldn't necessarily be aware of it and would make up a second key (most likely according to a predefined wording scheme to not end up in complete chaos), e.g.
_("prompt_please_enter_name");
which then has to be translated again.
So I think that doesn't scale very well. A pre-agreed wording scheme of suffixes/prefixes e.g. for contexts can never be as precise as the text itself I think (either too verbose or too general, beforehand you don't know and afterwards it's difficult to change) and is more work for the developer that's not worth it IMHO.
Does anybody agree/disagree?
Am wondering if the combination of trim(), strip_tags() and addslashes() is enough to filter values of variables from $_GET and $_POST
That depends what kind of validation you are wanting to perform.
Here are some basic examples:
If the data is going to be used in MySQL queries make sure to use mysql_real_escape_query() on the data instead of addslashes().
If it contains file paths be sure to remove the "../" parts and block access to sensitive filename.
If you are going to display the data on a web page, make sure to use htmlspecialchars() on it.
But the most important validation is only accepting the values you are expecting, in other words: only allow numeric values when you are expecting numbers, etc.
Short answer: no.
Long answer: it depends.
Basically you can't say that a certain amount of filtering is or isn't sufficient without considering what you want to do with it. For example, the above will allow through "javascript:dostuff();", which might be OK or it might not if you happen to use one of those GET or POST values in the href attribute of a link.
Likewise you might have a rich text area where users can edit so stripping tags out of that doesn't exactly make sense.
I guess what I'm trying to say is that there is simple set of steps to sanitizing your data such that you can cross it off and say "done". You always have to consider what that data is doing.
It highly depends where you are going to use it for.
If you are going to display things as HTML, make absolutely sure you are properly specifying the encoding (e.g.: UTF-8). As long as you strip all tags, you should be fine.
For use in SQL queries, addslashes is not enough! If you use the mysqli library for example, you want to look at mysql::real_escape_string. For other DB libraries, use the designated escape function!
If you are going to use the string in javascript, addslashes will not be enough.
If you are paranoid about browser bugs, check out the OWASP Reform library
If you use the data in another context than HTML, other escaping techniques apply.
In the path:
Format: http://mydomain.com/{category}/{subcategory}/{pageNumber}/{pageSize}
Example: http://mydomain.com/books/thriller/3/25
In the querystring:
Format: http://mydomain.com/{category}/{subcategory}? pageNumber={pageNumber}&pageSize={pageSize}
Example: http://mydomain.com/books/thriller?pageNumber=3&pageSize=25
I like having everything on the path, but my problem with that is that while it is obvious (or at least somewhat obvious) what "books" and "thriller" are in first example, the "3" and "25" seem pretty arbitrary by contrast.
Is there a canonical method for determining what goes where in MVC, or is it really just up to the dev?
I prefer things like pagenumbers to be in the querystring variables. I think there's a difference in descriptiveness between
http://mydomain.com/books/thriller?pagesize=50&page=4
and
http://mydomain.com/books/thriller/50/4
The point (to me) of having clean url's is for them to be more descriptive and readable, and I find the first example to be just that.
One interesting point made byJohnRudolfLewis is:
One rule of thumb that I follow is
that if the argument is required,
consider using the path, if the
argument is optional, always use
querystring arguments.
One rule of thumb that I follow is that if the argument is required, consider using the path, if the argument is optional, always use querystring arguments.
Overall, I'd stick to whatever makes the url look more readable.
This site puts it in the querystring: https://stackoverflow.com/questions?page=2&pagesize=30
Well, it's obviously up to you. But, you're designing a RESTful interface that's supposed to be human readable. The querystring is much better in that regard. Otherwise you're looking at two numbers that could really be anything. And who's going to remember the order?
Is there a canonical method for determining what goes where in MVC, or is it really just up to the dev?
It's up to you.
MVC is about the organization/flow of your server-side code and seperating the view from the business layer, not so much about query parameters.
You could also consider the following
Format
http://mydomain.com/{category}/{subcategory}/page/{pageNumber}/results/{pageSize}
Example
http://mydomain.com/books/thriller/page/3/results/25
It is pretty much up to the dev. I would say put the pageSize in the URL.