Handling comments in code when doing i18n - internationalization

I'm in the process of translating a Open Source project from Chinese to English, and I've used i18n (in this case babel) to separate the code from both English and Chinese translations.
Everything's done, except for a rather large number of inline comments in the code.
Obviously, babel can't translate comments inline (and it would be rather obnoxious if it did, anyway. Since code would not be unique across languages and therefore less easily verifiable.)
The way I see it, there are a number of options:
Leave comments in -
Pro: Helps original author, etc.
Con: Makes it distracting for ongoing translation and anyone who doesn't speak the language
Strip out all the comments -
Pro: Code is native-language-agnostic, so it makes sense. Who needs comments anyway? Use the source, Luke!
Con: Goes against SE principles. Could lose something important in understanding how the code works - maybe something's been done to avoid a security risk, etc.
Place English comments near Chinese comments
(Possibly moved to lines above and prefixed with "EN" and "ZH", for example).
Pro: Best of both worlds, comments kept close to code
Con: Not conducive to dictionary-style translation. Can get bulky with more languages.
Create a comment dictionary / notes
Pro: Keeps the comments in a separate file for easy translation.
Con: Difficult to keep synced with code. Not intuitive to remember to update comments related to code when changing coe.
Use a different preprocessor for i18n before/after each development cycle.
Pro: Comments et al would be in your language. Could link this to git pull/push so you only ever see the code in your language.
Con: Bulky, non-obvious process. Could result in code-verification or even compilation errors.
None of these seem like really great solutions.
If you do alot of this, and the code is shared publicly between developers who don't share a native tongue, is there a recommended way to handle translating (or not) comments in the code itself?

I am not sure I understand... You say you separated the code from the languages part. So now you should have code (with comments) + English resources + Chinese resources (i used resources for whatever your programming language use to store localizable content)
Translators only see the resources, not the code, nor the comments. The comments stay untranslated, for the developers.

Short Answer
It seems to be a mixture of:
Strip out all the comments, and
Place English comments near Chinese comments.
Inline comments are almost always trivial - Strip them
Functional comments are not as intrusive - Translate them (possibly with a i18n prefix e.g. "[cn]:" or "[en]:").
Explanation
My meagre amount of research tends to suggest that larger projects make strong attempts to reduce comments and let the code explain itself, instead focusing on code quality which reduces the need for comments.
e.g. From the Linux Kernel Coding guidelines:
NEVER try to explain HOW your code works in a comment: it's much
better to write the code so that the working is obvious, and it's
a waste of time to explain badly written code.
...and from the MySQL coding standards:
Comment your code when you do something that someone else may think is
not trivial.
Both of these standards (and others) recommend minimal function descriptions also, so that's not as obtrusive to understanding the code, and, since function descriptions are generally multi-lined and above the code itself, multiple languages can be included as necessary.
Maybe someone, somewhere has built an interface that can isolate comments into the readers language, but I couldn't (yet) find any that do so.

I always think that API comments exported in the project and private comments in open source projects should be internationalized, which is very convenient for developers in other countries.
On Github, there are actually many developers who use their own national language to comment on some well-known open source projects and some of their own annotations. Most of the reason is that if they do not translate, the efficiency of developers reading comments very low.
Similar to .d.ts in TypeScript, I think function annotation translation can also take a similar form, which is more convenient for the community to feedback translation content, because in fact many developers are willing to do so.

Related

Code "internationalization"

I worked on different projects in different countries and remarked that sometimes the code became internationalized, like
SetLargeurEtHauteur() (for SetWidthAndHeight, fr)
Dim _ListaDeObiecte as List(Of Object) (for _ObjectList, ro)
internal void SohranenieUserov() (for SaveUsers, ru)
etc.
It happens that in countries with Latin alphabet this mix is more pronounced, because there is no need of transliteration.
More than that, often the programming "jargon" is inspired by the project specifications language. There are cases that terms in "project language" have a meaning that is not "translatable" in English.
There are also projects on which works only, say a French team, uses French words (say, Personne, Vehicule, Projet etc).
In that cases I personally add in specifications a "Dictionary" that explains all business object names and only these objects are used in other (French) language.
Say:
Collectif - ensemble des Personnes;
All the actions(Get, Set, Update, Modify, Load, etc) are in English.
Now that "strong" names could be used in code:
AddPersonneToCollectif.
What is your approach to "internationalization"?
PS.
I was amused that VisualStudio compiles and runs projects in .NET with buttons named à la "btnAddÉlève" or "кпкСтоп"...
My personal approach, which is shared by many but not all in the programming community, is that source code should be in English and, if possible, all the development tools should be in English too.
The most important reason for this is being able to share your problems and solutions with the world (like we are doing now in StackOverflow, no less) without having to translate class names, error messages, paths and other artifacts every time.
It also helps consistency, because most libraries are written in English and having element names that mix two languages doesn't really help anyone, besides being a constant focus of internal conflicts when a verb like Add isn't always traslated.
English code also makes it easier to add foreign people to a project without worrying about comprehension and misunderstandings (especially between closely related languages, like Spanish and Portuguese, which have lots of false cognates)
A good link on this subject: http://www.codinghorror.com/blog/2009/03/the-ugly-american-programmer.html
(In case anyone wonders, I'm south-american and English is not my primary language)
Even if everyone on the team is a good English speaker (which is not a given), they may not necessarily know the English equivalent of all the business terminology.
I think it's a project-specific decision what to allow, but I would generally tolerate and in some cases encourage business terms (e.g. entity names) in the local language, but not technical terms (i.e. not Largeur/Hauteur instead of Width/Height).
For example in the financial world in France, everyone knows what is meant by OPCVM and FCP - if you attempt English translations you might end up with more misunderstandings than you do by allowing mixed languages.
I have the same issue with Norwegian currently. I guess it depends on your position in the project, the available time and the role of the software.
In my case, I have decided to keep all terms in an existing protocol and library I am working with in Norwegian, as I can reasonably expect that generations of administrative workers have gotten used to these, and since the library depends on the protocol. In a library wrapper for an international project, I have translated each method name literally, and added an English language documentation of the method.
Comments and documentation on the code are in English.
If designing a software from scratch, I would try to find English terms for all method names and even business terms (if reasonable. I can hardly think of an example where no term can be found though.), to keep it "portable".
If you're writing code that you may be used internationally one day, write it in English. In doubt, write it in English, even comments if you can (although I suppose you can add a few comments in the language of your workplace).
It's not specific to coding unfortunately. English isn't my native language, but I've been able to read a number of technical papers and participate to international conferences with people from all over the world. These collaborations simply wouldn't work if everyone published in their respective native languages.
It may be sad if you feel like defending your language at all cost, but you have to be realistic about it. I suppose English has the advantage to be relatively simple for achieving a basic level: no genders for names, no conjugations, no cases.
Generally code referring to language concepts should be in the same mother language as the programming language (i.e. English - for, while, string are all English words).
It's OK (but not great) to have variables and domain concepts in a local language, but you definitely don't want to be translating List, Object, Decimal, etc. into terms which cause programmers more work in reconciling two languages. Even still, I would strongly lobby to restrict very common domain concepts like Collection, Membership, Person, User and possibly less common domain concepts like Invoice, Receipt to English where this is possible.
It would be like coding half your classes in VB and half in C# - your brain has to make a cognitive shift. While this is good for hybrid apps (JavaScript on the web and C# on the backend) because it helps you keep what's running where clear, it isn't good for a general programming.
In addition, using English for everything makes the domain and language words work together better.
There are always exceptions. There are certain cases where you would use a native word anyway - where the word describes the domain best. For instance, in our (English) code base, we had references to Mexican Spanish terms for certain concepts which were only relevant for people running our software in Mexico. Typically, Japanese terms were spelled out phonetically/Romaji, though - it was difficult for non-Japanese to be able to pronounce the pictograms ;-).
I think I'd call a code base like that "abysmal" rather than "internationalized", but the general rule I've always heard is that if you ever think that someone other than one who speaks your language might ever touch the code, do it in english.
I think that good design guideline is to write code with use of English names and with english comments only (of course if your team is capable to do this, but in case of international team English seems to be natural choose, since it's a most popular language, expecially in IT world).
Good explanation of such guidline is that keywords in most of programming languages are taken from English so writing your code using English names gives more consistent look and thanks to this you end with code that is easier to read.
Another reason is that most of compilers can handle only ascii characters as names of classes, methods, etc. so probably you will end with some strange names when you decide to use some language with alphabet containing non ascii chars.
Third reason that came to my mind is sharing your code on site like SO. Today I opened a post with a piece of code where classes had Spanish names. It was hard for me to guess what was the purpose of this class (even if sometimes is not necessary it is good when you read code and understand all used words:)).
To sum up I think that internationalization of code is not a good idea. You can imagine that keywords in programming languages (e.g. class, try, while) could also be localized and probably you can imagine also how hard life could be then...
To keep things consistent, I would make the code the same (human) language as the (programming) language. That is, if the programming language uses English keywords (like for, switch, public, etc) then keep the rest of the code in English. If you are using a compiler that recognizes (say) the Swahili translations of keywords, then keep the rest of the code in Swahili.
Many APIs have standardized naming schemes that are followed regardless of (human) language, and the accompanying documentation is translated as needed (instead of the source code).
No matter what (human) language you choose, pick one and stick with it. I'd much rather try to wade through source code in German than code that was a mix of German and English.

Does identifier casing really matter?

FxCop thought me (basically, from memory) that functions, classes and properties should be written in MajorCamelCase, while private variables should be in minorCamelCase.
I was talking about a reasonably popular project on IRC and quoted some code. One other guy, a fairly notorious troll who was also a half-op (gasp!) didn't seem to agree. Everything oughta be in the same casing, and he quite fervently favored MajorCamelCase, or even underscore_separation.
Ofcourse, he was just a troll so I reckoned I'd just keep doing it the way I already did. Before I learned the above guidelines, I hardly even had a coherent naming style.
He got me thinking, though -- does stuff like this really matter?
You need to make sure that your code is readable in the future. Please remember that you might want to pass the development of your application to someone else and this person will need to read and understand it. You could stop actively working on a project and return to it after a year - and be suprised that you have to read code carefully to understand how it works.
I believe it was Steve McConnell who said that specific naming style does not really matter (you could use anything you want as long as you are consistent) but this only applies when everyone working on the project agree with you.
In general it is better to adopt community-accepted coding styles where possible to facilitate code reuse and shorten learning curves.
If you don't care about long-term maintanability of your project (or consistency or readability) then no, casing (and coding conventions in general) don't really matter. Otherwise, they do matter. See this.
Your specific coding style doesn't matter (much), so long as it is consistent throughout the project.
This improves readability and understanding, as if an identifier is named in a particular way, the reader can (hopefully) be confident as to what that naming style implies.
As regards CamelCase v underscores, etc: again, it's down to your coding convention. One approach which uses both is to apply a prefix with underscore to indicate the module in which the function, or file-scope/global variable, is used, e.g. Config_Update(), Status_Get().

Are style-enforcement tools useful?

A recent question about StyleCop alerted me to the use of tools to enforce coding style. I would feel very annoyed if I were required to run one of these tools while I was developing. Do people really find them useful? Why or why not?
Everyone that has answered so far has indicated that they think that style/formatting rules are useful, and I am in 100% agreement with that. But what about using a tool for enforcement, rather than a style guide and regular code reviews? Have people found that useful in practice? Why or why not?
Yes, it's very helpful - particularly in large projects. It means you can go to anyone else's code, and it won't look alien to you. This means that people are more portable across projects, which gives a lot more flexibility - both for the person and the company.
The downside is that a lot of time can be spent arguing over which style to use.
There is a difference between a Coding style and a Formatting style.
A coding style enforces good practices.
the body of a 'IF' statement must be wrapped in opening and closing curly brackets
A formatting style is how the code looks.
where the '{' comes in an 'IF' statement.
In a team environment;
a good formatting tool will allow all the developers to see the code the way they want to see the code.
a good style tool will insure all the code follows the same guidelines
I like the concept of StyleCop, although I don't really care for a lot of the rules. Style is just so subjective that I find myself struggling to firmly decide if it should be part of our process or not. I really would prefer to see the team with a unified style, though, which is why I am so torn.
Obviously, the flip-side of the equation, with a tool like FxCop (or Code Analysis for fellow TFS users) is more based on practices, so the decision becomes more technical than personal and stylistic.
If style refers to formatting (like '{' must be at the end or at the beginning of a line), it can be very annoying, especially if merges are involves and if that style is not strictly enforced for all developers.
If style refers to 'good practice" (like the body of a 'if' statement must be wrapped in opening and closing curly brackets), it can be actually very useful.
I think in a large team, a uniform coding style is essential. Having some standard helps with maintainability, in that a new developer can be brought on to maintain old code, with minimal learning curve.
As far as enforcing styling differences (such as where the '{' comes) can be very easily be accomplished by automated tools, without imposing on the development process too much. Eclipse and Visual Studio both have a very rich set of options to format your code automatically based on a large set of options.
Restrictions on programming or formatting style might help reducing friction in a team of more than one person.
Restrictions on language features (especially using only a subset of C#) can help you concentrate on the problem domain instead of having to deal with an overwhelming number of concepts. This does matter if your software has to be robust and thoroughly understandable.
Regards,
tamberg
If you are using a version control system, it can get very ugly if every developer reformats the code towards his own preferences whenever he touches a file. In a place where the developers don't have the necessary communication skills, Wikipedia-like edit-wars can ensue if each developer passive-aggressively sticks to "his" standard.
Overall, manual reformatting also leads to more conflicts on checkins if two people work on the same file.
So if you are using a VCS, I'd even recommend enforcing formatting rules. Enforcing style rules can lead to better code quality.

Do you have coding standards? If so, how are they enforced? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I've worked on a couple of projects where we spent a great deal of time discussing and writing elaborate coding standards covering everything from syntax layout to actual best practices. However, I have also found that these are rarely followed to the full extent. Many developers seem to hesitate to reject a code review based on coding standard violations alone. I.e. violations are committed to the repository on a regular basis.
My questions are: Do you have coding standards? What do they cover? Are they followed by everyone? And what do you do (if anything) to make sure everybody is following the standards?
I'm aware that there is a similar question here, but my concern is not so much how you could do it, but how you are actually going about it and what are the perceived benefits?
I've worked in places with barely-followed coding practices, and others where they're close to being enforced - or at least easily checked.
A few suggestions:
The most important thing is to get buy-in to the idea that consistency trumps your personal preferred style. There should be discussion of the coding standard both before and after it's instituted, but no-one should be allowed to just opt out of it.
Code reviews should be mandatory, with the checkin comment including the username of the reviewer. If you're using a suitably powerful SCM, consider not allowing checkins which don't have a valid reviewer name.
There should be a document which everyone knows about laying out the coding standards. With enough detail, you shouldn't get too much in the way of arguments.
Where possible, automate checking of the conventions (via Lint, CheckStyle, FXCop etc) so it's easy for both the committer and the reviewer to get a quick check of things like ordering import/using directives, whitespace etc.
The benefits are:
Primarily consistency - if you make it so that anyone can feel "at home" in any part of the codebase at any time, it gives you more flexibility.
Spreading best practice - if you ban public fields, mutable structs etc then no-one can accidentally plant a time bomb in your code. (At least, not a time bomb that's covered by the standard. There's no coding standard for perfect code, of course :)
EDIT: I should point out that coding standards are probably most important when working in large companies. I believe they help even in small companies, but there's probably less need of process around the standard at that point. It helps when all the developers know each other personally and are all co-located.
Do you have coding standards?
Yes, differs from project to project.
What does it cover?
Code(class, variable, method, constant), SQL naming and formatting convention
Is it being followed by everyone?
Yes, every new entrant in project could be asked to create a demo project following organization coding convention then it gets reviewed. This exercise makes developer feel at ease before starting real job.
And what do you do (if anything) to make sure everybody is following the standard?
Use StyleCop and FxCop to ensure they are religiously followed. It would show up as warning/error if code fails to comply with organization coding convention.
Visual Studio Team system has nice code anlysis and check-In policies which would prevent developers checking in code that does not comply
Hope, it helps
Thanks,
Maulik Modi
We take use of the Eclipse's save actions and formatters. We do have a suggested standard, but nobody is actually enforcing it, so there are some variations on what is actually formatted, and how.
This is something of a nuisance (for me), as various whitespace variations are committed as updates to the SVN repository...
StyleCop does a good job of enforcing coding layout practices and you can write custom rules for it if something isn't covered in the base rules that is important to you.
I think coding standards are very important. There is nothing more frustrating than trying to find the differences between two revisions of a file only to find that the whole file has been changed by someone who reformatted it all. And I know someone is going to say that that sort of practice should be stamped out, but most IDEs have a 'reformat file' feature (Ctrl-K Ctrl-D in Visual Studio, for example), which makes keeping your code layed out nicely much easier.
I've seen projects fail through lack of coding standards - the curly-brace wars at my last company were contributary to a breakdown in the team.
I've found the best coding standards are not the standards made up by someone in the team. I implemented the standards created by iDesign (click here) in our team, which gets you away from any kind of resentment you might get if you try to implement your own 'standard'.
A quick mention of Code Style Enforcer (click here) which is pretty good for highlighting non-compliance in Visual Studio.
I have a combination of personal and company coding standards that are still evolving to some extent. They cover code structure, testing, and various documents describing various bits of functionality.
As it evolves, it is being adopted and interpreted by the rest of my team. Part of what will happen ultimately is that if there is concensus on some parts then those will hold up while other parts may just remain code that isn't necessarily going to be up to snuff.
I think there may be some respect or professional admiration that act as a way of getting people to follow the coding standards where some parts of it become clear after it is applied, e.g. refactoring a function to be more readable or adding tests to some form, with various "light bulb moments" to borrow a phrase from Oprah.
Another part of the benefit is to see how well do others work, what kinds of tips and techniques do they have and how can one improve over time to be a better developer.
I think the best way to look at coding standards is in terms of what you hope to achieve by applying, and the damage that they can cause if mis-applied. For example, I see the following as quite good;
Document and provide unit tests that illustrate all typical scenarios for usage of a given interface to a given routine or module.
Where possible use the following container classes libraries, etc...
Use asserts to validate incoming parameters and results returned (C & C++)
Minimise scope of all variables
Access object members through methods
Use new and delete over malloc and free
Use the prescribed naming conventions
I don't think that enforcing style beyond this is a great idea, as different programmers are efficient using differing styles. Forcing programmers to change style can be counter productive and lead to lost time and reduced quality. Standards should be kept short and easy to understand.
Oh yes, I'm the coding standard police :) I just wrote a simple script to periodically check and fix the code (my coding standard is simple enough to implement that.) I hope people will get the message after seeing all these "coding convention cleanups" messages :)
We have a kind of 'loose' standard. Maybe because of our inability to have agreement upon some of those 'how many spaces to put there and there', 'where to put my open brace, after the statement or on the next line'.
However, as we have main developers for each of the dedicated modules or components, and some additional developers that may work in those modules, we have the following main rule:
"Uphold the style used by the main developer"
So if he wants to do 3 space-indentation, do it yourself also.
It's not ideal as it might require retune your editor settings, but it keeps the peace :-)
Do you have coding standards?
What does it cover?
Yes, it has naming conventions, mandatory braces after if, while ... , no warning allowed, recommendations for 32/64 bits alignment, no magic number, header guards, variables initialization and formatting rules that favor consistency for legacy code.
Is it being followed by everyone?
And what do you do (if anything) to make sure everybody is following the standard?
Mostly, getting the team agreement and a somewhat lightweight coding standard (less than 20 rules) helped us here.
How it is being enforced ?
Softly, we do not have coding standard cop.
Application of the standard is checked at review time
We have template files that provide the standard boilerplate
I have never seen a project fail because of lack of coding standards (or adherence to them), or even have any effect on productivity. If you are spending any time on enforcing them then you are wasting money. There are so many important things to worry about instead (like code quality).
Create a set of suggested standards for those who prefer to have something to follow, but leave it at that.
JTest by ParaSoft is decent for Java.
Our coding standards are listed in our Programmer's Manual so everyone can easily refer to them. They are effective simply because we have buy in from all team members, because people are not afraid to raise standards and style issues during code reviews, and because they allow a certain level of flexibility. If one programmer creates a new file, and she prefers to place the bracket on the same line as an if statement, that sets the standard for that file. Anyone modifying that file in the future must use the same style to keep things consistent.
I'll admit, when I first read the coding standards, I didn't agree with some of them. For instance, we use a certain style for function declarations that looks like this:
static // Scope
void // Type declaration
func(
char al, //IN: Description of al
intl6u hash_table_size, //IN/OUT: Description of hash_table_size
int8u s) //OUT: Description of s
{
<local declarations>
<statements>
}
I had never seen that before, so it seemed strange and foreign to me at first. My gut reaction was, "Well, that's dumb." Now that I've been here a while, I have adjusted to the style and appreciate how I can quickly comprehend the function declaration because everyone does it this way.

Standards Document

I am writing a coding standards document for a team of about 15 developers with a project load of between 10 and 15 projects a year. Amongst other sections (which I may post here as I get to them) I am writing a section on code formatting. So to start with, I think it is wise that, for whatever reason, we establish some basic, consistent code formatting/naming standards.
I've looked at roughly 10 projects written over the last 3 years from this team and I'm, obviously, finding a pretty wide range of styles. Contractors come in and out and at times, and sometimes even double the team size.
I am looking for a few suggestions for code formatting and naming standards that have really paid off ... but that can also really be justified. I think consistency and shared-patterns go a long way to making the code more maintainable ... but, are there other things I ought to consider when defining said standards?
How do you lineup parenthesis? Do you follow the same parenthesis guidelines when dealing with classes, methods, try catch blocks, switch statements, if else blocks, etc.
Do you line up fields on a column? Do you notate/prefix private variables with an underscore? Do you follow any naming conventions to make it easier to find particulars in a file? How do you order the members of your class?
What about suggestions for namespaces, packaging or source code folder/organization standards? I tend to start with something like:
<com|org|...>.<company>.<app>.<layer>.<function>.ClassName
I'm curious to see if there are other, more accepted, practices than what I am accustomed to -- before I venture off dictating these standards. Links to standards already published online would be great too -- even though I've done a bit of that already.
First find a automated code-formatter that works with your language. Reason: Whatever the document says, people will inevitably break the rules. It's much easier to run code through a formatter than to nit-pick in a code review.
If you're using a language with an existing standard (e.g. Java, C#), it's easiest to use it, or at least start with it as a first draft. Sun put a lot of thought into their formatting rules; you might as well take advantage of it.
In any case, remember that much research has shown that varying things like brace position and whitespace use has no measurable effect on productivity or understandability or prevalence of bugs. Just having any standard is the key.
Coming from the automotive industry, here's a few style standards used for concrete reasons:
Always used braces in control structures, and place them on separate lines. This eliminates problems with people adding code and including it or not including it mistakenly inside a control structure.
if(...)
{
}
All switches/selects have a default case. The default case logs an error if it's not a valid path.
For the same reason as above, any if...elseif... control structures MUST end with a default else that also logs an error if it's not a valid path. A single if statement does not require this.
In the occasional case where a loop or control structure is intentionally empty, a semicolon is always placed within to indicate that this is intentional.
while(stillwaiting())
{
;
}
Naming standards have very different styles for typedefs, defined constants, module global variables, etc. Variable names include type. You can look at the name and have a good idea of what module it pertains to, its scope, and type. This makes it easy to detect errors related to types, etc.
There are others, but these are the top off my head.
-Adam
I'm going to second Jason's suggestion.
I just completed a standards document for a team of 10-12 that work mostly in perl. The document says to use "perltidy-like indentation for complex data structures." We also provided everyone with example perltidy settings that would clean up their code to meet this standard. It was very clear and very much industry-standard for the language so we had great buyoff on it by the team.
When setting out to write this document, I asked around for some examples of great code in our repository and googled a bit to find other standards documents that smarter architects than I to construct a template. It was tough being concise and pragmatic without crossing into micro-manager territory but very much worth it; having any standard is indeed key.
Hope it works out!
It obviously varies depending on languages and technologies. By the look of your example name space I am going to guess java, in which case http://java.sun.com/docs/codeconv/ is a really good place to start. You might also want to look at something like maven's standard directory structure which will make all your projects look similar.

Resources