Migrating from one C language to another, change Style?

Migrating from one C language to another, change Style? - coding-style

I find myself in conflict, regarding which code style I should follow when using a different C language.
Currently I am doing work (different projects) in C++, C# and Objective-C
I noticed there is a lot of discrepancy in the conventions basic frameworks follow. Generally, I don't think it's a bad idea to adhere to these conventions, as it makes code feel more "integrated" into the environment. However it is hard for me to remember all the differences and apply principles correctly.
In C# for example, all methods of a class start Uppercase, while Objective-C seems to prefer camelCase style methods.
What tactic would you choose:
One style to rule them all (as far as applicable)
Stick with what is common in the given environment
I do especially like the google styleguides, which seem to recommend the latter. However I disagree with them on using spaces instead of tabs and their indentation in general (e.g. methods on same level as class etc.)

I think you should stick to the "accepted" styles for each language. My rationale for that is that I think it would be much easier to recall what environment you're in when you have to think in the style used for that language. It will also be much easier for someone who is familiar with that environment to look at your code and feel more comfortable with the style and formatting (i.e. less chance for them to misunderstand what they're looking at).

My rule with porting code is: Don't touch it unless you have to.
My rule with modifying old code is: Use the style of the file.
Outside of those two situations, things like coding standards and perhaps your own opinion on good style can come into play.

Related

Handling comments in code when doing i18n

I'm in the process of translating a Open Source project from Chinese to English, and I've used i18n (in this case babel) to separate the code from both English and Chinese translations.
Everything's done, except for a rather large number of inline comments in the code.
Obviously, babel can't translate comments inline (and it would be rather obnoxious if it did, anyway. Since code would not be unique across languages and therefore less easily verifiable.)
The way I see it, there are a number of options:
Leave comments in -
Pro: Helps original author, etc.
Con: Makes it distracting for ongoing translation and anyone who doesn't speak the language
Strip out all the comments -
Pro: Code is native-language-agnostic, so it makes sense. Who needs comments anyway? Use the source, Luke!
Con: Goes against SE principles. Could lose something important in understanding how the code works - maybe something's been done to avoid a security risk, etc.
Place English comments near Chinese comments
(Possibly moved to lines above and prefixed with "EN" and "ZH", for example).
Pro: Best of both worlds, comments kept close to code
Con: Not conducive to dictionary-style translation. Can get bulky with more languages.
Create a comment dictionary / notes
Pro: Keeps the comments in a separate file for easy translation.
Con: Difficult to keep synced with code. Not intuitive to remember to update comments related to code when changing coe.
Use a different preprocessor for i18n before/after each development cycle.
Pro: Comments et al would be in your language. Could link this to git pull/push so you only ever see the code in your language.
Con: Bulky, non-obvious process. Could result in code-verification or even compilation errors.
None of these seem like really great solutions.
If you do alot of this, and the code is shared publicly between developers who don't share a native tongue, is there a recommended way to handle translating (or not) comments in the code itself?

I am not sure I understand... You say you separated the code from the languages part. So now you should have code (with comments) + English resources + Chinese resources (i used resources for whatever your programming language use to store localizable content)
Translators only see the resources, not the code, nor the comments. The comments stay untranslated, for the developers.

Short Answer
It seems to be a mixture of:
Strip out all the comments, and
Place English comments near Chinese comments.
Inline comments are almost always trivial - Strip them
Functional comments are not as intrusive - Translate them (possibly with a i18n prefix e.g. "[cn]:" or "[en]:").
Explanation
My meagre amount of research tends to suggest that larger projects make strong attempts to reduce comments and let the code explain itself, instead focusing on code quality which reduces the need for comments.
e.g. From the Linux Kernel Coding guidelines:
NEVER try to explain HOW your code works in a comment: it's much
better to write the code so that the working is obvious, and it's
a waste of time to explain badly written code.
...and from the MySQL coding standards:
Comment your code when you do something that someone else may think is
not trivial.
Both of these standards (and others) recommend minimal function descriptions also, so that's not as obtrusive to understanding the code, and, since function descriptions are generally multi-lined and above the code itself, multiple languages can be included as necessary.
Maybe someone, somewhere has built an interface that can isolate comments into the readers language, but I couldn't (yet) find any that do so.

I always think that API comments exported in the project and private comments in open source projects should be internationalized, which is very convenient for developers in other countries.
On Github, there are actually many developers who use their own national language to comment on some well-known open source projects and some of their own annotations. Most of the reason is that if they do not translate, the efficiency of developers reading comments very low.
Similar to .d.ts in TypeScript, I think function annotation translation can also take a similar form, which is more convenient for the community to feedback translation content, because in fact many developers are willing to do so.

Does identifier casing really matter?

FxCop thought me (basically, from memory) that functions, classes and properties should be written in MajorCamelCase, while private variables should be in minorCamelCase.
I was talking about a reasonably popular project on IRC and quoted some code. One other guy, a fairly notorious troll who was also a half-op (gasp!) didn't seem to agree. Everything oughta be in the same casing, and he quite fervently favored MajorCamelCase, or even underscore_separation.
Ofcourse, he was just a troll so I reckoned I'd just keep doing it the way I already did. Before I learned the above guidelines, I hardly even had a coherent naming style.
He got me thinking, though -- does stuff like this really matter?

You need to make sure that your code is readable in the future. Please remember that you might want to pass the development of your application to someone else and this person will need to read and understand it. You could stop actively working on a project and return to it after a year - and be suprised that you have to read code carefully to understand how it works.
I believe it was Steve McConnell who said that specific naming style does not really matter (you could use anything you want as long as you are consistent) but this only applies when everyone working on the project agree with you.
In general it is better to adopt community-accepted coding styles where possible to facilitate code reuse and shorten learning curves.

If you don't care about long-term maintanability of your project (or consistency or readability) then no, casing (and coding conventions in general) don't really matter. Otherwise, they do matter. See this.

Your specific coding style doesn't matter (much), so long as it is consistent throughout the project.
This improves readability and understanding, as if an identifier is named in a particular way, the reader can (hopefully) be confident as to what that naming style implies.
As regards CamelCase v underscores, etc: again, it's down to your coding convention. One approach which uses both is to apply a prefix with underscore to indicate the module in which the function, or file-scope/global variable, is used, e.g. Config_Update(), Status_Get().

What tools for migrating programs from a platform A to B

As a pet project, I was thinking about writing a program to migrate applications written in a language A into a language B.
A and B would be object-oriented languages. I suppose it is a very hard task : mapping language constructs that are alike is doable, but mapping libraries concepts will be a very long task.
I was wondering what tools to use, I know this has to do with compilation, but I'm a bit afraid to use Lex and Yacc and all that stuff.
I was thinking of maybe using the Eclipse Modeling Framework, which would help me write models (of application code) transformations in a readable form.
But first I would have to write parsers for creating the models (and also create the metamodel from the language grammar).
Are there tools that exist that would make my task easier?

You can use special transformation tools/languages for that TXL or Stratego/XT.
Also you can have a look and easily try Java to Python and Java to Tcl migrating projects made by me with TXL.
You are right about mapping library concepts. It is rather hard and long task. There are two ways here:
Fully migrate the class library from language A to B
Migrate classes/functions from language A to the corresponding concepts in language B
The approach you will choose depends on your goals and time/resources available. Also in many cases you wont be doing a general A->B migration which will cover all possible cases, you will need just to convert some project/library/etc. so you will see in your particular cases what is better to do with classes/libraries.

I think this is almost impossibly hard, especially as a personal project. But if you are going to do it, don't make life even more difficult for yourself by trying to come up with a general solution. Choose two specific real-life programming languages ind investigate the possibities of converting between them. I think you will be shocked by the number of problems and issues this will expose.

There are some tools for direct migration for some combinations of A and B.
There are a variety of reverse engineering and code generation tools for different languages and platforms. It's fairly rare to see reverse engineering tools which capture all the semantics of the source language, and the semantics of UML are not well defined ( since it's designed to map to different implementation languages, it itself doesn't define a complete execution model for its behavioural representations ), so you're unlikely to be able to reverse engineer and generate code between tools. You may find one tool that does full reverse engineering and full code generation for your A and B languages, and so may be able to get somewhere.
In general you don't use the same idioms on different platforms, so you're more likely to get something which emulates A code on B rather than something which corresponds to a native B solution.

If you want to use Java as the source language(that language you try to convert) than you might use Checkstyle AST(its used to write Rules). It gives you tree structure with every operation done in the source code. This will be much more easier than writing your own paser or using regex.
You can run com.puppycrawl.tools.checkstyle.gui.Main from checkstyle-4.4.jar to launch Swing GUI that parse Java Source Code.

Based on your comment
I'm not sure yet, but I think the source language/framework would be Java/Swing and the target some RIA language like Flex or a Javascript/Ajax framework. – Alain Michel 3 hours ago
Google Web Toolkit might be worth a look.

See this answer: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

Are style-enforcement tools useful?

A recent question about StyleCop alerted me to the use of tools to enforce coding style. I would feel very annoyed if I were required to run one of these tools while I was developing. Do people really find them useful? Why or why not?
Everyone that has answered so far has indicated that they think that style/formatting rules are useful, and I am in 100% agreement with that. But what about using a tool for enforcement, rather than a style guide and regular code reviews? Have people found that useful in practice? Why or why not?

Yes, it's very helpful - particularly in large projects. It means you can go to anyone else's code, and it won't look alien to you. This means that people are more portable across projects, which gives a lot more flexibility - both for the person and the company.
The downside is that a lot of time can be spent arguing over which style to use.

There is a difference between a Coding style and a Formatting style.
A coding style enforces good practices.
the body of a 'IF' statement must be wrapped in opening and closing curly brackets
A formatting style is how the code looks.
where the '{' comes in an 'IF' statement.
In a team environment;
a good formatting tool will allow all the developers to see the code the way they want to see the code.
a good style tool will insure all the code follows the same guidelines

I like the concept of StyleCop, although I don't really care for a lot of the rules. Style is just so subjective that I find myself struggling to firmly decide if it should be part of our process or not. I really would prefer to see the team with a unified style, though, which is why I am so torn.
Obviously, the flip-side of the equation, with a tool like FxCop (or Code Analysis for fellow TFS users) is more based on practices, so the decision becomes more technical than personal and stylistic.

If style refers to formatting (like '{' must be at the end or at the beginning of a line), it can be very annoying, especially if merges are involves and if that style is not strictly enforced for all developers.
If style refers to 'good practice" (like the body of a 'if' statement must be wrapped in opening and closing curly brackets), it can be actually very useful.

I think in a large team, a uniform coding style is essential. Having some standard helps with maintainability, in that a new developer can be brought on to maintain old code, with minimal learning curve.
As far as enforcing styling differences (such as where the '{' comes) can be very easily be accomplished by automated tools, without imposing on the development process too much. Eclipse and Visual Studio both have a very rich set of options to format your code automatically based on a large set of options.

Restrictions on programming or formatting style might help reducing friction in a team of more than one person.
Restrictions on language features (especially using only a subset of C#) can help you concentrate on the problem domain instead of having to deal with an overwhelming number of concepts. This does matter if your software has to be robust and thoroughly understandable.
Regards,
tamberg

If you are using a version control system, it can get very ugly if every developer reformats the code towards his own preferences whenever he touches a file. In a place where the developers don't have the necessary communication skills, Wikipedia-like edit-wars can ensue if each developer passive-aggressively sticks to "his" standard.
Overall, manual reformatting also leads to more conflicts on checkins if two people work on the same file.
So if you are using a VCS, I'd even recommend enforcing formatting rules. Enforcing style rules can lead to better code quality.

Standards Document

I am writing a coding standards document for a team of about 15 developers with a project load of between 10 and 15 projects a year. Amongst other sections (which I may post here as I get to them) I am writing a section on code formatting. So to start with, I think it is wise that, for whatever reason, we establish some basic, consistent code formatting/naming standards.
I've looked at roughly 10 projects written over the last 3 years from this team and I'm, obviously, finding a pretty wide range of styles. Contractors come in and out and at times, and sometimes even double the team size.
I am looking for a few suggestions for code formatting and naming standards that have really paid off ... but that can also really be justified. I think consistency and shared-patterns go a long way to making the code more maintainable ... but, are there other things I ought to consider when defining said standards?
How do you lineup parenthesis? Do you follow the same parenthesis guidelines when dealing with classes, methods, try catch blocks, switch statements, if else blocks, etc.
Do you line up fields on a column? Do you notate/prefix private variables with an underscore? Do you follow any naming conventions to make it easier to find particulars in a file? How do you order the members of your class?
What about suggestions for namespaces, packaging or source code folder/organization standards? I tend to start with something like:
<com|org|...>.<company>.<app>.<layer>.<function>.ClassName
I'm curious to see if there are other, more accepted, practices than what I am accustomed to -- before I venture off dictating these standards. Links to standards already published online would be great too -- even though I've done a bit of that already.

First find a automated code-formatter that works with your language. Reason: Whatever the document says, people will inevitably break the rules. It's much easier to run code through a formatter than to nit-pick in a code review.
If you're using a language with an existing standard (e.g. Java, C#), it's easiest to use it, or at least start with it as a first draft. Sun put a lot of thought into their formatting rules; you might as well take advantage of it.
In any case, remember that much research has shown that varying things like brace position and whitespace use has no measurable effect on productivity or understandability or prevalence of bugs. Just having any standard is the key.

Coming from the automotive industry, here's a few style standards used for concrete reasons:
Always used braces in control structures, and place them on separate lines. This eliminates problems with people adding code and including it or not including it mistakenly inside a control structure.
if(...)
{
}
All switches/selects have a default case. The default case logs an error if it's not a valid path.
For the same reason as above, any if...elseif... control structures MUST end with a default else that also logs an error if it's not a valid path. A single if statement does not require this.
In the occasional case where a loop or control structure is intentionally empty, a semicolon is always placed within to indicate that this is intentional.
while(stillwaiting())
{
;
}
Naming standards have very different styles for typedefs, defined constants, module global variables, etc. Variable names include type. You can look at the name and have a good idea of what module it pertains to, its scope, and type. This makes it easy to detect errors related to types, etc.
There are others, but these are the top off my head.
-Adam

I'm going to second Jason's suggestion.
I just completed a standards document for a team of 10-12 that work mostly in perl. The document says to use "perltidy-like indentation for complex data structures." We also provided everyone with example perltidy settings that would clean up their code to meet this standard. It was very clear and very much industry-standard for the language so we had great buyoff on it by the team.
When setting out to write this document, I asked around for some examples of great code in our repository and googled a bit to find other standards documents that smarter architects than I to construct a template. It was tough being concise and pragmatic without crossing into micro-manager territory but very much worth it; having any standard is indeed key.
Hope it works out!

It obviously varies depending on languages and technologies. By the look of your example name space I am going to guess java, in which case http://java.sun.com/docs/codeconv/ is a really good place to start. You might also want to look at something like maven's standard directory structure which will make all your projects look similar.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio