Native Win32 NLS - strange default sub language for Portuguese - winapi

Doing LANGID and LCID gymnastics right now. Just noticed that in native Win32 NLS, SUBLANG_PORTUGUESE_BRAZILIAN is the user default sub language for the LANG_PORTUGUESE primary language. I expected SUBLANG_PORTUGUESE to be the default sub language. Why isn't it?

I suppose it has something to do with sheer user base. I mean much more people live in Brazil than in Portugal. So it is probably better default value.

Related

Getting the default RTL codepage in Lazarus

Lazarus Wiki states
Lazarus (actually its LazUtils package) takes advantage of that API
and changes it to UTF-8 (CP_UTF8). It means also Windows users now use
UTF-8 strings in the RTL
In our cross-platform and cross-compiler code, we'd like to detect this specific situation. GetACP() Windows API function still returns "1252", and so does GetDefaultTextEncoding() function in Lazarus. But the text (specifically, the filename returned by FindFirst() function) contains the string with UTF8-encoded filename, and the codepage of the string (variable) is 65001 too.
So, how do we figure out that the RTL operates with UTF8 strings by default? I've spent several hours trying to figure this out from Lazarus source code, but probably I am missing something ...
I understand that in many scenarios, we need to inspect the codepage of each specific string, but I am interested in the way to find out the default RTL codepage which is UTF8 in Lazarus, yet Windows-defined one in FPC/Windows without Lazarus.
Turns out, that there's no single code page variable or function. Results of the filesystem API calls are converted to the codepage, defined in DefaultRTLFileSystemCodePage variable. The only problem is that this variable is present in the source code and is supposed to be in system unit, but the compiler doesn't see it.

How can I write a program in Japanese in Pascal?

I'm teaching myself Pascal and thought mixing Pascal with Japanese sounded like a really good idea the other day, but it appears Pascal only accepts Japanese characters some of the time, and I don't really know why it accepts them at all. Is there something I need to include to allow writing in Japanese with Free Pascal?
You don't mention which Pascal, and you don't describe acceptance. As identifier or literals in source code, on standard input, as literal?
Delphi D2009 and higher support unicode identifiers in their utf8 sources.
Free Pascal hasn't implemented this yet. It does allow utf8 source encoding though, and thus unicode literals.

Lazarus coding style guide

Style Guide?
Other than http://wiki.freepascal.org/Coding_style is there a style guide that represents the style followed by a notable and large body of work in Lazarus ( and/or FPC and/or Delphi) or some sort of widespread concensus.
Example
I'm looking for things that say something such as
Names of literal constants should be in all uppercase.
Names of variables should use camelCase with initial lowercase
Indent a begin on the line after an if
The above is just an example. I'm aware of well-supported conventions in languages like Java and Perl but not of a predominant convention for programs written using Lazarus or Delphi.
Purpose
My intent is
Try to adopt a common style for all the code I write
Have this style not be too much of a surprise for the majority of programmers who might one day read it.
I'm not working in a business that has established standards.
As a good detailed style guide I'm considering the Object Pascal Style Guide by Charles Calvert. It's for Object Pascal which the Free Pascal is a child of. In fact, most of the FPC units respect the rules mentioned there.
This article documents a standard style for formatting Delphi code. It is based on the conventions developed by the Delphi team.
You will probably yield the most info on this subject with the search term "delphi coding conventions" or something. These are very loose standards that are not enforced but can be very helpful to keep your code readable. Delphi and Lazarus are very interchangeable. Same would apply with Delphi as Lazarus in this regard. Much more info on Delphi.Old Delphi books are a great resource even.

Code "internationalization"

I worked on different projects in different countries and remarked that sometimes the code became internationalized, like
SetLargeurEtHauteur() (for SetWidthAndHeight, fr)
Dim _ListaDeObiecte as List(Of Object) (for _ObjectList, ro)
internal void SohranenieUserov() (for SaveUsers, ru)
etc.
It happens that in countries with Latin alphabet this mix is more pronounced, because there is no need of transliteration.
More than that, often the programming "jargon" is inspired by the project specifications language. There are cases that terms in "project language" have a meaning that is not "translatable" in English.
There are also projects on which works only, say a French team, uses French words (say, Personne, Vehicule, Projet etc).
In that cases I personally add in specifications a "Dictionary" that explains all business object names and only these objects are used in other (French) language.
Say:
Collectif - ensemble des Personnes;
All the actions(Get, Set, Update, Modify, Load, etc) are in English.
Now that "strong" names could be used in code:
AddPersonneToCollectif.
What is your approach to "internationalization"?
PS.
I was amused that VisualStudio compiles and runs projects in .NET with buttons named à la "btnAddÉlève" or "кпкСтоп"...
My personal approach, which is shared by many but not all in the programming community, is that source code should be in English and, if possible, all the development tools should be in English too.
The most important reason for this is being able to share your problems and solutions with the world (like we are doing now in StackOverflow, no less) without having to translate class names, error messages, paths and other artifacts every time.
It also helps consistency, because most libraries are written in English and having element names that mix two languages doesn't really help anyone, besides being a constant focus of internal conflicts when a verb like Add isn't always traslated.
English code also makes it easier to add foreign people to a project without worrying about comprehension and misunderstandings (especially between closely related languages, like Spanish and Portuguese, which have lots of false cognates)
A good link on this subject: http://www.codinghorror.com/blog/2009/03/the-ugly-american-programmer.html
(In case anyone wonders, I'm south-american and English is not my primary language)
Even if everyone on the team is a good English speaker (which is not a given), they may not necessarily know the English equivalent of all the business terminology.
I think it's a project-specific decision what to allow, but I would generally tolerate and in some cases encourage business terms (e.g. entity names) in the local language, but not technical terms (i.e. not Largeur/Hauteur instead of Width/Height).
For example in the financial world in France, everyone knows what is meant by OPCVM and FCP - if you attempt English translations you might end up with more misunderstandings than you do by allowing mixed languages.
I have the same issue with Norwegian currently. I guess it depends on your position in the project, the available time and the role of the software.
In my case, I have decided to keep all terms in an existing protocol and library I am working with in Norwegian, as I can reasonably expect that generations of administrative workers have gotten used to these, and since the library depends on the protocol. In a library wrapper for an international project, I have translated each method name literally, and added an English language documentation of the method.
Comments and documentation on the code are in English.
If designing a software from scratch, I would try to find English terms for all method names and even business terms (if reasonable. I can hardly think of an example where no term can be found though.), to keep it "portable".
If you're writing code that you may be used internationally one day, write it in English. In doubt, write it in English, even comments if you can (although I suppose you can add a few comments in the language of your workplace).
It's not specific to coding unfortunately. English isn't my native language, but I've been able to read a number of technical papers and participate to international conferences with people from all over the world. These collaborations simply wouldn't work if everyone published in their respective native languages.
It may be sad if you feel like defending your language at all cost, but you have to be realistic about it. I suppose English has the advantage to be relatively simple for achieving a basic level: no genders for names, no conjugations, no cases.
Generally code referring to language concepts should be in the same mother language as the programming language (i.e. English - for, while, string are all English words).
It's OK (but not great) to have variables and domain concepts in a local language, but you definitely don't want to be translating List, Object, Decimal, etc. into terms which cause programmers more work in reconciling two languages. Even still, I would strongly lobby to restrict very common domain concepts like Collection, Membership, Person, User and possibly less common domain concepts like Invoice, Receipt to English where this is possible.
It would be like coding half your classes in VB and half in C# - your brain has to make a cognitive shift. While this is good for hybrid apps (JavaScript on the web and C# on the backend) because it helps you keep what's running where clear, it isn't good for a general programming.
In addition, using English for everything makes the domain and language words work together better.
There are always exceptions. There are certain cases where you would use a native word anyway - where the word describes the domain best. For instance, in our (English) code base, we had references to Mexican Spanish terms for certain concepts which were only relevant for people running our software in Mexico. Typically, Japanese terms were spelled out phonetically/Romaji, though - it was difficult for non-Japanese to be able to pronounce the pictograms ;-).
I think I'd call a code base like that "abysmal" rather than "internationalized", but the general rule I've always heard is that if you ever think that someone other than one who speaks your language might ever touch the code, do it in english.
I think that good design guideline is to write code with use of English names and with english comments only (of course if your team is capable to do this, but in case of international team English seems to be natural choose, since it's a most popular language, expecially in IT world).
Good explanation of such guidline is that keywords in most of programming languages are taken from English so writing your code using English names gives more consistent look and thanks to this you end with code that is easier to read.
Another reason is that most of compilers can handle only ascii characters as names of classes, methods, etc. so probably you will end with some strange names when you decide to use some language with alphabet containing non ascii chars.
Third reason that came to my mind is sharing your code on site like SO. Today I opened a post with a piece of code where classes had Spanish names. It was hard for me to guess what was the purpose of this class (even if sometimes is not necessary it is good when you read code and understand all used words:)).
To sum up I think that internationalization of code is not a good idea. You can imagine that keywords in programming languages (e.g. class, try, while) could also be localized and probably you can imagine also how hard life could be then...
To keep things consistent, I would make the code the same (human) language as the (programming) language. That is, if the programming language uses English keywords (like for, switch, public, etc) then keep the rest of the code in English. If you are using a compiler that recognizes (say) the Swahili translations of keywords, then keep the rest of the code in Swahili.
Many APIs have standardized naming schemes that are followed regardless of (human) language, and the accompanying documentation is translated as needed (instead of the source code).
No matter what (human) language you choose, pick one and stick with it. I'd much rather try to wade through source code in German than code that was a mix of German and English.

Finding the current active language on windows

What are the possible solutions for finding the current active language which appears on the Windows language bar ?
CultureInfo.CurrentCulture. This has information on the language and culture. If you just want the language name, try CultureInfo.CurrentCulture.ThreeLetterISOLanguageName.
You should look at the Multilingual APIs in Win32 as a starting point. It's not entirely obvious from the documentation which call will provide you what you want, but I think the answer may lie with the calls relating to processes and threads, or preferred languages. You may need to do some experimentation to see which calls give the expected result of matching the language bar selection.
I suspect that the best call to try would be GetThreadUILanguage.

Resources