As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Every once in a while I'm confronted with displaying a list of available languages, and each and every time I ask my self:
Is it better to display the language in:
the currently selected language
English
in the language according to the button/list item
Examples:
English
German
French
or
English
Deutsch
Français
Is there any convention on which one should be used, is more polite or better in any other way? Are there other options?
I would say it's best to display the language in "its own language" (option #3). You can not necessarily expect the user to know the currently selected language, nor expect him to know English.
What's tricky is how to display the "Select your language" button in a language neutral way. I usually go for a flag indicating the current language since that tends to get the message across eventhough there's not always a 1:1 mapping between country and languages.
I definitely think you should display in the language that matches the item in the button list.
Reasons:
If it's not the language you're interested in, you won't mind if you don't understand it, as long as you can find your own language.
Think about the last time you called customer service. How many times have you heard something like, "Para Espanol, marque dos"? It's very common, accepted practice to mix different languages in one UI (whether visual or audible).
Think about how you'd feel if you went to a Spanish site, and you couldn't find your language under "E". Maybe, eventually, you'd notice "Ingles", and think it probably translated to "English", but it's definitely better to save the user the trouble of translating and mentally alphabetizing.
The standard (in both senses of the word, i.e. what is actually used in the real world, and what the IETF/W3C/ISO says) is to use ISO 639-1 Alpha-2 language codes. Maybe augmented with either the full name of the language in English, the language itself, a romanic transliteration of the name in the language itself or any combination thereof.
So, to keep with your example:
[de] German - Deutsch
[en] English
[fr] French - Français
[ja] Japanese - 日本語 (Nihongo)
Two options, first the name of the language in the selected locale or English, then the name of the language in itself between parens, or the other way around, e.g.:
English
French (Français)
German (Deutsch)
Spanish (Español)
or
English
Français (French)
Deutsch (German)
Español (Spanish)
English language name:
Pros:
Predictable sorting.
No need to think about different text flows.
Cons:
Users who doesn't speak English might have ha harder time finding their language.
If the rest of the application is translated, it might look sloppy or grammatically wrong: Ditt språk är English/votre langue est English.
Language in its own name:
Pros:
Easier for the non-English speaker.
You have to think about encoding and text flow; A useful exercise. :-)
Cons:
Harder to navigate if the user is used to English or has her mind set on finding an English name.
You have to consider all language variants.
What is right really depends on the rest of your application. You might want to consider having all language names translated to all languages. If english is choosen, then you get to pick from:
English
Swedish
French
If Swedish:
Engelska
Svenska
Franska
...and French:
Anglais
Suédois
Français
But then the translantion problem has turned from O(n) to O(n^2), which might be acceptable depending on what your current value of n is.
EDIT
As deceze points out. you will also have to handle the case when a user accidentally switches to a language she doesn't understand, and provide a way back - for example by always including a few major languages.
I find it harder to find "Magyar" in a list of languages.
Because there are languages with non-latin character set, this is not a simple first-letter-lookup, as I lose focus when I first meet one of these.
Where should I look? At 'M' - Magyar? But where is M? EDIT: M in the (current language's) alphabet, not on the keyboard.
Have a look at this (from Wikipedia):
Български - I know, this is Bulgarian, but
བོད་ཡིག - what is this?
Bosanski
Català
Česky
Dansk
Deutsch
Ελληνικά
I would prefer something like this:
A...
B...
C...
.
.
Hungarian (Magyar)
If the UI was Japanese, I would ctrl+f-ing "Magyar", though.
Whatever you do don't use the IP location to set the language.
Google is very annoying about this -- when logging on from a new location I get google in the local language and script. This is really annoying particularly, anywhere southeast of Croatia.
The worse offender though is Microsoft. When trying to purchase software thier servers keep switching languages depending on your location and in many cases makes it impossible pay for anything by Credit Card as the addresses and zip codes etc. are validated in the local format and not where your credit card was actually issued. ( By the way MS the first four digits of a credit card number indicate the issuing institution which is tied to a particular country so its not rocket science to work out a UK postcode format is required rather than say a six digit german ZIP code.
Use country-flags in combination with the language name in that language (Deutsch, Francais, Nederlands, ...).
I don't know about any programming related conventions about this but i would prefer to see the name of a language in its own language.
For example:
English
Türkçe
Deutsch
Have a look at your Regional Settings.
This is how Microsoft implemented it. Seems like your version 1.
alt text http://www.freeimagehosting.net/uploads/1c14f9f60d.jpg
Related
I was given the task of coming up with shorter German words for the German version of our software.
It got me to thinking that there should be some sort of standard vocabulary for information technology somewhere. Like there "have to be" terms that most (if not all) German computer users use for what English-speakers call file, database, record, search, search terms, search hits, find and replace, delete, OCR ... you get the idea.
I found ISO 2382 on the ISO Web site, but it only seems to standardize English and French. Is there an equivalent standard for German? How about for Spanish, or for other languages?
I may suggest this book, although quite dated, was an attempt to come up with a set of standard computer terms for translating from German to English and back:
Grosses IWT-Wörterbuch der Computertechnik und der Wirtschaftsinformatik. Englisch-Deutsch. Deutsch-Englisch
I will offer up the answer, "no".
Even within English, there are not standard words to describe computer operations as you have presented them. Certainly one can "delete" a file, but they can also "erase" it, "remove" it, an (shudder) "move it to the trash can".
Instead of trying to solve the problem in the large, I suggest you solve the problem in the small. Build a glossary of commonly used German words, and whenever there is an opportunity to expand the Glossary, first look over the existing entries and do your best to reuse the current terminology.
In a way, the reason good English documentation works well is because good writers of English use a glossary like technique explicitly or implicitly. In the event that much of your documentation comes from a single source, or related set of sources, you can make a "translation map" of "when they say X, we say Y". But, even such simplifications often require native readers to re-read the translation in context, as languages are not nearly regular enough to do simple substitution without many pitfalls.
As a starting point, The Open Group (www.opengroup.org) seems to have defined glossaries as part of their work on The Open Group Architecture Framework (TOGAF), which appear to be the sort of thing I needed. For example, these document numbers and titles are taken directly from their Web site:
C148 TOGAF® 9.1 Translation Glossary: English – Hrvatski (Croatian)
C149 TOGAF® 9.1 Translation Glossary: English – Castilian Spanish
C146 TOGAF® 9.1 Translation Glossary: English – Portuguese (Portugal)
C13H TOGAF® 9.1 Translation Glossary: English – Slovak
Normally I use Recaptcha for all captcha purposes, but now I'm building a website that is translated into Chinese and Japanese, among other languages. I'd like to make the captcha as accessible to those users as possible. Even if they can read and type English characters (which is not necessarily the case), often times even I as an English-speaker have had trouble figuring out what the word in Recaptcha has to be.
One good solution I've seen (from Google) is to use numbers instead of text. Are there other good solutions? Is there a reliable free captcha service out there such as Recaptcha that offers this option?
The Chinese and Japanese both use a keyboard with Latin characters on. The Chinese input their 1000s of characters via Pinyin (Romanized Chinese) and so they are very familiar with all the same letters that you and I are. Therefore, whatever you are using for English speaking people can also be used for them.
PS - I know this is an answer to an old post, but I'm hoping this answer will help anyone who comes here with the same question.
I have encountered the same problem in the past, I resolved the issue by using the following CAPTCHA which uses a numerical validation:
http://www.tipstricks.org/
However, this may not be the best solution for you, so here is an extensive list of different CAPTCHAs you might want to consider (most of them are text based, but some use alternative methods such as numerical expressions):
http://captcha.org/
Hope this helps
This may be a stupid question, but here goes.
I've seen several projects using some translation library (e.g. gettext) working with plain english placeholders. So for example:
_("Please enter your name");
instead of abstract placeholders (which has always been my instinctive preference)
_("error_please_enter_name");
I have seen various recommendations on SO to work with the former method, but I don't understand why. What I don't get is what do you do if you need to change the english wording? Because if the actual text is used as the key for all existing translations, you would have to edit all the translations, too, and change each key. Or don't you?
Isn't that awfully cumbersome? Why is this the industry standard?
It's definitely not proper normalization to do it this way. Are there massive advantages to this method that I'm not seeing?
Yes, you have to alter the existing translation files, and that is a good thing.
If you change the English wording, the translations probably need to change, too. Even if they don't, you need someone who speaks the other language to check.
You prep a new version, and part of the QA process is checking the translations. If the English wording changed and nobody checked the translation, it'll stick out like a sore thumb and it'll get fixed.
The main language is already existent: you don't need to translate it.
Translators have better context with a real sentence than vague placeholders.
The placeholders are just the keys, it's still possible to change the original language by creating a translation for it. Because when the translation doesn't exists, it uses the placeholder as the translated text.
We've been using abstract placeholders for a while and it was pretty annoying having to write everything twice when creating a new function. When English is the placeholder, you just write the code in English, you have meaningful output from the start and don't have to think about naming placeholders.
So my reason would be less work for the developers.
I like your second approach. When translating texts you always have the problem of homonyms. Like 'open' can mean a state of a window but also the verb to perform the action. In other languages these homonyms may not exist. That's why you should be able to add meaning to your placeholders. Best approach is to put this meaning in your text library. If this is not possible on the platform the framework you use, it might be a good idea to define a 'development language'. This language will add meaning to the text entries like: 'action_open' and 'state_open'. you will off course have to put extra effort i translating this language to plain english (or the language you develop for). I have put this philosophy in some large projects and in the long run this saves some time (and headaches).
The best way in my opinion is keeping meaning separate so if you develop your own translation library or the one you use supports it you can do something like this:
_(i18n("Please enter your name", "error_please_enter_name"));
Where:
i18n(text, meaning)
Interesting question. I assume the main reason is that you don't have to care about translation or localization files during development as the main language is in the code itself.
Well it probably is just that it's easier to read, and so easier to translate. I'm of the opinion that your way is best for scalability, but it does just require that extra bit of effort, which some developers might not consider worth it... and for some projects, it probably isn't.
There's a fallback hierarchy, from most specific locale to the unlocalised version in the source code.
So French in France might have the following fallback route:
fr_FR
fr
Unlocalised. Source code.
As a result, having proper English sentences in the source code ensures that if a particular translation is not provided for in step (1) or (2), you will at least get a proper understandable sentence than random programmer garbage like “error_file_not_found”.
Plus, what do you do if it is a format string: “Sorry but the %s does not exist” ? Worse still: “Written %s entries to %s, total size: %d” ?
Quite old question but one additional reason I haven't seen in the answers yet:
You could end up with more placeholders than necessary, thus more work for translators and possible inconsistent translations. However, good editors like Poedit or Gtranslator can probably help with that.
To stick with your example:
The text "Please enter your name" could appear in a different context in a different template (that the developer is most likely not aware of and shouldn't need to be). E.g. it could be used not as an error but as a prompt like a placeholder of an input field.
If you use
_("Please enter your name");
it would be reusable, the developer can be unaware of the already existing key for an error message and would just use the same text intuitively.
However, if you used
_("error_please_enter_name");
in a previous template, developers wouldn't necessarily be aware of it and would make up a second key (most likely according to a predefined wording scheme to not end up in complete chaos), e.g.
_("prompt_please_enter_name");
which then has to be translated again.
So I think that doesn't scale very well. A pre-agreed wording scheme of suffixes/prefixes e.g. for contexts can never be as precise as the text itself I think (either too verbose or too general, beforehand you don't know and afterwards it's difficult to change) and is more work for the developer that's not worth it IMHO.
Does anybody agree/disagree?
When I see a small program which is written for some students, I often see something like this: (haskell, german):
ueber = "What the haeck!"
instead of
über = "What the häck!"
As many modern languages are specified to allow non-standard charactes in declaration names via UTF-8, is there a special reason for avoiding these in a project, which is sure to be only for people who are able to input these characters (say for a team of german students?) or is this just a historical reason?
I know, that you should keep names in a-zA-Z_0-9 if you develop an applicaio internationally, but are there any reason for avoiding this in a "local" project?
is there a special reason for avoiding these in a project, which is sure to be only for people who are able to input these characters
That is certainly the main reason. Other reasons that come to mind is that many development tools, search functions, editors, parsers, documentors, code search engines etc. will not expect non-ASCII input in code.
Also, you never know where your code may be used one day! The smallest innocent school project can grow into a nice Open-Source tool that gets used around the globe one day. In that case, ASCII is the smallest common denominator, at least at the moment.
I've had to work on a project started by French developers. They had to spend quite a bit of time translating their program to English when more people joined the project. Teach your German students this lesson up front, and not only will they be able to share their code with others, they'll no longer need an über or ueber variable either.
BTW, ü is an alphabetic character. + and - are non-alphanumeric, and I'd say it's obvious why they're disliked in function names.
Should I use ISO 639-1 (2-letter abbreviation) or ISO 639-2 (3 letter abbrv) to store a user's language code? Both are official standards, but which is the de facto standard in the development community? I think ISO 639-1 would be easier to remember, and is probably more popular for that reason, but thats just a guess.
The site I'm building will have a separate site for the US, Brazil, Russia, China, & the UK.
http://en.wikipedia.org/wiki/ISO_639
You should use IETF language tags because they are already used for HTTP/HTML/XML and many other technologies. They are based on several standards including the ISO-639 collection (yes language, region and culture selection are not so simple to define).
I wrote a more detailed article regarding the proper language code selection and usage. The idea is to use the simplest/shorter ISO-639-1 codes and specify more only for special cases. Inside the article there are codes for ~30 most used languages with reasons why I consider one alternative better than another.
In case you want to skip reading the entire article here is a short list of language codes (not to be confused with country codes): ar, cs, da, de, el, en, en-gb, es, fr, fi, he, hu, it, ja, ko, nb, nl, pl, pt, pt-pt, ro, ru, sv, tr, uk, zh, zh-hant
The following points may not be obvious but should be borne in mind:
en is used for en-us - American English, and for British English is used en-gb
pt is used for pt-br, and not pt-pt witch has much less speakers
zh is used instead of zh-hans, zh-CN,...
zh-hant (Traditional Chinese) is used instead of more specific codes like zh-hant-TW or zh-TW
You can find more explanations inside the article.
I would go with a derivative of ISO 639. Specifically I like to use this: http://en.wikipedia.org/wiki/IETF_language_tag
I'm no expert, but every site I've ever seen uses ISO 639-1, including the current site I'm working on.
It works for us!
I've only ever seen 2-character language codes in use - so I'd recommend going with them unless your work involves delving into linguistics in some way. If all you're doing is customizing the browsing experience for the world at large, you won't need the extra repertoire offered by 3-character codes.
ISO 639-1 Alpha-2 are used pretty much universally.
They are used for example in HTTP content negotiation. If you ever wondered how an international website can automatically show you their homepage in your native language, that's how it works. (Although it's sometimes kinda annoying. I, for example, often get shown the default Apache homepage in German, because the webmaster turned on content negotiation, but only put content for English in.)
Most web browsers use them directly in their settings dialog box.
Most operating systems use them in their settings dialog boxes or configuration files.
Wikipedia uses them in their server names for the different language versions.
In other words: if your users aren't native English speakers, they will probably already have encountered them when configuring their software, because otherwise they wouldn't be able to use their computers.
The other members of the ISO 639 family are mostly of interest to linguists. Unless you expect Jesus Christ himself (ISO 639-2 Alpha-3 code arc) to visit your website, or maybe Klingons (tlh), ISO 639-1 has more languages than you ever can hope to support.