I want to make a multi-language site, such that all or almost all pages will be available in 2 or more translations. What are the best practices to follow?
For example, I consider these language selection mechanisms:
Cookie-based selection of the preferred language.
Based on Accept-Language header if the cookie is not set.
Based on GeoIP otherwise (probably).
Is there anything else?
How should different translations be served?
as LANG.example.com/page
as example.com/LANG/page
as example.com/page?hl=LANG
...
any of the above with a redirect to example.com/page? (It seems to be discouraged)
How to ensure that all the translations are properly indexed?
Sitemaps with all pages + correct Content-Language header are enough?
What is the best way to let the users know there are other translations, but do not distract them?
list available languages in the header/footer/sidebar (like Wikipedia)
put “Choose a language” selector next to the content
What is the best policy to deal with missing/outdated translations?
do not display missing pages at all or display a page in a different language?
display old translation, old translation with a warning or a page in a different language?
What else should I take into account? What should I do and what I definitely should not?
In addition to #Quassnoi's answers ensure that you standard RFC 4646 language identifiers (e.g. EN-US, DE-AT); you may already be aware of this. The CLDR project is an excellent repository of internationalization data (the Supplemental Data is really useful).
If a translation of a specific page is not available, use a language fallback mechanism back to the neutral language; for example "DE-AT", "DE", "" (neutral, e.g. "EN").
Most recent browsers and the underlying operating systems will correctly show all of the characters required for a locale selector list if the page is encoded correctly (I'd recommend all pages being UTF-8). Ensure that the locale list contains both the native and current-language names to allow both native and non-native speakers to view the specified translations, e.g. "Deutsch (German)" if the current locale is EN-*.
A lot of sites use a flag icon to show the current locale, but this is more relevant to the location and some people may be offended if you show only a dominant flag (e.g. the US or UK flag for English).
It may be worthwhile to have a more visible (semi-graphical) locale selector on the home page if no locale cookie has been submitted, using a combination of GeoIP and Accept-Language to determine the default locale choice.
Semi-related: if your users are in located in different time zones include a zone preference in their account profile for displaying time values in their local time. And store all time stamps using UTC.
Make the decision whether you need support for languages that require double byte characters early on (Chinese, Japanese, Korean, etc), Unicode is the preferable choice. It can be tedious to change later, especially if you have a database that doesn't use unicode.
Cookie-based selection of the
preferred language.
Based on Accept-Language header if
the cookie is not set.
These two you should support.
Put a big english banner at the top of your page that reads This page in English.
as example.com/LANG/page
This is the best choice.
LANG.example.com isn't good for autocomplete, and the question marks look ugly.
list available languages in the header/footer/sidebar (like Wikipedia)
Choose a language dropbox is confusing, as it is not intelligible being written in a wrong foreign language and spoils overall impression being written in English.
And you always tend to make the error selecting the language you don't even have fonts for leaving yourself on a page full of question marks.
display old translation with a warning
You know there is something you can read and get the point, but for the details you'd better get a dictionary and read it in English.
Related
I'm currently working on an internationalisation project for a large web application - initially we're just implementing French but more languages will follow in time. One of the issues we've come across is how to display adjectives.
Let's take "Active" as an example. When we received translations back from the company we're using, they returned "Actif(ve)", as English "Active" translates to masculine "Actif" or feminine "Active". We're unsure of how to display this, and wondered if there are any well established conventions in the web development world.
As far as I see it there are three possible scenarios:
We know at development time which noun a given adjective is referring to. In this case we can determine and use the correct gender.
We're referring to a user, either directly ("you") or in the third person. Short of making every user have a gender, I don't see a better approach than displaying both, i.e. "Actif(ve)"
We are displaying the adjective in isolation, not knowing which noun it's referring to. For example in a table of data, some rows might be dealing with a masculine entity, some feminine.
Scenarios 2 and 3 seem to be the toughest ones. Does anyone have any experience handling these issues? Any tips would be appreciated!
This is complex, because we cannot imagine all the cases, and there is risk to go in "opinion based" answer, so I keep it short and generic.
Usually I prefer to give context in translation (for translator), e.g. providing template: _("active {user_name}" (so also the ordering will be correct if languages want different ordering).
Then you may need to change code and template into _("active {first_name_feminine}") and _("active {first_name_masculine}") (and possibly more for duals, trials, plurals, collectives, honorific, etc.). Note: check that the translator will not mangle the {} and the string inside. Usually you need specific export/import scripts. Or I add a note inside the string, and I quickly translate into English removing the note to the translator). Also this can be automated (be creative on using special Unicode characters which should not be used in normal text, to delimit such text).
But if you cannot know the gender, the Actif(ve) may be the polite version used in such language. You need a native speaker test, and changes back and forth.
So the question is can I point out that my application supports en-US, en-GB and use for all of them the single resource file?
The intention is that I want my application to be available for all english-speaking countries. But it's meaningless to have different translations, because there are no specific translations.
Does it have a sense considering the mentioned intention to point out all those specific cultures in a manifest?
Yes - just use one English file and make it as default culture. This way even when en-GB is selected, for example, the app will fallback to en-US :)
As for date formatting - just be sure to use CurrentCulture - it gets formatting from the Regional and Number settings (and not CurrentUICulture which is for language needs only). This way people with, say, en-US UI language and Number formatting set to de-DE will still see the app in English but have number formatting as German.
There is a common confusion between CurrentCulture and CurrentUICulture and that Language equals formatting. That's why I see many 12-hour formats throughout Windows Phone/Store apps that simply ignore my Regional settings. A must-read regarding confusion about UI and Number formatting: http://forums.asp.net/post/1080435.aspx
I am working on a single page app that requires internationalization features (translation of all static strings into the user's language and setting date and currency formats).
I am using Ember.js, thus most of the static strings are in html bocks (in templates or views) or are in typical Javascript messages such as "Are you sure you want to delete the ..." (part of controller files).
I am looking for best practices and experiences on how to abstract all these strings and other locale specific bits out of the application.
I see mainly a problem with the fact that the user language is only determined after logging-in. But at that moment, the complete application is already loaded (in English) and thus "redirecting" to another language is not really possible (unless you load all strings of all possible languages at application start - but this would require too much data to be loaded at start).
Any feedback is wellcome !
-- UPDATE --
I found in the meantime the ember-i18n library which I can use for the translation of strings (https://github.com/jamesarosen/ember-i18n).
My main question however remains: how can you load dynamically translation.js files corresponding to a selected language or corresponding to the user's langauge after login ?
And is there a way to store the selected language so that at next application start, the application uses the correct language (thus load correct translations file before rendering UI).
Hope somebody can help.
Marc
You could store the language settings in a cookie, but that is not a 100% approach. Or just load the translation json with an ajax call as soon as you know the language.
Is there a problem if I have both English and Chinese versions of the same title/meta tags under the same exact url? I detect the language the user has set for the browser (through the http header "accept-language" field) and change the titles/meta tags based on the language set. I get a large percentage of my traffic from China and felt this was a better-localized user experience for those users BUT I have no idea how Google would view this. My gut feeling tells me that this is not good for SEO.
Baidu.com, a major Chinese search engine, does in fact pick up my translated tags however for other US based sites it does not translate their English title/meta tags into Chinese. I would think Chinese users are less likely to click on those.
Creating sub domains and or separate domains for other countries is not an option at this point. That being said should I only have one language (English) for my title/meta tags to avoid any search engine issues?
Thanks for any advice / wisdom you can offer. Really hoping to get clarity on best practices.
Thanks all!
Yes, it probably is a problem. Search engines see mixed language content. You are not describing how you “detect and change the titles/meta tags based on the users browser language”, but you are probably doing it client-side and using “browser language”, which is wrong whatever it means in detail (it does not specify the user’s preferred language).
To get a more targeted answer, ask a more real question, with a URL.
If you want to get search traffic from search engines in both English and Chinese, you should have two urls instead of one.
When googlebot crawls a page, it does not even send the "Accept-Language" header. You have to send it your default language. When there is one url, there is no way for you to have your second language indexed. You won't be ranked in search engines in multiple languages.
For best SEO, use separate top level domains, subdomains, or folders for different languages.
http://example.de/
http://example.es/
http://example.com/
http://de.example.com/
http://es.example.com/
http://www.example.com/
http://example.com/de/
http://example.com/es/
http://example.com/en/
I think there are no problem when you use English and Chinese in same meta tags.
What are the steps to develop a multilingual web application?
Should i store the languages texts and resources in database or should i use property files or resource files?
I understand that I need to use CurrentCulture with C# alone with CultureFormat etc.
I wanted to know you opinions on steps to build a multilingual web application.
Doesn't have to be language specific. I'm just looking for steps to build this.
The specific mechanisms are different depending on the platform you are developing on.
As a cursory set of work items:
Separation of code from content. Generally, resources are compiled into assemblies with the help of resource files (in dot net) or stored in property files (in java, though there are other options), or some other location, and referred to by ID. If you want localization costs to be reasonable, you need to avoid changes to the IDs between releases, as most localization tools will treat new IDs as new content.
Identification of areas in the application which make assumptions about the locale of the user, especially date/time, currency, number formatting or input.
Create some mechanism for locale-specific CSS content; not all fonts work for all languages, and not all font-sizes are sane for all languages. Don't paint yourself into a corner of forcing Thai text to be displayed in 8 pt. Also, text directionality is going to be right-to-left for at least two languages.
Design your page content to reflow or resize reasonably when more or less content than you expect is present. Many languages expand 50-80% from English for short strings, and 30-40% for longer pieces of content (that's a rough rule of thumb, not a law).
Identify cultural presumptions made by your UI designers, and try to make them more neutral, or, if you've got money and sanity to burn, localizable. Mailboxes don't look the same everywhere, hand gestures aren't universal, and something that's cute or clever or relies on a visual pun won't necessarily travel well.
Choose appropriate encodings for your supported languages. It's now reasonable to use UTF-8 for all content that's sent to web browsers, regardless of language.
Choose appropriate collation for your databases, or enable alternate collations, if you are dealing with content in multiple languages in your databases. Case-insensitivity works differently in many languages than it does in English, and accent insensitivity is acceptable in some languages and generally inappropriate in others.
Don't assume words are delimited by spaces or that sentences are delimited by punctuation, if you're trying to support search.
Avoid:
Storing localized content in databases, unless there's a really, really, good reason. And then, think again. If you have content that is somewhat dynamic and representatives of each region need to customize it, it may be reasonable to store certain categories of content with an associated locale ID.
Trying to be clever with string concatenation. Also, try not to assume rules about pluralization or counting work the same for every culture. Make sure, at least, that the order of strings (and controls) can be specified with format strings that are typical your platform, or well documented in your localization kit if you elect to roll your own for some reason.
Presuming that it's ok for code bugs to be fixed by localizers. That's generally not reasonable, at least if you want to deliver your product within a reasonable time at a reasonable cost; it's sometimes not even possible.
The first step is to internationalize. The second step is to localize. The third step is to translate.