i18n URL format with language parameter always in URL - codeigniter

I'm working on a new website and I have a question regarding internalization (i18n) and SEO.
Here's the case. I'm using this CodeIgniter code to translate my website in 2 languages.
Let's say these languages are English and Dutch.
When using this code, people visiting mydomain.com are redirected to mydomain.com/en/defaultcontrollername
where English is the default language.
I will have a simple selectbox where users can switch to Dutch. The url will then be the same, except "en" is replaced with "nl".
Now my question is: is it bad practice in terms of SEO that the default language is always present in the URL? Also, does an immediate redirect to mydomain.com/en/defaultcontrollername after opening mydomain.com have effect on SEO?
Is it better to always use a default language and present that first, or to "guess" the language based on browser headers?
Last but not least: do I also have to translate my controller names.

See: How should I structure my urls for both SEO and localization?
Having a folder (or subdomain, or top level domain) with the language is the preferred setup for optimal SEO.
You can translate words in the URL itself (the controller name in your case).
It makes the URLs look more appealing to non-English users.
Any non-ASCII characters have to be URL encoded (UTF-8).
URLS can get very long with even a few international characters.
Browsers often show URLs not encoded in the URL bar but they get copied and pasted encoded.

Related

Internationalizing Title/Meta Tags ok or bad practice?

Is there a problem if I have both English and Chinese versions of the same title/meta tags under the same exact url? I detect the language the user has set for the browser (through the http header "accept-language" field) and change the titles/meta tags based on the language set. I get a large percentage of my traffic from China and felt this was a better-localized user experience for those users BUT I have no idea how Google would view this. My gut feeling tells me that this is not good for SEO.
Baidu.com, a major Chinese search engine, does in fact pick up my translated tags however for other US based sites it does not translate their English title/meta tags into Chinese. I would think Chinese users are less likely to click on those.
Creating sub domains and or separate domains for other countries is not an option at this point. That being said should I only have one language (English) for my title/meta tags to avoid any search engine issues?
Thanks for any advice / wisdom you can offer. Really hoping to get clarity on best practices.
Thanks all!
Yes, it probably is a problem. Search engines see mixed language content. You are not describing how you “detect and change the titles/meta tags based on the users browser language”, but you are probably doing it client-side and using “browser language”, which is wrong whatever it means in detail (it does not specify the user’s preferred language).
To get a more targeted answer, ask a more real question, with a URL.
If you want to get search traffic from search engines in both English and Chinese, you should have two urls instead of one.
When googlebot crawls a page, it does not even send the "Accept-Language" header. You have to send it your default language. When there is one url, there is no way for you to have your second language indexed. You won't be ranked in search engines in multiple languages.
For best SEO, use separate top level domains, subdomains, or folders for different languages.
http://example.de/
http://example.es/
http://example.com/
http://de.example.com/
http://es.example.com/
http://www.example.com/
http://example.com/de/
http://example.com/es/
http://example.com/en/
I think there are no problem when you use English and Chinese in same meta tags.

#Url.Content not encoding text - ASP.NET MVC with Razor

I need to build an url like this: /products/myproductdescription/5; it works except when the product description contains a "/".
I build the link with razor in this way:
#product.Description
I thought that using #product.Description would encode the description, but I get a link with the "/" if it is present in the description.
I tried this way also:
#product.Description
but the result is the same ...
Someone can tell me why that part of the link is not encoded?
Thank you.
You should avoid using special characters in the path portion of an url. You could use slugs and replace all dangerous characters. Here's for example how that's done on StackOverflow with the question titles in the url. In this case in order to be able to uniquely identify the resource always use an id. The description could only be used for SEO purposes.

Is it possible to create INTERNATIONAL permalinks?

i was wondering how you deal with permalinks on international sites. By permalink i mean some link which is unique and human readable.
E.g. for english phrases its no problem e.g. /product/some-title/
but what do you do if the product title is in e.g chinese language??
how do you deal with this problem?
i am implementing an international site and one requirement is to have human readable URLs.
Thanks for every comment
Characters outside the ISO Latin-1 set are not permitted in URLs according to this spec, so Chinese strings would be out immediately.
Where the product name can be localised, you can use urls like <DOMAIN>/<LANGUAGE>/DIR/<PRODUCT_TRANSLATED>, e.g.:
http://www.example.com/en/products/cat/
http://www.example.com/fr/products/chat/
accompanied by a mod_rewrite rule to the effect of:
RewriteRule ^([a-z]+)/product/([a-z]+)? product_lookup.php?lang=$1&product=$2
For the first example above, this rule will call product_lookup.php?lang=en&product=cat. Inside this script is where you would access the internal translation engine (from the lang parameter, en in this case) to do the same translation you do on the user-facing side to translate, say, "Chat" on the French page, "Cat" on the English, etc.
Using an external translation API would be a good idea, but tricky to get a reliable one which works correctly in your business domain. Google have opened up a translation API, but it currently only supports a limited number of languages.
English <=> Arabic
English <=> Chinese
English <=> Russian
Take a look at Wikipedia.
They use national characters in URLs.
For example, Russian home page URL is: http://ru.wikipedia.org/wiki/Заглавная_страница. The browser transparently encodes all non-ASCII characters and replaces them by their codes when sending URL to the server.
But on the web page all URLs are human-readable.
So you don't need to do anything special -- just put your product names into URLs as is.
The webserver should be able to decode them for your application automatically.
I usually transliterate the non-ascii characters. For example "täst" would become "taest". GNU iconv can do this for you (I'm sure there are other libraries):
$ echo täst | iconv -t 'ascii//translit'
taest
Alas, these transliterations are locale dependent: in languages other than german, 'ä' could be translitertated as simply 'a', for example. But on the other side, there should be a transliteration for every (commonly used) character set into ASCII.
How about some scheme like /productid/{product-id-number}/some-title/
where the site looks at the {number} and ignores the 'some-title' part entirely. You can put that into whatever language or encoding you like, because it's not being used.
If memory serves, you're only able to use English letters in URLs. There's a discussion to change that, but I'm fairly positive that it's not been implemented yet.
that said, you'd need to have a look up table where you assign translations of products/titles into whatever word that they'll be in the other language. For example:
foo.com/cat will need a translation look up for "cat" "gato" "neko" etc.
Then your HTTP module which is parsing those human reading objects into an exact url will know which page to serve based upon the translations.
Creating a look up for such thing seems an overflow to me. I cannot create a lookup for all the different words in all languages. Maybe accessing an translation API would be a good idea.
So as far as I can see its not possible to use foreign chars in the permalink as the sepecs of the URL does not allow it.
What do you think of encoding the specials chars? are those URLs recognized by Google then?

Are clean URLs a backend or a frontend thing

What do you think.. are clean URLs a backend or frontend 'discipline'
The answer is BOTH.
For example:
https://stackoverflow.com/questions/203278/are-clean-urls-a-backend-or-a-frontend-thing
The number above is a database id, a back-end thing. Chop off the pretty part and it goes to the same page. Therefore the "are-clean-urls-a-backend-or-a-frontend-thing" is part of the front-end thing.
If we're talking url's being 'clean' from an end user experience then I'm going to break the mould a bit and say that url's in general are not intuitive and they never will be, they are intended to be machine readable.
There is no standard to the format of a url such that when navigating from site to site humans will never ever remember how to reach a resource purely through remembering urls and their 'friendly syntax'. We can argue the toss about whether using a '?' and '&' or '/' to express how how to identify a resource via a url; is one method better than the other? it doesn't matter. At the end of the day a machine parses it and sends back the result.
We should stop deluding ourselves that people actually type these things in and realise that uri's are for machines, not people.
I have yet to use/remember a uri that goes beyond the first few characters of the http://domain.com/ part of an address, and I've been using the web since a long time. That's what bookmarks are for. Nowhere on a website does it say 'change this part here in our url to view 'whatever else' resource' because url's are usually undocumented and opaque.
Yes make your uri's SEO friendly (hell even they change periodically) but forget about the whole 'human/clean' resource identifier thing, it's a mystical pipe dream.
I agree with Vlion that url's should provide a unique mechanism to bookmark a resource and return to it (unlike some of these abominable web 2.0 ajax/silverlight/flash creations), but the bookmark will never be for humans to comprehend and understand. There seems to be quite a lot of preoccupation and energy spent in dreaming up url strategies that humans can remember and type in, it's a waste of energy. Let's get on and solve real problems.
Sorry for the rant, but there's a lot of web 2.0 nonsense related to urls going on in certain circles that are just a total waste of time.
Now that Firefox's Awesome bar and Google Chrome's Omnibox address bars can be used to search the browsing history it makes it much easier for users to search their history for previously visited sites, so having clean urls may help the user find sites in their history more easily.
Making sure the page has an appropriate Title is important (as both browsers search the title as well as the url) but by making sure the url has relevant keywords in it as well, when those keywords are typed in the address bar the urls will be more likely to show up higher in the suggestions as the keyword will be matched twice, in the url and the title.
Also, once a user has typed the name of a site they will be presented with example urls from the site which they can then use as a template for narrowing down their search. So using verbs and nouns in the url for different sections or actions of the site will aid the user to narrow their search to just the part of the site they are interested in, e.g. the /questions/ or /tag/ sections of stackoverflow, or the "/doc" at the end of docs.google.com/doc that can be used to view just document pages on Google docs*.
Since both Firefox and Chrome search for each space separated word typed into the address bar, it could be argued that it isn't necessary for searching that the url be completely human readable, but to allow the user to actually read the keywords they are interested in from the url the amount of "noise" should be kept to a minimum.
* which are of the form http://docs.google.com/Doc?id=gibberish
My perspective is simple:
every place I visit with my browser(with various edge case exceptions) should be bookmarkable and Forward/Back should be usable and not destroy any data entry.
Backend for sure. Your server is the one that has to take care of the routing to the resources requested by the URL.
I think the main reasons for using friendly URLs are:
Ease of linking / sharing
Presentation
Seo
So I think it's purely a client-side pleasure. While they're nice on the server as well, they're not mission critical.

What are the best practices for multilanguage sites?

I want to make a multi-language site, such that all or almost all pages will be available in 2 or more translations. What are the best practices to follow?
For example, I consider these language selection mechanisms:
Cookie-based selection of the preferred language.
Based on Accept-Language header if the cookie is not set.
Based on GeoIP otherwise (probably).
Is there anything else?
How should different translations be served?
as LANG.example.com/page
as example.com/LANG/page
as example.com/page?hl=LANG
...
any of the above with a redirect to example.com/page? (It seems to be discouraged)
How to ensure that all the translations are properly indexed?
Sitemaps with all pages + correct Content-Language header are enough?
What is the best way to let the users know there are other translations, but do not distract them?
list available languages in the header/footer/sidebar (like Wikipedia)
put “Choose a language” selector next to the content
What is the best policy to deal with missing/outdated translations?
do not display missing pages at all or display a page in a different language?
display old translation, old translation with a warning or a page in a different language?
What else should I take into account? What should I do and what I definitely should not?
In addition to #Quassnoi's answers ensure that you standard RFC 4646 language identifiers (e.g. EN-US, DE-AT); you may already be aware of this. The CLDR project is an excellent repository of internationalization data (the Supplemental Data is really useful).
If a translation of a specific page is not available, use a language fallback mechanism back to the neutral language; for example "DE-AT", "DE", "" (neutral, e.g. "EN").
Most recent browsers and the underlying operating systems will correctly show all of the characters required for a locale selector list if the page is encoded correctly (I'd recommend all pages being UTF-8). Ensure that the locale list contains both the native and current-language names to allow both native and non-native speakers to view the specified translations, e.g. "Deutsch (German)" if the current locale is EN-*.
A lot of sites use a flag icon to show the current locale, but this is more relevant to the location and some people may be offended if you show only a dominant flag (e.g. the US or UK flag for English).
It may be worthwhile to have a more visible (semi-graphical) locale selector on the home page if no locale cookie has been submitted, using a combination of GeoIP and Accept-Language to determine the default locale choice.
Semi-related: if your users are in located in different time zones include a zone preference in their account profile for displaying time values in their local time. And store all time stamps using UTC.
Make the decision whether you need support for languages that require double byte characters early on (Chinese, Japanese, Korean, etc), Unicode is the preferable choice. It can be tedious to change later, especially if you have a database that doesn't use unicode.
Cookie-based selection of the
preferred language.
Based on Accept-Language header if
the cookie is not set.
These two you should support.
Put a big english banner at the top of your page that reads This page in English.
as example.com/LANG/page
This is the best choice.
LANG.example.com isn't good for autocomplete, and the question marks look ugly.
list available languages in the header/footer/sidebar (like Wikipedia)
Choose a language dropbox is confusing, as it is not intelligible being written in a wrong foreign language and spoils overall impression being written in English.
And you always tend to make the error selecting the language you don't even have fonts for leaving yourself on a page full of question marks.
display old translation with a warning
You know there is something you can read and get the point, but for the details you'd better get a dictionary and read it in English.

Resources