Avoid duplicating hreflang tags - hreflang

How can i avoid duplicating hreflang tags in this situation:
Home : http://mywebsite.com -> use browser language (assuming what is french now)
<link rel=”alternate” hreflang=”fr” href=”http://mywebsite.com”>
<link rel=”alternate” hreflang=”fr” href=”http://mywebsite.com/fr”>
<link rel=”alternate” hreflang=”en” href=”http://mywebsite.com/en”>
Thank you specialists

You can set the language and region via hreflang.
Examples:
de: German language content, independent of region
en-GB: English language content, for GB users
de-ES: German language content, for users in Spain
Source:
https://developers.google.com/search/docs/advanced/crawling/localized-versions
The default specification is then x-default value.
Source:
https://developers.google.com/search/blog/2013/04/x-default-hreflang-for-international-pages

Related

Why are my search results not in the same charset as my page encoding?

I am using UTF-8 encoding for an html page.
<head>
<meta charset="utf-8">
In the debugger console, document.characterSet returns "UTF-8".
On the page, I have metadata (keywords, description, title) with a valid UTF-8 character: '®', which is UTF-8: 'c2ae'
The character displays correctly in the view source, and on the page title.
But google search results and bing search results are showing it as 'î'. That is, during the web crawl, it appears to be getting converted to ISO-8859-1 or Western-1252 displaying both bytes: 'c2' and 'ae'.
If I replace the character with ® => (\u00ae) it shows correctly.
Short of converting my meta data to ISO-8859-1, is there a best practice I should be using for this?
Issue was on the back-end, the data was not being transcoded to UTF-8 properly when read from cache. So, I feel the best practice is to use the native UTF-8 BMP character, with the proper page encoding, and not be required to use html entity values.
Look at the pages meta tags and confirm that it is not using this:
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
For HTML5 Google recommends:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
Also note this:
Note:
<meta charset="">
Another Note:
Some characters are reserved in HTML. "Html Entities"
These reserved characters in HTML must be replaced with character entities.
e.g.
& ampersand & &
® registered trademark ® ®

When do I need to use x-default for hreflang?

I run a site in Belgium for which default language is Dutch. Using a selector the user can translate the page into English and French.
When entering the site for the first time it's served in Dutch:
http://example.com/articles/my_article/
The language switcher gives you this English version (this places a language cookie for English):
http://example.com/my_article/?lang=en
The language switcher gives you this French version (this places a language cookie for French):
http://example.com/my_article/?lang=fr
The language switcher gives you this Dutch version (this places a language cookie for Dutch):
http://example.com/my_article/?lang=nl
Now I use the following canonical and alternate hreflang tags on this page:
<link rel='canonical' href='http://example.com/my_article/'/>
<link rel='alternate' hreflang='nl' href='http://example.com/my_article/?lang=nl'/>
<link rel='alternate' hreflang='en' href='http://example.com/my_article/?lang=en'/>
<link rel='alternate' hreflang='fr' href='http://example.com/my_article/?lang=fr'/>
The problem is, when you go back to the following URL after visiting a URL with lang=xy then it'll be served in the language based on the cookie that was previously set:
http://example.com/articles/my_article/
Does that mean I should add x-default for this page?
<link rel="alternate" href="http://example.com/my_article/" hreflang="x-default" />
From my understanding, that is the way it is supposed to work. Once users select a language, they see the content in that language.
X-default should point to a "language/region/country selection page".
In this case, it could be example.com/welcome that shows a menu to select a preferred language.
So x-default should not show any particular language page version. Like, choosing English to be x-default (example.com/my_article/?lang=en). No. It should point to the language selection page, like a welcome page. That page should be written in whatever language you think is the safest "catch-all", with a design that's easy to navigate even if you don't speak it (country flags with language name written in the language of the country, stating something like "English language site version" or whatever you think explains it the best).
Google explains it here:
https://support.google.com/webmasters/answer/189077?hl=en

With AMP HTML, is it legitimate to set the link canonical href attribute to pound (#)?

Is it legitimate to set the canonical link to the pound symbol as shown below, or am I required to enter a physical page name?
<link rel="canonical" href="#">
When testing this, the pound setting does not generate a validation error (ala #development=1). In my scenario, the page using this layout file will not have an alternate "regular HTML" version. The only version will be the AMP HTML version.
For additional context, I'm experimenting with an MVC site that will use AMP HTML. To keep my layout file simple, I'd prefer to use the pound symbol rather than extracting the child page name and applying that to the href attribute. I know how to apply the URL to the partial view via code like so:
<link rel="canonical" href="#HttpContext.Current.Request.Url.AbsoluteUri">
I'm just curious if it's legitimate AMP HTML to use the pound symbol instead. Thank you.
From the documentation:
Required markup
AMP HTML documents MUST:
contain a <link rel="canonical" href="$SOME_URL" /> tag inside their head that points to the regular HTML version of the AMP HTML
document or to itself if no such HTML version exists.
So instead of using href="#", you should have it point to itself in order to stay consistent with the AMP specifications.
Validation is evolving, the validator doesn't catch all issues today. The issue with using "#" or any relative URL is that when this document is served elsewhere, such as cdn.ampproject.org, that relative URL will no longer point to your intended canonical. You should instead use an absolute URL <link rel=canonical href="URL">.

Firefox and UTF-16 encoding

I'm building a website with the encoding UTF-16. It means that every files (html,jsp) is encoded in UTF-18 and I set in the head of every HTML page :
<meta http-equiv="content-type" content="text/html; charset=UTF-16">
My index page is correctly displayed by Chrom and IE. However, firefox doesn't render the index. It displays 2 strange characters and the full index page code :
��<!DOCTYPE html> <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-16"> ...
Do you know the reason? It should be a problem of encoding, but I don't know where it's located...
Thanks
(Disclosure: I’m the developer responsible for the relevant code in Firefox.)
I'm building a website with the encoding UTF-16.
Please don’t. The short rules are:
Never use UTF-16 for interchange.
Always use UTF-8 for interchange.
If you break rules 1 & 2 and still use UTF-16, at least use the BOM (the right one).
But seriously, don’t break rules 1 and 2.
If you include user-provided content on your pages, using UTF-16 means that your site is vulnerable to socially engineered XSS at least in older browsers. Try this demo in an old version of Firefox (version 20 or older) or in a Presto-based version of Opera.
To avoid the vulnerability, use UTF-8.
It means that every files (html,jsp) is encoded in UTF-18
Uh oh. :-)
and I set in the head of every HTML page :
<meta http-equiv="content-type" content="text/html; charset=UTF-16">
A meta tag works as an internal encoding declaration only when the encoding being used maps the bytes of the meta tag to the same bytes ASCII would. That’s not the case for UTF-16.
Do you know the reason?
Not without full response headers and the original response body in a hex editor. The general solution, as noted above, is to use always UTF-8 and never to use UTF-16 over HTTP.
If your content is in a language for which UTF-16 is more compact than UTF-8, two things:
All the HTML, JS and CSS on the page is more compact in UTF-8.
gzip makes the difference go away.
Check that the server sends a Content-Type header with the correct encoding.

Meta tags not valid (html5reset templates)

I am using html5reset as a reset and template for a website. However I am getting all kinds of validation errors on some meta tags:
<meta name="title" >
<meta name="google-site-verification" >
<meta name="copyright" >
<meta name="DC.title" >
<meta name="DC.subject" >
<meta name="DC.creator" >
I could simply remove those meta tags, but I'd rather know why first. Here is the link to validate my website (which is online at a temporary url): http://validator.w3.org/check?uri=http%3A%2F%2Ftanchelmus.be%2Fsten%2Fnl%2Fnews&charset=%28detect+automatically%29&doctype=Inline&group=0
If you use "property" instead of "name" you will pass HTML5 validator.
<meta property="DC.title" >
<meta property="DC.subject" >
<meta property="DC.creator" >
I'm not entirely sure if the syntax is correct, but I did see this in RDFa documentation here: http://en.wikipedia.org/wiki/RDFa
Another thing to note is that the HTML5 standard is a work in progress. And thus the W3C HTML5 validator is also a work in progress.
This blog post has some interesting background regarding Microformats (eg Dublin Core) & HTML5.
It may not always be pragmatic to develop just to please the Validator, especially one that is a work in progress.
What it looks like you're being told is that the values you're using in the name attribute are not part of the valid set of values you can use.
Ths WhatWG Website identifies many of the standard and other meta name values.
Hello to validate the Dublin Core tags, you must change dc. to dcterms.
Here you can see an example:
<meta name="dcterms.contributor" content="Your name" />
<meta name="dcterms.keywords" content="Your keywords here" />
Regards!

Resources