How to customise the default language sitemap url in a multilanguage Hugo website - sitemap

I have a multilanguage website in Hugo and right now the sitemap generated automatically is the following:
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://domain/en/sitemap.xml</loc>
<lastmod>2022-04-20T08:34:57+02:00</lastmod>
</sitemap>
<sitemap>
<loc>https://domain/it/sitemap.xml</loc>
<lastmod>2022-04-20T08:34:57+02:00</lastmod>
</sitemap>
</sitemapindex>
The issue is that all the content in English, which is the default language, does not contain /en in the url but simply the slug itself, such as /products /blog. The italian content contains the language indication in the url instead, such as /it/prodotti, /it/blog.
Sitemap-wise, it doesn't seem to be advisable to have the english sitemap in /en/sitemap. It should be in /domain/sitemap_en.xml instead.
Any clue on how to customise the localised url of the sitemap?
Thank you.

Here is the hugo Built-in Template for sitemapindex:
https://github.com/gohugoio/hugo/blob/master/tpl/tplimpl/embedded/templates/_default/sitemapindex.xml
They use .SitemapAbsURL variable, but I didn't find in documentation from where it came. However, you could rewrite sitemapindex for example with .Permalik
To override the built-in sitemapindex.xml template, create a new file in either of these locations:
layouts/sitemapindex.xml
layouts/_default/sitemapindex.xml

Related

Will sitemaps (with indexes and text url lists) work with cross subdomains?

I have two subdomains I use for my website: static.example.com and www.example.com. Due to the nature of my web server, it is best for me to serve the static content (css, js files and hopefully sitemaps) with static.example.com.
I have put Sitemap: https://static.example.com/sitemap.xml into the robots.txt for www.example.com. However, I will need to have several sitemap indexes with hundreds of thousands to a few millions of urls under different subdirectories.
For example, I have the following subdirectories in the main website:
www.example.com/articles
www.example.com/questions
www.example.com/videos
...
Therefore, can I structure my sitemap.xml in this way:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://static.example.com/sitemaps/article.xml</loc>
</sitemap>
<sitemap>
<loc>https://static.example.com/sitemaps/question.xml</loc>
</sitemap>
<sitemap>
<loc>https://static.example.com/sitemaps/video.xml</loc>
</sitemap>
</sitemapindex>
Then for example in the article sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://static.example.com/sitemaps/article/1-10000.txt</loc>
<lastmod>2021-04-22T19:50:00+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://static.example.com/sitemaps/article/10001-20000.txt</loc>
<lastmod>2021-04-22T19:50:00+00:00</lastmod>
</sitemap>
</sitemapindex>
And in each .txt files I will be listing the urls that address the main website. For example
https://www.example.com/article/1
https://www.example.com/article/5
https://www.example.com/article/8
...
Is this structure okay? Cross submits explained here explicitly allows me to put my main sitemap under a different domain and for txt url lists it tells me to put them into the highest-level directory. Didn't see it mentioning serving url lists or sitemap indexes under a different subdomain.
Is it possible for me to serve my sitemaps and url lists in this way?
This by default won't work. The sitemaps protocol states that (see section on "Sitemap file location
"):
Note that this means that all URLs listed in the Sitemap must use the same protocol (http, in this example) and reside on the same host as the Sitemap. For instance, if the Sitemap is located at http://www.example.com/sitemap.xml, it can't include URLs from http://subdomain.example.com.
However, there are ways to make it work. For example, with Google, it will work as long as all subdomains have been verified in Search Console (see details here). More generally, you need to edit the robots.txt files to prove that you own all these hosts (even if they are just subdomains). You can see the "Sitemaps & Cross Submits" section of the same sitemaps protocol for details.

Magento does not generate robots.txt

I'm using Magento ver. 1.9.0.1 and I'm having some trouble with robots.txt.
I set INDEX, FOLLOW on system--> configuration--> design but I can't find robots.txt file in the main directory.
I can't understand what is wrong...maybe I can set a robots.txt file by myself but I suppose it is better to let Magento make it.
Thank you very much
You have to create manually robots.txt in magento
The setting which you are using will create the meta tag on every page as following
<meta name="robots" content="*" />
This setting will not create the robots.txt it self. You have to create robots.txt file by your self in Magento root folder.

Master Sitemap link in Header of site or Robots.txt

I have a master sitemap that contains links to other site maps that is accessable on a path like:
www.website.com/sitemap.xml
I wanted to ask if this is enough for the search engines or if I need to link this to my site?
linking - I know I can use a robots.txt file but I is it possible to just add a link to the head of the site - something like (and I'm just guessing):
<head>
<link rel="sitemap" type="application/xml" title="Sitemap" href="/sitemap.xml">
</head>
thankyou
Adam
This is totally okay.
Sitemap should always be located in the root and that is the only place where the search engines will look.
I suggest you to use a Google Webmasters tool to submit a sitemap for your domain so you can get indexed and you can monitor search engine behavior.
Hopefully this info will help you.

Hide directory in source code

Actually my question is How is folder path hidden ? Firstly I am using Joomla.
I found a website 4 months ago, so i don't remember name of the website which is Joomla site. They hide their folder path.
Between to head> sth. head(tags). If you look at source code, you can see this part. And then this part include template name.
For example:
<link type="text/css" href="http://www.site.com/templates/template_name/css/style.css" rel="stylesheet"></link>
So we can learn to what the name of the template. But they hide this part. When i looked this part(http://www.site.com/templates/template_name/css), i can see only /style.css.
Do you have any idea?
You need two things to accomplish that.
A (system) plugin, that changes every template related URL to the 'official' format, eg.
$url = str_replace('/templates/template_name/css/', '/style/', $url;
An .htaccess redirect reverting the change.
RewriteRule ^style/(.*)$ templates/template_name/css/$1 [R=301,L]
If you want obscure template names, and a few modules and plugins from HTML source, the easy way is use a CSS and JS compressor like jbetolo or RokBooster.
But keep in mind that you will make a bit more hader to find your template name, but still possible via other ways. Like some images if they are not compressed in HTML.

Multiple Sitemap: entries in robots.txt?

I have been searching around using Google but I can't find an answer to this question.
A robots.txt file can contain the following line:
Sitemap: http://www.mysite.com/sitemapindex.xml
but is it possible to specify multiple sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps referenced in each sitemap index file? For example, will this work:
Sitemap: http://www.mysite.com/sitemapindex1.xml
Sitemap: http://www.mysite.com/sitemapindex2.xml
Sitemap: http://www.mysite.com/sitemapindex3.xml
Yes it is possible to have more than one sitemap-index-file:
You can have more than one Sitemap index file.
Highlight by me.
Yes it is possible to list multiple sitemap-files within robots.txt, see as well in the sitemap.org site:
You can specify more than one Sitemap file per robots.txt file.
Sitemap: http://www.example.com/sitemap-host1.xml
Sitemap: http://www.example.com/sitemap-host2.xml
Highlight by me, this can not be misread I'd say, so simply spoken, this can be done.
This is also necessary for cross-submits, for which btw. the robots.txt has been chosen.
Btw Google, Yahoo and Bing, all are members of sitemaps.org:
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.
So you can rest assured that your sitemap entries will be properly read by the search engine bots.
Submitting them via webmaster tools can not hurt either - as John Mueller commented.
If your sitemap is over 10 MB (uncompressed) or has more than 50 000 entries Google requires that you use multiple sitemaps bundled with a Sitemap Index File.
Using Sitemap index files (to group multiple sitemap files)
In your robots.txt point to a sitemap index which should look like this:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2012-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2012-01-01</lastmod>
</sitemap>
</sitemapindex>
It's recommended to create a sitemap index file, rather separate XML URLs to put in your your robots.txt file.
Then, put the indexed sitemap URL as below in your robots.txt file.
Sitemap: http://www.yoursite.com/sitemap_index.xml
If you want to learn how to create indexed sitemap URL, then follow this guide from sitemap.org
Best Practice:
Create image sitemap, video sitemap separately if your website has huge number of such contents.
Check spelling of robots file, it should be robots.txt, don't use robot.txt or any misspelling.
Put robots.txt file in root directly only.
For more info, you can visit robots.txt's official website.
You need specify in your in your file sitemap.xml this code:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.exemple.com/sitemap1.xml.gz</loc>
</sitemap>
<sitemap>
<loc>http://www.exemple.com/sitemap2.xml.gz</loc>
</sitemap>
</sitemapindex>
source: https://support.google.com/webmasters/answer/75712?hl=fr#
It is possible to write them, but it is up to the search engine to know what to do with it. I suspect many search engines will either "keep digesting" more and more tokens, or alternatively, take the last sitemap they find as the real one.
I propose that the question be "if I want ____ search engine to index my site, would I be able to define multiple sitemaps?"

Resources