We are using the robots.txt to reference our sitemap index file.
Now we will release new, different countries. Our webseite under the TLD .de provides a robots.txt, containing a reference to our index file. The index files refers to different sitemaps containing our .de link in loc XML node. Other locales (eg. for .fr) are listed with xhtml:link below.
Example:
<url>
<loc>https://xy.de/hallo</loc>
<xhtml:link>https://xy.fr/hello</xhtml:link>
</url>
The question is now, should we add a robots.txt with a reference to our sitemap index to our .fr index too?
Or might it is enough to place the reference only in the German .de robots.txt because the locations are described with alternative locations for each other locale? Or should we invert the loc XML node with the "current" locale? E.g. under https://xy.fr/robots.txt should there be a sitemap referenced with .fr links in the loc XML node?
The Sitemaps protocol doesn’t mention an xhtml:link element, so consumers following the protocol might ignore it.
As a sitemap can only contain URLs from the same host, and a robots.txt file also only works for its host, the typical way is to give each host its own robots.txt file which points to this host’s sitemap (with an absolute URL).
# robots.txt from http://fr.example/robots.txt
Sitemap: http://fr.example/sitemap.xml
# robots.txt from http://de.example/robots.txt
Sitemap: http://de.example/sitemap.xml
The sitemap can be hosted on a different host, but you still need to prove ownership via the robots.txt file (see Sitemaps & Cross Submits).
Related
I'm looking add referencing the sitemap for multiple domain name alias which is spun off logic within a Laravel framework. in my robots.txt file - but I'm not quite sure what the correct way is to do this. Sitemaps exist and are present and correct, but just unsure as to the format google expects...so really looking for SEO based answers rather than was to achieve this.
I'm thinking I can do this for robots.txt
i.e.,
Sitemap: https://www.main-domain.com/sitemap.xml
Sitemap: https://www.domain-alias1.com/sitemap.xml
Sitemap: https://www.domain-alias2.com/sitemap.xml
Any pro-seo tips would be much appreciated!
Assuming your code is about generating a proper robot.txt having multiple sitemaps, then listing them one per line with the Sitemap: instruction is the right way. These must be located at the same directory level as the robot.txt file (i.e. root level).
If I have two webserver serving the same domain. Is it possible to have sitemap1.xml on server1 and sitemap2.xml on server2? sitemap1 and sitemap2 will have different dynamic contents. Will search engines be able to find and index them?
To make sure they can find both sitemaps, make sure they are declared in your robots.txt file:
sitemap: http://www.yoursite.com/sitemap1.xml
sitemap: http://www.yoursite.com/sitemap2.xml
See a summary I maintain for more information.
I'm working for a simple bot for a project, and I noticed, that a lot of sites do not have sitemaps in their robot.txt files. There is of course an option to simply index the sites in question and crawl all possible pages, but that often takes much more time than simply downloading sitemap.
What is the best way to detect sitemap if it is not mentioned in robots.txt?
Normally it should be placed in the root directory of a domain like xydomain.xyz/sitemap.xml .
I would only add the site map into the robots file, if it is placed elsewhere. If a site uses more than one site map located on another place, it should be noted in an index map.
You can use this online tool to scan your site and create a bespoke sitemap.xlm file for your site.
To help your sitemap to be discovered through the robot.txt add the URL of your sitemap at the very top of your robot.txt file, (see below example).
So, the robots.txt file looks like this:
Sitemap: http://www.example.com/sitemap.xml
User-agent:*
Disallow:
I want to have a sitemap structure where the sitempasindex file is located in the root path (example.com/sitemaps.xml) and it references several sitemap[n].xml files located in a folder (example.com/static/sitemap1.xml). Those sitemap[n].xml files link to webpages that are in the rooth path (like example.com/helloworld.html).
Is that posible? I'm asking because I know that if the sitemap.xml file is placed within a folder, it can only contain webpages that are under that folder.
Thanks!
I believe you easily have
example.com/sitemap-index.xml
point to e.g.
example.com/sub1/sitemap.xml
and
example.com/sub2/sitemap.xml
however each sitemap.xml should only contains URLs within each their subfolder. (From your question, it seems you have those sitemap.xml files link to paths in root. I doubt that works, but you could try run a small test and submit to Google. If no errors then...)
The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.
From google perspective, they should be available on main root of the website. http://example.com/sitemap.xml, When you submit it through the subdir in webmaster tool "http://example.com/catalog/sitemap.xml" google won't crawl it and always showing us pending index status.
I am trying to find the best way of displaying content that resides under a different server location.
So I have a domain where have the main site content is located at:
/home/user/my_site/www/
and accessed at:
www.example.com
I have another site (a blog) located at:
/home/user/the_blog/www/
I wish to get the blog content to appear at:
www.example.com/news
I was planning on using an .htaccess file at my_site to set the rules for the path:
/news
However the content for the blog resides outside the .htaccess document root, so although U can set a rule it won't be able to access this content.
Is it possible to change the document root somewhere higher up the chain?
Or is it possible to just create a symlink for the /news folder? Is this even advisable?
Thanks in advance
Tom
You could set an alias to that location:
Alias /news /home/user/the_blog/www
But that can only be set in the server or virtual host configuration context and not in a .htaccess file.
Since both directories aren't in your DocumentRoot, I don't see how mod_rewrite can work here. And I don't think anyone would recommend symlinking. The way I see it, there are only two ways out of this: either change your DOcumentRoot or move the latter directory into the current DocumentRoot.