Domain name alias, sitemaps.xml and robots.txt - laravel

I'm looking add referencing the sitemap for multiple domain name alias which is spun off logic within a Laravel framework. in my robots.txt file - but I'm not quite sure what the correct way is to do this. Sitemaps exist and are present and correct, but just unsure as to the format google expects...so really looking for SEO based answers rather than was to achieve this.
I'm thinking I can do this for robots.txt
i.e.,
Sitemap: https://www.main-domain.com/sitemap.xml
Sitemap: https://www.domain-alias1.com/sitemap.xml
Sitemap: https://www.domain-alias2.com/sitemap.xml
Any pro-seo tips would be much appreciated!

Assuming your code is about generating a proper robot.txt having multiple sitemaps, then listing them one per line with the Sitemap: instruction is the right way. These must be located at the same directory level as the robot.txt file (i.e. root level).

Related

Sitemap on multiple webserver

If I have two webserver serving the same domain. Is it possible to have sitemap1.xml on server1 and sitemap2.xml on server2? sitemap1 and sitemap2 will have different dynamic contents. Will search engines be able to find and index them?
To make sure they can find both sitemaps, make sure they are declared in your robots.txt file:
sitemap: http://www.yoursite.com/sitemap1.xml
sitemap: http://www.yoursite.com/sitemap2.xml
See a summary I maintain for more information.

How to create Directory/Folder when using Magento?

We are running magento on our site www.xsmoke.com. The site is international, so we are using "/country code" - e.g.www.xsmoke.com/de/ etc.
Now we would like to install wordpress in one of the languages only and we want the URL to be "xsmoke.com/de/blog".
But i can't create a folder on that location for the wordpress files because of magento.. Does anyone have an idea for a workaround?
Thanks.
I'm not sure how you've set things up but it would be possible to do this by using the technique involving directories and symlinks for multiple websites (rather than have Magento include the store codes in the url). See the below answer on how to do that;
https://magento.stackexchange.com/questions/13171/multiple-country-specific-stores-on-the-same-domain-show-country-selection-firs#answer-13173
And you'd then just install Wordpress in the /de/blog directory. Otherwise you might be better asking another question tagged with .htaccess and ask for a way to do a rewrite that would handle it in the context of Magento's existing rewrites.

Someone using our site on robots.txt

Some weeks ago, we discovered someone going on our site with the robots.txt directory:
http://www.ourdomain.com/robots.txt
I've been doing some research and it said that robots.txt makes the permissions of our search engine?
I'm not certain of that...
The reason why I'm asking this is because he is trying to get into that file once again today...
The thing is that we do not have this file on our website... So why is someone trying to access that file? Is it dangerous? Should we be worried?
We have tracked the IP address and it says the location is in Texas, and some weeks ago, it was in Venezuela... Is he using a VPN? Is this a bot?
Can someone explain what this file does and why he is trying to access it?
In a robots.txt (a simple text file) you can specify which URLs of your site should not be crawled by bots (like search engine crawlers).
The location of this file is fixed so that bots always know where to find the rules: the file named robots.txt has to be placed in the document root of your host. For example, when your site is http://example.com/blog, the robots.txt must be accessible from http://example.com/robots.txt.
Polite bots will always check this file before trying to access your pages; impolite bots will ignore it.
If you don’t provide a robots.txt, polite bots assume that they are allowed to crawl everything. To get rid of the 404s, use this robots.txt (which says the same: all bots are allowed to crawl everything):
User-agent: *
Disallow:

Is there a way to detect sitemap, if it is not in robots.txt?

I'm working for a simple bot for a project, and I noticed, that a lot of sites do not have sitemaps in their robot.txt files. There is of course an option to simply index the sites in question and crawl all possible pages, but that often takes much more time than simply downloading sitemap.
What is the best way to detect sitemap if it is not mentioned in robots.txt?
Normally it should be placed in the root directory of a domain like xydomain.xyz/sitemap.xml .
I would only add the site map into the robots file, if it is placed elsewhere. If a site uses more than one site map located on another place, it should be noted in an index map.
You can use this online tool to scan your site and create a bespoke sitemap.xlm file for your site.
To help your sitemap to be discovered through the robot.txt add the URL of your sitemap at the very top of your robot.txt file, (see below example).
So, the robots.txt file looks like this:
Sitemap: http://www.example.com/sitemap.xml
User-agent:*
Disallow:

mod_rewrite document root change or symlink

I am trying to find the best way of displaying content that resides under a different server location.
So I have a domain where have the main site content is located at:
/home/user/my_site/www/
and accessed at:
www.example.com
I have another site (a blog) located at:
/home/user/the_blog/www/
I wish to get the blog content to appear at:
www.example.com/news
I was planning on using an .htaccess file at my_site to set the rules for the path:
/news
However the content for the blog resides outside the .htaccess document root, so although U can set a rule it won't be able to access this content.
Is it possible to change the document root somewhere higher up the chain?
Or is it possible to just create a symlink for the /news folder? Is this even advisable?
Thanks in advance
Tom
You could set an alias to that location:
Alias /news /home/user/the_blog/www
But that can only be set in the server or virtual host configuration context and not in a .htaccess file.
Since both directories aren't in your DocumentRoot, I don't see how mod_rewrite can work here. And I don't think anyone would recommend symlinking. The way I see it, there are only two ways out of this: either change your DOcumentRoot or move the latter directory into the current DocumentRoot.

Resources