We have a sitemap at our site, http://www.gamezebo.com/sitemap.xml
Some of the urls in the sitemap, are being reported in the webmaster central as being blocked by our robots.txt, see, gamezebo.com/robots.txt ! Although these urls are not Disallowed in Robots.txt. There are other such urls aswell, for example, gamezebo.com/gamelinks is present in our sitemap, but it's being reported as "URL restricted by robots.txt".
Also I have this parse result in the Webmaster Central that says, "Line 21: Crawl-delay: 10 Rule ignored by Googlebot". What does it mean?
I appreciate your help,
Thanks.
the crawl delay is not an actual specification in robots.txt format so that line will be ignored... you can set a custom crawl rate if you want to in google webmaster tools under settings > crawl rate
Related
our 2 products such as www.nbook.in and www.a4auto.com are removed from all search engines. These projects got sub-domains and the links from subdomains are available. The domain is not blacklisted. The custom sitemap is created and the same is getting indexed even now. Analyzed the URL by google search console and it seems fine. 'site:nbook.in' in google didn't produce any result. Actually, we got more than 75,000 links available on google. it was working fine till last week. This is affecting our product's Alexa rank and reach. There is no issue with robots.txt file in the server document root folder. There is no rule set to deny any bot or user agent. It is not just Google, all search engines removed our links and this is making me confuse. We have other products which is designed and developed on the same way but there is absolutely no issue. I think there is nothing more to do with google search console/webmaster. I'm ready to provide more data upon requirement. Please help.
create robots.txt on root and put this code
User-agent: *
Disallow: /
How to solve error of Wrong Pages found error in sitemap.xml of Magento? After i set "No" value in "Use Categories Path for Product URLs"...
It seems from your question that earlier you were having some urls for which path is now changed, thus you want earlier URL to be removed. If this is the case then you there are several option:-
Firstly google itself remove url not found after certain search.
Secondly you can use Google remove url tool(https://www.google.com/webmasters/tools/url-removal), only if you have access of webmaster tool.
Thirdly add the url in robot.txt so that they are no more indexed by Google.
Hope this answers your question
Some weeks ago, we discovered someone going on our site with the robots.txt directory:
http://www.ourdomain.com/robots.txt
I've been doing some research and it said that robots.txt makes the permissions of our search engine?
I'm not certain of that...
The reason why I'm asking this is because he is trying to get into that file once again today...
The thing is that we do not have this file on our website... So why is someone trying to access that file? Is it dangerous? Should we be worried?
We have tracked the IP address and it says the location is in Texas, and some weeks ago, it was in Venezuela... Is he using a VPN? Is this a bot?
Can someone explain what this file does and why he is trying to access it?
In a robots.txt (a simple text file) you can specify which URLs of your site should not be crawled by bots (like search engine crawlers).
The location of this file is fixed so that bots always know where to find the rules: the file named robots.txt has to be placed in the document root of your host. For example, when your site is http://example.com/blog, the robots.txt must be accessible from http://example.com/robots.txt.
Polite bots will always check this file before trying to access your pages; impolite bots will ignore it.
If you don’t provide a robots.txt, polite bots assume that they are allowed to crawl everything. To get rid of the 404s, use this robots.txt (which says the same: all bots are allowed to crawl everything):
User-agent: *
Disallow:
I've read the guideline on google news sitemap but could not locate whether google news sitemap needs to be referenced on robots.txt.
https://support.google.com/news/publisher/answer/74288?hl=en#sitemapguidelines
Could someone please confirm
On the documentation page you linked, there is this note:
Once you've created your Sitemap, upload it to the highest-level directory that contains your news articles. Please see this page for further instructions on submitting your Sitemap.
It explains that you can submit the sitemaps in Google Webmaster Tools: Crawl → Sitemaps
No it does not need to be included in robots.txt if you have submitted it to Google, but it is a good practice to do so.
my hosting company mixed up and trying to limit the search agents they had blocked all including google with robots.txt. Afer I discover it I changed the robots.txt content to Allow: / and waited for a week time Google to see changes but nothing.. Than I completely removed robots.txt file and still I can see this error:
The result from this is that my site did 1000-1200 visits per day - now is dropped to 200.. Please help to solve this.. How to wake google that nothing stops him to browse the site? Is it possible all those 5000 url that are now blocked to have been removed from google index?
what you need to do is create robots.txt allowing your whole site to be indexed, upload it to root then go to Webmaster Tools -> Crawl -> Fetch as Google and click on red button saying "FETCH"
wait few seconds or just refresh the page, then click on "Submit to index" and choose "URL and all linked pages"
let me know if that helps, i'm pretty sure it will help