Robots.txt in Magento Websites - magento

I have recently started working with a company that has a Magento eCommerce website.
We spotted that the traffic dipped considerably in May, and also the ranking on Google.
When I started investigating i saw that the pages of the ES website were not appearing on Screaming Frog
Only the homepage showed and status said blocked by robots.txt
I said this to my developer and they said they would move the robot.txt file to the /pub folder.
but would that not mean the file was in two places.. would this be an issue?
The developer has gone ahead and done this, how long should it take to see is screaming frog is indexing the pages.
Any Magento developers that could help with advise on this?
Thanks
Neo

There is a documentation page for how to manage robots.txt with Magento 2.x.
And you can use this to allow all traffic to your site:
User-agent:*
Disallow:
Regarding the Googlebot crawl rate, here is some explanation on it.
According to Google, “crawling and indexing are processes which can take some time and which rely on many factors.” It’s estimated that it takes anywhere between a few days to four weeks before Googlebots index a new site. If you have an older website and doesn’t experience crawling, the design of the site may be a problem. Sometimes, sites are temporarily unavailable when Google attempted to crawl, so checking the crawl stats and looking at errors can help you make changes to get your site crawled. Google also has two different crawlbots: a desktop crawler to simulate a user on a desktop, and a mobile crawler to simulate a device search.

Related

How to apply pagespeed insights results

Well, I trying establish a web page with a wordpress and GoDaddy hosting. I want to make fast web page, because people says fast web pages appear on first line at Google (as specially mobile web page speed is very important people says). So want to make very fast web page but my level of knowledge is not very advanced, I progress by learning.
If I test my web page with Insights, mine mobile score is about 60-70. If I read reports of Insights there are lots of improvements links appear at blow. I want to learn how to fix that. If you help me make an example, I will do the others myself.
If we start at first problem which is /css?family=…(fonts.googleapis.com) this problem seen below of "Eliminate resources that prevent rendering" topic. So how to fix it. What should I do?
Also at the "covorage" tab there are some source codes are seen and it is not using. For example I am not using easy-sheare plugin (secong row at the image) at homepage.
How to remove safely that codes from home page. If I can learn how one is made, I can correct the others myself.
The issue you are running into is something I have seen over and over again. GoDaddy and Wordpress sites generally are bloated and perform poorly.
Here are some tips to improve your speed & get a better PS ranking.
Hosting: Do you need to be on Godaddy? I have seen this time and time again. Most websites on GD are SLOW. GD is good for domain registration, not for hosting. Most non-tech folks do not know any better. Try using Amazon Lightsail, AWS-S3, Google Firebase, or Netlify. They all offer much faster page loads by reducing initial server response time. And they are surprisingly simple to learn and deploy.
CDN: You must use a content-distribution-network (CDN). Check out Cloudfront. They offer a free tier that works quite well.
Wordpress: This is your real issue. Wordpress is neither easy to build nor easy to maintain. You need multiple plugins to make the site perform. Best you build your own. If you have to be on Wordpress checkout image optimizers, minifiers, and cache plugins. Gumlet, WP Rocket, Shortpixel are quite popular to improve speed.

Google webmaster tools: sitemaps submitted every day (?)

having sent normally the first time my sitemap.xml through webmaster tools, I notice every day submitted url's plots (beside indexed ones under optimisation->sitemaps menu) without doing anything from my own. I use drupal7 with sitemap module (http://drupal.org/project/xmlsitemap) and there's no automated tasks enabled.
Does it mean that url's are submitted "internally" by google every day? Or there's something wrong that I need to resolve?
Many thanks for help.
Google will remember any sitemaps you submit and their crawler will automatically download those and associated resources more or less whenever it feels like doing so. This is usually reflected in your Webmaster Tools. In all likelihood it'll even do so without you entering your sitemap on their website if your site gets linked to. Same goes for pretty much any other bot and crawler out in the wild.
No need to worry, everything is doing what it's supposed to. It's a Good Thing(tm) when Google crawls your site frequently :).

Bulk import + export url rerwrites for Magento

I found a "bulk import and export url rewrites extension" for Magento when looking on the internet on how to bulk redirect urls from my current live site to the new urls based on the new site which is on a development server still.
I’ve asked my programmer to help me out and they’ve sent me 2 CSV files, one with all request and target urls from the current live site (these are often different as well, probably due to earlier redirects), and one similar to that for the new site. The current live site comes with 2500 urls, the future site with 3500 (probably because some old, inactive and unnecessary categories are still in the new site as well).
I was thinking to paste the current site’s urls into an Excel sheet and to then insert the future urls per url. A lot of work… Then I thought: can’t I limit my work to the approx. 300 urls that Google has indexed (which can be found through Google Webmaster Tools as you probably know)?
What would you recommend? Would there be advantages to using such an extension? Which advantages would that be? (if you keep in mind that my programmer would upload all of my redirects into a .htaccess file for me?)
Thanks in advance.
Kind regards, Bob.
Axel is giving HORRIBLE advice.. 301 redirects tell Google to transfer old PR and authority of that page to the new page in the redirect.. More over if other websites linked to those pages you don't want a bunch of dead links, then someone MIGHT just remove them.
Even worse if you don't handle the 404's correctly Google can and will penalize you.
ALWAYS setup 301 redirects when changing platforms, only someone that either doesn't understand or perhaps care about SEO would suggest otherwise

Why use a Google Sitemap?

I've played around with Google Sitemaps on a couple sites. The lastmod, changefreq, and priority parameters are pretty cool in theory. But in practice I haven't seen these parameters affect much.
And most of my sites don't have a Google Sitemap and that has worked out fine. Google still crawls the site and finds all of my pages. The old meta robot and robots.txt mechanisms still work when you don't want a page (or directory) to be indexed. And I just leave every other page alone and as long as there's a link to it Google will find it.
So what reasons have you found to write a Google Sitemap? Is it worth it?
From the FAQ:
Sitemaps are particularly helpful if:
Your site has dynamic content.
Your site has pages that aren't easily
discovered by Googlebot during the
crawl process—for example, pages
featuring rich AJAX or images.
Your site is new and has few links to it.
(Googlebot crawls the web by
following links from one page to
another, so if your site isn't well
linked, it may be hard for us to
discover it.)
Your site has a large
archive of content pages that are not
well linked to each other, or are not
linked at all.
It also allows you to provide more granular information to Google about the relative importance of pages in your site and how often the spider should come back. And, as mentioned above, if Google deems your site important enough to show sublinks under in the search results, you can control what appears via sitemap.
I believe the "special links" in search results are generated from the google sitemap.
What do I mean by "special link"? Search for "apache", below the first result (Apache software foundation) there are two columns of links ("Apache Server", "Tomcat", "FAQ").
I guess it helps Google to prioritize their crawl? But in practice I was involved in a project where we used the gzip-ed large version of it where it helped massively. And AFAIK there is a nice integration with webmaster tools as well.
I am also curious about the topic, but does it cost anything to generate a sitemap?
In theory, anything that costs nothing and may have a potential gain, even if very small or very remote, can be defined as "worth it".
In addition, Google says: "Tell us about your pages with Sitemaps: which ones are the most important to you and how often they change. You can also let us know how you would like the URLs we index to appear." (Webmaster Tools)
I don't think that the bold statement above is possible with the traditional mechanisms that search engines use to discover URLs.

Slow down spidering of website

Is there a way to force a spider to slow down its spidering of a website? Anything that can be put in headers or robots.txt?
I thought i remembered reading something about this being possible but cannot find anything now.
If you're referring to Google, you can throttle the speed at which Google spiders your site by using your Google Webmaster account (Google Webmaster Tools).
There is also this, which you can put in robots.txt
User-agent: *
Crawl-delay: 10
Where crawl delay is specified as the number of seconds between each page crawl. Of course, like everything else in robots.txt, the crawler has to respect it, so YMMV.
Beyond using the Google Webmaster tools for the Googlebot (see Robert Harvey's answer), Yahoo! and Bing support the nonstandard Crawl-delay directive in robots.txt:
http://en.wikipedia.org/wiki/Robots.txt#Nonstandard_extensions
When push comes to shove, however, a misbehaving bot that's slamming your site will just have to be blocked at a higher level (e.g. load balancer, router, caching proxy, whatever is appropriate for your architecture).
See Throttling your web server for a solution using Perl. Randal Schwartz said that he survived a Slashdot attack using this solution.
I don't think robots will do anything except allow or disallow. Most of the search engines will allow you to customize how they index your site.
For example: Bing and Google
If you have a specific agent that is causing issues, you might either block it specifically, or see if you can configure it.

Resources