I'm thinking of doing such structure for accessing some hypothetical page:
/foo/ID/some-friendly-string
The key part here is "ID" that identifies the page so everything that's not ID is only relevant to SEO. I also want everything else that isn't "/foo/ID/some-friendly-string" to redirect to the original link. Eg.:
/foo/ID ---> /foo/ID/some-friendly-string
/foo/ID/some-friendly-string-blah ---> /foo/ID/some-friendly-string
But what if somehow these links get "polluted" somewhere on the internet and spiders start accessing them with "/foo/ID/some-friendly-string-blah-blah-pollution" URLs? I don't even know if this can happen, but if, say, some bad person decided to post thousands of such "different" links on some well known forums or some such - then google would find thousands of "different" URLs 301-redirecting to the same page.
In such case - would there be some sort of a penalty or is it all the same to google as long as the endpoint is unique and no content duplicates?
I might be a little paranoid with this, but that's just my nature to investigate explaitable situations :)
Thanks for your thoughts
Your approach of using 301 redirect is correct.
301 redirects are very useful if people access your site through several different URLs.
For instance, your page for a given ID can be accessed in multiple ways, say:
http://yoursite.com/foo/ID
http://yoursite.com/foo/ID/some-friendly-string (preferred)
http://yoursite.com/foo/ID/some-friendly-string-blah
http://yoursite.com/some-friendly-string-blah-blah-pollution
It is a good idea to pick one of those URLs (you have decided to be http://yoursite.com/foo/ID/some-friendly-string) as your preferred URL and use 301 redirects to send traffic from the other URLs to your preferred one.
I would also recommend adding canonical link to the HEAD section on the page e.g.
<link rel="canonical" href="http://yourwebsite.com/foo/ID/some-friendly-string"/>
You can get more details on 301 redirects in:
Google Webmaster Tools - Configuration > Change of Address
Google Webmaster Tools Documentation - 301 redirects
I hope that will help you out with your decisions on redirects.
EDIT
I forgot to mention very good example, namely, Stack Overflow. The URL of this question is
http://stackoverflow.com/questions/14318239/seo-301-redirect-limits but you can access it with http://stackoverflow.com/questions/14318239/blahblah and will get redirect to the original URL.
Related
I have a problem with someone (using many IP addresses) browsing all over my shop using:
example.com/catalog/category/view/id/$i
I have URL rewrite turned on, so the usual human browsing looks "friendly":
example.com/category_name.html
Therefore, the question is - how to prevent from browsing the shop using "old" (not rewritten) URLs, leaving only "friendly" URLs allowed?
This is pretty important, since it is using hundreds of threads which is causing the shop to work really slow.
Since there are many random IP addresses, clearly you can't just block access from a single or small group of addresses. You may need to implement some logging that somehow identifies this crawler uniquely (maybe by browser agent, or possibly with some clever use of the Modernizr javascript library).
Once you've been able to distinguish some unique identifiers of this crawler, you could probably use a rule in .htaccess (if it's a user agent thing) to redirect or otherwise prevent them from consuming your server's oomph.
This SO question provides details on rules for user agents.
Block all bots/crawlers/spiders for a special directory with htaccess
If the spider crawls all the urls of the given pattern:
example.com/catalog/category/view/id/$i
then you can just kill these urls in a .htaccess. The rewrite is made internally from category.html -> /catalog/category/view/id/$i so, you only block the bots.
Once the rewrites are there ... They are there. They are stored in the Mage database for many reasons. One is crawlers like the one crawling your site. Another is users that might have the old page bookmarked. There are a number of methods individuals have come up with to go through and clean up your redirects (Google) ... But as it stands, in Magento, once they are there, they are not easily managed using Magento.
I might suggest generating a new site map and submitting it to the crawler affecting your site. Not only is this crawler going to be crawling tons of pages it doesn't need to, it's going to see duplicate content (bad ju ju).
I found a "bulk import and export url rewrites extension" for Magento when looking on the internet on how to bulk redirect urls from my current live site to the new urls based on the new site which is on a development server still.
I’ve asked my programmer to help me out and they’ve sent me 2 CSV files, one with all request and target urls from the current live site (these are often different as well, probably due to earlier redirects), and one similar to that for the new site. The current live site comes with 2500 urls, the future site with 3500 (probably because some old, inactive and unnecessary categories are still in the new site as well).
I was thinking to paste the current site’s urls into an Excel sheet and to then insert the future urls per url. A lot of work… Then I thought: can’t I limit my work to the approx. 300 urls that Google has indexed (which can be found through Google Webmaster Tools as you probably know)?
What would you recommend? Would there be advantages to using such an extension? Which advantages would that be? (if you keep in mind that my programmer would upload all of my redirects into a .htaccess file for me?)
Thanks in advance.
Kind regards, Bob.
Axel is giving HORRIBLE advice.. 301 redirects tell Google to transfer old PR and authority of that page to the new page in the redirect.. More over if other websites linked to those pages you don't want a bunch of dead links, then someone MIGHT just remove them.
Even worse if you don't handle the 404's correctly Google can and will penalize you.
ALWAYS setup 301 redirects when changing platforms, only someone that either doesn't understand or perhaps care about SEO would suggest otherwise
We have an application which has about 15000 pages. For better SEO reasons we had to change the URL's. Google had already crawled all of these pages earlier and due to the change, we see a lot of duplicate titles/meta description on webmasters. Our impressions on google have dropped and we believe this is the reason. Correct me if my assumption is incorrect. Now we are not able to write a regular expression for the change of URL's using a 301 redirect, because the change was such. The only way to do it would be to write 301 redirects for individual URL's which is not feasible for 10000 URL's. Now can we use a robots meta tag with NOINDEX? My question basically is if I write a NOINDEX metatag will google remove the already indexed URL's? If not what are the other ways to remove the old indexed URL's from google? ANother thing which I can do is make all the previous pages 404 errors to avoid the duplicates, but will that be a right thing to do?
Now we are not able to write a regular expression for the change of
URL's using a 301 redirect, because the change was such. The only way
to do it would be to write 301 redirects for individual URL's which is
not feasible for 10000 URL's.
Of course you can! I'm rewriting more than 15000 URLs with mod_rewrite and RewriteMap!
This is just a matter of scripting / echo all URLs and mastering vim, but you can, and easily. If you need more information, just ask.
What you can do is a RewriteMap fils like this:
/baskinrobbins/branch/branch1/ /baskinrobbins/branch/Florida/Jacksonville/branch1
I've made a huge answer here and you can very easily adapt it to your needs.
I could do that job in 1-2 hours max but I'm expensive ;).
Reindexing is slow
It would take weeks for Google to ignore the older URLs anyways.
Use Htaccess 301 redirects
You can add a file on your Apache server, called .htaccess, that is able to list all the old URLs and the new URLs and have the user instantly redirected to the new page. Can you generate such a text file? I'm sure you can loop through the sections in your app or whatever and generate a list of URLs.
Use the following syntax.
Redirect 301 /oldpage.html http://www.yoursite.com/newpage.html
Redirect 301 /oldpage2.html http://www.yoursite.com/folder/
This prevents the 404 File Not Found errors, and is better than the meta refresh or redirect tag because the the old page is not even served to clients.
I did this for a website that had gone through a recent upgrade, and since google kept pointing to the older files, we needed to redirect clients to view the new content instead.
Where's .htaccess?
Go to your site’s root folder, and create/download the .htaccess file to your local computer and edit it with a plain-text editor (ie. Notepad). If you are using FTP Client software and you don’t see any .htaccess file on your server, ensure you are viewing invisible / system files.
I'm coding a site that makes heavy use of AJAX to load pages for users with JavaScript, but I also want it to be friendly for users with JavaScript disabled or unavailable. I've covered all the basics; for example, all my links point to canonical links, and JavaScript loads them via AJAX. My "about" page, therefore, is located at /about/, but will load on the main page and will, once finished, utilize hash/hashbang links to enable back-button functionality.
Here's the problem I have: while a hash/hashbang link will be able to be used to link to a specific page via AJAX for users with JavaScript, if a user with JavaScript attempts to link someone without it to the page, the page cannot be loaded for that person using AJAX.
As such, I'd like to be able, if possible, to use .htaccess to redirect hash/hashbang-specified pages to the canonical link. In other words, the exact opposite of what this contributer was trying to achieve.
http://example.com/#!about --> http://example.com/about/
Is it possible with .htaccess, or otherwise without JavaScript? If so, how?
Thanks!
I don't think it's possible to do this on server side. Because the part of the url after # is not included in the request sent to the server.
I might be a bit late to the party on this one, but i'm looking into this too. Since your url already contains the #!, as opposed to #, you can actually do this. Google will fetch
http://example.com/#!about
as
http://example.com?_escaped_fragment_about
Therefore, if you use a redirect 301 on that, and use javascript to redirect the user only version of the page, you have practically reached your desired result.
I realise you asked for a no-javascript solution, but i figure that was for reasons of SEO. For more information, please see this page by google.
EDIT:
<meta http-equiv="refresh" content="5; url=http://example.com/">
Some more on meta refresh here.
It:
1) Does not require javascript!
-
2) Can be Seo friendly!
-
3) Works with bookmarks and history (etc.)
I hope this helps!
So Google takes:
http://www.mysite.com/mypage/#!pageState
and converts it to:
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
...So... Would be it fair game to redirect that with a 301 status to something like:
http://www.mysite.com/mypage/pagestate/
and then return an HTML snapshot?
My thought is if you have an existing html structure, and you just want to add ajax as a progressive enhancement, this would be a fair way to do it, if Google just skipped over _escaped_fragment_ and indexed the redirected URL. Then your ajax links are configured by javascript, and underneath them are the regular links that go to your regular site structure.
So then when a user comes in on a static url (ie http://www.mysite.com/mypage/pagestate/ ), the first link he clicks takes him to the ajax interface if he has javascript, then it's all ajax.
On a side note does anyone know if Yahoo/MSN onboard with this 'spec' (loosely used)? I can't seem to find anything that says for sure.
If you redirect the "?_escaped_fragment_" URL it will likely result in the final URL being indexed (which might result in a suboptimal user experience, depending on how you have your site setup). There might be a reason to do it like that, but it's hard to say in general.
As far as I know, other search engines are not yet following the AJAX-crawling proposal.
You've pretty much got it. I recently did some tests and experimented with sites like Twitter (which uses #!) to see how they handle this. From what I can tell they handle it like you're describing.
If this is your primary URL
http://www.mysite.com/mypage/#!pageState
Google/Facebook will go to
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
You can setup a server-side 301 redirect to a prettier URL, perhaps something like
http://www.mysite.com/mypage/pagestate/
On these HTML snapshot pages you can add a client-side redirect to send most people back to the dynamic version of the page. This ensures most people share the dynamic URL. For example, if you try to go to http://twitter.com/brettdewoody it'll redirect you to the dynamic (https://twitter.com/#!/brettdewoody) version of the page.
To answer your last question, both Google and Facebook use the _escaped_fragment_ method right now.