Does Googlebot crawl url with get parameters?

Does Googlebot crawl url with get parameters? - mod-rewrite

I gave up rewriting url because I don't feel the necessity.
So url looks like "http://www.stackoverflow.com?question=35453."
But, I worry about one thing.
Does Googlebot crawl my pages?
Do I need to rewrite my url like "http://www.stackoverflow.com/question/35453"?

Does Googlebot crawl my pages?
Yes. But you can set in google webmaster tools to ignore it.
In a general way, if you want nice url more meaningful for search engines and users, you should avoid parameters and transform to human language words separated by dashes.
So IMHO, the best you could do is:
http://www.stackoverflow.com/question/does-googlebot-crawl-url-with-get-parameters.html.

Related

Crawl depth for URLs added through metadata-and-url feed

We have a need to add specific URLs through metadata-and-url feed and prevent GSA to follow links found on these pages. URLs found on this pages must be ignored even if they specified in Follow Patterns rules.
Is it possible to specify crawl depth for URLs added through metadata-and-url feed or maybe there are some other ways to prevent GSA follow URLs found on specific pages?

You can't solve this problem with just a metadata-and-URL feed. The GSA is going to crawl the links that it finds, unless you can specify patterns to block them.
There are a couple possible solutions I can think of.
You could replace the metadata-and-URL feed with a content feed. You'd then have to fetch whatever you want to index and include that in the feed. Your fetch program could remove all of the links, or it could "break" relative links by specifying an incorrect URL for each of the documents. You'd then have to rewrite the incorrect URLs back to the correct URLs in your search result display page. I've done the second approach before, and that's pretty easy to do.
You could use a crawl proxy to block access to any of the links you don't want the GSA to follow.

The easiest method to prevent this is to add the following to the "HEAD" section of your HTML.
This will prevent the GSA (and any other search engine) from following any links on the page.

Since you say that you can't add the relevant nofollow meta tags to your content then you can handle this using your follow and crawl patterns.
From the official documentation:
Google recommends crawling to the maximum depth, allowing the Google algorithm to present the user with the best search results. You can use URL patterns to control how many levels of subdirectories are included in the index.
For example, the following URL patterns cause the search appliance to crawl the top three subdirectories on the site www.mysite.com:
regexp:www\\.mysite\\.com/[^/]*$
regexp:www\\.mysite\\.com/[^/]*/[^/]*$
regexp:www\\.mysite\\.com/[^/]*/[^/]*/[^/]*$

Internationalizing Title/Meta Tags ok or bad practice?

Is there a problem if I have both English and Chinese versions of the same title/meta tags under the same exact url? I detect the language the user has set for the browser (through the http header "accept-language" field) and change the titles/meta tags based on the language set. I get a large percentage of my traffic from China and felt this was a better-localized user experience for those users BUT I have no idea how Google would view this. My gut feeling tells me that this is not good for SEO.
Baidu.com, a major Chinese search engine, does in fact pick up my translated tags however for other US based sites it does not translate their English title/meta tags into Chinese. I would think Chinese users are less likely to click on those.
Creating sub domains and or separate domains for other countries is not an option at this point. That being said should I only have one language (English) for my title/meta tags to avoid any search engine issues?
Thanks for any advice / wisdom you can offer. Really hoping to get clarity on best practices.
Thanks all!

Yes, it probably is a problem. Search engines see mixed language content. You are not describing how you “detect and change the titles/meta tags based on the users browser language”, but you are probably doing it client-side and using “browser language”, which is wrong whatever it means in detail (it does not specify the user’s preferred language).
To get a more targeted answer, ask a more real question, with a URL.

If you want to get search traffic from search engines in both English and Chinese, you should have two urls instead of one.
When googlebot crawls a page, it does not even send the "Accept-Language" header. You have to send it your default language. When there is one url, there is no way for you to have your second language indexed. You won't be ranked in search engines in multiple languages.
For best SEO, use separate top level domains, subdomains, or folders for different languages.
http://example.de/
http://example.es/
http://example.com/
http://de.example.com/
http://es.example.com/
http://www.example.com/
http://example.com/de/
http://example.com/es/
http://example.com/en/

I think there are no problem when you use English and Chinese in same meta tags.

Do I have to have language in URL? [Codeigniter]

I am making a mutli-language site. I have seen that some websites have urls with languages in them like so:
http://example.com/en/homepage
I hear it is important for SEO, but I was wondering, doesn't that make it more complicated in terms of routing, URI, controllers, rather than just having a session/cookie that holds the desired language?
What are pluses and minuses of each way and which way should I go?
thank you

you could add some lines to your route config and to your core to do what you want.
Here are two links with a lot of information to implement this: http://codeigniter.com/wiki/URI_Language_Identifier/
http://sumonbd.wordpress.com/2009/09/16/develop-multilingual-site-using-codeigniter-i18n-library/

Pretty URLs Vs. Duplicate Content

I'm trying to clear up a grey area about this much talked about topic...
Like most devs, I've made some pretty URLs with mod_rewrite. My sites internal links point to the pretty URLs and things are working nicely.
But, I can still access the old URL if I point to it directly.
Now, this is most certainly going to cause duplicate content issues so after doing some research it seems that 301 redirects are the way to go.
But.... and here's the grey bit...
If you are working on a site with thousands of URLs, what's best practice to achieve this? I don't wantto list 1k+ lines in .htaccess I thought of a regexp in my rewrite rule, but my pretty URLs have names from the database in them... and I can't access that from .htaccess :)
Have I hit a dead end? Is there a way around this? Would Google's canonical tag be a possibility??

Well, I don't know if this is the "definitive" answer, but I have a bunch of "functional" URLS like:
http://www.flipscript.com/product.aspx?cid=7&pid=42&ds=asdjlf8i7sdfkhsjfd978
but I remap the URLs, link to them and list them in my site map as:
http://www.flipscript.com/ambigram-ring.aspx
I haven't seen ANY evidence that identical URLS pointing to the same content within the same domain has any negative impact on SEO.
In fact, over the past year, I have climbed to the #1 position on Google with this in place for my primary keyword.
My theory about why this should be so is that Google applies the duplicate content penalty for entire "clone sites", not for just linking with different URLs to the same content within a single site.

A quick dirty way would be to re-route everything on the site via a PHP file that checks to see if the path is still valid, querying the database if necessary. Use a 301 redirect if the path has permanently moved. Soon enough these "grey urls" should hardly ever come across, and indexes should be updated across search engines. At which point you can remove the router.
If you could specify what your "grey url" looks like I may be able to suggest a better alternative.

"Would Google's canonical tag be a possibility??" -- Why not?
--> It automatically transfers page rank
--> Google recommends canonical tag even if the content differs slightly but is more or less similar.
--> Too many 301 redirects to pages within site are bad for SEO (my personal experience with Bing).
--> Too may 301 redirects increase the effective load time of content for your users (especially bad if the ping times from their location to your server is high).

Are clean URLs a backend or a frontend thing

What do you think.. are clean URLs a backend or frontend 'discipline'

The answer is BOTH.
For example:
https://stackoverflow.com/questions/203278/are-clean-urls-a-backend-or-a-frontend-thing
The number above is a database id, a back-end thing. Chop off the pretty part and it goes to the same page. Therefore the "are-clean-urls-a-backend-or-a-frontend-thing" is part of the front-end thing.

If we're talking url's being 'clean' from an end user experience then I'm going to break the mould a bit and say that url's in general are not intuitive and they never will be, they are intended to be machine readable.
There is no standard to the format of a url such that when navigating from site to site humans will never ever remember how to reach a resource purely through remembering urls and their 'friendly syntax'. We can argue the toss about whether using a '?' and '&' or '/' to express how how to identify a resource via a url; is one method better than the other? it doesn't matter. At the end of the day a machine parses it and sends back the result.
We should stop deluding ourselves that people actually type these things in and realise that uri's are for machines, not people.
I have yet to use/remember a uri that goes beyond the first few characters of the http://domain.com/ part of an address, and I've been using the web since a long time. That's what bookmarks are for. Nowhere on a website does it say 'change this part here in our url to view 'whatever else' resource' because url's are usually undocumented and opaque.
Yes make your uri's SEO friendly (hell even they change periodically) but forget about the whole 'human/clean' resource identifier thing, it's a mystical pipe dream.
I agree with Vlion that url's should provide a unique mechanism to bookmark a resource and return to it (unlike some of these abominable web 2.0 ajax/silverlight/flash creations), but the bookmark will never be for humans to comprehend and understand. There seems to be quite a lot of preoccupation and energy spent in dreaming up url strategies that humans can remember and type in, it's a waste of energy. Let's get on and solve real problems.
Sorry for the rant, but there's a lot of web 2.0 nonsense related to urls going on in certain circles that are just a total waste of time.

Now that Firefox's Awesome bar and Google Chrome's Omnibox address bars can be used to search the browsing history it makes it much easier for users to search their history for previously visited sites, so having clean urls may help the user find sites in their history more easily.
Making sure the page has an appropriate Title is important (as both browsers search the title as well as the url) but by making sure the url has relevant keywords in it as well, when those keywords are typed in the address bar the urls will be more likely to show up higher in the suggestions as the keyword will be matched twice, in the url and the title.
Also, once a user has typed the name of a site they will be presented with example urls from the site which they can then use as a template for narrowing down their search. So using verbs and nouns in the url for different sections or actions of the site will aid the user to narrow their search to just the part of the site they are interested in, e.g. the /questions/ or /tag/ sections of stackoverflow, or the "/doc" at the end of docs.google.com/doc that can be used to view just document pages on Google docs*.
Since both Firefox and Chrome search for each space separated word typed into the address bar, it could be argued that it isn't necessary for searching that the url be completely human readable, but to allow the user to actually read the keywords they are interested in from the url the amount of "noise" should be kept to a minimum.
* which are of the form http://docs.google.com/Doc?id=gibberish

My perspective is simple:
every place I visit with my browser(with various edge case exceptions) should be bookmarkable and Forward/Back should be usable and not destroy any data entry.

Backend for sure. Your server is the one that has to take care of the routing to the resources requested by the URL.

I think the main reasons for using friendly URLs are:
Ease of linking / sharing
Presentation
Seo
So I think it's purely a client-side pleasure. While they're nice on the server as well, they're not mission critical.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Does Googlebot crawl url with get parameters? - mod-rewrite

I gave up rewriting url because I don't feel the necessity. So url looks like "http://www.stackoverflow.com?question=35453." But, I worry about one thing. Does Googlebot crawl my pages? Do I need to rewrite my url like "http://www.stackoverflow.com/question/35453"?

Related

Crawl depth for URLs added through metadata-and-url feed

Internationalizing Title/Meta Tags ok or bad practice?

Do I have to have language in URL? [Codeigniter]

Pretty URLs Vs. Duplicate Content

Are clean URLs a backend or a frontend thing

Categories

Resources