Why use a Google Sitemap? - sitemap

I've played around with Google Sitemaps on a couple sites. The lastmod, changefreq, and priority parameters are pretty cool in theory. But in practice I haven't seen these parameters affect much.
And most of my sites don't have a Google Sitemap and that has worked out fine. Google still crawls the site and finds all of my pages. The old meta robot and robots.txt mechanisms still work when you don't want a page (or directory) to be indexed. And I just leave every other page alone and as long as there's a link to it Google will find it.
So what reasons have you found to write a Google Sitemap? Is it worth it?

From the FAQ:
Sitemaps are particularly helpful if:
Your site has dynamic content.
Your site has pages that aren't easily
discovered by Googlebot during the
crawl process—for example, pages
featuring rich AJAX or images.
Your site is new and has few links to it.
(Googlebot crawls the web by
following links from one page to
another, so if your site isn't well
linked, it may be hard for us to
discover it.)
Your site has a large
archive of content pages that are not
well linked to each other, or are not
linked at all.
It also allows you to provide more granular information to Google about the relative importance of pages in your site and how often the spider should come back. And, as mentioned above, if Google deems your site important enough to show sublinks under in the search results, you can control what appears via sitemap.

I believe the "special links" in search results are generated from the google sitemap.
What do I mean by "special link"? Search for "apache", below the first result (Apache software foundation) there are two columns of links ("Apache Server", "Tomcat", "FAQ").

I guess it helps Google to prioritize their crawl? But in practice I was involved in a project where we used the gzip-ed large version of it where it helped massively. And AFAIK there is a nice integration with webmaster tools as well.

I am also curious about the topic, but does it cost anything to generate a sitemap?
In theory, anything that costs nothing and may have a potential gain, even if very small or very remote, can be defined as "worth it".
In addition, Google says: "Tell us about your pages with Sitemaps: which ones are the most important to you and how often they change. You can also let us know how you would like the URLs we index to appear." (Webmaster Tools)
I don't think that the bold statement above is possible with the traditional mechanisms that search engines use to discover URLs.

Related

AJAX Crawling with question mark instead of hashbang

Where I'm at: I've read Google's documentation regarding it's AJAX crawling, and I've searched around a bit in this website and others, but I'm quite confused, as it seems that all problems address the same issue: AJAX crawing with hashbangs?
I've developed an app which, among other purposes, let's the user search for locations worldwide, using an AJAX searcher quite similar to Google's, but my app uses exclusively the question mark in AJAX, instead of hashbang. Due to compatibility issues, changing it to the hashbang is not an option.
Not only am I largely confused by the fact that I could not find anyone else using the question mark instead of the hashbang, I'm also wondering if there is any documentation regarding my issue: how to let google bot crawl all my AJAX content when I'm using the question mark instead of a hashbang in my AJAX app.
The AJAX crawling schema was created explicitly for applications and websites using hashbang (#!) in the URL structure, because the fragment part of the URLs only exist on the client side; the URL rewriting in the specs, i.e. from #! to ?_escaped_fragment_= is meant to solve that.
Since most of the web is already making use of Javascript in a way or other, we (Google) needed a better solution, so we started executing Javascript in the pages we crawled and effectively render every page, just like a normal browser would. To quote our blogpost, Understanding web pages better:
In order to solve this problem, we decided to try to understand pages by executing JavaScript. It’s hard to do that at the scale of the current web, but we decided that it’s worth it. We have been gradually improving how we do this for some time. In the past few months, our indexing system has been rendering a substantial number of web pages more like an average user’s browser with JavaScript turned on.
You can also see what we "see" using Fetch as Google in Search Console (former Webmaster Tools); read more about the feature in our post titled Rendering pages with Fetch as Google
Before you do anything else, please try to fetch a few pages from your site with Fetch as Google. You might not have to do anything at all, it might actually work out of the box. And the good news is that it's not only Google that's rendering pages!

SEO with angularjs and asp.net restfull service

I have developed a website using angularjs and web api.
The problem is that the ajax rendered content is not crawable by google. And no one can find the website using google search.
After reading many articles regarding this issue, including:
This one with all links of explanation going out,
Google ajax crawling protocol, and also stack over flow question, I couldn't find the proper solution. Those that mention asp.net solutions, are talking about mvc, and I need only the simple REST by web api, other articles are not talking about asp.net.
Is there any simple explanation?
I'm the one who asked this same question long ago, so I will answer from my experience:
Firstly, if all your content are accessible via unique URIs (including the hashbang if you use it), modern search engines should index it just fine. In fact Google can index javascript generated content now. You can try that via the Google Webmaster tool and see how your site is indexed.
Secondly, there are libraries that help you to serve parsed content to search engines if you need to, but in my case I didn't bother much with it since Google is indexing js nicely.
I've seen others ask this question, and maybe I'm missing something or this is outdated, but I don't see why AngularJS needs to be an issue with SEO.
Say you have a landing page and it has a bunch of links. Assuming you're using html5 mode in AngularJS (and I'm not sure that's 100% necessary) and something like ng-route then the links on the landing page can work both as "angular" (JavaScript) links and "old school" (full page load) links.
If you're a human user you can click a link and it will do angular magic and adjust the content without loading the full page. Ok, all fine.
But if you instead copy the link and paste it in a new tab or new browser, it will still work - assuming you've set up routes correctly.
I'm not an SEO expert by any stretch of the imagination, but as I understand it, having links that load pages and having those pages have real and useful content is the core of SEO, and done this way, AngularJS should work fine. The key thing to check is if you copy and paste the link (not just click it) that it works.

What's the best SEO practice when you do an AJAX driven website?

I encountered several websites running in Ajax and it seems like their SEO is pretty bad, does Google really crawl websites like that?
Optimization guides for different search engines tell that bots are unable to crawl such sites. I think, Google's bots might use Chrome's engine for some purposes (I remember, they made site screenshots one time), but nevertheless, it's the static HTML that's important. Therefore, the usual practice would be generating valid HTML on the server to provide functional site for user agents like, for example, Lynx, and then patching it with AJAX, history API and all other imaginable bells and whistles.

Get URI fragment (hash) to affect SEO? Get indexed by SEs?

I am building a forum site where the post is retrieved on the same page as the listing via AJAX. When a new post is shown, the URI fragment is changed (ex: .php#1_This-is-the-first-post). Also the title and meta tags are changed.
My question is this. I have read that search engines aren't able to use #these-words. So therefore, my entire site won't be able to be indexed (as it will look like one page).
What can i do to get around this, or at least make my sub-pages be able to get indexed?
NOTE: I have built almost all of the site, so radically changes would be hard. SEO is my weakest geek-skill.
Add non-AJAX versions of every page, and link to them from your popups as "permalinks" (or whatever you want to call them). Not only aren't your pages available to search engines, they can't be bookmarked or emailed to friends. I recently worked with some designers on a site and talked them out of using an AJAX-only design. They ended up putting article "teasers" in popups and making users go to a page with a bookmarkable URL to read the complete texts.
As difficult as it may be, the "best" answer may be to re-architect your site to use the hash tag URL scheme more sparingly
Short of that, I'd suggest the following:
Create an alternative, non-hash based URL scheme. This is a must.
Create a site-map that allows search engines to find your existing pages through the new URL scheme.
Slowly port your site over. You might consider adding these deeper links on the page, or encourage users to share those links instead of the hash-based ones, etc.
Hope this helps!

How do I Extend Blogengine.Net to collect statistics of visitors?

I love BlogEngine. But from what I can se it does not collect the standard information about the visitors I would like to see (referrer, browser-type and so on).
When I log in as Admin I have a menu item named "Referrer". I can choose a weekday and then I'll be presented with 1 or 2 rows with
"google.com 4 hits, "itmaskinen.se 6 hits" and so on, But that's not what I want to se, I want to se where my visitors come from, country, IP if possible, how many visitors and so on.
If someone of you are familiar with Blogengine.Net and can point me in the right direction to where I would put my own log-code or if you know any visitor-statistic-extension that can do it for me, I would be really happy to know. I prefer an extension, because if I make changes myself to BlogEngine it may break later updates I install.
Blogengine.Net is a blog software made in .Net found here: http://www.dotnetblogengine.net/
And yes, I prefer to take this question here rather then in the Blogengine.Net forum, you know why. ;)
(Anyone, feel free to edit my (bad) english in this post and after that delete this sentence)
This isn't an extension, but it's what I use to collect all my blogengine.net data and it should be upgrade safe.
When you log into the Blogengine.NET admin screens you can go to "Settings> Custome Code > Tracking Script", here you can put your http://www.google.com/analytics/ logging script. Google Analytics provides all the referrer, browser type, etc stuff you were wanting. And what's nice is you can then create additional accounts for other sites if you choose.
I use both Google Analytics and StatCounter to track visitor stats. I find that each one provides useful information that the other doesn't. And they're both free to a certain extent.
I place their javascript code int the site.master file of my custom BE.Net skin.
For Google Analytics I go a step further and pass the username of authenticated users as a custom variable. That way I can match users names up with the stats. To do this you can use the _setVar javascript method on the GA pageTracker like so:
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-129049-25");
var userDefinedValue = '<%= System.Web.Security.Membership.GetUser() != null ? System.Web.Security.Membership.GetUser().UserName : "" %>';
pageTracker._setVar(userDefinedValue);
pageTracker._trackPageview();
</script>
Anyone noticed that we miss all the hits coming from RSS readers? Syndication.axd does not run the analytics javascripts. So we miss the vast majority of viewers from the statistics. And we happily analyze that is just not impotant - ad-hoc visitors.
For the vast majority of cases, Google Analytics does just fine. It all depends on how much data you want. For example, if you want to keep note of IP addresses and resolve them to get domain names, and also highlight all visits to your blog from, say, your coworkers at the company where you work, you'd have to write some custom code yourself. However, it's all fairly primitive - these sorts of things are easily achievable using ASP.NET.
I set up gathering statistics on IIS web site of my BlogEngine instance and then analyze the logs using WebLog Expert - http://www.weblogexpert.com.
It is more reliable than google analytics, since I see really ALL requests that are coming to my IIS, no matter if this is a request to axd or to some static content. And, once I've found out that google was fooling me in the number of visits. After that I trust my IIS statistics much more than google.
There is a Widget which can be use to display Visits and Online Users Statistics.
You can find it from following links:
http://www.nuget.org/packages/Statistics/
http://www.itnerd.ir/post/2013/07/25/Visits-and-Online-Users-Statistics-widget-for-BlogEngine-2
but to see the instructions go to the second link.

Resources