ajax - no ! after # - do the crawlers read this? - ajax

I made a site - all requests are sent to server via AJAX, but there is no ! afer # so it wont be changed by crawlers on ?_escaped_fragment_. Everytime you click on link on my site all you do is changing name after #. Then request to server is send - php is querying mysql for data and then json with this data comes back- its recognized and the content (DOM and text) changes.
in short way - all links just ASK for data from mysql. There is no html or anything
You can add this links and it works.
You can go forward and backward it works.
The Question:
Do the crawlers index my link and json data which comes from it ?

The content AJAX is not indexed for google.
You will have to add tags < noscript > if you want add some static content and will be indexed for the crawler.
There are more elegant solution as you comment, building the ajax links as for example twitter did.
Anyway you will need the parameter ! after # if you want that the crawler will translate the url.
For example if you follow my next link you will see my twitter page, but see the parameters in the link:
https://twitter.com/#!MuSTa1nE
For the crawler is translate as:
http://twitter.com/?_escaped_fragment_=/MuSTa1nE
You don´t want that google index the ugly url so remember to do a 301 redirect.
Review the next content:
http://www.seomoz.org/blog/how-to-allow-google-to-crawl-ajax-content
Expect it will be helpful.

Related

Ajax content indexing, Google

I've followed the instructions from the Google website to enable Ajax crawling on my AngularJS site by adding the following meta tag:
<meta name="fragment" content="!">
The rendered content has some links like:
User 1
User 2
User 3
Also some Ajax tabs which render dynamic content like:
Popular
Recent
Looking at the server logs, GoogleBot did came and passed in correctly the _escaped_fragement in the Uri, which is correct:
_escaped_fragment_=%2fpopular
_escaped_fragment_=%2frecent
Problem is that looking at actual indexed content using site:www.somesite.com and logs on server, I see that GoogleBot attempted to index pages like:
/user/1/#!/popular
/user/1/#!/recent
Why would something like this happen considering those urls are relative and don't have #! on them to indicate ajax content and is there a way to prevent this?
If those URLs are available on all pages, it will simply add them.
So, if I would go to: User 1 and there are again Popular there pages, then it's logical that Google loads: /user/1#!/popular
You might want to know that I've solved this puzzle with a script that's on Github: https://github.com/kubrickology/Logical-escaped_fragment
Simply build your AJAX pages with: __init()

Why is my ajax content not being indexed by google

I have tried to set my site up ( http://www.diablo3values.com )according to the guidelines set out here : https://developers.google.com/webmasters/ajax-crawling/ However, it appears that Google has updated their indexes (because I see the revisions to the meta description tags) but the ajax content does not show up in the index.
I am trying to use the “Handle pages without hash fragments” option.
If you view either of the following:
http://www.diablo3values.com/?_escaped_fragment_=
http://www.diablo3values.com/about?_escaped_fragment_=
you will correctly see the HTML snap shot with my content. (those are the two pages I an most concerned about).
Any Ideas? Am I doing something wrong? How do you get google to correclty recognize the tag.
I'm typing this as an answer, since it got a little to long to be a comment.
First of all, your links seems to point to localhost:8080/about, and not /about, which probably is why google doesn't index it in the first place.
Second, here's my experience with pushstate urls and Google AJAX crawling:
My experience is that ajax crawling with pushstate urls is handled a little differently by google than with hashbang urls. Since google won't know that your url is a pushstate url (since it looks just like a regular url), you need to add <meta name="fragment" content="!"> to all your pages, not only the "root" page. And google doesn't seem to know that the pages are part of the same application, so it treats every page as a separate Ajax application. So the Google bot will never actually create a navigation structure inside _escaped_fragment_, like _escaped_fragment_=/about, as it would with a hashbang url (#!/about). Instead, it will request /about?_escaped_fragment_= (which you aparently already have set up). This goes for all your "deep links". Instead of /?_escaped_fragment_=/thelink, google will always request /thelink?_escaped_fragment_=.
But as said initially, the reason it doesn't work for you is probably because you have localhost:8080 urls in your _escaped_fragment_ generated html.
Googlebot only knows to crawl the escaped fragment if your urls conform to the hash bang standard. As users navigate your site, your urls need to be:
http://www.diablo3values.com/
http://www.diablo3values.com/#!contact
http://www.diablo3values.com/#!about
Googlebot actually needs to see these urls in the source code so that it can follow them. Then it knows to download the following urls:
http://www.diablo3values.com/?_escaped_fragment=contact
http://www.diablo3values.com/?_escaped_fragment=about
On your site you appear to be loading a new page on each click, and then loading the content of each page via AJAX too. This is not how I would expect an AJAX site to work. Usually the purpose of using AJAX is so that the user never has to load a whole new page. When the user clicks, the new content section is loaded and inserted into the page. You serve the navigation once and then you only serve escaped fragments of the content.

Scraping content from an AJAX/Javascript web page

I need to do some screen scraping on a web page where the content I need is generated by AJAX. On the initial page there is a table with 4 tabs. When you click on any of the tabs the content of the table changes. I need the content from the 3rd tab only.
I have used the google chrome 'Inspect Element' tool to see what the requests and post data was and I can get the information I need when I put the information (session id and a lot of other cookie data as well as post data) from the inspect element result into a PHP curl request. But this only works for the 30 minutes that the session lasts. Does anyone know of a way I can get to this information?
I wont reproduce the code here but I will point you to the answer.
Its within this book:
http://www.amazon.com/Webbots-Spiders-Screen-Scrapers-Developing/dp/1593273975/ref=dp_ob_image_bk
A must buy for someone doing what your doing.
In the end I used htmlunit to get the content I needed. I also found the HTMLUnit Scripter very useful to help generate the Java code required.

Ajax generated pages with different URLs

I couldn't really word the title very well, but here's my problem: I've got a webpage that reads from a database each time the user clicks a button, the content is then replaced for part of the page.
Because it is an ajax load, everything is done in the background, and so the URL stays the same. This wasn't be a problem at all until I realised that I will want to have a different Facebook comments box for each set of content that is loaded - so if someone comments, it is posted to their facebook profile, people click on the link and are then taken to different content.
So... what I need is some way of referencing each set of content, and I've found a site that does exactly that (I'm sure there are a lot of them).
Here's the link.
Each set of content has a different 'hash code' (because I don't know the actual name for it) which is appended to the URL - in this case the code is "#1922934", this allows people to post links to it that specific set of content on Facebook etc. - and also allows a different Facebook comment box for each set of content.
Does anyone know how such a set-up can be achieved or how these 'hash codes' work?
Here's a document from wikipedia on it.
[http://en.wikipedia.org/wiki/Fragment_identifier][1]
The main idea is that URI fragments are used because they don't cause a page reload. They also can be used to refer to anchors on a web page.
What I would do is on page load use JavaScript to read the URI fragment (location.hash) then make a request to your server to load the comments etc. The URI fragment cannot be read by a server and is only found through a client (browser)
Sounds like you want something like SammyJS.

What to put in HTML snapshot for hash-bang URL for SEO?

I am using hash-bang URLs in my AJAX application and I am implementing the server-side for:
handle ?_escaped_fragment_=key1=value1%26key2=value2
So when I look at Google's FAQ, it says that this URL has an equivalent snapshot
It is easy to see that the snapshot content is not the same as corresponding hash-bang url. This Google example does not help and therefore my question:
My HTML page has three components/panels/sections that are being updated by AJAX. I use the onclick event on the hash-bang URLs to fetch the content from server and then update relevant section of the HTML page. My panels are updated independent of each other and each panel has its own hash-bang URL .
My question is:
Should the HTML snapshot contain the entire page with all 3 sections or only the updated section?
If I am to return the entire page, it is almost impossible to get the state of the other 2 sections correctly, so would the Googlebot reject my site if the other 2 sections are returned in their default state ?
this is a good question, sadly no answer for this one :( im looking for the same. My problem is that EVERYTHING are news loaded with ajax, so each news is actually a little peace of text so im asking myself if my snapshots should be only the current new or a full page with all the info that i have in my home plus current new's content
Do you have news about that topic ?

Resources