We have a page with over 1000 image, we show only 10 on each page, we load them with ajax, when people "see the images", also using datatable.
Everything works fine, however in Google webmaster tools, I just got thousands of 404 errors, with pages like this:
http://example.com/ajax/%5C%22http:%5C/%5C/example.com%5C/image%5C/1937%5C/image-name%5C%22
Of course if I go to this page, I get a 404 error, because no page like this exists, but I don't understand then why Google fetches URLs like this.
A card url looks like this: example.com/image/a 4 digit number here/image-name
As it gets loaded with ajax it creates that kind of url, which you (as a visitor) never sees but somehow Google fetches it.
Now I added /ajax to robots.txt to disallow fetching it, but I'm not sure if that's the best idea.
Any help would be appreciated.
The most likely reason is that your ajax directory (and possible other directories) is readable and lists your PHP files, which Google can access and parse for more URLs.
For example, if one of your scripts echos JSON with strings like the following, Google will find
<a class=\"quality1\" href=\"http:\/\/example.com\/card\/22\/inner-rage\">
and try to navigate to that link which resolves to
http://example.com/%22http:////example.com//card//22//inner-rage/%22
which is a 404.
You should stop http://example.com/ajax/ from displaying directory contents with either an .htaccess, or drop an empty index.html there.
You've also disallowed /ajax in your robots.txt, so this should also work. Try both.
Related
My site is AJAX but it pulls content from .html files. Some of those files have been indexed without the #!, so they just function as a basic html site. I want to redirect users that land on the non ajax page to the #! version. I tried a redirect (without thinking about it) and it created an endless loop with the dynamic content.
If you look at the code, you will see that it uses js to place the static pages into a content wrapper.
I am equally having trouble with an seo issue, where google does not appear to be requesting the escaped_fragment version... that or I need some help. I thought that since it was pulling content from html files, I could just copy those pages and add name it _escaped_fragment_=page.html it is not working. I tried a redirect, but google fetch just showed the redirect request and not content.
It was a template that I purchased... I figured out how to modify the theme and content, but this is beyond me.
Closed
I decided to scrap the hashbang method. I have real pages, and I decided to let them be searched and indexed. I am waiting on a solution to pull only the body into the ajax content warapper; however, I was able to apply basic CSS to the pages without messing anything up when loaded into the main page via ajax.
I used
$("a").attr("href", function(i, href) (some js stuff to add a hash-- hostname +# href)
to add a hash to the clean urls that were internal from the main menu. This created a loop if added to the pages, so I used a clean url with an onclick redirect to the ajax version. "/" before the link.
onclick="window.location = '/#link.html';return false;"
I had a JS redirect that detected if there was a hash before the page link, and if not, added it; however, google did not like it! Sure the pages are not as nice. That said, I have content for non JS enabled browsers. As soon as I get the main.js modified so that it ignores head elements, I can dress them up even more. Each page has links that will get a user to the ajax version, including the home button "/#".
I have tried to set my site up ( http://www.diablo3values.com )according to the guidelines set out here : https://developers.google.com/webmasters/ajax-crawling/ However, it appears that Google has updated their indexes (because I see the revisions to the meta description tags) but the ajax content does not show up in the index.
I am trying to use the “Handle pages without hash fragments” option.
If you view either of the following:
http://www.diablo3values.com/?_escaped_fragment_=
http://www.diablo3values.com/about?_escaped_fragment_=
you will correctly see the HTML snap shot with my content. (those are the two pages I an most concerned about).
Any Ideas? Am I doing something wrong? How do you get google to correclty recognize the tag.
I'm typing this as an answer, since it got a little to long to be a comment.
First of all, your links seems to point to localhost:8080/about, and not /about, which probably is why google doesn't index it in the first place.
Second, here's my experience with pushstate urls and Google AJAX crawling:
My experience is that ajax crawling with pushstate urls is handled a little differently by google than with hashbang urls. Since google won't know that your url is a pushstate url (since it looks just like a regular url), you need to add <meta name="fragment" content="!"> to all your pages, not only the "root" page. And google doesn't seem to know that the pages are part of the same application, so it treats every page as a separate Ajax application. So the Google bot will never actually create a navigation structure inside _escaped_fragment_, like _escaped_fragment_=/about, as it would with a hashbang url (#!/about). Instead, it will request /about?_escaped_fragment_= (which you aparently already have set up). This goes for all your "deep links". Instead of /?_escaped_fragment_=/thelink, google will always request /thelink?_escaped_fragment_=.
But as said initially, the reason it doesn't work for you is probably because you have localhost:8080 urls in your _escaped_fragment_ generated html.
Googlebot only knows to crawl the escaped fragment if your urls conform to the hash bang standard. As users navigate your site, your urls need to be:
http://www.diablo3values.com/
http://www.diablo3values.com/#!contact
http://www.diablo3values.com/#!about
Googlebot actually needs to see these urls in the source code so that it can follow them. Then it knows to download the following urls:
http://www.diablo3values.com/?_escaped_fragment=contact
http://www.diablo3values.com/?_escaped_fragment=about
On your site you appear to be loading a new page on each click, and then loading the content of each page via AJAX too. This is not how I would expect an AJAX site to work. Usually the purpose of using AJAX is so that the user never has to load a whole new page. When the user clicks, the new content section is loaded and inserted into the page. You serve the navigation once and then you only serve escaped fragments of the content.
Hi Folks,
my first post here, thanks for any help i got already throught reading before.
I am working on a wordpress projekt. And it seems i am missing the overview on my problem.
I use ajax to recieve additional product data. http:url/product/additional_ajax_data...
This works fine, except direct call of the ajax urls. Direct call of a ajax url gives
a 404 not found.
Please dont give instructions like: add 200 ok to header... Cause the project will
consist of some thousand pages and work arounds like this are a no go...
Aditional infos: the urls have no ajax hash tag... And the content will dynamicly loaded depending on last url fragment
I found the solution:
To prevent Wordpress of 404 when calling a ajax url directly, add rewrite endpoints to the system.
You can follow the post from Jon Cave on Wordpress:
http://make.wordpress.org/plugins/2012/06/07/rewrite-endpoints-api/
Works also on custom post_types and custom taxonomys, keep an eye on the type for wich you want to register a custom endpoint rewrite (that may depends on your options from your post type, page type etc...).
If you are sure that url is correct and file is there, check if permissions on file are not too strict. Also check .htaccess to make sure it doesn't black certain file extensions to be loaded directly
When i use the object debugger, the scraper is not able to see my OG content on my page. The debugger says "Can't download: Could not retrieve data from URL.", even though it's a 200 OK and shows the correct fetched and canonical URL. I have a subdomain on it, and it work fine.So not sure what happen to my main domain.
When click on Scraped URL See exactly what our scraper sees for your URL , it just show blank page.
Your site seems to have some HTML errors: http://validator.w3.org/check?uri=http%3A%2F%2Fspandooly.de
You should fix them before attempting to validate your site.
Funny thing, I create a copy of your page, and it seems to validate with no changes in the HTML. Your webserver might be doing something weird (according to the headers, the charset is missing or none):
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fwww.webniraj.com%2Fspandooly.html
I couldn't really word the title very well, but here's my problem: I've got a webpage that reads from a database each time the user clicks a button, the content is then replaced for part of the page.
Because it is an ajax load, everything is done in the background, and so the URL stays the same. This wasn't be a problem at all until I realised that I will want to have a different Facebook comments box for each set of content that is loaded - so if someone comments, it is posted to their facebook profile, people click on the link and are then taken to different content.
So... what I need is some way of referencing each set of content, and I've found a site that does exactly that (I'm sure there are a lot of them).
Here's the link.
Each set of content has a different 'hash code' (because I don't know the actual name for it) which is appended to the URL - in this case the code is "#1922934", this allows people to post links to it that specific set of content on Facebook etc. - and also allows a different Facebook comment box for each set of content.
Does anyone know how such a set-up can be achieved or how these 'hash codes' work?
Here's a document from wikipedia on it.
[http://en.wikipedia.org/wiki/Fragment_identifier][1]
The main idea is that URI fragments are used because they don't cause a page reload. They also can be used to refer to anchors on a web page.
What I would do is on page load use JavaScript to read the URI fragment (location.hash) then make a request to your server to load the comments etc. The URI fragment cannot be read by a server and is only found through a client (browser)
Sounds like you want something like SammyJS.