Recursive wget: alter links

Recursive wget: alter links - ajax

I am trying to optimize my AJAX fragment links for Google crawler (which substitudes "#!..." links with "?_escaped_fragment_=..." as described here). I want to check if the entire site is accessible via _escaped_fragment_ links I have implemented.
I am curious if I can use wget recursive site download to this end and make it substitude "#!" links with "_escaped_fragment_", so that wget sees
abc.com?_escaped_fragment=arg=value
instead of
abc.com#!arg=value

No you can't strings after # is not sending to server... they are for JavaScript routing.

Related

Canonical Link Does Not Make Anything

Guys I have two different link but same website.
one of them is: https://example.com/
other is: https://www.example.com/
I am doing this process in a nodejs and pug. So I have a layout file which I have added my canonical link in order to effect every page of my website.
My canonical link (in Pug syntax):
link(rel="canonical", href="https://www.example.com/")
But I have added this link to my head in my layout.pug
Why it is not redirecting to canonical url?

The "canonical" link type does not redirect browsers to a different URL. Its purpose is to tell search engine spiders which variant of a page to index.

How do I make my hosting detect _escaped_fragment_ and fetch the corresponding HTML?

I have an AJAX site and I'm using hashbangs (#!) in my urls with the intention of then providing the correct HTML versions when google bots replace the #! with ?_escaped_fragment_.
How do I go about routing/proxying/redirecting the url with _escaped_fragment_ to the corresponding HTML pages? I can't find documentation on this part of the process specifically, and my first thought was that I should be using a 301 or 302 redirect, but I was told that wasn't the case, albeit not given any more info.

You can't use htaccess or redirects at all. Everything after the # in the URL isn't even sent to the server. The URL fragment is entirely client side. You'll need to use some kind of javascript solution to look at the fragment, and make whatever appropriate AJAX call to the server and load the content you get back.

Will the google crawler use ajax _escaped_fragment_ format when a link redirects to an ajax link?

I know I can do the reverse and have server.com/#!/mystuff be ajax-crawlable, but I want to know if the reverse is possible. If I have server.com/mystuff and that sends a redirect to server.com/#!/mystuff, will the google crawler then run that url through it's rename process so that it follows the redirect to server.com/?escaped_fragment=mystuff?

According to webmaster tools, no [1].
Question: When should I use _escaped_fragment_ and when should I use #! in my AJAX URLs?
Your site should always use the #! syntax in all URLs that have adopted the AJAX crawling scheme. Googlebot will not follow hyperlinks in the _escaped_fragment_ format.
However the reality seems to be a bit different. I'm noticing googlebot following escaped format links.
[1] https://developers.google.com/webmasters/ajax-crawling/docs/faq

Google crawl ajax / dynamically generated content - SEO

I've got a very unique situation that I don't believe any of the other topics here can relate.
I have a ecommerce module that is dynamically loaded / embedded into third party sites, no iframe straight JSON to web client into content. I have no access to these third part sites at all, other then my javascript file being loaded from their page and dynamically generating the content.
I'm aware of the #! method, but that's no good here, my JS does generate "urls" within the embedded platform, but they're fake and for the address bar only, and I don't believe google crawlers can reach this far.
So my question is, is there a meta that we can set to point outside the url to i.e. back to my server with static crawlable content. I.e. pointing the canonical to my server... but again I don't think that would work.

If you implement #! then you have to make sure the url your embedded in supports the fragment parameter versions, which you probably can't. It's server side stuff.
You probably can't influence the canonical tag of the page either. It again has to be done server side. Any meta tag you set via JavaScript will not be seen by a bot.
Disqus solved the problem by providing an API so the embedding websites could get there comments server side and render then in plain html. WordPress has a plugin to do this. Disqus are also one of the few systems that Google has worked out how to crawl their AJAX pages.
Some plugins request people to also include a plain link with the JavaScript. Be careful with this as you may break Google Guidelines if you do it wrong. But you may be able to integrate the plain link with your plugin so that it directs bots and users to a crawlable version of the content.

Look into Google's crawlable ajax standard (and why it's a bad idea) and canonical URLs.

Now you can actually do this. A complete guide and examples can be found here: https://github.com/kubrickology/Logical-escaped_fragment

Google Search optimisation for ajax calls

I have a page on my site which has a list of things which gets updated frequently. This list is created by calling the server via jsonp, getting json back and transforming it into html. Fast and slick.
Unfortunately, Google isn't able to index it. After reading up on how to get this done according to Google's AJAX crawling guide, I am bit confused and need some clarification and confirmation:
The ajax pages need to be implement the rules only, right?
I currently have a rest url like
[site]/base/junkets/browse.aspx?page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
this would need to become something like:
[site]/base/junkets/browse.aspx#page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
And when google calls it like this
[site]/base/junkets/browse.aspx#!page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
I would have to deliver the html snapshot.
Why replace the ? with # ?
Creating html snapshots seems very cumbersome. Would it suffice to just serve simple links? In my case I would be happy if google would only index the things pages.

It looks like you've misunderstood the AJAX crawling guide. The #! notation is to be used on links to the page your AJAX application lives within, not on the URL of the service your appliction makes calls to. For example, if I access your app by going to example.com/app/, then you'd make page crawlable by instead linking to example.com/app/#!page=1.
Now when Googlebot sees that URL in a link, instead of going to example.com/app/#!page=1 – which means issuing a request for example.com/app/ (recall that the hash is never sent to the server) – it will request example.com/app/?_escaped_fragment_=page=1. If _escaped_fragment_ is present in a request, you know to return the static HTML version of your content.
Why is all of this necessary? Googlebot does not execute script (nor does it know how to index your JSON objects), so it has no way of knowing what ends up in front of your users after your scripts run and content is loaded. So, your server has to do the heavy lifting of producing a HTML version of what your users ultimately see in the AJAXy version.
So what are your next steps?
First, either change the links pointing to your application to include #!page=1 (or whatever), or add <meta name="fragment" content="!"> to your app's HTML. (See item 3 of the AJAX crawling guide.)
When the user changes pages (if this is applicable), you should also update the hash to reflect the current page. You could simply set location.hash='#!page=n';, but I'd recommend using the excellent jQuery BBQ plugin to help you manage the page's hash. (This way, you can listen to changes to the hash if the user manually changes it in the address bar.) Caveat: the currently released version of BBQ (1.2.1) does not support AJAX crawlable URLs, but the most recent version in the Git master (1.3pre) does, so you'll need to grab it here. Then, just set the AJAX crawlable option:
$.param.fragment.ajaxCrawlable(true);
Second, you'll have to add some server-side logic to example.com/app/ to detect the presence of _escaped_fragment_ in the query string, and return a static HTML version of the page if it's there. This is where Google's guidance on creating HTML snapshots might be helpful. It sounds like you might want to pursue option 3. You could also modify your service to output HTML in addition to JSON.

I've more or less given up on this. There really seems no alternative to generating the html on the server and delivering it in the html bdoy if you want goolge to index your directory.
I even tried adding a section wraped a .net user control which implemented a simple html version of the directory. But google also managed to ignore ..
So in the end my directory has been de-ajaxified. :(

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Recursive wget: alter links - ajax

No you can't strings after # is not sending to server... they are for JavaScript routing.

Related

Canonical Link Does Not Make Anything

How do I make my hosting detect _escaped_fragment_ and fetch the corresponding HTML?

Will the google crawler use ajax _escaped_fragment_ format when a link redirects to an ajax link?

Google crawl ajax / dynamically generated content - SEO

Google Search optimisation for ajax calls

Categories

Resources