Changing the URL and not reloading a page (this time without a url fragment) - ajax

Just wondering as to how did Google code this website without it reloading the website when the page AND url are changed.
Take a look and go from page to page.
http://www.20thingsilearned.com/
I've done a similar technique using a # hash change operation (with use of by the onhashchange event) to achieve the same result, however, this technique is far more cleaner.
Is this a chrome-specific feature? Or do all HTML5-enabled browsers support it? Or is there some complex combination of frames that are used together?

Take a look # this question
How does this site change the URL of the browser without changing pages?

Related

cached html pages and dynamic content

I have a caching plugin that creates static html pages from my php/mysql driven site.
On the homepage I've a listing of content <ul><li>content</li></ul>
I've a drop-wdown (select), that loads different set of content.
Obviously this isn't playing well with caching plugin.
I'm don't have website yet, I'm in the thinking phase and I'm trying to understand what problems I might face... could you help me with this little part I explained above?
Not 100% sure what you mean; but you should be able to stop browser caching by adding a random GET variable into the URL. Like time() for example; your browser should see it as a different page.

AJAX URLs & URL Rewriting

I am starting to set up a personal website, and I would like it's layout to look something like
-------------------------------
- Page Header & Menus Go Here -
-------------------------------
- Main Contents -
-------------------------------
- Footers -
-------------------------------
The main question is that I would like it to be a single-page interface in which the main contents are loaded and displayed with a combination of AJAX and jQuery to produce a nice effect. However, I would, of course, like to have the contents bookmark-enabled and indexed by search engines. I have skimmed throught the Single Page Interface Manifesto which explains some nice ways of achieving this, but I wouldn't really like to have my URLs like
http://www.mysite.com/index.php#!section=section1
http://www.mysite.com/index.php#!section=section2
I would, of course, like to re-write them as
http://www.mysite.com/section1
http://www.mysite.com/section2
My questions are this whether this approach is correct/doable and if AJAX URLs are compatible with URL rewriting. What URLS would be indexed by, say, Google anyway?
If you want your page to work without reloading and update at the same time the page's URL, the only way to archieve this is by changing the hash in the URL (location.hash = 'whatever').
URL rewriting cannot be used since the hash is not sent to the server, it's only available in the browser's scope.
Check Facebook or Twitter URLs. They are prettier than #!section=section1 but still need the hash.
Cheers.
If you want to load different content/tabs/some content of page without reloading browser,
Now It is possible with pjax..
you can use something like http://padrino-pjax.heroku.com/
you can try it, go to the link and click on any of links home,dinosaurs,aliens
and you will see It will change url and some content without reloading full page
It is achieved using ajax+push/pop of url in browser
I'm looking for a solution myself for a similar problem (I have a client site with an AJAXed wordpress theme, and these dreadful #! stuff on the URL prevent all the Social sharing plugins I have tried so far, from working correctly).
Apparently, there is a solution (with some drawbacks ofc..). I found about it here: http://moz.com/blog/create-crawlable-link-friendly-ajax-websites-using-pushstate
I know it's like two years since you've asked, but it could be helpful for someone else, or you may wanna check it out just for the sake of the curiosity itself! :-)

Facebook like or share with dynamic document title

I found this problem all over the net but no answer yet, so maybe here someone solved it ...?
I built a page relying heavily on jquery.address. It's got one index page and the rest loads dynamically via Ajax following Google's /#!/ scheme for crawlable pages. Now I want to add Facebooks Like or share button but I can't get it to grab the actual page title or url.
Whatever I do, it always falls back to title and url of the index page. It tried:
(obviously) changing title an openGraph meta on load of the new parts.
"linking" the crawler page (?_escaped_fragmet_=xyx) but specifying the #! page in meta
"sharing" with a given title and url.
I never get anything but a link to the index page or a blank "share" to the right url with title and thumbnail ignored.
Has anyone got a similar setup working?
Thanks for any hints,
thomas
Facebook is actually using #! now and it works! If you build your site so that http://site.de/?_escaped_fragment=something is identical to http://site.de/#!/something all you have to do is "share" the #! url and it'll display the info from the escaped fragment page.
Use this URL to check: http://developers.facebook.com/tools/debug
But: A much cleaner solution to the problem can be found here: http://github.com/browserstate/history.js/wiki/Intelligent-State-Handling
My guess would be that Facebook's crawler doesn't run Javascript and will always display whatever's actually in the page it gets from the server.
Facebook share has a BRUTAL cache, last time I checked it was impossible to change the title / description data once it was scraped :(
The issue I had was the og:url and the actual url of the page did not match. I also read a number of comments about the og data being just after the title element, but I don't think that solved anything.
With regard to issues of caching, it is true that Facebook's caching is "brutal", but it does not cache anything for the lint tool: http://developers.facebook.com/tools/debug.
I use no-hash-bang urls when sharing links. I process the hard links and redirect them to a hash bang client side using javascript. That way if a crawler goes to the hard linked page it will display the information just as it would if javascript were enabled.
Compare:
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Flikeapage.com%2F%23!%2FChristmas%2Fvs%2FBacon
and
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Flikeapage.com%2FChristmas%2Fvs%2FBacon
Hope this helps.

With Google's #! mess, what effect would a redirect on the converted URL have?

So Google takes:
http://www.mysite.com/mypage/#!pageState
and converts it to:
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
...So... Would be it fair game to redirect that with a 301 status to something like:
http://www.mysite.com/mypage/pagestate/
and then return an HTML snapshot?
My thought is if you have an existing html structure, and you just want to add ajax as a progressive enhancement, this would be a fair way to do it, if Google just skipped over _escaped_fragment_ and indexed the redirected URL. Then your ajax links are configured by javascript, and underneath them are the regular links that go to your regular site structure.
So then when a user comes in on a static url (ie http://www.mysite.com/mypage/pagestate/ ), the first link he clicks takes him to the ajax interface if he has javascript, then it's all ajax.
On a side note does anyone know if Yahoo/MSN onboard with this 'spec' (loosely used)? I can't seem to find anything that says for sure.
If you redirect the "?_escaped_fragment_" URL it will likely result in the final URL being indexed (which might result in a suboptimal user experience, depending on how you have your site setup). There might be a reason to do it like that, but it's hard to say in general.
As far as I know, other search engines are not yet following the AJAX-crawling proposal.
You've pretty much got it. I recently did some tests and experimented with sites like Twitter (which uses #!) to see how they handle this. From what I can tell they handle it like you're describing.
If this is your primary URL
http://www.mysite.com/mypage/#!pageState
Google/Facebook will go to
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
You can setup a server-side 301 redirect to a prettier URL, perhaps something like
http://www.mysite.com/mypage/pagestate/
On these HTML snapshot pages you can add a client-side redirect to send most people back to the dynamic version of the page. This ensures most people share the dynamic URL. For example, if you try to go to http://twitter.com/brettdewoody it'll redirect you to the dynamic (https://twitter.com/#!/brettdewoody) version of the page.
To answer your last question, both Google and Facebook use the _escaped_fragment_ method right now.

Screen scraping an ASP.NET web page to retrieve data displayed in the grid view

I am using RUBY to screen scrap a web page (created in asp.net) which uses gridview to display data. I am successfully able to read the data displayed on page-1 of the grid but unable to figure out how I can move to the next page in the grid to read all the data.
Problem is the page number hyperlinks are not normal hyperlinks (with URL) but instead are javascript hyperlink which causes postback to the same page..
An example of the hyperlink:-
6
I recommend using Watir, a ruby library designed for browser testing, if you're already using ruby for processing. For one thing, it gives you a much nicer interface to the DOM elements on the page, and it makes clicking links like this easier:
ie.link(:text, '6').click
Then, of course you have easier methods for navigating the table as well. It's easy enough to automate this process:
1..total_number_of_pages.each do |next_page|
ie.link(:text, next_page).click
# table processing goes here
end
I don't know your use case, but this approach has its advantages and disadvantages. For one thing, it actually runs a browser instance, so if this is something you need to frequently run quietly in the background in completely automated way, this may not be the best approach. On the other hand, if it's ok to launch a browser instance, then you don't have to worry about all that postback nonsense, and you can just click the link as if you were a user.
Watir: http://wtr.rubyforge.org/
You'll need to figure out the actual URL.
Option 1a: Open the page in a browser with good developer support (e.g. firefox with the web development tools) and look through the source to find where _doPostBack is defined. Figure out what URL it's constructing. Note that it might not be in the main page source, but instead in something that the page loads.
Option 1b: Ditto, but have ruby do it. If you're fetching the page with Net:HTTP you've got the tools to find the definition of __doPostBack already (the body as a string, ruby's grep, and the ability to request additional files, such as those in script tags).
Option 2: Monitor the traffic between a browser and the page (e.g. with a logging proxy) to find out what the URL is.
Option 3: Ask the owner of the web page.
Option 4: Guess. This may not be as bad as it sounds (e.g. if the original URL ends with "...?page=1" or something) but in general this is the least likely to work.
Edit (in response to your comment on the other question):
Assuming you're using the Net:HTTP library, you can do a postback by just replacing your get with a post, e.g. my_http.post(my_url) instead of my_http.get(my_url)
Edit (in response to danieltalsky's answer):
watir may be a really good solution for you (I'm kicking myself for not having thought of it), but be aware that you may have to manually fire the event or go through other hoops to get what you want. As a specific gotcha, with any asynchronous fetch like this you need to make sure that the full response has come back before you scrape it; that isn't a problem when you're doing the request inline yourself.
You will have to perform the postback. The data is pass with a form POST back to the server. Like Markus said use something like FireBug or the Developer Tools in IE 8 and fiddler to watch the traffic. But honestly this is a web form using the bloated GridView and you will be in for a fun adventure. ;)
You'll need to do some investigation in order to figure out what HTTP request the javascript execution is performing. I've used the Mozilla browser with the Firebug plugin and also the "Live HTTP Headers" plugin to help determine what is going on. It will likely become clear to you which requests you will need to make in order to traverse to the next page. Make sure you pay attention to any cookies getting set.
I've had really good success using Mechanize for scraping. It wraps all of the HTTP communication, html parsing and searching(using Nokogiri), redirection, and holding onto cookies. But it doesn't know how to execute Javascript, which is why you will need to figure out what http request to perform on your own.

Resources