retrieve older cached versions of webpage

retrieve older cached versions of webpage - caching

Is it possible to retrieve old cached version of a webpage older than the most recent cache provided by google search? Something like a history of a webpage?
Google's cached link only provides their most recent cache. Is there a way to get to older versions, either via google or maybe another similar website?

You can use Wayback machine archive.org for that or cache viewer website like cachearchive.com.

Related

Where is WinAPI documentation?

E.g. I search “JetCommitTransaction”, search finds https://msdn.microsoft.com/en-us/library/gg269191(v=exchg.10).aspx which redirects to https://learn.microsoft.com/en-us/previous-versions/
That API is available in all versions of Windows starting from Win2000 to the very latest Win10, and even available for Windows 10 UWP apps, https://learn.microsoft.com/en-us/uwp/win32-and-com/win32-apis#apis-from-esentdll so it’s not deprecated or something.
archive.org works but it's slow and inconvenient.

Microsoft is moving all documentation from msdn.microsoft.com to learn.microsoft.com so I assume this will work itself out after a while but this is the first time I have seen a completely broken redirect.
I think Google cache and Archive.org are your best options for now. You could also try contacting #docsmsft/Github issues.

We are working on moving all MSDN documentation to learn.microsoft.com. Some of the redirects were accidentally deployed early - those have been rolled back, until the migration is complete. In the meantime, the documentation is accessible on MSDN on the links you specified above.

Download Google Drive SDK thumbnailLink provides cached file

Using Google Drive SDK I retrieve the thumbnailLink property for a Google document and I then use this to download the generated image, which I cache on a file server. However, I'm seeing that I often get a thumbnail of an older version of my document, it could be a version cached by Google drive.
This thumbnail link has this form:
https://docs.google.com/...&sz=s220
You can get different thumbnail sizes based on the sz argument. The interesting thing is that I'm seeing different versions (older or newer thumbnails of my document) of the thumbnail depending on the value of the sz argument.
Is there a way to get a fresh thumbnail when a Google document has been updated?

There does appear to be some caching going on with the thumbnail URLs, such that certain ranges of sizes are cached together. In my experience those caches do expire however, although not in a way you can completely control (as you've noted requesting a different size can sometimes cause a cache miss). The team is working on changing the thumbnail serving to a new backend that should resolve the issue, but I don't have a timeline for that change.

Library to detect and parse browser information

I have a requirement where I need to store various bits of browser information such as Brand (eg IE, Chrome..), Model (eg IE, Chrome), Browser Version (eg 7.0.0.0), OS Version (eg Windows 7, OSX, Linux), Flash version info ect. For mobile detection, I'm using WURFL which uses the user agent string and has great support for mobile devices, but not so much for desktop web browsers. I'm using the web patch with WURFL, but to make it useful I would have to add my own override patch to provide some of the items listed above. Is this the best way to do this? Or has anyone found a library out there more suited to this kind of task. If WURFL is the best way to do this, is there an updated and maintained web patch that's more comprehensive than the one provided on the WURFL site?

Based on information received directly from ScientiaMobile the next minor release (estimated at a couple of weeks from the date of this post) of the API will improve the quality of web browser detection and that they are considering including the web patch within the main repository removing the need for a separate patch file.

Is it safe to use code from code.jquery.com for long-term application?

I am using Ajax / jquery on a webpage i am designing... in order for it to function, i include (at the top of my page) the javascript at: http://code.jquery.com/jquery-1.4.4.js
This works great and all, but i have a fear that
1) the code might get changed without me knowing, then i encounter problems and try to debug for days / hours before finding that the code at this site changed
2) the website is no longer used / specific code no longer hosted years from now
So would it be safer to save that javascript file onto my server, and access it from there?

You should use either a Microsoft or Google CDN. It will be much faster, it will be cached for a lot of your users and it's guaranteed to be there, as opposed to the jQuery link you include.

http://code.jquery.com is jQuery's CDN (provided by Media Temple). The code at http://code.jquery.com/jquery-1.4.4.js will never change; jQuery will release a new version (which will be at a different URL), if anything needs to change (which happens all the time; version 1.5b was released today).
The jQuery guys know what they're doing, and they setup a CDN so people can easily link to jQuery. They're just as (un)likely to bring down the CDN as Google and Microsoft are at bringing theirs down.
See http://docs.jquery.com/Downloading_jQuery for more information.
Having said that, it would seem the Google hosted version (http://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js), is referenced more in websites; this leads to a small performance advantage as far as your users are concerned, as the file has more chance of being cached.

It's safe, notice the version number? As jQuery is updated then that version number will change.
Of course using a CDN will always mean that it's possible for the content delivery network to go out of business. But that's the case with any non directly controlled server.
You of course could use the Google CDN for jQuery, I highly recommend it.
Relevant:
http://code.google.com/apis/libraries/devguide.html#jquery

How to Programmatically take Snapshot of Crawled Webpages (in Ruby)?

What is the best solution to programmatically take a snapshot of a webpage?
The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. I would also like to be able to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow.
It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day).
Any ideas how to do this in Ruby? Seems pretty tough.
Browsers to do this in: Safari or Firefox, preferably Safari.
Thanks so much.

This really depends on your operating system. What you need is a way to hook into a web browser and save that to an image.
If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image.
This is definitely possible, for inspiration you may wish to look at the Paparazzi! and webkit2png projects.
Another option, which isn't dependent on the OS, might be to use the BrowserShots API.

There is no built in library in Ruby for rendering a web page.
Using Selenium & Ruby is one possibility. You can run Firefox as a headless browser (ie on a server).
Here is the source code for browser shots. http://sourceforge.net/projects/browsershots/files/
If you are using Linux you could use http://khtml2png.sourceforge.net/ and script it via Ruby.
Some paid services to try and automate
http://webthumb.bluga.net/home
http://www.thumbalizr.com

as viewed by.... ie? firefox? opera? one of the myriad webkit engines?
if only it were possible to automate http://browsershots.org :)

Use selenium-rc, it comes with snapshot capabilities.

With jruby you can use SWT's browser library.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

retrieve older cached versions of webpage - caching

You can use Wayback machine archive.org for that or cache viewer website like cachearchive.com.

Related

Where is WinAPI documentation?

Download Google Drive SDK thumbnailLink provides cached file

Library to detect and parse browser information

Is it safe to use code from code.jquery.com for long-term application?

How to Programmatically take Snapshot of Crawled Webpages (in Ruby)?

Categories

Resources