Reload Document into Google Docs Viewer (Clear Cache) - caching

Google Docs Viewer (http://docs.google.com/viewer) creates a cache of a document after the first viewing. To see what I mean, try the following:
Upload file.pdf to your server (i.e., http://example.com).
Visit http://docs.google.com/viewer?url=http://example.com/file.pdf
Upload a new file to replace file.pdf (but use the same name).
Revisit http://docs.google.com/viewer?url=http://example.com/file.pdf.
Google Docs Viewer still shows the old file.pdf.
Anyone know how to correct this?
(I have already tried clearing browser cache, switching browsers, and logging in with a different google account to view the link.)

It appears there is no way to clear the cache. Although, from my experience, Google tends to do it automatically about once a day.

Maybe if you append a dynamic query string parameter to filename maybe cache will not work.
ex: http://docs.google.com/viewer?url=http://example.com/file.pdf?time=3454354

I added ?time=0
Seemed to work.

Related

Copying offline forum's webpages from Google cache to new forum

Please first consider the following scenario:
I had a public forum online. accidentally all its database is lost and I am not able to get it anywhere. Now I have the only solution to get it from cached resources on web.
I want to know that is there a way to copy webpages of my forum from Google cache and directly put them on my other newly created forum? I must be possible but I don't know how. Can anyone help me?
The other question is if I managed to recover all the pages using Warrick on my hard disk, how is it possible to create DB from those pages?
Please help me out I shall be very grateful.
Its not possible to recreate a database from Googles cached version of your site without manually entering every single user (and all of their profile information) and rewriting every single post from scratch.

Google Docs Viewer - File Request Timeout

I'm working on a Joomla website, which has a set of documents that needs to displayed using a Google Docs viewer.
Though only Authenticated users can reach the file, but the file can also be access through direct path like http://www.example.com/files/somefile.pdf even without authentication.
So when i tried to view a file through Google Viewer with a link something like this..
http://docs.google.com/viewer?url=http://www.example.com/files/somefile.pdf
The files which are of size less than 100kb are viewable and for rest all an error message is displayed as:
Sorry, it took too long to find the document at the original source. Please try again later.
You can also try to download the original document by clicking here.
So I'm not sure whether this is something to do with the Google Doc Viewer, Joomla or any Server issue for request timeout.
How can I make each file irrespective of size viewable with Google Docs?
If its PDF only, you can also just use pdfjs from Mozilla directly. Then you should check your URL encoding. If the issue remains, check out https://code.google.com/p/google-api-php-client/ for converting your docs in-place. Opening them with pdfjs is still recommended to bypass Google-Doc-Viewer problems, at least that is how I could get this working properly.

How to clear Linkedin Share cache?

I have a new description in the page but when I share the page it is still using the old description that no longer exists, I am after something similar like the Facebook Lint.
Any ideas?
You can append dummy query string value to your url and make it look like a new url and LinkedIn fetches it again. I've tried it and it works.
For example:
https://www.codeproof.com/?refid=LinkedIn
where refid=LinkedIn is just a dummy value.
If your url already contains query string and then just append "&refid=LinkedIn" at the end of the url.
Unfortunately, appending a query string to the URL no longer works.
From the following StackOverflow post:
LinkedIn's content cache presently stores website information for approximately 7 days before the crawler will revisit the site.
There looks like there is no instant way to clear the cache, but to wait seven days, remove the media and re-add it.
Appending a query string to the URL no longer works, so you'll have to wait 7 days.
But if you really need to share your URL with the medias you want, you'll have to go with a custom API call.
From the LinkedIn developer docs :
The first time that LinkedIn's crawlers visit a webpage when asked to
share content via a URL, the data it finds (Open Graph values or our
own analysis) will be cached for a period of approximately 7 days.
This means that if you subsequently change the article's description,
upload a new image, fix a typo in the title, etc., you will not see
the change represented during any subsequent attempts to share the
page until the cache has expired and the crawler is forced to revisit
the page to retrieve fresh content.
If you make API calls that directly provide the content to be shared
rather than by a URL that requires analysis, LinkedIn will always use
the values you provide.
Step 1: Visit https://www.linkedin.com/post-inspector/
Step 2: Enter your URL and click on Inspect, You will see the updated preview image
Step 3: Now try sharing your URL on LinkedIn
I've just found a way to force linkedin to fetch a fresh version of the page. Just create a redirect to your destination page and share the redirect page.
For example:
If your page that you want to share is: http://stackoverflow.com
Create a redirect for a page: https://stackoverflow.com/share-li to go to http://stackoverflow.com
And then share the https://stackoverflow.com/share-li on linked in. This way linkedin will think it's a new page and it'll get a fresh page version.
It's easy to do if you're using wordpress, just install a redirection plugin like this one for example: https://wordpress.org/plugins/redirection/
For wordpress these steps work for me:
In the home page I've removed the featured image and add it as a simple image on the header of the page
I've created a redirect page in my blog like (mydomain.com/social) that redirects all requests to my blog (mydomain.com)
Share the blog again in the social networks and everything will be ok
It's done =D
Unfortunately, there is none as of now. We are investigating what it would take to expose a similar feature. Please stay tuned. We'll announce it on the developer site at http://developer.linkedin.com.

Rewrite url ala Google Instant?

I have a e-commerce website built in Ajax and Js, when the user type a search keyword the list is pulled via ajax but the browser url, in my case doesn't change, so if the user reaload or simply bookmarks the address he 'll have to start form scratch loosing the keywords input.
i noticed Google instead rewrites the url with the complete query, no hashtag or complex workaround...apparently
how can i achieve that? consider i have complete control on my server so i can set my apache in any way i want.
thanks!!
See this question, almost the same except they used Facebook as a example.
How does facebook rewrite the source URL of a page in the browser address bar?
If you watch the URL in Google Instant, it doesn't change until you hit "Search" or pause for a set period of time (2 seconds, i think).
After this delay, Google refreshes the page with those search queries.
I'm not sure what browser you're using, but I get all the search terms after a hashtag in Chrome (e.g., http://www.google.com/#sclient=psy&hl=en&q=test+test+sibilance&aq=3&...). I don't think what you think is occurring is actually happening. It could be done on Chrome and other HTML5 browsers using history.pushState(), but I don't see Google Instant using that method.
Then it is not instant. Without reloading the page you can only change the fragment identifier in the URL.
My experience is, that after you changed the search, the Google URL is no longer "correct", i.e. it does not represent the latest query.

Content Water Marking

We have members-only paid content that is frequently copied and republished without our permission.
We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.
The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.
This bumps the page load time from ~.8ms to ~2.5sec for each article page view.
Does anyone know of any watermarking strategies that can still be used with caching?
Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )
We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.
If you're talking about images then you could use PHP to add a watermark to the images.
How can I add an image onto an image in PHP like a watermark
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago
Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.
That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.
Thanks
You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.
Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.
This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.
How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.
What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.
If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

Resources