Prevent direct-linking to .zip files - download

I'ld like to prevent direct-linking to .zip files I offer for download on my website.
I'm reading posts for hours now but I'm not sure which method is the best to achieve that. PHP seems not to be safe and htaccess refferer can be empty etc.
Which method do you guys use or would suggest?
Cheers

See: http://www.alistapart.com/articles/hotlinking/
and: http://www.webmasterworld.com/forum92/2787.htm

Referrer checking is one option, but as you noted they can be empty or spoofed.
Another possibility is to set a cookie when someone visits normal pages on your site, and check for that when the person tries to download the zip file. This could be gotten around (e.g. by the hot-linker embedding an appropriate cookie-setter page as a 1x1 image along size the hot link), but it's less likely they'll figure it out. It'll also exclude people who block cookies, of course.
Another possibility is to generate limited-time-access URLs on the download page, something along the lines of http://example.com/download.php?file=file.zip&code=some-random-string-here. The link would only be usable for a small number of downloads and/or a short period of time, after which it would no longer function.

Related

Check on which pages an image is used?

Is there a certain way to check which pages on a website use a specific image?
Say I have some image which I don't use on a page anymore, so I'd like to delete it from my server. But I'm not entirely sure if it's being used on other pages, is there a way to check if it's still being shown on other pages?
You can hook your website to google webmaster tools and wait a little bit after a while 404 errors will appear there. This way you can track unused resources and dead ends.
This includes images.
There is a better way if you have direct access to the web server.
Visit every page in your website or let google crawl it.
You can later sort the files by date modified and ones which are not modified lately are not used.
You have to make sure you get the images from the pages so I would use a historyless cahceless session.
How to sort the files according to the time stamp in unix?

Getting images from a URL to a directory

Well, straight to the point, I want to put a URL, and get all the images inside this URL, for example
www.blablabla.com/images
in this images folder I want to get all the images... I already know how to get a image from an specific URL, but I dunno how to get all of the without having to go straight to the exactly path, is there a way to get a list of all the items inside a URL path or something like that?
Well, basically, this can't be done. Well, not under normal circumstances anyway. The problem is that you don't know what files are in that directory.
...unless the server has "directory listing" on. This is considered a security vulnerability, so the chance this is the case isn't too high. (The idea is that you are exposing details about your server that you don't have to, and while it is no problem on its own, it might make things that can be a security problem known to the world.)
This means that if the server is yours, you can turn directory listing on, or that when the server happens to have it turned on, you can visit the url (www.blablabla.com/images) and see a listing of all the files in that directory. This doesn't always look exactly the same, but in general the common thing is that you will get an html page with links to all the files in the directory. As such, all you would need to do is retrieve the page and parse the links, ending up with the urls to the images you want.
If the server is yours, I would recommend at least looking into any other options you might have. One such option could be to make a script that provides all the urls instead of relying on directory listing. This does not have some of the more unfortunate implications that directory listing has (like showing non-images that happen to be in the same directory) and can be more flexible.
Another way to do this might be to use a protocol different from HTTP like FTP, SFTP or SCP. These protocols do not have the same flexibility as a script, but they do have even more safety as they easily allow you to restrict access to both the directory listing and your images to only people with correct login details (or private keys). (Of course, if such a protocol is available for your use and it's not your own server, you could use them as well.)

How can I set up custom ImageResizer urls?

I'm just getting started with ImageResizer and I'm stuck on what seem like totally basic questions:
I have an uploader that I use to put images into a directory that's not directly accessible over HTTP. (If I just put a image at, say, /images/myimage.jpg, then anyone could access it by just asking for it, whereas I want to limit access via thumbnails, watermarks, etc.). So I want to put it at /offlimits/myimage.jpg, but be able to serve it up at /public/images/myimage.jpg.
I don't really want to dump all the images in the same offlimits folder, because putting lots of files in one folder makes Windows unhappy. But I don't want to expose the details of that subdirectory structure either, so where do I put the mapping between the public facing url and the actual image location?
Most generally, I don't necessarily want an image extension at all, so I'd like to say /public/image_id?width=100... and have this map to /offlimits/sub1/sub2/sub3/image_id.jpg.
Can anyone advise about how to set this up?
Three part questions are generally frowned upon here at SO, but I'll bite anyhow :)
If you're allowing access to images based on authentication, then you need to use ASP.NET's URL Authorization feature. ImageResizer supports URL Authorization rules. If you just don't want the source files available, and want to force them resized or watermarked, read the docs on how to implement arbitrary rules like this.
You can rewrite image paths to your heart's content with Config.Current.Rewrite, which works just like the PostRewrite event mentioned earlier. Just remember you'll have to keep it all straight in your head later.
Image extensions are good things. Don't fight them. They let the server figure out the right mime-type to send and help errant browsers recover from related bugs. They prevent issues on several platforms and make the Save As dialog work. They significantly improve server efficiency as well, since handling logic doesn't have wait as long. This is particularly relevant because of the design of the IIS/ASP.NET modules system.

client-side file caching

If I understand correctly, a broswer caches images, JS files, etc. based on the file name. So there's a danger that if one such file is updated (on the server), the browser will use the cached copy instead.
A workaround for this problem is to rename all files (as part of the build), such that the file name includes an MD5 hash of it's contents, e.g.
foo.js -> foo_AS577688BC87654.js
me.png -> me_32126A88BC3456BB.png
However, in addition to renaming the files themselves, all references to these files must be changed. For exmaple a tag such as <img src="me.png"/> should be changed to <img src="me_32126A88BC3456BB.png"/>.
Obviously this can get pretty complicated, particularly when you consider that references to these files may be dynamically created within server-side code.
Of course, one solution is to completely disable caching on the browser (and any caches between the server and the browser) using HTTP headers. However, having no caching will create it's own set of problems.
Is there a better solution?
Thanks,
Don
The best solution seems to be to version filenames by appending the last-modified time.
You can do it this way: add a rewrite rule to your Apache configuration, like so:
RewriteRule ^(.+)\.(.+)\.(js|css|jpg|png|gif)$ $1.$3
This will redirect any "versioned" URL to the "normal" one. The idea is to keep your filenames the same, but to benefit from cache. The solution to append a parameter to the URL will not be optimal with some proxies that don't cache URLs with parameters.
Then, instead of writing:
<img src="image.png" />
Just call a PHP function:
<img src="<?php versionFile('image.png'); ?>" />
With versionFile() looking like this:
function versionFile($file){
$path = pathinfo($file);
$ver = '.'.filemtime($_SERVER['DOCUMENT_ROOT'].$file).'.';
echo $path['dirname'].'/'.str_replace('.', $ver, $path['basename']);
}
And that's it! The browser will ask for image.123456789.png, Apache will redirect this to image.png, so you will benefit from cache in all cases and won't have any out-of-date issue, while not having to bother with filename versioning.
You can see a detailed explanation of this technique here: http://particletree.com/notebook/automatically-version-your-css-and-javascript-files/
Why not just add a querystring "version" number and update the version each time?
foo.js -> foo.js?version=5
There still is a bit of work during the build to update the version numbers but filenames don't need to change.
Renaming your resources is the way to go, although we use a build number and embed that in to the file name instead of an MD5 hash
foo.js -> foo.123.js
as it means that all your resources can be renamed in a deterministic fashion and resolved at runtime.
We then use custom controls to generate links to resources at on page load based upon the build number which is stored in an app setting.
We followed a similar pattern to PJP, using Rails and Nginx.
We wanted user avatar images to be browser cached, but on an avatar's change we needed the cache to be invalidated ASAP.
We added a method to the avatar model to append a timestamp to the file name:
return "/images/#{sourcedir}/#{user.login}-#{self.updated_at.to_s(:flat_string)}.png"
In all places in the code where avatars were used, we referenced this method rather than an URL. In the Nginx configuration, we added this rewrite:
rewrite "^/images/avatars/(.+)-[\d]{12}.png" /images/avatars/$1.png;
rewrite "^/images/small-avatars/(.+)-[\d]{12}.png" /images/small-avatars/$1.png;
This meant if a file changed, its URL in the HTML changed, so the user's browser made a new request for the file. When the request reached Nginx, it got rewritten to the simple name of the file.
I would suggest using caching by ETags in this situation, see http://en.wikipedia.org/wiki/HTTP_ETag. You can then use the hash as the etag. A request will still be submitted for each resource, but the browser will only download items that have changed since last download.
Read up on your web server / platform docs on how to use etags properly, most decent platforms have built-in support.
Most modern browsers check the if-modified-since header whenever a cacheable resource is in a HTTP request. However, not all browsers support the if-modified-since header.
There are three ways to "force" the browser to load a cached resource.
Option 1 Create a query string with a version#. src="script.js?ver=21". The downside is many proxy servers wont cache a resource with query strings. It also requires site-wide updating for changes.
Option 2 Create a naming system for your files src="script083010.js". However the downside to option 1 is that this as well requires site-wide updates whenever a file changes.
Option 3 Perhaps the most elegant solution, simply set up the caching headers: last-modified and expires in your server. The main downside to this is users may have to recache resources because they expired yet never changed. Additionally, the last-modified header does not work well when content is being served from multiple servers.
Here a few resources to check out: Yahoo Google AskApache.com
This is really only an issue if your web server sets a far-future "Expires" header (setting something like ExpiresDefault "access plus 10 years" in your Apache config). Otherwise, a browser will make a conditional GET, based on the modified time and/or the Etag. You can verify what is happening on your site by using a web proxy or an extension like Firebug (on the Net panel). Your question doesn't mention how your web server is configured, and what headers it is sending with static files.
If you're not setting a far-future Expires header, there's nothing special you need to do. Your web server will usually handle conditional GETs for static files based on last modified time just fine. If you are setting a far-future Expires header then yes, you need to add some sort of version to the file name like your question and the other answers have mentioned already.
I have also been thinking about this for a site I support where it would be a big job to change all references. I have two ideas:
1.
Set distant cache expiry headers and apply the changes you suggest for the most commonly downloaded files. For other files set the headers so they expire after a very short time - eg. 10 minutes. Then if you have a 10 minute downtime when updating the application, caches will be refreshed by the time users go to the site. General site navigation should be improved as the files will only need downloading every 10 minutes not every click.
2.
Each time a new version of the application is deployed to a different context that contains the version number. eg. www.site.com/app_2_6_0/ I'm not really sure about this as users bookmarks would be broken on each update.
I believe that a combination of solutions works best:
Setting cache expiry dates for each type of resource (image, page, etc) appropreatly for that resource, for example:
Your static "About", "Contact" etc pages probably arn't going to change more than a few time a year, so you could easily put a cache time of a month on these pages.
Images used in these pages could have eternal cache times, as you are more likey to replace an image then to change one.
Avatar images might have an expiry time of a day.
Some resources need modified dates in their names. For example avatars, generated images, and the like.
Some things should never be caches, new pages, user content etc. In these cases you should cache on the server, but never on the client side.
In the end you need to carfully consider each type of resource to determine what cache time to instruct the browser to use, and always be conservitive if you are unsure. You can increase the time later, but it's much more pain to uncache something.
You might want to check out the approach taken by the grails "uiperformance" plugin, which you can find here. It does a lot of the things you mention, but automates them (set expiry time to a long time, then increments version numbers when files change).
So if you're using grails, you get this stuff for free. If you are not - maybe you can borrow the techniques employed.
Also - borrowed form the ui-performance page, - read the following 14 rules.
ETags seemingly provide a solution for this...
As per http://httpd.apache.org/docs/2.0/mod/core.html#fileetag, we can set the browser to generate ETags on file-size (instead of time/inode/etc). This generation should be constant across multiple server deployments.
Just enable it in (/etc/apache2/apache2.conf)
FileETag Size
& you should be good!
That way, you can simply reference your images as <img src='/path/to/foo.png' /> and still use all the goodness of HTTP caching.

Content Water Marking

We have members-only paid content that is frequently copied and republished without our permission.
We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.
The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.
This bumps the page load time from ~.8ms to ~2.5sec for each article page view.
Does anyone know of any watermarking strategies that can still be used with caching?
Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )
We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.
If you're talking about images then you could use PHP to add a watermark to the images.
How can I add an image onto an image in PHP like a watermark
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago
Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.
That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.
Thanks
You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.
Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.
This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.
How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.
What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.
If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

Resources