How to read a file from browsers cache? - caching

I had made a GIF-file with more (hidden) information in it self, then only the picture-data.
Like so:
<?php
// set variabelen
$naam = "gebruikersinformatie";
$info['age'] = 27;
$info['number'] = '1234.56.789';
$info['name'] = 'Arie Noniem';
$info['unique_hash'] = base64_encode(implode("|", array($_SERVER['HTTP_USER_AGENT'],$_SERVER['HTTP_ACCEPT'],$_SERVER['REMOTE_ADDR'],$_SERVER['REMOTE_PORT'],$_SERVER['HTTP_ACCEPT_LANGUAGE'])));
// build the information
$info = base64_encode(http_build_query($info));
// build the image
header('Content-type: image/gif');
echo base64_decode("R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\n".$info);
?>-
Now, i would like to check if the file is already in user's cache.
If not: make a new one (with code above).
If is in cache: read that gif.gif file with PHP, as there is more information stored in it.
Question: how to check if the file is in the cache? How to get the browser to cache images, with php? doesnt work correctly
And how to read the cached file? So PHP gets the real contents from the cached file?
Reason: to avoid EU-cookie-law

I'm afraid you won't be avoiding EU Cookie Law.
Although it's commonly become known as cookie law it is a privacy directive with principles applying to any technology where information is placed, held or read from the user's device. So necessity for compliance includes for instance Flash files (Locally Stored Objects), tracking pixels (invisible one pixel images typically used for tracking email opens) and other stuff too.
Just for reference, this answer is based in our experience in putting together ukcookieslaw.co.uk to deal specifically with the UK implementation of the EU Directive (noticing the German in your coding :-).
Assuming at the least privacy invasive your solution was doing the same as a session cookie and providing a necessary function (like maintaining a log-in) one could argue your solution is actually less compliant, as a session cookie will be (usually) destroyed at latest when the user quits the browser.
Your more obscured, difficult to inspect, deliberately hidden (I appreciate there's no malicious intent) payload can, and given that most people do not empty their cache each time they quit, will hang around for longer. In fact, in a way you're relying on that.
Without the details one can't take a view, but it may be that the information is more available to third parties, i.e. is there a possibility of caching of the image by intermediaries in the network that you would have to protect against?
You would still have to describe your use of personal data, and rely on either implied consent (or explicit consent) for placing data on the user's device for your site's compliance. Problem is that any consent must be INFORMED consent, and it would appear on the face of it that informing the user is furthest from your mind.
I think you need a better reason for your engineering effort :-)
kind regards,
Philip

Related

How to check for a image update on a website?

How can I check if an image file is changed on a website, from another website and then store it and the old version?
I'm using this to log the images on the server.
This is just a quick sketch of the simplest approach. If you want more detail on some part, just ask in the comments.
Sketch of solution
Download the image, compute a hash for it and store the image in file system and image ID + hash + file system path (and possibly other info such as time of request) in database.
When checking for update, get last available info for the same ID from the database and if hashes are not the same, the image was not updated. If you use cryptographic hash like MD5 or SHA1 and the hash changed, it is almost sure that the image changed too.
Setup a cronjob to run the script periodically.
To download the image, you could use $img = file_get_contents($url);. MD5 can be computed via $hash = md5($img);, SHA1 via $hash = sha1($img);. For storing use file_put_contents($path, $img);.
Optimization
There are several ways to optimize the job.
To cut on memory consumption, download the file directly to file system using file_put_contents($path, $url); and only after that compute the hash using $hash = md5_file($path); or $sha1_file($path);. This is better for larger images. The bad thing is that you have to read the data from file system again to compute the hash, so I think it would be slower.
Side note: Never optimize anything before you know that it really makes the code better. Always measure before, after and compare.
Another optimization could be done for saving on data transfers from remote server if the server sends reliable headers for caching. ETag is the best one because it should be based on the contents of the file. If it does not change, file should be the same. If you want just to check the headers, use $headers = get_headers($url, 1);. To fetch really just the headers, you should issue just HTTP request via HEAD method instead of GET. Check get_headers() manual for more info. To check the headers while getting response body, use file_get_contents() along with $http_response_header special variable.
Issuing requests indicating that you cached the image on last visit (via If-Modified-Since et al.) could serve the same purpose.
Etiquette and legal aspects
I told you how. Now I’ll tell you when (not).
Do not abuse the remote server. Remember that its owner has expenses to keep it up and running and definitely does not want to let it be occupied by your scripts for more than a negligible amount of time, transferring not much data. Adapt your polling period to type of target server and size of image. Adapting it to estimated frequency of change is not a bad idea too.
Be sure to have consent of image rights holder when storing its copy. Licensing can be a messy thing. Be careful, otherwise you can get into trouble.
If you plan to somehow crawl for the images, robots.txt standard might be of your interest. This file could tell you that you are not welcome and you should respect it.
Related questions
Some are related more, some less. People want to watch mainly HTML pages. That has other specifics, which is also why I did not flag this question as duplicate of one of these.
https://stackoverflow.com/q/11336431/2157640
https://stackoverflow.com/q/11182825/2157640
https://stackoverflow.com/q/13398512/2157640
https://stackoverflow.com/q/15207145/2157640
https://stackoverflow.com/q/1494488/2157640

html form post/get data security

Okay, this question isnt exactly very clear, because i cant write it as a single question.
I have a game that i am designing using javascript, and it is basically a multiplayer game.
So say there is two players, player darklord and angel
angel shoots darklord
so darklord loses 1 life.
Now what happens is that i use ajax to submit the number of life that darklord loses.
And the request is GET /shootout.php?shooter=darklord&life=-1
so this allows me to store the new life of darklord.
Now the problem is say angel knows about computer, and he starts requesting /shootout.php?shooter=darklord&life=-3
Thus darklord loses more life then he should have. So angel cheated in the game.
No i want to prevent this kind of requests, and i am trying to get a way so that my requests can be hidden. I mean i know i can encrypt the url. So say i encrypted it such that the request should be GET /enc.php?e=934ufj30jf for darklord to lose a life, and different values of e for angel to lose a life, or gain a point. However for this to work i will need to send the data to the client, as in tell the javascript to request this url.
Now the user can easily go around reading the source of the file in order to find out what are the new requests for doing things,
I have found and thought of many other ways, but they all limit the amount of cheating or effect the game-play etc.. None of them eliminate this security completely.
So now my question is how do i make sure that users dont send data that is not real. How do i stop them from cheating?
I have thought of the best way being that i use server side scripts to actually calculate the possibility of someone shooting someone else and then matching it with the client input, but that will effect execution time by a LOT, so i am trying to find other ways, some public key encryptions?? (problem is the user can put the data as they want and then encrypt it) tokens? (problem is the user can put the data as they want and then put the current token)
so any other ideas anyone??
This isn't about hiding requests, it's about implementing proper access controls. Your example is referred to as an insecure direct object reference in that manipulating values in the querystring relating to direct DB objects causes an unintended outcome (have a look at OWASP Top 10 for .NET developers part 4: Insecure direct object reference).
There are a couple of things you can do but the most important is implementing proper access controls. You must authenticate the caller of the service and authorise them to perform the requested activity (and this all has to happen on the server). In this case, angle should not be able to perform an action on behalf of darklord.
The other thing you can do is use an indirect object reference map (refer to the link above), which obfuscates the IDs of the player with cryptographically strong, user-specific alternatives. You probably don't need this in addition to the access controls but it does give you more unpredictability.
Finally, think about the flip-side as well - if darklord is able to pass the amount of damage as a parameter, what's to stop him from re-issuing the request manually with "life=-100"? It will depend on the specifics of how the attack action is performed, but you're going to want to avoid people gaming this action too.
You have to assume that the user is completely in control of the client JavaScript. The only way to make this secure is to do the check on the server side.
You should not send result of action. You should send action.
i.e angel shoot darkangel from point (7,15) with angle 36 degree
than server checks is it correct shoot and decrease lifes of darklord
There was an excellent answer given on this subject a couple of years ago. It actually refers to Flash rather than JavaScript, but the security concerns and techniques are going to be applicable to this situation too.
What is the best way to stop people hacking the PHP-based highscore table of a Flash game
You should never have the client tell the server what changes to make in player state (eg. remove X amount of health) because the client could always be cheating. Instead only have the client tell the server what input the player has made and then the server determines what happens as a result of that input.
Although this doesn't remove the possibility of cheating by writing a bot that plays the game automatically (and is better at the game than any human player) you at least remove overt cheating of the "I did 10,000 damage, trust me" variety.
Detecting bots is best done by tracking behavioral data and doing data mining to find cheaters. And if there is no behavioral difference between bots and human players, then who cares about the bots.

AJAX every form element?

Is it better-practice to AJAX every form element separately (eg. send request onChange, etc) or collect all the data, then submit with 1 click save?
Essentially, auto-save or user-initiated-save?
I would generally say that a user-initiated save is the way to go for most web-applications. If for nothing else, this is how users are used to interacting with web apps; familiarity and ease of use is extremely important in web applications. Not to mention it can cut down on unnecessary traffic.
This is not to say that auto-saving does not have it's place, but often it can be cause unnecessary traffic. For example, if I am auto-saving a contact form, fill out my name, then email, then back to name to change it, that is already 3 requests that have been sent with no benefit - this is extra work for no added advantage.
Once again, I think it does have a lot to do with your application or where you are planning on using it. Inline edits are something that often uses auto-saving and there I think it is useful, whereas a contact form/signup form would not be a good idea.
I'd say that depends on the nature of your application and whether "auto-save" is a behaviour desired by your users.
"User initiated save" is what a user would expect from their experience with web forms nowadays - I would not deviate from that unless there's a good reason.
Depends on following factors:
What kind of data are you trying to save. E.g. is it okay to be able to save the data partly or you need to save it all at once?
How much data do you want to save? If you have many fields, you might want to send data in chunks (In case of wizards) or save everything at once
Its also a good idea to have data saved (in background) for large forms in a temp way if the user may take a long time to fill in the data (e.g. emails saved as drafts)
It also depends on your web app and the way you have designed your forms. In some forms you may allow certain fields to be modified and saved inplace, so that you can fetch additional data for example
In most cases it would be good to have an explicit "Save" action for your data forms

Techniques to reduce data harvesting from AJAX/JSON services

I was wondering if anyone had come across any techniques to reduce the chances of data exposed through JSON type services on the server (intended to supply AJAX functions) from being harvested by external agents.
It seems to me that the problem is not so difficult if you had say a Flash client consuming the data. Then you could send encrypted data to the client, which would know how to decrypt it. The same method seems impossible with AJAX though, due to the open nature of the Javascript source.
Has anybody implemented a clever technique here?
Whatever the method, it should still allow a genuine AJAX function to consume the data.
Note that I'm not really talking about protecting 'sensitive' information here, the odd record leaking out is not a problem. Rather I am thinking about stopping a situation where the whole DB is hoovered up by bots (either in one go, or gradually over time).
Thanks.
First, I would like to clear on this:
It seems to me that the problem is not
so difficult if you had say a Flash
client consuming the data. Then you
could send encrypted data to the
client, which would know how to
decrypt it. The same method seems
impossible with AJAX though, due to
the open nature of the Javascrip
source.
It will be pretty obvious the information is being sent encrypted to the flash client & it won't be that hard for the attacker to find out from your flash compiled program what's being used for this - replicate & get all that data.
If the data does happens to have the value you are thinking, you can count on the above.
If this is public information, embrace that & don't combat it - instead find ways to capitalize on it.
If this is information that you are only exposing to a set of users, make sure you have the corresponding authentication / secure communication. Track usage as others have said, and have measures that act on it,
The first thing to prevent bots from stealing your data is not technological, it's legal. First, make sure you have the right language in your site's Terms of Use that what you're trying to prevent is actually disallowed and defensible from a legal standpoint. Second, make sure you design your technical strategy with legal issues in mind. For example, in the US, if you put data behind an authentication barrier and an attacker steals it, it's likely a violation of the DMCA law. Third, find a lawyer who can advise you on IP and DMCA issues... nice folks on StackOverflow aren't enough. :-)
Now, about the technology:
A reasonable solution is to require that users be authenticated before they can get access to your sensitive Ajax calls. This allows you to simply monitor per-user usage of your Ajax calls and (manually or automatically) cancel the account of any user who makes too many requests in a particular time period. (or too many total requests, if you're trying to defend against a trickle approach).
This approach of course is vulnerable to sophisticated bots who automatically sign up new "users", but with a reasonably good CAPTCHA implementation, it's quite hard to build this kind of bot. (see "circumvention" section at http://en.wikipedia.org/wiki/CAPTCHA)
If you are trying to protect public data (no authentication) then your options are much more limited. As other answers noted, you can try IP-address-based limits (and run afoul of large corporate proxy users) but sophisticated attackers can get around this by distributing the load. There's also likley sophisticated software which watches things like request timing, request patterns, etc. and tries to spot bots. Poker sites, for example, spend a lot of time on this. But don't expect these kinds of systems to be cheap. One easy thing you can do is to mine your web logs (e.g. using Splunk) and find the top N IP addresses hitting your site, and then do a reverse-IP lookup on them. Some will be legitimate corporate or ISP proxies. But if you recognize a compeitor's domain name among the list, you can block their domain or follow up with your lawyers.
In addition to pre-theft defense, you might also want to think about inserting a "honey pot": deliberately fake information that you can track later. This is how, for example, maps manufacturers catch plaigarism: they insert a fake street in their maps and see which other maps show the same fake street. While this doesn't prevent determined folks from sucking out all your data, it does let you find out later who's re-using your data. This can be done by embedding unique text strings in your text output, and then searching for those strings on Google later (assuming your data is re-usable on another public website). If your data is HTML or images, you can include an image which points back to your site, and you can track who is downloading it, and look for patterns you can use to bust the freeloaders.
Note that the javascript encryption approach noted in one of the other answers won't work for non-authenticated sessions-- an attacker can simply download the javascript and run it just like a regular browser would. Moral of the story: public data is essentially indefensible. If you want to keep data protected, put it behind an authentication barrier.
This is obvious, but if your data is publicly searchable by search engines, you'll both need a non-AJAX solution for them (Google won't read your ajax data!) and you'll want to mark those pages NOARCHIVE so your data doesn't show up in Google's cache. You'll also probably want a white list of search engine crawler IP addreses which you allow into your search-engine-crawlable pages (you can work with Google, Bing, Yahoo, etc. to get these), otherwise malicious bots could simply impersonate Google and get your data.
In conclusion, I want to echo #kdgregory above: make sure that the threat is real enough that it's worth the effort required. Many companies overestimate the interest that other people (both legitimate customers and nefarious actors) have in their business. It might be that yours is an oddball case where you have particularly important data, it's particularly valuable to obtain, it must be publicly accessible without authentication, and your legal recourses will be limited if someone steals your data. But all those together is admittedly an unusual case.
P.S. - another way to think about this problem which may or may not apply in your case. Sometimes it's easier to change how your data works which obviates securing it. For example, can you tie your data in some way to a service on your site so that the data isn't very useful unless it's being used in conjunction with your code. Or can you embed advertising in it, so that wherever it's shown you get paid? And so on. I don't know if any of these mitigations apply to your case, but many businesses have found ways to give stuff away for free on the Internet (and encourage rather than prevent wide re-distribution) and still make money, so a hybrid free/pay strategy may (or may not) be possible in your case.
If you have an internal Memcached box, you could consider using a technique where you create an entry for each IP that hits your server with an hour expiration. Then increment that value each time the IP hits your AJAX endpoint. If the value gets over a particular threshold, fry the connection. If the value expires in Memcached, you know it isn't getting "hoovered away".
This isn't a concrete answer with a proof of concept, but maybe a starting point for you. You could create a javascript function that provides encryption/decryption functions. The javascript would need to be built dynamically, and you would include an encryption key that is unique to the session. On the server side, you'd have an encryption service that uses the key from the session to encrypt your JSON before delivering it.
This would at least prevent someone from listening to your web traffic, pulling information out of your database.
I'm with kdgergory though, it sounds like your data is too open.
Some techniques are listed in Further thoughts on hindering screen scraping.
If you use PHP, Bad behavior is a nice tool to help. If you don't use PHP, it can give some ideas on how to filter (see How it works page).
Incredibill's blog is giving nice tips, lists of User-agents/IP ranges to block, etc...
Here are a variety of suggestions:
Issue tokens required for redemption along with each AJAX request. Expire the tokens.
Track how many queries are coming from each client, and throttle excessive usage based on expected normal usage of your site.
Look for patterns in usage such as sequential queries, spikes in requests, or queries that occur faster than a human could conduct.
Check user-agents. Many bots don't completely replicate the user agent info of a browser, and you can eliminate programatic scraping of your data using this method.
Change the front-end component of your website to redirect to a captcha (or some other human verifying mechanism) once a request threshold is exceeded.
Modify your logic so the respsonse data is returned in a few different ways to complicate the code required to parse.
Obsfucate your client-side javascript.
Block IPs of offending clients.
Bots usually doesn't parse Javascript, so your ajax code won't be instantly executed. And if they even do, bots usually doesn't maintain sessions/cookies as well. Knowing that, you could reject the request if it is invoked without a valid session/cookie (which is obviously set on the server side beforehand by the request on the parent page).
This does not protect you from human hazard though. The safest way is to restrict access to users with a login/password. If that is not your intent, well, then you have to live with the fact that it's a public application. You could of course scan logs and maintian blacklists with IP addresses and useragents, but that goes extreme.

JSONP and Cross-Domain queries - How to Update/Manipulate instead of just read

So I'm reading The Art & Science of Javascript, which is a good book, and it has a good section on JSONP. I've been reading all I can about it today, and even looking through every question here on StackOverflow. JSONP is a great idea, but it only seems to resolve the "Same Origin Problem" for getting data, but doesn't address it for changing data.
Did I just miss all the blogs that talked about this, or is JSONP not the solution I was hoping for?
JSONP results in a SCRIPT tag being generated to another server with any parameters that might be required as a GET request. e.g.
<script src="http://myserver.com/getjson?customer=232&callback=jsonp543354" type="text/javascript">
</script>
There is technically nothing to stop this sort of request altering data on the server, e.g. specifying newName=Tony. Your response could then be whether the update succeeded or not. You will be limited by whatever you can fit on a querystring. If you are going with this approach add some random element as a parameter so that proxy's won't cache it.
Some people may consider this goes against the way GET's are supposed to work i.e. they shouldn't cause data to change.
Yes, and honestly I would like to stick to that paradigm. However, I might bend the rule and say that, requests which do not alter/deal with CRUCIAL data will be accessible via GET calls... hm...
For instance, I am building a shopping cart system, and I think that allowing the adding/removing/etc of items to/from a cart could very easily be exposed via GETs, since even though you can change data, you cannot do anything critical with it. If someone maliciously added 1,000 flatscreen monitors to your shopping cart, there would be at least one verification step that would NOT be vulnerable to any attacks (a standard ASP.NET page at that point, with verification and all that jazz).
Is this a good/workable solution in anyones' opinion?

Resources