AJAX and forms? - ajax

I have some forms that communicate with server using AJAX for real reasons: cascade combos, suggestions, multiple correlated selections (e.g. I have {elementary} knowledge of {French} [add], and {good} knowledge of {German} [add]...). I also have some regular fields that I handle trough get.
Thing is that once I've made connection to server-side, it would be easier to me to push all data that way. Is it going too far? What about if I have no reason for AJAX in the first place, and I still use it for pushing form data? I would feel obligated to provide fallback for people with javascript off, but most of the underlying logic would be the same, so it doesn't seam to me as an much of an overhead. It is a kind of data I would push through post anyway, so I'm not losing get parameters that would be good for something.
Any reason not to do things this way?

User experience is an important part of any software product. If you can improve the experience by providing better interactions, there's no reason not to do it.
Make sure though that you write unobtrusive and degradable javascript, so users with screen-readers or javascript-disabled browsers can still complete the interactions.

The only problem with this strategy is that you're in a lot deeper trouble if someone decides they want a non-javascript solution as well. I think it's fairly wise to use the "least fancy" mechanism that will get the desired result on the web. If it's just a form post, then keep it a form post unless there's a reason to do otherwise.

Related

In a Web API, isn't providing a DELETE ALL too dangerous?

In studying Web API design (regardless of the specific technology), I often come across these two uses of the DELETE verb:
DELETE /SomeResource/123 /* deletes entity with ID 123 */
DELETE /SomeResource/ /* deletes all entities */
I always get the feeling that there's something wrong with providing the latter as an operation in most applications. In rare cases where the resource is trivial, sure, why not just blow the whole collection away without a second thought? And yes, I understand that typically it's the client app's job to present an "Are you sure?" confirmation. But I like to envision my API being driven in a safe way even by some low-level agent like Fiddler.
So is there some mechanism I'm missing, like a way for the server/API to initiate some kind of dialog with the client agent to get confirmation before blowing away 10,000 customer records?
EDIT
Yes, assume that I do want to provide the functionality to remove all entities in a given scenario, but feel compelled to avoid it because of the perceived danger (typos, not paying attention, etc; the things that a UI confirmation dialog is for)
You'll NEVER present the option to do a delete all function in a web api.
While there may be exceptions, its usually an awful idea to do this because it allows a user to perform an irreversible action. On the other hand, think of how awful it would be to have someone with ill intent find this endpoint!
Most libraries such as ASP.NET will require each method to be implemented explicitly before accepting a request (this is handled in ASP.NET during routing). So, if you don't provide DELETE /someresource/ the problem will never come up. This is particularly great, because you can't accidentally blow away data with a typo.
If you really, really want this functionality, you'll want to be 100% sure you have a reason to have it because this is a dangerous, dangerous endpoint to have.
Bottom line? I wouldn't provide it to protect your data.

Which is better: {REST API, website} --> {database}, or {website} --> {REST API} --> {database}?

I have a product that gathers and displays measurements of all kinds (won't go into it). The display portion is, as one would expect, a database + website built on top of it (with Symfony).
However, we'll probably be creating an API to expose the data to third-parties as well.
Now, we either have the choice of building both the website and the API on top of the database, or just build the API on top, and have the website implement the API.
I would greatly prefer the latter, since otherwise I'll have to adapt both model layers for the API and the website every time the schema changes (which can be a few times).
If I have the latter I obviously have the advantage of only adapting the API model. If the API contract stays the same, the website wouldn't need adapting.
However, obviously there is a downside in performance.
With website <-> database, vs website <-> API <-> database, the first will obviously be the fastest.
My question is: what is your opinion on this trade-off?
I'm hoping the performance can be almost evened out, since all the machines will be on the same LAN + there will be caching. If that's the case, the ease of development would certainly make my life easier :-)
Looking forward to your opinions and experience!
If there was ever a case of premature optimization, this is it! You're not going to know the answer without more information, and I suspect very much that the performance differences between the two will be so negligible as to be irrelevant in your domain.
The best approach, IMO, is to spike on a few of your models using both approaches and see where that gets you.
No better way to make sure your API is going to be usable by others than to use it yourself. I would go website -> API -> database. Write it once, you can always tune it and "cheat" later if you have too.
Many modern websites use JavaScript (AJAX etc) and then make service calls to an API. If you took that approach you would simply have a carefully designed, reusable API layer in front of your DB.
I find that there's little or no extra effort here, and I'm sceptical that you'll incur noticable performance penalties.

Techniques to reduce data harvesting from AJAX/JSON services

I was wondering if anyone had come across any techniques to reduce the chances of data exposed through JSON type services on the server (intended to supply AJAX functions) from being harvested by external agents.
It seems to me that the problem is not so difficult if you had say a Flash client consuming the data. Then you could send encrypted data to the client, which would know how to decrypt it. The same method seems impossible with AJAX though, due to the open nature of the Javascript source.
Has anybody implemented a clever technique here?
Whatever the method, it should still allow a genuine AJAX function to consume the data.
Note that I'm not really talking about protecting 'sensitive' information here, the odd record leaking out is not a problem. Rather I am thinking about stopping a situation where the whole DB is hoovered up by bots (either in one go, or gradually over time).
Thanks.
First, I would like to clear on this:
It seems to me that the problem is not
so difficult if you had say a Flash
client consuming the data. Then you
could send encrypted data to the
client, which would know how to
decrypt it. The same method seems
impossible with AJAX though, due to
the open nature of the Javascrip
source.
It will be pretty obvious the information is being sent encrypted to the flash client & it won't be that hard for the attacker to find out from your flash compiled program what's being used for this - replicate & get all that data.
If the data does happens to have the value you are thinking, you can count on the above.
If this is public information, embrace that & don't combat it - instead find ways to capitalize on it.
If this is information that you are only exposing to a set of users, make sure you have the corresponding authentication / secure communication. Track usage as others have said, and have measures that act on it,
The first thing to prevent bots from stealing your data is not technological, it's legal. First, make sure you have the right language in your site's Terms of Use that what you're trying to prevent is actually disallowed and defensible from a legal standpoint. Second, make sure you design your technical strategy with legal issues in mind. For example, in the US, if you put data behind an authentication barrier and an attacker steals it, it's likely a violation of the DMCA law. Third, find a lawyer who can advise you on IP and DMCA issues... nice folks on StackOverflow aren't enough. :-)
Now, about the technology:
A reasonable solution is to require that users be authenticated before they can get access to your sensitive Ajax calls. This allows you to simply monitor per-user usage of your Ajax calls and (manually or automatically) cancel the account of any user who makes too many requests in a particular time period. (or too many total requests, if you're trying to defend against a trickle approach).
This approach of course is vulnerable to sophisticated bots who automatically sign up new "users", but with a reasonably good CAPTCHA implementation, it's quite hard to build this kind of bot. (see "circumvention" section at http://en.wikipedia.org/wiki/CAPTCHA)
If you are trying to protect public data (no authentication) then your options are much more limited. As other answers noted, you can try IP-address-based limits (and run afoul of large corporate proxy users) but sophisticated attackers can get around this by distributing the load. There's also likley sophisticated software which watches things like request timing, request patterns, etc. and tries to spot bots. Poker sites, for example, spend a lot of time on this. But don't expect these kinds of systems to be cheap. One easy thing you can do is to mine your web logs (e.g. using Splunk) and find the top N IP addresses hitting your site, and then do a reverse-IP lookup on them. Some will be legitimate corporate or ISP proxies. But if you recognize a compeitor's domain name among the list, you can block their domain or follow up with your lawyers.
In addition to pre-theft defense, you might also want to think about inserting a "honey pot": deliberately fake information that you can track later. This is how, for example, maps manufacturers catch plaigarism: they insert a fake street in their maps and see which other maps show the same fake street. While this doesn't prevent determined folks from sucking out all your data, it does let you find out later who's re-using your data. This can be done by embedding unique text strings in your text output, and then searching for those strings on Google later (assuming your data is re-usable on another public website). If your data is HTML or images, you can include an image which points back to your site, and you can track who is downloading it, and look for patterns you can use to bust the freeloaders.
Note that the javascript encryption approach noted in one of the other answers won't work for non-authenticated sessions-- an attacker can simply download the javascript and run it just like a regular browser would. Moral of the story: public data is essentially indefensible. If you want to keep data protected, put it behind an authentication barrier.
This is obvious, but if your data is publicly searchable by search engines, you'll both need a non-AJAX solution for them (Google won't read your ajax data!) and you'll want to mark those pages NOARCHIVE so your data doesn't show up in Google's cache. You'll also probably want a white list of search engine crawler IP addreses which you allow into your search-engine-crawlable pages (you can work with Google, Bing, Yahoo, etc. to get these), otherwise malicious bots could simply impersonate Google and get your data.
In conclusion, I want to echo #kdgregory above: make sure that the threat is real enough that it's worth the effort required. Many companies overestimate the interest that other people (both legitimate customers and nefarious actors) have in their business. It might be that yours is an oddball case where you have particularly important data, it's particularly valuable to obtain, it must be publicly accessible without authentication, and your legal recourses will be limited if someone steals your data. But all those together is admittedly an unusual case.
P.S. - another way to think about this problem which may or may not apply in your case. Sometimes it's easier to change how your data works which obviates securing it. For example, can you tie your data in some way to a service on your site so that the data isn't very useful unless it's being used in conjunction with your code. Or can you embed advertising in it, so that wherever it's shown you get paid? And so on. I don't know if any of these mitigations apply to your case, but many businesses have found ways to give stuff away for free on the Internet (and encourage rather than prevent wide re-distribution) and still make money, so a hybrid free/pay strategy may (or may not) be possible in your case.
If you have an internal Memcached box, you could consider using a technique where you create an entry for each IP that hits your server with an hour expiration. Then increment that value each time the IP hits your AJAX endpoint. If the value gets over a particular threshold, fry the connection. If the value expires in Memcached, you know it isn't getting "hoovered away".
This isn't a concrete answer with a proof of concept, but maybe a starting point for you. You could create a javascript function that provides encryption/decryption functions. The javascript would need to be built dynamically, and you would include an encryption key that is unique to the session. On the server side, you'd have an encryption service that uses the key from the session to encrypt your JSON before delivering it.
This would at least prevent someone from listening to your web traffic, pulling information out of your database.
I'm with kdgergory though, it sounds like your data is too open.
Some techniques are listed in Further thoughts on hindering screen scraping.
If you use PHP, Bad behavior is a nice tool to help. If you don't use PHP, it can give some ideas on how to filter (see How it works page).
Incredibill's blog is giving nice tips, lists of User-agents/IP ranges to block, etc...
Here are a variety of suggestions:
Issue tokens required for redemption along with each AJAX request. Expire the tokens.
Track how many queries are coming from each client, and throttle excessive usage based on expected normal usage of your site.
Look for patterns in usage such as sequential queries, spikes in requests, or queries that occur faster than a human could conduct.
Check user-agents. Many bots don't completely replicate the user agent info of a browser, and you can eliminate programatic scraping of your data using this method.
Change the front-end component of your website to redirect to a captcha (or some other human verifying mechanism) once a request threshold is exceeded.
Modify your logic so the respsonse data is returned in a few different ways to complicate the code required to parse.
Obsfucate your client-side javascript.
Block IPs of offending clients.
Bots usually doesn't parse Javascript, so your ajax code won't be instantly executed. And if they even do, bots usually doesn't maintain sessions/cookies as well. Knowing that, you could reject the request if it is invoked without a valid session/cookie (which is obviously set on the server side beforehand by the request on the parent page).
This does not protect you from human hazard though. The safest way is to restrict access to users with a login/password. If that is not your intent, well, then you have to live with the fact that it's a public application. You could of course scan logs and maintian blacklists with IP addresses and useragents, but that goes extreme.

Ask the user or try not to bother him?

I have an application that might receive a net request for data from another computer. The data can be grouped into several categories so that filtering can be made upon it.
In this situation two things can happen:
I give the user the ability to filter the information he wants to send (thus reducing bandwidth and providing the user with a powerful feature)
Try not to bother the user with this so that the use of the application remains as simple as possible and decide beforehand what information will be send.
Basically is the old debate between Google UI and "your app UI". The second option is too simple but it limits the user ability to decide exactly the data he wants to send, the second introduce a complexity to the user that might be unneeded.
What alternative do you thing is better?
I think the best is if you can to do the default thing without asking the user, but provide an options menu or similar somewhere so that an interested user can go in and optimize if she wishes. If it makes sense in your situation, it might be an idea to notify the user in a subtle way that there are options that can be configured when they start the operation, without requiring them to take any action.
Without a lot more detail it is hard to say. It depends on the sort of users you will be getting and how skillful they are.
You might be able to do some sort of compromise, where it is simple by default, but has an advanced button for advanced users.
It always depends on the situations. You can assume the default inputs wherever possible and ask the user for more. But in my opinion simplicity is the best. If you need lot on user interventions, you can try wizard kind-of-interfaces.
It depends on how much time you want to put into polishing.
I would say if its a feature you are thinking of adding, its probably a good feature. However, if you have concerns of overwhelming the novice user, have a basic feature and simply add a link like "advanced" next to it.

JSONP and Cross-Domain queries - How to Update/Manipulate instead of just read

So I'm reading The Art & Science of Javascript, which is a good book, and it has a good section on JSONP. I've been reading all I can about it today, and even looking through every question here on StackOverflow. JSONP is a great idea, but it only seems to resolve the "Same Origin Problem" for getting data, but doesn't address it for changing data.
Did I just miss all the blogs that talked about this, or is JSONP not the solution I was hoping for?
JSONP results in a SCRIPT tag being generated to another server with any parameters that might be required as a GET request. e.g.
<script src="http://myserver.com/getjson?customer=232&callback=jsonp543354" type="text/javascript">
</script>
There is technically nothing to stop this sort of request altering data on the server, e.g. specifying newName=Tony. Your response could then be whether the update succeeded or not. You will be limited by whatever you can fit on a querystring. If you are going with this approach add some random element as a parameter so that proxy's won't cache it.
Some people may consider this goes against the way GET's are supposed to work i.e. they shouldn't cause data to change.
Yes, and honestly I would like to stick to that paradigm. However, I might bend the rule and say that, requests which do not alter/deal with CRUCIAL data will be accessible via GET calls... hm...
For instance, I am building a shopping cart system, and I think that allowing the adding/removing/etc of items to/from a cart could very easily be exposed via GETs, since even though you can change data, you cannot do anything critical with it. If someone maliciously added 1,000 flatscreen monitors to your shopping cart, there would be at least one verification step that would NOT be vulnerable to any attacks (a standard ASP.NET page at that point, with verification and all that jazz).
Is this a good/workable solution in anyones' opinion?

Resources