In studying Web API design (regardless of the specific technology), I often come across these two uses of the DELETE verb:
DELETE /SomeResource/123 /* deletes entity with ID 123 */
DELETE /SomeResource/ /* deletes all entities */
I always get the feeling that there's something wrong with providing the latter as an operation in most applications. In rare cases where the resource is trivial, sure, why not just blow the whole collection away without a second thought? And yes, I understand that typically it's the client app's job to present an "Are you sure?" confirmation. But I like to envision my API being driven in a safe way even by some low-level agent like Fiddler.
So is there some mechanism I'm missing, like a way for the server/API to initiate some kind of dialog with the client agent to get confirmation before blowing away 10,000 customer records?
EDIT
Yes, assume that I do want to provide the functionality to remove all entities in a given scenario, but feel compelled to avoid it because of the perceived danger (typos, not paying attention, etc; the things that a UI confirmation dialog is for)
You'll NEVER present the option to do a delete all function in a web api.
While there may be exceptions, its usually an awful idea to do this because it allows a user to perform an irreversible action. On the other hand, think of how awful it would be to have someone with ill intent find this endpoint!
Most libraries such as ASP.NET will require each method to be implemented explicitly before accepting a request (this is handled in ASP.NET during routing). So, if you don't provide DELETE /someresource/ the problem will never come up. This is particularly great, because you can't accidentally blow away data with a typo.
If you really, really want this functionality, you'll want to be 100% sure you have a reason to have it because this is a dangerous, dangerous endpoint to have.
Bottom line? I wouldn't provide it to protect your data.
Related
Is it better-practice to AJAX every form element separately (eg. send request onChange, etc) or collect all the data, then submit with 1 click save?
Essentially, auto-save or user-initiated-save?
I would generally say that a user-initiated save is the way to go for most web-applications. If for nothing else, this is how users are used to interacting with web apps; familiarity and ease of use is extremely important in web applications. Not to mention it can cut down on unnecessary traffic.
This is not to say that auto-saving does not have it's place, but often it can be cause unnecessary traffic. For example, if I am auto-saving a contact form, fill out my name, then email, then back to name to change it, that is already 3 requests that have been sent with no benefit - this is extra work for no added advantage.
Once again, I think it does have a lot to do with your application or where you are planning on using it. Inline edits are something that often uses auto-saving and there I think it is useful, whereas a contact form/signup form would not be a good idea.
I'd say that depends on the nature of your application and whether "auto-save" is a behaviour desired by your users.
"User initiated save" is what a user would expect from their experience with web forms nowadays - I would not deviate from that unless there's a good reason.
Depends on following factors:
What kind of data are you trying to save. E.g. is it okay to be able to save the data partly or you need to save it all at once?
How much data do you want to save? If you have many fields, you might want to send data in chunks (In case of wizards) or save everything at once
Its also a good idea to have data saved (in background) for large forms in a temp way if the user may take a long time to fill in the data (e.g. emails saved as drafts)
It also depends on your web app and the way you have designed your forms. In some forms you may allow certain fields to be modified and saved inplace, so that you can fetch additional data for example
In most cases it would be good to have an explicit "Save" action for your data forms
I am developing an application that allows for a user to manage some individual data points. One of the things that my users will want to do is "delete" but what should that mean?
For a web application is it better to present a user with the option to have serious delete or to use a "trash" system?
Under "serious delete" (would love to know if there is a better name for this...) you click "delete" and then the user is warned "this is final and tragic action. Once you do this you will not be able to get -insert data point name here- back, even if you are crying..."
Then if they click delete... well it truly is gone forever.
Under the "trash" model, you never trust that the user really wants to delete... instead you remove the data point from the "main display" and put into a bucket called "the trash". This gets it out of the users way, which is what they usually want, but they can get it back if they make a mistake. Obviously this is the way most operating systems have gone.
The advantages of "serious delete" are:
Easy to implement
Easy to explain to users
The disadvantages of "serious delete" are:
it can be tragically final
sometimes, cats walk on keyboards
The advantages of the "trash" system are:
user is safe from themselves
bulk methods like "delete a bunch at once" make more sense
saves support headaches
The disadvantages of the "trash" system are":
For sensitive data, you create an illusion of destruction users think something is gone, but it is not.
Lots of subtle distinctions make implementation more difficult
Do you "eventually" delete the contents of the trash?
My question is which one is the right design pattern for modern web applications? How does an "archive" function work into this? That is how gmail works. Give enough discussion to justify your answer... Would love to be pointed towards some relevant research.
-FT
Here is a relevant article you might like. It's more focused on business scenarios, but it can apply anywhere.
Don't Delete, Just Don't.
In my personal idiosyncratic view, every irreversible action is a serious design error. These “Are you REALLY, ABSOLUTELY, POSITIVELY sure?” message boxes are pure crap because the user is very quickly conditioned to just click “OK” and be done with it. In fact, I have lost important data despite such dialogs on several occasions.
In other words: these dialog boxes do not add a working fail-safe barrier, just a usability barrier.
Even trash systems have this problem (they just postpone the moment); the best solution from a usability point of view is an infinite history. Of course, implementing this may carry forbidding costs (e.g. in terms of memory usage).
Permanently deleting sensitive data can (and should!) by all means be implemented as a much more complicated action: it is rarely needed. The only issue is to make it clear to the user that data is normally not lost. Using a history instead of the trash can may help here: the trash can could be misunderstood as being a permanent delete while a (visible) history of actions doesn’t give this illusion of safety.
I think you pretty well summarized the pros and cons for the trash model:
Pro: Overall, better for the user.
Con: Overall, easier for the developer.
For a usability specialist like myself, it’s a no brainer. As far as I’m concerned developers are supposed to work hard to make life easier for users. Frankly, until web apps have Undo capabilities like trash, I don’t think we can consider them usable. Time that web apps catch up to 1984.
A few other details:
Another advantage of the trash model is the capacity to eliminate the need for confirmation message in most cases. The vast majority of the time, the user is deleting something intentionally, so putting an extra step in with a confirmation message is usually adding work for them. Worse, they get in the habit of smacking OK quickly, which generalizes to other confirmations they really better read. See http://alistapart.com/articles/neveruseawarning.
Part of the beauty of the trash metaphor is that it suggests to user that deleting is not permanent. Like a physical trash can, objects can be retrieved. If you feature the trash can obviously enough on your page (so users realize they can open it), and include an image of the trash can as an icon next to the Delete menu item, that should be enough to indicate that deleting doesn’t destroy, but rather moves things to the trash.
It’s probably acceptable to destroy the oldest deletions in the trash if there are performance considerations. That is once again consistent with the trash can metaphor: users can anticipate that sooner or later “someone” is going to empty the trash. I don’t think users will care if the trash never empties. If they need to truly destroy something sensitive, you can provide an explicit procedure for it (e.g., deleting something that’s already in the trash).
My opinion is that there really is no reason to hard-delete anything that could be of importance to someone. In my experience, the pain of having to do a tape backup because a user realized they deleted something last week that they needed is easily resolved by using an 'IsActive' or 'IsDeleted' flag on records that a user could soft-delete.
Totally context-dependent.
If it's an easily reversible action, just delete it, you might not even need a confirmation
If you might be losing lots of data, a "temporary deleted items storage" might well be in place
I don't see that the presentation of the application (Web App or not) is a major driver for this decision. I would be more concerned with the value of the data, its ease of recreation and the costs of keeping and housekeeping it.
My feeling is that when your app is acting as a respository of valuable data then delete-via-trash is preferable.
I was wondering if anyone had come across any techniques to reduce the chances of data exposed through JSON type services on the server (intended to supply AJAX functions) from being harvested by external agents.
It seems to me that the problem is not so difficult if you had say a Flash client consuming the data. Then you could send encrypted data to the client, which would know how to decrypt it. The same method seems impossible with AJAX though, due to the open nature of the Javascript source.
Has anybody implemented a clever technique here?
Whatever the method, it should still allow a genuine AJAX function to consume the data.
Note that I'm not really talking about protecting 'sensitive' information here, the odd record leaking out is not a problem. Rather I am thinking about stopping a situation where the whole DB is hoovered up by bots (either in one go, or gradually over time).
Thanks.
First, I would like to clear on this:
It seems to me that the problem is not
so difficult if you had say a Flash
client consuming the data. Then you
could send encrypted data to the
client, which would know how to
decrypt it. The same method seems
impossible with AJAX though, due to
the open nature of the Javascrip
source.
It will be pretty obvious the information is being sent encrypted to the flash client & it won't be that hard for the attacker to find out from your flash compiled program what's being used for this - replicate & get all that data.
If the data does happens to have the value you are thinking, you can count on the above.
If this is public information, embrace that & don't combat it - instead find ways to capitalize on it.
If this is information that you are only exposing to a set of users, make sure you have the corresponding authentication / secure communication. Track usage as others have said, and have measures that act on it,
The first thing to prevent bots from stealing your data is not technological, it's legal. First, make sure you have the right language in your site's Terms of Use that what you're trying to prevent is actually disallowed and defensible from a legal standpoint. Second, make sure you design your technical strategy with legal issues in mind. For example, in the US, if you put data behind an authentication barrier and an attacker steals it, it's likely a violation of the DMCA law. Third, find a lawyer who can advise you on IP and DMCA issues... nice folks on StackOverflow aren't enough. :-)
Now, about the technology:
A reasonable solution is to require that users be authenticated before they can get access to your sensitive Ajax calls. This allows you to simply monitor per-user usage of your Ajax calls and (manually or automatically) cancel the account of any user who makes too many requests in a particular time period. (or too many total requests, if you're trying to defend against a trickle approach).
This approach of course is vulnerable to sophisticated bots who automatically sign up new "users", but with a reasonably good CAPTCHA implementation, it's quite hard to build this kind of bot. (see "circumvention" section at http://en.wikipedia.org/wiki/CAPTCHA)
If you are trying to protect public data (no authentication) then your options are much more limited. As other answers noted, you can try IP-address-based limits (and run afoul of large corporate proxy users) but sophisticated attackers can get around this by distributing the load. There's also likley sophisticated software which watches things like request timing, request patterns, etc. and tries to spot bots. Poker sites, for example, spend a lot of time on this. But don't expect these kinds of systems to be cheap. One easy thing you can do is to mine your web logs (e.g. using Splunk) and find the top N IP addresses hitting your site, and then do a reverse-IP lookup on them. Some will be legitimate corporate or ISP proxies. But if you recognize a compeitor's domain name among the list, you can block their domain or follow up with your lawyers.
In addition to pre-theft defense, you might also want to think about inserting a "honey pot": deliberately fake information that you can track later. This is how, for example, maps manufacturers catch plaigarism: they insert a fake street in their maps and see which other maps show the same fake street. While this doesn't prevent determined folks from sucking out all your data, it does let you find out later who's re-using your data. This can be done by embedding unique text strings in your text output, and then searching for those strings on Google later (assuming your data is re-usable on another public website). If your data is HTML or images, you can include an image which points back to your site, and you can track who is downloading it, and look for patterns you can use to bust the freeloaders.
Note that the javascript encryption approach noted in one of the other answers won't work for non-authenticated sessions-- an attacker can simply download the javascript and run it just like a regular browser would. Moral of the story: public data is essentially indefensible. If you want to keep data protected, put it behind an authentication barrier.
This is obvious, but if your data is publicly searchable by search engines, you'll both need a non-AJAX solution for them (Google won't read your ajax data!) and you'll want to mark those pages NOARCHIVE so your data doesn't show up in Google's cache. You'll also probably want a white list of search engine crawler IP addreses which you allow into your search-engine-crawlable pages (you can work with Google, Bing, Yahoo, etc. to get these), otherwise malicious bots could simply impersonate Google and get your data.
In conclusion, I want to echo #kdgregory above: make sure that the threat is real enough that it's worth the effort required. Many companies overestimate the interest that other people (both legitimate customers and nefarious actors) have in their business. It might be that yours is an oddball case where you have particularly important data, it's particularly valuable to obtain, it must be publicly accessible without authentication, and your legal recourses will be limited if someone steals your data. But all those together is admittedly an unusual case.
P.S. - another way to think about this problem which may or may not apply in your case. Sometimes it's easier to change how your data works which obviates securing it. For example, can you tie your data in some way to a service on your site so that the data isn't very useful unless it's being used in conjunction with your code. Or can you embed advertising in it, so that wherever it's shown you get paid? And so on. I don't know if any of these mitigations apply to your case, but many businesses have found ways to give stuff away for free on the Internet (and encourage rather than prevent wide re-distribution) and still make money, so a hybrid free/pay strategy may (or may not) be possible in your case.
If you have an internal Memcached box, you could consider using a technique where you create an entry for each IP that hits your server with an hour expiration. Then increment that value each time the IP hits your AJAX endpoint. If the value gets over a particular threshold, fry the connection. If the value expires in Memcached, you know it isn't getting "hoovered away".
This isn't a concrete answer with a proof of concept, but maybe a starting point for you. You could create a javascript function that provides encryption/decryption functions. The javascript would need to be built dynamically, and you would include an encryption key that is unique to the session. On the server side, you'd have an encryption service that uses the key from the session to encrypt your JSON before delivering it.
This would at least prevent someone from listening to your web traffic, pulling information out of your database.
I'm with kdgergory though, it sounds like your data is too open.
Some techniques are listed in Further thoughts on hindering screen scraping.
If you use PHP, Bad behavior is a nice tool to help. If you don't use PHP, it can give some ideas on how to filter (see How it works page).
Incredibill's blog is giving nice tips, lists of User-agents/IP ranges to block, etc...
Here are a variety of suggestions:
Issue tokens required for redemption along with each AJAX request. Expire the tokens.
Track how many queries are coming from each client, and throttle excessive usage based on expected normal usage of your site.
Look for patterns in usage such as sequential queries, spikes in requests, or queries that occur faster than a human could conduct.
Check user-agents. Many bots don't completely replicate the user agent info of a browser, and you can eliminate programatic scraping of your data using this method.
Change the front-end component of your website to redirect to a captcha (or some other human verifying mechanism) once a request threshold is exceeded.
Modify your logic so the respsonse data is returned in a few different ways to complicate the code required to parse.
Obsfucate your client-side javascript.
Block IPs of offending clients.
Bots usually doesn't parse Javascript, so your ajax code won't be instantly executed. And if they even do, bots usually doesn't maintain sessions/cookies as well. Knowing that, you could reject the request if it is invoked without a valid session/cookie (which is obviously set on the server side beforehand by the request on the parent page).
This does not protect you from human hazard though. The safest way is to restrict access to users with a login/password. If that is not your intent, well, then you have to live with the fact that it's a public application. You could of course scan logs and maintian blacklists with IP addresses and useragents, but that goes extreme.
I have an application that might receive a net request for data from another computer. The data can be grouped into several categories so that filtering can be made upon it.
In this situation two things can happen:
I give the user the ability to filter the information he wants to send (thus reducing bandwidth and providing the user with a powerful feature)
Try not to bother the user with this so that the use of the application remains as simple as possible and decide beforehand what information will be send.
Basically is the old debate between Google UI and "your app UI". The second option is too simple but it limits the user ability to decide exactly the data he wants to send, the second introduce a complexity to the user that might be unneeded.
What alternative do you thing is better?
I think the best is if you can to do the default thing without asking the user, but provide an options menu or similar somewhere so that an interested user can go in and optimize if she wishes. If it makes sense in your situation, it might be an idea to notify the user in a subtle way that there are options that can be configured when they start the operation, without requiring them to take any action.
Without a lot more detail it is hard to say. It depends on the sort of users you will be getting and how skillful they are.
You might be able to do some sort of compromise, where it is simple by default, but has an advanced button for advanced users.
It always depends on the situations. You can assume the default inputs wherever possible and ask the user for more. But in my opinion simplicity is the best. If you need lot on user interventions, you can try wizard kind-of-interfaces.
It depends on how much time you want to put into polishing.
I would say if its a feature you are thinking of adding, its probably a good feature. However, if you have concerns of overwhelming the novice user, have a basic feature and simply add a link like "advanced" next to it.
Rather than using a "save" button on a web form, many web designers like a "save as you go" approach. Where as the user's changes to the data are saved immediately once the user changes focus out of say a text box.
Has anyone identified a formal pattern for this technique? I especially need to tie it all back to chunky service call. Concurrency issues seem to be one of the first issues coming to mind.
Ajax Patters has the Object Persistence pattern which sounds like what you're describing.
You may also want to look at the Fat Client article regarding response issues which might arrise.
You can try the UX Partterns Explorer by Infragistics.
http://quince.infragistics.com/#/Main
This "pattern" is actually the norm for real world interfaces (the radio doesn't wait for you to hit save before applying your volume changes).
As for the concurrency, one solution is to say that server time is "real" and then use ajax to push changes to clients who may be looking at semantically intersecting views.
-- MarkusQ