Can SMB2 CHANGE_NOTIFY be used to keep an accurate snapshot of a remote directory listing? - smb

SMB2 CHANGE_NOTIFY looks promising, as if it could deliver enough information on subdirectory or subtree updates from the server, so we can keep our listing of remote directory up-to-date by handling the response.
However, it's not a subscription to an event stream, just a one-off command receiving one response, so I suspect that it can be used only as a mere hint to invalidate our cache and reread the directory. When we receive a response, there could be any additional changes before we send another CHANGE_NOTIFY request, and we'll miss the details of these changes.
Is there any way around this problem? Or is rereading directory on learning that it's updated a necessary step?
I want to understand possible solutions on the protocol level (you can imagine I'm using a customized client that I can make do what I want, with some common servers like Windows or smbd3).

Strictly speaking, not even re-reading the directory listing should save you, since the directory can change between re-reading the listing and submitting another CHANGE_NOTIFY request. The race condition just moves to a different spot.
Except there is no race condition.
This took a little digging, but it’s all there in the specification. In MS-SMB2 v20200826 §3.3.5.19 ‘Receiving an SMB2 CHANGE_NOTIFY Request’, it is stated:
The server MUST process a change notification request in the object store as specified by the algorithm in section 3.3.1.3.
In §3.3.1.3 ‘Algorithm for Change Notifications in an Object Store’, we have:
The server MUST implement an algorithm that monitors for changes on an object store. The effect of this algorithm MUST be identical to that used to offer the behavior specified in [MS-CIFS] sections 3.2.4.39 and 3.3.5.59.4.
And in MS-CIFS v20201001 §3.3.5.59.4 ‘Receiving an NT_TRANSACT_NOTIFY_CHANGE Request’ there is this:
If the client has not issued any NT_TRANSACT_NOTIFY_CHANGE Requests on this FID previously, the server SHOULD allocate an empty change notification buffer and associate it with the open directory. The size of the buffer SHOULD be at least equal to the MaxParameterCount field in the SMB_COM_NT_TRANSACT Request (section 2.2.4.62.1) used to transport the NT_TRANSACT_NOTIFY_CHANGE Request. If the client previously issued an NT_TRANSACT_NOTIFY_CHANGE Request on this FID, the server SHOULD already have a change notification buffer associated with the FID. The change notification buffer is used to collect directory change information in between NT_TRANSACT_NOTIFY_CHANGE (section 2.2.7.4) calls that reference the same FID.
Emphasis mine. This agrees with how Samba implements it (the fsp structure persists between individual requests). I wouldn’t expect Microsoft to do worse than that by not keeping the promise they made in their own specification.

Related

Incremental updates using browser cache

The client (an AngularJS application) gets rather big lists from the server. The lists may have hundreds or thousands of elements, which can mean a few megabytes uncompressed (and some users (admins) get much more data).
I'm not planning to let the client get partial results as sorting and filtering should not bother the server.
Compression works fine (factor of about 10) and as the lists don't change often, 304 NOT MODIFIED helps a lot, too. But another important optimization is missing:
As a typical change of the lists are rather small (e.g., modifying two elements and adding a new one), transferring the changes only sounds like a good idea. I wonder how to do it properly.
Something like GET /offer/123/items should always return all the items in the offer number 123, right? Compression and 304 can be used here, but no incremental update. A request like GET /offer/123/items?since=1495765733 sounds like the way to go, but then browser caching does not get used:
either nothing has changed and the answer is empty (and caching it makes no sense)
or something has changed, the client updates its state and does never ask for changes since 1495765733 anymore (and caching it makes even less sense)
Obviously, when using the "since" query, nothing will be cached for the "resource" (the original query gets used just once or not at all).
So I can't rely on the browser cache and I can only use localStorage or sessionStorage, which have a few downsides:
it's limited to a few megabytes (the browser HTTP cache may be much bigger and gets handled automatically)
I have to implement some replacement strategy when I hit the limit
the browser cache stores already compressed data which I don't get (I'd have to re-compress them)
it doesn't work for the users (admins) getting bigger lists as even a single list may already be over limit
it gets emptied on logout (a customer's requirement)
Given that there's HTML 5 and HTTP 2.0, that's pretty unsatisfactory. What am I missing?
Is it possible to use the browser HTTP cache together with incremental updates?
I think there is one thing you are missing: in short, headers. What I'm thinking you could do and that would match (most) of your requirements, would be to:
First GET /offer/123/items is done normally, nothing special.
Subsequents GET /offer/123/items will be sent with a Fetched-At: 1495765733 header, indicating your server when the initial request has been sent.
From this point on, two scenarios are possible.
Either there is no change, and you can send the 304.
If there is a change however, return the new items since the time stamp previously sent has headers, but set a Cache-Control: no-cache from your response.
This leaves you to the point where you can have incremental updates, with caching of the initial megabytes-sized elements.
There is still one drawback though, that the caching is only done once, it won't cache updates. You said that your lists are not updated often so it might already work for you, but if you really want to push this further, I could think of one more thing.
Upon receiving an incremental update, you could trigger in the background another request without the Fetched-At header that won't be used at all by your application, but will just be there to update your http cache. It should not be as bad as it sounds performance-wise since your framework won't update its data with the new one (and potentially trigger re-renders), the only notable drawback would be in term of network and memory consumption. On mobile it might be problematic, but it doesn't sounds like an app intended to be displayed on them anyway.
I absolutely don't know your use-case and will just throw that out there, but are you really sure that doing some sort of pagination won't work? Megabytes of data sounds a lot to display and process for normal humans ;)
I would ditch the request/response cycle entirely and move to a push model.
Specifically, WebSockets.
This is the standard technology used on financial trading websites serving tables of real-time ticker data. Here is one such production application demonstrating the power of WebSockets:
https://www.poloniex.com/exchange#btc_eth
WebSocket applications have two types of state: global and user. The above link will show three tables of global data. When you're logged in, two aditional tables of user data are displayed at the bottom.
This is not HTTP; you won't be able to just slap this into a Java Servlet. You'll need to run a separate process on your server which communicates over TCP. The good news is, there are mature solutions readily available. A Java-based solution with a very decent free licensing option, which includes both client and server APIs (and does integrate with Angular2) is Lightstreamer. They have a well-organized demo page too. There are also adapters available to integrate with your data sources.
You may be hesitant to ditch your existing servlet approach, but this will be less headaches in the long run, and scales marvelously. HTTP polling, even with well-designed header-only requests, do not scale well with large lists which update frequently.
---------- EDIT ----------
Since the list updates are infrequent, WebSockets are probably overkill. Based on the further details provided by comments on this answer, I would recommend a DOM-based, AJAX-updated sorter and filterer such as DataTables, which has some built-in options for caching. In order to reuse client data across sessions, ajax requests in the previous link should be modified to save the current data in the table to localStorage after every ajax request, and when the client starts a new session, populate the table with this data. This will allow the plugin to manage the filtering, sorting, caching and browser-based persistence.
I'm thinking about something similar to Aperçu's idea, but using two requests. The idea is yet incomplete, so bear with me...
The client asks for GET /offer/123/items, possibly with the ETag and Fetched-At headers.
The server answers with
200 and a full list if either header is missing, or when there are too many changes since the Fetched-At timestamp
304 if nothing has changed since then
304 and a special Fetch-More header telling the client that more data is to be fetched otherwise
The last case is violating how HTTP should work, but AFAIK it's the only way letting the browser cache everything what I want it to cache. Since the whole communication is encrypted, proxies can't punish me for violating the spec.
The client reacts to Fetch-Errata by requesting GET /offer/123/items/errata. This way, the resource has got split into two requests. The split is ugly, but an angular $http interceptor can hide the ugliness from the application.
The second request is cacheable, too, and there can be also a Fetched-At header. The details are unclear, but some strong handwavium makes me believe that it can work. Actually, the errata could itself be inaccurate but still useful and get an errata itself.... etc.
With HTTP/1.1, more requests may mean more latency, but having a couple of them should still be profitable because of the saved bandwidth. The server can decide when to stop.
With HTTP/2, multiple requests could be send at once. The server could be make to handle them efficiently as it knows that they belong together. Some more handwavium...
I find the idea strange, but interesting and I'm looking forward to comments. Feel free to downvote me, but please leave an explanation.

Check if S3 file has been modified

How can I use a shell script check if an Amazon S3 file ( small .xml file) has been modified. I'm currently using curl to check every 10 seconds, but it's making many GET requests.
curl "s3.aws.amazon.com/bucket/file.xml"
if cmp "file.xml" "current.xml"
then
echo "no change"
else
echo "file changed"
cp "file.xml" "current.xml"
fi
sleep(10s)
Is there a better way to check every 10 seconds that reduces the number of GET requests? (This is built on top of a rails app so i could possibly build a handler in rails?)
Let me start by first telling you some facts about S3. You might know this, but in case you don't, you might see that your current code could have some "unexpected" behavior.
S3 and "Eventual Consistency"
S3 provides "eventual consistency" for overwritten objects. From the S3 FAQ, you have:
Q: What data consistency model does Amazon S3 employ?
Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
Eventual consistency for overwrites means that, whenever an object is updated (ie, whenever your small XML file is overwritten), clients retrieving the file MAY see the new version, or they MAY see the old version. For how long? For an unspecified amount of time. It typically achieves consistency in much less than 10 seconds, but you have to assume that it will, eventually, take more than 10 seconds to achieve consistency. More interestingly (sadly?), even after a successful retrieval of the new version, clients MAY still receive the older version later.
One thing that you can be assured of is: if a client starts download a version of the file, it will download that entire version (in other words, there's no chance that you would receive for example, the first half of the XML file as the old version and the second half as the new version).
With that in mind, notice that your script could fail to identify the change within your 10-second timeframe: you could make multiple requests, even after a change, until your script downloads a changed version. And even then, after you detect the change, it is (unfortunately) entirely possible the the next request would download the previous (!) version, and trigger yet another "change" in your code, then the next would give the current version, and trigger yet another "change" in your code!
If you are OK with the fact that S3 provides eventual consistency, there's a way you could possibly improve your system.
Idea 1: S3 event notifications + SNS
You mentioned that you thought about using SNS. That could definitely be an interesting approach: you could enable S3 event notifications and then get a notification through SNS whenever the file is updated.
How do you get the notification? You would need to create a subscription, and here you have a few options.
Idea 1.1: S3 event notifications + SNS + a "web app"
If you have a "web application", ie, anything running in a publicly accessible HTTP endpoint, you could create an HTTP subscriber, so SNS will call your server with the notification whenever it happens. This might or might not be possible or desirable in your scenario
Idea 2: S3 event notifications + SQS
You could create a message queue in SQS and have S3 deliver the notifications directly to the queue. This would also be possible as S3 event notifications + SNS + SQS, since you can add a queue as a subscriber to an SNS topic (the advantage being that, in case you need to add functionality later, you could add more queues and subscribe them to the same topic, therefore getting "multiple copies" of the notification).
To retrieve the notification you'd make a call to SQS. You'd still have to poll - ie, have a loop and call GET on SQS (which cost about the same, or maybe a tiny bit more depending on the region, than S3 GETs). The slight difference is that you could reduce a bit the number of total requests -- SQS supports long-polling requests of up to 20 seconds: you make the GET call on SQS and, if there are no messages, SQS holds the request for up to 20 seconds, returning immediately if a message arrives, or returning an empty response if no messages are available within those 20 seconds. So, you would send only 1 GET every 20 seconds, to get faster notifications than you currently have. You could potentially halve the number of GETs you make (once every 10s to S3 vs once every 20s to SQS).
Also - you could chose to use one single SQS queue to aggregate all changes to all XML files, or multiple SQS queues, one per XML file. With a single queue, you would greatly reduce the overall number of GET requests. With one queue per XML file, that's when you could potentially "halve" the number of GET request as compared to what you have now.
Idea 3: S3 event notifications + AWS Lambda
You can also use a Lambda function for this. This could require some more changes in your environment - you wouldn't use a Shell Script to poll, but S3 can be configured to call a Lambda Function for you as a response to an event, such as an update on your XML file. You could write your code in Java, Javascript or Python (some people devised some "hacks" to use other languages as well, including Bash).
The beauty of this is that there's no more polling, and you don't have to maintain a web server (as in "idea 1.1"). Your code "simply runs", whenever there's a change.
Notice that, no matter which one of these ideas you use, you still have to deal with eventual consistency. In other words, you'd know that a PUT/POST has happened, but once your code sends a GET, you could still receive the older version...
Idea 4: Use DynamoDB instead
If you have the ability to make a more structural change on the system, you could consider using DynamoDB for this task.
The reason I suggest this is because DynamoDB supports strong consistency, even for updates. Notice that it's not the default - by default, DynamoDB operates in eventual consistency mode, but the "retrieval" operations (GetItem, for example), support fully consistent reads.
Also, DynamoDB has what we call "DynamoDB Streams", which is a mechanism that allows you to get a stream of changes made to any (or all) items on your table. These notifications can be polled, or they can even be used in conjunction with a Lambda function, that would be called automatically whenever a change happens! This, plus the fact that DynamoDB can be used with strong consistency, could possibly help you solve your problem.
In DynamoDB, it's usually a good practice to keep the records small. You mentioned in your comments that your XML files are about 2kB - I'd say that could be considered "small enough" so that it would be a good fit for DynamoDB! (the reasoning: DynamoDB reads are typically calculated as multiples of 4kB; so to fully read 1 of your XML files, you'd consume just 1 read; also, depending on how you do it, for example using a Query operation instead of a GetItem operation, you could possibly be able to read 2 XML files from DynamoDB consuming just 1 read operation).
Some references:
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
http://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
I can think of another way by using S3 Versioning; this would require the least amount of changes to your code.
Versioning is a means of keeping multiple variants of an object in the same bucket.
This would mean that every time a new file.xml is uploaded, S3 will create a new version.
In your script, instead of getting the object and comparing it, get the HEAD of the object which contains the VersionId field. Match this version with the previous version to find out if the file has changed.
If the file has indeed changed, get the new file, and also get the new version of that file and save it locally so that next time you can use this version to check if a newer-newer version has been uploaded.
Note 1: You will still be making lots of calls to S3, but instead of fetching the entire file every time, you are only fetching the metadata of the file which is much faster and smaller in size.
Note 2: However, if your aim was to reduce the number of calls, the easiest solution I can think of is using lambdas. You can trigger a lambda function every time a file is uploaded that then calls the REST endpoint of your service to notify you of the file change.
You can use --exact-timestamps
see AWS discussion
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Instead of using versioning, you can simply compare the E-Tag of the file, which is available in the header, and is similar to the MD-5 hash of the file (and is exactly the MD-5 hash if the file is small, i.e. less than 4 MB, or sometimes even larger. Otherwise, it is the MD-5 hash of a list of binary hashes of blocks.)
With that said, I would suggest you look at your application again and ask if there is a way you can avoid this critical path.

Ruby websocket check if user exist

Using Event-machine and Ruby. Currently I'm making a game were at the end of the turn it checks if other user there. When sending data to the user using ws.send() how can I check if the user actually got the data or is alternative solution?
As the library doesn't provide you with access to the underlying protocol elements, you need to add elements to your application protocol to do this. A typical approach is to add an identifier to each message and response to messages with acknowledgement messages that contain those identifiers.
Note that such an approach will only help you to have a better idea of what has been received by a client. There is no assurance of particular state in the case of errors. An example would be losing a connection after the client as sent an ACK, but the service has not received it.
As a result of the complexity I just mentioned, it is often easier to try to make most operations idempotent - that is able to be replayed without detriment to the system, and to replay readily during/after error conditions. You may additionally find a way to periodically synchronize the relevant state entirely, to avoid the long term continuation of minor errors introduced by loss of data/a connection.

Is there a RESTful way to determine whether a POST will succeed?

Is there a RESTful way to determine whether a POST (or any other non-idempotent verb) will succeed? This would seem to be useful in cases where you essentially need to do multiple idempotent requests against different services, any of which might fail. It would be nice if these requests could be done in a "transaction" (i.e. with support for rollback), but since this is impossible, an alternative is to check whether each of the requests will succeed before actually performing them.
For example suppose I'm building an ecommerce system that allows people to buy t-shirts with custom text printed on them, and this system requires integrating with two different services: a t-shirt printing service, and a payment service. Each of these has a RESTful API, and either might fail. (e.g. the printing company might refuse to print certain words on a t-shirt, say, and the bank might complain if the credit card has expired.) Is there any way to speculatively perform these two requests, so my system will only proceed with them if both requests appear valid?
If not, can this problem be solved in a different way? Creating a resource via a POST with status = pending, and changing this to status = complete if all requests succeed? (DELETE is more tricky...)
HTTP defines the 202 status code for exactly your scenario:
202 Accepted
The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this.
The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
Source: HTTP 1.1 Status Code Definition
This is similar to 201 Created, except that you are indicating that the request has not been completed and the entity has not yet been created. Your response would contain a URL to the resource representing the "order request", so clients can check the status of the order through this URL.
To answer your question more directly: There is no way to "test" whether a request will succeed before you make it, because you're asking for clairvoyance.
It's not possible to foresee the range of technical problems that could occur when you attempt to make a request in the future. The network may be unavailable, the server may not be able to access its database or external systems it depends on for functioning, there may be a power-cut and the server is offline, a stray neutrino could wander into your memory and bump a 0 to a 1 causing a catastrophic kernel fault.
In order to consume a remote service you need to account for possible failures of any request in isolation of any other processes.
For your specific problem, if the services have no transactional safety, you can't bake any in there and you have to deal with this in a more real-world way. A few options off the top of my head:
Get the T-Shirt company to give you a "test" mechanism, so you can see whether they'll process any given order without actually placing it. It could be that placing an order with them is a two-phase operation, where you construct the order in the first phase (at which time they validate its creation) and then you subsequently ask the order to be processed (after you have taken payment successfully).
Take the credit-card payment first and move your order into a "paid" state. Then attempt to fulfil the order with the T-Shirt service as an asynchronous process. If fulfilment fails and you can identify that the customer tried to get something printed the company is not prepared to produce, you will have to contact them to change their order or produce a refund.
Most organizations will adopt the second approach, due to its technical simplicity and reduced risk to the business. It also has the benefit of being able to cope with the T-Shirt service not being available; the asynchronous process simply waits until the service is available and completes the order at that time.
Exactly. That can be done as you suggest in your last sentence. The idea would be to decopule resource creation (that will always work unless network failures) that represents an "ongoing request" of the "order acceptation", that can be later decided. As POST returns a "Location" header, you can then retrieve in any moment the "status" of your request.
At some point it may become either accepted or rejected. This may be intantaneous or it may take some time, so you have to design your service with these restrictions (i.e. allowing the client to check if his/her order is accepted, or running some kind of hourly/daily service that collect accepted requests).

When do you use POST and when do you use GET?

From what I can gather, there are three categories:
Never use GET and use POST
Never use POST and use GET
It doesn't matter which one you use.
Am I correct in assuming those three cases? If so, what are some examples from each case?
Use POST for destructive actions such as creation (I'm aware of the irony), editing, and deletion, because you can't hit a POST action in the address bar of your browser. Use GET when it's safe to allow a person to call an action. So a URL like:
http://myblog.org/admin/posts/delete/357
Should bring you to a confirmation page, rather than simply deleting the item. It's far easier to avoid accidents this way.
POST is also more secure than GET, because you aren't sticking information into a URL. And so using GET as the method for an HTML form that collects a password or other sensitive information is not the best idea.
One final note: POST can transmit a larger amount of information than GET. 'POST' has no size restrictions for transmitted data, whilst 'GET' is limited to 2048 characters.
In brief
Use GET for safe andidempotent requests
Use POST for neither safe nor idempotent requests
In details
There is a proper place for each. Even if you don't follow RESTful principles, a lot can be gained from learning about REST and how a resource oriented approach works.
A RESTful application will use GETs for operations which are both safe and idempotent.
A safe operation is an operation which does not change the data requested.
An idempotent operation is one in which the result will be the same no matter how many times you request it.
It stands to reason that, as GETs are used for safe operations they are automatically also idempotent. Typically a GET is used for retrieving a resource (a question and its associated answers on stack overflow for example) or collection of resources.
A RESTful app will use PUTs for operations which are not safe but idempotent.
I know the question was about GET and POST, but I'll return to POST in a second.
Typically a PUT is used for editing a resource (editing a question or an answer on stack overflow for example).
A POST would be used for any operation which is neither safe or idempotent.
Typically a POST would be used to create a new resource for example creating a NEW SO question (though in some designs a PUT would be used for this also).
If you run the POST twice you would end up creating TWO new questions.
There's also a DELETE operation, but I'm guessing I can leave that there :)
Discussion
In practical terms modern web browsers typically only support GET and POST reliably (you can perform all of these operations via javascript calls, but in terms of entering data in forms and pressing submit you've generally got the two options). In a RESTful application the POST will often be overriden to provide the PUT and DELETE calls also.
But, even if you are not following RESTful principles, it can be useful to think in terms of using GET for retrieving / viewing information and POST for creating / editing information.
You should never use GET for an operation which alters data. If a search engine crawls a link to your evil op, or the client bookmarks it could spell big trouble.
Use GET if you don't mind the request being repeated (That is it doesn't change state).
Use POST if the operation does change the system's state.
Short Version
GET: Usually used for submitted search requests, or any request where you want the user to be able to pull up the exact page again.
Advantages of GET:
URLs can be bookmarked safely.
Pages can be reloaded safely.
Disadvantages of GET:
Variables are passed through url as name-value pairs. (Security risk)
Limited number of variables that can be passed. (Based upon browser. For example, Internet Explorer is limited to 2,048 characters.)
POST: Used for higher security requests where data may be used to alter a database, or a page that you don't want someone to bookmark.
Advantages of POST:
Name-value pairs are not displayed in url. (Security += 1)
Unlimited number of name-value pairs can be passed via POST. Reference.
Disadvantages of POST:
Page that used POST data cannot be bookmark. (If you so desired.)
Longer Version
Directly from the Hypertext Transfer Protocol -- HTTP/1.1:
9.3 GET
The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process.
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
The semantics of the GET method change to a "partial GET" if the request message includes a Range header field. A partial GET requests that only part of the entity be transferred, as described in section 14.35. The partial GET method is intended to reduce unnecessary network usage by allowing partially-retrieved entities to be completed without transferring data already held by the client.
The response to a GET request is cacheable if and only if it meets the requirements for HTTP caching described in section 13.
See section 15.1.3 for security considerations when used for forms.
9.5 POST
The POST method is used to request that the origin server accept the
entity enclosed in the request as a new subordinate of the resource
identified by the Request-URI in the Request-Line. POST is designed
to allow a uniform method to cover the following functions:
Annotation of existing resources;
Posting a message to a bulletin board, newsgroup, mailing list,
or similar group of articles;
Providing a block of data, such as the result of submitting a
form, to a data-handling process;
Extending a database through an append operation.
The actual function performed by the POST method is determined by the
server and is usually dependent on the Request-URI. The posted entity
is subordinate to that URI in the same way that a file is subordinate
to a directory containing it, a news article is subordinate to a
newsgroup to which it is posted, or a record is subordinate to a
database.
The action performed by the POST method might not result in a
resource that can be identified by a URI. In this case, either 200
(OK) or 204 (No Content) is the appropriate response status,
depending on whether or not the response includes an entity that
describes the result.
The first important thing is the meaning of GET versus POST :
GET should be used to... get... some information from the server,
while POST should be used to send some information to the server.
After that, a couple of things that can be noted :
Using GET, your users can use the "back" button in their browser, and they can bookmark pages
There is a limit in the size of the parameters you can pass as GET (2KB for some versions of Internet Explorer, if I'm not mistaken) ; the limit is much more for POST, and generally depends on the server's configuration.
Anyway, I don't think we could "live" without GET : think of how many URLs you are using with parameters in the query string, every day -- without GET, all those wouldn't work ;-)
Apart from the length constraints difference in many web browsers, there is also a semantic difference. GETs are supposed to be "safe" in that they are read-only operations that don't change the server state. POSTs will typically change state and will give warnings on resubmission. Search engines' web crawlers may make GETs but should never make POSTs.
Use GET if you want to read data without changing state, and use POST if you want to update state on the server.
My general rule of thumb is to use Get when you are making requests to the server that aren't going to alter state. Posts are reserved for requests to the server that alter state.
One practical difference is that browsers and webservers have a limit on the number of characters that can exist in a URL. It's different from application to application, but it's certainly possible to hit it if you've got textareas in your forms.
Another gotcha with GETs - they get indexed by search engines and other automatic systems. Google once had a product that would pre-fetch links on the page you were viewing, so they'd be faster to load if you clicked those links. It caused major havoc on sites that had links like delete.php?id=1 - people lost their entire sites.
Use GET when you want the URL to reflect the state of the page. This is useful for viewing dynamically generated pages, such as those seen here. A POST should be used in a form to submit data, like when I click the "Post Your Answer" button. It also produces a cleaner URL since it doesn't generate a parameter string after the path.
Because GETs are purely URLs, they can be cached by the web browser and may be better used for things like consistently generated images. (Set an Expiry time)
One example from the gravatar page: http://www.gravatar.com/avatar/4c3be63a4c2f539b013787725dfce802?d=monsterid
GET may yeild marginally better performance, some webservers write POST contents to a temporary file before invoking the handler.
Another thing to consider is the size limit. GETs are capped by the size of the URL, 1024 bytes by the standard, though browsers may support more.
Transferring more data than that should use a POST to get better browser compatibility.
Even less than that limit is a problem, as another poster wrote, anything in the URL could end up in other parts of the brower's UI, like history.
1.3 Quick Checklist for Choosing HTTP GET or POST
Use GET if:
The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup).
Use POST if:
The interaction is more like an order, or
The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or
The user be held accountable for the results of the interaction.
Source.
There is nothing you can't do per-se. The point is that you're not supposed to modify the server state on an HTTP GET. HTTP proxies assume that since HTTP GET does not modify the state then whether a user invokes HTTP GET one time or 1000 times makes no difference. Using this information they assume it is safe to return a cached version of the first HTTP GET. If you break the HTTP specification you risk breaking HTTP client and proxies in the wild. Don't do it :)
This traverses into the concept of REST and how the web was kinda intended on being used. There is an excellent podcast on Software Engineering radio that gives an in depth talk about the use of Get and Post.
Get is used to pull data from the server, where an update action shouldn't be needed. The idea being is that you should be able to use the same GET request over and over and have the same information returned. The URL has the get information in the query string, because it was meant to be able to be easily sent to other systems and people like a address on where to find something.
Post is supposed to be used (at least by the REST architecture which the web is kinda based on) for pushing information to the server/telling the server to perform an action. Examples like: Update this data, Create this record.
i dont see a problem using get though, i use it for simple things where it makes sense to keep things on the query string.
Using it to update state - like a GET of delete.php?id=5 to delete a page - is very risky. People found that out when Google's web accelerator started prefetching URLs on pages - it hit all the 'delete' links and wiped out peoples' data. Same thing can happen with search engine spiders.
POST can move large data while GET cannot.
But generally it's not about a shortcomming of GET, rather a convention if you want your website/webapp to be behaving nicely.
Have a look at http://www.w3.org/2001/tag/doc/whenToUseGet.html
From RFC 2616:
9.3 GET
The GET method means retrieve whatever information (in the form of
an entity) is identified by the
Request-URI. If the Request-URI refers
to a data-producing process, it is the
produced data which shall be returned
as the entity in the response and not
the source text of the process, unless
that text happens to be the output of
the process.
9.5 POST The POST method is used to request that the origin server
accept the entity enclosed in the
request as a new subordinate of the
resource identified by the Request-URI
in the Request-Line. POST is designed
to allow a uniform method to cover the
following functions:
Annotation of existing resources;
Posting a message to a bulletin board, newsgroup, mailing list, or
similar group of articles;
Providing a block of data, such as the result of submitting a form, to a
data-handling process;
Extending a database through an append operation.
The actual function performed by the
POST method is determined by the
server and is usually dependent on the
Request-URI. The posted entity is
subordinate to that URI in the same
way that a file is subordinate to a
directory containing it, a news
article is subordinate to a newsgroup
to which it is posted, or a record is
subordinate to a database.
The action performed by the POST
method might not result in a resource
that can be identified by a URI. In
this case, either 200 (OK) or 204 (No
Content) is the appropriate response
status, depending on whether or not
the response includes an entity that
describes the result.
I use POST when I don't want people to see the QueryString or when the QueryString gets large. Also, POST is needed for file uploads.
I don't see a problem using GET though, I use it for simple things where it makes sense to keep things on the QueryString.
Using GET will allow linking to a particular page possible too where POST would not work.
The original intent was that GET was used for getting data back and POST was to be anything. The rule of thumb that I use is that if I'm sending anything back to the server, I use POST. If I'm just calling an URL to get back data, I use GET.
Read the article about HTTP in the Wikipedia. It will explain what the protocol is and what it does:
GET
Requests a representation of the specified resource. Note that GET should not be used for operations that cause side-effects, such as using it for taking actions in web applications. One reason for this is that GET may be used arbitrarily by robots or crawlers, which should not need to consider the side effects that a request should cause.
and
POST
Submits data to be processed (e.g., from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both.
The W3C has a document named URIs, Addressability, and the use of HTTP GET and POST that explains when to use what. Citing
1.3 Quick Checklist for Choosing HTTP GET or POST
Use GET if:
The interaction is more like a question (i.e., it is a
safe operation such as a query, read operation, or lookup).
and
Use POST if:
The interaction is more like an order, or
The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or
o The user be held accountable for the results of the interaction.
However, before the final decision to use HTTP GET or POST, please also consider considerations for sensitive data and practical considerations.
A practial example would be whenever you submit an HTML form. You specify either post or get for the form action. PHP will populate $_GET and $_POST accordingly.
In PHP, POST data limit is usually set by your php.ini. GET is limited by server/browser settings I believe - usually around 255 bytes.
From w3schools.com:
What is HTTP?
The Hypertext Transfer Protocol (HTTP) is designed to enable
communications between clients and servers.
HTTP works as a request-response protocol between a client and server.
A web browser may be the client, and an application on a computer that
hosts a web site may be the server.
Example: A client (browser) submits an HTTP request to the server;
then the server returns a response to the client. The response
contains status information about the request and may also contain the
requested content.
Two HTTP Request Methods: GET and POST
Two commonly used methods for a request-response between a client and
server are: GET and POST.
GET – Requests data from a specified resource POST – Submits data to
be processed to a specified resource
Here we distinguish the major differences:
Well one major thing is anything you submit over GET is going to be exposed via the URL. Secondly as Ceejayoz says, there is a limit on characters for a URL.
Another difference is that POST generally requires two HTTP operations, whereas GET only requires one.
Edit: I should clarify--for common programming patterns. Generally responding to a POST with a straight up HTML web page is a questionable design for a variety of reasons, one of which is the annoying "you must resubmit this form, do you wish to do so?" on pressing the back button.
As answered by others, there's a limit on url size with get, and files can be submitted with post only.
I'd like to add that one can add things to a database with a get and perform actions with a post. When a script receives a post or a get, it can do whatever the author wants it to do. I believe the lack of understanding comes from the wording the book chose or how you read it.
A script author should use posts to change the database and use get only for retrieval of information.
Scripting languages provided many means with which to access the request. For example, PHP allows the use of $_REQUEST to retrieve either a post or a get. One should avoid this in favor of the more specific $_GET or $_POST.
In web programming, there's a lot more room for interpretation. There's what one should and what one can do, but which one is better is often up for debate. Luckily, in this case, there is no ambiguity. You should use posts to change data, and you should use get to retrieve information.
HTTP Post data doesn't have a specified limit on the amount of data, where as different browsers have different limits for GET's. The RFC 2068 states:
Servers should be cautious about
depending on URI lengths above 255
bytes, because some older client or
proxy implementations may not properly
support these lengths
Specifically you should the right HTTP constructs for what they're used for. HTTP GET's shouldn't have side-effects and can be safely refreshed and stored by HTTP Proxies, etc.
HTTP POST's are used when you want to submit data against a url resource.
A typical example for using HTTP GET is on a Search, i.e. Search?Query=my+query
A typical example for using a HTTP POST is submitting feedback to an online form.
Gorgapor, mod_rewrite still often utilizes GET. It just allows to translate a friendlier URL into a URL with a GET query string.
Simple version of POST GET PUT DELETE
use GET - when you want to get any resource like List of data based on any Id or Name
use POST - when you want to send any data to server. keep in mind POST is heavy weight operation because for updation we should use PUT instead of POST
internally POST will create new resource
use PUT - when you

Resources