Why do some NiFi HTTP responses take so long? - apache-nifi

I've noticed that when I issue 200 responses via NiFi, the response is typically immediate. However, 404 and 500 errors seem to take so long that they often cause the client to timeout.
Is this an intentional behavior? Or is my HandleHTTPResponse processor possibly setup wrong?
--
Edit: While answered below, it's worth clarifying -- the HandleHTTPResponse was not behaving differently; I just happened to be routing [penalized] flowfiles to processors that were set to give 404/500 error codes ... so, it appeared there was a correlation.

The failed requests might be penalized. Check out the settings on your failure path and update the default 30 seconds value to 0, which makes more sense when handling expected http errors.

Without knowing which responses are taking this long, my guess is that the error responses are being generated by an exception which may be thrown due to an internal timeout (i.e. waiting for some other connection or operation which fails to complete, exhausting the timeout, which leads to the HTTP response taking so long). You can profile these operations in the JVM if you like.

Related

HTTP status code for creating too many resources

If there is a limit on the number of resources created using POST request, what should be the status code?
Let's say, there is a restriction on the number of resources created using POST wherein only 10 resources can be created. The 11th POST request should fail due to the above constraint. What should be the status code?
Should it be 422 with a meaningful message, something along the lines of "Resource count limit reached"? or is there a status code for this?
It really depends on your use-case.
If user is limited in time (let's say 10 per day) but might actually get more credits later automatically, I suggest 429 Too Many Requests as client sent to many requests in one day.
If credits are locked (ie: User only had 10 free credits), I suggest 403 Forbidden as the request is fully understood and processable but the server does deny it due to lack of credits.
Anyway 422 Unprocessable entity is not correct as the request is well formed and server might process it with given credits. Nothing is really missing from the request (from what I understand from your post).
I think that HTTP400 is appropriate, especially if you can provide helpful feedback in the error response. If a user is submitting an invalid payload in the request- it's a bad request. Anything else might get confusing.
Though, HTTP405 (Not Allowed) might be better. If there are no POST more requests accepted by the server for a particular resource that may be more accurate. However it really just depends on the future use of the API.

Kafka Producer is not retrying after Timeout

Intermittently(once or twice in a month) I am seeing the error
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for cart-topic-0: 5109 ms has passed since batch creation plus linger time
in my logs due to which the corresponding message was not processed by Kafka Producer.
Though all the brokers are up and available I'm not sure why this error is being observed. Even the load is not much during this period.
I have set the retries property value to 10 in Producer configs but still, the message was not been retried. Is there anything else I need to add for the Kafka send method? I have gone through the similar issues raised, but there is no proper conclusion for this error.
Can someone please help on how to fix this.
From the KIP proposal which is now addressed
We propose adding a new timeout delivery.timeout.ms. The window of enforcement includes batching in the accumulator, retries, and the inflight segments of the batch. With this config, the user has a guaranteed upper bound on when a record will either get sent, fail or expire from the point when send returns. In other words we no longer overload request.timeout.ms to act as a weak proxy for accumulator timeout and instead introduce an explicit timeout that users can rely on without exposing any internals of the producer such as the accumulator.
So basically, post this now you can additionally be able to configure a delivery timeout and retries for every async send you execute.
I had an issue where retries were not being obeyed, but in my particular case it was because we were calling the get() method on send for synchronous behaviour. We hadn't realized it would impact retries.
In investigating the issue through various paths I came across the definition of the sorts of errors that are retrial
https://kafka.apache.org/11/javadoc/org/apache/kafka/common/errors/RetriableException.html
What had confused me is that timeout was listed as a retrial one.
I would normally have suggested you would want to look into if the delivery of your batches was taking too long and messages in your buffer were expiring due to increased volume, but you've mentioned that the volume isn't particularly high.
Did you determine if increasing the request.timeout.ms has an impact on the frequency of occurrence? It might be more of a treating the symptom step than the cause.

SparkPost API error: 'Too many requests'

When I run requests series like
https://api.sparkpost.com:443/api/v1/suppression-list/he**0#gmail.com
Sometimes, I get error:
name: 'SparkPostError',
errors: [ { message: 'Too many requests' } ],
statusCode: 429
Too many - this is how many?
How long the server keeps track of period? How long the server resets the counter? How to solve this problem?
https://developers.sparkpost.com/api/index.html#header-rate-limiting
After a quick look at the site...
The answer of support
As mentioned before we do limit requests on our endpoints to prevent abuse and while we can’t reveal the actual limits we do recommend that you wait several seconds before making consecutive requests. If you are making these requests via some type of automated process (script, code, etc.), we highly recommend that you add a wait for several seconds upon seeing this 429 error and reattempt after. Decreasing the frequency of your requests and waiting before reattempting is the only way around this error message.
https://developers.sparkpost.com/api/index.html#header-rate-limiting
Rate Limiting
Note: To prevent abuse, our servers enforce request rate limiting, which may trigger responses with HTTP status code 429.
SparkPost implements rate limiting on the following API endpoints:
/api/v1/message-events
/api/v1/metrics/*
The limits imposed here are dynamic but as a general rule, polling these endpoints more than once in 2 minutes may encounter rate limiting and a 429 status code.

Does empty "Expect:" header mean anything?

Many libraries include Expect: 100-continue on all HTTP 1.1 POST and PUT requests by default.
I intend to reduce perceived latency by removing 100-continue mechanism on the client side on those requests for which I know the expense of sending data right away is less than waiting a roundtrip for 100-continue, namely on short requests.
Of course I still want all the other great features of HTTP 1.1, thus only I want to kill Expect: 100-continue header. I have two options:
remove expect header entirely, or
send empty expect header, Expect:\r\n
Is there ever any difference between the two?
Any software that might break for one or the other?
Nothing should break if you remove the Expect header, but I know that Microsoft IIS has had issues with 100 Continue in the past. For example, IIS5 always sends 100 continue responses. So, I wonder if at least some of the uses of it in libraries might be to work around similarly broken behaviour in servers.
Many libraries seem to set this header and then not actually handle 100 Continue properly - e.g. they begin to send the request body immediately without waiting for a 100 Continue and then don't handle the fact that the server might send back any HTTP error code before they've finished sending the request body (the first part's OK, it's the second part which is broken - see later in my answer). This leads me to believe that some authors have just copied it from elsewhere without fully understanding the subtleties.
I can't see any reason to include a blank Expect header - if you're not going to include 100-continue (or some other Expect clause) then omit the header entirely. The only reason to include it would be to work around broken webservers, but I'm not aware of any which behave in this way.
Finally, if you're just looking to reduce roundtrip latencies it seems to me that it wouldn't actually be inconsistent with the RFC to simply begin to transmit the request body immediately. You're not supposed to wait indefinitely to send the request body (as per the RFC), so you're behaving to the spec - it's just your timeout before sending anyway is zero.
You must be aware that servers are at liberty to not send the 100 Continue response if they've already received some of the request body, so you have to handle servers which send 100 Continue, those which send nothing and wait for the full request and those which immediately send any HTTP error code (which may be 417, but more likely a generic 4xx code). In this way, your short requests shouldn't have any overhead (aside from the Expect header) but you won't have to wait for the 100 Continue. Of course, for this approach to work you'll need to be doing things in a way which lets you interrupt the request as soon as the server returns an error code (e.g. non-blocking IO with poll() or select()).
Doing things this way might help keep your code more consistent between small and large requests while reducing the latency. The downside is that it's perhaps not what the RFC authors had in mind, even if it doesn't explicitly violate any of the requirements. Also, it might make your later code more complicated if you're not already doing non-blocking IO or similar.

Is it wrong to return 202 "Accepted" in response to HTTP GET?

I have a set of resources whose representations are lazily created. The computation to construct these representations can take anywhere from a few milliseconds to a few hours, depending on server load, the specific resource, and the phase of the moon.
The first GET request received for the resource starts the computation on the server. If the computation completes within a few seconds, the computed representation is returned. Otherwise, a 202 "Accepted" status code is returned, and the client must poll the resource until the final representation is available.
The reason for this behavior is the following: If a result is available within a few seconds, it needs to be retrieved as soon as possible; otherwise, when it becomes available is not important.
Due to limited memory and the sheer volume of requests, neither NIO nor long polling is an option (i.e. I can't keep nearly enough connections open, nor even can I even fit all of the requests in memory; once "a few seconds" have passed, I persist the excess requests). Likewise, client limitations are such that they cannot handle a completion callback, instead. Finally, note I'm not interested in creating a "factory" resource that one POSTs to, as the extra roundtrips mean we fail the piecewise realtime constraint more than is desired (moreover, it's extra complexity; also, this is a resource that would benefit from caching).
I imagine there is some controversy over returning a 202 "Accepted" status code in response to a GET request, seeing as I've never seen it in practice, and its most intuitive use is in response to unsafe methods, but I've never found anything specifically discouraging it. Moreover, am I not preserving both safety and idempotency?
So, what do folks think about this approach?
EDIT: I should mention this is for a so-called business web API--not for browsers.
If it's for a well-defined and -documented API, 202 sounds exactly right for what's happening.
If it's for the public Internet, I would be too worried about client compatibility. I've seen so many if (status == 200) hard-coded.... In that case, I would return a 200.
Also, the RFC makes no indication that using 202 for a GET request is wrong, while it makes clear distinctions in other code descriptions (e.g. 200).
The request has been accepted for processing, but the processing has not been completed.
We did this for a recent application, a client (custom application, not a browser) POST'ed a query and the server would return 202 with a URI to the "job" being posted - the client would use that URI to poll for the result - this seems to fit nicely with what was being done.
The most important thing here is anyway to document how your service/API works, and what a response of 202 means.
From what I can recall - GET is supposed to return a resource without modifying the server. Maybe activity will be logged or what have you, but the request should be rerunnable with the same result.
POST on the other hand is a request to change the state of something on the server. Insert a record, delete a record, run a job, something like that. 202 would be appropriate for a POST that returned but isn't finished, but not really a GET request.
It's all very puritan and not well practiced in the wild, so you're probably safe by returning 202. GET should return 200. POST can return 200 if it finished or 202 if it's not done.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
In case of a resource that is supposed to have a representation of an entity that is clearly specified by an ID (as opposed to a "factory" resource, as described in the question), I recommend staying with the GET method and, in a situation when the entity/representation is not available because of lazy-creation or any other temporary situation, use the 503 Service Unavailable response code that is more appropriate and was actually designed for situations like this one.
Reasoning for this can be found in the RFCs for HTTP itself (please verify the description of the 503 response code), as well as on numerous other resources.
Please compare with HTTP status code for temporarily unavailable pages. Although that question is about a different use case, it actually relates to the exact same feature of HTTP.

Resources