Breaking a possible infinite loop in AWS step functions - aws-lambda

I am writing a state machine with the following functionality.
start State -> Lambda1 which calls external service Describe API endpoint to get State attribute of item example "isOKay" or "isNotOkay" -> Choice state((depending on the state received) if "IsOkay" move to next state and if "isNotOkay" again call lambda1. This happens until it gets a IsOkay state. How can put a limit to this custom retry loop so that I dont get stuck if I never receive a IsOkay response.

You can use input your step in a form of counter, which incremented by lambda. Which when return in retry can be checked for a limit, if crosses one fail lambda with custom exception. Describe separate step for handling the exception.
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.html
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html

Related

Project reactor - react to timeout happened downstream

Project Reactor has a variety of timeout() operators.
The very basic implementation raises TimeoutException in case no item arrives within the given Duration. The exception is propagated downstream , and to upstream it sends cancel signal.
Basically my question is: is it possible to somehow react (and do something) specifically to timeout that happened downstream, not just to cancelation that sent after timeout happened?
My question is based on the requirements of my real business case and also I'm wondering if there is a straight solution.
I'll simplify my code for better understanding what I want to achieve.
Let's say I have the following reactive pipeline:
Flux.fromIterable(List.of(firstClient, secondClient))
.concatMap(Client::callApi) // making API calls sequentially
.collectList() // collecting results of API calls for further processing
.timeout(Duration.ofMillis(3000)) // the entire process should not take more than duration specified
.subscribe();
I have multiple clients for making API calls. The business requirement is to call them sequantilly, so I call them with concatMap(). Then I should collect all the results and the entire process should not take more than some Duration
The Client interface:
interface Client {
Mono<Result> callApi();
}
And the implementations:
Client firstClient = () ->
Mono.delay(Duration.ofMillis(2000L)) // simulating delay of first api call
.map(__ -> new Result())
// !!! Pseudo-operator just to demonstrate what I want to achieve
.doOnTimeoutDownstream(() ->
log.info("First API call canceled due to downstream timeout!")
);
Client secondClient = () ->
Mono.delay(Duration.ofMillis(1500L)) // simulating delay of second api call
.map(__ -> new Result())
// !!! Pseudo-operator just to demonstrate what I want to achieve
.doOnTimeoutDownstream(() ->
log.info("Second API call canceled due to downstream timeout!")
);
So, if I have not received and collected all the results during the amount of time specified, I need to know which API call was actually canceled due to downstream timeout and have some callback for this "event".
I know I could put doOnCancel() callback to every client call (instead of pseudo-operator I demonstrated) and it would work, but this callback reacts to cancelation, which may happen due to any error.
Of course, with proper exception handling (onErrorResume(), for example) it would work as I expect, however, I'm interesting if there is some straight way to somehow react specifically to timeout in this case.

Cypress async form validation - how to capture (possibly) quick state changes

I have some async form validation code that I'd like to put under test using Cypress. The code is pretty simple -
on user input, enter async validation UI state (or stay in that state if there are previous validation requests that haven't been responded to)
send a request to the server
receive a response
if there are no pending requests, leave async validation UI state
Step 1 is the part I want to test. Right now, this means checking if some element has been assigned some class -- but the state changes can happen very fast, and most of the time (not always!) Cypress times out waiting for something that has ALREADY happened (in other words, step 4 has already occurred by the time we get around to seeing if step 1 happened).
So the failing test looks like:
cy.get("#some-input").type("...");
cy.get("#some-target-element").should("have.class", "class-to-check-for");
Usually, by the time Cypress gets to the second line, step 4 has already ran and the test fails. Is there a common pattern I should know about to solve this? I would naturally prefer not to have change the code under test.
Edit 1:
I'm not certain that I've 100% solved the "race" condition here, but if I use the underlying native elements (discarding the jQuery abstraction), I haven't had a failure yet.
So, changing:
cy.get("#some-input").type("...")
to:
cy.get("#some-input").then(jQueryObj => {
let nativeElement = jQueryObj[0];
nativeElement.value = "...";
nativeElement.dispatchEvent(new Event("input")); // make sure the app knows this element changed
});
And then running Cypress' checks for what classes have / haven't been added has been effective.
You can stub the server request that happens during form validation - and slow it down, see delay parameter https://docs.cypress.io/api/commands/route.html#Use-delays-for-responses
While the request is delayed, your app's validation UI is showing, you can validate it and then once the request finishes, check if the UI goes away.

How to retry a failed action?

I went through Spring Statemachine documentation but did not find clear answers for some scenarios. I will greatly appreciate if some one can clarify my questions.
Scenario1: How to retry errors related to action failures? Lets say I have the following states S1, S2 and S3 and when we transition from S1 to S2 I want to perform action A2. If action A2 fails I want to retry it with some time intervals. Is that possible using Spring StateMachine?
Consider AWS state machine Step Functions for example. All work in the step functions States are done using Task. And Task can be configured for retry.
transitions
.withExternal()
.source(States.S1)
.target(States.S2)
.event(Events.E1)
.action(action());
Scenario 2: Lets say Statemachine has states S1, S2 and S3. The current state is S2. If the server goes down on startup will the Statemachine execution pick up from where it left off or we will have to do it all over again?
Scenario 3: When a Guard returns false (possibly because of error condition) and prevents a transition what happens next?
How to retry a failed action?
There are two types of actions in Spring State Machine - transition actions and state actions. In scenario 1 you're talking about transition action.
When you specify a transition action, you can also specify an error handler if the action fails. This is clearly documented in the spring state machine documentation.
.withExternal()
.source(States.S1)
.target(States.S2)
.event(Events.E1)
.action(action(), errorAction());
In your errorAction() method you can implement your logic.
Possible options are:
transition to an earlier state and go the same path
transition to a specific state (e.g. retry state) where you can have your retry logic (e.g. Task/Executor that retries the action N times, and transition to other states (e.g. action success => go normal flow; action failed after N retries => transition to a failure terminal state)
There's also the official Tasks example, that demonstrates recovery/retry logic (source code).

Why does Rxjs unsubscribe on error?

In short:
How to proceed listening after an error in stream without putting a .catch before every .subscribe?
If you need more details they are here:
Lets assume I have a Subject of current user or null. I get the data from API sometimes and send to the Subject. It updates the view accordingly.
But at some point error occurs on my server and I want my application to continue working as before but notify some places about the error and KEEP listening to my Subject.
Initially I thought that if I just do userSubject.error(...) it will only trigger .catch callback and error handlers on subscribes and skip all success handlers and chains.
And if after I call userSubject.next(...) all my chains and subscribers will work as before
BUT unluckily it is not the case. After the first uncaught .error it unsubscribes subscribers from the stream and they do not operate any more.
So my question: Why???
And what to do instead if I want to handle null value normally but also handle errors only in some places?
Here is the link to RxJs source code where Subscriber unsubscribes on error
https://github.com/ReactiveX/rxjs/blob/master/src/Subscriber.ts#L140
Rx observables follow the grammar next*(error|complete)?, meaning that they can produce nothing after error or complete notification has been delivered.
An explanation of why this matters can be found from Rx design guidelines:
The single message indicating that an observable sequence has finished ensures that consumers of the observable sequence can deterministically establish that it is safe to perform cleanup operations.
A single failure further ensures that abort semantics can be maintained for operators that work on multiple observable sequences.
In short, if you want your observers to keep listening to the subject after a server error has occurred, do not deliver that error to the subject, but rather handle it in some other way (e.g. use catch, retry or deliver the error to a dedicated subject).
Every Observable emits zero or more next notifications and one error or complete but never both.
For this reason, Subjects have internal state.
Then it depends how you construct your chain. For example you can use retry() to resubscribe to its source Observable on error.
Or when you pass values to your Subject you can send only next notifications and ignore the other two:
.subscribe(v => subject.next(v));
Or if you want to throw error when the user is null you can use any operator that captures exceptions and sends them as error notifications. For example like this:
.map(v => {
if (v === null) {
throw new Error("It's broken");
}
return v;
})
Anyway it's hard to give more precise advice without any code.

Safe await on function in another process

TL;DR
How to safely await on function execution (takes str and int as arguments and doesn't require any other context) in a separate process?
Long story
I have aiohtto.web web API that uses Boost.Python wrapper for C++ extension, run under gunicorn (and I plan to deploy it on Heroku), tested by locust.
About extension: it have just one function that does non-blocking operation - takes one string (and one integer for timeout management), does some calculations with it and returns a new string. And for every input string, it is only one possible output (except timeout, but in that case, C++ exception must be raised and translated by Boost.Python to a Python-compatible one).
In short, a handler for specific URL executes the code below:
res = await loop.run_in_executor(executor, func, *args)
where executor is the ProcessPoolExecutor instance, and func -function from C++ extension module. (in the real project, this code is in the coroutine method of the class, and func - it's classmethod that only executes C++ function and returns the result)
Error catching
When a new request arrives, I extract it's POST data by request.post() and then storing it's data to the instance of the custom class named Call (because I have no idea how to name it in another way). So that call object contains all input data (string), request receiving time and unique id that comes with the request.
Then it proceeds to class named Handler (not the aiohttp request handler), that passes it's input to another class' method with loop.run_in_executor inside. But Handler has a logging system that works like a middleware - reads id and receiving time of every incoming call object and logging it with a message that tells you either it just starting to execute, successfully executed or get in trouble. Also, Handler have try/except and stores all errors inside the call object, so that logging middleware knows what error occurred, or what output extension had returned
Testing
I have the unit test that just creates 256 coroutines with this code inside and executor that have 256 workers and it works well.
But when testing with Locust here comes a problem. I use 4 Gunicorn workers and 4 executor workers for this kind of testing. At some time application just starts to return wrong output.
My Locust's TaskSet is configured to log every fault response with all available information: output string, error string, input string (that was returned by the application too), id. All simulated requests are the same, but id is unique for every.
The situation is better when setting Gunicorn's max_requests option to 100 requests, but failures still come.
Interesting thing is, that sometimes I can trigger "wrong output" period by simply stopping and starting Locust's test.
I need a 100% guarantee that my web API works as I expect.
UPDATE & solution
Just asked my teammate to review the C++ code - the problem was in global variables. In some way, it wasn't a problem for 256 parallel coroutines, but for Gunicorn was.

Resources