We are using Spring Cloud Streams with multiple bindings based on Kafka Streams binders.
The output of /actuator/health correctly lists all our bindings and their state (RUNNING) - see example below.
Our expectation was, when a binding is stopped using
curl -d '{"state":"STOPPED"}' -H "Content-Type: application/json" -X POST http://<host>:<port>/actuator/bindings/mystep1,
it is still listed, but with threadState = NOT_RUNNING or SHUTDOWN and the overall health status is DOWN.
This is not the case!. After stopping a binder, it is removed from the list and the overall state of /actuator/health is still UP.
Is there a reason for this? We would like to have a alert, on this execution state of our application.
Are there code examples, how we can achieve this by a customized solution based on KafkaStreamsBinderHealthIndicator?
Example output of /actuator/health with Kafka Streams:
{
"status": "UP",
"components": {
"binders": {
"status": "UP",
"components": {
"kstream": {
"status": "UP",
"details": {
"mystep1": {
"threadState": "RUNNING",
...
},
...
},
"mystep2": {
"threadState": "RUNNING",
...
},
...
}
}
}
}
},
"refreshScope": {
"status": "UP"
}
}
}
UPDATE on the exact situation:
We do not stop the binding manually via the bindings endpoint.
We have implemented integrated error queues for runtime errors within all processing steps based on StreamBridge.
The solution has also some kind of circuit breaker feature. This is the one that stops a binding from within the code, when a configurable limit of consecutive runtime errors is reached, because we do not want to flood our internal error queues.
Our application is monitored by Icinga via /actuator/health, therefore we would like to get an alarm, when on of the bindings is stopped.
Switching in Icinga to another endpoint like /actuator/bindings cannot be done easily by our team.
Presently, the Kafka Streams binder health indicator only considers the currently active Kafka Streams for health check. What you are seeing as the output when the binding is stopped is expected. Since you used the bindings endpoint to stop the binding, you can use /actuator/bindings to get the status of the bindings. There you will see the state of all the bindings in the stopped processor as stopped. Does that satisfy your use case? If not, please add a new issue in the repository and we could consider making some changes in the binder so that the health indicator is configurable by the users. At the moment, applications cannot customize the health check implementation. We could also consider adding a property, using which you can force the stopped/inactive kafka streams processors as part of the health check output. This is going to be tricky - for e.g. what will be the overall status of the health if some processors are down?
Related
I would like to clarify in my mind the way different kinds of events could be implemented in a EDA system (system with Event Driven Architecture) implementing DDD (Domain Driven Design). Let assume that we are not using event sourcing.
More specifically having read relevant articles it seems there are 3 kinds of events:
Event notification: This kind of event seems not to carry much details, it just notifies the for the event which has happened, providing a way to query for more information.
"type": "paycheck-generated",
"event-id": "537ec7c2-d1a1-2005-8654-96aee1116b72",
"delivery-id": "05011927-a328-4860-a106-737b2929db4e",
"timestamp": 1615726445,
"payload": {
"employee-id": "456123",
"link": "/paychecks/456123/2021/01" }
}
Event-carried state transfer (ECST): This event seems to come in two flavours, either it has a delta of some information which was changed, or it contains all the relevant information (snapshot) for a resource.
{
"type": "customer-updated",
"event-id": "6b7ce6c6-8587-4e4f-924a-cec028000ce6",
"customer-id": "01b18d56-b79a-4873-ac99-3d9f767dbe61",
"timestamp": 1615728520,
"payload": {
"first-name": "Carolyn",
"last-name": "Hayes",
"phone": "555-1022",
"status": "follow-up-set",
"follow-up-date": "2021/05/08",
"birthday": "1982/04/05",
"version": 7
}
}
{
"type": "customer-updated",
"event-id": "6b7ce6c6-8587-4e4f-924a-cec028000ce6",
"customer-id": "01b18d56-b79a-4873-ac99-3d9f767dbe61",
"timestamp": 1615728520,
"payload": {
"status": "follow-up-set",
"follow-up-date": "2021/05/10",
"version": 8
}
}
Domain event: This event lies somewhere in between on both of the other two, it has more information than the event notification but this info is more relevant to a specific domain.
The examples above for each one from the Khononov's book (Learning Domain-Driven Design: Aligning Software Architecture and Business Strategy)
Having said the previous statements I would like to clarify the following questions:
(1) Is the typical use of the Event-carried state transfer (ECST) and Event notifications type of events to be used in the form of integration events (in a DDD EDA system) when communicating with other bounded contexts? (via transforming domain events to integration
events depending on the use case)
(2) Is there one or many other typical categories of events in a Domain Driven Designed system utilising Event Driven Architecture? For example: event notifications when domain errors occur - for client notification for the specific error (which in this case there is no aggregate persistence taking place) - how are these kind of errors are propagated back to the client, and what could be the name of such events accepted by the DDD EDA community?
I have a AWS lambda that the trigger for activating it is an event from EventBridge (rule)
The rule looks like this:
{
"detail-type": ["ECS Task State Change"],
"source": ["aws.ecs"],
"detail": {
"stopCode": ["EssentialContainerExited", "UserInitiated"],
"clusterArn": ["arn:aws:ecs:.........."],
"containers": {
"name": ["some name"]
},
"lastStatus": ["DEACTIVATING"],
"desiredStatus": ["STOPPED"]
}
}
This event is normally triggered when ECS task status is changed (in this case when a task is killed)
My questions are:
Can I simulate this event from command line?
maybe by running aws events put-events --entries file://putevents.json
(What should I write in the putevents.json file?)
Can I simulate this event from Javascript code?
TL;DR Yes and yes, provided you deal with with the limitation that user-generated events cannot have a source that begins with aws.
Send custom events to EventBridge with the PutEvents API. The API is available in the CLI as well as in the SDKs (see AWS JS SDK). The list of custom events you pass in the entries parameter must have three fields at a minimum:
[
{
"source": "my-custom-event", // cannot start with aws !!,
"detail-type": "ECS Task State Change",
"detail": {} // copy from the ECS sample events docs
}
]
The ECS task state change event samples in the ECS documentation make handy templates for your custom events. You can safely prune any non-required field that you don't need for pattern matching.
Custom events are not permitted to mimic the aws system event sources. So amend your rule to also match on your custom source name:
"source": ["aws.ecs", "my-custom-event"],
I'm considering using Feature Flags in a web based app that has both javascript/html and mobile native clients, and am trying to make an informed decision on the following:
Should feature flags be exposed to client applications?
When discussing this with others, 2 approaches have appeared with how clients deal with feature flags, those being:
1) Clients know nothing about feature flags at all.
Server side endpoints that respond with data would include extra data to say if a feature was on or off.
e.g. for a fictional endpoint, /posts, data could be returned like so
enhanced ui feature enabled:
{
enhanced_ui: true,
[1,2,3,4,5]
}
enhanced ui feature disabled:
{
enhanced_ui: false,
[1,2,3,4,5]
}
2) Clients can access an endpoint, and ask for feature flag states.
e.g. /flagstates
{
'enhanced_ui:true
}
Clients then use this to hide or show features as required.
Some thoughts:
Approach #1 has less moving parts - no client side libraries are needed for implementing gates at all.
The question comes up though - when dynamic flags are updated, how do clients know? We can implement pub/sub to receive notifications and reload clients, then they'd automatically get the new up to date data.
Approach #2 feels like it might be easier to manage listening for flag updates, since it's a single endpoint that returns features, and state changes could be pushed out easily.
This is something that interested me as well as I have a requirement to implement feature flags/switches in the product I am working on. I have been researching this area for the past week and I will share my findings and thoughts (I am not claiming they are best practice in any way). These findings and thoughts will be heavily based on ASP.Net Zero and ASP.Net Boilerplate as I found these to be the closest match for an example implementation for what I am looking for.
Should feature flags be exposed to client applications?
Yes and no. If you are building a software as a service product (with multitenancy potentially), then you will most likely have to have some sort of management ui where admin users can manage (CRUD/Enable/Disable) features . This means that if you are building a SPA, you will obviously have to implement endpoints in your api (appropriately secured, of course) that your front end can use to retrieve details about the features and their current state for editing purposes. This could look like something below:
"features": [
{
"parentName": "string",
"name": "string",
"displayName": "string",
"description": "string",
"defaultValue": "string",
"inputType": {
"name": "string",
"attributes": {
"additionalProp1": {},
"additionalProp2": {},
"additionalProp3": {}
},
....
Model for features can, of course, vary based on your problem domain, but above should give you an idea of a generic model to hold feature definition.
Now as you can see, there is more to the feature than just a boolean flag whether it is enabled or not - it may have attributes around it. This is something that was not obvious at all for me to begin with as I only thought about my problem in the context of fairly simple features (true/false) where as actually, there may be features that are a lot more complex.
Lastly, when your users will be browsing your app, if you are rendering the UI for tenant who has your EnhancedUI feature enabled, you will need to know if the feature is enabled. In ASP.Net Zero this done by using something called IPermissionService, which is implemented in both, front end and back end. In back end, the permission service would basically check if the user is supposed to be allowed to access some resource, which in feature switch context means to check whether the feature is enabled for a given tenant. In front end (Angular), the permission service retrieves these permissions (
/api/services/app/Permission/GetAllPermissions):
{
"items": [
{
"level": 0,
"parentName": "string",
"name": "string",
"displayName": "string",
"description": "string",
"isGrantedByDefault": true
}
]
}
This can then be used to create some sort of RouteGuard where if something is not enabled or not allowed, you can appropriately redirect for example to an Upgrade your edition page.
Hopefully this gives you some ideas to think about.
For example such code:
os.Stderr.WriteString(rec.(string))
But this will not show as an error:
I know that I can panic after logging and catch it on API Gateway (against sending stacktrace to the client) - no other ways? Documentation is not mention anything like that.
It seems not possible. I assume, you're looking at the metrics in Amazon CloudWatch
AWS Lambda automatically monitors functions on your behalf, reporting
metrics through Amazon CloudWatch. These metrics include total
invocations, errors, duration, throttles, DLQ errors and Iterator age
for stream-based invocations.
https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-metrics.html
Now, let's see how do they define errors
Metric "Errors" measures the number of invocations that failed due to errors in the
function (response code 4XX).
So, if you want to see the errors on that graph, you have to respond with the proper codes. If you're concerned about exposing the error stacktrace, here is a good read Error handling with API Gateway and Go Lambda functions. The basic idea there is about creating a custom lambdaError type, meant to be used by a Lambda handler function to wrap errors before returning them. This custom error message
{
"code": "TASK_NOT_FOUND",
"public_message": "Task not found",
"private_message": "unknown task: foo-bar"
}
will be wrapped in a standard one
{
"errorMessage": "{\"code\":\"TASK_NOT_FOUND\",\"public_message\":\"Task not found\",\"private_message\":\"unknown task: foo-bar\"}",
"errorType": "lambdaError"
}
and later on mapped in API Gateway, so, the end client will see only the public message
{
"code": "TASK_NOT_FOUND",
"message": "Task not found"
}
I'm trying to use the "whisk.system/messaging" and trying to use the method messageHubProduce.
I created a bind to this package, and tried to use a simple call with postman.
Using the documentation, I created a simple json and did a call, but the method is really unstable. The same call sometimes return as a success, sometimes returns a timeout and sometimes as a "No brokers available".
I now the implementation of this code is on python. Have anyone with the same symptoms I getting?
This is the message I'm sending.
{
"topic": "mytopic",
"value": "MyMessage",
"blocking": false
}
These are the results for the same call
messageHubProduce 446d59eb816b4b34a52374a6a24f3efe
{ "error": "The action exceeded its time limits of 60000 milliseconds." }
messageHubProduce 4213b6a495bc4c5aa7af9e299ddd8fcd
{ "success": true }
After working closely with the Message Hub team, we have deployed an updated messageHubProduce action which should address your stability and performance issues.
Additionally, to provide real-time feedback please feel free to join us on Slack: https://openwhisk.incubator.apache.org/slack.html