Using Cloud Foundry events to audit successful deletion of orgs/spaces - events

I intend to use the events defined in the Cloud Foundry API in order to verify that the deletion of an org or a space including all service instances has been successful. The CF events should in principle provide all necessary information.
In order to test I created a space with a few deployed apps and service instances bound to the apps. using
cf curl "/v2/events"
I could see the related events. Now I deleted the space via
cf curl "/v2/spaces/<my-space-guid>?recursive=true&async=true" -X DELETE
Again, executing
cf curl "/v2/events?q=space_guid:<my-space-guid>"
I expected to see a number of events for the deletion of all entities in the space including the space deletion event itself. Unfortunately I did not see ANY event for the space. It looks like they were all deleted with the space. I cannot imagine that this is the intended lifecycle of such (audit) events?!?
Does anybody have any own experience with the API? Any reference to details on the defined behaviour?

Related

How to enable disk space metrics in Spring-Boot/Micrometer?

I have read over the Spring Metrics docs and I have system metrics enabled in my application.yml file. This, according to the docs, is supposed to give me metrics prefixed with process., system. and disk.. I see results for the first two of these, but I am not getting any metrics about disk space usage. I've even looked in the code and have found a MeterBinder class named DiskSpaceMetrics that seems to send the two values disk.free and disk.total. Can someone please tell me how to get my Spring app to send these disk space metrics?
I am sending my metrics to AWS CloudWatch.
I found this Question: How to enable DiskSpaceMetrics in io.micrometer. It seems to be about seeing the disk space values in Spring's actuator dashboard. I do see the values there. What I want is for those values to be periodically reported as metrics values.
It turns out that my app WAS sending out the metrics. The AWS Cloudwatch console just wasn't showing them to me. I brought up the metrics just fine via Grafana. Even once I knew they were there, I could find no way to get the AWS console to show them to me. Strange. I might have to put in a request to AWS asking them what's up with that.

Is there a way to update cached in-memory value on all running instance of a serverless function? (AWS,Google,Azure or OpenWhisk)

Suppose I am running a serverless function with a global state variable which is cached in memory. Assuming that the value is cached on multiple running instances, how an update to the global state would be broadcasted to every serverless instance with the updated value?
Is this possible in any of the serverless framework?
It depends on the serverless framework you're using, which makes it hard to give a useful answer on Stack Overflow. You'll have to research each of them. And you'll have to review this over time because their underlying implementations can change.
In general, you will be able to achieve your goal as long as you can open up a bidirectional connection from each function instance so that your system outside the function instances can send them updates when it needs to. This is because you can't just send a request and have it reach every backing instance. The serverless frameworks are specifically designed to not work that way. They load balance your requests to the various backing instances. And it's not guaranteed to be round robin, so there's no way for you to be confident you're sending enough duplicate requests for each of the backing instances to have been hit at least once.
However, there is something also built into most serverless frameworks that may stop you, even if you can open up long lives connections from each of them that allow them to be reliably messaged at least once each. To help keep resources available for functions that need them, inactive functions are often "paused" in some way. Again, each framework will have its own way of doing this.
For example, OpenWhisk has a configurable "grace period" where it allows CPU to be allocated only for a small period of time after the last request for a container. OpenWhisk calls this pausing and unpausing containers. When a container is paused, no CPU is allocated to it, so background processing (like if it's Node.js and you've put something onto the event loop with setInterval) will not run and messages sent to it from a connection it opened will not be responded to.
This will prevent your updates from reliably going out unless you have constant activity that keeps every OpenWhisk container not only warm, but unpaused. And, it goes against the interests of the folks maintaining the OpenWhisk cluster you're deploying to. They will want to pause your container as soon as they can so that the CPU it consumed can be allocated to containers not yet paused instead. They will try to tune their cluster so that containers remain unpaused for a duration as short as possible after a request/event is handled. So, this will be hard for you to control unless you're working with an OpenWhisk deployment you control, in which case you just need to tune it according to your needs.
Network restrictions that interfere with your ability to open these connections may also prevent you from using this architecture.
You should take these factors into consideration if you plan to use a serverless framework and consider changing your architecture if you require global state that would be mutated this way in your system.
Specifically, you should consider switching to a stateless design where instead of caching occurring in each function instance, it occurs in a shared service designed for fast caching, like Redis or Memcached. Then each function can check that shared caching service for the data before retrieving it from its source. Many cloud providers who provide serverless compute options also provide managed databases like these. So you can often deploy it all to the same place.
Also, you could switch, if not to a stateless design, a pull model for caching instead of a push model. Instead of having updates pushed out to each function instance to refresh their cached data, each function would pull fresh data from its source when they detect that the data stored in their memory has expired.

Using ILogger to send logging to x-ray via OpenTelemetry

All,
Thanks in advance for your time. We are moving to OpenTelemetry from ILogger/log4net logging to files. We were on-prem now moving to the cloud logging to files is not going to work. We use AWS. I have the aws-otel-collector working with tracing. Logging seems to be to console only - there is no way to get logs to xray via OT. In on-prem we made extensive use of file based logging now the auto-instrumentation in OT and AWS does most of what we need. There are times where we all wish we could peek inside the code at runtime and see a few values that the auto instrumentation does not provide. That is what I would like to log to x-ray via OT. There are samples (with warning that say not best practice) that explain how to do this in native AWS but that means I have to run the aws-otel-collector and the x-ray daemon. The use of logs would be very limited and judicious but I would really like to have them covered by one API. Is this possible?
Again - thanks in advance for your time.
Steve
It looks like you don't differentiate between traces and logs. They are not the same. You can include "logs" (correct term is "event in the span") into trace, but that must be done when traces are generated. If you own the code, then check documentation how to do that.
Opentelemetry (OTEL) is designated for metrics, traces, logs. But implemenetation for logs is still not stable. See https://opentelemetry.io/status/#logging
So I would use OTEL for now only for traces (X-Ray), metrics (AWS Prometheus). Logs should be processed outside of OTEL and stored in correct log storage - that's not X-Ray (that's a trace storage), but OpenSearch, CloudWatch logs, ...

Caching solution for AWS SSM parameter store to be used with dotnet lambdas

I have a lot of dotnet lambda microservices using SSM parameter store for configuration purposes. It's been quite adventageous over environment variables as I'm sharing a lot of configuration across different microservices. Though recently I've started pushing the limits of it. It now affects my throughput and started costing more than I'd like.
I've considered using the amazon extension for dotnet configuration manager, but it falls short for my requirements. I need the configuration to hot swap to keep the microservices running healthy at high uptime. Which won't happen with its current implementation. Deploying all microservices just for a configuration change is not an option either.
This lead me to research a cache solution that is able to at least invalidate the cache from outside, but I couldn't come accross anything that works with SSM parameter store out of box.
At worst, I'll need to come up with another microservice with it's own db that takes care of the configuration, but I don't wanna go down that path tbh.
What is the general approach that is being used this kind of scenarios?
You can use SSM in environment variables like
environment:
VariableName: ${ssm:/serverless-VariableName}
and reference in your code from environment. We are using this approach.
This will store SSM when you deploy your app, and reuse it without calling SSM Store for every request
For reducing number of network calls to SSM parameter store, you can
assign configuration values from SSM to static properties on
application startup. And use these static properties for
configuration values in your application instead of calling SSM parameter store again throughout the life of that particular instance of lambda.
Changes to SSM parameters will reflect only in new instances of lambda.
If you are using provisioned concurrency on lambda then the above mentioned solution will not be helpful as the changes in SSM store parameters will not reflect to provisioned lambda as it is always kept in initialized state. For changes to reflect you need to redeploy or remove provision concurrency and add it back.
If you have a use case where parameter values get changed frequently and it should be reflected in your lambda code immediately then you can use secret manager to store such values and use aws provided client side caching support for secrets https://aws.amazon.com/blogs/security/how-to-use-aws-secrets-manager-client-side-caching-in-dotnet/.
I think that we need to deep into the architecture of your question.
Since you are using lambda, independently of your configuration, if you are not using provisioned concurrency, your container life cycle will be 5-10 minutes (common lifecycle of shared lambda container).
That said, if we are using another type of infrastructure, such as K8s (EKS for example), you could:
Cache this SSM parameter in a distributed cache (Elasticache).
Create a SSM parameter change event in cloudwatch events.
Put a SNS as target.
Subscribe http endpoint or a lambda function to clear this cache entry.
Now, with the cache invalidated, the first app that needs this parameter will fetch the value from SSM parameter and put into cache, and you can put a TTL here for invalidate in schedule.
But, because you are running with an serverless approach, create a TCP connection for each lambda container (you can share the tcp connection with elasticache across multiple invocations) maybe downgrade your performance, so, you need to make this tradeoff:
Verify that connection with elasticache is a problem for your use case. If is, you can use a simple SSM parameter cache client and put a small TTL (for example 5 minutes), just to prevent your lambdas hit the SSM parameter limits.

How to download 300k log lines from my application?

I am running a job on my Heroku app that generates about 300k lines of log within 5 minutes. I need to extract all of them into a file. How can I do this?
The Heroku UI only shows logs in real time, since the moment it was opened, and only keeps 10k lines.
I attached a LogDNA Add-on as a drain, but their export also only allows 10k lines export. To even have the option of export, I need to apply a search filter (I typed 2020 because all the lines start with a date, but still...). I can scroll through all the logs to see them, but as I scroll up the bottom gets truncated, so I can't even copy-paste them myself.
I then attached Sumo Logic as a drain, which is better, because the export limit is 100k. However I still need to filter the logs in 30s to 60s intervals and download separately. Also it exports to CSV file and in reverse order (newest first, not what I want) so I have to still work on the file after its downloaded.
Is there no option to get actual raw log files in full?
Is there no option to get actual raw log files in full?
There are no actual raw log files.
Heroku's architecture requires that logging be distributed. By default, its Logplex service aggregates log output from all services into a single stream and makes it available via heroku logs. However,
Logplex is designed for collating and routing log messages, not for storage. It retains the most recent 1,500 lines of your consolidated logs, which expire after 1 week.
For longer persistence you need something else. In addition to commercial logging services like those you mentioned, you have several options:
Log to a database instead of files. Something like Apache Cassandra might be a good fit.
Send your logs to a logging server via Syslog (my preference):
Syslog drains allow you to forward your Heroku logs to an external Syslog server for long-term archiving.
Send your logs to a custom logging process via HTTPS.
Log drains also support messaging via HTTPS. This makes it easy to write your own log-processing logic and run it on a web service (such as another Heroku app).
Speaking solely from the Sumo Logic point of view, since that’s the only one I’m familiar with here, you could do this with its Search Job API: https://help.sumologic.com/APIs/Search-Job-API/About-the-Search-Job-API
The Search Job API lets you kick off a search, poll it for status, and then when complete, page through the results (up to 1M records, I believe) and do whatever you want with them, such as dumping them into a CSV file.
But this is only available to trial and Enterprise accounts.
I just looked at Heroku’s docs and it does not look like they have a native way to retrieve more than 1500 and you do have to forward those logs via syslog to a separate server / service.
I think your best solution is going to depend, however, on your use-case, such as why specifically you need these logs in a CSV.

Resources