Continuous updates of storage change in AWS S3 - spring

I have lifecycle rule enabled on s3, which moves objects to galcier after 30days. Since AWS does not support event notification yet, I don't have a way to update my application about object moving to glacier.
My use case is "Once object is moved to glacier, I want to restrict users from performing any action on that object". Is there a way to get update once object moves to S3?
I am planning to implement a scheduler (using spring #Scheduler) which will run every 1Hr and scan all objects in s3 and check if they have moved to glacier then update application RDS accordingly.
Let me know if there are other good approaches to handle this use case instead of writing a scheduler.
Regards.

Is there a way to get update once object moves to S3?
At the moment, there is no S3 event support for this.
Let me know if there are other good approaches to handle this use case
instead of writing a scheduler.
The approach you are planning to use, seems reasonable. For the scheduler, you can use a Lambda function with scheduled events without putting the burden to your current server.

Related

dynamic ec2 resourcing in declarative cloud formation/terraform

We are moving our infrastructure to cloud formation since it's much easier to describe the infrastructure in a nice manner. This works fantastically well for things like security groups, routing, VPCs, transit gateways.
However, we have two issues which we are struggling with and I don't think fit the declarative, infrastructure-as-code paradigm which things like terrafrom and cloud formation are.
(1) We have a business requirement where we run a scheduled batch at specific times in the day. These are very computationally intensive. To save costs, we run these on an EC2 which is brought up at that time, then torn down when the batch is finished. However, this seems to require a temporary change to the terraform/CF files, then a change back. Is there a more native way of doing this?
(2) We dynamically store and allow to be edited by clients their firewalling rules on their load balancer (ALB). This information cannot be stored in the terraform/CF files since it can be changed by clients on demand.
Is there a way of properly doing these things in CF/Terraform?
(1) If you have to use EC2, you could create a Lambda that would start your EC2 instances. Then, create a CloudWatch Event that triggers the Lambda at your specified date / time. For more details you can see https://aws.amazon.com/premiumsupport/knowledge-center/start-stop-lambda-cloudwatch/. Once the job is done, have your EC2 shut itself down using the awssdk or awscli.
Alternatively, you could use AWS Lambda to run your batch job. You only get charged when the Lambda runs. Likewise, create a CloudWatch Event rule that schedules the Lambda.
(2) You could store the firewall rules in your own DB and modify the actual ALB SG rules using the awssdk. I don't think it's a good idea to store these things in Terraform/CF. IMHO Terraform/CF are great for declaring infrastructure but won't be a good solution for resources that are dynamically changing, especially by third parties like your clients.

Can AWS Lambda be used as the backend for getstream.io?

I didn't find any posts related to this topic. It seems natural to use Lambda as a getstream backend, but I'm not sure if it heavily depends on persistent connections or other architectural choices that would rule it out. Is it a sensible approach? Has anyone made it work? Any advice?
While you can build an entire website only in Lambda, you have to consider the followings:
Lambda behind API Gateway has a timeout limit of 30 seconds and a Payload size limit (both received and sended) of 6MB. While for most of the cases this is fine, if you have some really big operations or you need to send some really big datas (like a high resolution image), you can't do it with this approach, but you need to think about something else (for instance you can send an SNS to another Lambda function with higher timeout that can do all this asynchronously and then send the result to the client when it's done, supposing the client is capable of receiving events)
Lambda has cold starts, which in terms slow down your APIs when a client calls them for the first time in a while. The cold start time depends on the language you are doing your Lambdas, so you might consider this too. If you are using C# or Java for your Lambdas, than this is probably not the best choice. From this point of view, Node.JS and Python seems to be the best choices, with Golang rising. You can find more about it here. And by the way, you can now specify a Provisioned Throughput for your Lambda, which aims to fix the cold start issue, but I haven't used it yet so I can't tell if there is any difference (but I'm sure there is)
If done correctly you'll end up managing hundreds of Lambda functions, while with a standard Docker Container under ECS you'll manage few APIs with multiple endpoints. This point should not be underestimated, as on one side it will make changes easier in the future, since lambda will be small and you'll easily find the bug and fix it, but on the other side you have to move across these functions, which if you don't know exactly which lambda is responsible of what can be a long process
Lambda can't handle sessions as far as I know. Because after some time the Lambda container gets dropped, you can't store any session inside the Lambda itself. You'll always need a structure to store the session so it can be shared across multiple Lambda invocations, such as some records in a DynamoDB table or something else, but this mean that you have to write the code for this, while in a classic API (like a .NET Core one) all of this is handled by the language itself and you only need to store or retrieve items from the session (most of the times)
So... yeah! A backed written entirely in Lambda is possible. The company I work in does it and I must say is a lot better, both in terms of speed and development time. But those benefits comes later, since you need to face all of the reasons I listed above before, and is not as easy as it could seem
Yes, you can use AWS Lambda as backend and integrate with Stream API there.
Building an entire application on Lambda directly is going to be very complex and requires writing lot of boiler plate code just to enforce some basic organization and structure to your project.
My recommendation is use a serverless framework to do this that takes care of keeping your application well organized and to deploy new versions (and environments).
Serverless is a good option for that: https://serverless.com/framework/docs/providers/aws/guide/intro/

Parse - Updating local DataStore in combination with LiveQuery

I am writing an Android app using the local data store and LiveQueries. I am facing a problem in the combination of both. When I start the app, I fetch all instances of an entity the current user has access to and save them in the local storage. Additionally, I subscribe to changes of this entity with the same query using a LiveQuery. When such an entity that is changed by cloud code, e.g., gets deleted, the event is properly received in the app. However, the entity as part of the local storage is not deleted. As I have not found anything about this in the documentation or by searching the Internet my question is, if this is normal behavior and if I have to deal with the changes in the local storage manually, e.g., by unpinning the entity in this case. The other possibility is, that it should update, but it does not in my case somehow.
If I have to deal with it manually: What is the best way to replace/update the entity in the local storage, if only an update happens?
Thanks!
As a solution I simply do the pinning/unpinning now in every event handler. It really seems, that there is no automatic updating of the local datastore.

Is my understanding of the AWS Lambda serverless architecture correct?

I am considering to use the AWS lambda serverless architecture for my next project. This is my understanding of the technology and I would very much appreciate it if somebody can correct me.
You can deploy function that acts as the event handlers.
The event handlers are configured to respond to any events that are provided
In the case of writing the lambda functions in Javascript, you can require any other Javascript modules you write and use them.
All your lambda and its required modules are written stateless. Your app's states are ultimately kept in the database.
If you ever want to write some stateful logic such as keeping the results from one HTTP request and temporarily store it somewhere and look it up in the subsequent request, is this not possible in Lambda?
About your question, lambdas can use a temporal directory /tmp to storage files. This has a limitation of 500MB. Since the lambda container COULD be reused for performance, there is a chance that the file is still there for the next lambda invocation. This is discouraged but in some particular cases could be helpful. Anyway, if you really need it, the better approach would be to use a cache system.
In addition to your considerations, AWS Lambdas are not good for:
To keep state, like files that are downloaded and could be reused later.
Handle OS
Long running tasks
Hard latency requirements apps.
Depending on the database client, multiple concurrent lambdas can lead to an overhead in the database connections since a client is instantiated for each lambda.

When to use AWS Lambda vs. SWF

There is an SNS topic that I would like to listen in on and I understand that I can either use SQS with SWF to work on each event or have AWS Lambda subscribe directly to SNS to work on each event when it arrives. For each event all I plan to do is pull out certain information and store it into Elastic Search.
My question is when would I use one method versus the other? Is one better when it comes to handling errors?
For your use case you definitely want to Lambda.
SWF is much more complicated and is designed for longer processes, with multiple steps, that may take days to complete. For SWF I generally think of use cases like a customer placing an order on a website triggering a workflow that takes the order through all the steps of the process of billing, manufacturing, packaging, shipping, etc.

Resources