Fargate vs Lambda, when to use which? - aws-lambda

I'm pretty new to the whole Serverless landscape, and am trying to wrap my head around when to use Fargate vs Lambda.
I am aware that Fargate is a serverless subset of ECS, and Lambda is serverless as well but driven by events. But I'd like to be able to explain the two paradigms in simple terms to other folks that are familiar with containers but not that much with AWS and serverless.
Currently we have a couple of physical servers in charge of receiving text files, parsing them out, and populating several db tables with the results. Based on my understanding, I think this would be a use case better suited for Lambda because the process that parses out the text files is triggered by a schedule, is not long running, and ramps down when not in use.
However, if we were to port over one of our servers that receive API calls, we would probably want to use Fargate because we would always need at least one instance of the image up and running.
In terms of containers, and in very general terms would it be safe to say that if the container is designed to do:
docker run <some_input>
Then it is a job for Lambda.
But if the container is designed to do something like:
docker run --expose 80
Then it is a job for Fargate.
Is this a good analogy?

That's the start of a good analogy. However Lambda also has limitations in terms of available CPU and RAM, and a maximum run time of 15 minutes per invocation. So anything that needs more resources, or needs to run for longer than 15 minutes, would be a better fit for Fargate.
Also I'm not sure why you say something is a better fit for Fargate because you "always need at least one instance running". Lambda+API Gateway is a great fit for API calls. API Gateway is always ready to receive the API call and it will then invoke a Lambda function to process it (if the response isn't already cached).

It is important to notice that with Lambda you don't need to build, secure, or maintain a container. You just worry about the code. Now as mentioned already, Lambda has a max run time limit and 3GB memory limit (CPU increases proportionally). Also if it is used sporadically it may need to be pre-warmed (called on a schedule) for extra performance.
Fargate manages docker containers, which you need to define, maintain and secure. If you need more control of what is available in the environment where your code runs, you could potentially use a container (or a server), but that again comes with the management. You also have more options on Memory/CPU size and length of time your run can take to run.
Even for an API server as you mentioned you could put API gateway in front and call Lambda.

As Mark has already mentioned, you can Lambda + API Gateway to expose your lambda function as API.
But lambda has significant limitations in terms of function executions. There are limitations on the programming languages supported, memory consumption and execution time (It was increased to 15 mins recently from the earlier 5 mins). This is where AWS Fargate can help by giving the benefits of both container world and Serverless (FaaS) world. Here you worry only about container (its CPU, memory requirements, IAM policies..) and leave the rest to Amazon ECS by choosing Fargate launch type. ECS will choose the right instance type, manage your cluster, it's auto scaling, optimum utilization.

This is the right analogy, but it is not an exhaustive list to be able to explain the two paradigms.
In general, Lambda is more suitable for serverless applications. Its nature is a function-as-a-service (FaaS). It just does the simple tasks and that’s all. Don’t expect too much more.
It should be considered as the first option for serverless module. But it has more limitations and restrictions. Module architecture elaborated from functional and not-functional requirements, surrounded infrastructure and many other factors.
To make a decision minimum you must review the list of restrictions such as:
Portability
Environment control
Trigger type
Response time
Response size
Process time
Memory usage
These are the main factors. But the list hasn’t covered all the factors and restrictions to consider between both these serverless technologies.
To know more about I recommend this article https://greenm.io/aws-lambda-or-aws-fargate/

Related

dynamic ec2 resourcing in declarative cloud formation/terraform

We are moving our infrastructure to cloud formation since it's much easier to describe the infrastructure in a nice manner. This works fantastically well for things like security groups, routing, VPCs, transit gateways.
However, we have two issues which we are struggling with and I don't think fit the declarative, infrastructure-as-code paradigm which things like terrafrom and cloud formation are.
(1) We have a business requirement where we run a scheduled batch at specific times in the day. These are very computationally intensive. To save costs, we run these on an EC2 which is brought up at that time, then torn down when the batch is finished. However, this seems to require a temporary change to the terraform/CF files, then a change back. Is there a more native way of doing this?
(2) We dynamically store and allow to be edited by clients their firewalling rules on their load balancer (ALB). This information cannot be stored in the terraform/CF files since it can be changed by clients on demand.
Is there a way of properly doing these things in CF/Terraform?
(1) If you have to use EC2, you could create a Lambda that would start your EC2 instances. Then, create a CloudWatch Event that triggers the Lambda at your specified date / time. For more details you can see https://aws.amazon.com/premiumsupport/knowledge-center/start-stop-lambda-cloudwatch/. Once the job is done, have your EC2 shut itself down using the awssdk or awscli.
Alternatively, you could use AWS Lambda to run your batch job. You only get charged when the Lambda runs. Likewise, create a CloudWatch Event rule that schedules the Lambda.
(2) You could store the firewall rules in your own DB and modify the actual ALB SG rules using the awssdk. I don't think it's a good idea to store these things in Terraform/CF. IMHO Terraform/CF are great for declaring infrastructure but won't be a good solution for resources that are dynamically changing, especially by third parties like your clients.

AWS Fargate vs Lambda with Provisioned concurrency

I want to support a light-weight functionality (a DDB lookup and forward the request to backend service) supporting around peak 50 TPS with an acceptable latency <1s, I was thinking about using lambda with provisioned concurrency feature, or is it better to use Fargate?
When should we prefer Fargate to Lambda with Provisioned concurrency feature?
Any pointers to cost, performance study of both the services is helpful.
In general I would prefer Lambda over Fargate for the following reasons:
Scaling is handled for you. This can be particularly important if your traffic is very spiky and unpredictable.
At low volume you might not have to pay anything
It's easier to deploy and configure
Fargate does have some advantages, however.
At a higher load you may find that Fargate is cheaper.
A single Fargate task may be able to handle quite a few requests, so you might not need a lot of scaling. This depends a lot on language and what the code does.
No cold starts. Whilst provisioned concurrency in Lambda helps with this it doesn't deal with traffic spikes. If you have consistent/known spikes in traffic you can probably handle it with autoscaling in Fargate.
There is more that goes into it that what I've stated, and I've certainly over simplified some of my statements, but it's a good starting place.

AWS Lambda vs EC2 REST API

I am not an expert in AWS but have some experience. Got a situation where Angular UI (host on EC2) would have to talk to RDS DB instance. All set so far in the stack except API(middle ware). We are thinking of using Lambda (as our traffic is unknown at this time). Again here we have lot of choices to make on programming side like C# or Python or Node. (we are tilting towards C# or Python based on the some research done and skills also Python good at having great cold start and C# .NET core being stable in terms of performance).
Since we are with Lambda offcourse we should go in the route of API GATEWAY. all set but now, can all the business logic of our application can reside in Lambda? if so wouldnt it Lambda becomes huge and take performance hit(more memory, more computational resources thus higher costs?)? then we thought of lets have Lambda out there to take light weight processing and heavy lifting can be moved to .NET API that host on EC2?
Not sure of we are seeing any issues in this approach? Also have to mention that, Lambda have to call RDS for CRUD operations then should I think to much about concurrency issues? as it might fall into state full category?
The advantage with AWS Lambda here is scaling. As you know already , cuz Lambda is fully managed so we can take advantages of it for this case.
If you host API on EC2, so we don't have "scaling" part in place. And of course, you can start using ECS, Auto Scaling Group ... but it is bring you to another route.
Are you building this application for learning or for production?

Can AWS Lambda be used as the backend for getstream.io?

I didn't find any posts related to this topic. It seems natural to use Lambda as a getstream backend, but I'm not sure if it heavily depends on persistent connections or other architectural choices that would rule it out. Is it a sensible approach? Has anyone made it work? Any advice?
While you can build an entire website only in Lambda, you have to consider the followings:
Lambda behind API Gateway has a timeout limit of 30 seconds and a Payload size limit (both received and sended) of 6MB. While for most of the cases this is fine, if you have some really big operations or you need to send some really big datas (like a high resolution image), you can't do it with this approach, but you need to think about something else (for instance you can send an SNS to another Lambda function with higher timeout that can do all this asynchronously and then send the result to the client when it's done, supposing the client is capable of receiving events)
Lambda has cold starts, which in terms slow down your APIs when a client calls them for the first time in a while. The cold start time depends on the language you are doing your Lambdas, so you might consider this too. If you are using C# or Java for your Lambdas, than this is probably not the best choice. From this point of view, Node.JS and Python seems to be the best choices, with Golang rising. You can find more about it here. And by the way, you can now specify a Provisioned Throughput for your Lambda, which aims to fix the cold start issue, but I haven't used it yet so I can't tell if there is any difference (but I'm sure there is)
If done correctly you'll end up managing hundreds of Lambda functions, while with a standard Docker Container under ECS you'll manage few APIs with multiple endpoints. This point should not be underestimated, as on one side it will make changes easier in the future, since lambda will be small and you'll easily find the bug and fix it, but on the other side you have to move across these functions, which if you don't know exactly which lambda is responsible of what can be a long process
Lambda can't handle sessions as far as I know. Because after some time the Lambda container gets dropped, you can't store any session inside the Lambda itself. You'll always need a structure to store the session so it can be shared across multiple Lambda invocations, such as some records in a DynamoDB table or something else, but this mean that you have to write the code for this, while in a classic API (like a .NET Core one) all of this is handled by the language itself and you only need to store or retrieve items from the session (most of the times)
So... yeah! A backed written entirely in Lambda is possible. The company I work in does it and I must say is a lot better, both in terms of speed and development time. But those benefits comes later, since you need to face all of the reasons I listed above before, and is not as easy as it could seem
Yes, you can use AWS Lambda as backend and integrate with Stream API there.
Building an entire application on Lambda directly is going to be very complex and requires writing lot of boiler plate code just to enforce some basic organization and structure to your project.
My recommendation is use a serverless framework to do this that takes care of keeping your application well organized and to deploy new versions (and environments).
Serverless is a good option for that: https://serverless.com/framework/docs/providers/aws/guide/intro/

Serverless - Running an Express instance in a Lambda function, good or bad?

While learning the Serverless Framework I came across several tutorials showing how to run an Express instance in a Lambda. This seems to me like an overkill and against the purpose of Lambda functions.
The approach usually involves running an Express instance in the Lambda and proxying API Gateway requests to the Express router for internal handling.
To me the trivial approach is to just create an API in API Gateway and route individual requests to a Lambda for handling. Am I missing something?
Taking into account that Lambdas' execution time is 15 minutes, isn't just spinning up the Express instance quite expensive in terms of memory? Also, limited to 100 concurrent Lambda executions would create a bottleneck, no? Wouldn't an EC2 instance be a better fit in such case? Using a Lambda like this seems like an overkill.
The only two benefits I see in running an Express instance in a Lambda are:
In the case of migration of an existing app written in Express, allows to slowly break down the app into API Gateway endpoints.
Internal handling of routing rather than relying on the API Gateway request/response model (proxying to Express router).
What would be the benefit of such approach, in case I am missing something?
Some resources promoting this approach:
Express.js and AWS Lambda — a serverless love story (Slobodan Stojanović, freeCodeCamp)
awslabs/aws-serverless-express (GitHub)
Deploy a REST API using Serverless, Express and Node.js (Alex DeBrie, Serverless Framework Blog)
Most of your points are valid, and it can indeed be called an antipattern to run Express inside your Lambda functions behind an API Gateway.
Should be noted that the initialization time is not that much of a concern. While the execution time of a single invocation is capped at 15 minutes, a single Lambda instance will serve multiple requests after it has been fired up. A frequently invoked single Lambda instance has usually a lifetime or 6 to 9 hours, and is disposed of at about 30 minutes of inactivity. (note that AWS does not publicly disclose these parameters and these figures should only be used as a ballpark). Whoever is the unlucky one to get the cold start and eat the initialization delay could get an additional delay in the thousands of milliseconds however.
The singular main advantage of this approach is, as you said, providing a migration path for existing Node developers with existing Express knowledge and applications. You should generally not consider this approach when developing an application from scratch and implement idiomatic serverless patterns instead (e.g. utilizing API Gateway routing).
Just to reiterate, the main downsides of this approach:
Higher needless overall code complexity due to forgoing API Gateway functionality (routing etc.)
Higher initialization time resulting in longer cold starts
Larger code footprint due to more dependencies
Larger code footprint due to losing tree shaking / individual packaging due to internal routing
P.S. The main contender these days probably wouldn't be a dedicated EC2 instance, and rather Fargate containers running Express in Node.js. This pattern has many of the same benefits as Serverless while keeping existing development patterns and tools largely intact.

Resources