I would like to know which is the better option amongst these two techniques:
Writing a function in EC2 instance using AWS CLI
Writing a function using AWS Lambda
I prefer writing code in AWS Lambda, but I would like to know if there are any specific advantages on using Lambda.
P.S: Those functions that I have to execute are almost the same(they utilize the same algorithm), so there is no difference in the functionality.
Regards
I understand that you would like to know if there is any particular advantage of using AWS Lambda instead of using EC2 Instances.
Here are some advantages of AWS Lambda:
Reduced cost. Lambda follows a pay-as-you-invoke pricing model, unlike AWS EC2, and the first million invokes fall under the free tier category[1]. Depending on your use case, you might be able to save a lot using AWS Lambda on your production environment.
No system administration payload. AWS Lambda follows serverless computing paradigms, and there's no need to start servers, configure them as per your need, and maintain them.
An AWS Lambda function can be pretty convenient for automation tasks and can be triggered by a number of services[2]. Eg: If you upload a file to an AWS S3 bucket, you could choose to trigger a Lambda function that compresses the file and stores it in another S3 bucket.
However, Lambda has some disadvantages as well, compared to EC2/ECS:
Lambda functions are prone to the Cold Start issue. A Cold Start issue usually occurs when a Lambda function hasn't been invoked for quite some time. AWS deploys a new container to the Lambda function in the backend, and there might be delayed invocations at times[3].
It can get arduous to debug AWS Lambda function logs and metrics in Amazon CloudWatch.
A Lambda function has a supported maximum execution time of 15 minutes, and there is a time period limitation. Therefore, it might not be possible to use a Lambda function for time-consuming operations(eg: Processing large flat files).
Amazon EC2 has a system administration payload and it might cost a bit higher, but there are no Lambda Cold Start issues, and it can even work for long running tasks.
Therefore, you could choose to use EC2 or Lambda depending on your exact use-case.
I hope this answer helps you out.
References
[1]. https://aws.amazon.com/lambda/pricing/
[2]. https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda-function.html
[3]. https://docs.aws.amazon.com/lambda/latest/dg/running-lambda-code.html
AWS Lambda run as the stateless service, means, we can't store files inside the function. We built the whole application with the 60 lambda functions. Out of 60, 54 lambda functions are triggered by API gateway remaining are act as the service modules (means, called by another lambda function).
If you use lambda function as the microservice, you can gain more in the performance and price aspects.
My suggestion: Don't create a single lambda function to run the whole system. Go with microservices method.
Related
We are moving our infrastructure to cloud formation since it's much easier to describe the infrastructure in a nice manner. This works fantastically well for things like security groups, routing, VPCs, transit gateways.
However, we have two issues which we are struggling with and I don't think fit the declarative, infrastructure-as-code paradigm which things like terrafrom and cloud formation are.
(1) We have a business requirement where we run a scheduled batch at specific times in the day. These are very computationally intensive. To save costs, we run these on an EC2 which is brought up at that time, then torn down when the batch is finished. However, this seems to require a temporary change to the terraform/CF files, then a change back. Is there a more native way of doing this?
(2) We dynamically store and allow to be edited by clients their firewalling rules on their load balancer (ALB). This information cannot be stored in the terraform/CF files since it can be changed by clients on demand.
Is there a way of properly doing these things in CF/Terraform?
(1) If you have to use EC2, you could create a Lambda that would start your EC2 instances. Then, create a CloudWatch Event that triggers the Lambda at your specified date / time. For more details you can see https://aws.amazon.com/premiumsupport/knowledge-center/start-stop-lambda-cloudwatch/. Once the job is done, have your EC2 shut itself down using the awssdk or awscli.
Alternatively, you could use AWS Lambda to run your batch job. You only get charged when the Lambda runs. Likewise, create a CloudWatch Event rule that schedules the Lambda.
(2) You could store the firewall rules in your own DB and modify the actual ALB SG rules using the awssdk. I don't think it's a good idea to store these things in Terraform/CF. IMHO Terraform/CF are great for declaring infrastructure but won't be a good solution for resources that are dynamically changing, especially by third parties like your clients.
I am looking to setup an event driven architecture to process messages from SQS and load into AWS S3. The events will be low volume and I was looking at either using Databricks or AWS lambda to process these messages as these are the 2 tools we already have procured.
I wanted to understand which one would be best to use as I'm struggling to differentiate them for this task as the throughput is only up to 1000 messages per day and unlikely to go higher at the moment so both are capable.
I just wanted to see what other people would consider and see as the differentiators between the two of these products so I can make sure this is future proofed as best I can?
We have used lambda more where I work and it may help to keep it consistent as we have more AWS skills in house but we are looking to build out databricks capability and I do personally find it easier to use.
If it was big data then I would have made the decision easier.
Thanks
AWS Lambda seems to be a much better choice in this case. Following are some benefits you will get with Lambda as compared to DataBricks.
Pros:
Free of cost: AWS Lambda is free for 1 Million requests per month and 400,000 GB-seconds of compute time per month, which means your request rate of 1000/day will easily be covered under this. More details here.
Very simple setup: The Lambda function implementation will be very straight-forward. Connect the SQS Queue with your Lambda function using the AWS Console or AWS cli. More details here. The Lambda function code will just be a couple of lines. It receives the message from SQS queue and writes to S3.
Logging and monitoring: You won't need any separate setup to track the performance metrics - How many messages were processed by Lambda, how many were successful, how much time it took. All these metrics are automatically generated by AWS CloudWatch. You also get an in-built retry mechanism, just specify the retry policy and AWS Lambda will take care of the rest.
Cons:
One drawback of this approach would be that each invocation of Lambda will write to a separate file in S3 because S3 doesn't provide APIs to append to existing files. So you will get 1000 files in S3 per day. Maybe you are fine with this (depends on what you want to do with this data in S3). If not, you will either need a separate job to join all files periodically or do a download of existing file from S3, append to it and upload back, which makes your Lambda a bit more complex.
DataBricks on the other hand, is built for different kind of use cases - Loading large datasets from Amazon S3 and performing analytics, SQL-like queries, builing ML models etc. It won't be suitable for this use case.
I created a new lambda function but do not see cloudfront as an option in the Triggers. Does anybody know why that might be? Thanks
As per AWS current documentation:
Make sure that you’re in the US-East-1 (N. Virginia) Region. You must
be in this Region to create Lambda#Edge functions.
See: AWS Tutorial: Creating a Simple Lambda#Edge Function
You cannot add from Lambda console. For adding trigger for cache behavior, you need to do it from CloudFront console.
Its is explained in detail here - https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-edge-add-triggers-cf-console.html
CloudFront's Lambda#Edge integration feature requires that the functions be written in Node.js. It isn't possible to trigger a function in another language directly from CloudFront.
You must create functions with the nodejs6.10 or nodejs8.10 runtime property.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-requirements-limits.html#lambda-requirements-lambda-function-configuration
Of course, in the Node.js runtime environment, you have the AWS Javascript SDK available, so if you had a really compelling case, you could use that from a Javascript function to invoke another, different Lambda function written in a different language... but it's difficult to imagine a common case where that would make sense, because of the added latency and cost, but I have, for example, used this solution to allow Lambda#Edge to reach inside a VPC -- which can only be done by invoking a second Lambda function (which could be configured to have VPC access) from inside the first (which can't, because Lambda#Edge functions run in the region nearest to the viewer, rather than in a single region, so they will not run inside a VPC).
I'm pretty new to the whole Serverless landscape, and am trying to wrap my head around when to use Fargate vs Lambda.
I am aware that Fargate is a serverless subset of ECS, and Lambda is serverless as well but driven by events. But I'd like to be able to explain the two paradigms in simple terms to other folks that are familiar with containers but not that much with AWS and serverless.
Currently we have a couple of physical servers in charge of receiving text files, parsing them out, and populating several db tables with the results. Based on my understanding, I think this would be a use case better suited for Lambda because the process that parses out the text files is triggered by a schedule, is not long running, and ramps down when not in use.
However, if we were to port over one of our servers that receive API calls, we would probably want to use Fargate because we would always need at least one instance of the image up and running.
In terms of containers, and in very general terms would it be safe to say that if the container is designed to do:
docker run <some_input>
Then it is a job for Lambda.
But if the container is designed to do something like:
docker run --expose 80
Then it is a job for Fargate.
Is this a good analogy?
That's the start of a good analogy. However Lambda also has limitations in terms of available CPU and RAM, and a maximum run time of 15 minutes per invocation. So anything that needs more resources, or needs to run for longer than 15 minutes, would be a better fit for Fargate.
Also I'm not sure why you say something is a better fit for Fargate because you "always need at least one instance running". Lambda+API Gateway is a great fit for API calls. API Gateway is always ready to receive the API call and it will then invoke a Lambda function to process it (if the response isn't already cached).
It is important to notice that with Lambda you don't need to build, secure, or maintain a container. You just worry about the code. Now as mentioned already, Lambda has a max run time limit and 3GB memory limit (CPU increases proportionally). Also if it is used sporadically it may need to be pre-warmed (called on a schedule) for extra performance.
Fargate manages docker containers, which you need to define, maintain and secure. If you need more control of what is available in the environment where your code runs, you could potentially use a container (or a server), but that again comes with the management. You also have more options on Memory/CPU size and length of time your run can take to run.
Even for an API server as you mentioned you could put API gateway in front and call Lambda.
As Mark has already mentioned, you can Lambda + API Gateway to expose your lambda function as API.
But lambda has significant limitations in terms of function executions. There are limitations on the programming languages supported, memory consumption and execution time (It was increased to 15 mins recently from the earlier 5 mins). This is where AWS Fargate can help by giving the benefits of both container world and Serverless (FaaS) world. Here you worry only about container (its CPU, memory requirements, IAM policies..) and leave the rest to Amazon ECS by choosing Fargate launch type. ECS will choose the right instance type, manage your cluster, it's auto scaling, optimum utilization.
This is the right analogy, but it is not an exhaustive list to be able to explain the two paradigms.
In general, Lambda is more suitable for serverless applications. Its nature is a function-as-a-service (FaaS). It just does the simple tasks and that’s all. Don’t expect too much more.
It should be considered as the first option for serverless module. But it has more limitations and restrictions. Module architecture elaborated from functional and not-functional requirements, surrounded infrastructure and many other factors.
To make a decision minimum you must review the list of restrictions such as:
Portability
Environment control
Trigger type
Response time
Response size
Process time
Memory usage
These are the main factors. But the list hasn’t covered all the factors and restrictions to consider between both these serverless technologies.
To know more about I recommend this article https://greenm.io/aws-lambda-or-aws-fargate/
I am considering to use the AWS lambda serverless architecture for my next project. This is my understanding of the technology and I would very much appreciate it if somebody can correct me.
You can deploy function that acts as the event handlers.
The event handlers are configured to respond to any events that are provided
In the case of writing the lambda functions in Javascript, you can require any other Javascript modules you write and use them.
All your lambda and its required modules are written stateless. Your app's states are ultimately kept in the database.
If you ever want to write some stateful logic such as keeping the results from one HTTP request and temporarily store it somewhere and look it up in the subsequent request, is this not possible in Lambda?
About your question, lambdas can use a temporal directory /tmp to storage files. This has a limitation of 500MB. Since the lambda container COULD be reused for performance, there is a chance that the file is still there for the next lambda invocation. This is discouraged but in some particular cases could be helpful. Anyway, if you really need it, the better approach would be to use a cache system.
In addition to your considerations, AWS Lambdas are not good for:
To keep state, like files that are downloaded and could be reused later.
Handle OS
Long running tasks
Hard latency requirements apps.
Depending on the database client, multiple concurrent lambdas can lead to an overhead in the database connections since a client is instantiated for each lambda.