The main question is: how is time accounted in a wsgi application deployed on aws Lambda?
Suppose I deploy the following simple Flask app:
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello():
return "Hello World!", 200
using Zappa on AWS Lambda, with the following configuration:
{
"dev": {
"app_function": "simple_application.app",
"profile_name": "default",
"project_name": "simple_application",
"runtime": "python3.7",
"s3_bucket": "zappa-deployments-RANDOM",
"memory_size": 128,
"keep_warm": false,
"aws_region": "us-east-1"
}
}
Now if AWS has a request for my website, it will spin up a container with my code inside and let it deal with the request. Suppose the request is served in 200ms.
Obviously the Lambda with the wsgi server inside continues running (Zappa by default makes the lambda run for at least 30s).
So now to the various subquestions:
How much time am I charged for the execution?
200ms because of the request duration
30s because of the below limit for my lambda execution time
until the lambda is killed by AWS to reclaim space (which could occur even 30-45 minutes after)
If another request come along (and the first one is still being served) will the second request spin up another Lambda container or it will be queued until a threshold time has passed?
I expected to be charged just for the 200ms by reading AWS Lambda pricing page, but I would bet on it charging for 30s because, after all, I'm the one who imposed such limit.
In case I'm just charged for 200ms (and subsequent requests time) but the container keeps running uninterruptly for 30-45 minutes, I have a third subquestion:
Suppose now that I want to use a global variable as a simple local cache and synchronize it with a database (let's say DynamoDB) before the container is killed.
To do this I would like to raise the execution time limit of my lambdas to 15m, then at lambda creation set a Timer to fire a function that synchronizes the state and aborts the function after 14m30s.
How would accounted running time change in this settings (i.e. with a Timer that fires after a certain amount of time)?
The proposed lambda code for this subquestion is:
from flask import Flask
from threading import Timer
from db import Visits
import sys
lambda_uuid = "SOME-UUID-OF-LAMBDA-INSTANCE"
# Collects number of visits
visits = 0
def report_visits():
Visits(uuid=lambda_uuid, visits=visits).save()
sys.exit(0)
t = Timer(14 * 60 + 30, report_visits)
t.start()
# Start of flask routes
app = Flask(__name__)
#app.route("/")
def hello():
visits = visits + 1
return "Hello World!", 200
Thanks in advance for any information.
Zappa by default makes the lambda run for at least 30s
I can find no documentation to support this, and further all the Zappa documentation I can find states that a Zappa function will end as soon as a response is returned (just like any other AWS Lambda function) and as such you are only billed for those milliseconds that it took to generate the response.
I do see that the default maximum execution time for a Zappa Lambda function is 30 seconds. Perhaps that is where you are confused? That setting simply tells AWS Lambda to kill your running function instance if it runs for that long.
I expected to be charged just for the 200ms by reading AWS Lambda pricing page, but I would bet on it charging for 30s because, after all, I'm the one who imposed such limit.
You would be charged for exactly how long your function runs. So if it runs for 200ms, then you are charged for 200ms.
In case I'm just charged for 200ms (and subsequent requests time) but the container keeps running uninterruptly for 30-45 minutes, I have a third subquestion:
It doesn't keep running that long, it ends when it returns a response.
You are telling Lambda to kill your function instance if it runs for more than 30 seconds, so it's never going to run for more than 30 seconds with your current settings.
The current maximum run time for an AWS Lambda function is 15 minutes, so it couldn't possibly run for 30-45 minutes anyway.
Related
I'm integrating a lambda function with a standard queue in SQS.
I came across these two parameters batchSize and maxBatchingWindow. My original thinking was either the number of messages in the queue has reached the batchSize or the time since the first message came in has last for maxBatchingWindow seconds will trigger the lambda. In other words, whichever condition is satisfied first will invoke the lambda. And I couldn't find enough clarification about these two parameters in this documentation.
As a result, I did some experiment, setting batchSize = 3 and maxBatchingWindow = 300 seconds while setting the reservedConcurrency = 1 for lambda. Then I manually create 3 messages in the queue quickly (<< 5 min). However, I didn't observe the lambda being invoked after 5 minutes (300 s). Particularly, the metric Number Of Messages Sent of sqs shows a new data point at xx:54:15 while the logGroup for lambda updates around xx:59:53 (The lambda does nothing intensive but just to print out the value of event so I'm sure that would be the right execution).
Does that mean, once maxBatchingWindow is set greater than 0, it will become the only requirement to invoke lambda even if the batchSize has met?
I have built my own application around AWS Lambda and Salesforce.
I have around 10 users using my internal app, so not talkiing about big usage.
Daily, I have around 500-1000 SQS task which can be processed on a normal day, with one task which can take around 1-60 seconds depending on its complexity.
This is working perfectly.
Timeout for my lambda is 900.
BatchSize = 1
Using Python 3.8
I've created a decorator which allows me to process through SQS some of my functions which required to be processed ASYNC with FIFO logic.
Everything is working well.
My Lambda function doesn't return anything at the end, but it completes with success (standard scenario). However, I have noted that some tasks were going intot my DLQ (I only allow processing once, if it gets represented it goes into DLQ immediately).
The thing I don't get is why is this going on like this ?
Lambda ends with succes --> Normally the task should be deleted from the initial SQS queue.
So I've added a manual deletion of the task processed at the total end of the function. I've logged the result which is sent when I do boto3.client.delete_message and I get a 200 status so everything is OK..... However once in a while (1 out of 100, so 10 times per day in my case) I can see the task going into the DLQ...
Reprocessing the same task into my standard queue without changing anything... it gets processed successfuly (again) and deleted (as expected initially).
What is the most problematic to me is the fact that deleting the message still ends it with it going sometimes into DLQ ? What could be the problem ?
Example of my async processor
def process_data(event, context):
"""
By convention, we need to store in the table AsyncTaskQueueNamea dict with the following parameters:
- python_module: use to determine the location of the method to call asynchronously
- python_function: use to determine the location of the method to call asynchronously
- uuid: uuid to get the params stored in dynamodb
"""
print('Start Processing Async')
client = boto3.client('sqs')
queue_url = client.get_queue_url(QueueName=settings.AsyncTaskQueueName)['QueueUrl']
# batch size = 1 so only record 1 to process
for record in event['Records']:
try:
kwargs = json.loads(record['body'])
print(f'Start Processing Async Data Record:\n{kwargs}')
python_module = kwargs['python_module']
python_function = kwargs['python_function']
# CALLING THE FUNCTION WE WANTED ASYNC, AND DOING ITS STUFF... (WORKING OK)
getattr(sys.modules[python_module], python_function)(uuid=kwargs['uuid'], is_in_async_processing=True)
print('End Processing Async Data Record')
res = client.delete_message(QueueUrl=queue_url, ReceiptHandle=record['receiptHandle'])
print(f'End Deleting Async Data Record with status: {res}') # When the problem I'm monitoring occurs, it goes up to this line, with res status = 200 !! That's where I'm losing my mind. I can confirm the uuid in the DLQ being the same as in the queue so we are definitely talking of the same message which has been moved to the DLQ.
except Exception:
# set expire to 0 so that the task goes into DLQ
client.change_message_visibility(
QueueUrl=queue_url,
ReceiptHandle=record['receiptHandle'],
VisibilityTimeout=0
)
utils.raise_exception(f'There was a problem during async processing. Event:\n'
f'{json.dumps(event, indent=4, default=utils.jsonize_datetime)}')
Example of today's bug with logs from CloudWatch:
Initial event:
{'Records': [{'messageId': '75587372-256a-47d4-905b-62e1b42e2dad', 'receiptHandle': 'YYYYYY", "python_module": "quote.processing", "python_function": "compute_price_data"}', 'attributes': {'ApproximateReceiveCount': '1', 'SentTimestamp': '1621432888344', 'SequenceNumber': '18861830893125615872', 'MessageGroupId': 'compute_price_data', 'SenderId': 'XXXXX:main-app-production-main', 'MessageDeduplicationId': 'b4de6096-b8aa-11eb-9d50-5330640b1ec1', 'ApproximateFirstReceiveTimestamp': '1621432888344'}, 'messageAttributes': {}, 'md5OfBody': '5a67d0ed88898b7b71643ebba975e708', 'eventSource': 'aws:sqs', 'eventSourceARN': 'arn:aws:sqs:eu-west-3:XXXXX:async_task-production.fifo', 'awsRegion': 'eu-west-3'}]}
Res (after calling delete_message):
End Deleting Async Data Record with status: {'ResponseMetadata': {'RequestId': '7738ffe7-0adb-5812-8701-a6f8161cf411', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '7738ffe7-0adb-5812-8701-a6f8161cf411', 'date': 'Wed, 19 May 2021 14:02:47 GMT', 'content-type': 'text/xml', 'content-length': '215'}, 'RetryAttempts': 0}}
BUT... 75587372-256a-47d4-905b-62e1b42e2dad is in the DLQ after this delete_message. I'm becoming crazy
OK, the problem was due to my serverless.yml timeout settings to be 900, but not in AWS. I may have changed it manually to 1min, so my long tasks were released after 1 min and then going immediately to DLQ.
Hence the deletion doing anything since the task was already in the DLQ when the deletion was made
I am currently doing performance Testing for the application.. We need to test with number of concurrent users (eg. 300). We are using Stepping Thread group and it is working fine..
The test is about 38 mins. At some point, when the server memory is overloaded the memory is cleaned and getting restarted takes 10 to 20 seconds during that time we are getting 502 - Bad Gateway response..
We have almost 6 Modules (each is in Transaction controller) and each controller has almost 20 to 30 api calls)
I just wanted to pause 20 seconds when first we encounter 502.. Is it possible to do that? I can use If controller but i can not add for all the 20 calls is that previous sample is OK which is time taking process. Is there any other way?
I would check ResponseCodes in PostProcessor and in case it is 502 Bad Gateway, I would get the current thread to sleep using Java Tread and Jmeter Api using
JMeterThread getThread() from JMeterContext.
JMeterContext jmctx = JMeterContextService.getContext();
JMeterThread currentThread = jmctx.getThread();
currentThread.sleep(20000);
I am not sure about that currentThread.sleep(20000); because I need to check if JMeterThread inherits sleep() from Java Thread.
Checking it locally.
more samples are here :
https://www.programcreek.com/java-api-examples/?api=org.apache.jmeter.threads.JMeterContext
I have a for loop that iterates over an array. For each item in the array, it calls a function that makes django-rest-framework requests. Each function call is independent of the others.
If the array has 25 items, it currently takes 30 seconds to complete. I am trying to get the total time down to less than 10 seconds.
Half the time spent in the function is taken up by DRF requests. Would it make sense to replace the for loop with a multiprocessing Pool? If so, how do I to ensure each process makes requests over a separate connection using the requests package?
I tried just replacing:
for scenario_id in scenario_ids:
step_scenario_partial(scenario_id)
with:
pool = Pool(processes=2)
pool.map(step_scenario_partial, scenario_ids)
which failed due to OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'decryption failed or bad record mac')]
According to this, the error was due to re-using the same SSL connection in more than one process.
You can use the concurrent python module (docs) which can execute parallel tasks. Example method that returns a list of response objects:
from concurrent import futures
def execute_all(scenario_ids, num_workers=5):
'''
Method to make parallel API calls
'''
with futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
return [result for result in executor.map(step_scenario_partial, scenario_ids)]
The ThreadPoolExecutor uses a pool of threads to execute asynchronous parallel calls. You can experiment with values of num_workers, starting with 5, to ensure the total execution time is <10 seconds.
I need to regulate how often a Mechanize instance connects with an API (once every 2 seconds, so limit connections to that or more)
So this:
instance.pre_connect_hooks << Proc.new { sleep 2 }
I had thought this would work, and it sort of does BUT now every method in that class sleeps for 2 seconds, as if the mechanize instance is touched and told to hold 2 seconds. I'm going to try a post connect hook, but it is obvious I need something a bit more elaborate, but what I don't know what at this point.
Code is more explanation so if you are interested following along: https://github.com/blueblank/reddit_modbot, otherwise my question concerns how to efficiently and effectively rate limit a Mechanize instance to within a specific time frame specified by an API (where overstepping that limit results in dropped requests and bans). Also, I'm guessing I need to better integrate a mechanize instance to my class as well, any pointers on that appreciated as well.
Pre and post connect hooks are called on every connect, so if there is some redirection it could trigger many times for one request. Try history_added which only gets called once:
instance.history_added = Proc.new {sleep 2}
I use SlowWeb to rate limit calls to a specific URL.
require 'slowweb'
SlowWeb.limit('example.com', 10, 60)
In this case calls to example.com domain are limited to 10 requests every 60 seconds.