Activity cannot send a response with data larger than 32768 characters - ruby

I am trying to invoke a simple lambda function (the lambda function prints hello world to console) using ruby . However when I run the code and look at the swf dashboard . I see the following error :
Reason: An Activity cannot send a response with data larger than 32768 characters. Please limit the size of the response. You can look at the Activity Worker logs to see the original response.
Could someone help me out to resolve this issue?
the code is as follows:
require 'aws/decider'
require 'aws-sdk'
class U_Act
extend AWS::Flow::Activities
activity :b_u do
{
version: "1.0"
}
end
def b_u(c_id)
lambda=Aws::Lambda::Client.new(
region: “xxxxxx”
access_key_id: “XxXXXXXXXXX”,
secret_access_key: “XXXXXXXXXX”
)
resp = lambda.invoke(
function_name: “s_u_1” # required
)
print "#{resp}"
end
Thanks

According to AWS documentation you cannot send input / result data set size larger than 32,000 characters. This limit affects activity or workflow execution result data, input data when scheduling activity tasks or workflow executions, and input sent with a workflow execution signal.
Workaround to resolve this issue are
Use AWS S3 to upload the message and send the path of the S3 message between the activities.
If you need high performance use Elasticache and store the values and pass the keys between the activities.

Related

Form Recognizer Heavy Workload

My use case is the following :
Once every day I upload 1000 single page pdf to Azure Storage and process them with Form Recognizer via python azure-form-recognizer latest client.
So far I’m using the Async version of the client and I send the 1000 coroutines concurrently.
tasks = {asyncio.create_task(analyse_async(doc)): doc for doc in documents}
pending = set(tasks)
# Handle retry
while pending:
# backoff in case of 429
time.sleep(1)
# concurrent call return_when all completed
finished, pending = await asyncio.wait(
pending, return_when=asyncio.ALL_COMPLETED
)
# check if task has exception and register for new run.
for task in finished:
arg = tasks[task]
if task.exception():
new_task = asyncio.create_task(analyze_async(doc))
tasks[new_task] = doc
pending.add(new_task)
Now I’m not really comfortable with this setup. The main reason being the unpredictable successive states of the service in the same iteration. Can be up then throw 429 then up again. So not enough deterministic for me. I was wondering if another approach was possible. Do you think I should rather increase progressively the transactions. Start with 15 (default TPS) then 50 … 100 until the queue is empty ? Or another option ?
Thx
We need to enable the CORS and make some changes to that CORS to make it available to access the heavy workload.
Follow the procedure to implement the heavy workload in form recognizer.
Make it for page blobs here for higher and best performance.
Redundancy is also required. Make it ZRS for better implementation.
Create a storage account to upload the files.
Go to CORS and add the URL required.
Set the Allowed origins to https://formrecognizer.appliedai.azure.com
Go to containers and upload the documents.
Upload the documents. Use the container and blob information to give as the input for the recognizer. If the case is from Form Recognizer studio, the size of the total documents is considered and also the number of characters limit is there. So suggested to use the python code using the container created as the input folder.

SQS task going in DLQ despite being successful in Lambda + also when deleted manually

I have built my own application around AWS Lambda and Salesforce.
I have around 10 users using my internal app, so not talkiing about big usage.
Daily, I have around 500-1000 SQS task which can be processed on a normal day, with one task which can take around 1-60 seconds depending on its complexity.
This is working perfectly.
Timeout for my lambda is 900.
BatchSize = 1
Using Python 3.8
I've created a decorator which allows me to process through SQS some of my functions which required to be processed ASYNC with FIFO logic.
Everything is working well.
My Lambda function doesn't return anything at the end, but it completes with success (standard scenario). However, I have noted that some tasks were going intot my DLQ (I only allow processing once, if it gets represented it goes into DLQ immediately).
The thing I don't get is why is this going on like this ?
Lambda ends with succes --> Normally the task should be deleted from the initial SQS queue.
So I've added a manual deletion of the task processed at the total end of the function. I've logged the result which is sent when I do boto3.client.delete_message and I get a 200 status so everything is OK..... However once in a while (1 out of 100, so 10 times per day in my case) I can see the task going into the DLQ...
Reprocessing the same task into my standard queue without changing anything... it gets processed successfuly (again) and deleted (as expected initially).
What is the most problematic to me is the fact that deleting the message still ends it with it going sometimes into DLQ ? What could be the problem ?
Example of my async processor
def process_data(event, context):
"""
By convention, we need to store in the table AsyncTaskQueueNamea dict with the following parameters:
- python_module: use to determine the location of the method to call asynchronously
- python_function: use to determine the location of the method to call asynchronously
- uuid: uuid to get the params stored in dynamodb
"""
print('Start Processing Async')
client = boto3.client('sqs')
queue_url = client.get_queue_url(QueueName=settings.AsyncTaskQueueName)['QueueUrl']
# batch size = 1 so only record 1 to process
for record in event['Records']:
try:
kwargs = json.loads(record['body'])
print(f'Start Processing Async Data Record:\n{kwargs}')
python_module = kwargs['python_module']
python_function = kwargs['python_function']
# CALLING THE FUNCTION WE WANTED ASYNC, AND DOING ITS STUFF... (WORKING OK)
getattr(sys.modules[python_module], python_function)(uuid=kwargs['uuid'], is_in_async_processing=True)
print('End Processing Async Data Record')
res = client.delete_message(QueueUrl=queue_url, ReceiptHandle=record['receiptHandle'])
print(f'End Deleting Async Data Record with status: {res}') # When the problem I'm monitoring occurs, it goes up to this line, with res status = 200 !! That's where I'm losing my mind. I can confirm the uuid in the DLQ being the same as in the queue so we are definitely talking of the same message which has been moved to the DLQ.
except Exception:
# set expire to 0 so that the task goes into DLQ
client.change_message_visibility(
QueueUrl=queue_url,
ReceiptHandle=record['receiptHandle'],
VisibilityTimeout=0
)
utils.raise_exception(f'There was a problem during async processing. Event:\n'
f'{json.dumps(event, indent=4, default=utils.jsonize_datetime)}')
Example of today's bug with logs from CloudWatch:
Initial event:
{'Records': [{'messageId': '75587372-256a-47d4-905b-62e1b42e2dad', 'receiptHandle': 'YYYYYY", "python_module": "quote.processing", "python_function": "compute_price_data"}', 'attributes': {'ApproximateReceiveCount': '1', 'SentTimestamp': '1621432888344', 'SequenceNumber': '18861830893125615872', 'MessageGroupId': 'compute_price_data', 'SenderId': 'XXXXX:main-app-production-main', 'MessageDeduplicationId': 'b4de6096-b8aa-11eb-9d50-5330640b1ec1', 'ApproximateFirstReceiveTimestamp': '1621432888344'}, 'messageAttributes': {}, 'md5OfBody': '5a67d0ed88898b7b71643ebba975e708', 'eventSource': 'aws:sqs', 'eventSourceARN': 'arn:aws:sqs:eu-west-3:XXXXX:async_task-production.fifo', 'awsRegion': 'eu-west-3'}]}
Res (after calling delete_message):
End Deleting Async Data Record with status: {'ResponseMetadata': {'RequestId': '7738ffe7-0adb-5812-8701-a6f8161cf411', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '7738ffe7-0adb-5812-8701-a6f8161cf411', 'date': 'Wed, 19 May 2021 14:02:47 GMT', 'content-type': 'text/xml', 'content-length': '215'}, 'RetryAttempts': 0}}
BUT... 75587372-256a-47d4-905b-62e1b42e2dad is in the DLQ after this delete_message. I'm becoming crazy
OK, the problem was due to my serverless.yml timeout settings to be 900, but not in AWS. I may have changed it manually to 1min, so my long tasks were released after 1 min and then going immediately to DLQ.
Hence the deletion doing anything since the task was already in the DLQ when the deletion was made

golang get kubernetes resources(30000+ configmaps) failed

I want to use client-go to get resources in Kubernetes cluster. Due to a large amount of data, when I get the configmap connection is closed.
stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 695; INTERNAL_ERROR
configmaps:
$ kubectl -n kube-system get cm |wc -l
35937
code:
cms, err := client.CoreV1().ConfigMaps(kube-system).List(context.TODO(), v1.ListOptions{})
I try to use Limit parameter, I can get some data, but I don’t know how to get all.
cms, err := client.CoreV1().ConfigMaps(kube-system).List(context.TODO(), v1.ListOptions{Limit: 1000 })
I'm new to Go. Any pointers as to how to go about it would be greatly appreciated.
The documentation for v1.ListOptions describes how it works:
limit is a maximum number of responses to return for a list call. If more items exist, the
server will set the continue field on the list metadata to a value that can be used with the
same initial query to retrieve the next set of results.
This means that you should examine the response, save the value of the continue field (as well as the actual results), then reissue the same command but with continue set to the just seen value. Repeat until the returned continue field is empty (or an error occurs).
See the API concepts page for details on handling chunking of large results.
You should use a ListPager to paginate requests that need to query many objects. The ListPager includes buffering pages, so it has improved performance over simply using the Limit and Continue values.

How to detect if SimpleDB domain contains the requested item?

The AWS SimpleDB documentation for the Ruby SDK provides the following example with regard to using the get_attributes method:
resp = client.get_attributes({
domain_name: "String", # required
item_name: "String", # required
attribute_names: ["String"],
consistent_read: false,
})
...and then the following example response:
resp.attributes #=> Array
resp.attributes[0].name #=> String
resp.attributes[0].alternate_name_encoding #=> String
resp.attributes[0].value #=> String
resp.attributes[0].alternate_value_encoding #=> String
It also states the following piece of advice:
If the item does not exist on the replica that was accessed for this operation, an empty set is returned. The system does not return an error as it cannot guarantee the item does not exist on other replicas.
I hope that I'm misunderstanding this, but if your response does return an empty set, then how are you supposed to know if it's because no item exists with the supplied item name, or if your request just hit a replica that doesn't contain your item?
I have never used AWS SimpleDB before but from the little knowledge I have about replication from Amazon's DynamoDB the data is usually eventually consistent - while any of the replicas handles your request to read the attributes, the process of replication the previously written data can still take place across the replicas responsible for storing your data and that's why it's possible that the replica handling your request to read the attributes does not have to have the data stored (yet) - that's why it cannot respond with an error message.
What you should be able to do in order to be 100% sure is to specify the consistent_read: true parameter as it should tell you whether the data exists in AWS SimpleDB or not:
according to the documentation of get_attributes method
:consistent_read (Boolean) —
Determines whether or not strong consistency should be enforced when data is read from SimpleDB. If true, any data previously written to SimpleDB will be returned. Otherwise, results will be consistent eventually, and the client may not see data that was written immediately before your read.

Ruby - Elastic Search & RabbitMQ - data import being lost, script crashing silently

Stackers
I have a lot of messages in a RabbitMQ queue (running on localhost in my dev environment). The payload of the messages is a JSON string that I want to load directly into Elastic Search (also running on localhost for now). I wrote a quick ruby script to pull the messages from the queue and load them into ES, which is as follows :
#! /usr/bin/ruby
require 'bunny'
require 'json'
require 'elasticsearch'
# Connect to RabbitMQ to collect data
mq_conn = Bunny.new
mq_conn.start
mq_ch = mq_conn.create_channel
mq_q = mq_ch.queue("test.data")
# Connect to ElasticSearch to post the data
es = Elasticsearch::Client.new log: true
# Main loop - collect the message and stuff it into the db.
mq_q.subscribe do |delivery_info, metadata, payload|
begin
es.index index: "indexname",
type: "relationship",
body: payload
rescue
puts "Received #{payload} - #{delivery_info} - #{metadata}"
puts "Exception raised"
exit
end
end
mq_conn.close
There are around 4,000,000 messages in the queue.
When I run the script, I see a bunch of messages, say 30, being loaded into Elastic Search just fine. However, I see around 500 messages leaving the queue.
root#beep:~# rabbitmqctl list_queues
Listing queues ...
test.data 4333080
...done.
root#beep:~# rabbitmqctl list_queues
Listing queues ...
test.data 4332580
...done.
The script then silently exits without telling me an exception. The begin/rescue block never triggers an exception so I don't know why the script is finishing early or losing so many messages. Any clues how I should debug this next.
A
I've added a simple, working example here:
https://github.com/elasticsearch/elasticsearch-ruby/blob/master/examples/rabbitmq/consumer-publisher.rb
It's hard to debug your example when you don't provide examples of the test data.
The Elasticsearch "river" feature is deprecated, and will be removed, eventually. You should definitely invest time into writing your own custom feeder, if RabbitMQ and Elasticsearch are a central part of your infrastructure.
Answering my own question, I then learned that this is a crazy and stupid way to load a message queue of index instructions into Elastic. I created a river and can drain instructions much faster than I could with a ropey script. ;-)

Resources