How do I find the request ID for a failed Lambda invocation? - aws-lambda

On my AWS Lambda dashboard, I see a spike in failed invocations. I want to investigate these errors by looking at the logs for these invocations. Currently, the only thing I can do to filter these invocations, is get the timeline of the failed invocations, and then look through logs.
Is there a way I can search for failed invocations, i.e. ones that did not return a 200, and get a request ID that I can then lookup in CloudWatch Logs?

You may use AWS X-Ray for this by enabling in AWS Lambda dashboard.
In X-Ray dashboard;
you may view traces
filter them by status code
see all the details of the invocation including request id, total execution time etc such as
{
"Document": {
"id": "ept5e8c459d8d017fab",
"name": "zucker",
"start_time": 1595364779.526,
"trace_id": "1-some-trace-id-fa543548b17a44aeb2e62171",
"end_time": 1595364780.079,
"http": {
"response": {
"status": 200
}
},
"aws": {
"request_id": "abcdefg-69b5-hijkl-95cc-170e91c66110"
},
"origin": "AWS::Lambda",
"resource_arn": "arn:aws:lambda:eu-west-1:12345678:function:major-tom"
},
"Id": "52dc189d8d017fab"
}

What I understand from your question is you are more interested in finding out why your lambda invocation has failed rather than finding the request-id for failed lambda invocation.
You can do this by following the steps below:
Go to your lambda function in the AWS console.
There will be three tabs named as Configuration, Permissions, and Monitoring
Click on the Monitoring Tab. Here you can see the number of invocation, Error count and success rate, and other metrics as well. Click on the Error metrics. You will see that at what time the error in invocation has happened. You can read more at this Lambda function metrics
If you already know the time at which your function has failed then ignore Step 3.
Now scroll down. You will find the section termed as CloudWatch Logs Insights. Here you will see logs for all the invocation that has happened within the specified time range.
Adjust your time range under this section. You can choose a predefined time range like 1h, 3h, 1d, etc, or your custom time range.
Now Click on the Log stream link after the above filter has been applied. It will take you to cloudwatch console and you can see the logs here.

Related

custom job status for awx/ansible tower workflows

AWX/Ansible Tower has it's own REST API service. From the below url structure i can get information about an in progress or finished workflow job:
https://<awx-ip>/api/v2/workflow_jobs/<job-id>/
But the "status" field at this url doesn't show the value i desire. If the templates that run in this job didn't get an error during execution, the value always shows a "successful".
So i need a way to show my desired job status by this rest api service. Maybe below field can be edited, but i don't know how:
"job_explanation": ""
I only need a field to serve a custom status about the ongoing or completed job. For example "partial", "failed", "successful", "ongoing" etc.
How can i edit or add a field during the ongoing job and after it is completed. Is there a way to manipulate the fields on rest api's job stats?
According the Ansible Tower API Reference Guide Workflow Jobs, by
Retriving a Workflow Job the status: (choice) can have following values
new: New
pending: Pending
waiting: Waiting
running: Running
successful: Successful
failed: Failed
error: Error
canceled: Canceled
... status about the ongoing or completed job
So the status should be already there.
For example "partial", "failed", "successful", "ongoing" etc.
So it look like the options you are looking for are already there
ongoing -> running
partial -> canceled
failed -> failed
successful -> successful
curl --silent -u "${ACCOUNT}:${PASSWORD}" https://${TOWER_URL}/api/v2/workflow_jobs/${jobID}/ | jq .
Resulting into an output of
...
"launch_type": "relaunch",
"status": "running",
"failed": false,
"started": "2022-02-04T14:28:04.147633Z",
"finished": null,
"canceled_on": null,
"elapsed": 17.367907,
"job_args": "",
"job_cwd": "",
"job_env": {},
"job_explanation": "",
...
and
...
"launch_type": "relaunch",
"status": "successful",
"failed": false,
"started": "2022-02-04T14:28:04.147633Z",
"finished": "2022-02-04T14:28:24.156419Z",
"canceled_on": null,
"elapsed": 20.009,
"job_args": "",
"job_cwd": "",
"job_env": {},
"job_explanation": "",
...
Maybe below field can be edited, but i don't know how: "job_explanation": ""
According List Jobs the field
job_explanation: A status field to indicate the state of the job if it wasn’t able to run and capture stdout (string)
therefore it should probably not be edited.
How can I edit or add a field during the ongoing job ...
The REST API is for creating, starting, stopping and so on of jobs, to remote control the Tower application. The values become set by the application, there is no safe option to set them by yourself via the API.
... and after it is completed. Is there a way to manipulate the fields on REST API job stats?
It might be possible to to alter the job result directly within in PostgreSQL application database backend.
Also it might be possible for you the change the application ansible/awx/ and the behavior of awx/api/urls/workflow_job.py.

elasticsearch connectionTimeout even after setting timeout=100

My code is self explanatory so won't provide additional details
es = Elasticsearch(['http://localhost:9200'])
e1={
"first_name":"nitin",
"last_name":"panwar",
"age": 27,
"about": "Love to play cricket",
"interests": ['sports','music'],
"timeout":100,
#"request_timeout":100 gives same error
}
res = es.index(index="test", doc_type="employee", id=1, body=e1)
I've read most posts about this error however all they talk about is to increase timeout which does not work for me.
This is the error:
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10))
You are trying to set timeout inside the body. You are supposed to initialize the elasticsearch client with the timeout param or depending on the client library there might be a request parameter for individual requests.
Param body depending on the context is usually the actual search query or data document (your case). Giving timeout in the body will make elasticsearch treat is as data to be indexed as document in your case.

View Completed Elasticsearch Tasks

I am trying to run daily tasks using Elasticsearch's update by query API. I can find currently running tasks but need a way to view all tasks, including completed.
I've reviewed the ES docs for the Update By Query API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html#docs-update-by-query
And the Task API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html#tasks
Task API shows how to get the status of a currently running task with GET _tasks/[taskId], or all running tasks - GET _tasks. But I need to see a history of all tasks ran.
How do I see a list of all completed tasks?
A bit late to the party, but you can check the tasks in the .tasks system index.
You can query this index, as any other regular index.
For the updateByQuery task you can do:
curl -XPOST -sS "elastic:9200/.tasks/_search" -H 'Content-Type: application/json' -d '{
"query": {
"bool": {
"filter": [
{
"term": {
"completed": true
}
},
{
"term": {
"task.action": "indices:data/write/update/byquery"
}
}
]
}
}
}'
From documentation,
The Task Management API is new and should still be considered a beta
feature.It allows to retrieve information about the tasks currently executing on one or more nodes in the cluster.
It still in beta version, so using this currently you'll only able to do these-
Possible to retrieve information for a particular task using GET /_tasks/<task_id>, where you can also use the detailed request parameter to get more information about the running tasks(also you can use other params as per the support)
Tasks can be also listed using GET _cat/tasks version of the list tasks command, which accepts the same arguments as the standard list tasks command.
If a long-running task supports cancellation, it can be cancelled with the cancel tasks API, with POST _tasks/oTUltX4IQMOUUVeiohTt8A:12345/_cancel
Task can be grouped with GET _tasks?group_by=parents

Elasticsearch Realtime GET support

When I index a document in ES, I am trying to access the same document within in the refresh interval has passed and the search is not returning the result. Is there a Realtime GET support which allows to get a document once indexed regardless of the "refresh rate" of the index. I tried reducing the refresh_interval to 500ms instead of 1s, but my search query happens even before 500 ms and it is not a good idea to reduce it even further.
After indexing a document, you can GET it immediately without waiting for the refresh interval.
The GET API is real-time
So if you index a new document like this
POST index/type/1
{ "name": "John Doe" }
You can get it immediately without waiting using
GET index/type/1
If you search, however, you'll need to wait for the refresh interval to pass in order to retrieve the new document or call the refresh API.
For completeness' sake, it's worth stating that when indexing you also have the option of refreshing the shards immediately, by passing the refresh=true parameter like below. Note, however, that this can have bad performance implications, so it should be used sparingly.
POST index/type/1?refresh=true
{ "name": "John Doe" }
Also worth noting that in ES 5, you'll have the option of telling ES to wait for a refresh before returning from the create call:
POST index/type/1?refresh=wait_for
{ "name": "John Doe" }
In this case, once the POST request returns, you're guaranteed that the new document is available in the next search call.

How to use RabbitMQ http api to see what queue had a messages in a ready state

I have a RabbitMQ server setup with thousands of queues. Of which only about 5 of these are persistent queues. Every now and then there is a back up of a queue that will have about 5-10 messages in a ready state. These messages do not appear to be in the persistent queues. I want to find out which queues had the messages in a ready state, but the only indication that it is happening is on the overview page of the web management console which is for all queues.
Is there a way to query Rabbit to tell me the stat info for messages that were in a ready state for a period of minutes and which queue they were in?
I would use the HTTP API.
http://rabbit-broker:15672/api/queues
This will give you a list of the current queue states in JSON so you'll have to keep polling it. Store the "messages_ready" for given queue "name" for the period you want to monitor. Now you'll be able to see which queues have that backlog spike.
You can use simple curl as well as whichever platform you prefer with an HTTP client.
Please note: the user you'll connect will have to have monitor tag to access all the queue information.
Out of the box there is no easy way AFAIK, you'd have to manually click through the queues and look at their graphs in the UI for the last hour, which is tedious.
I had similar requirements and I found a better way than polling. The docs say that you may get raw samples via api if you use special parameters in the request.
For example in your case, if you are interested in messages with ready state, you may ask your queue for a history of queue lengths, for example last 60 seconds with samples every 1 second (note 15672 is the default port used by rabbitmq_management):
http://rabbitHost:15672/api/queues/vhost/queue?lengths_age=60&lengths_incr=1
For default vhost=/ it will be:
http://rabbitHost:15672/api/queues/%2F/queue?lengths_age=60&lengths_incr=1
Then in the result json there will be some additional _details objects like this:
"messages_ready_details": {
"avg": 8.524590163934427,
"avg_rate": 0.08333333333333333,
"samples": [{
"timestamp": 1532699694000,
"sample": 5
}, {
"timestamp": 1532699693000,
"sample": 11
},
<... more samples ...>
],
"rate": -6.0
},
"messages_ready": 5,
Then on this raw data you may do any stats you need.
Other raw data samples appear if you use differen parameters in
What sampling will appear? What parameters are required for it to appear?
Messages sent and received msg_rates_age / msg_rates_incr
Bytes sent and received data_rates_age / data_rates_incr
Queue lengths lengths_age / lengths_incr
Node statistics (e.g. file descriptors, disk space free) node_stats_age / node_stats_incr

Resources