Datadog Lambda Extension Logs delayed

Datadog Lambda Extension Logs delayed - aws-lambda

Im trying to send logs from AWS Lambda using Datadog extension.
It works but the logs arent being sent until the lambda is shut down (as opposed to the end of invocation) which leads to ~10min delay before logs appear within Datadog.
The current environment variables for the lambda are as follows:
DD_API_KEY_SECRET_ARN = secert_arn
DD_CAPTURE_LAMBDA_PAYLOAD = true
DD_ENV = dev
DD_FLUSH_TO_LOG = false
DD_LAMBDA_HANDLER = index.handler
DD_LOG_LEVEL = debug
DD_LOGS_INJECTION = true
DD_SERVERLESS_LOGS_ENABLED = true
DD_SERVICE = MyService
DD_SITE = datadoghq.com
DD_TRACE_ENABLED = true
DD_VERSION $LATEST

You should take a look at this issue:
https://github.com/DataDog/datadog-lambda-extension/issues/29
Let me quote an answer from it:
Hi #stalar, thanks for reaching out.
This is a known behavior based on the way Lambda Extensions and the
Lambda Logs API work. Once your function finishes running, the
extension is frozen until the next invocation. However, there isn't a
guarantee that we have received logs at that time. Logs may arrive on
the subsequent invocation of the function. Furthermore, if your
function is invoked repeatedly, we will switch to a strategy of
periodically flushing logs to reduce overhead, which may mean that
logs do not immediately appear in Datadog after each and every
invocation.
We are in touch with AWS about possible improvements to resolve this
issue.
Let me know if you have any further questions!

Related

Winston Force flush before ending lambda execution

I'm trying to use Winston to send logs to Datadog from an Aws Lambda. The problem with the lambdas is that once we return a response, the lambda execution stops and it doesn't give time to Winston to flush the logs.
Is there a way I can force the flush before returning. I'm trying this but it doesn't seem to do the trick:
async function handler (event): Promise<FormattedJSONResponse> {
const logger = getLogger()
// do some work
await closeLogger(logger)
return awsResponse
}
function closeLogger (logger: Logger): Promise<any> {
const loggerDone = new Promise((resolve, _) => {
logger.on('finish', () => {
resolve(logger)
})
})
logger.end()
logger.close()
return loggerDone
}
Versions:
AWS Lambda with nodejs 12
Winston: 3.3.3
Thanks for your help

First of all I don't understand why you would want to send your logs within you lambda function? If you do so your lambda function will run longer to process the logs, meaning you will be charged for the time it takes to send the logs to Datadog.
Instead, you could save the logs to CloudWatch. To avoid high charges for CloudWatch set the retention to a rather short time, maybe one day. On the CloudWatch log stream you can then add a subscriber which could be another lambda function. This "log-processor"-lambda-function will process, transform the logs and send them to Datadog. With this architecture your first lambda function containing the business logic won't fail if Datadog cannot be reached for instance. It makes your architecture more resilient and has better separation of concerns. Yan Cui wrote a great article on "Centralised logging for AWS Lambda"
Another approach, still separating your logging from your lambda function business logic to some degree, builds upon lambda extensions namely the Lambda Logs API.
Put simple, lambda extensions add an extra layer to your function but are not part of the lambda function's code itself. Probably the best part for you: Datadog already offers a ready to use extension, which is responsible for:
Pushing real-time enhanced Lambda metrics, custom metrics, and traces from the Datadog Lambda Library to Datadog.
Forwarding logs from your Lambda function to Datadog.
For more info on Lambda extensions follow the links mentioned above or have a look at Yan Cui's post "Lambda Logs API: a new way to process Lambda logs in real-time"

After spending 4 hours on this issue, I found no other way (that works, isn't buggy and is transport agnostic) than to use an arbitrary timeout before returning a response.
This example is for NextJS but you can easily remove res: NextApiResponse.
export const gracefulExit = (response: any, res: NextApiResponse) => {
setTimeout(() => {
res.send({ ...response, sessionId });
}, 400);
};
Then in all my serverless functions I don't do res.send({x}) but rather gracefulExit({x}, res)

message flow - end when broker is stopping

I have an IIB message flow that runs for a few hours each evening, using a java loop to perform some actions once per minute.
During that time, if the multi-instance broker that the flow is running on fails over, the failover hangs until this message flow ends its processing (potentially hours later).
Is there any kind of hook I can use in Java to say "if the broker is stopping or failing over, then cancel this processing to let it happen"?
Edit
I have now tried the following code as a test, but even when a request is made to stop the execution group/flow, the booleans all remain as true
Boolean egIsRunning = true;
Boolean aIsRunning = true;
Boolean msgFlowIsRunning = true;
while (egIsRunning && aIsRunning && msgFlowIsRunning)
{
Thread.sleep(1000);
ExecutionGroupProxy e = ExecutionGroupProxy.getLocalInstance();
egIsRunning = e.isRunning();
ApplicationProxy a = e.getApplicationByName("SANDBOX.APP");
aIsRunning = a.isRunning();
MessageFlowProxy m = a.getMessageFlowByName("SANDBOX_MSGFLOW");
msgFlowIsRunning = m.isRunning();
}
So, I don't think the Integration API is going to help here? Or is there some ".isTryingToStop" method that I'm missing?

I recommend to look at the Integration API.
From within your Java Compute Node you can connect with BrokerProxy.getLocalInstance() (don't forget to disconnect(), otherwise you will run out of memory eventually). Then I would try MessageFlowProxy.isRunning() as an exit criteria for your loop.
Should isRunning not work, there are other options like AdministeredObjectListener to figure out what is happening with your flow.

Dagster failure notification systems

Is there a way in dagster to receive notifications when certain events occur, such as failures? For example, is there an integration with a tool like sentry available?

There is a datadog integration that lets users send events to datadog. From the docs:
#solid(required_resource_keys={'datadog'})
def datadog_solid(context):
dd = context.resources.datadog
dd.event('Man down!', 'This server needs assistance.')
dd.gauge('users.online', 1001, tags=["protocol:http"])
dd.increment('page.views')
dd.decrement('page.views')
dd.histogram('album.photo.count', 26, tags=["gender:female"])
dd.distribution('album.photo.count', 26, tags=["color:blue"])
dd.set('visitors.uniques', 999, tags=["browser:ie"])
dd.service_check('svc.check_name', dd.WARNING)
dd.timing("query.response.time", 1234)
# Use timed decorator
#dd.timed('run_fn')
def run_fn():
pass
run_fn()
#pipeline(mode_defs=[ModeDefinition(resource_defs={'datadog': datadog_resource})])
def dd_pipeline():
datadog_solid()
result = execute_pipeline(
dd_pipeline,
{'resources': {'datadog': {'config': {'api_key': 'YOUR_KEY', 'app_key': 'YOUR_KEY'}}}},
)
Adding first class user-configurable hooks for certain events (ie failure) is currently work in progress.

Not sure if this simply was not available yet when the accepted answer was written, but the current Dagster version (0.9.16) has a better mechanism to solve the question at hand.
They now have a hook system, where you can annotate a function to be triggered when either a pipeline has completed successfully or when it has failed.
Code example from the documentation:
#success_hook(required_resource_keys={'slack'})
def slack_on_success(context):
message = 'solid {} succeeded'.format(context.solid.name)
context.resources.slack.send_message(message)
#success_hook
def do_something_on_success(context):
do_something()

Gradle - Capturing output written to out / err on a per task basis

I'm trying to capture output written from each task as it is executed. The code below works as expected when running Gradle with --max-workers 1, but when multiple tasks are running in parallel this code below picks up output written from other tasks running simultaneously.
The API documentation states the following about the "getLogging" method on Task. From what it says I judge that it should support capturing output from single tasks regardless of any other tasks running at the same time.
getLogging()
Returns the LoggingManager which can be used to control the logging level and standard output/error capture for this task. https://docs.gradle.org/current/javadoc/org/gradle/api/Task.html
graph.allTasks.forEach { Task task ->
task.ext.capturedOutput = [ ]
def listener = { task.capturedOutput << it } as StandardOutputListener
task.logging.addStandardErrorListener(listener)
task.logging.addStandardOutputListener(listener)
task.doLast {
task.logging.removeStandardOutputListener(listener)
task.logging.removeStandardErrorListener(listener)
}
}
Have I messed up something in the code above or should I report this as a bug?

It looks like every LoggingManager instance shares an OutputLevelRenderer, which is what your listeners eventually get added to. This did make me wonder why you weren't getting duplicate messages because you're attaching the same listeners to the same renderer over and over again. But it seems the magic is in BroadcastDispatch, which keeps the listeners in a map, keyed by the listener object itself. So you can't have duplicate listeners.
Mind you, for that to hold, the hash code of each listener must be the same, which seems surprising. Anyway, perhaps this is working as intended, perhaps it isn't. It's certainly worth an issue to get some clarity on whether Gradle should support listeners per task. Alternatively raise it on the dev mailing list.

Basic Sidekiq Questions about Idempotency and functions

I'm using Sidekiq to perform some heavy processing in the background. I looked online but couldn't find the answers to the following questions. I am using:
Class.delay.use_method(listing_id)
And then, inside the class, I have a
self.use_method(listing_id)
listing = Listing.find_by_id listing_id
UserMailer.send_mail(listing)
Class.call_example_function()
Two questions:
How do I make this function idempotent for the UserMailer sendmail? In other words, in case the delayed method runs twice, how do I make sure that it only sends the mail once? Would wrapping it in something like this work?
mail_sent = false
if !mail_sent
UserMailer.send_mail(listing)
mail_sent = true
end
I'm guessing not since the function is tried again and then mail_sent is set to false for the second run through. So how do I make it so that UserMailer is only run once.
Are functions called within the delayed async method also asynchronous? In other words, is Class.call_example_function() executed asynchronously (not part of the response / request cycle?) If not, should I use Class.delay.call_example_function()
Overall, just getting familiar with Sidekiq so any thoughts would be appreciated.
Thanks

I'm coming into this late, but having been around the loop and had this StackOverflow entry appearing prominently via Google, it needs clarification.
The issue of idempotency and the issue of unique jobs are not the same thing. The 'unique' gems look at the parameters of job at the point it is about to be processed. If they find that there was another job with the same parameters which had been submitted within some expiry time window then the job is not actually processed.
The gems are literally what they say they are; they consider whether an enqueued job is unique or not within a certain time window. They do not interfere with the retry mechanism. In the case of the O.P.'s question, the e-mail would still get sent twice if Class.call_example_function() threw an error thus causing a job retry, but the previous line of code had successfully sent the e-mail.
Aside: The sidekiq-unique-jobs gem mentioned in another answer has not been updated for Sidekiq 3 at the time of writing. An alternative is sidekiq-middleware which does much the same thing, but has been updated.
https://github.com/krasnoukhov/sidekiq-middleware
https://github.com/mhenrixon/sidekiq-unique-jobs (as previously mentioned)
There are numerous possible solutions to the O.P.'s email problem and the correct one is something that only the O.P. can assess in the context of their application and execution environment. One would be: If the e-mail is only going to be sent once ("Congratulations, you've signed up!") then a simple flag on the User model wrapped in a transaction should do the trick. Assuming a class User accessible as an association through the Listing via listing.user, and adding in a boolean flag mail_sent to the User model (with migration), then:
listing = Listing.find_by_id(listing_id)
unless listing.user.mail_sent?
User.transaction do
listing.user.mail_sent = true
listing.user.save!
UserMailer.send_mail(listing)
end
end
Class.call_example_function()
...so that if the user mailer throws an exception, the transaction is rolled back and the change to the user's flag setting is undone. If the "call_example_function" code throws an exception, then the job fails and will be retried later, but the user's "e-mail sent" flag was successfully saved on the first try so the e-mail won't be resent.

Regarding idempotency, you can use https://github.com/mhenrixon/sidekiq-unique-jobs gem:
All that is required is that you specifically set the sidekiq option
for unique to true like below:
sidekiq_options unique: true
For jobs scheduled in the future it is possible to set for how long
the job should be unique. The job will be unique for the number of
seconds configured or until the job has been completed.
*If you want the unique job to stick around even after it has been successfully processed then just set the unique_unlock_order to
anything except :before_yield or :after_yield (unique_unlock_order =
:never)
I'm not sure I understand the second part of the question - when you delay a method call, the whole method call is deferred to the sidekiq process. If by 'response / request cycle' you mean that you are running a web server, and you call delay from there, so all the calls within the use_method are called from the sidekiq process, and hence outside of that cycle. They are called synchronously relative to each other though...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Datadog Lambda Extension Logs delayed - aws-lambda

Related

Winston Force flush before ending lambda execution

message flow - end when broker is stopping

Dagster failure notification systems

Gradle - Capturing output written to out / err on a per task basis

Basic Sidekiq Questions about Idempotency and functions

Categories

Resources