Dagster failure notification systems - sentry

Is there a way in dagster to receive notifications when certain events occur, such as failures? For example, is there an integration with a tool like sentry available?

There is a datadog integration that lets users send events to datadog. From the docs:
#solid(required_resource_keys={'datadog'})
def datadog_solid(context):
dd = context.resources.datadog
dd.event('Man down!', 'This server needs assistance.')
dd.gauge('users.online', 1001, tags=["protocol:http"])
dd.increment('page.views')
dd.decrement('page.views')
dd.histogram('album.photo.count', 26, tags=["gender:female"])
dd.distribution('album.photo.count', 26, tags=["color:blue"])
dd.set('visitors.uniques', 999, tags=["browser:ie"])
dd.service_check('svc.check_name', dd.WARNING)
dd.timing("query.response.time", 1234)
# Use timed decorator
#dd.timed('run_fn')
def run_fn():
pass
run_fn()
#pipeline(mode_defs=[ModeDefinition(resource_defs={'datadog': datadog_resource})])
def dd_pipeline():
datadog_solid()
result = execute_pipeline(
dd_pipeline,
{'resources': {'datadog': {'config': {'api_key': 'YOUR_KEY', 'app_key': 'YOUR_KEY'}}}},
)
Adding first class user-configurable hooks for certain events (ie failure) is currently work in progress.

Not sure if this simply was not available yet when the accepted answer was written, but the current Dagster version (0.9.16) has a better mechanism to solve the question at hand.
They now have a hook system, where you can annotate a function to be triggered when either a pipeline has completed successfully or when it has failed.
Code example from the documentation:
#success_hook(required_resource_keys={'slack'})
def slack_on_success(context):
message = 'solid {} succeeded'.format(context.solid.name)
context.resources.slack.send_message(message)
#success_hook
def do_something_on_success(context):
do_something()

Related

Can a client subscribe to multiple graphql subscriptions in one connection?

Say we have a schema like this:
type Subscription {
objectAddedA: ObjectA
objectAddedB: ObjectB
}
Can a graphql client subscribe to both the objectAddedA and objectAddedB subscriptions at the same time? I'm having a hard time finding good examples of subscriptions on the web, and the graphql docs don't seem to mention them at all unless I'm missing it. We are designing a system that runs in kubernetes where a single pod will be getting api requests to add/update/delete configuration and we want to use graphql subscriptions to push these changes to any pods that care about them (they would be the graphql clients). However there are going to be lots of different object types and potentially several different types of events that they will want to be notified about at any time, so not sure if you can subscribe to several different subscriptions at once or if you have to designed the schema in a way that a single subscription will give all the possible events you'll need.
It is possible with graphql-python/gql
See the documentation here
Extract:
# First define all your queries using a session argument:
async def execute_query1(session):
result = await session.execute(query1)
print(result)
async def execute_query2(session):
result = await session.execute(query2)
print(result)
async def execute_subscription1(session):
async for result in session.subscribe(subscription1):
print(result)
async def execute_subscription2(session):
async for result in session.subscribe(subscription2):
print(result)
# Then create a couroutine which will connect to your API and run all your queries as tasks.
# We use a `backoff` decorator to reconnect using exponential backoff in case of connection failure.
#backoff.on_exception(backoff.expo, Exception, max_time=300)
async def graphql_connection():
transport = WebsocketsTransport(url="wss://YOUR_URL")
client = Client(transport=transport, fetch_schema_from_transport=True)
async with client as session:
task1 = asyncio.create_task(execute_query1(session))
task2 = asyncio.create_task(execute_query2(session))
task3 = asyncio.create_task(execute_subscription1(session))
task4 = asyncio.create_task(execute_subscription2(session))
await asyncio.gather(task1, task2, task3, task4)
asyncio.run(graphql_connection())
Actually, the GraphQL standard explicitly says that
Subscription operations must have exactly one root field.
Python's "graphql-core" library ensures it through a validation rule. Libraries that are based on it (graphene, ariadne and strawberry) would follow this rule as well.
This is what the server says if you attempt multiple subscriptions in one request:
"error": {
"message": "Anonymous Subscription must select only one top level field.",
You can remove this validation rule and see what happens, but remember that your in the no-standards land now, and things usually don't end well there... :D

Rasa Chatbot: Handling repeated scenario

I am working in follow up bot, each user has many tasks, when user ask about his/her tasks, the Bot will fetch tasks using API then the bot will displayed the tasks one by one and going to ask the user if he/she able to finish it today. if user said yes the the the task will marked as completed if no the bot will ask the user about finished date.
I tired many solution in Action by iterate over tasks and dispatch template but after dispatching the loop stop and never go back again.
class ActionRequestTasks(Action):
def name(self):
return "action_request_tasks"
#staticmethod
def json2obj(data):
return json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
def run(self, dispatcher, tracker: DialogueStateTracker, domain):
response = requests.get('url', headers=headers)
tasks_wrapper = self.json2obj(response.text)
data = tasks_wrapper.Data
first_message = "You have {} delayed tasks, I will help you to go through all of them".format(len(data))
dispatcher.utter_message(first_message)
for task in data:
task_message = "Task Title {}\nComplete percentage {}\nStart Date {}\nFinish Date{}".format(task.Title,
task.PercentComplete,
task.StartDate,
task.FinishDate)
dispatcher.utter_message(task_message)
dispatcher.utter_template("utter_able_to_finish", tracker)
return []
This sounds like the perfect application for a Form. You can make the API call in the required_slots() method, then use validation to fill the slots dependent on the user's response. The form will run until all slots are filled, then you can decide what to do with the slots in the submit() method (for instance, updating the task status for each one via another request).
I recommend reading the docs on Form setup and also checking out the code for formbot to see a working implementation

Run when you can

In my sinatra web application, I have a route:
get "/" do
temp = MyClass.new("hello",1)
redirect "/home"
end
Where MyClass is:
class MyClass
#instancesArray = []
def initialize(string,id)
#string = string
#id = id
#instancesArray[id] = this
end
def run(id)
puts #instancesArray[id].string
end
end
At some point I would want to run MyClass.run(1), but I wouldn't want it to execute immediately because that would slow down the servers response to some clients. I would want the server to wait to run MyClass.run(temp) until there was some time with a lighter load. How could I tell it to wait until there is an empty/light load, then run MyClass.run(temp)? Can I do that?
Addendum
Here is some sample code for what I would want to do:
$var = 0
get "/" do
$var = $var+1 # each time a request is recieved, it incriments
end
After that I would have a loop that would count requests/minute (so after a minute it would reset $var to 0, and if $var was less than some number, then it would run tasks util the load increased.
As Andrew mentioned (correctly—not sure why he was voted down), Sinatra stops processing a route when it sees a redirect, so any subsequent statements will never execute. As you stated, you don't want to put those statements before the redirect because that will block the request until they complete. You could potentially send the redirect status and header to the client without using the redirect method and then call MyClass#run. This will have the desired effect (from the client's perspective), but the server process (or thread) will block until it completes. This is undesirable because that process (or thread) will not be able to serve any new requests until it unblocks.
You could fork a new process (or spawn a new thread) to handle this background task asynchronously from the main process associated with the request. Unfortunately, this approach has the potential to get messy. You would have to code around different situations like the background task failing, or the fork/spawn failing, or the main request process not ending if it owns a running thread or other process. (Disclaimer: I don't really know enough about IPC in Ruby and Rack under different application servers to understand all of the different scenarios, but I'm confident that here there be dragons.)
The most common solution pattern for this type of problem is to push the task into some kind of work queue to be serviced later by another process. Pushing a task onto the queue is ideally a very quick operation, and won't block the main process for more than a few milliseconds. This introduces a few new challenges (where is the queue? how is the task described so that it can be facilitated at a later time without any context? how do we maintain the worker processes?) but fortunately a lot of the leg work has already been done by other people. :-)
There is the delayed_job gem, which seems to provide a nice all-in-one solution. Unfortunately, it's mostly geared towards Rails and ActiveRecord, and the efforts people have made in the past to make it work with Sinatra look to be unmaintained. The contemporary, framework-agnostic solutions are Resque and Sidekiq. It might take some effort to get up and running with either option, but it would be well worth it if you have several "run when you can" type functions in your application.
MyClass.run(temp) is never actually executing. In your current request to / path you instantiate a new instance of MyClass then it will immediately do a get request to /home. I'm not entirely sure what the question is though. If you want something to execute after the redirect, that functionality needs to exist within the /home route.
get '/home' do
# some code like MyClass.run(some_arg)
end

Basic Sidekiq Questions about Idempotency and functions

I'm using Sidekiq to perform some heavy processing in the background. I looked online but couldn't find the answers to the following questions. I am using:
Class.delay.use_method(listing_id)
And then, inside the class, I have a
self.use_method(listing_id)
listing = Listing.find_by_id listing_id
UserMailer.send_mail(listing)
Class.call_example_function()
Two questions:
How do I make this function idempotent for the UserMailer sendmail? In other words, in case the delayed method runs twice, how do I make sure that it only sends the mail once? Would wrapping it in something like this work?
mail_sent = false
if !mail_sent
UserMailer.send_mail(listing)
mail_sent = true
end
I'm guessing not since the function is tried again and then mail_sent is set to false for the second run through. So how do I make it so that UserMailer is only run once.
Are functions called within the delayed async method also asynchronous? In other words, is Class.call_example_function() executed asynchronously (not part of the response / request cycle?) If not, should I use Class.delay.call_example_function()
Overall, just getting familiar with Sidekiq so any thoughts would be appreciated.
Thanks
I'm coming into this late, but having been around the loop and had this StackOverflow entry appearing prominently via Google, it needs clarification.
The issue of idempotency and the issue of unique jobs are not the same thing. The 'unique' gems look at the parameters of job at the point it is about to be processed. If they find that there was another job with the same parameters which had been submitted within some expiry time window then the job is not actually processed.
The gems are literally what they say they are; they consider whether an enqueued job is unique or not within a certain time window. They do not interfere with the retry mechanism. In the case of the O.P.'s question, the e-mail would still get sent twice if Class.call_example_function() threw an error thus causing a job retry, but the previous line of code had successfully sent the e-mail.
Aside: The sidekiq-unique-jobs gem mentioned in another answer has not been updated for Sidekiq 3 at the time of writing. An alternative is sidekiq-middleware which does much the same thing, but has been updated.
https://github.com/krasnoukhov/sidekiq-middleware
https://github.com/mhenrixon/sidekiq-unique-jobs (as previously mentioned)
There are numerous possible solutions to the O.P.'s email problem and the correct one is something that only the O.P. can assess in the context of their application and execution environment. One would be: If the e-mail is only going to be sent once ("Congratulations, you've signed up!") then a simple flag on the User model wrapped in a transaction should do the trick. Assuming a class User accessible as an association through the Listing via listing.user, and adding in a boolean flag mail_sent to the User model (with migration), then:
listing = Listing.find_by_id(listing_id)
unless listing.user.mail_sent?
User.transaction do
listing.user.mail_sent = true
listing.user.save!
UserMailer.send_mail(listing)
end
end
Class.call_example_function()
...so that if the user mailer throws an exception, the transaction is rolled back and the change to the user's flag setting is undone. If the "call_example_function" code throws an exception, then the job fails and will be retried later, but the user's "e-mail sent" flag was successfully saved on the first try so the e-mail won't be resent.
Regarding idempotency, you can use https://github.com/mhenrixon/sidekiq-unique-jobs gem:
All that is required is that you specifically set the sidekiq option
for unique to true like below:
sidekiq_options unique: true
For jobs scheduled in the future it is possible to set for how long
the job should be unique. The job will be unique for the number of
seconds configured or until the job has been completed.
*If you want the unique job to stick around even after it has been successfully processed then just set the unique_unlock_order to
anything except :before_yield or :after_yield (unique_unlock_order =
:never)
I'm not sure I understand the second part of the question - when you delay a method call, the whole method call is deferred to the sidekiq process. If by 'response / request cycle' you mean that you are running a web server, and you call delay from there, so all the calls within the use_method are called from the sidekiq process, and hence outside of that cycle. They are called synchronously relative to each other though...

Running Plone subscriber events asynchronously

In using Plone 4, I have successfully created a subscriber event to do extra processing when a custom content type is saved. This I accomplished by using the Products.Archetypes.interfaces.IObjectInitializedEvent interface.
configure.zcml
<subscriber
for="mycustom.product.interfaces.IRepositoryItem
Products.Archetypes.interfaces.IObjectInitializedEvent"
handler=".subscribers.notifyCreatedRepositoryItem"
/>
subscribers.py
def notifyCreatedRepositoryItem(repositoryitem, event):
"""
This gets called on IObjectInitializedEvent - which occurs when a new object is created.
"""
my custom processing goes here. Should be asynchronous
However, the extra processing can sometimes take too long, and I was wondering if there is a way to run it in the background i.e. asynchronously.
Is it possible to run subscriber events asynchronously for example when one is saving an object?
Not out of the box. You'd need to add asynch support to your environment.
Take a look at plone.app.async; you'll need a ZEO environment and at least one extra instance. The latter will run async jobs you push into the queue from your site.
You can then define methods to be executed asynchronously and push tasks into the queue to execute such a method asynchronously.
Example code, push a task into the queue:
from plone.app.async.interfaces import IAsyncService
async = getUtility(IAsyncService)
async.queueJob(an_async_task, someobject, arg1_value, arg2_value)
and the task itself:
def an_async_task(someobject, arg1, arg2):
# do something with someobject
where someobject is a persistent object in your ZODB. The IAsyncService.queueJob takes at least a function and a context object, but you can add as many further arguments as you need to execute your task. The arguments must be pickleable.
The task will then be executed by an async worker instance when it can, outside of the context of the current request.
Just to give more options, you could try collective.taskqueue for that, really simple and really powerful (and avoid some of the drawbacks of plone.app.async).
The description on PyPI already has enough to get you up to speed in no time, and you can use redis for the queue management which is a big plus.

Resources