Hadoop Implement a status callback - hadoop

I am looking for a clean way to implement a java event system that hooks into Hadoop v2. I know there is a notification url and I have used that in the past. What I want to do is to hook into the JobStatus and have events posted to a queue service for propagating events to client. I tried extending job and assigning the status using reflection to my custom JobStatus class but this is not working. I have also cursory looked into Yarn's event system to add a hook that will allow me to listen for yarn events and propagate those. I really need an expert opinion on how to accomplish this kind of task. I want to get log messages, status change events in real time over to a web client.
Thanks for any assistance in advance.

Related

How to design notification system that sends real-time alerts created by users

I've been thinking about how to design a system that supports user created scheduled alerts. My problem is once the alerts are created and inserted into a database, I don't know what the best way to go about scheduling those alerts. Polling the database to see which alerts need to go out next doesn't seem entirely right to me.
What are some ways this could be handled on a scale where say a million users could create their own custom alerts like change baby diaper at 3pm everyday?
This problem is very suitable for cloud platforms. For example, you could use GCP Cloud Scheduler to invoke a cloud function when the alert is supposed to be sent out. The cloud function then calls some API to alert the user.
If cloud platforms are not an option, you could have your application spawn a new thread when an alert is created, and sleep that thread for a certain duration. When it wakes up, it sends the alert. Less elegant and less scalable than the first solution, but it would still work.

ActiveMQ How to delete only some scheduled elements

I have an ActiveMQ messaging system and i want to delete only some scheduled messages from the queue.
I can delete all the scheduled message via a ScheduledMessage.AMQ_SCHEDULER_ACTION_REMOVEALL
message sent to the queue.
I can delete a message by ID by sending a AMQ_SCHEDULER_ACTION_REMOVE message.
But is there a way to delete all messages with a selector (maybe a property on the message) ?
I checked the Jolokia REST API of ActiveMQ, but it seems that informations on Scheduled messages are not available.
No that functionality is not currently supported. You would need to take a look at the source code and implement this yourself and then contribute it back to the community. There is a fine line though were trying to use a message broker as a database will turn around and bite you so I'd recommend caution on that front.
You'd need to implement a new remove directive like AMQ_SCHEDULER_ACTION_REMOVE_SELECTED and define how the selector works in that case, SQL92 string etc and then add an API on the Scheduler store interface and implement it in the Scheduler implementation in the KahaDB module.

Microservice and RabbitMQ

I am new to Microservices and have a question with RabbitMQ / EasyNetQ.
I am sending messages from one microservice to another microservice.
Each Microservice are Web API's. I am using CQRS where my Command Handler would consume message off the Queue and do some business logic. In order to call the handler, it will need to make a request to the API method.
I would like to know without having to explicit call the API endpoint to hit the code for consuming messages. Is there an automated way of doing it without having to call the API endpoint ?
Suggestion could be creating a separate solution which would be a Console App that will execute the RabbitMQ in order to start listening. Create a while loop to read messages, then call the web api endpoint to handle business logic every time a new message is sent to the queue.
My aim is to create a listener or a startup task where once messages are in the queue it will automatically pick it up from the Queue and continue with command handler but not sure how to do the "Automatic" way as i describe it. I was thinking to utilise Azure Webjob that will continuously be running and it will act as the Consumer.
Looking for a good architectural way of doing it.
Programming language being used is C#
Much Appreciated
The recommended way of hosting RabbitMQ subscriber is by writing a windows service using something like topshelf library and subscribe to bus events inside that service on its start. We did that in multiple projects with no issues.
If you are using Azure, the best place to host RabbitMQ subscriber is in a "Worker Role".
I am using CQRS where my Command Handler would consume message off
the Queue and do some business logic. In order to call the handler, it
will need to make a request to the API method.
Are you sure this is real CQRS? CQRS occures when you handle queries and commands differently in your domain logic. Receiving a message via a calss, that's called CommandHandler and just reacting to it is not yet CQRS.
My aim is to create a listener or a startup task where once messages
are in the queue it will automatically pick it up from the Queue and
continue with command handler but not sure how to do the "Automatic"
way as i describe it. I was thinking to utilise Azure Webjob that will
continuously be running and it will act as the Consumer. Looking for
a good architectural way of doing it.
The easier you do that, the better. Don't go searching for complex solutions until you tried out all the simple ones. When I was implementing something similar, I was just running a pool of message handler scripts using Linux cron. A handler poped a message off the queue, processed it and terminated. Simple.
I think using the CQRS pattern, you will have events as well and corresponding event handlers. As you are using RabbitMQ for asynchronous communication between command and query then any message put on specific channel on RabbitMQ, can be listened by a callback method
Receiving messages from the queue is more complex. It works by subscribing a callback function to a queue. Whenever we receive a message, this callback function is called by the Pika library.

Adding a Hadoop Dispatcher listener for eventing

I am looking for a suggestion on how to listen for events from the Hadoop framework. I have spent quite a lot of time investigating solutions to this problem but have not come up with a reasonable solution. What I want is a way to add a bridge to listen for events from the MRAppMaster's dispatcher. I want to capture these events and forward the ones I care about to the listening client. This would allow me to detect when a job is complete without having to register a servlet callback as the JobCompletion functionality does. Has anyone found a way to register for events from the Dispatcher? I have modified the MrAppMaster to create an event bridge that I use to forward the events to an ActiveMQ server but this makes my version of hadoop specialized and non portable. Any ideas, I have heard that the TimeServer may help me add this functionality but am not sure how it works as of yet.

How can I implement Pre- and Post-Commit Hooks in Riak?

There is but scant information on the web as to how to actually implement these features of Riak besides this blog post and a few others. Are any client libraries (ripple etc.) capable of receiving messages via the hook so that working with the changed data in the app (i.e. outside of Riak) becomes possible? Thanks.
It's not possible to have Riak call back into your application, however if you use the "returnbody" option when storing, you'll get back the value that was actually stored as modified by pre-commit hooks.
Post-commit hooks are run asynchronously after the object is stored and so should not be used to modify the stored object. One way you might get "messages via the hook" would be to have your post-commit hook post messages to RabbitMQ (or some other queue), which your application could then consume and do its own processing.
I hope that gives you an idea of where to start. In the meantime, we'll add some examples to that wiki page.

Resources