Oozie custom asynchronous action - hadoop

I have a problem implementing a custom asynchronous action in Oozie. My class extends from ActionExecutor, and overwrites the methods initActionType, start, end, check, kill and isCompleted.
In the start method, i want to to start a YARN job, that is implemented through my BiohadoopClient class. To make the call asynchronous, i wrapped the client.run() method in a Callable:
public void start(final Context context, final WorkflowAction action) {
...
Callable<String> biohadoop = new Callable<String>() {
BiohadoopClient client = new BiohadoopClient();
client.run();
}
// submit callable to executor
executor.submit(biohadoop);
// set the start data, according to https://oozie.apache.org/docs/4.0.1/DG_CustomActionExecutor.html
context.setStartData(externalId, callBackUrl, callBackUrl);
...
}
This works fine, and for example when I use my custom action in a fork/join manner, the execution of the actions runs in parallel.
Now, the problem is, that Oozie remains in a RUNNING state for this actions. It seems impossible to change that to a completed state. The check() method is never called by Oozie, the same is true for the end() method. It doesn't help to set the context.setExternalStatus(), context.setExecutionData() and context.setEndData() manually in the Callable (after the client.run() has finished). I tried also to queue manually an ActionEndXCommand, but without luck.
When I wait in the start() method for the Callable to complete, the state gets updated correctly, but the execution in fork/join isn't parallel anymore (which seem logic, as the execution waits for the Callable to complete).
How external clients notify Oozie workflow with HTTP callback didn't help, as using the callback seems to change nothing (well, I can see that it happened in the log files, but beside from that, nothing...). Also, the answer mentioned, that the SSH action runs asynchronously, but I haven't found out how this is done. There is some wrapping inside a Callable, but at the end, the call() method of the Callable is invoked directly (no submission to an Executor).
So far I haven't found any example howto write an asynchronous custom action. Can anybody please help me?
Thanks
Edit
Here are the implementations of initActionType(), start(), check(), end(), the callable implementation can be found inside the start() action.
The callable is submitted to an executor in the start() action, after which its shutdown() method is invoked - so the executor shuts down after the Callable has finished. As next step, context.setStartData(externalId, callBackUrl, callBackUrl) is invoked.
private final AtomicBoolean finished = new AtomicBoolean(false);
public void initActionType() {
super.initActionType();
log.info("initActionType() invoked");
}
public void start(final Context context, final WorkflowAction action)
throws ActionExecutorException {
log.info("start() invoked");
// Get parameters from Node configuration
final String parameter = getParameters(action.getConf());
Callable<String> biohadoop = new Callable<String>() {
#Override
public String call() throws Exception {
log.info("Starting Biohadoop");
// No difference if check() is called manually
// or if the next line is commented out
check(context, action);
BiohadoopClient client = new BiohadoopClient();
client.run(parameter);
log.info("Biohadoop finished");
finished.set(true);
// No difference if check() is called manually
// or if the next line is commented out
check(context, action);
return null;
}
};
ExecutorService executor = Executors.newCachedThreadPool();
biohadoopResult = executor.submit(biohadoop);
executor.shutdown();
String externalId = action.getId();
String callBackUrl = context.getCallbackUrl("finished");
context.setStartData(externalId, callBackUrl, callBackUrl);
}
public void check(final Context context, final WorkflowAction action)
throws ActionExecutorException {
// finished is an AtomicBoolean, that is set to true,
// after Biohadoop has finished (see implementation of Callable)
if (finished.get()) {
log.info("check(Context, WorkflowAction) invoked -
Callable has finished");
context.setExternalStatus(Status.OK.toString());
context.setExecutionData(Status.OK.toString(), null);
} else {
log.info("check(Context, WorkflowAction) invoked");
context.setExternalStatus(Status.RUNNING.toString());
}
}
public void end(Context context, WorkflowAction action)
throws ActionExecutorException {
log.info("end(Context, WorkflowAction) invoked");
context.setEndData(Status.OK, Status.OK.toString());
}

One thing - I can see you are shutting down the executor right after you have submitted the job - executor.shutdown();. That might be causing the issue. Could you please try moving this statement to the end() method instead?

In the end I didn't find a "real" solution to the problem. The solution that worked for me was to implement an action, that invokes the Biohadoop instances in parallel using the Java Executor framework. After the invokation, I wait (still inside the action) for the threads to finish

Related

Vert.x: how to process HttpRequest with a blocking operation

I've just started with Vert.x and would like to understand what is the right way of handling potentially long (blocking) operations as part of processing a REST HttpRequest. The application itself is a Spring app.
Here is a simplified REST service I have so far:
public class MainApp {
// instantiated by Spring
private AlertsRestService alertsRestService;
#PostConstruct
public void init() {
Vertx.vertx().deployVerticle(alertsRestService);
}
}
public class AlertsRestService extends AbstractVerticle {
// instantiated by Spring
private PostgresService pgService;
#Value("${rest.endpoint.port:8080}")
private int restEndpointPort;
#Override
public void start(Future<Void> futureStartResult) {
HttpServer server = vertx.createHttpServer();
Router router = Router.router(vertx);
//enable reading of the request body for all routes
router.route().handler(BodyHandler.create());
router.route(HttpMethod.GET, "/allDefinitions")
.handler(this::handleGetAllDefinitions);
server.requestHandler(router)
.listen(restEndpointPort,
result -> {
if (result.succeeded()) {
futureStartResult.complete();
} else {
futureStartResult.fail(result.cause());
}
}
);
}
private void handleGetAllDefinitions( RoutingContext routingContext) {
HttpServerResponse response = routingContext.response();
Collection<AlertDefinition> allDefinitions = null;
try {
allDefinitions = pgService.getAllDefinitions();
} catch (Exception e) {
response.setStatusCode(500).end(e.getMessage());
}
response.putHeader("content-type", "application/json")
.setStatusCode(200)
.end(Json.encodePrettily(allAlertDefinitions));
}
}
Spring config:
<bean id="alertsRestService" class="com.my.AlertsRestService"
p:pgService-ref="postgresService"
p:restEndpointPort="${rest.endpoint.port}"
/>
<bean id="mainApp" class="com.my.MainApp"
p:alertsRestService-ref="alertsRestService"
/>
Now the question is: how to properly handle the (blocking) call to my postgresService, which may take longer time if there are many items to get/return ?
After researching and looking at some examples, I see a few ways to do it, but I don't fully understand differences between them:
Option 1. convert my AlertsRestService into a Worker Verticle and use the worker thread pool:
public class MainApp {
private AlertsRestService alertsRestService;
#PostConstruct
public void init() {
DeploymentOptions options = new DeploymentOptions().setWorker(true);
Vertx.vertx().deployVerticle(alertsRestService, options);
}
}
What confuses me here is this statement from the Vert.x docs: "Worker verticle instances are never executed concurrently by Vert.x by more than one thread, but can [be] executed by different threads at different times"
Does it mean that all HTTP requests to my alertsRestService are going to be, effectively, throttled to be executed sequentially, by one thread at a time? That's not what I would like: this service is purely stateless and should be able to handle concurrent requests just fine ....
So, maybe I need to look at the next option:
Option 2. convert my service to be a multi-threaded Worker Verticle, by doing something similar to the example in the docs:
public class MainApp {
private AlertsRestService alertsRestService;
#PostConstruct
public void init() {
DeploymentOptions options = new DeploymentOptions()
.setWorker(true)
.setInstances(5) // matches the worker pool size below
.setWorkerPoolName("the-specific-pool")
.setWorkerPoolSize(5);
Vertx.vertx().deployVerticle(alertsRestService, options);
}
}
So, in this example - what exactly will be happening? As I understand, ".setInstances(5)" directive means that 5 instances of my 'alertsRestService' will be created. I configured this service as a Spring bean, with its dependencies wired in by the Spring framework. However, in this case, it seems to me the 5 instances are not going to be created by Spring, but rather by Vert.x - is that true? and how could I change that to use Spring instead?
Option 3. use the 'blockingHandler' for routing. The only change in the code would be in the AlertsRestService.start() method in how I define a handler for the router:
boolean ordered = false;
router.route(HttpMethod.GET, "/allDefinitions")
.blockingHandler(this::handleGetAllDefinitions, ordered);
As I understand, setting the 'ordered' parameter to TRUE means that the handler can be called concurrently. Does it mean this option is equivalent to the Option #2 with multi-threaded Worker Verticles?
What is the difference? that the async multi-threaded execution pertains to the one specific HTTP request only (the one for the /allDefinitions path) as opposed to the whole AlertsRestService Verticle?
Option 4. and the last option I found is to use the 'executeBlocking()' directive explicitly to run only the enclosed code in worker threads. I could not find many examples of how to do this with HTTP request handling, so below is my attempt - maybe incorrect. The difference here is only in the implementation of the handler method, handleGetAllAlertDefinitions() - but it is rather involved... :
private void handleGetAllAlertDefinitions(RoutingContext routingContext) {
vertx.executeBlocking(
fut -> { fut.complete( sendAsyncRequestToDB(routingContext)); },
false,
res -> { handleAsyncResponse(res, routingContext); }
);
}
public Collection<AlertDefinition> sendAsyncRequestToDB(RoutingContext routingContext) {
Collection<AlertDefinition> allAlertDefinitions = new LinkedList<>();
try {
alertDefinitionsDao.getAllAlertDefinitions();
} catch (Exception e) {
routingContext.response().setStatusCode(500)
.end(e.getMessage());
}
return allAlertDefinitions;
}
private void handleAsyncResponse(AsyncResult<Object> asyncResult, RoutingContext routingContext){
if(asyncResult.succeeded()){
try {
routingContext.response().putHeader("content-type", "application/json")
.setStatusCode(200)
.end(Json.encodePrettily(asyncResult.result()));
} catch(EncodeException e) {
routingContext.response().setStatusCode(500)
.end(e.getMessage());
}
} else {
routingContext.response().setStatusCode(500)
.end(asyncResult.cause());
}
}
How is this different form other options? And does Option 4 provide concurrent execution of the handler or single-threaded like in Option 1?
Finally, coming back to the original question: what is the most appropriate Option for handling longer-running operations when handling REST requests?
Sorry for such a long post.... :)
Thank you!
That's a big question, and I'm not sure I'll be able to address it fully. But let's try:
In Option #1 what it actually means is that you shouldn't use ThreadLocal in your worker verticles, if you use more than one worker of the same type. Using only one worker means that your requests will be serialised.
Option #2 is simply incorrect. You cannot use setInstances with instance of a class, only with it's name. You're correct, though, that if you choose to use name of the class, Vert.x will instantiate them.
Option #3 is less concurrent than using Workers, and shouldn't be used.
Option #4 executeBlocking is basically doing Option #3, and is also quite bad.

run PublishSubject on different thread rxJava

I am running RxJava and creating a subject to use onNext() method to produce data. I am using Spring.
This is my setup:
#Component
public class SubjectObserver {
private SerializedSubject<SomeObj, SomeObj> safeSource;
public SubjectObserver() {
safeSource = PublishSubject.<SomeObj>create().toSerialized();
**safeSource.subscribeOn(<my taskthreadExecutor>);**
**safeSource.observeOn(<my taskthreadExecutor>);**
safeSource.subscribe(new Subscriber<AsyncRemoteRequest>() {
#Override
public void onNext(AsyncRemoteRequest asyncRemoteRequest) {
LOGGER.debug("{} invoked.", Thread.currentThread().getName());
doSomething();
}
}
}
public void publish(SomeObj myObj) {
safeSource.onNext(myObj);
}
}
The way new data is generated on the RxJava stream is by #Autowire private SubjectObserver subjectObserver
and then calling subjectObserver.publish(newDataObjGenerated)
No matter what I specify for subscribeOn() & observeOn():
Schedulers.io()
Schedulers.computation()
my threads
Schedulers.newThread
The onNext() and the actual work inside it is done on the same thread that actually calls the onNext() on the subject to generate/produce data.
Is this correct? If so, what am I missing? I was expecting the doSomething() to be done on a different thread.
Update
In my calling class, if I change the way I am invoking the publish method, then of course a new thread is allocated for the subscriber to run on.
taskExecutor.execute(() -> subjectObserver.publish(newlyGeneratedObj));
Thanks,
Each operator on Observable/Subject return a new instance with the extra behavior, however, your code just applies the subscribeOn and observeOn then throws away whatever they produced and subscribes to the raw Subject. You should chain the method calls and then subscribe:
safeSource = PublishSubject.<AsyncRemoteRequest>create().toSerialized();
safeSource
.subscribeOn(<my taskthreadExecutor>)
.observeOn(<my taskthreadExecutor>)
.subscribe(new Subscriber<AsyncRemoteRequest>() {
#Override
public void onNext(AsyncRemoteRequest asyncRemoteRequest) {
LOGGER.debug("{} invoked.", Thread.currentThread().getName());
doSomething();
}
});
Note that subscribeOn has no practical effect on a PublishSubject because there is no subscription side-effect happening in its subscribe() method.

Stop main thread until all events on JavaFX event queue have been executed

While debugging an application I would like the main thread to wait after each Runnable I put on the JavaFX event queue using
Platform.runLater(new Runnable()... )
to wait until it has been executed (i.e. is visible). However there are two twists here:
First, it is not really a standard, GUI driven JavaFX app. It is rather a script showing and updating a JavaFX stage every now an then. So the structure looks something like this:
public static void main(String [] args){
//do some calculations
SomeView someView = new SomeView(data); //SomeView is basically a wrapper for a stage
PlotUtils.plotView(someView) //displays SomeView (i.e. the stage)
//do some more calculations
someView.updateView(updatedData)
//do some more calculations
}
public class SomeView {
private static boolean viewUpdated = false;
private ObservableList<....> observableData;
public void updateView(Data data){
Platform.runLater(new Runnable() {
#Override
public void run() {
observableData.addAll(data);
boolean viewUpdated = true;
}
});
//If configured (e.g using boolean switch), wait here until
//the Runnable has been executed and the Stage has been updated.
//At the moment I am doing this by waiting until viewUpdated has been
//set to true ... but I am looking for a better solution!
}
}
Second, it should be easy to disable this "feature", i.e. to wait for the Runnable to be executed (this would be no problem using the current approach but should be possible with the alternative approach as well).
What is the best way to do this?
E.g. is there something like a blocking version to execute a Runnable on the JavaFX thread or is there an easy way to check whether all events on the event queue have been executed/ the eventqueue is empty....?
There's also PlatformImpl.runAndWait() that uses a countdown latch so long as you don't call it from the JavaFX thread
This is based on the general idea from JavaFX2: Can I pause a background Task / Service?
The basic idea is to submit a FutureTask<Void> to Platform.runLater() and then to call get() on the FutureTask. get() will block until the task has been completed:
// on some background thread:
Runnable runnable = () -> { /* code to execute on FX Application Thread */};
FutureTask<Void> task = new FutureTask<>(runnable, null);
Platform.runLater(task);
task.get();
You must not execute this code block on the FX Application Thread, as this will result in deadlock.
If you want this to be easily configurable, you could do the following:
// Wraps an executor and pauses the current thread
// until the execution of the runnable provided to execute() is complete
// Caution! Calling the execute() method on this executor from the same thread
// used by the underlying executor will result in deadlock.
public class DebugExecutor implements Executor {
private final Executor exec ;
public DebugExecutor(Executor executor) {
this.exec = executor ;
}
#Override
public void execute(Runnable command) {
FutureTask<Void> task = new FutureTask<>(command, null);
exec.execute(command);
try {
task.get();
} catch (InterruptedException interrupt) {
throw new Error("Unexpected interruption");
} catch (ExecutionException exc) {
throw new RuntimeException(exc);
}
}
}
Now in your application you can do:
// for debug:
Executor frontExec = new DebugExecutor(Platform::runLater);
// for production:
// Executor frontExec = Platform::runLater ;
and replace all the calls to
Platform.runLater(...) with frontExec.execute(...);
Depending on how configurable you want this, you could create frontExec conditionally based on a command-line argument, or a properties file (or, if you are using a dependency injection framework, you can inject it).

Spring AsyncTask: update jsf view component

I have a long run job must run in background and after it finished I want to update jsf view component.
I used SimpleAsyncTaskExecutor to do the work. It works good but when comming to updating ui then I am getting NullPointerException.
Here is my code
SimpleAsyncTaskExecutor tasks = new SimpleAsyncTaskExecutor();
tasks.submitListenable(new Callable<String>() {
#Override
public String call() throws Exception {
//Do long time taking job in approximately 16 seconds
doTheBigJob();
//then update view component by it's id
FacesContext.getCurrentInstance().getPartialViewContext().getRenderIds().add(myComponentId);
return "";
}
});
Not: When the time is short (like 2 seconds), no NullPointerException occurs
Thanks in advence.
FacesContext.getCurrentInstance() returns null because it tries to get the context from thread local variable. But because the executing thread was not initialized by JSF (which is done by javax.faces.webapp.FacesServlet) but created by executor then the thread local variable is null.
I have no idea why NullPointerException does not occur sometimes. By default SimpleAsyncTaskExecutor creates new thread each time unless you specify a thread pool. When I recreated the example it happened every time. Maybe it does but is not logged properly...
To solve your problem you need to resort to polling. For instance you can use property of backing bean to indicate that job was finished.
#Named("someBean")
#SessionScoped
public class SomeBean {
private volatile boolean jobDone = false;
public String execute() {
SimpleAsyncTaskExecutor tasks = new SimpleAsyncTaskExecutor();
tasks.submitListenable(new Callable<String>() {
public String call() throws Exception {
//Do long time taking job in approximately 16 seconds
doTheBigJob();
jobDone = true
return "";
}
});
return null;
}
public boolean isJobDone() {
return jobDone;
}
}
On your page you enter component which is rendered when jobDone==true. For instance:
<h:outputText id="jobDoneText" rendered="#{someBean.jobDone}" value="Job finished"/>
Then using polling and AJAX you update your current page.
In pure JSF the only way to do polling is to use combination of JavaScript and JSF AJAX requests.
Alternatively you can use Primefaces component p:poll to poll for changes.
<p:poll interval="1" update="jobDoneText" />
More about polling in JSF can be found in answers to the following question: JSF, refresh periodically a component with ajax?

spring MVC Callable execution continues even after request timeout?

I have an Asynchronous handlermethod like this
#RequestMapping("/custom-timeout-handling")
public #ResponseBody WebAsyncTask<String> callableWithCustomTimeoutHandling() {
Callable<String> callable = new Callable<String>() {
public String call() throws Exception {
while(i==0){
System.out.println("inside while loop->");
}
return "Callable result";
}
};
return new WebAsyncTask<String>(10000, callable);
}
which will execute the while loop until the specified timeout(10sec).
When the request is timeout,it executes the handleTimeout method from TimeoutCallableProcessingInterceptor
public class TimeoutCallableProcessingInterceptor extends CallableProcessingInterceptorAdapter {
#Override
public <T> Object handleTimeout(NativeWebRequest request, Callable<T> task) throws Exception {
throw new IllegalStateException("[" + task.getClass().getName() + "] timed out");
}
}
Source:i have replaced
Thread.sleep(2000)
with
while(i==0){
System.out.println("inside while loop->");
}
My problem is even after timeout(finished executing handle timeout method)response is send from handletimeout method
the while loop is still processing until the value of i is changed to some other value other than zero.
Is the request is still held by the server?then what is the use of request timeout?
Thanks in advance...
When a servlet container thread detects that a async callable has timed-out, it invokes handleTimeout() (in its own context). Thats the reason you see the handleTimeout() getting executed. It is executed by a servlet container thread and not by the thread that runs the Callable.
If you want custom timeout handling, you need to do two things:
Override onTimeout() in your WebAsyncTask. Whatever callable you provide as the callback to onTimeout() will be invoked within a servlet container thread when it detects your callable has timed-out.
Check for timeouts/interruptions in the Callable you have created inside the controller.
If your Callable does not expect and respect interruption ("If the target thread does not poll the interrupted status the interrupt is effectively ignored"), there is no way to interrupt it! Pls refer this answer to know how to expect and respect interruption.

Resources