SyntaxHighlighter

Friday, June 22, 2012

Thread Pool Issues and JobLauncher

Clustered Spring Batch and Thread Pools

Scenario

in a scenario where multiple instances of a batch job are distributed in a cluster, or even multiple threads in a single instance may be able to run a job, we need some way of indicating that a given instance is 'tapped out' of resources and can't run a job.  The JobLauncher interface currently provides four possible exceptions, but doesn't throw an exception for when the thread pool is done.

to enable a given cluster node to reject a job start request, allowing another node to process it instead, we need to manage a TaskRejectedException.  in the current SimpleJobLauncher class in org.springframework.batch.launch.support, this exception is caught, logged, and the job is marked as having immediately failed.   this can be seen as correct behavior (the job failed because it couldn't start), but how do we note that in such a way that we can 'load balance' the job start to another node?

One Solution


one solution would be to intercept the return of the job failure and throw an exception as a consequence, allowing the job start request to be handled as a 'failed to process the request' and then load balanced to another node.  the components of this are as follows;

- around aspect tied to org.springframework.batch.launch.JobLauncher.run
- aspect intercepts the returned JobExecution object and verifies for status
- if the status is 'failed', checks the exit status description fo the TaskRejectedException
- if it's there, throws the TaskRejectedException up to the caller

when coupled with Spring Integration, this allows for the message transaction to be rolled back to the adapter (e.g. JMS) causing the original request to be placed back on the queue for consumption by either another node, or the same node at a later interval.

Considerations

we may see a handler built into the Spring Batch Framework in subsequent releases, hence the aspect approach would allow for us to 'unwind' this change in a subsequent upgrade.  personally, this change is unlikely to be included except for a major release due to changing the signature on the JobLauncher interface.

No comments:

Post a Comment