WebSphere Liberty Java Batch Can we Pass Batchlet step Output to Chunk step as a Input Parameter runtime - websphere-liberty

In WebSphere Liberty Java Batch,
Is it possible to pass first Step Output to Next step as input parameter.
e.g. First step is Batchlet and second step is chunk. Once first step completes its execution output should be passed to second step runtime..

I'm guessing you are thinking of this in z/OS JCL terms where a step would write output to a temporary dataset that gets passed to a subsequent step. JSR-352 doesn't get into dataset (or file) allocation. That's up to the application code. So you could certainly have a step that wrote output into a file (or dataset) and a later step could certainly read from that same file (or dataset) if it knew the same. You could make the name into a job property that was provided as a property to the batchlet and reader. You could even externalize the value of the job property as a job parameter.
But nothing is going to delete the file for you at the end of the job (like a temporary dataset would get deleted). You'll need to clean up the file yourself.
Is that what you were asking?

You can use the JobContext user data: JobContext.set/getTransientUserData().
This does not however allow you to populate a batch property (via #Inject #BatchProperty) in a parallel way to the manner in which you can supply values from XML via substitution with job parameters.
We have raised an issue to consider an enhancement for the next revision of the Batch specification to allow a property value to be set dynamically from an earlier portion of the execution.
In the meantime there is also the possibility to use CDI bean scopes to share information across steps, but this also is not integrated with batch property injection.

Related

Identify a spring-batch job instance with incrementer

let me discribe shortly what I want and what I - maybe - know.
I want spring-batch to run a async job; in future more jobs.
The job gets two parameters: an external id and a year.
The job should be able to be restarted after completion because the user wants to run a job with the same parameters again and again.
Only one job should be executed with the same parameters at the same time.
From outside (web interface) it should be possible to query if a job is running by job name and parameters.
The querier could be different from the job starter so an instance or execution id is not present.
I know that a job instance is the representation of the job(name) and the parameters and - like you commented - I cannot rerun a job with the same parameters if the instance/execution is marked completed - except I use a incrementer.
But this changes the parameters by adding a run.id. Now a job is restartable but I and sping-batch itself are not able to identify a running job instance (by name and original parameters) anymore because every job run results in a new instance.
And the question "why would one would restart a successfully completed job instance?" is easy to answer: The user outside don't know about job/instance/execution. The user will start some data processing for a year again and again. And it's my task to make it possible :).
So it would be nice if spring-batch can let the user know "the job with your original parameters is still running".
Question:
What would be a good solution for my needs?
I didn't tried something but thought about it. Maybe I can write an own JobDao for my query? But this will not solve the run-instance-at-same-time problem. Or I can customize the JdbcJobInstanceDao or SimpleJobRepository? Maybe I must add a own job_key which contains only the original parameters?
To correctly understand the answer I am going to give to your question, it is important to know the difference and understand the relation between a job, a job instance and a job execution in Spring Batch. The The Domain Language of Batch section of the reference documentation explains that in details with examples.
The job should be able to be restarted after completion.
This is not possible by design, or more precisely, a job instance cannot be restarted after completion by design (Think of it like "why would one would restart a successfully completed job instance?").
From outside (web interface) it should be possible to query if an instance is running by job name and parameters. There querier could be different from the job starter so an instance or execution id is not present.
The JobExplorer is the API you are looking for. You can ask for job instances and job executions as needed.
Question: What would be a good solution for my needs?
In your case, you receive an external ID and a year as a job execution request. Those two parameters can be used as identifying parameters to define job instances. With this in place, if a job instance is failed, you can restart it by using the same parameters.
I see no need for an incrementer in your case. The incrementer is useful for jobs for which the instances can be defined as a "sequence" that can be "incremented". I see no need to create a custom DAO or JobRepository neither, you should be able to implement your requirement with the built-in components by correctly defining what a job instance is.
For my use-case I have to check if a execution for a job/parameters-combination is running. The parameters here are without run.id of an incrementor. This check must be done before a job run and by explicit rest call. Normally spring-batch checks for running executions but because of the used incrementor every job instance is unique and it will never find any.
So I created a bean with a check method and made use of jobExplorer.findRunningJobExecutions(jobName);. The result can then compared with the used paramters by iterating over JobExecution.getJobParameters().getParameters().
The bean can be used in the rest-method and in an own implemention of JobLauncher.run().
Another solution would be to store the increment separately for a job/parameters-combination. But I don't want to do this not least because I think a framework like spring-batch should do this for me or supports me by reusing/restarting a completed job instance.

jMeter how to re-execute CSV Data Set Config

Using jMeter to set up a soak / load test that needs to run each request with different data. The structure I currently have is
Thread Group (2 Users, 2 Loops)
- Simple controller
-- Java Sampler (Custom Plugin) to convert CSV Formula data into NewThreadData.csv (as variable)
-- Java Sampler (Custom Plugin) to create directory of files created with NewThreadData.csv merged into a template
-- While Controller Condition - js NewThreadData column not = EOF
--- CSV Data Set Config NewThreadData.csv (Recycle False / Stop of EOF False) - filename passed as variable
--- JMS PUblisher with the FileName a variable using the filename column from within NewThreadData.csv
My problem is the on the second loop, the data is updated in NewThreadData.csv, but the CSV in the while loop never runs again.
Appears that the CSV Data Set Config "knows" it has been run, regardless of the actual CSV data.
Questions
How can I get the CSV Data Set Config to be rerun / re executed in this scenario?
Are there undocumented variables or means of getting the config to reprocess?
Is there a way to spawn a new thread on each iteration rather than reusing the existing thread, as the CSV does execute once for each "User"[thread]. I also tried Stop on EOF : True, but that stopped the second loop.
Aim is to eventually ramp up the user count and the number of loops (changing to forever); with there being about about 100 different combinations of data to be inserted on each loop. The formula I am using has time and thread number to give me data uniqueness along with other data that is dynamically created from a formula. Recycle on EOF is not feasible as I need to regenerate the csv contents on each loop. A super-csv I don't think is feasible to cover the load and soak scenarios.
Thanks in anticipation. Andrew
I don't think it's possible to "reset" the CSV Data Set Config, it's a configuration element hence once it reads the file in its current state it will "stick" to its content.
If you're manipulating file content dynamically I would rather recommend going for __CSVRead() function instead which is evaluated in the runtime just where it's placed therefore it doesn't reserve file and it "rewinds" to the beginning of the file when the last line is read.
More information: How to Pick Different CSV Files at JMeter Runtime
CSV Data Set Config element is executed first and only once even though you have placed it within the while controller. Hence CSV Data Set Config element is not suitable for your requirement.
You could use JSSR223 Pre-processor to work with the dynamic CSV files using a supported programming language/script (Java , Groovy)

Working of onTrigger - nifi custom processor

I just started learning about the custom processor in nifi. I want to understand the specific case of working of onTrigger. I am doing some operations in onTrigger function using the property values which are defined in the nifi flow Processor interface.
Ex: Property value in the custom processor takes a string separated by ',' and in the onTrigger function I write a code which converts the string into an array of String and removes the additional white spaces.
My question is will this operation run every time a flowfile passes through the custom processor or will it be converted only once.
I tried going through the official development docs but could'nt find info on this
The Java code of a processor is compiled when you run a Maven build to produce the NAR file. The code is not compiled by NiFi itself.
You then deploy a NAR file to a NiFi instance by placing it in the lib directory, and then you use components from that NAR in your flow by adding them to the canvas.
Once a component is on the canvas and it is started, then the onTrigger method is called according to the scheduling strategy.
Whatever code is in onTrigger will run for every execution of the processor, so your code to read the property and split the value will run every time.
If the property supports expression language from flow files, then you need to run this code every time in onTrigger because the resulting value could be different for every flow file.
if the property does not support expression language from flow files, then you can instead use a method with #OnScheduled and process the property value into whatever you need, and store in a member variable of the processor, this way it only happens one time.

Always read first n lines on spring batch job restart

I am using spring batch module to read a complex file with multi-line records. First 3 lines in the file will always contain a header with few common fields.
These common fields will be used in the processing of subsequent records in the file. The job is restartable.
Suppose the input file has 10 records (please note number of records may not be same as number of lines since records can span over multiple lines).
Suppose job runs first time, starts reading the file from line 1, and processes first 5 records and fails while processing 6th record.
During this first run, since job has also parsed header part (first 3 lines in the file), application can successfully process first 5 records.
Now when failed job restarted it will start from 6th record and hence will not read the header part this time. Since application requires certain values
contained in the header record, the job fails. I would like to know possible suggestions so that restarted job always reads the header part and then starts
from where it left off (6th record in the above scenario).
Thanks in advance.
i guess, the file in question does not change between runs? then it's not necessary to re-read it, my solution builds on this assumption
if you use one step you can
implement a LineCallbackHandler
give it access to the stepExecutionContext (it's easy with annotations, but can be too with interfaces, just extend StepExecutionListenerSupport)
save the header values into the ExecutionContext
extract them from the context and use them where you want to
it should work for re-start as well, because Spring Batch reads/saves the values from the first run and will provide the complete ExecutionContext for subsequent runs
You can make 2 step job where:
First step reads first 3 lines as header information and puts everything you need to job context (and therefore save it in DB for future executions if job fails). If this step fails, header info will be read again and if it passes you are sure it will always have header info in job context.
Second step can use same file for input but this time you can tell it to skip first 3 lines and read rest as is. This way you will get restartability on that step and each time job fails it will resume where it left of.

Set result from previous Reducer as configuration parameter

As part of the calculation logic , In a Mapreduce workflow i need to take the result from a reducer as parameter for the next reducer in the chain.
Path plc =new Path(args[1]+"/3");--> Outputpath from the previous reducer
Configuration c4= new Configuration();
c4.set("denom", GetLineC.extCount(plc));---> GetLineC.extCount is a function that returns a value
ControlledJob cJob4= new ControlledJob(c4);
Im using JobControl to create the dependency between the jobs and all the configuration. When the program is executed it gives "No such file or directory".In the flow when the control reaches this part the file will be present in this location. But since the configuration is instantiated in the beginning this error is showing up.
Is there a way to set the single line output from the previous reducer as a parameter directly?
Well, I think you mean previous job instead of previous reducer. If you're executing the two jobs using the same driver class, you already know the output of the last job, which is a directory. Clearly you're using only one reducer and it will write its output in a part-r-00000 file inside the output path. To set it as a configuration parameter to the next job, you will have to read this file manually.
Are you considering that in GetLineC.extCount(Path path)?

Resources