Difference between twarn and tassert in Talend - etl

Please let me know the difference between the two components twarn and tassert in Talend?
Really I don't found the utility of the creation of these components separatley? What is the need/utility of each one on real use case.

twarn : This component provides a priority-rated message to the next component. It does not stop your Job in case of error. If you want to kill a Job in case of error, see tDie. (taken from Talend Help center docs)
tassert : This evaluate the status of a Job execution. It concludes with the boolean result based on an assertive statement related to the execution and feed the result to tAssertCatcher for proper Job status presentation.(taken from Talend Help center docs)
For use-case, please follow the below link:
Error Handling in Talend

Related

does anyone know how to know if an sbmjob finished?

What I'm trying to do is execute 2 SBMJOBs, but first I need program A to finish, in order to execute program B. Does anyone know how to do it?
Program A:
SBMJOB CMD(CALL PGM(PROGRAM1) PARM(PARM1 PARM2)) JOB(PROGRAM1)
Program B:
To run this program, I need the A to finish first, but how can I validate that?
SBMJOB CMD(CALL PGM(PROGRAM2) PARM(PARM1 PARM2)) JOB(PROGRAM2)
Thanks for the help
It would be easier to submit your two jobs on a JOBQ connected to a subsystem which let only one job process at a time.
Your second job will run naturally after the first is finished.
You can choose the JOBQ with the JOBQ parameter on the SBMJOB command.
QBATCH is by default a jobq with only one simultaneous job. But check before, it could have been changed on your system.
See the CHGJOBQE to change configuration on a jobq. General explanation on jobq and subsystem is here IBM doc JOBS and job queues

How to performance test workflow execution?

I have 2 APIs
Create a workflow (http POST request)
Check workflow status (http GET
request)
I want to performance test on how much time does workflow takes to complete.
Tried two ways:
Option 1 Created a java test that triggers workflow create API and then poll status API to check if status turns to CREATED. I check the time taken in this process which gives me performance results.
Option 2 Was using Gatling to do the same
val createWorkflow = http("create").post("").body(ElFileBody("src/main/resources/weather.json")).asJson.check(status.is(200))
.check(jsonPath("$.id").saveAs("id"))
val statusWorkflow = http("status").get("/${id}")
.check(jsonPath("$.status").saveAs("status")).asJson.check(status.is(200))
val scn = scenario("CREATING")
.exec(createWorkflow)
.repeat(20){exec(statusWorkflow)}
Gatling one didn't really work (or I am doing it in some wrong way). Is there a way in Gatling I can merge multiple requests and do something similar to Option 1
Is there some other tool that can help me out to performance test such scenarios?
I think something like below should work when using Gatling's tryMax
.tryMax(100) {
pause(1)
.exec(http("status").get("/${id}")
.check(jsonPath("$.status").saveAs("status")).asJson.check(status.is(200))
)
}
Note: I didn't try this out locally. More information about tryMax:
https://medium.com/#vcomposieux/load-testing-gatling-tips-tricks-47e829e5d449 (Polling: waiting for an asynchronous task)
https://gatling.io/docs/current/advanced_tutorial/#step-05-check-and-failure-management

How can i have 2 input link in DataStage sequence job?

As you can see that when the SEQ_DIM_ACCOUNT Job executed it has 2 conditions with Success and Failure.
I wanted to run execute_command_60 when it's failed, but if execute_command_60 has been run, then i wanted the execute_command_60 to get to the SEQ_DIM_BUSINESS_PARTNER, but when i tried to link the execute_command_60 to SEQ_DIM_BUSINESS_PARTNER it gave me an error "the destination stage cannot support any more input links"
Is there a way to do that?
Yes it is possible with the help of a Sequencer stage.
Add that after the Execute_Command and before the SEQ_DIM_BUSINESS_PARTNER. This Stage kan take any number of Input-Links and you only have to specify if All or Any input links have been run to go on

Is Parse providing 15 seconds for Cloud Code functions

I'm currently coding an app that utilizes Parse as a backend, but have run into a '124' error. I admit that I do a lot in my cloud functions, but, from what I've observed, it doesn't appear over 15 seconds. Could someone please confirm this? Below is the output.
E2015-03-06T03:49:52.644Z] v286: Ran cloud function createEvent for
user puZNjFVfSm with:
Input:
{"RSVPDate":{"__type":"Date","iso":"2015-03-06T04:49:52.000Z"},"description":"Sample event to showcase
functionality","group":{"max":5,"min":4},"max":50,"reoccur":{"day":1,"month":1,"stop":{"__type":"Date","iso":"2015-03-06T04:49:52.000Z"},"week":1},"title":"SampleFCFS"}
Failed with: Execution timed out I2015-03-06T03:49:52.716Z] begin
I2015-03-06T03:49:52.717Z] creating Event - initial checks completed
I2015-03-06T03:49:52.718Z] Finished advanced checks
I2015-03-06T03:49:52.719Z] Event creation start
I2015-03-06T03:49:52.770Z] begin event creation
I2015-03-06T03:49:52.873Z] Finding role: company_employee_z0Zx39OyuY
I2015-03-06T03:49:52.875Z] Added and secured event
I2015-03-06T03:49:52.931Z] attaching role to 425Qy9v9e4
I2015-03-06T03:49:52.934Z] Adding participant
From what I can tell, it looks like I'm only getting 300Z (is that milliseconds?) on all my runs. Shouldn't I be getting 15 seconds?
Update: I found that the issue was caused by using the addUnique function of Parse Objects with an array of pointers. By inserting ids instead of pointers, the issue was resolved.
Thank you for your help.

Hive execution hook

I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.
The current environment I am using is given below:
Hadoop : Cloudera version 4.1.2
Operating system : Centos
Thanks,
Arun
There are several types of hooks depending on at which stage you want to inject your custom code:
Driver run hooks (Pre/Post)
Semantic analyizer hooks (Pre/Post)
Execution hooks (Pre/Failure/Post)
Client statistics publisher
If you run a script the processing flow looks like as follows:
Driver.run() takes the command
HiveDriverRunHook.preDriverRun()
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Driver.compile() starts processing the command: creates the abstract syntax tree
AbstractSemanticAnalyzerHook.preAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Semantic analysis
AbstractSemanticAnalyzerHook.postAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Create and validate the query plan (physical plan)
Driver.execute() : ready to run the jobs
ExecuteWithHookContext.run()
(HiveConf.ConfVars.PREEXECHOOKS)
ExecDriver.execute() runs all the jobs
For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
ClientStatsPublisher.run() is called to publish statistics
(HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
If a task fails: ExecuteWithHookContext.run()
(HiveConf.ConfVars.ONFAILUREHOOKS)
Finish all the tasks
ExecuteWithHookContext.run() (HiveConf.ConfVars.POSTEXECHOOKS)
Before returning the result HiveDriverRunHook.postDriverRun() ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Return the result.
For each of the hooks I indicated the interfaces you have to implement. In the brackets
there's the corresponding conf. prop. key you have to set in order to register the
class at the beginning of the script.
E.g: setting the PreExecution hook (9th stage of the workflow)
HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;
Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.
Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distribution
differs (too much)
a good start --> http://dharmeshkakadia.github.io/hive-hook/
there are examples...
note: hive cli from console show the messages if you execute from hue, add a logger and you can see the results in hiveserver2 log role.

Resources