Running parallel jobs in talend - parallel-processing

I have a situation where i need to run five different child jobs in talend in parallel. Problem is that, in my select query i would be getting five different ID's and then for each particular id , i need to run five different jobs. Problem with tparrallelize component is that , it does not allow me to pass context variables to each sub job, i.e id in this particular case.
select id from table limit 5; ----> five different instance of same job with different id as parameter
Any help would be highly appreciated
thanks

I'm not sure if I properly understand what you're doing here but if you were to break out each of those IDs and store them as 5 separate context variables then each job could access their own context variable with the right ID stored for each of them and use that.
So I would start with your database input component (just select the IDs you want) and feed that into a tFlowToIterate. Connect this via an iterate flow into a tFixedFlowInput component and create 2 fields in your schema, "key" and "value". Use the inline table to specify that "key" should be ((Integer)globalMap.get("tFlowToIterate_1_CURRENT_ITERATION")) and "value" should be ((String)globalMap.get("row1.SupplierPartNumber")).
I'd then throw this into a tMap component where I'd put "ContextNumber" + row2.key into the mapped key column just to make it a bit more obvious than the iteration number as your context and then feed that directly into a tContextLoad.
From there you can OnSubjobOK to your tParallelize component and link all your jobs together. In each job configure the jobs to use the appropriate context variable.

Related

Jmeter interactions with UI and Database inserts simultaneosly

When we create a Jmeter script through Blazemeter/third party script recorder and there are some insert/update and delete functions on UI involve in records. Just want to know when we run same JMeter script with 100 users, Do those new records get inserted/update/delete in database as well ? If yes, then what should be remaining 99 users data if there are unifications on UI.
When you record the user action it results into hard-coded values so if you add foo line in the UI it gets added to the database.
When you replay the test with 1 user depending on your application implementation
either another foo line will get added to the database
or you will get an error regarding this entry is present already
When you run the same test with 100 users the result will be the same, to wit:
either you will have 100 identical new/updated entries
or you will have 100 errors
So I would suggest doing some parameterization of your tests so each thread (virtual user) would operate its own unique data, like:
have a CSV file with credentials for 100 users which can be read using CSV Data Set Config
when you add an entry you can also consider adding an unique prefix or postfix to this like:
current virtual user number via __threadNum() function
current iteration via ${__jm__Thread Group__idx} pre-defined variable
current timestamp via __time() function
unique GUID-like structure via __UUID() function
etc.

Talend loop for each record

Hi i am designing a data generation job.
my job is something like this
tRowGenerate --> tMap --> tFileOutputDelimited.
Lets say my tRowGenerate produces 5 columns with 2 records. I want to iterate for this records i.e for each record I want to iterate certain number of times.
for record 1 iterate 5 times to produce further data.
for record 2 iterate 3 times to produce further data.
Please suggest how to apply this multiply by xi logic. where xi for each record can change.
Thanks!
If you want to loop on the data generated from the tRowGenerator you can use a tLoop where you put the call to your business rule to determine the number of loops or when stop looping.
An example job might look like:
Logic of flow:
row1 is a main connection taking the generated values to the tFlowtoIterate that stores them in global variables;
the iterate link activates the tLoop that can use the values stored in the global vars to activate your business rule (to have the number of loops or tho ask if continue or stop);
the tLoop activate the tJavaFlex that uses the stored global vars to produce the output you like and pass it to the tFileOutputDelimited with a main link (row2).
You have to activate the append flag on the tFileOutputDelimited to keep the data from the different loops. If you need you can add a tFileDelete at the beginning to empty the output file before a new processing round.

how to create a new record dynamically using informatica powercenter

I have employee's leaves related data and payment related information.
e.g. Employee E1 has taken maternity leave this year. She needs to paid for 6 months and if she is on leave for greater duration (like 8 months) , I need to create two records for her.
One for the allowed duration and the other for extended duration.
Employee LeaveStartDAte LeaveEndDate Total_days_taken Total_days_allowed LeaveType
e1 1Jan2013 31Aug2013 242 186 ML
Target expected :
Employee LeaveStartDAte LeaveEndDate Leavetype
e1 1Jan2013 30June2013 ML
e1 1July 2013 31Aug2013 Extended ML
How can create the second record dynamically in informatica mapping?
Generally speaking, we use java transformation in informatica to dynamically create new rows. However, for scenarios like the one you described, where you only need to create one extra row based on some condition you can achieve this by adding two target instances and populating the second target instance conditionally (using a router or filter transformation).
You can do something like this:
Create two sets of ports for LeaveStartDate, LeaveEndDate and LeaveType in an expression, and calculate their values accordingly. For example:
LeaveStartDate1 -> source LeaveStartDate
LeaveStartDate2 -> LeaveStartDate + Total_days_allowed + 1
Now connect first set of ports directly to a target instance. Connect the second set of ports to another target instance through a filter. The filter condition would be something like Total_days_taken > Total_days_allowed. You can also do this using a router, if you like.
You can use two pipelines in a mapping - one to load the records for insert and the 2nd used to combined the insert with update.

SSIS For Loop Container equivalent in Talend Open Studio

I have a talend mapping which needs to be executed based on a ID. I want to pass the ID as a parameter. The mapping should execute for one ID at an time. I want to loop the execution for each ID one after the other. This can be achieved in SSIS using the For Loop Container. Can anyone help me finding out the equivalent for the same in Talend Open Studio.
Thanks in advance.
If you take just the Id part of your input and then link that to the main part of your current job with an Iterate link via a tFlowToIterate component it should automatically do this. You can access the value from the GlobalMap using something along the lines of ((String) GlobalMap.get("row1.Id")).
You can use the analogue tForeach component, and set the ID's values inside it. After that, you should connect the iterate output and reference the current value with the variable ((String)globalMap.get("tForeach_1_CURRENT_VALUE")) where the tForeach_1 is the name of your tForeach component.
And set the Query to something like:
"select id, name from employee
where id="+((String)globalMap.get("tForeach_1_CURRENT_VALUE"))

Single Database Call With Many Parameters vs Many Database Calls With Few Parameters

I am writing a Content Management System which can store meta-data about different document-types. Each document-type has its own set of meta-data fields. For example a Letter has fields like "To", "From", "ToAddress", "FromAddress" etc whereas a MinutesOfMeeting has fields like "DateHeldOn", "TimeHeldOn", "AttendedBy" etc.
I am saving this information in database in two tables: General and Specific. General store information which is common to all types such as DocumentOwnerName, DocumentCreatedDate, DocumentSize etc. Specific table is not one table but a set of 35 different tables, one for each document-type.
I have a page which contains a grid in which I show list of document. One record corresponds to one document. Since the grid is made to show documents of all types therefore first row may show a letter, second a MinutesOfMeeting, third a Memo etc.
I have also made a search feature where user can set criteria on basis of which documents list is retrieved. To make it work, there are four search-related parameters for each of the field in each of the specific tables, and all of these parameters are passed to a central procedure. This procedure then filter out records on basis of criteria.
The problem is, dealing with 35 different document-types, each having like 10 fields, I end up with more than a thousand parameters for the procedure. This is a maintenance nightmare. I am looking for a solution.
One solution is to deal with each of the specific table individually, getting back Ids, then union them. This is fine, except that I have to make 36 different calls to the database, one each for a specific table plus one for the general table.
It all boils down to a simple architecture choice: Should I make a single database call passing many parameters or should I make many database calls passing few parameters.
Which approach is more preferable and why?
Edit: The web-server and database-server are on the same machine. Therefore, network speed shouldn't matter.
When designing an API where I need a procedure to take a large number of related parameters, or even a variable list of parameters, I use record types, e.g.:
TYPE param_type IS RECORD (
To
From
ToAddress
FromAddress
DateHeldOn
TimeHeldOn
AttendedBy
);
PROCEDURE do_search (in_params IN param_type);
The structure of the record is up to you, of course. If the procedure is coded to ignore the record elements that are NULL, then all the caller needs to do is set those elements that are required, e.g.:
DECLARE
p param_type;
BEGIN
p.DateHeldOn := DATE '2012-01-01';
do_search(p);
END;

Resources