Add record Progressive Number after tUnite component

Add record Progressive Number after tUnite component - etl

I have the following issue: I am taking records from several tables and then I combine them with tUnite component in order to create a .txt file. After that I sort retrieved rows based on a certain field. The output file is like this:
myField
HeaderRecord xxxxx
DetailRecord xxxxx
DetailRecord xxxxx
AdditionalRecord empty -> field I want to calculate for Additional Record
DetailRecord xxxxx
DetailRecord xxxxx
AdditionalRecord empty -> field I want to calculate for Additional Record
DetailRecord xxxxx
AdditionalRecord empty -> field I want to calculate for Additional Record
What I want to do is to calculate "myField" just for "AddionalRecord" (TAB3). This Field is a Progressive number which must works like showed in the screenshot below.
How can I achieve my goal after I merge records with tUnite?
myField
HeaderRecord xxxxx
DetailRecord xxxxx
DetailRecord xxxxx
AdditionalRecord 000000004 -> field I want to calculate for Additional Record
DetailRecord xxxxx
DetailRecord xxxxx
AdditionalRecord 000000007
DetailRecord xxxxx
AdditionalRecord 000000009
I attached myJob below:
Does anyone could help me? Thank you in advance.
It seems I cannot read the value of row6.message_type_id

Related

Merge tab delimited files, sort by columns and eliminate duplicates

I'm trying to merge two large text files, eliminate duplicates and create a new file using Powershell. I'm not familiar with Powershell, but haven't been able to draft a script to accomplish this task thus why I need help.
Merge two tab delimited tabs with 35 columns
Sort merged file by three columns. Note some column header names contain spaces.
a. Column one is text field that needs to be sorted ascending
b. Column two is text field that needs to be sorted descending
c. Column three is date field that need to be sorted descending
Identify the first occurrence of a record using a from step 2 and save all columns to a new tab delimited file.
Current
Customer Name State Date of last purchase + 32 columns...............
ABC Company TX 12/30/2022 11:01:54
DEF Company FL 10/01/2022 09:15:35
ABC Company TX 10/15/2022 03:14:18
ABC Company TX 09/25/2022 08:29:37
DEF Company FL 08/31/2022 10:48:03
DEF Company FL 10/01/2022 02:11:58
Result
Customer Name State Date of last purchase + 32 columns................
ABC Company TX 12/30/2022 11:01:54
DEF Company FL 10/01/2022 09:15:35
I tried several none have been successful..
Result
Customer Name State Date of last purchase + 32 columns................
ABC Company TX 12/30/2022 11:01:54
DEF Company FL 10/01/2022 09:15:35

How to extract key-value pairs from CSV using Talend

I have data for one column in my CSV file as :
`column1`
row1 : {'name':'Steve Jobs','location':'America','status':'none'}
row2 : {'name':'Mark','location':'America','status':'present'}
row3 : {'name':'Elan','location':'Canada','status':'present'}
I want as the output for that column as :
`name` `location` `status`
Steve jobs America none
Mark America present
Elan Canada present
But sometimes I have row value like {'name':'Steve Jobs','location':'America','status':'none'},{'name':'Mark','location':'America','status':'present'}
Please help !

You have to use tMap and tExtractDelimitedFields components.
Flow,
Below is the step by step explination,
Original data - row1 : {'name':'Steve Jobs','location':'America','status':'none'}
Substring the value inside the braces using below function
row1.Column0.substring(row1.Column0.indexOf("{")+1, row1.Column0.indexOf("}")-1)
Now the result is - 'name':'Steve Jobs','location':'America','status':'none'
3.Extract single columns to multiple using tExtractDelimitedFields. Since the columns are seperated be ,, delimiter should be provided as comma. And we have 3 fields in the data, so create 3 fields in the component schema. Below is the snipping of the tExtractDelimitedFields component configuration
Now the result is,
name location status
'name':'Steve Jobs' 'location':'America' 'status':'none'
'name':'Mark' 'location':'America' 'status':'present'
'name':'Elan' 'location':'Canada' 'status':'present'
Again using one more tMap, replace the column names and single quotes from the data,
row2.name.replaceAll("'name':", "").replaceAll("'", "")
row2.location.replaceAll("'location':", "").replaceAll("'", "")
row2.status.replaceAll("'status':", "").replaceAll("'", "")
Your final result is below,

How to replace more values in one column

I am trying to replace 3 values in one column, but I would like to do it in one step instead of three steps. I don't want to have Replaced Value1, Replaces Value2 and Replaced Value 3.
Imagine you have in column Cars only these values: Volkswagen, Renault and Dacia. You want to replace them like:
Volkswagen --> VW
Renault --> RN
Dacia --> DC
Is it possible to do it in one step instead of 3? I am trying to use statement Table.ReplaceValue
Many thanks

One of the methods is creating RenameCars table like this:
After adding this table to PQ you may use following formula:
= Table.TransformColumns(YourTable, {"Cars", each
try RenameCars{[Name = _]}[Name_mod] otherwise _})
Another way (if your list of replacements is quite short) is using Record.FieldOrDefault function. In this case supporting table is not needed.
= Table.TransformColumns(YourTable, {"Cars", each
Record.FieldOrDefault([Volkswagen = "VW", Renault = "RN", Dacia = "DC"],_,_)})

Similar to #Aleksei's answer, you can use the Table.ReplaceValue function instead if you prefer:
= Table.ReplaceValue(YourTable, each [Car], each RenameCars{[Name = [Car]]}[Name_mod], Replacer.ReplaceText, {"Car"})
This assumes you have the RenameCars table as well.

How to bind values from CSV files with a query to database?

I'm trying to build a report using BIRT. I define several data sources: two CSV-files and MySQL database. A query that receives data from the database looks like this:
SELECT applicationType, STATUS, COUNT(*)
FROM cards
GROUP BY applicationType, STATUS;
Then I created a table with three columns that outputs these values from the query:
So far so good. But I want to output values from CSV-files instead of applicationType and status. The first file, apptype.csv, has the following structure:
applicationType,apptypedescr
1,"Common Type"
2,"Type 1"
...
and the second one, statuscards.csv, has the following structure:
status,statuscards
1,"Blocked"
2,"Normal"
...
And instead of:
Тип приложения | Статус карты | Количество
---------------|--------------|------------
1 | 2 | 55
I want to output the following:
Тип приложения | Статус карты | Количество
---------------|----------------|------------
Common Type | Normal | 55
I alse created New Joint Data Set to bind MySQL dataset and the first file dataset:
But I don't know how to change the table now. As far as I understand, [applicationType] in the first column should be replaced with [apptypedescr]:
but I'm not able to drag this field into the table, it's possible to add it to the report only outside the table. How can I bind these values from the CSV files to data from the MySQL query in the table?

I did this by setting new dataset for table in Properties -> Binding -> DataSet. After this the report was built properly:

Adding new columns for data using Mapreduce

Is it possible to append columns in mapreduce while processing data?
Example:
I have input dataset with 3 columns[EMPID,EMPNAME,EMP_DEPT] and I want to process these data using mapreduce. In the reduce phase is it possible to add new columns say TIMESTAMP(system timestamp when record get processed). Output of the reducer should be EMPID,EMPNAME,EMP_DEPT,TIMESTAMP
Input Data:
EMPID EMPNAME EMP_DEPT
1 David HR
2 Sam IT
Output Data:
EMPID EMPNAME EMP_DEPT Timestamp
1 David HR XX:XX:XX:XX
2 Sam IT XX:XX:XX:XX

It seems the purpose of your MapReduce is just to add the timestamp "column" (regarding your input and output example there is no other modification/transformation/processing of the EMPID, EMPNAME and EMP_DEPT fields). If that is the case, the only thing you have to do is to add to the read lines ("rows") the timestamping in the mapper; then let the reducer joins all the new "rows". Workflow:
Each input file is splited into many chunks:
(input file) --> spliter --> split1, split2, ..., splitN
Each split content is:
split1 = {"1 David HR", "2 Sam IT"}
split2 = {...}
Splits are assigned to mappers (one per split), which output (key,value) pairs; in this case, it is enough with a common key for all the pairs:
(offset, "1 David HR") --> mapper1 --> ("key", "1 David HR 2015-06-13 12:23:45.242")
(offset, "2 Sam IT") --> mapper1 --> ("key", "2 Sam IT 2015-06-13 12:23:45.243")
...
"..." --> mapper2 --> ...
...
The reducer receives an array, for each different key, with all the pairs outputted by the mappers that have such a key:
("key", ["1 David HR 2015-06-13 12:23:45.242", "2 Sam IT 2015-06-13 12:23:45.243"]) --> reducer --> (output file)
If your aim is to finally process the original data in some way, do it at the mapper, in addition to the timestamping.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Add record Progressive Number after tUnite component - etl

Related

Merge tab delimited files, sort by columns and eliminate duplicates

How to extract key-value pairs from CSV using Talend

How to replace more values in one column

How to bind values from CSV files with a query to database?

Adding new columns for data using Mapreduce

Categories

Resources