Process an input file having multiple name value pairs in a line - etl

I am writing kettle transformation.
My input file looks like following
sessionId=40936a7c-8af9|txId=40936a7d-8af9-11e|field3=val3|field4=val4|field5=myapp|field6=03/12/13 15:13:34|
Now, how do i process this file? I am completely at loss.
First step is CSV file input with | as delimiter
My analysis will be based on "Value" part of name value pair.
Has anyone processes such files before?

Since you have already splitted the records into fields of 'key=value' you could use an expression transform to cut the string into two by locating the position of the = character and create two out ports where one holds the key and the other the value.
From there it depends what you want to do with the information, if you want to store them as key/value route them trough a union, or use a router transform to send them to different targets.
Her is an example of an expression to split the pairs:

You could use the Modified Javascript Value Step, add this step after this grouping with pipes.
Now do some parsing javascript like this:
var mainArr = new Array();
var sessionIdSplit = sessionId.toString().split("|");
for(y = 0; y < sessionIdSplit.length; y++){
mainArr[y] = sessionIdSplit[y].toString();
//here you can add another loop to parse again and split the key=value
}
Alert("mainArr: "+ mainArr);

Related

Office Script - Split strings in vector & use dynamic cell address - Run an excel script from Power Automate

I'm completely new on Office Script (with only old experience on Python and C++) and I'm trying to run a rather "simple" Office Script on excel from power automate. The goal is to fill specific cells (always the same, their position shouldn't change) on the excel file.
The power Automate part is working, the problem is managing to use the information sent to Excel, in excel.
The script take three variables from Power automate (all three strings) and should fill specific cells based on these. CMQ_Name: string to use as is.
Version: string to use as is.
PT_Name: String with names separated by a ";". The goal is to split it in as much string as needed (I'm stocking them in an Array) and write each name in cells on top of each other, always starting on the same position (cell A2).
I'm able to use CMQ_Names & Version and put them in the cell they're supposed to go in, I've already make it works.
However, I cannot make the last part (in bold above, part 2 in the code below) work.
Learning on this has been pretty frustrating as some elements seems to sometime works and sometimes not. Newbie me is probably having syntax issues more than anyting...  
function main(workbook: ExcelScript.Workbook,
CMQ_Name: string,
Version: string,
PT_Name: string )
{
// create reference for each sheet in the excel document
let NAMES = workbook.getWorksheet("CMQ_NAMES");
let TERMS = workbook.getWorksheet("CMQ_TERMS");
//------Part 1: Update entries in sheet CMQ_NAMES
NAMES.getRange("A2").setValues(CMQ_Name);
NAMES.getRange("D2").setValues(Version);
//Update entries in sheet CMQ_TERMS
TERMS.getRange("A2").setValues(CMQ_Name);
//-------Part 2: work with PT_Name
//Split PT_Name
let ARRAY1: string[] = PT_Name.split(";");
let CELL: string;
let B: string = "B"
for (var i = 0; i < ARRAY1.length; i++) {
CELL = B.concat(i.toString());
NAMES.getRange(CELL).setValues(ARRAY1[i]);
}
}
  I have several problems:
Some parts (basically anything with red are detected as a problem and I have no idea why. Some research indicated it could be false positive, other not. It's not the biggest problem either as it seems the code sometimes works despite these warnings.
Argument of type 'string' is not assignable to parameter of type '(string | number | boolean)[ ][ ]'.
I couldn't find a way to use a variable as address to select a specific cell to write in, which is preventing the for loop at the end from working.  I've been bashing my head against this for a week now without solving it.
Could you kindly take a look?
Thank you!!
I tried several workarounds and other syntaxes without much success. Writing the first two strings in cells work, working with the third string doesn't.
EDIT: Thanks to the below comment, I managed to make it work:
function main(
workbook: ExcelScript.Workbook,
CMQ_Name: string,
Version: string,
PT_Name: string )
{
// create reference for each table
let NAMES = workbook.getWorksheet("CMQ_NAMES");
let TERMS = workbook.getWorksheet("CMQ_TERMS");
//------Part 0: clear previous info
TERMS.getRange("B2:B200").clear()
//------Part 1: Update entries in sheet CMQ_NAMES
NAMES.getRange("A2").setValue(CMQ_Name);
NAMES.getRange("D2").setValue(Version);
//Update entries in sheet CMQ_TERMS
TERMS.getRange("A2").setValue(CMQ_Name);
//-------Part 2: work with PT_Name
//Split PT_Name
let ARRAY1: string[] = PT_Name.split(";");
let CELL: string;
let B: string = "B"
for (var i = 2; i < ARRAY1.length + 2; i++) {
CELL = B.concat(i.toString());
//console.log(CELL); //debugging
TERMS.getRange(CELL).setValue(ARRAY1[i - 2]);
}
}
You're using setValues() (plural) which accepts a 2 dimensional array of values that contains the data for the given rows and columns.
You need to look at using setValue() instead as that takes a single argument of type any.
https://learn.microsoft.com/en-us/javascript/api/office-scripts/excelscript/excelscript.range?view=office-scripts#excelscript-excelscript-range-setvalue-member(1)
As for using a variable to retrieve a single cell (or set of cells for that matter), you really just need to use the getRange() method to do that, this is a basic example ...
function main(workbook: ExcelScript.Workbook) {
let cellAddress: string = "A4";
let range: ExcelScript.Range = workbook.getWorksheet("Data").getRange(cellAddress);
console.log(range.getAddress());
}
If you want to specify multiple ranges, just change cellAddress to something like this ... A4:C10
That method also accepts named ranges.

How would I convert a txt file containing a lot of symbols into a array?

so I just have a quick question. The program is supposed to create a character array, and get the content from a text file, containing a lot of random symbols like &,?,!,letters, and numbers. I am not allowed to create seperate arrays, and put them into the 2d array instead. How would I go about doing so? I already know the number of rows and columns because it tells me at the top of the file before actually having all the symbols and stuff. Heres what I have so far:
char [][]charArray=new char[a][b];
for(int z=0;z<charArray.length;z++)
{
for(int y=0;y<charArray[y].length;y++)
{
charArray[y]=fileReader.next();
}
}
So A is the number of rows, and B is the number of columns to read from. When I run the program, it says that it is expecting a char []charArray, and it found a string, and the error is called an incompatible type error.
ALso ps: fileReader is my scanner to read from a file. THanks!
First of all, you need to use more descriptive names for your variables. For example, why name the variable a when a really represents the number of rows in the file? Instead, use numRows (and likewise for b, use numCols). Also, you really should name your scanner scanner. There is a FileReader class and your fileReader variable name is misleading---it makes everyone think you're using a FileReader instead of a Scanner. Finally, the brackets used to declare an array type in Java are normally placed adjacent to the type name, as in char[][] instead of char [][]. This does not change the way the code executes, but it conforms better to common convention.
Now, to your problem. You stated that the number of rows/columns are declared at the beginning of the file. This solution assumes the file does in fact contain numRows rows and numCols columns. Basically, next returns a String. You can use String.toCharArray to convert the String to a char[]. Then you simply copy the characters to the appropriate position in your charArray.
Scanner scanner = new Scanner(theFile);
char[][] charArray=new char[numRows][numCols];
for (int i = 0; i < numRows; i++) {
final char[] aLine = scanner.next().toCharArray();
for(int j = 0; j < aLine.length;j++){
charArray[i][j] = aLine[j];
}
}

Can we aggregate dynamic number of rows using Talend Open Studio

I'm a beginner in Talend Open Studio, and I'm trying to do the transformation below.
From a SQL Table that contains:
DeltaStock Date
------------------------
+50 (initial stock) J0
+80 J1
-30 J2
... ...
I want to produce this table:
Stock Date
-----------
50 J0
130 J1
100 J2
... ...
Do you think this could be possible using TOS? I thought of using tAggregateRow, but I didn't find it appropriate to my issue.
There's probably an easier way to do this using the tMemorizeRows component but the first thought that comes to mind is to use the globalMap to store a rolling sum.
In Talend it is possible to store an object (any value or any type) in the globalMap so that it can be retrieved later on in the job. This is used automatically if you ever use a tFlowToIterate component which allows you to retrieve the values for that row that is being iterated on from the globalMap.
A very basic sample job might look like this:
In this we have a tJava component that only initialises the rolling sum in the globalMap with the following code:
//Initialise the rollingSum global variable
globalMap.put("rollingSum", 0);
After this we connect this component onSubjobOk to make sure we only carry on if we've managed to put the rollingSum into the globalMap.
I then provide my data using a tFixedFlowInput component which allows me to easily hardcode some values for this example job. You could easily replace this with any input. I have used your sample input data from the question:
We then process the data using a tJavaRow which will do some transformations on the data row by row. I've used the following code which works for this example:
//Initialise the operator and the value variables
String operator = "";
Integer value = 0;
//Get the current rolling sum
Integer rollingSum = (Integer) globalMap.get("rollingSum");
//Extract the operator
Pattern p = Pattern.compile("^([+-])([0-9]+)$");
Matcher m = p.matcher(input_row.deltaStock);
//If we have any matches from the regular expression search then extract the operator and the value
if (m.find()) {
operator = m.group(1);
value = Integer.parseInt(m.group(2));
}
//Conditional to use the operator
if ("+".equals(operator)) {
rollingSum += value;
} else if ("-".equals(operator)) {
rollingSum -= value;
} else {
System.out.println("The operator provided wasn't a + or a -");
}
//Put the new rollingSum back into the globalMap
globalMap.put("rollingSum", rollingSum);
//Output the data
output_row.stock = rollingSum;
output_row.date = input_row.date;
There's quite a lot going on there but basically it starts by getting the current rollingSum from the globalMap.
Next, it uses a regular expression to split up the deltaStock string into an operator and a value. From this it uses the operator provided (plus or minus) to either add the deltaStock to the rollingSum or subtract the deltaStock from the rollingSum.
After this it then adds the new rollingSum back into the globalMap and outputs the 2 columns of stock and date (unchanged).
In my sample job I then output the data using a tLogRow which will print the values of the data to the console. I typically select the table formatting option in it and in this case I get the following output:
.-----+----.
|tLogRow_8 |
|=----+---=|
|stock|date|
|=----+---=|
|50 |J0 |
|130 |J1 |
|100 |J2 |
'-----+----'
Which should be what you were looking for.
You should be able to do it in Talend Open Studio.
I attach here an image with the JOB, the content of the tJavaRow and the execution result.
I left under the tFixedFlowInput used to simulate the input a tJDBCInput that you should use to read the data from your DB. Hopefully you can use a specific tXXXInput for your DB instead of the generic JDBC one.
Here is some simple code in the tJavaRow.
//Code generated according to input schema and output schema
output_row.delta = input_row.delta;
output_row.date = input_row.date;
output_row.rollingSum =
Integer.parseInt(globalMap.get("rollingSum").toString());
int delta = Integer.parseInt(input_row.delta);
output_row.rollingSum += delta;
// Save rolling SUM for next round
globalMap.put("rollingSum", output_row.rollingSum);
Beware of the exceptions in the parseInt(). You should handle them the way you feel right.
In my projects I usually have a SafeParse library that does not throws exceptions but returns a default value I can pass together with the vale to be parsed.

generate a different number of columns based on input number

Suppose I have some XML data that has an unknown number of sub-nodes. Is there a method that allows me to input the number of sub-nodes into the program as a parameter, and have it process them? current code is something like this
SourceXML = LOAD '$input' using org.apache.pig.piggybank.storage.XMLLoader('$TopNode') as test:chararray;
test2 = LIMIT SourceXML 3;
test3 = FOREACH test2 GENERATE REGEX_EXTRACT(test,'<$tag1>(.*)</$tag1>',1),
REGEX_EXTRACT(test,'<$tag2>(.*)</$tag2>',1);
dump test3;
however I may not know in advance how many simple elements there are in the target data (how many $tag# there are). I am hoping to use a .txt file containing parameters that looks something like this:
input=/inputpath/lowerlevelsofpath
numberSimpleElements=3
tag1=tag1name
tag2=tag2name
tag3=tag3name
With a regex_extract being done on each tag in the input file
Any ideas on how to accomplish this?
You could do following
Split the text by some regex, so that each row now has value.
Generate (tag, value) for each row
Do a join between (tag, value) and (list of tags)

How to rename Pipe fields in cascading?

In two separate occasions, I've had to rename all the fields in a Pipe to join (using Merge or CoGroup). What I have done recently is:
//These two pipes contain similar values but different Field Names
Pipe papa = new Retain(papa, fieldsFrom);
Pipe pepe = new Retain(pepe, fieldsTo);
//Where fieldsFrom.size() == fieldsTo.size() and the fields positions match
for (int i =0; i < fieldsFrom.size(); i++){
pepe = new Rename(pepe, fieldsFrom.select(new Fields(i)),
fieldsTo.select(new Fields(i)));
}
//this allows me to do this
Pipe retVal = new Merge(papa, pepe);
Obviously this is pretty fragile since I need to ensure field positions in FieldsFrom and FieldsTo remain constant and that they are the same size etc.
Is there a better - less fragile way to merge without going through all the ceremony above?
You can eliminate some ceremony by utilizing Rename's ability to handle aligned from/to fields like this:
pepe = new Rename(pepe, fieldsFrom, fieldsTo);
But this only eliminates the for loop; yes, you must ensure fieldsFrom and fieldsTo are the same size and aligned to correctly express the rename.
cascading.jruby addresses this by wrapping renaming in a function that accepts a mapping rather than aligned from/to fields.
It is also the case that Merge requires incoming pipes to declare the same fields, but CoGroup only requires that you provide declaredFields to ensure there are no name collisions on the output (all fields propagate through, even grouping keys from all inputs).

Resources