2 differents Readers to fill an item in the same step - spring

I have this situation. I have a csv file with some information, I have to complete this information with a database table registers ad write a new file with these info.
I guess that I should use a MultipleReader implementation, one to read my file and other to read my database (Some like this example: https://bigzidane.wordpress.com/2016/09/15/spring-batch-multiple-sources-as-input/) But I need pass conditions to te query in relation a the current item being processed.
Any way, If it is possible, I need configure a query data in my reader2 with info getted in my reader1. How I could make this?
This a little resume of my problem:
Input File (Reader1)
Id;data1;data2;data3
Database (Reader2)
Id|data4;data5;data6
Output File
Id;data1;data2;data3;data4;data5;data6
Sorry My english. Any link to articles or docs is good.

This is a common pattern known as the "driving query pattern" and is described in the Common patterns section of the reference documentation.
You can use a FlatFileItemReader to read your input file and an item processor to query the database for the current item and enrich it with additional data.
Another idea is to load your flat file in a staging table in the database and use a database item reader to join data.

Related

How to set start and end row or interval rows for CSV in Nifi?

I want to get particular part of excel file in Nifi. My Nifi template like that;
GetFileProcessor
ConvertExcelToCSVProcessor
PutDatabaseRecordProcessor
I should parse data between step 2 and 3.
Is there a solution for getting specific rows and columns ?
Note:If there is a option for cutting ConvertExcelToCSVProcessor, it will work for me.
You can use Record processors between ConvertExcelToCSV and PutDatabaseRecord.
to remove or override a column use UpdateRecord. this processor can receive your data via CSVReader and prepare an output for PutDatabaseRecord or QueryRecord . check View usage -> Additional Details...
in order to filter by column use QueryRecord.
here an example. this example receives data through CSVReader and makes some aggregations, you can as well do some filtering according to doc
also this post had helped me to understand Records in Nifi

How to read an excel sheet and put the cell value within different text fields through UiPath?

How to read an excel sheet and put the cell value within different text fields through UiPath?
I have a excel sheet as follows:
I have read the excel contents and to iterate over the contents later I have stored the contents in a Output Data Table as follows:
Read Range - Output:
DataTable: CVdatatable
Output Data Table
DataTable: CVdatatable
Text: opCVdatatable
Screenshot:
Finally, I want to read the text opCVdatatable in a iteration and write them into text fields. So in the desired Input fileds I mentioned opCVdatatable or opCVdatatable+ "[k(enter)]" as required.
Screenshot:
But UiPath seems to start from the begining of the Output Data Table whenever I called for opCVdatatable.
Inshort, each desired Input fileds are iteratively getting filled up by all the data with the data stored in the Output Data Table.
Can someone help me out please?
My first recommendation is to use Workbook: Read range activity to read data from Excel because it is quicker, works in the background, and does not require excel to be installed on the system.
Start your sequence like this (note the add headers property is not checked):
You do not need to use Output Data Table because this activity outputs a string containing all row items. What you want to do instead is to access the items in the data table and output each one as a string in your type into, e.g., CVDatatable.Rows(0).Item(0).ToString, like so:
You mention you want to read the text opCVdatatable in an iteration and write them into text fields. This is a little bit more complex, but i'll give you an example. You can use a For Each Row activity and loop through each row in CVDatatable, setting the index property if required. See below:
The challenge is to get the selector correct here and make it dynamic, so that it targets a different text field per iteration. The selector for the type into activity will depend on the system you are targeting, but here is an example:
And the selector for this:
Also, here is a working XAML file for you to test.
Hope this helps.
Chris
Here's a different, more general approach. Instead of including the target in the process itself, the Excel would be modified to include parts of a selector:
Note that column B now contains an identifier, and this ID depends on the application you will be working with. For example, here's my sample app looks like. As you can see, the first text box has an id of 585, the second one is 586, and so on (note that you can work with any kind of identifier including the control's name if exposed to UiPath):
Now, instead of adding multiple Type Into elements to your workflow, you would add just a single one, loop over each of the datatable's row, and then create a dynamic selector:
In my case the selector for the Type Into activity looks as follows:
"<wnd cls='#32770' title='General' /><wnd ctrlid='" + row(1).ToString() + "' />"
This will allow you to maintain the process from the Excel sheet alone - if there's a new field that needs to be mapped, just add it to your sheet. No changes to the Workflow are required.

Get data's source in kettle

When I use kettle , I was wandering how to get a table column's source column. Just for an example , after I have merged two tables into one table based on primary key already , Given any column in output table , I could judge whether table it belongs to and get the original column name in original table. Thank you for helping and sorry for my poor English...
http://i.stack.imgur.com/xoR0s.png
When I was given any field in table3 (suppose a field named A in table3) , I could know where it comes from without the graphical view (from java code or other ways) , like the original table name (here are input1 or input2) and the original column name(maybe B in input1 , but represents A in table3). Besides I use mysql.
There are a couple of ways to do this:
1) Manually. If you right-click on the output step and choose Show Output fields (or whatever it's called), you will see the "origin step" for each of the outgoing fields. You can do the same for input fields. Then you can trace them back to those origin steps, and repeat the process of viewing the input fields at those steps, and seeing those fields' origins, and so on. This is probably not what you're looking for.
2) With code. Prior to 6.0, you'd need to programmatically perform the same operations as are listed in option 1 above. In 6.0 there is the Data Lineage capability, which offers the LineageClient API that can find the origin fields for the specified output fields. For more information see my blog post describing the Data Lineage capability. Also I put a Gremlin Console in the PDI Marketplace, to make the use of LineageClient easier (and you can visually see the lineage graph too).

Oracle - build dimension from a file based data source

I'm trying to build a star schema in Oracle 12c. In my case my data source is not a relational database but a single excel/csv file which is populated via a google form, which means I don't have any sort of reference from a source system such as auto incremental keys/ids. Now what would be the best approach to build a star schema given this condition?
File row sample:
<submitted timestamp>,<submitted by user>,<region>,<country>,<branch>,<branch location>,<branch area>,<branch type>,<branch name>,<branch private? yes/no value>,<the following would be all "fact" values (measurements),...,...,...
In case i wanted to build a "branch" dimension, how would I handle updates/inserts after the first load into the dimension table?
Thought solution so far:
I had thought of making a concatenated string "key" with the branch values, which would make it unique (underscore would be the "glue" to concatenate the values), eg:
<region>_<country>_<branch>_<branch location> as branch_key
I would insert all the distinct branches into a staging table, including they branch_key column for each one of them, then when trying to load into the dimension I could compare which key does not exists yet in my dimension table and then insert it. As for updates, I'm a bit stuck on how to handle that, I had thought of having another file mapping which branches are active having a expiration date column. Basically trying to simulate what I could do having the data in a database instead of CSV files.
This is all I can think of so far, do you have any other recommendations/ideas on how to implement this? Take on consideration that the data source cannot as in I have to read these csv files, since data is not stored anywhere else.
Thank you.

Reading XML-files with StAX / Kettle (Pentaho)

I'm doing an ETL-process with Pentaho (Spoon / Kettle) where I'd like to read XML-file and store element values to db.
This works just fine with "Get data from XML" -component...but the XML file is quite big, several giga bytes, and there fore reading the file takes too long.
Pentaho Wiki says:
The existing Get Data from XML step is easier to use but uses DOM
parsers that need in memory processing and even the purging of parts
of the file is not sufficient when these parts are very big.
The XML Input Stream (StAX) step uses a completely different approach
to solve use cases with very big and complex data stuctures and the
need for very fast data loads...
There fore I'm now trying to do the same with StAX, but it just doesn't seem to work out like planned. I'm testing this with XML-file which only has one element group. The file is read and then mapped/inserted to table...but now I get multiple rows to table where all the values are "undefined" and some rows where I have the right values. In total I have 92 rows in the table, even though it should only have one row.
Flow goes like:
1) read with StAX
2) Modified Java Script Value
3) Output to DB
At step 2) I'm doing as follow:
var id;
if ( xml_data_type_description.equals("CHARACTERS") &&
xml_path.equals("/labels/label/id") ) {
id = xml_data_value; }
...
I'm using positional-staz.zip from http://forums.pentaho.com/showthread.php?83480-XPath-in-Get-data-from-XML-tool&p=261230#post261230 as an example.
How to use StAX for reading XML-file and storing the element values to DB?
I've been trying to look for examples but haven't found much. The above example uses "Filter Rows" -component before inserting the rows. I don't quite understand why it's being used, can't I just map the values I need? It might be that this problem occurs because I don't use, or know how to use, Filter Rows -component.
Cheers!
I posted a possible StAX-based solution on the forum listed above, but I'll post the gist of it here since it is awaiting moderator approval.
Using the StAX parser, you can select just those elements that you care about, namely those with a data type of CHARACTERS. For the forum example, you basically need to denormalize the rows in sets of 4 (EXPR, EXCH, DATE, ASK). To do this you add the row number to the stream (using an Add Sequence step) then use a Calculator to determine a "bucket number" = INT((rownum-1)/4). This will give you a grouping field for a Row Denormaliser step.
When the post is approved, you'll see a link to a transformation that uses StAX and the method I describe above.
Is this what you're looking for? If not please let me know where I misunderstood and maybe I can help.

Resources