Talend: Save variable for later use - etl

I´m trying to save a value in spreadsheet's header for later use as a new column value.
This is the reduced version with value (XYZ) in header:
The value in header must be used for new column CODE:
This is my design:
tFilterRow_1 is used to reject rows without values in A, B, C columns.
There is a conditional in tJavaRow_1 to set a global variable:
if(String.valueOf(row1.col_a).equals("CODE:")){
globalMap.putIfAbsent("code", row1.col_b);
}
The Var expression in tMap_1 to get the global variable is:
(String)globalMap.get("code")
The Var "code" is mapped to column "code" but I'm getting this output:
a1|b1|c1|
a2|b2|c2|
a3|b3|c3|
What is missed or there is a better approach to accomplish this escenario ?
Thanks in advance.

Short answer:
I tJavaRow use the input_row or the actual rowN in this case row4.
Longer answer, how I'd do it.
I'd do is let the excel flow in AS-IS. By using some Java tricks we can simply skip the first few rows then let the rest of the flow go through.
So the filter + tjavarow combo can be replaced with a tJavaFlex.
tJavaFlex I'd do:
begin:
boolean contentFound = false;
main
if(input_row.col1 != null && input_row.col1.equalsIgnoreCase("Code:") ) {
globalMap.put("code",input_row.col2);
}
if(input_row.col1 != null && input_row.col1.equalsIgnoreCase("Column A:") ) {
contentFound = true;
} else {
if(false == contentFound) continue;
}
This way you'll simply skip the first few records (i.e header) and only care about the actual data.

Related

Is there any better way to check if the same data is present in a table in .Net core 3.1?

I'm pulling data from a third party api. The api runs multiple times in a day. So, if the same data is present in the table it should ignore that record, else if there are any changes it should update that record or insert a new record if anything new shows up in the json received.
I'm using the below code for inserting any new data.
var input = JsonConvert.DeserializeObject<List<DeserializeLookup>>(resultJson).ToList();
var entryset = input.Select(y => new Lookup
{
lookupType = "JOBCODE",
code = y.Code,
description = y.Description,
isNew = true,
lastUpdatedDate = DateTime.UtcNow
}).ToList();
await _context.Lookup.AddRangeAsync(entryset);
await _context.SaveChangesAsync();
But, after the first run, when the api runs again it's again inserting the same data in the table. As a result, duplicate entries are getting into table. To handle the same, I used a foreach loop as below before inserting data to the table.
foreach (var item in input)
{
if (!_context.Lookup.Any(r =>
r.code== item.Code))
{
//above insert code
}
}
But, the same doesn't work as expected. Also, the api takes a lot of time to run when I put a foreach loop. Is there a solution to this in .net core 3.1
List<DeserializeLookup> newList=new();
foreach (var item in input)
{
if (!_context.Lookup.Any(r =>
r.code== item.Code))
{
newList.add(item);
//above insert code
}
}
await _context.Lookup.AddRangeAsync(newList);
await _context.SaveChangesAsync();
It will be better if you try this way
I’m on my phone so forgive me for not being able to format the code in my response. The solution to your problem is something I actually just encountered myself while syncing data from an azure function and third party app and into a sql database.
Depending on your table schema, you would need one column with a unique identifier. Make this column a primary key (first step to preventing duplicates). Here’s a resource for that: https://www.w3schools.com/sql/sql_primarykey.ASP
The next step you want to take care of is your stored procedure. You’ll need to perform what’s commonly referred to as an UPSERT. To do this you’ll need to merge a table with the incoming data...on a specified column (whichever is your primary key).
That would look something like this:
MERGE
Table_1 AS T1
USING
Incoming_Data AS source
ON
T1.column1 = source.column1
/// you can use an AND / OR operator in here for matching on additional values or combinations
WHEN MATCHED THEN
UPDATE SET T1.column2= source.column2
//// etc for more columns
WHEN NOT MATCHED THEN
INSERT (column1, column2, column3) VALUES (source.column1, source.column2, source.column3);
First of all, you should decouple the format in which you get your data from your actual data handling. In your case: get rid of the JSon before you actually interpret the data.
Alas, I haven't got a clue what your data represents, so Let's assume your data is a sequence of Customer Orders. When you get new data, you want to Add all new orders, and you want to update changed orders.
So somewhere you have a method with input your json data, and as output a sequence of Orders:
IEnumerable<Order> InterpretJsonData(string jsonData)
{
...
}
You know Json better than I do, besides this conversion is a bit beside your question.
You wrote:
So, if the same data is present in the table it should ignore that record, else if there are any changes it should update that record or insert a new record
You need an Equality Comparer
To detect whether there are Added or Changed Customer Orders, you need something to detect whether Order A equals Order B. There must be at least one unique field by which you can identify an Order, even if all other values are of the Order are changed.
This unique value is usually called the primary key, or the Id. I assume your Orders have an Id.
So if your new Order data contains an Id that was not available before, then you are certain that the Order was Added.
If your new Order data has an Id that was already in previously processed Orders, then you have to check the other values to detect whether it was changed.
For this you need Equality comparers: one that says that two Orders are equal if they have the same Id, and one that says checks all values for equality.
A standard pattern is to derive your comparer from class EqualityComparer<Order>
class OrderComparer : EqualityComparer<Order>
{
public static IEqualityComparer<Order> ByValue = new OrderComparer();
... // TODO implement
}
Fist I'll show you how to use this to detect additions and changes, then I'll show you how to implement it.
Somewhere you have access to the already processed Orders:
IEnumerable<Order> GetProcessedOrders() {...}
var jsondata = FetchNewJsonOrderData();
// convert the jsonData into a sequence of Orders
IEnumerable<Order> orders = this.InterpretJsonData(jsondata);
To detect which Orders are added or changed, you could make a Dictonary of the already Processed orders and check the orders one-by-one if they are changed:
IEqualityComparer<Order> comparer = OrderComparer.ByValue;
Dictionary<int, Order> processedOrders = this.GetProcessedOrders()
.ToDictionary(order => order.Id);
foreach (Order order in Orders)
{
if(processedOrders.TryGetValue(order.Id, out Order originalOrder)
{
// order already existed. Is it changed?
if(!comparer.Equals(order, originalOrder))
{
// unequal!
this.ProcessChangedOrder(order);
// remember the changed values of this Order
processedOrder[order.Id] = Order;
}
// else: no changes, nothing to do
}
else
{
// Added!
this.ProcessAddedOrder(order);
processedOrder.Add(order.Id, order);
}
}
Immediately after Processing the changed / added order, I remember the new value, because the same Order might be changed again.
If you want this in a LINQ fashion, you have to GroupJoin the Orders with the ProcessedOrders, to get "Orders with their zero or more Previously processed Orders" (there will probably be zero or one Previously processed order).
var ordersWithTPreviouslyProcessedOrder = orders.GroupJoin(this.GetProcessedOrders(),
order => order.Id, // from every Order take the Id
processedOrder => processedOrder.Id, // from every previously processed Order take the Id
// parameter resultSelector: from every Order, with its zero or more previously
// processed Orders make one new:
(order, previouslyProcessedOrders) => new
{
Order = order,
ProcessedOrder = previouslyProcessedOrders.FirstOrDefault(),
})
.ToList();
I use GroupJoin instead of Join, because this way I also get the "Orders that have no previously processed orders" (= new orders). If you would use a simple Join, you would not get them.
I do a ToList, so that in the next statements the group join is not done twice:
var addedOrders = ordersWithTPreviouslyProcessedOrder
.Where(orderCombi => orderCombi.ProcessedOrder == null);
var changedOrders = ordersWithTPreviouslyProcessedOrder
.Where(orderCombi => !comparer.Equals(orderCombi.Order, orderCombi.PreviousOrder);
Implementation of "Compare by Value"
// equal if all values equal
protected override bool Equals(bool x, bool y)
{
if (x == null) return y == null; // true if both null, false if x null but y not null
if (y == null) return false; // because x not null
if (Object.ReferenceEquals(x, y) return true;
if (x.GetType() != y.GetType()) return false;
// compare all properties one by one:
return x.Id == y.Id
&& x.Date == y.Date
&& ...
}
For GetHashCode is one rule: if X equals Y then they must have the same hash code. If not equal, then there is no rule, but it is more efficient for lookups if they have different hash codes. Make a tradeoff between calculation speed and hash code uniqueness.
In this case: If two Orders are equal, then I am certain that they have the same Id. For speed I don't check the other properties.
protected override int GetHashCode(Order x)
{
if (x == null)
return 34339d98; // just a hash code for all null Orders
else
return x.Id.GetHashCode();
}

How to remove or hide the data value not required from table in birt tool

How do I remove or hide the data value not required from table in birt tool?
I tried with the values it works in some places but now in groups which has multiple values.
I need to filter some of the values which should not be displayed in the data tab of the table.
I have a column which does not have any value that I need to filter out (But its not an empty value because when I check I got to know that it has some blank spaces). It should display only the columns with non-blank value.
How can I remove those columns from the data set.
You can of course try scripting the data source query but you can also run a script on the table when it is created to hide the empty column.
Try this script in the table's onCreate event:
var mycolumnCount = this.getRowData().getColumnCount();
var DisNull = false;
for(i=1;i<mycolumnCount;i++) {
var temp = this.getRowData().getColumnValue(i)
if(this.getRowData().getColumnValue(i) == "") {
DisNull = true;
}else{
DisNull = false;
i = mycolumnCount+1;
}
}
if(DisNull == true) {
this.getStyle().display = "none"
}

How to fetch data for multiple values of a parameter one by one

I have multiple values for one parameter i want to fetch data for each query for every value of parameter in Birt report. i'm getting data only for one value of parameter not all. m using Scripted data source.Open and fetch methods.Thanks
Open in DataSet
importPackage(Packages.com.abc.test.events);
var TlantNo = params["tlant"].value;
var reqNo = params["Number"].value;
poreEvents = new StdPoreReqEvents();
poreEvents.setReqNo(reqNo);
poreEvents.setTlantNo(TlantNo);
poreEvents.open();
fetch
var poreRO = poreEvents.fetch();
if (poreRO == null) {
return false;
} else
{
row["REQ_NO"] = poreRO.getReqNo();
row["REQ_DATE"] = poreRO.getReqDate();
return true;
}
A report parameter with multiple values is an array, we have to iterate on it through the scripted dataset.
In open event of the dataset, we only have to initialize a global index:
i=0;
In fetch event, process each iteration with something such the script below. Pay a special attention how we get the value of reqNo:
importPackage(Packages.com.abc.test.events);
if (params["Number"].value!=null && i<params["Number"].value.length){
var TlantNo = params["tlant"].value;
var reqNo = params["Number"].value[i];
//ETC. do here your stuff with porevents, declare poreRO, check if result is null
row["REQ_NO"] = poreRO.getReqNo();
row["REQ_DATE"] = poreRO.getReqDate();
i++; //Important: increment this even if poreRO is null, otherwise infinite loop
return true; //should return true even if poreRO was null, to process next rows
}else{
return false;
}
The other approach is to do this problem is by defining a dataset having output column which is the multiple value parameter Fetch that column by IN query of the multiple value parameter then pass that value means(output column value) to the other datasets as a Input/output parameter and give binding to parameter.
It resolved my problem :)

Overwrite data dropdown values in search form (multipleSearch)

I'm having a colModel entry like this:
{name:'status', index:'status', sorttype:"text", xmlmap:"STATUS", width:"90", stype: 'select', searchoptions:{sopt: ['eq','ne'], value:':all;Hold:Hold;4-Eye-Check:4-Eye-Check;Approved:Approved;Rejected:Rejected;Closed:Closed'}},
thats working fine as long as it's used in the FilterToolBar, but if I open the NavGridSearch Im running into troubles. The entry "all" is not working anymore. The query in the FilterToolBar seems to ignore my empty but the NavGridSearch doesn't.
Is there any wildcard sign which could be used instead of an empty String, which delivers all entries regardless if I search for all status entries in the FilterToolBar or the NavGridSearch?
I use the newest OpenSource jQGrid Lib(4.3.2)
Thanks in advance!
I am trying to understand the use case for a criteria of 'all'. It seems the purpose of 'all' is that you want the grid to ignore this search criteria and return all rows regardless of this value? If that is the case, why would the user even want to select this search criteria - they can just remove it and the same effect will be achieved. Or am I missing something?
Update
The grid uses function createEl : function(eltype,options,vl,autowidth, ajaxso) to create the select for the search form. Unfortunately this select always adds the all criteria to the list, even though it has a search value of "" which will not match any rows. One workaround is to modify the grid to skip select options that have an empty value. Using the source file jquery.jqGrid.src.js you can add the following code to skip the all option:
else if (sv.length > 1 && sv[0].length === 0) continue;
Here it is within the context of createEl:
if(typeof options.value === 'string') {
so = options.value.split(delim);
for(i=0; i<so.length;i++){
sv = so[i].split(sep);
if(sv.length > 2 ) {
sv[1] = $.map(sv,function(n,ii){if(ii>0) { return n;} }).join(sep);
} else if (sv.length > 1 && sv[0].length === 0) continue; // <-- Change this line
ov = document.createElement("option");
ov.setAttribute("role","option");
ov.value = sv[0]; ov.innerHTML = sv[1];
elem.appendChild(ov);
if (!msl && ($.trim(sv[0]) == $.trim(vl) || $.trim(sv[1]) == $.trim(vl))) { ov.selected ="selected"; }
if (msl && ($.inArray($.trim(sv[1]), ovm)>-1 || $.inArray($.trim(sv[0]), ovm)>-1)) {ov.selected ="selected";}
}
If you need a minified version, update the "src" version of jqGrid and then run it through the Google Closure Compiler.
I am not sure this is a general purpose change, which is why I am calling it a workaround for right now. Longer term a better solution needs to be found so jqGrid can be patched... I'll try to find some more time later to revisit this issue.
Does that help?

How do I set a parameter to a list of values in a BIRT report?

I have a DataSet with a query like this:
select s.name, w.week_ending, w.sales
from store s, weekly_sales_summary w
where s.id=w.store_id and s.id = ?
I would like to modify the query to allow me to specify a list of store IDs, like:
select s.name, w.week_ending, w.sales
from store s, weekly_sales_summary w
where s.id=w.store_id and s.id IN (?)
How do I accomplish this in BIRT? What kind of parameter do I need to specify?
The easy part is the report parameter: set the display type to be List Box, then check the Allow Multiple Values option.
Now the hard part: unfortunately, you can't bind a multi-value report parameter to a dataset parameter (at least, not in version 3.2, which is what I'm using). There's a posting on the BIRT World blog here:
http://birtworld.blogspot.com/2009/03/birt-multi-select-statements.html
that describes how to use a code plug-in to bind multi-select report parameters to a report dataset.
Unfortunately, when I tried it, it didn't work. If you can get it to work, that's the method I would recommend; if you can't, then the alternative would be to modify the dataset's queryText, to insert all the values from the report parameter into the query at the appropriate point. Assuming s.id is numeric, here's a function that can be pasted into the beforeOpen event script for the datasource:
function fnMultiValParamSql ( pmParameterName, pmSubstituteString, pmQueryText )
{
strParamValsSelected=reportContext.getParameterValue(pmParameterName);
strSelectedValues="";
for (var varCounter=0;varCounter<strParamValsSelected.length;varCounter++)
{
strSelectedValues += strParamValsSelected[varCounter].toString()+",";
}
strSelectedValues = strSelectedValues.substring(0,strSelectedValues.length-1);
return pmQueryText.replace(pmSubstituteString,strSelectedValues);
}
which can then be called from the beforeOpen event script for the dataset, like this:
this.queryText = fnMultiValParamSql ( "rpID", "0 /*rpID*/", this.queryText );
assuming that your report parameter is called rpID. You will need to amend your query to look like this:
select s.name, w.week_ending, w.sales
from store s, weekly_sales_summary w
where s.id=w.store_id and s.id IN (0 /*rpID*/)
The 0 is included in the script so that the query script is valid at design time, and the dataset values will bind correctly to the report; at runtime, this hard-coded 0 will be removed.
However, this approach is potentially very dangerous, as it could make you vulnerable to SQL Injection attacks: http://en.wikipedia.org/wiki/SQL_injection , as demonstrated here: http://xkcd.com/327/ .
In the case of purely numeric values selected from a predefined picklist, a SQL injection attack should not be possible; however, the same approach is vulnerable where freeform entry strings for the parameter are allowed.
FYI: the BIRT World article should work (I wrote it) but that was an earlier solution to the problem.
We have created an open source plugin that you can add to BIRT that has a much cleaner solution to this problem. The Bind Parameters function in the birt-functions-lib provides a simple way to do multi-selects from multi-value parameters.
If you are still interested have a look at the birt-functions-lib project on Eclipse Labs.
Here's another one. Based on some hints I found elsewhere and extended to preserve the number of parameters in your data set SQL. This solution works with a JavaScript function that you call at OnBeforeOpen of the data set:
prepare(this);
function prepare(dataSet) {
while (dataSet.queryText.indexOf("#IN?")>=0) {
dataSet.queryText = dataSet.queryText.replace(
"#XYZ?",
"('"+params["products"].value.join("','")+"') or ?=0"
);
}
}
In your query, replace occurrences of (?) with #XYZ?. The method above makes sure that
the query has the actual values and still a parameter (so that the dataset editor and preview doesn't complain).
Note: Beware of SQL injection, e.g. by not allowing string values
I created a more general solution, which handles optional/required parameters behaviour too. When parameter is not required and user doesn't select any value, the IN-clause gets disabled. It also allows the user to select both real values and null value.
In report initialize script I add this code:
/** Fullfill IN-clause in a data set query,
* using a List box report parameter.
* Placeholder must be the parentheses after IN keyword with wathever you want inside.
* If required is false then the whole IN-clause in the query
* must be surrounded by parentheses.
* dataType and required refers to the parameter, they must be passed,
* but should be better to find a way to retrieve them inside this function
* (given parameter name).
*/
function fulfillInClause(dataSet, placeholder, param, dataType, required) {
if (dataSet.queryText.indexOf(placeholder)>=0) {
var paramValue = params[param].value;
var emptyParam = (paramValue==null || paramValue.length<=0);
//build the list of possible values
// paramValue==null check in ternary operators
// will prevent exceptions when user doesn't select any value
// (it will not affect the query if param is optional,
// while we will never arrive here if it is required)
var replacement = " (";
if (dataType == "string")
replacement += (emptyParam ? "''" : createList(paramValue, ",", "'", "varchar(10)") );
else if (dataType == "integer")
replacement += (emptyParam ? "0" : createList(paramValue, ",", "" , "int" ) );
else
//TODO implement more cases
return;
replacement += ") ";
//if param is not required and user doesn't select any value for it
//then nullify the IN clause with an always-true clause
if (!required && emptyParam)
replacement += " or 0=0 ";
//put replacement in the query
dataSet.queryText = dataSet.queryText.replace( placeholder, replacement );
//DEBUG
params["debug" + dataSet.name + "Query"]=dataSet.queryText;
}
}
/** Create a string list of array values,
* separated by separator and each of them surrounded by a pair surrounders
*/
function createList(array, separator, surrounder, sqlDataType){
var result = "";
for(var i=0; i<array.length; i++) {
if(result.length>0)
result += separator;
if(array[i]!=null)
result += surrounder + array[i] + surrounder;
else
result += "cast(null as " + sqlDataType + ")";
}
return result;
}
Usage example
In dataset query put your special IN-clause:
select F1, F2
from T1
where F3='Bubi'
and ( F4 in (''/*?customers*/) )
In beforeOpen script of the dataset with the IN-clause write:
fulfillInClause(this, "(''/*?customers*/)", "customers", "string", false);
Note that I used a placeholder which allows the query to run also before the replacement (eg. it has quotes as F4 is a varchar). You can build a placeholder that fits your case.

Resources