Aggregator transformation gives only one row - informatica-powercenter

In aggregator transformation,I used max function. I am getting single row for each group. But multiple rows have maximum value. How to get the other rows details

In such case you need to calculate the MAX value per group, just as you seem to have done, and then join the result back to the original dataset via Joiner Transformation using value and group key as join conditions. As a result you should get all the rows with max value per group.
This should look more or less as depicted below:
SRC--SQ--AGG-\
\ \
\------JNR--TGT

Related

Select columns in a Polars LazyFrame based on a condition without collect?

We often want to remove columns from a LazyFrame that don't fit a condition or threshold evaluated over that column (variance, number of missing values, number of unique values). It's possible to evaluate a condition over a LazyFrame columnwise, collect that condition, and pass it as a list to the same LazyFrame (see this question). Is it possible to do this without evaluating an intermediate result?
A toy example would be to select only the columns that have 10 or more unique values. I can do this following the example from the linked question:
threshold = 10
df = ldf.select(
ldf.select(pl.all().n_unique())
.melt()
.filter(pl.col("value") >= threshold)
.select("variable")
.collect() # this evaluates the condition over the dataframe
.to_series()
.to_list()
).collect()
I would like to do this with only one collect() statement at the end.
This is impossible without a collect. With LazyFrames you are making a computation graph. Every node in that graph has a known schema that is defined before running the query.
It is impossible to know what the schema is if the columns you select are dependent on the "running" the query.
In short, you have to collect and then continue lazy from that point.

How to get the sum of values of a column in tmap?

I have 2 columns - Matches(Integer), Accounts_type(String). And i want to create a third column where i want to get proportions of matches played by different account types. I am new to Talend & am facing issue with this for past 2 days & did a lot of research but to no avail. Please help..
You can do it like this:
You need to read your source data twice (I used tFixedFlowInput_1 and tFixedFlowInput_2 with the same data). The idea is to calculate the total of your matches in tAggregateRow_1, it simply does a sum of all Matches without a group by column, then use that as a lookup.
The tMap then joins your source data with the calculated total. Since the total will always be one record, you don't need any join column. You then simply divide Matches by Total as required.
This is supposing you have unique values in Account_type; if you don't, you need to add another tAggregateRow between your source and tMap_1, in order to get sum of Matches for each Account_type (group by Account_type).

Power Query, avg value based on the values appearing within a specified date range

Context:
I have a data set for the weights of truck and trailer combinations coming into my site over the span of a few years. I have organized my data by seasons as I am trying to prove that the truck:trailers in winter are noticeably heavier due to ice, snow, and mud. The theory is, if the tare weight is higher in this season (the weight of the truck after it empties its load) than its Avg tare weight (which I need to calculate from the data) it can be deduced that the truck:trailer combinations are coming in with extra weight that we pay for in part as some snow/ice/mud falls off in the trailer emptying process.
What I've done so far:
I've defined a custom date range for my seasons
I've grouped Truck:Trailer by: count to get a duplicates column and, all rows to keep all my details
I've filtered out every combination I've seen less than 50 times, as i want good representation for each truck:trailer combo so that I can better emphasize repeated patterns
I've added an index column to better keep track of the individuals before expanding the details
What I need to do:
I only want to work with truck:trailer combinations which have weighed in for all four seasons at least once
I need to find the average tare weight of the truck:trailer combinations based over the extended range for both summer and autumn (the dry time of the year) while preserving the raw tare data for all seasons, as I need to eventually compare the winter tare values to this average.
example of my data
When I'm finished I'd like the data to look something like this
Pivot Chart
query data
For your first question (all seasons) you can add a column that holds the distinct count of the values in [Season] for each [Driver:Trailer]. Then filter your table on that column, keeping only the 4's. To achieve this, add the following m-code to your script in the Advanced Editor. Change the part after in to #"DistinctCount Season"
#"DistinctCount Season" = Table.Join(#"insert name previous step","Driver:Trailer",
Table.Group(#"insert name previous step", {"Driver:Trailer"},
{{"DistinctCountSeasons", each Table.RowCount(Table.Distinct(_,"Season")),
type number}}),"Driver:Trailer")
Insert the name of your previous step where indicated.
For second question:
You can use a matrix-visual for that in you report. First create a measure:
[AverageTare] = AVERAGE(table'[Tare])
Then put [Season] on Rows and the [AverageTare] on Values. You can create a group (right-click on [Season] in the FIELDS-pain) called [DrySeason], to combine the values for Spring and Summer.
If that doesn't work for you, explore the AVERAGEX function.
EDIT
In excel you can use a pivottable. Put [Season] on Rows and the [AverageTare] on Values. Right-click a value in the pivottable. Select Value Field Setting and choose Average. Then select the Seasons you want to group, right-click and select Group.
EDIT 2
To add a column in the Power Query Editor that holds the average [Tare] for the [Season] in each row, add the following steps to your script in the Avanced Editor:
#"GroupedSeasonAvg" = Table.Group(#"Insert name previous step", {"Season"}, {{"AVG", each List.Average([Tare]), type number}}),
#"JoinOnSeason" = Table.NestedJoin(#"Insert name previous step",{"Season"},GroupedSeasonAvg,{"Season"},"AVGGrouped"),
#"ExtractSeasonAVG" = Table.ExpandTableColumn(JoinOnSeason, "AVGGrouped", {"AVG"}, {"SeasonAVG"})
It works something like this:
"GroupedSeasonAvg" : Creates a table with the avereges for each [Season]
"JoinOnSeason": Creates a new column with tables joining the [Season] value for each row to [Season] in the grouped table.
#"ExtractSeasonAVG": Expand each table and keep only [AVG].

DAX: Use measure outcome to populate calculated column without recalculating measure per row

I have two tables in PowerBI. One called 'Fact_WorstInstance' contains rows of (Index,Instance). For example:
1,2
2,1
3,2
One called 'Fact_AllInstances' contains rows of (Index,Instance,Value). For example:
1,1,'Red'
1,2,'Green'
2,1,'Amber'
2,2,'Red'
2,3,'Brown'
3,1,'Green'
3,2,'Blue'
The first table is essentially a pointer to the worst entry in the second table for the given index (as categorised by some external system).
There is a slicer on which Indexes are visible to the user.
What I want to do is find the worst instance value for the highest visible Index in the 'Fact_WorstInstance' table, and then get all the Index and Value rows from the 'Fact_AllInstances' table for that Instance.
For example, if the slicer isnt filtering then (3,2) should be the active row from from the 'Fact_WorstInstance' table and this should be used to get Instance 2 from the 'Fact_AllInstances' table
1,2,'Green'
2,2,'Red'
3,2,'Blue'
from the 'Fact_AllInstances' table.
I tried to do this in many different ways, by creating a measure on the 'Fact_WorstInstance' which gives the highest visible row. And then use this measure to create a calculated column on the 'Fact_AllInstances', with 1 for worst and 0 for not worst. And then use this calculated column as a filter in PowerBI.
The measure itself gives the expected value. The problem I have is when the measure is used to create the calculated column, I cannot find a way to stop the Index being filtered based on the row of calculated column - and therefore the measure outcome changes for each row.
My measure:
Worst Entry = CALCULATE(FIRSTNONBLANK(Fact_WorstInstance[Instance],1),filter(ALLSELECTED(Fact_WorstInstance),Fact_WorstInstance[Index]=MAX(Fact_WorstInstance[Index])))
My column:
WorstColumn = if(Fact_AllInstances[Instance]=[Worst Entry],1,0)
So instead of getting the output above, I get
1,2,'Green'
2,1,'Amber' --> because for Index 2, the measure gives index 1 as worst
3,2,'Blue'
This is a possible solution you might want to implement.
First of all, calculated columns are not affected by slicers/page filters, you will need to create a measure for that, so the way your are appraching the problem won't work.
Create an additional calculated table that holds unique instances values. In Power BI, Modeling tab there is a icon for creating a New Table, where you can use an expression to produce the table.
Use this expression:
IsntancesCalcTable = VALUES(Fact_WorstInstance[Instance])
Now you have a table called InstancesCalcTable in your model.
Drag the Instance column in the InstancesCalcTable and drop it in the Instance column of the Fact_WorstInstance, this will create a relationship between InstancesCalcTable and Fact_WorstInstance via Instance. A line between both tables will be drawn in the Relationships view, double click that line and you will see the Edit Relationship window.
Make sure it looks like this:
Then do the same for creating the relationship between InstancesCalcTable and Fact_AllInstances.
You will end with a model like this:
Then you can use Index column in the Fact_WorstInstance table, in a slicer and it will filter the Fact_AllInstances table to get only the instances selected.
However if you don't have any filter all rows in Fact_AllInstances will be shown.

DAX Running Running Total Based on None Date Columns

I need to do a running total and filter by two none date fields.
All data is in a single table that is used to house Cycle Times for Part Numbers at each of their respective steps. This table contains a record for each combination of Step and Part. I need to get a running total for Cycle Times on all complete and current steps.
For example in the Table View, if you were to go to CycleHoursComplete at StepNo 40 I need it to = 2.86 or the sum of StepCycleHours with StepNo's <= the StepNo for the given record.
Currently "CycleHoursComplete" = CALCULATE( SUM ( Routing[StepCycleHours] ) , GROUPBY(Routing, Routing[PartNo]), (Routing[StepNo] <= Routing[StepNo] )), and that is not working.
You can easily do running totals with non-date fields as long as you have a numerical identifier to sort by, such as an ID column or in your case, the StepNo column.
You just need to use a combination of FILTER and EARLIER functions. Here's the DAX:
=CALCULATE(SUM([StepCycleHours]),ALL(Cycle),Cycle[StepNo]<=EARLIER(Cycle[StepNo]))
Note my table is called "Cycle"- you would need to replace that with your table name (Routing).
I noticed you tried to use a similar logic as the one I pasted above in the formula, but you forgot to use EARLIER, which is crucial to create row-level context.
Result:

Resources