Duplicating values down for each "matching value" in a separate column

Duplicating values down for each "matching value" in a separate column - filter

I'm looking for an automated process that will let me replicate an identification number down against a list of reference numbers in a separate column. Every time the identification number changes in column B, it would need to repeat in every box below it until a new identification number is listed.
I've provided the existing data structure, and the requested format color coded to better showcase what is needed in the example at the link below:
DEMO FILE!

=ARRAYFORMULA({"Identification Number";
IF(LEN(C4:C), VLOOKUP(ROW(B4:B), FILTER({ROW(B4:B), B4:B}, LEN(B4:B)), 2), )})

Related

Google Sheets - Removing the used items from drop-down lists

In Google Sheets I have a column containing a list of available Serial Numbers (say, column A).
Somewhere else (say, column B) a user must choose the serial number used among those listed in column A; I use Data Validation with a drop-down list in order to prevent the user to use a non-existent serial number.
My goal is to allow the user only choose the remaining available serial numbers, by removing from the drop-down list all the serial numbers already used.
By using the FILTER function, combined with MATCH and ISNA, I am able to create a column of available serial numbers (say, column C). The function used is:
=FILTER(A2:A;ISNA(MATCH(A2:A;B2:B;0))).
Then I moved the Data Validation list of column B (where the user must select the serial number used) from column A (all serial numbers) to column C (filtered serial numbers). I also added the "Reject input" in the Data Validation form, so I can allow the user only to enter a value listed in column C.
It works, but all the previously entered serial numbers on column B have a small red triangle showing that data is not valid. Of course, this happens because all entered values are removed from the data validation list.
I could simply ignore the red triangles, but I don't like this solution that much, because it always looks like there's an error on the sheet, and when we will have many data inside it would be difficult to distinguish this problem from any others.
Is there a different way to solve?
Thanks

with formula only you can use:
=TRANSPOSE(FILTER(A2:A, NOT(COUNTIF(C3:C4, A2:A)), A2:A<>""))
=TRANSPOSE(FILTER(A2:A, NOT(COUNTIF({C2; C4}, A2:A)), A2:A<>""))
=TRANSPOSE(FILTER(A2:A, NOT(COUNTIF(C2:C3, A2:A)), A2:A<>""))
then hide columns and use validation:
where this is the result:
demo sheet
update:
1st fx:
=TRANSPOSE(FILTER(A2:A, NOT(COUNTIF(C3:C4, A2:A)), A2:A<>""))
2nd and every next fx:
=TRANSPOSE(FILTER(A$2:A, NOT(COUNTIF({
INDIRECT("C2:C"&ROW()-1); INDIRECT("C"&ROW()+1&":C")}, A$2:A)), A$2:A<>""))

How to make groups in an input and select a specific row in each of them in Talend?

I am working on a Talend transformation process (we are using Talend 6.4).
, and I don't know how to implement the current requirement.
I have an input consisting in :
Two columns that are my group keys (Account and Product), but are not unique (the same Account x Product couple can happen in multiple rows)
A criterion column (Contract end date), which will help me decide which row I want to keep for each group
Some "tail" data that need to be passed to the following step of the processing (the contract number)
The rule to implement is:
Keep only one record per group
The selected record must be one with no end date or, if all have end date, with the biggest end date
The selected record can be random in case there is a tie
See the transformation applying those rules on some dummy data:
I thought first to do the following:
sort by Account, Product, End_date (nulls first)
"select first" in each group
but I am not skilled enough to know whether the second transformation exists in Talend.
Regards,
Pierre

Very interesting Talend question.
You need to create something like this job.
here a link to the zip file to import in your Talend

The answer from #MBDIA seem to be working, however I would like to share what we did to fulfill our requirement.
See our Talend process here:
The first tMap (tMap_3) acts like a tReplicate and a tMap, and sends:
in the upper branch only the Account and Product references, that are then deduplicated by the tAggregateRow_1.
in the lower branch all data and computed fields that enables us to take care of the case where the date is missing (instead of defaulting to 31/12/9999, we compute a flag (0 or 1) that we use in the sort step afterwards).
In the second part of the process, we first apply the sort to the whole data on Account, Product, Empty date flag (computed before), End date (desc) and use a second tMap to make a join on both branches (on Account x Product), only keeping First Match in order to keep the first record as per our requirement.

Generate new format from a non-system generated report using Power Query

I have an excel file which is non-system generated report format.
I wish to calculate and generate another new output.
Given the Report format as below:-
1) Inside the query when load this excel file, how can I create a new column to copy and paste on the first found value (1#51) at column at the next record, if the next record is empty. Once, if detected a new value (1#261) then copy and paste to the subsequent null value of few next records till this end?
2) The final aim is to generate a new output to auto match/calculate the money to be assign to different reference. As shown below:-
The reference A ~ E is sharing the 3 bank Ref (28269,28542 & RMP) , was thinking to read the same data source a few times, first time to read the column A ~ O(QueryRef) and 2nd time to read the same source to read from A, Q ~ V(QueryBank).
After this I do not have idea how I can allocate the $$ from Query Bank to QueryRef based on the Sum of Total AR.
Eg,
Total Amt of BankRef 28269, $57,044.67 is sufficient to cover Ref#A $10,947.12
BankRef 28269 still sufficient to cover Ref#B $27,647.60
BankRef 28269 left only $18,449.95 , hence the balance of 28269 be allocate to Ref#C.
Remaining balance of Ref#C will need to use BankRef28542 to cover,i.e. $1,812.29
Ref#D will then be allocated of the remaining balance of BankRef28542, i.e. $4,595.32
Ref#D still left $13,350.03 unallocated, hence this will use BankRef#RMP
Ref#E only need $597.66, and BankRef#RMP is sufficient to cover this.
I am not sure if my above case study can be solved using power query or not, due to me still being a newbie # Power Query? Or this is too complicate to handle hence we need to write a program to auto matching this kinds of scenario?
Attached is the sample source file and output :
https://www.dropbox.com/sh/dyecwcdz2qg549y/AACzezsXBwAf8eHUNxxLD1eWa?dl=0
Any advice/opinion/guidance is very much appreciated.

Answering question one:
You have a feature in Powerquery called FILL, DOWN or UP.
For a selected column you can copy the first non empty value to all rows under until a new non empty row is found and so on.

Populate C with max value of B for each row of related values in A

Goal: Using Excel 2010, how can I get each change number's (listed in column A) to show the Max Business Criticality (from column B) to display in column C via formula(s)?
Let me explain: I have a list of Change tickets (in column A) where the change number will likely be listed multiple times (due to different locations and servers). This means each change number may be listed once or may be listed 20 times. Each occurrence of the change number is assigned a Business Criticality (again, based on different locations and servers). This value is captured in column B.
In column C, of the same table, I need to return the max criticality associated with each change and have that be displayed in each row (see image for desired result - colored for ease of differentiating change numbers).
Everything I've seen involves creating a distinct list of change numbers separate from the source table. I'd rather not do that.
How can I get each change number's MaxCrit to display in column C?
I'll try to attach the file if I can figure out how (and/or if I have rights to do so).

In C2 with Ctrl+Shift+Enter and copied down to suit:
=MAX(IF(A:A=A2,B:B))
with

Get data's source in kettle

When I use kettle , I was wandering how to get a table column's source column. Just for an example , after I have merged two tables into one table based on primary key already , Given any column in output table , I could judge whether table it belongs to and get the original column name in original table. Thank you for helping and sorry for my poor English...
http://i.stack.imgur.com/xoR0s.png
When I was given any field in table3 (suppose a field named A in table3) , I could know where it comes from without the graphical view (from java code or other ways) , like the original table name (here are input1 or input2) and the original column name(maybe B in input1 , but represents A in table3). Besides I use mysql.

There are a couple of ways to do this:
1) Manually. If you right-click on the output step and choose Show Output fields (or whatever it's called), you will see the "origin step" for each of the outgoing fields. You can do the same for input fields. Then you can trace them back to those origin steps, and repeat the process of viewing the input fields at those steps, and seeing those fields' origins, and so on. This is probably not what you're looking for.
2) With code. Prior to 6.0, you'd need to programmatically perform the same operations as are listed in option 1 above. In 6.0 there is the Data Lineage capability, which offers the LineageClient API that can find the origin fields for the specified output fields. For more information see my blog post describing the Data Lineage capability. Also I put a Gremlin Console in the PDI Marketplace, to make the use of LineageClient easier (and you can visually see the lineage graph too).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio