Subtract Duplicates between 2 Arrays - filter

Say I have column A which contains 20 unique alphabetical names, and column B which contains 5 alphabetical names. I want to write a formula that counts the unique names in column A and subtracts matching names that exist in column B. For example, if I have A2 = Tom, A3 = Mike, A4 = Ben, A5 = Sam; B2 = Ben then it takes 4 unique names from column A and subtracts the 1 matching name in column B to equal 3. I also want this formula to ignore blank cells across both column ranges.

=COUNTA(IFERROR(UNIQUE(FILTER(A:A, NOT(COUNTIF(B:B, A:A)), LEN(A:A)))))

Related

Switching values between columns IF

I have a various columns with numeric data in them, and I was wondering if I can somehow switch values between columns IF a condition is met - if value in columna A is equal to 0 and value in column B is deifferent that 0, then I would like to swich those values so that column A has a value from B and vice versa.
I was trying to do that with Table.ReplaceValue but the problem is, that once I replace a value in column A with that from column B, my condition won't be met during next replacement.
Example:
If a Table looks like that:
PART NO
COLUMN A
COLUMN B
1
120
0
2
0
80
3
130
140
I'd like it to change like this:
PART NO
COLUMN A
COLUMN B
1
120
0
2
80
0
3
130
140
Add column .. custom column... Column A.1
= if (insert your test here) then [COLUMN B] else [COLUMN A]
Add column .. custom column... Column B.1
= if [COLUMN A] = [COLUMN A.1] then [COLUMN B] else [COLUMN A]
then right click and remove original two columns, and rename these

Merge without proc sorting in SAS

I have two similar data table that look like the following:
Data 1: Data 2:
categorical value categorical value
Sex Sex
Male 2 Male 3
Female 3 Female 1
Weight Weight
Mean 50 Mean 49
Median 53 Median 51
I would like to merge them without having to proc sort. How can I do so? I know classically, I would have to proc sort by categorical, and then merge by categorical but I don't want an alphabetized categorical category.
Desired output:
categorical value value2
Sex
Male 2 3
Female 3 1
Weight
Mean 50 49
Median 53 51
If it's one to one, each line with each line, just omit the BY statement in a data step merge.
data want;
merge t1 t2 (rename=value=new_value);
run;
proc sql;
create table dataMerged as
select data1.categorical, data1.value, data2.value as value2
from data1 LFET JOIN data2
on data1.categorical = data2.categorical;
quit;

Split last column into two equal halves in unix

I need to split last column into two separate columns & delete some part of it.
Currently all the values in the last column has 6 numbers . I need to split them into two separate columns.
First column should have first three numbers and second column should have next three numbers.
I ultimately want to delete newly created second column.
Data -
ID c1 c2 c3 c4 c5
12 A XY 123 456 657098
The new file should be created as below -
Data 2
ID c1 c2 c3 c4 c5
12 A XY 123 456 657
Thanks
You can use this awk that checks length of last column for each row:
awk 'length($NF) == 6 { $NF = substr($NF, 1, 3) } 1' file
Data -
ID c1 c2 c3 c4 c5
12 A XY 123 456 657

Combining every column-combination of an arbitrary number of matrices

I'm trying to figure out a way to do a certain "reduction"
I have a varying number of matrices of varying size, e.g
1 2 2 2 5 6...70 70
3 7 8 9 7 7...88 89
1 3 4
2 7 7
3 8 8
9 9 9
.
.
44 49 49 49 49 49 49
50 50 50 50 50 50 50
87 87 88 89 90 91 92
What I need to do (and I hope that I'm explaining this clearly enough) is to combine any possible
combination of columns from these matrices, this means that one column might be
1
3
1
2
3
9
.
.
.
44
50
87
Which would reduce down to
1
2
3
9
.
.
.
44
50
87
The reason why I'm doing this is because I need to find the smallest unique combined column
What am I trying to accomplish
For those interested, I'm trying to find the smallest set of gene knockouts
to disable reactions. Here, every matrix represents a reactions, and the columns represent the indices of
the genes that would disable that reaction.
The method may be as brute force as needed, as these matrices rarely become overwhelmingly large,
and the reaction combinations won't be long either
The problem
I can't (as far as I know) create a for loop with an arbitrary number of iterators, and the number of
matrices (reactions to disable) is arbitrary.
Clarification
If I have matrices A,B,C with columns a1,a2...b1,b2...c1...cn what I need
are the columns [a1 b1 c1], [a1, b1, c2], ..., [a1 b1 cn] ... [an bn cn]
Solution
Courtesy of Michael Ohlrogge below.
Extension of his answer, for completeness
His solution ends with
MyProd = product(Array_of_ColGroups...)
Which gets the job done
And picking up where he left off
collection = collect(MyProd); #MyProd is an iterator
merged_cols = Array[] # the rows of 'collection' are arrays of arrays
for (i,v) in enumerate(collection)
# I apologize for this line
push!(merged_cols, sort!(unique(vcat(v...))))
end
# find all lengths so I can find which is the minimum
lengths = map(x -> length(x), merged_cols);
loc_of_shortest = find(broadcast((x,y) -> length(x) == y, merged_cols,minimum(lengths)))
best_gene_combos = merged_cols[loc_of_shortest]
tl;dr - complete solution:
# example matrices
a = rand(1:50, 8,4); b = rand(1:50, 10,5); c = rand(1:50, 12,4);
Matrices = [a,b,c];
toJagged(x) = [x[:,i] for i in 1:size(x,2)];
JaggedMatrices = [toJagged(x) for x in Matrices];
Combined = [unique(i) for i in JaggedMatrices[1]];
for n in 2:length(JaggedMatrices)
Combined = [unique([i;j]) for i in Combined, j in JaggedMatrices[n]];
end
Lengths = [length(s) for s in Combined];
Minima = findin(Lengths, min(Lengths...));
SubscriptsArray = ind2sub(size(Lengths), Minima);
ComboTuples = [((i[j] for i in SubscriptsArray)...) for j in 1:length(Minima)]
Explanation:
Assume you have matrix a and b
a = rand(1:50, 8,4);
b = rand(1:50, 10,5);
Express them as a jagged array, columns first
A = [a[:,i] for i in 1:size(a,2)];
B = [b[:,i] for i in 1:size(b,2)];
Concatenate rows for all column combinations using a list comprehension; remove duplicates on the spot:
Combined = [unique([i;j]) for i in A, j in B];
You now have all column combinations of a and b, as concatenated rows with duplicates removed. Find the lengths easily:
Lengths = [length(s) for s in Combined];
If you have more than two matrices, perform this process iteratively in a for loop, e.g. by using the Combined matrix in place of a. e.g. if you have a matrix c:
c = rand(1:50, 12,4);
C = [c[:,i] for i in 1:size(c,2)];
Combined = [unique([i;j]) for i in Combined, j in C];
Once you have the Lengths array as a multidimensional array (as many dimensions as input matrices, where the size of each dimension is the number of columns in each matrix), you can find the column combinations that correspond to the lowest value (there may well be more than one combination), via a simple ind2sub operation:
Minima = findin(Lengths, min(Lengths...));
SubscriptsArray = ind2sub(size(Lengths), Minima)
(e.g. for a randomized run with 3 input matrices, I happened to get 4 results with the minimal length of 19. The result of ind2sub was ([4,4,3,4,4],[3,3,4,5,3],[1,3,3,3,4])
You can convert this further to a list of "Column Combination" tuples with a (somewhat ugly) list comprehension:
ComboTuples = [((i[j] for i in SubscriptsArray)...) for j in 1:length(Minima)]
# results in:
# 5-element Array{Tuple{Int64,Int64,Int64},1}:
# (4,3,1)
# (4,3,3)
# (3,4,3)
# (4,5,3)
# (4,3,4)
Ok, let's see if I understand this. You've got n matrices and want all combinations with one column from each of the n matrices? If so, how about the product() (for Cartesian product) from the Iterators package?
using Iterators
n = 3
Array_of_Arrays = [rand(3,3) for idx = 1:n] ## arbitrary representation of your set of arrays.
Array_of_ColGroups = Array(Array, length(Array_of_Arrays))
for (idx, MyArray) in enumerate(Array_of_Arrays)
Array_of_ColGroups[idx] = [MyArray[:,jdx] for jdx in 1:size(MyArray,2)]
end
MyProd = product(Array_of_ColGroups...)
This will create an iterator object which you can then loop over to consider the specific combinations of columns.

Aggregation Operation in Kettle / Pentaho

I'm trying to do an aggregate operation between some columns from an Excel file input. I have the following case:
Column 1 Column 2 Column 3
X $15 A
X $20 A
Y $1 B
Y $1 B
Y $3 C
And i want to achieve this aggregation operation:
Column 1 Column 2 Column 3
X $35 A
Y $2 B
Y $3 C
As you see, the Column 1 and 3 are the criteria for doing the aggregation operation, in this case, i want to get the sum of the column 2.
Is there any way to do this in Pentaho Data Integration? I've tried with "Join Rows" and "Join Rows (As a cartesian product)", but, i have no results.
Please look to Group By step. It should allow you to group by Column 1 and Column 3 and sum Column 2.

Resources