Complex condition for new column in Power Query - powerquery

In Power Query (M) there is a table with columns A and B. Column C is generated. Column C get value "T" if: A < 3 and the value of B exists elsewhere in column B where A is >= 3. This is true for the 4th row.
Furthermore, to complete it:
if column A >= 3 then column C = column B
if column A < 3 and NOT
(the value of B exists elsewhere in column B where A is >= 3) then
"n"
Is there a (simple) way to write this in M?
A
B
C
3
x
x
3
x
x
3
x
x
1
x
T
2
y
n
2
y
n
Thanks in advance!

UPDATED
You can use this
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source,"C",(i)=>
if i[A]>=3 then i[B] else if
i[A]<3 and Table.RowCount(
Table.SelectRows(Source, each [B]=i[B] and [A]>=3)
) >1 then "T" else "n"
)
in #"Added Custom"

Related

Cumulative Sum if previous row in another column is the same

I'm trying to do something that is simple in excel but I need it to be added in my query. I am trying to create a cumulative sum in a new column of time values in one column if a name column equals a previous row. Example of before and after below
Data Input Table
ID
Time
A
2
B
3
C
1
D
0.5
E
1
E
3
E
5
F
2
G
3
G
4
H
1
Table After Query
ID
Time
BeforeStart
A
2
0
B
3
0
C
1
0
D
0.5
0
E
1
0
E
3
1
E
5
4
F
2
0
G
3
0
G
4
3
H
1
0
Basically if column ID equals the row above itself then sum the time and BeforeStart rows above itself, if it doesn't then it is 0.
In powerquery, try below and replace Table2 with the name of your source data table
let
xFunction = (xTable as table) as table => let
#"Added Index" = Table.AddIndexColumn(xTable, "Index", 1, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Running Total", each List.Sum(List.FirstN(#"Added Index"[Time],[Index]-1)))
in #"Added Custom",
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}, {"Time", type number}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"data", each xFunction(_), type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Time", "Running Total"}, {"Time", "Running Total"})
in #"Expanded data"
Try this function:
=IF(COUNTIF($A2:$A$2,$A2)=1,0,SUMPRODUCT((A2:$A$2=A2)*B2:$B$2)-B2)

How to match a value from a table upto a particular position in oracle?

I have to write a query to match values in two tables, Table A and Table B , Table A is havingvalues in column XYZ as "91517181","915171812", i want to check if its exist in table B or not , but in table B, the value in column ABC is "9151718", but in another column in table B it is having its match length as "10". Which means it is upto "9151718XXX".
So i have to write a query where value from table A should match with value in table B, because in table B, the value is upto 10 characters.
Kindly help...
I think that you need something like this:
table a: table b:
xyz x y
---------- ---------- ---
9151718 9151718 10
91517181 91360 5
913601
select a.xyz, rpad(xyz, b.y, 'x') result, b.x pattern, b.y len
from a
left join b on a.xyz like b.x||'%' and length(a.xyz)<=b.y
xyz result pattern len
---------- ---------- ---------- ---
9151718 9151718xxx 9151718 10
91517181 91517181xx 9151718 10
913601 <- not matched
I think something like that:
select * from a where
exists(select 'x' from b where substr(xyz, 1, y) = x)
x - value in b
y - length in b

How to find rows of a matrix where with the same ordering of unique and duplicated elements, but not necessarily the same value

I wasn't quite sure how to phrase this question. Suppose I have the following matrix:
A=[1 0 0;
0 0 1;
0 1 0;
0 1 1;
0 1 2;
3 4 4]
Given row 1, I want to find all rows where:
the elements that are unique in row 1, are unique in the same column in the other row, but don't necessarily have the same value
and if there are elements with duplicate values in row 1, there are be duplicate values in the same columns in the other row, but not necessarily the same value
For example, in matrix A, if I was given row 1 I would like to find rows 4 and 6.
Can't test this right now, but I think the following will work:
A=[1 0 0;
0 0 1;
0 1 0;
0 1 1;
0 1 2;
3 4 4];
B = zeros(size(A));
for ii = 1:size(A,1)
r = A(ii,:);
B(ii,1) = 1;
for jj = 2:size(A,2)
c = find(r(1:jj-1)==r(jj));
if numel(c) > 0
B(ii,jj) = B(ii,c);
else
B(ii,jj) = B(ii,jj-1)+1;
end
end
end
At the end of this we have an array B in which "like indices have like values" and the rows you are looking for are now identical.
Now you can do
[C, ia, ic] = unique(B,'rows','stable');
disp('The answer you want is ');
disp(ia);
And the answer you want will be in the variable ia. See http://www.mathworks.com/help/matlab/ref/unique.html#btb0_8v . I am not 100% sure that you can use the rows and stable parameters in the same call - but I think you can.
Try it and see if it works - and ask questions if you need more info.
Here is a simple method
B = NaN(size(A)); %//Preallocation
for row = 1:size(A,1)
[~,~,B(row,:)] = unique(A(row,:), 'stable');
end
find(ismember(B(2:end,:), B(1,:), 'rows')) + 1
A simple solution without loops:
row = 1; %// row used as reference
equal = bsxfun(#eq, A, permute(A, [1 3 2]));
equal = reshape(equal,size(A,1),[]); %// linearized signature of each row
result = find(ismember(equal,equal(row,:),'rows')); %// find matching rows
result = setdiff(result,row); %// remove reference row, if needed
The key is to compute a "signature" of each row, meaning the equality relationship between all combinations of its elements. This is done with bsxfun. Then, rows with the same signature can be easily found with ismember.
Thanks, Floris. The unique call didn't work correctly and I think you meant to use matrix B in it, too. Here's what I managed to do, although it's not as clean:
A=[1 0 0 1;
0 0 1 3;
0 1 0 1;
0 1 1 0;
0 1 2 2;
3 4 4 3;
5 9 9 4];
B = zeros(size(A));
for ii = 1:size(A,1)
r = A(ii,:);
B(ii,1) = 1;
for jj = 2:size(A,2)
c = find(r(1:jj-1)==r(jj));
if numel(c) > 0
B(ii,jj) = B(ii,c);
else
B(ii,jj) = max(B(ii,:))+1; % need max to generalize to more columns
end
end
end
match = zeros(size(A,1)-1,size(A,2));
for i=2:size(A,1)
for j=1:size(A,2)
if B(i,j) == B(1,j)
match(i-1,j)=1;
end
end
end
index=find(sum(match,2)==size(A,2));
In the nested loops I check if the elements in the rows below it match up in the correct column. If there is a perfect match the row should sum to the row dimension.
When I generalize this for the specific problem I'm working on the matrix fills with a certain set of base size(A,2) numbers. So for base 4 and greater, a max statement is needed in the else statement for no matches. Otherwise, for certain number combinations in a given row, a duplication of an element may occur when there is none.
A overview would be to reduce each row into a "signature" counting element repeats, i.e., your row 1 becomes 1, 2. Then check for equal signatures.

Efficient way of finding rows in which A>B

Suppose M is a matrix where each row represents a randomized sequence of a pool of N objects, e.g.,
1 2 3 4
3 4 1 2
2 1 3 4
How can I efficiently find all the rows in which a number A comes before a number B?
e.g., A=1 and B=2; I want to retrieve the first and the second rows (in which 1 comes before 2)
There you go:
[iA jA] = find(M.'==A);
[iB jB] = find(M.'==B);
sol = find(iA<iB)
Note that this works because, according to the problem specification, every number is guaranteed to appear once in each row.
To find rows of M with a given prefix (as requested in the comments): let prefix be a vector with the sought prefix (for example, prefix = [1 2]):
find(all(bsxfun(#eq, M(:,1:numel(prefix)).', prefix(:))))
something like the following code should work. It will look to see if A comes before B in each row.
temp = [1 2 3 4;
3 4 1 2;
2 1 3 4];
A = 1;
B = 2;
orderMatch = zeros(1,size(temp,1));
for i = 1:size(temp,1)
match1= temp(i,:) == A;
match2= temp(i,:) == B;
aIndex = find(match1,1);
bIndex = find(match2,1);
if aIndex < bIndex
orderMatch(i) = 1;
end
end
solution = find(orderMatch);
This will result in [1,1,0] because the first two rows have 1 coming before 2, but the third row does not.
UPDATE
added find function on ordermatch to give row indices as suggested by Luis

Aggregation Operation in Kettle / Pentaho

I'm trying to do an aggregate operation between some columns from an Excel file input. I have the following case:
Column 1 Column 2 Column 3
X $15 A
X $20 A
Y $1 B
Y $1 B
Y $3 C
And i want to achieve this aggregation operation:
Column 1 Column 2 Column 3
X $35 A
Y $2 B
Y $3 C
As you see, the Column 1 and 3 are the criteria for doing the aggregation operation, in this case, i want to get the sum of the column 2.
Is there any way to do this in Pentaho Data Integration? I've tried with "Join Rows" and "Join Rows (As a cartesian product)", but, i have no results.
Please look to Group By step. It should allow you to group by Column 1 and Column 3 and sum Column 2.

Resources