Consider the follwing CQL Query
MATCH (n:Label1) WITH n
OPTIONAL MATCH (n)-[r:REL_1]-(:Label2 {id: 5})
WHERE r is NULL OR r.d < 12345 OR (r.d = 12345 OR r.c < 2)
WITH n,r LIMIT 100
WITH COLLECT({n: n, r: r}) AS rows
MERGE (c:Label2 {id: 5})
WITH c,
[b IN rows WHERE b.r.d IS NULL OR b.r.d < 12345] AS null_less_rows,
[c IN rows WHERE (c.r.d = 12345 AND c.r.c < 2)] AS other_rows
WITH null_less_rows, other_rows, c, null_less_rows+other_rows AS rows, size(null_less_rows+other_rows) AS count
UNWIND null_less_rows AS null_less_row
MERGE(s:Label1 {id: null_less_row.n.id})
MERGE(s)-[:REL_1 {d: 12345, c: 1}]->(c)
WITH DISTINCT other_rows, c, rows, count
UNWIND other_rows AS other_row
MATCH(s:Label1 {id: other_row.n.id})-[str:REL_1]->(c) SET str.c = str.c + 1
WITH rows, count
RETURN rows, count
When I excute the query, It should return rows and count (according to query). But instead of returning rows, count it's giving result statement.
Set 200 properties, created 100 relationships, statement completed in 13 ms.
Is there any problem with query structure or problem with improper use of UNWIND clause.
If other_rows is null or empty, UNWIND will not produce any rows.
You could solve it with:
UNWIND case coalesce(size(other_rows),0) when 0 then [null] else other_rows end
as other_row
Addition to Micheael Hunger Answer
UNWIND (CASE other_rows WHEN [] then [{n:{id: -2}}] else other_rows end) AS other_row
As I am operating on values of array, instead of null, I need to add extra condition so that it can't throw any error mesaages.
This apllies for both cases (other_rows, null_less_rows)
Related
I want to use Power Query to extract by field(field is [Project]), then get the top 3 scoring rows from the master table for each project, but if there are more than 3 rows with a score of over 15, they should all be included. 3 rows must be extracted every time as minimum.
Essentially I'm trying to combine Keep Rows function with my formula of "=if(score>=15,1,0)"
Setting the query to records with score greater than 15 doesn't work for projects where the highest scores are, for example, 1, 7 and 15. This would only return 1 row, but we need 3 as a minimum.
Setting it to the top 3 scores only would omit rows in a table where the highest scores are 18, 19, 20
Is there a way to combine the two function to say "Choose the top 3 rows, but choose the top n rows if there are n rows with score >= 15
As far as I understand you try to do following (Alexis Olson proposed very same):
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
group = Table.Group(Source, {"Project"}, {"temp", each Table.SelectRows(Table.AddIndexColumn(Table.Sort(_, {"Score", 1}), "i", 1, 1), each [i]<=3 or [Score]>=15)}),
expand = Table.ExpandTableColumn(group, "temp", {"Score"})
in
expand
Or:
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
group = Table.Group(Source, {"Project"}, {"temp", each [a = Table.Sort(_, {"Score", 1}), b = Table.FirstN(a, 3) & Table.SelectRows(Table.Skip(a,3), each [Score]>=15)][b]}),
expand = Table.ExpandTableColumn(group, "temp", {"Score"})
in
expand
Or:
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
group = Table.Group(Source, {"Project"}, {"Score", each [a = List.Sort([Score], 1), b = List.FirstN(a,3)&List.Select(List.Skip(a,3), each _ >=15)][b]}),
expand = Table.ExpandListColumn(group, "Score")
in
expand
Note, if there are more columns in the table you want to keep, for first and second variants you may just add these columns to last step. For last variant you haven't such option and the code should be modified.
Sort by the Score column in descending order and then add an Index column (go to Add Column > Index Column > From 1).
Then filter on the Index column choosing to keep values less than or equal to 3. This should produce a step with this M code:
= Table.SelectRows(#"Added Index", each [Index] <= 3)
Now you just need to make a small adjustment to also include any score 15 or greater:
= Table.SelectRows(#"Added Index", each [Index] <= 3 or [Score] >= 15)
For key equality lookup, we can use a data type like a hashmap, but is there a data structure for looking up values matching arbitrary ranges?
The Rust code below emulates this using a match expression but I'm looking to not have to hard code the cases in code.
let x = 5;
match x {
d if d <= 0 => println!("d <= 0"),
d if 1 < d && d <= 3 => println!("1 < d <= 3"),
d if 4 < d && d <= 6 => println!("4 < d <= 6"),
_ => {}
}
(Rust playground)
You could create a list of ranges, with start and end values. Sort that list by start value.
When you get a query, do a binary search on the starting values. When your value is greater than or equal to the starting value, and less than or equal to the ending value, you know you got the right range.
If you have a relatively small total range (say, integers from 1 to 1000), you could pre-fill an array of references to ranges. Say you have the 4 ranges and the possible query values are 0 through 10:
range1: 0, 2
range2: 3, 5
range3: 6, 8
range4: 7, 10
Your array, then, would be [range1, range1, range1, range2, range2, range2, range3, range3, range3, range4, range4, range4, range4].
You could extend that to however large you want it, depending on how much memory you want to spend. That gives you a direct lookup.
I am looking for an algorithm which works like this
permutateBuckets([A,B,C])
and gives the following result:
[ [[A,B,C]],
[[A,B],[C]], [[A,C],[B]], [[B,C],[A]], [[A],[B,C]], [[B],[A,C]], [[C],[A,B]],
[[A],[B],[C]], [[A],[C],[B]], [[B],[A],[C]], [[B],[C],[A]], [[C],[A],[B]], [[C],[B],[A]]
]
In general:
The permutation for [1,2,...,n] should include any possible arrangements of 1 up to n buckets that contain the input values, order of values within buckets is not relevant (e.g. [1,2] equals [2,1]), only the order of the containing buckets matters (e.g. [[1,2],[3]] is different than [[3],[1,2]] ).
Each input element has to be in exactly one bucket for a result to be valid (e.g. an input of [1,2] cannot give [[1]] (missing 2), or [[1,2],[1]] (1 appears twice) as output).
The simplest approach is recursive:
Make [[A]] list
Insert new item in all possible places -
before current sublists
between all sublists
after current sublists
into every sublist
For example, list [[B][A]] produces 5 new lists with item C - places to insert C are:
[ [B] [A] ]
^ ^ ^ ^ ^
and three level-2 lists [[A],[B]], [[B],[A]], [[A,B]] produce 5+5+3=13 level-3 lists.
Alternative way:
Generate all n-length nondecreasing sequences from 1...1 to 1..n and generate unique permutations for every sequence.
Values on these permutations correspond to the bucket number for every item. For example, 122 sequence gives 3 permutations that corresponds to distributions:
1 2 2 [1],[2, 3]
2 1 2 [2],[1, 3]
2 2 1 [3],[1, 2]
In any case number of distributions rises very quickly (ordered Bell numbers 1, 3, 13, 75, 541, 4683, 47293, 545835, 7087261, 102247563...)
Implementation of iterative approach in Delphi (full FP-compatible code at ideone)
procedure GenDistributions(N: Integer);
var
seq, t, i, mx: Integer;
Data: array of Byte;
Dist: TBytes2D;
begin
SetLength(Data, N);
//there are n-1 places for incrementing
//so 2^(n-1) possible sequences
for seq := 0 to 1 shl (N - 1) - 1 do begin
t := seq;
mx := 0;
Data[0] := mx;
for i := 1 to N - 1 do begin
mx := mx + (t and 1); //check for the lowest bit
Data[i] := mx;
t := t shr 1;
end;
//here Data contains nondecreasing sequence 0..mx, increment is 0 or 1
//Data[i] corresponds to the number of sublist which item i belongs to
repeat
Dist := nil;
SetLength(Dist, mx + 1); // reset result array into [][][] state
for i := 0 to N - 1 do
Dist[Data[i]] := Dist[Data[i]] + [i]; //add item to calculated sublist
PrintOut(Dist);
until not NextPerm(Data); //generates next permutation if possible
end;
And now Python recursive implementation (ideone)
import copy
cnt = 0
def ModifySublist(Ls, idx, value):
res = copy.deepcopy(Ls)
res[idx].append(value)
return res
def InsertSublist(Ls, idx, value):
res = copy.deepcopy(Ls)
res.insert(idx, [value])
return res
def GenDists(AList, Level, Limit):
global cnt
if (Level==Limit):
print( AList)
cnt += 1
else:
for i in range(len(AList)):
GenDists(ModifySublist(AList, i, Level), Level + 1, Limit)
GenDists(InsertSublist(AList, i, Level), Level + 1, Limit)
GenDists(InsertSublist(AList, len(AList), Level), Level + 1, Limit)
GenDists([], 0, 3)
print(cnt)
Edit: #mhmnn cloned this code in JavaScript using custom items for output.
I need to DELETE relations of particular type of a node which is iterating over FOREACH.
In detail ::
PROFILE MATCH (n:Label1)-[r1:REL1]-(a:Label2)
WHERE a.prop1 = 2
WITH n
WITH COLLECT(n) AS rows
WITH [a IN rows WHERE a.prop2 < 1484764200] AS less_than_rows,
[b IN rows WHERE b.prop2 = 1484764200 AND b.prop3 < 2] AS other_rows
WITH size(less_than_rows) + size(other_rows) AS count, less_than_rows, other_rows
FOREACH (sub IN less_than_rows |
MERGE (sub)-[r:REL2]-(:Label2)
DELETE r
MERGE(l2:Label2{id:540})
MERGE (sub)-[:APPEND_TO {s:0}]->(l2)
SET sub.prop3=1, sub.prop2=1484764200)
WITH DISTINCT other_rows, count
FOREACH (sub IN other_rows |
MERGE(l2:Label2{id:540})
MERGE (sub)-[:APPEND_TO {s:0}]->(l2)
SET sub.prop3=sub.prop3+1)
RETURN count
As FOREACH is not suppoting MATCH, I used MERGE to achieve it. But it is very slow when I execute it (It is taking around 1 min).
But If I excete with out FOREACH (stop updaing), it is giving around 1 sec.
Problem:: Clearly the problem with FOREACH or inside operations with in FOREACH.
I want to delete a particular relation, create another relation and set some properties to node.
Note:: I showed total query because Is there any other way to achieve the same requirement (out of this FOREACH, I tried with CASE WHEN)
I noticed a few things about your original query:
MERGE(l2:Label2 {id:540}) should be moved out of both FOREACH clauses, since it only needs to be done once. This is slowing down the query. In fact, if you expect the node to already exist, you can use a MATCH instead.
MERGE (sub)-[:APPEND_TO {s:0}]->(l2) may not do what you intended, since it will only match existing relationships in which the s property is still 0. If s is not 0, you will end up creating an additional relationship. To ensure that there is a single relationship and that its s value is (reset to) 0, you should remove the {s:0} test from the pattern and use SET to set the s value; this should also speed up the MERGE, since it will not need to do a property value test.
This version of your query should fix the above issues, and be faster (but you will have to try it out to see how much faster):
PROFILE
MATCH (n:Label1)-[:REL1]-(a:Label2)
WHERE a.prop1 = 2
WITH COLLECT(n) AS rows
WITH
[a IN rows WHERE a.prop2 < 1484764200] AS less_than_rows,
[b IN rows WHERE b.prop2 = 1484764200 AND b.prop3 < 2] AS other_rows
WITH size(less_than_rows) + size(other_rows) AS count, less_than_rows, other_rows
MERGE(l2:Label2 {id:540})
FOREACH (sub IN less_than_rows |
MERGE (sub)-[r:REL2]-(:Label2)
DELETE r
MERGE (sub)-[r2:APPEND_TO]->(l2)
SET r2.s = 0, sub.prop3 = 1, sub.prop2 = 1484764200)
WITH DISTINCT l2, other_rows, count
FOREACH (sub IN other_rows |
MERGE (sub)-[r3:APPEND_TO]->(l2)
SET r3.s = 0, sub.prop3 = sub.prop3+1)
RETURN count;
If you only intend to set the s value to 0 when the APPEND_TO relationship is being created, then use the ON CREATE clause instead of SET:
PROFILE
MATCH (n:Label1)-[:REL1]-(a:Label2)
WHERE a.prop1 = 2
WITH COLLECT(n) AS rows
WITH
[a IN rows WHERE a.prop2 < 1484764200] AS less_than_rows,
[b IN rows WHERE b.prop2 = 1484764200 AND b.prop3 < 2] AS other_rows
WITH size(less_than_rows) + size(other_rows) AS count, less_than_rows, other_rows
MERGE(l2:Label2 {id:540})
FOREACH (sub IN less_than_rows |
MERGE (sub)-[r:REL2]-(:Label2)
DELETE r
MERGE (sub)-[r2:APPEND_TO]->(l2)
ON CREATE SET r2.s = 0
SET sub.prop3 = 1, sub.prop2 = 1484764200)
WITH DISTINCT l2, other_rows, count
FOREACH (sub IN other_rows |
MERGE (sub)-[r3:APPEND_TO]->(l2)
ON CREATE r3.s = 0
SET sub.prop3 = sub.prop3+1)
RETURN count;
Instead of FOREACH, you can UNWIND the collection of rows and process those. You can also use OPTIONAL MATCH instead of MERGE, so you avoid the fallback creation behavior of MERGE when a match isn't found. See how this compares:
PROFILE
MATCH (n:Label1)-[:REL1]-(a:Label2)
WHERE a.prop1 = 2
WITH COLLECT(n) AS rows
WITH [a IN rows WHERE a.prop2 < 1484764200] AS less_than_rows,
[b IN rows WHERE b.prop2 = 1484764200 AND b.prop3 < 2] AS other_rows
WITH size(less_than_rows) + size(other_rows) AS count, less_than_rows, other_rows
// faster to do it here, only 1 row so it executes once
MERGE(l2:Label2{id:540})
UNWIND less_than_rows as sub
OPTIONAL MATCH (sub)-[r:REL2]-(:Label2)
DELETE r
MERGE (sub)-[:APPEND_TO {s:0}]->(l2)
SET sub.prop3=1, sub.prop2=1484764200
WITH DISTINCT other_rows, count, l2
UNWIND other_rows as sub
MERGE (sub)-[:APPEND_TO {s:0}]->(l2)
SET sub.prop3=sub.prop3+1
RETURN count
I'm using Pig 10.0. I want to Merge bags in a foreach. Let's say I have the following visitors alias:
(a, b, {1, 2, 3, 4}),
(a, d, {1, 3, 6}),
(a, e, {7}),
(z, b, {1, 2, 3})
I want to group the tuples on the first field and merge the bags with a set semantic to get the following following tuples:
({1, 2, 3, 4, 6, 7}, a, 6)
({1, 2, 3}, z, 3)
The first field is the union of the bags with a set semantic. The second field of the tuple is the group field. The third field is the number items in the bag.
I tried several variations around the following code (replaced SetUnion by Group/Distinct etc.) but always failed to achieve the wanted behavior:
DEFINE SetUnion datafu.pig.bags.sets.SetUnion();
grouped = GROUP visitors by (FirstField);
merged = FOREACH grouped {
VU = SetUnion(visitors.ThirdField);
GENERATE
VU as Vu,
group as FirstField,
COUNT(VU) as Cnt;
}
dump merged;
Can you explain where I'm wrong and how to implement the desired behavior?
I finally managed to achieve the wanted behavior. A self contained example of my solution follows:
Data file:
a b 1
a b 2
a b 3
a b 4
a d 1
a b 3
a b 6
a e 7
z b 1
z b 2
z b 3
Code:
-- Prepare data
in = LOAD 'data' USING PigStorage()
AS (One:chararray, Two:chararray, Id:long);
grp = GROUP in by (One, Two);
cnt = FOREACH grp {
ids = DISTINCT in.Id;
GENERATE
ids as Ids,
group.One as One,
group.Two as Two,
COUNT(ids) as Count;
}
-- Interesting code follows
grp2 = GROUP cnt by One;
cnt2 = FOREACH grp2 {
ids = FOREACH cnt.Ids generate FLATTEN($0);
GENERATE
ids as Ids,
group as One,
COUNT(ids) as Count;
}
describe cnt2;
dump grp2;
dump cnt2;
Describe:
Cnt: {Ids: {(Id: long)},One: chararray,Two: chararray,Count: long}
grp2:
(a,{({(1),(2),(3),(4),(6)},a,b,5),({(1)},a,d,1),({(7)},a,e,1)})
(z,{({(1),(2),(3)},z,b,3)})
cnt2:
({(1),(2),(3),(4),(6),(1),(7)},a,7)
({(1),(2),(3)},z,3)
Since the code uses a FOREACH nested in a FOREACH it requires Pig > 10.0.
I will let the question as unresolved for a few days since a cleaner solution probably exists.
Found a simpler solution for this.
current_input = load '/idn/home/ksing143/tuple_related_data/tough_grouping.txt' USING PigStorage() AS (col1:chararray, col2:chararray, col3:int);
/* But we do not need column 2. Hence eliminating to avoid confusion */
relevant_input = foreach current_input generate col1, col3;
relevant_distinct = DISTINCT relevant_input;
relevant_grouped = group relevant_distinct by col1;
/* This will give */
(a,{(a,1),(a,2),(a,3),(a,4),(a,6),(a,7)})
(z,{(z,1),(z,2),(z,3)})
relevant_grouped_advance = foreach relevant_grouped generate (relevant_distinct.col3) as col3, group, COUNT(relevant_distinct.col3) as count_val;
/* This will give desired result */
({(1),(2),(3),(4),(6),(7)},a,6)
({(1),(2),(3)},z,3)