ArangoDB 3.2 traversal: exclude edge collection - filter

I am doing an AQL traversal with ArangoDB 3.2 in which I retrieve the nodes connected to my vertexCollection like this:
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "global", bfs:true}
RETURN v._id
and now I want to skip the nodes from paths where a particular edge collection is used. I know I can filter for particular attributes in lists, like FILTER p.edges[*].type ALL == 'whatever' but I do not find how to apply this into IS_SAME_COLLECTION() to filter by collection.
I discard the option of specifying exactly the edgeCollection in the traversal instead of the GRAPH because it's just one particular edgeCollection that I want to skip vs. many that I want to go through.
I don't know whether there is already an implementation for 'skip edge collection' or something like that in a graph traversal, so far I could not find it.
Note:
I tried to filter like this
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "global", bfs:true}
FILTER NOT IS_SAME_COLLECTION('edgeToSkip', e._id)
RETURN v._id
But here I simply skip the nodes directly connected with edge 'edgeToSkip' but not all nodes within the path where 'edgeToSkip' is present. So I need, not only to exclude that particular edge, but stop traversing when it is found.
Thanks
UPDATE:
I found a workaround, basically I gather all edges present in a 'path' and then filter out if the edge I want to skip is in the 'path'. Note I change from uniqeVertices: "global" to uniqueVertices: "path".
.
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "path", bfs:true}
# collect edge names (collection name) in the current path
LET ids = (
FOR edge IN p.edges
RETURN PARSE_IDENTIFIER(edge)["collection"]
)
# filter out if edge name (edgeToSkip) is present
FILTER 'edgeToSkip' NOT IN ids
RETURN v._id
This way, once the edgeToSkip is found in the path, no vertex is returned, but vertices before the 'edgeToSkip' yes
If the graph is like this:
vertexA --edge1--> vertexB --edge2--> vertexC --edgeToSkip--> vertexD --edge3--> vertexE
Will return:
vertexA, vertexB and vertexC (but not vertexD and vertexE)

I found a workaround, basically I gather all edges present in a 'path' and then filter out if the edge I want to skip is in the 'path'. Note I change from uniqeVertices: "global" to uniqueVertices: "path".
.
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "path", bfs:true}
# collect edge names (collection name) in the current path
LET ids = (
FOR edge IN p.edges
RETURN PARSE_IDENTIFIER(edge)["collection"]
)
# filter out if edge name (edgeToSkip) is present
FILTER 'edgeToSkip' NOT IN ids
RETURN v._id
This way, once the edgeToSkip is found in the path, no vertex is returned, but vertices before the 'edgeToSkip' yes
If the graph is like this:
vertexA --edge1--> vertexB --edge2--> vertexC --edgeToSkip--> vertexD --edge3--> vertexE
Will return:
vertexA, vertexB and vertexC (but not vertexD and vertexE)

Related

Neo4j - Performance: Find Nodes with a certain label with a relation to the source node

i have the following requirements:
Given a source node
Find all nodes within a certain range (e.g. 4 hops)
and the destination nodes has a special label "x"
Restrict the label type of the nodes in the path to the destination
return only the shortest path length (e.g. if i find a node with 2 hops, dont find also nodes with 3 or 4 hops)
Return all nodes needed for showing the destination nodes and the path between
i managed to create a query but the performance is not so well. i assume it is because of the high amount of nodes with the label "x"
MATCH path = allShortestPaths((source)-[*..4]-(destination))
WHERE source.objectID IN ['001614914']
AND source:Y
AND destination:X
AND ALL(x IN nodes(path)[1..] WHERE any(l in labels(x) WHERE l in ['A', 'B', 'C']))
WITH path
LIMIT 1000
WITH COLLECT(path) AS paths, MIN(length(path)) AS minLength
WITH FILTER(p IN paths WHERE length(p)= minLength) AS pathList
LIMIT 25
UNWIND pathList as path
WITH [n in nodes(path)] as nodes
return nodes
Profile:
if i change the query to use not the shortest path function, this works well when the source has not a lot of outgoing Paths
MATCH path = ((source)-[*..4]-(destination))
WHERE source.objectID IN ['001614914']
AND source:Y
AND destination:X
AND ALL(x IN nodes(path)[1..] WHERE any(l in labels(x) WHERE l in ['A', 'B', 'C']))
WITH path
LIMIT 1000
WITH COLLECT(path) AS paths, MIN(length(path)) AS minLength
WITH FILTER(p IN paths WHERE length(p)= minLength) AS pathList
LIMIT 25
UNWIND pathList as path
WITH [n in nodes(path)] as nodes
return nodes
Profile:
But if i have a source node with many children this has also a bad performance...
in the moment i am thinking if i start with a simple search for all destinations and call a shortestPath on each found destination this might be better but i am not very sure.
e.g.
MATCH (source)-[*..4]-(destination)
WHERE source.objectID IN ['001614914']
AND source:Y
AND destination:X
WITH destination
LIMIT 100
call apoc (shortest path ...)
...
Or is there a better approach?
You may want to try APOC's path expander using 'NODE_GLOBAL' uniqueness, it typically performs better than a variable-length match. It also has a means for whitelisting nodes during traversal, but this does apply to the start node too, so we'd have to include :Y in the whitelist.
See if this works for you:
MATCH path = (source:Y)
WHERE source.objectID IN ['001614914']
CALL apoc.path.expandConfig(source, {labelFilter:'+A|B|C|Y', maxLevel:4, uniqueness:'NODE_GLOBAL'}) YIELD path
WITH path, last(nodes(path)) as destination
WHERE destination:X AND NONE(node in TAIL(nodes(path)) WHERE node:Y)
// all the rest is the same as your old query
WITH path
LIMIT 1000
WITH COLLECT(path) AS paths, MIN(length(path)) AS minLength
WITH FILTER(p IN paths WHERE length(p)= minLength) AS pathList
LIMIT 25
UNWIND pathList as path
RETURN NODES(path) as nodes

How to fetch a subgraph of first neighbors in neo4j?

I fetch first n neighbors of a node with this query in neo4j:
(in this example, n = 6)
I have a weighted graph, and so I also order the results by weight:
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
RETURN DISTINCT neighbor,
rel.weight AS weight ORDER BY proximity DESC LIMIT 6;
I would like to fetch a whole subgraph, including second neighbors (first neighbors of first six children).
I tried smtg like :
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
FOREACH (neighbor | MATCH neighbor-[rel2]-(neighbor2) )
RETURN DISTINCT neighbor1, neighbor2, rel.proximity AS proximity ORDER BY proximity DESC LIMIT 6, rel2.proximity AS proximity ORDER BY proximity DESC LIMIT 6;
the syntax is still wrong but I am also uncertain about the output:
I would like to have a table of tuples parent, children and weight:
[node_A - node_B - weight]
I would like to see if it is performing better one query or six queries.
Can someone help in clarifying how to iterate a query (FOREACH) and format the output?
thank you!
Ok, I think I understand. Here's another attempt based on your comment:
MATCH (start_node)-[rel]-(neighbor)
WHERE ID(start_node) IN {source_ids}
WITH
neighbor, rel
ORDER BY rel.proximity
WITH
collect({neighbor: neighbor, rel: rel})[0..6] AS neighbors_and_rels
UNWIND neighbors_and_rels AS neighbor_and_rel
WITH
neighbor_and_rel.neighbor AS neighbor,
neighbor_and_rel.rel AS rel
MATCH neighbor-[rel2]-(neighbor2)
WITH
neighbor,
rel,
neighbor2,
rel2
ORDER BY rel.proximity
WITH
neighbor,
rel,
collect([neighbor2, rel2])[0..6] AS neighbors_and_rels2
UNWIND neighbors_and_rels2 AS neighbor_and_rel2
RETURN
neighbor,
rel,
neighbor_and_rel2[0] AS neighbor2,
neighbor_and_rel2[1] AS rel2
It's a bit long, but hopefully it gives you the idea at least
First you should avoid using START as it will (hopefully) eventually go away.
So to get a neighborhood you could use variable length paths to get all of the paths away from the node
MATCH path=start_node-[rel*1..3]-(neighbor)
WHERE ID(start_node) = 1859988
RETURN path, nodes(path) AS nodes, EXTRACT(rel IN rels(path) | rel.weight) AS weights;
Then you can take the path / nodes and combine them in memory with your language of choice.
EDIT:
Also take a look at this SO Question: Fetch a tree with Neo4j
It shows how to get the output as a set of start/end nodes for each of the relationships which can be nicer in many cases.

Create an addEdge() Gremlin query that won't duplicate for Titan

Is there a way to create a unique edge between two vertices on a Titan graph and confirm it can't be created again, unless deleted and recreated?
Basically I need to create:
vertex1--follows-->vertex2
But I keep creating multiple edges for the same relationship:
vertex1--follows-->vertex2
vertex1--follows-->vertex2
vertex1--follows-->vertex2
vertex1--follows-->vertex2
My basic addEdge query is this:
def follow(target)
grem = "g.addEdge(
g.V('id', '#{id}').next(),
g.V('id', '#{target.id}').next(),
'follows',
[since:#{Time.now.year}]
)"
$graph.execute(grem).results
end
What I am trying to find is something like this
def follow(target)
grem = "g.addEdge(
g.V('id', '#{id}').next(),
g.V('id', '#{target.id}').next(),
'follows',
[since:#{Time.now.year}]
).unique(Direction.OUT)"
$graph.execute(grem).results
end
In this document there is a method called unique, but I cannot seem to get this to work on edges, only properties of vertices.
https://github.com/thinkaurelius/titan/wiki/Type-Definition-Overview
I could run a query before the create addEdge to check for an existing edge, but that seems hacky and could cause issues with a race condition.
Is it possible a method exists which I can append to addEdge which will prevent creating a duplicate edge if an edge already exists?
Or, is there a way to create a unique property label on an edge?
Here is a gremlin session of the issue:
gremlin> g.makeType().name('follows').unique(IN).makeEdgeLabel();
==>v[36028797018964558]
gremlin> u = g.addVertex([name:'brett'])
==>v[120004]
gremlin> u2 = g.addVertex([name:'brettU'])
==>v[120008]
gremlin> e = g.addEdge(u, u2, 'follows')
==>e[2w5N-vdy-2F0LaTPQK2][120004-follows->120008]
gremlin> e = g.addEdge(u, u2, 'follows')
An edge with the given type already exists on the in-vertex
Display stack trace? [yN]
gremlin> e = g.addEdge(u2, u, 'follows')
==>e[2w5P-vdC-2F0LaTPQK2][120008-follows->120004]
gremlin> u3 = g.addVertex([name:'brett3'])
==>v[120012]
gremlin> e = g.addEdge(u3, u, 'follows')
An edge with the given type already exists on the in-vertex
Display stack trace? [yN] N
gremlin> g.E
==>e[2w5N-vdy-2F0LaTPQK2][120004-follows->120008]
==>e[2w5P-vdC-2F0LaTPQK2][120008-follows->120004]
Setting up the unique(IN|BOTH|OUT) creates an issue where we can only have one follower per user. This of course would make it impossible to have a user -> follows -> [users] relationship.
Here is another example of trying to set a unique property on an edge, this fails also:
gremlin> g.makeType().name('follows_id').unique(BOTH).makeEdgeLabel();
==>v[36028797018964942]
gremlin> u = g.addVertex([name:'brett'])
==>v[200004]
gremlin> u2 = g.addVertex([name:'brett2'])
==>v[200008]
gremlin> u3 = g.addVertex([name:'brett3'])
==>v[200012]
gremlin> e = g.addEdge(u, u2, 'follows', [follows_id:'200004-20008'])
Value must be a vertex
Display stack trace? [yN] N
gremlin> g.E
==>e[4c9z-Q1S-2F0LaTPQQu][200004-follows->200008]
gremlin> e = g.addEdge(u, u2, 'follows', [follows_id:'200004-20008'])
Value must be a vertex
Display stack trace? [yN] N
gremlin> g.E
==>e[4c9z-Q1S-2F0LaTPQQu][200004-follows->200008]
==>e[4c9B-Q1S-2F0LaTPQQu][200004-follows->200008]
To close the loop here, this question was answered in the Aurelius Graphs Mailing List. Basically:
we don't really see a use case for a uniqueness constraints to apply
to pairs of vertices (a la - only one edge can exist between vertex A
and B) for these reasons:
most times, you can get rid of the duplication quite cheaply on the query side with a dedup(): v.out('follows').dedup().....
the likelihood of conflict is much lower (due to the N^2 combinations of vertices) which makes locks just waaaay to expensive
compared to the likelihood of conflict.
In short, you should validate edge existence in your application as it cannot be enforced by Titan.
This prevents duplication in application code versus a DB configuration and solves the issue we were having.
grem = "
if(g.V('uid', '#{id}').out('follows').has('id', g.V('uid', '#{target.id}').next().id).hasNext() == true){
println 'already connected'
} else{
g.addEdge(
g.V('uid', '#{id}').next(),
g.V('uid', '#{target.id}').next(),
'follows',
[since:(new java.util.Date()).getTime()]
)
}"
$graph.execute(grem).results

Create groups from sets of nodes

I have a list of sets (a,b,c,d,e in below example). Each of the sets contains a list of nodes in that set (1-6 below). I was wondering that there probably is a general known algorithm for achieving the below, and I just do not know about it.
sets[
a[1,2,5,6],
b[1,4,5],
c[1,2,5],
d[2,5],
e[1,6],
]
I would like to generate a new structure, a list of groups, with each group having
all the (sub)sets of nodes that appear in multiple sets
references to the original sets those nodes belong to
So the above data would become (order of groups irrelevant).
group1{nodes[2,5],sets[a,c,e]}
group2{nodes[1,2,5],sets[a,c]}
group3{nodes[1,6],sets[a,e]}
group4{nodes[1,5],sets[a,b,c]}
I am assuming I can get the data in as an array/object structure and manipulate that, and then spit the resulting structure out in whatever format needed.
It would be a plus if:
all groups had a minimum of 2 nodes and 2 sets.
when a subset of nodes is contained in a bigger set that forms a group, then only the bigger set gets a group: in this example, nodes 1,2 do not have a group of their own since all the sets they have in common already appear in group2.
(The sets are stored in XML, which I have also managed to convert to JSON so far, but this is irrelevant. I can understand procedural (pseudo)code but also something like a skeleton in XSLT or Scala could help to get started, I guess.)
Go through the list of sets. For each set S
Go through the list of groups. For each group G
If S can be a member of G (i.e. if G's set is a subset of S), add S to G.
If S cannot be a member of G but the intersection of S ang G's set contains more than one node, make a new group for that intersection and add it to the list.
Give S a group of its own and add it to the list.
Combine any groups that have the same set.
Delete any group with only one member set.
For example, with your example sets, after reading a and b the list of groups is
[1,2,5,6] [a]
[1,5] [a,b]
[1,4,5] [b]
And after reading c it's
[1,2,5,6] [a]
[1,5] [a,b,c]
[1,4,5] [b]
[1,2,5] [a,c]
There are slightly more efficient algorithms, if speed is a problem.
/*
Pseudocode algorithm for creating groups data from a set dataset, further explained in the project documentation. This is based on
http://stackoverflow.com/questions/1644387/create-groups-from-sets-of-nodes
I am assuming
- Group is a structure (class) the objects of which contain two lists: a list of sets and a list of nodes (group.nodes). Its constructor accepts a list of nodes and a reference to a Set object
- Set is a list structure (class), the objects (set) of which contain the nodes of the list in set.nodes
- groups and sets are both list structures that can contain arbitrary objects which can be iterated with foreach().
- you can get the objects two lists have in common as a new list with intersection()
- you can count the number of objects in a list with length()
*/
//Create groups, going through the original sets
foreach(sets as set){
if(groups.nodes.length==0){
groups.addGroup(new Group(set.nodes, set));
}
else{
foreach (groups as group){
if(group.nodes.length() == intersection(group.nodes,set.nodes).length()){
// the group is a subset of the set, so just add the set as a member the group
group.addset(set);
if (group.nodes.length() < set.nodes.length()){
// if the set has more nodes than the group that already exists,
// create a new group for the nodes of the set, with set as a member of that group
groups.addGroup(new Group(set.nodes, set));
}
}
// If group is not a subset of set, and the intersection of the nodes of the group
// and the nodes of the set
// is greater than one (they have more than one person in common), create a new group with
// those nodes they have in common, with set as a member of that group
else if(group.nodes.length() > intersection(group.nodes,set.nodes).length()
&& intersection(group.nodes,set.nodes).length()>1){
groups.addGroup(new Group(intersection(group.nodes,set.nodes), set);
}
}
}
}
// Cleanup time!
foreach(groups as group){
//delete any group with only one member set (for it is not really a group then)
if (group.sets.length<2){
groups.remove(group);
}
// combine any groups that have the same set of nodes. Is this really needed?
foreach(groups2 as group2){
//if the size of the intersection of the groups is the same size as either of the
//groups, then the groups have the same nodes.
if (intersection(group.nodes,group2.nodes).length == group.nodes.length){
foreach(group2.sets as set2){
if(!group.hasset(set)){
group.addset(set2);
}
}
groups.remove(group2);
}
}
}

Calculating paths in a graph

I have to make a method for making a list with all the paths in a graph.My graph has only one start node and one finish node. Each node has a list whith its children and other list whith its parents. I have to make another list containing all the paths (each of them in another list)
Any suggestion??
It depends on whether it is acyclic or not. Clearly a cycle will result in infinity paths (once round the loop, twice round, 3 times round... etc etc). If the graph is acyclic then you should be able to do a depth-first seach (DFS) (http://en.wikipedia.org/wiki/Depth-first_search) and simply count the number of times you encounter the destination node.
First familiarize yourself with basic graph algorithms (try a textbook, or google). Figure out which one best suits the problem you are solving, and implement it. You may need to adapt the algorithm a little, but in general there are widely known algorithms for all basic graph problems.
If you have a GraphNode class that looks something like this:
public class GraphNode
{
public IEnumerable<GraphNode> Children { get; set; }
// ...
}
Then this sould do the work:
public static class GraphPathFinder
{
public static IEnumerable<IEnumerable<GraphNode>> FindAllPathsTo(this GraphNode startNode, GraphNode endNode)
{
List<IEnumerable<GraphNode>> results = new List<IEnumerable<GraphNode>>();
Stack<GraphNode> currentPath = new Stack<GraphNode>();
currentPath.Push(startNode);
FindAllPathsRecursive(endNode, currentPath, results);
return results;
}
private static void FindAllPathsRecursive(GraphNode endNode, Stack<GraphNode> currentPath, List<IEnumerable<GraphNode>> results)
{
if (currentPath.Peek() == endNode) results.Add(currentPath.ToList());
else
{
foreach (GraphNode node in currentPath.Peek().Children.Where(p => !currentPath.Contains(p)))
{
currentPath.Push(node);
FindAllPathsRecursive(endNode, currentPath, new List<IEnumerable<GraphNode>>());
currentPath.Pop();
}
}
}
}
It's a simple implementation of the DFS algorithm. No error checking, optimizations, thread-safety etc...
Also if you are sure that your graph does not cycles, you may remove the where clause in the foreach statement in the last method.
Hope this helped.
You could generate every possible combination of vertices (using combinatorics) and filter out the paths that don't exist (where the vertices aren't joined by an edge or the edge has the wrong direction on it).
You can improve on this basic idea by having the code that generates the combinations check what remaining vertices are available from the current vertex.
This is all assuming you have acyclic graphs and wish to visit each vertex exactly once.

Resources