Create an addEdge() Gremlin query that won't duplicate for Titan - ruby

Is there a way to create a unique edge between two vertices on a Titan graph and confirm it can't be created again, unless deleted and recreated?
Basically I need to create:
vertex1--follows-->vertex2
But I keep creating multiple edges for the same relationship:
vertex1--follows-->vertex2
vertex1--follows-->vertex2
vertex1--follows-->vertex2
vertex1--follows-->vertex2
My basic addEdge query is this:
def follow(target)
grem = "g.addEdge(
g.V('id', '#{id}').next(),
g.V('id', '#{target.id}').next(),
'follows',
[since:#{Time.now.year}]
)"
$graph.execute(grem).results
end
What I am trying to find is something like this
def follow(target)
grem = "g.addEdge(
g.V('id', '#{id}').next(),
g.V('id', '#{target.id}').next(),
'follows',
[since:#{Time.now.year}]
).unique(Direction.OUT)"
$graph.execute(grem).results
end
In this document there is a method called unique, but I cannot seem to get this to work on edges, only properties of vertices.
https://github.com/thinkaurelius/titan/wiki/Type-Definition-Overview
I could run a query before the create addEdge to check for an existing edge, but that seems hacky and could cause issues with a race condition.
Is it possible a method exists which I can append to addEdge which will prevent creating a duplicate edge if an edge already exists?
Or, is there a way to create a unique property label on an edge?
Here is a gremlin session of the issue:
gremlin> g.makeType().name('follows').unique(IN).makeEdgeLabel();
==>v[36028797018964558]
gremlin> u = g.addVertex([name:'brett'])
==>v[120004]
gremlin> u2 = g.addVertex([name:'brettU'])
==>v[120008]
gremlin> e = g.addEdge(u, u2, 'follows')
==>e[2w5N-vdy-2F0LaTPQK2][120004-follows->120008]
gremlin> e = g.addEdge(u, u2, 'follows')
An edge with the given type already exists on the in-vertex
Display stack trace? [yN]
gremlin> e = g.addEdge(u2, u, 'follows')
==>e[2w5P-vdC-2F0LaTPQK2][120008-follows->120004]
gremlin> u3 = g.addVertex([name:'brett3'])
==>v[120012]
gremlin> e = g.addEdge(u3, u, 'follows')
An edge with the given type already exists on the in-vertex
Display stack trace? [yN] N
gremlin> g.E
==>e[2w5N-vdy-2F0LaTPQK2][120004-follows->120008]
==>e[2w5P-vdC-2F0LaTPQK2][120008-follows->120004]
Setting up the unique(IN|BOTH|OUT) creates an issue where we can only have one follower per user. This of course would make it impossible to have a user -> follows -> [users] relationship.
Here is another example of trying to set a unique property on an edge, this fails also:
gremlin> g.makeType().name('follows_id').unique(BOTH).makeEdgeLabel();
==>v[36028797018964942]
gremlin> u = g.addVertex([name:'brett'])
==>v[200004]
gremlin> u2 = g.addVertex([name:'brett2'])
==>v[200008]
gremlin> u3 = g.addVertex([name:'brett3'])
==>v[200012]
gremlin> e = g.addEdge(u, u2, 'follows', [follows_id:'200004-20008'])
Value must be a vertex
Display stack trace? [yN] N
gremlin> g.E
==>e[4c9z-Q1S-2F0LaTPQQu][200004-follows->200008]
gremlin> e = g.addEdge(u, u2, 'follows', [follows_id:'200004-20008'])
Value must be a vertex
Display stack trace? [yN] N
gremlin> g.E
==>e[4c9z-Q1S-2F0LaTPQQu][200004-follows->200008]
==>e[4c9B-Q1S-2F0LaTPQQu][200004-follows->200008]

To close the loop here, this question was answered in the Aurelius Graphs Mailing List. Basically:
we don't really see a use case for a uniqueness constraints to apply
to pairs of vertices (a la - only one edge can exist between vertex A
and B) for these reasons:
most times, you can get rid of the duplication quite cheaply on the query side with a dedup(): v.out('follows').dedup().....
the likelihood of conflict is much lower (due to the N^2 combinations of vertices) which makes locks just waaaay to expensive
compared to the likelihood of conflict.
In short, you should validate edge existence in your application as it cannot be enforced by Titan.

This prevents duplication in application code versus a DB configuration and solves the issue we were having.
grem = "
if(g.V('uid', '#{id}').out('follows').has('id', g.V('uid', '#{target.id}').next().id).hasNext() == true){
println 'already connected'
} else{
g.addEdge(
g.V('uid', '#{id}').next(),
g.V('uid', '#{target.id}').next(),
'follows',
[since:(new java.util.Date()).getTime()]
)
}"
$graph.execute(grem).results

Related

How to get level(depth) number of two connected nodes in neo4j

I'm using neo4j as a graph database to store user's connections detail into this. here I want to show the level of one user with respect to another user in their connections like Linkedin. for example- first layer connection, second layer connection, third layer and above the third layer shows 3+. but I don't know how this happens using neo4j. i searched for this but couldn't find any solution for this. if anybody knows about this then please help me to implement this functionality.
To find the shortest "connection level" between 2 specific people, just get the shortest path and add 1:
MATCH path = shortestpath((p1:Person)-[*..]-(p2:Person))
WHERE p1.id = 1 AND p2.id = 2
RETURN LENGTH(path) + 1 AS level
NOTE: You may want to put a reasonable upper bound on the variable-length relationship pattern (e.g., [*..6]) to avoid having the query taking too long or running out of memory in a large DB). You should probably ignore very distant connections anyway.
it would be something like this
// get all persons (or users)
MATCH (p:Person)
// create a set of unique combinations , assuring that you do
// not do double work
WITH COLLECT(p) AS personList
UNWIND personList AS personA
UNWIND personList AS personB
WITH personA,personB
WHERE id(personA) < id(personB)
// find the shortest path between any two nodes
MATCH path=shortestPath( (personA)-[:LINKED_TO*]-(personB) )
// return the distance ( = path length) between the two nodes
RETURN personA.name AS nameA,
personB.name AS nameB,
CASE WHEN length(path) > 3 THEN '3+'
ELSE toString(length(path))
END AS distance

ArangoDB 3.2 traversal: exclude edge collection

I am doing an AQL traversal with ArangoDB 3.2 in which I retrieve the nodes connected to my vertexCollection like this:
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "global", bfs:true}
RETURN v._id
and now I want to skip the nodes from paths where a particular edge collection is used. I know I can filter for particular attributes in lists, like FILTER p.edges[*].type ALL == 'whatever' but I do not find how to apply this into IS_SAME_COLLECTION() to filter by collection.
I discard the option of specifying exactly the edgeCollection in the traversal instead of the GRAPH because it's just one particular edgeCollection that I want to skip vs. many that I want to go through.
I don't know whether there is already an implementation for 'skip edge collection' or something like that in a graph traversal, so far I could not find it.
Note:
I tried to filter like this
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "global", bfs:true}
FILTER NOT IS_SAME_COLLECTION('edgeToSkip', e._id)
RETURN v._id
But here I simply skip the nodes directly connected with edge 'edgeToSkip' but not all nodes within the path where 'edgeToSkip' is present. So I need, not only to exclude that particular edge, but stop traversing when it is found.
Thanks
UPDATE:
I found a workaround, basically I gather all edges present in a 'path' and then filter out if the edge I want to skip is in the 'path'. Note I change from uniqeVertices: "global" to uniqueVertices: "path".
.
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "path", bfs:true}
# collect edge names (collection name) in the current path
LET ids = (
FOR edge IN p.edges
RETURN PARSE_IDENTIFIER(edge)["collection"]
)
# filter out if edge name (edgeToSkip) is present
FILTER 'edgeToSkip' NOT IN ids
RETURN v._id
This way, once the edgeToSkip is found in the path, no vertex is returned, but vertices before the 'edgeToSkip' yes
If the graph is like this:
vertexA --edge1--> vertexB --edge2--> vertexC --edgeToSkip--> vertexD --edge3--> vertexE
Will return:
vertexA, vertexB and vertexC (but not vertexD and vertexE)
I found a workaround, basically I gather all edges present in a 'path' and then filter out if the edge I want to skip is in the 'path'. Note I change from uniqeVertices: "global" to uniqueVertices: "path".
.
For v, e, p IN 1..10 ANY vertexCollection GRAPH myGraph OPTIONS {uniqueVertices: "path", bfs:true}
# collect edge names (collection name) in the current path
LET ids = (
FOR edge IN p.edges
RETURN PARSE_IDENTIFIER(edge)["collection"]
)
# filter out if edge name (edgeToSkip) is present
FILTER 'edgeToSkip' NOT IN ids
RETURN v._id
This way, once the edgeToSkip is found in the path, no vertex is returned, but vertices before the 'edgeToSkip' yes
If the graph is like this:
vertexA --edge1--> vertexB --edge2--> vertexC --edgeToSkip--> vertexD --edge3--> vertexE
Will return:
vertexA, vertexB and vertexC (but not vertexD and vertexE)

Tensorflow how to dump the result placement algorithm

I'm curious of model parallelism, and I've read the code from Yaroslav Bulatov. And in that example, we should partition our model (or in tensorflow we called Graph) manually to different partition (left_network & right_network).
So, I was wondering if I have to make partitions manually, what's the simple_placer.cc and graph_partition.cc have done to the whole graph? And I'm still not clear as all.
In my thought(let me know if anythong wrong):
If the graph has 8 partitions(subgraph) which can be seen as 8 jobs, and 4 workers,How the partitions distributed to workers can be done through:
explicit annotations via tf.device(), or
distributed training, tf.train.replica_device_setter()
share the variables across parameter servers, and otherwise put all
ops on the worker device
But how's the graph make partitions?
I want to trace what's the subgraph (op-nodes set) looks like?
Can I dump the result or I need to trace/modified which code file?
Please let me know if any concepts is wrong or vague.
I'm a rookie of these, any opinion is appreciated.
In the code below, is matmul a op-node, would it be partition into
different jobs?
y_ = tf.placeholder(tf.float32, [None, 10])
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W) + b
You can get the result of the placement algorithm by passing additional options when you call tf.Session.run()
# ...
y = tf.matmul(x, W) + b
sess = tf.Session()
options = tf.RunOptions(output_partition_graphs=True)
metadata = tf.RunMetadata()
sess.run(y, options=options, run_metadata=metadata)
# `metadata` now contains information about what happened during the `run()` call.
for partition in metadata.partition_graphs:
# `partition` is a `tf.GraphDef` representing all the nodes that ran on a single
# device. All nodes in `partition` have the same `device` value.
device = partition.node[0].device
for node in partition.node:
# e.g. print each node or store it in a dictionary for further analysis.
# ...

How to fetch a subgraph of first neighbors in neo4j?

I fetch first n neighbors of a node with this query in neo4j:
(in this example, n = 6)
I have a weighted graph, and so I also order the results by weight:
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
RETURN DISTINCT neighbor,
rel.weight AS weight ORDER BY proximity DESC LIMIT 6;
I would like to fetch a whole subgraph, including second neighbors (first neighbors of first six children).
I tried smtg like :
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
FOREACH (neighbor | MATCH neighbor-[rel2]-(neighbor2) )
RETURN DISTINCT neighbor1, neighbor2, rel.proximity AS proximity ORDER BY proximity DESC LIMIT 6, rel2.proximity AS proximity ORDER BY proximity DESC LIMIT 6;
the syntax is still wrong but I am also uncertain about the output:
I would like to have a table of tuples parent, children and weight:
[node_A - node_B - weight]
I would like to see if it is performing better one query or six queries.
Can someone help in clarifying how to iterate a query (FOREACH) and format the output?
thank you!
Ok, I think I understand. Here's another attempt based on your comment:
MATCH (start_node)-[rel]-(neighbor)
WHERE ID(start_node) IN {source_ids}
WITH
neighbor, rel
ORDER BY rel.proximity
WITH
collect({neighbor: neighbor, rel: rel})[0..6] AS neighbors_and_rels
UNWIND neighbors_and_rels AS neighbor_and_rel
WITH
neighbor_and_rel.neighbor AS neighbor,
neighbor_and_rel.rel AS rel
MATCH neighbor-[rel2]-(neighbor2)
WITH
neighbor,
rel,
neighbor2,
rel2
ORDER BY rel.proximity
WITH
neighbor,
rel,
collect([neighbor2, rel2])[0..6] AS neighbors_and_rels2
UNWIND neighbors_and_rels2 AS neighbor_and_rel2
RETURN
neighbor,
rel,
neighbor_and_rel2[0] AS neighbor2,
neighbor_and_rel2[1] AS rel2
It's a bit long, but hopefully it gives you the idea at least
First you should avoid using START as it will (hopefully) eventually go away.
So to get a neighborhood you could use variable length paths to get all of the paths away from the node
MATCH path=start_node-[rel*1..3]-(neighbor)
WHERE ID(start_node) = 1859988
RETURN path, nodes(path) AS nodes, EXTRACT(rel IN rels(path) | rel.weight) AS weights;
Then you can take the path / nodes and combine them in memory with your language of choice.
EDIT:
Also take a look at this SO Question: Fetch a tree with Neo4j
It shows how to get the output as a set of start/end nodes for each of the relationships which can be nicer in many cases.

FB profile connectivity

I tried to search for the answer of one problem of my interview. But got no solution. Can anyone help me in this question. Here is the problem Description:
Given two person's name A & B. You know both exist on FB. You have to tell is there any connectivity between them. If connectivity exists then you have to tell the exact path of connectivity.
By Connectivity they mean that B could be a friend of C which is friend of A . In this way the re is a connectivity between A & B and the path would be A -> B- > C
You can use Bidirectional search.
The main idea:
AGroup = {A}, BGroup = {B}.
while intersect(AGroup,BGroup) = empty set:
2.1 Expand every person from AGroup that you have not expanded yet and insert the result to AGroup.
2.2 Expand every person from BGroup that you have not expanded yet and insert the result BGroup.
2.3 if AGroup and BGroup have not changed, return "A and B are not connected".
Denote S the person in both AGroup and in BGroup.
Now you have the path from A to S, and the path from B to S.
Return A->...->S->...->B.

Resources