How to get level(depth) number of two connected nodes in neo4j - spring-boot

I'm using neo4j as a graph database to store user's connections detail into this. here I want to show the level of one user with respect to another user in their connections like Linkedin. for example- first layer connection, second layer connection, third layer and above the third layer shows 3+. but I don't know how this happens using neo4j. i searched for this but couldn't find any solution for this. if anybody knows about this then please help me to implement this functionality.

To find the shortest "connection level" between 2 specific people, just get the shortest path and add 1:
MATCH path = shortestpath((p1:Person)-[*..]-(p2:Person))
WHERE p1.id = 1 AND p2.id = 2
RETURN LENGTH(path) + 1 AS level
NOTE: You may want to put a reasonable upper bound on the variable-length relationship pattern (e.g., [*..6]) to avoid having the query taking too long or running out of memory in a large DB). You should probably ignore very distant connections anyway.

it would be something like this
// get all persons (or users)
MATCH (p:Person)
// create a set of unique combinations , assuring that you do
// not do double work
WITH COLLECT(p) AS personList
UNWIND personList AS personA
UNWIND personList AS personB
WITH personA,personB
WHERE id(personA) < id(personB)
// find the shortest path between any two nodes
MATCH path=shortestPath( (personA)-[:LINKED_TO*]-(personB) )
// return the distance ( = path length) between the two nodes
RETURN personA.name AS nameA,
personB.name AS nameB,
CASE WHEN length(path) > 3 THEN '3+'
ELSE toString(length(path))
END AS distance

Related

Finding optimal swapping paths in employees moving to different cities

We have a problem where we want to find the optimal path for swapping employees' locations across the country.
Hypothetically, a company allows for employees to request to move to another city only if a vacancy is available in that city, and also if someone is willing to take their soon-to-be vacant position. Examine the example:
Employee A who currently works in Los Angeles wants to move to Boston.
Employee B who currently works in Boston wants to move to New York.
Employee C who currently works in New York wants to move to Los Angeles.
In the above triangle, we can grant all three employees the permission to do the move, since there won't be any vacancies once they move. But the situation gets more complex when:
Multiple employees are competing for the same location. We can solve this with a hypothetical score of some sort, like more years working for the company gets the priority.
We have more cities to consider. (in the hundreds)
We have more employees to consider. (in the hundreds of thousands)
Ultimately the goal is to grant the highest number of move permissions without leading to any vacancies in the system.
We're currently exploring the idea of simulating all the swapping paths, and then selecting the one that generates the highest number of moves.
But I feel that this problem existed in the wild before, I just don't know what keywords to look for in order to get more insights. Any ideas? What algorithms should we look into?
Remove the impossible move requests, like this
A,B are specific cities. n is amy city
RAB is a request to move from A to B
RAn is a request to move from A
RnA is a reuest to move to A
CAn is the number of requests to move from A
CnA is the number of requests to move to A
set flag TRUE
WHILE ( flag == TRUE )
set flag = FALSE
LOOP A over all cities
IF CAn > CnA then not all RAn can be permitted.
Remove lower scoring requests until CAn == CnA.
set flag TRUE
Once these "impossible" moves are removed, all of the remaining move requests are "in-balance". That is, all of the move requests to a city are equal all of those from a city. From that point on it no longer matters which cycles you choose to implement: once you implement them, the remainings requests are still all in-balance. And no matter which move-cycle and which order they are implemented in, it stays in-balance until all remaining requests are zero, and the total number of moves will be exactly the same no matter how they are implemented. ( This explanation is due to https://stackoverflow.com/users/109122/rbarryyoung )
Here is C++ code implementing this
void removeForbidden()
{
bool flag = true;
while (flag)
{
flag = false;
for (auto &city : sCity)
{
auto vFrom = RequestCountFrom(city);
auto vTo = RequestCountTo(city);
if (vFrom.size() > vTo.size())
{
for (int k = vTo.size(); k < vFrom.size(); k++)
{
vFrom[k]->allowed = false;
}
flag = true;
}
}
}
std::cout << "Permitted moves:\n";
for (auto &R : vRequest)
{
if (R.allowed)
std::cout << R.text();
}
}
The complete application code is at https://gist.github.com/JamesBremner/5f49beaca59a7a7043e356fbb35f0d09
The input is a space delimited text file with 4 columns: employee name, employee score, from city, to city
Here is sample input based on your example but adding another request that cannot be permitted
e1 1 a b
e2 1 b c
e3 1 c a
e4 0 a c
The output from this is
Permitted moves:
e1 1 a b
e2 1 b c
e3 1 c a
Note: I have not implemented the scoring. For simplicity I assume that move requests are entered in order of descending score. So, the requests that are dropped when necessary, change according to the order you enter them. I assume you will be able to implement whatever scoring system you require. Also note that, unless you calculate a unique score for every request from a city, then which requests are denied may vary with the order of input.
I was about to post this in a comment but it was more than the the actually allowed characters.
I'm not sure about existing advanced algorithms that could potentially solve this problem, but you can custom fit some fundamental ones:
An employee wanting to move from city1 to some city2 is a directed edge from city1 to city2. Make sure that if 2 employees want to move from A to B, you add 2 directed edges for that or somehow keep count of the quantity.
Find disjoint components of the graph.
In each disjoint component, find the largest possible circle. A circle means A -> B -> C -> A.
Remove those edges and keep count of the number of successful swaps.
Rpeat until there are no circles in any of the disjoint components.
This is a greedy algorithm. At the moment I'm still not quite sure if it would produce the optimal solution in each and every situation. Any input is appreciated.

Can this Neo4j query be optimized?

I have rather large dataset (20mln nodes, 200mln edges), simplest shortestPath queries finish in milliseconds, everything is great.
But... I need to allow shortestPath to have ZERO or ONE relation of type 999 and it can be only the first from the start node.
So, my query became like this:
MATCH (one:Obj{oid:'startID'})-[r1*0..1]-(b:Obj)
WHERE all(rel in r1 where rel.val = 999)
WITH one, b
MATCH (two:Obj{oid:'endID'}), path=shortestPath((one) -[0..21]-(two))
WHERE ALL (x IN RELATIONSHIPS(path)
WHERE (x.val > -1 and x.val<101) or (x.val=999 or x.val=998)) return path
it runs in milliseconds when there's a short path (up to 2-4), but can take 5 or 20 seconds for paths like 5++. Maybe I've composed inefficient query?
This question will be bountied when available.
Some of your requirements are a bit unclear to me, so I'll reiterate my understanding and offer a solution.
You want to inspect the shortest paths between a start and end node.
The paths returned should have ZERO or ONE relationship with a val of 999. If it's ONE relationship with that value, it should be the first.
Here's an attempt based on that logic:
MATCH (start:Obj {oid:'startID'}),
(end:Obj {oid:'endID'}),
path=shortestPath((start)-[1..21]->(end))
WITH path, relationships(path) AS rels
WHERE all(r IN relationships WHERE r.val != 999)
OR (relationships[0].val = 999
AND all(r IN relationships[1..] WHERE r.val != 999))
RETURN path
I haven't had a chance to test on actual data, but hopefully this logic and approach at least point you in the right direction.
Also note: it's possible the entire WHERE clause at the end could be reduced to:
WHERE all(r IN relationships[1..] WHERE r.val != 999)
Meaning you don't even need to check the first relationship.

How to fetch a subgraph of first neighbors in neo4j?

I fetch first n neighbors of a node with this query in neo4j:
(in this example, n = 6)
I have a weighted graph, and so I also order the results by weight:
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
RETURN DISTINCT neighbor,
rel.weight AS weight ORDER BY proximity DESC LIMIT 6;
I would like to fetch a whole subgraph, including second neighbors (first neighbors of first six children).
I tried smtg like :
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
FOREACH (neighbor | MATCH neighbor-[rel2]-(neighbor2) )
RETURN DISTINCT neighbor1, neighbor2, rel.proximity AS proximity ORDER BY proximity DESC LIMIT 6, rel2.proximity AS proximity ORDER BY proximity DESC LIMIT 6;
the syntax is still wrong but I am also uncertain about the output:
I would like to have a table of tuples parent, children and weight:
[node_A - node_B - weight]
I would like to see if it is performing better one query or six queries.
Can someone help in clarifying how to iterate a query (FOREACH) and format the output?
thank you!
Ok, I think I understand. Here's another attempt based on your comment:
MATCH (start_node)-[rel]-(neighbor)
WHERE ID(start_node) IN {source_ids}
WITH
neighbor, rel
ORDER BY rel.proximity
WITH
collect({neighbor: neighbor, rel: rel})[0..6] AS neighbors_and_rels
UNWIND neighbors_and_rels AS neighbor_and_rel
WITH
neighbor_and_rel.neighbor AS neighbor,
neighbor_and_rel.rel AS rel
MATCH neighbor-[rel2]-(neighbor2)
WITH
neighbor,
rel,
neighbor2,
rel2
ORDER BY rel.proximity
WITH
neighbor,
rel,
collect([neighbor2, rel2])[0..6] AS neighbors_and_rels2
UNWIND neighbors_and_rels2 AS neighbor_and_rel2
RETURN
neighbor,
rel,
neighbor_and_rel2[0] AS neighbor2,
neighbor_and_rel2[1] AS rel2
It's a bit long, but hopefully it gives you the idea at least
First you should avoid using START as it will (hopefully) eventually go away.
So to get a neighborhood you could use variable length paths to get all of the paths away from the node
MATCH path=start_node-[rel*1..3]-(neighbor)
WHERE ID(start_node) = 1859988
RETURN path, nodes(path) AS nodes, EXTRACT(rel IN rels(path) | rel.weight) AS weights;
Then you can take the path / nodes and combine them in memory with your language of choice.
EDIT:
Also take a look at this SO Question: Fetch a tree with Neo4j
It shows how to get the output as a set of start/end nodes for each of the relationships which can be nicer in many cases.

find endpoints for range given a value within the range

I am trying to solve a simple problem, but at the moment I cannot think of a better solution. I am testing an API that is not documented.
There is an ID used to fetch objects and it has a min and max value with random values missing in-between. I'm trying to test the responses I receive for random objects, but to find objects, I need to have valid IDs.
It would be very inefficient to test random numbers and hope that I get an object back. The best I can do is find a range, get a random number between that range and check if it exists before conducting tests.
A sample list of all of the IDs in the database might look like this:
[1005, 25984, 25986, 29587, 30000, ...]
Assuming the deviation from one value to another will never exceed C, e.g. from the first value to the next value, the difference will never be greater than a pre-defined constant, how would you calculate the min/max of the range given only one value in the range?
Starting from a given value and looping until the last value is found is horrible but that is how it was implemented by previous devs. Below is pseudocode that more or less covers what they do.
// this can be any valid object ID from the database
// assuming the ID's in the database are [1005, 25984, 25986, 29587, 30000]
// "i" could be any one of these values
var i = givenPredefinedObjectId;
var deviation = 100;
// objectWithIdExists() is going to lookup an object with the ID "i" in the database
// if there is no object with the ID "i" , it will return false
// otherwise the object will get tested and return true
while(objectWithIdExists(i)){
i++;
}
for(i; i < i+deviation; i++){
if(objectWithIdExists(i)){
goto while loop;
}
}
endPoint = i - deviation;
Assuming there is no knowledge about the possible values except you can check if they exist and you are given one valid value (there is no array with all possible IDs, that was just an example), how would you find the min/max values?
Unbounded binary search is feasible, with a factor of C slowdown. Given an algorithm for unbounded binary search that, given access to the oracle less_equal(n) for some natural number n, returns n in time O(log n), implement the oracle on input k by querying all of the IDs C*k, C*k+1, ..., C*k+C-1 and reporting that k is less than or equal to n if and only if one ID is found. The running time is O(C*log((max-min)/C)).

Speed dating algorithm

I work in a consulting organization and am most of the time at customer locations. Because of that I rarely meet my colleagues. To get to know each other better we are going to arrange a dinner party. There will be many small tables so people can have a chat. In order to talk to as many different people as possible during the party, everybody has to switch tables at some interval, say every hour.
How do I write a program that creates the table switching schedule? Just to give you some numbers; in this case there will be around 40 people and there can be at most 8 people at each table. But, the algorithm needs to be generic of course
heres an idea
first work from the perspective of the first person .. lets call him X
X has to meet all the other people in the room, so we should divide the remaining people into n groups ( where n = #_of_people/capacity_per_table ) and make him sit with one of these groups per iteration
Now that X has been taken care of, we will consider the next person Y
WLOG Y be a person X had to sit with in the first iteration itself.. so we already know Y's table group for that time-frame.. we should then divide the remaining people into groups such that each group sits with Y for every consecutive iteration.. and for each iteration X's group and Y's group have no person in common
.. I guess, if you keep doing something like this, you will get an optimal solution (if one exists)
Alternatively you could crowd source the problem by giving each person a card where they could write down the names of all the people they got dine with.. and at the end of event, present some kind of prize to the person with the most names in their card
This sounds like an application for genetic algorithm:
Select a random permutation of the 40 guests - this is one seating arrangement
Repeat the random permutation N time (n is how many times you are to switch seats in the night)
Combine the permutations together - this is the chromosome for one organism
Repeat for how ever many organisms you want to breed in one generation
The fitness score is the number of people each person got to see in one night (or alternatively - the inverse of the number of people they did not see)
Breed, mutate and introduce new organisms using the normal method and repeat until you get a satisfactory answer
You can add in any other factors you like into the fitness, such as male/female ratio and so on without greatly changing the underlying method.
Why not imitate real world?
class Person {
void doPeriodically() {
do {
newTable = random (numberOfTables);
} while (tableBusy(newTable))
switchTable (newTable)
}
}
Oh, and note that there is a similar algorithm for finding a mating partner and it's rumored to be effective for those 99% of people who don't spend all of their free time answering programming questions...
Perfect Table Plan
You might want to have a look at combinatorial design theory.
Intuitively I don't think you can do better than a perfect shuffle, but it's beyond my pre-coffee cognition to prove it.
This one was very funny! :D
I tried different method but the logic suggested by adi92 (card + prize) is the one that works better than any other I tried.
It works like this:
a guy arrives and examines all the tables
for each table with free seats he counts how many people he has to meet yet, then choose the one with more unknown people
if two tables have an equal number of unknown people then the guy will choose the one with more free seats, so that there is more probability to meet more new people
at each turn the order of the people taking seats is random (this avoid possible infinite loops), this is a "demo" of the working algorithm in python:
import random
class Person(object):
def __init__(self, name):
self.name = name
self.known_people = dict()
def meets(self, a_guy, propagation = True):
"self meets a_guy, and a_guy meets self"
if a_guy not in self.known_people:
self.known_people[a_guy] = 1
else:
self.known_people[a_guy] += 1
if propagation: a_guy.meets(self, False)
def points(self, table):
"Calculates how many new guys self will meet at table"
return len([p for p in table if p not in self.known_people])
def chooses(self, tables, n_seats):
"Calculate what is the best table to sit at, and return it"
points = 0
free_seats = 0
ret = random.choice([t for t in tables if len(t)<n_seats])
for table in tables:
tmp_p = self.points(table)
tmp_s = n_seats - len(table)
if tmp_s == 0: continue
if tmp_p > points or (tmp_p == points and tmp_s > free_seats):
ret = table
points = tmp_p
free_seats = tmp_s
return ret
def __str__(self):
return self.name
def __repr__(self):
return self.name
def Switcher(n_seats, people):
"""calculate how many tables and what switches you need
assuming each table has n_seats seats"""
n_people = len(people)
n_tables = n_people/n_seats
switches = []
while not all(len(g.known_people) == n_people-1 for g in people):
tables = [[] for t in xrange(n_tables)]
random.shuffle(people) # need to change "starter"
for the_guy in people:
table = the_guy.chooses(tables, n_seats)
tables.remove(table)
for guy in table:
the_guy.meets(guy)
table += [the_guy]
tables += [table]
switches += [tables]
return switches
lst_people = [Person('Hallis'),
Person('adi92'),
Person('ilya n.'),
Person('m_oLogin'),
Person('Andrea'),
Person('1800 INFORMATION'),
Person('starblue'),
Person('regularfry')]
s = Switcher(4, lst_people)
print "You need %d tables and %d turns" % (len(s[0]), len(s))
turn = 1
for tables in s:
print 'Turn #%d' % turn
turn += 1
tbl = 1
for table in tables:
print ' Table #%d - '%tbl, table
tbl += 1
print '\n'
This will output something like:
You need 2 tables and 3 turns
Turn #1
Table #1 - [1800 INFORMATION, Hallis, m_oLogin, Andrea]
Table #2 - [adi92, starblue, ilya n., regularfry]
Turn #2
Table #1 - [regularfry, starblue, Hallis, m_oLogin]
Table #2 - [adi92, 1800 INFORMATION, Andrea, ilya n.]
Turn #3
Table #1 - [m_oLogin, Hallis, adi92, ilya n.]
Table #2 - [Andrea, regularfry, starblue, 1800 INFORMATION]
Because of the random it won't always come with the minimum number of switch, especially with larger sets of people. You should then run it a couple of times and get the result with less turns (so you do not stress all the people at the party :P ), and it is an easy thing to code :P
PS:
Yes, you can save the prize money :P
You can also take look at stable matching problem. The solution to this problem involves using max-flow algorithm. http://en.wikipedia.org/wiki/Stable_marriage_problem
I wouldn't bother with genetic algorithms. Instead, I would do the following, which is a slight refinement on repeated perfect shuffles.
While (there are two people who haven't met):
Consider the graph where each node is a guest and edge (A, B) exists if A and B have NOT sat at the same table. Find all the connected components of this graph. If there are any connected components of size < tablesize, schedule those connected components at tables. Note that even this is actually an instance of a hard problem known as Bin packing, but first fit decreasing will probably be fine, which can be accomplished by sorting the connected components in order of biggest to smallest, and then putting them each of them in turn at the first table where they fit.
Perform a random permutation of the remaining elements. (In other words, seat the remaining people randomly, which at first will be everyone.)
Increment counter indicating number of rounds.
Repeat the above for a while until the number of rounds seems to converge.

Resources