Finding optimal swapping paths in employees moving to different cities - algorithm

We have a problem where we want to find the optimal path for swapping employees' locations across the country.
Hypothetically, a company allows for employees to request to move to another city only if a vacancy is available in that city, and also if someone is willing to take their soon-to-be vacant position. Examine the example:
Employee A who currently works in Los Angeles wants to move to Boston.
Employee B who currently works in Boston wants to move to New York.
Employee C who currently works in New York wants to move to Los Angeles.
In the above triangle, we can grant all three employees the permission to do the move, since there won't be any vacancies once they move. But the situation gets more complex when:
Multiple employees are competing for the same location. We can solve this with a hypothetical score of some sort, like more years working for the company gets the priority.
We have more cities to consider. (in the hundreds)
We have more employees to consider. (in the hundreds of thousands)
Ultimately the goal is to grant the highest number of move permissions without leading to any vacancies in the system.
We're currently exploring the idea of simulating all the swapping paths, and then selecting the one that generates the highest number of moves.
But I feel that this problem existed in the wild before, I just don't know what keywords to look for in order to get more insights. Any ideas? What algorithms should we look into?

Remove the impossible move requests, like this
A,B are specific cities. n is amy city
RAB is a request to move from A to B
RAn is a request to move from A
RnA is a reuest to move to A
CAn is the number of requests to move from A
CnA is the number of requests to move to A
set flag TRUE
WHILE ( flag == TRUE )
set flag = FALSE
LOOP A over all cities
IF CAn > CnA then not all RAn can be permitted.
Remove lower scoring requests until CAn == CnA.
set flag TRUE
Once these "impossible" moves are removed, all of the remaining move requests are "in-balance". That is, all of the move requests to a city are equal all of those from a city. From that point on it no longer matters which cycles you choose to implement: once you implement them, the remainings requests are still all in-balance. And no matter which move-cycle and which order they are implemented in, it stays in-balance until all remaining requests are zero, and the total number of moves will be exactly the same no matter how they are implemented. ( This explanation is due to https://stackoverflow.com/users/109122/rbarryyoung )
Here is C++ code implementing this
void removeForbidden()
{
bool flag = true;
while (flag)
{
flag = false;
for (auto &city : sCity)
{
auto vFrom = RequestCountFrom(city);
auto vTo = RequestCountTo(city);
if (vFrom.size() > vTo.size())
{
for (int k = vTo.size(); k < vFrom.size(); k++)
{
vFrom[k]->allowed = false;
}
flag = true;
}
}
}
std::cout << "Permitted moves:\n";
for (auto &R : vRequest)
{
if (R.allowed)
std::cout << R.text();
}
}
The complete application code is at https://gist.github.com/JamesBremner/5f49beaca59a7a7043e356fbb35f0d09
The input is a space delimited text file with 4 columns: employee name, employee score, from city, to city
Here is sample input based on your example but adding another request that cannot be permitted
e1 1 a b
e2 1 b c
e3 1 c a
e4 0 a c
The output from this is
Permitted moves:
e1 1 a b
e2 1 b c
e3 1 c a
Note: I have not implemented the scoring. For simplicity I assume that move requests are entered in order of descending score. So, the requests that are dropped when necessary, change according to the order you enter them. I assume you will be able to implement whatever scoring system you require. Also note that, unless you calculate a unique score for every request from a city, then which requests are denied may vary with the order of input.

I was about to post this in a comment but it was more than the the actually allowed characters.
I'm not sure about existing advanced algorithms that could potentially solve this problem, but you can custom fit some fundamental ones:
An employee wanting to move from city1 to some city2 is a directed edge from city1 to city2. Make sure that if 2 employees want to move from A to B, you add 2 directed edges for that or somehow keep count of the quantity.
Find disjoint components of the graph.
In each disjoint component, find the largest possible circle. A circle means A -> B -> C -> A.
Remove those edges and keep count of the number of successful swaps.
Rpeat until there are no circles in any of the disjoint components.
This is a greedy algorithm. At the moment I'm still not quite sure if it would produce the optimal solution in each and every situation. Any input is appreciated.

Related

How to get level(depth) number of two connected nodes in neo4j

I'm using neo4j as a graph database to store user's connections detail into this. here I want to show the level of one user with respect to another user in their connections like Linkedin. for example- first layer connection, second layer connection, third layer and above the third layer shows 3+. but I don't know how this happens using neo4j. i searched for this but couldn't find any solution for this. if anybody knows about this then please help me to implement this functionality.
To find the shortest "connection level" between 2 specific people, just get the shortest path and add 1:
MATCH path = shortestpath((p1:Person)-[*..]-(p2:Person))
WHERE p1.id = 1 AND p2.id = 2
RETURN LENGTH(path) + 1 AS level
NOTE: You may want to put a reasonable upper bound on the variable-length relationship pattern (e.g., [*..6]) to avoid having the query taking too long or running out of memory in a large DB). You should probably ignore very distant connections anyway.
it would be something like this
// get all persons (or users)
MATCH (p:Person)
// create a set of unique combinations , assuring that you do
// not do double work
WITH COLLECT(p) AS personList
UNWIND personList AS personA
UNWIND personList AS personB
WITH personA,personB
WHERE id(personA) < id(personB)
// find the shortest path between any two nodes
MATCH path=shortestPath( (personA)-[:LINKED_TO*]-(personB) )
// return the distance ( = path length) between the two nodes
RETURN personA.name AS nameA,
personB.name AS nameB,
CASE WHEN length(path) > 3 THEN '3+'
ELSE toString(length(path))
END AS distance

Can this Neo4j query be optimized?

I have rather large dataset (20mln nodes, 200mln edges), simplest shortestPath queries finish in milliseconds, everything is great.
But... I need to allow shortestPath to have ZERO or ONE relation of type 999 and it can be only the first from the start node.
So, my query became like this:
MATCH (one:Obj{oid:'startID'})-[r1*0..1]-(b:Obj)
WHERE all(rel in r1 where rel.val = 999)
WITH one, b
MATCH (two:Obj{oid:'endID'}), path=shortestPath((one) -[0..21]-(two))
WHERE ALL (x IN RELATIONSHIPS(path)
WHERE (x.val > -1 and x.val<101) or (x.val=999 or x.val=998)) return path
it runs in milliseconds when there's a short path (up to 2-4), but can take 5 or 20 seconds for paths like 5++. Maybe I've composed inefficient query?
This question will be bountied when available.
Some of your requirements are a bit unclear to me, so I'll reiterate my understanding and offer a solution.
You want to inspect the shortest paths between a start and end node.
The paths returned should have ZERO or ONE relationship with a val of 999. If it's ONE relationship with that value, it should be the first.
Here's an attempt based on that logic:
MATCH (start:Obj {oid:'startID'}),
(end:Obj {oid:'endID'}),
path=shortestPath((start)-[1..21]->(end))
WITH path, relationships(path) AS rels
WHERE all(r IN relationships WHERE r.val != 999)
OR (relationships[0].val = 999
AND all(r IN relationships[1..] WHERE r.val != 999))
RETURN path
I haven't had a chance to test on actual data, but hopefully this logic and approach at least point you in the right direction.
Also note: it's possible the entire WHERE clause at the end could be reduced to:
WHERE all(r IN relationships[1..] WHERE r.val != 999)
Meaning you don't even need to check the first relationship.

How to fetch a subgraph of first neighbors in neo4j?

I fetch first n neighbors of a node with this query in neo4j:
(in this example, n = 6)
I have a weighted graph, and so I also order the results by weight:
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
RETURN DISTINCT neighbor,
rel.weight AS weight ORDER BY proximity DESC LIMIT 6;
I would like to fetch a whole subgraph, including second neighbors (first neighbors of first six children).
I tried smtg like :
START start_node=node(1859988)
MATCH start_node-[rel]-(neighbor)
FOREACH (neighbor | MATCH neighbor-[rel2]-(neighbor2) )
RETURN DISTINCT neighbor1, neighbor2, rel.proximity AS proximity ORDER BY proximity DESC LIMIT 6, rel2.proximity AS proximity ORDER BY proximity DESC LIMIT 6;
the syntax is still wrong but I am also uncertain about the output:
I would like to have a table of tuples parent, children and weight:
[node_A - node_B - weight]
I would like to see if it is performing better one query or six queries.
Can someone help in clarifying how to iterate a query (FOREACH) and format the output?
thank you!
Ok, I think I understand. Here's another attempt based on your comment:
MATCH (start_node)-[rel]-(neighbor)
WHERE ID(start_node) IN {source_ids}
WITH
neighbor, rel
ORDER BY rel.proximity
WITH
collect({neighbor: neighbor, rel: rel})[0..6] AS neighbors_and_rels
UNWIND neighbors_and_rels AS neighbor_and_rel
WITH
neighbor_and_rel.neighbor AS neighbor,
neighbor_and_rel.rel AS rel
MATCH neighbor-[rel2]-(neighbor2)
WITH
neighbor,
rel,
neighbor2,
rel2
ORDER BY rel.proximity
WITH
neighbor,
rel,
collect([neighbor2, rel2])[0..6] AS neighbors_and_rels2
UNWIND neighbors_and_rels2 AS neighbor_and_rel2
RETURN
neighbor,
rel,
neighbor_and_rel2[0] AS neighbor2,
neighbor_and_rel2[1] AS rel2
It's a bit long, but hopefully it gives you the idea at least
First you should avoid using START as it will (hopefully) eventually go away.
So to get a neighborhood you could use variable length paths to get all of the paths away from the node
MATCH path=start_node-[rel*1..3]-(neighbor)
WHERE ID(start_node) = 1859988
RETURN path, nodes(path) AS nodes, EXTRACT(rel IN rels(path) | rel.weight) AS weights;
Then you can take the path / nodes and combine them in memory with your language of choice.
EDIT:
Also take a look at this SO Question: Fetch a tree with Neo4j
It shows how to get the output as a set of start/end nodes for each of the relationships which can be nicer in many cases.

Algorithm to create unique random concatenation of items

I'm thinking about an algorithm that will create X most unique concatenations of Y parts, where each part can be one of several items. For example 3 parts:
part #1: 0,1,2
part #2: a,b,c
part #3: x,y,z
And the (random, one case of some possibilities) result of 5 concatenations:
0ax
1by
2cz
0bz (note that '0by' would be "less unique " than '0bz' because 'by' already was)
2ay (note that 'a' didn't after '2' jet, and 'y' didn't after 'a' jet)
Simple BAD results for next concatenation:
1cy ('c' wasn't after 1, 'y' wasn't after 'c', BUT '1'-'y' already was as first-last
Simple GOOD next result would be:
0cy ('c' wasn't after '0', 'y' wasn't after 'c', and '0'-'y' wasn't as first-last part)
1az
1cx
I know that this solution limit possible results, but when all full unique possibilities will gone, algorithm should continue and try to keep most avaible uniqueness (repeating as few as possible).
Consider real example:
Boy/Girl/Martin
bought/stole/get
bottle/milk/water
And I want results like:
Boy get milk
Martin stole bottle
Girl bought water
Boy bought bottle (not water, because of 'bought+water' and not milk, because of 'Boy+milk')
Maybe start with a tree of all combinations, but how to select most unique trees first?
Edit: According to this sample data, we can see, that creation of fully unique results for 4 words * 3 possibilities, provide us only 3 results:
Martin stole a bootle
Boy bought an milk
He get hard water
But, there can be more results requested. So, 4. result should be most-available-uniqueness like Martin bought hard milk, not Martin stole a water
Edit: Some start for a solution ?
Imagine each part as a barrel, wich can be rotated, and last item goes as first when rotates down, first goes as last when rotating up. Now, set barells like this:
Martin|stole |a |bootle
Boy |bought|an |milk
He |get |hard|water
Now, write sentences as We see, and rotate first barell UP once, second twice, third three and so on. We get sentences (note that third barell did one full rotation):
Boy |get |a |milk
He |stole |an |water
Martin|bought|hard|bootle
And we get next solutions. We can do process one more time to get more solutions:
He |bought|a |water
Martin|get |an |bootle
Boy |stole |hard|milk
The problem is that first barrel will be connected with last, because rotating parallel.
I'm wondering if that will be more uniqe if i rotate last barrel one more time in last solution (but the i provide other connections like an-water - but this will be repeated only 2 times, not 3 times like now). Don't know that "barrels" are good way ofthinking here.
I think that we should first found a definition for uniqueness
For example, what is changing uniqueness to drop ? If we use word that was already used ? Do repeating 2 words close to each other is less uniqe that repeating a word in some gap of other words ? So, this problem can be subjective.
But I think that in lot of sequences, each word should be used similar times (like selecting word randomly and removing from a set, and after getting all words refresh all options that they can be obtained next time) - this is easy to do.
But, even if we get each words similar number od times, we should do something to do-not-repeat-connections between words. I think, that more uniqe is repeating words far from each other, not next to each other.
Anytime you need a new concatenation, just generate a completely random one, calculate it's fitness, and then either accept that concatenation or reject it (probabilistically, that is).
const C = 1.0
function CreateGoodConcatenation()
{
for (rejectionCount = 0; ; rejectionCount++)
{
candidate = CreateRandomConcatination()
fitness = CalculateFitness(candidate) // returns 0 < fitness <= 1
r = GetRand(zero to one)
adjusted_r = Math.pow(r, C * rejectionCount + 1) // bias toward acceptability as rejectionCount increases
if (adjusted_r < fitness)
{
return candidate
}
}
}
CalculateFitness should never return zero. If it does, you might find yourself in an infinite loop.
As you increase C, less ideal concatenations are accepted more readily.
As you decrease C, you face increased iterations for each call to CreateGoodConcatenation (plus less entropy in the result)

Speed dating algorithm

I work in a consulting organization and am most of the time at customer locations. Because of that I rarely meet my colleagues. To get to know each other better we are going to arrange a dinner party. There will be many small tables so people can have a chat. In order to talk to as many different people as possible during the party, everybody has to switch tables at some interval, say every hour.
How do I write a program that creates the table switching schedule? Just to give you some numbers; in this case there will be around 40 people and there can be at most 8 people at each table. But, the algorithm needs to be generic of course
heres an idea
first work from the perspective of the first person .. lets call him X
X has to meet all the other people in the room, so we should divide the remaining people into n groups ( where n = #_of_people/capacity_per_table ) and make him sit with one of these groups per iteration
Now that X has been taken care of, we will consider the next person Y
WLOG Y be a person X had to sit with in the first iteration itself.. so we already know Y's table group for that time-frame.. we should then divide the remaining people into groups such that each group sits with Y for every consecutive iteration.. and for each iteration X's group and Y's group have no person in common
.. I guess, if you keep doing something like this, you will get an optimal solution (if one exists)
Alternatively you could crowd source the problem by giving each person a card where they could write down the names of all the people they got dine with.. and at the end of event, present some kind of prize to the person with the most names in their card
This sounds like an application for genetic algorithm:
Select a random permutation of the 40 guests - this is one seating arrangement
Repeat the random permutation N time (n is how many times you are to switch seats in the night)
Combine the permutations together - this is the chromosome for one organism
Repeat for how ever many organisms you want to breed in one generation
The fitness score is the number of people each person got to see in one night (or alternatively - the inverse of the number of people they did not see)
Breed, mutate and introduce new organisms using the normal method and repeat until you get a satisfactory answer
You can add in any other factors you like into the fitness, such as male/female ratio and so on without greatly changing the underlying method.
Why not imitate real world?
class Person {
void doPeriodically() {
do {
newTable = random (numberOfTables);
} while (tableBusy(newTable))
switchTable (newTable)
}
}
Oh, and note that there is a similar algorithm for finding a mating partner and it's rumored to be effective for those 99% of people who don't spend all of their free time answering programming questions...
Perfect Table Plan
You might want to have a look at combinatorial design theory.
Intuitively I don't think you can do better than a perfect shuffle, but it's beyond my pre-coffee cognition to prove it.
This one was very funny! :D
I tried different method but the logic suggested by adi92 (card + prize) is the one that works better than any other I tried.
It works like this:
a guy arrives and examines all the tables
for each table with free seats he counts how many people he has to meet yet, then choose the one with more unknown people
if two tables have an equal number of unknown people then the guy will choose the one with more free seats, so that there is more probability to meet more new people
at each turn the order of the people taking seats is random (this avoid possible infinite loops), this is a "demo" of the working algorithm in python:
import random
class Person(object):
def __init__(self, name):
self.name = name
self.known_people = dict()
def meets(self, a_guy, propagation = True):
"self meets a_guy, and a_guy meets self"
if a_guy not in self.known_people:
self.known_people[a_guy] = 1
else:
self.known_people[a_guy] += 1
if propagation: a_guy.meets(self, False)
def points(self, table):
"Calculates how many new guys self will meet at table"
return len([p for p in table if p not in self.known_people])
def chooses(self, tables, n_seats):
"Calculate what is the best table to sit at, and return it"
points = 0
free_seats = 0
ret = random.choice([t for t in tables if len(t)<n_seats])
for table in tables:
tmp_p = self.points(table)
tmp_s = n_seats - len(table)
if tmp_s == 0: continue
if tmp_p > points or (tmp_p == points and tmp_s > free_seats):
ret = table
points = tmp_p
free_seats = tmp_s
return ret
def __str__(self):
return self.name
def __repr__(self):
return self.name
def Switcher(n_seats, people):
"""calculate how many tables and what switches you need
assuming each table has n_seats seats"""
n_people = len(people)
n_tables = n_people/n_seats
switches = []
while not all(len(g.known_people) == n_people-1 for g in people):
tables = [[] for t in xrange(n_tables)]
random.shuffle(people) # need to change "starter"
for the_guy in people:
table = the_guy.chooses(tables, n_seats)
tables.remove(table)
for guy in table:
the_guy.meets(guy)
table += [the_guy]
tables += [table]
switches += [tables]
return switches
lst_people = [Person('Hallis'),
Person('adi92'),
Person('ilya n.'),
Person('m_oLogin'),
Person('Andrea'),
Person('1800 INFORMATION'),
Person('starblue'),
Person('regularfry')]
s = Switcher(4, lst_people)
print "You need %d tables and %d turns" % (len(s[0]), len(s))
turn = 1
for tables in s:
print 'Turn #%d' % turn
turn += 1
tbl = 1
for table in tables:
print ' Table #%d - '%tbl, table
tbl += 1
print '\n'
This will output something like:
You need 2 tables and 3 turns
Turn #1
Table #1 - [1800 INFORMATION, Hallis, m_oLogin, Andrea]
Table #2 - [adi92, starblue, ilya n., regularfry]
Turn #2
Table #1 - [regularfry, starblue, Hallis, m_oLogin]
Table #2 - [adi92, 1800 INFORMATION, Andrea, ilya n.]
Turn #3
Table #1 - [m_oLogin, Hallis, adi92, ilya n.]
Table #2 - [Andrea, regularfry, starblue, 1800 INFORMATION]
Because of the random it won't always come with the minimum number of switch, especially with larger sets of people. You should then run it a couple of times and get the result with less turns (so you do not stress all the people at the party :P ), and it is an easy thing to code :P
PS:
Yes, you can save the prize money :P
You can also take look at stable matching problem. The solution to this problem involves using max-flow algorithm. http://en.wikipedia.org/wiki/Stable_marriage_problem
I wouldn't bother with genetic algorithms. Instead, I would do the following, which is a slight refinement on repeated perfect shuffles.
While (there are two people who haven't met):
Consider the graph where each node is a guest and edge (A, B) exists if A and B have NOT sat at the same table. Find all the connected components of this graph. If there are any connected components of size < tablesize, schedule those connected components at tables. Note that even this is actually an instance of a hard problem known as Bin packing, but first fit decreasing will probably be fine, which can be accomplished by sorting the connected components in order of biggest to smallest, and then putting them each of them in turn at the first table where they fit.
Perform a random permutation of the remaining elements. (In other words, seat the remaining people randomly, which at first will be everyone.)
Increment counter indicating number of rounds.
Repeat the above for a while until the number of rounds seems to converge.

Resources