Matching items 1-1 with partial information in rust - algorithm

I have encountered the following situation when writing my program and I am not sure what is the best way to go about solving it, there is probably some algorithm that would be best for it, but I am not sure on what would be the correct terms for it.
I have 3 lists:
An unordered list of detailed information about individual users.
An unordered list of users
An list of "hints" for matching the users to their information, varying in correctness
I know that every single information item belongs to one user that is in my list, and the number of users and information items is equal.
The "hints" that I have can be described in the style of "User B has a number that is >= 8 at index 52 of their information item", or "User C has a number that is equal to 6 at index 12 of their information item".
The point is that for any given information item, I can check if it either satisfies the hint for that user, or it doesn't satisfy the hint of a user. Of course, a hint can be true for multiple items and an item cannot belong to more than one user.
It is important to note that the amount of hints I have will not necessarily be enough to match every user to an item, but I want to nevertheless narrow it down as much as possible with the information that I have, and maybe get results that would describe, for example, that item 5 belongs to A, item 6 belongs to either B, C, item 5 belongs to e or f, but item 6 certainly belongs to c if item 5 belongs to f.
I also do not trust all my hints equally. I have some hints that are almost certainly true, and I want these to be assumed to not be able to contradict themselves, and if they do contradict or create an impossible situation with the provided info item list, that it is found out about the error. And I also have hints that I don't trust as much, and these might contradict themselves, and if so the trusted ones are given priority, and these less trusted ones are only used as a last resort to suggest ways to narrow down the possibilities even further only at the very end.
I am using rust and petgraph for my program until I am at this point, I would think that there is a way that you are supposed to create a graph for this and use some algorithm on it to solve this problem.

You can set this up as a bipartite graph, use edge weights to reflect your information, and then find the maximum matching.
You have two sets of nodes: one for users, and another for information. There are weighted edges between users and information nodes, with the edge weight representing our confidence in the connection.
I'm assuming for every hint you have a confidence score between 0 and 1 representing how likely you think the hint is to be true.
Here's how to convert hints to edge weights: Say we have 3 hints for a (user, info) pair: h1, h2, h3, with confidence scores c1, c2, c3. Then the edge between user and info gets the score 1 - ((1 - c1) * (1 - c2) * (1 - c3)).
E.g., say our confidence scores are 0.9, 0.3, 0.2. Then the edge weight becomes 1 - ((1 - 0.9) * (1 - 0.3) * (1 - 0.2)) = 1 - 0.1 * 0.7 * 0.8 = 0.944.
Leave any (user, info) pairs with no hints as having no edge.
Then, find the maximum weight matching in a bipartite graph. (e.g. https://www.geeksforgeeks.org/maximum-bipartite-matching/). This will maximize your confidence per the scores above.
Finally we're left with (user, info) pairs with no hints, which you can pair up however you like.

Related

Optimize event seat assignments with Corona restrictions

Problem:
Given a set of group registrations, each for a varying number of people (1-7),
and a set of seating groups (immutable, at least 2m apart) varying from 1-4 seats,
I'd like to find the optimal assignment of people groups to seating groups:
People groups may be split among several seating groups (though preferably not)
Seating groups may not be shared by different people groups
(optional) the assignment should minimize the number of 'wasted' seats, i.e. maximize the number of seats in empty seating groups
(ideally it should run from within a Google Apps script, so memory and computational complexity should be as small as possible)
First attempt:
I'm interested in the decision problem (is it feasible?) as well as the optimization problem (see optional optimization function). I've modeled it as a SAT problem, but this does not find an optimal solution.
For this reason, I've tried to model it as an optimization problem. I'm thinking along the lines of a (remote) variation of multiple-knapsack, but I haven't been able to name it yet:
items: seating groups (size -> weight)
knapsacks: people groups (size -> container size)
constraint: combined item weight >= container size
optimization: minimize the number of items
As you can see, the constraint and optimization are inverted compared to the standard problem. So my question is: Am I on the right track here or would you go about it another way? If it's correct, does this optimization problem have a name?
You could approach this as an Integer Linear Programming Problem, defined as follows:
let P = the set of people groups, people group i consists of p_i people;
let T = the set of tables, table j has t_j places;
let x_ij be 1 if people from people group i are placed at table j, 0 otherwise
let M be a large penalty factor for empty seats
let N be a large penalty factor for splitting groups
// # of free spaces = # unavailable - # occupied
// every time a group uses more than one table,
// a penalty of N * (#tables - 1) is incurred
min M * [SUM_j(SUM_i[x_ij] * t_j) - SUM_i(p_i)] + N * SUM_i[(SUM_j(x_ij) - 1)]
// at most one group per table
s.t. SUM_i(x_ij) <= 1 for all j
// every group has enough seats
SUM_j(x_ij * t_j) = p_i for all i
0 <= x_ij <= 1
Although this minimises the number of empty seats, it does not minimise the number of tables used or maximise the number of groups admitted. If you'd like to do that, you could expand the objective function by adding a penalty for every group turned away.
ILPs are NP-hard, so without the right solvers, it might not be possible to make this run with Google Apps. I have no experience with that, so I'm afraid I can't help you. But there are some methods to reduce your search space.
One would be through something called column generation. Here, the problem is split into two parts. The complex master problem is your main research question, but instead of the entire solution space, it tries to find the optimum from different candidate assignments (or columns).
The goal is then to define a subproblem that recommends these new potential solutions that are then incorporated in the master problem. The power of a good subproblem is that it should be reducable to a simpler model, like Knapsack or Dijkstra.

Dynamic graph algorithm shortest path

I have a problem which I am converting into a TSP kind of problem so that I can explain it here and I am looking for any existing algorithms that might help me.
There are a list of places that I need to visit and I need to visit them all.
There are some places that have to be visited as the first x of n (IE, they need to be first 3 or first 5 places visited). (where the number is arbitrary)
There are some other places that have to be visited as the last y of n (IE, they need to be last 3 or last 5 places visited).
The places are could be categorized (some may not have a category), for those in a category, they need to visited as far away from each other (ie, if 2 places are categorized as green, then I would like to visit as many other category places as possible between these green categorized places)
Here is an example list:
A: category green: last 3
B: category none: ordering none
C: category pink: first 3
D: category none: ordering none
E: category none: last 3
F: category green: ordering none
G: category pink: first 3
The order I would like to come up with is:
G(pink,first3) -> F(green,none) -> C(pink,first3) -> D(none,none) -> B(none,none) -> E(none,last3) -> A(green,last3)
Explanation:
G came first, to keep it as far away from C as possible.
F came next to keep it as far away from A as possible.
C came next as it needed to be in first 3. C and G could be interchanged
D B could be placed anywhere
E came next as it had to be last 3
A came last as it had to be last 3 and by placing it at the end, it was as far as possible from F.
My idea is to evaluate each edge cost and the edge cost would be dynamically calculated. So if you tried to visit A and then F it would have a high cost, as opposed to visiting A and then some other place and then F (where the number of places in between would some how be part of the cost). Also, I would introduce a start and end place and so, if I had to visit some places as first x, I would be able to give it a low cost if start was within N places of that place. Same for the end.
I was wondering if there is a graph algorithm that can account for such dynamic weights/cost and determine the shortest path from start to end?
note: In some cases a best case may not be available, and that would be ok, as long as I can show that the cost is high because there wasnt enough category separation (eg: all places were in the same category).
Brute force algorithm
Initial idea I had is: Given a list of places, come up with all combinations of place ordering and calculate the costs between each and then choose the cheapest. (but this would mean evaluating n! where for 8 that would be 362880 orders that i would have to evaluate! why 8, cause that is what I believe will be the average number of places to evaluate)
But is there an algorithm that I could potentially use to determine it without testing all orderings.
One thing you could do:
Order the places as follows: first 1, ... first n, unordered, last n, ... last 1.
Go through the list and separate elemnts with the same color where possible without violating the previous order
Calculate the cost of this list and store it as the current best
Use this list to determine the order in which you evaluate permutations
While you build permutations, keep track of the cost
Abort building the current permutation when the cost exceeds the current best (including the theoretical minimum cost for the remaining places, if there is any)
Also abort when you have the theoretically possible best score.

Assign items to a minimal number of groups

This is a popular CS pattern, but I'm apparently missing some keywords because I'm not having any luck searching for it.
I have a set of 4 items: [A,B,C,D].
I have 3 groups: 1, 2, 3.
Group 1 can accept A or B.
Group 2 can accept B or C.
Group 3 can accept C or D.
Assign the items in a way that minimizes the number of groups used.
I.e. the solution would be:
Group 1: [A,B]
Group 2: []
Group 3: [C,D]
How would I solve this programatically? I know I've seen this before, so any keywords or links to point me in the right direction would be very appreciated.
This is the set covering problem. It is NP hard, so finding the true minimal set is hard in general and requires exponential time. The greedy algorithm which takes the set covering the most remaining elements may give good approximations. For covering sets with bounded size it can be also solved in reasonable time. For further details see http://en.m.wikipedia.org/wiki/Set_cover_problem
If we visualize this problem as a Graph problem, which all the groups and items are nodes in the graph, and there are edges connect between the group and items, we can see that this problem is a small case of Vertex Cover
However, if the number of items is small (less than 16), we can use dynamic programming to solve it easily.

Combinatorial best match

Say I have a Group data structure which contains a list of Element objects, such that each group has a unique set of elements.:
public class Group
{
public List<Element> Elements;
}
and say I have a list of populations who require certain elements, in such a way that each population has a unique set of required elements:
public class Population
{
public List<Element> RequiredElements;
}
I have an unlimited quantity of each defined Group, i.e. they are not consumed by populations.
Say I am looking at a particular Population. I want to find the best possible match of groups such that there is minimum excess elements, and no unmatched elements.
For example: I have a population which needs wood, steel, grain, and coal. The only groups available are {wood, herbs}, {steel, coal, oil}, {grain, steel}, and {herbs, meat}.
The last group - {herbs, meat} isn't required at all by my population so it isn't used. All others are needed, but herbs and oil are not required so it is wasted. Furthermore, steel exists twice in the minimum set, so one lot of steel is also wasted. The best match in this example has a wastage of 3.
So for a few hundred Population objects, I need to find the minimum wastage best match and compute how many elements are wasted.
How do I even begin to solve this? Once I have found a match, counting the wastage is trivial. Finding the match in the first place is hard. I could enumerate all possibilities but with a few thousand populations and many hundreds of groups, it's quite a task. Especially considering this whole thing sits inside each iteration of a simulated annealing algorithm.
I'm wondering whether I can formulate the whole thing as a mixed-integer program and call a solver like GLPK at each iteration.
I hope I have explained the problem correctly. I can clarify anything that's unclear.
Here's my binary program, for those of you interested...
x is the decision vector, an element of {0,1}, which says that the population in question does/doesn't receive from group i. There is an entry for each group.
b is the column vector, an element of {0,1}, which says which resources the population in question does/doesn't need. There is an entry for each resource.
A is a matrix, an element of {0,1}, which says what resources are in what groups.
The program is:
Minimise: ((Ax - b)' * 1-vector) + (x' * 1-vector);
Subject to: Ax >= b;
The constraint just says that all required resources must be satisfied. The objective is to minimise all excess and the total number of groups used. (i.e. 0 excess with 1 group used is better than 0 excess with 5 groups used).
You can formulate an integer program for each population P as follows. Use a binary variable xj to denote whether group j is chosen or not. Let A be a binary matrix, such that Aij is 1 if and only if item i is present in group j. Then the integer program is:
min Ei,j (xjAij)
s.t. Ej xjAij >= 1 for all i in P.
xj = 0, 1 for all j.
Note that you can obtain the minimum wastage by subtracting |P| from the optimal solution of the above IP.
Do you mean the Maximum matching problem?
You need to build a bipartite graph, where one of the sides is your populations and the other is groups, and edge exists between group A and population B if it have it in its set.
To find maximum edge matching you can easily use Kuhn algorithm, which is greatly described here on TopCoder.
But, if you want to find mimimum edge dominating set (the set of minimum edges that is covering all the vertexes), the problem becomes NP-hard and can't be solved in polynomial time.
Take a look at the weighted set cover problem, I think this is exactly what you described above. A basic description of the (unweighted) problem can be found here.
Finding the minimal waste as you defined above is equivalent to finding a set cover such that the sum of the cardinalities of the covering sets is minimal. Hence, the weight of each set (=a group of elements) has to be defined equal to its cardinality.
Since even the unweighted the set cover problem is NP-complete, it is not likely that an efficient algorithm for your problem instances exist. Maybe a good greedy approximation algorithm will be sufficient or your purpose? Googling weighted set cover provides several promising results, e.g. this script.

How do I pick the most beneficial combination of items from a set of items?

I'm designing a piece of a game where the AI needs to determine which combination of armor will give the best overall stat bonus to the character. Each character will have about 10 stats, of which only 3-4 are important, and of those important ones, a few will be more important than the others.
Armor will also give a boost to 1 or all stats. For example, a shirt might give +4 to the character's int and +2 stamina while at the same time, a pair of pants may have +7 strength and nothing else.
So let's say that a character has a healthy choice of armor to use (5 pairs of pants, 5 pairs of gloves, etc.) We've designated that Int and Perception are the most important stats for this character. How could I write an algorithm that would determine which combination of armor and items would result in the highest of any given stat (say in this example Int and Perception)?
Targeting one statistic
This is pretty straightforward. First, a few assumptions:
You didn't mention this, but presumably one can only wear at most one kind of armor for a particular slot. That is, you can't wear two pairs of pants, or two shirts.
Presumably, also, the choice of one piece of gear does not affect or conflict with others (other than the constraint of not having more than one piece of clothing in the same slot). That is, if you wear pants, this in no way precludes you from wearing a shirt. But notice, more subtly, that we're assuming you don't get some sort of synergy effect from wearing two related items.
Suppose that you want to target statistic X. Then the algorithm is as follows:
Group all the items by slot.
Within each group, sort the potential items in that group by how much they boost X, in descending order.
Pick the first item in each group and wear it.
The set of items chosen is the optimal loadout.
Proof: The only way to get a higher X stat would be if there was an item A which provided more X than some other in its group. But we already sorted all the items in each group in descending order, so there can be no such A.
What happens if the assumptions are violated?
If assumption one isn't true -- that is, you can wear multiple items in each slot -- then instead of picking the first item from each group, pick the first Q(s) items from each group, where Q(s) is the number of items that can go in slot s.
If assumption two isn't true -- that is, items do affect each other -- then we don't have enough information to solve the problem. We'd need to know specifically how items can affect each other, or else be forced to try every possible combination of items through brute force and see which ones have the best overall results.
Targeting N statistics
If you want to target multiple stats at once, you need a way to tell "how good" something is. This is called a fitness function. You'll need to decide how important the N statistics are, relative to each other. For example, you might decide that every +1 to Perception is worth 10 points, while every +1 to Intelligence is only worth 6 points. You now have a way to evaluate the "goodness" of items relative to each other.
Once you have that, instead of optimizing for X, you instead optimize for F, the fitness function. The process is then the same as the above for one statistic.
If, there is no restriction on the number of items by category, the following will work for multiple statistics and multiple items.
Data preparation:
Give each statistic (Int, Perception) a weight, according to how important you determine it is
Store this as a 1-D array statImportance
Give each item-statistic combination a value, according to how much said item boosts said statistic for the player
Store this as a 2-D array itemStatBoost
Algorithm:
In pseudocode. Here assume that itemScore is a sortable Map with Item as the key and a numeric value as the value, and values are initialised to 0.
Assume that the sort method is able to sort this Map by values (not keys).
//Score each item and rank them
for each statistic as S
for each item as I
score = itemScore.get(I) + (statImportance[S] * itemStatBoost[I,S])
itemScore.put(I, score)
sort(itemScore)
//Decide which items to use
maxEquippableItems = 10 //use the appropriate value
selectedItems = new array[maxEquippableItems]
for 0 <= idx < maxEquippableItems
selectedItems[idx] = itemScore.getByIndex(idx)

Resources