Formulating an algorithm for a group sorting program with exclusionfactors - algorithm

I'm trying to formulate an equation/algorithm to solve this problem (for a program I'm writing):
Rules:
A person, p, that is to be sorted can exclude n amount of people from the list. The excluded people, n, cannot be in the same group as p.
The list will contain around 100-150 people.
A group should contain 5-7 people (ideally 6)
My current thoughts:
Take the list count and divide it by 6, which will give me the amount of groups.
Feed people into the groups untill an exclusion occurs. When this happens, try to move the mismatched persons into other groups, based on some sort of score-system untill proper groups are formed.
However, I still feel like I need to put a limit on the amount of people allowed to be excluded per person.
My question is basically how I would figure out how many people a certain person can exclude to make this endeavor possible. Considering there will be around 150 people, each with its own list of persons to exclude, is it even possible? However, some exceptions are ofcourse allowed. Ideas and thoughts are also appriciated!
I'm planning to write the program in java.

Related

How to optimize events scheduling with unknown future events?

Scenario:
I need to give users opportunity to book different times for the service.
Caveat is that i dont have bookings in advance but i need to fill them as they come in.
Bookings can be represented as keyvalue pairs:
[startTime, duration]
So, for example, [9,3] would mean event starts at 9 o’clock and has duration of 3 hours.
Rules:
users come in one by one, there is never a batch of users requests
no bookings can overlap
service is available 24/7 so no need to worry about “working time”
users choose duration on their own
obviously, once user chooses&confirms his booking we cannot shuffle it anymore
we dont want gaps to be lesser than some amount of time. this one is based on probability that future users will fill in the gap. for example, if distribution of durations over users bookings is such that probability for future users filling the gap shorter than x hours is less than p then we want a rule that gap cannot be shorter than x. (for purpose of this question, we can assume x being hardcoded, here i just explain reasons)
the goal is to have service-busy-duration maximized
My thinking so far...
I keep the list of bookings made so far
I also keep track of gaps (as they are potential slots for new users booking)
When new user comes with his booking [startTime, duration] i first check for ideal case where gapLength = duration. if there is no such gaps, i find all slots (gaps) that satisfy condition gapLength - duration > minimumGapDuration and order them in descending order by that gapLength - duration value
I assign user to the first gap with maximum value of gapLength - duration since that gives me highest probability that gap remaining after this booking will also get filled in future
Questions:
Are there some problems with my approach that i am missing?
Are there some algorithms solving this particular problem?
Is there some usual approach (good starting point) which i could start with and optimize later? (i am actually trying to get enough infos to start but not making some critical mistake; optimizations can/should come later)
PS.
From research so far it sounds this might be the case for constraint programming. I would like to avoid it if possible as i have no clue about it (maybe its simple, i just dont know) but if it makes a real difference, i will go for its benefits and implement it.
I went through stackoverflow for similar problems but didnt find one with unknown future events. If there is such and this is direct duplicate, please refer to it.

Subset sum in a specific case

I need to reconcile bank transactions. A match is when the sum of a group of transactions is zero.The grade of the match is higher if the match is small.
This is a variation of the subset sum problem (NP Complete).
However:
There are not so many transactions (usually up to 10K)
The sum is limited to 10M so maybe, under these conditions there is a practical solution.
The maximum group size can be limited to 10 transactions.
Thanks to anyone who helps.
U can use dynamic programming as mentioned by sudomakeinstall2. You will need to store for each sum the sums used to get to it (so you can back trace to the transactions , to build the actual group, not just answer true/false).
If a sum has too many paths (too many possibilities to get this sum), than it's meaningless (too many possibilities to reconcile) and you might ignore long paths.
When calculating sums use filters for transactions with similar references/dates/details.
You might want to make a few iterations.First try to find only small groups.This should be fast, and then you can remove all the transaction in this groups before going for larger groups.
Hope this helps.

Creating random overlapping groups

I'm trying to populate a database with sample data, and I'm hoping there's an algorithm out there that can speed up this process.
I have a database of sample people and I need to create a sample network of friend pairings. For example person 1 might be be friends with person 2,3,4, and 7, and person 2 would obviously be friends with person 1, but not necessarily with any of the others.
I'm hoping to find a way to automate the process of creating these randomly generated list of friends within certain parameters, like minimum and maximum number of friends.
Does something like this exist or could someone point me in the right direction?
So I'm not if this is the ideal solution, but it worked for me. Generally the steps were:
Start with an array of people.
Copy the array and shuffle it.
Give each person in the first array a random number (within a range) of random friends (second array).
Remove the person from their own list of friends.
Iterate through each person in each friend list and see if the owner of the list is in their friend's list and if not, add it.
I used a pool of 1000 people, with and initial range of friends of 3-10, and after adding the reciprocals the final average was about 5-27, which was good enough for me.

n! combinations, how to find best one without killing computer?

I'll get straight to it. I'm working on an web or phone app that is responsible for scheduling. I want students to input courses they took, and I give them possible combinations of courses they should take that fits their requirements.
However, let's say there's 150 courses that fits their requirements and they're looking for 3 courses. That would be 150C3 combinations, right?.
Would it be feasible to run something like this in browser or a mobile device?
First of all you need a smarter algorithm which can prune the search tree. Also, if you are doing this for the same set of courses over and over again, doing the computation on the server would be better, and perhaps precomputing a feasible data structure can reduce the execution time of the queries. For example, you can create a tree where each sub-tree under a node contains nodes that are 'compatible'.
Sounds to me like you're viewing this completely wrong. At most institutions there are 1) curriculum requirements for graduation, and 2) prerequisites for many requirements and electives. This isn't a pure combinatorial problem, it's a dependency tree. For instance, if Course 201, Course 301, and Course 401 are all required for the student's major, higher numbers have the lower numbered ones as prereqs, and the student is a Junior, you should be strongly recommending that Course 201 be taken ASAP.
Yay, mathematics I think I can handle!
If there are 150 courses, and you have to choose 3, then the amount of possibilities are (150*149*148)/(3*2) (correction per jerry), which is certainly better than 150 factorial which is a whole lot more zeros ;)
Now, you really don't want to build an array that size, and you don't have to! All web languages have the idea of randomly choosing an element in an array, so you get an element in an array and request 3 random unique entries from it.
While the potential course combinations is very large, based on your post I see no reason to even attempt to calculate them. This task of random selection of k items from n-sized list is delightfully trivial even for old, slow devices!
Is there any particular reason you'd need to calculate all the potential course combinations, instead of just grab-bagging one random selection as a suggestion? If not, problem solved!
Option 1 (Time\Space costly): let the user on mobile phone browse the list of (150*149*148) possible choices, page by page, the processing is done at the server-side.
Option 2 (Simple): instead of the (150*149*148)-item decision tree, provide a 150-item bag, if he choose one item from the bag, remove it from the bag.
Option 3 (Complex): expand your decision tree (possible choices) using a dependency tree (parent course requires child courses) and the list of course already taken by the student, and his track\level.
As far as I know, most educational systems use the third option, which requires having a profile for the student.

Multi Attribute Matching of Profiles

I am trying to solve a problem of a dating site. Here is the problem
Each user of app will have some attributes - like the books he reads, movies he watches, music, TV show etc. These are defined top level attribute categories. Each of these categories can have any number of values. e.g. in books : Fountain Head, Love Story ...
Now, I need to match users based on profile attributes. Here is what I am planning to do :
Store the data with reverse indexing. i.f. Each of Fountain Head, Love Story etc is index key to set of users with that attribute.
When a new user joins, get the attributes of this user, find which index keys for this user, get all the users for these keys, bucket (or radix sort or similar sort) to sort on the basis of how many times a user in this merged list.
Is this good, bad, worse? Any other suggestions?
Thanks
Ajay
The algorithm you described is not bad, although it uses a very simple notion of similarity between people.
Let us make it more adjustable, without creating a complicated matching criteria. Let's say people who like the same book are more similar than people who listen to the same music. The same goes with every interest. That is, similarity in different fields has different weights.
Like you said, you can keep a list for each interest (like a book, a song etc) to the people who have that in their profile. Then, say you want to find matches of guy g:
for each interest i in g's interests:
for each person p in list of i
if p and g have mismatching sexual preferences
continue
if p is already in g's match list
g->match_list[p].score += i->match_weight
else
add p to g->match_list with score i->match_weight
sort g->match_list based on score
The choice of weights is not a simple task though. You would need a lot of psychology to get that right. Using your common sense however, you could get values that are not that far off.
In general, matching people is much more complicated than summing some scores. For example a certain set of matching interests may have more (or in some cases less) effect than the sum of them individually. Also, an interest in one may totally result in a rejection from the other no matter what other matching interest exists (Take two very similar people that one of them loves and the other hates twilight for example)

Resources