algorithm for equal groups according to Parameters - algorithm

I have some people's data .each people has grades for few parameters
I want to divide the peoples for N groups that will be as equals as possible in all the parameters.
the parameters are rating. for example - it is most important that parameter 1 will be
equals in the groups,the second parameter is in second priority and the last parameter is The least priority
for example :
there are 100 peoples with data like this:
people1 = ["param1"=12,"param2"=70,"param3"=6]
people2 = ["param1"=9,"param2"=79,"param3"=2]
and I want to divide the peoples to 3 groups (more or less in a same size)
that will have as most as possible equals grades
can someone help me? give idea?
thanks in advance

This post makes me think of me being kid and playing soccer games in the yard with other kids.
There were 2 captains selected, and each one chosen turn by turn one player from the pool for the team. This way teams were balanced at the end.
You sure can make an algorithm from this story, and it's super easy (even for kids :) and brings good results on large amount of data.
Only thing you need - to sort the data by "Strength" of players and divide them.

Related

How to design an algorithm to put elements into groups with constraints?

I was given a task of putting students into groups (to prepare a coding camp), but with several constraints. Though I've finished the task by hand, I'd like to know is there already exist some algorithms for tasks like this, or how can I design such an algorithm.
Background: 40 students in total, with these attributes:
gender: F/M
grade: Year 1/2
school: School 1/School 2/...
early assessment result: Rank from 1 to 40
Constraints: All of them needs to be satisfied.
Exactly 4 people per group
Each group needs to have at least a girl
Each group needs to have at least a Year 2 student
4 group members needs to come from 4 different schools
Each group needs to have at least a student who ranked top 10 in early assessment
What I'm expecting:
The Best: An existing algorithm/program for these kind of problems
Or, An algorithm for this specific problem
Or at least, Some ideas of creating an algorithm for this specific problem
My thoughts:
Since I've successed in making groups by hand, I know that such a solution indeed exists for my current dataset. But if I need an algorithm to find a solution for me, it should first try to check whether a solution even exists, by check if the number of girl / Year 2 students is greater than 10 (with pigeonhole principle), and some other conditions. And obviously, Constraint 5 is the easiest, and can provide a base solution for the rest. However, I still can not find a systematic way of doing it. Perhaps bruteforce and randomization can help? I'm not sure.
And sorry, since the data is confidential, I can not post it.
Update: After consulting a friend, here is a possible method:
First put the top 1 to 10 into 10 different groups.
Then iterate through groups. If the only person in the group is a boy/girl, try to add a girl/boy from a different school.
Then the problem size is reduced from 2^40 to 2^20, making bruthforce a viable solution.

Creating random overlapping groups

I'm trying to populate a database with sample data, and I'm hoping there's an algorithm out there that can speed up this process.
I have a database of sample people and I need to create a sample network of friend pairings. For example person 1 might be be friends with person 2,3,4, and 7, and person 2 would obviously be friends with person 1, but not necessarily with any of the others.
I'm hoping to find a way to automate the process of creating these randomly generated list of friends within certain parameters, like minimum and maximum number of friends.
Does something like this exist or could someone point me in the right direction?
So I'm not if this is the ideal solution, but it worked for me. Generally the steps were:
Start with an array of people.
Copy the array and shuffle it.
Give each person in the first array a random number (within a range) of random friends (second array).
Remove the person from their own list of friends.
Iterate through each person in each friend list and see if the owner of the list is in their friend's list and if not, add it.
I used a pool of 1000 people, with and initial range of friends of 3-10, and after adding the reciprocals the final average was about 5-27, which was good enough for me.

How to give players a score on a ranking/prediction task?

I have a website built with php/mysql, and I am looking for help in communicating to a Programmer what I want him to do with a Poll/Prediction game that I am trying to create.
For purposes of discussion, assume a game where perhaps 100 players try to predict the top 5 finishers in a Golf Tournament of perhaps 9 Golfers.
I am looking for help in how to create and assign a score based upon the accuracy of prediction.
The players provide a rank ordering using a drag and drop function to order the players from 1 through 5. This ordering has already been coded, and the ranks are stored somehow in the DB (I do not know how).
My initial thinking is to ask the coder to create a script which will assign a score from 1 to 5 for each Golfer that the player nominated to be in the Top 5.
So, a player who predicted perfectly would be awarded a perfect score of 12345.
His first golfer received a 1 for finishing first, second a 2 for finishing second, third golfer receives a 3 for finishing third, and so on.
Anybody less than perfect would have a score higher than 12345.
Players who got the first four positions correct would have to be differentiated on the basis of the finish of their fifth Golfer.
So, one might score 12347 and the other 12348 and the player with the highest score (12348) would be the loser in a matchup of the two players.
A player who did poorly, might have a score of 53419.
Question:
Is this a viable way of creating a score which the players of my game can be ranked upon?
Is it possible to instead simply have something like a Spearman Rank-Order Correlation calculated comparing the Actual Finish Positions with the Predicted Finish Positions for each player,
and then rank players on the basis of the correlation coefficients for their rankings?
Thanks for any help in clarifying how to conceptualize this before approaching a programmer who gets annoyed when I don't really know what I want him to do ahead of time.
It's a quite interesting problem.
It seems that there are three components that need to be considered in the scoring: the number of correct predictions, the order of correct predictions, and the weight of correct predictions.
For example, assume the truth is:
1,5,10,15,20
Here are some predictions:
1,6,7,8,9 : only predicted first one
2,1,10,21,30 : 1 and 10, but the order of 1 is incorrect
20,15,1,5,30 : hit four in the top 5, but the orders are incorrect
It depends on what you value most. You may first check how many in the top 5 the user has predicted, add a value, and then penalize wrong orders. The weight for each position should also be different, this way
1,5,10,15,20 will rank higher than 1,5,10,20,15 and higher than 1,10,5,20,15
Spearman may be working, but I feel it could be too coarse for your purpose.
This is actually a very similar problem that search engines have. EG, in search engine evaluation, the actual outcomes are preferred results provided by humans, and the predicted outcomes are the results delivered by the search engine. In both your task and for search engines, I'd guess you care a lot more about the accuracy of the winner than the accuracy of the 5th place finisher. If that is the case, then the mean average precision is probably a good measure.

Simulating amazon.com best seller for books

I was just going through amazon.com and an interesting thing that caught my eye is how they calculate best sells in books.
I was thinking of writing a sample program to calculate this. I was thinking that suppose i am calculating best sellers for the month than just sum the sales count of the individual books and show the top 10. Is it ok or am I missing something?
EDIT
One more interesting thing can happen: suppose one book having id1 was sold 10 pieces on first day but after that it has not been sold but book having id2 is getting sold for 1 or 2 pieces regularly. So how it would affect the best seller calculation. Thanks.
Sounds about right. Depends on how exactly you want to define it.
"best sellers" is the number of units sold.
Another way to do it, if you don't want to fix it to one month is to have some distribution function (like square decay, t^2) and add the counts weighted by the distribution function.
This way, even though you don't have a fixed timed window you look at both new comers and old books. Your function should look like this:
for a_book in books:
score = 0
for a_sale in sales[a_book]:
score += 1 / (days(now() - a_sale.time()) ** 2) # pow 2
I think you get the idea. You can try different functions like exp(days) or different powers. Experiment and see what makes sense for you.

Rating Algorithm

I'm trying to develop a rating system for an application I'm working on. Basically app allows you to rate an object from 1 to 5(represented by stars). But I of course know that keeping a rating count and adding the rating the number itself is not feasible.
So the first thing that came up in my mind was dividing the received rating by the total ratings given. Like if the object has received the rating 2 from a user and if the number of times that object has been rated is 100 maybe adding the 2/100. However I believe this method is not good enough since 1)A naive approach 2) In order for me to get the number of times that object has been rated I have to do a look up on db which might end up having time complexity O(n)
So I was wondering what alternative and possibly better ways to approach this problem?
You can keep in DB 2 additional values - number of times it was rated and total sum of all ratings. This way to update object's rating you need only to:
Add new rating to total sum.
Divide total sum by total times it was rated.
There are many approaches to this but before that check
If all feedback givers treated at equal or some have more weight than others (like panel review, etc)
If the objective is to provide only an average or any score band or such. Consider scenario like this website - showing total reputation score
And yes - if average is to be omputed, you need to have total and count of feedback and then have to compute it - that's plain maths. But if you need any other method, be prepared for more compute cycles. balance between database hits and compute cycle but that's next stage of design. First get your requirement and approach to solution in place.
I think you should keep separate counters for 1 stars, 2 stars, ... to calcuate the rating, you'd have to compute rating = (1*numOneStars+2*numTwoStars+3*numThreeStars+4*numFourStars+5*numFiveStars)/numOneStars+numTwoStars+numThreeStars+numFourStars+numFiveStars)
This way you can, like amazon also show how many ppl voted 1 stars and how many voted 5 stars...
Have you considered a vote up/down mechanism over numbers of stars? It doesn't directly solve your problem but it's worth noting that other sites such as YouTube, Facebook, StackOverflow etc all use +/- voting as it is often much more effective than star based ratings.

Resources