What is the field of study of this algorithm problem? - algorithm

In my app people can give a mark to other people out of ten points. At midnight, every day, I would like to implement an algorithm that compute the best "match" for each person.
At the end of the day, I will have, for exemple :
(ID_person_who_gave_mark, ID_person_who_received_mark, mark)
(1, 2, 7.5) // 1 gave to 2 a 7.5/10
(1, 3, 9) // etc..
(1, 4, 6)
(2, 1, 5.5)
(2, 3, 4)
(2, 4, 8)
(3, 1, 3)
(3, 2, 10)
(3, 4, 9)
(4, 1, 2)
// no (4, 2, xx) because 4 didn't gave any mark to 2
(4, 3, 6.5)
At the end of algo, I would like that every person has the best match, that is the best compromise to "make everyone happy".
In my exemple, I would say that person 1 gave a 9/10 to 3 but 3 gave a 3/10 to 1 so they definitely can't match, 1 gave a 7.5/10 to 2 and 2 gave a 5.5/10 to 1, so why not, and finally, 1 gave a 6/10 to 4 but 4 gave a 2/10 to 1 so they can't match (under 5/10 = they can't match). So for person 1, the only match would be with 2, but I have to check if it's good for 2 to have 1 as match too.
2 gave a 4/10 to 3 so (2,3) is over (under 5/10), but 2 gave a 8/10 to 4, so (2,4) would be much more cool for 2 than (2,1).
Let's see 4 : 4 didn't gave any mark to 2, so they can't match : one possibility is left for 2 : we make the match (2-1)
let's see for 3 : with 1 it's over (3/10), with 2 it would be super cool for 3 (10/10) but 2 gave 3 a 4/10 so it's over. 3 gave a cool 9/10 to 4, so that would be nice too. Let's check 4 : 4 gave 2/10 to 1 so it's over with 1 (under 5/10), and gave a pretty nice 6.5 to 3. So the best match is finally between 4 and 3.
So our final matchs are in this exemple : (1-2) and (3-4)
Even when I do that intuitively like I just did, it's complicated to find an appropriate behaviour to compute all these informations.
Can you help me "mathematise" such a purpose, in order to have an algo that could do such calculation for let's say 50000 people ? Or at least, what is the field of study where I could get more information to solve this effectively ?

Related

THE JOSEPHUS PROBLEM? (concrete mathematics)

"thanks in advance to give your precious time..."
In our variation, we start with n people numbered 1 to n around a circle, and we eliminate every second remaining person until only one survives.
As it is said that the smart mathematicians are not shame of thinking small..!
so we will start with the group of only 10 people around the circle.
The elimination order is 2, 4, 6, 8, 10, and 1, 3, 5, 7, 9, so 5 survives. The problem: Determine the survivor's number, J(n).
We just saw that J(10) = 5. We might conjecture that J(n) = n/2 when n is even; and the case n = 2 supports the conjecture: J(2) = 1. But a few other small cases dissuade us | the conjecture fails for n = 4 and n = 6.
n =| 1| 2| 3|4 |5 |6
_____|__| _|_ |_ |_ |_
J(n)=|1 |1 | 3| 1| 3| 5
as for n=1 there is no second person to eliminate so it is clear that the J(1)=1; and for the n=2 as the 2 is next to the 1 in the circle so the second(2) person get's eliminated i.,e n=2; J(2)=1 clear and fine ..!but for 3 persons in the circle the 2nd gets eliminated and we have 1,3 as the survivor but ,why the book show's that J(3)=3...
here i am unable to understand why for n=3 ;J(3)=3 as and for n=4 ;J(4)=1 and for the n=6;J(6)=5
We know every second person (which is the person immediatly after the person we are currently looking at) will be killed.
For n = 3:
(1) 2 3 (looking at 1, kills 2)
1 (3) (looking at 3, kills 1)
(3) 3 survives
For n = 4:
(1) 2 3 4 (looking at 1, kills 2)
1 (3) 4 (looking at 3, kills 4)
(1) 3 (looking at 1, kills 3)
(1) 1 survives
For n = 6:
(1) 2 3 4 5 6 (looking at 1, kills 2)
1 (3) 4 5 6 (looking at 3, kills 4)
1 3 (5) 6 (looking at 5, kills 6)
(1) 3 5 (looking at 1, kills 3)
1 (5) (looking at 5, kills 1)
(5) 5 survives

Creating simple repeating number sequence in SPSS

I want to create the following sequence in SPSS syntax. I've tried LOOP and DO REPEAT, but cannot figure out how to re-create this:
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
Your question is really not clear enough, so I'm just guessing. Please edit your question so we can know if this is the right solution (and for the benefit of future readers).
If what you want is a variable that has the values 1, 1, 1, 2, 2, 2, 3, 3, 3, etc', Here is a way to get that:
compute MyVar=trunc(($casenum-1)/3)+1.
exe.

Getting the combination of facevalues that gives the highest score in a dicegame

Working on a dicegame for school and I have trouble figuring out how to do automatic calculation of the result. (we don't have to do it automatically, so I could just let the player choose which dice to use and then just check that the user choices are valid) but now that I have started to think about it I can't stop...
the problem is as follows:
I have six dice, the dice are normal dice with the value of 1-6.
In this example I have already roled the dice and they have the following values:
[2, 2, 2, 1, 1, 1]
But I don't know how to calulate all combinations so that as many dicecombinations as possible whose value combined(addition) are 3 (in this example) are used.
The values should be added together (for example a die with value 1 and another die with the value 2 are together 3) then there are different rounds in the game where the aim is to get different values (which can be a combination(addition) of die-values for example
dicevalues: [2, 2, 2, 2, 2, 2]
could give the user a total of 12 points if 4 is the goal for the current round)
2 + 2 = 4
2 + 2 = 4
2 + 2 = 4
if the goal of the round instead where 6 then the it would be
2 + 2 + 2 = 6
2 + 2 + 2 = 6
instead which would give the player 12 points (6 + 6)
[1, 3, 6, 6, 6, 6]
with the goal of 3 would only use the dice with value 3 and discard the rest since there is no way to add them up to get three.
2 + 1 = 3
2 + 1 = 3
2 + 1 = 3
would give the user 9 points.
but if it where calculated the wrong way and the ones where used up together instead of each 1 getting apierd with a two 1 + 1 + 1 which would only give the player 3 points och the twos couldn't be used.
Another example is:
[1, 2, 3, 4, 5, 6]
and all combinations that are equal to 6 gives the user points
[6], [5, 1], [4 ,2]
user gets 18 points (3 * 6)
[1 ,2 ,3], [6]
user gets 12 points (2 * 6) (Here the user gets six points less due to adding upp 1 + 2 + 3 instead of doing like in the example above)
A dice can have a value between 1 and 6.
I haven't really done much more than think about it and I'm pretty sure that I could do it right now, but it would be a solution that would scale really bad if I for example wanted to use 8 dices instead and every time I start programming on it I start to think that have to be a better/easier way of doing it... Anyone have any suggestion on where to start? I tried searching for an answer and I'm sure it's out there but I have problem forumulating a query that gives me relevant result...
With problems that look confusing like this, it is a really good idea to start with some working and examples. We have 6 die, with range [1 to 6]. The possible combinations we could make therefore are:
target = 2
1 combination: 2
2 combination: 1+1
target = 3
1 combination: 3
2 combination: 2+1
3 combination: 1+1+1
target = 4
1 combination: 4
2 combination: 3+1
2+2
3 combination: 2+1+1
4 combination: 1+1+1+1
target = 5
1 combination: 5
2 combination: 4+1
3+2
3 combination: 2+2+1
4 combination: 2+1+1+1
5 combination: 1+1+1+1+1
See the pattern? Hint, we go backwards from target to 1 for the first number we can add, and then given this first number, and the size of the combination, there is a limit to how big subsequent numbers can be!
There is a finite list of possible combinations. You can by looking for 1 combination scores, and remove these from the die available. Then move on to look for 2 combination scores, etc.
If you want to read more about this sub-field of mathematics, the term you need to look for is "Combinatorics". Have fun!

Moving maximum variant

Yesterday, I got asked the following question during a technical interview.
Imagine that you are working for a news agency. At every discrete point of time t, a story breaks. Some stories are more interesting than others. This "hotness" is expressed as a natural number h, with greater numbers representing hotter news stories.
Given a stream S of n stories, your job is to find the hottest story out of the most recent k stories for every t >= k.
So far, so good: this is the moving maximum problem (also known as the sliding window maximum problem), and there is a linear-time algorithm that solves it.
Now the question gets harder. Of course, older stories are usually less hot compared to newer stories. Let the age a of the most recent story be zero, and let the age of any other story be one greater than the age of its succeeding story. The "improved hotness" of a story is then defined as max(0, min(h, k - a)).
Here's an example:
n = 13, k = 4
S indices: 0 1 2 3 4 5 6 7 8 9 10
S values: 1 3 1 7 1 3 9 3 1 3 1
mov max hot indices: 3 3 3 6 6 6 6 9
mov max hot values: 7 7 7 9 9 9 9 3
mov max imp-hot indices: 3 3 5 6 7 7 9 9
mov max imp-hot values: 4 3 3 4 3 3 3 3
I was at a complete loss with this question. I thought about adding the index to every element before computing the maximum, but that gives you the answer for when the hotness of a story decreases by one at every step, regardless of whether it reached the hotness bound or not.
Can you find an algorithm for this problem with sub-quadratic (ideally: linear) running time?
I'll sketch a linear-time solution to the original problem involving a double-ended queue (deque) and then extend it to improved hotness with no loss of asymptotic efficiency.
Original problem: keep a deque that contains the stories that are (1) newer or hotter than every other story so far (2) in the window. At any given time, the hottest story in the queue is at the front. New stories are pushed onto the back of the deque, after popping every story from the back until a hotter story is found. Stories are popped from the front as they age out of the window.
For example:
S indices: 0 1 2 3 4 5 6 7 8 9 10
S values: 1 3 1 7 1 3 9 3 1 3 1
deque: (front) [] (back)
push (0, 1)
deque: [(0, 1)]
pop (0, 1) because it's not hotter than (1, 3)
push (1, 3)
deque: [(1, 3)]
push (2, 1)
deque: [(1, 3), (2, 1)]
pop (2, 1) and then (1, 3) because they're not hotter than (3, 7)
push (3, 7)
deque: [(3, 7)]
push (4, 1)
deque: [(3, 7), (4, 1)]
pop (4, 1) because it's not hotter than (5, 3)
push (5, 3)
deque: [(3, 7), (5, 3)]
pop (5, 3) and then (3, 7) because they're not hotter than (6, 9)
push (6, 9)
deque: [(6, 9)]
push (7, 3)
deque: [(6, 9), (7, 3)]
push (8, 1)
deque: [(6, 9), (7, 3), (8, 1)]
pop (8, 1) and (7, 3) because they're not hotter than (9, 3)
push (9, 3)
deque: [(6, 9), (9, 3)]
push (10, 1)
pop (6, 9) because it exited the window
deque: [(9, 3), (10, 1)]
To handle the new problem, we modify how we handle aging stories. Instead of popping stories as they slide out of the window, we pop the front story whenever its improved hotness becomes less than or equal to its hotness. When determining the top story, only the most recently popped story needs to be considered.
In Python:
import collections
Elem = collections.namedtuple('Elem', ('hot', 't'))
def winmaximphot(hots, k):
q = collections.deque()
oldtop = 0
for t, hot in enumerate(hots):
while q and q[-1].hot <= hot:
del q[-1]
q.append(Elem(hot, t))
while q and q[0].hot >= k - (t - q[0].t) > 0:
oldtop = k - (t - q[0].t)
del q[0]
if t + 1 >= k:
yield max(oldtop, q[0].hot) if q else oldtop
oldtop = max(0, oldtop - 1)
print(list(winmaximphot([1, 3, 1, 7, 1, 3, 9, 3, 1, 3, 1], 4)))
Idea is the following: for each breaking news, it will beat all previous news after k-h steps. It means for k==30 and news hotness h==28, this news will be hotter than all previous news after 2 steps.
Let's keep all moments of time when next news will be the hottest. At step i we get moment of time when current news will beat all previous ones equal to i+k-h.
So we will have such sequence of objects {news_date | news_beats_all_previous_ones_date}, which is in increasing order by news_beats_all_previous_ones_date:
{i1 | i1+k-h} {i3 | i3+k-h} {i4 | i4+k-h} {i7 | i7+k-h} {i8 | i8+k-h}
At current step we get i9+k-h, we are adding it to the end of this list, removing all values which are bigger (since sequence is increasing this is easy).
Once first element's news_beats_all_previous_ones_date becomes equal current date (i), this news becomes answer to the sliding window query and we remove this item from the sequence.
So, you need a data structure with ability to add to the end, and remove from beginning and from the end. This is Deque. Time complexity of solution is O(n).

Suggestions for optimizing length of bins for a time period

I have an optimisation problem where I need to optimize the lengths of a fixed number of bins over a known period of time. The bins should contain minimal overlapping items with the same tag (see definition of items and tags later).
If the problem can be solved heuristically that is fine, the exact optimum is not important.
I was wondering if anybody had any suggestions as to approaches to try out for this or at had any ideas as to what the name of the problem would be.
The problem
Lets say we have n number of items that have two attributes: tag and time range.
For an example we have the following items:
(tag: time range (s))
(1: 0, 2)
(1: 4, 5)
(1: 7, 8)
(1: 9, 15)
(2: 0, 5)
(2: 7, 11)
(2: 14, 20)
(3: 4, 6)
(3: 7, 11)
(4: 5, 15)
When plotted this would look as follows:
Lets say we have to bin this 20 second period of time into 4 groups. We could do this by having 4 groups of length 5.
And would look something like this:
The number of overlapping items with the same tag would be
Group 1: 1 (tag 1)
Group 2: 2 (tag 1 and tag 3)
Group 3: 2 (tag 2)
Group 4: 0
Total overlapping items: 5
Another grouping selection for 4 groups would then be of lengths 4, 3, 2 and 11 seconds.
The number of overlapping items with the same tag would be :
Group 1: 0
Group 2: 0
Group 3: 0
Group 4: 1 (tag 2)
Attempts to solve (brute force)
I can find the optimum solution by binning the whole period of time into small segments (say 1 seconds, for the above example there would be 20 bins).
I can then find all the integer compositions for the integer 20 that use 4 components. e.g.
This would provide 127 different compositions
(1, 1, 4, 14), (9, 5, 5, 1), (1, 4, 4, 11), (13, 3, 3, 1), (3, 4, 4, 9), (10, 5, 4, 1), (7, 6, 6, 1), (1, 3, 5, 11), (2, 4, 4, 10) ......
For (1, 1, 4, 14) the grouping would be 4 groups of 1, 1, 4 and 14 seconds.
I then find the composition with the best score (smallest number of overlapping tags).
The problem with this approach is that it can only be done on relatively small numbers as the number of compositions of an integer gets incredibly large when the size of the integer increases.
Therefore, if my data is 1000 seconds and I have to put bins of size 1 second the run time would be too long.
Attempts to solve (heuristically)
I have tried using a genetic algorithm type approach.
Where chromosomes are a composition of lengths which are created randomly and genes are the individual lengths of each group. Due to the nature of the data I am struggling to do any meaningful crossover/mutations though.
Does anyone have any suggestions?

Resources