can't creating random groups in reasonable running time - performance

I have a personal project I'm working and I have an issue that I can't seem to solve (well, I can't solve it quickly).
Let's say I have a group of x[1..|x|] people, and a group of x elements.
I want to create x groups (group number i is for person number i) and in each group there are y different elements.
For example: if I have 10 people and 10 elements and I want each group will have 2 elements:
| 0 |1 |2 |3 |4 |5 |6 |7 |8 |9 |
|___________________________________________|
| 7 |4 |0 |6 |2 |8 |3 |1 |9 |5 |
| 6 |9 |5 |8 |7 |0 |2 |3 |1 |4 |
The top row represents the people (0..9);
the two numbers below each person says which elements he have.
Notice: every element appears only two times (not more and not less).
Also notice that person number i can't have the element number i.
For example: person number 3 can't have element number 3.
My problem is how to create those groups (quickly).
The best solution I found so far is to create a matrix with x column and y rows;
take an array with size x , shuffle it, and then to see if I can't insert it to the matrix. If I can, move to the next row; if I can't shuffle it again and see if now it can be inserted.
The problem is, that even with small numbers (1000 people/elements and in each group 50 elements) the code is very slow.
The problem is with the shuffle, when it tries to find a match to row (~13) it needs to reshuffle many times until it finds a row it can place inside the matrix.
Does anyone know how this thing can be done quickly? Any ideas will be welcomed!!
Thx.

you can iterate though people, for each placing y elements of that number in random groups that are not full. just have an array with the vacant groups and remove them when they fill up, and of course exclude the current group from the random selection.

In mathematical terms, what you want is to generate a random permutation without fixed points. This is called a this is called a derangement (see here for more details including the probability of a random permutation being a derangement). If you google "generate random derangement" or something similar you will find several implementations.

Related

How to decide the probability percentage in question

I have the below question:
In the first part of the question, is says the probability that the selected person will be a male is 0.44, it means the number of males is 25*0.44 = 11. That's ok
In the second part, the probability of the selected person will be a male who was born before 1960 is 0.28, Does that mean 0.28 out of the total number which is 25 or out of the number of males?
I mean should the number of male who was born before 1960 equals into 250.28 OR 110.28
I find it easiest to think of these sorts of problems as contingency tables.
You use a maxtrix layout to express the distributions in terms of two or more factors or characteristics, each having two or more categories. The table can be constructed either with probabilities (proportions) or with counts, and switching back and forth is easy based on the total count in the table. Entries in the table are the intersections of the categories, corresponding to and in a verbal description. The numbers to the right or at the bottom of the table are called marginals, because they're found in the margins of the tables, and are always the sum of the table row or column entries in which they occur. The total probability (or count) in the table is found by summing across all the rows and columns. The marginal distribution of gender would be found by summing across rows, and the marginal distribution of birthdays would be found by summing across the columns.
Based on this, you can inferentially determine other values as indicated by the entries in parentheses below. With one more entry, either for gender or in the marginal row for birthdays, you'd be able to fill in the whole table inferentially. (This is related to the concept of degrees of freedom - how many pieces of info can you fill in independently before the others are determined by the known constraint that the totals are fixed or that probability adds to 1.)
Probabilities
Birthday
< 1960 | >= 1960
_______________________
G | | |
e F | | | (0.56)
n __|_________|__________|
d | | |
e M | 0.28 | (0.16) | 0.44
r __|_________|__________|______
? ? | 1.00
Counts
Birthday
< 1960 | >= 1960
_______________________
G | | |
e F | | | (14)
n __|_________|__________|
d | | |
e M | 7 | (4) | 11
r __|_________|__________|_____
? ? | 25
Conditional probability corresponds to limiting yourself to the subset of rows or columns specified in the condition. If you had been asked what is the probability of a birthday < 1960 given the gender is male, i.e., P{birthday < 1960 | M} in relatively standard notation, you'd be restricting your focus to just the M row, so the answer would be 7/11 = 0.28/0.44. Computationally, you take the probabilities or counts in the qualifying table entries and express them as a proportion of the probabilities or counts of the specified (given) marginal entries. This is often written in prob & stats texts as P(A|B) = P(AB)/P(B), where AB is a set shorthand for A and B (intersection).
0,44 = 11 / 25 people are male.
0,28 = 7 / 25 people are male & born before 1960.

Normalize 5-star rating to make rating more uniform

I have a system where people rate different items on a scale from 0-5. The issue is, not everyone rates the same items, and the scoring is not objective. The goal is to achieve a fair comparison between items so that an item's score is not affected too much if one of the scorers is very "lenient" or "harsh." In actuality, there may be 100 items, each one scored twice, but here is an example dataset where 4 people scored 12 items, each one being scored twice:
| Item | Score 1 | Score 2 |
_____
1 | 5 | | 4 |
2 | 5 | | 3 | C
3 | 4 | |_2_|
4 A | 5 | | 5 |
5 | 3 | | 0 |
6 |_5_| | 3 |
7 | 3 | | 1 | D
8 | 4 | | 1 |
9 B | 4 | |_2_|
10 | 4 | | 3 |
11 | 4 | | 3 | C
12 |_5_| | 4 |
In this table, the boxes represent a single person's set of scores. We can label the person who gave score 1 to items 1-6 person A, the one who gave score 1 to 7-12 person B, the one who gave score 2 to 1-3 and 10-12 person C, and the one who gave score 2 to to 4-9 person D.
Informally, if we assume person C was the closest to each item's objective score, we might reason as follows:
Person A generally gave higher scores than C on items 1-3 so he is "lenient."
D gave low scores to all of them except for item 4 which then must have been truly good. He gave scores generally lower than A, so his scores should be adjusted slightly upwards perhaps.
B gave higher scores than D, and a bit higher than C, so a bit "lenient".
Thus, we might produce adjusted scores for each item. For example, even though item 2 has a higher average score than item 9, they are probably on par considering A is generally lenient and D is generally harsh. The question is, how do we do this programmatically. I thought we might make several transformation functions which transform a raw score into an adjusted score, say A, B, C, and D. For example, we might have A(5)=3.7 because when A rates an item as 5, it is really in the 3-4 range. Then, we want to minimize
|A(x_0a)-C(x_0c)|^2 + |D(x_1d)-A(x_1a)|^2 + |B(x_2b)-D(x_2d)|^2 + |C(x_3c)-B(x_3b)|^2
where x_ip is a vector which consists of person p's ratings for items 3i+1, 3i+2, and 3i+3. We might make A, B, C, and D linear transformations, for example. How then do you optimize it? And is this the best way to eliminate one the harshness or leniency of scorers without throwing away their ratings?

Tic Tac Toe heuristic AI [duplicate]

This question already has answers here:
What algorithm for a tic-tac-toe game can I use to determine the "best move" for the AI?
(10 answers)
Closed 7 years ago.
I designed simple AI for 3x3 Tic Tac Toe game. However I didn't want to do neither complete search, nor MinMax. Instead i thought of a heuristic that would evaluate values for all of 9 fields, and then AI would choose the field with highest value. The problem is, I have absolutely no idea how to determine, whether it is a perfect (unbeatable) algorithm.
And here are the details:
Every Field has several WinPaths on the grid. Middle one has 4 (horizontal, vertical and two diagonals), corners have 3 each (horizontal, diagonal and one vertical), sides have only 2 each (horizontal and vertical). Value of each Field equals sum of its WinPaths values. And WinPath value depends on its contents:
Empty: [ | | ] - 1 point
One symbol: [X| | ] - 10 points // can be any symbol in any place
Two different symbols: [X|O| ] - 0 points // they can be arranged in any possible way
Two identical opponents symbols: [X|X| ] - 100 points // arranged in any of three ways
Two identical "my" symbols: [O|O| ] - 1000 points // arranged in any of three ways
This way for example beginning situation has values as below:
3 | 2 | 3
---+---+---
2 | 4 | 2
---+---+---
3 | 2 | 3
However a later one can be like this (X is moving now):
X | 10| O
---+---+---
O | O |110
---+---+---
X | 20| 20
So is there any reliable way to find out whether is this a perfect algorithm, or does it have any disadvantages?
PS. I was trying (from the perspective of player) to create a fork situation so I could beat this AI, but I have failed.
Wikipedia: tic-tac-toe says that there are only 362,880 possible tic-tac-toe games. A brute force approach to proving your algorithm would be to exhaustively search the game tree, having your opponent try each possible move at each turn, and see if your algorithm ever loses (it's guaranteed a win or draw if perfect). The space is small enough that a program could do this very quickly. Of course, you would then be faced with proving that your test program is correct.
To know if your bot good enough you have to play a lot games bot vs top players and bot vs the best bots in the market (usually for much complicated game like chess or go).
Did you tried this (I play first)?
13| 12| 13
---+---+---
12| 14| 12
---+---+---
13| 12| O
Right?
| |
---+---+---
| X |
---+---+---
| | O
O |20 |30
---+---+---
20| X |20
---+---+---
30|20 | O
If I understand it well the X next move will be on the corner
O |20 | X
---+---+---
20| X |20
---+---+---
30|20 | O
And from here I win..
If this pass (if I missed something..) so your solution look perfect

Faster computation to get multiples of number at different levels

Here is the scenario:
We have several items that are shipped to many stores. We want to be able to allocate a certain quantity of each item to a store based on need. Each of these stores is also associated to a specific warehouse.
The catch is that at the warehouse level, the total quantity of each item must be a multiple of a number (6 for example).
I have already calculated out the quantity needed by each store at store level, but they do not sum up to a multiple of 6 at the warehouse level.
My solution was this using Excel:
Using a SUMIFS formula to keep track of the sum of each item allocated at the warehouse level. Then another MOD(6) formula that calculates the remaining until a multiple of 6. Then my actually VBA code loops through and subtracts 1 (if MOD <= 3) or adds (if MOD > 3) from the store level units needed until MOD = 0 for all rows.
Now this works for me, but is extremely slow even when I have just ~5000 rows.
I am looking for a faster solution, because everytime I subtract/add to units needed, the SUMIFS and MOD need to be calculated again.
EDIT: (trying to be clearer)
I have a template file that I paste my data into with the following setup:
+------+-------+-----------+----------+--------------+--------+
| Item | Store | Warehouse | StoreQty | WarehouseQty | Mod(6) |
+------+-------+-----------+----------+--------------+--------+
| 1 | 1 | 1 | 2 | 8 | 2 |
| 1 | 2 | 1 | 3 | 8 | 2 |
| 1 | 3 | 1 | 1 | 8 | 2 |
| 1 | 4 | 1 | 2 | 8 | 2 |
| 2 | 1 | 2 | 1 | 4 | 2 |
| 2 | 2 | 2 | 3 | 4 | 2 |
+------+-------+-----------+----------+--------------+--------+
Currently the WarehouseQty column is the SUMIFS formula summing up the StoreQty for each Item-Store combo that is associated to the Warehouse. So I guess the Warehouse/WarehouseQty columns is actually duplicated several times every time an Item-Store combo shows up. The WarehouseQty is the one that needs to be a multiple of 6.
Screen updating can be turned OFF to speed up length computations like this:
Application.ScreenUpdating = FALSE
The opposite assignment turns screen updating back on again.
put the data into an array first, rather than cells, then put the data back after you have manipulated it - this will be much faster.
an example which uses your criteria:
Option Explicit
Sub test()
Dim q() 'this is what will be used for the range
Dim i As Long
q = Range("C2:C41") 'put the data into the array - *ALWAYS* 2 dimensions, even if a single column
For i = LBound(q) To UBound(q) ' use this, in case it's a dynamic array - 1 to 40 would have worked here
Select Case q(i, 1) Mod 6 ' calculate remander
Case 0 To 3
q(i, 1) = q(i, 1) - (q(i, 1) Mod 6) 'make a multiple of 6
Case 4 To 5
q(i, 1) = q(i, 1) - (q(i, 1) Mod 6) + 6 ' and go higher in the later numbers
End Select
Next i
Range("D2:D41") = q ' drop the data back
End Sub
Guessing you may find that stopping the screen refresh may help quite a chunk and therefore not need any more suggestions.
Another option would be to reduce your adjustment to a quantity which is divisible by 6 to a number of if statements, depending on the value of mod(6).
You could also address how you sum up the number of a particular item across all stores, using a pivot table and reading the sum totals from there is a lot quicker than using sumifs in a macro
Based on your modifications to the question:
You're correct that you could have huge amounts of replication doing the calculation row by row, as well as adjusting the quantity by a single unit at a time even though you know exactly how many units you need to add / remove from the mod(6) formula.
Could you not create a new sheet with all your possible combinations of product Id and store. You could then use sumifs() for each of these unique combinations and in a final step round up/down at a warehouse level?

Relative quality of sorted array

I have 2 sorting alghoritms that provides different results (i sort info by relevancy). As result in both ways I get same items in different order. I know, that first alghorytm provides better results than second. I want to get relative value (from 0 to 1) that means "first N values of array2 is 0.73 quality of first N values of array1" (I compare first elements, because user see it without any actions).
First that comes to mind is use sum of differences between position in array1 and array2.
For example:
array1: 1 2 3 4 | 5 6 7 8 9
array2: 8 6 2 3 | 7 4 1 5 9 - positions in array1
array2*: 5 5 2 3 | (greater than 4 replaces with 5 to take relative value in diapasone 0..1)
I want to compare first 4 elements:
S = 1 + 2 + 3 + 4 - sum of etalon, maximum deviation
D = |1 - 5| + |2 - 5| + |3 - 2| + |4 - 3| = 9 - this is absolute deviation
To calculate relative quality I use next formula: (S - D)/S = 0.1.
Is there any standart algorithms? What disadvantages of this algoritm?
What you are looking for is probably DCG [Discounted Cumulative Gain] and nDCG [normalized DCG], which are used to rank relevance.
This assumes one list [let it be list2] is a baseline - the "absolute truth", and list1 should be as closest as possible to it.
The idea is that if the first element if out of order - it is more important if the 10th element is out of order.
The solution is described with more details and an example in my answer in this post [sorry for self-adving myself, it just seems to fit well in here]. and the basic idea is to evaluate:
DCG(list1)/DCG(list2)
Where the relevance of the each element is derived from list2 itself, for example: rel_i = 1/log(1+i)
Notes:
Of course DCG can be calculated only on the relvant n elements
and not on the entire list.
This solution will yield result of 1 if list1 == list2
This solution assumes what matters is only where elements appear, and not the numerical value - of the elements. It completely disregard the numerical value.

Resources