Amazon Quicksight - how to calculate Percentage of Total UNIQUE values? - amazon-quicksight

How to calculate percent of count of UNIQUE values?
E.g. I have a dataset with people who can pick multiple symptoms (i.e each person can have 0 to 10 values).
person 1 - symptom A, B
person 2 - symptom B, C, D
person 3 - no symptoms
person 4 - symptom A
etc.
E.g. if total UNIQUE count of people is 4 and 2 of them have picked symptom A, then I'd like to see:
A = 2/4 = 50%
Currently QuickSight is able to calculate shares based on total count of people (not unique count) as one person can have multiple symptoms, so A is 2/6 = 33% (not what I need).
As much as I've tried, QuickSight doesn't enable that??

Related

Algorithms for optimal student seating arrangements

Say I need to place n=30 students into groups of between 2 and 6, and I collect the following preference data from each student:
Student Name: Tom
Likes to sit with: Jimi, Eric
Doesn't like to sit with: John, Paul, Ringo, George
It's implied that they're neutral about any other student in the overall class that they haven't mentioned.
How might I best run a large number of simulations of many different/random grouping arrangements, to be able to determine a score for each arrangement, through which I could then pick the "most optimal" score/arrangement?
Alternatively, are there any other methods by which I might be able to calculate a solution that satisfies all of the supplied constraints?
I'd like a generic method that can be reused on different class sizes each year, but within each simulation run, the following constants and variables apply:
Constants: Total number of students, Student preferences
Variables: Group sizes, Student Groupings, Number of different group arrangements/iterations to test
Thanks in advance for any help/advice/pointers provided.
I believe you can state this as an explicit mathematical optimization problem.
Define the binary decision variables:
x(p,g) = 1 if person p is assigned to group g
0 otherwise
I used:
I used your data set with 28 persons, and your preference matrix (with -1,+1,0 elements). For groups, I used 4 groups of 6 and 1 group of 4. A solution can look like:
---- 80 PARAMETER solution using MIQP model
group1 group2 group3 group4 group5
aimee 1
amber-la 1
amber-le 1
andrina 1
catelyn-t 1
charlie 1
charlotte 1
cory 1
daniel 1
ellie 1
ellis 1
eve 1
grace-c 1
grace-g 1
holly 1
jack 1
jade 1
james 1
kadie 1
kieran 1
kristiana 1
lily 1
luke 1
naz 1
nibah 1
niko 1
wiki 1
zeina 1
COUNT 6 6 6 6 4
Notes:
This model can be linearized, so it can be fed into a standard MIP solver
I solved this directly as a MIQP model (actually the solver reformulated the model into a MIP). The model solved in a few seconds.
Probably we need to add extra logic to make sure one person is not getting a really bad assignment. We optimize here only the total sum. This overall sum may allow an individual to get a bad deal. It is an interesting exercise to take this into account in the model. There are some interesting trade-offs.
1st approach should be, create matrix n x n where n is total number of students, indexes for row and columns are ordinals for every student, and each column representing preferences for sitting with the others students. Fills the cells with values 1=Like to sit, -1 = the Opposite, 0 = neutral. Zeroes to be filled too on main diagonal (i,i)
------Mark Maria John Peter
Mark 0 1 -1 1
Maria 0 0 -1 1
John -1 1 0 1
Peter 0
Score calculations are based on sums of these values. So ie: John likes to sit with Maria, = 1, but Maria doesn't like to sit with John -1, result is 0. Best result is when both score (sum) 2.
So on, based on Group Sizes, calculate Score of each posible combination. Bigger the score, better the arrangement. Combinations discriminate values on main diagonal. ie: John grouped with the same John is not a valid combination/group.
In a group size of 2, best score is 2
In a group size of 3, best score is 6,
In a group size of 4, best score is 12
In a group size of n, best score would be (n-1)*n
Now in ordered list of combinations / groups, you should take first the best tuples with highest scores, but avoiding duplicates of students between tuples.
In a recent research, a PSO was implemented to classify students under unknown number of groups of 4 to 6. PSO showed improved capabilities compared to GA. I think that all you need is the specific research.
The paper is: Forming automatic groups of learners using particle swarm optimization for applications of differentiated instruction
You can find the paper here: https://doi.org/10.1002/cae.22191
Perhaps the researchers could guide you through researchgate: https://www.researchgate.net/publication/338078753
Regarding the optimal sitting you need to specify an objective function with the specific data

Algorithm: Fill different baskets

Let's assume I have 3 different baskets with a fixed capacity
And n-products which provide different value for each basket -- you can only pick whole products
Each product should be limited to a max amount (i.e. you can maximal pick product A 5 times)
Every product adds at least 0 or more value to all baskets and come in all kinds of variations
Now I want a list with all possible combinations of products fitting in the baskets ordered by accuracy (like basket 1 is 5% more full would be 5% less accurate)
Edit: Example
Basket A capacity 100
Basket B capacity 80
Basket C capacity 30
fake products
Product 1 (A: 5, B: 10, C: 1)
Product 2 (A: 20 B: 0, C: 0)
There might be hundreds more products
Best fit with max 5 each would be
5 times Product 1
4 times Product 2
Result
A: 105
B: 50
C: 5
Accuracy: (qty_used / max_qty) * 100 = (160 / 210) * 100 = 76.190%
Next would be another combination with less accuracy
Any pointing in the right direction is highly appreciated Thanks
Edit:
instead of above method, accuracy should be as error and the list should be in ascending order of error.
Error(Basket x) = (|max_qty(x) - qty_used(x)| / max_qty(x)) * 100
and the overall error should be the weighted average of the errors of all baskets.
Total Error = [Σ (Error(x) * max_qty(x))] / [Σ (max_qty(x))]

How to define a algorithm that gives a ranking number for at dentist?

I have some problems with defining a algorithm that will calculate a ranking number for a dentist.
Assume, we have three different dentists:
dentist number 1: Got 125 patients and out of the 125 patients the
dentist have booked a time with 75 of them. 60% of them got a time.
dentist number 2: Got 5 patients and out of the 5 patients the
dentist have booked a time with 4 of them. 80% of them got a time.
dentist number 3: Got 25 patients and out of the 14 patients the
dentist have booked a time with 14 of them. 56% got a time.
If we use the formula:
patients booked time with / totalpatients * 100
it will not be the right way to calculate the ranking, as we will get an output of the higher percentage is, the better the dentist is, but it's wrong. By doing it in that way, the dentists would have a ranking:
dentist number 2 would have a ranking of 1. (80% got a time).
dentist number 1 would have a ranking of 2 (60% got a time).
dentist number 3 would have a ranking of 3. (56% got a time).
But, it should be in this way:
dentist number 1 = ranking 1
dentist number 2 = ranking 2
dentist number 3 = ranking 3
I don't know to make a algorithm that also takes the amount of patients as a factor to the ranking-calculation.
It is quite arbitrary how you define what makes a better dentist in terms of number of patients and the percentage of those that have an appointment with them.
Let's call the number of patients P, the number of those that have an appointment A, and the function determining how "good" a dentist is f. So f would be a function of P and A: f(P, A).
One component of f could indeed be what you already calculated: A/P.
Another component would have to be P, but I would think that the effect on f(P, A) of increasing P with 1 would be much higher for a low P, than for a high P, so this component should not be a linear function. It would also be practical if this component would have a value between 0 and 1, just like the other component.
Taking all this together, I suggest this definition of f, which will give a number between 0 and 1:
f(P,A) = 1/3 * P/(10 + P) + 2/3 * A/P
For the different dentists, this results in:
1: 1/3 * 125/135 + 2/3 * 75/125 = 0.7086419753...
2: 1/3 * 5/15 + 2/3 * 4/5 = 0.6444444444...
3: 1/3 * 25/35 + 2/3 * 14/25 = 0.6114285714...
You could play a bit with the constant factors in the formula, like increasing the term 10. Or you could change the factors 1/3 and 2/3 making sure that their sum is 1.
This is just one way to do it. There are an infinity of other ways...

Summarize different category rankings

I determine the rankings of i.e. 1000 participants in multiple categories.
The results are something like that:
Participant/Category/Place (lower is better):
A|1|1.
A|2|1.
A|3|1.
A|4|7.
B|1|2.
B|2|2.
B|3|2.
B|4|4.
[...]
Now I want to summarize the rankings. The standard method would be to sum up all places and divide it by the number of categories:
Participant A: (1+1+1+7) / 4 = 2,5
Participant B: (2+2+2+4) / 4 = 2,5
But I want to prefer participant A, because he's won 3 of 4 categories.
I could define fixed points for all places, i.e:
Place|Points
1|1000
2|500
3|250
4|125
5|62.5
6|31.25
7|15.625
[...]
Participant A: 1000+1000+1000+15.625 = 3015.625
Participant B: 500+500+500+125 = 1625
The problem is now, that I want to give every place some points, so it's still possible to sort low places. And when I continue to divide the available points by 2, the maximum number of decimal places are insufficient (Available points /2^Number of places).
What can I do?
How about using harmonic mean?
4 / (1/1 + 1/1 + 1/1 + 1/7) = 1.272727
4 / (1/2 + 1/2 + 1/2 + 1/4) = 2.285714

How to optimize Cartesian product

Is there a better way to compute Cartesian product. Since Cartesian product is a special case that differs on each case. I think, I need to explain what I need to achieve and why I end up doing Cartesian product. Please help me if Cartesian product is the only solution for my problem. If so, how to improve the performance?
Background:
We are trying to help customers to buy products cheaper.
Let say customer ordered 5 products (prod1, prod2, prod3, prod4, prod5).
Each ordered product has been offered by different vendors.
Representation Format 1:
Vendor 1 - offers prod1, prod2, prod4
vendor 2 - offers prod1, prod5
vendor 3 - offers prod1, prod2, prod5
vendor 4 - offers prod1
vendor 5 - offers prod2
vendor 6 - offers prod3, prod4
In other words
Representation Format 2:
Prod 1 - offered by vendor1, vendor2, vendor3, vendor4
Prod 2 - offered by vendor5, vendor3, vendor1
prod 3 - offered by vendor6
prod 4 - offered by vendor1, vendor6
prod 5 - offered by vendor3, vendor2
Now to choose the best vendor based on the price. We can sort the products by price and take the first one.
In that case we choose
prod 1 from vendor 1
prod 2 from vendor 5
prod 3 from vendor 6
prod 4 from vendor 1
prod 5 from vendor 3
Complexity:
Since we chose 4 unique vendors, we need to pay 4 shipping prices.
Also each vendor has a minimum purchase order. If we don't meet it, then we end up paying that charge as well.
In order to choose the best combination of products, we have to do Cartesian product of offered products to compute the total price.
total price computation algorithm:
foreach unique vendor
if (sum (product price offered by specific vendor * quantity) < minimum purchase order limit specified by specific vendor)
totalprice += sum (product price * quantity) + minimum purchase charge + shipping price
else
totalprice += sum (product price * quantity) + shipping price
end foreach
In our case
{vendor1, vendor2, vendor3, vendor4}
{vendor1, vendor3, vendor5}
{vendor6}
{vendor1, vendor6}
{vendor2, vendor3}
4 * 3 * 1 * 2 * 2 = 48 combination needs to be computed to find the best combination.
{vendor1,vendor1, vendor6, vendor1, vendor2} = totalprice1,
{vendor1, vendor3, vendor6, vendor1, vendor2} = totalprice2,
*
{vendor4, vendor5, vendor6, vendor6, vendor3} = totalprice48
Now sort the computed total price to find the best combination.
Actual problem:
If the customer orders more than 15 products, and assume, each product has been offered by 8 unique vendors, then we end up computing 8^15=35184372088832 combinations, which takes more than couple of hours. If the customer orders more than 20 products then it takes more than couple of days.
Is there a solution to approach this problem in a different angle?
Your problem can get even more complex. A simple example:
Product 1 2 3
Vendor 1 10 20 40
Vendor 2 20 10 40
--------------------------
needed cnt 100 100 25
You need 100 El. of P1, 100 of P2, and 25 of P3.
P1 can be purchased for 1000 at V1, P2 for 1000 at V2, and P3 for 1000 at V1 or V3.
Now shipping would be free, if you purchase for 1500, but cost you 200 at each vendor else.
So if you order everything at V1, you would pay 4000:
1000+2000+1000+0 (shipping) = or for the same sum
2000+1000+1000+0 at V2, or splitted
1000+0+0+200 = 1200 at V1 plus
0+1000+1000+0 = 2000 at V2,
which sums up to 3200 and could be found by your method.
But you could split the purchase of product 3 this way:
1000+0+500+0 = 1500 at V1 plus
0+1000+500+0 = 1500 at V2
which only sums up to 3000 and would not be found by your method.
Afaik, there is established research in such topics, and the keywords are matrices and system of equations.
You can describe your problem as
f(c11, p11) + f(c22, p12) + f(c13, p13) = c1 => dc1
f(c21, p21) + f(c22, p22) + f(c23, p23) = c2 => dc2
...
f(c31, p31) + f(c32, p32) + f(c13, p33) = c3 => dc3
where cij is the count of product j at vendor i and pij is the price of product j at vendor i, but f(c11,p11) is not just count*price, but a function of count and price, since there might be a quantity discount. The right side is the purchase total for vendor i.
This is without purchase discount, which has to be modeled on top. If the discount on shipping is only depending on the total costs, it can be modeled just from ci => dci.
You would try to minimize sum (dc1+dc2+...+dcm).

Resources