How to find average of tuples in relational algebra calculator - relational-algebra

Problem is to use the group by function to find only the average of books checked out by students of a specific department. However, it keeps outputting the average of all checked out books from all students.
What I have so far:
γ avg(Books_Quantity) -> y (Student) ⨝ (σ Department = 'Computer_Science' (Student))
The output should be 1.75, but is instead outputting the average for all the departments.
y Student.Student_ID Student.Student_Name Student.Department Student.Books_Quantity
1.5 1 John Computer_Science 2
1.5 2 Lisa Computer_Science 1
1.5 5 Xina Computer_Science 3
1.5 7 Chang Computer_Science 1

I found the answer. You have to put the Select option inside the table selection operation. Like so:
γ avg(Books_Quantity) -> y (σ Department = 'Computer_Science' (Student))

Related

Power Pivot - Aggregate within groups to determine max value

I'm looking for a DAX formula (for Power Pivot) that aggregates within certain groups and across other groups to determine the maximum.
Here's my data table:
State
Customer
Fruit
Qty
NY
A
Apple
5
NY
A
Orange
1
NY
A
Pear
5
NY
B
Apple
1
NY
B
Orange
6
NY
C
Apple
2
NY
C
Orange
2
NY
C
Pear
5
CA
D
Orange
4
CA
D
Pear
2
I want to determine the most popular fruit by State (ignoring Customer). In NY, there are a total of 8 apples, 9 oranges, and 10 pears. So the formula should return Pear.
Resulting in a table like this:
State
Dominant Fruit
NY
Pear
CA
Orange
What is the Power Pivot formula I need for that Dominant Fruit column on the resulting table? Thanks
You can create a measure to rank the amount of fruits per state like so:
Ranking = RANKX( ALLEXCEPT( 'Table','Table'[Customer],'Table'[State] ) , CALCULATE( SUM( 'Table'[Qty] ) ) )
This measure will rank "Dominant Fruit" (based on the quantity) with 1.
You can than add filter on visual to show only values where rank is 1:

Algorithms for optimal student seating arrangements

Say I need to place n=30 students into groups of between 2 and 6, and I collect the following preference data from each student:
Student Name: Tom
Likes to sit with: Jimi, Eric
Doesn't like to sit with: John, Paul, Ringo, George
It's implied that they're neutral about any other student in the overall class that they haven't mentioned.
How might I best run a large number of simulations of many different/random grouping arrangements, to be able to determine a score for each arrangement, through which I could then pick the "most optimal" score/arrangement?
Alternatively, are there any other methods by which I might be able to calculate a solution that satisfies all of the supplied constraints?
I'd like a generic method that can be reused on different class sizes each year, but within each simulation run, the following constants and variables apply:
Constants: Total number of students, Student preferences
Variables: Group sizes, Student Groupings, Number of different group arrangements/iterations to test
Thanks in advance for any help/advice/pointers provided.
I believe you can state this as an explicit mathematical optimization problem.
Define the binary decision variables:
x(p,g) = 1 if person p is assigned to group g
0 otherwise
I used:
I used your data set with 28 persons, and your preference matrix (with -1,+1,0 elements). For groups, I used 4 groups of 6 and 1 group of 4. A solution can look like:
---- 80 PARAMETER solution using MIQP model
group1 group2 group3 group4 group5
aimee 1
amber-la 1
amber-le 1
andrina 1
catelyn-t 1
charlie 1
charlotte 1
cory 1
daniel 1
ellie 1
ellis 1
eve 1
grace-c 1
grace-g 1
holly 1
jack 1
jade 1
james 1
kadie 1
kieran 1
kristiana 1
lily 1
luke 1
naz 1
nibah 1
niko 1
wiki 1
zeina 1
COUNT 6 6 6 6 4
Notes:
This model can be linearized, so it can be fed into a standard MIP solver
I solved this directly as a MIQP model (actually the solver reformulated the model into a MIP). The model solved in a few seconds.
Probably we need to add extra logic to make sure one person is not getting a really bad assignment. We optimize here only the total sum. This overall sum may allow an individual to get a bad deal. It is an interesting exercise to take this into account in the model. There are some interesting trade-offs.
1st approach should be, create matrix n x n where n is total number of students, indexes for row and columns are ordinals for every student, and each column representing preferences for sitting with the others students. Fills the cells with values 1=Like to sit, -1 = the Opposite, 0 = neutral. Zeroes to be filled too on main diagonal (i,i)
------Mark Maria John Peter
Mark 0 1 -1 1
Maria 0 0 -1 1
John -1 1 0 1
Peter 0
Score calculations are based on sums of these values. So ie: John likes to sit with Maria, = 1, but Maria doesn't like to sit with John -1, result is 0. Best result is when both score (sum) 2.
So on, based on Group Sizes, calculate Score of each posible combination. Bigger the score, better the arrangement. Combinations discriminate values on main diagonal. ie: John grouped with the same John is not a valid combination/group.
In a group size of 2, best score is 2
In a group size of 3, best score is 6,
In a group size of 4, best score is 12
In a group size of n, best score would be (n-1)*n
Now in ordered list of combinations / groups, you should take first the best tuples with highest scores, but avoiding duplicates of students between tuples.
In a recent research, a PSO was implemented to classify students under unknown number of groups of 4 to 6. PSO showed improved capabilities compared to GA. I think that all you need is the specific research.
The paper is: Forming automatic groups of learners using particle swarm optimization for applications of differentiated instruction
You can find the paper here: https://doi.org/10.1002/cae.22191
Perhaps the researchers could guide you through researchgate: https://www.researchgate.net/publication/338078753
Regarding the optimal sitting you need to specify an objective function with the specific data

Task with graph - looking for advice

I've a task with graph. I'm not looking for code, but only for idea. I don't know even where should I start.. so content of this task is:
in first line we have two number, n and q.
n - number of cities and q - number of days.
next line contain n integer number n1, n2, n3, n4...n_n where n_i means that we can earn n_i money in city with number i.
next n-1 lines desribe connection between city a and b.
each line is form a, b, c which is mean that
a is connected with b (and b with a) and cost of this path is c.
next we have q lines which desribe changes, we have 2 case
in form 1 v d which means that from dawn day i profit from city v will be d
and form 2 a b d which means that from dawn day i cost of path between a and b (and b to a) will be d
our task is print ids of city where we will be sleep after i day.
we start from city which index 1 and when we're in city number 2 then we will sleep in this city.
for example
input:
4 4
10 20 30 50
1 2 5
2 3 7
2 4 57
1 3 28
1 1 25
2 3 2 1
2 2 4 13
output:
3 1 3 4
sorry for my english :/ as I said before I'm not looking for code but for general idea.
#EDIT
maybe it will be some useful info. When we go to city with index B. We spend there night. Question is. Where we will be spend our nights. I mean how our path looks

Summarize different category rankings

I determine the rankings of i.e. 1000 participants in multiple categories.
The results are something like that:
Participant/Category/Place (lower is better):
A|1|1.
A|2|1.
A|3|1.
A|4|7.
B|1|2.
B|2|2.
B|3|2.
B|4|4.
[...]
Now I want to summarize the rankings. The standard method would be to sum up all places and divide it by the number of categories:
Participant A: (1+1+1+7) / 4 = 2,5
Participant B: (2+2+2+4) / 4 = 2,5
But I want to prefer participant A, because he's won 3 of 4 categories.
I could define fixed points for all places, i.e:
Place|Points
1|1000
2|500
3|250
4|125
5|62.5
6|31.25
7|15.625
[...]
Participant A: 1000+1000+1000+15.625 = 3015.625
Participant B: 500+500+500+125 = 1625
The problem is now, that I want to give every place some points, so it's still possible to sort low places. And when I continue to divide the available points by 2, the maximum number of decimal places are insufficient (Available points /2^Number of places).
What can I do?
How about using harmonic mean?
4 / (1/1 + 1/1 + 1/1 + 1/7) = 1.272727
4 / (1/2 + 1/2 + 1/2 + 1/4) = 2.285714

User submitted rankings

I was looking to have members submit their top-10 list of something, or their top 10 rankings, then have some algorithm combine the results. Is there something out there like that?
Thanks!
Ahhhh, that's open-ended alright. Let's consider a simple case where only two people vote:
1 ALPHA
2 BRAVO
3 CHARLIE
1 ALPHA
2 DELTA
3 BRAVO
We can't go purely by count... ALPHA should obviously win, though it has the same votes as BRAVO. Yet, we must avoid a case where just a few first place votes dominate a massive amount of 10th place votes. To do this, I suggest the following:
$score = log($num_of_answers - $rank + 2)
First place would then be worth just a bit over one point, and tenth place would get .3 points. That logarithmic scaling prevents ridiculous dominance, yet still gives weight to rankings. From those example votes (and assuming they were the top 3 of a list of 10), you would get:
ALPHA: 2.08
BRAVO: 1.95
DELTA: .1
CHARLIE: .95
Why? Well, that's subjective. I feel out of a very long list that 4,000 10th place votes is worth more than 1,000 1st place votes. You may scale it differently by changing the base of your log (natural, 2, etc.), or choose a different system.
You could just add up the total for each item of the ranking given by a user and then sort them.
ie:
A = (a,b,c)
B = (a,c,b)
C = (b,a,c)
D = (c,b,a)
E = (a,c,b)
F = (c,a,b)
a = 1 + 1 + 2 + 3 + 1 + 2 = 10
b = 2 + 3 + 1 + 2 + 3 + 3 = 14
c = 3 + 2 + 3 + 1 + 2 + 1 = 12
Thus,
a
c
b
I think you could solve this problem by using a max flow algorithm, to create an aggregate ranking, assuming the following:
Each unique item from the list of items is a node in a graph. E.g. if there are 10 things to vote on, there are 10 nodes.
An edge goes from node *a* to node *b* if *a* is immediately before *b* in a _single user submitted_ ranking.
The last node created from a _single user submitted_ ranking will have an edge pointed at the *sink*
The first node created from a _single user submitted_ ranking will have an incoming edge from the *source*
This should get you an aggregated top-10 list.

Resources