Intersecting Sorted Sets in Redis - data-structures

I have a sorted set in Redis containing values like those below:
ZADD ranking1 0 Kyle Neath
ZADD ranking1 1 Cameron McEfee
ZADD ranking1 2 Ben Bliekamp
ZADD ranking1 3 Justin Palmer
ZADD ranking2 0 Cameron McEfee
ZADD ranking2 1 Justin Palmer
ZADD ranking2 2 Kyle Neath
ZADD ranking2 3 Ben Bliekamp
... and so on.
Is there a way to fetch the scores for a certain person and return them in list form? As an example, calling Kyle Neath would return [0, 2]. Should I be modeling this differently to achieve the same thing?

With the current layout of the data, the only way to achieve the list is using one zscore per ranking.
Besides this sorted sets, you could have one hash per person with their position in each ranking. The memory usage won't be much higher, since strings are reused and hashes are pretty cheap.
For example:
HMSET "Kyle Neath" ranking1 0 ranking2 2
HMSET "Cameron McEfee" ranking1 1 ranking2 0
HMSET "Ben Bliekamp" ranking1 2 ranking2 3
HMSET "Justin Palmer" ranking1 3 ranking2 1
And to fetch the list
HVALS "Kyle Neath"
But you will have to ensure consistency of the sorted sets and the hashes in your application code.

Related

How to convert layers of many-to-one relationships into input data without losing information?

I apologize if the question doesn't make much sense.
I'm trying to convert data into inputs for a classifier. I have multiple layers of many to one data which I need to "attach" to the top layer. I will try to explain with an example:
Let's say you have a household, each household contains at least one person (represented with categorical data male/female/other), each person has some amount of pets (represented with categorical data dog/cat/rat/etc..). Is it possible to represent this data into one row (for the household) without losing information?
One way I could think of doing it was the count of the amount of data for each category, so a household would have 2 males, 1 female, 2 dogs and 1 cat. Except this loses the information about how the data itself is structured, like if the female has all of the pets, that data doesn't tell you that.
The other way would be structuring each household into a database, so each row is a person containing m/f/o and the amount of each pet, then performing some dimensionality reduction technique to put it all on one row for the household but I'm not sure if this is feasible.
So yeah, any advice would be appreciated.
have you tried using the one-hot-encoding method to represent each on a single line? Or does this method work?
import pandas as pd
df = pd.DataFrame({'household':["M","M","F","M","F","F"],
'pets': ["Cat","Dog","Cat","Hamster","Dog","Cat"],
'amount of pets': [2,1,1,2,2,3]})
df=pd.get_dummies(df,columns=["pets"])
print(df)
household amount of pets pets_Cat pets_Dog pets_Hamster
0 M 2 1 0 0
1 M 1 0 1 0
2 F 1 1 0 0
3 M 2 0 0 1
4 F 2 0 1 0
5 F 3 1 0 0
print(df.iloc[:1])
household amount of pets pets_Cat pets_Dog pets_Hamster
0 M 2 1 0 0
In this way, it is possible to see both the animal owned by the household and its number in a single line. Is this the representation you want?

Avoid creation of Dense Matrix on a large dataset in Python

perhaps You might think that this question has already been answered but I didn't find anything useful on the internet.
I am trying to find similar users of a movies dataset, using jaccard distance. Each user has a userId from 1 to 1000. For each user, we store the movies he has seen (movieId) and the rating he has left. movieIds are integers from 1 to 100.000 and ratings are values from 1 to 10. If a rating is 0 then we suppose that the user didn't watch that particular movie.
So, our dense matrix should look like this:
movie1 | movie2 | movie3 | movie4 | .... | movie100000
user1: 5 0 3 2 .... 0
user2: 0 0 1 4 .... 0
user3: 0 0 0 0 .... 1
..... .... .... .... .... .... ....
user1000: 0 2 0 0 .... 0
Notice that because the dataset is too large, there will be a lot of zeros. Also, this matrix is 1000x100000 size. This means that a normal laptop will need to utilize a lot RAM and the algorithm will take a lot of time only to construct the dense matrix. That's why i am trying to avoid this approach. Performance matters a lot.
Is there any way to work around this?
Thanks in advance.

Algorithms for optimal student seating arrangements

Say I need to place n=30 students into groups of between 2 and 6, and I collect the following preference data from each student:
Student Name: Tom
Likes to sit with: Jimi, Eric
Doesn't like to sit with: John, Paul, Ringo, George
It's implied that they're neutral about any other student in the overall class that they haven't mentioned.
How might I best run a large number of simulations of many different/random grouping arrangements, to be able to determine a score for each arrangement, through which I could then pick the "most optimal" score/arrangement?
Alternatively, are there any other methods by which I might be able to calculate a solution that satisfies all of the supplied constraints?
I'd like a generic method that can be reused on different class sizes each year, but within each simulation run, the following constants and variables apply:
Constants: Total number of students, Student preferences
Variables: Group sizes, Student Groupings, Number of different group arrangements/iterations to test
Thanks in advance for any help/advice/pointers provided.
I believe you can state this as an explicit mathematical optimization problem.
Define the binary decision variables:
x(p,g) = 1 if person p is assigned to group g
0 otherwise
I used:
I used your data set with 28 persons, and your preference matrix (with -1,+1,0 elements). For groups, I used 4 groups of 6 and 1 group of 4. A solution can look like:
---- 80 PARAMETER solution using MIQP model
group1 group2 group3 group4 group5
aimee 1
amber-la 1
amber-le 1
andrina 1
catelyn-t 1
charlie 1
charlotte 1
cory 1
daniel 1
ellie 1
ellis 1
eve 1
grace-c 1
grace-g 1
holly 1
jack 1
jade 1
james 1
kadie 1
kieran 1
kristiana 1
lily 1
luke 1
naz 1
nibah 1
niko 1
wiki 1
zeina 1
COUNT 6 6 6 6 4
Notes:
This model can be linearized, so it can be fed into a standard MIP solver
I solved this directly as a MIQP model (actually the solver reformulated the model into a MIP). The model solved in a few seconds.
Probably we need to add extra logic to make sure one person is not getting a really bad assignment. We optimize here only the total sum. This overall sum may allow an individual to get a bad deal. It is an interesting exercise to take this into account in the model. There are some interesting trade-offs.
1st approach should be, create matrix n x n where n is total number of students, indexes for row and columns are ordinals for every student, and each column representing preferences for sitting with the others students. Fills the cells with values 1=Like to sit, -1 = the Opposite, 0 = neutral. Zeroes to be filled too on main diagonal (i,i)
------Mark Maria John Peter
Mark 0 1 -1 1
Maria 0 0 -1 1
John -1 1 0 1
Peter 0
Score calculations are based on sums of these values. So ie: John likes to sit with Maria, = 1, but Maria doesn't like to sit with John -1, result is 0. Best result is when both score (sum) 2.
So on, based on Group Sizes, calculate Score of each posible combination. Bigger the score, better the arrangement. Combinations discriminate values on main diagonal. ie: John grouped with the same John is not a valid combination/group.
In a group size of 2, best score is 2
In a group size of 3, best score is 6,
In a group size of 4, best score is 12
In a group size of n, best score would be (n-1)*n
Now in ordered list of combinations / groups, you should take first the best tuples with highest scores, but avoiding duplicates of students between tuples.
In a recent research, a PSO was implemented to classify students under unknown number of groups of 4 to 6. PSO showed improved capabilities compared to GA. I think that all you need is the specific research.
The paper is: Forming automatic groups of learners using particle swarm optimization for applications of differentiated instruction
You can find the paper here: https://doi.org/10.1002/cae.22191
Perhaps the researchers could guide you through researchgate: https://www.researchgate.net/publication/338078753
Regarding the optimal sitting you need to specify an objective function with the specific data

Redis: Group & Sum multiple ZSET Sorted Sets into one Sorted Set

If I have two sorted set with different set of members with different scores:
ZADD set1 10 "player1"
ZADD set1 15 "player2"
ZADD set1 5 "player3"
ZADD set2 30 "player1"
ZADD set2 22 "player3"
I need to merge the above 2 sets by grouping the common all players and summing up their scores and get like this:
set3 40 "player1"
set3 15 "player2"
set3 27 "player3"
One way I tried is to fetch this data into Ruby objects and do the grouping logic. I am looking for ways to do this within Redis.
You're in luck because redis supports this out of the box!
ZINTERSTORE destination numkeys key [key ...] [WEIGHTS weight] [AGGREGATE SUM|MIN|MAX]
summary: Intersect multiple sorted sets and store the resulting sorted set in a new key
so in your case:
ZINTERSTORE set3 2 set1 set2 AGGREGATE SUM
and voila! set3 contains the common players with summed scores:
127.0.0.1:6379> ZRANGE set3 0 -1 WITHSCORES
1) "player3"
2) "27"
3) "player1"
4) "40"

Array problem using if and do loop

This is my code:
data INDAT8; set INDAT6;
Array myarray{24,27};
goodgroups=0;
do i=2 to 24 by 2;
do j=2 to 27;
if myarray[i,j] gt 1 then myarray[i+1,j] = 'bad';
else if myarray[i,j] eq 1 and myarray[i+1,j] = 1 then myarray[i+1,j]= 'good';
end;
end;
run;
proc print data=INDAT8;
run;
Problem:
I have the data in this format- it is just an example: n=2
X Y info
2 1 good
2 4 bad
3 2 good
4 1 bad
4 4 good
6 2 good
6 3 good
Now, the above data is in sorted manner (total 7 rows). I need to make a group of 2 , 3 or 4 rows separately and generate a graph. In the above data, I made a group of 2 rows. The third row is left alone as there is no other column in 3rd row to form a group. A group can be formed only within the same row. NOT with other rows.
Now, I will check if both the rows have “good” in the info column or not. If both rows have “good” – the group formed is also good , otherwise bad. In the above example, 3rd /last group is “good” group. Rest are all bad group. Once I’m done with all the rows, I will calculate the total no. of Good groups formed/Total no. of groups.
In the above example, the output will be: Total no. of good groups/Total no. of groups => 1/3.
This is the case of n=2(size of group)
Now, for n=3, we make group of 3 rows and for n=4, we make a group of 4 rows and find the good /bad groups in a similar way. If all the rows in a group has “good” block—the result is good block, otherwise bad.
Example: n= 3
2 1 good
2 4 bad
2 6 good
3 2 good
4 1 good
4 4 good
4 6 good
6 2 good
6 3 good
In the above case, I left the 4th row and last 2 rows as I can’t make group of 3 rows with them. The first group result is “bad” and last group result is “good”.
Output: 1/ 2
For n= 4:
2 1 good
2 4 good
2 6 good
2 7 good
3 2 good
4 1 good
4 4 good
4 6 good
6 2 good
6 3 good
6 4 good
6 5 good
In this case, I make a group of 4 and finds the result. The 5th,6th,7th,8th row are left behind or ignored. I made 2 groups of 4 rows and both are “good” blocks.
Output: 2/2
So, After getting 3 output values from n=2 , n-3, and n=4 I will plot a graph of these values.
If you can help in any any language using array, if and do loop. it would be great.
I can change my code accordingly.
Update:
The answer for this doesn't have to be in sas. Since it is more algorithm-related than anything, I will accept suggestions in any language as long as they show how to accomplish this using arrays and do.
I am having trouble understanding your problem statement, but from what I can gather here is what I can suggest:
Place data into bins and the process the summary data.
Implementation 1
Assumption: You don't know what the range of the first column will be or distriution will be sparse
Create a hash table. The Key will be the item you are doing your grouping on. The value will be the count seen so far.
Proces each record. If the key already exists, increment the count (value for that key in the hash). Otherwise add the key and set the value to 1.
Continue until you have processed all records
Count the number of keys in the hash table and the number of values that are greater than your threshold.
Implementation 2
Assumption: You know the range of the first column and the distriution is reasonably dense
Create an array of integers with enough elements so the index can match the column value. Initialize all elements to zero. This array will hold your count for each item you are grouping on
Process each record. Examine value of first column. Increment corresponding index in array. (So if you have "2 1 good", do groupCount[2]++)
Continue until you have processed all records
Walk each element in the array. Count how many items are non zero (meaning they appeared at least once) and how many items meet your threshold.
You can use the same approach for gathering the good and bad counts.

Resources