Extract possible sample combinations from multiple count constraints - algorithm

I have some input data like this.
unique ID
Q1
Q2
Q3
1
1
1
2
2
1
1
2
3
1
0
3
4
2
0
1
5
3
1
2
6
4
1
3
And my target is to extract some data which satisfy the following conditions:
total count: 4
Q1=1 count: 2
Q1=2 count: 1
Q2=1 count: 1~3
Q3=1 count: 1
In this case, both data set with ids [1, 2, 4, 5] or [2, 3, 4, 5] are acceptable answers.
In reality, I will possibly have 6000+ rows of data and up to 12 count limitation like above. The count might varies from 1 to 50.
I've written a solution which firstly group all ids by each condition, then use deapth first search to exhaustedly try out all possible combinations between the groups. (I believe this is a brute-force solution...)
However, I always run out my computer's memory and my time before I can get a possible answer.
My question is,
what's the possible least time complexity of this problem. (I believe this is kind of subset sum problem, but I am not sure)
how can I solve this problem instead of a brute-force one? I'm considering dynamic programming or decision tree. However, I believe that I will possibly run out of my computer's memory with either of this one. Or can I solve this problem by each data row's probabilities/entropy (and I would appreciate more details on this)?
My brute-force solution sample codes are not worth reading at all. Thus, I'll skip posting my code snippets...

Related

Practical example of the Optimal Page Replacement Algorithm, with 4 Tiles

I'm doing some theoretical examples with different page replacement algorithms, in order to get a better understanding for when I actually write the code. I'm kind of confused about this example.
Given below is a physical memory with 4 tiles (4 sections?). The following pages are visited one after the other:
R = 1, 2, 3, 2, 4, 5, 3, 6, 1, 4, 2, 3, 1, 4
Run the optimal page replacement algorithm on R with 4 tiles.
I know that when a page needs to be swapped in, the operating system swaps out the page whose next use will occur farthest in the future. In practice I'll have:
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Page 1 2 3 2 4 5 3 6 1 4 2 3 1 4
Tile 1 1 1 1
Tile 2 2 2
Tile 3 3
Tile 4
I'm not sure what happens at time 4 because we get page 2, but thats already present in the memory. Normally, if it was another number like 6, then it would go in Tile 4 but I'm lost in this case.
At time t=4, page 2 is already present, so there is no need to do anything. You can just skip it and move to the next time interval.
If there was a another number like 6, if there is a free slot available, you move it there, or else find the page that won't be used for the longest duration in the future and swap it.

Looking for a sorting algorithm

I am looking for a sorting algorithm to help me in my work. My objective is the following: after receiving an input of this kind:
5 4
1 2
2 3
3 4
4 5
The first line tells me how many ids I have, and the second number tells me how many connections. The following lines tell me the connections, and tell me that the first Id comes before the second one, for example: 1 comes before 2, 2 comes before 3, and so on. And if an impossible situation occurs:
3 2
1 2
2 3
3 1
or
2 2
1 2
2 1
I want to be able to send an error message.
Is there an algorithm that already does this? or can u give me some guide lines to how to start my work? I do not want ur code just some help/tips/advices. Thanks in advance for ur time.
From your description, I think you are probably looking for topological sorting.
It is based on the assumption that 'impossible situation' occurs when one connections suggests that A comes before B but there is some another connection which suggests that B comes before A.
Link for topological sort:
Topological Sorting

How to find a recurring pattern into a list of numbers?

I need a way to find pattern in list of values. In particular every second I get a value in a range (ex. 1-3), and I want to find recurring pattern from this value list.
If i plot this values into an x,y system i'd get something like a Nyquist–Shannon sampling. It could be very interesting to work on this.
I could also plot these values and work on visual pattern recognition (neural networks...).
input:
instant value
1 1
2 2
3 3
4 1
5 2
6 3
7 1
output->1,2,3
What could be the best way to proceed ?

Recommendation system and baseline predictors

I have a bunch of data where the first column represents users, the second column is movies, and the third is a ten-points rating.
0 0 9
0 1 8
1 1 4
1 2 6
2 2 7
And I have to predict the third number for another ser of data (user, movie, ?):
0 2
1 0
2 0
2 1
I use this way for finding bias values https://youtube.com/watch?v=dGM4bNQcVKI and this way for predicting https://www.youtube.com/watch?v=4RSigTais8o.
Bias value for user number 0: 9 + 8 / 2 = 8.5 - 1.5 = 7.
Bias value for movie number 2: 6 + 7 / 2 = 6.5 - 1.5 = 5.
And baseline predictors:
1.5 + 7 + 5, where result is 13.5, but in contest result is: 7.052009.
But the problem description says the result of my Recommendation system should be:
0 2 7.052009
1 0 6.687943
2 0 6.995272
2 1 6.687943
Where is my mistake?
The raw average is the average of ALL the present scores ((9+8+4+6+7) / 5 = 6.8), I don't see that number anywhere, so I guess that's your error.
In the video Prof. used the raw average of 3.5 on all the calculations, including calculating bias, he skipped how to reach that number, if you add all numbers on the table of the video and divide, you get 3.5.
0 2 9.2 is the answer for the first one, using your videos as guide. The videos claims to have avoided calculus, the different final answers of the contest probably come from using the "full" method.
0 2 ?, user 0 (row 0: 9 8 x), movie 2 (column 2: x 6 7)
raw average = 6.8
bias user 0: (9+8) / 2 - 6.8 = 1.7
bias movie 2: (6+7) / 2 - 6.8 = -0.3
prediction: 6.8+1.7-0.3 = 8.2
The problem looks like a variation of the Netflix Contest, the contest' host knows the actual answers (the ratings), he doesn't give them to you, you are expected to guess/predict them, the winner of the contest is the one that gets the closest to the actual answers.
The winner of you contest got the closest, but he got there using an unknown method, or his own variation of a know method, if your goal is to match his answer exactly, you are better off asking him what method he used and how did he modify it, and try to replicate his results.
If this was homework and not a contest, then the teacher would expect you to use the "correct" method he taught you (there's no set method, just many methods that work with different accuracy), you'd have to use it exactly like he taught you. But it is a contest, your goal is to find a base method that approximates the best (the one you used is very low on accuracy), and tinker with it a bit to get even better results.
If you want to understand the link I suggest you research and later ask a statistics question, because it's just plain statistics. You can try to understand the link or research Matrix factorization on your own. Remember that to get contest winning results (or close) you won't be able to use a simple method like the one you found on the youtube video, but require a method with a lot more math.

sorting cards with wildcards

i am programming a card game and i need to sort a stack of cards by their rank. so that they form a gapless sequence.
in this special game the card with value 2 could be used as a wild card, so for example the cards
2 3 5
should be sorted like this
3 2 5
because the 2 replaces the 4, otherwise it would not be a valid sequence.
however the cards
2 3 4
should stay like they are.
restriction: there an be only one '2' used as a wildcard.
2 2 3 4
would also stay like it is, because the first 2 would replace the ACE (or 1, whatever you call it).
the following would not be a valid input sequence, since one of the 2s must be use as a wildcard and one not. it is not possible to make up a gapless sequence then.
2 4 2 6
now i have a difficulty to figure out if a 2 is used as a wildcard or not. once i got that, i think i can do the rest of the sorting
thanks for any algorithmic help on this problem!
EDIT in response to your clarification to your new requirement:
You're implying that you'll never get data for which a gapless sequence cannot be formed. (If only I could have such guarantees in the real world.) So:
Do you have a 2?
No: your sequence must already be gapless.
Yes: You need to figure out where to put it.
Sort your input. Do you see a gap? Since you can only use one 2 as a wildcard, there can be at most one gap.
No: treat the 2 as a legitimate number two.
Yes: move the 2 to the gap to fill it in.
EDIT in response to your new requirement:
In this case, just look for the highest single gap, and plug it with a 2 if you have a 2 available.
Original answer:
Since your sequence must be gapless, you could count the number of 2s you have and the sizes of all the gaps that are present. Then just fill in the highest gap for which you have a sufficient number of 2s.

Resources