Related
For educational purposes I am trying to learn the quick sort algorithm. Instead of checking out an implementation on the web or trying to implement directly from the pseudocode on wikipedia, I am trying a "hard way" approach.
I watched this lecture from CS50 https://www.youtube.com/watch?v=aQiWF4E8flQ&t=305s in order to understand how the numbers move while being "quick sorted". My implementation, which I will show bellow, works perfectly for the example provided on the video. The example on the video of an initial unsorted array is this:
That's my code in Python 3:
len_seq = int(input())
print("len_seq",len_seq)
seq_in = list(map(int,(input()).split()))
print("seq_in",seq_in)
def my_quick_sort(seq):
wall_index = 0
pivot_corect_final_index_list = []
while wall_index<len_seq:
pivot = seq[-1]
print("pivot",pivot)
print("wall_index",wall_index)
for i in range(wall_index,len_seq):
print ("seq[i]",seq[i])
if seq[i]<pivot:
print("before inside swap,", seq)
seq[wall_index],seq[i]=seq[i],seq[wall_index]
print("after inside swap,", seq)
wall_index = wall_index + 1
print ("wall_index",wall_index)
print("before outside swap,", seq)
seq[wall_index],seq[-1]=seq[-1],seq[wall_index]
print("after outside swap,", seq)
pivot_corect_final_index = wall_index
print ("pivot correct final index",pivot_corect_final_index)
pivot_corect_final_index_list.append(pivot_corect_final_index)
print ("pivot_corect_final_index_list",pivot_corect_final_index_list)
wall_index = wall_index + 1
print ("wall_index",wall_index)
return seq
print(my_quick_sort(seq_in))
To use harvard's CS50 example in my code you need to input this:
9
6 5 1 3 8 4 7 9 2
The algorithm works fine and returns the correct output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Continuing my study, I tried to implement Khan Academy's example: https://www.khanacademy.org/computing/computer-science/algorithms/quick-sort/a/overview-of-quicksort
The unsorted list in this case is:
[9, 7, 5, 11, 12, 2, 14, 3, 10, 6]
You need to input the following in my code in order to run it:
10
9 7 5 11 12 2 14 3 10 6
Differently from the Harvard example, in this case my implementation does not work perfectly. It returns:
[5, 2, 3, 6, 7, 9, 10, 11, 12, 14]
As you see, all the numbers that I treated as pivots end in the correct position. However, some numbers behind the pivots are not right.
Reading khan academy's article it seems that my implementation is right on the partition step. However, it is not right on the conquer step. I am trying to avoid looking to a final solution. I am trying to improve what I have build so far. Not sure if this is the best method, but that's what I am trying right now.
How can I fix the conquer step? Is it necessary to introduce a recursive approach? How can I do that inside my iterative process going on?
And should that step be introduced after successfully treating each pivot?
Thanks for the patience of reading this long post.
Can't comment, not enough reputation.
In the first pass of your algorithm, you correctly place all elements smaller than the pivot to the left of the pivot. However, since your value of wall_index increases (e.g. from 0 to 1), you ignore the leftmost element with index 0 (it might not be in the correct position, so it should not be ignored).
In the Khan academy test case, the number 5 gets placed at the leftmost index in the first pass, and then gets ignored by subsequent passes, thus it gets stuck on the left. Similarly, trying this modification of the harvard example
9
6 5 1 3 8 4 7 2 9
yields
[6, 5, 1, 3, 8, 4, 7, 2, 9]
After the first partitioning, you have to make sure to apply quicksort to both the arrays to the left and to the right of the pivot. For example, after the first pivot (6) is placed in the correct position for the Khan example (what you labeled as the outside swap),
[5, 2, 3, 6, 12, 7, 14, 9, 10, 11]
<--1--> p <--------2--------->
you have to apply the same quicksort to both subarrays 1 and 2 in the diagram above. I suggest you try out the recursive implementation first, which will give you a good idea of the algorithm, then try to implement it iteratively.
I wrote a sieve using akka streams to find prime members of an arbitrary source of Int:
object Sieve extends App {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer(ActorMaterializerSettings(system))
implicit val ctx = implicitly[ExecutionContext](system.dispatcher)
val NaturalNumbers = Source.fromIterator(() => Iterator.from(2))
val IsPrimeByEurithmethes: Flow[Int, Int, _] = Flow[Int].filter {
case n: Int =>
(2 to Math.floor(Math.sqrt(n)).toInt).par.forall(n % _ != 0)
}
NaturalNumbers.via(IsPrimeByEurithmethes).throttle(100000, 1 second, 100000, ThrottleMode.Shaping).to(Sink.foreach(println)).run()
}
Ok, so this appears to work decently well. However, there are at least a few potential areas of concern:
The modulo checks are run using par.forall, ie they are totally hidden within the Flow that filters, but I can see how it would be useful to have a Map from the candidate n to another Map of each n % _. Maybe.
I am checking way too many of the candidates needlessly - both in terms of checking n that I will already know are NOT prime based on previous results, and by checking n % _ that are redundant. In fact, even if I think the n is prime, it suffices to check only the known primes up until that point.
The second point is my more immediate concern.
I think I can prove rather easily that there is a more efficient way - by filtering out the source given each NEW prime.
So then....
2, 3, 4, 5, 6, 7, 8, 9, 10, 11... => (after finding p=2)
2, 3, 5, 7, 9, , 11... => (after finding p=3)
2, 3, 5, 7, , 11... => ...
Now after finding a p and filtering the source, we need to know whether the next candidate is a p. Well, we can say for sure it is prime if the largest known prime is greater than its root, which will Always happen I believe, so it suffices to just pick the next element...
2, 3, 4, 5, 6, 7, 8, 9, 10, 11... => (after finding p=2) PICK n(2) = 3
2, 3, 5, 7, 9, , 11... => (after finding p=3) PICK n(3) = 5
2, 3, 5, 7, , 11... => (after finding p=5) PICK n(5) = 7
This seems to me like a rewriting of the originally-provided sieve to do far fewer checks at the cost of introducing a strict sequential dependency.
Another idea - I could remove the constraint by working things out in terms of symbols, like the minimum set of modulo checks that necessitate primality, etc.
Am I barking up the wrong tree? IF not, how can I go about messing with my source in this manner?
I just started fiddling around with akka streams recently so there might be better solutions than this (especially since the code feels kind of clumsy to me) - but your second point seemed to be just the right challenge for me to try out building a feedback loop within akka streams.
Find my full solution here: https://gist.github.com/MartinHH/de62b3b081ccfee4ae7320298edd81ee
The main idea was to accumulate the primes that are already found and merge them with the stream of incoming natural numbers so the primes-check could be done based on the results up to N like this:
def isPrime(n: Int, primesSoFar: SortedSet[Int]): Boolean =
!primesSoFar.exists(n % _ == 0) &&
!(primesSoFar.lastOption.getOrElse(2) to Math.floor(Math.sqrt(n)).toInt).par.exists(n % _ == 0)
As part of a program I'm writing I need to make sure a variable does not equal any number that is the result of multiplying 2 numbers in a given list. For example: I've got a list Primes = [2, 3, 5, 7, 11] and I need to make sure that X does not equal any two of those numbers multiplied together such as 6 (2*3) or 55 (5*11) etc...
The code I have is as follows:
list(Numbers):-
Numbers = [X, Y, Sum],
between(3,6,Y),
between(3,6,X),
Primes = [2, 3, 5, 7, 11],
Sum is X+Y,
(Code i need help with)
The above code wiill type out results of [3,3,6], [4,3,7], [5,3,8] and so on. Now what I want is to be able to identify when sum is equal to a prime * prime and exclude that from the results. Something like Sum \= prime * prime. However, I don't know how to loop through the elements in Prime in order to multiply two elements together and then do that for all element in the list.
Hope this makes sense; im not great at explaining things.
Thanks in advance.
This is inefficient, but easy to code:
...
forall((nth(I,Primes,X),nth(J,Primes,Y),J>I), Sum =\= X*Y).
I think you could use that loop to initialize a list of precomputed factors, then use memberchk/2.
In SWI-Prolog use nth1/3 instead of nth/3
I'm programming a Killer Sudoku Solver in Ruby and I try to take human strategies and put them into code. I have implemented about 10 strategies but I have a problem on this one.
In killer sudoku, we have "zones" of cells and we know the sum of these cells and we know possibilities for each cell.
Example :
Cell 1 can be 1, 3, 4 or 9
Cell 2 can be 2, 4 or 5
Cell 3 can be 3, 4 or 9
The sum of all cells must be 12
I want my program to try all possibilities to eliminate possibilities. For instance, here, cell 1 can't be 9 because you can't make 3 by adding two numbers possible in cells 2 and 3.
So I want that for any number of cells, it removes the ones that are impossible by trying them and seeing it doesn't work.
How can I get this working ?
There's multiple ways to approach the general problem of game solving, and emulating human strategies is not always the best way. That said, here's how you can solve your question:
1st way, brute-forcy
Basically, we want to try all possibilities of the combinations of the cells, and pick the ones that have the correct sum.
cell_1 = [1,3,4,9]
cell_2 = [2,4,5]
cell_3 = [3,4,9]
all_valid_combinations = cell_1.product(cell_2,cell_3).select {|combo| combo.sum == 12}
# => [[1, 2, 9], [3, 5, 4], [4, 4, 4], [4, 5, 3]]
#.sum isn't a built-in function, it's just used here for convenience
to pare this down to individual cells, you could do:
cell_1 = all_valid_combinations.map {|combo| combo[0]}.uniq
# => [1, 3, 4]
cell_2 = all_valid_combinations.map {|combo| combo[1]}.uniq
# => [2, 5, 4]
. . .
if you don't have a huge large set of cells, this way is easier to code. it can get a bit inefficienct though. For small problems, this is the way I'd use.
2nd way, backtracking search
Another well known technique takes the problem from the other approach. Basically, for each cell, ask "Can this cell be this number, given the other cells?"
so, starting with cell 1, can the number be 1? to check, we see if cells 2 and 3 can sum to 11. (12-1)
* can cell 2 have the value 2? to check, can cell 3 sum to 9 (11-1)
and so on. In very large cases, where you could have many many valid combinations, this will be slightly faster, as you can return 'true' on the first time you find a valid number for a cell. Some people find recursive algorithms a bit harder to grok, though, so your mileage may vary.
I'm looking for a way to make the best possible combination of people in groups. Let me sketch the situation.
Say we have persons A, B, C and D. Furthermore we have groups 1, 2, 3, 4 and 5. Both are examples and can be less or more. Each person gives a rating to each other person. So for example A rates B a 3, C a 2, and so on. Each person also rates each group. (Say ratings are 0-5). Now I need some sort of algorithm to distribute these people evenly over the groups while keeping them as happy as possible (as in: They should be in a highrated group, with highrated people). Now I know it's not possible for the people to be in the best group (the one they rated a 5) but I need them to be in the best possible solution for the entire group.
I think this is a difficult question, and I would be happy if someone could direct me to some more information about this types of problems, or help me with the algo I'm looking for.
Thanks!
EDIT:
I see a lot of great answers but this problem is too great for me too solve correctly. However, the answers posted so far give me a great starting point too look further into the subject. Thanks a lot already!
after establishing this is NP-Hard problem, I would suggest as a heuristical solution: Artificial Intelligence tools.
A possible approach is steepest ascent hill climbing [SAHC]
first, we will define our utility function (let it be u). It can be the sum of total happiness in all groups.
next,we define our 'world': S is the group of all possible partitions.
for each legal partition s of S, we define:
next(s)={all possibilities moving one person to a different group}
all we have to do now is run SAHC with random restarts:
1. best<- -INFINITY
2. while there is more time
3. choose a random partition as starting point, denote it as s.
4. NEXT <- next(s)
5. if max{ U(NEXT) } < u(s): //s is the top of the hill
5.1. if u(s) > best: best <- u(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max{ NEXT }
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
It is anytime algorithm, meaning it will get a better result as you give it more time to run, and eventually [at time infinity] it will find the optimal result.
The problem is NP-hard: you can reduce from Maximum Triangle Packing, that is, finding at least k vertex-disjoint triangles in a graph, to the version where there are k groups of size 3, no one cares about which group he is in, and likes everyone for 0 or for 1. So even this very special case is hard.
To solve it, I would try using an ILP: have binary variables g_ik indicating that person i is in group k, with constraints to ensure a person is only in one group and a group has an appropriate size. Further, binary variables t_ijk that indicate that persons i and j are together in group k (ensured by t_ijk <= 0.5 g_ik + 0.5 g_jk) and binary variables t_ij that indicate that i and j are together in any group (ensured by t_ij <= sum_k t_ijk). You can then maximize the happiness function under these constraints.
This ILP has very many variables, but modern solvers are pretty good and this approach is very easy to implement.
This is an example of an optimization problem. It is a very well
studied type of problems with very good methods to solve them. Read
Programming Collective Intelligence which explains it much better
than me.
Basically, there are three parts to any kind of optimization problem.
The input to the problem solving function.
The solution outputted by the problem solving function.
A scoring function that evaluates how optimal the solution is by
scoring it.
Now the problem can be stated as finding the solution that produces
the highest score. To do that, you first need to come up with a format
to represent a possible solution that the scoring function can then
score. Assuming 6 persons (0-5) and 3 groups (0-2), this python data structure
would work and would be a possible solution:
output = [
[0, 1],
[2, 3],
[4, 5]
]
Person 0 and 1 is put in group 0, person 2 and 3 in group 1 and so
on. To score this solution, we need to know the input and the rules for
calculating the output. The input could be represented by this data
structure:
input = [
[0, 4, 1, 3, 4, 1, 3, 1, 3],
[5, 0, 1, 2, 1, 5, 5, 2, 4],
[4, 1, 0, 1, 3, 2, 1, 1, 1],
[2, 4, 1, 0, 5, 4, 2, 3, 4],
[5, 5, 5, 5, 0, 5, 5, 5, 5],
[1, 2, 1, 4, 3, 0, 4, 5, 1]
]
Each list in the list represents the rating the person gave. For
example, in the first row, the person 0 gave rating 0 to person 0 (you
can't rate yourself), 4 to person 1, 1 to person 2, 3 to 3, 4 to 4 and
1 to person 5. Then he or she rated the groups 0-2 3, 1 and 3
respectively.
So above is an example of a valid solution to the given input. How do
we score it? That's not specified in the question, only that the
"best" combination is desired therefore I'll arbitrarily decide that
the score for a solution is the sum of each persons happiness. Each
persons happiness is determined by adding his or her rating of the
group with the average of the rating for each person in the group,
excluding the person itself.
Here is the scoring function:
N_GROUPS = 3
N_PERSONS = 6
def score_solution(input, output):
tot_score = 0
for person, ratings in enumerate(input):
# Check what group the person is a member of.
for group, members in enumerate(output):
if person in members:
# Check what rating person gave the group.
group_rating = ratings[N_PERSONS + group]
# Check what rating the person gave the others.
others = list(members)
others.remove(person)
if not others:
# protect against zero division
person_rating = 0
else:
person_ratings = [ratings[o] for o in others]
person_rating = sum(person_ratings) / float(len(person_ratings))
tot_score += group_rating + person_rating
return tot_score
It should return a score of 37.0 for the given solution. Now what
we'll do is to generate valid outputs while keeping track of which one
is best until we are satisfied:
from random import choice
def gen_solution():
groups = [[] for x in range(N_GROUPS)]
for person in range(N_PERSONS):
choice(groups).append(person)
return groups
# Generate 10000 solutions
solutions = [gen_solution() for x in range(10000)]
# Score them
solutions = [(score_solution(input, sol), sol) for sol in solutions]
# Sort by score, take the best.
best_score, best_solution = sorted(solutions)[-1]
print 'The best solution is %s with score %.2f' % (best_solution, best_score)
Running this on my computer produces:
The best solution is [[0, 1], [3, 5], [2, 4]] with score 47.00
Obviously, you may think it is a really stupid idea to randomly just
generate solutions to throw at the problem, and it is. There are much
more sophisticated methods to generate solutions such as simulated
annealing or genetic optimization. But they all build upon the same
framework as given above.