How to know when ShearSorting is done

How to know when ShearSorting is done - sorting

I'm currently doing some shearSorting and cannot figure out when this operation is supposed to be done with an n x n matrix.
What I'm doing currently is I'm copying the matrix at the start of each iteration of the loop to a temp matrix and then at the end of each iteration of the loop I'm comparing both the original and the temp matrices and if they are the same then I break out of the loop and exit. I do not like this approach as we always end up going through one extra iteration after the matrix in sorted and done which is a waste of CPU time and cycles.
There has to be a better way to do this checking. I keep finding references to log(n) to signify how many iteration we need but I don't believe they mean actual log(n) as log(5) for a 5x5 matrix in 0.69 which is impossible for number of iterations.
Any suggestions?

SO I know shearSort takes log(n) run iterations to complete so for a case of 5x5 matrix we will have 3 runs for rows and 3 runs for columns. But what if the 5x5 matrix I was given is kinda almost sorted and only needs one or 2 more iterations to be completed, in that case I do not see the point in iterating 6 time through it as this would be considered a waste of CPU power and cycles.
Also we have the following solution: if we copy the matrix at start of each iteration of the shearSort function to a temporary matrix and at the end of each iteration we compare the 2 matrices together and they are the same then we know that we are done (Note here an iteration would mean both a row and a column sort as a matrix might not need a row sort at first but would need a column sort after ). In this case we would be preserving CPU cycles in case the matrix doesn't need N + 1 iterations, but this solution would provide an issue which is when N + 1 iterations are needed then we would be doing N + 3 iterations to finish ( the extra 2 iterations would be one to check if 2 matrices are same for row and one for column).
To solve this we would have to use a combination of both solutions:
we would still be copying the matrix at start and comparing it to temp matrix at the end and if they are equal before we get to the N + 1 iterations then we are done and do not need to go on any further, and if they are not then we go to the N + 1 iteration and stop after since we know at this point the matrix should be sorted after N + 1 iterations.

Related

Is the computational complexity of counting runs in cribbage O(N*log(N)) in the worst case?

In the card game cribbage, counting the runs for a hand during the show (one of the stages of a turn in the game) is reporting the longest increasing subsequence which consists of only values that increase by 1. If duplicate values exist are apart of this subsequence than a double run (or triple, quadruple, et cetera) is reported.
Some examples:
("A","2","3","4","5") => (1,5) Single run for 5
("A","2","3","4","4") => (2,4) Double run for 4
("A","2","3","3","3") => (3,3) Triple run for 3
("A","2","3","4","6") => (1,4) Single run for 4
("A","2","3","5","6") => (1,3) Single run for 3
("A","2","4","5","7") => (0,0) No runs
To address cases that arise with hands larger than the cribbage hand size of 5. A run will be selected if it has the maximum product of the number duplicates of a subsequence and that subsequences length.
Some relevant examples:
("A","2","2","3","5","6","7","8","9","T","J") => (1,7) Single run for 7
("A","2","2","3","5","6","7","8") => (2,3) Double run for 3
My method for finding the maximum scoring run is as follows:
Create a list of ranks and sort it. O(N*log(N))
Create a list to store the length of the maximum run length and how many duplicates of it exist. Initialize it to [1 duplicate, 1 long].
Create an identical list as above to store the current run.
Create a flag that indicates whether the duplicate you've encountered is not the initial duplicate of this value. Initialize it to False.
Create a variable to store the increase in duplicate subsequences if additional duplicates values are found after the initial duplicate. Initialize it to 1.
Iterate over the differences between adjacent elements. O(N)
If the difference is greater than one, the run has ended. Check if the product of the elements of the max run is less than the current run and the current run has length 3 or greater. If this is true, the current run becomes the maximum run and the current run list is reset to [1,1]. The flag is reset to False. The increment for duplicate subsequences is reset to 1. Iterate to next value.
If the difference is 1, increment the length of the current run by 1 and set the flag to False. Iterate to next value.
If the difference is 0 and the flag is False, set the increment for duplicate subsequences equal to the current number of duplicates for the run. Then, double the number of duplicates for the run and set the flag to True. Iterate to the next value
If the difference is 0 and the flag is True, increase the number of the runs by the increment for duplicate subsequences value.
After the iteration, check the current run list as in step 7 against the max run and set max run accordingly.
I believe this has O(N*(1+log(N)). I believe this is the best time complexity, but I am not sure how to prove this or what a better algorithm would look like. Is there a way to do this without sorting the list first that achieves a better time complexity? If not, how does one go about proving this is the best time complexity?
iterate over the differences between

Time complexity of an algorithm is a well-traveled path. Proving the complexity of an algorithm varies slightly among mathematician clusters; rather, the complexity community usually works with modular pseudo-code and standard reductions. For instance, a for loop based on the input length is O(N) (surprise); sorting a list is known to be O(log N) at best (in the general case). For an good treatment, see Big O, how do you calculate/approximate it?.
Note: O(N x (1+log(N)) is slightly sloppy notation. Only the greatest complexity factor -- the one that dominates as N approaches infinity -- is used. Drop the 1+: it's simply O(N log N).
As I suggested in a comment, you can simply count elements. Keep a list of counts, indexed by your card values. For discussing the algorithm, don't use the "dirty" data of character representations: "A23456789TJQK"; simply use their values, either 0-12 or 1-13.
for rank in hand:
count[rank] += 1
This is a linear pass through the data, O(N).
Now, traverse your array of counts, finding the longest sequence of non-zero values. This is a fixed-length list of 13 elements, touching each element only once: O(1). If you accumulate a list of multiples (card counts, then you'll also have your combinatoric factors at the end.
The resulting algorithm and code are, therefore, O(N).
For instance, let's shorten this to 7 card values, 0-6. Given the input integers
1 2 1 3 6 1 3 5 0
You make the first pass to count items:
[1 3 1 2 0 1 1]
A second pass gives you a max run length of 4, with counts [1 3 1 2].
You report a run of 4, a triple and a double, or the point count
4 * (1 * 3 * 1 * 2)
You can also count the pair values:
2 * 3! + 2 * 2!

Heuristics for this (probably) NP-complete puzzle game

I asked whether this problem was NP-complete on the Computer Science forum, but asking for programming heuristics seems better suited for this site. So here it goes.
You are given an NxN grid of unit squares and 2N binary strings of length N. The goal is to fill the grid with 0's and 1's so that each string appears once and only once in the grid, either horizontally (left to right) or vertically (top down). Or determine that no such solution exists. If N is not fixed I suspect this is an NP-complete problem. However are there any heuristics that can hopefully speed up the search to faster than brute force trying all ways to fill in the grid with N vertical strings?

I remember programming this for my friend that had the 5x5 physical version of this game, but I used brute force back then. I can only think of this heuristic:
Consider a 4x4 map with these 8 strings (read each from left to right):
1 1 0 1
1 0 0 1
1 0 1 1
1 0 1 0
1 1 1 1
1 0 0 0
0 0 1 1
1 1 1 0
(Note that this is already solved, since the second 4 is the first 4 transposed)
First attempt:
We will choose columns from left to right. Since 7 of 8 strings start with 1, we will try to put the one with most 1s to the first column (so that we can lay rows more easily when columns are done).
In the second column, most string have 0, so you can also try putting a string with most zeros to the second row, and so on.
This i would call a wide-1 prediction, since it only looks at one column at a time
(Possible) Improvement:
You can look at 2 columns at a time (a wide-2 prediction, if i may call it like that). In this case, from the 8 strings, the most common combination of first two bits is 10 (5/8), so you would like to choose first two columns so the the combination 10 occurring as much as possible (in this case, 1111 followed by 1000 has 3 of 4 10 at start).
(Of course you don't have to stop at 2)
Weaknesses:
I don't know if this would work. I just made it up and thought it might work.
If you choose to he wide-X prediction, the number of possibilities is exponential with X
This can absolutely fail if the distribution of combinations if even.
What you can do:
As i said, this game has physical 5x5 adaptation, only there you can also lay the string from right-to-left and bottom-to-top, if you found that name, you could google further. I unfortunately don't remember it.

Sounds like you want the crossword grid filling algorithm:
First, build 2N subsets of your 2N strings -- each subset has all the strings with a particular bit at a particular postion. So subset(0,3) is all the strings that have a 0 in the 3rd position and subset(1,5) is all the strings that have a 1 in the 5th position.
The algorithm is a basic brute-force depth fist search trying all possible mappings of strings to slots in the grid, with severe pruning of impossible branches
Your search state is a set of assignments of strings to slots and a set of sets of possible assignments to the remaining slots. The initial state has 0 assignments and 2N sets, all of which contain all 2N strings.
At each step of the search, pick the most constrained set (the set with the fewest elements) from the set of possible sets. Try each element of the set in turn in that slot (adding it to the assigments and removing it from the set of sets), and constrain all the remaining sets of sets by removing the chosen string and intersecting the crossing sets with subset(X,N) (computed in step 1) where X is the bit from the chosen string and N is the row/column number of the chosen string
If you find an empty set when picking above, there is no solution with the choices so far, so backtrack up the tree to a different choice
This is still EXPTIME, but it is about as fast as you can get it. Since the main time consuming step is the set intersections, using 2N bit binary strings for your set representation is very fast -- for N=32, the sets fit in a 64-bit word and can be intersected with a single AND instruction. It also helps to have a POPCOUNT instruction, since you also need set sizes.

This can be solved as a 0/1 integer linear program with O(N^2) variables and constraints. First there are variables Xij which are 1 if string i is assigned to line j (where j=1 to N are rows and j = (N+1) to 2N are columns). Then there is a variable for each square in the grid, which indicates if the entry is 0 or 1. If the position of the square is (i,j) with variable Yij then the sum of all X variables for line j that correspond to strings that have a 1 in position i is equal to Yij, and the sum of all X variables for line j that correspond to strings that have a 0 in position i is equal to (1 - Yij). And similarly for line i and position j. Finally, the sum of all X variables Xij for each string i (summed over all lines j) is equal to 1.
There has been a lot of research in speeding up solvers for 0/1 integer programming so this may be able to often handle fairly large N (like N=100) for many examples. Also, in some cases, solving the relaxed non-integer linear program and rounding the solution off to 0/1 may produce a valid solution, in polynomial time.

We could choose the first lg 2N rows out of the 2N strings, and then since 2^(lg 2N) = 2N, in a lot of cases there shouldn't be very many ways to assign the N columns so that the prefixes of length lg 2N are respected. Then all the rows are filled in so they can be checked to see if a solution has been found. We can also try assigning more rows in the beginning, and fill in different combinations of rows besides the initial rows. (e.g. we can try filling in contiguous rows starting anywhere in the grid).
Running time for assigning lg 2N rows out of 2N strings is O((2N)^(lg 2N)) = O(2^((lg 2N)^2)), which grows slower than 2^N. Assigning columns to match the prefixes is the part that's the hardest to predict run time. If a prefix occurs K times among the assigned rows, and there are M remaining strings that have the prefix, then the number of assignments for this prefix is M*(M-1)...(M-K+1). The total number of possible column assignments is the product of these terms over all prefixes that occur among the rows. If this gets to be too large, the number of rows initially assigned can be increased. But it's hard to predict the worst-case run time unless an assumption is made like the NxN grid is filled in randomly.

From the given array of numbers find all the of numbers in group of 3 with sum value N

Given is a array of numbers:
1, 2, 8, 6, 9, 0, 4
We need to find all the numbers in group of three which sums to a value N ( say 11 in this example). Here, the possible numbers in group of three are:
{1,2,8}, {1,4,6}, {0,2,9}
The first solution I could think was of O(n^3). Later I could improve a little(n^2 log n) with the approach:
1. Sort the array.
2. Select any two number and perform binary search for the third element.
Can it be improved further with some other approaches?

You can certainly do it in O(n^2): for each i in the array, test whether two other values sum to N-i.
You can test in O(n) whether two values in a sorted array sum to k by sweeping from both ends at once. If the sum of the two elements you're on is too big, decrement the "right-to-left" index to make it smaller. If the sum is too small, increment the "left-to-right" index to make it bigger. If there's a pair that works, you'll find them, and you perform at most 2*n iterations before you run out of road at one end or the other. You might need code to ignore the value you're using as i, depends what the rules are.
You could instead use some kind of dynamic programming, working down from N, and you probably end up with time something like O(n*N) or so. Realistically I don't think that's any better: it looks like all your numbers are non-negative, so if n is much bigger than N then before you start you can quickly throw out any large values from the array, and also any duplicates beyond 3 copies of each value (or 2 copies, as long as you check whether 3*i == N before discarding the 3rd copy of i). After that step, n is O(N).

Determining running time of an algorithm to compare two arrays

I want to know how it is possible to determine the run time of an algorithm written in pseudocode so that I can familiarize myself with run time. So for example, how do you know what the run time of an algorithm that will compare 2 arrays to determine if they are not the same?
Array 1 = [1, 5, 3, 2, 10, 12] Array 2 = [3, 2, 1, 5, 10, 12]
So these two arrays are not the same since they are ordered differently.
My pseudocode is:
1) set current pointer to first number in first array
2) set second pointer to first number in second array
3) while ( current pointer != " ") compare with same position element in other array
4) if (current pointer == second pointer)
move current pointer to next number
move second pointer to next number
5) else (output that arrays are not the same)
end loop
So I am assuming first off my code is correct. I know step 4 executes only once since it only takes 1 match to display arrays are not the same. So step 4 takes only constant time (1). I know step 1 and 2 only execute once also.
so far I know run time is 3 + ? (? being the run time of loop itself)
Now I am lost on the loop part. Does the loop run n times (n being number of numbers in the array?), since worst case might be every single number gets matched? Am I thinking of run time in the right way?
If someone can help with this, I'll appreciate it.
Thanks!

What you are asking about is called the time-complexity of your algorithm. We talk about the time complexity of algorithms using so called Big-O notation.
Big-O notation is a method for talking about the approximate number of steps our algorithms take relative to the size of the algorithms input, in the worst possible case for an input of that size.
Your algorithm runs in O(n) time (pronounced "big-oh of n" or "order n" or sometimes we just say "linear time").
You already know that steps 1,2, and 4 all run in a constant number of steps relative to the size of the array. We say that those steps run in O(1) time ("constant time").
So let's consider step 3:
If there are n elements in the array, then step 3 needs to do n comparisons in the worst case. So we say that step 3 takes O(n) time.
Since the algorithm takes O(n) time on step 3, and all other steps are faster, we say that the total time complexity of your algorithm is O(n).
When we write O(f), where f is some function, we mean that the algorithm runs within some constant factor of f for large values.
Take your algorithm for example. For large values of n (say n = 1000), the algorithm doesn't take exactly n steps. Suppose that a comparison takes 5 instructions to complete in your algorithm, on your machine of choice. (It could be any constant number, I'm just choosing 5 for example.) And suppose that steps 1, 2, 4 all take some constant number of steps each, totalling 10 instructions for all three of those steps.
Then for n = 1000 your algorithm would take:
Steps 1 + 2 + 4 = 10 instructions. Step 3 = 5*1000 = 5000 instructions.
This is a total of 5010 instructions. This is about 5*n instructions, which is a constant factor of n, which is why we say it is O(n).
For very large n, the 10 in f = 5*n + 10 becomes more and more insignificant, as does the 5. For this reason, we simply reduce the function to f is within a constant factor of n for large n by saying f is in O(n).
In this way it's easy to describe the idea that a quadratic function like f1 = n^2 + 2 is always larger than any linear function like f2 = 10000*n + 50000 when n is large enough, by simply writing f1 as O(n) and f2 as O(n^2).

You are correct. The running time is O(n) where n is the number of elements in the arrays. Each time you add 1 element to the arrays, you would have to execute the loop 1 more time in the worst case.

Simple Algorithm Question

Watching the free MIT Algorithms course on iTunesU and Im stuck on the first lecture.
Take an insertion sort, its time is really T(n/2) in worst-case (reversed order array/list), but they say that this is theta n squared. I would thought this would be theta n. Im lost how they say this is n squared. Im stuck how they jump to concluding this is n squared, Wikipedia is not helping either. Can someone dumb it down further?

Insertion-sorting an array of 4 elements that starts in reverse order:
4 3 2 1
first, insert the "4" into its proper position in an array of length 1 (i.e. do nothing).
next, insert the "3" into its proper position in an array of length 2:
3 4 2 1
(we had to move the 3 and the 4)
next, insert the "2" into its proper position in an array of length 3:
2 3 4 1
(we had to move the 2, the 3 and the 4)
next, insert the "1"
1 2 3 4
(we had to move the 1, the 2, the 3 and the 4)
We performed n steps, and each step k required moving k elements (or k-1 swaps, depending how you want to look at it). The sum of k from 1 to n is Theta(n^2).
In the case of a simple linked list structure[*], we can move an object into its proper place in O(1), but in general finding the proper place still requires a linear search through the part of the data that's already sorted, so it still ends up only O(n^2) for general input. A basic insertion sort for a linked list happens to deal very well with reverse-ordered data, though, because it happens to always find the correct insertion position immediately. So we get n steps of O(1) each for a total O(n) running time for this particular data.
Assuming we still choose the first unsorted element to insert, and that we search forward through the sorted part of the list at each step, then the worst case of insertion sort for a list is already-sorted data, and is Theta(n^2) again.
[*] meaning, nothing fancy like a skip list.

From wikipedia:
The worst case input is an array
sorted in reverse order. In this case
every iteration of the inner loop will
scan and shift the entire sorted
subsection of the array before
inserting the next element. For this
case insertion sort has a quadratic
running time (i.e., O(n2)).
First loop is iterating on the array/list to sort, inner loop iterates on the partially sorted array/list. If it's already sorted you can see that you iterate all the way to the end of the sorted container every time.
Here's more explanation in pseudo:
for element in unsorted_container
for current_element in sorted_container
if element < current_element -> Will never happen since sorted in reverse order.
InsertBefore(element, current_element)
if element not inserted
InsertAtEnd(element) <- Will always execute this part since it will always insert at end.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio