How to understand how many comparisons are made in the merge algorithm? - algorithm

In the merge algorithm, why is the number of comparisons is at most N, and at least N/2. I thought the comparison is at most N/2, since there would be at most calls on less(aux[j], aux[i]). Or does that mean the comparison include the statements of
if(i > mid) and else if (j > hi )? Thanks!
public static void merge(Comparable[] a, int lo, int mid, int hi)
{
for (int k = lo; k <= hi; k++)
aux[k] = a[k];
for (int k = lo; k <= hi; k++)
if(i > mid) a[k] = aux[j++];
else if (j > hi ) a[k] = aux[i++];
else if (less(aux[j], aux[i])) a[k] = aux[j++];
else a[k] = aux[i++];
}

As an example, suppose you want to merge these lists together:
1 3 5 7 9
2 4 6 8 10
This work work as follows:
1 3 5 7 9
2 4 6 8 10
^
+--------------- Compare 1 and 2, output 1
3 5 7 9
2 4 6 8 10
^
+--------------- Compare 3 and 2, output 2
3 5 7 9
4 6 8 10
^
+--------------- Compare 3 and 4, output 3
5 7 9
4 6 8 10
^
+--------------- Compare 5 and 4, output 4
5 7 9
6 8 10
^
+--------------- Compare 5 and 6, output 5
7 9
6 8 10
^
+--------------- Compare 7 and 6, output 6
7 9
8 10
^
+--------------- Compare 7 and 8, output 7
9
8 10
^
+--------------- Compare 9 and 8, output 8
9
10
^
+--------------- Compare 9 and 10, output 9
10
^
+--------------- Output 10
Here, there are 10 total elements and, as you can see, 9 comparisons were needed. The maximum number of compares that will be made is actually N - 1 for an N-element list, since if you alternate back and forth between one list having a larger value and the other having a larger value one comparison will be made per element being outputted (except for the very last one). The case where N / 2 comparisons are made is actually the best possible case; that happens only when all elements in one list are strictly smaller than all elements in the other.

Related

Did I understand the following quicksort algorithm correctly?

I'm trying to understand the given algorithm and here are my thoughts:
A is the given array... x stands for the number which is on the left side of the pivot element, y stands for the number which is on the right side of the pivot element. (Let's say the pivot element is the element on the very right side of the array.) And A[y] stands for the pivot element?
If I understood correctly, the algorithm first searches from x towards y until first number is greater or equal A[y], then search from y towards x until first number is smaller or equal A[y].
After, swap both numbers and repeat if i hasn't reached j. In the end the numbers left from i are smaller than A[y] and the numbers right from j are larger than A[y]... Also move A[y] to the middle.
What do you think about this? Am I right?
Maybe you could give an example with a random array? I cannot do this yet I believe.
Algorithm Quicksort
1 func int div (A array; x, y integer) {
2 num = A[y];
3 i = x;
4 j = y-1;
5 repeat
6 while (A[i] <= num and i < y)
7 i = i+1;
8 end while;
9 while (A[j] >= num and j > x)
10 j = j-1;
11 end while;
12 if i < j then
13 swap (A[i], A[j]);
14 end if;
15 until i >= j;
16 swap (A[i], A[y]);
17 return i;
18 }
From the algorithm it looks like x and y mark the left and right bounds of the sorting algorithm within an array (to sort the complete array, you'd use x = 0 and y = A.length). The pivot element is the rightmost one (at index y).
Then, i starts at x (the left bound) and compares each element with the pivot A[y]. It increases up to an index, at which the element is larger than A[y].
j starts at y (the right bound) and does the complementary: it decreases until it reaches an index, at which an element smaller than A[y] is found.
If i is still smaller than j, these two elements (at index i and j) are then swapped, meaning all elements from x to i are smaller than A[y], all elements from j to y are larger than A[y]. In this case, there are still elements between the indices i and j, that have not been 'seen' by the algorithm.
This procedure is repeated until eventually, the complete array is partitioned into the lower 'half' (all smaller than A[y]) and the upper 'half' (all larger than A[y]). One element larger than A[y] is then swapped with A[y] and the index between the two partitions is then returned.
In conclusion, your algorithm only partitions the array into elements smaller than A[y] and elements larger than A[y]. Repeating this recursively on both partitions will eventually sort them completely.
Example:
A = [ 4 7 9 2 5 1 3 8 6 ]
--> called with x = 0, y = 8 (thus, partitioning the complete array)
e.g. div([ 4 7 9 2 5 1 3 8 6 ], 0, 8)
xi j y
[ 4 7 9 2 5 1 3 8 | 6 ] x=0, y=8, A[y]=6, i=0, j=7
x i j y
[ 4 7 9 2 5 1 3 8 | 6 ] x=0, y=8, A[y]=6, i=1, j=7
A[i]=7 >= A[y]=6 -> first while loop ends
x i j y
[ 4 7 9 2 5 1 3 8 | 6 ] x=0, y=8, A[y]=6, i=1, j=6
A[j]=3 <= A[y]=6 -> second while loop ends
x i j y
[ 4 3 9 2 5 1 7 8 | 6 ] x=0, y=8, A[y]=6, i=1, j=6
swapped A[i] and A[j] (i and j stay the same!) -> repeat until i<j
x i j y
[ 4 3 9 2 5 1 7 8 | 6 ] x=0, y=8, A[y]=6, i=2, j=6
x i j y
[ 4 3 9 2 5 1 7 8 | 6 ] x=0, y=8, A[y]=6, i=2, j=5
x i j y
[ 4 3 1 2 5 9 7 8 | 6 ] x=0, y=8, A[y]=6, i=2, j=5
swapped -> repeat (i<j)
x i j y
[ 4 3 1 2 5 9 7 8 | 6 ] x=0, y=8, A[y]=6, i=3, j=5
x i j y
[ 4 3 1 2 5 9 7 8 | 6 ] x=0, y=8, A[y]=6, i=4, j=5
x ij y
[ 4 3 1 2 5 9 7 8 | 6 ] x=0, y=8, A[y]=6, i=5, j=5
first while ends again
x j i y
[ 4 3 1 2 5 9 7 8 | 6 ] x=0, y=8, A[y]=6, i=5, j=4
second while ends again
j>i -> don't swap
j>i -> don't repeat
swap A[i] and A[y]
x j i y
[ 4 3 1 2 5 6 7 8 | 9 ] x=0, y=8, A[y]=6, i=5, j=4
return i=5
Now, all elements between x and i are smaller than your ORIGINAL A[y] (which is now different!) and all elements between i+1 and y are smaller than the ORIGINAL A[y].
Next, you'd call div([ 4 3 1 2 5 6 7 8 9 ], 0, 5) and div([ 4 3 1 2 5 6 7 8 9 ], 6, 8) or more abstract: div(A, x, i) and div(A, i+1, y).

Unclear on 2D Matrix Transposal Method

How would one go about transposing a 2D matrix in this following manner?:
I understand that there is some sort of pattern to doing this but hard-coding is not the way, so if someone can provide some advice that would be great.
Original:
4 5 2 0
7 2 1 4
9 4 2 0
7 8 9 3
into
Transpose:
3 0 4 0
9 2 1 2
8 4 2 5
7 9 7 4
for(i=1; i<=n; i++) {
for(j=1; j<=n-i; j++) {
aux = a[i][j];
a[i][j] = a[n-j+1][n-i+1];
a[n-j+1][n-i+1] = aux;
}
}
By looking at the matrix you can see that line i is swapped with column n-i+1, which is equivalent to the symmetrical elements relative to the second diagonal being swapped.

Division algorithm with decimal bignum

EDIT: I rebased my bignum class to use std::bitset and I just implemented long division fine. I just didn't know any class to store bits. (like std::bitset)
I'm making a bignum class with std::string to use decimal characters as internal representation. I tried implementing division with a simple algorithm:
while N ≥ D do
N := N - D
end
return N
And of course it was slow. I tried implementing long division but that was too hard to do with decimal characters.
Thanks in advance.
Instead of subtracting D very often you could try to find the highest value of the form
D^2n and sub this. Than repeat that steps with the remaining value until the remaining is less than D.
Pseudocode
0 result=0
1 powerOfD = D
2 while (powerOfD*powerOfD<N)
3 powerOfD=powerOfD*powerOfD
4 result = result+powerOfD/D, N=N-powerOfD;
5 if(N>D)
6 goto 1
7 return result
Example 31/3 (N=31, D=3)
0 result=0
1 powerD = 3;
2 3*3 < 31 TRUE
3 powerOfD= 3*3;
2 9*9 < 31 FALSE
4 result=0+9/3; N=31 - 9
5 22> 3 TRUE
6 goto 1
1 powerD = 3
2 3*3 < 22 TRUE
3 powerOfD= 3*3;
2 9*9 < 31 FALSE
4 result=3+9/3; N=22 - 9
5 13> 3 TRUE
6 goto 1
1 powerD = 3
2 3*3 < 13 TRUE
3 powerOfD= 3*3;
2 9*9 < 31 FALSE
4 result=6+9/3; N=13 - 9
5 4> 3 TRUE
6 goto 1
1 powerD = 3
2 3*3 < 4 ALSE
4 result=9+3/3; N=4-3
5 1> 3 FALSE
7 return 10

Matlab: removing rows when there are repeated values in columns

I have a problem with removing the rows when the columns are identical.
I have used a for and if loop but the run time is too long.
I was thinking if there are any more efficient and faster run time method.
say
A=[ 2 4 6 8;
3 9 7 9;
4 8 7 6;
8 5 4 6;
2 10 11 2]
I would want the result to be
A=[ 2 4 6 8;
4 8 7 6;
8 5 4 6]
eliminating the 2nd row because of the repeated '9' and remove the 5th row because of repeated '2'.
You can use sort and diff to identify the rows with repeated values
A = A(all(diff(sort(A'))),:)
returns
A =
2 4 6 8
4 8 7 6
8 5 4 6
The trick here is how to find the rows with repeated values in an efficient manner.
How about this:
% compare all-vs-all for each row using `bsxfun`
>> c = bsxfun( #eq, A, permute( A, [1 3 2] ) );
>> c = sum( c, 3 ); % count the number of matches each element has in the row
>> c = any( c > 1, 2 ); % indicates rows with repeated values - an element with more than one match
>> A = A( ~c, : )
A =
2 4 6 8
4 8 7 6
8 5 4 6

sorting integers with restrictions

if we have an array of integers then how can we determine the minimum steps required to sort them(in ascending order) if the only allowed operation per step is : moving the elements to either extremes?
E.g if we have
7 8 9 11 1 10
then in 1st step one can move 11 to right end and in second step move 1 to left end to get 1 7 8 9 10 11 . Hence total steps = 2
Can bubble sort be applied here? but the worst case complexity would be O(n^2) then. So how can we do more efficiently?
Thanks.
Here is a solution that takes O(n log n) time, O(n) auxiliary space, and exactly n MoveToFront operations.
Given the input array, A, Make a second array, B, with value/index pairs, like so...
7 8 9 11 1 10 -> (7 0) (8 1) (9 2) (11 3) (1 4) (10 5)
[this step takes O(n) time and O(n) space]
then sort B in descending order of the value of each pair...
(7 0) (8 1) (9 2) (11 3) (1 4) (10 5) -> (11 3) (10 5) (9 2) (8 1) (7 0) (1 4)
[this step takes O(n log n) time]
prepare a binary search tree, T.
Now for each element in B do the following...
Let I be the index of this element.
Let V be the sum of I and the number of elements in T that are greater than I.
Do a MoveToFront operation on the Vth element of A.
Add I to T.
[Both of the operations on T take O(log n) time]
Here is this algorithm working on your example array
(11 3)
I := 3
V := 3 + 0 = 3
MoveToFront(3): 7 8 9 11 1 10 -> 11 7 8 9 1 10
T: () -> (3)
(10 5)
I := 5
V := 5 + 0 = 5
MoveToFront(5): 11 7 8 9 1 10 -> 10 11 7 8 9 1
T: (3) -> (3 5)
(9 2)
I := 2
V := 2 + 2 = 4
MoveToFront(4): 10 11 7 8 9 1 -> 9 10 11 7 8 1
T: (3 5) -> (2 3 5)
(8 1)
I := 1
V := 1 + 3 = 4
MoveToFront(4): 9 10 11 7 8 1 -> 8 9 10 11 7 1
T: (2 3 5) -> (1 2 3 5)
(7 0)
I := 0
V := 0 + 4 = 4
MoveToFront(4): 8 9 10 11 7 1 -> 7 8 9 10 11 1
T: (1 2 3 5) -> (0 1 2 3 5)
(1 4)
I := 4
V := 4 + 1 = 5
MoveToFront(5): 7 8 9 10 11 1 -> 1 7 8 9 10 11
T: (0 1 2 3 5) -> (0 1 2 3 4 5)
I imagine you can find ways to sort these arrays that use fewer than n MoveToFront/Back operations, but I don't think you can find those in less than O(n log n) time. If those operations are slow, though, then it might be worth using an algorithm that takes more time to plan so you can do fewer of those operations.
Can you clarify this problem a little bit? When you say list, do you mean a linked list or do you mean an array? If it's not a linked list, I'm a little confused by the limited operation set. If it is a linked list you can probably modify quicksort which runs in average case O(nlgn) time.
Essentially the data structure you are mentioning is a linked list. For linked lists you can use quicksort or mergesort ( O(nlogn) ).
That doesn't sound to me like a sorting problem. You need to just find how many movements you need to do, but you don't need to sort the array. I bet that could be done faster than O(n log n)
I propose such solution:
just count how many a[i] > a[i - 1]. And that will be your result.
Argumentation:
If you have a[i] > a[i-1], it means, that either a[i] or a[i-1] aren't in their places. So you can:
1) move a[i-1] to the beginning of the array
2) move a[i] to the end of the array.
Upd. You need to define which a[i] or a[i-1] are you moving, as for your example for the array: 7 8 9 11 1 10 you have two comparations, that shows what aren't in place: 11 > 1 and 11 > 10. So that is definetely a task for dynamic programming. But, it is still faster then O(n log n)

Resources