Dissecting mergesort routine - algorithm

Disclaimer: this is a question that requires some explanation of code and algorithm. It is not intended to fix anything or optimize anything but rather facilitate understanding.
My understanding of sorting routines is not great. I asked for help with converting an already available code for mergesort from integer type to string type here: delphi mergesort for string arrays. After I received my answer I set out to understand the sorting routine.
Couple of resources came handy to help understanding:
http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/merge/mergen.htm
http://www.youtube.com/watch?v=9Qk1t66g7IU
I attempted to dissect the code to follow it along. This question is not my attempt to validate my own understanding of mergesort, but rather show the sorting routine in a clear manner. The value of this question is for people attempting to understand mergesort better. This is essential as other sorts can be understood easier if you understand one prototype well.
My question is why did we add "1" to set length and to "Result"
SetLength(AVals, Length(Vals) div 2 + 1);
Result := 1 + PerformMergeSort(0, High(Vals));
and why did we subtract "1" here? EDIT: I think K will be out of bounds if not subtract 1?
Result := k - 1;
here is the code in this question; BTW this is an optimized mergesort as it copies only half the array:
function MergeSortRemoveDuplicates(var Vals: array of Integer):Integer;
var
AVals: array of Integer;
//returns index of the last valid element
function Merge(I0, I1, J0, J1: Integer):Integer;
var
i, j, k, LC:Integer;
begin
LC := I1 - I0;
for i := 0 to LC do
AVals[i]:=Vals[i + I0];
//copy lower half or Vals into temporary array AVals
k := I0;
i := 0;
j := J0;
while ((i <= LC) and (j <= J1)) do
if (AVals[i] < Vals[j]) then begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end else if (AVals[i] > Vals[j]) then begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end else begin //duplicate
Vals[k] := AVals[i];
inc(i);
inc(j);
inc(k);
end;
//copy the rest
while i <= LC do begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end;
if k <> j then
while j <= J1 do begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end;
Result := k - 1;
end;
//returns index of the last valid element
function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
var
AMid, I1, J1:Integer;
begin
//It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1;
I1 := PerformMergeSort(ALo, AMid);
J1 := PerformMergeSort(AMid + 1, AHi);
Result := Merge(ALo, I1, AMid + 1, J1);
end else
Result := ALo;
end;
begin
SetLength(AVals, Length(Vals) div 2 + 1);
Result := 1 + PerformMergeSort(0, High(Vals));
end;
here is my understanding with very small modification:
function MergeSortRemoveDuplicates(var Vals: array of Integer):Integer;
var
AVals: array of Integer;
//returns index of the last valid element
function Merge(I0, I1, J0, J1: Integer):Integer;
var
i, j, k, LC:Integer;
begin
// difference between mid-point on leftside
// between low(Original_array) and midpoint(true Original_array midpoint)
// subtracting I0 which is Low(Original_array)
// or here equals zero(0)
// so LC is quarter point in Original_array??
LC := I1 - I0;
// here we walk from begining of array
// and copy the elements between zero and LC
// this is funny call that Vals[i + I0] like 0 + 0
// then 1 + 0 and so on. I guess this guarantees if we are
// starting from non-zero based array??
for i := 0 to LC do
AVals[i]:=Vals[i + I0];
// k equal low(Original_array)
k := I0;
// I will be our zero based counter element
i := 0;
// J will be (midpoint + 1) or
// begining element of right side of array
j := J0;
// while we look at Copy_array elements
// between first element (low(Copy_array)
// and original_array from midpoint + 1 to high(Original_array)
// we start to sort it
while ((i <= LC) and (j <= J1)) do
// if the value at Copy_array is smaller than the Original_array
// we move it to begining of Original_array
// remember position K is first element
if (AVals[i] < Vals[j]) then begin
Vals[k] := AVals[i];
// move to next element in Copy_array
inc(i);
// move to next element in Original_array
inc(k);
// if the value at copy_array is larger
// then we move smaller value from J Original_array (J is midpoint+1)
// to position K original_array (K now is the lower part of ) Original_array)
end else if (AVals[i] > Vals[j]) then begin
Vals[k]:=Vals[j];
//move K to the next element in Original_array
inc(k);
// move j to next element in Original_array
inc(j);
// if the value in Original_array is equal to the element in Copy_array
// do nothing and count everything up
// so we end up with one copy from duplicate and disregard the rest
end else begin //duplicate
Vals[k] := AVals[i];
inc(i);
inc(j);
inc(k);
end;
//copy the rest
while i <= LC do begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end;
// if the counters do not endup at the same element
// this means we have some that maybe leftover on
// the right side of the Original_array.
// This explains why K does not equal J : there are still elements left over
// then copy them to Original_array
// starting at position K.
if k <> j then
while j <= J1 do begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end;
// why K - 1?
// function needs result so return will be null if called
// I don't understand this part
Result := k - 1;
end;
//returns index of the last valid element
function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
var
AMid, I1, J1:Integer;
begin
//It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1; // midpoint
I1 := PerformMergeSort(ALo, AMid); //recursive call I1 is a data point on the left
J1 := PerformMergeSort(AMid + 1, AHi); // recursive call I1 is a data point on the right
Result := Merge(ALo, I1, AMid + 1, J1);
end else
Result := ALo;
end;
begin
// test if array is even then we can split nicely down middle
if Length(Vals) mod 2 = 0 then
begin
SetLength(AVals, Length(Vals) shr 1);
Result := PerformMergeSort(0, High(Vals));
end
else
//array is odd let us add 1 to it and make it even
// shr 1 is essentially dividing by 2 but doing it on the bit level
begin
SetLength(AVals, (Length(Vals) + 1) shr 1);
Result := PerformMergeSort(0, High(Vals));
end;
end;

This is my modification of code presented by author, intended to remove duplicates during the sorting. Some explanations:
External function:
We should provide buffer (AVals) to store half of the initial aray. Length(Vals) div 2 + 1 provides enough space for odd- and even-sized arrays without unnecessary complication. Better value (for all cases): Length(Vals + 1) div 2
Internal procedure PerformMergeSort returns index of the last valid element, but external procedure returns count of valid elements (it was commented in the cited topic), so I use (1 + PerformMergeSort()).
Reasons: internally we have to work with indexes, but end user of this procedure should know new array length.
Internal function PerformMergeSort:
It takes start and end indexes of array chunk, sorting this chunk and returns index of the last valid element. After recursive calls we have this situation.
Invariant: both chunks are sorted, they don't contain duplicates, non-zero length of left segment
*****ACDEFG****BCDEGHILM******
^ ^ ^ ^
| | | |
Alo I1 AMid+1 J1
I0 I1 J0 J1 //as named in Merge
\____/
LC+1 elements
And after merging:
*****ABCDEFGHILM**************
^ ^^
| ||__k
| |
Alo Result
Internal function Merge:
Use provided example, pen and paper, step through merging, see how it works.
Concerning to copy cycle: we copy (LC+1) elements to temporary buffer AVals, using start segment of AVals (always starting from 0) and proper segment of the main array (starting from I0, it is usually non-zero)

Related

Pascal bubble sort print each sorted line

I have my bubble sorting algorithm which works correctly but I want to set it up so it prints each line in the process of the final output(19 lines).I have tried almost everything, but it doesn't print correctly:
program Bubble_Sort;
const N = 20;
var
d : array[1..N] of integer;
var
i,j,x : integer;
begin
randomize;
for i := 1 to N do d[i] := random(100);
writeln('Before sorting:'); writeln;
for i := 1 to N do write(d[i], ' ');
writeln;
for j := 1 to N - 1 do
for i := 1 to N - 1 do
write(d[i], ' ');
if d[i] > d[i+1] then
begin
x := d[i]; d[i] := d[i+1]; d[i+1] := x;
end;
writeln('After sorting:'); writeln;
for i := 1 to N do write(d[i], ' ');
writeln;
end.
The outer loop in the center of your code, the for j ... loop runs for each bubble iteration. That is where you want to output the state of the sorting. Because you thus have more than one statement within that for j ... loop, you must also add a begin .. end pair:
for j := 1 to N - 1 do
begin
//one round of sorting
//display result so far
end;
The sorting is ok as you have it, except when you added the write(d[i], ' '); presumably to output the sort result for one iteration, you changed the execution order to become totally wrong.
Remove the write(d[i], ' '); from where it is now.
To display the sorting result after each iteration add a new for k ... loop and a writeln;
for k := 1 to N do
write(d[k], ' ');
writeln;
Final sorting and progress display should be structured like:
for j := 1 to N - 1 do
begin
for i := 1 to N - 1 do
// one round of sorting
for k := 1 to N - 1 do
// output result of one sorting round
end;

Optimize a perfect number check to O(sqrt(n))

Part of the program I have checks if an input number is a perfect number. We're supposed to find a solution that runs in O(sqrt(n)). The rest of my program runs in constant time, but this function is holding me back.
function Perfect(x: integer): boolean;
var
i: integer;
sum: integer=0;
begin
for i := 1 to x-1 do
if (x mod i = 0) then
sum := sum + i;
if sum = x then
exit(true)
else
exit(false);
end;
This runs in O(n) time, and I need to cut it down to O(sqrt(n)) time.
These are the options I've come up with:
(1) Find a way to make the for loop go from 1 to sqrt(x)...
(2) Find a way to check for a perfect number that doesn't use a for loop...
Any suggestions? I appreciate any hints, tips, instruction, etc. :)
You need to iterate the cycle not for i := 1 to x-1 but for i := 2 to trunc(sqrt(x)).
The highest integer divisor is x but we do not take it in into account when looking for perfect numbers. We increment sum by 1 instead (or initialize it with 1 - not 0).
The code if (x mod i = 0) then sum := sum + i; for this purpose can be converted to:
if (x mod i = 0) then
begin
sum := sum + i;
sum := sum + (x div i);
end;
And so we get the following code:
function Perfect(x: integer): boolean;
var
i: integer;
sum: integer = 1;
sqrtx: integer;
begin
sqrtx := trunc(sqrt(x));
i := 2;
while i <= sqrtx do
begin
if (x mod i = 0) then
begin
sum := sum + i;
sum := sum + (x div i) // you can also compare i and x div i
//to avoid adding the same number twice
//for example when x = 4 both 2 and 4 div 2 will be added
end;
inc(i);
end;
if sum = x then
exit(true)
else
exit(false);
end;

All sums of a number

I need an algorithm to print all possible sums of a number (partitions).
For example: for 5 I want to print:
1+1+1+1+1
1+1+1+2
1+1+3
1+2+2
1+4
2+3
5
I am writing my code in Pascal. So far I have this:
Program Partition;
Var
pole :Array [0..100] of integer;
n :integer;
{functions and procedures}
function Minimum(a, b :integer): integer;
Begin
if (a > b) then Minimum := b
else Minimum := a;
End;
procedure Rozloz(cislo, i :integer);
Var
j, soucet :integer;
Begin
soucet := 0;
if (cislo = 0) then
begin
for j := i - 1 downto 1 do
begin
soucet := soucet + pole[j];
if (soucet <> n) then
Write(pole[j], '+')
else Write(pole[j]);
end;
soucet := 0;
Writeln()
end
else
begin
for j := 1 to Minimum(cislo, pole[i - 1]) do
begin
pole[i] := j;
Rozloz(cislo - j, i + 1);
end;
end;
End;
{functions and procedures}
{Main program}
Begin
Read(n);
pole[0] := 101;
Rozloz(n, 1);
Readln;
End.
It works good but instead of output I want I get this:
1+1+1+1+1
2+1+1+1
2+2+1
3+1+1
3+2
4+1
5
I can't figure out how to print it in right way. Thank you for help
EDIT: changing for j:=i-1 downto 1 to for j:=1 to i-1 solves one problem. But my output is still this: (1+1+1+1+1) (2+1+1+1) (2+2+1) (3+1+1) (3+2) (4+1) (5) but it should be: (1+1+1+1+1) (1+1+1+2) (1+1+3) (1+2+2) (1+4) (2+3) (5) Main problem is with the 5th and the 6th element. They should be in the opposite order.
I won't attempt Pascal, but here is pseudocode for a solution that prints things in the order that you want.
procedure print_partition(partition);
print "("
print partition.join("+")
print ") "
procedure finish_and_print_all_partitions(partition, i, n):
for j in (i..(n/2)):
partition.append(j)
finish_and_print_all_partitions(partition, j, n-j)
partition.pop()
partition.append(n)
print_partition(partition)
partition.pop()
procedure print_all_partitions(n):
finish_and_print_all_partitions([], 1, n)

2^n calculator in pascal for n={bigger numbers}

Before i must say this : Please, excuse me for my bad english...
I'm student.My teacher gave me problem in pascal for my course work...
I must write program that calculates 2^n for big values of n...I've wrote but there is a problem...My program returns 0 for values of n that bigger than 30...My code is below...Please help me:::Thanks beforehand...
function control(a: integer): boolean;
var
b: boolean;
begin
if (a >= 10) then b := true
else b := false;
control := b;
end;
const
n = 200000000;
var
a: array[1..n] of integer;
i, j, c, t, rsayi: longint; k: string;
begin
writeln('2^n');
write('n=');
read(k);
a[1] := 1;
rsayi := 1;
val(k, t, c);
for i := 1 to t do
for j := 1 to t div 2 do
begin
a[j] := a[j] * 2;
end;
for i := 1 to t div 2 do
begin
if control(a[j]) = true then
begin
a[j + 1] := a[j + 1] + (a[j] div 10);
a[j] := a[j] mod 10;
rsayi := rsayi + 1;
end;
end;
for j := rsayi downto 1 do write(a[j]);
end.
The first (nested) loop boils down to "t" multiplications by 2 on every single element of a.
30 multiplications by two is as far as you can go with a 32-bit integer (2^31-1 of positive values, so 2^31 is out of reach)
So the first loop doesn't work, and you probably have to rethink your strategy.
Here is a quick and dirty program to compute all 2^n up to some given, possibly large, n. The program repeatedly doubles the number in array a, which is stored in base 10; with lower digit in a[1]. Notice it's not particularly fast, so it would not be wise to use it for n = 200000000.
program powers;
const
n = 2000; { largest power to compute }
m = 700; { length of array, should be at least log(2)*n }
var
a: array[1 .. m] of integer;
carry, s, p, i, j: integer;
begin
p := 1;
a[1] := 1;
for i := 1 to n do
begin
carry := 0;
for j := 1 to p do
begin
s := 2*a[j] + carry;
if s >= 10 then
begin
carry := 1;
a[j] := s - 10
end
else
begin
carry := 0;
a[j] := s
end
end;
if carry > 0 then
begin
p := p + 1;
a[p] := 1
end;
write(i, ': ');
for j := p downto 1 do
write(a[j]);
writeln
end
end.

delphi mergesort for string arrays [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Found this coded mergesort on http://www.explainth.at/en/delphi/dsort.shtml (site down but try wayback machine or this site: http://read.pudn.com/downloads192/sourcecode/delphi_control/901147/Sorts.pas__.htm) but essentially the array defined was not for an array of string.
type TSortArray = array[0..8191] of Double;
I want to pass an array of string that would possibly eliminate duplicates (this would be Union?) and preserve original order if possible for later resorting it back to original index position minus the duplicates of course (original index) so array can be passed back for further processing. I am using very large files of strings with millions of strings (14 to 30 million) so TStringList is not an option. Best option for these large files is to use arrays of string or arrays of records (or maybe single linked list??) and sort with stable algorithm made for large amount of data.
How can I change this to take array of string?
How can it be further modified to delete or at least mark duplicates?
Is it possible to store original index number to place back strings in original position?
Are arrays of string or arrays of record better for large number of strings as compared to a single linked list?
Questions are listed in order of importance so if you answer question number 1 only that is fine. Thank you in advance for all your input.
procedure MergeSort(var Vals:TSortArray;ACount:Integer);
var AVals:TSortArray;
procedure Merge(ALo,AMid,AHi:Integer);
var i,j,k,m:Integer;
begin
i:=0;
for j:=ALo to AMid do
begin
AVals[i]:=Vals[j];
inc(i);
//copy lower half or Vals into temporary array AVals
end;
i:=0;j:=AMid + 1;k:=ALo;//j could be undefined after the for loop!
while ((k < j) and (j <= AHi)) do
if (AVals[i] < Vals[j]) then
begin
Vals[k]:=AVals[i];
inc(i);inc(k);
end else
begin
Vals[k]:=Vals[j];
inc(k);inc(j);
end;
{locate next greatest value in Vals or AVals and copy it to the
right position.}
for m:=k to j - 1 do
begin
Vals[m]:=AVals[i];
inc(i);
end;
//copy back any remaining, unsorted, elements
end;
procedure PerformMergeSort(ALo,AHi:Integer);
var AMid:Integer;
begin
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1;
PerformMergeSort(ALo,AMid);
PerformMergeSort(AMid + 1,AHi);
Merge(ALo,AMid,AHi); <==== passing the array as string causes AV breakdown here
end;
end;
begin
SetLength(AVals, ACount);
PerformMergeSort(0,ACount - 1);
end;
Answer to the second question: Mergesort modification with duplicate deleting. Should work for strings.
//returns new valid length
function MergeSortRemoveDuplicates(var Vals: array of Integer):Integer;
var
AVals: array of Integer;
//returns index of the last valid element
function Merge(I0, I1, J0, J1: Integer):Integer;
var
i, j, k, LC:Integer;
begin
LC := I1 - I0;
for i := 0 to LC do
AVals[i]:=Vals[i + I0];
//copy lower half or Vals into temporary array AVals
k := I0;
i := 0;
j := J0;
while ((i <= LC) and (j <= J1)) do
if (AVals[i] < Vals[j]) then begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end else if (AVals[i] > Vals[j]) then begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end else begin //duplicate
Vals[k] := AVals[i];
inc(i);
inc(j);
inc(k);
end;
//copy the rest
while i <= LC do begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end;
if k <> j then
while j <= J1 do begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end;
Result := k - 1;
end;
//returns index of the last valid element
function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
var
AMid, I1, J1:Integer;
begin
//It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1;
I1 := PerformMergeSort(ALo, AMid);
J1 := PerformMergeSort(AMid + 1, AHi);
Result := Merge(ALo, I1, AMid + 1, J1);
end else
Result := ALo;
end;
begin
SetLength(AVals, Length(Vals) div 2 + 1);
Result := 1 + PerformMergeSort(0, High(Vals));
end;
//short test
var
A: array of Integer;
i, NewLen: Integer;
begin
Randomize;
SetLength(A, 12);
for i := 0 to High(A) do
A[i] := Random(10);
NewLen := MergeSortRemoveDuplicates(A);
SetLength(A, NewLen);
for i := 0 to High(A) do
Memo1.Lines.Add(IntToStr(A[i]))
end;
Simple modification for strings:
function MergeSortRemoveDuplicates(var Vals: array of String):Integer;
var
AVals: array of String;
and test case:
var
List: TStringList;
Arr: array of string;
i, n: Integer;
begin
with TStringList.Create do try
LoadFromFile('F:\m2.txt'); //contains some equal strings
SetLength(Arr, Count);
for i := 0 to Count - 1 do
Arr[i] := Strings[i];
finally
Free
end;
n := MergeSortRemoveDuplicates(Arr);
for i := 0 to n - 1 do
Memo1.Lines.Add(Arr[i]);
end;
You'd need to modify the declaration TSortArray from array of double to array of string (or array of MyRecord)
The comparison routines in the Merge nested proc needs to be made compatible for strings. Check for anywhere that determines whether AVal[x] < / > AVal[y]. Delphi has procedures for this (AnsiCompareText / AnsiCompareStr depending on whether you want case-sensitivity)
That should work, but if you hadn't done this in your earlier attempts then Delphi should have complained about type mismatches rather than giving an AV, so there may be something else going on
I think duplicate checking should be done post-sort - it only requires one scan through of the data
If you want to store original index data then you will probably need to use an array of record (data: string; OriginalIndex: integer). Code in the Merge procedure then needs to be modified to pass Vals[x].Data to comparison routines. Filling the OriginalIndex values will be a quick scan before calling the Merge procedure
Not 100% sure, to be honest - it's easier to move large contiguous chunks of data with linked lists than with arrays, and arrays don't need messing about with pointers. If your dataset is sufficiently large you may even need to resort to streaming to disk which is likely to drive your choice more than either of those points.

Resources