delphi mergesort for string arrays [closed] - algorithm

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Found this coded mergesort on http://www.explainth.at/en/delphi/dsort.shtml (site down but try wayback machine or this site: http://read.pudn.com/downloads192/sourcecode/delphi_control/901147/Sorts.pas__.htm) but essentially the array defined was not for an array of string.
type TSortArray = array[0..8191] of Double;
I want to pass an array of string that would possibly eliminate duplicates (this would be Union?) and preserve original order if possible for later resorting it back to original index position minus the duplicates of course (original index) so array can be passed back for further processing. I am using very large files of strings with millions of strings (14 to 30 million) so TStringList is not an option. Best option for these large files is to use arrays of string or arrays of records (or maybe single linked list??) and sort with stable algorithm made for large amount of data.
How can I change this to take array of string?
How can it be further modified to delete or at least mark duplicates?
Is it possible to store original index number to place back strings in original position?
Are arrays of string or arrays of record better for large number of strings as compared to a single linked list?
Questions are listed in order of importance so if you answer question number 1 only that is fine. Thank you in advance for all your input.
procedure MergeSort(var Vals:TSortArray;ACount:Integer);
var AVals:TSortArray;
procedure Merge(ALo,AMid,AHi:Integer);
var i,j,k,m:Integer;
begin
i:=0;
for j:=ALo to AMid do
begin
AVals[i]:=Vals[j];
inc(i);
//copy lower half or Vals into temporary array AVals
end;
i:=0;j:=AMid + 1;k:=ALo;//j could be undefined after the for loop!
while ((k < j) and (j <= AHi)) do
if (AVals[i] < Vals[j]) then
begin
Vals[k]:=AVals[i];
inc(i);inc(k);
end else
begin
Vals[k]:=Vals[j];
inc(k);inc(j);
end;
{locate next greatest value in Vals or AVals and copy it to the
right position.}
for m:=k to j - 1 do
begin
Vals[m]:=AVals[i];
inc(i);
end;
//copy back any remaining, unsorted, elements
end;
procedure PerformMergeSort(ALo,AHi:Integer);
var AMid:Integer;
begin
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1;
PerformMergeSort(ALo,AMid);
PerformMergeSort(AMid + 1,AHi);
Merge(ALo,AMid,AHi); <==== passing the array as string causes AV breakdown here
end;
end;
begin
SetLength(AVals, ACount);
PerformMergeSort(0,ACount - 1);
end;

Answer to the second question: Mergesort modification with duplicate deleting. Should work for strings.
//returns new valid length
function MergeSortRemoveDuplicates(var Vals: array of Integer):Integer;
var
AVals: array of Integer;
//returns index of the last valid element
function Merge(I0, I1, J0, J1: Integer):Integer;
var
i, j, k, LC:Integer;
begin
LC := I1 - I0;
for i := 0 to LC do
AVals[i]:=Vals[i + I0];
//copy lower half or Vals into temporary array AVals
k := I0;
i := 0;
j := J0;
while ((i <= LC) and (j <= J1)) do
if (AVals[i] < Vals[j]) then begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end else if (AVals[i] > Vals[j]) then begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end else begin //duplicate
Vals[k] := AVals[i];
inc(i);
inc(j);
inc(k);
end;
//copy the rest
while i <= LC do begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end;
if k <> j then
while j <= J1 do begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end;
Result := k - 1;
end;
//returns index of the last valid element
function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
var
AMid, I1, J1:Integer;
begin
//It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1;
I1 := PerformMergeSort(ALo, AMid);
J1 := PerformMergeSort(AMid + 1, AHi);
Result := Merge(ALo, I1, AMid + 1, J1);
end else
Result := ALo;
end;
begin
SetLength(AVals, Length(Vals) div 2 + 1);
Result := 1 + PerformMergeSort(0, High(Vals));
end;
//short test
var
A: array of Integer;
i, NewLen: Integer;
begin
Randomize;
SetLength(A, 12);
for i := 0 to High(A) do
A[i] := Random(10);
NewLen := MergeSortRemoveDuplicates(A);
SetLength(A, NewLen);
for i := 0 to High(A) do
Memo1.Lines.Add(IntToStr(A[i]))
end;
Simple modification for strings:
function MergeSortRemoveDuplicates(var Vals: array of String):Integer;
var
AVals: array of String;
and test case:
var
List: TStringList;
Arr: array of string;
i, n: Integer;
begin
with TStringList.Create do try
LoadFromFile('F:\m2.txt'); //contains some equal strings
SetLength(Arr, Count);
for i := 0 to Count - 1 do
Arr[i] := Strings[i];
finally
Free
end;
n := MergeSortRemoveDuplicates(Arr);
for i := 0 to n - 1 do
Memo1.Lines.Add(Arr[i]);
end;

You'd need to modify the declaration TSortArray from array of double to array of string (or array of MyRecord)
The comparison routines in the Merge nested proc needs to be made compatible for strings. Check for anywhere that determines whether AVal[x] < / > AVal[y]. Delphi has procedures for this (AnsiCompareText / AnsiCompareStr depending on whether you want case-sensitivity)
That should work, but if you hadn't done this in your earlier attempts then Delphi should have complained about type mismatches rather than giving an AV, so there may be something else going on
I think duplicate checking should be done post-sort - it only requires one scan through of the data
If you want to store original index data then you will probably need to use an array of record (data: string; OriginalIndex: integer). Code in the Merge procedure then needs to be modified to pass Vals[x].Data to comparison routines. Filling the OriginalIndex values will be a quick scan before calling the Merge procedure
Not 100% sure, to be honest - it's easier to move large contiguous chunks of data with linked lists than with arrays, and arrays don't need messing about with pointers. If your dataset is sufficiently large you may even need to resort to streaming to disk which is likely to drive your choice more than either of those points.

Related

Insertion Sort - TStringList Delphi

i'm trying to sort TStringList of integers from a text file with Insertion and Selection Sort .Selection Sort works ok , but the Insertion Sort doesnt work with my code . Can someone tell me where i'm wrong ? My 'numbers.txt' has 5000 lines of numbers. Thanks in advance
UPDATE : I have edited my code a bit , it works now with Insertion-Sort but it sorts just 4 indexes of integer as on the image
var
i, Position, n: integer;
Value: string;
begin
n := Items.Count;
for i := 1 to n - 1 do
begin
Value := Items[i];
Position := i-1;
while (Position >0) and (Items[Position]>Value) do
begin
Items[Position+1]:= Items[Position] ;
Position := Position -1 ;
end;
Items[Position+1] := Value;
end;
end;
Your data in the image is sorting exactly as it should, because you're sorting on string values, and based on the comparison you're making the order is perfect. "1143" falls exactly between the string values "11413" and "11443", because the comparison is made character by character out to the length of the shortest of the values. "1141" < "1143" < "1144", based on the first four characters of each string.
If you want an actual integer sort, then you need to convert the two values to integer before comparing them. Something like this should work (note I did not test your overall sort logic - I just used values that demonstrate the concept):
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, System.Classes;
var
i, Position, n: integer;
Value: integer;
Items: TStringList;
begin
Items := TStringList.Create;
try
Items.DelimitedText := '1116,11170,11178,11206,1122,11221,11228';
n := Items.Count;
for i := 1 to n - 1 do
begin
Value := StrToInt(Items[i]);
Position := i - 1;
while (Position > 0) and (StrToInt(Items[Position]) > Value) do
begin
Items[Position + 1]:= Items[Position];
Position := Position - 1 ;
end;
Items[Position+1] := IntToStr(Value);
end;
for i := 0 to Items.Count - 1 do
WriteLn(Items[i]);
finally
Items.Free;
end;
ReadLn;
end.
The output I got from the code above in a console window:
1116
1122
11170
11178
11206
11221
11228

How many times does one number divide into another, and how much is left over?

I need an algorithm in Delphi to generate partitions for a specified integer value.
Example: for 13 if 5 is specified as the max value for partition it will give 5,5,3; if 4 is specified as max partition value the result should be 4,4,4,1, and so on.
It's simple enough to solve the problem using div and mod. Here's an example program that I don't think needs any further explanation:
program IntegerPartitions;
{$APPTYPE CONSOLE}
function Partitions(const Total, Part: Integer): TArray<Integer>;
var
Count: Integer;
Rem: Integer;
i: Integer;
begin
Assert(Total>0);
Assert(Part>0);
Count := Total div Part;
Rem := Total mod Part;
if Rem=0 then
SetLength(Result, Count)
else
SetLength(Result, Count+1);
for i := 0 to Count-1 do
Result[i] := Part;
if Rem<>0 then
Result[Count] := Rem;
end;
var
Value: Integer;
begin
for Value in Partitions(13, 5) do
Writeln(Value);
Readln;
end.

Dissecting mergesort routine

Disclaimer: this is a question that requires some explanation of code and algorithm. It is not intended to fix anything or optimize anything but rather facilitate understanding.
My understanding of sorting routines is not great. I asked for help with converting an already available code for mergesort from integer type to string type here: delphi mergesort for string arrays. After I received my answer I set out to understand the sorting routine.
Couple of resources came handy to help understanding:
http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/merge/mergen.htm
http://www.youtube.com/watch?v=9Qk1t66g7IU
I attempted to dissect the code to follow it along. This question is not my attempt to validate my own understanding of mergesort, but rather show the sorting routine in a clear manner. The value of this question is for people attempting to understand mergesort better. This is essential as other sorts can be understood easier if you understand one prototype well.
My question is why did we add "1" to set length and to "Result"
SetLength(AVals, Length(Vals) div 2 + 1);
Result := 1 + PerformMergeSort(0, High(Vals));
and why did we subtract "1" here? EDIT: I think K will be out of bounds if not subtract 1?
Result := k - 1;
here is the code in this question; BTW this is an optimized mergesort as it copies only half the array:
function MergeSortRemoveDuplicates(var Vals: array of Integer):Integer;
var
AVals: array of Integer;
//returns index of the last valid element
function Merge(I0, I1, J0, J1: Integer):Integer;
var
i, j, k, LC:Integer;
begin
LC := I1 - I0;
for i := 0 to LC do
AVals[i]:=Vals[i + I0];
//copy lower half or Vals into temporary array AVals
k := I0;
i := 0;
j := J0;
while ((i <= LC) and (j <= J1)) do
if (AVals[i] < Vals[j]) then begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end else if (AVals[i] > Vals[j]) then begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end else begin //duplicate
Vals[k] := AVals[i];
inc(i);
inc(j);
inc(k);
end;
//copy the rest
while i <= LC do begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end;
if k <> j then
while j <= J1 do begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end;
Result := k - 1;
end;
//returns index of the last valid element
function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
var
AMid, I1, J1:Integer;
begin
//It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1;
I1 := PerformMergeSort(ALo, AMid);
J1 := PerformMergeSort(AMid + 1, AHi);
Result := Merge(ALo, I1, AMid + 1, J1);
end else
Result := ALo;
end;
begin
SetLength(AVals, Length(Vals) div 2 + 1);
Result := 1 + PerformMergeSort(0, High(Vals));
end;
here is my understanding with very small modification:
function MergeSortRemoveDuplicates(var Vals: array of Integer):Integer;
var
AVals: array of Integer;
//returns index of the last valid element
function Merge(I0, I1, J0, J1: Integer):Integer;
var
i, j, k, LC:Integer;
begin
// difference between mid-point on leftside
// between low(Original_array) and midpoint(true Original_array midpoint)
// subtracting I0 which is Low(Original_array)
// or here equals zero(0)
// so LC is quarter point in Original_array??
LC := I1 - I0;
// here we walk from begining of array
// and copy the elements between zero and LC
// this is funny call that Vals[i + I0] like 0 + 0
// then 1 + 0 and so on. I guess this guarantees if we are
// starting from non-zero based array??
for i := 0 to LC do
AVals[i]:=Vals[i + I0];
// k equal low(Original_array)
k := I0;
// I will be our zero based counter element
i := 0;
// J will be (midpoint + 1) or
// begining element of right side of array
j := J0;
// while we look at Copy_array elements
// between first element (low(Copy_array)
// and original_array from midpoint + 1 to high(Original_array)
// we start to sort it
while ((i <= LC) and (j <= J1)) do
// if the value at Copy_array is smaller than the Original_array
// we move it to begining of Original_array
// remember position K is first element
if (AVals[i] < Vals[j]) then begin
Vals[k] := AVals[i];
// move to next element in Copy_array
inc(i);
// move to next element in Original_array
inc(k);
// if the value at copy_array is larger
// then we move smaller value from J Original_array (J is midpoint+1)
// to position K original_array (K now is the lower part of ) Original_array)
end else if (AVals[i] > Vals[j]) then begin
Vals[k]:=Vals[j];
//move K to the next element in Original_array
inc(k);
// move j to next element in Original_array
inc(j);
// if the value in Original_array is equal to the element in Copy_array
// do nothing and count everything up
// so we end up with one copy from duplicate and disregard the rest
end else begin //duplicate
Vals[k] := AVals[i];
inc(i);
inc(j);
inc(k);
end;
//copy the rest
while i <= LC do begin
Vals[k] := AVals[i];
inc(i);
inc(k);
end;
// if the counters do not endup at the same element
// this means we have some that maybe leftover on
// the right side of the Original_array.
// This explains why K does not equal J : there are still elements left over
// then copy them to Original_array
// starting at position K.
if k <> j then
while j <= J1 do begin
Vals[k]:=Vals[j];
inc(k);
inc(j);
end;
// why K - 1?
// function needs result so return will be null if called
// I don't understand this part
Result := k - 1;
end;
//returns index of the last valid element
function PerformMergeSort(ALo, AHi:Integer): Integer; //returns
var
AMid, I1, J1:Integer;
begin
//It would be wise to use Insertion Sort when (AHi - ALo) is small (about 32-100)
if (ALo < AHi) then
begin
AMid:=(ALo + AHi) shr 1; // midpoint
I1 := PerformMergeSort(ALo, AMid); //recursive call I1 is a data point on the left
J1 := PerformMergeSort(AMid + 1, AHi); // recursive call I1 is a data point on the right
Result := Merge(ALo, I1, AMid + 1, J1);
end else
Result := ALo;
end;
begin
// test if array is even then we can split nicely down middle
if Length(Vals) mod 2 = 0 then
begin
SetLength(AVals, Length(Vals) shr 1);
Result := PerformMergeSort(0, High(Vals));
end
else
//array is odd let us add 1 to it and make it even
// shr 1 is essentially dividing by 2 but doing it on the bit level
begin
SetLength(AVals, (Length(Vals) + 1) shr 1);
Result := PerformMergeSort(0, High(Vals));
end;
end;
This is my modification of code presented by author, intended to remove duplicates during the sorting. Some explanations:
External function:
We should provide buffer (AVals) to store half of the initial aray. Length(Vals) div 2 + 1 provides enough space for odd- and even-sized arrays without unnecessary complication. Better value (for all cases): Length(Vals + 1) div 2
Internal procedure PerformMergeSort returns index of the last valid element, but external procedure returns count of valid elements (it was commented in the cited topic), so I use (1 + PerformMergeSort()).
Reasons: internally we have to work with indexes, but end user of this procedure should know new array length.
Internal function PerformMergeSort:
It takes start and end indexes of array chunk, sorting this chunk and returns index of the last valid element. After recursive calls we have this situation.
Invariant: both chunks are sorted, they don't contain duplicates, non-zero length of left segment
*****ACDEFG****BCDEGHILM******
^ ^ ^ ^
| | | |
Alo I1 AMid+1 J1
I0 I1 J0 J1 //as named in Merge
\____/
LC+1 elements
And after merging:
*****ABCDEFGHILM**************
^ ^^
| ||__k
| |
Alo Result
Internal function Merge:
Use provided example, pen and paper, step through merging, see how it works.
Concerning to copy cycle: we copy (LC+1) elements to temporary buffer AVals, using start segment of AVals (always starting from 0) and proper segment of the main array (starting from I0, it is usually non-zero)

Comparing sorting types in Pascal

I am going to write a program that asks the user to enter items then choose the sorting type to sort them (bubble, inserion and selection).
After that I have to compare these types by time efficiency.
I wrote code for the first part, but I didn't know how to use a function in Pascal to write the second part (comparison).
This is what I have done:
Begin
writeln('How many Number you would like to sort:');
readln(size);
For m := 1 to size do
Begin
if m=1 then
begin
writeln('');
writeln('Input the first value: ');
readln(IntArray[m]);
end
else if m=size then
begin
writeln('Input the last value: ');
readln(IntArray[m]);
end
else
begin
writeln('Input the next value: ');
readln(IntArray[m]);
End;
End;
writeln('Values Before The Sort: ');
for m:=0 to size-1 do
writeln(IntArray[m+1]);
writeln();
begin
repeat
writeln(' ~*~...~*~ ~*~...~*~ ~*~...~*~ ~*~...~*~');
writeln('1. Insertion Sort.');
writeln('2. Bubble Sort.');
writeln('3. Selection Sort. ');
writeln('4. Exist ');
writeln('');
writeln('Enter your choice number: ');
readln(sort);
case sort of
1: begin {when choice = 1}
writeln('');
writeln(' The sorted numbers by Insertion sort are ~> ');
For i := 2 to size do
Begin
index := intarray[i];
j := i;
While ((j > 1) AND (intarray[j-1] > index)) do
Begin
intarray[j] := intarray[j-1];
j := j - 1;
End;
intarray[j] := index;
End;
for i:= 1 to size do
writeln(intarray[i]);
end;
2: begin {when choice = 2}
writeln('');
writeln(' The sorted numbers by bubble sort are ~> ');
For i := size-1 DownTo 1 do
For j := 2 to i do
If (intArray[j-1] > intarray[j]) then
Begin
temp := intarray[j-1];
intarray[j-1] := intarray[j];
intarray[j] := temp;
End;
for i:= 1 to size do
writeln(intarray[i]);
end;
3: begin {when choice = 3}
writeln('');
writeln(' The sorted numbers by selection sort are ~> ');
for i:=1 to size do
begin
j:= i ;
for index:= i +1 to size do
if intarray[index]<intarray[j] then
j:=index;
temp:=intarray[j];
intarray[j]:=intarray[i];
intarray[i]:=temp;
end;
for i:= 1 to size do
writeln(intarray[i]);
end;
4: begin
writeln('*~...~*~ Thank U For used Our Program We Hope You Enjoyed ~*~...~*~ ');
end;
end;
until sort = 4 ;
end;
end.
I hope that I will find the answer here...
I hope you know the TIME complexity of Bubble, Insertion and Selection sort.
If you know you can just compare like that
if (time_complexity_bub>time_complexity_ins)and(time_complexity_bub>time_complexity_sel) then writeln('Bubble Sort is the WORST !!!');
if (time_complexity_ins>time_complexity_bub)and(time_complexity_ins>time_complexity_sel) then writeln('Insertion Sort is the WORST !!!');
if (time_complexity_sel>time_complexity_ins)and(time_complexity_bub<time_complexity_sel) then writeln('Selection Sort is the WORST !!!');
If you have other questions you can ask me :D ...
Pascal supports >,>=,=,<= and < for comparison, but it seems you already know this:
if intarray[index]<intarray[j] then
So maybe you have to explain your question a bit clearer.
I think author is not sure how to measure time in Pascal.
I don't know what compiler you're using, but overall pattern is like:
var
startTime : TDateTime;
overallTime : TDateTime;
begin
startTime := SomeFunctionToGetCurrentTimeWithMicroseconds;
SomeLongOperation;
overalltime := SomeFunctionToGetCurrentTimeWithMicroseconds() - startTime;
end.

How do you implement Levenshtein distance in Delphi?

I'm posting this in the spirit of answering your own questions.
The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here, in Delphi?
Just a note on performance:
This thing is very fast. On my desktop (2.33 Ghz dual-core, 2GB ram, WinXP), I can run through an array of 100K strings in less than one second.
function EditDistance(s, t: string): integer;
var
d : array of array of integer;
i,j,cost : integer;
begin
{
Compute the edit-distance between two strings.
Algorithm and description may be found at either of these two links:
http://en.wikipedia.org/wiki/Levenshtein_distance
http://www.google.com/search?q=Levenshtein+distance
}
//initialize our cost array
SetLength(d,Length(s)+1);
for i := Low(d) to High(d) do begin
SetLength(d[i],Length(t)+1);
end;
for i := Low(d) to High(d) do begin
d[i,0] := i;
for j := Low(d[i]) to High(d[i]) do begin
d[0,j] := j;
end;
end;
//store our costs in a 2-d grid
for i := Low(d)+1 to High(d) do begin
for j := Low(d[i])+1 to High(d[i]) do begin
if s[i] = t[j] then begin
cost := 0;
end
else begin
cost := 1;
end;
//to use "Min", add "Math" to your uses clause!
d[i,j] := Min(Min(
d[i-1,j]+1, //deletion
d[i,j-1]+1), //insertion
d[i-1,j-1]+cost //substitution
);
end; //for j
end; //for i
//now that we've stored the costs, return the final one
Result := d[Length(s),Length(t)];
//dynamic arrays are reference counted.
//no need to deallocate them
end;

Resources