how to make a sorting method in smalltalk - sorting

I am trying to make a new sorting method in smalltalk. Anyone know how to change this sorting java code to squeak?
public static void SelectionSort ( int [ ] num )
{
int i, j, first, temp;
for ( i = num.length - 1; i > 0; i - - )
{
first = 0; //initialize to subscript of first element
for(j = 1; j <= i; j ++) //locate smallest element between positions 1 and i.
{
if( num[j] < num[first] )
first = j;
}
temp = num[first]; //swap smallest found with element in position i.
num[first] = num[ i ];
num[i] = temp;
}
}

Short answer:
You don't have to.
To sort an array, just send it the message #asSortedCollection. For instance, inspect this in a workspace:
#(7 2 8 5) asSortedCollection
Long answer:
Since I assume you want to see how you would implement the equivalent of your Java code in Smalltalk if you really had to, here's a relatively "literal translation" you can test in a workspace (tested in Pharo, should work in Squeak as well):
| someNumbers |
someNumbers := #(7 2 8 5) copy. "See comments below for an explanation."
someNumbers size to: 1 by: -1 do: [:eachOuterIndex |
| indexOfSmallest swapValue |
indexOfSmallest := 1.
1 to: eachOuterIndex do: [:eachInnerIndex |
(someNumbers at: eachInnerIndex) < (someNumbers at: indexOfSmallest)
ifTrue: [ indexOfSmallest := eachInnerIndex ]
].
swapValue := someNumbers at: indexOfSmallest.
someNumbers at: indexOfSmallest put: (someNumbers at: eachOuterIndex).
someNumbers at: eachOuterIndex put: swapValue.
].
^someNumbers
Clearly, there are a few changes from your version, such as using explicit naming, which is one of Smalltalk's hallmark conventions (in particular, indexOfSmallest should be clearer than first, which is misleading since it's not necessarily the first index), and decreasing the scope of the variables you called first and temp). See #Leandro's answer for a version that uses your own variable names if you have trouble with the "translation".
If the code were to live in a method, I would probably put it in the SequenceableCollection hierarchy (or maybe you'd want to add yours as a subclass in there if you want to override other behaviour), and the start of it could look something like this:
copySortedDescending
"Answer a copy of the receiver, sorted in descending order."
| copy |
copy := self copy.
copy size to: 1 by: -1 do: [:eachOuterIndex |
"... and so on..."
].
^copy
Again, note that I'm deliberately changing the name, because I don't think selectionSort is a descriptive name of what the method does, and I wouldn't use a collection as an argument to a method living somewhere else - the knowledge of how to do the sorting belongs on the collection itself.
I'm sure you could come up with a better roll-your-own-answer very easily, though. For instance, you could try sending a SequenceableCollection instance the message sort: and pass a sort block as an argument, in which you could specify how you want your elements to be sorted.

Here is a line by line translation. Row numbers are not part of the code.
1. selectionSort: num
2. | first temp |
3. num size to: 1 by: -1 do: [:i |
4. first := 1. "initialize to subscript of first element"
5. 1 to: i do: [:j |
6. "locate smallest element between positions 1 and i"
7. (num at: j) < (num at: first) ifTrue: [first := j]].
8. temp := num at: first. "swap smallest with element in position i"
9. num at: first put: (num at: i).
10. num at: i put: temp]
Remarks:
No argument type declaration. No answer type.
Block temporaries i and j declared inside blocks (lines 3 and 5). In Smalltalk, indexed collections are 1 based.
num.length() -> num size. Decreasing for loop translates into to:by:do: message.
Assignment = becomes := and 0 becomes 1 (see line remark 2 above.)
Increasing for loop translates into to:do: message.
Comments are enclosed between double quotes.
[j] translates into the at: j message. if translates into an ifTrue: message.
temp could have been declared in the first block: do: [:i | | temp |....
num[j] = temp also becomes a message sending at:put:.
Idem 9. Note also that you could have used the cascade syntax for lines 9 and 10:
num
at: first put: (num at: i);
at: i put: temp
No need to answer num because it's been modified by the method. See, however, the interesting discussion originated in Amos' answer: Why shouldn't I store into literal arrays in Smalltalk?.

Related

What is wrong with the recursive algorithm developed for the below problem?

I have tried to solve an algorithmic problem. I have come up with a recursive algorithm to solve the same. This is the link to the problem:
https://codeforces.com/problemset/problem/1178/B
This problem is not from any contest that is currently going on.
I have coded my algorithm and had run it on a few test cases, it turns out that it is counting more than the correct amount. I went through my thought process again and again but could not find any mistake. I have written my algorithm (not the code, but just the recursive function I have thought of) below. Can I please know where had I gone wrong -- what was the mistake in my thought process?
Let my recursive function be called as count, it takes any of the below three forms as the algorithm proceeds.
count(i,'o',0) = count(i+1,'o',0) [+ count(i+1,'w',1) --> iff (i)th
element of the string is 'o']
count(i,'w',0) = count(i+1,'w',0) [+ count(i+2,'o',0) --> iff (i)th and (i+1)th elements are both equal to 'v']
count(i,'w',1) = count(i+1,'w',1) [+ 1 + count(i+2,'w',0) --> iff (i)th and (i+1)th elements are both equal to 'v']
Note: The recursive function calls present inside the [.] (square brackets) will be called iff the conditions mentioned after the arrows are satisfied.)
Explanation: The main idea behind the recursive function developed is to count the number of occurrences of the given sequence. The count function takes 3 arguments:
argument 1: The index of the string on which we are currently located.
argument 2: The pattern we are looking for (if this argument is 'o' it means that we are looking for the letter 'o' -- i.e. at which index it is there. If it is 'w' it means that we are looking for the pattern 'vv' -- i.e. we are looking for 2 consecutive indices where this pattern occurs.)
argument 3: This can be either 1 or 0. If it is 1 it means that we are looking for the 'vv' pattern, having already found the 'o' i.e. we are looking for the 'vv' pattern shown in bold: vvovv. If it is 0, it means that we are searching for the 'vv' pattern which will be the
beginning of the pattern vvovv (shown in bold.)
I will initiate the algorithm with count(0,'w',0) -- it means, we are at the 0th index of the string, we are looking for the pattern 'vv', and this 'vv' will be the prefix of the 'vvovv' pattern we wish to find.
So, the output of count(0,'w',0) should be my answer. Now comes the trouble, for the following input: "vvovooovovvovoovoovvvvovo" (say input1), my program (which is based on the above algorithm) gives the expected answer(= 50). But, when I just append "vv" to the above input to get a new input: "vvovooovovvovoovoovvvvovovv" (say input2) and run my algorithm again, I get 135 as the answer, while the correct answer is 75 (this is the answer the solution code returns). Why is this happening? Where had I made an error?
Also, one more doubt is if the output for the input1 is 50, then the output for the input2 should be at least twice right -- because all of the subsequences which were present in the input1, will be present in the input2 too and all of those subsequences can also form a new subsequence with the appended 'vv' -- this means we have at least 100 favourable subsequences right?
P.S. This is the link to the solution code https://codeforces.com/blog/entry/68534
This question doesn't need recursion or dynamic programming.
The basic idea is to count how many ws we have before and after each o.
If you have X vs, it means you have X - 1 ws.
Let's use vvvovvv as an example. We know that before and after the o we have 3 vs, which means 2 ws. To evaluate the answer, just multiply 2x2 = 4.
For each o we find, we just need to multiply the ws before and after it, sum it all and this is our answer.
We can find how many ws there are before and after each o in linear time.
#include <iostream>
using namespace std;
int convert_v_to_w(int v_count){
return max(0, v_count - 1);
}
int main(){
string s = "vvovooovovvovoovoovvvvovovvvov";
int n = s.size();
int wBefore[n];
int wAfter[n];
int v_count = 0, wb = 0, wa = 0;
//counting ws before each o
int i = 0;
while(i < n){
v_count = 0;
while(i < n && s[i] == 'v'){
v_count++;
i++;
}
wb += convert_v_to_w(v_count);
if(i < n && s[i] == 'o'){
wBefore[i] = wb;
}
i++;
}
//counting ws after each o
i = n - 1;
while(i >= 0){
v_count = 0;
while(i >= 0 && s[i] == 'v'){
v_count++;
i--;
}
wa += convert_v_to_w(v_count);
if(i >= 0 && s[i] == 'o'){
wAfter[i] = wa;
}
i--;
}
//evaluating answer by multiplying ws before and after each o
int ans = 0;
for(int i = 0; i < n; i++){
if(s[i] == 'o') ans += wBefore[i] * wAfter[i];
}
cout<<ans<<endl;
}
output: 100
complexity: O(n) time and space

Cost constant of pseudocode

I understand where the cost comes from for lines 1-4, 6, and 9-10. But why is the cost of line 5 10 and the cost of line 7 6?
Max-Heap-Increase-Key(A[1...n]: array of number, i: 1<= i <= n, key)
1 if key < A[i]
2 then print " "
3 A[i] = key
4 parent = floor(i/2)
5 while i > 1 and A[parent] < A[i]
6 temp = A[i]
7 A[i] = A[parent]
8 A[parent] = temp
9 i = parent
10 parent = floor(i/2)
The constant cost for a single execution of each line are as follows:
1) 5,
2) 1,
3) 4,
4) 4,
5) 10,
6) 4,
7) 6,
8) 4,
9) 2,
10) 4
Count cost 1 for: reading a variable, writing to variable, using an array index to locate memory location, reading or writing to array index, arithmetic op, comparison (where <= or >= counts twice) and a boolean operation.
Let's look at line 5:
while i > 1 and A[parent] < A[i]
According to the rules for what we should count:
Reading a variable: i is read twice, A twice, and parent once, so there are five read operations.
Reading from an array: Twice from the array A.
Comparison: One > and one <, so there are two comparisons.
Boolean operation: One and.
Total cost is 10.
And line 7:
A[i] = A[parent]
According to the rules:
Reading a variable: A is read twice, i once, and parent once.
Reading from an array: Once, on the right-hand side.
Writing to an array: Once, on the left-hand side.
So the total cost is 6.
It remains uncertain what "using an array index to locate memory location" is supposed to mean if this is different to "reading or writing to array index". Perhaps this should be counted instead of loading the variable A? That would be strange, but it is also strange to describe it as a separate cost from reading/writing to an array.
Generally speaking, a variable like A holds a pointer to an array, so accessing an array like A[i] requires loading that pointer, then loading the index variable, and then doing the read or the write. The read or write operation consumes the pointer and index loaded in the previous two operations.

Represent a word with an alphabet

This is an interview question:
Imagine an alphabet of words. Example:
a ==> 1
b ==> 2
c ==> 3
.
z ==> 26
ab ==> 27
ac ==> 28
.
az ==> 51
bc ==> 52
and so on.
Such that the sequence of characters need to be in ascending order only (ab is valid but ba is not). Given any word print its index if valid and 0 if not.
Input Output
ab 27
ba 0
aez 441
Note: Brute-force is not allowed. Here is the link to the question: http://www.careercup.com/question?id=21117662
I can understand that solution as:
The total words is 2^26 -1.
For a given word, the words with small size occurs first.
Let n be the length of the word,
Total number of words with size less than n is C(26, 1) + C(26, 2) + ...+ C(26, n -1)
Then calculate how many words with the same size prior to the given word
The sum of two numbers plusing one is the result
Reference: sites.google.com/site/spaceofjameschen/annnocements/printtheindexofawordwithlettersinascendingorder--microsoft
In the sample solution, I understood how the author calculated number of words with size less than word.size(). But, in the code, I am not too sure about how to find number of words of the same size as word.size() that occur before 'word'.
Precisely, this bit:
char desirableStart;
i = 0;
while( i < str.size()){
desirableStart = (i == 0) ? 'a' : str[i - 1] + 1;
for(int j = desirableStart; j < str[i]; ++j){
index += NChooseK('z' - j, str.size() - i - 1); // Choose str.size() - i - 1 in the available charset
}
i ++;
}
Can someone help me understand this bit? Thanks.
First of all (you probably got this part, but for completeness sake), the NChooseK function calculates the binomial coefficient, i.e. the number of ways to choose k elements from a set of n elements. This function is referred to as C(n, k) in your comments, so I will use the same notation.
Since the letters are sorted and not repeating, this is exactly the number of ways one can create the n-letter words described in the problem, so this is why the first part of the function is getting you at the right position:
int index = 0;
int i = 1;
while(i < str.size()) {
// choose *i* letters out of 26
index += NChooseK(26, i);
i++;
}
For example, if your input was aez, this would get the index of the word yz, which is the last possible 2-letter combination: C(26, 1) + C(26, 2) = 351.
At this point, you have the initial index of your n-letter word, and need to see how many combinations of n-letter words you need to skip to get to the end of the word. To do this, you have to check each individual letter and count all possible combinations of letters starting with one letter after the previous one (the desirableStart variable in your code), and ending with the letter being examined.
For example, for aez you would proceed as following:
Get the last 2-letter word index (yz).
Increase index by one (this is actually done at the end of your code, but it makes more sense to do it here to keep the correct positions): now we are at index of abc.
First letter is a, no need to increase. You are still at abc.
Second letter is e, count combinations for 2nd letter from b to e. This will get you to aef (note that f is the first valid 3rd character in this example, and desirableStart takes care of that).
Third letter is z, count combinations for 3rd letter, from f to z. This will get you to aez.
That's what the last part of your code does:
// get to str.size() initial index (moved this line up)
index ++;
i = 0;
while( i < str.size()) {
// if previous letter was `e`, we need to start with `f`
desirableStart = (i == 0) ? 'a' : str[i - 1] + 1;
// add all combinations until you get to the current letter
for (int j = desirableStart; j < str[i]; ++j) {
char validLettersRemaining = 'z' - j;
int numberOfLettersToChoose = str.size() - i - 1;
index += NChooseK(validLettersRemaining, numberOfLettersToChoose);
}
i++;
}
return index;
there is no difference between the computation of the number of words of the same size and the counterpart for shorter words.
you may be led astray by the indexing of arrays in c which starts at 0. thus though i < str.size() might suggest otherwise, the last iteration of this loop actually counts words of the same size as that of the word whose index is computed.

return index of sequence of repeating numbers in array

given an array:
array = [16 16 16 22 23 23 23 25 52 52 52]
I want return a list of indices that point to the elements of three repeating numbers.
In this case that would be :
indices = find_sequence(nbr_repeats = 3)
print indices
[0 1 2 4 5 6 8 9 10]
what is the fastest and most elegant algorithm to use in order to implement find_sequence?
Simplest way i know of...keep a track of the first place you saw a number. Keep on going til you find a different number, then if the sequence is long enough, add all the numbers from the start of the sequence til just before the end.
(Of course, you'll have to check the sequence length after you're done checking elements, too. I did it by iterating one past the end and just skipping the element check on the last iteration.)
To find_repeats (input : list, minimum : integer):
start := 0
result := []
for each x from 0 to (input length):
' "*or*" here is a short-circuit or
' so we don't go checking an element that doesn't exist
if x == (input length) *or* array[x] != array[start]:
if (x - start) >= minimum:
append [start...(x - 1)] to result
start := x
return result
Based on OP's assumption:
the list is sorted
the largest frequency is nbr_repeats
This might work:
def find_sequence(nbr_repeats, l):
res = []
current = -1
count = 0
idx = 0
for i in l:
if i == current:
count += 1
if count == nbr_repeats:
for k in reversed(range(nbr_repeats)):
res.append(idx-k)
else:
current = i
count = 1
idx += 1
return res
This looks to me like a special case of the Boyer-Moore string search algorithm, and since any language you use will contain optimisations for string search, perhaps the most elegant answer is to treat your data as a character array (i.e. a string) and use your language's built in string search functions... Note that this only works if your numbers fit into your language's supported character set (e.g. no numbers bigger than 128 in ASCII)
Since you did not specify a language, here is a pseudocode:
find_sequence(array: array of int, nbr_repeats: int) : array of int
retVal = emty array of int // the return'd array
last = empty array of int // collection of last seen same elements
i = -1
for each element e in array
++i
if (isempty(last))
add(last, e) // just starting
else if (count(last, e) >= nbr_repeats)
add(retVal, i-nbr_repeats) // found an index
else if (e == first(last))
add(last, e) // we have encountered this element before
else
if (count(last, e) >= nbr_repeats)
for (j=nbr_repeats-1; j>0; --j)
add(retVal, i-j) // catching up to i with indices
last = [e] // new element
if (count(last, e) >= nbr_repeats)
for (j=nbr_repeats-1; j>0; --j)
add(retVal, i-j) // handle end of array properly
return retVal
Edit: removed comment about sorting as it would mangle the original indices.
Note: you could also just keep the last element and its seen-count instead of maintaining a list of last same elements

Algorithm for series

A, B, C,…. Z, AA, AB, ….AZ, BA,BB,…. , ZZ,AAA, …., write a function that takes a integer n and returns the string presentation. Can somebody tell me the algorithm to find the nth value in the series?
Treat those strings as numbers in base 26 with A=0. It's not quite an exact translation because in real base 26 A=AA=AAA=0, so you have to make some adjustments as necessary.
Here's a Java implementation:
static String convert(int n) {
int digits = 1;
for (int j = 26; j <= n; j *= 26) {
digits++;
n -= j;
}
String s = "";
for (; digits --> 0 ;) {
s = (char) ('A' + (n % 26)) + s;
n /= 26;
}
return s;
}
This converts 0=A, 26=AA, 702=AAA as required.
Without giving away too much (since this question seems to be a homework problem), what you're doing is close to the same as translating that integer n into base 26. Good luck!
If, as others suspect, this is homework, then this answer probably won't be much help. If this is for a real-world project though, it might make sense to do make a generator instead, which is an easy and idiomatic thing to do in some languages, such as Python. Something like this:
def letterPattern():
pattern = [0]
while True:
yield pattern
pattern[0] += 1
# iterate through all numbers in the list *except* the last one
for i in range(0,len(pattern)-1):
if pattern[i] == 26:
pattern[i] = 0
pattern[i+1] += 1
# now if the last number is 26, set it to zero, and append another zero to the end
if pattern[-1] == 26:
pattern[-1] = 0
pattern.append(0)
Except instead of yielding pattern itself you would reverse it, and map 0 to A, 1 to B, etc. then yield the string. I've run the code above and it seems to work, but I haven't tested it extensively at all.
I hope you'll find this readable enough to implement, even if you don't know Python. (For the Pythonistas out there, yes the "for i in range(...)" loop is ugly and unpythonic, but off the top of my head, I don't know any other way to do what I'm doing here)

Resources