Selection Sort - Index of Min/Max

Selection Sort - Index of Min/Max - sorting

I was looking through the selection sort algorithm on cprogramming.com
and I think I found an error in the implementation.
If you work through the algorithm, there's a variable called "index_of_min" which I believe should be "index_of_max" (since when I tested it, it was sorting largest to smallest).
Thinking that it was a typo or a minor mistake, I checked out some other websites like wikipedia and some lesser known websites like geekpedia. It seems like they are call it index of min.
When I ran it through the debugger, it really seemed to me that it's the max value's index. Am I making a mistake somewhere?
Edit: As Svante pointed out, only the cprogramming implentation is wrong. Wikipedia and Geekpidia are fine.

The wikipedia and geekpedia sites seem to be correct, the cprogramming.com implementation actually has a bug; this:
if (array[index_of_min] < array[y])
{ index_of_min = y; }
has the order reversed, it should be:
if (array[y] < array[index_of_min])
{ index_of_min = y; }
Another fix would be to call the variable index_of_max, but I would expect a sorting algorithm to sort smallest to largest, and if this expectation is shared by the majority of programmers (as I presume), the principle of least astonishment rather demands the above fix.

I've only just read the code, but it looks like you're right: either index_of_min is misnamed or the comparison is backwards.
It isn't as strange as it might seem to see this error in several places. It's quite likely that each is copied from a single common source.

From Cprogramming.com "It works by selecting the smallest (or largest, if you want to sort from big to small) element of the array and placing it at the head of the array" So they have it sorting from large to small, the code isent wrong, nor is the variable naming, index_of_min keeps track of the starting point int the array (0) and then moves forward in that array. ie index_of_min keeps the smallest index value. Do not get it confused with whatever the value is at that index.

You are right. The code from that website (shown below) is incorrect.
for(int x = 0; x < n; x++)
{
int index_of_min = x;
for(int y = x; y < n; y++)
{
if(array[index_of_min] < array[y]) /* Here's the problem */
{
index_of_min = y;
}
}
int temp = array[x];
array[x] = array[index_of_min];
array[index_of_min] = temp;
}
At the end of the inner loop, for(int y=x; y<n; y++), the variable, index_of_min, holds the index of the maximum value. Assuming it was designed to sort the array from largest to smallest, this is a poorly named variable.
If you want the array sorted smallest to largest (as one would expect), you need to reverse the if statement:
if (array[y] < array[index_of_min])
{
index_of_min = y;
}

Related

Binary search for first occurrence of k

I have code that searches a sorted array and returns the index of the first occurrence of k.
I am wondering whether its possible to write this code using
while(left<right)
instead of
while(left<=right)
Here is the full code:
public static int searchFirstOfK(List<Integer> A, int k) {
int left = 0, right = A.size() - 1, result = -1;
// A.subList(left, right + 1) is the candidate set.
while (left <= right) {
int mid = left + ((right - left) / 2);
if (A.get(mid) > k) {
right = mid - 1;
} else if (A.get(mid) == k) {
result = mid;
// Nothing to the right of mid can be the first occurrence of k.
right = mid - 1;
} else { // A.get(mid) < k
left = mid + 1;
}
}
return result;
}
How do I know when to use left is less than or equal to right, or just use left is less than right.

Building on this answer to another binary search question: How can I simplify this working Binary Search code in C?
If you want to find the position of the first occurrence, you can't stop when you find a matching element. Your search should look like this (of course this assumes that the list is sorted):
int findFirst(List<Integer> list, int valueToFind)
{
int pos=0;
int limit=list.size();
while(pos<limit)
{
int testpos = pos+((limit-pos)>>1);
if (list.get(testpos)<valueToFind)
pos=testpos+1;
else
limit=testpos;
}
if (pos < list.size() && list.get(pos)==valueToFind)
return pos;
else
return -1;
}
Note that we only need to do one comparison per iteration. The binary search finds the unique position where all the preceding elements are less than valueToFind and all the following elements are greater or equal, and then it checks to see if the value you're looking for is actually there.
The linked answer highlights several advantages of writing a binary search this way.

Simply put No.
Consider the case of array having only one element i.e., {0} and the element to be searched is 0 as well.
In this case, left == right, but if your condition is while(left<right), then searchFirstOfK will return -1.
This answer is in context of the posted code. If we are talking about alternatives so that we can use while(left<right) then Matt Timmermans's answer is correct and is an even better approach.
Below is a comparison of Matt (OP - Let's call it Normal Binary) and Matt Timmermans (Let's call it Optimized Binary) approaches for a list containing values between 0 and 5000000:

This is an extremely interesting question. The thing is there is a way by which you can make your binary search right always. The thing is determining the correct ranges and avoiding the single element stuck-out behavior.
while(left+1<right)
{
m = (left+right)/2;
if(check condition is true)
left = m;
else
right = m;
}
Only key thing to remember is you always make the left as the smallest condition unsatisfying element and right as the biggest condition satisfying element. That way you won't get stuck up. Once you understand the range division by this method, you will never fail at binary search.
The above initialization will give you the largest condition satisfying element.
By changing the initialization you can get variety of elements (like small condition satisfying element).

Code Jam 2008 "Price Is Wrong" - Explanation

I have been going through Code Jam archives. I am really struggling at the solution of The Price Is Wrong of Code Jam 2008
The problem statement is -
You're playing a game in which you try to guess the correct retail price of various products for sale. After guessing the price of each product in a list, you are shown the same list of products sorted by their actual prices, from least to most expensive. (No two products cost the same amount.) Based on this ordering, you are given a single chance to change one or more of your guesses.
Your program should output the smallest set of products such that, if you change your prices for those products, the ordering of your guesses will be consistent with the correct ordering of the product list. The products in the returned set should be listed in alphabetical order. If there are multiple smallest sets, output the set which occurs first lexicographically.
For example, assume these are your initial guesses:
code = $20
jam = $15
foo = $40
bar = $30
google = $60
If the correct ordering is code jam foo bar google, then you would need to change two of your prices in order to match the correct ordering. You might change one guess to read jam = $30 and another guess to read bar = $50, which would match the correct ordering and produce the output set bar jam. However, the output set bar code comes before bar jam lexicographically, and you can match the correct ordering by changing your guesses for these items as well.
Example
Input
code jam foo bar google
20 15 40 30 60
Output
Case #1: bar code
I am not asking for exact solution but for, how should I proceed with the problem
Thanks in advance.

Okay after struggling a bit, I got both small & large cases accepted.
Before posting my ugly ugly code, here is some brief explanation:
First, based on the problem statement, and the limits of the parameters, it is intuitive to think that the core part of the problem is simply finding Longest Increasing Subsequence (LIS). It does rely on your experience to figure it out fast though (indeed most cases in competitive programming field).
Think like this, if I can find the set of items which price is forming a LIS, then the items left are the smallest set that you need to change.
But you need to fulfil one more requirement, which is I think is the hardest part of this problem, is when there exists multiple smallest set, you have to find the lexicographical smallest one. That is same as saying find the LIS with lexicographical largest name (and then we throw them away, the items left is the answer)
To do this, there are many ways, but as the limits are so small (N <= 64), you can use basically whatever algorithm (O(N^4)? O(N^5)? Go ahead!)
My accepted method is to add a stupid twist into the traditional O(N^2) dynamic programming for LIS:
Let DP(i) be the LIS in number[0..i] AND number i must be chosen
Also use an array of set<string> to store the optimal set of items'name which can achieve DP(i), we update this array together with the process of doing dynamic programming for finding DP(i)
Then after the dynamic programming, simply find the lexicographical largest set of item's name, and exclude them from the original item set. The items left is the answer.
Here is my accepted ugly ugly code in C++14, most of the lines is to handle the troublesome I/O stuff, please tell me if it's not clear, I can provide a few example to elaborate more.
#include<bits/stdc++.h>
using namespace std;
int T, n, a[70], dp[70], mx=0;
vector<string> name;
set<string> ans, dp2[70];
string s;
char c;
bool compSet(set<string> st1, set<string> st2){
if(st1.size() != st2.size()) return true;
auto it1 = st1.begin();
auto it2 = st2.begin();
for(; it1 != st1.end(); it1++, it2++)
if((*it1) > (*it2)) return true;
else if((*it1) < (*it2)) return false;
return false;
}
int main() {
cin >> T;
getchar();
for(int qwe=1;qwe<=T;qwe++){
mx=n=0; s=""; ans.clear(); name.clear();
while(c=getchar(), c != '\n'){
if(c == ' ') n++, name.push_back(s), ans.insert(s),s="";
else s+=c;
}
name.push_back(s); ans.insert(s); s=""; n++;
for(int i=0; i<n; i++) cin >> a[i];
getchar();
for(int i=0 ;i<n;i++)
dp[i] = 1, dp2[i].clear(), dp2[i].insert(name[i]);
for(int i=1; i<n; i++){
for(int j=0; j<i;j++){
if(a[j] < a[i] && dp[j]+1 >= dp[i]){
dp[i] = dp[j]+1;
set<string> tmp = dp2[j];
tmp.insert(name[i]);
if(compSet(tmp, dp2[i])) dp2[i] = tmp;
}
}
mx = max(mx, dp[i]);
}
set<string> tmp;
for(int i=0; i<n; i++) {
if(dp[i] == mx) if(compSet(dp2[i], tmp)) tmp = dp2[i];
}
for(auto x : tmp)
ans.erase(x);
printf("Case #%d: ", qwe);
for(auto it = ans.begin(); it!=ans.end(); ){
cout << *it;
if(++it!= ans.end()) cout << ' ';
else cout << '\n';
}
}
return 0;
}

Well based on the problem you have specified, if i tell you that you don't need to tell me the order or name of the products, rather you just need to tell me -
The number of the product values that will change.
What would your answer be?
Basically then the problem has reduced to the following statement -
You are given a list of numbers and you want to make some changes to the list such that the numbers are now in increasing order. But you want your changes made to the individual elements of the list to be minimum.
How would you solve this?
If you find out the Longest Increasing Sub-sequence in the list of numbers you have, then you just need to subtract the length of the list from that LIS value.
Why you ask?
Well because if you want the number of changes made to the list to be minimum then if i leave the longest increasing sub-sequence as it is and change the other values i will definitely get the most optimal answer.
Let's take your example -
We have - 2 10 4 6 8
How many changes would be made to this list?
The longest increasing subsequence length is - 4.
So if we leave 4 item values as they are and change the other remaining values then we would only have to change 5(list length) - 4 = 1 values.
Now addressing your original problem, you need to print the product names. Well if you exclude the elements present in the LIS you should get your answer.
But wait!
What happens when you have many subsequences with the same LIS length? How will you choose the lexicographically smallest answer?
Well why don't you think about it in terms of LIS itself. This should be good enough to get you started right?

Algorithm to find duplicate in an array

I have an assignment to create an algorithm to find duplicates in an array which includes number values. but it has not said which kind of numbers, integers or floats. I have written the following pseudocode:
FindingDuplicateAlgorithm(A) // A is the array
mergeSort(A);
for int i <- 0 to i<A.length
if A[i] == A[i+1]
i++
return A[i]
else
i++
have I created an efficient algorithm?
I think there is a problem in my algorithm, it returns duplicate numbers several time. for example if array include 2 in two for two indexes i will have ...2, 2,... in the output. how can i change it to return each duplicat only one time?
I think it is a good algorithm for integers, but does it work good for float numbers too?

To handle duplicates, you can do the following:
if A[i] == A[i+1]:
result.append(A[i]) # collect found duplicates in a list
while A[i] == A[i+1]: # skip the entire range of duplicates
i++ # until a new value is found

Do you want to find Duplicates in Java?
You may use a HashSet.
HashSet h = new HashSet();
for(Object a:A){
boolean b = h.add(a);
boolean duplicate = !b;
if(duplicate)
// do something with a;
}
The return-Value of add() is defined as:
true if the set did not already
contain the specified element.
EDIT:
I know HashSet is optimized for inserts and contains operations. But I'm not sure if its fast enough for your concerns.
EDIT2:
I've seen you recently added the homework-tag. I would not prefer my answer if itf homework, because it may be to "high-level" for an allgorithm-lesson
http://download.oracle.com/javase/1.4.2/docs/api/java/util/HashSet.html#add%28java.lang.Object%29

Your answer seems pretty good. First sorting and them simply checking neighboring values gives you O(n log(n)) complexity which is quite efficient.
Merge sort is O(n log(n)) while checking neighboring values is simply O(n).
One thing though (as mentioned in one of the comments) you are going to get a stack overflow (lol) with your pseudocode. The inner loop should be (in Java):
for (int i = 0; i < array.length - 1; i++) {
...
}
Then also, if you actually want to display which numbers (and or indexes) are the duplicates, you will need to store them in a separate list.

I'm not sure what language you need to write the algorithm in, but there are some really good C++ solutions in response to my question here. Should be of use to you.

O(n) algorithm: traverse the array and try to input each element in a hashtable/set with number as the hash key. if you cannot enter, than that's a duplicate.

Your algorithm contains a buffer overrun. i starts with 0, so I assume the indexes into array A are zero-based, i.e. the first element is A[0], the last is A[A.length-1]. Now i counts up to A.length-1, and in the loop body accesses A[i+1], which is out of the array for the last iteration. Or, simply put: If you're comparing each element with the next element, you can only do length-1 comparisons.
If you only want to report duplicates once, I'd use a bool variable firstDuplicate, that's set to false when you find a duplicate and true when the number is different from the next. Then you'd only report the first duplicate by only reporting the duplicate numbers if firstDuplicate is true.

public void printDuplicates(int[] inputArray) {
if (inputArray == null) {
throw new IllegalArgumentException("Input array can not be null");
}
int length = inputArray.length;
if (length == 1) {
System.out.print(inputArray[0] + " ");
return;
}
for (int i = 0; i < length; i++) {
if (inputArray[Math.abs(inputArray[i])] >= 0) {
inputArray[Math.abs(inputArray[i])] = -inputArray[Math.abs(inputArray[i])];
} else {
System.out.print(Math.abs(inputArray[i]) + " ");
}
}
}

Naming the counter variables of consecutive for-loops

Let's say you have a piece of code where you have a for-loop, followed by another for-loop and so on... now, which one is preferable
Give every counter variable the same name:
for (int i = 0; i < someBound; i++) {
doSomething();
}
for (int i = 0; i < anotherBound; i++) {
doSomethingElse();
}
Give them different names:
for (int i = 0; i < someBound; i++) {
doSomething();
}
for (int j = 0; j < anotherBound; j++) {
doSomethingElse();
}
I think the second one would be somewhat more readable, on the other hand I'd use j,k and so on to name inner loops... what do you think?

I reuse the variable name in this case. The reason being that i is sort of international programmerese for "loop control variable whose name isn't really important". j is a bit less clear on that score, and once you have to start using k and beyond it gets kind of obscure.
One thing I should add is that when you use nested loops, you do have to go to j, k, and beyond. Of course if you have more than three nested loops, I'd highly suggest a bit of refactoring.

first one is good for me,. cz that would allow you to use j, k in your inner loops., and because you are resetting i = 0 in the second loop so there wont be any issues with old value being used

In a way you wrote your loops the counter is not supposed to be used outside the loop body. So there's nothing wrong about using the same variable names.
As for readability i, j, k are commonly used as variable names for counters. So it is even better to use them rather then pick the next letter over and over again.

I find it interesting that so many people have different opinions on this. Personally I prefer the first method, if for no other reason then to keep j and k open. I could see why people would prefer the second one for readability, but I think any coder worth handing a project over to is going to be able to see what you're doing with the first situation.

The variable should be named something related to the operation or the boundary condition.
For example:
'indexOfPeople',
'activeConnections', or
'fileCount'.
If you are going to use 'i', 'j', and 'k', then reserve 'j' and 'k' for nested loops.

void doSomethingInALoop() {
for (int i = 0; i < someBound; i++) {
doSomething();
}
}
void doSomethingElseInALoop() {
for (int i = 0; i < anotherBound; i++) {
doSomethingElse();
}
}

If the loops are doing the same things (the loop control -- not the loop body, i.e. they are looping over the same array or same range), then I'd use the same variable.
If they are doing different things -- A different array, or whatever, then I'd use use different variables.

So, on the one-hundredth loop, you'd name the variable "zzz"?
The question is really irrelevant since the variable is defined local to the for-loop. Some flavors of C, such as on OpenVMS, require using different names. Otherwise, it amounts to programmer's preference, unless the compiler restricts it.

Algorithm to select a single, random combination of values?

Say I have y distinct values and I want to select x of them at random. What's an efficient algorithm for doing this? I could just call rand() x times, but the performance would be poor if x, y were large.
Note that combinations are needed here: each value should have the same probability to be selected but their order in the result is not important. Sure, any algorithm generating permutations would qualify, but I wonder if it's possible to do this more efficiently without the random order requirement.
How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N covers this case for permutations.

Robert Floyd invented a sampling algorithm for just such situations. It's generally superior to shuffling then grabbing the first x elements since it doesn't require O(y) storage. As originally written it assumes values from 1..N, but it's trivial to produce 0..N and/or use non-contiguous values by simply treating the values it produces as subscripts into a vector/array/whatever.
In pseuocode, the algorithm runs like this (stealing from Jon Bentley's Programming Pearls column "A sample of Brilliance").
initialize set S to empty
for J := N-M + 1 to N do
T := RandInt(1, J)
if T is not in S then
insert T in S
else
insert J in S
That last bit (inserting J if T is already in S) is the tricky part. The bottom line is that it assures the correct mathematical probability of inserting J so that it produces unbiased results.
It's O(x)1 and O(1) with regard to y, O(x) storage.
Note that, in accordance with the combinations tag in the question, the algorithm only guarantees equal probability of each element occuring in the result, not of their relative order in it.
1O(x2) in the worst case for the hash map involved which can be neglected since it's a virtually nonexistent pathological case where all the values have the same hash

Assuming that you want the order to be random too (or don't mind it being random), I would just use a truncated Fisher-Yates shuffle. Start the shuffle algorithm, but stop once you have selected the first x values, instead of "randomly selecting" all y of them.
Fisher-Yates works as follows:
select an element at random, and swap it with the element at the end of the array.
Recurse (or more likely iterate) on the remainder of the array, excluding the last element.
Steps after the first do not modify the last element of the array. Steps after the first two don't affect the last two elements. Steps after the first x don't affect the last x elements. So at that point you can stop - the top of the array contains uniformly randomly selected data. The bottom of the array contains somewhat randomized elements, but the permutation you get of them is not uniformly distributed.
Of course this means you've trashed the input array - if this means you'd need to take a copy of it before starting, and x is small compared with y, then copying the whole array is not very efficient. Do note though that if all you're going to use it for in future is further selections, then the fact that it's in somewhat-random order doesn't matter, you can just use it again. If you're doing the selection multiple times, therefore, you may be able to do only one copy at the start, and amortise the cost.

If you really only need to generate combinations - where the order of elements does not matter - you may use combinadics as they are implemented e.g. here by James McCaffrey.
Contrast this with k-permutations, where the order of elements does matter.
In the first case (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), (3,2,1) are considered the same - in the latter, they are considered distinct, though they contain the same elements.
In case you need combinations, you may really only need to generate one random number (albeit it can be a bit large) - that can be used directly to find the m th combination.
Since this random number represents the index of a particular combination, it follows that your random number should be between 0 and C(n,k).
Calculating combinadics might take some time as well.
It might just not worth the trouble - besides Jerry's and Federico's answer is certainly simpler than implementing combinadics.
However if you really only need a combination and you are bugged about generating the exact number of random bits that are needed and none more... ;-)
While it is not clear whether you want combinations or k-permutations, here is a C# code for the latter (yes, we could generate only a complement if x > y/2, but then we would have been left with a combination that must be shuffled to get a real k-permutation):
static class TakeHelper
{
public static IEnumerable<T> TakeRandom<T>(
this IEnumerable<T> source, Random rng, int count)
{
T[] items = source.ToArray();
count = count < items.Length ? count : items.Length;
for (int i = items.Length - 1 ; count-- > 0; i--)
{
int p = rng.Next(i + 1);
yield return items[p];
items[p] = items[i];
}
}
}
class Program
{
static void Main(string[] args)
{
Random rnd = new Random(Environment.TickCount);
int[] numbers = new int[] { 1, 2, 3, 4, 5, 6, 7 };
foreach (int number in numbers.TakeRandom(rnd, 3))
{
Console.WriteLine(number);
}
}
}
Another, more elaborate implementation that generates k-permutations, that I had lying around and I believe is in a way an improvement over existing algorithms if you only need to iterate over the results. While it also needs to generate x random numbers, it only uses O(min(y/2, x)) memory in the process:
/// <summary>
/// Generates unique random numbers
/// <remarks>
/// Worst case memory usage is O(min((emax-imin)/2, num))
/// </remarks>
/// </summary>
/// <param name="random">Random source</param>
/// <param name="imin">Inclusive lower bound</param>
/// <param name="emax">Exclusive upper bound</param>
/// <param name="num">Number of integers to generate</param>
/// <returns>Sequence of unique random numbers</returns>
public static IEnumerable<int> UniqueRandoms(
Random random, int imin, int emax, int num)
{
int dictsize = num;
long half = (emax - (long)imin + 1) / 2;
if (half < dictsize)
dictsize = (int)half;
Dictionary<int, int> trans = new Dictionary<int, int>(dictsize);
for (int i = 0; i < num; i++)
{
int current = imin + i;
int r = random.Next(current, emax);
int right;
if (!trans.TryGetValue(r, out right))
{
right = r;
}
int left;
if (trans.TryGetValue(current, out left))
{
trans.Remove(current);
}
else
{
left = current;
}
if (r > current)
{
trans[r] = left;
}
yield return right;
}
}
The general idea is to do a Fisher-Yates shuffle and memorize the transpositions in the permutation.
It was not published anywhere nor has it received any peer-review whatsoever. I believe it is a curiosity rather than having some practical value. Nonetheless I am very open to criticism and would generally like to know if you find anything wrong with it - please consider this (and adding a comment before downvoting).

A little suggestion: if x >> y/2, it's probably better to select at random y - x elements, then choose the complementary set.

The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
{
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i<n; i++) // O(n) times
{
selected = rand( 0, --N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
value = a[ selected ];
a[ selected ] = a[ N ];
a[ N ] = value;
backup[ i ] = selected;
picked[ i ] = value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored
for (i=n-1; i>=0; i--) // O(n) times
{
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
N++;
}
return picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and non-destructive on the input array (as a partial shuffle would be) but this is optional
adapted from here
update
another approach using only a single call to PRNG (pseudo-random number generator) in [0,1] by IVAN STOJMENOVIC, "ON RANDOM AND ADAPTIVE PARALLEL GENERATION OF COMBINATORIAL OBJECTS" (section 3), of O(N) (worst-case) complexity

Here is a simple way to do it which is only inefficient if Y is much larger than X.
void randomly_select_subset(
int X, int Y,
const int * inputs, int X, int * outputs
) {
int i, r;
for( i = 0; i < X; ++i ) outputs[i] = inputs[i];
for( i = X; i < Y; ++i ) {
r = rand_inclusive( 0, i+1 );
if( r < i ) outputs[r] = inputs[i];
}
}
Basically, copy the first X of your distinct values to your output array, and then for each remaining value, randomly decide whether or not to include that value.
The random number is further used to choose an element of our (mutable) output array to replace.

If, for example, you have 2^64 distinct values, you can use a symmetric key algorithm (with a 64 bits block) to quickly reshuffle all combinations. (for example Blowfish).
for(i=0; i<x; i++)
e[i] = encrypt(key, i)
This is not random in the pure sense but can be useful for your purpose.
If you want to work with arbitrary # of distinct values following cryptographic techniques you can but it's more complex.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio