C/C++/Java/C#: help parsing numbers - algorithm

I've got a real problem (it's not homework, you can check my profile). I need to parse data whose formatting is not under my control.
The data look like this:
6,852:6,100,752
So there's first a number made of up to 9 digits, followed by a colon.
Then I know for sure that, after the colon:
there's at least one valid combination of numbers that add up to the number before the column
I know exactly how many numbers add up to the number before the colon (two in this case, but it can go as high as ten numbers)
In this case, 6852 is 6100 + 752.
My problem: I need to find these numbers (in this example, 6100 + 752).
It is unfortunate that in the data I'm forced to parse, the separator between the numbers (the comma) is also the separator used inside the number themselves (6100 is written as 6,100).
Once again: that unfortunate formatting is not under my control and, once again, this is not homework.
I need to solve this for up to 10 numbers that need to add up.
Here's an example with three numbers adding up to 6855:
6,855:360,6,175,320
I fear that there are cases where there would be two possible different solutions. HOWEVER if I get a solution that works "in most cases" it would be enough.
How do you typically solve such a problem in a C-style bracket language?

Well, I would start with the brute force approach and then apply some heuristics to prune the search space. Just split the list on the right by commas and iterate over all possible ways to group them into n terms (where n is the number of terms in the solution). You can use the following two rules to skip over invalid possibilities.
(1) You know that any group of 1 or 2 digits must begin a term.
(2) You know that no candidate term in your comma delimited list can be greater than the total on the left. (This also tells you the maximum number of digit groups that any candidate term can have.)

Recursive implementation (pseudo code):
int total; // The total read before the colon
// Takes the list of tokens as integers after the colon
// tokens is the set of tokens left to analyse,
// partialList is the partial list of numbers built so far
// sum is the sum of numbers in partialList
// Aggregate takes 2 ints XXX and YYY and returns XXX,YYY (= XXX*1000+YYY)
function getNumbers(tokens, sum, partialList) =
if isEmpty(tokens)
if sum = total return partialList
else return null // Got to the end with the wrong sum
var result1 = getNumbers(tokens[1:end], sum+token[0], Add(partialList, tokens[0]))
var result2 = getNumbers(tokens[2:end], sum+Aggregate(token[0], token[1]), Append(partialList, Aggregate(tokens[0], tokens[1])))
if result1 <> null return result1
if result2 <> null return result2
return null // No solution overall
You can do a lot better from different points of view, like tail recursion, pruning (you can have XXX,YYY only if YYY has 3 digits)... but this may work well enough for your app.
Divide-and-conquer would make for a nice improvement.

I think you should try all possible ways to parse the string and calculate the sum and return a list of those results that give the correct sum. This should be only one result in most cases unless you are very unlucky.
One thing to note that reduces the number of possibilities is that there is only an ambiguity if you have aa,bbb and bbb is exactly 3 digits. If you have aa,bb there is only one way to parse it.

Reading in C++:
std::pair<int,std::vector<int> > read_numbers(std::istream& is)
{
std::pair<int,std::vector<int> > result;
if(!is >> result.first) throw "foo!"
for(;;) {
int i;
if(!is >> i)
if(is.eof()) return result;
else throw "bar!";
result.second.push_back(i);
char ch;
if(is >> ch)
if(ch != ',') throw "foobar!";
is >> std::ws;
}
}
void f()
{
std::istringstream iss("6,852:6,100,752");
std::pair<int,std::vector<int> > foo = read_numbers(iss);
std::vector<int> result = get_winning_combination( foo.first
, foo.second.begin()
, foo.second.end() );
for( std::vector<int>::const_iterator i=result.begin(); i!=result.end(), ++i)
std::cout << *i << " ";
}
The actual cracking of the numbers is left as an exercise to the reader. :)

I think your main problem is deciding how to actually parse the numbers. The rest is just rote work with strings->numbers and iteration over combinations.
For instance, in the examples you gave, you could heuristically decide that a single-digit number followed by a three-digit number is, in fact, a four-digit number. Does a heuristic such as this hold true over a larger dataset? If not, you're also likely to have to iterate over the possible input parsing combinations, which means the naive solution is going to have a big polynomic complexity (O(nx), where x is >4).
Actually checking for which numbers add up is easy to do using a recursive search.
List<int> GetSummands(int total, int numberOfElements, IEnumerable<int> values)
{
if (numberOfElements == 0)
{
if (total == 0)
return new List<int>(); // Empty list.
else
return null; // Indicate no solution.
}
else if (total < 0)
{
return null; // Indicate no solution.
}
else
{
for (int i = 0; i < values.Count; ++i)
{
List<int> summands = GetSummands(
total - values[i], numberOfElements - 1, values.Skip(i + 1));
if (summands != null)
{
// Found solution.
summands.Add(values[i]);
return summands;
}
}
}
}

Related

Need help understanding this dynamic programming solution

So the problem being asked is:
A message containing letters from A-Z is being encoded to numbers using the following mapping:
'A' -> 1
'B' -> 2
...
'Z' -> 26
Given a non-empty string containing only digits, determine the total number of ways to decode it.
Example 1:
Input: "12"
Output: 2
Explanation: It could be decoded as "AB" (1 2) or "L" (12).
Example 2:
Input: "226"
Output: 3
Explanation: It could be decoded as "BZ" (2 26), "VF" (22 6), or "BBF" (2 2 6).
I solved it very inefficiently and was looking at other solutions and saw that dynamic programming was a good method to approach this problem. Since DP is new to me, I've been reading about it and am now coming back to the solution I saw and I'm trying to understand the logic behind the bottom down approach this guy used.
function numDecodings(s) {
if (s.length === 0) return 0;
const N = s.length;
const dp = Array(N+1).fill(0);
dp[0] = 1;
dp[1] = s[0] === '0' ? 0 : 1;
for (let i = 2; i <= N; i++) {
if (s[i-1] !== '0') {
dp[i] += dp[i-1];
}
if (s[i-2] === '1' || s[i-2] === '2' && s[i-1] <= '6') {
dp[i] += dp[i-2];
}
}
return dp[N];
}
First, let's straighten out some terms:
There are "top-down" and "bottom-up" approaches. "bottom-down" is not a useful term.
DP is a "bottom-up" approach, in that each solution is based on smaller and later ones.
The code has a "memo" array, dp. You may see the term "memoization" in your readings. This means that when we first compute a solution to a particular sub-problem, we will make a memo of it (remember the solution), indexed by the parameters. Thereafter, any time we need the solution, we'll simply look it up instead of recomputing it.
At each position in the string, we remember how many ways there are to code the string up to this point, and then compute how many ways total there are when we add the current character to that prefix.
Very briefly:
If the current character is not 0, then we can count one way to continue any previous string: take this as encoding a letter a-i. In this case, every encoding so far is still valid, so we carry forward that count: dp[i] += dp[i-1].
If the previous and current character form a legal encoding, then we can also take those as a 2-digit encoding (letters j-z), and we carry forward the count from before this 2-character code: dp[i] += dp[i-1].
That's all there is to the algorithm. Note that this does not handle all possible digit sequences: if the code reaches a point where there are no possible continuations, it will simply allow dp[i] to remain 0, and continue without issuing a message. For instance, given the input 12000226, the algorithm will identify three ways to encode 12, extend that to include at for 120, and then reset when it hits the next two zeroes. It then starts over with the next 2, and will find 3 ways to encode the remainder of the string, returning 3 as the result.

Code Jam 2008 "Price Is Wrong" - Explanation

I have been going through Code Jam archives. I am really struggling at the solution of The Price Is Wrong of Code Jam 2008
The problem statement is -
You're playing a game in which you try to guess the correct retail price of various products for sale. After guessing the price of each product in a list, you are shown the same list of products sorted by their actual prices, from least to most expensive. (No two products cost the same amount.) Based on this ordering, you are given a single chance to change one or more of your guesses.
Your program should output the smallest set of products such that, if you change your prices for those products, the ordering of your guesses will be consistent with the correct ordering of the product list. The products in the returned set should be listed in alphabetical order. If there are multiple smallest sets, output the set which occurs first lexicographically.
For example, assume these are your initial guesses:
code = $20
jam = $15
foo = $40
bar = $30
google = $60
If the correct ordering is code jam foo bar google, then you would need to change two of your prices in order to match the correct ordering. You might change one guess to read jam = $30 and another guess to read bar = $50, which would match the correct ordering and produce the output set bar jam. However, the output set bar code comes before bar jam lexicographically, and you can match the correct ordering by changing your guesses for these items as well.
Example
Input
code jam foo bar google
20 15 40 30 60
Output
Case #1: bar code
I am not asking for exact solution but for, how should I proceed with the problem
Thanks in advance.
Okay after struggling a bit, I got both small & large cases accepted.
Before posting my ugly ugly code, here is some brief explanation:
First, based on the problem statement, and the limits of the parameters, it is intuitive to think that the core part of the problem is simply finding Longest Increasing Subsequence (LIS). It does rely on your experience to figure it out fast though (indeed most cases in competitive programming field).
Think like this, if I can find the set of items which price is forming a LIS, then the items left are the smallest set that you need to change.
But you need to fulfil one more requirement, which is I think is the hardest part of this problem, is when there exists multiple smallest set, you have to find the lexicographical smallest one. That is same as saying find the LIS with lexicographical largest name (and then we throw them away, the items left is the answer)
To do this, there are many ways, but as the limits are so small (N <= 64), you can use basically whatever algorithm (O(N^4)? O(N^5)? Go ahead!)
My accepted method is to add a stupid twist into the traditional O(N^2) dynamic programming for LIS:
Let DP(i) be the LIS in number[0..i] AND number i must be chosen
Also use an array of set<string> to store the optimal set of items'name which can achieve DP(i), we update this array together with the process of doing dynamic programming for finding DP(i)
Then after the dynamic programming, simply find the lexicographical largest set of item's name, and exclude them from the original item set. The items left is the answer.
Here is my accepted ugly ugly code in C++14, most of the lines is to handle the troublesome I/O stuff, please tell me if it's not clear, I can provide a few example to elaborate more.
#include<bits/stdc++.h>
using namespace std;
int T, n, a[70], dp[70], mx=0;
vector<string> name;
set<string> ans, dp2[70];
string s;
char c;
bool compSet(set<string> st1, set<string> st2){
if(st1.size() != st2.size()) return true;
auto it1 = st1.begin();
auto it2 = st2.begin();
for(; it1 != st1.end(); it1++, it2++)
if((*it1) > (*it2)) return true;
else if((*it1) < (*it2)) return false;
return false;
}
int main() {
cin >> T;
getchar();
for(int qwe=1;qwe<=T;qwe++){
mx=n=0; s=""; ans.clear(); name.clear();
while(c=getchar(), c != '\n'){
if(c == ' ') n++, name.push_back(s), ans.insert(s),s="";
else s+=c;
}
name.push_back(s); ans.insert(s); s=""; n++;
for(int i=0; i<n; i++) cin >> a[i];
getchar();
for(int i=0 ;i<n;i++)
dp[i] = 1, dp2[i].clear(), dp2[i].insert(name[i]);
for(int i=1; i<n; i++){
for(int j=0; j<i;j++){
if(a[j] < a[i] && dp[j]+1 >= dp[i]){
dp[i] = dp[j]+1;
set<string> tmp = dp2[j];
tmp.insert(name[i]);
if(compSet(tmp, dp2[i])) dp2[i] = tmp;
}
}
mx = max(mx, dp[i]);
}
set<string> tmp;
for(int i=0; i<n; i++) {
if(dp[i] == mx) if(compSet(dp2[i], tmp)) tmp = dp2[i];
}
for(auto x : tmp)
ans.erase(x);
printf("Case #%d: ", qwe);
for(auto it = ans.begin(); it!=ans.end(); ){
cout << *it;
if(++it!= ans.end()) cout << ' ';
else cout << '\n';
}
}
return 0;
}
Well based on the problem you have specified, if i tell you that you don't need to tell me the order or name of the products, rather you just need to tell me -
The number of the product values that will change.
What would your answer be?
Basically then the problem has reduced to the following statement -
You are given a list of numbers and you want to make some changes to the list such that the numbers are now in increasing order. But you want your changes made to the individual elements of the list to be minimum.
How would you solve this?
If you find out the Longest Increasing Sub-sequence in the list of numbers you have, then you just need to subtract the length of the list from that LIS value.
Why you ask?
Well because if you want the number of changes made to the list to be minimum then if i leave the longest increasing sub-sequence as it is and change the other values i will definitely get the most optimal answer.
Let's take your example -
We have - 2 10 4 6 8
How many changes would be made to this list?
The longest increasing subsequence length is - 4.
So if we leave 4 item values as they are and change the other remaining values then we would only have to change 5(list length) - 4 = 1 values.
Now addressing your original problem, you need to print the product names. Well if you exclude the elements present in the LIS you should get your answer.
But wait!
What happens when you have many subsequences with the same LIS length? How will you choose the lexicographically smallest answer?
Well why don't you think about it in terms of LIS itself. This should be good enough to get you started right?

Is this a good Primality Checking Solution?

I have written this code to check if a number is prime (for numbers upto 10^9+7)
Is this a good method ??
What will be the time complexity for this ??
What I have done is that I have made a unordered_set which stores the prime numbers upto sqrt(n).
When checking if a number is prime or not if first check if its is less than the max number in the table.
If it is less it is searched in the table so the complexity should be O(1) in this case.
If it is more the number is put through a divisibility test with the numbers from the set of number containing the prime numbers.
#include<iostream>
#include<set>
#include<math.h>
#include<unordered_set>
#define sqrt10e9 31623
using namespace std;
unordered_set<long long> primeSet = { 2, 3 }; //used for fast lookups
void genrate_prime_set(long range) //this generates prime number upto sqrt(10^9+7)
{
bool flag;
set<long long> tempPrimeSet = { 2, 3 }; //a temporay set is used for genration
set<long long>::iterator j;
for (int i = 3; i <= range; i = i + 2)
{
//cout << i << " ";
flag = true;
for (j = tempPrimeSet.begin(); *j * *j <= i; ++j)
{
if (i % (*j) == 0)
{
flag = false;
break;
}
}
if (flag)
{
primeSet.insert(i);
tempPrimeSet.insert(i);
}
}
}
bool is_prime(long long i,unordered_set<long long> primeSet)
{
bool flag = true;
if(i <= sqrt10e9) //if number exist in the lookup table
return primeSet.count(i);
//if it doesn't iterate through the table
for (unordered_set<long long>::iterator j = primeSet.begin(); j != primeSet.end(); ++j)
{
if (*j * *j <= i && i % (*j) == 0)
{
flag = false;
break;
}
}
return flag;
}
int main()
{
//long long testCases, a, b, kiwiCount;
bool primeFlag = true;
//unordered_set<int> primeNum;
genrate_prime_set(sqrt10e9);
cout << primeSet.size()<<"\n";
cout << is_prime(9999991,primeSet);
return 0;
}
This doesn't strike me as a particularly efficient way to do the job at hand.
Although it probably won't make a big difference in the end, the efficient way to generate all the primes up to some specific limit is clearly to use a sieve--the sieve of Eratosthenes is simple and fast. There are a couple of modifications that can be faster, but for the small size you're dealing with, they're probably not worthwhile.
These normally produce their output in a more effective format than you're currently using as well. In particular, you typically just dedicate one bit to each possible prime (i.e., each odd number) and end up with it zeroed if the number is composite, and one if it's prime (you can, of course, reverse the sense if you prefer).
Since you only need one bit for each odd number from 3 to 31623, this requires only about 16 K bits, or about 2K bytes--a truly minuscule amount of memory by modern standards (especially: little enough to fit in L1 cache quite easily).
Since the bits are stored in order, it's also trivial to compute and test by the factors up to the square root of the number you're testing instead of testing against all the numbers in the table (including those greater than the square root of the number you're testing, which is obviously a waste of time). This also optimizes access to the memory in case some of it's not in the cache (i.e., you can access all the data in order, making life as easy as possible for the hardware prefetcher).
If you wanted to optimize further, I'd consider just using the sieve to find all primes up to 109+7, and look up inputs. Whether this is a win will depend (heavily) upon the number of queries you can expect to receive. A quick check shows that a simple implementation of the Sieve of Eratosthenes can find all primes up to 109 in about 17 seconds. After that, each query is (of course) essentially instantaneous (i.e., the cost of a single memory read). This does require around 120 megabytes of memory for the result of the sieve, which would once have been a major consideration, but (except on fairly limited systems) normally wouldn't be any more.
The very short answer: do research on the subject, starting with the term "Miller-Rabin"
The short answer is no:
Looking for factors of a number is a poor way to check for primality
Exhaustively searching through primes is a poor way to look for factors
Especially if you search through every prime, rather than just the ones less than or equal to the square root of the number
Doing a primality test on each number of them is a poor way to generate a list of primes
Also, you should take in primeSet by reference rather than copy, if it really needs to be a parameter.
Note: testing small primes to see if they divide a number is a useful first step of a primality test, but should generally only be used for the smallest primes before switching to a better method
No, it's not a very good way to determine if a number is prime. Here is pseudocode for a simple primality test that is sufficient for numbers in your range; I'll leave it to you to translate to C++:
function isPrime(n)
d := 2
while d * d <= n
if n % d == 0
return False
d := d + 1
return True
This works by trying every potential divisor up to the square root of the input number n; if no divisor has been found, then the input number could not be composite, meaning of the form n = p × q, because one of the two divisors p or q must be less than the square root of n while the other is greater than the square root of n.
There are better ways to determine primality; for instance, after initially checking if the number is even (and hence prime only if n = 2), it is only necessary to test odd potential divisors, halving the amount of work necessary. If you have a list of primes up to the square root of n, you can use that list as trial divisors and make the process even faster. And there are other techniques for larger n.
But that should be enough to get you started. When you are ready for more, come back here and ask more questions.
I can only suggest a way to use a library function in Java to check the primality of a number. As for the other questions, I do not have any answers.
The java.math.BigInteger.isProbablePrime(int certainty) returns true if this BigInteger is probably prime, false if it's definitely composite. If certainty is ≤ 0, true is returned. You should try and use it in your code. So try rewriting it in Java
Parameters
certainty - a measure of the uncertainty that the caller is willing to tolerate: if the call returns true the probability that this BigInteger is prime exceeds (1 - 1/2^certainty). The execution time of this method is proportional to the value of this parameter.
Return Value
This method returns true if this BigInteger is probably prime, false if it's definitely composite.
Example
The following example shows the usage of math.BigInteger.isProbablePrime() method
import java.math.*;
public class BigIntegerDemo {
public static void main(String[] args) {
// create 3 BigInteger objects
BigInteger bi1, bi2, bi3;
// create 3 Boolean objects
Boolean b1, b2, b3;
// assign values to bi1, bi2
bi1 = new BigInteger("7");
bi2 = new BigInteger("9");
// perform isProbablePrime on bi1, bi2
b1 = bi1.isProbablePrime(1);
b2 = bi2.isProbablePrime(1);
b3 = bi2.isProbablePrime(-1);
String str1 = bi1+ " is prime with certainity 1 is " +b1;
String str2 = bi2+ " is prime with certainity 1 is " +b2;
String str3 = bi2+ " is prime with certainity -1 is " +b3;
// print b1, b2, b3 values
System.out.println( str1 );
System.out.println( str2 );
System.out.println( str3 );
}
}
Output
7 is prime with certainity 1 is true
9 is prime with certainity 1 is false
9 is prime with certainity -1 is true

Words with at least 2 common letters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
A string is named 2-consistent if each word has at least 2 letters in common with the next word.
For example
"Atom another era" [atom has a and t in common with another and
another has e and a in common with era (the answer is not unique).
First of all I need a data structure which takes 2 words and answers in constant time at the question "Do these words have at least 2 letters in common?"
Now, given a string of n words I need to find the longest 2-consistent substring.
I can't figure out what data structure to use. I thought to radix tree or prefix tree, but I could not find the answer. Can you help me?
Assuming unaccented letters and ignoring capitalization, for each word you can store a bit-field in a 32-bit integer where bits 0-25 are set to 1 if the corresponding letter from a-z is present.
The integer can be computed in linear time like this:
int getBitField(char* word)
{
int bits = 0;
while(*word)
bits |= 1 << ((*word++) - 'a');
return bits;
}
If the words are assumed to be words in English or some other language, with a maximum word length then the difference between constant and linear time is fairly meaningless because (for the sake of argument) all words less than the maximum length can be padded out with non-matching characters, which will result in a constant time algorithm.
Once you have the bit fields for two words you can test if they are 2-consistent in constant time by ANDing them together and checking if the result is not zero (which would indicate no letters in common) and not a power of 2 (which would indicate only one letter in common as only a single bit is set). You can test for a power of 2 by ANDing a number with itself minus 1.
bool is2Consistent(int word1bits, int word2bits)
{
int common = word1bits & word2bits;
return (common & (common - 1)) != 0;
}
This won't work if you intend to define words like 'meet' and 'beef' which have repeated letters as 2-consistent.
If you wanted to test for 3-consistency, you just need to add an extra line to the function:
bool is3Consistent(int word1bits, int word2bits)
{
int common = word1bits & word2bits;
common &= (common - 1);
return (common & (common - 1)) != 0;
}
ANDing an integer with itself minus one just removes the least significant bit, so you could apply it an arbitrary number of times to test for 4-consistency, 5-consistency etc.
Part 1: Are wordOne and wordTwo 2-consistent ?
public bool IsWordsTwoConsistent(string first, string second)
{
int[] letters = Enumerable.Repeat(0, 26).ToArray();
int countDoubles = 0;
foreach (char c in first.toLowerCase())
{
letters[(int)c - 97]++;
}
foreach (char c in second.toLowerCase())
{
if (letters[(int)c - 97] > 0)
countDoubles++;
if (countDoubles > 1)
return true;
}
return false;
}
Part 2: Longest 2-consistent substring
public int GetPositionLongestTwoConsistentSubstring(string input)
{
string[] wordsArray = input.Split(' ');
int maxLocation = -1, maxLength = 0;
int candLocation = -1, candLength = 0; //candiadate
for (int i = 0 ; i < wordsArray.Length - 1 ; i++)
{
if (IsWordsTwoConsistent(wordsArray[i], wordsArray[i+1]))
{
candLength++;
if (candLocation == -1)
candLength = i;
}
else
{
if (candLength > maxLength)
{
maxLength = candLength;
maxLocation = candLocation;
}
candLength = 0;
candLocation = -1;
}
}
if (candLength > maxLength)
{
maxLength = candLength;
maxLocation = candLocation;
}
return maxLocation;
}
First of all I need a data structure which takes 2 words and answers
in constant time at the question "Do these words have at least 2
letters in common?"
Easy. First compute the adjacency matrix for the dictionary you are using where 'adjacent' is defined to mean 'having at least two letters in common'. I disagree with the comments above, storing even a comprehensive English dictionary isn't very much data these days. Storing the full adjacency matrix might take too much space for your liking, so use sparse array facilities.
Now, bear in mind that an English word is just a number in base-26 (or base-52 if you insist on distinguishing capital letters) so looking up the row and column for a pair of words is a constant-time operation and you have the solution to your question.
Oh sure, this consumes space and takes a fair amount of pre-computation but OP asks about a data structure for answering the question in constant time.

Algorithm to find duplicate in an array

I have an assignment to create an algorithm to find duplicates in an array which includes number values. but it has not said which kind of numbers, integers or floats. I have written the following pseudocode:
FindingDuplicateAlgorithm(A) // A is the array
mergeSort(A);
for int i <- 0 to i<A.length
if A[i] == A[i+1]
i++
return A[i]
else
i++
have I created an efficient algorithm?
I think there is a problem in my algorithm, it returns duplicate numbers several time. for example if array include 2 in two for two indexes i will have ...2, 2,... in the output. how can i change it to return each duplicat only one time?
I think it is a good algorithm for integers, but does it work good for float numbers too?
To handle duplicates, you can do the following:
if A[i] == A[i+1]:
result.append(A[i]) # collect found duplicates in a list
while A[i] == A[i+1]: # skip the entire range of duplicates
i++ # until a new value is found
Do you want to find Duplicates in Java?
You may use a HashSet.
HashSet h = new HashSet();
for(Object a:A){
boolean b = h.add(a);
boolean duplicate = !b;
if(duplicate)
// do something with a;
}
The return-Value of add() is defined as:
true if the set did not already
contain the specified element.
EDIT:
I know HashSet is optimized for inserts and contains operations. But I'm not sure if its fast enough for your concerns.
EDIT2:
I've seen you recently added the homework-tag. I would not prefer my answer if itf homework, because it may be to "high-level" for an allgorithm-lesson
http://download.oracle.com/javase/1.4.2/docs/api/java/util/HashSet.html#add%28java.lang.Object%29
Your answer seems pretty good. First sorting and them simply checking neighboring values gives you O(n log(n)) complexity which is quite efficient.
Merge sort is O(n log(n)) while checking neighboring values is simply O(n).
One thing though (as mentioned in one of the comments) you are going to get a stack overflow (lol) with your pseudocode. The inner loop should be (in Java):
for (int i = 0; i < array.length - 1; i++) {
...
}
Then also, if you actually want to display which numbers (and or indexes) are the duplicates, you will need to store them in a separate list.
I'm not sure what language you need to write the algorithm in, but there are some really good C++ solutions in response to my question here. Should be of use to you.
O(n) algorithm: traverse the array and try to input each element in a hashtable/set with number as the hash key. if you cannot enter, than that's a duplicate.
Your algorithm contains a buffer overrun. i starts with 0, so I assume the indexes into array A are zero-based, i.e. the first element is A[0], the last is A[A.length-1]. Now i counts up to A.length-1, and in the loop body accesses A[i+1], which is out of the array for the last iteration. Or, simply put: If you're comparing each element with the next element, you can only do length-1 comparisons.
If you only want to report duplicates once, I'd use a bool variable firstDuplicate, that's set to false when you find a duplicate and true when the number is different from the next. Then you'd only report the first duplicate by only reporting the duplicate numbers if firstDuplicate is true.
public void printDuplicates(int[] inputArray) {
if (inputArray == null) {
throw new IllegalArgumentException("Input array can not be null");
}
int length = inputArray.length;
if (length == 1) {
System.out.print(inputArray[0] + " ");
return;
}
for (int i = 0; i < length; i++) {
if (inputArray[Math.abs(inputArray[i])] >= 0) {
inputArray[Math.abs(inputArray[i])] = -inputArray[Math.abs(inputArray[i])];
} else {
System.out.print(Math.abs(inputArray[i]) + " ");
}
}
}

Resources