Generate all unique substrings for given string

Generate all unique substrings for given string - algorithm

Given a string s, what is the fastest method to generate a set of all its unique substrings?
Example: for str = "aba" we would get substrs={"a", "b", "ab", "ba", "aba"}.
The naive algorithm would be to traverse the entire string generating substrings in length 1..n in each iteration, yielding an O(n^2) upper bound.
Is a better bound possible?
(this is technically homework, so pointers-only are welcome as well)

As other posters have said, there are potentially O(n^2) substrings for a given string, so printing them out cannot be done faster than that. However there exists an efficient representation of the set that can be constructed in linear time: the suffix tree.

There is no way to do this faster than O(n2) because there are a total of O(n2) substrings in a string, so if you have to generate them all, their number will be n(n + 1) / 2 in the worst case, hence the upper lower bound of O(n2) Ω(n2).

First one is brute force which has complexity O(N^3) which could be brought down to O(N^2 log(N))
Second One using HashSet which has Complexity O(N^2)
Third One using LCP by initially finding all the suffix of a given string which has the worst case O(N^2) and best case O(N Log(N)).
First Solution:-
import java.util.Scanner;
public class DistinctSubString {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Enter The string");
String s = in.nextLine();
long startTime = System.currentTimeMillis();
int L = s.length();
int N = L * (L + 1) / 2;
String[] Comb = new String[N];
for (int i = 0, p = 0; i < L; ++i) {
for (int j = 0; j < (L - i); ++j) {
Comb[p++] = s.substring(j, i + j + 1);
}
}
/*
* for(int j=0;j<N;++j) { System.out.println(Comb[j]); }
*/
boolean[] val = new boolean[N];
for (int i = 0; i < N; ++i)
val[i] = true;
int counter = N;
int p = 0, start = 0;
for (int i = 0, j; i < L; ++i) {
p = L - i;
for (j = start; j < (start + p); ++j) {
if (val[j]) {
//System.out.println(Comb[j]);
for (int k = j + 1; k < start + p; ++k) {
if (Comb[j].equals(Comb[k])) {
counter--;
val[k] = false;
}
}
}
}
start = j;
}
System.out.println("Substrings are " + N
+ " of which unique substrings are " + counter);
long endTime = System.currentTimeMillis();
System.out.println("It took " + (endTime - startTime) + " milliseconds");
}
}
Second Solution:-
import java.util.*;
public class DistictSubstrings_usingHashTable {
public static void main(String args[]) {
// create a hash set
Scanner in = new Scanner(System.in);
System.out.print("Enter The string");
String s = in.nextLine();
int L = s.length();
long startTime = System.currentTimeMillis();
Set<String> hs = new HashSet<String>();
// add elements to the hash set
for (int i = 0; i < L; ++i) {
for (int j = 0; j < (L - i); ++j) {
hs.add(s.substring(j, i + j + 1));
}
}
System.out.println(hs.size());
long endTime = System.currentTimeMillis();
System.out.println("It took " + (endTime - startTime) + " milliseconds");
}
}
Third Solution:-
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
public class LCPsolnFroDistinctSubString {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter Desired String ");
String string = br.readLine();
int length = string.length();
String[] arrayString = new String[length];
for (int i = 0; i < length; ++i) {
arrayString[i] = string.substring(length - 1 - i, length);
}
Arrays.sort(arrayString);
for (int i = 0; i < length; ++i)
System.out.println(arrayString[i]);
long num_substring = arrayString[0].length();
for (int i = 0; i < length - 1; ++i) {
int j = 0;
for (; j < arrayString[i].length(); ++j) {
if (!((arrayString[i].substring(0, j + 1)).equals((arrayString)[i + 1]
.substring(0, j + 1)))) {
break;
}
}
num_substring += arrayString[i + 1].length() - j;
}
System.out.println("unique substrings = " + num_substring);
}
}
Fourth Solution:-
public static void printAllCombinations(String soFar, String rest) {
if(rest.isEmpty()) {
System.out.println(soFar);
} else {
printAllCombinations(soFar + rest.substring(0,1), rest.substring(1));
printAllCombinations(soFar , rest.substring(1));
}
}
Test case:- printAllCombinations("", "abcd");

For big oh ... Best you could do would be O(n^2)
No need to reinvent the wheel, its not based on a strings, but on a sets, so you will have to take the concepts and apply them to your own situation.
Algorithms
Really Good White Paper from MS
In depth PowerPoint
Blog on string perms

well, since there is potentially n*(n+1)/2 different substrings (+1 for the empty substring), I doubt you can be better than O(n*2) (worst case). the easiest thing is to generate them and use some nice O(1) lookup table (such as a hashmap) for excluding duplicates right when you find them.

class SubstringsOfAString {
public static void main(String args[]) {
String string = "Hello", sub = null;
System.out.println("Substrings of \"" + string + "\" are :-");
for (int i = 0; i < string.length(); i++) {
for (int j = 1; j <= string.length() - i; j++) {
sub = string.substring(i, j + i);
System.out.println(sub);
}
}
}
}

class program
{
List<String> lst = new List<String>();
String str = "abc";
public void func()
{
subset(0, "");
lst.Sort();
lst = lst.Distinct().ToList();
foreach (String item in lst)
{
Console.WriteLine(item);
}
}
void subset(int n, String s)
{
for (int i = n; i < str.Length; i++)
{
lst.Add(s + str[i].ToString());
subset(i + 1, s + str[i].ToString());
}
}
}

This prints unique substrings.
https://ideone.com/QVWOh0
def uniq_substring(test):
lista=[]
[lista.append(test[i:i+k+1]) for i in range(len(test)) for k in
range(len(test)-i) if test[i:i+k+1] not in lista and
test[i:i+k+1][::-1] not in lista]
print lista
uniq_substring('rohit')
uniq_substring('abab')
['r', 'ro', 'roh', 'rohi', 'rohit', 'o', 'oh', 'ohi', 'ohit', 'h',
'hi', 'hit', 'i', 'it', 't']
['a', 'ab', 'aba', 'abab', 'b', 'bab']

Many answers that include 2 for loops and a .substring() call claim O(N^2) time complexity. However, it is important to note that the worst case for a .substring() call in Java (post update 6 in Java 7) is O(N). So by adding a .substring() call in your code, the order of N has increased by one.
Therefore, 2 for loops and a .substring() call within those loops equals an O(N^3) time complexity.

It can only be done in o(n^2) time as total number of unique substrings of a string would be n(n+1)/2.
Example:
string s = "abcd"
pass 0: (all the strings are of length 1)
a, b, c, d = 4 strings
pass 1: (all the strings are of length 2)
ab, bc, cd = 3 strings
pass 2: (all the strings are of length 3)
abc, bcd = 2 strings
pass 3: (all the strings are of length 4)
abcd = 1 strings
Using this analogy, we can write solution with o(n^2) time complexity and constant space complexity.
The source code is as below:
#include<stdio.h>
void print(char arr[], int start, int end)
{
int i;
for(i=start;i<=end;i++)
{
printf("%c",arr[i]);
}
printf("\n");
}
void substrings(char arr[], int n)
{
int pass,j,start,end;
int no_of_strings = n-1;
for(pass=0;pass<n;pass++)
{
start = 0;
end = start+pass;
for(j=no_of_strings;j>=0;j--)
{
print(arr,start, end);
start++;
end = start+pass;
}
no_of_strings--;
}
}
int main()
{
char str[] = "abcd";
substrings(str,4);
return 0;
}

Naive algorithm takes O(n^3) time instead of O(n^2) time.
There are O(n^2) number of substrings.
And if you put O(n^2) number of substrings, for example, set,
then set compares O(lgn) comparisons for each string to check if it alrady exists in the set or not.
Besides it takes O(n) time for string comparison.
Therefore, it takes O(n^3 lgn) time if you use set. and you can reduce it O(n^3) time if you use hashtable instead of set.
The point is it is string comparisons not number comparisons.
So one of the best algorithm let's say if you use suffix array and longest common prefix (LCP) algorithm, it reduces O(n^2) time for this problem.
Building a suffix array using O(n) time algorithm.
Time for LCP = O(n) time.
Since for each pair of strings in suffix array, do LCP so total time is O(n^2) time to find the length of distinct subtrings.
Besides if you want to print all distinct substrings, it takes O(n^2) time.

Try this code using a suffix array and longest common prefix. It can also give you the total number of unique substrings. The code might give a stack overflow in visual studio but runs fine in Eclipse C++. That's because it returns vectors for functions. Haven't tested it against extremely long strings. Will do so and report back.
// C++ program for building LCP array for given text
#include <bits/stdc++.h>
#include <vector>
#include <string>
using namespace std;
#define MAX 100000
int cum[MAX];
// Structure to store information of a suffix
struct suffix
{
int index; // To store original index
int rank[2]; // To store ranks and next rank pair
};
// A comparison function used by sort() to compare two suffixes
// Compares two pairs, returns 1 if first pair is smaller
int cmp(struct suffix a, struct suffix b)
{
return (a.rank[0] == b.rank[0])? (a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}
// This is the main function that takes a string 'txt' of size n as an
// argument, builds and return the suffix array for the given string
vector<int> buildSuffixArray(string txt, int n)
{
// A structure to store suffixes and their indexes
struct suffix suffixes[n];
// Store suffixes and their indexes in an array of structures.
// The structure is needed to sort the suffixes alphabatically
// and maintain their old indexes while sorting
for (int i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = txt[i] - 'a';
suffixes[i].rank[1] = ((i+1) < n)? (txt[i + 1] - 'a'): -1;
}
// Sort the suffixes using the comparison function
// defined above.
sort(suffixes, suffixes+n, cmp);
// At his point, all suffixes are sorted according to first
// 2 characters. Let us sort suffixes according to first 4
// characters, then first 8 and so on
int ind[n]; // This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for (int k = 4; k < 2*n; k = k*2)
{
// Assigning rank and index values to first suffix
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
// Assigning rank to suffixes
for (int i = 1; i < n; i++)
{
// If first rank and next ranks are same as that of previous
// suffix in array, assign the same new rank to this suffix
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else // Otherwise increment rank and assign
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
// Assign next rank to every suffix
for (int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}
// Sort the suffixes according to first k characters
sort(suffixes, suffixes+n, cmp);
}
// Store indexes of all sorted suffixes in the suffix array
vector<int>suffixArr;
for (int i = 0; i < n; i++)
suffixArr.push_back(suffixes[i].index);
// Return the suffix array
return suffixArr;
}
/* To construct and return LCP */
vector<int> kasai(string txt, vector<int> suffixArr)
{
int n = suffixArr.size();
// To store LCP array
vector<int> lcp(n, 0);
// An auxiliary array to store inverse of suffix array
// elements. For example if suffixArr[0] is 5, the
// invSuff[5] would store 0. This is used to get next
// suffix string from suffix array.
vector<int> invSuff(n, 0);
// Fill values in invSuff[]
for (int i=0; i < n; i++)
invSuff[suffixArr[i]] = i;
// Initialize length of previous LCP
int k = 0;
// Process all suffixes one by one starting from
// first suffix in txt[]
for (int i=0; i<n; i++)
{
/* If the current suffix is at n-1, then we don’t
have next substring to consider. So lcp is not
defined for this substring, we put zero. */
if (invSuff[i] == n-1)
{
k = 0;
continue;
}
/* j contains index of the next substring to
be considered to compare with the present
substring, i.e., next string in suffix array */
int j = suffixArr[invSuff[i]+1];
// Directly start matching from k'th index as
// at-least k-1 characters will match
while (i+k<n && j+k<n && txt[i+k]==txt[j+k])
k++;
lcp[invSuff[i]] = k; // lcp for the present suffix.
// Deleting the starting character from the string.
if (k>0)
k--;
}
// return the constructed lcp array
return lcp;
}
// Utility function to print an array
void printArr(vector<int>arr, int n)
{
for (int i = 0; i < n; i++)
cout << arr[i] << " ";
cout << endl;
}
// Driver program
int main()
{
int t;
cin >> t;
//t = 1;
while (t > 0) {
//string str = "banana";
string str;
cin >> str; // >> k;
vector<int>suffixArr = buildSuffixArray(str, str.length());
int n = suffixArr.size();
cout << "Suffix Array : \n";
printArr(suffixArr, n);
vector<int>lcp = kasai(str, suffixArr);
cout << "\nLCP Array : \n";
printArr(lcp, n);
// cum will hold number of substrings if that'a what you want (total = cum[n-1]
cum[0] = n - suffixArr[0];
// vector <pair<int,int>> substrs[n];
int count = 1;
for (int i = 1; i <= n-suffixArr[0]; i++) {
//substrs[0].push_back({suffixArr[0],i});
string sub_str = str.substr(suffixArr[0],i);
cout << count << " " << sub_str << endl;
count++;
}
for(int i = 1;i < n;i++) {
cum[i] = cum[i-1] + (n - suffixArr[i] - lcp[i - 1]);
int end = n - suffixArr[i];
int begin = lcp[i-1] + 1;
int begin_suffix = suffixArr[i];
for (int j = begin, k = 1; j <= end; j++, k++) {
//substrs[i].push_back({begin_suffix, lcp[i-1] + k});
// cout << "i push " << i << " " << begin_suffix << " " << k << endl;
string sub_str = str.substr(begin_suffix, lcp[i-1] +k);
cout << count << " " << sub_str << endl;
count++;
}
}
/*int count = 1;
cout << endl;
for(int i = 0; i < n; i++){
for (auto it = substrs[i].begin(); it != substrs[i].end(); ++it ) {
string sub_str = str.substr(it->first, it->second);
cout << count << " " << sub_str << endl;
count++;
}
}*/
t--;
}
return 0;
}
And here's a simpler algorithm:
#include <iostream>
#include <string.h>
#include <vector>
#include <string>
#include <algorithm>
#include <time.h>
using namespace std;
char txt[100000], *p[100000];
int m, n;
int cmp(const void *p, const void *q) {
int rc = memcmp(*(char **)p, *(char **)q, m);
return rc;
}
int main() {
std::cin >> txt;
int start_s = clock();
n = strlen(txt);
int k; int i;
int count = 1;
for (m = 1; m <= n; m++) {
for (k = 0; k+m <= n; k++)
p[k] = txt+k;
qsort(p, k, sizeof(p[0]), &cmp);
for (i = 0; i < k; i++) {
if (i != 0 && cmp(&p[i-1], &p[i]) == 0){
continue;
}
char cur_txt[100000];
memcpy(cur_txt, p[i],m);
cur_txt[m] = '\0';
std::cout << count << " " << cur_txt << std::endl;
count++;
}
}
cout << --count << endl;
int stop_s = clock();
float run_time = (stop_s - start_s) / double(CLOCKS_PER_SEC);
cout << endl << "distinct substrings \t\tExecution time = " << run_time << " seconds" << endl;
return 0;
}
Both algorithms listed a simply too slow for extremely long strings though. I tested the algorithms against a string of length over 47,000 and the algorithms took over 20 minutes to complete, with the first one taking 1200 seconds, and the second one taking 1360 seconds, and that's just counting the unique substrings without outputting to the terminal. So for probably strings of length up to 1000 you might get a working solution. Both solutions did compute the same total number of unique substrings though. I did test both algorithms against string lengths of 2000 and 10,000. The times were for the first algorithm: 0.33 s and 12 s; for the second algorithm it was 0.535 s and 20 s. So it looks like in general the first algorithm is faster.

Here is my code in Python. It generates all possible substrings of any given string.
def find_substring(str_in):
substrs = []
if len(str_in) <= 1:
return [str_in]
s1 = find_substring(str_in[:1])
s2 = find_substring(str_in[1:])
substrs.append(s1)
substrs.append(s2)
for s11 in s1:
substrs.append(s11)
for s21 in s2:
substrs.append("%s%s" %(s11, s21))
for s21 in s2:
substrs.append(s21)
return set(substrs)
If you pass str_ = "abcdef" to the function, it generates the following results:
a, ab, abc, abcd, abcde, abcdef, abcdf, abce, abcef, abcf, abd, abde, abdef, abdf, abe, abef, abf, ac, acd, acde, acdef, acdf, ace, acef, acf, ad, ade, adef, adf, ae, aef, af, b, bc, bcd, bcde, bcdef, bcdf, bce, bcef, bcf, bd, bde, bdef, bdf, be, bef, bf, c, cd, cde, cdef, cdf, ce, cef, cf, d, de, def, df, e, ef, f

Related

Maximum product prefix string

The following is a demo question from a coding interview site called codility:
A prefix of a string S is any leading contiguous part of S. For example, "c" and "cod" are prefixes of the string "codility". For simplicity, we require prefixes to be non-empty.
The product of prefix P of string S is the number of occurrences of P multiplied by the length of P. More precisely, if prefix P consists of K characters and P occurs exactly T times in S, then the product equals K * T.
For example, S = "abababa" has the following prefixes:
"a", whose product equals 1 * 4 = 4,
"ab", whose product equals 2 * 3 = 6,
"aba", whose product equals 3 * 3 = 9,
"abab", whose product equals 4 * 2 = 8,
"ababa", whose product equals 5 * 2 = 10,
"ababab", whose product equals 6 * 1 = 6,
"abababa", whose product equals 7 * 1 = 7.
The longest prefix is identical to the original string. The goal is to choose such a prefix as maximizes the value of the product. In above example the maximal product is 10.
Below is my poor solution in Java requiring O(N^2) time. It is apparently possible to do this in O(N). I was thinking Kadanes algorithm. But I can't think of any way that I can encode some information at each step that lets me find the running max. Can any one think of an O(N) algorithm for this?
import java.util.HashMap;
class Solution {
public int solution(String S) {
int N = S.length();
if(N<1 || N>300000){
System.out.println("Invalid length");
return(-1);
}
HashMap<String,Integer> prefixes = new HashMap<String,Integer>();
for(int i=0; i<N; i++){
String keystr = "";
for(int j=i; j>=0; j--) {
keystr += S.charAt(j);
if(!prefixes.containsKey(keystr))
prefixes.put(keystr,keystr.length());
else{
int newval = prefixes.get(keystr)+keystr.length();
if(newval > 1000000000)return 1000000000;
prefixes.put(keystr,newval);
}
}
}
int maax1 = 0;
for(int val : prefixes.values())
if(val>maax1)
maax1 = val;
return maax1;
}
}

Here's a O(n log n) version based on suffix arrays. There are O(n) construction algorithms for suffix arrays, I just don't have the patience to code them.
Example output (this output isn't O(n), but it's only to show that we can indeed compute all the scores):
4*1 a
3*3 aba
2*5 ababa
1*7 abababa
3*2 ab
2*4 abab
1*6 ababab
Basically you have to reverse the string, and compute the suffix array (SA) and the longest common prefix (LCP).
Then you have traverse the SA array backwards looking for LCPs that match the entire suffix (prefix in the original string). If there's a match, increment the counter, otherwise reset it to 1. Each suffix (prefix) receive a "score" (SCR) that corresponds to the number of times it appears in the original string.
#include <iostream>
#include <cstring>
#include <string>
#define MAX 10050
using namespace std;
int RA[MAX], tempRA[MAX];
int SA[MAX], tempSA[MAX];
int C[MAX];
int Phi[MAX], PLCP[MAX], LCP[MAX];
int SCR[MAX];
void suffix_sort(int n, int k) {
memset(C, 0, sizeof C);
for (int i = 0; i < n; i++)
C[i + k < n ? RA[i + k] : 0]++;
int sum = 0;
for (int i = 0; i < max(256, n); i++) {
int t = C[i];
C[i] = sum;
sum += t;
}
for (int i = 0; i < n; i++)
tempSA[C[SA[i] + k < n ? RA[SA[i] + k] : 0]++] = SA[i];
memcpy(SA, tempSA, n*sizeof(int));
}
void suffix_array(string &s) {
int n = s.size();
for (int i = 0; i < n; i++)
RA[i] = s[i] - 1;
for (int i = 0; i < n; i++)
SA[i] = i;
for (int k = 1; k < n; k *= 2) {
suffix_sort(n, k);
suffix_sort(n, 0);
int r = tempRA[SA[0]] = 0;
for (int i = 1; i < n; i++) {
int s1 = SA[i], s2 = SA[i-1];
bool equal = true;
equal &= RA[s1] == RA[s2];
equal &= RA[s1+k] == RA[s2+k];
tempRA[SA[i]] = equal ? r : ++r;
}
memcpy(RA, tempRA, n*sizeof(int));
}
}
void lcp(string &s) {
int n = s.size();
Phi[SA[0]] = -1;
for (int i = 1; i < n; i++)
Phi[SA[i]] = SA[i-1];
int L = 0;
for (int i = 0; i < n; i++) {
if (Phi[i] == -1) {
PLCP[i] = 0;
continue;
}
while (s[i + L] == s[Phi[i] + L])
L++;
PLCP[i] = L;
L = max(L-1, 0);
}
for (int i = 1; i < n; i++)
LCP[i] = PLCP[SA[i]];
}
void score(string &s) {
SCR[s.size()-1] = 1;
int sum = 1;
for (int i=s.size()-2; i>=0; i--) {
if (LCP[i+1] < s.size()-SA[i]-1) {
sum = 1;
} else {
sum++;
}
SCR[i] = sum;
}
}
int main() {
string s = "abababa";
s = string(s.rbegin(), s.rend()) +".";
suffix_array(s);
lcp(s);
score(s);
for(int i=0; i<s.size(); i++) {
string ns = s.substr(SA[i], s.size()-SA[i]-1);
ns = string(ns.rbegin(), ns.rend());
cout << SCR[i] << "*" << ns.size() << " " << ns << endl;
}
}
Most of this code (specially the suffix array and LCP implementations) I have been using for some years in contests. This version in special I adapted from this one I wrote some years ago.

public class Main {
public static void main(String[] args) {
String input = "abababa";
String prefix;
int product;
int maxProduct = 0;
for (int i = 1; i <= input.length(); i++) {
prefix = input.substring(0, i);
String substr;
int occurs = 0;
for (int j = prefix.length(); j <= input.length(); j++) {
substr = input.substring(0, j);
if (substr.endsWith(prefix))
occurs++;
}
product = occurs*prefix.length();
System.out.println("product of " + prefix + " = " +
prefix.length() + " * " + occurs +" = " + product);
maxProduct = (product > maxProduct)?product:maxProduct;
}
System.out.println("maxProduct = " + maxProduct);
}
}

I was working on this challenge for more than 4 days , reading a lot of documentation, I found a solution with O(N) .
I got 81%, the idea is simple using a window slide.
def solution(s: String): Int = {
var max = s.length // length of the string
var i, j = 1 // start with i=j=1 ( is the beginning of the slide and j the end of the slide )
val len = s.length // the length of the string
val count = Array.ofDim[Int](len) // to store intermediate results
while (i < len - 1 || j < len) {
if (i < len && s(0) != s(i)) {
while (i < len && s(0) != s(i)) { // if the begin of the slide is different from
// the first letter of the string skip it
i = i + 1
}
}
j = i + 1
var k = 1
while (j < len && s(j).equals(s(k))) { // check for equality and update the array count
if (count(k) == 0) {
count(k) = 1
}
count(k) = count(k) + 1
max = math.max((k + 1) * count(k), max)
k = k + 1
j = j + 1
}
i = i + 1
}
max // return the max
}

Dividing array in two equal parts such that difference if sum of numbers of each array is minimum [duplicate]

Given a set of numbers, divide the numbers into two subsets such that difference between the sum of numbers in two subsets is minimal.
This is the idea that I have, but I am not sure if this is a correct solution:
Sort the array
Take the first 2 elements. Consider them as 2 sets (each having 1 element)
Take the next element from the array.
Decide in which set should this element go (by computing the sum => it should be minimum)
Repeat
Is this the correct solution? Can we do better?

The decision version of the problem you are describing is an NP-complete problem and it is called the partition problem. There are a number of approximations which provide, in many cases, optimal or, at least, good enough solutions.
The simple algorithm you described is a way playground kids would pick teams. This greedy algorithm performs remarkably well if the numbers in the set are of similar orders of magnitude.
The article The Easiest Hardest Problem, by American Scientist, gives an excellent analysis of the problem. You should go through and read it!

No, that doesn't work. There is no polynomial time solution (unless P=NP). The best you can do is just look at all different subsets. Have a look at the subset sum problem.
Consider the list [0, 1, 5, 6]. You will claim {0, 5} and {1, 6}, when the best answer is actually {0, 1, 5} and {6}.

No, Your algorithm is wrong. Your algo follows a greedy approach.
I implemented your approach and it failed over this test case:
(You may try here)
A greedy algorithm:
#include<bits/stdc++.h>
#define rep(i,_n) for(int i=0;i<_n;i++)
using namespace std;
#define MXN 55
int a[MXN];
int main() {
//code
int t,n,c;
cin>>t;
while(t--){
cin>>n;
rep(i,n) cin>>a[i];
sort(a, a+n);
reverse(a, a+n);
ll sum1 = 0, sum2 = 0;
rep(i,n){
cout<<a[i]<<endl;
if(sum1<=sum2)
sum1 += a[i];
else
sum2 += a[i];
}
cout<<abs(sum1-sum2)<<endl;
}
return 0;
}
Test case:
1
8
16 14 13 13 12 10 9 3
Wrong Ans: 6
16 13 10 9
14 13 12 3
Correct Ans: 0
16 13 13 3
14 12 10 9
The reason greedy algorithm fails is that it does not consider cases when taking a larger element in current larger sum set and later a much smaller in the larger sum set may result much better results. It always try to minimize current difference without exploring or knowing further possibilities, while in a correct solution you might include an element in a larger set and include a much smaller element later to compensate this difference, same as in above test case.
Correct Solution:
To understand the solution, you will need to understand all below problems in order:
0/1 Knapsack with Dynamic Programming
Partition Equal Subset Sum with DP
Solution
My Code (Same logic as this):
#include<bits/stdc++.h>
#define rep(i,_n) for(int i=0;i<_n;i++)
using namespace std;
#define MXN 55
int arr[MXN];
int dp[MXN][MXN*MXN];
int main() {
//code
int t,N,c;
cin>>t;
while(t--){
rep(i,MXN) fill(dp[i], dp[i]+MXN*MXN, 0);
cin>>N;
rep(i,N) cin>>arr[i];
int sum = accumulate(arr, arr+N, 0);
dp[0][0] = 1;
for(int i=1; i<=N; i++)
for(int j=sum; j>=0; j--)
dp[i][j] |= (dp[i-1][j] | (j>=arr[i-1] ? dp[i-1][j-arr[i-1]] : 0));
int res = sum;
for(int i=0; i<=sum/2; i++)
if(dp[N][i]) res = min(res, abs(i - (sum-i)));
cout<<res<<endl;
}
return 0;
}

Combinations over combinations approach:
import itertools as it
def min_diff_sets(data):
"""
Parameters:
- `data`: input list.
Return:
- min diff between sum of numbers in two sets
"""
if len(data) == 1:
return data[0]
s = sum(data)
# `a` is list of all possible combinations of all possible lengths (from 1
# to len(data) )
a = []
for i in range(1, len(data)):
a.extend(list(it.combinations(data, i)))
# `b` is list of all possible pairs (combinations) of all elements from `a`
b = it.combinations(a, 2)
# `c` is going to be final correct list of combinations.
# Let's apply 2 filters:
# 1. leave only pairs where: sum of all elements == sum(data)
# 2. leave only pairs where: flat list from pairs == data
c = filter(lambda x: sum(x[0])+sum(x[1])==s, b)
c = filter(lambda x: sorted([i for sub in x for i in sub])==sorted(data), c)
# `res` = [min_diff_between_sum_of_numbers_in_two_sets,
# ((set_1), (set_2))
# ]
res = sorted([(abs(sum(i[0]) - sum(i[1])), i) for i in c],
key=lambda x: x[0])
return min([i[0] for i in res])
if __name__ == '__main__':
assert min_diff_sets([10, 10]) == 0, "1st example"
assert min_diff_sets([10]) == 10, "2nd example"
assert min_diff_sets([5, 8, 13, 27, 14]) == 3, "3rd example"
assert min_diff_sets([5, 5, 6, 5]) == 1, "4th example"
assert min_diff_sets([12, 30, 30, 32, 42, 49]) == 9, "5th example"
assert min_diff_sets([1, 1, 1, 3]) == 0, "6th example"

The recursive approach is to generate all possible sums from all the values of array and to check
which solution is the most optimal one.
To generate sums we either include the i’th item in set 1 or don’t include, i.e., include in
set 2.
The time complexity is O(n*sum) for both time and space.T
public class MinimumSubsetSum {
static int dp[][];
public static int minDiffSubsets(int arr[], int i, int calculatedSum, int totalSum) {
if(dp[i][calculatedSum] != -1) return dp[i][calculatedSum];
/**
* If i=0, then the sum of one subset has been calculated as we have reached the last
* element. The sum of another subset is totalSum - calculated sum. We need to return the
* difference between them.
*/
if(i == 0) {
return Math.abs((totalSum - calculatedSum) - calculatedSum);
}
//Including the ith element
int iElementIncluded = minDiffSubsets(arr, i-1, arr[i-1] + calculatedSum,
totalSum);
//Excluding the ith element
int iElementExcluded = minDiffSubsets(arr, i-1, calculatedSum, totalSum);
int res = Math.min(iElementIncluded, iElementExcluded);
dp[i][calculatedSum] = res;
return res;
}
public static void util(int arr[]) {
int totalSum = 0;
int n = arr.length;
for(Integer e : arr) totalSum += e;
dp = new int[n+1][totalSum+1];
for(int i=0; i <= n; i++)
for(int j=0; j <= totalSum; j++)
dp[i][j] = -1;
int res = minDiffSubsets(arr, n, 0, totalSum);
System.out.println("The min difference between two subset is " + res);
}
public static void main(String[] args) {
util(new int[]{3, 1, 4, 2, 2, 1});
}
}

We can use Dynamic Programming (similar to the way we find if a set can be partitioned into two equal sum subsets). Then we find the max possible sum, which will be our first partition.
Second partition will be the difference of the total sum and firstSum.
Answer will be the difference of the first and second partitions.
public int minDiffernce(int set[]) {
int sum = 0;
int n = set.length;
for(int i=0; i<n; i++)
sum+=set[i];
//finding half of total sum, because min difference can be at max 0, if one subset reaches half
int target = sum/2;
boolean[][] dp = new boolean[n+1][target+1];//2
for(int i = 0; i<=n; i++)
dp[i][0] = true;
for(int i= 1; i<=n; i++){
for(int j = 1; j<=target;j++){
if(set[i-1]>j) dp[i][j] = dp[i-1][j];
else dp[i][j] = dp[i-1][j] || dp[i-1][j-set[i-1]];
}
}
// we now find the max sum possible starting from target
int firstPart = 0;
for(int j = target; j>=0; j--){
if(dp[n][j] == true) {
firstPart = j; break;
}
}
int secondPart = sum - firstPart;
return Math.abs(firstPart - secondPart);
}

One small change: reverse the order - start with the largest number and work down. This will minimize the error.

Are you sorting your subset into decending order or ascending order?
Think about it like this, the array {1, 3, 5, 8, 9, 25}
if you were to divide, you would have {1,8,9} =18 {3,5,25} =33
If it were sorted into descending order it would work out a lot better
{25,1}=26 {9,8,5,3}=25
So your solution is basically correct, it just needs to make sure to take the largest values first.
EDIT: Read tskuzzy's post. Mine does not work

This is a variation of the knapsack and subset sum problem.
In subset sum problem, given n positive integers and a value k and we have to find the sum of subset whose value is less than or equal to k.
In the above problem we have given an array, here we have to find the subset whose sum is less than or equal to total_sum(sum of array values).
So the
subset sum can be found using a variation in knapsack algorithm,by
taking profits as given array values. And the final answer is
total_sum-dp[n][total_sum/2]. Have a look at the below code for clear
understanding.
#include<iostream>
#include<cstdio>
using namespace std;
int main()
{
int n;
cin>>n;
int arr[n],sum=0;
for(int i=1;i<=n;i++)
cin>>arr[i],sum+=arr[i];
int temp=sum/2;
int dp[n+1][temp+2];
for(int i=0;i<=n;i++)
{
for(int j=0;j<=temp;j++)
{
if(i==0 || j==0)
dp[i][j]=0;
else if(arr[i]<=j)
dp[i][j]=max(dp[i-1][j],dp[i-1][j-arr[i]]+arr[i]);
else
{
dp[i][j]=dp[i-1][j];
}
}
}
cout<<sum-2*dp[n][temp]<<endl;
}

This can be solve using BST.
First sort the array say arr1
To start create another arr2 with the last element of arr1 (remove this ele from arr1)
Now:Repeat the steps till no swap happens.
Check arr1 for an element which can be moved to arr2 using BST such that the diff is less MIN diff found till now.
if we find an element move this element to arr2 and go to step1 again.
if we don't find any element in above steps do steps 1 & 2 for arr2 & arr1.
i.e. now check if we have any element in arr2 which can be moved to arr1
continue steps 1-4 till we don't need any swap..
we get the solution.
Sample Java Code:
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
/**
* Divide an array so that the difference between these 2 is min
*
* #author shaikhjamir
*
*/
public class DivideArrayForMinDiff {
/**
* Create 2 arrays and try to find the element from 2nd one so that diff is
* min than the current one
*/
private static int sum(List<Integer> arr) {
int total = 0;
for (int i = 0; i < arr.size(); i++) {
total += arr.get(i);
}
return total;
}
private static int diff(ArrayList<Integer> arr, ArrayList<Integer> arr2) {
int diff = sum(arr) - sum(arr2);
if (diff < 0)
diff = diff * -1;
return diff;
}
private static int MIN = Integer.MAX_VALUE;
private static int binarySearch(int low, int high, ArrayList<Integer> arr1, int arr2sum) {
if (low > high || low < 0)
return -1;
int mid = (low + high) / 2;
int midVal = arr1.get(mid);
int sum1 = sum(arr1);
int resultOfMoveOrg = (sum1 - midVal) - (arr2sum + midVal);
int resultOfMove = (sum1 - midVal) - (arr2sum + midVal);
if (resultOfMove < 0)
resultOfMove = resultOfMove * -1;
if (resultOfMove < MIN) {
// lets do the swap
return mid;
}
// this is positive number greater than min
// which mean we should move left
if (resultOfMoveOrg < 0) {
// 1,10, 19 ==> 30
// 100
// 20, 110 = -90
// 29, 111 = -83
return binarySearch(low, mid - 1, arr1, arr2sum);
} else {
// resultOfMoveOrg > 0
// 1,5,10, 15, 19, 20 => 70
// 21
// For 10
// 60, 31 it will be 29
// now if we move 1
// 71, 22 ==> 49
// but now if we move 20
// 50, 41 ==> 9
return binarySearch(mid + 1, high, arr1, arr2sum);
}
}
private static int findMin(ArrayList<Integer> arr1) {
ArrayList<Integer> list2 = new ArrayList<>(arr1.subList(arr1.size() - 1, arr1.size()));
arr1.remove(arr1.size() - 1);
while (true) {
int index = binarySearch(0, arr1.size(), arr1, sum(list2));
if (index != -1) {
int val = arr1.get(index);
arr1.remove(index);
list2.add(val);
Collections.sort(list2);
MIN = diff(arr1, list2);
} else {
// now try for arr2
int index2 = binarySearch(0, list2.size(), list2, sum(arr1));
if (index2 != -1) {
int val = list2.get(index2);
list2.remove(index2);
arr1.add(val);
Collections.sort(arr1);
MIN = diff(arr1, list2);
} else {
// no switch in both the cases
break;
}
}
}
System.out.println("MIN==>" + MIN);
System.out.println("arr1==>" + arr1 + ":" + sum(arr1));
System.out.println("list2==>" + list2 + ":" + sum(list2));
return 0;
}
public static void main(String args[]) {
ArrayList<Integer> org = new ArrayList<>();
org = new ArrayList<>();
org.add(1);
org.add(2);
org.add(3);
org.add(7);
org.add(8);
org.add(10);
findMin(org);
}
}

you can use bits to solve this problem by looping over all the possible combinations using bits:
main algorithm:
for(int i = 0; i < 1<<n; i++) {
int s = 0;
for(int j = 0; j < n; j++) {
if(i & 1<<j) s += arr[j];
}
int curr = abs((total-s)-s);
ans = min(ans, curr);
}
use long long for greater inputs.
but here I found a recursive and dynamic programming solution and I used both the approaches to solve the question and both worked for greater inputs perfectly fine. Hope this helps :) link to solution

Please check this logic which I have written for this problem. It worked for few scenarios I checked. Please comment on the solution,
Approach :
Sort the main array and divide it into 2 teams.
Then start making the team equal by shift and swapping elements from one array to other, based on the conditions mentioned in the code.
If the difference is difference of sum is less than the minimum number of the larger array(array with bigger sum), then shift the elements from the bigger array to smaller array.Shifting happens with the condition, that element from the bigger array with value less than or equal to the difference.When all the elements from the bigger array is greater than the difference, the shifting stops and swapping happens. I m just swapping the last elements of the array (It can be made more efficient by finding which two elements to swap), but still this worked. Let me know if this logic failed in any scenario.
public class SmallestDifference {
static int sum1 = 0, sum2 = 0, diff, minDiff;
private static List<Integer> minArr1;
private static List<Integer> minArr2;
private static List<Integer> biggerArr;
/**
* #param args
*/
public static void main(String[] args) {
SmallestDifference sm = new SmallestDifference();
Integer[] array1 = { 2, 7, 1, 4, 5, 9, 10, 11 };
List<Integer> array = new ArrayList<Integer>();
for (Integer val : array1) {
array.add(val);
}
Collections.sort(array);
CopyOnWriteArrayList<Integer> arr1 = new CopyOnWriteArrayList<>(array.subList(0, array.size() / 2));
CopyOnWriteArrayList<Integer> arr2 = new CopyOnWriteArrayList<>(array.subList(array.size() / 2, array.size()));
diff = Math.abs(sm.getSum(arr1) - sm.getSum(arr2));
minDiff = array.get(0);
sm.updateSum(arr1, arr2);
System.out.println(arr1 + " : " + arr2);
System.out.println(sum1 + " - " + sum2 + " = " + diff + " : minDiff = " + minDiff);
int k = arr2.size();
biggerArr = arr2;
while (diff != 0 && k >= 0) {
while (diff != 0 && sm.findMin(biggerArr) < diff) {
sm.swich(arr1, arr2);
int sum1 = sm.getSum(arr1), sum2 = sm.getSum(arr2);
diff = Math.abs(sum1 - sum2);
if (sum1 > sum2) {
biggerArr = arr1;
} else {
biggerArr = arr2;
}
if (minDiff > diff || sm.findMin(biggerArr) > diff) {
minDiff = diff;
minArr1 = new CopyOnWriteArrayList<>(arr1);
minArr2 = new CopyOnWriteArrayList<>(arr2);
}
sm.updateSum(arr1, arr2);
System.out.println("Shifting : " + sum1 + " - " + sum2 + " = " + diff + " : minDiff = " + minDiff);
}
while (k >= 0 && minDiff > array.get(0) && minDiff != 0) {
sm.swap(arr1, arr2);
diff = Math.abs(sm.getSum(arr1) - sm.getSum(arr2));
if (minDiff > diff) {
minDiff = diff;
minArr1 = new CopyOnWriteArrayList<>(arr1);
minArr2 = new CopyOnWriteArrayList<>(arr2);
}
sm.updateSum(arr1, arr2);
System.out.println("Swapping : " + sum1 + " - " + sum2 + " = " + diff + " : minDiff = " + minDiff);
k--;
}
k--;
}
System.out.println(minArr1 + " : " + minArr2 + " = " + minDiff);
}
private void updateSum(CopyOnWriteArrayList<Integer> arr1, CopyOnWriteArrayList<Integer> arr2) {
SmallestDifference sm1 = new SmallestDifference();
sum1 = sm1.getSum(arr1);
sum2 = sm1.getSum(arr2);
}
private int findMin(List<Integer> biggerArr2) {
Integer min = biggerArr2.get(0);
for (Integer integer : biggerArr2) {
if(min > integer) {
min = integer;
}
}
return min;
}
private int getSum(CopyOnWriteArrayList<Integer> arr) {
int sum = 0;
for (Integer val : arr) {
sum += val;
}
return sum;
}
private void swap(CopyOnWriteArrayList<Integer> arr1, CopyOnWriteArrayList<Integer> arr2) {
int l1 = arr1.size(), l2 = arr2.size(), temp2 = arr2.get(l2 - 1), temp1 = arr1.get(l1 - 1);
arr1.remove(l1 - 1);
arr1.add(temp2);
arr2.remove(l2 - 1);
arr2.add(temp1);
System.out.println(arr1 + " : " + arr2);
}
private void swich(CopyOnWriteArrayList<Integer> arr1, CopyOnWriteArrayList<Integer> arr2) {
Integer e;
if (sum1 > sum2) {
e = this.findElementJustLessThanMinDiff(arr1);
arr1.remove(e);
arr2.add(e);
} else {
e = this.findElementJustLessThanMinDiff(arr2);
arr2.remove(e);
arr1.add(e);
}
System.out.println(arr1 + " : " + arr2);
}
private Integer findElementJustLessThanMinDiff(CopyOnWriteArrayList<Integer> arr1) {
Integer e = arr1.get(0);
int tempDiff = diff - e;
for (Integer integer : arr1) {
if (diff > integer && (diff - integer) < tempDiff) {
e = integer;
tempDiff = diff - e;
}
}
return e;
}
}

A possible solution here- https://stackoverflow.com/a/31228461/4955513
This Java program seems to solve this problem, provided one condition is fulfilled- that there is one and only one solution to the problem.

I'll convert this problem to subset sum problem
let's take array int[] A = { 10,20,15,5,25,33 };
it should be divided into {25 20 10} and { 33 20 } and answer is 55-53=2
Notations : SUM == sum of whole array
sum1 == sum of subset1
sum2 == sum of subset1
step 1: get sum of whole array SUM=108
step 2: whichever way we divide our array into two part one thing will remain true
sum1+ sum2= SUM
step 3: if our intention is to get minimum sum difference then sum1 and sum2 should be near SUM/2 (example sum1=54 and sum2=54 then diff=0 )
steon 4: let's try combinations
sum1 = 54 AND sum2 = 54 (not possible to divide like this)
sum1 = 55 AND sum2 = 53 (possible and our solution, should break here)
sum1 = 56 AND sum2 = 52
sum1 = 57 AND sum2 = 51 .......so on
pseudo code
SUM=Array.sum();
sum1 = SUM/2;
sum2 = SUM-sum1;
while(true){
if(subSetSuMProblem(A,sum1) && subSetSuMProblem(A,sum2){
print "possible"
break;
}
else{
sum1++;
sum2--;
}
}
Java code for the same
import java.util.ArrayList;
import java.util.List;
public class MinimumSumSubsetPrint {
public static void main(String[] args) {
int[] A = {10, 20, 15, 5, 25, 32};
int sum = 0;
for (int i = 0; i < A.length; i++) {
sum += A[i];
}
subsetSumDynamic(A, sum);
}
private static boolean subsetSumDynamic(int[] A, int sum) {
int n = A.length;
boolean[][] T = new boolean[n + 1][sum + 1];
// sum2[0][0]=true;
for (int i = 0; i <= n; i++) {
T[i][0] = true;
}
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= sum; j++) {
if (A[i - 1] > j) {
T[i][j] = T[i - 1][j];
} else {
T[i][j] = T[i - 1][j] || T[i - 1][j - A[i - 1]];
}
}
}
int sum1 = sum / 2;
int sum2 = sum - sum1;
while (true) {
if (T[n][sum1] && T[n][sum2]) {
printSubsets(T, sum1, n, A);
printSubsets(T, sum2, n, A);
break;
} else {
sum1 = sum1 - 1;
sum2 = sum - sum1;
System.out.println(sum1 + ":" + sum2);
}
}
return T[n][sum];
}
private static void printSubsets(boolean[][] T, int sum, int n, int[] A) {
List<Integer> sumvals = new ArrayList<Integer>();
int i = n;
int j = sum;
while (i > 0 && j > 0) {
if (T[i][j] == T[i - 1][j]) {
i--;
} else {
sumvals.add(A[i - 1]);
j = j - A[i - 1];
i--;
}
}
System.out.println();
for (int p : sumvals) {
System.out.print(p + " ");
}
System.out.println();
}
}

Here is recursive approach
def helper(arr,sumCal,sumTot,n):
if n==0:
return abs(abs(sumCal-sumTot)-sumCal)
return min(helper(arr,sumCal+arr[n-1],sumTot,n-1),helper(arr,sumCal,sumTot,n-1))
def minimum_subset_diff(arr,n):
sum=0
for i in range(n):
sum+=arr[i]
return helper(arr,0,sum,n)
Here is a Top down Dynamic approach to reduce the time complexity
dp=[[-1]*100 for i in range(100)]
def helper_dp(arr,sumCal,sumTot,n):
if n==0:
return abs(abs(sumCal-sumTot)-sumCal)
if dp[n][sumTot]!=-1:
return dp[n][sumTot]
return min(helper_dp(arr,sumCal+arr[n-1],sumTot,n-1),helper_dp(arr,sumCal,sumTot,n-1))
def minimum_subset_diff_dp(arr,n):
sum=0
for i in range(n):
sum+=arr[i]
return helper_dp(arr,0,sum,n)

int ModDiff(int a, int b)
{
if(a < b)return b - a;
return a-b;
}
int EqDiv(int *a, int l, int *SumI, int *SumE)
{
static int tc = 0;
int min = ModDiff(*SumI,*SumE);
for(int i = 0; i < l; i++)
{
swap(a,0,i);
a++;
int m1 = EqDiv(a, l-1, SumI,SumE);
a--;
swap(a,0,i);
*SumI = *SumI + a[i];
*SumE = *SumE - a[i];
swap(a,0,i);
a++;
int m2 = EqDiv(a,l-1, SumI,SumE);
a--;
swap(a,0,i);
*SumI = *SumI - a[i];
*SumE = *SumE + a[i];
min = min3(min,m1,m2);
}
return min;
}
call the function with SumI =0 and SumE= sumof all the elements in a.
This O(n!) solution does compute the way we can divide the given array into 2 parts such the difference is minimum.
But definitely not practical due to the n! time complexity looking to improve this using DP.

#include<bits/stdc++.h>
using namespace std;
bool ison(int i,int x)
{
if((i>>x) & 1)return true;
return false;
}
int main()
{
// cout<<"enter the number of elements : ";
int n;
cin>>n;
int a[n];
for(int i=0;i<n;i++)
cin>>a[i];
int sumarr1[(1<<n)-1];
int sumarr2[(1<<n)-1];
memset(sumarr1,0,sizeof(sumarr1));
memset(sumarr2,0,sizeof(sumarr2));
int index=0;
vector<int>v1[(1<<n)-1];
vector<int>v2[(1<<n)-1];
for(int i=1;i<(1<<n);i++)
{
for(int j=0;j<n;j++)
{
if(ison(i,j))
{
sumarr1[index]+=a[j];
v1[index].push_back(a[j]);
}
else
{
sumarr2[index]+=a[j];
v2[index].push_back(a[j]);
}
}index++;
}
int ans=INT_MAX;
int ii;
for(int i=0;i<index;i++)
{
if(abs(sumarr1[i]-sumarr2[i])<ans)
{
ii=i;
ans=abs(sumarr1[i]-sumarr2[i]);
}
}
cout<<"first partitioned array : ";
for(int i=0;i<v1[ii].size();i++)
{
cout<<v1[ii][i]<<" ";
}
cout<<endl;
cout<<"2nd partitioned array : ";
for(int i=0;i<v2[ii].size();i++)
{
cout<<v2[ii][i]<<" ";
}
cout<<endl;
cout<<"minimum difference is : "<<ans<<endl;
}

Many answers mentioned about getting an 'approximate' solution in a very acceptable time bound . But since it is asked in an interview , I dont expect they need an approximation algorithm. Also I dont expect they need a naive exponential algorithm either.
Coming to the problem , assuming the maximum value of sum of numbers is known , it can infact be solved in polynomial time using dynamic programming. Refer this link
https://people.cs.clemson.edu/~bcdean/dp_practice/dp_4.swf

HI I think This Problem can be solved in Linear Time on a sorted array , no Polynomial Time is required , rather than Choosing Next Element u can choose nest two Element and decide which side which element to go. in This Way
in this way minimize the difference, let suppose
{0,1,5,6} ,
choose {0,1}
{0} , {1}
choose 5,6
{0,6}, {1,5}
but still that is not exact solution , now at the end there will be difference of sum in 2 array let suppose x
but there can be better solution of difference of (less than x)
for that Find again 1 greedy approach over sorted half sized array
and move x/2(or nearby) element from 1 set to another or exchange element of(difference x/2) so that difference can be minimized***

Find a pair of elements from an array whose sum equals a given number

Given array of n integers and given a number X, find all the unique pairs of elements (a,b), whose summation is equal to X.
The following is my solution, it is O(nLog(n)+n), but I am not sure whether or not it is optimal.
int main(void)
{
int arr [10] = {1,2,3,4,5,6,7,8,9,0};
findpair(arr, 10, 7);
}
void findpair(int arr[], int len, int sum)
{
std::sort(arr, arr+len);
int i = 0;
int j = len -1;
while( i < j){
while((arr[i] + arr[j]) <= sum && i < j)
{
if((arr[i] + arr[j]) == sum)
cout << "(" << arr[i] << "," << arr[j] << ")" << endl;
i++;
}
j--;
while((arr[i] + arr[j]) >= sum && i < j)
{
if((arr[i] + arr[j]) == sum)
cout << "(" << arr[i] << "," << arr[j] << ")" << endl;
j--;
}
}
}

There are 3 approaches to this solution:
Let the sum be T and n be the size of array
Approach 1:
The naive way to do this would be to check all combinations (n choose 2). This exhaustive search is O(n2).
Approach 2:
A better way would be to sort the array. This takes O(n log n)
Then for each x in array A,
use binary search to look for T-x. This will take O(nlogn).
So, overall search is O(n log n)
Approach 3 :
The best way
would be to insert every element into a hash table (without sorting). This takes O(n) as constant time insertion.
Then for every x,
we can just look up its complement, T-x, which is O(1).
Overall the run time of this approach is O(n).
You can refer more here.Thanks.

# Let arr be the given array.
# And K be the give sum
for i=0 to arr.length - 1 do
# key is the element and value is its index.
hash(arr[i]) = i
end-for
for i=0 to arr.length - 1 do
# if K-th element exists and it's different then we found a pair
if hash(K - arr[i]) != i
print "pair i , hash(K - arr[i]) has sum K"
end-if
end-for

Implementation in Java : Using codaddict's algorithm (Maybe slightly different)
import java.util.HashMap;
public class ArrayPairSum {
public static void main(String[] args) {
int []a = {2,45,7,3,5,1,8,9};
printSumPairs(a,10);
}
public static void printSumPairs(int []input, int k){
Map<Integer, Integer> pairs = new HashMap<Integer, Integer>();
for(int i=0;i<input.length;i++){
if(pairs.containsKey(input[i]))
System.out.println(input[i] +", "+ pairs.get(input[i]));
else
pairs.put(k-input[i], input[i]);
}
}
}
For input = {2,45,7,3,5,1,8,9} and if Sum is 10
Output pairs:
3,7
8,2
9,1
Some notes about the solution :
We iterate only once through the array --> O(n) time
Insertion and lookup time in Hash is O(1).
Overall time is O(n), although it uses extra space in terms of hash.

Solution in java. You can add all the String elements to an ArrayList of strings and return the list. Here I am just printing it out.
void numberPairsForSum(int[] array, int sum) {
HashSet<Integer> set = new HashSet<Integer>();
for (int num : array) {
if (set.contains(sum - num)) {
String s = num + ", " + (sum - num) + " add up to " + sum;
System.out.println(s);
}
set.add(num);
}
}

Python Implementation:
import itertools
list = [1, 1, 2, 3, 4, 5,]
uniquelist = set(list)
targetsum = 5
for n in itertools.combinations(uniquelist, 2):
if n[0] + n[1] == targetsum:
print str(n[0]) + " + " + str(n[1])
Output:
1 + 4
2 + 3

C++11, run time complexity O(n):
#include <vector>
#include <unordered_map>
#include <utility>
std::vector<std::pair<int, int>> FindPairsForSum(
const std::vector<int>& data, const int& sum)
{
std::unordered_map<int, size_t> umap;
std::vector<std::pair<int, int>> result;
for (size_t i = 0; i < data.size(); ++i)
{
if (0 < umap.count(sum - data[i]))
{
size_t j = umap[sum - data[i]];
result.push_back({data[i], data[j]});
}
else
{
umap[data[i]] = i;
}
}
return result;
}

Here is a solution witch takes into account duplicate entries. It is written in javascript and assumes array is sorted. The solution runs in O(n) time and does not use any extra memory aside from variable.
var count_pairs = function(_arr,x) {
if(!x) x = 0;
var pairs = 0;
var i = 0;
var k = _arr.length-1;
if((k+1)<2) return pairs;
var halfX = x/2;
while(i<k) {
var curK = _arr[k];
var curI = _arr[i];
var pairsThisLoop = 0;
if(curK+curI==x) {
// if midpoint and equal find combinations
if(curK==curI) {
var comb = 1;
while(--k>=i) pairs+=(comb++);
break;
}
// count pair and k duplicates
pairsThisLoop++;
while(_arr[--k]==curK) pairsThisLoop++;
// add k side pairs to running total for every i side pair found
pairs+=pairsThisLoop;
while(_arr[++i]==curI) pairs+=pairsThisLoop;
} else {
// if we are at a mid point
if(curK==curI) break;
var distK = Math.abs(halfX-curK);
var distI = Math.abs(halfX-curI);
if(distI > distK) while(_arr[++i]==curI);
else while(_arr[--k]==curK);
}
}
return pairs;
}
I solved this during an interview for a large corporation. They took it but not me.
So here it is for everyone.
Start at both side of the array and slowly work your way inwards making sure to count duplicates if they exist.
It only counts pairs but can be reworked to
find the pairs
find pairs < x
find pairs > x
Enjoy!

O(n)
def find_pairs(L,sum):
s = set(L)
edgeCase = sum/2
if L.count(edgeCase) ==2:
print edgeCase, edgeCase
s.remove(edgeCase)
for i in s:
diff = sum-i
if diff in s:
print i, diff
L = [2,45,7,3,5,1,8,9]
sum = 10
find_pairs(L,sum)
Methodology: a + b = c, so instead of looking for (a,b) we look for a = c -
b

Implementation in Java : Using codaddict's algorithm:
import java.util.Hashtable;
public class Range {
public static void main(String[] args) {
// TODO Auto-generated method stub
Hashtable mapping = new Hashtable();
int a[]= {80,79,82,81,84,83,85};
int k = 160;
for (int i=0; i < a.length; i++){
mapping.put(a[i], i);
}
for (int i=0; i < a.length; i++){
if (mapping.containsKey(k - a[i]) && (Integer)mapping.get(k-a[i]) != i){
System.out.println(k-a[i]+", "+ a[i]);
}
}
}
}
Output:
81, 79
79, 81
If you want duplicate pairs (eg: 80,80) also then just remove && (Integer)mapping.get(k-a[i]) != i from the if condition and you are good to go.

Just attended this question on HackerRank and here's my 'Objective C' Solution:
-(NSNumber*)sum:(NSArray*) a andK:(NSNumber*)k {
NSMutableDictionary *dict = [NSMutableDictionary dictionary];
long long count = 0;
for(long i=0;i<a.count;i++){
if(dict[a[i]]) {
count++;
NSLog(#"a[i]: %#, dict[array[i]]: %#", a[i], dict[a[i]]);
}
else{
NSNumber *calcNum = #(k.longLongValue-((NSNumber*)a[i]).longLongValue);
dict[calcNum] = a[i];
}
}
return #(count);
}
Hope it helps someone.

this is the implementation of O(n*lg n) using binary search implementation inside a loop.
#include <iostream>
using namespace std;
bool *inMemory;
int pairSum(int arr[], int n, int k)
{
int count = 0;
if(n==0)
return count;
for (int i = 0; i < n; ++i)
{
int start = 0;
int end = n-1;
while(start <= end)
{
int mid = start + (end-start)/2;
if(i == mid)
break;
else if((arr[i] + arr[mid]) == k && !inMemory[i] && !inMemory[mid])
{
count++;
inMemory[i] = true;
inMemory[mid] = true;
}
else if(arr[i] + arr[mid] >= k)
{
end = mid-1;
}
else
start = mid+1;
}
}
return count;
}
int main()
{
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
inMemory = new bool[10];
for (int i = 0; i < 10; ++i)
{
inMemory[i] = false;
}
cout << pairSum(arr, 10, 11) << endl;
return 0;
}

In python
arr = [1, 2, 4, 6, 10]
diff_hash = {}
expected_sum = 3
for i in arr:
if diff_hash.has_key(i):
print i, diff_hash[i]
key = expected_sum - i
diff_hash[key] = i

Nice solution from Codeaddict. I took the liberty of implementing a version of it in Ruby:
def find_sum(arr,sum)
result ={}
h = Hash[arr.map {|i| [i,i]}]
arr.each { |l| result[l] = sum-l if h[sum-l] && !result[sum-l] }
result
end
To allow duplicate pairs (1,5), (5,1) we just have to remove the && !result[sum-l] instruction

Here is Java code for three approaches:
1. Using Map O(n), HashSet can also be used here.
2. Sort array and then use BinarySearch to look for complement O(nLog(n))
3. Traditional BruteForce two loops O(n^2)
public class PairsEqualToSum {
public static void main(String[] args) {
int a[] = {1,10,5,8,2,12,6,4};
findPairs1(a,10);
findPairs2(a,10);
findPairs3(a,10);
}
//Method1 - O(N) use a Map to insert values as keys & check for number's complement in map
static void findPairs1(int[]a, int sum){
Map<Integer, Integer> pairs = new HashMap<Integer, Integer>();
for(int i=0; i<a.length; i++){
if(pairs.containsKey(sum-a[i]))
System.out.println("("+a[i]+","+(sum-a[i])+")");
else
pairs.put(a[i], 0);
}
}
//Method2 - O(nlog(n)) using Sort
static void findPairs2(int[]a, int sum){
Arrays.sort(a);
for(int i=0; i<a.length/2; i++){
int complement = sum - a[i];
int foundAtIndex = Arrays.binarySearch(a,complement);
if(foundAtIndex >0 && foundAtIndex != i) //to avoid situation where binarySearch would find the original and not the complement like "5"
System.out.println("("+a[i]+","+(sum-a[i])+")");
}
}
//Method 3 - Brute Force O(n^2)
static void findPairs3(int[]a, int sum){
for(int i=0; i<a.length; i++){
for(int j=i; j<a.length;j++){
if(a[i]+a[j] == sum)
System.out.println("("+a[i]+","+a[j]+")");
}
}
}
}

A Simple program in java for arrays having unique elements:
import java.util.*;
public class ArrayPairSum {
public static void main(String[] args) {
int []a = {2,4,7,3,5,1,8,9,5};
sumPairs(a,10);
}
public static void sumPairs(int []input, int k){
Set<Integer> set = new HashSet<Integer>();
for(int i=0;i<input.length;i++){
if(set.contains(input[i]))
System.out.println(input[i] +", "+(k-input[i]));
else
set.add(k-input[i]);
}
}
}

A simple Java code snippet for printing the pairs below:
public static void count_all_pairs_with_given_sum(int arr[], int S){
if(arr.length < 2){
return;
}
HashSet values = new HashSet(arr.length);
for(int value : arr)values.add(value);
for(int value : arr){
int difference = S - value;
if(values.contains(difference) && value<difference){
System.out.printf("(%d, %d) %n", value, difference);
}
}
}

Another solution in Swift: the idea is to create an hash that store values of (sum - currentValue) and compare this to the current value of the loop. The complexity is O(n).
func findPair(list: [Int], _ sum: Int) -> [(Int, Int)]? {
var hash = Set<Int>() //save list of value of sum - item.
var dictCount = [Int: Int]() //to avoid the case A*2 = sum where we have only one A in the array
var foundKeys = Set<Int>() //to avoid duplicated pair in the result.
var result = [(Int, Int)]() //this is for the result.
for item in list {
//keep track of count of each element to avoid problem: [2, 3, 5], 10 -> result = (5,5)
if (!dictCount.keys.contains(item)) {
dictCount[item] = 1
} else {
dictCount[item] = dictCount[item]! + 1
}
//if my hash does not contain the (sum - item) value -> insert to hash.
if !hash.contains(sum-item) {
hash.insert(sum-item)
}
//check if current item is the same as another hash value or not, if yes, return the tuple.
if hash.contains(item) &&
(dictCount[item] > 1 || sum != item*2) // check if we have item*2 = sum or not.
{
if !foundKeys.contains(item) && !foundKeys.contains(sum-item) {
foundKeys.insert(item) //add to found items in order to not to add duplicated pair.
result.append((item, sum-item))
}
}
}
return result
}
//test:
let a = findPair([2,3,5,4,1,7,6,8,9,5,3,3,3,3,3,3,3,3,3], 14) //will return (8,6) and (9,5)

My Solution - Java - Without duplicates
public static void printAllPairSum(int[] a, int x){
System.out.printf("printAllPairSum(%s,%d)\n", Arrays.toString(a),x);
if(a==null||a.length==0){
return;
}
int length = a.length;
Map<Integer,Integer> reverseMapOfArray = new HashMap<>(length,1.0f);
for (int i = 0; i < length; i++) {
reverseMapOfArray.put(a[i], i);
}
for (int i = 0; i < length; i++) {
Integer j = reverseMapOfArray.get(x - a[i]);
if(j!=null && i<j){
System.out.printf("a[%d] + a[%d] = %d + %d = %d\n",i,j,a[i],a[j],x);
}
}
System.out.println("------------------------------");
}

This prints the pairs and avoids duplicates using bitwise manipulation.
public static void findSumHashMap(int[] arr, int key) {
Map<Integer, Integer> valMap = new HashMap<Integer, Integer>();
for(int i=0;i<arr.length;i++)
valMap.put(arr[i], i);
int indicesVisited = 0;
for(int i=0;i<arr.length;i++) {
if(valMap.containsKey(key - arr[i]) && valMap.get(key - arr[i]) != i) {
if(!((indicesVisited & ((1<<i) | (1<<valMap.get(key - arr[i])))) > 0)) {
int diff = key-arr[i];
System.out.println(arr[i] + " " +diff);
indicesVisited = indicesVisited | (1<<i) | (1<<valMap.get(key - arr[i]));
}
}
}
}

I bypassed the bit manuplation and just compared the index values. This is less than the loop iteration value (i in this case). This will not print the duplicate pairs and duplicate array elements also.
public static void findSumHashMap(int[] arr, int key) {
Map<Integer, Integer> valMap = new HashMap<Integer, Integer>();
for (int i = 0; i < arr.length; i++) {
valMap.put(arr[i], i);
}
for (int i = 0; i < arr.length; i++) {
if (valMap.containsKey(key - arr[i])
&& valMap.get(key - arr[i]) != i) {
if (valMap.get(key - arr[i]) < i) {
int diff = key - arr[i];
System.out.println(arr[i] + " " + diff);
}
}
}
}

in C#:
int[] array = new int[] { 1, 5, 7, 2, 9, 8, 4, 3, 6 }; // given array
int sum = 10; // given sum
for (int i = 0; i <= array.Count() - 1; i++)
if (array.Contains(sum - array[i]))
Console.WriteLine("{0}, {1}", array[i], sum - array[i]);

One Solution can be this, but not optimul (The complexity of this code is O(n^2)):
public class FindPairsEqualToSum {
private static int inputSum = 0;
public static List<String> findPairsForSum(int[] inputArray, int sum) {
List<String> list = new ArrayList<String>();
List<Integer> inputList = new ArrayList<Integer>();
for (int i : inputArray) {
inputList.add(i);
}
for (int i : inputArray) {
int tempInt = sum - i;
if (inputList.contains(tempInt)) {
String pair = String.valueOf(i + ", " + tempInt);
list.add(pair);
}
}
return list;
}
}

A simple python version of the code that find a pair sum of zero and can be modify to find k:
def sumToK(lst):
k = 0 # <- define the k here
d = {} # build a dictionary
# build the hashmap key = val of lst, value = i
for index, val in enumerate(lst):
d[val] = index
# find the key; if a key is in the dict, and not the same index as the current key
for i, val in enumerate(lst):
if (k-val) in d and d[k-val] != i:
return True
return False
The run time complexity of the function is O(n) and Space: O(n) as well.

public static int[] f (final int[] nums, int target) {
int[] r = new int[2];
r[0] = -1;
r[1] = -1;
int[] vIndex = new int[0Xfff];
for (int i = 0; i < nums.length; i++) {
int delta = 0Xff;
int gapIndex = target - nums[i] + delta;
if (vIndex[gapIndex] != 0) {
r[0] = vIndex[gapIndex];
r[1] = i + 1;
return r;
} else {
vIndex[nums[i] + delta] = i + 1;
}
}
return r;
}

less than o(n) solution will be=>
function(array,k)
var map = {};
for element in array
map(element) = true;
if(map(k-element))
return {k,element}

Solution in Python using list comprehension
f= [[i,j] for i in list for j in list if j+i==X];
O(N2)
also gives two ordered pairs- (a,b) and (b,a) as well

I can do it in O(n). Let me know when you want the answer. Note it involves simply traversing the array once with no sorting, etc... I should mention too that it exploits commutativity of addition and doesn't use hashes but wastes memory.
using System;
using System.Collections.Generic;
/*
An O(n) approach exists by using a lookup table. The approach is to store the value in a "bin" that can easily be looked up(e.g., O(1)) if it is a candidate for an appropriate sum.
e.g.,
for each a[k] in the array we simply put the it in another array at the location x - a[k].
Suppose we have [0, 1, 5, 3, 6, 9, 8, 7] and x = 9
We create a new array,
indexes value
9 - 0 = 9 0
9 - 1 = 8 1
9 - 5 = 4 5
9 - 3 = 6 3
9 - 6 = 3 6
9 - 9 = 0 9
9 - 8 = 1 8
9 - 7 = 2 7
THEN the only values that matter are the ones who have an index into the new table.
So, say when we reach 9 or equal we see if our new array has the index 9 - 9 = 0. Since it does we know that all the values it contains will add to 9. (note in this cause it's obvious there is only 1 possible one but it might have multiple index values in it which we need to store).
So effectively what we end up doing is only having to move through the array once. Because addition is commutative we will end up with all the possible results.
For example, when we get to 6 we get the index into our new table as 9 - 6 = 3. Since the table contains that index value we know the values.
This is essentially trading off speed for memory.
*/
namespace sum
{
class Program
{
static void Main(string[] args)
{
int num = 25;
int X = 10;
var arr = new List<int>();
for(int i = 0; i <= num; i++) arr.Add((new Random((int)(DateTime.Now.Ticks + i*num))).Next(0, num*2));
Console.Write("["); for (int i = 0; i < num - 1; i++) Console.Write(arr[i] + ", "); Console.WriteLine(arr[arr.Count-1] + "] - " + X);
var arrbrute = new List<Tuple<int,int>>();
var arrfast = new List<Tuple<int,int>>();
for(int i = 0; i < num; i++)
for(int j = i+1; j < num; j++)
if (arr[i] + arr[j] == X)
arrbrute.Add(new Tuple<int, int>(arr[i], arr[j]));
int M = 500;
var lookup = new List<List<int>>();
for(int i = 0; i < 1000; i++) lookup.Add(new List<int>());
for(int i = 0; i < num; i++)
{
// Check and see if we have any "matches"
if (lookup[M + X - arr[i]].Count != 0)
{
foreach(var j in lookup[M + X - arr[i]])
arrfast.Add(new Tuple<int, int>(arr[i], arr[j]));
}
lookup[M + arr[i]].Add(i);
}
for(int i = 0; i < arrbrute.Count; i++)
Console.WriteLine(arrbrute[i].Item1 + " + " + arrbrute[i].Item2 + " = " + X);
Console.WriteLine("---------");
for(int i = 0; i < arrfast.Count; i++)
Console.WriteLine(arrfast[i].Item1 + " + " + arrfast[i].Item2 + " = " + X);
Console.ReadKey();
}
}
}

I implemented logic in Scala with out a Map. It gives duplicate pairs since the counter loops thru entire elements of the array. If duplicate pairs are needed, you can simply return the value pc
val arr = Array[Int](8, 7, 2, 5, 3, 1, 5)
val num = 10
var pc = 0
for(i <- arr.indices) {
if(arr.contains(Math.abs(arr(i) - num))) pc += 1
}
println(s"Pairs: ${pc/2}")
It is working with duplicates values in the array as well.

GOLANG Implementation
func findPairs(slice1 []int, sum int) [][]int {
pairMap := make(map[int]int)
var SliceOfPairs [][]int
for i, v := range slice1 {
if valuei, ok := pairMap[v]; ok {
//fmt.Println("Pair Found", i, valuei)
SliceOfPairs = append(SliceOfPairs, []int{i, valuei})
} else {
pairMap[sum-v] = i
}
}
return SliceOfPairs
}

function findPairOfNumbers(arr, targetSum) {
arr = arr.sort();
var low = 0, high = arr.length - 1, sum, result = [];
while(low < high) {
sum = arr[low] + arr[high];
if(sum < targetSum)
low++;
else if(sum > targetSum)
high--;
else if(sum === targetSum) {
result.push({val1: arr[low], val2: arr[high]});
high--;
}
}
return (result || false);
}
var pairs = findPairOfNumbers([1,2,3,4,5,6,7,8,9,0], 7);
if(pairs.length) {
console.log(pairs);
} else {
console.log("No pair of numbers found that sums to " + 7);
}

Minimum window width in string x that contains all characters of string y

Find minimum window width in string x that contains all characters of another string y. For example:
String x = "coobdafceeaxab"
String y = "abc"
The answer should be 5, because the shortest substring in x that contains all three letters of y is "bdafc".
I can think of a naive solution with complexity O(n^2 * log(m)), where n = len(x) and m = len(y). Can anyone suggest a better solution? Thanks.
Update: now think of it, if I change my set to tr1::unordered_map, then I can cut the complexity down to O(n^2), because insertion and deletion should both be O(1).

time: O(n) (One pass)
space: O(k)
This is how I would do it:
Create a hash table for all the characters from string Y. (I assume all characters are different in Y).
First pass:
Start from first character of string X.
update hash table, for exa: for key 'a' enter location (say 1).
Keep on doing it until you get all characters from Y (until all key in hash table has value).
If you get some character again, update its newer value and erase older one.
Once you have first pass, take smallest value from hash table and biggest value.
Thats the minimum window observed so far.
Now, go to next character in string X, update hash table and see if you get smaller window.
Edit:
Lets take an example here:
String x = "coobdafceeaxab"
String y = "abc"
First initialize a hash table from characters of Y.
h[a] = -1
h[b] = -1
h[c] = -1
Now, Start from first character of X.
First character is c, h[c] = 0
Second character (o) is not part of hash, skip it.
..
Fourth character (b), h[b] = 3
..
Sixth character(a), enter hash table h[a] = 5.
Now, all keys from hash table has some value.
Smallest value is 0 (of c) and highest value is 5 (of a), minimum window so far is 6 (0 to 5).
First pass is done.
Take next character. f is not part of hash table, skip it.
Next character (c), update hash table h[c] = 7.
Find new window, smallest value is 3 (of b) and highest value is 7 (of c).
New window is 3 to 7 => 5.
Keep on doing it till last character of string X.
I hope its clear now.
Edit
There are some concerns about finding max and min value from hash.
We can maintain sorted Link-list and map it with hash table.
Whenever any element from Link list changes, it should be re-mapped to hash table.
Both these operation are O(1)
Total space would be m+m
Edit
Here is small visualisation of above problem:
For "coobdafceeaxab" and "abc"
step-0:
Initial doubly linked-list = NULL
Initial hash-table = NULL
step-1:
Head<->[c,0]<->tail
h[c] = [0, 'pointer to c node in LL']
step-2:
Head<->[c,0]<->[b,3]<->tail
h[c] = [0, 'pointer to c node in LL'], h[b] = [3, 'pointer to b node in LL'],
Step-3:
Head<->[c,0]<->[b,3]<->[a,5]<->tail
h[c] = [0, 'pointer to c node in LL'], h[b] = [3, 'pointer to b node in LL'], h[a] = [5, 'pointer to a node in LL']
Minimum Window => difference from tail and head => (5-0)+1 => Length: 6
Step-4:
Update entry of C to index 7 here. (Remove 'c' node from linked-list and append at the tail)
Head<->[b,3]<->[a,5]<->[c,7]<->tail
h[c] = [7, 'new pointer to c node in LL'], h[b] = [3, 'pointer to b node in LL'], h[a] = [5, 'pointer to a node in LL'],
Minimum Window => difference from tail and head => (7-3)+1 => Length: 5
And so on..
Note that above Linked-list update and hash table update are both O(1).
Please correct me if I am wrong..
Summary:
TIme complexity: O(n) with one pass
Space Complexity: O(k) where k is length of string Y

I found this very nice O(N) time complexity version here http://leetcode.com/2010/11/finding-minimum-window-in-s-which.html, and shortened it slightly (removed continue in a first while , which allowed to simplify condition for the second while loop). Note, that this solution allows for duplicates in the second string, while many of the above answers do not.
private static String minWindow(String s, String t) {
int[] needToFind = new int[256];
int[] hasFound = new int[256];
for(int i = 0; i < t.length(); ++i) {
needToFind[t.charAt(i)]++;
}
int count = 0;
int minWindowSize = Integer.MAX_VALUE;
int start = 0, end = -1;
String window = "";
while (++end < s.length()) {
char c = s.charAt(end);
if(++hasFound[c] <= needToFind[c]) {
count++;
}
if(count < t.length()) continue;
while (hasFound[s.charAt(start)] > needToFind[s.charAt(start)]) {
hasFound[s.charAt(start++)]--;
}
if(end - start + 1 < minWindowSize) {
minWindowSize = end - start + 1;
window = s.substring(start, end + 1);
}
}
return window;
}

Here's my solution in C++:
int min_width(const string& x, const set<char>& y) {
vector<int> at;
for (int i = 0; i < x.length(); i++)
if (y.count(x[i]) > 0)
at.push_back(i);
int ret = x.size();
int start = 0;
map<char, int> count;
for (int end = 0; end < at.size(); end++) {
count[x[at[end]]]++;
while (count[x[at[start]]] > 1)
count[x[at[start++]]]--;
if (count.size() == y.size() && ret > at[end] - at[start] + 1)
ret = at[end] - at[start] + 1;
}
return ret;
}
Edit: Here's an implementation of Jack's idea. It's the same time complexity as mine, but without the inner loop that confuses you.
int min_width(const string& x, const set<char>& y) {
int ret = x.size();
map<char, int> index;
set<int> index_set;
for (int j = 0; j < x.size(); j++) {
if (y.count(x[j]) > 0) {
if (index.count(x[j]) > 0)
index_set.erase(index[x[j]]);
index_set.insert(j);
index[x[j]] = j;
if (index.size() == y.size()) {
int i = *index_set.begin();
if (ret > j-i+1)
ret = j-i+1;
}
}
}
return ret;
}
In Java it can be implemented nicely with LinkedHashMap:
static int minWidth(String x, HashSet<Character> y) {
int ret = x.length();
Map<Character, Integer> index = new LinkedHashMap<Character, Integer>();
for (int j = 0; j < x.length(); j++) {
char ch = x.charAt(j);
if (y.contains(ch)) {
index.remove(ch);
index.put(ch, j);
if (index.size() == y.size()) {
int i = index.values().iterator().next();
if (ret > j - i + 1)
ret = j - i + 1;
}
}
}
return ret;
}
All operations inside the loop take constant time (assuming hashed elements disperse properly).

There is an O(n solution to this problem). It very well described in this article.
http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html
Hope it helps.

This is my solution in C++, just for reference.
Update: originally I used std::set, now I change it to tr1::unordered_map to cut complexity down to n^2, otherwise these two implementations look pretty similar, to prevent this post from getting too long, I only list the improved solution.
#include <iostream>
#include <tr1/unordered_map>
#include <string>
using namespace std;
using namespace std::tr1;
typedef tr1::unordered_map<char, int> hash_t;
// Returns min substring width in which sentence contains all chars in word
// Returns sentence's length + 1 if not found
size_t get_min_width(const string &sent, const string &word) {
size_t min_size = sent.size() + 1;
hash_t char_set; // char set that word contains
for (size_t i = 0; i < word.size(); i++) {
char_set.insert(hash_t::value_type(word[i], 1));
}
for (size_t i = 0; i < sent.size() - word.size(); i++) {
hash_t s = char_set;
for (size_t j = i; j < min(j + min_size, sent.size()); j++) {
s.erase(sent[j]);
if (s.empty()) {
size_t size = j - i + 1;
if (size < min_size) min_size = size;
break;
}
}
}
return min_size;
}
int main() {
const string x = "coobdafceeaxab";
const string y = "abc";
cout << get_min_width(x, y) << "\n";
}

An implementation of Jack's idea.
public int smallestWindow(String str1, String str2){
if(str1==null || str2==null){
throw new IllegalArgumentException();
}
Map<String, Node> map=new HashMap<String, Node>();
Node head=null, current=null;
for(int i=0;i<str1.length();i++){
char c=str1.charAt(i);
if(head==null){
head=new Node(c);
current=head;
map.put(String.valueOf(c), head);
}
else{
current.next=new Node(c);
current.next.pre=current;
current=current.next;
map.put(String.valueOf(c), current);
}
}
Node end=current;
int min=Integer.MAX_VALUE;
int count=0;
for(int i=0;i<str2.length();i++){
char c = str2.charAt(i);
Node n=map.get(String.valueOf(c));
if(n!=null){
if(n.index==Integer.MAX_VALUE){
count++;
}
n.index=i;
if(n==head){
Node temp=head;
head=head.next;
if(head==null){//one node
return 1;
}
head.pre=null;
temp.pre=end;
end.next=temp;
temp.next=null;
end=temp;
}
else if(end!=n){
n.pre.next=n.next;
n.next.pre=n.pre;
n.pre=end;
n.next=null;
end.next=n;
end=n;
}
if(count==str1.length()){
min=Math.min(end.index-head.index+1, min);
}
}
}
System.out.println(map);
return min;
}

Simple java solution using the sliding window. Extending NitishMD's idea above:
public class StringSearchDemo {
public String getSmallestSubsetOfStringContaingSearchString(String toMatch,
String inputString) {
if (inputString.isEmpty() || toMatch.isEmpty()) {
return null;
}
// List<String> results = new ArrayList<String>(); // optional you can comment this out
String smallestMatch = "";
// String largestMatch = "";
int startPointer = 0, endPointer = 1;
HashMap<Character, Integer> toMatchMap = new HashMap<Character, Integer>();
for (char c : toMatch.toCharArray()) {
if (toMatchMap.containsKey(c)) {
toMatchMap.put(c, (toMatchMap.get(c) + 1));
} else {
toMatchMap.put(c, 1);
}
}
int totalCount = getCountofMatchingString(toMatchMap, toMatch);
for (int i = 0; i < inputString.length();) {
if (!toMatchMap.containsKey(inputString.charAt(i))) {
endPointer++;
i++;
continue;
}
String currentSubString = inputString.substring(startPointer,
endPointer);
if (getCountofMatchingString(toMatchMap, currentSubString) >= totalCount) {
// results.add(currentSubString); // optional you can comment this out
if (smallestMatch.length() > currentSubString.length()) {
smallestMatch = currentSubString;
} else if (smallestMatch.isEmpty()) {
smallestMatch = currentSubString;
}
// if (largestMatch.length() < currentSubString.length()) {
// largestMatch = currentSubString;
// }
startPointer++;
} else {
endPointer++;
i++;
}
}
// System.out.println("all possible combinations = " + results); // optional, you can comment this out
// System.out.println("smallest result = " + smallestMatch);
// System.out.println("largest result = " + largestMatch);
return smallestMatch;
}
public int getCountofMatchingString(HashMap<Character, Integer> toMatchMap,
String toMatch) {
int match = 0;
HashMap<Character, Integer> localMap = new HashMap<Character, Integer>();
for (char c : toMatch.toCharArray()) {
if (toMatchMap.containsKey(c)) {
if (localMap.containsKey(c)) {
if (localMap.get(c) < toMatchMap.get(c)) {
localMap.put(c, (localMap.get(c) + 1));
match++;
}
} else {
localMap.put(c, 1);
match++;
}
}
}
return match;
}
public static void main(String[] args) {
String inputString = "zxaddbddxyy由ccbbwwaay漢字由来";
String matchCriteria = "a由";
System.out.println("input=" + matchCriteria);
System.out.println("matchCriteria=" + inputString);
String result = (new StringSearchDemo())
.getSmallestSubsetOfStringContaingSearchString(matchCriteria, inputString);
System.out.println("smallest possbile match = " + result);
}
}

Efficiently reverse the order of the words (not characters) in an array of characters

Given an array of characters which forms a sentence of words, give an efficient algorithm to reverse the order of the words (not characters) in it.
Example input and output:
>>> reverse_words("this is a string")
'string a is this'
It should be O(N) time and O(1) space (split() and pushing on / popping off the stack are not allowed).
The puzzle is taken from here.

A solution in C/C++:
void swap(char* str, int i, int j){
char t = str[i];
str[i] = str[j];
str[j] = t;
}
void reverse_string(char* str, int length){
for(int i=0; i<length/2; i++){
swap(str, i, length-i-1);
}
}
void reverse_words(char* str){
int l = strlen(str);
//Reverse string
reverse_string(str,strlen(str));
int p=0;
//Find word boundaries and reverse word by word
for(int i=0; i<l; i++){
if(str[i] == ' '){
reverse_string(&str[p], i-p);
p=i+1;
}
}
//Finally reverse the last word.
reverse_string(&str[p], l-p);
}
This should be O(n) in time and O(1) in space.
Edit: Cleaned it up a bit.
The first pass over the string is obviously O(n/2) = O(n). The second pass is O(n + combined length of all words / 2) = O(n + n/2) = O(n), which makes this an O(n) algorithm.

pushing a string onto a stack and then popping it off - is that still O(1)?
essentially, that is the same as using split()...
Doesn't O(1) mean in-place? This task gets easy if we can just append strings and stuff, but that uses space...
EDIT: Thomas Watnedal is right. The following algorithm is O(n) in time and O(1) in space:
reverse string in-place (first iteration over string)
reverse each (reversed) word in-place (another two iterations over string)
find first word boundary
reverse inside this word boundary
repeat for next word until finished
I guess we would need to prove that step 2 is really only O(2n)...

#include <string>
#include <boost/next_prior.hpp>
void reverse(std::string& foo) {
using namespace std;
std::reverse(foo.begin(), foo.end());
string::iterator begin = foo.begin();
while (1) {
string::iterator space = find(begin, foo.end(), ' ');
std::reverse(begin, space);
begin = boost::next(space);
if (space == foo.end())
break;
}
}

Here is my answer. No library calls and no temp data structures.
#include <stdio.h>
void reverse(char* string, int length){
int i;
for (i = 0; i < length/2; i++){
string[length - 1 - i] ^= string[i] ;
string[i] ^= string[length - 1 - i];
string[length - 1 - i] ^= string[i];
}
}
int main () {
char string[] = "This is a test string";
char *ptr;
int i = 0;
int word = 0;
ptr = (char *)&string;
printf("%s\n", string);
int length=0;
while (*ptr++){
++length;
}
reverse(string, length);
printf("%s\n", string);
for (i=0;i<length;i++){
if(string[i] == ' '){
reverse(&string[word], i-word);
word = i+1;
}
}
reverse(&string[word], i-word); //for last word
printf("\n%s\n", string);
return 0;
}

In pseudo code:
reverse input string
reverse each word (you will need to find word boundaries)

#Daren Thomas
Implementation of your algorithm (O(N) in time, O(1) in space) in D (Digital Mars):
#!/usr/bin/dmd -run
/**
* to compile & run:
* $ dmd -run reverse_words.d
* to optimize:
* $ dmd -O -inline -release reverse_words.d
*/
import std.algorithm: reverse;
import std.stdio: writeln;
import std.string: find;
void reverse_words(char[] str) {
// reverse whole string
reverse(str);
// reverse each word
for (auto i = 0; (i = find(str, " ")) != -1; str = str[i + 1..length])
reverse(str[0..i]);
// reverse last word
reverse(str);
}
void main() {
char[] str = cast(char[])("this is a string");
writeln(str);
reverse_words(str);
writeln(str);
}
Output:
this is a string
string a is this

in Ruby
"this is a string".split.reverse.join(" ")

In C: (C99)
#include <stdio.h>
#include <string.h>
void reverseString(char* string, int length)
{
char swap;
for (int i = 0; i < length/2; i++)
{
swap = string[length - 1 - i];
string[length - 1 - i] = string[i];
string[i] = swap;
}
}
int main (int argc, const char * argv[]) {
char teststring[] = "Given an array of characters which form a sentence of words, give an efficient algorithm to reverse the order of the words (not characters) in it.";
printf("%s\n", teststring);
int length = strlen(teststring);
reverseString(teststring, length);
int i = 0;
while (i < length)
{
int wordlength = strspn(teststring + i, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
reverseString(teststring + i, wordlength);
i += wordlength + 1;
}
printf("%s\n", teststring);
return 0;
}
This gives output:
Given an array of characters which
form a sentence of words, give an
efficient algorithm to reverse the
order of the words (not characters) in
it.
.it in )characters not( words the
of order the reverse to algorithm
efficient an give ,words of sentence a
form which characters of array an
Given
This takes at most 4N time, with small constant space.
Unfortunately, It doesn't handle punctuation or case gracefully.

O(N) in space and O(N) in time solution in Python:
def reverse_words_nosplit(str_):
"""
>>> f = reverse_words_nosplit
>>> f("this is a string")
'string a is this'
"""
iend = len(str_)
s = ""
while True:
ispace = str_.rfind(" ", 0, iend)
if ispace == -1:
s += str_[:iend]
break
s += str_[ispace+1:iend]
s += " "
iend = ispace
return s

You would use what is known as an iterative recursive function, which is O(N) in time as it takes N (N being the number of words) iterations to complete and O(1) in space as each iteration holds its own state within the function arguments.
(define (reverse sentence-to-reverse)
(reverse-iter (sentence-to-reverse ""))
(define (reverse-iter(sentence, reverse-sentence)
(if (= 0 string-length sentence)
reverse-sentence
( reverse-iter( remove-first-word(sentence), add-first-word(sentence, reverse-sentence)))
Note: I have written this in scheme which I am a complete novice, so apologies for lack of correct string manipulation.
remove-first-word finds the first word boundary of sentence, then takes that section of characters (including space and punctuation) and removes it and returns new sentence
add-first-word finds the first word boundary of sentence, then takes that section of characters (including space and punctuation) and adds it to reverse-sentence and returns new reverse-sentence contents.

THIS PROGRAM IS TO REVERSE THE SENTENCE USING POINTERS IN "C language" By Vasantha kumar & Sundaramoorthy from KONGU ENGG COLLEGE, Erode.
NOTE: Sentence must end with dot(.)
because NULL character is not assigned automatically
at the end of the sentence*
#include<stdio.h>
#include<string.h>
int main()
{
char *p,*s="this is good.",*t;
int i,j,a,l,count=0;
l=strlen(s);
p=&s[l-1];
t=&s[-1];
while(*t)
{
if(*t==' ')
count++;
t++;
}
a=count;
while(l!=0)
{
for(i=0;*p!=' '&&t!=p;p--,i++);
p++;
for(;((*p)!='.')&&(*p!=' ');p++)
printf("%c",*p);
printf(" ");
if(a==count)
{
p=p-i-1;
l=l-i;
}
else
{
p=p-i-2;
l=l-i-1;
}
count--;
}
return 0;
}

Push each word onto a stack. Pop all the words off the stack.

using System;
namespace q47407
{
class MainClass
{
public static void Main(string[] args)
{
string s = Console.ReadLine();
string[] r = s.Split(' ');
for(int i = r.Length-1 ; i >= 0; i--)
Console.Write(r[i] + " ");
Console.WriteLine();
}
}
}
edit: i guess i should read the whole question... carry on.

Efficient in terms of my time: took under 2 minutes to write in REBOL:
reverse_words: func [s [string!]] [form reverse parse s none]
Try it out:
reverse_words "this is a string"
"string a is this"

A C++ solution:
#include <string>
#include <iostream>
using namespace std;
string revwords(string in) {
string rev;
int wordlen = 0;
for (int i = in.length(); i >= 0; --i) {
if (i == 0 || iswspace(in[i-1])) {
if (wordlen) {
for (int j = i; wordlen--; )
rev.push_back(in[j++]);
wordlen = 0;
}
if (i > 0)
rev.push_back(in[i-1]);
}
else
++wordlen;
}
return rev;
}
int main() {
cout << revwords("this is a sentence") << "." << endl;
cout << revwords(" a sentence with extra spaces ") << "." << endl;
return 0;
}

A Ruby solution.
# Reverse all words in string
def reverse_words(string)
return string if string == ''
reverse(string, 0, string.size - 1)
bounds = next_word_bounds(string, 0)
while bounds.all? { |b| b < string.size }
reverse(string, bounds[:from], bounds[:to])
bounds = next_word_bounds(string, bounds[:to] + 1)
end
string
end
# Reverse a single word between indices "from" and "to" in "string"
def reverse(s, from, to)
half = (from - to) / 2 + 1
half.times do |i|
s[from], s[to] = s[to], s[from]
from, to = from.next, to.next
end
s
end
# Find the boundaries of the next word starting at index "from"
def next_word_bounds(s, from)
from = s.index(/\S/, from) || s.size
to = s.index(/\s/, from + 1) || s.size
return { from: from, to: to - 1 }
end

in C#, in-place, O(n), and tested:
static char[] ReverseAllWords(char[] in_text)
{
int lindex = 0;
int rindex = in_text.Length - 1;
if (rindex > 1)
{
//reverse complete phrase
in_text = ReverseString(in_text, 0, rindex);
//reverse each word in resultant reversed phrase
for (rindex = 0; rindex <= in_text.Length; rindex++)
{
if (rindex == in_text.Length || in_text[rindex] == ' ')
{
in_text = ReverseString(in_text, lindex, rindex - 1);
lindex = rindex + 1;
}
}
}
return in_text;
}
static char[] ReverseString(char[] intext, int lindex, int rindex)
{
char tempc;
while (lindex < rindex)
{
tempc = intext[lindex];
intext[lindex++] = intext[rindex];
intext[rindex--] = tempc;
}
return intext;
}

This problem can be solved with O(n) in time and O(1) in space. The sample code looks as mentioned below:
public static string reverseWords(String s)
{
char[] stringChar = s.ToCharArray();
int length = stringChar.Length, tempIndex = 0;
Swap(stringChar, 0, length - 1);
for (int i = 0; i < length; i++)
{
if (i == length-1)
{
Swap(stringChar, tempIndex, i);
tempIndex = i + 1;
}
else if (stringChar[i] == ' ')
{
Swap(stringChar, tempIndex, i-1);
tempIndex = i + 1;
}
}
return new String(stringChar);
}
private static void Swap(char[] p, int startIndex, int endIndex)
{
while (startIndex < endIndex)
{
p[startIndex] ^= p[endIndex];
p[endIndex] ^= p[startIndex];
p[startIndex] ^= p[endIndex];
startIndex++;
endIndex--;
}
}

Algorithm:
1).Reverse each word of the string.
2).Reverse resultant String.
public class Solution {
public String reverseWords(String p) {
String reg=" ";
if(p==null||p.length()==0||p.equals(""))
{
return "";
}
String[] a=p.split("\\s+");
StringBuilder res=new StringBuilder();;
for(int i=0;i<a.length;i++)
{
String temp=doReverseString(a[i]);
res.append(temp);
res.append(" ");
}
String resultant=doReverseString(res.toString());
System.out.println(res);
return resultant.toString().replaceAll("^\\s+|\\s+$", "");
}
public String doReverseString(String s)`{`
char str[]=s.toCharArray();
int start=0,end=s.length()-1;
while(start<end)
{
char temp=str[start];
str[start]=str[end];
str[end]=temp;
start++;
end--;
}
String a=new String(str);
return a;
}
public static void main(String[] args)
{
Solution r=new Solution();
String main=r.reverseWords("kya hua");
//System.out.println(re);
System.out.println(main);
}
}

A one liner:
l="Is this as expected ??"
" ".join(each[::-1] for each in l[::-1].split())
Output:
'?? expected as this Is'

The algorithm to solve this problem is based on two steps process, first step will reverse the individual words of string,then in second step, reverse whole string. Implementation of algorithm will take O(n) time and O(1) space complexity.
#include <stdio.h>
#include <string.h>
void reverseStr(char* s, int start, int end);
int main()
{
char s[] = "This is test string";
int start = 0;
int end = 0;
int i = 0;
while (1) {
if (s[i] == ' ' || s[i] == '\0')
{
reverseStr(s, start, end-1);
start = i + 1;
end = start;
}
else{
end++;
}
if(s[i] == '\0'){
break;
}
i++;
}
reverseStr(s, 0, strlen(s)-1);
printf("\n\noutput= %s\n\n", s);
return 0;
}
void reverseStr(char* s, int start, int end)
{
char temp;
int j = end;
int i = start;
for (i = start; i < j ; i++, j--) {
temp = s[i];
s[i] = s[j];
s[j] = temp;
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Generate all unique substrings for given string - algorithm

As other posters have said, there are potentially O(n^2) substrings for a given string, so printing them out cannot be done faster than that. However there exists an efficient representation of the set that can be constructed in linear time: the suffix tree.

There is no way to do this faster than O(n2) because there are a total of O(n2) substrings in a string, so if you have to generate them all, their number will be n(n + 1) / 2 in the worst case, hence the upper lower bound of O(n2) Ω(n2).

For big oh ... Best you could do would be O(n^2) No need to reinvent the wheel, its not based on a strings, but on a sets, so you will have to take the concepts and apply them to your own situation. Algorithms Really Good White Paper from MS In depth PowerPoint Blog on string perms

well, since there is potentially n(n+1)/2 different substrings (+1 for the empty substring), I doubt you can be better than O(n2) (worst case). the easiest thing is to generate them and use some nice O(1) lookup table (such as a hashmap) for excluding duplicates right when you find them.

Related

Maximum product prefix string

Dividing array in two equal parts such that difference if sum of numbers of each array is minimum [duplicate]

Find a pair of elements from an array whose sum equals a given number

Minimum window width in string x that contains all characters of string y

Efficiently reverse the order of the words (not characters) in an array of characters

Categories

Resources