minimum length window in string1 where string2 is subsequence - algorithm

Main DNA sequence(a string) is given (let say string1) and another string to search for(let say string2). You have to find the minimum length window in string1 where string2 is subsequence.
string1 = "abcdefababaef"
string2 = "abf"
Approaches that i thought of, but does not seem to be working:
1. Use longest common subsequence(LCS) approach and check if the (length of LCS = length of string2). But this will give me whether string2 is present in string1 as subsequence, but not smallest window.
2. KMP algo, but not sure how to modify it.
3. Prepare a map of {characters: pos of characters} of string1 which are in string2. Like:
{ a : 0,6,8,10
b : 1,7,9
f : 5,12 }
And then some approach to find min window and still maintaining the order of "abf"
I am not sure whether I am thinking in right directions or am I totally off.
Is there a known algorithm for this, or does anyone know any approach? Kindly suggest.
Thanks in advance.

You can do LCS and find all the max subsequences in the String1 of String2 using recursion on the DP table of the LCS result. Then calculate the window length of each of LCS and you can get minimum of it. You can also stop a branch if it already exceeds size of current smallest window found.
check Reading out all LCS :-
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

Dynamic Programming!
Here is a C implementation
#include <iostream>
#include <vector>
using namespace std;
int main() {
string a, b;
cin >> a >> b;
int m = a.size(), n = b.size();
int inf = 100000000;
vector < vector < int > > dp (n + 1, vector < int > (m + 1, inf)); // length of min string a[j...k] such that b[i...] is a subsequence of a[j...k]
dp[n] = vector < int > (m + 1, 0); // b[n...] = "", so dp[n][i] = 0 for each i
for (int i = n - 1; i >= 0; --i) {
for (int j = m - 1; j >= 0; --j) {
if(b[i] == a[j]) dp[i][j] = 1 + dp[i+1][j+1];
else dp[i][j] = 1 + dp[i][j+1];
}
}
int l, r, min_len = inf;
for (int i = 0; i < m; ++i) {
if(dp[0][i] < min_len) {
min_len = dp[0][i];
l = i, r = i + min_len;
}
}
if(min_len == inf) {
cout << "no solution!\n";
} else {
for (int i = l; i < r; ++i) {
cout << a[i];
}
cout << '\n';
}
return 0;
}

I found a similar interview question on CareerCup , only difference being that its an array of integers instead of characters. I borrowed an idea and made a few changes, let me know if you have any questions after reading this C++ code.
What I am trying to do here is : The for loop in the main function is used to loop over all elements of the given array and find positions where I encounter the first element of the subarray, once found, I call the find_subsequence function where I recursively match the elements of the given array to the subarray at the same time preserving the order of elements. Finally, find_subsequence returns the position and I calculate the size of the subsequence.
Please excuse my English, wish I could explain it better.
#include "stdafx.h"
#include "iostream"
#include "vector"
#include "set"
using namespace std;
class Solution {
public:
int find_subsequence(vector<int> s, vector<int> c, int arrayStart, int subArrayStart) {
if (arrayStart == s.size() || subArrayStart ==c.size()) return -1;
if (subArrayStart==c.size()-1) return arrayStart;
if (s[arrayStart + 1] == c[subArrayStart + 1])
return find_subsequence(s, c, arrayStart + 1, subArrayStart + 1);
else
return find_subsequence(s, c, arrayStart + 1, subArrayStart);
}
};
int main()
{
vector<int> v = { 1,5,3,5,6,7,8,5,6,8,7,8,0,7 };
vector<int> c = { 5,6,8,7 };
Solution s;
int size = INT_MAX;
int j = -1;
for (int i = 0; i <v.size(); i++) {
if(v[i]==c[0]){
int x = s.find_subsequence(v, c, i-1, -1);
if (x > -1) {
if (x - i + 1 < size) {
size = x - i + 1;
j = i;
}
if (size == c.size())
break;
}
}
}
cout << size <<" "<<j;
return 0;
}

Related

Longest Increasing Subarray after add or subtract some element an amount less than K

Given an array and we can add or subtract some element an amount less than K to make the longest increasing subarray
Example: An array a=[6,4,3,2] and K=1; we can subtract 1 from a[2]; add 1 to a[4] so the array will be a=[6,3,3,3] and the LIS is [3,3,3]
An algorithm of complexity O(n) is possible, by considering a "state" approach.
For each index i, the state corresponds to the three values that we can get: A[i]-K, A[i], A[i]+K.
Then, for a given index, for each state s = 0, 1, 2, we can calculate the maximum increasing sequence length terminating at this state.
length[i+1][s] = 1 + max (length[i][s'], if val[i][s'] <= val[i+1][s], for s' = 0, 1, 2)
We can use the fact that length[i][s] is increasing with s.
In practice, if we are only interesting to know the final maximum length, we don't need to memorize all the length values.
Here is a simple C++ implementation, to illustrate this algorithm. It only provides the maximum length.
#include <iostream>
#include <vector>
#include <array>
#include <string>
struct Status {
std::array<int, 3> val;
std::array<int, 3> l_seq; // length sequences
};
int longuest_ascending_seq (const std::vector<int>& A, int K) {
int max_length = 0;
int n = A.size();
if (n == 0) return 0;
Status previous, current;
previous = {{A[0]-K, A[0]-K, A[0]-K}, {0, 0, 0}};
for (int i = 0; i < n; ++i) {
current.val = {A[i]-K, A[i], A[i] + K};
for (int j = 0; j < 3; ++j) {
int x = current.val[j];
if (x >= previous.val[2]) {
current.l_seq[j] = previous.l_seq[2] + 1;
} else if (x >= previous.val[1]) {
current.l_seq[j] = previous.l_seq[1] + 1;
} else if (x >= previous.val[0]) {
current.l_seq[j] = previous.l_seq[0] + 1;
} else {
current.l_seq[j] = 1;
}
}
if (current.l_seq[2] > max_length) max_length = current.l_seq[2];
std::swap (previous, current);
}
return max_length;
}
int main() {
std::vector<int> A = {6, 4, 3, 2, 0};
int K = 1;
auto ans = longuest_ascending_seq (A, K);
std::cout << ans << std::endl;
return 0;
}

Dynamic Programming Coin Change Limited Coins

Dynamic Programming Change Problem (Limited Coins).
I'm trying to create a program that takes as INPUT:
int coinValues[]; //e.g [coin1,coin2,coin3]
int coinLimit[]; //e.g [2 coin1 available,1 coin2 available,...]
int amount; //the amount we want change for.
OUTPUT:
int DynProg[]; //of size amount+1.
And output should be an Array of size amount+1 of which each cell represents the optimal number of coins we need to give change for the amount of the cell's index.
EXAMPLE: Let's say that we have the cell of Array at index: 5 with a content of 2.
This means that in order to give change for the amount of 5(INDEX), you need 2(cell's content) coins (Optimal Solution).
Basically I need exactly the output of the first array of this video(C[p])
. It's exactly the same problem with the big DIFFERENCE of LIMITED COINS.
Link to Video.
Note: See the video to understand, ignore the 2nd array of the video, and have in mind that I don't need the combinations, but the DP array, so then I can find which coins to give as change.
Thank you.
Consider the next pseudocode:
for every coin nominal v = coinValues[i]:
loop coinLimit[i] times:
starting with k=0 entry, check for non-zero C[k]:
if C[k]+1 < C[k+v] then
replace C[k+v] with C[k]+1 and set S[k+v]=v
Is it clear?
O(nk) solution from an editorial I wrote a while ago:
We start with the basic DP solution that runs in O(k*sum(c)). We have our dp array, where dp[i][j] stores the least possible number of coins from the first i denominations that sum to j. We have the following transition: dp[i][j] = min(dp[i - 1][j - cnt * value[i]] + cnt) for cnt from 0 to j / value[i].
To optimize this to an O(nk) solution, we can use a deque to memorize the minimum values from the previous iteration and make the transitions O(1). The basic idea is that if we want to find the minimum of the last m values in some array, we can maintain an increasing deque that stores possible candidates for the minimum. At each step, we pop off values at the end of the deque greater than the current value before pushing the current value into the back deque. Since the current value is both further to the right and less than the values we popped off, we can be sure they will never be the minimum. Then, we pop off the first element in the deque if it is more than m elements away. The minimum value at each step is now simply the first element in the deque.
We can apply a similar optimization trick to this problem. For each coin type i, we compute the elements of the dp array in this order: For each possible value of j % value[i] in increasing order, we process the values of j which when divided by value[i] produces that remainder in increasing order. Now we can apply the deque optimization trick to find min(dp[i - 1][j - cnt * value[i]] + cnt) for cnt from 0 to j / value[i] in constant time.
Pseudocode:
let n = number of coin denominations
let k = amount of change needed
let v[i] = value of the ith denomination, 1 indexed
let c[i] = maximum number of coins of the ith denomination, 1 indexed
let dp[i][j] = the fewest number of coins needed to sum to j using the first i coin denominations
for i from 1 to k:
dp[0][i] = INF
for i from 1 to n:
for rem from 0 to v[i] - 1:
let d = empty double-ended-queue
for j from 0 to (k - rem) / v[i]:
let currval = rem + v[i] * j
if dp[i - 1][currval] is not INF:
while d is not empty and dp[i - 1][d.back() * v[i] + rem] + j - d.back() >= dp[i - 1][currval]:
d.pop_back()
d.push_back(j)
if d is not empty and j - d.front() > c[i]:
d.pop_front()
if d is empty:
dp[i][currval] = INF
else:
dp[i][currval] = dp[i - 1][d.front() * v[i] + rem] + j - d.front()
This is what you are looking for.
Assumptions made : Coin Values are in descending order
public class CoinChangeLimitedCoins {
public static void main(String[] args) {
int[] coins = { 5, 3, 2, 1 };
int[] counts = { 2, 1, 2, 1 };
int target = 9;
int[] nums = combine(coins, counts);
System.out.println(minCount(nums, target, 0, 0, 0));
}
private static int minCount(int[] nums, int target, int sum, int current, int count){
if(current > nums.length) return -1;
if(sum == target) return count;
if(sum + nums[current] <= target){
return minCount(nums, target, sum+nums[current], current+1, count+1);
} else {
return minCount(nums, target, sum, current+1, count);
}
}
private static int[] combine(int[] coins, int[] counts) {
int sum = 0;
for (int count : counts) {
sum += count;
}
int[] returnArray = new int[sum];
int returnArrayIndex = 0;
for (int i = 0; i < coins.length; i++) {
int count = counts[i];
while (count != 0) {
returnArray[returnArrayIndex] = coins[i];
returnArrayIndex++;
count--;
}
}
return returnArray;
}
}
You can check this question: Minimum coin change problem with limited amount of coins.
BTW, I created c++ program based above link's algorithm:
#include <iostream>
#include <map>
#include <vector>
#include <algorithm>
#include <limits>
using namespace std;
void copyVec(vector<int> from, vector<int> &to){
for(vector<int>::size_type i = 0; i < from.size(); i++)
to[i] = from[i];
}
vector<int> makeChangeWithLimited(int amount, vector<int> coins, vector<int> limits)
{
vector<int> change;
vector<vector<int>> coinsUsed( amount + 1 , vector<int>(coins.size()));
vector<int> minCoins(amount+1,numeric_limits<int>::max() - 1);
minCoins[0] = 0;
vector<int> limitsCopy(limits.size());
copy(limits.begin(), limits.end(), limitsCopy.begin());
for (vector<int>::size_type i = 0; i < coins.size(); ++i)
{
while (limitsCopy[i] > 0)
{
for (int j = amount; j >= 0; --j)
{
int currAmount = j + coins[i];
if (currAmount <= amount)
{
if (minCoins[currAmount] > minCoins[j] + 1)
{
minCoins[currAmount] = minCoins[j] + 1;
copyVec(coinsUsed[j], coinsUsed[currAmount]);
coinsUsed[currAmount][i] += 1;
}
}
}
limitsCopy[i] -= 1;
}
}
if (minCoins[amount] == numeric_limits<int>::max() - 1)
{
return change;
}
copy(coinsUsed[amount].begin(),coinsUsed[amount].end(), back_inserter(change) );
return change;
}
int main()
{
vector<int> coins;
coins.push_back(20);
coins.push_back(50);
coins.push_back(100);
coins.push_back(200);
vector<int> limits;
limits.push_back(100);
limits.push_back(100);
limits.push_back(50);
limits.push_back(20);
int amount = 0;
cin >> amount;
while(amount){
vector<int> change = makeChangeWithLimited(amount,coins,limits);
for(vector<int>::size_type i = 0; i < change.size(); i++){
cout << change[i] << "x" << coins[i] << endl;
}
if(change.empty()){
cout << "IMPOSSIBE\n";
}
cin >> amount;
}
system("pause");
return 0;
}
Code in c#
private static int MinCoinsChangeWithLimitedCoins(int[] coins, int[] counts, int sum)
{
var dp = new int[sum + 1];
Array.Fill(dp, int.MaxValue);
dp[0] = 0;
for (int i = 0; i < coins.Length; i++) // n
{
int coin = coins[i];
for (int j = 0; j < counts[i]; j++) //
{
for (int s = sum; s >= coin ; s--) // sum
{
int remainder = s - coin;
if (remainder >= 0 && dp[remainder] != int.MaxValue)
{
dp[s] = Math.Min(1 + dp[remainder], dp[s]);
}
}
}
}
return dp[sum] == int.MaxValue ? -1 : dp[sum];
}

How can I maximise the number of ribbon piece for a ribbon of given length n?

I have a ribbon, its length is n. I want to cut the ribbon in a way that fulfils the following two conditions:
1. After the cutting each ribbon piece should have length a, b or c.
2. After the cutting the number of ribbon pieces should be maximum.
Find the number of maximum pieces after required cutting.
Input is of the form n,a,b,c where n is the original length of ribbon, and a,b,c are the required lengths of the ribbon.
For eg: I/P = 5 5 3 2
O/P = 2
Now, I am able to realize that this should follow a DP solution. A one dimensional DP where dp[n] represents the maximum number of ways for ribbon of length n.
Now, I am not sure if the recurrence relations will be of the form,
dp[n] = dp[n-a] + a;
dp[n] = dp[n-b] + b;
dp[n] = dp[n-c] + c;
Is this correct or there is some other way?
Edit: Implementation according to the first post:
#include <iostream>
#include <cmath>
using namespace std;
int dp[100000];
int maxi (int a,int b,int c);
int main (void)
{
int n,a,b,c;
cin>>n>>a>>b>>c;
for (int i = 0; i <= n; i++)
{
if ( i == 0 )
dp[i] = 0;
else
dp[i] = maxi(dp[i-a],dp[i-b],dp[i-c])+1;
}
cout<<dp[n]<<"\n";
return 0;
}
int maxi (int a,int b,int c)
{
int ret;
if ( a > b )
ret = a;
else
ret = b;
if ( ret < c )
ret = c;
return ret;
}
if n < 0:
dp[n] = -infinity
if n == 0:
dp[n] = 0
if n > 0:
dp[n] = 1 + max(dp[n-a], dp[n-b], dp[n-c])
for (int i = 0; i <= n; i++)
{
if (i == 0)
dp[i] = 0;
else {
int A = (i-a>=0) ? dp[i-a] : -n-1;
int B = (i-b>=0) ? dp[i-b] : -n-1;
int C = (i-c>=0) ? dp[i-c] : -n-1;
dp[i] = maxi(A,B,C)+1;
}
}

How to find the subarray that has sum closest to zero or a certain value t in O(nlogn)

Actually it is the problem #10 of chapter 8 of Programming Pearls 2nd edition. It asked two questions: given an array A[] of integers(positive and nonpositive), how can you find a continuous subarray of A[] whose sum is closest to 0? Or closest to a certain value t?
I can think of a way to solve the problem closest to 0. Calculate the prefix sum array S[], where S[i] = A[0]+A[1]+...+A[i]. And then sort this S according to the element value, along with its original index information kept, to find subarray sum closest to 0, just iterate the S array and do the diff of the two neighboring values and update the minimum absolute diff.
Question is, what is the best way so solve second problem? Closest to a certain value t? Can anyone give a code or at least an algorithm? (If anyone has better solution to closest to zero problem, answers are welcome too)
To solve this problem, you can build an interval-tree by your own,
or balanced binary search tree, or even beneficial from STL map, in O(nlogn).
Following is use STL map, with lower_bound().
#include <map>
#include <iostream>
#include <algorithm>
using namespace std;
int A[] = {10,20,30,30,20,10,10,20};
// return (i, j) s.t. A[i] + ... + A[j] is nearest to value c
pair<int, int> nearest_to_c(int c, int n, int A[]) {
map<int, int> bst;
bst[0] = -1;
// barriers
bst[-int(1e9)] = -2;
bst[int(1e9)] = n;
int sum = 0, start, end, ret = c;
for (int i=0; i<n; ++i) {
sum += A[i];
// it->first >= sum-c, and with the minimal value in bst
map<int, int>::iterator it = bst.lower_bound(sum - c);
int tmp = -(sum - c - it->first);
if (tmp < ret) {
ret = tmp;
start = it->second + 1;
end = i;
}
--it;
// it->first < sum-c, and with the maximal value in bst
tmp = sum - c - it->first;
if (tmp < ret) {
ret = tmp;
start = it->second + 1;
end = i;
}
bst[sum] = i;
}
return make_pair(start, end);
}
// demo
int main() {
int c;
cin >> c;
pair<int, int> ans = nearest_to_c(c, 8, A);
cout << ans.first << ' ' << ans.second << endl;
return 0;
}
You can adapt your method. Assuming you have an array S of prefix sums, as you wrote, and already sorted in increasing order of sum value. The key concept is to not only examine consecutive prefix sums, but instead use two pointers to indicate two positions in the array S. Written in a (slightly pythonic) pseudocode:
left = 0 # Initialize window of length 0 ...
right = 0 # ... at the beginning of the array
best = ∞ # Keep track of best solution so far
while right < length(S): # Iterate until window reaches the end of the array
diff = S[right] - S[left]
if diff < t: # Window is getting too small
if t - diff < best: # We have a new best subarray
best = t - diff
# remember left and right as well
right = right + 1 # Make window bigger
else: # Window getting too big
if diff - t < best # We have a new best subarray
best = diff - t
# remember left and right as well
left = left + 1 # Make window smaller
The complexity is bound by the sorting. The above search will take at most 2n=O(n) iterations of the loop, each with computation time bound by a constant. Note that the above code was conceived for positive t.
The code was conceived for positive elements in S, and positive t. If any negative integers crop up, you might end up with a situation where the original index of right is smaller than that of left. So you'd end up with a sub sequence sum of -t. You can check this condition in the if … < best checks, but if you only suppress such cases there, I believe that you might be missing some relevant cases. Bottom line is: take this idea, think it through, but you'll have to adapt it for negative numbers.
Note: I think that this is the same general idea which Boris Strandjev wanted to express in his solution. However, I found that solution somewhat hard to read and harder to understand, so I'm offering my own formulation of this.
Your solution for the 0 case seems ok to me. Here is my solution to the second case:
You again calculate the prefix sums and sort.
You initialize to indices start to 0 (first index in the sorted prefix array) end to last (last index of the prefix array)
you start iterating over start 0...last and for each you find the corresponding end - the last index in which the prefix sum is such that prefix[start] + prefix[end] > t. When you find that end the best solution for start is either prefix[start] + prefix[end] or prefix[start] + prefix[end - 1] (the latter taken only if end > 0)
the most important thing is that you do not search for end for each start from scratch - prefix[start] increases in value when iterating over all possible values for start, which means that in each iteration you are interested only in values <= the previous value of end.
you can stop iterating when start > end
you take the best of all values obtained for all start positions.
It can easily be proved that this will give you complexity of O(n logn) for the entire algorithm.
I found this question by accident. Although it's been a while, I just post it. O(nlogn) time, O(n) space algorithm. This is running Java code. Hope this help people.
import java.util.*;
public class FindSubarrayClosestToZero {
void findSubarrayClosestToZero(int[] A) {
int curSum = 0;
List<Pair> list = new ArrayList<Pair>();
// 1. create prefix array: curSum array
for(int i = 0; i < A.length; i++) {
curSum += A[i];
Pair pair = new Pair(curSum, i);
list.add(pair);
}
// 2. sort the prefix array by value
Collections.sort(list, valueComparator);
// printPairList(list);
System.out.println();
// 3. compute pair-wise value diff: Triple< diff, i, i+1>
List<Triple> tList = new ArrayList<Triple>();
for(int i=0; i < A.length-1; i++) {
Pair p1 = list.get(i);
Pair p2 = list.get(i+1);
int valueDiff = p2.value - p1.value;
Triple Triple = new Triple(valueDiff, p1.index, p2.index);
tList.add(Triple);
}
// printTripleList(tList);
System.out.println();
// 4. Sort by min diff
Collections.sort(tList, valueDiffComparator);
// printTripleList(tList);
Triple res = tList.get(0);
int startIndex = Math.min(res.index1 + 1, res.index2);
int endIndex = Math.max(res.index1 + 1, res.index2);
System.out.println("\n\nThe subarray whose sum is closest to 0 is: ");
for(int i= startIndex; i<=endIndex; i++) {
System.out.print(" " + A[i]);
}
}
class Pair {
int value;
int index;
public Pair(int value, int index) {
this.value = value;
this.index = index;
}
}
class Triple {
int valueDiff;
int index1;
int index2;
public Triple(int valueDiff, int index1, int index2) {
this.valueDiff = valueDiff;
this.index1 = index1;
this.index2 = index2;
}
}
public static Comparator<Pair> valueComparator = new Comparator<Pair>() {
public int compare(Pair p1, Pair p2) {
return p1.value - p2.value;
}
};
public static Comparator<Triple> valueDiffComparator = new Comparator<Triple>() {
public int compare(Triple t1, Triple t2) {
return t1.valueDiff - t2.valueDiff;
}
};
void printPairList(List<Pair> list) {
for(Pair pair : list) {
System.out.println("<" + pair.value + " : " + pair.index + ">");
}
}
void printTripleList(List<Triple> list) {
for(Triple t : list) {
System.out.println("<" + t.valueDiff + " : " + t.index1 + " , " + t.index2 + ">");
}
}
public static void main(String[] args) {
int A1[] = {8, -3, 2, 1, -4, 10, -5}; // -3, 2, 1
int A2[] = {-3, 2, 4, -6, -8, 10, 11}; // 2, 4, 6
int A3[] = {10, -2, -7}; // 10, -2, -7
FindSubarrayClosestToZero f = new FindSubarrayClosestToZero();
f.findSubarrayClosestToZero(A1);
f.findSubarrayClosestToZero(A2);
f.findSubarrayClosestToZero(A3);
}
}
Solution time complexity : O(NlogN)
Solution space complexity : O(N)
[Note this problem can't be solved in O(N) as some have claimed]
Algorithm:-
Compute cumulative array(here,cum[]) of given array [Line 10]
Sort the cumulative array [Line 11]
Answer is minimum amongst C[i]-C[i+1] , $\forall$ i∈[1,n-1] (1-based index) [Line 12]
C++ Code:-
#include<bits/stdc++.h>
#define M 1000010
#define REP(i,n) for (int i=1;i<=n;i++)
using namespace std;
typedef long long ll;
ll a[M],n,cum[M],ans=numeric_limits<ll>::max(); //cum->cumulative array
int main() {
ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
cin>>n; REP(i,n) cin>>a[i],cum[i]=cum[i-1]+a[i];
sort(cum+1,cum+n+1);
REP(i,n-1) ans=min(ans,cum[i+1]-cum[i]);
cout<<ans; //min +ve difference from 0 we can get
}
After more thinking on this problem, I found that #frankyym's solution is the right solution. I have made some refinements on the original solution, here is my code:
#include <map>
#include <stdio.h>
#include <algorithm>
#include <limits.h>
using namespace std;
#define IDX_LOW_BOUND -2
// Return [i..j] range of A
pair<int, int> nearest_to_c(int A[], int n, int t) {
map<int, int> bst;
int presum, subsum, closest, i, j, start, end;
bool unset;
map<int, int>::iterator it;
bst[0] = -1;
// Barriers. Assume that no prefix sum is equal to INT_MAX or INT_MIN.
bst[INT_MIN] = IDX_LOW_BOUND;
bst[INT_MAX] = n;
unset = true;
// This initial value is always overwritten afterwards.
closest = 0;
presum = 0;
for (i = 0; i < n; ++i) {
presum += A[i];
for (it = bst.lower_bound(presum - t), j = 0; j < 2; --it, j++) {
if (it->first == INT_MAX || it->first == INT_MIN)
continue;
subsum = presum - it->first;
if (unset || abs(closest - t) > abs(subsum - t)) {
closest = subsum;
start = it->second + 1;
end = i;
if (closest - t == 0)
goto ret;
unset = false;
}
}
bst[presum] = i;
}
ret:
return make_pair(start, end);
}
int main() {
int A[] = {10, 20, 30, 30, 20, 10, 10, 20};
int t;
scanf("%d", &t);
pair<int, int> ans = nearest_to_c(A, 8, t);
printf("[%d:%d]\n", ans.first, ans.second);
return 0;
}
As a side note: I agree with the algorithms provided by other threads here. There is another algorithm on top of my head recently. Make up another copy of A[], which is B[]. Inside B[], each element is A[i]-t/n, which means B[0]=A[0]-t/n, B[1]=A[1]-t/n ... B[n-1]=A[n-1]-t/n. Then the second problem is actually transformed to the first problem, once the smallest subarray of B[] closest to 0 is found, the subarray of A[] closest to t is found at the same time. (It is kinda tricky if t is not divisible by n, nevertheless, the precision has to be chosen appropriate. Also the runtime is O(n))
I think there is a little bug concerning the closest to 0 solution. At the last step we should not only inspect the difference between neighbor elements but also elements not near to each other if one of them is bigger than 0 and the other one is smaller than 0.
Sorry, I thought I am supposed to get all answers for the problem. Didn't see it only requires one.
Cant we use dynamic programming to solve this question similar to kadane's algorithm.Here is my solution to this problem.Please comment if this approach is wrong.
#include <bits/stdc++.h>
using namespace std;
int main() {
//code
int test;
cin>>test;
while(test--){
int n;
cin>>n;
vector<int> A(n);
for(int i=0;i<n;i++)
cin>>A[i];
int closest_so_far=A[0];
int closest_end_here=A[0];
int start=0;
int end=0;
int lstart=0;
int lend=0;
for(int i=1;i<n;i++){
if(abs(A[i]-0)<abs(A[i]+closest_end_here-0)){
closest_end_here=A[i]-0;
lstart=i;
lend=i;
}
else{
closest_end_here=A[i]+closest_end_here-0;
lend=i;
}
if(abs(closest_end_here-0)<abs(closest_so_far-0)){
closest_so_far=closest_end_here;
start=lstart;
end=lend;
}
}
for(int i=start;i<=end;i++)
cout<<A[i]<<" ";
cout<<endl;
cout<<closest_so_far<<endl;
}
return 0;
}
Here is a code implementation by java:
public class Solution {
/**
* #param nums: A list of integers
* #return: A list of integers includes the index of the first number
* and the index of the last number
*/
public ArrayList<Integer> subarraySumClosest(int[] nums) {
// write your code here
int len = nums.length;
ArrayList<Integer> result = new ArrayList<Integer>();
int[] sum = new int[len];
HashMap<Integer,Integer> mapHelper = new HashMap<Integer,Integer>();
int min = Integer.MAX_VALUE;
int curr1 = 0;
int curr2 = 0;
sum[0] = nums[0];
if(nums == null || len < 2){
result.add(0);
result.add(0);
return result;
}
for(int i = 1;i < len;i++){
sum[i] = sum[i-1] + nums[i];
}
for(int i = 0;i < len;i++){
if(mapHelper.containsKey(sum[i])){
result.add(mapHelper.get(sum[i])+1);
result.add(i);
return result;
}
else{
mapHelper.put(sum[i],i);
}
}
Arrays.sort(sum);
for(int i = 0;i < len-1;i++){
if(Math.abs(sum[i] - sum[i+1]) < min){
min = Math.abs(sum[i] - sum[i+1]);
curr1 = sum[i];
curr2 = sum[i+1];
}
}
if(mapHelper.get(curr1) < mapHelper.get(curr2)){
result.add(mapHelper.get(curr1)+1);
result.add(mapHelper.get(curr2));
}
else{
result.add(mapHelper.get(curr2)+1);
result.add(mapHelper.get(curr1));
}
return result;
}
}

Generate all unique substrings for given string

Given a string s, what is the fastest method to generate a set of all its unique substrings?
Example: for str = "aba" we would get substrs={"a", "b", "ab", "ba", "aba"}.
The naive algorithm would be to traverse the entire string generating substrings in length 1..n in each iteration, yielding an O(n^2) upper bound.
Is a better bound possible?
(this is technically homework, so pointers-only are welcome as well)
As other posters have said, there are potentially O(n^2) substrings for a given string, so printing them out cannot be done faster than that. However there exists an efficient representation of the set that can be constructed in linear time: the suffix tree.
There is no way to do this faster than O(n2) because there are a total of O(n2) substrings in a string, so if you have to generate them all, their number will be n(n + 1) / 2 in the worst case, hence the upper lower bound of O(n2) Ω(n2).
First one is brute force which has complexity O(N^3) which could be brought down to O(N^2 log(N))
Second One using HashSet which has Complexity O(N^2)
Third One using LCP by initially finding all the suffix of a given string which has the worst case O(N^2) and best case O(N Log(N)).
First Solution:-
import java.util.Scanner;
public class DistinctSubString {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Enter The string");
String s = in.nextLine();
long startTime = System.currentTimeMillis();
int L = s.length();
int N = L * (L + 1) / 2;
String[] Comb = new String[N];
for (int i = 0, p = 0; i < L; ++i) {
for (int j = 0; j < (L - i); ++j) {
Comb[p++] = s.substring(j, i + j + 1);
}
}
/*
* for(int j=0;j<N;++j) { System.out.println(Comb[j]); }
*/
boolean[] val = new boolean[N];
for (int i = 0; i < N; ++i)
val[i] = true;
int counter = N;
int p = 0, start = 0;
for (int i = 0, j; i < L; ++i) {
p = L - i;
for (j = start; j < (start + p); ++j) {
if (val[j]) {
//System.out.println(Comb[j]);
for (int k = j + 1; k < start + p; ++k) {
if (Comb[j].equals(Comb[k])) {
counter--;
val[k] = false;
}
}
}
}
start = j;
}
System.out.println("Substrings are " + N
+ " of which unique substrings are " + counter);
long endTime = System.currentTimeMillis();
System.out.println("It took " + (endTime - startTime) + " milliseconds");
}
}
Second Solution:-
import java.util.*;
public class DistictSubstrings_usingHashTable {
public static void main(String args[]) {
// create a hash set
Scanner in = new Scanner(System.in);
System.out.print("Enter The string");
String s = in.nextLine();
int L = s.length();
long startTime = System.currentTimeMillis();
Set<String> hs = new HashSet<String>();
// add elements to the hash set
for (int i = 0; i < L; ++i) {
for (int j = 0; j < (L - i); ++j) {
hs.add(s.substring(j, i + j + 1));
}
}
System.out.println(hs.size());
long endTime = System.currentTimeMillis();
System.out.println("It took " + (endTime - startTime) + " milliseconds");
}
}
Third Solution:-
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
public class LCPsolnFroDistinctSubString {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter Desired String ");
String string = br.readLine();
int length = string.length();
String[] arrayString = new String[length];
for (int i = 0; i < length; ++i) {
arrayString[i] = string.substring(length - 1 - i, length);
}
Arrays.sort(arrayString);
for (int i = 0; i < length; ++i)
System.out.println(arrayString[i]);
long num_substring = arrayString[0].length();
for (int i = 0; i < length - 1; ++i) {
int j = 0;
for (; j < arrayString[i].length(); ++j) {
if (!((arrayString[i].substring(0, j + 1)).equals((arrayString)[i + 1]
.substring(0, j + 1)))) {
break;
}
}
num_substring += arrayString[i + 1].length() - j;
}
System.out.println("unique substrings = " + num_substring);
}
}
Fourth Solution:-
public static void printAllCombinations(String soFar, String rest) {
if(rest.isEmpty()) {
System.out.println(soFar);
} else {
printAllCombinations(soFar + rest.substring(0,1), rest.substring(1));
printAllCombinations(soFar , rest.substring(1));
}
}
Test case:- printAllCombinations("", "abcd");
For big oh ... Best you could do would be O(n^2)
No need to reinvent the wheel, its not based on a strings, but on a sets, so you will have to take the concepts and apply them to your own situation.
Algorithms
Really Good White Paper from MS
In depth PowerPoint
Blog on string perms
well, since there is potentially n*(n+1)/2 different substrings (+1 for the empty substring), I doubt you can be better than O(n*2) (worst case). the easiest thing is to generate them and use some nice O(1) lookup table (such as a hashmap) for excluding duplicates right when you find them.
class SubstringsOfAString {
public static void main(String args[]) {
String string = "Hello", sub = null;
System.out.println("Substrings of \"" + string + "\" are :-");
for (int i = 0; i < string.length(); i++) {
for (int j = 1; j <= string.length() - i; j++) {
sub = string.substring(i, j + i);
System.out.println(sub);
}
}
}
}
class program
{
List<String> lst = new List<String>();
String str = "abc";
public void func()
{
subset(0, "");
lst.Sort();
lst = lst.Distinct().ToList();
foreach (String item in lst)
{
Console.WriteLine(item);
}
}
void subset(int n, String s)
{
for (int i = n; i < str.Length; i++)
{
lst.Add(s + str[i].ToString());
subset(i + 1, s + str[i].ToString());
}
}
}
This prints unique substrings.
https://ideone.com/QVWOh0
def uniq_substring(test):
lista=[]
[lista.append(test[i:i+k+1]) for i in range(len(test)) for k in
range(len(test)-i) if test[i:i+k+1] not in lista and
test[i:i+k+1][::-1] not in lista]
print lista
uniq_substring('rohit')
uniq_substring('abab')
['r', 'ro', 'roh', 'rohi', 'rohit', 'o', 'oh', 'ohi', 'ohit', 'h',
'hi', 'hit', 'i', 'it', 't']
['a', 'ab', 'aba', 'abab', 'b', 'bab']
Many answers that include 2 for loops and a .substring() call claim O(N^2) time complexity. However, it is important to note that the worst case for a .substring() call in Java (post update 6 in Java 7) is O(N). So by adding a .substring() call in your code, the order of N has increased by one.
Therefore, 2 for loops and a .substring() call within those loops equals an O(N^3) time complexity.
It can only be done in o(n^2) time as total number of unique substrings of a string would be n(n+1)/2.
Example:
string s = "abcd"
pass 0: (all the strings are of length 1)
a, b, c, d = 4 strings
pass 1: (all the strings are of length 2)
ab, bc, cd = 3 strings
pass 2: (all the strings are of length 3)
abc, bcd = 2 strings
pass 3: (all the strings are of length 4)
abcd = 1 strings
Using this analogy, we can write solution with o(n^2) time complexity and constant space complexity.
The source code is as below:
#include<stdio.h>
void print(char arr[], int start, int end)
{
int i;
for(i=start;i<=end;i++)
{
printf("%c",arr[i]);
}
printf("\n");
}
void substrings(char arr[], int n)
{
int pass,j,start,end;
int no_of_strings = n-1;
for(pass=0;pass<n;pass++)
{
start = 0;
end = start+pass;
for(j=no_of_strings;j>=0;j--)
{
print(arr,start, end);
start++;
end = start+pass;
}
no_of_strings--;
}
}
int main()
{
char str[] = "abcd";
substrings(str,4);
return 0;
}
Naive algorithm takes O(n^3) time instead of O(n^2) time.
There are O(n^2) number of substrings.
And if you put O(n^2) number of substrings, for example, set,
then set compares O(lgn) comparisons for each string to check if it alrady exists in the set or not.
Besides it takes O(n) time for string comparison.
Therefore, it takes O(n^3 lgn) time if you use set. and you can reduce it O(n^3) time if you use hashtable instead of set.
The point is it is string comparisons not number comparisons.
So one of the best algorithm let's say if you use suffix array and longest common prefix (LCP) algorithm, it reduces O(n^2) time for this problem.
Building a suffix array using O(n) time algorithm.
Time for LCP = O(n) time.
Since for each pair of strings in suffix array, do LCP so total time is O(n^2) time to find the length of distinct subtrings.
Besides if you want to print all distinct substrings, it takes O(n^2) time.
Try this code using a suffix array and longest common prefix. It can also give you the total number of unique substrings. The code might give a stack overflow in visual studio but runs fine in Eclipse C++. That's because it returns vectors for functions. Haven't tested it against extremely long strings. Will do so and report back.
// C++ program for building LCP array for given text
#include <bits/stdc++.h>
#include <vector>
#include <string>
using namespace std;
#define MAX 100000
int cum[MAX];
// Structure to store information of a suffix
struct suffix
{
int index; // To store original index
int rank[2]; // To store ranks and next rank pair
};
// A comparison function used by sort() to compare two suffixes
// Compares two pairs, returns 1 if first pair is smaller
int cmp(struct suffix a, struct suffix b)
{
return (a.rank[0] == b.rank[0])? (a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}
// This is the main function that takes a string 'txt' of size n as an
// argument, builds and return the suffix array for the given string
vector<int> buildSuffixArray(string txt, int n)
{
// A structure to store suffixes and their indexes
struct suffix suffixes[n];
// Store suffixes and their indexes in an array of structures.
// The structure is needed to sort the suffixes alphabatically
// and maintain their old indexes while sorting
for (int i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = txt[i] - 'a';
suffixes[i].rank[1] = ((i+1) < n)? (txt[i + 1] - 'a'): -1;
}
// Sort the suffixes using the comparison function
// defined above.
sort(suffixes, suffixes+n, cmp);
// At his point, all suffixes are sorted according to first
// 2 characters. Let us sort suffixes according to first 4
// characters, then first 8 and so on
int ind[n]; // This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for (int k = 4; k < 2*n; k = k*2)
{
// Assigning rank and index values to first suffix
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
// Assigning rank to suffixes
for (int i = 1; i < n; i++)
{
// If first rank and next ranks are same as that of previous
// suffix in array, assign the same new rank to this suffix
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else // Otherwise increment rank and assign
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
// Assign next rank to every suffix
for (int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}
// Sort the suffixes according to first k characters
sort(suffixes, suffixes+n, cmp);
}
// Store indexes of all sorted suffixes in the suffix array
vector<int>suffixArr;
for (int i = 0; i < n; i++)
suffixArr.push_back(suffixes[i].index);
// Return the suffix array
return suffixArr;
}
/* To construct and return LCP */
vector<int> kasai(string txt, vector<int> suffixArr)
{
int n = suffixArr.size();
// To store LCP array
vector<int> lcp(n, 0);
// An auxiliary array to store inverse of suffix array
// elements. For example if suffixArr[0] is 5, the
// invSuff[5] would store 0. This is used to get next
// suffix string from suffix array.
vector<int> invSuff(n, 0);
// Fill values in invSuff[]
for (int i=0; i < n; i++)
invSuff[suffixArr[i]] = i;
// Initialize length of previous LCP
int k = 0;
// Process all suffixes one by one starting from
// first suffix in txt[]
for (int i=0; i<n; i++)
{
/* If the current suffix is at n-1, then we don’t
have next substring to consider. So lcp is not
defined for this substring, we put zero. */
if (invSuff[i] == n-1)
{
k = 0;
continue;
}
/* j contains index of the next substring to
be considered to compare with the present
substring, i.e., next string in suffix array */
int j = suffixArr[invSuff[i]+1];
// Directly start matching from k'th index as
// at-least k-1 characters will match
while (i+k<n && j+k<n && txt[i+k]==txt[j+k])
k++;
lcp[invSuff[i]] = k; // lcp for the present suffix.
// Deleting the starting character from the string.
if (k>0)
k--;
}
// return the constructed lcp array
return lcp;
}
// Utility function to print an array
void printArr(vector<int>arr, int n)
{
for (int i = 0; i < n; i++)
cout << arr[i] << " ";
cout << endl;
}
// Driver program
int main()
{
int t;
cin >> t;
//t = 1;
while (t > 0) {
//string str = "banana";
string str;
cin >> str; // >> k;
vector<int>suffixArr = buildSuffixArray(str, str.length());
int n = suffixArr.size();
cout << "Suffix Array : \n";
printArr(suffixArr, n);
vector<int>lcp = kasai(str, suffixArr);
cout << "\nLCP Array : \n";
printArr(lcp, n);
// cum will hold number of substrings if that'a what you want (total = cum[n-1]
cum[0] = n - suffixArr[0];
// vector <pair<int,int>> substrs[n];
int count = 1;
for (int i = 1; i <= n-suffixArr[0]; i++) {
//substrs[0].push_back({suffixArr[0],i});
string sub_str = str.substr(suffixArr[0],i);
cout << count << " " << sub_str << endl;
count++;
}
for(int i = 1;i < n;i++) {
cum[i] = cum[i-1] + (n - suffixArr[i] - lcp[i - 1]);
int end = n - suffixArr[i];
int begin = lcp[i-1] + 1;
int begin_suffix = suffixArr[i];
for (int j = begin, k = 1; j <= end; j++, k++) {
//substrs[i].push_back({begin_suffix, lcp[i-1] + k});
// cout << "i push " << i << " " << begin_suffix << " " << k << endl;
string sub_str = str.substr(begin_suffix, lcp[i-1] +k);
cout << count << " " << sub_str << endl;
count++;
}
}
/*int count = 1;
cout << endl;
for(int i = 0; i < n; i++){
for (auto it = substrs[i].begin(); it != substrs[i].end(); ++it ) {
string sub_str = str.substr(it->first, it->second);
cout << count << " " << sub_str << endl;
count++;
}
}*/
t--;
}
return 0;
}
And here's a simpler algorithm:
#include <iostream>
#include <string.h>
#include <vector>
#include <string>
#include <algorithm>
#include <time.h>
using namespace std;
char txt[100000], *p[100000];
int m, n;
int cmp(const void *p, const void *q) {
int rc = memcmp(*(char **)p, *(char **)q, m);
return rc;
}
int main() {
std::cin >> txt;
int start_s = clock();
n = strlen(txt);
int k; int i;
int count = 1;
for (m = 1; m <= n; m++) {
for (k = 0; k+m <= n; k++)
p[k] = txt+k;
qsort(p, k, sizeof(p[0]), &cmp);
for (i = 0; i < k; i++) {
if (i != 0 && cmp(&p[i-1], &p[i]) == 0){
continue;
}
char cur_txt[100000];
memcpy(cur_txt, p[i],m);
cur_txt[m] = '\0';
std::cout << count << " " << cur_txt << std::endl;
count++;
}
}
cout << --count << endl;
int stop_s = clock();
float run_time = (stop_s - start_s) / double(CLOCKS_PER_SEC);
cout << endl << "distinct substrings \t\tExecution time = " << run_time << " seconds" << endl;
return 0;
}
Both algorithms listed a simply too slow for extremely long strings though. I tested the algorithms against a string of length over 47,000 and the algorithms took over 20 minutes to complete, with the first one taking 1200 seconds, and the second one taking 1360 seconds, and that's just counting the unique substrings without outputting to the terminal. So for probably strings of length up to 1000 you might get a working solution. Both solutions did compute the same total number of unique substrings though. I did test both algorithms against string lengths of 2000 and 10,000. The times were for the first algorithm: 0.33 s and 12 s; for the second algorithm it was 0.535 s and 20 s. So it looks like in general the first algorithm is faster.
Here is my code in Python. It generates all possible substrings of any given string.
def find_substring(str_in):
substrs = []
if len(str_in) <= 1:
return [str_in]
s1 = find_substring(str_in[:1])
s2 = find_substring(str_in[1:])
substrs.append(s1)
substrs.append(s2)
for s11 in s1:
substrs.append(s11)
for s21 in s2:
substrs.append("%s%s" %(s11, s21))
for s21 in s2:
substrs.append(s21)
return set(substrs)
If you pass str_ = "abcdef" to the function, it generates the following results:
a, ab, abc, abcd, abcde, abcdef, abcdf, abce, abcef, abcf, abd, abde, abdef, abdf, abe, abef, abf, ac, acd, acde, acdef, acdf, ace, acef, acf, ad, ade, adef, adf, ae, aef, af, b, bc, bcd, bcde, bcdef, bcdf, bce, bcef, bcf, bd, bde, bdef, bdf, be, bef, bf, c, cd, cde, cdef, cdf, ce, cef, cf, d, de, def, df, e, ef, f

Resources