Best algorithm to pair items of two queues - algorithm

I have to find the best algorithm to define pairing between the items from two lists as in the figure. The pair is valid only if the number of node in list A is lower than number of node in list B and there are no crosses between links. The quality of the matching algorithm is determined by the total number of links.
I firstly tried to use a very simple algorithm: take a node in the list A and then look for the first node in list B that is higher than the former. The second figure shows a test case where this algorithm is not the best one.

Simple back-tracking can work (it may not be optimal, but it will certainly work).
For each legal pairing A[i], B[j], there are two choices:
take it, and make it illegal to try to pair any A[x], B[y] with x>i and y<j
not take it, and look at other possible pairs
By incrementally adding legal pairs to a bunch of pairs, you will eventually exhaust all legal pairings down a path. The number of valid pairings in a path is what you seek to maximize, and this algorithm will look at all possible answers and is guaranteed to work.
Pseudocode:
function search(currentPairs):
bestPairing = currentPairs
for each currently legal pair:
nextPairing = search(copyOf(currentPairs) + this pair)
if length of nextPairing > length of bestPairing:
bestPairing = nextPairing
return bestPairing
Initially, you will pass an empty currentPairs. Searching for legal pairs is the tricky part. You can use 3 nested loops that look at all A[x], B[y], and finally, if A[x] < B[y], look against all currentPairs to see if the there is a crossing line (the cost of this is roughly O(n^3)); or you can use a boolean matrix of valid pairings, which you update at each level (less computation time, down to O(n^2) - but more expensive in terms of memory)

Here a Java implementation.
For convinience I first build a map with the valid choices for each entry of list(array) a to b.
Then I loop throuough the list, making no choice and the valid choices for a connection to b.
Since you cant go back without crossing the existing connections I keep track of the maximum assigned in b.
Works at least for the two examples...
public class ListMatcher {
private int[] a ;
private int[] b ;
private Map<Integer,List<Integer>> choicesMap;
public ListMatcher(int[] a, int[] b) {
this.a = a;
this.b = b;
choicesMap = makeMap(a,b);
}
public Map<Integer,Integer> solve() {
Map<Integer,Integer> solution = new HashMap<>();
return solve(solution, 0, -1);
}
private Map<Integer,Integer> solve(Map<Integer,Integer> soFar, int current, int max) {
// done
if (current >= a.length) {
return soFar;
}
// make no choice from this entry
Map<Integer, Integer> solution = solve(new HashMap<>(soFar),current+1, max);
for (Integer choice : choicesMap.get(current)) {
if (choice > max) // can't go back
{
Map<Integer,Integer> next = new HashMap<>(soFar);
next.put(current, choice);
next = solve(next, current+1, choice);
if (next.size() > solution.size()) {
solution = next;
}
}
}
return solution;
}
// init possible choices
private Map<Integer, List<Integer>> makeMap(int[] a, int[] b) {
Map<Integer,List<Integer>> possibleMap = new HashMap<>();
for(int i = 0; i < a.length; i++) {
List<Integer> possible = new ArrayList<>();
for(int j = 0; j < b.length; j++) {
if (a[i] < b[j]) {
possible.add(j);
}
}
possibleMap.put(i, possible);
}
return possibleMap;
}
public static void main(String[] args) {
ListMatcher matcher = new ListMatcher(new int[]{3,7,2,1,5,9,2,2},new int[]{4,5,10,1,12,3,6,7});
System.out.println(matcher.solve());
matcher = new ListMatcher(new int[]{10,1,1,1,1,1,1,1},new int[]{2,2,2,2,2,2,2,101});
System.out.println(matcher.solve());
}
}
Output
(format: zero-based index_in_a=index_in_b)
{2=0, 3=1, 4=2, 5=4, 6=5, 7=6}
{1=0, 2=1, 3=2, 4=3, 5=4, 6=5, 7=6}
Your solution isn't picked because the solutions making no choice are picked first.
You can change this by processing the loop first...

Thanks to David's suggestion, I finally found the algorithm. It is an LCS approach, replacing the '=' with an '>'.
Recursive approach
The recursive approach is very straightforward. G and V are the two vectors with size n and m (adding a 0 at the beginning of both). Starting from the end, if last from G is larger than last from V, then return 1 + the function evaluated without the last item, otherwise return max of the function removing last from G or last from V.
int evaluateMaxRecursive(vector<int> V, vector<int> G, int n, int m) {
if ((n == 0) || (m == 0)) {
return 0;
}
else {
if (V[n] < G[m]) {
return 1 + evaluateMaxRecursive(V, G, n - 1, m - 1);
} else {
return max(evaluateMaxRecursive(V, G, n - 1, m), evaluateMaxRecursive(V, G, n, m - 1));
}
}
};
The recursive approach is valid with small number of items, due to the re-evaluation of same lists that occur during the loop.
Non recursive approach
The non recursive approach goes in the opposite direction and works with a table that is filled in after having clared to 0 first row and first column. The max value is the value in the bottom left corner of the table
int evaluateMax(vector<int> V, vector<int> G, int n, int m) {
int** table = new int* [n + 1];
for (int i = 0; i < n + 1; ++i)
table[i] = new int[m + 1];
for (int i = 0; i < n + 1; i++)
for (int t = 0; t < m + 1; t++)
table[i][t] = 0;
for (int i = 1; i < m + 1; i++)
for (int t = 1; t < n + 1; t++) {
if (G[i - 1] > V[t - 1]) {
table[t] [i] = 1 + table[t - 1][i - 1];
}
else {
table[t][i] = max(table[t][i - 1], table[t - 1][i]);
}
}
return table[n][m];
}
You can find more details here LCS - Wikipedia

Related

Class Z behaves like which well-known data structure?

I am working with this question, which I am unsure about:
Class Z behaves like which well-known data structure?
Where the possible answers is:
A. (LIFO) Stack.
B. (FIFO) Queue.
C. Priority queue.
D. Union–Find.
By looking at the code, I think the answer is D - union find. If we look at the methods query, last or first, we see it uses Union-find data-structure to determine if the array is equal or not.
public class Z
{
int[] next, prev;
Z(int N) {
prev = new int[N];
next = new int[N];
for (int i = 0; i<N; ++i) {
// put element i in a list of its own
next[i] = i;
prev[i] = i;
}
}
int first(int i) {
// return first element of list containing i
while (i != prev[i]) i = prev[i];
return i;
}
int last(int i) {
// return last element of list containing i
while (i != next[i]) i = next[i];
return i;
}
void update(int i, int j) {
int f = first(j);
int l = last(i);
next[l] = f;
prev[f] = l;
}
boolean query(int i, int j) {
return last(i) == last(j);
}
}
Yes, you're right -- it can be used as a Union Find datastructure. If z is an instance of this class, then Union can be written as if !z.query(i, j) z.update(i, j), and Find can be written z.last(i).
Details
Z keeps the integers 0, 1, ..., N-1in a set of disjoint lists, with each integer in its own list initially. update(i, j) appends the list containing j to the list containing i. first(i) and last(i) return the first and last element of the list containing i. query(i, j) reports whether i and j are in the same list.
The implementation requires update(i, j) to only be called if i and j are not already in the same list (otherwise lists become loops, and subsequent calls to any of the methods may not terminate), and its efficiency is poor as the usual disjoint-union-datastructure optimizations aren't made.

Maximum subarray sum modulo M

Most of us are familiar with the maximum sum subarray problem. I came across a variant of this problem which asks the programmer to output the maximum of all subarray sums modulo some number M.
The naive approach to solve this variant would be to find all possible subarray sums (which would be of the order of N^2 where N is the size of the array). Of course, this is not good enough. The question is - how can we do better?
Example: Let us consider the following array:
6 6 11 15 12 1
Let M = 13. In this case, subarray 6 6 (or 12 or 6 6 11 15 or 11 15 12) will yield maximum sum ( = 12 ).
We can do this as follow:
Maintaining an array sum which at index ith, it contains the modulus sum from 0 to ith.
For each index ith, we need to find the maximum sub sum that end at this index:
For each subarray (start + 1 , i ), we know that the mod sum of this sub array is
int a = (sum[i] - sum[start] + M) % M
So, we can only achieve a sub-sum larger than sum[i] if sum[start] is larger than sum[i] and as close to sum[i] as possible.
This can be done easily if you using a binary search tree.
Pseudo code:
int[] sum;
sum[0] = A[0];
Tree tree;
tree.add(sum[0]);
int result = sum[0];
for(int i = 1; i < n; i++){
sum[i] = sum[i - 1] + A[i];
sum[i] %= M;
int a = tree.getMinimumValueLargerThan(sum[i]);
result = max((sum[i] - a + M) % M, result);
tree.add(sum[i]);
}
print result;
Time complexity :O(n log n)
Let A be our input array with zero-based indexing. We can reduce A modulo M without changing the result.
First of all, let's reduce the problem to a slightly easier one by computing an array P representing the prefix sums of A, modulo M:
A = 6 6 11 2 12 1
P = 6 12 10 12 11 12
Now let's process the possible left borders of our solution subarrays in decreasing order. This means that we will first determine the optimal solution that starts at index n - 1, then the one that starts at index n - 2 etc.
In our example, if we chose i = 3 as our left border, the possible subarray sums are represented by the suffix P[3..n-1] plus a constant a = A[i] - P[i]:
a = A[3] - P[3] = 2 - 12 = 3 (mod 13)
P + a = * * * 2 1 2
The global maximum will occur at one point too. Since we can insert the suffix values from right to left, we have now reduced the problem to the following:
Given a set of values S and integers x and M, find the maximum of S + x modulo M
This one is easy: Just use a balanced binary search tree to manage the elements of S. Given a query x, we want to find the largest value in S that is smaller than M - x (that is the case where no overflow occurs when adding x). If there is no such value, just use the largest value of S. Both can be done in O(log |S|) time.
Total runtime of this solution: O(n log n)
Here's some C++ code to compute the maximum sum. It would need some minor adaptions to also return the borders of the optimal subarray:
#include <bits/stdc++.h>
using namespace std;
int max_mod_sum(const vector<int>& A, int M) {
vector<int> P(A.size());
for (int i = 0; i < A.size(); ++i)
P[i] = (A[i] + (i > 0 ? P[i-1] : 0)) % M;
set<int> S;
int res = 0;
for (int i = A.size() - 1; i >= 0; --i) {
S.insert(P[i]);
int a = (A[i] - P[i] + M) % M;
auto it = S.lower_bound(M - a);
if (it != begin(S))
res = max(res, *prev(it) + a);
res = max(res, (*prev(end(S)) + a) % M);
}
return res;
}
int main() {
// random testing to the rescue
for (int i = 0; i < 1000; ++i) {
int M = rand() % 1000 + 1, n = rand() % 1000 + 1;
vector<int> A(n);
for (int i = 0; i< n; ++i)
A[i] = rand() % M;
int should_be = 0;
for (int i = 0; i < n; ++i) {
int sum = 0;
for (int j = i; j < n; ++j) {
sum = (sum + A[j]) % M;
should_be = max(should_be, sum);
}
}
assert(should_be == max_mod_sum(A, M));
}
}
For me, all explanations here were awful, since I didn't get the searching/sorting part. How do we search/sort, was unclear.
We all know that we need to build prefixSum, meaning sum of all elems from 0 to i with modulo m
I guess, what we are looking for is clear.
Knowing that subarray[i][j] = (prefix[i] - prefix[j] + m) % m (indicating the modulo sum from index i to j), our maxima when given prefix[i] is always that prefix[j] which is as close as possible to prefix[i], but slightly bigger.
E.g. for m = 8, prefix[i] being 5, we are looking for the next value after 5, which is in our prefixArray.
For efficient search (binary search) we sort the prefixes.
What we can not do is, build the prefixSum first, then iterate again from 0 to n and look for index in the sorted prefix array, because we can find and endIndex which is smaller than our startIndex, which is no good.
Therefore, what we do is we iterate from 0 to n indicating the endIndex of our potential max subarray sum and then look in our sorted prefix array, (which is empty at the beginning) which contains sorted prefixes between 0 and endIndex.
def maximumSum(coll, m):
n = len(coll)
maxSum, prefixSum = 0, 0
sortedPrefixes = []
for endIndex in range(n):
prefixSum = (prefixSum + coll[endIndex]) % m
maxSum = max(maxSum, prefixSum)
startIndex = bisect.bisect_right(sortedPrefixes, prefixSum)
if startIndex < len(sortedPrefixes):
maxSum = max(maxSum, prefixSum - sortedPrefixes[startIndex] + m)
bisect.insort(sortedPrefixes, prefixSum)
return maxSum
From your question, it seems that you have created an array to store the cumulative sums (Prefix Sum Array), and are calculating the sum of the sub-array arr[i:j] as (sum[j] - sum[i] + M) % M. (arr and sum denote the given array and the prefix sum array respectively)
Calculating the sum of every sub-array results in a O(n*n) algorithm.
The question that arises is -
Do we really need to consider the sum of every sub-array to reach the desired maximum?
No!
For a value of j the value (sum[j] - sum[i] + M) % M will be maximum when sum[i] is just greater than sum[j] or the difference is M - 1.
This would reduce the algorithm to O(nlogn).
You can take a look at this explanation! https://www.youtube.com/watch?v=u_ft5jCDZXk
There are already a bunch of great solutions listed here, but I wanted to add one that has O(nlogn) runtime without using a balanced binary tree, which isn't in the Python standard library. This solution isn't my idea, but I had to think a bit as to why it worked. Here's the code, explanation below:
def maximumSum(a, m):
prefixSums = [(0, -1)]
for idx, el in enumerate(a):
prefixSums.append(((prefixSums[-1][0] + el) % m, idx))
prefixSums = sorted(prefixSums)
maxSeen = prefixSums[-1][0]
for (a, a_idx), (b, b_idx) in zip(prefixSums[:-1], prefixSums[1:]):
if a_idx > b_idx and b > a:
maxSeen = max((a-b) % m, maxSeen)
return maxSeen
As with the other solutions, we first calculate the prefix sums, but this time we also keep track of the index of the prefix sum. We then sort the prefix sums, as we want to find the smallest difference between prefix sums modulo m - sorting lets us just look at adjacent elements as they have the smallest difference.
At this point you might think we're neglecting an essential part of the problem - we want the smallest difference between prefix sums, but the larger prefix sum needs to appear before the smaller prefix sum (meaning it has a smaller index). In the solutions using trees, we ensure that by adding prefix sums one by one and recalculating the best solution.
However, it turns out that we can look at adjacent elements and just ignore ones that don't satisfy our index requirement. This confused me for some time, but the key realization is that the optimal solution will always come from two adjacent elements. I'll prove this via a contradiction. Let's say that the optimal solution comes from two non-adjacent prefix sums x and z with indices i and k, where z > x (it's sorted!) and k > i:
x ... z
k ... i
Let's consider one of the numbers between x and z, and let's call it y with index j. Since the list is sorted, x < y < z.
x ... y ... z
k ... j ... i
The prefix sum y must have index j < i, otherwise it would be part of a better solution with z. But if j < i, then j < k and y and x form a better solution than z and x! So any elements between x and z must form a better solution with one of the two, which contradicts our original assumption. Therefore the optimal solution must come from adjacent prefix sums in the sorted list.
Here is Java code for maximum sub array sum modulo. We handle the case we can not find least element in the tree strictly greater than s[i]
public static long maxModulo(long[] a, final long k) {
long[] s = new long[a.length];
TreeSet<Long> tree = new TreeSet<>();
s[0] = a[0] % k;
tree.add(s[0]);
long result = s[0];
for (int i = 1; i < a.length; i++) {
s[i] = (s[i - 1] + a[i]) % k;
// find least element in the tree strictly greater than s[i]
Long v = tree.higher(s[i]);
if (v == null) {
// can't find v, then compare v and s[i]
result = Math.max(s[i], result);
} else {
result = Math.max((s[i] - v + k) % k, result);
}
tree.add(s[i]);
}
return result;
}
Few points from my side that might hopefully help someone understand the problem better.
You do not need to add +M to the modulo calculation, as mentioned, % operator handles negative numbers well, so a % M = (a + M) % M
As mentioned, the trick is to build the proxy sum table such that
proxy[n] = (a[1] + ... a[n]) % M
This then allows one to represent the maxSubarraySum[i, j] as
maxSubarraySum[i, j] = (proxy[j] - proxy[j]) % M
The implementation trick is to build the proxy table as we iterate through the elements, instead of first pre-building it and then using. This is because for each new element in the array a[i] we want to compute proxy[i] and find proxy[j] that is bigger than but as close as possible to proxy[i] (ideally bigger by 1 because this results in a reminder of M - 1). For this we need to use a clever data structure for building proxy table while keeping it sorted and
being able to quickly find a closest bigger element to proxy[i]. bisect.bisect_right is a good choice in Python.
See my Python implementation below (hope this helps but I am aware this might not necessarily be as concise as others' solutions):
def maximumSum(a, m):
prefix_sum = [a[0] % m]
prefix_sum_sorted = [a[0] % m]
current_max = prefix_sum_sorted[0]
for elem in a[1:]:
prefix_sum_next = (prefix_sum[-1] + elem) % m
prefix_sum.append(prefix_sum_next)
idx_closest_bigger = bisect.bisect_right(prefix_sum_sorted, prefix_sum_next)
if idx_closest_bigger >= len(prefix_sum_sorted):
current_max = max(current_max, prefix_sum_next)
bisect.insort_right(prefix_sum_sorted, prefix_sum_next)
continue
if prefix_sum_sorted[idx_closest_bigger] > prefix_sum_next:
current_max = max(current_max, (prefix_sum_next - prefix_sum_sorted[idx_closest_bigger]) % m)
bisect.insort_right(prefix_sum_sorted, prefix_sum_next)
return current_max
Total java implementation with O(n*log(n))
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.TreeSet;
import java.util.stream.Stream;
public class MaximizeSumMod {
public static void main(String[] args) throws Exception{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
Long times = Long.valueOf(in.readLine());
while(times --> 0){
long[] pair = Stream.of(in.readLine().split(" ")).mapToLong(Long::parseLong).toArray();
long mod = pair[1];
long[] numbers = Stream.of(in.readLine().split(" ")).mapToLong(Long::parseLong).toArray();
printMaxMod(numbers,mod);
}
}
private static void printMaxMod(long[] numbers, Long mod) {
Long maxSoFar = (numbers[numbers.length-1] + numbers[numbers.length-2])%mod;
maxSoFar = (maxSoFar > (numbers[0]%mod)) ? maxSoFar : numbers[0]%mod;
numbers[0] %=mod;
for (Long i = 1L; i < numbers.length; i++) {
long currentNumber = numbers[i.intValue()]%mod;
maxSoFar = maxSoFar > currentNumber ? maxSoFar : currentNumber;
numbers[i.intValue()] = (currentNumber + numbers[i.intValue()-1])%mod;
maxSoFar = maxSoFar > numbers[i.intValue()] ? maxSoFar : numbers[i.intValue()];
}
if(mod.equals(maxSoFar+1) || numbers.length == 2){
System.out.println(maxSoFar);
return;
}
long previousNumber = numbers[0];
TreeSet<Long> set = new TreeSet<>();
set.add(previousNumber);
for (Long i = 2L; i < numbers.length; i++) {
Long currentNumber = numbers[i.intValue()];
Long ceiling = set.ceiling(currentNumber);
if(ceiling == null){
set.add(numbers[i.intValue()-1]);
continue;
}
if(ceiling.equals(currentNumber)){
set.remove(ceiling);
Long greaterCeiling = set.ceiling(currentNumber);
if(greaterCeiling == null){
set.add(ceiling);
set.add(numbers[i.intValue()-1]);
continue;
}
set.add(ceiling);
ceiling = greaterCeiling;
}
Long newMax = (currentNumber - ceiling + mod);
maxSoFar = maxSoFar > newMax ? maxSoFar :newMax;
set.add(numbers[i.intValue()-1]);
}
System.out.println(maxSoFar);
}
}
Adding STL C++11 code based on the solution suggested by #Pham Trung. Might be handy.
#include <iostream>
#include <set>
int main() {
int N;
std::cin>>N;
for (int nn=0;nn<N;nn++){
long long n,m;
std::set<long long> mSet;
long long maxVal = 0; //positive input values
long long sumVal = 0;
std::cin>>n>>m;
mSet.insert(m);
for (long long q=0;q<n;q++){
long long tmp;
std::cin>>tmp;
sumVal = (sumVal + tmp)%m;
auto itSub = mSet.upper_bound(sumVal);
maxVal = std::max(maxVal,(m + sumVal - *itSub)%m);
mSet.insert(sumVal);
}
std::cout<<maxVal<<"\n";
}
}
As you can read in Wikipedia exists a solution called Kadane's algorithm, which compute the maximum subarray sum watching ate the maximum subarray ending at position i for all positions i by iterating once over the array. Then this solve the problem with with runtime complexity O(n).
Unfortunately, I think that Kadane's algorithm isn't able to find all possible solution when more than one solution exists.
An implementation in Java, I didn't tested it:
public int[] kadanesAlgorithm (int[] array) {
int start_old = 0;
int start = 0;
int end = 0;
int found_max = 0;
int max = array[0];
for(int i = 0; i<array.length; i++) {
max = Math.max(array[i], max + array[i]);
found_max = Math.max(found_max, max);
if(max < 0)
start = i+1;
else if(max == found_max) {
start_old=start;
end = i;
}
}
return Arrays.copyOfRange(array, start_old, end+1);
}
I feel my thoughts are aligned with what have been posted already, but just in case - Kotlin O(NlogN) solution:
val seen = sortedSetOf(0L)
var prev = 0L
return max(a.map { x ->
val z = (prev + x) % m
prev = z
seen.add(z)
seen.higher(z)?.let{ y ->
(z - y + m) % m
} ?: z
})
Implementation in java using treeset...
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.TreeSet;
public class Main {
public static void main(String[] args) throws IOException {
BufferedReader read = new BufferedReader(new InputStreamReader(System.in)) ;
String[] str = read.readLine().trim().split(" ") ;
int n = Integer.parseInt(str[0]) ;
long m = Long.parseLong(str[1]) ;
str = read.readLine().trim().split(" ") ;
long[] arr = new long[n] ;
for(int i=0; i<n; i++) {
arr[i] = Long.parseLong(str[i]) ;
}
long maxCount = 0L ;
TreeSet<Long> tree = new TreeSet<>() ;
tree.add(0L) ;
long prefix = 0L ;
for(int i=0; i<n; i++) {
prefix = (prefix + arr[i]) % m ;
maxCount = Math.max(prefix, maxCount) ;
Long temp = tree.higher(prefix) ;
System.out.println(temp);
if(temp != null) {
maxCount = Math.max((prefix-temp+m)%m, maxCount) ;
}
//System.out.println(maxCount);
tree.add(prefix) ;
}
System.out.println(maxCount);
}
}
Here is one implementation of solution in java for this problem which works using TreeSet in java for optimized solution !
public static long maximumSum2(long[] arr, long n, long m)
{
long x = 0;
long prefix = 0;
long maxim = 0;
TreeSet<Long> S = new TreeSet<Long>();
S.add((long)0);
// Traversing the array.
for (int i = 0; i < n; i++)
{
// Finding prefix sum.
prefix = (prefix + arr[i]) % m;
// Finding maximum of prefix sum.
maxim = Math.max(maxim, prefix);
// Finding iterator poing to the first
// element that is not less than value
// "prefix + 1", i.e., greater than or
// equal to this value.
long it = S.higher(prefix)!=null?S.higher(prefix):0;
// boolean isFound = false;
// for (long j : S)
// {
// if (j >= prefix + 1)
// if(isFound == false) {
// it = j;
// isFound = true;
// }
// else {
// if(j < it) {
// it = j;
// }
// }
// }
if (it != 0)
{
maxim = Math.max(maxim, prefix - it + m);
}
// adding prefix in the set.
S.add(prefix);
}
return maxim;
}
public static int MaxSequence(int[] arr)
{
int maxSum = 0;
int partialSum = 0;
int negative = 0;
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] < 0)
{
negative++;
}
}
if (negative == arr.Length)
{
return 0;
}
foreach (int item in arr)
{
partialSum += item;
maxSum = Math.Max(maxSum, partialSum);
if (partialSum < 0)
{
partialSum = 0;
}
}
return maxSum;
}
Modify Kadane algorithm to keep track of #occurrence. Below is the code.
#python3
#source: https://github.com/harishvc/challenges/blob/master/dp-largest-sum-sublist-modulo.py
#Time complexity: O(n)
#Space complexity: O(n)
def maxContiguousSum(a,K):
sum_so_far =0
max_sum = 0
count = {} #keep track of occurrence
for i in range(0,len(a)):
sum_so_far += a[i]
sum_so_far = sum_so_far%K
if sum_so_far > 0:
max_sum = max(max_sum,sum_so_far)
if sum_so_far in count.keys():
count[sum_so_far] += 1
else:
count[sum_so_far] = 1
else:
assert sum_so_far < 0 , "Logic error"
#IMPORTANT: reset sum_so_far
sum_so_far = 0
return max_sum,count[max_sum]
a = [6, 6, 11, 15, 12, 1]
K = 13
max_sum,count = maxContiguousSum(a,K)
print("input >>> %s max sum=%d #occurrence=%d" % (a,max_sum,count))

How to perform K-swap operations on an N-digit integer to get maximum possible number

I recently went through an interview and was asked this question. Let me explain the question properly:
Given a number M (N-digit integer) and K number of swap operations(a swap
operation can swap 2 digits), devise an algorithm to get the maximum
possible integer?
Examples:
M = 132 K = 1 output = 312
M = 132 K = 2 output = 321
M = 7899 k = 2 output = 9987
My solution ( algorithm in pseudo-code). I used a max-heap to get the maximum digit out of N-digits in each of the K-operations and then suitably swapping it.
for(int i = 0; i<K; i++)
{
int max_digit_currently = GetMaxFromHeap();
// The above function GetMaxFromHeap() pops out the maximum currently and deletes it from heap
int index_to_swap_with = GetRightMostOccurenceOfTheDigitObtainedAbove();
// This returns me the index of the digit obtained in the previous function
// .e.g If I have 436659 and K=2 given,
// then after K=1 I'll have 936654 and after K=2, I should have 966354 and not 963654.
// Now, the swap part comes. Here the gotcha is, say with the same above example, I have K=3.
// If I do GetMaxFromHeap() I'll get 6 when K=3, but I should not swap it,
// rather I should continue for next iteration and
// get GetMaxFromHeap() to give me 5 and then get 966534 from 966354.
if (Value_at_index_to_swap == max_digit_currently)
continue;
else
DoSwap();
}
Time complexity: O(K*( N + log_2(N) ))
// K-times [log_2(N) for popping out number from heap & N to get the rightmost index to swap with]
The above strategy fails in this example:
M = 8799 and K = 2
Following my strategy, I'll get M = 9798 after K=1 and M = 9978 after K=2. However, the maximum I can get is M = 9987 after K=2.
What did I miss?
Also suggest other ways to solve the problem & ways to optimize my solution.
I think the missing part is that, after you've performed the K swaps as in the algorithm described by the OP, you're left with some numbers that you can swap between themselves. For example, for the number 87949, after the initial algorithm we would get 99748. However, after that we can swap 7 and 8 "for free", i.e. not consuming any of the K swaps. This would mean "I'd rather not swap the 7 with the second 9 but with the first".
So, to get the max number, one would perform the algorithm described by the OP and remember the numbers which were moved to the right, and the positions to which they were moved. Then, sort these numbers in decreasing order and put them in the positions from left to right.
This is something like a separation of the algorithm in two phases - in the first one, you choose which numbers should go in the front to maximize the first K positions. Then you determine the order in which you would have swapped them with the numbers whose positions they took, so that the rest of the number is maximized as well.
Not all the details are clear, and I'm not 100% sure it handles all cases correctly, so if anyone can break it - go ahead.
This is a recursive function, which sorts the possible swap values for each (current-max) digit:
function swap2max(string, K) {
// the recursion end:
if (string.length==0 || K==0)
return string
m = getMaxDigit(string)
// an array of indices of the maxdigits to swap in the string
indices = []
// a counter for the length of that array, to determine how many chars
// from the front will be swapped
len = 0
// an array of digits to be swapped
front = []
// and the index of the last of those:
right = 0
// get those indices, in a loop with 2 conditions:
// * just run backwards through the string, until we meet the swapped range
// * no more swaps than left (K)
for (i=string.length; i-->right && len<K;)
if (m == string[i])
// omit digits that are already in the right place
while (right<=i && string[right] == m)
right++
// and when they need to be swapped
if (i>=right)
front.push(string[right++])
indices.push(i)
len++
// sort the digits to swap with
front.sort()
// and swap them
for (i=0; i<len; i++)
string.setCharAt(indices[i], front[i])
// the first len digits are the max ones
// the rest the result of calling the function on the rest of the string
return m.repeat(right) + swap2max(string.substr(right), K-len)
}
This is all pseudocode, but converts fairly easy to other languages. This solution is nonrecursive and operates in linear worst case and average case time.
You are provided with the following functions:
function k_swap(n, k1, k2):
temp = n[k1]
n[k1] = n[k2]
n[k2] = temp
int : operator[k]
// gets or sets the kth digit of an integer
property int : magnitude
// the number of digits in an integer
You could do something like the following:
int input = [some integer] // input value
int digitcounts[10] = {0, ...} // all zeroes
int digitpositions[10] = {0, ...) // all zeroes
bool filled[input.magnitude] = {false, ...) // all falses
for d = input[i = 0 => input.magnitude]:
digitcounts[d]++ // count number of occurrences of each digit
digitpositions[0] = 0;
for i = 1 => input.magnitude:
digitpositions[i] = digitpositions[i - 1] + digitcounts[i - 1] // output positions
for i = 0 => input.magnitude:
digit = input[i]
if filled[i] == true:
continue
k_swap(input, i, digitpositions[digit])
filled[digitpositions[digit]] = true
digitpositions[digit]++
I'll walk through it with the number input = 724886771
computed digitcounts:
{0, 1, 1, 0, 1, 0, 1, 3, 2, 0}
computed digitpositions:
{0, 0, 1, 2, 2, 3, 3, 4, 7, 9}
swap steps:
swap 0 with 0: 724886771, mark 0 visited
swap 1 with 4: 724876781, mark 4 visited
swap 2 with 5: 724778881, mark 5 visited
swap 3 with 3: 724778881, mark 3 visited
skip 4 (already visited)
skip 5 (already visited)
swap 6 with 2: 728776481, mark 2 visited
swap 7 with 1: 788776421, mark 1 visited
swap 8 with 6: 887776421, mark 6 visited
output number: 887776421
Edit:
This doesn't address the question correctly. If I have time later, I'll fix it but I don't right now.
How I would do it (in pseudo-c -- nothing fancy), assuming a fantasy integer array is passed where each element represents one decimal digit:
int[] sortToMaxInt(int[] M, int K) {
for (int i = 0; K > 0 && i < M.size() - 1; i++) {
if (swapDec(M, i)) K--;
}
return M;
}
bool swapDec(int[]& M, int i) {
/* no need to try and swap the value 9 as it is the
* highest possible value anyway. */
if (M[i] == 9) return false;
int max_dec = 0;
int max_idx = 0;
for (int j = i+1; j < M.size(); j++) {
if (M[j] >= max_dec) {
max_idx = j;
max_dec = M[j];
}
}
if (max_dec > M[i]) {
M.swapElements(i, max_idx);
return true;
}
return false;
}
From the top of my head so if anyone spots some fatal flaw please let me know.
Edit: based on the other answers posted here, I probably grossly misunderstood the problem. Anyone care to elaborate?
You start with max-number(M, N, 1, K).
max-number(M, N, pos, k)
{
if k == 0
return M
max-digit = 0
for i = pos to N
if M[i] > max-digit
max-digit = M[i]
if M[pos] == max-digit
return max-number(M, N, pos + 1, k)
for i = (pos + 1) to N
maxs.add(M)
if M[i] == max-digit
M2 = new M
swap(M2, i, pos)
maxs.add(max-number(M2, N, pos + 1, k - 1))
return maxs.max()
}
Here's my approach (It's not fool-proof, but covers the basic cases). First we'll need a function that extracts each DIGIT of an INT into a container:
std::shared_ptr<std::deque<int>> getDigitsOfInt(const int N)
{
int number(N);
std::shared_ptr<std::deque<int>> digitsQueue(new std::deque<int>());
while (number != 0)
{
digitsQueue->push_front(number % 10);
number /= 10;
}
return digitsQueue;
}
You obviously want to create the inverse of this, so convert such a container back to an INT:
const int getIntOfDigits(const std::shared_ptr<std::deque<int>>& digitsQueue)
{
int number(0);
for (std::deque<int>::size_type i = 0, iMAX = digitsQueue->size(); i < iMAX; ++i)
{
number = number * 10 + digitsQueue->at(i);
}
return number;
}
You also will need to find the MAX_DIGIT. It would be great to use std::max_element as it returns an iterator to the maximum element of a container, but if there are more you want the last of them. So let's implement our own max algorithm:
int getLastMaxDigitOfN(const std::shared_ptr<std::deque<int>>& digitsQueue, int startPosition)
{
assert(!digitsQueue->empty() && digitsQueue->size() > startPosition);
int maxDigitPosition(0);
int maxDigit(digitsQueue->at(startPosition));
for (std::deque<int>::size_type i = startPosition, iMAX = digitsQueue->size(); i < iMAX; ++i)
{
const int currentDigit(digitsQueue->at(i));
if (maxDigit <= currentDigit)
{
maxDigit = currentDigit;
maxDigitPosition = i;
}
}
return maxDigitPosition;
}
From here on its pretty straight what you have to do, put the right-most (last) MAX DIGITS to their places until you can swap:
const int solution(const int N, const int K)
{
std::shared_ptr<std::deque<int>> digitsOfN = getDigitsOfInt(N);
int pos(0);
int RemainingSwaps(K);
while (RemainingSwaps)
{
int lastHDPosition = getLastMaxDigitOfN(digitsOfN, pos);
if (lastHDPosition != pos)
{
std::swap<int>(digitsOfN->at(lastHDPosition), digitsOfN->at(pos));
++pos;
--RemainingSwaps;
}
}
return getIntOfDigits(digitsOfN);
}
There are unhandled corner-cases but I'll leave that up to you.
I assumed K = 2, but you can change the value!
Java code
public class Solution {
public static void main (String args[]) {
Solution d = new Solution();
System.out.println(d.solve(1234));
System.out.println(d.solve(9812));
System.out.println(d.solve(9876));
}
public int solve(int number) {
int[] array = intToArray(number);
int[] result = solve(array, array.length-1, 2);
return arrayToInt(result);
}
private int arrayToInt(int[] array) {
String s = "";
for (int i = array.length-1 ;i >= 0; i--) {
s = s + array[i]+"";
}
return Integer.parseInt(s);
}
private int[] intToArray(int number){
String s = number+"";
int[] result = new int[s.length()];
for(int i = 0 ;i < s.length() ;i++) {
result[s.length()-1-i] = Integer.parseInt(s.charAt(i)+"");
}
return result;
}
private int[] solve(int[] array, int endIndex, int num) {
if (endIndex == 0)
return array;
int size = num ;
int firstIndex = endIndex - size;
if (firstIndex < 0)
firstIndex = 0;
int biggest = findBiggestIndex(array, endIndex, firstIndex);
if (biggest!= endIndex) {
if (endIndex-biggest==num) {
while(num!=0) {
int temp = array[biggest];
array[biggest] = array[biggest+1];
array[biggest+1] = temp;
biggest++;
num--;
}
return array;
}else{
int n = endIndex-biggest;
for (int i = 0 ;i < n;i++) {
int temp = array[biggest];
array[biggest] = array[biggest+1];
array[biggest+1] = temp;
biggest++;
}
return solve(array, --biggest, firstIndex);
}
}else{
return solve(array, --endIndex, num);
}
}
private int findBiggestIndex(int[] array, int endIndex, int firstIndex) {
int result = firstIndex;
int max = array[firstIndex];
for (int i = firstIndex; i <= endIndex; i++){
if (array[i] > max){
max = array[i];
result = i;
}
}
return result;
}
}

Linear time algorithm for 2-SUM

Given an integer x and a sorted array a of N distinct integers, design a linear-time algorithm to determine if there exists two distinct indices i and j such that a[i] + a[j] == x
This is type of Subset sum problem
Here is my solution. I don't know if it was known earlier or not. Imagine 3D plot of function of two variables i and j:
sum(i,j) = a[i]+a[j]
For every i there is such j that a[i]+a[j] is closest to x. All these (i,j) pairs form closest-to-x line. We just need to walk along this line and look for a[i]+a[j] == x:
int i = 0;
int j = lower_bound(a.begin(), a.end(), x) - a.begin();
while (j >= 0 && j < a.size() && i < a.size()) {
int sum = a[i]+a[j];
if (sum == x) {
cout << "found: " << i << " " << j << endl;
return;
}
if (sum > x) j--;
else i++;
if (i > j) break;
}
cout << " not found\n";
Complexity: O(n)
think in terms of complements.
iterate over the list, figure out for each item what the number needed to get to X for that number is. stick number and complement into hash. while iterating check to see if number or its complement is in hash. if so, found.
edit: and as I have some time, some pseudo'ish code.
boolean find(int[] array, int x) {
HashSet<Integer> s = new HashSet<Integer>();
for(int i = 0; i < array.length; i++) {
if (s.contains(array[i]) || s.contains(x-array[i])) {
return true;
}
s.add(array[i]);
s.add(x-array[i]);
}
return false;
}
Given that the array is sorted (WLOG in descending order), we can do the following:
Algorithm A_1:
We are given (a_1,...,a_n,m), a_1<...,<a_n.
Put a pointer at the top of the list and one at the bottom.
Compute the sum where both pointers are.
If the sum is greater than m, move the above pointer down.
If the sum is less than m, move the lower pointer up.
If a pointer is on the other (here we assume each number can be employed only once), report unsat.
Otherwise, (an equivalent sum will be found), report sat.
It is clear that this is O(n) since the maximum number of sums computed is exactly n. The proof of correctness is left as an exercise.
This is merely a subroutine of the Horowitz and Sahni (1974) algorithm for SUBSET-SUM. (However, note that almost all general purpose SS algorithms contain such a routine, Schroeppel, Shamir (1981), Howgrave-Graham_Joux (2010), Becker-Joux (2011).)
If we were given an unordered list, implementing this algorithm would be O(nlogn) since we could sort the list using Mergesort, then apply A_1.
First pass search for the first value that is > ceil(x/2). Lets call this value L.
From index of L, search backwards till you find the other operand that matches the sum.
It is 2*n ~ O(n)
This we can extend to binary search.
Search for an element using binary search such that we find L, such that L is min(elements in a > ceil(x/2)).
Do the same for R, but now with L as the max size of searchable elements in the array.
This approach is 2*log(n).
Here's a python version using Dictionary data structure and number complement. This has linear running time(Order of N: O(N)):
def twoSum(N, x):
dict = {}
for i in range(len(N)):
complement = x - N[i]
if complement in dict:
return True
dict[N[i]] = i
return False
# Test
print twoSum([2, 7, 11, 15], 9) # True
print twoSum([2, 7, 11, 15], 3) # False
Iterate over the array and save the qualified numbers and their indices into the map. The time complexity of this algorithm is O(n).
vector<int> twoSum(vector<int> &numbers, int target) {
map<int, int> summap;
vector<int> result;
for (int i = 0; i < numbers.size(); i++) {
summap[numbers[i]] = i;
}
for (int i = 0; i < numbers.size(); i++) {
int searched = target - numbers[i];
if (summap.find(searched) != summap.end()) {
result.push_back(i + 1);
result.push_back(summap[searched] + 1);
break;
}
}
return result;
}
I would just add the difference to a HashSet<T> like this:
public static bool Find(int[] array, int toReach)
{
HashSet<int> hashSet = new HashSet<int>();
foreach (int current in array)
{
if (hashSet.Contains(current))
{
return true;
}
hashSet.Add(toReach - current);
}
return false;
}
Note: The code is mine but the test file was not. Also, this idea for the hash function comes from various readings on the net.
An implementation in Scala. It uses a hashMap and a custom (yet simple) mapping for the values. I agree that it does not makes use of the sorted nature of the initial array.
The hash function
I fix the bucket size by dividing each value by 10000. That number could vary, depending on the size you want for the buckets, which can be made optimal depending on the input range.
So for example, key 1 is responsible for all the integers from 1 to 9.
Impact on search scope
What that means, is that for a current value n, for which you're looking to find a complement c such as n + c = x (x being the element you're trying ton find a 2-SUM of), there is only 3 possibles buckets in which the complement can be:
-key
-key + 1
-key - 1
Let's say that your numbers are in a file of the following form:
0
1
10
10
-10
10000
-10000
10001
9999
-10001
-9999
10000
5000
5000
-5000
-1
1000
2000
-1000
-2000
Here's the implementation in Scala
import scala.collection.mutable
import scala.io.Source
object TwoSumRed {
val usage = """
Usage: scala TwoSumRed.scala [filename]
"""
def main(args: Array[String]) {
val carte = createMap(args) match {
case None => return
case Some(m) => m
}
var t: Int = 1
carte.foreach {
case (bucket, values) => {
var toCheck: Array[Long] = Array[Long]()
if (carte.contains(-bucket)) {
toCheck = toCheck ++: carte(-bucket)
}
if (carte.contains(-bucket - 1)) {
toCheck = toCheck ++: carte(-bucket - 1)
}
if (carte.contains(-bucket + 1)) {
toCheck = toCheck ++: carte(-bucket + 1)
}
values.foreach { v =>
toCheck.foreach { c =>
if ((c + v) == t) {
println(s"$c and $v forms a 2-sum for $t")
return
}
}
}
}
}
}
def createMap(args: Array[String]): Option[mutable.HashMap[Int, Array[Long]]] = {
var carte: mutable.HashMap[Int,Array[Long]] = mutable.HashMap[Int,Array[Long]]()
if (args.length == 1) {
val filename = args.toList(0)
val lines: List[Long] = Source.fromFile(filename).getLines().map(_.toLong).toList
lines.foreach { l =>
val idx: Int = math.floor(l / 10000).toInt
if (carte.contains(idx)) {
carte(idx) = carte(idx) :+ l
} else {
carte += (idx -> Array[Long](l))
}
}
Some(carte)
} else {
println(usage)
None
}
}
}
int[] b = new int[N];
for (int i = 0; i < N; i++)
{
b[i] = x - a[N -1 - i];
}
for (int i = 0, j = 0; i < N && j < N;)
if(a[i] == b[j])
{
cout << "found";
return;
} else if(a[i] < b[j])
i++;
else
j++;
cout << "not found";
Here is a linear time complexity solution O(n) time O(1) space
public void twoSum(int[] arr){
if(arr.length < 2) return;
int max = arr[0] + arr[1];
int bigger = Math.max(arr[0], arr[1]);
int smaller = Math.min(arr[0], arr[1]);
int biggerIndex = 0;
int smallerIndex = 0;
for(int i = 2 ; i < arr.length ; i++){
if(arr[i] + bigger <= max){ continue;}
else{
if(arr[i] > bigger){
smaller = bigger;
bigger = arr[i];
biggerIndex = i;
}else if(arr[i] > smaller)
{
smaller = arr[i];
smallerIndex = i;
}
max = bigger + smaller;
}
}
System.out.println("Biggest sum is: " + max + "with indices ["+biggerIndex+","+smallerIndex+"]");
}
Solution
We need array to store the indices
Check if the array is empty or contains less than 2 elements
Define the start and the end point of the array
Iterate till condition is met
Check if the sum is equal to the target. If yes get the indices.
If condition is not met then traverse left or right based on the sum value
Traverse to the right
Traverse to the left
For more info :[http://www.prathapkudupublog.com/2017/05/two-sum-ii-input-array-is-sorted.html
Credit to leonid
His solution in java, if you want to give it a shot
I removed the return, so if the array is sorted, but DOES allow duplicates, it still gives pairs
static boolean cpp(int[] a, int x) {
int i = 0;
int j = a.length - 1;
while (j >= 0 && j < a.length && i < a.length) {
int sum = a[i] + a[j];
if (sum == x) {
System.out.printf("found %s, %s \n", i, j);
// return true;
}
if (sum > x) j--;
else i++;
if (i > j) break;
}
System.out.println("not found");
return false;
}
The classic linear time two-pointer solution does not require hashing so can solve related problems such as approximate sum (find closest pair sum to target).
First, a simple n log n solution: walk through array elements a[i], and use binary search to find the best a[j].
To get rid of the log factor, use the following observation: as the list is sorted, iterating through indices i gives a[i] is increasing, so any corresponding a[j] is decreasing in value and in index j. This gives the two-pointer solution: start with indices lo = 0, hi = N-1 (pointing to a[0] and a[N-1]). For a[0], find the best a[hi] by decreasing hi. Then increment lo and for each a[lo], decrease hi until a[lo] + a[hi] is the best. The algorithm can stop when it reaches lo == hi.

Dynamic programming exercise for string cutting

I have been working on the following problem from this book.
A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position 10 first has a better cost of 20+10=30.
I need a dynamic programming algorithm that given m cuts, finds the minimum cost of cutting a string into m+1 pieces.
The divide and conquer approach seems to me the best one for this kind of problem. Here is a Java implementation of the algorithm:
Note: the array m should be sorted in ascending order (use Arrays.sort(m);)
public int findMinCutCost(int[] m, int n) {
int cost = n * m.length;
for (int i=0; i<m.length; i++) {
cost = Math.min(findMinCutCostImpl(m, n, i), cost);
}
return cost;
}
private int findMinCutCostImpl(int[] m, int n, int i) {
if (m.length == 1) return n;
int cl = 0, cr = 0;
if (i > 0) {
cl = Integer.MAX_VALUE;
int[] ml = Arrays.copyOfRange(m, 0, i);
int nl = m[i];
for (int j=0; j<ml.length; j++) {
cl = Math.min(findMinCutCostImpl(ml, nl, j), cl);
}
}
if (i < m.length - 1) {
cr = Integer.MAX_VALUE;
int[] mr = Arrays.copyOfRange(m, i + 1, m.length);
int nr = n - m[i];
for (int j=0; j<mr.length; j++) {
mr[j] = mr[j] - m[i];
}
for (int j=0; j<mr.length; j++) {
cr = Math.min(findMinCutCostImpl(mr, nr, j), cr);
}
}
return n + cl + cr;
}
For example :
int n = 20;
int[] m = new int[] { 10, 3 };
System.out.println(findMinCutCost(m, n));
Will print 30
** Edit **
I have implemented two other methods to answer the problem in the question.
1. Median cut approximation
This method cut recursively always the biggest chunks. The results are not always the best solution, but offers a not negligible gain (in the order of +100000% gain from my tests) for a negligible minimal cut loss difference from the best cost.
public int findMinCutCost2(int[] m, int n) {
if (m.length == 0) return 0;
if (m.length == 1) return n;
float half = n/2f;
int bestIndex = 0;
for (int i=1; i<m.length; i++) {
if (Math.abs(half - m[bestIndex]) > Math.abs(half - m[i])) {
bestIndex = i;
}
}
int cl = 0, cr = 0;
if (bestIndex > 0) {
int[] ml = Arrays.copyOfRange(m, 0, bestIndex);
int nl = m[bestIndex];
cl = findMinCutCost2(ml, nl);
}
if (bestIndex < m.length - 1) {
int[] mr = Arrays.copyOfRange(m, bestIndex + 1, m.length);
int nr = n - m[bestIndex];
for (int j=0; j<mr.length; j++) {
mr[j] = mr[j] - m[bestIndex];
}
cr = findMinCutCost2(mr, nr);
}
return n + cl + cr;
}
2. A constant time multi-cut
Instead of calculating the minimal cost, just use different indices and buffers. Since this method executes in a constant time, it always returns n. Plus, the method actually split the string in substrings.
public int findMinCutCost3(int[] m, int n) {
char[][] charArr = new char[m.length+1][];
charArr[0] = new char[m[0]];
for (int i=0, j=0, k=0; j<n; j++) {
//charArr[i][k++] = string[j]; // string is the actual string to split
if (i < m.length && j == m[i]) {
if (++i >= m.length) {
charArr[i] = new char[n - m[i-1]];
} else {
charArr[i] = new char[m[i] - m[i-1]];
}
k=0;
}
}
return n;
}
Note: that this last method could easily be modified to accept a String str argument instead of n and set n = str.length(), and return a String[] array from charArr[][].
For dynamic programming, I claim that all you really need to know is what the state space should be - how to represent partial problems.
Here we are dividing a string up into m+1 pieces by creating new breaks. I claim that a good state space is a set of (a, b) pairs, where a is the location of the start of a substring and b is the location of the end of the same substring, counted as number of breaks in the final broken down string. The cost associated with each pair is the minimum cost of breaking it up. If b <= a + 1, then the cost is 0, because there are no more breaks to put in. If b is larger, then the possible locations for the next break in that substring are the points a+1, a+2,... b-1. The next break is going to cost b-a regardless of where we put it, but if we put it at position k the minimum cost of later breaks is (a, k) + (k, b).
So to solve this with dynamic programming, build up a table (a, b) of minimum costs, where you can work out the cost of breaks on strings with k sections by considering k - 1 possible breaks and then looking up the costs of strings with at most k - 1 sections.
One way to expand on this would be to start by creating a table T[a, b] and setting all entries in that table to infinity. Then go over the table again and where b <= a+1 put T[a,b] = 0. This fills in entries representing sections of the original string which need no further cuts. Now scan through the table and for each T[a,b] with b > a + 1 consider every possible k such that a < k < b and if min_k ((length between breaks a and b) + T[a,k] + T[k,b]) < T[a,b] set T[a,b] to that minimum value. This recognizes where you now know a way to chop up the substrings represented by T[a,k] and T[k,b] cheaply, so this gives you a better way to chop up T[a,b]. If you now repeat this m times you are done - use a standard dynamic programming backtrack to work out the solution. It might help if you save the best value of k for each T[a,b] in a separate table.
python code:
mincost(n, cut_list) =min { n+ mincost(k,left_cut_list) + min(n-k, right_cut_list) }
import sys
def splitstr(n,cut_list):
if len(cut_list) == 0:
return [0,[]]
min_positions = []
min_cost = sys.maxint
for k in cut_list:
left_split = [ x for x in cut_list if x < k]
right_split = [ x-k for x in cut_list if x > k]
#print n,k, left_split, right_split
lcost = splitstr(k,left_split)
rcost = splitstr(n-k,right_split)
cost = n+lcost[0] + rcost[0]
positions = [k] + lcost[1]+ [x+k for x in rcost[1]]
#print "cost:", cost, " min: ", positions
if cost < min_cost:
min_cost = cost
min_positions = positions
return ( min_cost, min_positions)
print splitstr(20,[3,10,16]) # (40, [10, 3, 16])
print splitstr(20,[3,10]) # (30, [10, 3])
print splitstr(5,[1,2,3,4,5]) # (13, [2, 1, 3, 4, 5])
print splitstr(1,[1]) # (1, [1]) # m cuts m+1 substrings
Here is a c++ implementation. Its an O(n^3) Implementation using D.P . Assuming that the cut array is sorted . If it is not it takes O(n^3) time to sort it hence asymptotic time complexity remains same.
#include <iostream>
#include <string.h>
#include <stdio.h>
#include <limits.h>
using namespace std;
int main(){
int i,j,gap,k,l,m,n;
while(scanf("%d%d",&n,&k)!=EOF){
int a[n+1][n+1];
int cut[k];
memset(a,0,sizeof(a));
for(i=0;i<k;i++)
cin >> cut[i];
for(gap=1;gap<=n;gap++){
for(i=0,j=i+gap;j<=n;j++,i++){
if(gap==1)
a[i][j]=0;
else{
int min = INT_MAX;
for(m=0;m<k;m++){
if(cut[m]<j and cut[m] >i){
int cost=(j-i)+a[i][cut[m]]+a[cut[m]][j];
if(cost<min)
min=cost;
}
}
if(min>=INT_MAX)
a[i][j]=0;
else
a[i][j]=min;
}
}
}
cout << a[0][n] << endl;
}
return 0;
}

Resources