Algorithms: Interesting diffing algorithm

Algorithms: Interesting diffing algorithm - algorithm

This came up in a real-world situation, and I thought I would share it, as it could lead to some interesting solutions. Essentially, the algorithm needs to diff two lists, but let me give you a more rigorous definition of the problem.
Mathematical Formulation
Suppose you have two lists, L and R each of which contain elements from some underlying alphabet S. Moreover, these lists have the property that the common elements that they have appear in order: that is to say, if L[i] = R[i*] and L[j] = R[j*], and i < j then i* < j*. The lists need not have any common elements at all, and one or both may be empty. [Clarification: You may assume no repetitions of elements.]
The problem is to produce a sort of "diff" of the lists, which may be viewed as new list of ordered pairs (x,y) where x is from L and y is from R, with the following properties:
If x appears in both lists, then (x,x) appears in the result.
If x appears in L, but not in R, then (x,NULL) appears in the result.
If y appears in R, but not in L, then (NULL,y) appears in the result.
and finally
The result list has "the same" ordering as each of the input lists: it shares, roughly speaking, the same ordering property as above with each of the lists individually (see example).
Examples
L = (d)
R = (a,b,c)
Result = ((NULL,d), (a,NULL), (b,NULL), (c,NULL))
L = (a,b,c,d,e)
R = (b,q,c,d,g,e)
Result = ((a,NULL), (b,b), (NULL,q), (c,c), (d,d), (NULL,g), (e,e))
Does anyone have any good algorithms to solve this? What is the complexity?

There is a way to do this in O(n), if you're willing to make a copy of one of the lists in a different data structure. This is a classic time/space tradeoff.
Create a hash map of the list R, with the key being the element and the value being the original index into the array; in C++, you could use unordered_map from tr1 or boost.
Keep an index to the unprocessed portion of list R, initialized to the first element.
For each element in list L, check the hash map for a match in list R. If you do not find one, output (L value, NULL). If there is a match, get the corresponding index from the hash map. For each unprocessed element in list R up to the matching index, output (NULL, R value). For the match, output (value, value).
When you have reached the end of list L, go through the remaining elements of list R and output (NULL, R value).
Edit: Here is the solution in Python. To those who say this solution depends on the existence of a good hashing function - of course it does. The original poster may add additional constraints to the question if this is a problem, but I will take an optimistic stance until then.
def FindMatches(listL, listR):
result=[]
lookupR={}
for i in range(0, len(listR)):
lookupR[listR[i]] = i
unprocessedR = 0
for left in listL:
if left in lookupR:
for right in listR[unprocessedR:lookupR[left]]:
result.append((None,right))
result.append((left,left))
unprocessedR = lookupR[left] + 1
else:
result.append((left,None))
for right in listR[unprocessedR:]:
result.append((None,right))
return result
>>> FindMatches(('d'),('a','b','c'))
[('d', None), (None, 'a'), (None, 'b'), (None, 'c')]
>>> FindMatches(('a','b','c','d','e'),('b','q','c','d','g','e'))
[('a', None), ('b', 'b'), (None, 'q'), ('c', 'c'), ('d', 'd'), (None, 'g'), ('e','e')]

The worst case, as defined and using only equality, must be O(n*m). Consider the following two lists:
A[] = {a,b,c,d,e,f,g}
B[] = {h,i,j,k,l,m,n}
Assume there exists exactly one match between those two "ordered" lists. It will take O(n*m) comparisons since there does not exist a comparison which removes the need for other comparisons later.
So, any algorithm you come up with is going to be O(n*m), or worse.

Diffing ordered lists can be done in linear time by traversing both lists and matching as you go. I will try to post some psuedo Java code in an update.
Since we don't know the ordering algorithm and can't determine any ordering based on less than or greater than operators, we must consider the lists unordered. Also, given how the results are to be formatted you are faced with scanning both lists (at least until you find a match and then you can bookmark and start from there again). It will still be O(n^2) performance, or yes more specifically O(nm).

This is exactly like sequence alignment, you can use the Needleman-Wunsch algorithm to solve it. The link includes the code in Python. Just make sure you set the scoring so that a mismatch is negative and a match is positive and an alignment with a blank is 0 when maximizing. The algorithm runs in O(n * m) time and space, but the space complexity of this can be improved.
Scoring Function
int score(char x, char y){
if ((x == ' ') || (y == ' ')){
return 0;
}
else if (x != y){
return -1;
}
else if (x == y){
return 1;
}
else{
puts("Error!");
exit(2);
}
}
Code
#include <stdio.h>
#include <stdbool.h>
int max(int a, int b, int c){
bool ab, ac, bc;
ab = (a > b);
ac = (a > c);
bc = (b > c);
if (ab && ac){
return a;
}
if (!ab && bc){
return b;
}
if (!ac && !bc){
return c;
}
}
int score(char x, char y){
if ((x == ' ') || (y == ' ')){
return 0;
}
else if (x != y){
return -1;
}
else if (x == y){
return 1;
}
else{
puts("Error!");
exit(2);
}
}
void print_table(int **table, char str1[], char str2[]){
unsigned int i, j, len1, len2;
len1 = strlen(str1) + 1;
len2 = strlen(str2) + 1;
for (j = 0; j < len2; j++){
if (j != 0){
printf("%3c", str2[j - 1]);
}
else{
printf("%3c%3c", ' ', ' ');
}
}
putchar('\n');
for (i = 0; i < len1; i++){
if (i != 0){
printf("%3c", str1[i - 1]);
}
else{
printf("%3c", ' ');
}
for (j = 0; j < len2; j++){
printf("%3d", table[i][j]);
}
putchar('\n');
}
}
int **optimal_global_alignment_table(char str1[], char str2[]){
unsigned int len1, len2, i, j;
int **table;
len1 = strlen(str1) + 1;
len2 = strlen(str2) + 1;
table = malloc(sizeof(int*) * len1);
for (i = 0; i < len1; i++){
table[i] = calloc(len2, sizeof(int));
}
for (i = 0; i < len1; i++){
table[i][0] += i * score(str1[i], ' ');
}
for (j = 0; j < len1; j++){
table[0][j] += j * score(str1[j], ' ');
}
for (i = 1; i < len1; i++){
for (j = 1; j < len2; j++){
table[i][j] = max(
table[i - 1][j - 1] + score(str1[i - 1], str2[j - 1]),
table[i - 1][j] + score(str1[i - 1], ' '),
table[i][j - 1] + score(' ', str2[j - 1])
);
}
}
return table;
}
void prefix_char(char ch, char str[]){
int i;
for (i = strlen(str); i >= 0; i--){
str[i+1] = str[i];
}
str[0] = ch;
}
void optimal_global_alignment(int **table, char str1[], char str2[]){
unsigned int i, j;
char *align1, *align2;
i = strlen(str1);
j = strlen(str2);
align1 = malloc(sizeof(char) * (i * j));
align2 = malloc(sizeof(char) * (i * j));
align1[0] = align2[0] = '\0';
while((i > 0) && (j > 0)){
if (table[i][j] == (table[i - 1][j - 1] + score(str1[i - 1], str2[j - 1]))){
prefix_char(str1[i - 1], align1);
prefix_char(str2[j - 1], align2);
i--;
j--;
}
else if (table[i][j] == (table[i - 1][j] + score(str1[i-1], ' '))){
prefix_char(str1[i - 1], align1);
prefix_char('_', align2);
i--;
}
else if (table[i][j] == (table[i][j - 1] + score(' ', str2[j - 1]))){
prefix_char('_', align1);
prefix_char(str2[j - 1], align2);
j--;
}
}
while (i > 0){
prefix_char(str1[i - 1], align1);
prefix_char('_', align2);
i--;
}
while(j > 0){
prefix_char('_', align1);
prefix_char(str2[j - 1], align2);
j--;
}
puts(align1);
puts(align2);
}
int main(int argc, char * argv[]){
int **table;
if (argc == 3){
table = optimal_global_alignment_table(argv[1], argv[2]);
print_table(table, argv[1], argv[2]);
optimal_global_alignment(table, argv[1], argv[2]);
}
else{
puts("Reqires to string arguments!");
}
return 0;
}
Sample IO
$ cc dynamic_programming.c && ./a.out aab bba
__aab
bb_a_
$ cc dynamic_programming.c && ./a.out d abc
___d
abc_
$ cc dynamic_programming.c && ./a.out abcde bqcdge
ab_cd_e
_bqcdge

No real tangible answer, only vague intuition. Because you don't know the ordering algorithm, only that the data is ordered in each list, it sounds vaguely like the algorithms used to "diff" files (e.g. in Beyond Compare) and match sequences of lines together. Or also vaguely similar to regexp algorithms.
There can also be multiple solutions. (never mind, not if there are not repeated elements that are strictly ordered. I was thinking too much along the lines of file comparisons)

This is a pretty simple problem since you already have an ordered list.
//this is very rough pseudocode
stack aList;
stack bList;
List resultList;
char aVal;
char bVal;
while(aList.Count > 0 || bList.Count > 0)
{
aVal = aList.Peek; //grab the top item in A
bVal = bList.Peek; //grab the top item in B
if(aVal < bVal || bVal == null)
{
resultList.Add(new Tuple(aList.Pop(), null)));
}
if(bVal < aVal || aVal == null)
{
resultList.Add(new Tuple(null, bList.Pop()));
}
else //equal
{
resultList.Add(new Tuple(aList.Pop(), bList.Pop()));
}
}
Note... this code WILL NOT compile. It is just meant as a guide.
EDIT Based on the OP comments
If the ordering algorithm is not exposed, then the lists must be considered unordered.
If the lists are unordered, then the algorithm has a time complexity of O(n^2), specifically O(nm) where n and m are the number of items in each list.
EDIT
Algorithm to solve this
L(a,b,c,d,e)
R(b,q,c,d,g,e)
//pseudo code... will not compile
//Note, this modifies aList and bList, so make copies.
List aList;
List bList;
List resultList;
var aVal;
var bVal;
while(aList.Count > 0)
{
aVal = aList.Pop();
for(int bIndex = 0; bIndex < bList.Count; bIndex++)
{
bVal = bList.Peek();
if(aVal.RelevantlyEquivalentTo(bVal)
{
//The bList items that come BEFORE the match, are definetly not in aList
for(int tempIndex = 0; tempIndex < bIndex; tempIndex++)
{
resultList.Add(new Tuple(null, bList.Pop()));
}
//This 'popped' item is the same as bVal right now
resultList.Add(new Tuple(aVal, bList.Pop()));
//Set aVal to null so it doesn't get added to resultList again
aVal = null;
//Break because it's guaranteed not to be in the rest of the list
break;
}
}
//No Matches
if(aVal != null)
{
resultList.Add(new Tuple(aVal, null));
}
}
//aList is now empty, and all the items left in bList need to be added to result set
while(bList.Count > 0)
{
resultList.Add(new Tuple(null, bList.Pop()));
}
The result set will be
L(a,b,c,d,e)
R(b,q,c,d,g,e)
Result ((a,null),(b,b),(null,q),(c,c),(d,d),(null,g),(e,e))

I don't think you have enough information. All you've asserted is that elements that match match in the same order, but finding the first matching pair is an O(nm) operation unless you have some other ordering that you can determine.

SELECT distinct l.element, r.element
FROM LeftList l
OUTER JOIN RightList r
ON l.element = r.element
ORDER BY l.id, r.id
Assumes the ID of each element is its ordering. And of course, that your lists are contained in a Relational Database :)

Related

Sum of different elements in tuples nonexponential algorithm

I was working on something, and I was able to reduce a problem to a particular form: given n tuples each of k integers, say: (a1,a2,a3,a4) , (b1,b2,b3,b4) , (c1,c2,c3,c4) , (d1,d2,d3,d4), I wish to choose any number of tuples, that, when added to each other, give a tuple with no positive elements. If I choose tuples a and b, I get tuple (a1+b1,a2+b2,a3+b3,a4+b4). So, if a = (1,-2,2,0) and b=(-1, 1, -3,0) then a+b =(0,-1,-1,0) which includes no positive numbers, hence is a solution of the problem.
Is there a way to obtain a solution (or verify its nonexistence) using a method other than checking the sum of all subset tuples, which takes 2^n steps?
Since this question is from my head, and not a particular textbook, I do not know the proper way to express it, and research to find an answer has been completely futile. Most of my searches directed me to the subset sum problem, where we choose k elements from a list that sum to a particular question. My problem could be said to be a complication of that: we choose a group of tuples from a list, and we want the sum of each element in these tuples to be <=0.
Edit: Thanks to the link provided, and due to the comments that indicated that a less than exponential solution is difficult, solving the question for the tuples whose elements range between -1,0, and 1 will be enough for me. Furthermore, the tuples will have ranging from 10,000-20,000 integers, and there will be no more than 1000 tuples. Each tuple has at most 10 1's, and 10 -1's, and the rest are zeroes
If anyone could also prove that it is some sort of NP, that would be great.
I failed to come up with a DP solution, and sorting doesn't seem useful

This can be solved in pseudo polynomial time with the given constraints using dynamic programming.
Explanation
This is similar to the pseudo polynomial time dynamic programming solution for the subset sum problem. It is only extended to multiple dimensions (4).
Time complexity
O(n * sum4) or in this case, since sum has been bounded by n,
O(n5)
Solution
Demo
Here is a top-down dynamic programming solution with memoization in C++.
const int N = 50;
int a[50][4]= {{0, 1, -1, 0},
{1, -1, 0, 0},
{-1, -1, 0, -1}};
unordered_map<int, bool> dp[N];
bool subset(int n, int sum1, int sum2, int sum3, int sum4)
{
// Base case: No tuple selected
if (n == -1 && !sum1 && !sum2 && !sum3 && !sum4)
return true;
// Base case: No tuple selected with non-zero sum
else if(n == -1)
return false;
else if(dp[n].find(hashsum(sum1, sum2, sum3, sum4)) != dp[n].end() )
return dp[n][hashsum(sum1, sum2, sum3, sum4)];
// Include the current element
bool include = subset(n - 1,
sum1 - a[n][0],
sum2 - a[n][1],
sum3 - a[n][2],
sum4 - a[n][3]);
// Exclude the current element
bool exclude = subset(n - 1, sum1, sum2, sum3, sum4);
return dp[n][hashsum(sum1, sum2, sum3, sum4)] = include || exclude;
}
For memoization, the hashsum is calculated as follows:
int hashsum(int sum1, int sum2, int sum3, int sum4) {
int offset = N;
int base = 2 * N;
int hashSum = 0;
hashSum += (sum1 + offset) * 1;
hashSum += (sum2 + offset) * base;
hashSum += (sum3 + offset) * base * base;
hashSum += (sum4 + offset) * base * base * base;
return hashSum;
}
The driver code can then search for any non-positive sum as follows:
int main()
{
int n = 3;
bool flag = false;
int sum1, sum2, sum3, sum4;
for (sum1 = -n; sum1 <= 0; sum1++) {
for (sum2 = -n; sum2 <= 0; sum2++) {
for (sum3 = -n; sum3 <= 0; sum3++) {
for (sum4 = -n; sum4 <= 0; sum4++) {
if (subset(n - 1, sum1, sum2, sum3, sum4)) {
flag = true;
goto done;
}
}
}
}
}
done:
if (flag && (sum1 || sum2 || sum3 || sum4))
cout << "Solution found. " << sum1 << ' ' << sum2 << ' ' << sum3 << ' ' << sum4 << std::endl;
else
cout << "No solution found.\n";
return 0;
}
Note that a trivial solution with sums (0, 0, 0, 0} where no element is ever selected always exists and thus is left out in the driver code.

Time limit exceeded in my code given below

Question:
Lapindrome is defined as a string which when split in the middle, gives two halves having the same characters and same frequency of each character. If there are odd number of characters in the string, we ignore the middle character and check for lapindrome. For example gaga is a lapindrome, since the two halves ga and ga have the same characters with same frequency. Also, abccab, rotor and xyzxy are a few examples of lapindromes. Note that abbaab is NOT a lapindrome. The two halves contain the same characters but their frequencies do not match.
Your task is simple. Given a string, you need to tell if it is a lapindrome.
Input:
First line of input contains a single integer T, the number of test cases.
Each test is a single line containing a string S composed of only lowercase English alphabet.
Output:
For each test case, output on a separate line: "YES" if the string is a lapindrome and "NO" if it is not.
Constraints:
1 ≤ T ≤ 100
2 ≤ |S| ≤ 1000, where |S| denotes the length of S
#include <stdio.h>
#include <string.h>
int found;
int lsearch(char a[], int l, int h, char p) {
int i = l;
for (i = l; i <= h; i++) {
if (a[i] == p) {
found = 0;
return i;
}
}
return -1;
}
int main() {
char s[100];
int q, z, i, T;
scanf("%d", &T);
while (T--) {
q = 0;
scanf("%s", &s);
if (strlen(s) % 2 == 0)
for (i = 0; i < (strlen(s) / 2); i++) {
z = lsearch(s, strlen(s) / 2, strlen(s) - 1, s[i]);
if (found == 0) {
found = -1;
s[z] = -2;
} else
q = 1;
} else
for (i = 0; i < (strlen(s) / 2); i++) {
z = lsearch(s, 1 + (strlen(s) / 2), strlen(s) - 1, s[i]);
if (found == 0) {
found = -1;
s[z] = -2;
} else
q = 1;
}
if (strlen(s) % 2 == 0)
for (i = (strlen(s) / 2); i < strlen(s); i++) {
if (s[i] != -2)
q = 1;
} else
for (i = (strlen(s) / 2) + 1; i < strlen(s); i++) {
if (s[i] != -2)
q = 1;
}
if (q == 1)
printf("NO\n");
else
printf("YES\n");
}
}
I am getting correct output in codeblocks but the codechef compiler says time limit exceeded. Please tell me why it says so

For each of O(n) characters you do a O(n) search leading to a O(n^2) algorithm. Throw a thousand character string at it, and it is too slow.
This is solvable in two standard ways. The first is to sort each half of the string and then compare. The second is to create hash tables for letter frequency and then compare.

Linear time algorithm for 2-SUM

Given an integer x and a sorted array a of N distinct integers, design a linear-time algorithm to determine if there exists two distinct indices i and j such that a[i] + a[j] == x

This is type of Subset sum problem
Here is my solution. I don't know if it was known earlier or not. Imagine 3D plot of function of two variables i and j:
sum(i,j) = a[i]+a[j]
For every i there is such j that a[i]+a[j] is closest to x. All these (i,j) pairs form closest-to-x line. We just need to walk along this line and look for a[i]+a[j] == x:
int i = 0;
int j = lower_bound(a.begin(), a.end(), x) - a.begin();
while (j >= 0 && j < a.size() && i < a.size()) {
int sum = a[i]+a[j];
if (sum == x) {
cout << "found: " << i << " " << j << endl;
return;
}
if (sum > x) j--;
else i++;
if (i > j) break;
}
cout << " not found\n";
Complexity: O(n)

think in terms of complements.
iterate over the list, figure out for each item what the number needed to get to X for that number is. stick number and complement into hash. while iterating check to see if number or its complement is in hash. if so, found.
edit: and as I have some time, some pseudo'ish code.
boolean find(int[] array, int x) {
HashSet<Integer> s = new HashSet<Integer>();
for(int i = 0; i < array.length; i++) {
if (s.contains(array[i]) || s.contains(x-array[i])) {
return true;
}
s.add(array[i]);
s.add(x-array[i]);
}
return false;
}

Given that the array is sorted (WLOG in descending order), we can do the following:
Algorithm A_1:
We are given (a_1,...,a_n,m), a_1<...,<a_n.
Put a pointer at the top of the list and one at the bottom.
Compute the sum where both pointers are.
If the sum is greater than m, move the above pointer down.
If the sum is less than m, move the lower pointer up.
If a pointer is on the other (here we assume each number can be employed only once), report unsat.
Otherwise, (an equivalent sum will be found), report sat.
It is clear that this is O(n) since the maximum number of sums computed is exactly n. The proof of correctness is left as an exercise.
This is merely a subroutine of the Horowitz and Sahni (1974) algorithm for SUBSET-SUM. (However, note that almost all general purpose SS algorithms contain such a routine, Schroeppel, Shamir (1981), Howgrave-Graham_Joux (2010), Becker-Joux (2011).)
If we were given an unordered list, implementing this algorithm would be O(nlogn) since we could sort the list using Mergesort, then apply A_1.

First pass search for the first value that is > ceil(x/2). Lets call this value L.
From index of L, search backwards till you find the other operand that matches the sum.
It is 2*n ~ O(n)
This we can extend to binary search.
Search for an element using binary search such that we find L, such that L is min(elements in a > ceil(x/2)).
Do the same for R, but now with L as the max size of searchable elements in the array.
This approach is 2*log(n).

Here's a python version using Dictionary data structure and number complement. This has linear running time(Order of N: O(N)):
def twoSum(N, x):
dict = {}
for i in range(len(N)):
complement = x - N[i]
if complement in dict:
return True
dict[N[i]] = i
return False
# Test
print twoSum([2, 7, 11, 15], 9) # True
print twoSum([2, 7, 11, 15], 3) # False

Iterate over the array and save the qualified numbers and their indices into the map. The time complexity of this algorithm is O(n).
vector<int> twoSum(vector<int> &numbers, int target) {
map<int, int> summap;
vector<int> result;
for (int i = 0; i < numbers.size(); i++) {
summap[numbers[i]] = i;
}
for (int i = 0; i < numbers.size(); i++) {
int searched = target - numbers[i];
if (summap.find(searched) != summap.end()) {
result.push_back(i + 1);
result.push_back(summap[searched] + 1);
break;
}
}
return result;
}

I would just add the difference to a HashSet<T> like this:
public static bool Find(int[] array, int toReach)
{
HashSet<int> hashSet = new HashSet<int>();
foreach (int current in array)
{
if (hashSet.Contains(current))
{
return true;
}
hashSet.Add(toReach - current);
}
return false;
}

Note: The code is mine but the test file was not. Also, this idea for the hash function comes from various readings on the net.
An implementation in Scala. It uses a hashMap and a custom (yet simple) mapping for the values. I agree that it does not makes use of the sorted nature of the initial array.
The hash function
I fix the bucket size by dividing each value by 10000. That number could vary, depending on the size you want for the buckets, which can be made optimal depending on the input range.
So for example, key 1 is responsible for all the integers from 1 to 9.
Impact on search scope
What that means, is that for a current value n, for which you're looking to find a complement c such as n + c = x (x being the element you're trying ton find a 2-SUM of), there is only 3 possibles buckets in which the complement can be:
-key
-key + 1
-key - 1
Let's say that your numbers are in a file of the following form:
0
1
10
10
-10
10000
-10000
10001
9999
-10001
-9999
10000
5000
5000
-5000
-1
1000
2000
-1000
-2000
Here's the implementation in Scala
import scala.collection.mutable
import scala.io.Source
object TwoSumRed {
val usage = """
Usage: scala TwoSumRed.scala [filename]
"""
def main(args: Array[String]) {
val carte = createMap(args) match {
case None => return
case Some(m) => m
}
var t: Int = 1
carte.foreach {
case (bucket, values) => {
var toCheck: Array[Long] = Array[Long]()
if (carte.contains(-bucket)) {
toCheck = toCheck ++: carte(-bucket)
}
if (carte.contains(-bucket - 1)) {
toCheck = toCheck ++: carte(-bucket - 1)
}
if (carte.contains(-bucket + 1)) {
toCheck = toCheck ++: carte(-bucket + 1)
}
values.foreach { v =>
toCheck.foreach { c =>
if ((c + v) == t) {
println(s"$c and $v forms a 2-sum for $t")
return
}
}
}
}
}
}
def createMap(args: Array[String]): Option[mutable.HashMap[Int, Array[Long]]] = {
var carte: mutable.HashMap[Int,Array[Long]] = mutable.HashMap[Int,Array[Long]]()
if (args.length == 1) {
val filename = args.toList(0)
val lines: List[Long] = Source.fromFile(filename).getLines().map(_.toLong).toList
lines.foreach { l =>
val idx: Int = math.floor(l / 10000).toInt
if (carte.contains(idx)) {
carte(idx) = carte(idx) :+ l
} else {
carte += (idx -> Array[Long](l))
}
}
Some(carte)
} else {
println(usage)
None
}
}
}

int[] b = new int[N];
for (int i = 0; i < N; i++)
{
b[i] = x - a[N -1 - i];
}
for (int i = 0, j = 0; i < N && j < N;)
if(a[i] == b[j])
{
cout << "found";
return;
} else if(a[i] < b[j])
i++;
else
j++;
cout << "not found";

Here is a linear time complexity solution O(n) time O(1) space
public void twoSum(int[] arr){
if(arr.length < 2) return;
int max = arr[0] + arr[1];
int bigger = Math.max(arr[0], arr[1]);
int smaller = Math.min(arr[0], arr[1]);
int biggerIndex = 0;
int smallerIndex = 0;
for(int i = 2 ; i < arr.length ; i++){
if(arr[i] + bigger <= max){ continue;}
else{
if(arr[i] > bigger){
smaller = bigger;
bigger = arr[i];
biggerIndex = i;
}else if(arr[i] > smaller)
{
smaller = arr[i];
smallerIndex = i;
}
max = bigger + smaller;
}
}
System.out.println("Biggest sum is: " + max + "with indices ["+biggerIndex+","+smallerIndex+"]");
}

Solution
We need array to store the indices
Check if the array is empty or contains less than 2 elements
Define the start and the end point of the array
Iterate till condition is met
Check if the sum is equal to the target. If yes get the indices.
If condition is not met then traverse left or right based on the sum value
Traverse to the right
Traverse to the left
For more info :[http://www.prathapkudupublog.com/2017/05/two-sum-ii-input-array-is-sorted.html

Credit to leonid
His solution in java, if you want to give it a shot
I removed the return, so if the array is sorted, but DOES allow duplicates, it still gives pairs
static boolean cpp(int[] a, int x) {
int i = 0;
int j = a.length - 1;
while (j >= 0 && j < a.length && i < a.length) {
int sum = a[i] + a[j];
if (sum == x) {
System.out.printf("found %s, %s \n", i, j);
// return true;
}
if (sum > x) j--;
else i++;
if (i > j) break;
}
System.out.println("not found");
return false;
}

The classic linear time two-pointer solution does not require hashing so can solve related problems such as approximate sum (find closest pair sum to target).
First, a simple n log n solution: walk through array elements a[i], and use binary search to find the best a[j].
To get rid of the log factor, use the following observation: as the list is sorted, iterating through indices i gives a[i] is increasing, so any corresponding a[j] is decreasing in value and in index j. This gives the two-pointer solution: start with indices lo = 0, hi = N-1 (pointing to a[0] and a[N-1]). For a[0], find the best a[hi] by decreasing hi. Then increment lo and for each a[lo], decrease hi until a[lo] + a[hi] is the best. The algorithm can stop when it reaches lo == hi.

Given an array of numbers, find out if 3 of them add up to 0

Given an array of numbers, find out if 3 of them add up to 0.
Do it in N^2, how would one do this?

O(n^2) solution without hash tables (because using hash tables is cheating :P). Here's the pseudocode:
Sort the array // O(nlogn)
for each i from 1 to len(array) - 1
iter = i + 1
rev_iter = len(array) - 1
while iter < rev_iter
tmp = array[iter] + array[rev_iter] + array[i]
if tmp > 0
rev_iter--
else if tmp < 0
iter++
else
return true
return false
Basically using a sorted array, for each number (target) in an array, you use two pointers, one starting from the front and one starting from the back of the array, check if the sum of the elements pointed to by the pointers is >, < or == to the target, and advance the pointers accordingly or return true if the target is found.

Not for credit or anything, but here is my python version of Charles Ma's solution. Very cool.
def find_sum_to_zero(arr):
arr = sorted(arr)
for i, target in enumerate(arr):
lower, upper = 0, len(arr)-1
while lower < i < upper:
tmp = target + arr[lower] + arr[upper]
if tmp > 0:
upper -= 1
elif tmp < 0:
lower += 1
else:
yield arr[lower], target, arr[upper]
lower += 1
upper -= 1
if __name__ == '__main__':
# Get a list of random integers with no duplicates
from random import randint
arr = list(set(randint(-200, 200) for _ in range(50)))
for s in find_sum_to_zero(arr):
print s
Much later:
def find_sum_to_zero(arr):
limits = 0, len(arr) - 1
arr = sorted(arr)
for i, target in enumerate(arr):
lower, upper = limits
while lower < i < upper:
values = (arr[lower], target, arr[upper])
tmp = sum(values)
if not tmp:
yield values
lower += tmp <= 0
upper -= tmp >= 0

put the negative of each number into a hash table or some other constant time lookup data structure. (n)
loop through the array getting each set of two numbers (n^2), and see if their sum is in the hash table.

First sort the array, then for each negative number (A) in the array, find two elements in the array adding up to -A. Finding 2 elements in a sorted array that add up to the given number takes O(n) time, so the entire time complexity is O(n^2).

C++ implementation based on the pseudocode provided by Charles Ma, for anyone interested.
#include <iostream>
using namespace std;
void merge(int originalArray[], int low, int high, int sizeOfOriginalArray){
// Step 4: Merge sorted halves into an auxiliary array
int aux[sizeOfOriginalArray];
int auxArrayIndex, left, right, mid;
auxArrayIndex = low;
mid = (low + high)/2;
right = mid + 1;
left = low;
// choose the smaller of the two values "pointed to" by left, right
// copy that value into auxArray[auxArrayIndex]
// increment either left or right as appropriate
// increment auxArrayIndex
while ((left <= mid) && (right <= high)) {
if (originalArray[left] <= originalArray[right]) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}else{
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
}
// here when one of the two sorted halves has "run out" of values, but
// there are still some in the other half; copy all the remaining values
// to auxArray
// Note: only 1 of the next 2 loops will actually execute
while (left <= mid) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}
while (right <= high) {
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
// all values are in auxArray; copy them back into originalArray
int index = low;
while (index <= high) {
originalArray[index] = aux[index];
index++;
}
}
void mergeSortArray(int originalArray[], int low, int high){
int sizeOfOriginalArray = high + 1;
// base case
if (low >= high) {
return;
}
// Step 1: Find the middle of the array (conceptually, divide it in half)
int mid = (low + high)/2;
// Steps 2 and 3: Recursively sort the 2 halves of origianlArray and then merge those
mergeSortArray(originalArray, low, mid);
mergeSortArray(originalArray, mid + 1, high);
merge(originalArray, low, high, sizeOfOriginalArray);
}
//O(n^2) solution without hash tables
//Basically using a sorted array, for each number in an array, you use two pointers, one starting from the number and one starting from the end of the array, check if the sum of the three elements pointed to by the pointers (and the current number) is >, < or == to the targetSum, and advance the pointers accordingly or return true if the targetSum is found.
bool is3SumPossible(int originalArray[], int targetSum, int sizeOfOriginalArray){
int high = sizeOfOriginalArray - 1;
mergeSortArray(originalArray, 0, high);
int temp;
for (int k = 0; k < sizeOfOriginalArray; k++) {
for (int i = k, j = sizeOfOriginalArray-1; i <= j; ) {
temp = originalArray[k] + originalArray[i] + originalArray[j];
if (temp == targetSum) {
return true;
}else if (temp < targetSum){
i++;
}else if (temp > targetSum){
j--;
}
}
}
return false;
}
int main()
{
int arr[] = {2, -5, 10, 9, 8, 7, 3};
int size = sizeof(arr)/sizeof(int);
int targetSum = 5;
//3Sum possible?
bool ans = is3SumPossible(arr, targetSum, size); //size of the array passed as a function parameter because the array itself is passed as a pointer. Hence, it is cummbersome to calculate the size of the array inside is3SumPossible()
if (ans) {
cout<<"Possible";
}else{
cout<<"Not possible";
}
return 0;
}

This is my approach using Swift 3 in N^2 log N...
let integers = [-50,-40, 10, 30, 40, 50, -20, -10, 0, 5]
First step, sort array
let sortedArray = integers.sorted()
second, implement a binary search method that returns an index like so...
func find(value: Int, in array: [Int]) -> Int {
var leftIndex = 0
var rightIndex = array.count - 1
while leftIndex <= rightIndex {
let middleIndex = (leftIndex + rightIndex) / 2
let middleValue = array[middleIndex]
if middleValue == value {
return middleIndex
}
if value < middleValue {
rightIndex = middleIndex - 1
}
if value > middleValue {
leftIndex = middleIndex + 1
}
}
return 0
}
Finally, implement a method that keeps track of each time a set of "triplets" sum 0...
func getTimesTripleSumEqualZero(in integers: [Int]) -> Int {
let n = integers.count
var count = 0
//loop the array twice N^2
for i in 0..<n {
for j in (i + 1)..<n {
//Sum the first pair and assign it as a negative value
let twoSum = -(integers[i] + integers[j])
// perform a binary search log N
// it will return the index of the give number
let index = find(value: twoSum, in: integers)
//to avoid duplications we need to do this check by checking the items at correspondingly indexes
if (integers[i] < integers[j] && integers[j] < integers[index]) {
print("\([integers[i], integers[j], integers[index]])")
count += 1
}
}
}
return count
}
print("count:", findTripleSumEqualZeroBinary(in: sortedArray))
prints--- count: 7

void findTriplets(int arr[], int n)
{
bool found = false;
for (int i=0; i<n-1; i++)
{
unordered_set<int> s;
for (int j=i+1; j<n; j++)
{
int x = -(arr[i] + arr[j]);
if (s.find(x) != s.end())
{
printf("%d %d %d\n", x, arr[i], arr[j]);
found = true;
}
else
s.insert(arr[j]);
}
}
if (found == false)
cout << " No Triplet Found" << endl;
}

Algorithm to find next greater permutation of a given string

I want an efficient algorithm to find the next greater permutation of the given string.

Wikipedia has a nice article on lexicographical order generation. It also describes an algorithm to generate the next permutation.
Quoting:
The following algorithm generates the next permutation lexicographically after a given permutation. It changes the given permutation in-place.
Find the highest index i such that s[i] < s[i+1]. If no such index exists, the permutation is the last permutation.
Find the highest index j > i such that s[j] > s[i]. Such a j must exist, since i+1 is such an index.
Swap s[i] with s[j].
Reverse the order of all of the elements after index i till the last element.

A great solution that works is described here: https://www.nayuki.io/page/next-lexicographical-permutation-algorithm. And the solution that, if next permutation exists, returns it, otherwise returns false:
function nextPermutation(array) {
var i = array.length - 1;
while (i > 0 && array[i - 1] >= array[i]) {
i--;
}
if (i <= 0) {
return false;
}
var j = array.length - 1;
while (array[j] <= array[i - 1]) {
j--;
}
var temp = array[i - 1];
array[i - 1] = array[j];
array[j] = temp;
j = array.length - 1;
while (i < j) {
temp = array[i];
array[i] = array[j];
array[j] = temp;
i++;
j--;
}
return array;
}

Using the source cited by #Fleischpfanzerl:
We follow the steps as below to find the next lexicographical permutation:
nums = [0,1,2,5,3,3,0]
nums = [0]*5
curr = nums[-1]
pivot = -1
for items in nums[-2::-1]:
if items >= curr:
pivot -= 1
curr = items
else:
break
if pivot == - len(nums):
print('break') # The input is already the last possible permutation
j = len(nums) - 1
while nums[j] <= nums[pivot - 1]:
j -= 1
nums[j], nums[pivot - 1] = nums[pivot - 1], nums[j]
nums[pivot:] = nums[pivot:][::-1]
> [1, 3, 0, 2, 3, 5]
So the idea is:
The idea is to follow steps -
Find a index 'pivot' from the end of the array such that nums[i - 1] < nums[i]
Find index j, such that nums[j] > nums[pivot - 1]
Swap both these indexes
Reverse the suffix starting at pivot

Homework? Anyway, can look at the C++ function std::next_permutation, or this:
http://blog.bjrn.se/2008/04/lexicographic-permutations-using.html

We can find the next largest lexicographic string for a given string S using the following step.
1. Iterate over every character, we will get the last value i (starting from the first character) that satisfies the given condition S[i] < S[i + 1]
2. Now, we will get the last value j such that S[i] < S[j]
3. We now interchange S[i] and S[j]. And for every character from i+1 till the end, we sort the characters. i.e., sort(S[i+1]..S[len(S) - 1])
The given string is the next largest lexicographic string of S. One can also use next_permutation function call in C++.

nextperm(a, n)
1. find an index j such that a[j….n - 1] forms a monotonically decreasing sequence.
2. If j == 0 next perm not possible
3. Else
1. Reverse the array a[j…n - 1]
2. Binary search for index of a[j - 1] in a[j….n - 1]
3. Let i be the returned index
4. Increment i until a[j - 1] < a[i]
5. Swap a[j - 1] and a[i]
O(n) for each permutation.

I came across a great tutorial.
link : https://www.youtube.com/watch?v=quAS1iydq7U
void Solution::nextPermutation(vector<int> &a) {
int k=0;
int n=a.size();
for(int i=0;i<n-1;i++)
{
if(a[i]<a[i+1])
{
k=i;
}
}
int ele=INT_MAX;
int pos=0;
for(int i=k+1;i<n;i++)
{
if(a[i]>a[k] && a[i]<ele)
{
ele=a[i];pos=i;
}
}
if(pos!=0)
{
swap(a[k],a[pos]);
reverse(a.begin()+k+1,a.end());
}
}

void Solution::nextPermutation(vector<int> &a) {
int i, j=-1, k, n=a.size();
for(i=0; i<n-1; i++) if(a[i] < a[i+1]) j=i;
if(j==-1) reverse(a.begin(), a.end());
else {
for(i=j+1; i<n; i++) if(a[j] < a[i]) k=i;
swap(a[j],a[k]);
reverse(a.begin()+j+1, a.end());
}}

A great solution that works is described here: https://www.nayuki.io/page/next-lexicographical-permutation-algorithm.
and if you are looking for
source code:
/**
* method to find the next lexicographical greater string
*
* #param w
* #return a new string
*/
static String biggerIsGreater(String w) {
char charArray[] = w.toCharArray();
int n = charArray.length;
int endIndex = 0;
// step-1) Start from the right most character and find the first character
// that is smaller than previous character.
for (endIndex = n - 1; endIndex > 0; endIndex--) {
if (charArray[endIndex] > charArray[endIndex - 1]) {
break;
}
}
// If no such char found, then all characters are in descending order
// means there cannot be a greater string with same set of characters
if (endIndex == 0) {
return "no answer";
} else {
int firstSmallChar = charArray[endIndex - 1], nextSmallChar = endIndex;
// step-2) Find the smallest character on right side of (endIndex - 1)'th
// character that is greater than charArray[endIndex - 1]
for (int startIndex = endIndex + 1; startIndex < n; startIndex++) {
if (charArray[startIndex] > firstSmallChar && charArray[startIndex] < charArray[nextSmallChar]) {
nextSmallChar = startIndex;
}
}
// step-3) Swap the above found next smallest character with charArray[endIndex - 1]
swap(charArray, endIndex - 1, nextSmallChar);
// step-4) Sort the charArray after (endIndex - 1)in ascending order
Arrays.sort(charArray, endIndex , n);
}
return new String(charArray);
}
/**
* method to swap ith character with jth character inside charArray
*
* #param charArray
* #param i
* #param j
*/
static void swap(char charArray[], int i, int j) {
char temp = charArray[i];
charArray[i] = charArray[j];
charArray[j] = temp;
}
If you are looking for video explanation for the same, you can visit here.

This problem can be solved just by using two simple algorithms searching and find smaller element in just O(1) extra space and O(nlogn ) time and also easy to implement .
To understand this approach clearly . Watch this Video : https://www.youtube.com/watch?v=DREZ9pb8EQI

def result(lst):
if len(lst) == 0:
return 0
if len(lst) == 1:
return [lst]
l = []
for i in range(len(lst)):
m = lst[i]
remLst = lst[:i] + lst[i+1:]
for p in result(remLst):
l.append([m] + p)
return l
result(['1', '2', '3'])

Start traversing from the end of the list. Compare each one with the previous index value.
If the previous index (say at index i-1) value, consider x, is lower than the current index (index i) value, sort the sublist on right side starting from current position i.
Pick one value from the current position till end which is just higher than x, and put it at index i-1. At the index the value was picked from, put x. That is:
swap(list[i-1], list[j]) where j >= i, and the list is sorted from index "i" onwards
Code:
public void nextPermutation(ArrayList<Integer> a) {
for (int i = a.size()-1; i > 0; i--){
if (a.get(i) > a.get(i-1)){
Collections.sort(a.subList(i, a.size()));
for (int j = i; j < a.size(); j++){
if (a.get(j) > a.get(i-1)) {
int replaceWith = a.get(j); // Just higher than ith element at right side.
a.set(j, a.get(i-1));
a.set(i-1, replaceWith);
return;
}
}
}
}
// It means the values are already in non-increasing order. i.e. Lexicographical highest
// So reset it back to lowest possible order by making it non-decreasing order.
for (int i = 0, j = a.size()-1; i < j; i++, j--){
int tmp = a.get(i);
a.set(i, a.get(j));
a.set(j, tmp);
}
}
Example :
10 40 30 20 => 20 10 30 40 // 20 is just bigger than 10
10 40 30 20 5 => 20 5 10 30 40 // 20 is just bigger than 10. Numbers on right side are just sorted form of this set {numberOnRightSide - justBigger + numberToBeReplaced}.

This is efficient enough up to strings with 11 letters.
// next_permutation example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
void nextPerm(string word) {
vector<char> v(word.begin(), word.end());
vector<string> permvec; // permutation vector
string perm;
int counter = 0; //
int position = 0; // to find the position of keyword in the permutation vector
sort (v.begin(),v.end());
do {
perm = "";
for (vector<char>::const_iterator i = v.begin(); i != v.end(); ++i) {
perm += *i;
}
permvec.push_back(perm); // add permutation to vector
if (perm == word) {
position = counter +1;
}
counter++;
} while (next_permutation(v.begin(),v.end() ));
if (permvec.size() < 2 || word.length() < 2) {
cout << "No answer" << endl;
}
else if (position !=0) {
cout << "Answer: " << permvec.at(position) << endl;
}
}
int main () {
string word = "nextperm";
string key = "mreptxen";
nextPerm(word,key); // will check if the key is a permutation of the given word and return the next permutation after the key.
return 0;
}

I hope this code might be helpful.
int main() {
char str[100];
cin>>str;
int len=strlen(len);
int f=next_permutation(str,str+len);
if(f>0) {
print the string
} else {
cout<<"no answer";
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio