Minimum edit distance of two anagrams given two swap operations - algorithm

Given two anagrams S and P, what is the minimum edit distance from S to P when there are only two operations:
swap two adjacent elements
swap the first and the last element
If this question is simplified to only having the first operation (i.e. swap two adjacent elements) then this question is "similar to" the classical algorithm question of "the minimum number of swaps for sorting an array of numbers" (solution link is given below)
Sorting a sequence by swapping adjacent elements using minimum swaps
I mean "similar to" because when the two anagrams have all distinct characters:
S: A B C D
P : B C A D
Then we can define the ordering in P like this
P: B C A D
1 2 3 4
Then based on this ordering the string S becomes
S: A B C D
3 1 2 4
Then we can use the solution given in the link to solve this question.
However, I have two questions:
In the simplified question that we can only swap two adjacent elements, how can we get the minimum number of swaps if the anagrams contain duplicate elements. For example,
S: C D B C D A A
P: A A C D B C D
How to solve the complete question with two swap operations?

One approach is to use http://en.wikipedia.org/wiki/A*_search_algorithm for the search. Your cost function is half of the sum of the shortest distances from each element to the nearest element that could possibly go there. The reason for half is that the absolutely ideal set of swaps will at all points move both elements closer to where they want to go.

Related

How to find sum of elements in a range in a grid efficiently?

Given a n by n grid of positive numbers, what is the best complexity that can be achieved for obtaining the sum of elements in a range described by the corners of rectangle of area considered given as (x1,y1) and (x2,y2)? There will be q such queries.
PS: Considering the naive solution, the complexity is O(q*n^2).
Suppose you sum along the rows as in m69's comment to produce a matrix where every element is the sum of the corresponding element in the original matrix and all elements to the left of it. Then you do the same thing summing down the columns of that matrix of sums and you get a matrix where every element is the sum of a rectangular sub-array of elements to its left and above it.
Now take four points in this array of sums:
A B
C D
The value D - B - C + A contains the sum of a rectangular region with one corner at D and the other corners just to D's side of A, B, and C, as you can see by working out how many times points in the various regions are added and subtracted. So after O(n^2) pre-processing you can answer queries in time O(1)

Minimising distance between related items in an array

I have an array of related items, { A, B, C, D }.
C is dependent on A.
D is dependent on B and C.
So, I calculate the total distance between items in this permutation as the sum of distances between:
C and A (2),
D and B (2),
D and C (1).
So, we have a total of 5 in this permutation.
However, the most optimal solution would be {A, C, D, B}, which has a total distance of 3.
I have a (much more complicated) list of about 200 items, which I want to optimise as best as I can, and I'm not aware of any sorting algorithms that sort in this way- can anyone point me in the direction of an existing algorithm?
From Comments:
A plot of the data would look like below- (Apologies for the formatting!)
#Dependencies #Items
0 9
1 27
2 57
3 55
4 11
5 3
6 1
I believe what you're looking for is Topological Sorting.
This algorithm is used in directed graphs. Here the alphabets form the nodes of the graph and the dependencies form the unidirectional edges.
This algorithm is an application of depth first search and is used to order jobs.
This is a pretty neat explanation.

Understanding sorting solution to finding a triplet from each of 3 linked-lists whose sum is equal to a given number

Question:
Given three linked lists, say a, b and c, find one node from each list such that the sum of the values of the nodes is equal to a given number.
For example, if the three linked lists are 12->6->29, 23->5->8 and 90->20->59, and the given number is 101, the output should be the triplet "6 5 90".
An O(n²) solution is described on GeeksforGeeks: (paraphrased)
b and c are sorted in ascending and descending order respectively using merge sort. Then, for every pair of b and c (1st element of b and 1st element of c form a pair and so on), we check for all values of a.
I'm not wondering about the implementation, just the algorithm. How does this algorithm provide right solution?
The algorithm basically converts the 3-SUM problem to a 2-sum problem.
You have list b sorted in ascending order and c sorted in descending order.
For each ai element in a, you have to check whether there is a pair (bj, ck) in b & c such that:
bj + ck = SUM - ai
This can be done by traversing the lists b & c simultaneously.
Keep one pointer each for lists b & c (say p & q).
If (*p + *q == SUM - ai)
return with success.
If (*p + *q < SUM - ai)
p = p->next
If (*p + *q > SUM - ai)
q = q->next
The idea is that p points to the lowest element in the list b and q points to highest element in c. So if the sum is too small, then a higher number in b needs to be considered and if it is too large, then a smaller number in c needs to be considered.

Find cardinality of set

I have faced the following problem recently:
We have a sequence A of M consecutive integers, beginning at A[1] = 1:
1,2,...M (example: M = 8 , A = 1,2,3,4,5,6,7,8 )
We have the set T consisting of all possible subsequences made from L_T consecutive terms of A.
(example L_T = 3 , subsequences are {1,2,3},{2,3,4},{3,4,5},...). Let's call the elements of T "tiles".
We have the set S consisting of all possible subsequences of A that have length L_S. ( example L_S = 4, subsequences like {1,2,3,4} , {1,3,7,8} ,...{4,5,7,8} ).
We say that an element s of S can be "covered" by K "tiles" of T if there exist K tiles in T such that the union of their sets of terms contains the terms of s as a subset. For example, subsequence {1,2,3} is possible to cover with 2 tiles of length 2 ({1,2} and {3,4}), while subsequnce {1,3,5} is not possible to "cover" with 2 "tiles" of length 2, but is possible to cover with 2 "tiles" of length 3 ({1,2,3} and {4,5,6}).
Let C be the subset of elements of S that can be covered by K tiles of T.
Find the cardinality of C given M, L_T, L_S, K.
Any ideas would be appreciated how to tackle this problem.
Assume M is divisible by T, so that we have an integer number of tiles covering all elements of the initial set (otherwise the statement is currently unclear).
First, let us count F (P): it will be almost the number of subsequences of length L_S which can be covered by no more than P tiles, but not exactly that.
Formally, F (P) = choose (M/T, P) * choose (P*T, L_S).
We start by choosing exactly P covering tiles: the number of ways is choose (M/T, P).
When the tiles are fixed, we have exactly P * T distinct elements available, and there are choose (P*T, L_S) ways to choose a subsequence.
Well, this approach has a flaw.
Note that, when we chose a tile but did not use its elements at all, we in fact counted some subsequences more than once.
For example, if we fixed three tiles numbered 2, 6 and 7, but used only 2 and 7, we counted the same subsequences again and again when we fixed three tiles numbered 2, 7 and whatever.
The problem described above can be countered by a variation of the inclusion-exclusion principle.
Indeed, for a subsequence which uses only Q tiles out of P selected tiles, it is counted choose (M-Q, P-Q) times instead of only once: Q of P choices are fixed, but the other ones are arbitrary.
Define G (P) as the number of subsequences of length L_S which can be covered by exactly P tiles.
Then, F (P) is sum for Q from 0 to P of the products G (Q) * choose (M-Q, P-Q).
Working from P = 0 upwards, we can calculate all the values of G by calculating the values of F.
For example, we get G (2) from knowing F (2), G (0) and G (1), and also the equation connecting F (2) with G (0), G (1) and G (2).
After that, the answer is simply sum for P from 0 to K of the values G (P).

Matching points in 2 D space

I have 2 matrices A and B both of size Rows X 2 where Rows = m , n for A and B respectively. These m and n denote the points in the euclidean space.
The task I wish to perform is to match the maximum number of points from A and B ( assuming A has less number of points than B ) given the condition that the distance is less than a threshold d and each pair is unique.
I have seen this nearest point pairs but this won't work on my problem because for every point in A it select the minimum left in B. However it may happen that the first pair I picked from A and B was wrong leading to less number of matching pairs.
I am looking for a fast solution since both A and B consists of about 1000 points each. Again, some points will be left and I am aware that this would somehow lead to an exhaustive search.
I am looking for a solution where there is some sort of inbuilt functions in matlab or using data structures that can help whose matlab code is available such as kd-trees. As mentioned I have to find unique nearest matching points from B to A.
You can use pdist2 to compute a pairwise distance between two pairs of observations (of different sizes). The final distance matrix will be an N x M matrix which you can probe for all values above the desired threshold.
A = randn(1000, 2);
B = randn(500, 2);
D = pdist2(A, B, 'euclidean'); % euclidean distance
d = 0.5; % threshold
indexD = D > d;
pointsA = any(indexD, 2);
pointsB = any(indexD, 1);
The two vectors provide logical indexes to the points in A and B that have at least one match, defined by the minimum distance d, on the other. The resulting sets will be composed of the entire set of elements from matrix A (or B) with distance above d from any element of the other matrix B (or A).
You can also generalize to more than 2 dimensions or different distance metrics.

Resources