Minimising distance between related items in an array - algorithm

I have an array of related items, { A, B, C, D }.
C is dependent on A.
D is dependent on B and C.
So, I calculate the total distance between items in this permutation as the sum of distances between:
C and A (2),
D and B (2),
D and C (1).
So, we have a total of 5 in this permutation.
However, the most optimal solution would be {A, C, D, B}, which has a total distance of 3.
I have a (much more complicated) list of about 200 items, which I want to optimise as best as I can, and I'm not aware of any sorting algorithms that sort in this way- can anyone point me in the direction of an existing algorithm?
From Comments:
A plot of the data would look like below- (Apologies for the formatting!)
#Dependencies #Items
0 9
1 27
2 57
3 55
4 11
5 3
6 1

I believe what you're looking for is Topological Sorting.
This algorithm is used in directed graphs. Here the alphabets form the nodes of the graph and the dependencies form the unidirectional edges.
This algorithm is an application of depth first search and is used to order jobs.
This is a pretty neat explanation.

Related

Is there any-way to count the total number of topological sort in a DAG without finding the all possible order?

Suppose this is a question.....
How can I calculate total number of the topological sort without finding all orders?
In general this is
#P-complete. This
particular graph happens to be
series–parallel,
however, which makes it easy. Graphs in series cause the number of
possibilities for each graph to be multiplied. For the particular graph
you show, there are three diamonds in series, each of which has two
valid extensions, so there are eight possibilities.
Check the image here
For each of the rectangles [u,v] either u can appear first or v can appear first
So for these pairs [a,b], [c,d], [e,f], we have two choices.
And for the remaining elements p, q, r, s, they have only one choice because
we have to start with p and end with s
p ->(a or b)-> q -> (c or d) -> r -> (e or f) -> s.
Total = 1 * 2 * 1 * 2 * 1 * 2 * 1 = 8
Hence total of 8 topological ordering is possible.

Finding two minimum spanning trees in graph such that their sum is minimal

I'm trying to solve pretty complex problem with graphs, namely we have given undirected graph with N(N <= 10)nodes and M (M <= 25)edges.
Let's say we have two sets of edges A and B, we can't have two same edges in both A and B, also there can be edges that wont be used in the any of those sets, each edge is assigned value to it. We want to minimize the total sum of the two trees.
Please note that in both sets A and B the edges should form connected graph with all N nodes.
Example
N = 2, M = 3
Edges: 1 - 2, value = 10, 1 - 2, value: 20, 2 - 1, value 30, we want to return the result 30, in the set A we take the first edge and in set B the second edge.
N = 5
M = 8
Edges: {
(1,2,10),
(1,3,10),
(1,4,10),
(1,4,20),
(1,5,20),
(2,3,20),
(3,4,20),
(4,5,30),
}
set A contains edges {(1,2,10), (1,3,10), (1,4,10), (1,5,20)}
while set B contains {(1,4,20), (2,3,20), (3,4,20), (4,5,30)}
What I tried
Firstly I coded greedy solution, I first generated the first minimum spanning tree and then I generated with the other edges the second one, but it fails on some test cases. So I started thinking about this solution:
We can see that we want to split the edges in two groups, also we can see that in each group we want to have N - 1 edges to make sure the graph doesn't contain not-wanted edges, Now we see that in worse-case we will use (N-1) + (N-1) edges, that is 18 edges. This is small numbers, so we can run backtracking algorithm with some optimizations to solve this problem.
I still haven't coded the backtracking because I'm not sure if it will work, please write what do you think. Thanks in advance.

Minimum edit distance of two anagrams given two swap operations

Given two anagrams S and P, what is the minimum edit distance from S to P when there are only two operations:
swap two adjacent elements
swap the first and the last element
If this question is simplified to only having the first operation (i.e. swap two adjacent elements) then this question is "similar to" the classical algorithm question of "the minimum number of swaps for sorting an array of numbers" (solution link is given below)
Sorting a sequence by swapping adjacent elements using minimum swaps
I mean "similar to" because when the two anagrams have all distinct characters:
S: A B C D
P : B C A D
Then we can define the ordering in P like this
P: B C A D
1 2 3 4
Then based on this ordering the string S becomes
S: A B C D
3 1 2 4
Then we can use the solution given in the link to solve this question.
However, I have two questions:
In the simplified question that we can only swap two adjacent elements, how can we get the minimum number of swaps if the anagrams contain duplicate elements. For example,
S: C D B C D A A
P: A A C D B C D
How to solve the complete question with two swap operations?
One approach is to use http://en.wikipedia.org/wiki/A*_search_algorithm for the search. Your cost function is half of the sum of the shortest distances from each element to the nearest element that could possibly go there. The reason for half is that the absolutely ideal set of swaps will at all points move both elements closer to where they want to go.

k means clustering sample data

I am writing program to implement k-means clustering.
consider a simple input with 4 vertices a,b,c and d with following edge costs
[vertex1] [vertex2] [edge cost]
a b 1
a c 2
a d 3
b d 4
c d 5
Now I need to make the program run until i get 2 clusters.
My doubt is, in the first step when calculate the minimum distance it is a->b (edge cost 1). Now I should consider ab as a single cluster. If that is the case, what will be the distance of ab from c and d?
The K-means algorithm works as follows:
choose k points as initial centroids (hence, K-*);
calculate the distance from all vertices to the k centroids choosen;
assign each vertex to the closest centroid;
recalculate the position of the centroids by generating the mean between all the vertices that belong to the centroid (hence, k-means, one mean calculation for each of the k centroids);
go to step 2 and stop when, in step 3, no vertex get assigned to another centroid -- or until your error condition gets satisfied.
In your case, as you have an undirected graph, it'd be better for you to generate the coordinates of each vertex considering the edge distances, and then, apply the algorithm.
If you don't want to do this initial process, you may calculate the distance from a vertex to all other reachable vertices, but you'd have to do this for every iteration -- which is quite an unnecessary overhead.
For your undirected graph:
[vertex1] [vertex2] [edge cost]
a b 1
a c 2
a d 3
b d 4
c d 5
The table of distances would be something like:
a b c d
a 0 1 2 3
b 1 0 (1) 4
c 2 (1) 0 5
d 3 4 5 0
(1) - b to c = (b to a, a to c) = 3
If this should be your table, simply apply the Dijkstra algorithm on your graph, for each vertex, and consider the resultant table your table of distances.
The table would have the minimal distances, but, if you have any other policy to calculate it, it's totally up to you saying how to calculate it.
Notice also that, if your graph is directed, the matrix will not be symmetric, as it is, in this case.

Finding least number of bit sequence ORs to achieve all 1's?

I'm trying to find anything that may help with this task: I have a variable number of bit sequences (that will all individually be the same length) and I need to find which combination of sequences would OR to all 1's, using as few sequences as possible. I was thinking to start with whichever sequence had the most 1's and try filling in the blanks, but since I haven't worked with bit comparisons really I didn't know if there was some algorithm or property of bit logic that would simplify this. Thanks.
This problem, unfortunately, is NP-hard in the most general case by a reduction from the set cover problem. In the set cover problem, you have a collection of sets of elements, and want to find the smallest number of them whose union contains all the total elements. You can easily reduce the set cover problem to your problem by constructing a bitvector for each set that has a 1 in each position if a given set has that item and a 0 otherwise. The smallest number of bitvectors whose OR gives all 1s is then equivalent to the smallest group of sets whose union contains all elements.
For example, given the sets {a, b, e}, {b, c}, {b, d, f}, and {a, f}, you would get these bitvectors:
{a, b, e} 110010
{b, c} 011000
{b, d, f} 010101
{a, f} 100001
Since the set cover problem is known to be NP-hard, this means that unless P = NP there is no polynomial-time algorithm for your problem. Worse, it is known that you cannot approximate the optimal solution within a factor of O(log n), where n is the number of total elements, in polynomial time. You are probably best off looking for heuristics, or staying content with an O(log n) approximation using the greedy algorithm.
Hope this helps!
I thought a bit about this problem and here's the idea I came up with:
First you create for every bit a List and in every List you'll find every sequence that has a '1' on this bit. This takes O(n*m) beeing n the number of sequences and m the length of a particular sequence
Then you count all occurences of every Bitsequence and throw all these Tuple of [List, Integer] in a structure (AVL Tree or Heap or whatever you like) and sort them. (I mean: the sequence 'a' occurs 15 times over all lists and sequence b 10 times). This takes again O(n*m) because O(nlogn) < O(n*m)
In the next step you use the sequence with the highest priority and remove all lists of step one wich contain this sequence. Then you go back to step 2 until you have eliminated all lists. In the worst case you'll have to do this m times.
So in total we have a time of O(n * m^2)
Correct me if you I misunderstood a part of the question or if I did a mistake ;)
Here is a little example of what I mean:
Bit Strings:
a: 100101
b: 010001
c: 011100
d: 000010
So this will create the Lists:
L1: a
L2: b,c
L3: c
L4: a, c
L5: d
L6: a, b
Then we will count and sort:
a: 3
c: 3
b: 2
d: 1
So we take a in our final list and delete the following Lists:
L1, L4, L6
Now we count again:
c: 2
b: 1
d: 1
so we take c in our list and delete:
L2, L3
so we have only L5 left wich only contains d
So we have found our final minimal set: a, c, d

Resources