Drop values to keep only N occurrences - java-8

I was doing today some katas from Codewars. I had to write function which keeps only N of same elements from array, for example:
{1,2,3,4,1}, N=1 -> {1,2,3,4}
{2,2,2,2}, N=2 -> {2,2}
I come up with that solution using streams:
public static int[] deleteNth(int[] elements, int maxOcurrences) {
List<Integer> ints = Arrays.stream(elements)
.boxed()
.collect(Collectors.toList());
return ints.stream().filter(x -> Collections.frequency(ints, x) <= maxOcurrences)
.mapToInt(Integer::intValue)
.toArray();
}
So, firstly change ints to Integers, then filter if freq is higher than N.
But this isn't working, because repeating elements have identical frequency regardless of theirs positions. It looks like values are filtered after filter call. How can I fix this to get correct values?
PS: I know thats O(n^2), but this isn't a problem for me.

The solution I've found to accomplish the task at hand is as follows:
public static int[] deleteNth(int[] elements, int maxOccurrences) {
return Arrays.stream(elements)
.boxed()
.collect(Collectors.groupingBy(Function.identity(),
LinkedHashMap::new,
Collectors.counting()))
.entrySet()
.stream()
.flatMapToInt(entry ->
IntStream.generate(entry::getKey)
.limit(Math.min(maxOccurrences, entry.getValue())))
.toArray();
}
We first group the elements and then apply a Collectors.counting() as a downstream collector to get us the counts of a given element. After that is done we simply map a given number n number of times and then collect to an array with the toArray eager operations.

Actually you exclude elements that are superior to the maxOcurrences value :
.filter(x -> Collections.frequency(ints, x) <= maxOcurrences)
I am not sure that a full Stream solution be the best choice for this use case as you want to add some values according to how many was "currently collected" for these values.
Here it how I would implement that :
public class DeleteN {
public static void main(String[] args) {
System.out.println(Arrays.toString(deleteNth(new int[] { 1, 2, 3, 4, 1 }, 1)));
System.out.println(Arrays.toString(deleteNth(new int[] { 2, 2, 2, 2 }, 2)));
}
public static int[] deleteNth(int[] elements, int maxOcurrences) {
Map<Integer, Long> actualOccurencesByNumber = new HashMap<>();
List<Integer> result = new ArrayList<>();
Arrays.stream(elements)
.forEach(i -> {
Long actualValue = actualOccurencesByNumber.computeIfAbsent(i, k -> Long.valueOf(0L));
if (actualValue < maxOcurrences) {
result.add(i);
actualOccurencesByNumber.computeIfPresent(i, (k, v) -> v + 1L);
}
});
return result.stream().mapToInt(i -> i).toArray();
}
}
Output :
[1, 2, 3, 4]
[2, 2]

I think this is a great case to not use streams. Stream are not always the best approach when stateful operations are involved.
But it can be definitely done, and also the question asks specifically for streams, so you can use the followings.
Using forEachOrdered
You can use forEachOrdered with should ensure the order (here obvioulsy the stream has to be sequential):
public static int[] deleteNth(int[] elements, int maxOcurrs) {
List<Integer> list = new ArrayList<>();
Arrays.stream(elements).forEachOrdered(elem -> {
if (Collections.frequency(list, elem) < maxOcurrs) list.add(elem);
});
return list.stream().mapToInt(Integer::intValue).toArray();
}
Using collect
Given some circunstanses you can use the collect method to accomplish this.
When the stream is ordered and sequential, which is the case of Arrays.stream(elements).boxed(), the collect() method does not use the combiner operator (this is truth for java8 and java9 current realease, and however is not guaranteed to work exactly the same in next releases, because many optimizations can occur).
This implementation keeps the order of the stream, and as mentioned before works fine in the current releases. Like the answer in the link below says, and also in my personal opinion, i find very difficult that the implementation of collect in sequential streams will ever need to use the combiner.
The code of the collect method is the following:
public static int[] deleteNth(int[] elements, int maxOcurrs) {
return Arrays.stream(elements).boxed()
.collect(() -> new ArrayList<Integer>(),
(list, elem) -> {
if (Collections.frequency(list, elem) < maxOcurrs) list.add(elem);
},
(list1, list2) -> {
throw new UnsupportedOperationException("Undefined combiner");
})
.stream()
.mapToInt(Integer::intValue)
.toArray();
}
This collector creates an ArrayList, and when is goind to add the new element checks if the maxOcurrences is met, if is not, then adds the element. Like mentioned before, and in the answer below, the combiner is not called at all. This persforms a little better than n^2.
More information of why the combiner method is not called in sequentials streams can be found here.

Related

Explain the use of HashMap: Write a method to compute all permutations of a string whose characters are NOT necessarily unique

I came across the question below, don't fully understand the usage of HashMap, including the lines of map.put(c, count - 1) and map.put(c, count)?
Anyone can explain?
Permutations with Duplicates: Write a method to compute all
permutations of a string whose characters are not necessarily unique.
The list of permutations should not have duplicates.
public static HashMap<Character, Integer> getFreqTable(String s) {
HashMap<Character, Integer> map = new HashMap<Character, Integer>();
for (char c : s.toCharArray()) {
if (!map.containsKey(c)) {
map.put(c, 0);
}
map.put(c, map.get(c) + 1);
}
return map;
}
public static void getPerms(HashMap<Character, Integer> map, String prefix, int remaining, ArrayList<String> result) {
if (remaining == 0) {
result.add(prefix);
return;
}
for (Character c : map.keySet()) {
int count = map.get(c);
if (count > 0) {
map.put(c, count - 1);
printPerms(map, prefix + c, remaining - 1, result);
map.put(c, count);
}
}
}
public static ArrayList<String> getPerms(String s) {
ArrayList<String> result = new ArrayList<String>();
HashMap<Character, Integer> map = getFreqTable(s);
getPerms(map, "", s.length(), result);
return result;
}
public static void main(String[] args) {
String s = "aab";
ArrayList<String> result = getPerms(s);
System.out.println(result.toString());
}
Update
Thansk #trincot for his answer.
Sorry for not making it clear. I understand the use of HashMap, but I was looking for the reasoning for using it for this permutation question, particularly with duplicate numbers in the input.
For example, the reasoning why using HashMap and recursive backtracking can resolve this issue. I debugged and traced the getPerms but I cannot understand the backtracking logic naturally. The backtracking controls whether or not some permutation can be generated. But I cannot come up with it if I do it myself.
Below is the trace of first part of getPerms. X means if is not executed because a or b is zero.
aab -> aab,aba,baa
a2 b1
"" 3
a:2
a:1,
p(a,2)
a:0
p(aa,1)
a: X aaa
b: b=0
p(aab,0)
re: aab
b=1
a=1
b:1
b=0
p(ab,1)
a:0
a=0
p(aba,0)
a:1
b:0
X abb
a=2
b:1
Update 2
below is another example that explains why using HashMap helps
without HashMap
ab
[aa, ab, ba, bb]
ab
a
a b
aa
bb
b
b a
ba
bb
with HashMap
ab
[ab, ba]
This tells using HashMap and backtracking avoid duplicate in the input
The getFreqTable will create a HashMap that has as keys the characters of the input, and as values the count of occurrence of the corresponding character. So for input "aacbac", this function returns a HashMap that can be described as follows:
"a": 3
"b": 1
"c": 2
This is a very common use of a HashMap. As it provides quick lookup of a key (of a character in this case) and quick insertion of a new key, it is the ideal solution for counting the occurrence of each character in the input.
Then this map is used to select characters for a permutation. Whenever a character is selected for use in a permutation, its counter is decreased:
map.put(c, count - 1);
And when backtracking from recursion (which will produce all permutations with that character c as prefix), that counter is restored again:
map.put(c, count);
Whenever the counter for a certain character is 0, it cannot be selected anymore for a permutation. This is why there is this condition:
if (count > 0)
I hope this explains it.
Addendum
By maintaining the count of duplicate letters, this algorithm avoids to make an artificial distinction between these duplicate letters, by which the permutation "aa" would be considered the same as "aa", just because those two letters were swapped with eachother. A decrement of a counter does not mind where exactly that duplicate came from (position in the input). It just "takes one of them, no matter which".

To compare 2 integer arrays using Java 8 Features [duplicate]

This question already has answers here:
How do I get the intersection between two arrays as a new array?
(22 answers)
Closed 6 years ago.
Is it possible to do without external foreach to iterate b. Need to identify common values in 2 arays using Java 8
Integer a[]={1,2,3,4};
Integer b[]={9,8,2,3};
for(Integer b1:b) {
Stream.of(a).filter(a1 -> (a1.compareTo(b1) ==0)).forEach(System.out::println);
}
Output: 2 3
I would suggest using sets if you only want the common values (i.e. not taking duplicates into account)
Integer a[]={1,2,3,4};
Integer b[]={9,8,2,3};
Set<Integer> aSet = new HashSet<>(Arrays.asList(a));
Set<Integer> bSet = new HashSet<>(Arrays.asList(b));
aSet.retainAll(bSet);
Maybe something like this:
public static void main(String[] args) {
Integer a[] = {1, 2, 3, 4};
Integer b[] = {9, 8, 2, 3};
Stream<Integer> as = Arrays.stream(a).distinct();
Stream<Integer> bs = Arrays.stream(b).distinct();
List<Integer> collect = Stream.concat(as, bs)
.collect(Collectors.groupingBy(Function.identity()))
.entrySet()
.stream()
.filter(e -> e.getValue().size() > 1)
.map(e -> e.getKey())
.collect(Collectors.toList());
System.out.println(collect);
}
we merge two array into one stream
groupBy is counting by value
then we filter lists longer than 1, that lists contains duplicates
map to key to extract value of duplicated entry
print it.
edit: added distinct to initial streams.

Finding triplicates in 4 lists

I'm trying to find, given 4 arrays of N strings, a string that is common to at least 3 of the arrays in O(N*log(N)) time, and if it exists return the lexicographically first string.
What I tried was creating an array of size 4*N and adding items from the 4 arrays to it while removing the duplicates. Then I did a Quick sort on the big array to find the first eventual triplicate.
Does anyone know a better solution?
You can do this in O(n log n), with constant extra space. It's a standard k-way merge problem, after sorting the individual lists. If the individual lists can contain duplicates, then you'll need to remove the duplicates during the sorting.
So, assuming you have list1, list2, list3, and list4:
Sort the individual lists, removing duplicates
Create a priority queue (min-heap) of length 4
Add the first item from each list to the heap
last-key = ""
last-key-count = 0
while not done
remove the smallest item from the min-heap
add to the heap the next item from the list that contained the item you just removed.
if the item matches last-key
increment last-key-count
if last-key-count == 3 then
output last-key
exit done
else
last-key-count = 1
last-key = item key
end while
// if you get here, there was no triplicate item
An alternate way to do this is to combine all the lists into a single list, then sort it. You can then go through it sequentially to find the first triplicate. Again, if the individual lists can contain duplicates, you should remove them before you combine the lists.
combined = list1.concat(list2.concat(list3.concat(list4)))
last-key = ""
last-key-count = 0
for i = 0 to combined.length-1
if combined[i] == last-key
last-key-count++
if last-key-count == 3
exit done
else
last-key = combined[i]
last-key-count = 1
end for
// if you get here, no triplicate was found
Here we have 4 arrays of N strings, where N = 5. My approach to get all triplicates is:
Get the 1st string of the 1st array and add it in a Map< String, Set< Integer > > with the array number in the Set (I'm using a Hash because insertion and search are O(1));
Get the 1st string of the 2nd array and add it in a Map< String, Set< Integer > > with the array number in the Set;
Repeat step 2, but using 3rd and 4th arrays instead of 2nd;
Repeat steps 1, 2 and 3 but using the 2nd string instead of 1st;
Repeat steps 1, 2 and 3 but using the 3nd string instead of 1st;
Etc.
In the worst case, we will have N*4 comparisons, O(N*log(N)).
public class Main {
public static void main(String[] args) {
String[][] arr = {
{ "xxx", "xxx", "xxx", "zzz", "aaa" },
{ "ttt", "bbb", "ddd", "iii", "aaa" },
{ "sss", "kkk", "uuu", "rrr", "zzz" },
{ "iii", "zzz", "lll", "hhh", "aaa" }};
List<String> triplicates = findTriplicates(arr);
Collections.sort(triplicates);
for (String word : triplicates)
System.out.println(word);
}
public static List<String> findTriplicates(String[][] arr) {
Map<String, Set<Integer>> map = new HashMap<String, Set<Integer>>();
List<String> triplicates = new ArrayList<String>();
final int N = 5;
for (int i = 0; i < N; i++) {
for (int j = 0; j < 4; j++) {
String str = arr[j][i];
if (map.containsKey(str)) {
map.get(str).add(j);
if (map.get(str).size() == 3)
triplicates.add(str);
} else {
Set<Integer> set = new HashSet<Integer>();
set.add(j);
map.put(str, set);
}
}
}
return triplicates;
}
}
Output:
aaa
zzz
Ok, if you don't care about the constant factors this can be done in O(N) where N is the size of strings. It is important to distinguish number of strings vs their total size for practical purposes. (At the end I propose an alternative version which is O(N log N) where N is number of string comparisons.
You need one map string -> int for count, and one temporary already_counted map string -> bool. The latter one is basically a set. Important thing is to use unordered/hash versions of the associative containers, to avoid log factors.
For each array, for each element, you check whether the current element is in already_counted set. If not, do count[current_string] ++. Before going over to the next array empty the already_counted set.
Now you basically need a min search. Go over each element of count and if an element has value 3 or more, then compare the key associated with it, to your current min. Voilà. min is the lowest string with 3 or more occurences.
You don't need the N log N factor, because you do not need all the triplets, so no sorting or ordered data structures are needed. You have O(3*N) (again N is the total size of all string). This is an over estimation, later I give more detailed estimation.
Now, the caveat is that this method is based on string hashing, which is O(S), where S is the size of string. Twice, to deal with per-array repetitions. So, alternatively, might be faster, at least in c++ implementation, to actually use ordered versions of the containers. There are two reasons for this:
Comparing strings might be faster then hashing them. If the strings are different, then you will get a result of a comparison relatively fast, whereas with hashing you always go over whole string, and hashing quite more complicated.
They are contiguous in memory - cache friendly.
Hashing also has a problem with rehashing, etc. etc.
If the number of strings is not large, or if their size is very big, I would place my bet on the ordered versions. Also, if you have ordered count you get an edge in finding the least element because it's the 1st with count > 3, though in worst case you will get tons of a* with count 1 and z with 3.
So, to sum all of it up, if we call n the number of string comparisons, and N the number of string hashes.
Hash-based method is O(2 N + n) and with some trickery you can bring down constant factor by 1, e.g. reusing hash for count and the already_checked.\, or combining both data structures for example via bitset. So you would get O(N + n).
Pure string comparison based method would be O(2 n log n + n). Maybe somehow it would be possible to easily use hinting to drop the constant, but I am not sure.
It can be solved in O(N) using Trie.
You loop 4 lists one by one, for each list you insert the strings into the Trie.
When you inserting a string s of list L, increase the counter only if there is string s in previous lists. Update the answer if the counter >= 3 and is lexicographically smaller than the current answer.
Here is a sample C++ code, you can input 4 list of string, each contains 5 string to test it.
http://ideone.com/fTmKgJ
#include<bits/stdc++.h>
using namespace std;
vector<vector<string>> lists;
string ans = "";
struct TrieNode
{
TrieNode* l[128];
int n;
TrieNode()
{
memset(l, 0, sizeof(TrieNode*) * 128);
n = 0;
}
} *root = new TrieNode();
void add(string s, int listID)
{
TrieNode* p = root;
for (auto x: s)
{
if (!p->l[x]) p->l[x] = new TrieNode();
p = p->l[x];
}
p->n |= (1<<listID);
if(__builtin_popcount(p->n) >= 3 && (ans == "" || s < ans)) ans = s;
}
int main() {
for(int i=0; i<4;i++){
string s;
vector<string> v;
for(int i=0; i<5; i++){
cin >> s;
v.push_back(s);
}
lists.push_back(v);
}
for(int i=0; i<4;i++){
for(auto s: lists[i]){
add(s, i);
}
}
if(ans == "") cout << "NO ANSWER" << endl;
else cout << ans << endl;
return 0;
}

Understanding a recursive function involving generators

I've come across the following recursive algorithm, written here in Swift, that given an array, produces a generator that generates sub-arrays that are one element shorter than the original array. The sub arrays are created by removing one element at every index.
ie input [1,2,3] would return a generator that generated [1,2] [2,3] [1,3].
The algorithm works, but I'm having real trouble understanding how. Could someone explain what's happening, or offer advice on how to analyze or understand it? Thanks in advance
// Main algorithm
func smaller1<T>(xs:[T]) -> GeneratorOf<[T]> {
if let (head, tail) = xs.decompose {
var gen1:GeneratorOf<[T]> = one(tail)
var gen2:GeneratorOf<[T]> = map(smaller1(tail)) {
smallerTail in
return [head] + smallerTail
}
return gen1 + gen2
}
return one(nil)
}
// Auxillary functions used
func map<A, B>(var generator:GeneratorOf<A>, f:A -> B) -> GeneratorOf<B> {
return GeneratorOf {
return generator.next().map(f)
}
}
func one<X>(x:X?) -> GeneratorOf<X> {
return GeneratorOf(GeneratorOfOne(x))
}
The code is taken from the book 'Functional Programming in Swift' by Chris Eidhof, Florian Kugler, and Wouter Swierstra
Given an array [a_1,…,a_n], the code:
Generates the sub-array [a_2,…,a_n];
For each sub-array B of [a_2,…,a_n] (generated recursively), generates [a_1] + B.
For example, given the array [1,2,3], we:
Generate [2,3];
For each sub-array B of [2,3] (namely, [3] and [2]), generate [1] + B (this generates [1,3] and [1,2]).

Stable topological sort

Let say I have a graph where the nodes is stored in a sorted list. I now want to topological sort this graph while keeping the original order where the topological order is undefined.
Are there any good algorithms for this?
One possibility is to compute the lexicographically least topological order. The algorithm is to maintain a priority queue containing the nodes whose effective in-degree (over nodes not yet processed) is zero. Repeatedly dequeue the node with the least label, append it to the order, decrement the effective in-degrees of its successors, enqueue the ones that now have in-degree zero. This produces 1234567890 on btilly's example but does not in general minimize inversions.
The properties I like about this algorithm are that the output has a clean definition obviously satisfied by only one order and that, whenever there's an inversion (node x appears after node y even though x < y), x's largest dependency is larger than y's largest dependency, which is an "excuse" of sorts for inverting x and y. A corollary is that, in the absence of constraints, the lex least order is sorted order.
The problem is two-fold:
Topological sort
Stable sort
After many errors and trials I came up with a simple algorithm that resembles bubble sort but with topological order criteria.
I thoroughly tested the algorithm on full graphs with complete edge combinations so it can be considered as proven.
Cyclic dependencies are tolerated and resolved according to original order of elements in sequence. The resulting order is perfect and represents the closest possible match.
Here is the source code in C#:
static class TopologicalSort
{
/// <summary>
/// Delegate definition for dependency function.
/// </summary>
/// <typeparam name="T">The type.</typeparam>
/// <param name="a">The A.</param>
/// <param name="b">The B.</param>
/// <returns>
/// Returns <c>true</c> when A depends on B. Otherwise, <c>false</c>.
/// </returns>
public delegate bool TopologicalDependencyFunction<in T>(T a, T b);
/// <summary>
/// Sorts the elements of a sequence in dependency order according to comparison function with Gapotchenko algorithm.
/// The sort is stable. Cyclic dependencies are tolerated and resolved according to original order of elements in sequence.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">A sequence of values to order.</param>
/// <param name="dependencyFunction">The dependency function.</param>
/// <param name="equalityComparer">The equality comparer.</param>
/// <returns>The ordered sequence.</returns>
public static IEnumerable<T> StableOrder<T>(
IEnumerable<T> source,
TopologicalDependencyFunction<T> dependencyFunction,
IEqualityComparer<T> equalityComparer)
{
if (source == null)
throw new ArgumentNullException("source");
if (dependencyFunction == null)
throw new ArgumentNullException("dependencyFunction");
if (equalityComparer == null)
throw new ArgumentNullException("equalityComparer");
var graph = DependencyGraph<T>.TryCreate(source, dependencyFunction, equalityComparer);
if (graph == null)
return source;
var list = source.ToList();
int n = list.Count;
Restart:
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < i; ++j)
{
if (graph.DoesXHaveDirectDependencyOnY(list[j], list[i]))
{
bool jOnI = graph.DoesXHaveTransientDependencyOnY(list[j], list[i]);
bool iOnJ = graph.DoesXHaveTransientDependencyOnY(list[i], list[j]);
bool circularDependency = jOnI && iOnJ;
if (!circularDependency)
{
var t = list[i];
list.RemoveAt(i);
list.Insert(j, t);
goto Restart;
}
}
}
}
return list;
}
/// <summary>
/// Sorts the elements of a sequence in dependency order according to comparison function with Gapotchenko algorithm.
/// The sort is stable. Cyclic dependencies are tolerated and resolved according to original order of elements in sequence.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">A sequence of values to order.</param>
/// <param name="dependencyFunction">The dependency function.</param>
/// <returns>The ordered sequence.</returns>
public static IEnumerable<T> StableOrder<T>(
IEnumerable<T> source,
TopologicalDependencyFunction<T> dependencyFunction)
{
return StableOrder(source, dependencyFunction, EqualityComparer<T>.Default);
}
sealed class DependencyGraph<T>
{
private DependencyGraph()
{
}
public IEqualityComparer<T> EqualityComparer
{
get;
private set;
}
public sealed class Node
{
public int Position
{
get;
set;
}
List<T> _Children = new List<T>();
public IList<T> Children
{
get
{
return _Children;
}
}
}
public IDictionary<T, Node> Nodes
{
get;
private set;
}
public static DependencyGraph<T> TryCreate(
IEnumerable<T> source,
TopologicalDependencyFunction<T> dependencyFunction,
IEqualityComparer<T> equalityComparer)
{
var list = source as IList<T>;
if (list == null)
list = source.ToArray();
int n = list.Count;
if (n < 2)
return null;
var graph = new DependencyGraph<T>();
graph.EqualityComparer = equalityComparer;
graph.Nodes = new Dictionary<T, Node>(n, equalityComparer);
bool hasDependencies = false;
for (int position = 0; position < n; ++position)
{
var element = list[position];
Node node;
if (!graph.Nodes.TryGetValue(element, out node))
{
node = new Node();
node.Position = position;
graph.Nodes.Add(element, node);
}
foreach (var anotherElement in list)
{
if (equalityComparer.Equals(element, anotherElement))
continue;
if (dependencyFunction(element, anotherElement))
{
node.Children.Add(anotherElement);
hasDependencies = true;
}
}
}
if (!hasDependencies)
return null;
return graph;
}
public bool DoesXHaveDirectDependencyOnY(T x, T y)
{
Node node;
if (Nodes.TryGetValue(x, out node))
{
if (node.Children.Contains(y, EqualityComparer))
return true;
}
return false;
}
sealed class DependencyTraverser
{
public DependencyTraverser(DependencyGraph<T> graph)
{
_Graph = graph;
_VisitedNodes = new HashSet<T>(graph.EqualityComparer);
}
DependencyGraph<T> _Graph;
HashSet<T> _VisitedNodes;
public bool DoesXHaveTransientDependencyOnY(T x, T y)
{
if (!_VisitedNodes.Add(x))
return false;
Node node;
if (_Graph.Nodes.TryGetValue(x, out node))
{
if (node.Children.Contains(y, _Graph.EqualityComparer))
return true;
foreach (var i in node.Children)
{
if (DoesXHaveTransientDependencyOnY(i, y))
return true;
}
}
return false;
}
}
public bool DoesXHaveTransientDependencyOnY(T x, T y)
{
var traverser = new DependencyTraverser(this);
return traverser.DoesXHaveTransientDependencyOnY(x, y);
}
}
}
And a small sample application:
class Program
{
static bool DependencyFunction(char a, char b)
{
switch (a + " depends on " + b)
{
case "A depends on B":
return true;
case "B depends on D":
return true;
default:
return false;
}
}
static void Main(string[] args)
{
var source = "ABCDEF";
var result = TopologicalSort.StableOrder(source.ToCharArray(), DependencyFunction);
Console.WriteLine(string.Concat(result));
}
}
Given the input elements {A, B, C, D, E, F} where A depends on B and B depends on D the output is {D, B, A, C, E, F}.
UPDATE:
I wrote a small article about stable topological sort objective, algorithm and its proofing. Hope this gives more explanations and is useful to developers and researchers.
You have insufficient criteria to specify what you're looking for. For instance consider a graph with two directed components.
1 -> 2 -> 3 -> 4 -> 5
6 -> 7 -> 8 -> 9 -> 0
Which of the following sorts would you prefer?
6, 7, 8, 9, 0, 1, 2, 3, 4, 5
1, 2, 3, 4, 5, 6, 7, 8, 9, 0
The first results from breaking all ties by putting the lowest node as close to the head of the list as possible. Thus 0 wins. The second results from trying to minimize the number of times that A < B and B appears before A in the topological sort. Both are reasonable answers. The second is probably more pleasing.
I can easily produce an algorithm for the first. To start, take the lowest node, and do a breadth-first search to locate the distance to the shortest root node. Should there be a tie, identify the set of nodes that could appear on such a shortest path. Take the lowest node in that set, and place the best possible path from it to a root, and then place the best possible path from the lowest node we started with to it. Search for the next lowest node that is not already in the topological sort, and continue.
Producing an algorithm for the more pleasing version seems much harder. See http://en.wikipedia.org/wiki/Feedback_arc_set for a related problem that strongly suggests that it is, in fact, NP-complete.
Here's an easy iterative approach to topological sorting: continually remove a node with in-degree 0, along with its edges.
To achieve a stable version, just modify to: continually remove the smallest-index node with in-degree 0, along with its edges.
In pseudo-python:
# N is the number of nodes, labeled 0..N-1
# edges[i] is a list of nodes j, corresponding to edges (i, j)
inDegree = [0] * N
for i in range(N):
for j in edges[i]:
inDegree[j] += 1
# Now we maintain a "frontier" of in-degree 0 nodes.
# We take the smallest one until the frontier is exhausted.
# Note: You could use a priority queue / heap instead of a list,
# giving O(NlogN) runtime. This naive implementation is
# O(N^2) worst-case (when the order is very ambiguous).
frontier = []
for i in range(N):
if inDegree[i] == 0:
frontier.append(i)
order = []
while frontier:
i = min(frontier)
frontier.remove(i)
for j in edges[i]:
inDegree[j] -= 1
if inDegree[j] == 0:
frontier.append(j)
# Done - order is now a list of the nodes in topological order,
# with ties broken by original order in the list.
The depth-first search algorithm on Wikipedia worked for me:
const assert = chai.assert;
const stableTopologicalSort = ({
edges,
nodes
}) => {
// https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search
const result = [];
const marks = new Map();
const visit = node => {
if (marks.get(node) !== `permanent`) {
assert.notEqual(marks.get(node), `temporary`, `not a DAG`);
marks.set(node, `temporary`);
edges.filter(([, to]) => to === node).forEach(([from]) => visit(from));
marks.set(node, `permanent`);
result.push(node);
}
};
nodes.forEach(visit);
return result;
};
const graph = {
edges: [
[5, 11],
[7, 11],
[3, 8],
[11, 2],
[11, 9],
[11, 10],
[8, 9],
[3, 10]
],
nodes: [2, 3, 5, 7, 8, 9, 10, 11]
};
assert.deepEqual(stableTopologicalSort(graph), [5, 7, 11, 2, 3, 8, 9, 10]);
<script src="https://cdnjs.cloudflare.com/ajax/libs/chai/4.2.0/chai.min.js"></script>
Interpreting "stable topological sort" as a linearization of a DAG such that ranges in the linearization where the topological order doesn't matter, are sorted lexicographically. This can be solved with the DFS method of linearization, with the modification that nodes are visited in lexicographical order.
I have a Python Digraph class with a linearization method which looks like this:
def linearize_as_needed(self):
if self.islinearized:
return
# Algorithm: DFS Topological sort
# https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search
temporary = set()
permanent = set()
L = [ ]
def visit(vertices):
for vertex in sorted(vertices, reverse=True):
if vertex in permanent:
pass
elif vertex in temporary:
raise NotADAG
else:
temporary.add(vertex)
if vertex in self.arrows:
visit(self.arrows[vertex])
L.append(vertex)
temporary.remove(vertex)
permanent.add(vertex)
# print('visit: {} => {}'.format(vertices, L))
visit(self.vertices)
self._linear = list(reversed(L))
self._iter = iter(self._linear)
self.islinearized = True
Here
self.vertices
is the set of all vertices, and
self.arrows
holds the adjacency relation as a dict of left nodes to sets of right nodes.

Resources