My naive maximal clique finding algorithm runs faster than Bron-Kerbosch's. What's wrong? - ruby

In short, my naive code (in Ruby) looks like:
# $seen is a hash to memoize previously seen sets
# $sparse is a hash of usernames to a list of neighboring usernames
# $set is the list of output clusters
$seen = {}
def subgraph(set, adj)
hash = (set + adj).sort
return if $seen[hash]
$sets.push set.sort.join(", ") if adj.empty? and set.size > 2
adj.each {|node| subgraph(set + [node], $sparse[node] & adj)}
$seen[hash] = true
end
$sparse.keys.each do |vertex|
subgraph([vertex], $sparse[vertex])
end
And my Bron Kerbosch implementation:
def bron_kerbosch(set, points, exclude)
$sets.push set.sort.join(', ') if set.size > 2 and exclude.empty? and points.empty?
points.each_with_index do |vertex, i|
points[i] = nil
bron_kerbosch(set + [vertex],
points & $sparse[vertex],
exclude & $sparse[vertex])
exclude.push vertex
end
end
bron_kerbosch [], $sparse.keys, []
I also implemented pivoting and degeneracy ordering, which cut down on bron_kerbosch execution time, but not enough to overtake my initial solution. It seems wrong that this is the case; what algorithmic insight am I missing? Here is a writeup with more detail if you need to see fully working code. I've tested this on pseudo-random sets up to a million or so edges in size.

I don't know how you generate the random graphs for your tests but I suppose you use a function which generates a number according to a uniform distribution and thus you obtain a graph that is very homogeneous. That's a common problem when testing algorithms on graphs, it is very difficult to create good test cases (it's often as hard as solving the original problem).
The max-clique problem is a well-known NP hard problem and both algorithms (the naive one and the Bron Kerbosch one) have the same complexity so we can't expect a global improvement on all testcase but just an improvement on some particular cases. But because you used a uniform distribution to generate your graph, you don't have this particular case.
That's why performance of both algorithms is very similar on your data. And because Bron Kerbosch algorithm is a little more complex than the naive one, the naive one is faster.

Related

Is Recursion W/Memoization In Staircase Problem Bottom-Up?

Considering the classical staircase problem as "Davis has a number of staircases in his house and he likes to climb each staircase 1, 2, or 3 steps at a time. Being a very precocious child, he wonders how many ways there are to reach the top of the staircase."
My approach is to use memoization with recursion as
# TimeO(N), SpaceO(N), DP Bottom Up + Memoization
def stepPerms(n, memo = {}):
if n < 3:
return n
elif n == 3:
return 4
if n in memo:
return memo[n]
else:
memo[n] = stepPerms(n - 1, memo) + stepPerms(n - 2 ,memo) + stepPerms(n - 3 ,memo)
return memo[n]
The question that comes to my mind is that, is this solution bottom-up or top-down. My way of approaching it is that since we go all the way down to calculate the upper N values (imagine the recursion tree). I consider this bottom-up. Is this correct?
Recoursion strategies are as a general rule topdown approaches, whether they have memory or not. The underlaying algorithm design is dynamic programming, which traditionally built in a bottom-up fashion.
I noticed that you wrote your code in python, and python is generally not happy about deep recoursion (small amounts are okay, but performance quickly takes a hit and there is a maximum recousion depth of 1000 - unless it was changed since I read that).
If we make a bottom-up dynamic programmin version, we can get rid of this recousion, and we can also recognise that we only need constant amount of space, since we are only really interested in the last 3 values:
def stepPerms(n):
if n < 1: return n
memo = [1,2,4]
if n <= 3: return memo[n-1]
for i in range(3,n):
memo[i % 3] = sum(memo)
return memo[n-1]
Notice how much simpler the logic is, appart from the i is one less than the value, since the positions are starts a 0 instead of the count of 1.
In the top-down approach, the complex module is divided into submodules. So it is top down approach. On the other hand, bottom-up approach begins with elementary modules and then combine them further.
And bottom up approach of this solution will be:
memo{}
for i in range(0,3):
memo[i]=i
memo[3]=4
for i in range(4,n+1):
memo[i]=memo[i-1]+memo[i-2]+memo[i-3]

Why should we use Dynamic Programming with Memoization in order to solve - Minimum Number of Coins to Make Change

The Problem Statement:
Given an infinite supply of coins of values {C1, C2, ..., Cn} and a sum, find the minimum number of coins that can represent the sum X.
Most of the solutions on the web include dynamic programming with memoization. Here is an example from Youtube: https://www.youtube.com/watch?v=Kf_M7RdHr1M
My question is: why don't we sort the array of coins in descending order first and start exploring recursively by minimizing the sum until we reach 0? When we reach 0, we know that we have found the needed coins to make up the sum. Because we sorted the array in descending order, we know that we will always choose the greatest coin. Therefore, the first time the sum reaches down to 0, the count will have to be minimum.
I'd greatly appreciate if you help understand the complexity of my algorithm and compare it to the dynamic programming with memoization approach.
For simplicty, we are assuming there will always be a "$1" coin and thus there is always a way to make up the sum.
import java.util.*;
public class Solution{
public static void main(String [] args){
MinCount cnt=new MinCount(new Integer []{1,2,7,9});
System.out.println(cnt.count(12));
}
}
class MinCount{
Integer[] coins;
public MinCount(Integer [] coins){
Arrays.sort(coins,Collections.reverseOrder());
this.coins=coins;
}
public int count(int sum){
if(sum<0)return Integer.MAX_VALUE;
if(sum==0)return 0;
int min=Integer.MAX_VALUE;
for(int i=0; i<coins.length; i++){
int val=count(sum-coins[i]);
if(val<min)min=val;
if(val!=Integer.MAX_VALUE)break;
}
return min+1;
}
}
Suppose that you have coins worth $1, $50, and $52, and that your total is $100. Your proposed algorithm would produce a solution that uses 49 coins ($52 + $1 + $1 + … + $1 + $1); but the correct minimum result requires only 2 coins ($50 + $50).
(Incidentally, I think it's cheating to write
For simplicty we are assuming there will always be a "$1" coin and thus there is always a way to make up the sum.
when this is not in the problem statement, and therefore not assumed in other sources. That's a bit like asking "Why do sorting algorithms always put a lot of effort into rearranging the elements, instead of just assuming that the elements are in the right order to begin with?" But as it happens, even assuming the existence of a $1 coin doesn't let you guarantee that the naïve/greedy algorithm will find the optimal solution.)
I will complement the answer that has already been provided to your question with some algorithm design advice.
The solution that you propose is what is called a "greedy algorithm": a problem solving strategy that makes the locally optimal choice at each stage with the hope of finding a global optimum.
In many problems, a greedy strategy does not produce an optimal solution. The best way to disprove the correctess of an algorithm is to find a counter-example, such as the case of the "$52", "$50", and "$1" coins. To find counter-examples, Steven Skiena gives the following advice in his book "The Algorithm Design Manual":
Think small: when an algorithm fails, there is usually a very simple example on which it fails.
Hunt for the weakness: if the proposed algorithm is of the form "always take the biggest" (that is, a greedy algorithm), think about why that might prove to be the wrong thing to do. In particular, ...
Go for a tie: A devious way to break a greedy algorithm is to provide instances where everything is the same size. This way the algorithm may have nothing to base its decision on.
Seek extremes: many counter-examples are mixtures of huge and tiny, left and right, few and many, near and far. It is usually easier to verify or reason about extreme examples than more muddled ones.
#recursive solution in python
import sys
class Solution:
def __init__(self):
self.ans=0
self.maxint=sys.maxsize-1
def minCoins(self, coins, m, v):
res=self.solve(coins,m,v)
if res==sys.maxsize-1:
return -1
return res
def solve(self,coins,m,v):
if m==0 and v>0:
return self.maxint
if v==0:
return 0
if coins[m-1]<=v:
self.ans=min(self.solve(coins,m,v-coins[m-1])+1,self.solve(coins,m-1,v))
return self.ans
else:
self.ans=self.solve(coins,m-1,v)
return self.ans

Reducing the time complexity of this algorithm

I'm playing a game that has a weapon-forging component, where you combine two weapons to get a new one. The sheer number of weapon combinations (see "6.1. Blade Combination Tables" at http://www.gamefaqs.com/ps/914326-vagrant-story/faqs/8485) makes it difficult to figure out what you can ultimately create out of your current weapons through repeated forging, so I tried writing a program that would do this for me. I give it a list of weapons that I currently have, such as:
francisca
tabarzin
kris
and it gives me the list of all weapons that I can forge:
ball mace
chamkaq
dirk
francisca
large crescent
throwing knife
The problem is that I'm using a brute-force algorithm that scales extremely poorly; it takes about 15 seconds to calculate all possible weapons for seven starting weapons, and a few minutes to calculate for eight starting weapons. I'd like it to be able to calculate up to 64 weapons (the maximum that you can hold at once), but I don't think I'd live long enough to see the results.
function find_possible_weapons(source_weapons)
{
for (i in source_weapons)
{
for (j in source_weapons)
{
if (i != j)
{
result_weapon = combine_weapons(source_weapons[i], source_weapons[j]);
new_weapons = array();
new_weapons.add(result_weapon);
for (k in source_weapons)
{
if (k != i && k != j)
new_weapons.add(source_weapons[k]);
}
find_possible_weapons(new_weapons);
}
}
}
}
In English: I attempt every combination of two weapons from my list of source weapons. For each of those combinations, I create a new list of all weapons that I'd have following that combination (that is, the newly-combined weapon plus all of the source weapons except the two that I combined), and then I repeat these steps for the new list.
Is there a better way to do this?
Note that combining weapons in the reverse order can change the result (Rapier + Firangi = Short Sword, but Firangi + Rapier = Spatha), so I can't skip those reversals in the j loop.
Edit: Here's a breakdown of the test example that I gave above, to show what the algorithm is doing. A line in brackets shows the result of a combination, and the following line is the new list of weapons that's created as a result:
francisca,tabarzin,kris
[francisca + tabarzin = chamkaq]
chamkaq,kris
[chamkaq + kris = large crescent]
large crescent
[kris + chamkaq = large crescent]
large crescent
[francisca + kris = dirk]
dirk,tabarzin
[dirk + tabarzin = francisca]
francisca
[tabarzin + dirk = francisca]
francisca
[tabarzin + francisca = chamkaq]
chamkaq,kris
[chamkaq + kris = large crescent]
large crescent
[kris + chamkaq = large crescent]
large crescent
[tabarzin + kris = throwing knife]
throwing knife,francisca
[throwing knife + francisca = ball mace]
ball mace
[francisca + throwing knife = ball mace]
ball mace
[kris + francisca = dirk]
dirk,tabarzin
[dirk + tabarzin = francisca]
francisca
[tabarzin + dirk = francisca]
francisca
[kris + tabarzin = throwing knife]
throwing knife,francisca
[throwing knife + francisca = ball mace]
ball mace
[francisca + throwing knife = ball mace]
ball mace
Also, note that duplicate items in a list of weapons are significant and can't be removed. For example, if I add a second kris to my list of starting weapons so that I have the following list:
francisca
tabarzin
kris
kris
then I'm able to forge the following items:
ball mace
battle axe
battle knife
chamkaq
dirk
francisca
kris
kudi
large crescent
scramasax
throwing knife
The addition of a duplicate kris allowed me to forge four new items that I couldn't before. It also increased the total number of forge tests to 252 for a four-item list, up from 27 for the three-item list.
Edit: I'm getting the feeling that solving this would require more math and computer science knowledge than I have, so I'm going to give up on it. It seemed like a simple enough problem at first, but then, so does the Travelling Salesman. I'm accepting David Eisenstat's answer since the suggestion of remembering and skipping duplicate item lists made such a huge difference in execution time and seems like it would be applicable to a lot of similar problems.
Start by memoizing the brute force solution, i.e., sort source_weapons, make it hashable (e.g., convert to a string by joining with commas), and look it up in a map of input/output pairs. If it isn't there, do the computation as normal and add the result to the map. This often results in big wins for little effort.
Alternatively, you could do a backward search. Given a multiset of weapons, form predecessors by replacing one of the weapon with two weapons that forge it, in all possible ways. Starting with the singleton list consisting of the singleton multiset consisting of the goal weapon, repeatedly expand the list by predecessors of list elements and then cull multisets that are supersets of others. Stop when you reach a fixed point.
If linear programming is an option, then there are systematic ways to prune search trees. In particular, let's make the problem easier by (i) allowing an infinite supply of "catalysts" (maybe not needed here?) (ii) allowing "fractional" forging, e.g., if X + Y => Z, then 0.5 X + 0.5 Y => 0.5 Z. Then there's an LP formulation as follows. For all i + j => k (i and j forge k), the variable x_{ijk} is the number of times this forge is performed.
minimize sum_{i, j => k} x_{ijk} (to prevent wasteful cycles)
for all i: sum_{j, k: j + k => i} x_{jki}
- sum_{j, k: j + i => k} x_{jik}
- sum_{j, k: i + j => k} x_{ijk} >= q_i,
for all i + j => k: x_{ijk} >= 0,
where q_i is 1 if i is the goal item, else minus the number of i initially available. There are efficient solvers for this easy version. Since the reactions are always 2 => 1, you can always recover a feasible forging schedule for an integer solution. Accordingly, I would recommend integer programming for this problem. The paragraph below may still be of interest.
I know shipping an LP solver may be inconvenient, so here's an insight that will let you do without. This LP is feasible if and only if its dual is bounded. Intuitively, the dual problem is to assign a "value" to each item such that, however you forge, the total value of your inventory does not increase. If the goal item is valued at more than the available inventory, then you can't forge it. You can use any method that you can think of to assign these values.
I think you are unlikely to get a good general answer to this question because
if there was an efficient algorithm to solve your problem, then it would also be able to solve NP-complete problems.
For example, consider the problem of finding the maximum number of independent rows in a binary matrix.
This is a known NP-complete problem (e.g. by showing equivalence to the maximum independent set problem).
We can reduce this problem to your question in the following manner:
We can start holding one weapon for each column in the binary matrix, and then we imagine each row describes an alternative way of making a new weapon (say a battle axe).
We construct the weapon translation table such that to make the battle axe using method i, we need all weapons j such that M[i,j] is equal to 1 (this may involve inventing some additional weapons).
Then we construct a series of super weapons which can be made by combining different numbers of our battle axes.
For example, the mega ultimate battle axe may require 4 battle axes to be combined.
If we are able to work out the best weapon that can be constructed from your starting weapons, then we have solved the problem of finding the maximum number of independent rows in the original binary matrix.
It's not a huge saving, however looking at the source document, there are times when combining weapons produces the same weapon as one that was combined. I assume that you won't want to do this as you'll end up with less weapons.
So if you added a check for if the result_weapon was the same type as one of the inputs, and didn't go ahead and recursively call find_possible_weapons(new_weapons), you'd trim the search down a little.
The other thing I could think of, is you are not keeping a track of work done, so if the return from find_possible_weapons(new_weapons) returns the same weapon that you already have got by combining other weapons, you might well be performing the same search branch multiple times.
e.g. if you have a, b, c, d, e, f, g, and if a + b = x, and c + d = x, then you algorithm will be performing two lots of comparing x against e, f, and g. So if you keep a track of what you've already computed, you'll be onto a winner...
Basically, you have to trim the search tree. There are loads of different techniques to do this: it's called search. If you want more advice, I'd recommend going to the computer science stack exchange.
If you are still struggling, then you could always start weighting items/resulting items, and only focus on doing the calculation on 'high gain' objects...
You might want to start by creating a Weapon[][] matrix, to show the results of forging each pair. You could map the name of the weapon to the index of the matrix axis, and lookup of the results of a weapon combination would occur in constant time.

Partition a Set into k Disjoint Subset

Give a Set S, partition the set into k disjoint subsets such that the difference of their sums is minimal.
say, S = {1,2,3,4,5} and k = 2, so { {3,4}, {1,2,5} } since their sums {7,8} have minimal difference. For S = {1,2,3}, k = 2 it will be {{1,2},{3}} since difference in sum is 0.
The problem is similar to The Partition Problem from The Algorithm Design Manual. Except Steven Skiena discusses a method to solve it without rearrangement.
I was going to try Simulated Annealing. So i wondering, if there was a better method?
Thanks in advance.
The pseudo-polytime algorithm for a knapsack can be used for k=2. The best we can do is sum(S)/2. Run the knapsack algorithm
for s in S:
for i in 0 to sum(S):
if arr[i] then arr[i+s] = true;
then look at sum(S)/2, followed by sum(S)/2 +/- 1, etc.
For 'k>=3' I believe this is NP-complete, like the 3-partition problem.
The simplest way to do it for k>=3 is just to brute force it, here's one way, not sure if it's the fastest or cleanest.
import copy
arr = [1,2,3,4]
def t(k,accum,index):
print accum,k
if index == len(arr):
if(k==0):
return copy.deepcopy(accum);
else:
return [];
element = arr[index];
result = []
for set_i in range(len(accum)):
if k>0:
clone_new = copy.deepcopy(accum);
clone_new[set_i].append([element]);
result.extend( t(k-1,clone_new,index+1) );
for elem_i in range(len(accum[set_i])):
clone_new = copy.deepcopy(accum);
clone_new[set_i][elem_i].append(element)
result.extend( t(k,clone_new,index+1) );
return result
print t(3,[[]],0);
Simulated annealing might be good, but since the 'neighbors' of a particular solution aren't really clear, a genetic algorithm might be better suited to this. You'd start out by randomly picking a group of subsets and 'mutate' by moving numbers between subsets.
If the sets are large, I would definitely go for stochastic search. Don't know exactly what spinning_plate means when writing that "the neighborhood is not clearly defined". Of course it is --- you either move one item from one set to another, or swap items from two different sets, and this is a simple neighborhood. I would use both operations in stochastic search (which in practice could be tabu search or simulated annealing.)

Algorithm for solving set problem

If I have a set of values (which I'll call x), and a number of subsets of x:
What is the best way to work out all possible combinations of subsets whose union is equal to x, but none of whom intersect with each other.
An example might be:
if x is the set of the numbers 1 to 100, and I have four subsets:
a = 0-49
b = 50-100
c = 50-75
d = 76-100
then the possible combinations would be:
a + b
a + c + d
What you describe is called the Exact cover problem. The general solution is Knuth's Algorithm X, with the Dancing Links algorithm being a concrete implementation.
Given a well-order on the elements of x (make one up if necessary, this is always possible for finite or countable sets):
Let "sets chosen so far" be empty. Consider the smallest element of x. Find all sets which contain x and which do not intersect with any of the sets chosen so far. For each such set in turn recurse, adding the chosen set to "sets chosen so far", and looking at the smallest element of x not in any chosen set. If you reach a point where there is no element of x left, then you've found a solution. If you reach a point where there is no unchosen set containing the element you're looking for, and which does not intersect with any of the sets that you already have selected, then you've failed to find a solution, so backtrack.
This uses stack proportional to the number of non-intersecting subsets, so watch out for that. It also uses a lot of time - you can be far more efficient if, as in your example, the subsets are all contiguous ranges.
here's a bad way (recursive, does a lot of redundant work). But at least its actual code and is probably halfway to the "efficient" solution.
def unique_sets(sets, target):
if not sets and not target:
yield []
for i, s in enumerate(sets):
intersect = s.intersection(target) and not s.difference(target)
sets_without_s = sets[:i] + sets[i+1:]
if intersect:
for us in unique_sets(sets_without_s, target.difference(s)):
yield us + [s]
else:
for us in unique_sets(sets_without_s, target):
yield us
class named_set(set):
def __init__(self, items, name):
set.__init__(self, items)
self.name = name
def __repr__(self):
return self.name
a = named_set(range(0, 50), name='a')
b = named_set(range(50, 100), name='b')
c = named_set(range(50, 75), name='c')
d = named_set(range(75, 100), name='d')
for s in unique_sets([a,b,c,d], set(range(0, 100))):
print s
A way (may not be the best way) is:
Create a set of all the pairs of subsets which overlap.
For every combination of the original subsets, say "false" if the combination contains one or more of the pairs listed in Step 1, else say "true" if the union of the subsets equals x (e.g. if the total number of elements in the subsets is x)
The actual algorithm seems largely dependent on the choice of subsets, product operation, and equate operation. For addition (+), it seems like you could find a summation to suit your needs (the sum of 1 to 100 is similar to your a + b example). If you can do this, your algorithm is obviously O(1).
If you have a tougher product or equate operator (let's say taking a product of two terms means summing the strings and finding the SHA-1 hash), you may be stuck doing nested loops, which would be O(n^x) where x is the number of terms/variables.
Depending on the subsets you have to work with, it might be advantageous to use a more naive algorithm. One where you don't have to compare the entire subset, but only upper and lower bounds.
If you are talking random subsets, not necesserily a range, then Nick Johnson's suggestion will probably be the best choice.

Resources