I have no problem in implementing this algorithm in any imperative language, but I am struggling implementing it in Clojure or any other functional language. A lot of algorithms are described in terms of working with mutable data structures and imperative loops and it is hard for me to translate all of those to a functional domain.
Here is my incomplete attempt (a draft, not a working implementation) at implementing it in Clojure using adjacency lists as graph representation:
(ns karger.core
(:require [clojure.string :as string]))
(defn load-data []
(zipmap
(range 1 1000)
(map rest (map read-string
(string/split (slurp "data.txt") #"\n")))))
(defn min-cut [graph]
(let [start (rand-int (count graph))
end (rand-int (graph start))
start-list (nth graph start)]
(for [x (graph end)
:when (not= x start)]
(assoc graph start (conj start-list x)))
))
(count (load-data))
Can anyone give me a reference implementation of this algorithm (preferably written in Clojure)? Also I would like if someone gave me a general advice of translating an algorithm described in imperative terms to a functional domain.
Thanks in advance!
UPDATE #1
Here is a link to algorithm implementation written in Python: http://pastebin.com/WwWCtxpu
There are fundamental problems with your code as is:
your start-list binding is a number and cannot be conjed to'
you are calling assoc and ignoring the return value, thus making it a no-op.
you are using for as if it were a looping construct (it is a list comprehension)
you are calling nth on a hash-map, which will always fail (zipmap returns a hash-map)
In general the idea in functional programming is to lift mutable variables into immutable local bindings by making the "state of the world" entirely encapsulated by the function arguments, and making function calls with refined versions of that state.
Here is a working implementation from scratch based on the python solution you posted, and the graph file used by the java example here
(ns min-cut.core
(:require [clojure.java.io :as io]
[clojure.string :as string]
[clojure.pprint :refer [pprint]]))
(defn make-maps
[filename]
(reduce (fn [graph line]
(let [[node & edges] (->> line
(#(string/split % #"\W+"))
(remove #{""})
(map read-string))]
(assoc graph node (set edges))))
{}
(line-seq (io/reader filename))))
(defn karger
[graph]
(if (<= (count (keys graph))
2)
(count (graph (apply min (keys graph))))
(let [start (rand-nth (keys graph))
finish (rand-nth (vec (graph start)))
graph (loop [g graph
[edge & edges] (seq (graph finish))]
(if-not edge
g
(recur
(if (= edge start)
g
(update-in g [start] conj edge))
edges)))
graph (loop [g graph
[edge & edges] (seq (graph finish))]
(if-not edge
g
(let [gr (update-in g [edge] disj finish)
gr (if (= edge start)
gr
(update-in gr [edge] conj start))]
(recur gr edges))))
graph (dissoc graph finish)]
(recur graph))))
(defn -main
[& [file]]
(let [file (or file "kargerAdj.txt")
graph (make-maps file)]
(println "min cut is: "
(reduce min (repeatedly 1801 #(karger graph))))))
This is a very literal translation of the python code, so there are multiple places this code could be improved. For starters the two loops in the karger function could likely be replaced by a single reduce which would be much more concise and clear.
Note that no value that is created in this code is ever mutated - values are rebound but none of the incoming data structures are changed, and the only global definitions used are the functions make-maps, karger, and -main - all data is locally bound and passed to the next user.
Related
Using dictionaries, graphs, and lists, I'm attempting to implement the Dijkstra algorithm with BFS in Clojure. The issue is that I can't get it to work correctly; it won't work when I ask it to return the solution with weights, and it also won't work when I ask it to produce the graph without weights.
The problem occurs, most likely, at the BFS function; I would very appreciate any assistance with this.
I tried to search for information online and spoke with some of my classmates, but unfortunately it didn't help at all.
The files: e-roads-2020-full.clj + project.clj are in GitHub https://github.com/wfgemyd/clojure
;; state 0 - not encountered at all
;; state 1 - in the open queue
;; state 2 - current vertex
;; state 3 - visited
(defn al-papi [queue graph] ;;looks for the best vertex and dist
(loop [queue queue
best-distance nil
best-vertex nil]
(if (empty? queue)
best-vertex
(let [queue-label (first queue)
queue-vertex (get #(:vertices graph) queue-label)]
(if (or (nil? best-vertex) (< #(:distance queue-vertex) best-distance))
(recur (rest queue) #(:distance queue-vertex) queue-vertex)
(recur (rest queue) best-distance best-vertex))))))
(defn graph-bfs!
([graph]
(graph-bfs! graph (first (keys #(:vertices graph)))))
([graph start]
(graph-bfs! graph start (fn [vertex] nil)))
([graph start func]
(graph-bfs! graph start func first))
([graph start func func-m]
(let [vertices #(:vertices graph)]
(loop [queue (list start)]
(when (not (empty? queue))
(let [current-label (if (= func-m al-papi)(func-m queue graph)(func-m queue))
rest-queue (rest-queue! queue current-label)
current-vertex (get vertices current-label)
visited-status (:visited current-vertex)
current-neighbors #(:neighbors current-vertex)
unseen-neighbors (filter
(fn [label]
(= #(:visited (get vertices label)) 0))
current-neighbors)
]
(dosync (ref-set visited-status 2))
(func current-vertex)
(dosync (ref-set visited-status 3))
(doseq [label unseen-neighbors]
(dosync
(ref-set (:visited (get vertices label)) 1)))
(recur (concat rest-queue unseen-neighbors))))))))
(defn graph-dijkstra-mark! [graph finish use-weights]
(let [vertices #(:vertices graph)
start-vertex (get vertices finish)]
(graph-reset! graph)
(dosync
(ref-set (:distance start-vertex) 0))
(if (not use-weights)
(graph-bfs! graph
finish
(fn [vertex]
(let [next-distance (inc #(:distance vertex))]
(doseq [neighbor-label #(:neighbors vertex)]
(let [neighbor (get vertices neighbor-label)]
(if (= #(:visited neighbor) 0)
(dosync
(ref-set (:distance neighbor) next-distance))))))))
(graph-bfs! graph
finish
(fn [vertex]
(doseq [neighbor-label #(:neighbors vertex)]
(let [neighbor (get vertices neighbor-label)
next-distance (+ #(:distance vertex) (get-edge-weight graph (:label vertex) neighbor-label))]
(println "There is bfs!")
(when (or (= #(:visited neighbor) 0) (> #(:distance neighbor) next-distance))
(dosync
(ref-set (:distance neighbor) next-distance))))))
al-papi))))
Found the bug, it was in the back trace function, it was returning the ref and not the actual data that was needed in the function, plus the function "al-papi" was returning the best-vertex and not the :label of the best-vertex.
For a class project I am implementing the Bron-Kerbosch algorithm for finding maximal cliques in a graph. With help from others on SO I have gotten down to the final few issues.
This link (http://pastebin.com/2GUPZFnR) contains a SSCCE of my current implementation that outlines the issue. The issue I think lies with my use of disj to find the intersection of two lists. I think this based on the error given when I call BK-Call with the "sanity" input.
fptests.core> (BK-Call (sanity1))
ClassCastException clojure.lang.PersistentList$EmptyList cannot be cast to clojure.lang.IPersistentSet clojure.core/disj (core.clj:1449)
This error tracks down to a few lines in my Bron-Kerbosch function itself
(defn Bron-Kerbosch [r p x graph cliques]
(cond (and (empty? p) (empty? x)) (conj cliques r)
:else
(let [neigh (neighV graph (dec (count p)))]
(loop [loop-clq '(cliques)
loop-cnt '(dec (count p))
loop-p '(p)
loop-x '(x)]
(cond (= -1 loop-cnt) loop-clq
:else
(recur (conj loop-clq (Bron-Kerbosch (conj r loop-cnt) (conj p neigh) (disj x neigh)))
(dec loop-cnt)
(disj p loop-cnt)
(conj x loop-cnt)))))))
Specifically in the recursive call to the function in the recur form. Though this issue I think applies to all uses of conj and disj. It seems that conj "works" but not in the manner I assumed.
fptests.core> (disj '(1) '(1 2 3))
ClassCastException clojure.lang.PersistentList cannot be cast to clojure.lang.IPersistentSet clojure.core/disj (core.clj:1449)
fptests.core> (conj '(1) '(2 3))
((2 3) 1)
I assumed that (conj '(1) '(2 3)) would return (1 2 3) and not (1 (2 3)). So it seems my use of lists in the function overall is the issue. Is there a way that I could overcome this issue?
I would have to imagine there are functions like conj and disj that work with lists. I guess my other option if this is not true is to use some other data structure in the algorithm. What would be appropriate?
use cons, rest and concat.
user=> (cons 1 '(2 3))
(1 2 3)
user=> (cons '(1) '(2 3))
((1) 2 3)
user=> (cons 1 ())
(1)
user=> (rest '(1 2 3))
(2 3)
user=> (concat '(1) '(2 3))
(1 2 3)
I would indeed suggest using a data structure more suited to the problem at hand, namely a set.
You're already using Clojure hash sets in your graph representation ((repeat n #{}) in empty-graph); the r, p and x parameters of the Bron-Kerbosch step function are all conceptually sets, so it makes sense to represent them as Clojure sets as well.
With that choice of representation, things will become simpler for you – set intersection can be computed using clojure.set/intersection, disj works for removing individual keys etc.
On a separate note, it would be more natural to use if rather than cond in Bron-Kerbosch (both conds in that function actually only have two branches). More importantly, you'll want to remove the quotes from the init expressions in your loop – '(dec (count p)) (notice the '), to take one example, is a two-element list, not a number. (It's also somewhat unusual in Clojure for function names to include capital letters, but of course this is purely a matter of style.)
I wrote the code below for game I am working on. But it seems a little slow. If you have not checked the code yet, it's the A* search/pathfinding algorithm. It takes about 100-600 ms for a 100x100 grid, depending on the heuristic used (and consequently the number of tiles visited).
There are no reflection warnings. However, I suspect boxing might be an issue. But I don't know how to get rid of boxing in this case, because the computation is split among several functions. Also, I save tiles/coordinates as vectors of two numbers, like this: [x y]. But then the numbers will be boxed, right? A typical piece of code, if you don't want to read through it all, is: (def add-pos (partial mapv + pos)) where pos is the aforementioned kind of two-number vector. There are sereval of places where the numbers are manipulated in a way similar to add-pos above, and put back in a vector afterwards. Is there any way to optimize code like this? Any other tips is welcome too, performance-related or other.
EDIT: Thinking some more about it, I came up with a few follow-up questions: Can a Clojure function ever return primitives? Can a Clojure function ever take primitives (without any boxing)? Can I put primitives in a type/record without boxing?
(ns game.server.pathfinding
(:use game.utils)
(:require [clojure.math.numeric-tower :as math]
[game.math :as gmath]
[clojure.data.priority-map :as pm]))
(defn walkable? [x]
(and x (= 1 x)))
(defn point->tile
([p] (apply point->tile p))
([x y] [(int x) (int y)]))
(defn get-tile [m v]
"Gets the type of the tile at the point v in
the grid m. v is a point in R^2, not grid indices."
(get-in m (point->tile v)))
(defn integer-points
"Given an equation: x = start + t * step, returns a list of the
values for t that make x an integer between start and stop,
or nil if there is no such value for t."
[start stop step]
(if-not (zero? step)
(let [first-t (-> start ((if (neg? step) math/floor math/ceil))
(- start) (/ step))
t-step (/ 1 (math/abs step))]
(take-while #((if (neg? step) > <) (+ start (* step %)) stop)
(iterate (partial + t-step) first-t)))))
(defn crossed-tiles [[x y :as p] p2 m]
(let [[dx dy :as diff-vec] (map - p2 p)
ipf (fn [getter]
(integer-points (getter p) (getter p2) (getter diff-vec)))
x-int-ps (ipf first)
y-int-ps (ipf second)
get-tiles (fn [[x-indent y-indent] t]
(->> [(+ x-indent x (* t dx)) (+ y-indent y (* t dy))]
(get-tile m)))]
(concat (map (partial get-tiles [0.5 0]) x-int-ps)
(map (partial get-tiles [0 0.5]) y-int-ps))))
(defn clear-line?
"Returns true if the line between p and p2 passes over only
walkable? tiles in m, otherwise false."
[p p2 m]
(every? walkable? (crossed-tiles p p2 m)))
(defn clear-path?
"Returns true if a circular object with radius r can move
between p and p2, passing over only walkable? tiles in m,
otherwise false.
Note: Does not currently work for objects with a radius >= 0.5."
[p p2 r m]
(let [diff-vec (map (partial * r) (gmath/normalize (map - p2 p)))
ortho1 ((fn [[x y]] (list (- y) x)) diff-vec)
ortho2 ((fn [[x y]] (list y (- x))) diff-vec)]
(and (clear-line? (map + ortho1 p) (map + ortho1 p2) m)
(clear-line? (map + ortho2 p) (map + ortho2 p2) m))))
(defn straighten-path
"Given a path in the map m, remove unnecessary nodes of
the path. A node is removed if one can pass freely
between the previous and the next node."
([m path]
(if (> (count path) 2) (straighten-path m path nil) path))
([m [from mid to & tail] acc]
(if to
(if (clear-path? from to 0.49 m)
(recur m (list* from to tail) acc)
(recur m (list* mid to tail) (conj acc from)))
(reverse (conj acc from mid)))))
(defn to-mid-points [path]
(map (partial map (partial + 0.5)) path))
(defn to-tiles [path]
(map (partial map int) path))
(defn a*
"A* search for a grid of squares, mat. Tries to find a
path from start to goal using only walkable? tiles.
start and goal are vectors of indices into the grid,
not points in R^2."
[mat start goal factor]
(let [width (count mat)
height (count (first mat))]
(letfn [(h [{pos :pos}] (* factor (gmath/distance pos goal)))
(g [{:keys [pos parent]}]
(if parent
(+ (:g parent) (gmath/distance pos (parent :pos)))
0))
(make-node [parent pos]
(let [node {:pos pos :parent parent}
g (g node) h (h node)
f (+ g h)]
(assoc node :f f :g g :h h)))
(get-path
([node] (get-path node ()))
([{:keys [pos parent]} path]
(if parent
(recur parent (conj path pos))
(conj path pos))))
(free-tile? [tile]
(let [type (get-in mat (vec tile))]
(and type (walkable? type))))
(expand [closed pos]
(let [adj [[1 0] [0 1] [-1 0] [0 -1]]
add-pos (partial mapv + pos)]
(->> (take 4 (partition 2 1 (cycle adj)))
(map (fn [[t t2]]
(list* (map + t t2) (map add-pos [t t2]))))
(map (fn [[d t t2]]
(if (every? free-tile? [t t2]) d nil)))
(remove nil?)
(concat adj)
(map add-pos)
(remove (fn [[x y :as tile]]
(or (closed tile) (neg? x) (neg? y)
(>= x width) (>= y height)
(not (walkable? (get-in mat tile)))))))))
(add-to-open [open tile->node [{:keys [pos f] :as node} & more]]
(if node
(if (or (not (contains? open pos))
(< f (open pos)))
(recur (assoc open pos f)
(assoc tile->node pos node)
more)
(recur open tile->node more))
{:open open :tile->node tile->node}))]
(let [start-node (make-node nil start)]
(loop [closed #{}
open (pm/priority-map start (:f start-node))
tile->node {start start-node}]
(let [[curr _] (peek open) curr-node (tile->node curr)]
(when curr
(if (= curr goal)
(get-path curr-node)
(let [exp-tiles (expand closed curr)
exp-nodes (map (partial make-node curr-node) exp-tiles)
{:keys [open tile->node]}
(add-to-open (pop open) tile->node exp-nodes)]
(recur (conj closed curr) open tile->node))))))))))
(defn find-path [mat start goal]
(let [start-tile (point->tile start)
goal-tile (point->tile goal)
path (a* mat start-tile goal-tile)
point-path (to-mid-points path)
full-path (concat [start] point-path [goal])
final-path (rest (straighten-path mat full-path))]
final-path))
I recommend the Clojure High Performance Programming book for addressing questions like yours.
There are functions to unbox primitives (byte, short, int, long, float, double).
Warn-on-reflection does not apply to numeric type reflection / failure to optimize numeric code. There is a lib to force warnings for numeric reflection - primitive-math.
You can declare the types of function arguments and function return values (defn ^Integer foo [^Integer x ^Integer y] (+ x y)).
Avoid apply if you want performance.
Avoid varargs (a common reason to need apply) if you want performance. Varargs functions create garbage on every invocation (in order to construct the args map, which usually is not used outside the function body). partial always constructs a varargs function. Consider replacing the varargs (partial * x) with #(* x %), the latter can be optimized much more aggressively.
There is a tradeoff with using primitive jvm single-type arrays (they are mutible and fixed in length, which can lead to more complex and brittle code), but they will perform better than the standard clojure sequential types, and are available if all else fails to get the performance you need.
Also, use criterium to compare various implementations of your code, it has a bunch of tricks to help rule out the random things that affect execution time so you can see what really performs best in a tight loop.
Also, regarding your representation of a point as [x y] - you can reduce the space and lookup overhead of the collection holding them with (defrecord point [x y]) (as long as you know they will remain two elements only, and you don't mind changing your code to ask for (:x point) or (:y point)). You could further optimize by making or using a simple two-number java class (with the tradeoff of losing immutibility).
I'm trying to understand what is the idiomatic way in Clojure to recurse through a tree or a list represented by a Clojure list (or another collection type).
I could write the following to count the elements in a flat collection (ignore the fact that it's not tail-recursive):
(defn length
([xs]
(if (nil? (seq xs))
0
(+ 1 (length (rest xs))))))
Now in Scheme or CL all the examples only ever do this over lists, so the idiomatic base case test in those languages would be (nil? xs). In Clojure we'd like this function to work on all collection types, so is the idiomatic test (nil? (seq xs)), or maybe (empty? xs), or something completely different?
The other case I'd like to consider is tree traversal, i.e. traversing through a list or vector that represents a tree, e.g. [1 2 [3 4].
For example, counting the nodes in a tree:
(defn node-count [tree]
(cond (not (coll? tree)) 1
(nil? (seq tree)) 0
:else (+ (node-count (first tree)) (node-count (rest tree)))))
Here we use (not (coll? tree)) to check for atoms, whereas in Scheme/CL we'd use atom?. We also use (nil? (seq tree)) to check for an empty collection. And finally we use first and rest to destructure the current tree to the left branch and the rest of the tree.
So to summarise, are the following forms idiomatic in Clojure:
(nil? (seq xs)) to test for the empty collection
(first xs) and (rest xs) to dig into the collection
(not (coll? xs)) to check for atoms
The idiomatic test for a non-empty seqable is (seq coll):
(if (seq coll)
...
)
The nil? is unnecessary, since a non-nil return value from seq is guaranteed to be a seq and thus neither nil nor false and therefore truthy.
If you want to deal with the nil case first, you can change the if to if-not or seq to empty?; the latter is implemented as a composition of seq with not (which is why it is not idiomatic to write (not (empty? xs)), cf. the docstring of empty?).
As for first / rest -- it's useful to remember about the strict variant of rest, next, the use of which is more idiomatic than wrapping rest in a seq.
Finally, coll? checks if its argument is a Clojure persistent collection (an instance of clojure.lang.IPersistentCollection). Whether this is an appropriate check for "non-atoms" depends on whether the code needs to handle Java data structures as non-atoms (via interop): e.g. (coll? (java.util.HashSet.)) is false, as is (coll? (into-array [])), but you can call seq on both. There is a function called seqable? in core.incubator in the new modular contrib which promises to determine whether (seq x) would succeed for a given x.
I personally like the following approach to recurse through a collection:
(defn length
"Calculate the length of a collection or sequence"
([coll]
(if-let [[x & xs] (seq coll)]
(+ 1 (length xs))
0)))
Features:
(seq coll) is idiomatic for testing whether a collection is empty (as per Michal's great answer)
if-let with (seq coll) automatically handles both the nil and empty collection case
You can use destructuring to name the first and next elements as you like for use in your function body
Note that in general it is better to write recursive functions using recur if possible, so that you get the benefits of tail recursion and don't risk blowing up the stack. So with this in mind, I'd actually probably write this specific function as follows:
(defn length
"Calculate the length of a collection or sequence"
([coll]
(length coll 0))
([coll accumulator]
(if-let [[x & xs] (seq coll)]
(recur xs (inc accumulator))
accumulator)))
(length (range 1000000))
=> 1000000
I am working on a problem which requires finding all paths between two nodes in a directed graph. The graph may have cycles.
Notice that this particular implementation approach is an iterative DFS.
Several approaches I've considered are as follows -
BFS does not appear to have a way to neatly manage this kind of pathing relationships between nodes.
I don't see an easy mechanism for a DFS recursive algorithm to pass up the path when the terminating node is found. (Likely enough it could be done, if I implement a Maybe monad kind of thing).
Creating a GRAPH-PARENT routine. That would add a decent amount of churn (& bugs) in the existing code.
Abstractly, what needs to happen is a tree needs to be generated, with the start node as root, and all leafs are the terminating nodes. Each path from leaf to root is a legal path. That is what a recursive DFS would trace out.
I'm reasonably sure it can be done here, but I don't see exactly how to do it.
I've defined a protocol for this algorithm where GRAPH-EQUAL and GRAPH-NEXT can be defined for arbitrary objects.
The debug node type is a SEARCH-NODE, and it has the data accessor SEARCH-NODE-DATA.
(defun all-paths (start end)
(let ((stack (list start))
(mark-list (list start)) ;I've chosen to hold marking information local to all-paths, instead of marking the objects themselves.
(all-path-list '())) ; Not used yet, using debug statements to think about the problem
(do () ;; intializing no variables
;; While Stack still has elements
((not stack))
(let ((item (pop stack)))
;; I'm looking at the item.
(format t "I: ~a~%" (search-node-data item))
(cond ((graph-equal item end)
(format t "*Q: ~a~%" (loop for var in stack collect (search-node-data var)))
;;Unmark the terminal node so we can view it it next time.
(setf mark-list (remove item mark-list))))
(loop for next in (graph-next item)
do
(cond ((not (in next mark-list :test #'graph-equal))
;; mark the node
(push next mark-list)
;;Put it on the stack
(push next stack))))))))
See A Very General Method for Computing Shortest Paths for an algorithm that can return all paths in a graph (even when there are cycles) as regular expressions over the alphabet of edges in finite time (assuming a finite graph).
You need to pass the path list (mark-list) along with the nodes, since that is part of the state. I've renamed it path in this code:
(defun all-paths (start end)
(let ((stack (list '(start (start)))) ; list of (node path) elements
(all-path-list '()))
(do ()
((not stack))
(let ((item (pop stack)))
(let ((node (first item))
(path (second item)))
(format t "I: ~a~%" (search-node-data node))
(cond ((graph-equal node end)
(format t "*Q: ~a~%"
(loop for var in path
collect (search-node-data var)))))
(loop for next in (graph-next node)
do
(cond ((not (in next path :test #'graph-equal))
(push (list next (cons next path)) stack)))))))))