List processing in clojure, tail recursion needed - algorithm

Given a sorted list of intervals, e.g.
(def lst (list [7 10] [32 35]))
I need to implement a function that adds a new interval to the list. If the new interval is adjacent to any of those from the list, they should be merged:
(= (add-range [1 3] lst) (list [1 3] [7 10] [32 35])) ;; prepend left
(= (add-range [1 6] lst) (list [1 10] [32 35])) ;; merge left
(= (add-range [11 20] lst) (list [7 20] [32 35])) ;; merge right
(= (add-range [11 31] lst) (list [7 35])) ;; merge left and right
This is my implementation:
(defn add-range
[range range-list]
(if (empty? range-list)
(list range)
(let
[lo (first range)
hi (second range)
head (first range-list)
head-lo (dec (first head))
head-hi (inc (second head))]
(if (< hi head-lo)
(cons range range-list)
(if (= hi head-lo)
(cons [lo (second head)] (rest range-list))
(if (= lo head-hi)
(recur [(first head) hi] (rest range-list))
(cons head (add-range range (rest range-list)))))))))
It works and looks quite elegant too, but the last line contains a recursive call add-range which can not be replaced with recur because it is not the last call. I'm planning to have long range lists in my application and I don't want to blow up the stack.
How this can be rewritten using the tail recursion?
Is there another approach to solve the problem? Lazy sequences maybe?
UPDATE
The sorted list is actually not required. This can be a set or even an unsorted list, but it would be really nice to do it in a single pass.

Using a sorted set you can implement it as:
;; first the constructor
(defn ranges [& rs]
(apply sorted-set-by
(fn [[from-a to-a] [from-b to-b]]
(< to-a (dec from-b))) rs))
;; then add-range itself
(defn add-range [ranges [from to :as r]]
(let [rs (subseq ranges <= [from from] <= [to to])
ranges (reduce disj ranges rs)]
(conj ranges
(let [[from'] (or (first rs) r)
[_ to'] (or (last rs) r)]
[(min from from') (max to to')]))))
Let's try your tests:
=> (def lst (ranges [7 10] [32 35]))
#'user/lst
=> (add-range lst [1 3])
#{[1 3] [7 10] [32 35]}
=> (add-range lst [1 6])
#{[7 10] [32 35]}
=> (add-range lst [11 20])
#{[7 20] [32 35]}
=> (add-range lst [11 35])
#{[7 35]}
Addendum #1: add-range is O((m + 1) log n) where n is the size of the ranges set and m the number of merged intervals.

In my experience making something tail recursive involves passing as arguments all local state. Looking at the algo, it looks like already processed range items is the local state. ie, final result = (ranges ignored + merged-range + ranges not required to be considered).
Consider the following version, it explicitly passes a seq of already processed items.
(defn add-range
[range-obj ranges]
(loop [processed []
range-obj range-obj
remaining (list* ranges)]
(if (empty? remaining)
(conj processed range-obj)
(let [[lo hi] range-obj
[h-lo h-hi :as head] (first remaining)
upper-merge-threshold (dec h-lo)
lower-merge-threshold (inc h-hi)]
(cond
(< hi upper-merge-threshold) (into processed
(conj remaining range-obj))
(= hi upper-merge-threshold) (into processed
(conj (rest remaining) [lo h-hi]))
(= lo lower-merge-threshold) (recur processed
[h-lo hi]
(rest remaining))
:else (recur (conj processed head)
range-obj
(rest remaining)))))))
My version accepts a vector and returns a vector. You could modify the relevant code to make it accept a list and return a list.
As for is there a better algorithm, I don't know. I have simply converted your algo to be tail recursive.

Related

multiple filters in a single iteration

Suppose I have a list of tuples like so:
[["type_2" "val_x"] ["type_1" "val_y"] ["type_1" "val_z"]]
I'd like to filter them, so that I have two separate collections like this:
[["type_2" "val_x"]]
[["type_1" "val_y"] ["type_1" "val_z"]]
I can run filter twice. I'm wondering if it's possible to achieve the same result in a single iteration with functional programming?
This is the desired interface:
(multiple-filter predicate_fn_1 predicate_fn_2 coll)
while (vals (group-by first... would work ok in your case, it is not universal. Here is a variant (one of many possible ones) of applying multiple filters:
(defn classify [items & preds]
(loop [[x & xs :as items] items
res (repeat (count preds) [])]
(if (empty? items)
res
(recur xs
(mapv #(if (% x) (conj %2 x) %2) preds res)))))
in repl:
user> (classify [[:a 10] [:a 20] [:b 30] [:d 2] [:c 40] [:d 1]]
#(= (first %) :a)
#(= (first %) :b)
#(= (first %) :d))
[[[:a 10] [:a 20]] [[:b 30]] [[:d 2] [:d 1]]]
or the same with reduce:
(defn classify [items & preds]
(reduce (fn [res x] (mapv #(if (% x) (conj %2 x) %2) preds res))
(repeat (count preds) [])
items))
The classify function by #leetwinski fails to satisfy your desired interface; as an example, here is a compliant implementation:
(defn multiple-filter [& preds-and-coll]
(let [[preds coll] ((juxt drop-last last) preds-and-coll)]
(mapv #(filterv % coll) preds)))
Example:
(multiple-filter (comp #{"type_1"} first)
(comp #{"type_2"} first)
[["type_2" "val_x"] ["type_1" "val_y"] ["type_1" "val_z"]])
;;=> [[["type_1" "val_y"] ["type_1" "val_z"]] [["type_2" "val_x"]]]
I haven't implemented this as a single iteration because that would complicate this answer and not affect the algorithmic complexity, but feel free to replace my implementation using mapv and filterv with #leetwinski's single-iteration implementation.

Clojure: Find locations of "1" in a string and print out them in the format of intervals

I'm tring to solve such a problem:
Given a string consisting of "1" and "0", find all the locations of "1", and print them in the format of intervals.
For example:
"00101110101110" => 3, 5-7, 9, 11-13
My (ugly) solution:
(defn bar [x]
(letfn [(foo [mystr]
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %)))))]
(let [y (map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x))]
(print (ffirst y) "-" y (last (last (foo x)))))))
Explanation:
At first, I find the locations of "1" in the given string:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first)))
"00101110101110" => (3 5 6 7 9 11 12 13)
Then, I partition the list of positions into a sequence of 2-element-tuples. If there is an 1-element-tuple at the end of that sequence, drop it:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %))))
"00101110101110" => ((3 5) (5 6) (6 7) (7 9) (9 11) (11 12) (12 13))
At last, I print out the first position in the first tuple and the second one in the last tuple, while employing
(map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x)) to get the middle part.
The input:
(bar "00101110101110")
The final result:
3 , 5 -nil - (nil nil 7 , 9 -nil 9 , 11 -nil nil nil nil) 13
My questions:
How can I remove the nils in the final result?
How can I solve this problem in a more concise way?
In order to understand how to remove nils from the final result, let's understand how they get in there in the first place. The value bound to name y in the last let form is actually a sequence of all nil values. The function bar itself also returns nil. This occurs because print always returns nil and if returns nil when the condition is false and the "else" form is not present. Effectively, every value in the sequence returned by foo is converted to a nil. Non-nil values in the output are values printed as a side effect. nil and non-nil values are mixed because map is lazy and the mapping function is applied only when the last print realizes the lazy sequence y. Needless to say, using map for side effects is a bad idea.
So the simplest way to remove nils from the output is to avoid nil values altogether.
(->> "00101110101110"
(map-indexed vector) ;; ([0 \0] [1 \0] [2 \1]...
(partition-by second) ;; (([0 \0] [1 \0]) ([2 \1]) ([3 \0]) ([4 \1] [5 \1] [6 \1]) ...
(filter #(= \1 (-> % first second))) ;; (([2 \1]) ([4 \1] [5 \1] [6 \1])...
(map #(map (comp inc first) %)) ;; ((3) (5 6 7) (9) (11 12 13))
(map #(if (next %) [(first %) (last %)] %)) ;; ((3) [5 7] (9) [11 13])
(map #(clojure.string/join "-" %)) ;; ("3" "5-7" "9" "11-13")
(clojure.string/join ", "))
;; => "3, 5-7, 9, 11-13"
I found this problem kinda interesting, so I tried to attack it with the approach from this talk: Higher parallelism by mapping the data into a more convenient space and then in parallel combining sub-solutions. To that end, I focused on producing the intervals themselves in parallel; using transducers to perform all intermediate steps, then making an eduction and folding over that. This kind of organisation makes for a number of helper functions and such, so maybe not as good on the concision but hopefully interesting anyway.
I go through an intermediate representation as nested vectors: [accepted boundary], where the interval represented by the 2-vector boundary grows until there is a discontinuity, in which case it is added to the end of accepted.
(defn indices "Transducer finding indices of an element occuring in a sequence"
[element]
(keep-indexed #(when (= element %2) %1)))
(defn combine "Combine two series of intervals"
([] [[] nil])
([[acc-a bnd-a] [acc-b bnd-b]]
(let[ [[a b] [c d]] [bnd-a (first acc-b)] ]
(if (<= b c (inc b))
[(into acc-a (concat [[a d]] (pop acc-b) )) bnd-b]
[(into acc-a (concat [bnd-a] acc-b)) bnd-b]))))
(defn plus "Add an interval to the series"
([] [[] nil])
([[accepted boundary] to-add]
(if (nil? boundary)
[accepted to-add]
(let[[[a b] [c d]] [boundary to-add]]
(if (<= b c (inc b))
[accepted [a d]]
[(conj accepted boundary) to-add])))))
(defn printable-indices [element the-seq]
(let[glommed (clojure.core.reducers/fold combine plus (eduction (comp (indices \1) (map #(vector % %))) the-seq))
fixed-up (conj (first glommed) (last glommed))] ;;Because the reduction is done, the last boundary is now accepted.
(clojure.string/join ", " (map (fn [[a b]](if (= a b) (str a) (str a \- b)))) fixed-up)))

Clojure Group Sequential Occurrences - Improve Function

I'm trying to group items that appear directly beside each other, so long as they are each in a given "white-list". Groupings must have at least two or more items to be included.
For example, first arg is the collection, second arg the whitelist.
(group-sequential [1 2 3 4 5] [2 3])
>> ((2 3))
(group-sequential ["The" "quick" "brown" "healthy" "fox" "jumped" "over" "the" "fence"]
["quick" "brown" "over" "fox" "jumped"])
>> (("quick" "brown") ("fox" "jumped" "over"))
(group-sequential [1 2 3 4 5 6 7] [2 3 6])
>> ((2 3))
This is what I've come up with:
(defn group-sequential
[haystack needles]
(loop [l haystack acc '()]
(let [[curr more] (split-with #(some #{%} needles) l)]
(if (< (count curr) 2)
(if (empty? more) acc (recur (rest more) acc))
(recur (rest more) (cons curr acc))))))
It works, but is pretty ugly. I wonder if there's a much simpler idiomatic way to do it in Clojure? (You should have seen the fn before I discovered split-with :)
I bet there's a nice one-liner with partition-by or something, but it's late and I can't quite seem to make it work.
(defn group-sequential [coll white]
(->> coll
(map (set white))
(partition-by nil?)
(filter (comp first next))))
... a tidier version of Diego Basch's method.
Here's my first attempt:
(defn group-sequential [xs wl]
(let [s (set wl)
f (map #(if (s %) %) xs)
xs' (partition-by nil? f)]
(remove #(or (nil? (first %)) (= 1 (count %))) xs')))
(defn group-sequential
[coll matches]
(let [matches-set (set matches)]
(->> (partition-by (partial contains? matches-set) coll)
(filter #(clojure.set/subset? % matches-set))
(remove #(< (count %) 2)))))
Ok, I realized partition-by is pretty close to what I'm looking for, so I created this function which seems a lot more in line with the core stuff.
(defn partition-if
"Returns a lazy seq of partitions of items that match the filter"
[pred coll]
(lazy-seq
(when-let [s (seq coll)]
(let [[in more0] (split-with pred s)
[out more] (split-with (complement pred) more0)]
(if (empty? in)
(partition-if pred more)
(cons in (partition-if pred more)))))))
(partition-if #(some #{%} [2 3 6]) [1 2 3 4 5 6 7])
>> ((2 3))

How do I partition a sequence into increasing sub sequences in Clojure?

I have a sequence of integers and I would like to partition them into increasing segments and I want to have as little as possible segments. So I want to have
(segmentize [1 2 3 4 3 8 9 1 7] <=)
;=> [[1 2 3 4][3 8 9][1 7]]
I have implemented segmentize as follows:
(defn segmentize [col lte]
(loop [col col s [] res []]
(cond (empty? col) (conj res s)
(empty? s) (recur (rest col) (conj s (first col)) res)
(lte (last s) (first col)) (recur (rest col) (conj s (first col)) res)
:else (recur col [] (conj res s)))))
But I was wondering if there is already some handy clojure function that does exactly this, or if there is a more idiomatic way to do this.
You can build this with partition-by
(defn segmentize [cmp coll]
(let [switch (reductions = true (map cmp coll (rest coll)))]
(map (partial map first) (partition-by second (map list coll switch)))))
(segmentize <= [1 2 3 4 3 8 9 1 7])
;=> ((1 2 3 4) (3 8 9) (1 7))
The first two maps of the last line may be changed to mapv if you really want vectors rather than lazy sequences.
Another lazy implementation. Basically find out how many consecutive pairs of numbers return true for the "lte" function (take-while + segment) and then split the original collection by that number. Repeat with the reminder collection:
(defn segmentize
[coll lte]
(lazy-seq
(when-let [s (seq coll)]
(let [pairs-in-segment (take-while (fn [[a b]] (lte a b)) (partition 2 1 s))
[segment reminder] (split-at (inc (count pairs-in-segment)) s)]
(cons segment
(segmentize reminder lte))))))
This is a special case of some of the sequence-handling functions in org.flatland/useful, specifically flatland.useful.seq/partition-between:
(partition-between (partial apply >) xs)
If you require a from-scratch implementation with no external dependencies, I'd prefer dAni's answer.
Here is my version of segmentize (I called in split-when):
(defn split-when [f s]
(reduce (fn [acc [a b]]
(if (f b a)
(conj acc [b])
(update-in acc [(dec (count acc))] conj b)))
[[(first s)]]
(partition 2 1 s)))
(split-when < [1 2 3 4 3 8 9 1 7])
;; [[1 2 3 4] [3 8 9] [1 7]]
Because everybody loves lazy sequences:
(defn segmentize [coll cmp]
(if-let [c (seq coll)]
(lazy-seq
(let [[seg rem] (reduce (fn [[head tail] x]
(if (cmp (last head) x)
[(conj head x) (next tail)]
(reduced [head tail])))
[(vec (take 1 c)) (drop 1 c)]
(drop 1 c))]
(cons seg (segmentize rem cmp))))))
The code to compute each segment could probably be made a little less verbose using loop/recur, but I tend to find reduce more readable most of the time.

Stumped with functional breadth-first tree traversal in Clojure?

Say I have a tree defined as per the recommendation in this post, although it's a vector in my case, which hopefully shouldn't matter (they're vectors in Programming Clojure book):
(def tree [1 [[2 [4] [5]] [3 [6]]]])
which should be something like:
1
/ \
2 3
/ \ |
4 5 6
Now, I'd like to do a breadth-first traversal of the tree without any of the traditional means such as the queue, and instead use exclusively the stack to pass information around. I know this isn't the easiest route, but I'm doing it mostly as exercise. Also at this point I'm not planning to return a collection (I'll figure that out afterwards as exercise) but instead just print out the nodes as I travel through them.
My current solution (just starting out with Clojure, be nice):
(defn breadth-recur
[queue]
(if (empty? queue)
(println "Done!")
(let [collections (first (filter coll? queue))]
(do
; print out nodes on the current level, they will not be wrapped'
; in a [] vector and thus coll? will return false
(doseq [node queue] (if (not (coll? node)) (println node)))
(recur (reduce conj (first collections) (rest collections)))))))
The last line is not working as intended and I'm stumped about how to fix it. I know exactly what I want: I need to peel each layer of vectors and then concatenate the results to pass into recur.
The issue I'm seeing is mostly a:
IllegalArgumentException Don't know how to create ISeq from: java.lang.Long
Basically conj doesn't like appending a vector to a long, and if I swap conj for concat, then I fail when one of the two items I'm concatenating isn't a vector. Both conj and concat fail when facing:
[2 [4] [5] [3 [6]]]
I feel like I'm missing a really basic operation here that would work both on vectors and primitives in both positions.
Any suggestions?
Edit 1:
The tree should actually be (thanks Joost!):
(def tree [1 [2 [4] [5]] [3 [6]]])
However we still haven't found a breadth-first solution.
Since apparently there is still no breadth-first solution posted, here is a simple algorithm, implemented first eagerly, and then transformed to be lazy:
(defn bfs-eager [tree]
(loop [ret [], queue (conj clojure.lang.PersistentQueue/EMPTY tree)]
(if (seq queue)
(let [[node & children] (peek queue)]
(recur (conj ret node) (into (pop queue) children)))
ret)))
(defn bfs-lazy [tree]
((fn step [queue]
(lazy-seq
(when (seq queue)
(let [[node & children] (peek queue)]
(cons node
(step (into (pop queue) children)))))))
(conj clojure.lang.PersistentQueue/EMPTY tree)))
Your tree data is incorrect. It should be [1 [2 [4] [5]] [3 [6]]]
Also, you're mixing the tree traversal with printing and building up a result. Things get simpler if you concentrate on doing the hard part separately:
(def tree [1 [2 [4] [5]] [3 [6]]])
NOTE THIS IS DEPTH-FIRST. SEE BELOW
(defn bf "return elements in tree, breath-first"
[[el left right]] ;; a tree is a seq of one element,
;; followed by left and right child trees
(if el
(concat [el] (bf left) (bf right))))
(bf tree)
=> (1 2 4 5 3 6)
CORRECT VERSION
(defn bf [& roots]
(if (seq roots)
(concat (map first roots) ;; values in roots
(apply bf (mapcat rest roots))))) ;; recursively for children
(bf tree)
=> (1 2 3 4 5 6)
This might help, I was creating an algorithm to evaluate if a tree is symmetric and used a breadth-first traversal:
(defn node-values [nodes]
(map first nodes))
(defn node-children [nodes]
(mapcat next nodes))
(defn depth-traversal [nodes]
(if (not (empty? nodes))
(cons (node-values nodes) (depth-traversal (node-children nodes)))))
(defn tree-symmetric? [tree]
(every?
(fn [depth] (= depth (reverse depth)))
(depth-traversal (list tree))))
(def tree '(1 (2 (3) (4)) (2 (4) (3))))
(node-values (list tree)) ; (1)
(node-children (list tree)) ; ((2 (3) (4)) (2 (4) (3)))
(depth-traversal (list tree)) ; ((1) (2 2) (3 4 4 3))
(tree-symmetric? tree) ; true
many combinations of reduceand conj can be replaced with single into call
in the case above with reduce you may need to pass an initial empty vector to reduce to
get make conjunction happy.

Resources