Clojure Group Sequential Occurrences - Improve Function - algorithm

I'm trying to group items that appear directly beside each other, so long as they are each in a given "white-list". Groupings must have at least two or more items to be included.
For example, first arg is the collection, second arg the whitelist.
(group-sequential [1 2 3 4 5] [2 3])
>> ((2 3))
(group-sequential ["The" "quick" "brown" "healthy" "fox" "jumped" "over" "the" "fence"]
["quick" "brown" "over" "fox" "jumped"])
>> (("quick" "brown") ("fox" "jumped" "over"))
(group-sequential [1 2 3 4 5 6 7] [2 3 6])
>> ((2 3))
This is what I've come up with:
(defn group-sequential
[haystack needles]
(loop [l haystack acc '()]
(let [[curr more] (split-with #(some #{%} needles) l)]
(if (< (count curr) 2)
(if (empty? more) acc (recur (rest more) acc))
(recur (rest more) (cons curr acc))))))
It works, but is pretty ugly. I wonder if there's a much simpler idiomatic way to do it in Clojure? (You should have seen the fn before I discovered split-with :)
I bet there's a nice one-liner with partition-by or something, but it's late and I can't quite seem to make it work.

(defn group-sequential [coll white]
(->> coll
(map (set white))
(partition-by nil?)
(filter (comp first next))))
... a tidier version of Diego Basch's method.

Here's my first attempt:
(defn group-sequential [xs wl]
(let [s (set wl)
f (map #(if (s %) %) xs)
xs' (partition-by nil? f)]
(remove #(or (nil? (first %)) (= 1 (count %))) xs')))

(defn group-sequential
[coll matches]
(let [matches-set (set matches)]
(->> (partition-by (partial contains? matches-set) coll)
(filter #(clojure.set/subset? % matches-set))
(remove #(< (count %) 2)))))

Ok, I realized partition-by is pretty close to what I'm looking for, so I created this function which seems a lot more in line with the core stuff.
(defn partition-if
"Returns a lazy seq of partitions of items that match the filter"
[pred coll]
(lazy-seq
(when-let [s (seq coll)]
(let [[in more0] (split-with pred s)
[out more] (split-with (complement pred) more0)]
(if (empty? in)
(partition-if pred more)
(cons in (partition-if pred more)))))))
(partition-if #(some #{%} [2 3 6]) [1 2 3 4 5 6 7])
>> ((2 3))

Related

Scheme filter but the result list contains indices not values

Is there any easy way to create a kind of filter with the outcome the indices instead of values?
for example:
'(#true #false #true) -> '(0 2)
You can create a custom filter procedure that returns a list of indices for elements of a list where a predicate produces a true value, as such:
(define (filtr pred lst)
(for/list ([i lst]
[n (in-naturals)]
#:when (pred i))
n))
For example,
> (filtr number? '(1 2 3 a b c 8 d 19 e f))
'(0 1 2 6 8)
> (filtr (lambda (x) (and x)) '(#true #false #true))
'(0 2)
Of course there is a way to do it; there's many ways to do it. Here's one way.
(define (func xs)
(let loop ((index 0) (xs xs))
(cond ((empty? xs) empty)
((car xs) (cons index (loop (add1 index) (cdr xs))))
(else (loop (add1 index) (cdr xs))))))
(func '(#true #false #true #false #false #true #true))
;; => '(0 2 5 6)
Racket has a function for this (Not sure if it was present when this question was asked) - indexes-where, in racket/list.
Example:
> (indexes-where '(1 2 0 3 0 4) zero?)
'(2 4)

List processing in clojure, tail recursion needed

Given a sorted list of intervals, e.g.
(def lst (list [7 10] [32 35]))
I need to implement a function that adds a new interval to the list. If the new interval is adjacent to any of those from the list, they should be merged:
(= (add-range [1 3] lst) (list [1 3] [7 10] [32 35])) ;; prepend left
(= (add-range [1 6] lst) (list [1 10] [32 35])) ;; merge left
(= (add-range [11 20] lst) (list [7 20] [32 35])) ;; merge right
(= (add-range [11 31] lst) (list [7 35])) ;; merge left and right
This is my implementation:
(defn add-range
[range range-list]
(if (empty? range-list)
(list range)
(let
[lo (first range)
hi (second range)
head (first range-list)
head-lo (dec (first head))
head-hi (inc (second head))]
(if (< hi head-lo)
(cons range range-list)
(if (= hi head-lo)
(cons [lo (second head)] (rest range-list))
(if (= lo head-hi)
(recur [(first head) hi] (rest range-list))
(cons head (add-range range (rest range-list)))))))))
It works and looks quite elegant too, but the last line contains a recursive call add-range which can not be replaced with recur because it is not the last call. I'm planning to have long range lists in my application and I don't want to blow up the stack.
How this can be rewritten using the tail recursion?
Is there another approach to solve the problem? Lazy sequences maybe?
UPDATE
The sorted list is actually not required. This can be a set or even an unsorted list, but it would be really nice to do it in a single pass.
Using a sorted set you can implement it as:
;; first the constructor
(defn ranges [& rs]
(apply sorted-set-by
(fn [[from-a to-a] [from-b to-b]]
(< to-a (dec from-b))) rs))
;; then add-range itself
(defn add-range [ranges [from to :as r]]
(let [rs (subseq ranges <= [from from] <= [to to])
ranges (reduce disj ranges rs)]
(conj ranges
(let [[from'] (or (first rs) r)
[_ to'] (or (last rs) r)]
[(min from from') (max to to')]))))
Let's try your tests:
=> (def lst (ranges [7 10] [32 35]))
#'user/lst
=> (add-range lst [1 3])
#{[1 3] [7 10] [32 35]}
=> (add-range lst [1 6])
#{[7 10] [32 35]}
=> (add-range lst [11 20])
#{[7 20] [32 35]}
=> (add-range lst [11 35])
#{[7 35]}
Addendum #1: add-range is O((m + 1) log n) where n is the size of the ranges set and m the number of merged intervals.
In my experience making something tail recursive involves passing as arguments all local state. Looking at the algo, it looks like already processed range items is the local state. ie, final result = (ranges ignored + merged-range + ranges not required to be considered).
Consider the following version, it explicitly passes a seq of already processed items.
(defn add-range
[range-obj ranges]
(loop [processed []
range-obj range-obj
remaining (list* ranges)]
(if (empty? remaining)
(conj processed range-obj)
(let [[lo hi] range-obj
[h-lo h-hi :as head] (first remaining)
upper-merge-threshold (dec h-lo)
lower-merge-threshold (inc h-hi)]
(cond
(< hi upper-merge-threshold) (into processed
(conj remaining range-obj))
(= hi upper-merge-threshold) (into processed
(conj (rest remaining) [lo h-hi]))
(= lo lower-merge-threshold) (recur processed
[h-lo hi]
(rest remaining))
:else (recur (conj processed head)
range-obj
(rest remaining)))))))
My version accepts a vector and returns a vector. You could modify the relevant code to make it accept a list and return a list.
As for is there a better algorithm, I don't know. I have simply converted your algo to be tail recursive.

Clojure: Find locations of "1" in a string and print out them in the format of intervals

I'm tring to solve such a problem:
Given a string consisting of "1" and "0", find all the locations of "1", and print them in the format of intervals.
For example:
"00101110101110" => 3, 5-7, 9, 11-13
My (ugly) solution:
(defn bar [x]
(letfn [(foo [mystr]
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %)))))]
(let [y (map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x))]
(print (ffirst y) "-" y (last (last (foo x)))))))
Explanation:
At first, I find the locations of "1" in the given string:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first)))
"00101110101110" => (3 5 6 7 9 11 12 13)
Then, I partition the list of positions into a sequence of 2-element-tuples. If there is an 1-element-tuple at the end of that sequence, drop it:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %))))
"00101110101110" => ((3 5) (5 6) (6 7) (7 9) (9 11) (11 12) (12 13))
At last, I print out the first position in the first tuple and the second one in the last tuple, while employing
(map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x)) to get the middle part.
The input:
(bar "00101110101110")
The final result:
3 , 5 -nil - (nil nil 7 , 9 -nil 9 , 11 -nil nil nil nil) 13
My questions:
How can I remove the nils in the final result?
How can I solve this problem in a more concise way?
In order to understand how to remove nils from the final result, let's understand how they get in there in the first place. The value bound to name y in the last let form is actually a sequence of all nil values. The function bar itself also returns nil. This occurs because print always returns nil and if returns nil when the condition is false and the "else" form is not present. Effectively, every value in the sequence returned by foo is converted to a nil. Non-nil values in the output are values printed as a side effect. nil and non-nil values are mixed because map is lazy and the mapping function is applied only when the last print realizes the lazy sequence y. Needless to say, using map for side effects is a bad idea.
So the simplest way to remove nils from the output is to avoid nil values altogether.
(->> "00101110101110"
(map-indexed vector) ;; ([0 \0] [1 \0] [2 \1]...
(partition-by second) ;; (([0 \0] [1 \0]) ([2 \1]) ([3 \0]) ([4 \1] [5 \1] [6 \1]) ...
(filter #(= \1 (-> % first second))) ;; (([2 \1]) ([4 \1] [5 \1] [6 \1])...
(map #(map (comp inc first) %)) ;; ((3) (5 6 7) (9) (11 12 13))
(map #(if (next %) [(first %) (last %)] %)) ;; ((3) [5 7] (9) [11 13])
(map #(clojure.string/join "-" %)) ;; ("3" "5-7" "9" "11-13")
(clojure.string/join ", "))
;; => "3, 5-7, 9, 11-13"
I found this problem kinda interesting, so I tried to attack it with the approach from this talk: Higher parallelism by mapping the data into a more convenient space and then in parallel combining sub-solutions. To that end, I focused on producing the intervals themselves in parallel; using transducers to perform all intermediate steps, then making an eduction and folding over that. This kind of organisation makes for a number of helper functions and such, so maybe not as good on the concision but hopefully interesting anyway.
I go through an intermediate representation as nested vectors: [accepted boundary], where the interval represented by the 2-vector boundary grows until there is a discontinuity, in which case it is added to the end of accepted.
(defn indices "Transducer finding indices of an element occuring in a sequence"
[element]
(keep-indexed #(when (= element %2) %1)))
(defn combine "Combine two series of intervals"
([] [[] nil])
([[acc-a bnd-a] [acc-b bnd-b]]
(let[ [[a b] [c d]] [bnd-a (first acc-b)] ]
(if (<= b c (inc b))
[(into acc-a (concat [[a d]] (pop acc-b) )) bnd-b]
[(into acc-a (concat [bnd-a] acc-b)) bnd-b]))))
(defn plus "Add an interval to the series"
([] [[] nil])
([[accepted boundary] to-add]
(if (nil? boundary)
[accepted to-add]
(let[[[a b] [c d]] [boundary to-add]]
(if (<= b c (inc b))
[accepted [a d]]
[(conj accepted boundary) to-add])))))
(defn printable-indices [element the-seq]
(let[glommed (clojure.core.reducers/fold combine plus (eduction (comp (indices \1) (map #(vector % %))) the-seq))
fixed-up (conj (first glommed) (last glommed))] ;;Because the reduction is done, the last boundary is now accepted.
(clojure.string/join ", " (map (fn [[a b]](if (= a b) (str a) (str a \- b)))) fixed-up)))

How do I partition a sequence into increasing sub sequences in Clojure?

I have a sequence of integers and I would like to partition them into increasing segments and I want to have as little as possible segments. So I want to have
(segmentize [1 2 3 4 3 8 9 1 7] <=)
;=> [[1 2 3 4][3 8 9][1 7]]
I have implemented segmentize as follows:
(defn segmentize [col lte]
(loop [col col s [] res []]
(cond (empty? col) (conj res s)
(empty? s) (recur (rest col) (conj s (first col)) res)
(lte (last s) (first col)) (recur (rest col) (conj s (first col)) res)
:else (recur col [] (conj res s)))))
But I was wondering if there is already some handy clojure function that does exactly this, or if there is a more idiomatic way to do this.
You can build this with partition-by
(defn segmentize [cmp coll]
(let [switch (reductions = true (map cmp coll (rest coll)))]
(map (partial map first) (partition-by second (map list coll switch)))))
(segmentize <= [1 2 3 4 3 8 9 1 7])
;=> ((1 2 3 4) (3 8 9) (1 7))
The first two maps of the last line may be changed to mapv if you really want vectors rather than lazy sequences.
Another lazy implementation. Basically find out how many consecutive pairs of numbers return true for the "lte" function (take-while + segment) and then split the original collection by that number. Repeat with the reminder collection:
(defn segmentize
[coll lte]
(lazy-seq
(when-let [s (seq coll)]
(let [pairs-in-segment (take-while (fn [[a b]] (lte a b)) (partition 2 1 s))
[segment reminder] (split-at (inc (count pairs-in-segment)) s)]
(cons segment
(segmentize reminder lte))))))
This is a special case of some of the sequence-handling functions in org.flatland/useful, specifically flatland.useful.seq/partition-between:
(partition-between (partial apply >) xs)
If you require a from-scratch implementation with no external dependencies, I'd prefer dAni's answer.
Here is my version of segmentize (I called in split-when):
(defn split-when [f s]
(reduce (fn [acc [a b]]
(if (f b a)
(conj acc [b])
(update-in acc [(dec (count acc))] conj b)))
[[(first s)]]
(partition 2 1 s)))
(split-when < [1 2 3 4 3 8 9 1 7])
;; [[1 2 3 4] [3 8 9] [1 7]]
Because everybody loves lazy sequences:
(defn segmentize [coll cmp]
(if-let [c (seq coll)]
(lazy-seq
(let [[seg rem] (reduce (fn [[head tail] x]
(if (cmp (last head) x)
[(conj head x) (next tail)]
(reduced [head tail])))
[(vec (take 1 c)) (drop 1 c)]
(drop 1 c))]
(cons seg (segmentize rem cmp))))))
The code to compute each segment could probably be made a little less verbose using loop/recur, but I tend to find reduce more readable most of the time.

Filter a list into two parts by a predicate

I want to do
(filter-list-into-two-parts #'evenp '(1 2 3 4 5))
; => ((2 4) (1 3 5))
where a list is split into two sub-lists depending on whether a predicate evaluates to true. It is easy to define such a function:
(defun filter-list-into-two-parts (predicate list)
(list (remove-if-not predicate list) (remove-if predicate list)))
but I would like to know if there is a built-in function in Lisp that can do this, or perhaps a better way of writing this function?
I don't think there is a built-in and your version is sub-optimal because it traverses the list twice and calls the predicate on each list element twice.
(defun filter-list-into-two-parts (predicate list)
(loop for x in list
if (funcall predicate x) collect x into yes
else collect x into no
finally (return (values yes no))))
I return two values instead of the list thereof; this is more idiomatic (you will be using multiple-value-bind to extract yes and no from the multiple values returned, instead of using destructuring-bind to parse the list, it conses less and is faster, see also values function in Common Lisp).
A more general version would be
(defun split-list (key list &key (test 'eql))
(let ((ht (make-hash-table :test test)))
(dolist (x list ht)
(push x (gethash (funcall key x) ht '())))))
(split-list (lambda (x) (mod x 3)) (loop for i from 0 to 9 collect i))
==> #S(HASH-TABLE :TEST FASTHASH-EQL (2 . (8 5 2)) (1 . (7 4 1)) (0 . (9 6 3 0)))
Using REDUCE:
(reduce (lambda (a b)
(if (evenp a)
(push a (first b))
(push a (second b)))
b)
'(1 2 3 4 5)
:initial-value (list nil nil)
:from-end t)
In dash.el there is a function -separate that does exactly what you ask:
(-separate 'evenp '(1 2 3 4)) ; => '((2 4) (1 3))
You can ignore the rest of the post if you use -separate. I had to implement Haskell's partition function in Elisp. Elisp is similar1 in many respects to Common Lisp, so this answer will be useful for coders of both languages. My code was inspired by similar implementations for Python
(defun partition-push (p xs)
(let (trues falses) ; initialized to nil, nil = '()
(mapc (lambda (x) ; like mapcar but for side-effects only
(if (funcall p x)
(push x trues)
(push x falses)))
xs)
(list (reverse trues) (reverse falses))))
(defun partition-append (p xs)
(reduce (lambda (r x)
(if (funcall p x)
(list (append (car r) (list x))
(cadr r))
(list (car r)
(append (cadr r) (list x)))))
xs
:initial-value '(() ()) ; (list nil nil)
))
(defun partition-reduce-reverse (p xs)
(mapcar #'reverse ; reverse both lists
(reduce (lambda (r x)
(if (funcall p x)
(list (cons x (car r))
(cadr r))
(list (car r)
(cons x (cadr r)))))
xs
:initial-value '(() ())
)))
push is a destructive function that prepends an element to list. I didn't use Elisp's add-to-list, because it only adds the same element once. mapc is a map function2 that doesn't accumulate results. As Elisp, like Common Lisp, has separate namespaces for functions and variables3, you have to use funcall to call a function received as a parameter. reduce is a higher-order function4 that accepts :initial-value keyword, which allows for versatile usage. append concatenates variable amount of lists.
In the code partition-push is imperative Common Lisp that uses a widespread "push and reverse" idiom, you first generate lists by prepending to the list in O(1) and reversing in O(n). Appending once to a list would be O(n) due to lists implemented as cons cells, so appending n items would be O(n²). partition-append illustrates adding to the end. As I'm a functional programming fan, I wrote the no side-effects version with reduce in partition-reduce-reverse.
Emacs has a profiling tool. I run it against these 3 functions. The first element in a list returned is the total amount of seconds. As you can see, appending to list works extremely slow, while the functional variant is the quickest.
ELISP> (benchmark-run 100 (-separate #'evenp (number-sequence 0 1000)))
(0.043594004 0 0.0)
ELISP> (benchmark-run 100 (partition-push #'evenp (number-sequence 0 1000)))
(0.468053176 7 0.2956386049999793)
ELISP> (benchmark-run 100 (partition-append #'evenp (number-sequence 0 1000)))
(7.412973128 162 6.853687342999947)
ELISP> (benchmark-run 100 (partition-reduce-reverse #'evenp (number-sequence 0 1000)))
(0.217411618 3 0.12750035599998455)
References
Differences between Common Lisp and Emacs Lisp
Map higher-order function
Technical Issues of Separation in Function Cells and Value Cells
Fold higher-order function
I don't think that there is a partition function in the common lisp standard, but there are libraries that provide such an utility (with documentation and source).
CL-USER> (ql:quickload :arnesi)
CL-USER> (arnesi:partition '(1 2 3 4 5) 'evenp 'oddp)
((2 4) (1 3 5))
CL-USER> (arnesi:partition '(1 2 b "c") 'numberp 'symbolp 'stringp)
((1 2) (B) ("c"))

Resources