Clojure: Find locations of "1" in a string and print out them in the format of intervals - algorithm

I'm tring to solve such a problem:
Given a string consisting of "1" and "0", find all the locations of "1", and print them in the format of intervals.
For example:
"00101110101110" => 3, 5-7, 9, 11-13
My (ugly) solution:
(defn bar [x]
(letfn [(foo [mystr]
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %)))))]
(let [y (map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x))]
(print (ffirst y) "-" y (last (last (foo x)))))))
Explanation:
At first, I find the locations of "1" in the given string:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first)))
"00101110101110" => (3 5 6 7 9 11 12 13)
Then, I partition the list of positions into a sequence of 2-element-tuples. If there is an 1-element-tuple at the end of that sequence, drop it:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %))))
"00101110101110" => ((3 5) (5 6) (6 7) (7 9) (9 11) (11 12) (12 13))
At last, I print out the first position in the first tuple and the second one in the last tuple, while employing
(map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x)) to get the middle part.
The input:
(bar "00101110101110")
The final result:
3 , 5 -nil - (nil nil 7 , 9 -nil 9 , 11 -nil nil nil nil) 13
My questions:
How can I remove the nils in the final result?
How can I solve this problem in a more concise way?

In order to understand how to remove nils from the final result, let's understand how they get in there in the first place. The value bound to name y in the last let form is actually a sequence of all nil values. The function bar itself also returns nil. This occurs because print always returns nil and if returns nil when the condition is false and the "else" form is not present. Effectively, every value in the sequence returned by foo is converted to a nil. Non-nil values in the output are values printed as a side effect. nil and non-nil values are mixed because map is lazy and the mapping function is applied only when the last print realizes the lazy sequence y. Needless to say, using map for side effects is a bad idea.
So the simplest way to remove nils from the output is to avoid nil values altogether.
(->> "00101110101110"
(map-indexed vector) ;; ([0 \0] [1 \0] [2 \1]...
(partition-by second) ;; (([0 \0] [1 \0]) ([2 \1]) ([3 \0]) ([4 \1] [5 \1] [6 \1]) ...
(filter #(= \1 (-> % first second))) ;; (([2 \1]) ([4 \1] [5 \1] [6 \1])...
(map #(map (comp inc first) %)) ;; ((3) (5 6 7) (9) (11 12 13))
(map #(if (next %) [(first %) (last %)] %)) ;; ((3) [5 7] (9) [11 13])
(map #(clojure.string/join "-" %)) ;; ("3" "5-7" "9" "11-13")
(clojure.string/join ", "))
;; => "3, 5-7, 9, 11-13"

I found this problem kinda interesting, so I tried to attack it with the approach from this talk: Higher parallelism by mapping the data into a more convenient space and then in parallel combining sub-solutions. To that end, I focused on producing the intervals themselves in parallel; using transducers to perform all intermediate steps, then making an eduction and folding over that. This kind of organisation makes for a number of helper functions and such, so maybe not as good on the concision but hopefully interesting anyway.
I go through an intermediate representation as nested vectors: [accepted boundary], where the interval represented by the 2-vector boundary grows until there is a discontinuity, in which case it is added to the end of accepted.
(defn indices "Transducer finding indices of an element occuring in a sequence"
[element]
(keep-indexed #(when (= element %2) %1)))
(defn combine "Combine two series of intervals"
([] [[] nil])
([[acc-a bnd-a] [acc-b bnd-b]]
(let[ [[a b] [c d]] [bnd-a (first acc-b)] ]
(if (<= b c (inc b))
[(into acc-a (concat [[a d]] (pop acc-b) )) bnd-b]
[(into acc-a (concat [bnd-a] acc-b)) bnd-b]))))
(defn plus "Add an interval to the series"
([] [[] nil])
([[accepted boundary] to-add]
(if (nil? boundary)
[accepted to-add]
(let[[[a b] [c d]] [boundary to-add]]
(if (<= b c (inc b))
[accepted [a d]]
[(conj accepted boundary) to-add])))))
(defn printable-indices [element the-seq]
(let[glommed (clojure.core.reducers/fold combine plus (eduction (comp (indices \1) (map #(vector % %))) the-seq))
fixed-up (conj (first glommed) (last glommed))] ;;Because the reduction is done, the last boundary is now accepted.
(clojure.string/join ", " (map (fn [[a b]](if (= a b) (str a) (str a \- b)))) fixed-up)))

Related

Combinations in scheme without direct map?

Related to this question: Pair combinations in scheme, I'm trying to write a function that creates possible sequences of a list. I'm also trying to annotate it to myself with some lets, rather than putting everything in maps. Here is what I have so far:
(define (remove-from-list elem L)
(filter (lambda (x) (not (= x elem))) L))
(define (prepend-element-to-list-of-lists elem L)
(map (lambda (x) (append (list elem) x)) L))
(define (perm L)
; returns a list of lists, so base case will be '(()) rather than '()
(if (null? L) '(())
; we will take out the first element, this is our "prepend-item"
(let ((prepend-element (car L))
(list-minus-self (remove-from-list (car L) L)))
; prepend-to-list-of-lists
(let ((other-lists-minus-self (perm list-minus-self)))
(prepend-element-to-list-of-lists prepend-element other-lists-minus-self)
))))
(perm3 '(1 2 3))
((1 2 3)) ; seems to be stopping before doing the recursive cases/iterations.
What I'm trying to do here is to take out the first element of a list, and prepend that to all list-of-lists that would be created by the procedure without that element. For example, for [1,2,3] the first case would be:
Take out 1 --> prepended to combinations from [2,3], and so eventually it comes to [1,2,3] and [1,3,2].
However, I was seeing if I can do this without map and just calling itself. Is there a way to do that, or is map the only way to do the above for 1, then 2, then 3, ...
And related to this, for the "working normal case", why does the following keep nesting parentheticals?
(define (perm L)
(if (null? L) '(())
; (apply append <-- why is this part required?
(map (lambda (elem)
(map (lambda (other_list) (cons elem other_list))
(perm (remove-from-list elem L))))
L)))
; )
That is, without doing an (apply append) outside the map, I get the "correct" answer, but with tons of nested parens: (((1 (2 (3))) (1 (3 (2)))) ((2 (1 (3))) (2 (3 (1)))) ((3 (1 (2))) (3 (2 (1))))). I suppose if someone could just show an example of a more basic setup where a map 'telescopes' without the big function that might be helpful.
Regarding "where do parens come from", it's about types: the function being mapped turns "element" into a "list of elements", so if you map it over a list of elements, you turn each element in the list into a list of elements: ....
[ 1, 2, 3, ] -->
[ [ 1a, 1b ], [2a], [] ]
, say, (in general; not with those functions in question). And since there's recursion there, we then have something like
[ [ [1a1], [] ], [[]], [] ]
, and so on.
So map foo is listof elts -> listof (listof elts):
`foo` is: elt -> (listof elts)
-------------------------------------------------------
`map foo` is: listof elts -> listof (listof elts)
But if we apply append after the map on each step, we've leveled it into the listof elts -> listof elts,
`map foo`: listof elts -> listof (listof elts)
`apply append`: listof (listof elts) -> listof elts
----------------------------------------------------------------------
`flatmap foo`: listof elts -> listof elts
and so no new parens are popping up -- since they are leveled at each step when they appear, so they don't accumulate like that; the level of nestedness stays the same.
That's what apply append does: it removes the inner parens:
(apply append [ [x, ...], [y, ...], [z, ...] ] ) ==
( append [x, ...] [y, ...] [z, ...] ) ==
[ x, ..., y, ..., z, ... ]
So, as an example,
> (define (func x) (if (= 0 (remainder x 3)) '()
(if (= 0 (remainder x 2)) (list (+ x 1))
(list (+ x 1) (+ x 2)))))
> (display (map func (list 1 2 3 4)))
((2 3) (3) () (5))
> (display (map (lambda (xs) (map func xs)) (map func (list 1 2 3 4))))
(((3) ()) (()) () ((6 7)))
> (display (flatmap func (list 1 2 3 4)))
(2 3 3 5)
> (display (flatmap func (flatmap func (list 1 2 3 4))))
(3 6 7)
Now that the types fit, the flatmap funcs compose nicely, unlike without the flattening. Same happens during recursion in that function. The deeper levels of recursion work on the deeper levels of the result list. And without the flattening this creates more nestedness.

List processing in clojure, tail recursion needed

Given a sorted list of intervals, e.g.
(def lst (list [7 10] [32 35]))
I need to implement a function that adds a new interval to the list. If the new interval is adjacent to any of those from the list, they should be merged:
(= (add-range [1 3] lst) (list [1 3] [7 10] [32 35])) ;; prepend left
(= (add-range [1 6] lst) (list [1 10] [32 35])) ;; merge left
(= (add-range [11 20] lst) (list [7 20] [32 35])) ;; merge right
(= (add-range [11 31] lst) (list [7 35])) ;; merge left and right
This is my implementation:
(defn add-range
[range range-list]
(if (empty? range-list)
(list range)
(let
[lo (first range)
hi (second range)
head (first range-list)
head-lo (dec (first head))
head-hi (inc (second head))]
(if (< hi head-lo)
(cons range range-list)
(if (= hi head-lo)
(cons [lo (second head)] (rest range-list))
(if (= lo head-hi)
(recur [(first head) hi] (rest range-list))
(cons head (add-range range (rest range-list)))))))))
It works and looks quite elegant too, but the last line contains a recursive call add-range which can not be replaced with recur because it is not the last call. I'm planning to have long range lists in my application and I don't want to blow up the stack.
How this can be rewritten using the tail recursion?
Is there another approach to solve the problem? Lazy sequences maybe?
UPDATE
The sorted list is actually not required. This can be a set or even an unsorted list, but it would be really nice to do it in a single pass.
Using a sorted set you can implement it as:
;; first the constructor
(defn ranges [& rs]
(apply sorted-set-by
(fn [[from-a to-a] [from-b to-b]]
(< to-a (dec from-b))) rs))
;; then add-range itself
(defn add-range [ranges [from to :as r]]
(let [rs (subseq ranges <= [from from] <= [to to])
ranges (reduce disj ranges rs)]
(conj ranges
(let [[from'] (or (first rs) r)
[_ to'] (or (last rs) r)]
[(min from from') (max to to')]))))
Let's try your tests:
=> (def lst (ranges [7 10] [32 35]))
#'user/lst
=> (add-range lst [1 3])
#{[1 3] [7 10] [32 35]}
=> (add-range lst [1 6])
#{[7 10] [32 35]}
=> (add-range lst [11 20])
#{[7 20] [32 35]}
=> (add-range lst [11 35])
#{[7 35]}
Addendum #1: add-range is O((m + 1) log n) where n is the size of the ranges set and m the number of merged intervals.
In my experience making something tail recursive involves passing as arguments all local state. Looking at the algo, it looks like already processed range items is the local state. ie, final result = (ranges ignored + merged-range + ranges not required to be considered).
Consider the following version, it explicitly passes a seq of already processed items.
(defn add-range
[range-obj ranges]
(loop [processed []
range-obj range-obj
remaining (list* ranges)]
(if (empty? remaining)
(conj processed range-obj)
(let [[lo hi] range-obj
[h-lo h-hi :as head] (first remaining)
upper-merge-threshold (dec h-lo)
lower-merge-threshold (inc h-hi)]
(cond
(< hi upper-merge-threshold) (into processed
(conj remaining range-obj))
(= hi upper-merge-threshold) (into processed
(conj (rest remaining) [lo h-hi]))
(= lo lower-merge-threshold) (recur processed
[h-lo hi]
(rest remaining))
:else (recur (conj processed head)
range-obj
(rest remaining)))))))
My version accepts a vector and returns a vector. You could modify the relevant code to make it accept a list and return a list.
As for is there a better algorithm, I don't know. I have simply converted your algo to be tail recursive.

Clojure Group Sequential Occurrences - Improve Function

I'm trying to group items that appear directly beside each other, so long as they are each in a given "white-list". Groupings must have at least two or more items to be included.
For example, first arg is the collection, second arg the whitelist.
(group-sequential [1 2 3 4 5] [2 3])
>> ((2 3))
(group-sequential ["The" "quick" "brown" "healthy" "fox" "jumped" "over" "the" "fence"]
["quick" "brown" "over" "fox" "jumped"])
>> (("quick" "brown") ("fox" "jumped" "over"))
(group-sequential [1 2 3 4 5 6 7] [2 3 6])
>> ((2 3))
This is what I've come up with:
(defn group-sequential
[haystack needles]
(loop [l haystack acc '()]
(let [[curr more] (split-with #(some #{%} needles) l)]
(if (< (count curr) 2)
(if (empty? more) acc (recur (rest more) acc))
(recur (rest more) (cons curr acc))))))
It works, but is pretty ugly. I wonder if there's a much simpler idiomatic way to do it in Clojure? (You should have seen the fn before I discovered split-with :)
I bet there's a nice one-liner with partition-by or something, but it's late and I can't quite seem to make it work.
(defn group-sequential [coll white]
(->> coll
(map (set white))
(partition-by nil?)
(filter (comp first next))))
... a tidier version of Diego Basch's method.
Here's my first attempt:
(defn group-sequential [xs wl]
(let [s (set wl)
f (map #(if (s %) %) xs)
xs' (partition-by nil? f)]
(remove #(or (nil? (first %)) (= 1 (count %))) xs')))
(defn group-sequential
[coll matches]
(let [matches-set (set matches)]
(->> (partition-by (partial contains? matches-set) coll)
(filter #(clojure.set/subset? % matches-set))
(remove #(< (count %) 2)))))
Ok, I realized partition-by is pretty close to what I'm looking for, so I created this function which seems a lot more in line with the core stuff.
(defn partition-if
"Returns a lazy seq of partitions of items that match the filter"
[pred coll]
(lazy-seq
(when-let [s (seq coll)]
(let [[in more0] (split-with pred s)
[out more] (split-with (complement pred) more0)]
(if (empty? in)
(partition-if pred more)
(cons in (partition-if pred more)))))))
(partition-if #(some #{%} [2 3 6]) [1 2 3 4 5 6 7])
>> ((2 3))

How do I partition a sequence into increasing sub sequences in Clojure?

I have a sequence of integers and I would like to partition them into increasing segments and I want to have as little as possible segments. So I want to have
(segmentize [1 2 3 4 3 8 9 1 7] <=)
;=> [[1 2 3 4][3 8 9][1 7]]
I have implemented segmentize as follows:
(defn segmentize [col lte]
(loop [col col s [] res []]
(cond (empty? col) (conj res s)
(empty? s) (recur (rest col) (conj s (first col)) res)
(lte (last s) (first col)) (recur (rest col) (conj s (first col)) res)
:else (recur col [] (conj res s)))))
But I was wondering if there is already some handy clojure function that does exactly this, or if there is a more idiomatic way to do this.
You can build this with partition-by
(defn segmentize [cmp coll]
(let [switch (reductions = true (map cmp coll (rest coll)))]
(map (partial map first) (partition-by second (map list coll switch)))))
(segmentize <= [1 2 3 4 3 8 9 1 7])
;=> ((1 2 3 4) (3 8 9) (1 7))
The first two maps of the last line may be changed to mapv if you really want vectors rather than lazy sequences.
Another lazy implementation. Basically find out how many consecutive pairs of numbers return true for the "lte" function (take-while + segment) and then split the original collection by that number. Repeat with the reminder collection:
(defn segmentize
[coll lte]
(lazy-seq
(when-let [s (seq coll)]
(let [pairs-in-segment (take-while (fn [[a b]] (lte a b)) (partition 2 1 s))
[segment reminder] (split-at (inc (count pairs-in-segment)) s)]
(cons segment
(segmentize reminder lte))))))
This is a special case of some of the sequence-handling functions in org.flatland/useful, specifically flatland.useful.seq/partition-between:
(partition-between (partial apply >) xs)
If you require a from-scratch implementation with no external dependencies, I'd prefer dAni's answer.
Here is my version of segmentize (I called in split-when):
(defn split-when [f s]
(reduce (fn [acc [a b]]
(if (f b a)
(conj acc [b])
(update-in acc [(dec (count acc))] conj b)))
[[(first s)]]
(partition 2 1 s)))
(split-when < [1 2 3 4 3 8 9 1 7])
;; [[1 2 3 4] [3 8 9] [1 7]]
Because everybody loves lazy sequences:
(defn segmentize [coll cmp]
(if-let [c (seq coll)]
(lazy-seq
(let [[seg rem] (reduce (fn [[head tail] x]
(if (cmp (last head) x)
[(conj head x) (next tail)]
(reduced [head tail])))
[(vec (take 1 c)) (drop 1 c)]
(drop 1 c))]
(cons seg (segmentize rem cmp))))))
The code to compute each segment could probably be made a little less verbose using loop/recur, but I tend to find reduce more readable most of the time.

Filter a list into two parts by a predicate

I want to do
(filter-list-into-two-parts #'evenp '(1 2 3 4 5))
; => ((2 4) (1 3 5))
where a list is split into two sub-lists depending on whether a predicate evaluates to true. It is easy to define such a function:
(defun filter-list-into-two-parts (predicate list)
(list (remove-if-not predicate list) (remove-if predicate list)))
but I would like to know if there is a built-in function in Lisp that can do this, or perhaps a better way of writing this function?
I don't think there is a built-in and your version is sub-optimal because it traverses the list twice and calls the predicate on each list element twice.
(defun filter-list-into-two-parts (predicate list)
(loop for x in list
if (funcall predicate x) collect x into yes
else collect x into no
finally (return (values yes no))))
I return two values instead of the list thereof; this is more idiomatic (you will be using multiple-value-bind to extract yes and no from the multiple values returned, instead of using destructuring-bind to parse the list, it conses less and is faster, see also values function in Common Lisp).
A more general version would be
(defun split-list (key list &key (test 'eql))
(let ((ht (make-hash-table :test test)))
(dolist (x list ht)
(push x (gethash (funcall key x) ht '())))))
(split-list (lambda (x) (mod x 3)) (loop for i from 0 to 9 collect i))
==> #S(HASH-TABLE :TEST FASTHASH-EQL (2 . (8 5 2)) (1 . (7 4 1)) (0 . (9 6 3 0)))
Using REDUCE:
(reduce (lambda (a b)
(if (evenp a)
(push a (first b))
(push a (second b)))
b)
'(1 2 3 4 5)
:initial-value (list nil nil)
:from-end t)
In dash.el there is a function -separate that does exactly what you ask:
(-separate 'evenp '(1 2 3 4)) ; => '((2 4) (1 3))
You can ignore the rest of the post if you use -separate. I had to implement Haskell's partition function in Elisp. Elisp is similar1 in many respects to Common Lisp, so this answer will be useful for coders of both languages. My code was inspired by similar implementations for Python
(defun partition-push (p xs)
(let (trues falses) ; initialized to nil, nil = '()
(mapc (lambda (x) ; like mapcar but for side-effects only
(if (funcall p x)
(push x trues)
(push x falses)))
xs)
(list (reverse trues) (reverse falses))))
(defun partition-append (p xs)
(reduce (lambda (r x)
(if (funcall p x)
(list (append (car r) (list x))
(cadr r))
(list (car r)
(append (cadr r) (list x)))))
xs
:initial-value '(() ()) ; (list nil nil)
))
(defun partition-reduce-reverse (p xs)
(mapcar #'reverse ; reverse both lists
(reduce (lambda (r x)
(if (funcall p x)
(list (cons x (car r))
(cadr r))
(list (car r)
(cons x (cadr r)))))
xs
:initial-value '(() ())
)))
push is a destructive function that prepends an element to list. I didn't use Elisp's add-to-list, because it only adds the same element once. mapc is a map function2 that doesn't accumulate results. As Elisp, like Common Lisp, has separate namespaces for functions and variables3, you have to use funcall to call a function received as a parameter. reduce is a higher-order function4 that accepts :initial-value keyword, which allows for versatile usage. append concatenates variable amount of lists.
In the code partition-push is imperative Common Lisp that uses a widespread "push and reverse" idiom, you first generate lists by prepending to the list in O(1) and reversing in O(n). Appending once to a list would be O(n) due to lists implemented as cons cells, so appending n items would be O(n²). partition-append illustrates adding to the end. As I'm a functional programming fan, I wrote the no side-effects version with reduce in partition-reduce-reverse.
Emacs has a profiling tool. I run it against these 3 functions. The first element in a list returned is the total amount of seconds. As you can see, appending to list works extremely slow, while the functional variant is the quickest.
ELISP> (benchmark-run 100 (-separate #'evenp (number-sequence 0 1000)))
(0.043594004 0 0.0)
ELISP> (benchmark-run 100 (partition-push #'evenp (number-sequence 0 1000)))
(0.468053176 7 0.2956386049999793)
ELISP> (benchmark-run 100 (partition-append #'evenp (number-sequence 0 1000)))
(7.412973128 162 6.853687342999947)
ELISP> (benchmark-run 100 (partition-reduce-reverse #'evenp (number-sequence 0 1000)))
(0.217411618 3 0.12750035599998455)
References
Differences between Common Lisp and Emacs Lisp
Map higher-order function
Technical Issues of Separation in Function Cells and Value Cells
Fold higher-order function
I don't think that there is a partition function in the common lisp standard, but there are libraries that provide such an utility (with documentation and source).
CL-USER> (ql:quickload :arnesi)
CL-USER> (arnesi:partition '(1 2 3 4 5) 'evenp 'oddp)
((2 4) (1 3 5))
CL-USER> (arnesi:partition '(1 2 b "c") 'numberp 'symbolp 'stringp)
((1 2) (B) ("c"))

Resources