How do I partition a sequence into increasing sub sequences in Clojure? - algorithm

I have a sequence of integers and I would like to partition them into increasing segments and I want to have as little as possible segments. So I want to have
(segmentize [1 2 3 4 3 8 9 1 7] <=)
;=> [[1 2 3 4][3 8 9][1 7]]
I have implemented segmentize as follows:
(defn segmentize [col lte]
(loop [col col s [] res []]
(cond (empty? col) (conj res s)
(empty? s) (recur (rest col) (conj s (first col)) res)
(lte (last s) (first col)) (recur (rest col) (conj s (first col)) res)
:else (recur col [] (conj res s)))))
But I was wondering if there is already some handy clojure function that does exactly this, or if there is a more idiomatic way to do this.

You can build this with partition-by
(defn segmentize [cmp coll]
(let [switch (reductions = true (map cmp coll (rest coll)))]
(map (partial map first) (partition-by second (map list coll switch)))))
(segmentize <= [1 2 3 4 3 8 9 1 7])
;=> ((1 2 3 4) (3 8 9) (1 7))
The first two maps of the last line may be changed to mapv if you really want vectors rather than lazy sequences.

Another lazy implementation. Basically find out how many consecutive pairs of numbers return true for the "lte" function (take-while + segment) and then split the original collection by that number. Repeat with the reminder collection:
(defn segmentize
[coll lte]
(lazy-seq
(when-let [s (seq coll)]
(let [pairs-in-segment (take-while (fn [[a b]] (lte a b)) (partition 2 1 s))
[segment reminder] (split-at (inc (count pairs-in-segment)) s)]
(cons segment
(segmentize reminder lte))))))

This is a special case of some of the sequence-handling functions in org.flatland/useful, specifically flatland.useful.seq/partition-between:
(partition-between (partial apply >) xs)
If you require a from-scratch implementation with no external dependencies, I'd prefer dAni's answer.

Here is my version of segmentize (I called in split-when):
(defn split-when [f s]
(reduce (fn [acc [a b]]
(if (f b a)
(conj acc [b])
(update-in acc [(dec (count acc))] conj b)))
[[(first s)]]
(partition 2 1 s)))
(split-when < [1 2 3 4 3 8 9 1 7])
;; [[1 2 3 4] [3 8 9] [1 7]]

Because everybody loves lazy sequences:
(defn segmentize [coll cmp]
(if-let [c (seq coll)]
(lazy-seq
(let [[seg rem] (reduce (fn [[head tail] x]
(if (cmp (last head) x)
[(conj head x) (next tail)]
(reduced [head tail])))
[(vec (take 1 c)) (drop 1 c)]
(drop 1 c))]
(cons seg (segmentize rem cmp))))))
The code to compute each segment could probably be made a little less verbose using loop/recur, but I tend to find reduce more readable most of the time.

Related

Is there a Racket builtin to repeat an element n times as a list?

I realize this is trivially to implement myself, but I was curious if Racket has a builtin or syntactic equivalent to the following in Python:
>>> n = 5
>>> element = "arbitrary string"
>>> [element] * n
['arbitrary string', 'arbitrary string', 'arbitrary string', 'arbitrary string', 'arbitrary string']
If not, what is the idiomatic way to do this sort of thing in Racket? Right now my way of doing the above in Racket is:
(let ((n 5)
(element "arbitrary string"))
(map (λ (x) element)
(range n)))
Any suggestions are immensely appreciated, thanks!
Python's ["arbitrary string"] * 5 can be translated to (make-list 5 "arbitrary string") in Racket.
However, that's often not what you want, because elements are shared. This is totally fine for immutable values, but for mutable values, it could have an undesired consequence:
In Python
>>> xs = [[]] * 5
>>> xs[0].append(1)
>>> xs
[[1], [1], [1], [1], [1]]
In Racket:
> (define xs (make-list 5 (box 0)))
> (set-box! (first xs) 1)
> xs
'(#&1 #&1 #&1 #&1 #&1)
In Python, you could use the list comprehension to avoid the problem:
>>> xs = [[] for x in range(5)]
>>> xs[0].append(1)
>>> xs
[[1], [], [], [], []]
In Racket, you can use build-list.
> (define xs (build-list 5 (thunk* (box 0))))
> (set-box! (first xs) 1)
> xs
'(#&1 #&0 #&0 #&0 #&0)
Here are ways that also avoid the problem:
(build-list 5 (thunk* (box 0)))
(for/list ([x (in-range 5)]) (box 0))
(for/list ([x (range 5)]) (box 0))
(map (thunk* (box 0)) (range 5))
The last two are not recommended, because it needs to create the list (range 5) first, which is inefficient (it's similar to Python's ["hello" for x in list(range(5))]).
Note that (thunk* v) is equivalent to (lambda (ignored...) v), which is the reason why you get "fresh" v, avoiding the element sharing problem. However, if you intentionally want to share elements, you can also use (const v) instead of (thunk* v).
> (define xs (build-list 5 (const (box 0))))
> (set-box! (first xs) 1)
> xs
'(#&1 #&1 #&1 #&1 #&1)
Lastly, build-list in fact gives you the index as well. I used thunk* previously because for your problem, we don't need the index. However, if you need it, you can use it:
> (build-list 5 (lambda (x) (* 2 x)))
'(0 2 4 6 8)
You are looking for make-list. Here (make-list k v) makes a list with k elements all of which are v. There is a similar functions make-vector that, well, makes vectors:
> (make-list 5 "foo")
'("foo" "foo" "foo" "foo" "foo")
> (make-vector 5 1)
'#(1 1 1 1 1)
Lookup both make-list and build-list and compare.

How to find partitions of a list in Scheme

Say there is any given list in Scheme. This list is ‘(2 3 4)
I want to find all possible partitions of this list. This means a partition where a list is separated into two subsets such that every element of the list must be in one or the other subsets but not both, and no element can be left out of a split.
So, given the list ‘(2 3 4), I want to find all such possible partitions. These partitions would be the following: {2, 3} and {4}, {2, 4} and {3}, and the final possible partition being {3, 4} and {2}.
I want to be able to recursively find all partitions given a list in Scheme, but I have no ideas on how to do so. Code or psuedocode will help me if anyone can provide it for me!
I do believe I will have to use lambda for my recursive function.
I discuss several different types of partitions at my blog, though not this specific one. As an example, consider that an integer partition is the set of all sets of positive integers that sum to the given integer. For instance, the partitions of 4 is the set of sets ((1 1 1 1) (1 1 2) (1 3) (2 2) (4)).
The process is building the partitions is recursive. There is a single partition of 0, the empty set (). There is a single partition of 1, the set (1). There are two partitions of 2, the sets (1 1) and (2). There are three partitions of 3, the sets (1 1 1), (1 2) and (3). There are five partitions of 4, the sets (1 1 1 1), (1 1 2), (1 3), (2 2), and (4). There are seven partitions of 5, the sets (1 1 1 1 1), (1 1 1 2), (1 2 2), (1 1 3), (1 4), (2 3) and (5). And so on. In each case, the next-larger set of partitions is determined by adding each integer x less than or equal to the desired integer n to all the sets formed by the partition of n − x, eliminating any duplicates. Here's how I implement that:
Petite Chez Scheme Version 8.4
Copyright (c) 1985-2011 Cadence Research Systems
> (define (set-cons x xs)
(if (member x xs) xs
(cons x xs)))
> (define (parts n)
(if (zero? n) (list (list))
(let x-loop ((x 1) (xs (list)))
(if (= x n) (cons (list n) xs)
(let y-loop ((yss (parts (- n x))) (xs xs))
(if (null? yss) (x-loop (+ x 1) xs)
(y-loop (cdr yss)
(set-cons (sort < (cons x (car yss)))
xs))))))))
> (parts 6)
((6) (3 3) (2 2 2) (2 4) (1 1 4) (1 1 2 2) (1 1 1 1 2)
(1 1 1 1 1 1) (1 1 1 3) (1 2 3) (1 5))
I'm not going to solve your homework for you, but your solution will be similar to the one given above. You need to state your algorithm in recursive fashion, then write code to implement that algorithm. Your recursion is going to be something like this: For each item in the set, add the item to each partition of the remaining items of the set, eliminating duplicates.
That will get you started. If you have specific questions, come back here for additional help.
EDIT: Here is my solution. I'll let you figure out how it works.
(define range (case-lambda ; start, start+step, ..., start+step<stop
((stop) (range 0 stop (if (negative? stop) -1 1)))
((start stop) (range start stop (if (< start stop) 1 -1)))
((start stop step) (let ((le? (if (negative? step) >= <=)))
(let loop ((x start) (xs (list)))
(if (le? stop x) (reverse xs) (loop (+ x step) (cons x xs))))))
(else (error 'range "unrecognized arguments"))))
(define (sum xs) (apply + xs)) ; sum of elements of xs
(define digits (case-lambda ; list of base-b digits of n
((n) (digits n 10))
((n b) (do ((n n (quotient n b))
(ds (list) (cons (modulo n b) ds)))
((zero? n) ds)))))
(define (part k xs) ; k'th lexicographical left-partition of xs
(let loop ((ds (reverse (digits k 2))) (xs xs) (ys (list)))
(if (null? ds) (reverse ys)
(if (zero? (car ds))
(loop (cdr ds) (cdr xs) ys)
(loop (cdr ds) (cdr xs) (cons (car xs) ys))))))
(define (max-lcm xs) ; max lcm of part-sums of 2-partitions of xs
(let ((len (length xs)) (tot (sum xs)))
(apply max (map (lambda (s) (lcm s (- tot s)))
(map sum (map (lambda (k) (part k xs))
(range (expt 2 (- len 1)))))))))
(display (max-lcm '(2 3 4))) (newline) ; 20
(display (max-lcm '(2 3 4 6))) (newline) ; 56
You can find all 2-partitions of a list using the built-in combinations procedure. The idea is, for every element of a (len-k)-combination, there will be an element in the k-combination that complements it, producing a pair of lists whose union is the original list and intersection is the empty list.
For example:
(define (2-partitions lst)
(define (combine left right)
(map (lambda (x y) (list x y)) left right))
(let loop ((m (sub1 (length lst)))
(n 1))
(cond
((< m n) '())
((= m n)
(let* ((l (combinations lst m))
(half (/ (length l) 2)))
(combine (take l half)
(reverse (drop l half)))))
(else
(append
(combine (combinations lst m)
(reverse (combinations lst n)))
(loop (sub1 m) (add1 n)))))))
then you can build the partitions as:
(2-partitions '(2 3 4))
=> '(((2 3) (4))
((2 4) (3))
((3 4) (2)))
(2-partitions '(4 6 7 9))
=> '(((4 6 7) (9))
((4 6 9) (7))
((4 7 9) (6))
((6 7 9) (4))
((4 6) (7 9))
((4 7) (6 9))
((6 7) (4 9)))
Furthermore, you can find the max lcm of the partitions:
(define (max-lcm lst)
(define (local-lcm arg)
(lcm (apply + (car arg))
(apply + (cadr arg))))
(apply max (map local-lcm (2-partitions lst))))
For example:
(max-lcm '(2 3 4))
=> 20
(max-lcm '(4 6 7 9))
=> 165
To partition a list is straightforward recursive non-deterministic programming.
Given an element, we put it either into one bag, or the other.
The very first element will go into the first bag, without loss of generality.
The very last element must go into an empty bag only, if such is present at that time. Since we start by putting the first element into the first bag, it can only be the second:
(define (two-parts xs)
(if (or (null? xs) (null? (cdr xs)))
(list xs '())
(let go ((acc (list (list (car xs)) '())) ; the two bags
(xs (cdr xs)) ; the rest of list
(i (- (length xs) 1)) ; and its length
(z '()))
(if (= i 1) ; the last element in the list is reached:
(if (null? (cadr acc)) ; the 2nd bag is empty:
(cons (list (car acc) (list (car xs))) ; add only to the empty 2nd
z) ; otherwise,
(cons (list (cons (car xs) (car acc)) (cadr acc)) ; two solutions,
(cons (list (car acc) (cons (car xs) (cadr acc))) ; adding to
z))) ; either of the two bags;
(go (list (cons (car xs) (car acc)) (cadr acc)) ; all solutions after
(cdr xs) ; adding to the 1st bag
(- i 1) ; and then,
(go (list (car acc) (cons (car xs) (cadr acc))) ; all solutions
(cdr xs) ; after adding
(- i 1) ; to the 2nd instead
z))))))
And that's that!
In writing this I was helped by following this earlier related answer of mine.
Testing:
(two-parts (list 1 2 3))
; => '(((2 1) (3)) ((3 1) (2)) ((1) (3 2)))
(two-parts (list 1 2 3 4))
; => '(((3 2 1) (4))
; ((4 2 1) (3))
; ((2 1) (4 3))
; ((4 3 1) (2))
; ((3 1) (4 2))
; ((4 1) (3 2))
; ((1) (4 3 2)))
It is possible to reverse the parts before returning, or course; I wanted to keep the code short and clean, without the extraneous details.
edit: The code makes use of a technique by Richard Bird, of replacing (append (g A) (g B)) with (g' A (g' B z)) where (append (g A) y) = (g' A y) and the initial value for z is an empty list.
Another possibility is for the nested call to go to be put behind lambda (as the OP indeed suspected) and activated when the outer call to go finishes its job, making the whole function tail recursive, essentially in CPS style.

List processing in clojure, tail recursion needed

Given a sorted list of intervals, e.g.
(def lst (list [7 10] [32 35]))
I need to implement a function that adds a new interval to the list. If the new interval is adjacent to any of those from the list, they should be merged:
(= (add-range [1 3] lst) (list [1 3] [7 10] [32 35])) ;; prepend left
(= (add-range [1 6] lst) (list [1 10] [32 35])) ;; merge left
(= (add-range [11 20] lst) (list [7 20] [32 35])) ;; merge right
(= (add-range [11 31] lst) (list [7 35])) ;; merge left and right
This is my implementation:
(defn add-range
[range range-list]
(if (empty? range-list)
(list range)
(let
[lo (first range)
hi (second range)
head (first range-list)
head-lo (dec (first head))
head-hi (inc (second head))]
(if (< hi head-lo)
(cons range range-list)
(if (= hi head-lo)
(cons [lo (second head)] (rest range-list))
(if (= lo head-hi)
(recur [(first head) hi] (rest range-list))
(cons head (add-range range (rest range-list)))))))))
It works and looks quite elegant too, but the last line contains a recursive call add-range which can not be replaced with recur because it is not the last call. I'm planning to have long range lists in my application and I don't want to blow up the stack.
How this can be rewritten using the tail recursion?
Is there another approach to solve the problem? Lazy sequences maybe?
UPDATE
The sorted list is actually not required. This can be a set or even an unsorted list, but it would be really nice to do it in a single pass.
Using a sorted set you can implement it as:
;; first the constructor
(defn ranges [& rs]
(apply sorted-set-by
(fn [[from-a to-a] [from-b to-b]]
(< to-a (dec from-b))) rs))
;; then add-range itself
(defn add-range [ranges [from to :as r]]
(let [rs (subseq ranges <= [from from] <= [to to])
ranges (reduce disj ranges rs)]
(conj ranges
(let [[from'] (or (first rs) r)
[_ to'] (or (last rs) r)]
[(min from from') (max to to')]))))
Let's try your tests:
=> (def lst (ranges [7 10] [32 35]))
#'user/lst
=> (add-range lst [1 3])
#{[1 3] [7 10] [32 35]}
=> (add-range lst [1 6])
#{[7 10] [32 35]}
=> (add-range lst [11 20])
#{[7 20] [32 35]}
=> (add-range lst [11 35])
#{[7 35]}
Addendum #1: add-range is O((m + 1) log n) where n is the size of the ranges set and m the number of merged intervals.
In my experience making something tail recursive involves passing as arguments all local state. Looking at the algo, it looks like already processed range items is the local state. ie, final result = (ranges ignored + merged-range + ranges not required to be considered).
Consider the following version, it explicitly passes a seq of already processed items.
(defn add-range
[range-obj ranges]
(loop [processed []
range-obj range-obj
remaining (list* ranges)]
(if (empty? remaining)
(conj processed range-obj)
(let [[lo hi] range-obj
[h-lo h-hi :as head] (first remaining)
upper-merge-threshold (dec h-lo)
lower-merge-threshold (inc h-hi)]
(cond
(< hi upper-merge-threshold) (into processed
(conj remaining range-obj))
(= hi upper-merge-threshold) (into processed
(conj (rest remaining) [lo h-hi]))
(= lo lower-merge-threshold) (recur processed
[h-lo hi]
(rest remaining))
:else (recur (conj processed head)
range-obj
(rest remaining)))))))
My version accepts a vector and returns a vector. You could modify the relevant code to make it accept a list and return a list.
As for is there a better algorithm, I don't know. I have simply converted your algo to be tail recursive.

Clojure: Find locations of "1" in a string and print out them in the format of intervals

I'm tring to solve such a problem:
Given a string consisting of "1" and "0", find all the locations of "1", and print them in the format of intervals.
For example:
"00101110101110" => 3, 5-7, 9, 11-13
My (ugly) solution:
(defn bar [x]
(letfn [(foo [mystr]
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %)))))]
(let [y (map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x))]
(print (ffirst y) "-" y (last (last (foo x)))))))
Explanation:
At first, I find the locations of "1" in the given string:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first)))
"00101110101110" => (3 5 6 7 9 11 12 13)
Then, I partition the list of positions into a sequence of 2-element-tuples. If there is an 1-element-tuple at the end of that sequence, drop it:
(->>
(map-indexed vector mystr)
(filter #(= (second %) \1))
(map (comp inc first))
(partition-all 2 1)
(filter #(= 2 (count %))))
"00101110101110" => ((3 5) (5 6) (6 7) (7 9) (9 11) (11 12) (12 13))
At last, I print out the first position in the first tuple and the second one in the last tuple, while employing
(map #(if (> (- (second %) (first %)) 1) (print (first %) ", " (second %) "-")) (foo x)) to get the middle part.
The input:
(bar "00101110101110")
The final result:
3 , 5 -nil - (nil nil 7 , 9 -nil 9 , 11 -nil nil nil nil) 13
My questions:
How can I remove the nils in the final result?
How can I solve this problem in a more concise way?
In order to understand how to remove nils from the final result, let's understand how they get in there in the first place. The value bound to name y in the last let form is actually a sequence of all nil values. The function bar itself also returns nil. This occurs because print always returns nil and if returns nil when the condition is false and the "else" form is not present. Effectively, every value in the sequence returned by foo is converted to a nil. Non-nil values in the output are values printed as a side effect. nil and non-nil values are mixed because map is lazy and the mapping function is applied only when the last print realizes the lazy sequence y. Needless to say, using map for side effects is a bad idea.
So the simplest way to remove nils from the output is to avoid nil values altogether.
(->> "00101110101110"
(map-indexed vector) ;; ([0 \0] [1 \0] [2 \1]...
(partition-by second) ;; (([0 \0] [1 \0]) ([2 \1]) ([3 \0]) ([4 \1] [5 \1] [6 \1]) ...
(filter #(= \1 (-> % first second))) ;; (([2 \1]) ([4 \1] [5 \1] [6 \1])...
(map #(map (comp inc first) %)) ;; ((3) (5 6 7) (9) (11 12 13))
(map #(if (next %) [(first %) (last %)] %)) ;; ((3) [5 7] (9) [11 13])
(map #(clojure.string/join "-" %)) ;; ("3" "5-7" "9" "11-13")
(clojure.string/join ", "))
;; => "3, 5-7, 9, 11-13"
I found this problem kinda interesting, so I tried to attack it with the approach from this talk: Higher parallelism by mapping the data into a more convenient space and then in parallel combining sub-solutions. To that end, I focused on producing the intervals themselves in parallel; using transducers to perform all intermediate steps, then making an eduction and folding over that. This kind of organisation makes for a number of helper functions and such, so maybe not as good on the concision but hopefully interesting anyway.
I go through an intermediate representation as nested vectors: [accepted boundary], where the interval represented by the 2-vector boundary grows until there is a discontinuity, in which case it is added to the end of accepted.
(defn indices "Transducer finding indices of an element occuring in a sequence"
[element]
(keep-indexed #(when (= element %2) %1)))
(defn combine "Combine two series of intervals"
([] [[] nil])
([[acc-a bnd-a] [acc-b bnd-b]]
(let[ [[a b] [c d]] [bnd-a (first acc-b)] ]
(if (<= b c (inc b))
[(into acc-a (concat [[a d]] (pop acc-b) )) bnd-b]
[(into acc-a (concat [bnd-a] acc-b)) bnd-b]))))
(defn plus "Add an interval to the series"
([] [[] nil])
([[accepted boundary] to-add]
(if (nil? boundary)
[accepted to-add]
(let[[[a b] [c d]] [boundary to-add]]
(if (<= b c (inc b))
[accepted [a d]]
[(conj accepted boundary) to-add])))))
(defn printable-indices [element the-seq]
(let[glommed (clojure.core.reducers/fold combine plus (eduction (comp (indices \1) (map #(vector % %))) the-seq))
fixed-up (conj (first glommed) (last glommed))] ;;Because the reduction is done, the last boundary is now accepted.
(clojure.string/join ", " (map (fn [[a b]](if (= a b) (str a) (str a \- b)))) fixed-up)))

Clojure Group Sequential Occurrences - Improve Function

I'm trying to group items that appear directly beside each other, so long as they are each in a given "white-list". Groupings must have at least two or more items to be included.
For example, first arg is the collection, second arg the whitelist.
(group-sequential [1 2 3 4 5] [2 3])
>> ((2 3))
(group-sequential ["The" "quick" "brown" "healthy" "fox" "jumped" "over" "the" "fence"]
["quick" "brown" "over" "fox" "jumped"])
>> (("quick" "brown") ("fox" "jumped" "over"))
(group-sequential [1 2 3 4 5 6 7] [2 3 6])
>> ((2 3))
This is what I've come up with:
(defn group-sequential
[haystack needles]
(loop [l haystack acc '()]
(let [[curr more] (split-with #(some #{%} needles) l)]
(if (< (count curr) 2)
(if (empty? more) acc (recur (rest more) acc))
(recur (rest more) (cons curr acc))))))
It works, but is pretty ugly. I wonder if there's a much simpler idiomatic way to do it in Clojure? (You should have seen the fn before I discovered split-with :)
I bet there's a nice one-liner with partition-by or something, but it's late and I can't quite seem to make it work.
(defn group-sequential [coll white]
(->> coll
(map (set white))
(partition-by nil?)
(filter (comp first next))))
... a tidier version of Diego Basch's method.
Here's my first attempt:
(defn group-sequential [xs wl]
(let [s (set wl)
f (map #(if (s %) %) xs)
xs' (partition-by nil? f)]
(remove #(or (nil? (first %)) (= 1 (count %))) xs')))
(defn group-sequential
[coll matches]
(let [matches-set (set matches)]
(->> (partition-by (partial contains? matches-set) coll)
(filter #(clojure.set/subset? % matches-set))
(remove #(< (count %) 2)))))
Ok, I realized partition-by is pretty close to what I'm looking for, so I created this function which seems a lot more in line with the core stuff.
(defn partition-if
"Returns a lazy seq of partitions of items that match the filter"
[pred coll]
(lazy-seq
(when-let [s (seq coll)]
(let [[in more0] (split-with pred s)
[out more] (split-with (complement pred) more0)]
(if (empty? in)
(partition-if pred more)
(cons in (partition-if pred more)))))))
(partition-if #(some #{%} [2 3 6]) [1 2 3 4 5 6 7])
>> ((2 3))

Resources