How to remove sequential matches in vector in Clojure?

How to remove sequential matches in vector in Clojure? - algorithm

Let's say I have a vector ["a" "b" "c" "a" "a" "b"]. If given a sequence ["a" "b"], how can I remove all instances of that sequence (in order)? Here, the result would just be ["c" "a"].

If sequences that need to be removed are known in advance, core.match may be useful for your task:
(require '[clojure.core.match :refer [match]])
(defn remove-patterns [seq]
(match seq
["a" "b" & xs] (remove-patterns xs)
[x & xs] (cons x (remove-patterns xs))
[] ()))
(remove-patterns ["a" "b" "c" "a" "a" "b"]) ;; => ("c" "a")

The short answer is to treat it as a string and do a regex remove:
(defn remove-ab [v]
(mapv str (clojure.string/replace (apply str v) #"ab" "")))
(remove-ab ["a" "b" "c" "a" "a" "b"])
=> ["c" "a"]
The long answer is to implement your own regex state machine by iterating through the sequence, identifying matches, and returning a sequence without them.
Automat can help with making your own low level regex state machine:
https://github.com/ztellman/automat
Instaparse can be used to make rich grammas:
https://github.com/Engelberg/instaparse
You don't really need a library for such a small match, you can implement it as a loop:
(defn remove-ab [v]
(loop [[c & remaining] v
acc []
saw-a false]
(cond
(nil? c) (if saw-a (conj acc "a") acc) ;; terminate
(and (= "b" c) saw-a) (recur remaining acc false) ;; ignore ab
(= "a" c) (recur remaining (if saw-a (conj acc "a") acc) true) ;; got a
(and (not= "b" c) saw-a) (recur remaining (conj (conj acc "a") c) false) ;; keep ac
:else (recur remaining (conj acc c) false)))) ;; add c
But getting all the conditions right can be tricky... hence why a formal regex or state machine is advantageous.
Or a recursive definition:
(defn remove-ab [[x y & rest]]
(cond
(and (= x "a") (= y "b")) (recur rest)
(nil? x) ()
(nil? y) [x]
:else (cons x (remove-ab (cons y rest)))))

Recursive solution for a 2-element subsequence:
(defn f [sq [a b]]
(when (seq sq)
(if
(and
(= (first sq) a)
(= (second sq) b))
(f (rest (rest sq)) [a b])
(cons (first sq) (f (rest sq) [a b])))))
not exhaustively tested but seems to work.

A simple solution using lazy-seq, take and drop working for any finite subseq and any (including infinite) sequence that needs to be filtered:
(defn remove-subseq-at-start
[subseq xs]
(loop [xs xs]
(if (= (seq subseq) (take (count subseq) xs))
(recur (drop (count subseq) xs))
xs)))
(defn remove-subseq-all [subseq xs]
(if-let [xs (seq (remove-subseq-at-start subseq xs))]
(lazy-seq (cons (first xs) (remove-subseq subseq (rest xs))))
()))
(deftest remove-subseq-all-test
(is (= ["c" "a"] (remove-subseq-all ["a" "b"] ["a" "b" "a" "b" "c" "a" "a" "b"])))
(is (= ["a"] (remove-subseq-all ["a" "b"] ["a"])))
(is (= ["a" "b"] (remove-subseq-all [] ["a" "b"])))
(is (= [] (remove-subseq-all ["a" "b"] ["a" "b" "a" "b"])))
(is (= [] (remove-subseq-all ["a" "b"] nil)))
(is (= [] (remove-subseq-all [] [])))
(is (= ["a" "b" "a" "b"] (->> (remove-subseq-all ["c" "d"] (cycle ["a" "b" "c" "d"]))
(drop 2000000)
(take 4))))
(is (= (seq "ca") (remove-subseq-all "ab" "ababcaab"))))

If you can ensure that the input is a vector, we can use subvec to check on every element whether the following subvector of the same length matches the pattern. If so, we omit it, otherwise we move ahead to the next element in the vector:
(let [pattern ["a" "b"]
source ["a" "b" "c" "a" "a" "b"]]
(loop [source source
pattern-length (count pattern)
result []]
(if (< (count source) pattern-length)
(into [] (concat result source))
(if (= pattern (subvec source 0 pattern-length))
; skip matched part of source
(recur (subvec source pattern-length) pattern-length result)
; otherwise move ahead one element and save it as result
(recur (subvec source 1) pattern-length
(conj result (first source)))))))
With general sequences, you could use the same approach, substituting take and drop as appropriate.

Related

Combinations in scheme without direct map?

Related to this question: Pair combinations in scheme, I'm trying to write a function that creates possible sequences of a list. I'm also trying to annotate it to myself with some lets, rather than putting everything in maps. Here is what I have so far:
(define (remove-from-list elem L)
(filter (lambda (x) (not (= x elem))) L))
(define (prepend-element-to-list-of-lists elem L)
(map (lambda (x) (append (list elem) x)) L))
(define (perm L)
; returns a list of lists, so base case will be '(()) rather than '()
(if (null? L) '(())
; we will take out the first element, this is our "prepend-item"
(let ((prepend-element (car L))
(list-minus-self (remove-from-list (car L) L)))
; prepend-to-list-of-lists
(let ((other-lists-minus-self (perm list-minus-self)))
(prepend-element-to-list-of-lists prepend-element other-lists-minus-self)
))))
(perm3 '(1 2 3))
((1 2 3)) ; seems to be stopping before doing the recursive cases/iterations.
What I'm trying to do here is to take out the first element of a list, and prepend that to all list-of-lists that would be created by the procedure without that element. For example, for [1,2,3] the first case would be:
Take out 1 --> prepended to combinations from [2,3], and so eventually it comes to [1,2,3] and [1,3,2].
However, I was seeing if I can do this without map and just calling itself. Is there a way to do that, or is map the only way to do the above for 1, then 2, then 3, ...
And related to this, for the "working normal case", why does the following keep nesting parentheticals?
(define (perm L)
(if (null? L) '(())
; (apply append <-- why is this part required?
(map (lambda (elem)
(map (lambda (other_list) (cons elem other_list))
(perm (remove-from-list elem L))))
L)))
; )
That is, without doing an (apply append) outside the map, I get the "correct" answer, but with tons of nested parens: (((1 (2 (3))) (1 (3 (2)))) ((2 (1 (3))) (2 (3 (1)))) ((3 (1 (2))) (3 (2 (1))))). I suppose if someone could just show an example of a more basic setup where a map 'telescopes' without the big function that might be helpful.

Regarding "where do parens come from", it's about types: the function being mapped turns "element" into a "list of elements", so if you map it over a list of elements, you turn each element in the list into a list of elements: ....
[ 1, 2, 3, ] -->
[ [ 1a, 1b ], [2a], [] ]
, say, (in general; not with those functions in question). And since there's recursion there, we then have something like
[ [ [1a1], [] ], [[]], [] ]
, and so on.
So map foo is listof elts -> listof (listof elts):
`foo` is: elt -> (listof elts)
-------------------------------------------------------
`map foo` is: listof elts -> listof (listof elts)
But if we apply append after the map on each step, we've leveled it into the listof elts -> listof elts,
`map foo`: listof elts -> listof (listof elts)
`apply append`: listof (listof elts) -> listof elts
----------------------------------------------------------------------
`flatmap foo`: listof elts -> listof elts
and so no new parens are popping up -- since they are leveled at each step when they appear, so they don't accumulate like that; the level of nestedness stays the same.
That's what apply append does: it removes the inner parens:
(apply append [ [x, ...], [y, ...], [z, ...] ] ) ==
( append [x, ...] [y, ...] [z, ...] ) ==
[ x, ..., y, ..., z, ... ]
So, as an example,
> (define (func x) (if (= 0 (remainder x 3)) '()
(if (= 0 (remainder x 2)) (list (+ x 1))
(list (+ x 1) (+ x 2)))))
> (display (map func (list 1 2 3 4)))
((2 3) (3) () (5))
> (display (map (lambda (xs) (map func xs)) (map func (list 1 2 3 4))))
(((3) ()) (()) () ((6 7)))
> (display (flatmap func (list 1 2 3 4)))
(2 3 3 5)
> (display (flatmap func (flatmap func (list 1 2 3 4))))
(3 6 7)
Now that the types fit, the flatmap funcs compose nicely, unlike without the flattening. Same happens during recursion in that function. The deeper levels of recursion work on the deeper levels of the result list. And without the flattening this creates more nestedness.

Scheme - generate all distinct permutations of a list

While reading a certain book about functional programming and scheme (and Racket) in particular, I happened upon an exercise which states the following:
`
"Write a function 'rp' which takes, as an argument, a list 'lp' of pairs '(a . n)',
where 'a' is either a symbol or a number and 'n' is a natural number,
and which returns the list of all the lists, whose elements are the 'a's defined by
the pairs in 'lp', each one appearing exactly 'n' times."
For some reason this is really cryptic, but what it basically asks for is the list of all distinct permutations of a list containing n times the number/symbol a.
E.g : [[(rp '((a . 2) (b . 1))]] = '((a a b) (a b a) (b a a))
Generating the permutations, ignoring the distinct part, is fairly easy since there is a, relatively, straight forward recursive definition:
The list of permutations of an empty list, is a list containing an empty list.
The list of permutations of 3 elements a b c is a list containing the lists of all permutations of
a and b where, for each one, c has been inserted in all possible positions.
Which I translated in the following racket code:
(define permut
(lambda(ls)
(if(null? ls) '(())
(apply append
(map (lambda(l) (insert_perm (car ls) l))
(permut (cdr ls)))))))
(define insert_perm
(lambda(x ls)
(if(null? ls) (list (list x))
(cons (cons x ls)
(map (lambda(l) (cons (car ls) l))
(insert_perm x (cdr ls)))))))
This works, but does not return distinct permutations. Taking into account the duplicates seems to me much more complicated. Is there a simple modification of the simple permutation case that I cannot see? Is the solution completely different? Any help would be appreciated.

The change is pretty simple. When you have no duplicate, the following works:
The list of permutations of 3 elements a b c is a list containing the lists of all permutations of a and b where, for each one, c has been inserted in all possible positions.
With duplicates, the above doesn't work anymore. A permutation of 2 elements a = "a", b = "b" is:
"a" "b"
"b" "a"
Now, consider c = "a". If you insert it in all possible positions, then you would get:
c "a" "b" = "a" "a" "b"
"a" c "b" = "a" "a" "b"
"a" "b" c = "a" "b" "a"
c "b" "a" = "a" "b" "a"
"b" c "a" = "b" "a" "a"
"b" "a" c = "b" "a" "a"
So instead, make a restriction that when you are inserting, you will only do it before the first occurrence of the same element that exists in the list that you are inserting to:
c "a" "b" = "a" "a" "b" -- this is OK. c comes before the first occurrence of "a"
"a" c "b" = "a" "a" "b" -- this is not OK. c comes after the first occurrence of "a"
"a" "b" c = "a" "b" "a" -- this is not OK
c "b" "a" = "a" "b" "a" -- this is OK
"b" c "a" = "b" "a" "a" -- this is OK
"b" "a" c = "b" "a" "a" -- this is not OK
This gives:
"a" "a" "b"
"a" "b" "a"
"b" "a" "a"
as desired.
Moreover, you can see that this algorithm is a generalization of the algorithm that doesn't work with duplicates. When there's no duplicate, there's no "first occurrence", so you are allowed to insert everywhere.
By the way, here's how I would format your code in Racket/Scheme style:
(define (permut ls)
(if (null? ls)
'(())
(apply append
(map (lambda (l) (insert-perm (car ls) l))
(permut (cdr ls))))))
(define (insert-perm x ls)
(if (null? ls)
(list (list x))
(cons (cons x ls)
(map (lambda (l) (cons (car ls) l))
(insert-perm x (cdr ls))))))

After some thought I came up with my own recursive definition that seems to work. This solution is an alternative to the one proposed in the answer by #Sorawee Porncharoenwase and can be defined as follows:
The distinct permutations of a list containing only one kind of element
(e.g '(a a a)) is the list itself.
if (f l) gives the list of distinct permutations (lists) of l,
where l contains x times each distinct element el_i, 0<=i<=n
and if ll is the list l plus one element el_i, 0<=i<=n+1 (distinct or not)
Then the distinct permutations of ll is a list containing
all the following possible concatenations:
el_i + (f l/{el_i}), where l/{el_i} is the list l excluding its ith distinct element.
To illustrate this definition, consider the following examples:
The list of all distinct permutations of (a b c) is the list containing
a + {(b c) (c b)} = (a b c) (a c b)
b + {(a c) (c a)} = (b a c) (b c a)
c + {(a b) (b a)} = (c a b) (c b a)
The list of all distinct permutations of (a a b) is the list containing:
a + {(a b) (b a)} = (a a b) (a b a)
b + {(a a)} = (b a a)
etc...
Similarly, the list of all distinct permutations of (a a b c) is:
a + {(a b c) ...} = (a a b c) (a a c b) (a b a c) (a b c a) (a c a b) (a c b a)
b + {(a a c) ...} = (a a c) (a c a) (c a a)
c + {(a a b) ...} = (a a b) (a b a) (b a a)
This leads to the following implementation:
(define unique_perm
(lambda(ls)
(if (= (length ls) 1)
(list (build-list (cdar ls) (const (caar ls))))
(apply append (map (lambda(p) (map (lambda(l) (cons (car p) l)) (unique_perm (update_ls ls p)))) ls)))))
(define update_ls
(lambda(ls p)
(cond ((null? ls) ls)
((equal? (caar ls) (car p))
(if (= (- (cdar ls) 1) 0)
(cdr ls)
(cons (cons (caar ls) (- (cdar ls) 1)) (cdr ls))))
(else (cons (car ls) (update_ls (cdr ls) p))))))
Example:
> (unique_perm_2 '((a . 3) (b . 2)))
'((a a a b b) (a a b a b) (a a b b a) (a b a a b) (a b a b a) (a b b a a) (b a a a b) (b a a b a) (b a b a a) (b b a a a))

Clojure Group Sequential Occurrences - Improve Function

I'm trying to group items that appear directly beside each other, so long as they are each in a given "white-list". Groupings must have at least two or more items to be included.
For example, first arg is the collection, second arg the whitelist.
(group-sequential [1 2 3 4 5] [2 3])
>> ((2 3))
(group-sequential ["The" "quick" "brown" "healthy" "fox" "jumped" "over" "the" "fence"]
["quick" "brown" "over" "fox" "jumped"])
>> (("quick" "brown") ("fox" "jumped" "over"))
(group-sequential [1 2 3 4 5 6 7] [2 3 6])
>> ((2 3))
This is what I've come up with:
(defn group-sequential
[haystack needles]
(loop [l haystack acc '()]
(let [[curr more] (split-with #(some #{%} needles) l)]
(if (< (count curr) 2)
(if (empty? more) acc (recur (rest more) acc))
(recur (rest more) (cons curr acc))))))
It works, but is pretty ugly. I wonder if there's a much simpler idiomatic way to do it in Clojure? (You should have seen the fn before I discovered split-with :)
I bet there's a nice one-liner with partition-by or something, but it's late and I can't quite seem to make it work.

(defn group-sequential [coll white]
(->> coll
(map (set white))
(partition-by nil?)
(filter (comp first next))))
... a tidier version of Diego Basch's method.

Here's my first attempt:
(defn group-sequential [xs wl]
(let [s (set wl)
f (map #(if (s %) %) xs)
xs' (partition-by nil? f)]
(remove #(or (nil? (first %)) (= 1 (count %))) xs')))

(defn group-sequential
[coll matches]
(let [matches-set (set matches)]
(->> (partition-by (partial contains? matches-set) coll)
(filter #(clojure.set/subset? % matches-set))
(remove #(< (count %) 2)))))

Ok, I realized partition-by is pretty close to what I'm looking for, so I created this function which seems a lot more in line with the core stuff.
(defn partition-if
"Returns a lazy seq of partitions of items that match the filter"
[pred coll]
(lazy-seq
(when-let [s (seq coll)]
(let [[in more0] (split-with pred s)
[out more] (split-with (complement pred) more0)]
(if (empty? in)
(partition-if pred more)
(cons in (partition-if pred more)))))))
(partition-if #(some #{%} [2 3 6]) [1 2 3 4 5 6 7])
>> ((2 3))

How do I partition a sequence into increasing sub sequences in Clojure?

I have a sequence of integers and I would like to partition them into increasing segments and I want to have as little as possible segments. So I want to have
(segmentize [1 2 3 4 3 8 9 1 7] <=)
;=> [[1 2 3 4][3 8 9][1 7]]
I have implemented segmentize as follows:
(defn segmentize [col lte]
(loop [col col s [] res []]
(cond (empty? col) (conj res s)
(empty? s) (recur (rest col) (conj s (first col)) res)
(lte (last s) (first col)) (recur (rest col) (conj s (first col)) res)
:else (recur col [] (conj res s)))))
But I was wondering if there is already some handy clojure function that does exactly this, or if there is a more idiomatic way to do this.

You can build this with partition-by
(defn segmentize [cmp coll]
(let [switch (reductions = true (map cmp coll (rest coll)))]
(map (partial map first) (partition-by second (map list coll switch)))))
(segmentize <= [1 2 3 4 3 8 9 1 7])
;=> ((1 2 3 4) (3 8 9) (1 7))
The first two maps of the last line may be changed to mapv if you really want vectors rather than lazy sequences.

Another lazy implementation. Basically find out how many consecutive pairs of numbers return true for the "lte" function (take-while + segment) and then split the original collection by that number. Repeat with the reminder collection:
(defn segmentize
[coll lte]
(lazy-seq
(when-let [s (seq coll)]
(let [pairs-in-segment (take-while (fn [[a b]] (lte a b)) (partition 2 1 s))
[segment reminder] (split-at (inc (count pairs-in-segment)) s)]
(cons segment
(segmentize reminder lte))))))

This is a special case of some of the sequence-handling functions in org.flatland/useful, specifically flatland.useful.seq/partition-between:
(partition-between (partial apply >) xs)
If you require a from-scratch implementation with no external dependencies, I'd prefer dAni's answer.

Here is my version of segmentize (I called in split-when):
(defn split-when [f s]
(reduce (fn [acc [a b]]
(if (f b a)
(conj acc [b])
(update-in acc [(dec (count acc))] conj b)))
[[(first s)]]
(partition 2 1 s)))
(split-when < [1 2 3 4 3 8 9 1 7])
;; [[1 2 3 4] [3 8 9] [1 7]]

Because everybody loves lazy sequences:
(defn segmentize [coll cmp]
(if-let [c (seq coll)]
(lazy-seq
(let [[seg rem] (reduce (fn [[head tail] x]
(if (cmp (last head) x)
[(conj head x) (next tail)]
(reduced [head tail])))
[(vec (take 1 c)) (drop 1 c)]
(drop 1 c))]
(cons seg (segmentize rem cmp))))))
The code to compute each segment could probably be made a little less verbose using loop/recur, but I tend to find reduce more readable most of the time.

zip up two list of different lengths in elisp

I have two lists:
(setq x (list "a" "b" "c"))
(setq y (list "1" "2" "3" "4"))
How can I create a list of cons cells (("a" . "1") ("b" . "2") ("c" . "3") ("a" . "4")) with the shorter list recycled?

Here's my take:
(require 'cl-lib)
(cl-mapcar #'list (setcdr (last x) x) y)
I'd add a check for which of them is larger, but that would spoil the brevity:).

There is surely a simpler way to do it, but here's a version that turns the input sequences into infinite lists and zips them together:
(defun* cycle-iterator (xs &optional (idx 0) (len (length xs)))
"Create an iterator that will cycle over the elements in XS.
Return a cons, where the car is the current value and the cdr is
a function to continue the iteration."
(cons (nth (mod idx len) xs)
(eval `(lambda () (cycle-iterator ',xs ,(1+ idx) ,len)))))
(defun cycle-take (xs n)
"Take N elements from XS, cycling the elements if N exceeds the length of XS."
(loop
when (plusp n)
;; Creating the iterator returns the first value. Subsequent calls can then
;; be processed in a loop.
with (value . iterator) = (cycle-iterator xs)
with acc = (list value)
repeat (1- n) do (destructuring-bind (val . next) (funcall iterator)
(setq iterator next)
(setq acc (cons val acc)))
finally (return (nreverse acc))))
(defun cycling-zip (xs ys)
"Zip XS and YS together, cycling elements to ensure the result
is as long as the longest input list."
(loop
with limit = (max (length xs) (length ys))
for x in (cycle-take xs limit)
for y in (cycle-take ys limit)
collect (cons x y)))
;; Usage:
(cycling-zip '("a" "b" "c") '("1" "2" "3" "4"))
; => (("a" . "1") ("b" . "2") ("c" . "3") ("a" . "4"))

This answer requires dash list manipulation library. Before attacking your problem, it's good to find the length of the longest list. The first way I came up with is:
(require 'dash)
(require 'dash-functional)
(length (-max-by (-on '> 'length) (list x y))) ; 4
-on is a smart function from package dash-functional that accepts a comparator, a key on which to compare, and returns a function that compares on this key. Therefore (-max-by (-on '> 'length) xs) finds an element in xs, whose length is biggest. But this expression is too smart for its own sake, and dash-functional only works in Emacs 24 due to lexical scoping. Let's rewrite it, inspired by Python solution:
(-max (-map 'length (list x y))) ; 4
To take first n elements from an infinite cycled list, do (-take n (-cycle xs)). Therefore to create an alist, where elements from a smaller list are cycled, write:
(let ((len (-max (-map 'length (list x y)))))
(flet ((cycle (xs) (-take len (-cycle xs))))
(-zip (cycle x) (cycle y)))) ; (("a" . "1") ("b" . "2") ("c" . "3") ("a" . "4"))

I went with the recursive approach that seemed natural for lisp.
(defun zip (xs ys)
(cond
((or (null xs) (null ys)) ())
(t (cons (cons (car xs) (car ys)) (zip (cdr xs) (cdr ys))))))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to remove sequential matches in vector in Clojure? - algorithm

Let's say I have a vector ["a" "b" "c" "a" "a" "b"]. If given a sequence ["a" "b"], how can I remove all instances of that sequence (in order)? Here, the result would just be ["c" "a"].

Recursive solution for a 2-element subsequence: (defn f [sq [a b]] (when (seq sq) (if (and (= (first sq) a) (= (second sq) b)) (f (rest (rest sq)) [a b]) (cons (first sq) (f (rest sq) [a b]))))) not exhaustively tested but seems to work.

Related

Combinations in scheme without direct map?

Scheme - generate all distinct permutations of a list

Clojure Group Sequential Occurrences - Improve Function

How do I partition a sequence into increasing sub sequences in Clojure?

zip up two list of different lengths in elisp

Categories

Resources