Suppose I have a list of tuples like so:
[["type_2" "val_x"] ["type_1" "val_y"] ["type_1" "val_z"]]
I'd like to filter them, so that I have two separate collections like this:
[["type_2" "val_x"]]
[["type_1" "val_y"] ["type_1" "val_z"]]
I can run filter twice. I'm wondering if it's possible to achieve the same result in a single iteration with functional programming?
This is the desired interface:
(multiple-filter predicate_fn_1 predicate_fn_2 coll)
while (vals (group-by first... would work ok in your case, it is not universal. Here is a variant (one of many possible ones) of applying multiple filters:
(defn classify [items & preds]
(loop [[x & xs :as items] items
res (repeat (count preds) [])]
(if (empty? items)
res
(recur xs
(mapv #(if (% x) (conj %2 x) %2) preds res)))))
in repl:
user> (classify [[:a 10] [:a 20] [:b 30] [:d 2] [:c 40] [:d 1]]
#(= (first %) :a)
#(= (first %) :b)
#(= (first %) :d))
[[[:a 10] [:a 20]] [[:b 30]] [[:d 2] [:d 1]]]
or the same with reduce:
(defn classify [items & preds]
(reduce (fn [res x] (mapv #(if (% x) (conj %2 x) %2) preds res))
(repeat (count preds) [])
items))
The classify function by #leetwinski fails to satisfy your desired interface; as an example, here is a compliant implementation:
(defn multiple-filter [& preds-and-coll]
(let [[preds coll] ((juxt drop-last last) preds-and-coll)]
(mapv #(filterv % coll) preds)))
Example:
(multiple-filter (comp #{"type_1"} first)
(comp #{"type_2"} first)
[["type_2" "val_x"] ["type_1" "val_y"] ["type_1" "val_z"]])
;;=> [[["type_1" "val_y"] ["type_1" "val_z"]] [["type_2" "val_x"]]]
I haven't implemented this as a single iteration because that would complicate this answer and not affect the algorithmic complexity, but feel free to replace my implementation using mapv and filterv with #leetwinski's single-iteration implementation.
I would like to implement a function which maps over a sequence of maps and update values when predicates match
Here is a first working draft :
(defn update-if
([m k pred f]
(let [init (get m k)]
(if (and (not-nil? init) (pred init))
(update m k f)
m)))
([m bindings]
(reduce-kv
(fn [agg k v]
(let [[pred f] v]
(update-if agg k pred f)))
m bindings)))
(update-if {:a 1 :b 2} {:a [even? inc] :b [even? dec]}) ;; ==> {:a 1 :b 1}
(update-if {:a 1 :b 2} :b even? dec) ;; ==> {:a 1 :b 1}
(defn map-when
"Walks a collection of associative collections
and applies functions based on predicates
Output :
(map-when {:a [even? inc] :b [nan? zero]} '({:a 1 :b NaN} {:a 2 :b 7} {:a 4 :b NaN}))
=
({:a 1 :b 0} {:a 3 :b 7} {:a 5 :b 0})"
([bindings data]
(reduce
(fn [acc row]
(conj acc (update-if row bindings)))
'() data))
([pred f data]
(map
(fn [x]
(if (and (not-nil? x) (pred x))
(f x)
x))
data)))
Not-nil? check is important (here) because it just means data is missing.
The function takes around 2s to perform this on 1 million random {:a :b} maps (random gen included) .
I feel odd that no function exists for this in core/core-related library.
Are there some performance hints to improve this ? I tried transient but it does not work on empty lists '()
Thanks
You should look at the specter library. It probably has what you are looking for. Example:
(def data {:a [{:aa 1 :bb 2}
{:cc 3}]
:b [{:dd 4}]})
;; Manual Clojure
(defn map-vals [m afn]
(->> m (map (fn [[k v]] [k (afn v)])) (into {})))
(map-vals data
(fn [v]
(mapv
(fn [m]
(map-vals
m
(fn [v] (if (even? v) (inc v) v))))
v)))
;; Specter
(transform [MAP-VALS ALL MAP-VALS even?] inc data)
Generate just the necessary lambda to maximize reusability.
(defn cond-update-fn [clauses]
(fn [m]
(reduce (fn [m [k [pred f]]]
(cond-> m
(and (contains? m k)
(pred (get m k))) (update k f)))
m
clauses)))
If your preds and fns are known at compile time writing a macro instead (left as exercise for the reader) gives higher performance because of no pred iteration overhead.
Reuse in any context:
(def input [{:a 42, :b 42} {:a 42,:b 43}])
(def cond-update
(cond-update-fn {:a [even? inc]
:b [odd? dec]}))
(map cond-update input)
;-> ({:a 43, :b 42} {:a 43, :b 42})
;; Transducer
(into [] (map cond-update) input)
;-> [{:a 43, :b 42} {:a 43, :b 42}]
;; Standalone
(cond-update {:a 32})
;-> {:a 33}
Let's say I have a vector ["a" "b" "c" "a" "a" "b"]. If given a sequence ["a" "b"], how can I remove all instances of that sequence (in order)? Here, the result would just be ["c" "a"].
If sequences that need to be removed are known in advance, core.match may be useful for your task:
(require '[clojure.core.match :refer [match]])
(defn remove-patterns [seq]
(match seq
["a" "b" & xs] (remove-patterns xs)
[x & xs] (cons x (remove-patterns xs))
[] ()))
(remove-patterns ["a" "b" "c" "a" "a" "b"]) ;; => ("c" "a")
The short answer is to treat it as a string and do a regex remove:
(defn remove-ab [v]
(mapv str (clojure.string/replace (apply str v) #"ab" "")))
(remove-ab ["a" "b" "c" "a" "a" "b"])
=> ["c" "a"]
The long answer is to implement your own regex state machine by iterating through the sequence, identifying matches, and returning a sequence without them.
Automat can help with making your own low level regex state machine:
https://github.com/ztellman/automat
Instaparse can be used to make rich grammas:
https://github.com/Engelberg/instaparse
You don't really need a library for such a small match, you can implement it as a loop:
(defn remove-ab [v]
(loop [[c & remaining] v
acc []
saw-a false]
(cond
(nil? c) (if saw-a (conj acc "a") acc) ;; terminate
(and (= "b" c) saw-a) (recur remaining acc false) ;; ignore ab
(= "a" c) (recur remaining (if saw-a (conj acc "a") acc) true) ;; got a
(and (not= "b" c) saw-a) (recur remaining (conj (conj acc "a") c) false) ;; keep ac
:else (recur remaining (conj acc c) false)))) ;; add c
But getting all the conditions right can be tricky... hence why a formal regex or state machine is advantageous.
Or a recursive definition:
(defn remove-ab [[x y & rest]]
(cond
(and (= x "a") (= y "b")) (recur rest)
(nil? x) ()
(nil? y) [x]
:else (cons x (remove-ab (cons y rest)))))
Recursive solution for a 2-element subsequence:
(defn f [sq [a b]]
(when (seq sq)
(if
(and
(= (first sq) a)
(= (second sq) b))
(f (rest (rest sq)) [a b])
(cons (first sq) (f (rest sq) [a b])))))
not exhaustively tested but seems to work.
A simple solution using lazy-seq, take and drop working for any finite subseq and any (including infinite) sequence that needs to be filtered:
(defn remove-subseq-at-start
[subseq xs]
(loop [xs xs]
(if (= (seq subseq) (take (count subseq) xs))
(recur (drop (count subseq) xs))
xs)))
(defn remove-subseq-all [subseq xs]
(if-let [xs (seq (remove-subseq-at-start subseq xs))]
(lazy-seq (cons (first xs) (remove-subseq subseq (rest xs))))
()))
(deftest remove-subseq-all-test
(is (= ["c" "a"] (remove-subseq-all ["a" "b"] ["a" "b" "a" "b" "c" "a" "a" "b"])))
(is (= ["a"] (remove-subseq-all ["a" "b"] ["a"])))
(is (= ["a" "b"] (remove-subseq-all [] ["a" "b"])))
(is (= [] (remove-subseq-all ["a" "b"] ["a" "b" "a" "b"])))
(is (= [] (remove-subseq-all ["a" "b"] nil)))
(is (= [] (remove-subseq-all [] [])))
(is (= ["a" "b" "a" "b"] (->> (remove-subseq-all ["c" "d"] (cycle ["a" "b" "c" "d"]))
(drop 2000000)
(take 4))))
(is (= (seq "ca") (remove-subseq-all "ab" "ababcaab"))))
If you can ensure that the input is a vector, we can use subvec to check on every element whether the following subvector of the same length matches the pattern. If so, we omit it, otherwise we move ahead to the next element in the vector:
(let [pattern ["a" "b"]
source ["a" "b" "c" "a" "a" "b"]]
(loop [source source
pattern-length (count pattern)
result []]
(if (< (count source) pattern-length)
(into [] (concat result source))
(if (= pattern (subvec source 0 pattern-length))
; skip matched part of source
(recur (subvec source pattern-length) pattern-length result)
; otherwise move ahead one element and save it as result
(recur (subvec source 1) pattern-length
(conj result (first source)))))))
With general sequences, you could use the same approach, substituting take and drop as appropriate.
I'm trying to group items that appear directly beside each other, so long as they are each in a given "white-list". Groupings must have at least two or more items to be included.
For example, first arg is the collection, second arg the whitelist.
(group-sequential [1 2 3 4 5] [2 3])
>> ((2 3))
(group-sequential ["The" "quick" "brown" "healthy" "fox" "jumped" "over" "the" "fence"]
["quick" "brown" "over" "fox" "jumped"])
>> (("quick" "brown") ("fox" "jumped" "over"))
(group-sequential [1 2 3 4 5 6 7] [2 3 6])
>> ((2 3))
This is what I've come up with:
(defn group-sequential
[haystack needles]
(loop [l haystack acc '()]
(let [[curr more] (split-with #(some #{%} needles) l)]
(if (< (count curr) 2)
(if (empty? more) acc (recur (rest more) acc))
(recur (rest more) (cons curr acc))))))
It works, but is pretty ugly. I wonder if there's a much simpler idiomatic way to do it in Clojure? (You should have seen the fn before I discovered split-with :)
I bet there's a nice one-liner with partition-by or something, but it's late and I can't quite seem to make it work.
(defn group-sequential [coll white]
(->> coll
(map (set white))
(partition-by nil?)
(filter (comp first next))))
... a tidier version of Diego Basch's method.
Here's my first attempt:
(defn group-sequential [xs wl]
(let [s (set wl)
f (map #(if (s %) %) xs)
xs' (partition-by nil? f)]
(remove #(or (nil? (first %)) (= 1 (count %))) xs')))
(defn group-sequential
[coll matches]
(let [matches-set (set matches)]
(->> (partition-by (partial contains? matches-set) coll)
(filter #(clojure.set/subset? % matches-set))
(remove #(< (count %) 2)))))
Ok, I realized partition-by is pretty close to what I'm looking for, so I created this function which seems a lot more in line with the core stuff.
(defn partition-if
"Returns a lazy seq of partitions of items that match the filter"
[pred coll]
(lazy-seq
(when-let [s (seq coll)]
(let [[in more0] (split-with pred s)
[out more] (split-with (complement pred) more0)]
(if (empty? in)
(partition-if pred more)
(cons in (partition-if pred more)))))))
(partition-if #(some #{%} [2 3 6]) [1 2 3 4 5 6 7])
>> ((2 3))
What is the best way to test whether a list contains a given value in Clojure?
In particular, the behaviour of contains? is currently confusing me:
(contains? '(100 101 102) 101) => false
I could obviously write a simple function to traverse the list and test for equality, but there must surely be a standard way to do this?
Ah, contains?... supposedly one of the top five FAQs re: Clojure.
It does not check whether a collection contains a value; it checks whether an item could be retrieved with get or, in other words, whether a collection contains a key. This makes sense for sets (which can be thought of as making no distinction between keys and values), maps (so (contains? {:foo 1} :foo) is true) and vectors (but note that (contains? [:foo :bar] 0) is true, because the keys here are indices and the vector in question does "contain" the index 0!).
To add to the confusion, in cases where it doesn't make sense to call contains?, it simply return false; this is what happens in (contains? :foo 1) and also (contains? '(100 101 102) 101). Update: In Clojure ≥ 1.5 contains? throws when handed an object of a type that doesn't support the intended "key membership" test.
The correct way to do what you're trying to do is as follows:
; most of the time this works
(some #{101} '(100 101 102))
When searching for one of a bunch of items, you can use a larger set; when searching for false / nil, you can use false? / nil? -- because (#{x} x) returns x, thus (#{nil} nil) is nil; when searching for one of multiple items some of which may be false or nil, you can use
(some (zipmap [...the items...] (repeat true)) the-collection)
(Note that the items can be passed to zipmap in any type of collection.)
Here's my standard util for the same purpose:
(defn in?
"true if coll contains elm"
[coll elm]
(some #(= elm %) coll))
You can always call java methods with .methodName syntax.
(.contains [100 101 102] 101) => true
I know that I'm a little bit late, but what about:
(contains? (set '(101 102 103)) 102)
At last in clojure 1.4 outputs true :)
(not= -1 (.indexOf '(101 102 103) 102))
Works, but below is better:
(some #(= 102 %) '(101 102 103))
For what it is worth, this is my simple implementation of a contains function for lists:
(defn list-contains? [coll value]
(let [s (seq coll)]
(if s
(if (= (first s) value) true (recur (rest s) value))
false)))
If you have a vector or list and want to check whether a value is contained in it, you will find that contains? does not work.
Michał has already explained why.
; does not work as you might expect
(contains? [:a :b :c] :b) ; = false
There are four things you can try in this case:
Consider whether you really need a vector or list. If you use a set instead, contains? will work.
(contains? #{:a :b :c} :b) ; = true
Use some, wrapping the target in a set, as follows:
(some #{:b} [:a :b :c]) ; = :b, which is truthy
The set-as-function shortcut will not work if you are searching for a falsy value (false or nil).
; will not work
(some #{false} [true false true]) ; = nil
In these cases, you should use the built-in predicate function for that value, false? or nil?:
(some false? [true false true]) ; = true
If you will need to do this kind of search a lot, write a function for it:
(defn seq-contains? [coll target] (some #(= target %) coll))
(seq-contains? [true false true] false) ; = true
Also, see Michał’s answer for ways to check whether any of multiple targets are contained in a sequence.
Here's a quick function out of my standard utilities that I use for this purpose:
(defn seq-contains?
"Determine whether a sequence contains a given item"
[sequence item]
(if (empty? sequence)
false
(reduce #(or %1 %2) (map #(= %1 item) sequence))))
Here's the classic Lisp solution:
(defn member? [list elt]
"True if list contains at least one instance of elt"
(cond
(empty? list) false
(= (first list) elt) true
true (recur (rest list) elt)))
I've built upon j-g-faustus version of "list-contains?". It now takes any number of arguments.
(defn list-contains?
([collection value]
(let [sequence (seq collection)]
(if sequence (some #(= value %) sequence))))
([collection value & next]
(if (list-contains? collection value) (apply list-contains? collection next))))
It is as simple as using a set - similar to maps, you can just drop it in the function position. It evaluates to the value if in the set (which is truthy) or nil (which is falsey):
(#{100 101 102} 101) ; 101
(#{100 101 102} 99) ; nil
If you're checking against a reasonably sized vector/list you won't have until runtime, you can also use the set function:
; (def nums '(100 101 102))
((set nums) 101) ; 101
The recommended way is to use some with a set - see documentation for clojure.core/some.
You could then use some within a real true/false predicate, e.g.
(defn in? [coll x] (if (some #{x} coll) true false))
(defn in?
[needle coll]
(when (seq coll)
(or (= needle (first coll))
(recur needle (next coll)))))
(defn first-index
[needle coll]
(loop [index 0
needle needle
coll coll]
(when (seq coll)
(if (= needle (first coll))
index
(recur (inc index) needle (next coll))))))
(defn which?
"Checks if any of elements is included in coll and says which one
was found as first. Coll can be map, list, vector and set"
[ coll & rest ]
(let [ncoll (if (map? coll) (keys coll) coll)]
(reduce
#(or %1 (first (filter (fn[a] (= a %2))
ncoll))) nil rest )))
example usage (which? [ 1 2 3 ] 3) or (which? #{ 1 2 3} 4 5 3)
Since Clojure is built on Java, you can just as easily call the .indexOf Java function. This function returns the index of any element in a collection, and if it can't find this element, returns -1.
Making use of this we could simply say:
(not= (.indexOf [1 2 3 4] 3) -1)
=> true
The problem with the 'recommended' solution is it is breaks when the value you are seeking is 'nil'. I prefer this solution:
(defn member?
"I'm still amazed that Clojure does not provide a simple member function.
Returns true if `item` is a member of `series`, else nil."
[item series]
(and (some #(= item %) series) true))
There are convenient functions for this purpose in the Tupelo library. In particular, the functions contains-elem?, contains-key?, and contains-val? are very useful. Full documentation is present in the API docs.
contains-elem? is the most generic and is intended for vectors or any other clojure seq:
(testing "vecs"
(let [coll (range 3)]
(isnt (contains-elem? coll -1))
(is (contains-elem? coll 0))
(is (contains-elem? coll 1))
(is (contains-elem? coll 2))
(isnt (contains-elem? coll 3))
(isnt (contains-elem? coll nil)))
(let [coll [ 1 :two "three" \4]]
(isnt (contains-elem? coll :no-way))
(isnt (contains-elem? coll nil))
(is (contains-elem? coll 1))
(is (contains-elem? coll :two))
(is (contains-elem? coll "three"))
(is (contains-elem? coll \4)))
(let [coll [:yes nil 3]]
(isnt (contains-elem? coll :no-way))
(is (contains-elem? coll :yes))
(is (contains-elem? coll nil))))
Here we see that for an integer range or a mixed vector, contains-elem? works as expected for both existing and non-existant elements in the collection. For maps, we can also search for any key-value pair (expressed as a len-2 vector):
(testing "maps"
(let [coll {1 :two "three" \4}]
(isnt (contains-elem? coll nil ))
(isnt (contains-elem? coll [1 :no-way] ))
(is (contains-elem? coll [1 :two]))
(is (contains-elem? coll ["three" \4])))
(let [coll {1 nil "three" \4}]
(isnt (contains-elem? coll [nil 1] ))
(is (contains-elem? coll [1 nil] )))
(let [coll {nil 2 "three" \4}]
(isnt (contains-elem? coll [1 nil] ))
(is (contains-elem? coll [nil 2] ))))
It is also straightforward to search a set:
(testing "sets"
(let [coll #{1 :two "three" \4}]
(isnt (contains-elem? coll :no-way))
(is (contains-elem? coll 1))
(is (contains-elem? coll :two))
(is (contains-elem? coll "three"))
(is (contains-elem? coll \4)))
(let [coll #{:yes nil}]
(isnt (contains-elem? coll :no-way))
(is (contains-elem? coll :yes))
(is (contains-elem? coll nil)))))
For maps & sets, it is simpler (& more efficient) to use contains-key? to find a map entry or a set element:
(deftest t-contains-key?
(is (contains-key? {:a 1 :b 2} :a))
(is (contains-key? {:a 1 :b 2} :b))
(isnt (contains-key? {:a 1 :b 2} :x))
(isnt (contains-key? {:a 1 :b 2} :c))
(isnt (contains-key? {:a 1 :b 2} 1))
(isnt (contains-key? {:a 1 :b 2} 2))
(is (contains-key? {:a 1 nil 2} nil))
(isnt (contains-key? {:a 1 :b nil} nil))
(isnt (contains-key? {:a 1 :b 2} nil))
(is (contains-key? #{:a 1 :b 2} :a))
(is (contains-key? #{:a 1 :b 2} :b))
(is (contains-key? #{:a 1 :b 2} 1))
(is (contains-key? #{:a 1 :b 2} 2))
(isnt (contains-key? #{:a 1 :b 2} :x))
(isnt (contains-key? #{:a 1 :b 2} :c))
(is (contains-key? #{:a 5 nil "hello"} nil))
(isnt (contains-key? #{:a 5 :doh! "hello"} nil))
(throws? (contains-key? [:a 1 :b 2] :a))
(throws? (contains-key? [:a 1 :b 2] 1)))
And, for maps, you can also search for values with contains-val?:
(deftest t-contains-val?
(is (contains-val? {:a 1 :b 2} 1))
(is (contains-val? {:a 1 :b 2} 2))
(isnt (contains-val? {:a 1 :b 2} 0))
(isnt (contains-val? {:a 1 :b 2} 3))
(isnt (contains-val? {:a 1 :b 2} :a))
(isnt (contains-val? {:a 1 :b 2} :b))
(is (contains-val? {:a 1 :b nil} nil))
(isnt (contains-val? {:a 1 nil 2} nil))
(isnt (contains-val? {:a 1 :b 2} nil))
(throws? (contains-val? [:a 1 :b 2] 1))
(throws? (contains-val? #{:a 1 :b 2} 1)))
As seen in the test, each of these functions works correctly when for searching for nil values.
Another option:
((set '(100 101 102)) 101)
Use java.util.Collection#contains():
(.contains '(100 101 102) 101)
Found this late. But this is what im doing
(some (partial = 102) '(101 102 103))