I've come across the following recursive algorithm, written here in Swift, that given an array, produces a generator that generates sub-arrays that are one element shorter than the original array. The sub arrays are created by removing one element at every index.
ie input [1,2,3] would return a generator that generated [1,2] [2,3] [1,3].
The algorithm works, but I'm having real trouble understanding how. Could someone explain what's happening, or offer advice on how to analyze or understand it? Thanks in advance
// Main algorithm
func smaller1<T>(xs:[T]) -> GeneratorOf<[T]> {
if let (head, tail) = xs.decompose {
var gen1:GeneratorOf<[T]> = one(tail)
var gen2:GeneratorOf<[T]> = map(smaller1(tail)) {
smallerTail in
return [head] + smallerTail
}
return gen1 + gen2
}
return one(nil)
}
// Auxillary functions used
func map<A, B>(var generator:GeneratorOf<A>, f:A -> B) -> GeneratorOf<B> {
return GeneratorOf {
return generator.next().map(f)
}
}
func one<X>(x:X?) -> GeneratorOf<X> {
return GeneratorOf(GeneratorOfOne(x))
}
The code is taken from the book 'Functional Programming in Swift' by Chris Eidhof, Florian Kugler, and Wouter Swierstra
Given an array [a_1,…,a_n], the code:
Generates the sub-array [a_2,…,a_n];
For each sub-array B of [a_2,…,a_n] (generated recursively), generates [a_1] + B.
For example, given the array [1,2,3], we:
Generate [2,3];
For each sub-array B of [2,3] (namely, [3] and [2]), generate [1] + B (this generates [1,3] and [1,2]).
Related
This question already has answers here:
Composing a list of all pairs
(3 answers)
Closed 5 years ago.
I'm trying to find the most optimal way of finding pairs in a Scala collection. For example,
val list = List(1,2,3)
should produce these pairs
(1,2) (1,3) (2,1) (2,3) (3,1) (3,2)
My current implement seems quite expensive. How can I further optimize this?
val pairs = list.flatMap { currentElement =>
val clonedList: mutable.ListBuffer[Int] = list.to[ListBuffer]
val currentIndex = list.indexOf(currentElement)
val removedValue = clonedList.remove(currentIndex)
clonedList.map { y =>
(currentElement, y)
}
}
val l = Array(1,2,3,4)
val result = scala.collection.mutable.HashSet[(Int, Int)]()
for(i <- 0 until l.size) {
for(j<- (i+1) until l.size) {
result += l(i)->l(j)
result += l(j)->l(i)
}
}
Several optimizations here. First, with the second loop, we only traverse the list from the current element to the end, dividing the number of iterations by two. Then we limit the number of object creations to the minimum (Only tuples are created and added to a mutable hashset). Finally, with the hashset you handle the duplicates for free. An additional optimization would be to check if the set already contains the tuple to avoid creating an object for nothing.
For 1,000 elements, it takes less than 1s on my laptop. 7s for 10k distinct elements.
Note that recursively, you could do it this way:
def combi(s : Seq[Int]) : Seq[(Int, Int)] =
if(s.isEmpty)
Seq()
else
s.tail.flatMap(x=> Seq(s.head -> x, x -> s.head)) ++ combi(s.tail)
It takes a little bit more than 1s for 1000 elements.
Supposing that "most optimal way" could be treated differently (e.g. most of time I treat it the one which allows myself to be more productive) I suggest the following approach:
val originalList = (1 to 1000) toStream
def orderedPairs[T](list: Stream[T]) = list.combinations(2).map( p => (p(0), p(1)) ).toStream
val pairs = orderedPairs(originalList) ++ orderedPairs(originalList.reverse)
println(pairs.slice(0, 1000).toList)
I am new to the Breeze library and I would like to convert a Map[Int, Double] to breeze.linalg.SparseVector, and ideally without having to specify a fixed length of the SparseVector. I managed to achieve the goal with this clumsy code:
import breeze.linalg.{SparseVector => SBV}
val mySparseVector: SBV[Double] = new SBV[Double](Array.empty, Array.empty, 10000)
myMap foreach { e => mySparseVector(e._1) = e._2 }
Not only I have to specify a fixed length of 10,000, but the code runs in O(n), where n is the size of the map. Is there a better way?
You can use VectorBuilder. There's a (sadly) undocumented feature where if you tell it the length is -1, it will happily let you add things. You will have to (annoyingly) set the length before you construct the result...
val vb = new VectorBuilder(length = -1)
myMap foreach { e => vb.add(e._1, e._2) }
vb.length = myMap.keys.max + 1
vb.toSparseVector
(Your code is actually n^2 because SparseVector has to be sorted so you're repeatedly moving elements around in an array. VectorBuilder gives you n log n, which is the best you can do.)
I saw this example in "Programming in Scala" chapter 24 "Collections in depth". This example shows two alternative ways to implement a tree:
by extending Traversable[Int] - here the complexity of def foreach[U](f: Int => U): Unit would be O(N).
by extending Iterable[Int] - here the complexity of def iterator: Iterator[Int] would be O(N log(N)).
This is to demonstrate why it would be helpful to have two separate traits, Traversable and Iterable.
sealed abstract class Tree
case class Branch(left: Tree, right: Tree) extends Tree
case class Node(elem: Int) extends Tree
sealed abstract class Tree extends Traversable[Int] {
def foreach[U](f: Int => U) = this match {
case Node(elem) => f(elem)
case Branch(l, r) => l foreach f; r foreach f
}
}
sealed abstract class Tree extends Iterable[Int] {
def iterator: Iterator[Int] = this match {
case Node(elem) => Iterator.single(elem)
case Branch(l, r) => l.iterator ++ r.iterator
}
}
Regarding the implementation of foreach they say:
traversing a balanced tree takes time proportional to the number of
elements in the tree. To see this, consider that for a balanced tree
with N leaves you will have N - 1 interior nodes of class Branch. So
the total number of steps to traverse the tree is N + N - 1.
That makes sense. :)
However, they mention that the concatenation of the two iterators in the iterator method has time complexity of log(N), so the total complexity of the method would be N log(N):
Every time an element is produced by a concatenated iterator such as
l.iterator ++ r.iterator, the computation needs to follow one
indirection to get at the right iterator (either l.iterator, or
r.iterator). Overall, that makes log(N) indirections to get at a leaf
of a balanced tree with N leaves. So the cost of visiting all elements of a tree went up from about 2N for the foreach traversal method to N log(N) for the traversal with iterator.
????
Why does the computation of the concatenated iterator need to get at a leaf of the left or right iterator?
The pun on "collections in depth" is apt. The depth of the data structure matters.
When you invoke top.iterator.next(), each interior Branch delegates to the iterator of the Branch or Node below it, a call chain which is log(N).
You incur that call chain on every next().
Using foreach, you visit each Branch or Node just once.
Edit: Not sure if this helps, but here is an example of eagerly locating the leaves but lazily producing the values. It would stackoverflow or be slower in older versions of Scala, but the implementation of chained ++ was improved. Now it's a flat chain that gets shorter as it's consumed.
sealed abstract class Tree extends Iterable[Int] {
def iterator: Iterator[Int] = {
def leafIterator(t: Tree): List[Iterator[Int]] = t match {
case Node(_) => t.iterator :: Nil
case Branch(left, right) => leafIterator(left) ::: leafIterator(right)
}
this match {
case n # Node(_) => Iterator.fill(1)(n.value)
case Branch(left # Node(_), right # Node(_)) => left.iterator ++ right.iterator
case b # Branch(_, _) =>
leafIterator(b).foldLeft(Iterator[Int]())((all, it) => all ++ it)
}
}
}
case class Branch(left: Tree, right: Tree) extends Tree {
override def toString = s"Branch($left, $right)"
}
case class Node(elem: Int) extends Tree {
def value = {
Console println "An expensive leaf calculation"
elem
}
override def toString = s"Node($elem)"
}
object Test extends App {
// many leaves
val n = 1024 * 1024
val ns: List[Tree] = (1 to n).map(Node(_)).toList
var b = ns
while (b.size > 1) {
b = b.grouped(2).map { case left :: right :: Nil => Branch(left, right) }.toList
}
Console println s"Head: ${b.head.iterator.take(3).toList}"
}
In this implementation, the topmost branch does NOT know how many elements there are in its left and right sub-branches.
Therefore, the iterator is built recursively with the divide and conquer approach which is clearly represented in the iterator method - you get to each node (case Branch), you produce the iterator of the single node case Node => ... and then you join them.
Without getting into each and every node, it would not know what elements there are and how the tree is structured (odd branches allowed vs not allowed etc.).
EDIT:
Let's have a look inside the ++ method on Iterator.
def ++[B >: A](that: => GenTraversableOnce[B]): Iterator[B] = new Iterator.JoinIterator(self, that)
and then at Iterator.JoinIterator
private[scala] final class JoinIterator[+A](lhs: Iterator[A], that: => GenTraversableOnce[A]) extends Iterator[A] {
private[this] var state = 0 // 0: lhs not checked, 1: lhs has next, 2: switched to rhs
private[this] lazy val rhs: Iterator[A] = that.toIterator
def hasNext = state match {
case 0 =>
if (lhs.hasNext) {
state = 1
true
} else {
state = 2
rhs.hasNext
}
case 1 => true
case _ => rhs.hasNext
}
def next() = state match {
case 0 =>
if (lhs.hasNext) lhs.next()
else {
state = 2
rhs.next()
}
case 1 =>
state = 0
lhs.next()
case _ =>
rhs.next()
}
override def ++[B >: A](that: => GenTraversableOnce[B]) =
new ConcatIterator(this, Vector(() => that.toIterator))
}
From that we can see that joining iterators just creates a recursive structure in the rhs field. Furthermore, let's focus on it a bit more.
Consider an even tree with structure level1 [A]; level2 [B][C]; level 3[D][E][F][F]
When you call JoinIterator on the iterator you preserve the existing lhs iterator. However, you always .toIterator on rhs. Which means that for each subsequent level, the rhs part will be reconstructed. So for B ++ C you get that looks like A.lhs (stands for B) and A.rhs (stands for C.toIterator) where C.toIterator stands for C.lhs and C.rhs etc. Thus, the added complexity.
I hope this answers your question.
I would like to take random samples from very large lists while maintaining the order. I wrote the script below, but it requires .map(idx => ls(idx)) which is very wasteful. I can see a way of making this more efficient with a helper function and tail recursion, but I feel that there must be a simpler solution that I'm missing.
Is there a clean and more efficient way of doing this?
import scala.util.Random
def sampledList[T](ls: List[T], sampleSize: Int) = {
Random
.shuffle(ls.indices.toList)
.take(sampleSize)
.sorted
.map(idx => ls(idx))
}
val sampleList = List("t","h","e"," ","q","u","i","c","k"," ","b","r","o","w","n")
// imagine the list is much longer though
sampledList(sampleList, 5) // List(e, u, i, r, n)
EDIT:
It appears I was unclear: I am referring to maintaining the order of the values, not the original List collection.
If by
maintaining the order of the values
you understand to keeping the elements in the sample in the same order as in the ls list, then with a small modification to your original solution the performances can be greatly improved:
import scala.util.Random
def sampledList[T](ls: List[T], sampleSize: Int) = {
Random.shuffle(ls.zipWithIndex).take(sampleSize).sortBy(_._2).map(_._1)
}
This solution has a complexity of O(n + k*log(k)), where n is the list's size, and k is the sample size, while your solution is O(n + k * log(k) + n*k).
Here is an (more complex) alternative that has O(n) complexity. You can't get any better in terms of complexity (though you could get better performance by using another collection, in particular a collection that has a constant time size implementation). I did a quick benchmark which indicated that the speedup is very substantial.
import scala.util.Random
import scala.annotation.tailrec
def sampledList[T](ls: List[T], sampleSize: Int) = {
#tailrec
def rec(list: List[T], listSize: Int, sample: List[T], sampleSize: Int): List[T] = {
require(listSize >= sampleSize,
s"listSize must be >= sampleSize, but got listSize=$listSize and sampleSize=$sampleSize"
)
list match {
case hd :: tl =>
if (Random.nextInt(listSize) < sampleSize)
rec(tl, listSize-1, hd :: sample, sampleSize-1)
else rec(tl, listSize-1, sample, sampleSize)
case Nil =>
require(sampleSize == 0, // Should never happen
s"sampleSize must be zero at the end of processing, but got $sampleSize"
)
sample
}
}
rec(ls, ls.size, Nil, sampleSize).reverse
}
The above implementation simply iterates over the list and keeps (or not) the current element according to a probability which is designed to give the same chance to each element. My logic may have a flow, but at first blush it seems sound to me.
Here's another O(n) implementation that should have a uniform probability for each element:
implicit class SampleSeqOps[T](s: Seq[T]) {
def sample(n: Int, r: Random = Random): Seq[T] = {
assert(n >= 0 && n <= s.length)
val res = ListBuffer[T]()
val length = s.length
var samplesNeeded = n
for { (e, i) <- s.zipWithIndex } {
val p = samplesNeeded.toDouble / (length - i)
if (p >= r.nextDouble()) {
res += e
samplesNeeded -= 1
}
}
res.toSeq
}
}
I'm using it frequently with collections > 100'000 elements and the performance seems reasonable.
It's probably the same idea as in Régis Jean-Gilles's answer but I think the imperative solution is slightly more readable in this case.
Perhaps I don't quite understand, but since Lists are immutable you don't really need to worry about 'maintaining the order' since the original List is never touched. Wouldn't the following suffice?
def sampledList[T](ls: List[T], sampleSize: Int) =
Random.shuffle(ls).take(sampleSize)
While my previous answer has linear complexity, it does have the drawback of requiring two passes, the first one corresponding to the need to compute the length before doing anything else. Besides affecting the running time, we might want to sample a very large collection for which it is not practical nor efficient to load the whole collection in memory at once, in which case we'd like to be able to work with a simple iterator.
As it happens, we don't need to invent anything to fix this. There is simple and clever algorithm called reservoir sampling which does exactly this (building a sample as we iterate over a collection, all in one pass). With a minor modification we can also preserve the order, as required:
import scala.util.Random
def sampledList[T](ls: TraversableOnce[T], sampleSize: Int, preserveOrder: Boolean = false, rng: Random = new Random): Iterable[T] = {
val result = collection.mutable.Buffer.empty[(T, Int)]
for ((item, n) <- ls.toIterator.zipWithIndex) {
if (n < sampleSize) result += (item -> n)
else {
val s = rng.nextInt(n)
if (s < sampleSize) {
result(s) = (item -> n)
}
}
}
if (preserveOrder) {
result.sortBy(_._2).map(_._1)
}
else result.map(_._1)
}
I have got a list of lists where the content is a vector of characters. For example:
yoda <- list(a=list(c("A","B","C"), c("B","C","D")), b=list(c("D","C"), c("B","C","D","E","F")))
This is a much shorter version that what I am actually trying to do it on, for me there is 11 list members each having about 12 sublists. For each of the list members I need to pick one sub-member liste.g. one list for "a" and one list for "b". I would like to find which combination of sublists gives the greatest number of unique values, in this simple example it would be the first sublist in "a" and the second sublist in "b" giving a final answer of:
c("A","B","C","D","E","F")
At the moment I have just got a huge number of nested loops and it seems to be taking for ever. Here is the poor bit of code:
res <- list()
for (a in 1:length(extra.pats[[1]])) {
for (b in 1:length(extra.pats[[2]])) {
for (c in 1:length(extra.pats[[3]])) {
for (d in 1:length(extra.pats[[4]])) {
for (e in 1:length(extra.pats[[5]])) {
for (f in 1:length(extra.pats[[6]])) {
for (g in 1:length(extra.pats[[7]])) {
for (h in 1:length(extra.pats[[8]])) {
for (i in 1:length(extra.pats[[9]])) {
for (j in 1:length(extra.pats[[10]])) {
for (k in 1:length(extra.pats[[11]])) {
res[[paste(a,b,c,d,e,f,g,h,i,j,k, sep="_")]] <- unique(extra.pats[[1]][[a]], extra.pats[[2]][[b]], extra.pats[[3]][[c]], extra.pats[[4]][[d]], extra.pats[[5]][[e]], extra.pats[[6]][[f]], extra.pats[[7]][[g]], extra.pats[[8]][[h]], extra.pats[[9]][[i]], extra.pats[[10]][[j]], extra.pats[[11]][[k]])
}
}
}
}
}
}
}
}
}
}
}
If anyone has got any ideas how to do this properly that would be great.
Here's a proposal:
# create all possible combinations
comb <- expand.grid(yoda)
# find unique values for each combination
uni <- lapply(seq(nrow(comb)), function(x) unique(unlist(comb[x, ])))
# count the unique values
len <- lapply(uni, length)
# extract longest combination
uni[which.max(len)]
[[1]]
[1] "A" "B" "C" "D" "E" "F"
Your current problem dimensions prohibit an exhaustive search. Here is an example of a suboptimal algorithm. While simple, maybe you'll find that it gives you "good enough" results.
The algorithm goes as follows:
Look at your first list: pick the item with the highest number of unique values.
Look at the second list: pick the item that brings the highest number of new unique values in addition to what you already selected in step 1.
repeat until you have reached the end of your list.
The code:
good.cover <- function(top.list) {
selection <- vector("list", length(top.list))
num.new.unique <- function(x, y) length(setdiff(y, x))
for (i in seq_along(top.list)) {
score <- sapply(top.list[[i]], num.new.unique, x = unlist(selection))
selection[[i]] <- top.list[[i]][which.max(score)]
}
selection
}
Let's make up some data:
items.universe <- apply(expand.grid(list(LETTERS, 0:9)), 1, paste, collapse = "")
random.length <- function()sample(3:6, 1)
random.sample <- function(i)sample(items.universe, random.length())
random.list <- function(i)lapply(letters[1:12], random.sample)
initial.list <- lapply(1:11, random.list)
Now run it:
system.time(final.list <- good.cover(initial.list))
# user system elapsed
# 0.004 0.000 0.004