Positional before Named argument list parsing - arguments

How would you do that in the parser combinators
def namedAfterPos[P, N](pos: Parser[P], nmd: Parser[N], sep: Parser[_] = ",") = ???
List("a", "a,a,a", "a,a,a=b,a=b", "a=b, a=b") map (_ parseWith namedAfterPos("a", "a=b")) map {case Success(res, _) => res}
val Failure("positional is not expected after named", pos) = "a,a=b,a" parseWith namedAfterPos("a", "a=b")

Ok, here is mind approach
scala> def namedAfterPos[P, N](pos: Parser[P], nmd: Parser[N], sep: Parser[_] = ",") = {
// NB! http://stackoverflow.com/q/38041980/6267925 disucsses that that defining type
// aliases for result type is a bad programming practice that we use! Nevertheless,
// define result type -- first comes the list of positional params, second - named.
type Res = (List[P], List[N]); // named and positional have different types
def namedOrPositional(positional: Boolean, acc: Res): Parser[Res] = {
// once association accepted, we'll try the comma and go for next or exit
def recur(positional: Boolean, acc: Res) =
(sep flatMap {_ => namedOrPositional(positional, acc)}) | success(acc);
// named association should always be acceptable - stop accepting positionals when found
(nmd flatMap {n => recur(false, acc match {case (p,nn)=> (p, n :: nn)})}) |
// if named failed - we have positional arg
(pos flatMap {p =>
// proceed if positionals are still accepted
if (positional) recur(true, acc match {case (pp, n) => (p :: pp, n)})
// if they are not - report parsing failure
else failure("positional is not expected after named")})
};
namedOrPositional(true, (Nil, Nil)) // start collecting the args
}
defined function namedAfterPos
scala> List("a", "a,a,a", "a,a,a=b,a=b", "a=b, a=b") map (
_ p namedAfterPos("a", "a=b")) map {case Success(res, _) => res}
res67: List[(List[P], List[N])] = List((List(a), List()), (List(a, a, a), List()), (List(a, a), List(a=b, a=b)), (List(), List(a=b, a=b)))
val Failure("positional is not expected after named", pos) = "a,a=b,a" p namedAfterPos("a", "a=b")
pos: Input = scala.util.parsing.input.CharSequenceReader#1726cd4
// named param is usually has a form of identifier "=" expr and positionals are expressions
scala> def paredArgList[K,V](name: Parser[K] = identifier, value: Parser[V] = expr) =
pared(namedAfterPos(value, name ~ ("=" ~> value) map {case n~v => (n,v)}))
defined function paredArgList
scala> List("a+b-1", "b=1+1", "a,a+1", "b=3+1,c=c+1", "1,b=g+g,d=123,bd=123+1") map ("(" + _ + ")" p paredArgList()) map {case Success(res, _) => res}
res70: List[(List[P], List[N])] = List((List(a + b - 1), List()), (List(), List((b,1 + 1))), (List(a + 1, a), List()), (List(), List((c,c + 1), (b,3 + 1))), (List(1), List((bd,123 + 1), (d,123), (b,g + g))))

Related

Given: aabcdddeabb => Expected: [(a,2),(b,1),(c,1),(d,3),(e,1),(a,1),(b,1)] in Scala

I'm really interested in how this algorithm can be implemented. If possible, it would be great to see an implementation with and without recursion. I am new to the language so I would be very grateful for help. All I could come up with was this code and it goes no further:
print(counterOccur("aabcdddeabb"))
def counterOccur(string: String) =
string.toCharArray.toList.map(char => {
if (!char.charValue().equals(char.charValue() + 1)) (char, counter)
else (char, counter + 1)
})
I realize that it's not even close to the truth, I just don't even have a clue what else could be used.
First solution with using recursion. I take Char by Char from string and check if last element in the Vector is the same as current. If elements the same I update last element by increasing count(It is first case). If last element does not the same I just add new element to the Vector(second case). When I took all Chars from the string I just return result.
def counterOccur(string: String): Vector[(Char, Int)] = {
#tailrec
def loop(str: List[Char], result: Vector[(Char, Int)]): Vector[(Char, Int)] = {
str match {
case x :: xs if result.lastOption.exists(_._1.equals(x)) =>
val count = result(result.size - 1)._2
loop(xs, result.updated(result.size - 1, (x, count + 1)))
case x :: xs =>
loop(xs, result :+ (x, 1))
case Nil => result
}
}
loop(string.toList, Vector.empty[(Char, Int)])
}
println(counterOccur("aabcdddeabb"))
Second solution that does not use recursion. It works the same, but instead of the recursion it is using foldLeft.
def counterOccur2(string: String): Vector[(Char, Int)] = {
string.foldLeft(Vector.empty[(Char, Int)])((r, v) => {
val lastElementIndex = r.size - 1
if (r.lastOption.exists(lv => lv._1.equals(v))) {
r.updated(lastElementIndex, (v, r(lastElementIndex)._2 + 1))
} else {
r :+ (v, 1)
}
})
}
println(counterOccur2("aabcdddeabb"))
You can use a very simple foldLeft to accumulate. You also don't need toCharArray and toList because strings are implicitly convertible to Seq[Char]:
"aabcdddeabb".foldLeft(collection.mutable.ListBuffer[(Char,Int)]()){ (acc, elm) =>
acc.lastOption match {
case Some((c, i)) if c == elm =>
acc.dropRightInPlace(1).addOne((elm, i+1))
case _ =>
acc.addOne((elm, 1))
}
}
Here is a solution using foldLeft and a custom State case class:
def countConsecutives[A](data: List[A]): List[(A, Int)] = {
final case class State(currentElem: A, currentCount: Int, acc: List[(A, Int)]) {
def result: List[(A, Int)] =
((currentElem -> currentCount) :: acc).reverse
def nextState(newElem: A): State =
if (newElem == currentElem)
this.copy(currentCount = this.currentCount + 1)
else
State(
currentElem = newElem,
currentCount = 1,
acc = (this.currentElem -> this.currentCount) :: this.acc
)
}
object State {
def initial(a: A): State =
State(
currentElem = a,
currentCount = 1,
acc = List.empty
)
}
data match {
case a :: tail =>
tail.foldLeft(State.initial(a)) {
case (state, newElem) =>
state.nextState(newElem)
}.result
case Nil =>
List.empty
}
}
You can see the code running here.
One possibility is to use the unfold method. This method is defined for several collection types, here I'm using it to produce an Iterator (documented here for version 2.13.8):
def spans[A](as: Seq[A]): Iterator[Seq[A]] =
Iterator.unfold(as) {
case head +: tail =>
val (span, rest) = tail.span(_ == head)
Some((head +: span, rest))
case _ =>
None
}
unfold starts from a state and applies a function that returns, either:
None if we want to signal that the collection ended
Some of a pair that contains the next item of the collection we want to produce and the "remaining" state that will be fed to the next iteration.
In this example in particular, we start from a sequence of A called as (which can be a sequence of characters) and at each iteration:
if there's at least one item
we split head and tail
we further split the tail into the longest prefix that contains items equal to the head and the rest
we return the head and the prefix we got above as the next item
we return the rest of the collection as the state for the following iteration
otherwise, we return None as there's nothing more to be done
The result is a fairly flexible function that can be used to group together spans of equal items. You can then define the function you wanted initially in terms of this:
def spanLengths[A](as: Seq[A]): Iterator[(A, Int)] =
spans(as).map(a => a.head -> a.length)
This can be probably made more generic and its performance improved, but I hope this can be an helpful example about another possible approach. While folding a collection is a recursive approach, unfolding is referred to as a corecursive one (Wikipedia article).
You can play around with this code here on Scastie.
For
str = "aabcdddeabb"
you could extract matches of the regular expression
rgx = /(.)\1*/
to obtain the array
["aa", "b", "c", "ddd", "e", "a", "bb"]
and then map each element of the array to the desired string.1
def counterOccur(str: String): List[(Char, Int)] = {
"""(.)\1*""".r
.findAllIn(str)
.map(m => (m.charAt(0), m.length)).toList
}
counterOccur("aabcdddeabb")
#=> res0: List[(Char, Int)] = List((a,2), (b,1), (c,1), (d,3), (e,1), (a,1), (b,2))
The regular expression reads, "match any character and save it to capture group 1 ((.)), then match the content of capture group 1 zero or more times (\1*).
1. Scala code kindly provided by #Thefourthbird.

Scala how to define an ordering for Rationals

I have to implement compareRationals as something like
(a, b) => {
the body goes here
}
to compare to fractions, transform them so they both have the same denominator, then order the two results by their numerator to make sure they have the same denominator, need to find out the Least Common Denominator so my code works for println(insertionSort2(List(rationals))) and currently works for all the println statements besides that. I really need help to define compareRationals so println(insertionSort2(List(rationals))) shouldBe List(fourth, third, half)
Object {
def insertionSort2[A](xs: List[A])(implicit ord: Ordering[A]): List[A] = {
def insert2(y: A, ys: List[A]): List[A] =
ys match {
case List() => y :: List()
case z :: zs =>
if (ord.lt(y, z)) y :: z :: zs
else z :: insert2(y, zs)
}
xs match {
case List() => List()
case y :: ys => insert2(y, insertionSort2(ys))
}
}
class Rational(x: Int, y: Int) {
private def gcd(a: Int, b: Int): Int = if (b == 0) a else gcd(b, a % b)
private val g = gcd(x, y)
lazy val numer: Int = x / g
lazy val denom: Int = y / g
}
val compareRationals: (Rational, Rational) => Int =
implicit val rationalOrder: Ordering[Rational] =
new Ordering[Rational] {
def compare(x: Rational, y: Rational): Int = compareRationals(x, y)
}
def main(args: Array[String]): Unit = {
val half = new Rational(1, 2)
val third = new Rational(1, 3)
val fourth = new Rational(1, 4)
val rationals = List(third, half, fourth)
println(insertionSort2(List(4,2,9,5,8))(Ordering.Int))
println(insertionSort2(List(4,2,9,5,8)))
println(insertionSort2(List(rationals)))
}
}
}
I think this is all you need.
val compareRationals: (Rational, Rational) => Int =
(x,y) => x.numer * y.denom - y.numer * x.denom

How to get a termination reason from a recursive function?

Suppose a function is looped to produce a numeric result. The looping is stopped either if the iterations maximum is reached or the "optimality" condition is met. In either case, the value from the current loop is output. What is a functional way to get both this result and the stopping reason?
For illustration, here's my Scala implementation of the "Square Roots" example in 4.1 of https://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf.
object SquareRootAlg {
def next(a: Double)(x: Double): Double = (x + a/x)/2
def repeat[A](f: A=>A, a: A): Stream[A] = a #:: repeat(f, f(a))
def loopConditional[A](stop: (A, A) => Boolean)(s: => Stream[A] ): A = s match {
case a #:: t if t.isEmpty => a
case a #:: t => if (stop(a, t.head)) t.head else loopConditional(stop)(t)}
}
Eg, to find the square root of 4:
import SquareRootAlg._
val cond = (a: Double, b: Double) => (a-b).abs < 0.01
val alg = loopConditional(cond) _
val s = repeat(next(4.0), 4.0)
alg(s.take(3)) // = 2.05, "maxIters exceeded"
alg(s.take(5)) // = 2.00000009, "optimality reached"
This code works, but doesn't give me the stopping reason. So
I'm trying to write a method
def loopConditionalInfo[A](stop: (A, A)=> Boolean)(s: => Stream[A]): (A, Boolean)
outputting (2.05, false) in the first case above, and (2.00000009, true) in the second. Is there a way to write this method without modifying the next and repeat methods? Or would another functional approach work better?
Typically, you need to return a value that includes both a stopping reason and the result. Using the (A, Boolean) return signature you propose allows for this.
Your code would then become:
import scala.annotation.tailrec
object SquareRootAlg {
def next(a: Double)(x: Double): Double = (x + a/x)/2
def repeat[A](f: A=>A, a: A): Stream[A] = a #:: repeat(f, f(a))
#tailrec // Checks function is truly tail recursive.
def loopConditional[A](stop: (A, A) => Boolean)(s: => Stream[A] ): (A, Boolean) = {
val a = s.head
val t = s.tail
if(t.isEmpty) (a, false)
else if(stop(a, t.head)) (t.head, true)
else loopConditional(stop)(t)
}
}
Just return the booleans without modifying anything else:
object SquareRootAlg {
def next(a: Double)(x: Double): Double = (x + a/x)/2
def repeat[A](f: A => A, a: A): Stream[A] = a #:: repeat(f, f(a))
def loopConditionalInfo[A]
(stop: (A, A)=> Boolean)
(s: => Stream[A])
: (A, Boolean) = s match {
case a #:: t if t.isEmpty => (a, false)
case a #:: t =>
if (stop(a, t.head)) (t.head, true)
else loopConditionalInfo(stop)(t)
}
}
import SquareRootAlg._
val cond = (a: Double, b: Double) => (a-b).abs < 0.01
val alg = loopConditionalInfo(cond) _
val s = repeat(next(4.0), 4.0)
println(alg(s.take(3))) // = 2.05, "maxIters exceeded"
println(alg(s.take(5)))
prints
(2.05,false)
(2.0000000929222947,true)

Compare the Elements of List and creating a [key value Pairs or Maps] based on logic in Scala

I have a list with the following data. I have to compare the elements of the list and create a map with specified condition. The SFTP.csv should map to /dev/sftp/SFTP_schema.json same with other elements.
List[String] = List(
"/dev/sftp/SFTP.csv" ,
"/dev/sftp/test_schema.json" ,
"/dev/sftp/SFTP_schema.json",
"/dev/sftp/test.csv"
)
I have a large set what is the fastest way to do it?
So, you essentially want to invert a map.flatMap{ case (k, v) => List(k, v)) }? That looks like fun... How about this?:
val input = List(
"/dev/sftp/SFTP.csv" ,
"/dev/sftp/test_schema.json" ,
"/dev/sftp/SFTP_schema.json",
"/dev/sftp/test.csv"
)
val res = input.
groupBy(s => s.
split("/").
last.
replaceAll("\\.csv","").
replaceAll("_schema\\.json","")
).
map {
case (k, v1 :: v2 :: Nil) =>
if (v1.endsWith("csv")) (v1, v2)
else (v2, v1)
case sthElse => throw new Error(
"Invalid combination of csv & schema.json: " + sthElse
)
}
println(res)
Produces:
// Map(
// /dev/sftp/SFTP.csv -> /dev/sftp/SFTP_schema.json,
// /dev/sftp/test.csv -> /dev/sftp/test_schema.json
// )
As method:
def invertFlatMapToUnionKeyValue(input: List[String]): Map[String, String] = {
input.
groupBy(s => s.split("/").last.
replaceAll("\\.csv","").
replaceAll("_schema\\.json",""
)).
map {
case (k, v1 :: v2 :: Nil) =>
if (v1.endsWith("csv")) (v1, v2)
else (v2, v1)
case sthElse => throw new Error(
"Invalid combination of csv & schema.json: " + sthElse
)
}
}
You can partition a list into 2, based on a predicate:
val (csvs, jsons) = input.partition (n => n.endsWith (".csv"))
// csvs: List[String] = List(/dev/sftp/SFTP.csv, /dev/sftp/test.csv)
// jsons: List[String] = List(/dev/sftp/test_schema.json, /dev/sftp/SFTP_schema.json)
Then just iterate over the names, stripping off .csv and _schema.json:
for (c <- csvs;
j <- jsons;
if (c.substring (0, c.length - 4) == j.substring (0, j.length - 12))) yield
(c, j)
to combine the matches.
If we can assume that there is an entry for json schema always for a csv, one approach could be to partition and zip the lists after sorting.
val (csvs, jsons) = input.partition (n => n.endsWith (".csv"))
csvs.sorted.zip(jsonSchemas.sorted)

Simplest way to get the top n elements of a Scala Iterable

Is there a simple and efficient solution to determine the top n elements of a Scala Iterable? I mean something like
iter.toList.sortBy(_.myAttr).take(2)
but without having to sort all elements when only the top 2 are of interest. Ideally I'm looking for something like
iter.top(2, _.myAttr)
see also: Solution for the top element using an Ordering: In Scala, how to use Ordering[T] with List.min or List.max and keep code readable
Update:
Thank you all for your solutions. Finally, I took the original solution of user unknown and adopted it to use Iterable and the pimp-my-library pattern:
implicit def iterExt[A](iter: Iterable[A]) = new {
def top[B](n: Int, f: A => B)(implicit ord: Ordering[B]): List[A] = {
def updateSofar (sofar: List [A], el: A): List [A] = {
//println (el + " - " + sofar)
if (ord.compare(f(el), f(sofar.head)) > 0)
(el :: sofar.tail).sortBy (f)
else sofar
}
val (sofar, rest) = iter.splitAt(n)
(sofar.toList.sortBy (f) /: rest) (updateSofar (_, _)).reverse
}
}
case class A(s: String, i: Int)
val li = List (4, 3, 6, 7, 1, 2, 9, 5).map(i => A(i.toString(), i))
println(li.top(3, _.i))
My solution (bound to Int, but should be easily changed to Ordered (a few minutes please):
def top (n: Int, li: List [Int]) : List[Int] = {
def updateSofar (sofar: List [Int], el: Int) : List [Int] = {
// println (el + " - " + sofar)
if (el < sofar.head)
(el :: sofar.tail).sortWith (_ > _)
else sofar
}
/* better readable:
val sofar = li.take (n).sortWith (_ > _)
val rest = li.drop (n)
(sofar /: rest) (updateSofar (_, _)) */
(li.take (n). sortWith (_ > _) /: li.drop (n)) (updateSofar (_, _))
}
usage:
val li = List (4, 3, 6, 7, 1, 2, 9, 5)
top (2, li)
For above list, take the first 2 (4, 3) as starting TopTen (TopTwo).
Sort them, such that the first element is the bigger one (if any).
repeatedly iterate through the rest of the list (li.drop(n)), and compare the current element with the maximum of the list of minimums; replace, if neccessary, and resort again.
Improvements:
Throw away Int, and use ordered.
Throw away (_ > _) and use a user-Ordering to allow BottomTen. (Harder: pick the middle 10 :) )
Throw away List, and use Iterable instead
update (abstraction):
def extremeN [T](n: Int, li: List [T])
(comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):
List[T] = {
def updateSofar (sofar: List [T], el: T) : List [T] =
if (comp1 (el, sofar.head))
(el :: sofar.tail).sortWith (comp2 (_, _))
else sofar
(li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _))
}
/* still bound to Int:
def top (n: Int, li: List [Int]) : List[Int] = {
extremeN (n, li) ((_ < _), (_ > _))
}
def bottom (n: Int, li: List [Int]) : List[Int] = {
extremeN (n, li) ((_ > _), (_ < _))
}
*/
def top [T] (n: Int, li: List [T])
(implicit ord: Ordering[T]): Iterable[T] = {
extremeN (n, li) (ord.lt (_, _), ord.gt (_, _))
}
def bottom [T] (n: Int, li: List [T])
(implicit ord: Ordering[T]): Iterable[T] = {
extremeN (n, li) (ord.gt (_, _), ord.lt (_, _))
}
top (3, li)
bottom (3, li)
val sl = List ("Haus", "Garten", "Boot", "Sumpf", "X", "y", "xkcd", "x11")
bottom (2, sl)
To replace List with Iterable seems to be a bit harder.
As Daniel C. Sobral pointed out in the comments, a high n in topN can lead to much sorting work, so that it could be useful, to do a manual insertion sort instead of repeatedly sorting the whole list of top-n elements:
def extremeN [T](n: Int, li: List [T])
(comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):
List[T] = {
def sortedIns (el: T, list: List[T]): List[T] =
if (list.isEmpty) List (el) else
if (comp2 (el, list.head)) el :: list else
list.head :: sortedIns (el, list.tail)
def updateSofar (sofar: List [T], el: T) : List [T] =
if (comp1 (el, sofar.head))
sortedIns (el, sofar.tail)
else sofar
(li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _))
}
top/bottom method and usage as above. For small groups of top/bottom Elements, the sorting is rarely called, a few times in the beginning, and then less and less often over time. For example, 70 times with top (10) of 10 000, and 90 times with top (10) of 100 000.
Here's another solution that is simple and has pretty good performance.
def pickTopN[T](k: Int, iterable: Iterable[T])(implicit ord: Ordering[T]): Seq[T] = {
val q = collection.mutable.PriorityQueue[T](iterable.toSeq:_*)
val end = Math.min(k, q.size)
(1 to end).map(_ => q.dequeue())
}
The Big O is O(n + k log n), where k <= n. So the performance is linear for small k and at worst n log n.
The solution can also be optimized to be O(k) for memory but O(n log k) for performance. The idea is to use a MinHeap to track only the top k items at all times. Here's the solution.
def pickTopN[A, B](n: Int, iterable: Iterable[A], f: A => B)(implicit ord: Ordering[B]): Seq[A] = {
val seq = iterable.toSeq
val q = collection.mutable.PriorityQueue[A](seq.take(n):_*)(ord.on(f).reverse) // initialize with first n
// invariant: keep the top k scanned so far
seq.drop(n).foreach(v => {
q += v
q.dequeue()
})
q.dequeueAll.reverse
}
Yet another version:
val big = (1 to 100000)
def maxes[A](n:Int)(l:Traversable[A])(implicit o:Ordering[A]) =
l.foldLeft(collection.immutable.SortedSet.empty[A]) { (xs,y) =>
if (xs.size < n) xs + y
else {
import o._
val first = xs.firstKey
if (first < y) xs - first + y
else xs
}
}
println(maxes(4)(big))
println(maxes(2)(List("a","ab","c","z")))
Using the Set force the list to have unique values:
def maxes2[A](n:Int)(l:Traversable[A])(implicit o:Ordering[A]) =
l.foldLeft(List.empty[A]) { (xs,y) =>
import o._
if (xs.size < n) (y::xs).sort(lt _)
else {
val first = xs.head
if (first < y) (y::(xs - first)).sort(lt _)
else xs
}
}
You don't need to sort the entire collection in order to determine the top N elements. However, I don't believe that this functionality is supplied by the raw library, so you would have to roll you own, possibly using the pimp-my-library pattern.
For example, you can get the nth element of a collection as follows:
class Pimp[A, Repr <% TraversableLike[A, Repr]](self : Repr) {
def nth(n : Int)(implicit ord : Ordering[A]) : A = {
val trav : TraversableLike[A, Repr] = self
var ltp : List[A] = Nil
var etp : List[A] = Nil
var mtp : List[A] = Nil
trav.headOption match {
case None => error("Cannot get " + n + " element of empty collection")
case Some(piv) =>
trav.foreach { a =>
val cf = ord.compare(piv, a)
if (cf == 0) etp ::= a
else if (cf > 0) ltp ::= a
else mtp ::= a
}
if (n < ltp.length)
new Pimp[A, List[A]](ltp.reverse).nth(n)(ord)
else if (n < (ltp.length + etp.length))
piv
else
new Pimp[A, List[A]](mtp.reverse).nth(n - ltp.length - etp.length)(ord)
}
}
}
(This is not very functional; sorry)
It's then trivial to get the top n elements:
def topN(n : Int)(implicit ord : Ordering[A], bf : CanBuildFrom[Repr, A, Repr]) ={
val b = bf()
val elem = new Pimp[A, Repr](self).nth(n)(ord)
import util.control.Breaks._
breakable {
var soFar = 0
self.foreach { tt =>
if (ord.compare(tt, elem) < 0) {
b += tt
soFar += 1
}
}
assert (soFar <= n)
if (soFar < n) {
self.foreach { tt =>
if (ord.compare(tt, elem) == 0) {
b += tt
soFar += 1
}
if (soFar == n) break
}
}
}
b.result()
}
Unfortunately I'm having trouble getting this pimp to be discovered via this implicit:
implicit def t2n[A, Repr <% TraversableLike[A, Repr]](t : Repr) : Pimp[A, Repr]
= new Pimp[A, Repr](t)
I get this:
scala> List(4, 3, 6, 7, 1, 2, 8, 5).topN(4)
<console>:9: error: could not find implicit value for evidence parameter of type (List[Int]) => scala.collection.TraversableLike[A,List[Int]]
List(4, 3, 6, 7, 1, 2, 8, 5).topN(4)
^
However, the code actually works OK:
scala> new Pimp(List(4, 3, 6, 7, 1, 2, 8, 5)).topN(4)
res3: List[Int] = List(3, 1, 2, 4)
And
scala> new Pimp("ioanusdhpisjdmpsdsvfgewqw").topN(6)
res2: java.lang.String = adddfe
If the goal is to not sort the whole list then you could do something like this (of course it could be optimized a tad so that we don't change the list when the number clearly shouldn't be there):
List(1,6,3,7,3,2).foldLeft(List[Int]()){(l, n) => (n :: l).sorted.take(2)}
I implemented such an ranking algorithm recently in the Rank class of Apache Jackrabbit (in Java though). See the take method for the gist of it. The basic idea is to quicksort but terminate prematurely as soon as the top n elements have been found.
Here is asymptotically O(n) solution.
def top[T](data: List[T], n: Int)(implicit ord: Ordering[T]): List[T] = {
require( n < data.size)
def partition_inner(shuffledData: List[T], pivot: T): List[T] =
shuffledData.partition( e => ord.compare(e, pivot) > 0 ) match {
case (left, right) if left.size == n => left
case (left, x :: rest) if left.size < n =>
partition_inner(util.Random.shuffle(data), x)
case (left # y :: rest, right) if left.size > n =>
partition_inner(util.Random.shuffle(data), y)
}
val shuffled = util.Random.shuffle(data)
partition_inner(shuffled, shuffled.head)
}
scala> top(List.range(1,10000000), 5)
Due to recursion, this solution will take longer than some non-linear solutions above and can cause java.lang.OutOfMemoryError: GC overhead limit exceeded.
But slightly more readable IMHO and functional style. Just for job interview ;).
What is more important, that this solution can be easily parallelized.
def top[T](data: List[T], n: Int)(implicit ord: Ordering[T]): List[T] = {
require( n < data.size)
#tailrec
def partition_inner(shuffledData: List[T], pivot: T): List[T] =
shuffledData.par.partition( e => ord.compare(e, pivot) > 0 ) match {
case (left, right) if left.size == n => left.toList
case (left, right) if left.size < n =>
partition_inner(util.Random.shuffle(data), right.head)
case (left, right) if left.size > n =>
partition_inner(util.Random.shuffle(data), left.head)
}
val shuffled = util.Random.shuffle(data)
partition_inner(shuffled, shuffled.head)
}
For small values of n and large lists, getting the top n elements can be implemented by picking out the max element n times:
def top[T](n:Int, iter:Iterable[T])(implicit ord: Ordering[T]): Iterable[T] = {
def partitionMax(acc: Iterable[T], it: Iterable[T]): Iterable[T] = {
val max = it.max(ord)
val (nextElems, rest) = it.partition(ord.gteq(_, max))
val maxElems = acc ++ nextElems
if (maxElems.size >= n || rest.isEmpty) maxElems.take(n)
else partitionMax(maxElems, rest)
}
if (iter.isEmpty) iter.take(0)
else partitionMax(iter.take(0), iter)
}
This does not sort the entire list and takes an Ordering. I believe every method I call in partitionMax is O(list size) and I only expect to call it n times at most, so the overall efficiency for small n will be proportional to the size of the iterator.
scala> top(5, List.range(1,1000000))
res13: Iterable[Int] = List(999999, 999998, 999997, 999996, 999995)
scala> top(5, List.range(1,1000000))(Ordering[Int].on(- _))
res14: Iterable[Int] = List(1, 2, 3, 4, 5)
You could also add a branch for when n gets close to size of the iterable, and switch to iter.toList.sortBy(_.myAttr).take(n).
It does not return the type of collection provided, but you can look at How do I apply the enrich-my-library pattern to Scala collections? if this is a requirement.
An optimised solution using PriorityQueue with Time Complexity of O(nlogk). In the approach given in the update, you are sorting the sofar list every time which is not needed and below it is optimised by using PriorityQueue.
import scala.language.implicitConversions
import scala.language.reflectiveCalls
import collection.mutable.PriorityQueue
implicit def iterExt[A](iter: Iterable[A]) = new {
def top[B](n: Int, f: A => B)(implicit ord: Ordering[B]) : List[A] = {
def updateSofar (sofar: PriorityQueue[A], el: A): PriorityQueue[A] = {
if (ord.compare(f(el), f(sofar.head)) < 0){
sofar.dequeue
sofar.enqueue(el)
}
sofar
}
val (sofar, rest) = iter.splitAt(n)
(PriorityQueue(sofar.toSeq:_*)( Ordering.by( (x :A) => f(x) ) ) /: rest) (updateSofar (_, _)).dequeueAll.toList.reverse
}
}
case class A(s: String, i: Int)
val li = List (4, 3, 6, 7, 1, 2, 9, 5).map(i => A(i.toString(), i))
println(li.top(3, -_.i))

Resources