How to compose function to applicatives with scalaz - validation

While learning Scalaz 6, I'm trying to write type-safe readers returning validations. Here are my new types:
type ValidReader[S,X] = (S) => Validation[NonEmptyList[String],X]
type MapReader[X] = ValidReader[Map[String,String],X]
and I have two functions creating map-readers for ints and strings (*):
def readInt( k: String ): MapReader[Int] = ...
def readString( k: String ): MapReader[String] = ...
Given the following map:
val data = Map( "name" -> "Paul", "age" -> "8" )
I can write two readers to retrieve the name and age:
val name = readString( "name" )
val age = readInt( "age" )
println( name(data) ) //=> Success("Paul")
println( age(data) ) //=> Success(8)
Everything works fine, but now I want to compose both readers to build a Boy instance:
case class Boy( name: String, age: Int )
My best take is:
val boy = ( name |#| age ) {
(n,a) => ( n |#| a ) { Boy(_,_) }
}
println( boy(data) ) //=> Success(Boy(Paul,8))
It works as expected, but the expression is awkward with two levels of applicative builders. Is there a way, to get the following syntax to work ?
val boy = ( name |#| age ) { Boy(_,_) }
(*) Full and runnable implementation in: https://gist.github.com/1891147
Update: Here is the compiler error message that I get when trying the line above or Daniel suggestion:
[error] ***/MapReader.scala:114: type mismatch;
[error] found : scalaz.Validation[scalaz.NonEmptyList[String],String]
[error] required: String
[error] val boy = ( name |#| age ) { Boy(_,_) }
[error] ^

How about this?
val boy = (name |#| age) {
(Boy.apply _).lift[({type V[X]=ValidationNEL[String,X]})#V]
}
or using a type alias:
type VNELStr[X] = ValidationNEL[String,X]
val boy = (name |#| age) apply (Boy(_, _)).lift[VNELStr]
This is based on the following error message at the console:
scala> name |#| age apply Boy.apply
<console>:22: error: type mismatch;
found : (String, Int) => MapReader.Boy
required: (scalaz.Validation[scalaz.NonEmptyList[String],String],
scalaz.Validation[scalaz.NonEmptyList[String],Int]) => ?
So I just lifted Boy.apply to take the required type.

Note that since Reader and Validation (with a semigroup E) are both Applicative, their composition is also Applicative. Using scalaz 7 this can be expressed as:
import scalaz.Reader
import scalaz.Reader.{apply => toReader}
import scalaz.{Validation, ValidationNEL, Applicative, Kleisli, NonEmptyList}
//type IntReader[A] = Reader[Int, A] // has some ambigous implicit resolution problem
type IntReader[A] = Kleisli[scalaz.IdInstances#Id, Int, A]
type ValNEL[A] = ValidationNEL[Throwable, A]
val app = Applicative[IntReader].compose[ValNEL]
Now we can use a single |#| operation on the composed Applicative:
val f1 = toReader((x: Int) => Validation.success[NonEmptyList[Throwable], String](x.toString))
val f2 = toReader((x: Int) => Validation.success[NonEmptyList[Throwable], String]((x+1).toString))
val f3 = app.map2(f1, f2)(_ + ":" + _)
f3.run(5) should be_==(Validation.success("5:6"))

Related

How can I define a method that takes an Ordered[T] Array in Scala?

I'm building some basic algorithms in Scala (following Cormen's book) to refresh my mind on the subject and I'm building the insertion sort algorithm. Doing it like this, it works correctly:
class InsertionSort extends Sort {
def sort ( items : Array[Int] ) : Unit = {
if ( items.length < 2 ) {
throw new IllegalArgumentException( "Array must be bigger than 1" )
}
1.until( items.length ).foreach( ( currentIndex ) => {
val key = items(currentIndex)
var loopIndex = currentIndex - 1
while ( loopIndex > -1 && items(loopIndex) > key ) {
items.update( loopIndex + 1, items(loopIndex) )
loopIndex -= 1
}
items.update( loopIndex + 1, key )
} )
}
}
But this is for Int only and I would like to use generics and Ordered[A] so I could sort any type that is ordered. When I change the signature to be like this:
def sort( items : Array[Ordered[_]] ) : Unit
The following spec doesn't compile:
"sort correctly with merge sort" in {
val items = Array[RichInt](5, 2, 4, 6, 1, 3)
insertionSort.sort( items )
items.toList === Array[RichInt]( 1, 2, 3, 4, 5, 6 ).toList
}
And the compiler error is:
Type mismatch, expected: Array[Ordered[_]], actual Array[RichInt]
But isn't RichInt an Ordered[RichInt]? How should I define this method signature in a way that it would accept any Ordered object?
EDIT
In case anyone is interested, the final source is available here.
Actually RichInt is not an Ordered[RichInt] but an Ordered[Int]. However scala.runtime.RichInt <: Ordered[_], but class Array is invariant in type T so Array[RichInt] is not an Array[Ordered[_]].
scala> def f[T <% Ordered[T]](arr: Array[T]) = { arr(0) < arr(1) }
f: [T](arr: Array[T])(implicit evidence$1: T => Ordered[T])Boolean
scala> f(Array(1,2,3))
res2: Boolean = true
scala>
You can do this with a context bound on the type parameter;
scala> def foo[T : Ordering](arr: Array[T]) = {
| import math.Ordering.Implicits._
| arr(0) < arr(1)
| }
foo: [T](arr: Array[T])(implicit evidence$1: Ordering[T])Boolean
Such that usage is:
scala> foo(Array(2.3, 3.4))
res1: Boolean = true
The advantage to this is that you don't need the default order of the type if you don't want it:
scala> foo(Array("z", "bc"))
res4: Boolean = false
scala> foo(Array("z", "bc"))(Ordering.by(_.length))
res3: Boolean = true

How to avoid overhead to pass a Map[Integer, String] where a Map[Number, String] is expected?

The problem:
I have a mutable.Map[Integer, String], I want to pass it to two methods:
def processNumbers(nums: Map[Number, String])
def processIntegers(nums: mutable.Map[Integer, String])
after getting compile error, I ended up with this:
val ints: mutable.Map[Integer, String] = mutable.Map.empty[Integer, String]
//init of ints
val nums: Map[Number, String] = ints.toMap[Number, String]
processNumbers(nums)
processIntegers(ints)
With a little experiment, I figured out that my way of doing this has a significant overhead: the type conversion step multiply by 10 the execution time.
All in all the type conversion is really just to please the compiler, so how to do that without any overhead ?
For info, the code of my experiment:
package qndTests
import scala.collection.mutable
object TypeTest {
var hashNums = 0
var hashIntegers = 0
def processNumbers(nums: Map[Number, String]): Unit = {
nums.foreach(num =>{
hashNums+=num._1.hashCode+num._2.hashCode
})
}
def processNumbers2(nums: mutable.Map[Integer, String]): Unit = {
nums.foreach(num =>{
hashNums+=num._1.hashCode+num._2.hashCode
})
}
def processIntegers(nums: mutable.Map[Integer, String]): Unit = {
nums.foreach(num =>{
hashIntegers+=num._1.hashCode+num._2.hashCode
})
}
def test(ints: mutable.Map[Integer, String], convertType: Boolean): Unit = {
if(convertType)
println("run test with type conversion")
else
println("run test without type conversion")
val start = System.nanoTime
hashNums = 0
hashIntegers = 0
val nTest = 10
for(i <- 0 to nTest) {
if(convertType){
val nums: Map[Number, String] = ints.toMap[Number, String] //how much does that cost ?
processNumbers(nums)
}else{
processNumbers2(ints)
}
processIntegers(ints)
}
val end= System.nanoTime
println("nums: "+hashNums)
println("ints: "+hashIntegers)
println(end-start)
}
def main(args: Array[String]): Unit = {
val ints: mutable.Map[Integer, String] = mutable.Map.empty[Integer, String]
val testSize = 1000000
println("creating a map of "+testSize+" elements")
for(i <- 0 to testSize) ints.put(i, i.toBinaryString)
println("done")
test(ints, false)
test(ints, true)
}
}
and its output:
creating a map of 1000000 elements
done
run test without type conversion
nums: -1650117013
ints: -1650117013
2097538520
run test with type conversion
nums: -1650117013
ints: -1650117013
25423803480
--> about 2 seconds in the first case against 25 seconds in the second one !
As you've seen, Map[A,B] is nonvariant in the key type A, so you'll need a conversion of some kind to assign to a variable of type Map[A,B] a Map[A1,B] where A1 <: A. However, if you can change the definition of def processNumbers(nums: Map[Number, String]), you could try something like:
def processNumbers[T <: Number](nums: Map[T, String])
and pass the Map[Integer, String] without conversion.
Would that help solve your problem?

Extending Seq.sortBy in Scala

Say I have a list of names.
case class Name(val first: String, val last: String)
val names = Name("c", "B") :: Name("b", "a") :: Name("a", "B") :: Nil
If I now want to sort that list by last name (and if that is not enough, by first name), it is easily done.
names.sortBy(n => (n.last, n.first))
// List[Name] = List(Name(a,B), Name(c,B), Name(b,a))
But what, if I‘d like to sort this list based on some other collation for strings?
Unfortunately, the following does not work:
val o = new Ordering[String]{ def compare(x: String, y: String) = collator.compare(x, y) }
names.sortBy(n => (n.last, n.first))(o)
// error: type mismatch;
// found : java.lang.Object with Ordering[String]
// required: Ordering[(String, String)]
// names.sortBy(n => (n.last, n.first))(o)
is there any way that allow me to change the ordering without having to write an explicit sortWith method with multiple if–else branches in order to deal with all cases?
Well, this almost does the trick:
names.sorted(o.on((n: Name) => n.last + n.first))
On the other hand, you can do this as well:
implicit val o = new Ordering[String]{ def compare(x: String, y: String) = collator.compare(x, y) }
names.sortBy(n => (n.last, n.first))
This locally defined implicit will take precedence over the one defined on the Ordering object.
One solution is to extend the otherwise implicitly used Tuple2 ordering. Unfortunately, this means writing out Tuple2 in the code.
names.sortBy(n => (n.second, n.first))(Ordering.Tuple2(o, o))
I'm not 100% sure what methods you think collator should have.
But you have the most flexibility if you define the ordering on the case class:
val o = new Ordering[Name]{
def compare(a: Name, b: Name) =
3*math.signum(collator.compare(a.last,b.last)) +
math.signum(collator.compare(a.first,b.first))
}
names.sorted(o)
but you can also provide an implicit conversion from a string ordering to a name ordering:
def ostring2oname(os: Ordering[String]) = new Ordering[Name] {
def compare(a: Name, b: Name) =
3*math.signum(os.compare(a.last,b.last)) + math.signum(os.compare(a.first,b.first))
}
and then you can use any String ordering to sort Names:
def oo = new Ordering[String] {
def compare(x: String, y: String) = x.length compare y.length
}
val morenames = List("rat","fish","octopus")
scala> morenames.sorted(oo)
res1: List[java.lang.String] = List(rat, fish, octopus)
Edit: A handy trick, in case it wasn't apparent, is that if you want to order by N things and you're already using compare, you can just multiply each thing by 3^k (with the first-to-order being multiplied by the largest power of 3) and add.
If your comparisons are very time-consuming, you can easily add a cascading compare:
class CascadeCompare(i: Int) {
def tiebreak(j: => Int) = if (i!=0) i else j
}
implicit def break_ties(i: Int) = new CascadeCompare(i)
and then
def ostring2oname(os: Ordering[String]) = new Ordering[Name] {
def compare(a: Name, b: Name) =
os.compare(a.last,b.last) tiebreak os.compare(a.first,b.first)
}
(just be careful to nest them x tiebreak ( y tiebreak ( z tiebreak w ) ) ) so you don't do the implicit conversion a bunch of times in a row).
(If you really need fast compares, then you should write it all out by hand, or pack the orderings in an array and use a while loop. I'll assume you're not that desperate for performance.)

Slow Scala assert

We've been profiling our code recently and we've come across a few annoying hotspots. They're in the form
assert(a == b, a + " is not equal to " + b)
Because some of these asserts can be in code called a huge amount of times the string concat starts to add up. assert is defined as:
def assert(assumption : Boolean, message : Any) = ....
why isn't it defined as:
def assert(assumption : Boolean, message : => Any) = ....
That way it would evaluate lazily. Given that it's not defined that way is there an inline way of calling assert with a message param that is evaluated lazily?
Thanks
Lazy evaluation has also some overhead for the function object created. If your message object is already fully constructed (a static message) this overhead is unnecessary.
The appropriate method for your use case would be sprintf-style:
assert(a == b, "%s is not equal to %s", a, b)
As long as there is a speciaized function
assert(Boolean, String, Any, Any)
this implementation has no overhead or the cost of the var args array
assert(Boolean, String, Any*)
for the general case.
Implementing toString would be evaluated lazily, but is not readable:
assert(a == b, new { override def toString = a + " is not equal to " + b })
It is by-name, I changed it over a year ago.
http://www.scala-lang.org/node/825
Current Predef:
#elidable(ASSERTION)
def assert(assertion: Boolean, message: => Any) {
if (!assertion)
throw new java.lang.AssertionError("assertion failed: "+ message)
}
Thomas' answer is great, but just in case you like the idea of the last answer but dislike the unreadability, you can get around it:
object LazyS {
def apply(f: => String): AnyRef = new {
override def toString = f
}
}
Example:
object KnightSpeak {
override def toString = { println("Turned into a string") ; "Ni" }
}
scala> assert(true != false , LazyS("I say " + KnightSpeak))
scala> println( LazyS("I say " + KnightSpeak) )
Turned into a string
I say Ni
Try: assert( a==b, "%s is not equals to %s".format(a,b))
The format should only be called when the assert needs the string. Format is added to RichString via implicit.

how to read immutable data structures from file in scala

I have a data structure made of Jobs each containing a set of Tasks. Both Job and Task data are defined in files like these:
jobs.txt:
JA
JB
JC
tasks.txt:
JB T2
JA T1
JC T1
JA T3
JA T2
JB T1
The process of creating objects is the following:
- read each job, create it and store it by id
- read task, retrieve job by id, create task, store task in the job
Once the files are read this data structure is never modified. So I would like that tasks within jobs would be stored in an immutable set. But I don't know how to do it in an efficient way. (Note: the immutable map storing jobs may be left immutable)
Here is a simplified version of the code:
class Task(val id: String)
class Job(val id: String) {
val tasks = collection.mutable.Set[Task]() // This sholud be immutable
}
val jobs = collection.mutable.Map[String, Job]() // This is ok to be mutable
// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) {
val job = new Job(line.trim)
jobs += (job.id -> job)
}
// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
val tokens = line.split("\t")
val job = jobs(tokens(0).trim)
val task = new Task(job.id + "." + tokens(1).trim)
job.tasks += task
}
Thanks in advance for every suggestion!
The most efficient way to do this would be to read everything into mutable structures and then convert to immutable ones at the end, but this might require a lot of redundant coding for classes with a lot of fields. So instead, consider using the same pattern that the underlying collection uses: a job with a new task is a new job.
Here's an example that doesn't even bother reading the jobs list--it infers it from the task list. (This is an example that works under 2.7.x; recent versions of 2.8 use "Source.fromPath" instead of "Source.fromFile".)
object Example {
class Task(val id: String) {
override def toString = id
}
class Job(val id: String, val tasks: Set[Task]) {
def this(id0: String, old: Option[Job], taskID: String) = {
this(id0 , old.getOrElse(EmptyJob).tasks + new Task(taskID))
}
override def toString = id+" does "+tasks.toString
}
object EmptyJob extends Job("",Set.empty[Task]) { }
def read(fname: String):Map[String,Job] = {
val map = new scala.collection.mutable.HashMap[String,Job]()
scala.io.Source.fromFile(fname).getLines.foreach(line => {
line.split("\t") match {
case Array(j,t) => {
val jobID = j.trim
val taskID = t.trim
map += (jobID -> new Job(jobID,map.get(jobID),taskID))
}
case _ => /* Handle error? */
}
})
new scala.collection.immutable.HashMap() ++ map
}
}
scala> Example.read("tasks.txt")
res0: Map[String,Example.Job] = Map(JA -> JA does Set(T1, T3, T2), JB -> JB does Set(T2, T1), JC -> JC does Set(T1))
An alternate approach would read the job list (creating jobs as new Job(jobID,Set.empty[Task])), and then handle the error condition of when the task list contained an entry that wasn't in the job list. (You would still need to update the job list map every time you read in a new task.)
I did a feel changes for it to run on Scala 2.8 (mostly, fromPath instead of fromFile, and () after getLines). It may be using a few Scala 2.8 features, most notably groupBy. Probably toSet as well, but that one is easy to adapt on 2.7.
I don't have the files to test it, but I changed this stuff from val to def, and the type signatures, at least, match.
class Task(val id: String)
class Job(val id: String, val tasks: Set[Task])
// read tasks
val tasks = (
for {
line <- io.Source.fromPath("tasks.txt").getLines().toStream
tokens = line.split("\t")
jobId = tokens(0).trim
task = new Task(jobId + "." + tokens(1).trim)
} yield jobId -> task
).groupBy(_._1).map { case (key, value) => key -> value.map(_._2).toSet }
// read jobs
val jobs = Map() ++ (
for {
line <- io.Source.fromPath("jobs.txt").getLines()
job = new Job(line.trim, tasks(line.trim))
} yield job.id -> job
)
You could always delay the object creation until you have all the data read in from the file, like:
case class Task(id: String)
case class Job(id: String, tasks: Set[Task])
import scala.collection.mutable.{Map,ListBuffer}
val jobIds = Map[String, ListBuffer[String]]()
// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) {
val job = line.trim
jobIds += (job.id -> new ListBuffer[String]())
}
// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
val tokens = line.split("\t")
val job = tokens(0).trim
val task = job.id + "." + tokens(1).trim
jobIds(job) += task
}
// create objects
val jobs = jobIds.map { j =>
Job(j._1, Set() ++ j._2.map { Task(_) })
}
To deal with more fields, you could (with some effort) make a mutable version of your immutable classes, used for building. Then, convert as needed:
case class Task(id: String)
case class Job(val id: String, val tasks: Set[Task])
object Job {
class MutableJob {
var id: String = ""
var tasks = collection.mutable.Set[Task]()
def immutable = Job(id, Set() ++ tasks)
}
def mutable(id: String) = {
val ret = new MutableJob
ret.id = id
ret
}
}
val mutableJobs = collection.mutable.Map[String, Job.MutableJob]()
// read jobs
for (line <- io.Source.fromFile("jobs.txt").getLines) {
val job = Job.mutable(line.trim)
jobs += (job.id -> job)
}
// read tasks
for (line <- io.Source.fromFile("tasks.txt").getLines) {
val tokens = line.split("\t")
val job = jobs(tokens(0).trim)
val task = Task(job.id + "." + tokens(1).trim)
job.tasks += task
}
val jobs = for ((k,v) <- mutableJobs) yield (k, v.immutable)
One option here is to have some mutable but transient configurer class along the lines of the MutableMap above but then pass this through in some immutable form to your actual class:
val jobs: immutable.Map[String, Job] = {
val mJobs = readMutableJobs
immutable.Map(mJobs.toSeq: _*)
}
Then of course you can implement readMutableJobs along the lines you have already coded

Resources