Is there an idiomatic way to handle a collection of Validation in Scalaz6?
val results:Seq[Validation[A,B]]
val exceptions = results.collect{case Failure(exception)=>exception}
exceptions.foreach{logger.error("Error when starting up ccxy gottware",_)}
val success = results.collect{case Success(data)=>data}
success.foreach {data => data.push}
if (exceptions.isEmpty)
containers.foreach( _.start())
I could think of using a fold when looping on results, but what about the final test?
The usual way to work with a list of validations is to use sequence to turn the list into a Validation[A, List[B]], which will be be empty (i.e., a Failure) if there were any errors along the way.
Sequencing a Validation accumulates errors (as opposed to Either, which fails immediately) in the semigroup of the left-hand type. This is why you often see ValidationNEL (where the NEL stands for NonEmptyList) used instead of simply Validation. So for example if you have this result type:
import scalaz._, Scalaz._
type ExceptionsOr[A] = ValidationNEL[Exception, A]
And some results:
val results: Seq[ExceptionsOr[Int]] = Seq(
"13".parseInt.liftFailNel, "42".parseInt.liftFailNel
)
Sequencing will give you the following:
scala> results.sequence
res0: ExceptionsOr[Seq[Int]] = Success(List(13, 42))
If we had some errors like this, on the other hand:
val results: Seq[ExceptionsOr[Int]] = Seq(
"13".parseInt.liftFailNel, "a".parseInt.liftFailNel, "b".parseInt.liftFailNel
)
We'd end up with a Failure (note that I've reformatted the output to make it legible here):
scala> results.sequence
res1: ExceptionsOr[Seq[Int]] = Failure(
NonEmptyList(
java.lang.NumberFormatException: For input string: "a",
java.lang.NumberFormatException: For input string: "b"
)
)
So in your case you'd write something like this:
val results: Seq[ValidationNEL[A, B]]
results.sequence match {
case Success(xs) => xs.foreach(_.push); containers.foreach(_.start())
case Failure(exceptions) => exceptions.foreach(
logger.error("Error when starting up ccxy gottware", _)
)
}
See my answers here and here for more detail about sequence and about Validation more generally.
Related
Could you guys help me out with using Array and stream (?) over it to use single element (String) to save Movie to db and return FLux. Spring specific stuff isn't important - just the way to iterate over alphabet and create random Movies. What's the best and most-kotlinish way of doing this?
val alphabet = arrayOf("A".."Z")
val exampleMovies: Flux<Movie> = Flux.just(alphabet)
.flatMap { movieRepository.save(Movie(name = it)) }
I'm getting compilation error:
Error:(15, 62) Kotlin: Type mismatch: inferred type is Array<ClosedRange<String>>! but String? was expected
The problem is that arrayOf("A".."Z") will give an Array<ClosedRange<String>>, i.e. the array has one element of type ClosedRange. What you actually wanted to have is an Array<String> with elements A, B, C, ..., Z I guess? Unfortunately, the range operator doesn't work like this for Strings, explained here.
Instead, create that array by mapping a CharRange accordingly:
val alphabet = ('A'..'Z').map(Char::toString).toTypedArray()
Using Scala and Spark, I have the following construction:
val rdd1: RDD[String] = ...
val rdd2: RDD[(String, Any)] = ...
val rdd1pairs = rdd1.map(s => (s, s))
val result = rdd2.join(rdd1pairs)
.map { case (_: String, (e: Any, _)) => e }
The purpose of mapping rdd1 into a PairRDD is the join with rdd2 in the subsequent step. However, I am actually only interested in the values of rdd2, hence the mapping step in the last line which omits the keys. Actually, this is an intersection between rdd2 and rdd1 performed with Spark's join() for efficiency reasons.
My question refers to the keys of rdd1pairs: they are created for syntactical reasons only (to allow the join) in the first map step and are later discarded without any usage. How does the compiler handle this? Does it matter in terms of memory consumption whether I use the String s (as shown in the example)? Should I replace it by null or 0 to save a little memory? Does the compiler actually create and store these objects (references) or does it notice that they are never used?
In this case, it is what the Spark driver will do that influences the outcome rather than the compiler, I think. Whether or not Spark can optimise its execution pipeline in order to avoid creating the redundant duplication of s. I'm not sure but I think Spark will create the rdd1pairs, in memory.
Instead of mapping to (String, String) you could use (String, Unit):
rdd1.map(s => (s,()))
What you're doing is basically a filter of rdd2 based on rdd1. If rdd1 is significantly smaller than rdd2, another method would be to represent the data of rdd1 as a broadcast variable rather than an RDD, and simply filter rdd2. This avoids any shuffling or reduce phase, so may be quicker, but will only work if the data of rdd1 is small enough to fit on each node.
EDIT:
Considering how using Unit rather than String saves space, consider the following examples:
object size extends App {
(1 to 1000000).map(i => ("foo"+i, ()))
val input = readLine("prompt> ")
}
and
object size extends App {
(1 to 1000000).map(i => ("foo"+i, "foo"+i))
val input = readLine("prompt> ")
}
Using the jstat command as described in this question How to check heap usage of a running JVM from the command line? the first version uses significantly less heap than the latter.
Edit 2:
Unit is effectively a singleton object with no contents, so logically, it should not require any serialization. The fact that the type definition contains Unit tells you all you need to be able to deserialize a structure which has a field of type Unit.
Spark uses Java Serialization by default. Consider the following:
object Main extends App {
import java.io.{ObjectOutputStream, FileOutputStream}
case class Foo (a: String, b:String)
case class Bar (a: String, b:String, c: Unit)
val str = "abcdef"
val foo = Foo("abcdef", "xyz")
val bar = Bar("abcdef", "xyz", ())
val fos = new FileOutputStream( "foo.obj" )
val fo = new ObjectOutputStream( fos )
val bos = new FileOutputStream( "bar.obj" )
val bo = new ObjectOutputStream( bos )
fo writeObject foo
bo writeObject bar
}
The two files are of identical size:
�� sr Main$Foo3�,�z \ L at Ljava/lang/String;L bq ~ xpt abcdeft xyz
and
�� sr Main$Bar+a!N��b L at Ljava/lang/String;L bq ~ xpt abcdeft xyz
The problem
I'll start with a simplified parsing problem. Suppose I've got a list of strings that I want to parse into a list of integers, and that I want to accumulate errors. This is pretty easy in Scalaz 7:
val lines = List("12", "13", "13a", "14", "foo")
def parseLines(lines: List[String]) = lines.traverseU(_.parseInt.toValidationNel)
We can confirm that it works as expected:
scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
For input string: "13a"
For input string: "foo"
This is nice, but suppose the list is pretty long, and that I decide I want to capture more information about the context of the errors in order to make clean-up easier. For the sake of simplicity I'll just use (zero-indexed) line numbers here to represent the position, but the context could also include a file name or other information.
Passing around position
One simple approach is to pass the position to my line parser:
type Position = Int
case class InvalidLine(pos: Position, message: String) extends Throwable(
f"At $pos%d: $message%s"
)
def parseLine(line: String, pos: Position) = line.parseInt.leftMap(
_ => InvalidLine(pos, f"$line%s is not an integer!")
)
def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU(
(parseLine _).tupled andThen (_.toValidationNel)
)
This also works:
scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
At 2: 13a is not an integer!
At 4: foo is not an integer!
But in more complex situations passing the position around like this gets unpleasant.
Wrapping errors
Another option is to wrap the error produced by the line parser:
case class InvalidLine(pos: Position, underlying: Throwable) extends Throwable(
f"At $pos%d: ${underlying.getMessage}%s",
underlying
)
def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU {
case (line, pos) => line.parseInt.leftMap(InvalidLine(pos, _)).toValidationNel
}
And again, it works just fine:
scala> parseLines(lines).fold(_.foreach(t => println(t.getMessage)), println)
At 2: For input string: "13a"
At 4: For input string: "foo"
But sometimes I have a nice error ADT and this kind of wrapping doesn't feel particularly elegant.
Returning "partial" errors
A third approach would be to have my line parser return a partial error that needs to be combined with some additional information (the position, in this case). I'll use Reader here, but we could just as well represent the failure type as something like Position => Throwable. We can reuse our first (non-wrapping) InvalidLine above.
def parseLine(line: String) = line.parseInt.leftMap(
error => Reader(InvalidLine((_: Position), error.getMessage))
)
def parseLines(lines: List[String]) = lines.zipWithIndex.traverseU {
case (line, pos) => parseLine(line).leftMap(_.run(pos)).toValidationNel
}
Once again this produces the desired output, but also feels kind of verbose and clunky.
Question
I come across this kind of problem all the time—I'm parsing some messy data and want nice helpful error messages, but I also don't want to thread a bunch of position information through all of my parsing logic.
Is there some reason to prefer one of the approaches above? Are there better approaches?
I use a combination of your first and second options with locally-requested stackless exceptions for control flow. This is the best thing I've found to keep error handling both completely bulletproof and mostly out of the way. The basic form looks like this:
Ok.or[InvalidLine]{ bad =>
if (somethingWentWrong) bad(InvalidLine(x))
else y.parse(bad) // Parsers should know about sending back info!
}
where bad throws an exception when called that returns the data passed to it, and the output is a custom Either-like type. If it becomes important to inject additional context from the outer scope, adding an extra transformer step is all it takes to add context:
Ok.or[Invalid].explain(i: InvalidLine => Invalid(i, myFile)) { bad =>
// Parsing logic
}
Actually creating the classes to make this work is a little more fiddly than I want to post here (especially since there are additional considerations in all of my actual working code which obscures the details), but this is the logic.
Oh, and since this ends up just being an apply method on a class, you can always
val validate = Ok.or[Invalid].explain(/* blah */)
validate { bad => parseA }
validate { bad => parseB }
and all the usual tricks.
(I suppose it's not fully obvious that the type signature of bad is bad: InvalidLine => Nothing, and the type signature of the apply is (InvalidLine => Nothing) => T.)
An overly-simplified solution could be:
import scala.util.{Try, Success, Failure}
def parseLines(lines: List[String]): List[Try[Int]] =
lines map { l => Try (l.toInt) }
val lines = List("12", "13", "13a", "14", "foo")
println("LINES: " + lines)
val parsedLines = parseLines(lines)
println("PARSED: " + parsedLines)
val anyFailed: Boolean = parsedLines.exists(_.isFailure)
println("FAILURES EXIST?: " + anyFailed)
val failures: List[Throwable] = parsedLines.filter(_.isFailure).map{ case Failure(e) => e }
println("FAILURES: " + failures)
val parsedWithIndex = parsedLines.zipWithIndex
println("PARSED LINES WITH INDEX: " + parsedWithIndex)
val failuresWithIndex = parsedWithIndex.filter{ case (v, i) => v.isFailure }
println("FAILURES WITH INDEX: " + failuresWithIndex)
Prints:
LINES: List(12, 13, 13a, 14, foo)
PARSED: List(Success(12), Success(13), Failure(java.lang.NumberFormatException: For input string: "13a"), Success(14), Failure(java.lang.NumberFormatException: For input string: "foo"))
FAILURES EXIST?: true
FAILURES: List(java.lang.NumberFormatException: For input string: "13a", java.lang.NumberFormatException: For input string: "foo")
PARSED LINES WITH INDEX: List((Success(12),0), (Success(13),1), (Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Success(14),3), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))
FAILURES WITH INDEX: List((Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))
Given that you could wrap all this in a helper class, abstract parsing function, generalize input and output types and even define error type whether it's an exception or something else.
What I'm suggesting is a simple map based approach, the exact types could be defined based on a task.
The annoying thing is that you have to keep a reference to parsedWithIndex in order to be able to get indexes and exceptions unless your exceptions will hold indexes and other context information.
Example of an implementation:
case class Transformer[From, To](input: List[From], f: From => To) {
import scala.util.{Try, Success, Failure}
lazy val transformedWithIndex: List[(Try[To], Int)] =
input map { l => Try ( f(l) ) } zipWithIndex
def failuresWithIndex =
transformedWithIndex.filter { case (v, i) => v.isFailure }
lazy val failuresExist: Boolean =
! failuresWithIndex.isEmpty
def successfulOnly: List[To] =
for {
(e, _) <- transformedWithIndex
value <- e.toOption
} yield value
}
val lines = List("12", "13", "13a", "14", "foo")
val res = Transformer(lines, (l: String) => l.toInt)
println("FAILURES EXIST?: " + res.failuresExist)
println("PARSED LINES WITH INDEX: " + res.transformedWithIndex)
println("SUCCESSFUL ONLY: " + res.successfulOnly)
Prints:
FAILURES EXIST?: true
PARSED LINES WITH INDEX: List((Success(12),0), (Success(13),1), (Failure(java.lang.NumberFormatException: For input string: "13a"),2), (Success(14),3), (Failure(java.lang.NumberFormatException: For input string: "foo"),4))
SUCCESSFUL ONLY: List(12, 13, 14)
Try can be replaced with Either or your own custom Failure.
This does feel a bit more object oriented rather than functional.
I'm trying to use Scalaz 7 Validation in my app. However, I'm having an issue getting the |#| applicative functor to coalesce my failures. Here's the code I have:
type ValidationResult = ValidationNel[String, Unit]
def validate[A: ClassTag](instance: A, fieldNames: Option[Seq[String]] = None): ValidationResult = {
val fields = classTag[A].runtimeClass.getDeclaredFields
val fieldSubset = fieldNames match {
case Some(names) => fields.filter { field => names.contains(field.getName) }
case None => fields
}
fieldSubset.map {
field => field.getAnnotations.toSeq.map {
field.setAccessible(true)
val (name, value) = (field.getName, field.get(instance))
field.setAccessible(false)
annotation => annotation match {
case min: Min => minValidate(name, value, min.value())
case size: Size => sizeValidate(name, value, size.min(), size.max())
}
}
}.flatten[ValidationResult].foldLeft(().successNel[String])(_ |#| _)
}
The minValidate and sizeValidate functions just return ValidationResults.
The problem is, this code won't compile. The error message is:
Type mismatch, expected F0.type#M[NotInferedB], actual: ValidationResult
I have no idea what that means... do I need to give Scala more type info?
What I'm trying to accomplish is, if all fields are successNels, then return that, otherwise, return a combination of all the failureNels.
Has |#| changed since previous version of Scalaz? Because even if I do something like:
().successNel |#| ().successNel
I get the same error.
Update
I started poking around the Scalaz source and I found the +++ which seems to do what I want.
What's the difference between +++ and |#|?
Scalaz's applicative builder syntax (|#|) gives you a way of "lifting" functions into an applicative functor. Suppose we have the following results, for example:
val xs: ValidationNel[String, List[Int]] = "Error!".failNel
val ys: ValidationNel[String, List[Int]] = List(1, 2, 3).success
val zs: ValidationNel[String, List[Int]] = List(4, 5).success
We can lift the list concatenation function (++) into the Validation like this:
scala> println((ys |#| zs)(_ ++ _))
Success(List(1, 2, 3, 4, 5))
scala> println((xs |#| ys)(_ ++ _))
Failure(NonEmptyList(Error!))
scala> println((xs |#| xs)(_ ++ _))
Failure(NonEmptyList(Error!, Error!))
This syntax is a little weird—it's very unlike how you lift functions into an applicative functor in Haskell, for example, and is designed this way primarily to outsmart Scala's fairly stupid type inference system. See my answer here or blog post here for more discussion.
One part of the weirdness is that xs |#| ys doesn't really mean anything on its own—it's essentially an argument list that's waiting to be applied to a function that it will lift into its applicative functor and apply to itself.
The +++ on Validation is a much simpler kind of creature—it's just the addition operation for the Semigroup instance for the type (note that you could equivalently use Scalaz's semigroup operator |+| here in place of +++). You give it two Validation results with matching semigroup types and it gives you another Validation—not some awful ApplyOps thing.
As a side note, in this case the addition operation for Validation's semigroup is the same as the semigroup operation for the right side lifted into the Validation:
scala> (xs |+| ys) == (xs |#| ys)(_ |+| _)
res3: Boolean = true
This won't always be the case, however (it's not for \/, for example, where the semigroup accumulates errors but the applicative functor doesn't).
I have a Validation object
val v = Validation[String, Option[Int]]
I need to make a second validation, to check if actual Integer value is equals to 100 for example. If I do
val vv = v.map(_.map(intValue => if (intValue == 100)
intValue.success[String]
else
"Bad value found".fail[Integer]))
I get:
Validation[String, Option[Validation[String, Int]]]
How is it possible to get vv also as Validation[String, Option[Int]] in most concise way
=========
Found possible solution from my own:
val validation: Validation[String, Option[Int]] = Some(100).success[String]
val validatedTwice: Validation[String, Option[Int]] = validation.fold(
_ => validation, // if Failure then return it
_.map(validateValue _) getOrElse validation // validate Successful result
)
def validateValue(value: Int): Validation[String, Option[Int]] = {
if (value == 100)
Some(value).success[String]
else
"Bad value".fail[Option[Int]]
}
Looks not concise and elegant although it works
==============
Second solution from my own, but also looks over-compicated:
val validatedTwice2: Validation[String, Option[Int]] = validation.flatMap(
_.map(validateValue _).map(_.map(Some(_))) getOrElse validation)
def validateValue(value: Int): Validation[String, Int] = {
if (value == 100)
value.success[String]
else
"Bad value".fail[Int]
}
Your solution is over-complicated. The following will suffice!
v flatMap (_.filter(_ == 100).toSuccess("Bad value found"))
The toSuccess comes from OptionW and converts an Option[A] into a Validation[X, A] taking the value provided for the failure case in the event that the option is empty. The flatMap works like this:
Validation[X, A]
=> (A => Validation[X, B])
=> (via flatMap) Validation[X, B]
That is, flatMap maps and then flattens (join in scalaz-parlance):
Validation[X, A]
=> (A => Validation[X, B]]
=> (via map) Validation[X, Validation[X, B]]
=> (via join) Validation[X, B]
First, let's set up some type aliases because typing this out repeatedly will get old pretty fast. We'll tidy up your validation logic a little too while we're here.
type V[X] = Validation[String, X]
type O[X] = Option[X]
def checkInt(i: Int): V[Int] = Validation.fromEither(i != 100 either "Bad value found" or i)
val v: V[O[Int]] = _
this is where we're starting out - b1 is equivalent to your vv situation
val b1: V[O[V[Int]]] = v.map(_.map(checkInt))
so let's sequence the option to flip over the V[O[V[Int]]] into a V[V[O[Int]]]
val b2: V[V[O[Int]]] = v.map(_.map(checkInt)).map(_.sequence[V, Int])
or if you're feeling lambda-y it could have been
sequence[({type l[x] = Validation[String, x]})#l, Int]
next we flatten out that nested validation - we're going to pull in the Validation monad because we actually do want the fastfail behaviour here, although it's generally not the right thing to do.
implicit val monad = Validation.validationMonad[String]
val b3: V[O[Int]] = v.map(_.map(checkInt)).map(_.sequence[V, Int]).join
So now we've got a Validation[String, Option[Int]], so we're there, but this is still pretty messy. Lets use some equational reasoning to tidy it up
By the second functor law we know that:
X.map(_.f).map(_.g) = X.map(_.f.g) =>
val i1: V[O[Int]] = v.map(_.map(checkInt).sequence[V, Int]).join
and by the definition of a monad:
X.map(f).join = X.flatMap(f) =>
val i2: V[O[Int]] = v.flatMap(_.map(checkInt).sequence[V, Int])
and then we apply the free theorem of traversal:
(I struggled with that bloody paper so much, but it looks like some of it sunk in!):
X.map(f).sequence = X.traverse(f andThen identity) = X.traverse(f) =>
val i3: V[O[Int]] = v.flatMap(_.traverse[V, Int](checkInt))
so now we're looking at something a bit more civilised. I imagine there's some trickery to be played with the flatMap and traverse, but I've run out of inspiration.
Use flatMap, like so:
v.flatMap(_.parseInt.fail.map(_.getMessage).validation)