Accessing position information in a scala combinatorparser kills performance - performance

I wrote a new combinator for my parser in scala.
Its a variation of the ^^ combinator, which passes position information on.
But accessing the position information of the input element really cost performance.
In my case parsing a big example need around 3 seconds without position information, with it needs over 30 seconds.
I wrote a runnable example where the runtime is about 50% more when accessing the position.
Why is that? How can I get a better runtime?
Example:
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.combinator.Parsers
import scala.util.matching.Regex
import scala.language.implicitConversions
object FooParser extends RegexParsers with Parsers {
var withPosInfo = false
def b: Parser[String] = regexB("""[a-z]+""".r) ^^# { case (b, x) => b + " ::" + x.toString }
def regexB(p: Regex): BParser[String] = new BParser(regex(p))
class BParser[T](p: Parser[T]) {
def ^^#[U](f: ((Int, Int), T) => U): Parser[U] = Parser { in =>
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
val inwo = in.drop(start - offset)
p(inwo) match {
case Success(t, in1) =>
{
var a = 3
var b = 4
if(withPosInfo)
{ // takes a lot of time
a = inwo.pos.line
b = inwo.pos.column
}
Success(f((a, b), t), in1)
}
case ns: NoSuccess => ns
}
}
}
def main(args: Array[String]) = {
val r = "foo"*50000000
var now = System.nanoTime
parseAll(b, r)
var us = (System.nanoTime - now) / 1000
println("without: %d us".format(us))
withPosInfo = true
now = System.nanoTime
parseAll(b, r)
us = (System.nanoTime - now) / 1000
println("with : %d us".format(us))
}
}
Output:
without: 2952496 us
with : 4591070 us

Unfortunately, I don't think you can use the same approach. The problem is that line numbers end up implemented by scala.util.parsing.input.OffsetPosition which builds a list of every line break every time it is created. So if it ends up with string input it will parse the entire thing on every call to pos (twice in your example). See the code for CharSequenceReader and OffsetPosition for more details.
There is one quick thing you can do to speed this up:
val ip = inwo.pos
a = ip.line
b = ip.column
to at least avoid creating pos twice. But that still leaves you with a lot of redundant work. I'm afraid to really solve the problem you'll have to build the index as in OffsetPosition yourself, just once, and then keep referring to it.
You could also file a bug report / make an enhancement request. This is not a very good way to implement the feature.

Related

Using DFS to print out a Tree in Kotlin

I am working on Trees and wanted to print out the tree in a Stack
Here's what I got so far.
class TreeNode<T>(var key: T,){
var left: TreeNode<T>? = null
var right: TreeNode<T>? = null
}
fun depthFirstValues(root: TreeNode<Char>){
val stack = mutableListOf(root)
while (stack.size > 0){
val current = stack.removeFirst()
// println(current.key)
if(current.right!!.equals(true)) stack.add(current.right!!)
if (current.left!!.equals(true)) stack.add(current.left!!)
}
println(stack)
}
fun buildTree(): TreeNode<Char>{
val a = TreeNode('a')
val b = TreeNode('b')
val c = TreeNode('c')
val d = TreeNode('d')
val e = TreeNode('e')
val f = TreeNode('f')
a.left = b
a.right = c
b.left = d
b.right = e
c.right = f
return a
}
I got an emptyList as the return value. I have been tinkering with it throughout the day, but not sure how to get it to work. Any help will be appreciated. Thank you.
I see three major problems with your code.
You need an extra collection to store the results of the depth-first traversal, if you want to store the entire traversal in a collection, and print it out at the end.
You can't just use the same stack as the one you use to implement the traversal, because that stack is guaranteed to be empty at the end of the algorithm, as indicated by the condition on the while loop - stack.size == 0.
You are not actually using stack like a stack. You are removing elements from its front (removeFirst), but adding to its end (add), like a queue. To use it like a stack, you should add to/remove from the same end of the list.
You are not checking nulls correctly. current.right!!.equals(true) is false if current.right is not null, and will throw an exception if it is null - doesn't make much sense at all, does it?
Fixing these issues, we have:
fun depthFirstValues(root: TreeNode<Char>){
val stack = mutableListOf(root)
val result = mutableListOf<Char>()
while (stack.isNotEmpty()){
val current = stack.removeLast()
current.left?.apply(stack::add)
current.right?.apply(stack::add)
result.add(current.key) // could also get rid of "result" and just println(current.key) here
}
println(result)
}
When applied to your tree, it prints [a, c, f, b, e, d].

Better way to scan data using scala and spark

Problem
The input data has 2 types of records, lets call them R and W.
I need to traverse this data in Sequence from top to bottom in such a way that if the current record is of type W, it has to be merged with a map(lets call it workMap). If the key of that W-type record is already present in the map, the value of this record is added to it, otherwise a new entry is made into workMap.
If the current record is of type R, the workMap calculated until this record, is attached to the current record.
For example, if this is the order of records -
W1- a -> 2
W2- b -> 3
W3- a -> 4
R1
W4- c -> 1
R2
W5- c -> 4
Where W1, W2, W3, W4 and W5 are of type W; And R1 and R2 are of type R
At the end of this function, I should have the following -
R1 - { a -> 6,
b -> 3 } //merged(W1, W2, W3)
R2 - { a -> 6,
b -> 3,
c -> 1 } //merged(W1, W2, W3, W4)
{ a -> 6,
b -> 3,
c -> 5 } //merged(W1, W2, W3, W4, W5)
I want all the R-type records attached to the intermediate workMaps calculated until that point; And the final workMap after the last record is processed.
Here is the code that I have written -
def calcPerPartition(itr: Iterator[(InputKey, InputVal)]):
Iterator[(ReportKey, ReportVal)] = {
val workMap = mutable.HashMap.empty[WorkKey, WorkVal]
val reportList = mutable.ArrayBuffer.empty[(ReportKey, Reportval)]
while (itr.hasNext) {
val temp = itr.next()
val (iKey, iVal) = (temp._1, temp._2)
if (iKey.recordType == reportType) {
//creates a new (ReportKey, Reportval)
reportList += getNewReportRecord(workMap, iKey, iVal)
}
else {
//if iKey is already present, merge the values
//other wise adds a new entry
updateWorkMap(workMap, iKey, iVal)
}
}
val workList: Seq[(ReportKey, ReportVal)] = workMap.toList.map(convertToReport)
reportList.iterator ++ workList.iterator
}
ReportKey class is like this -
case class ReportKey (
// the type of record - report or work
rType: Int,
date: String,
.....
)
There are two problems with this approach that I am asking help for -
I have to keep track of a reportList - a list of R type records attached with intermediate workMaps. As the data grows, the reportList also grows and I am running into OutOfMemoryExceptions.
I have to combine reportList and workMap records in the same data structure and then return them. If there is any other elegant way, I would definitely consider changing this design.
For the sake of completeness - I am using spark. The function calcPerPartition is passed as argument for mapPartitions on an RDD. I need the workMaps from each partition to do some additional calculations later.
I know that if I don't have to return workMaps from each partition, the problem becomes much easier, like this -
...
val workMap = mutable.HashMap.empty[WorkKey, WorkVal]
itr.scanLeft[Option[(ReportKey, Reportval)]](
None)((acc: Option[(ReportKey, Reportval)],
curr: (InputKey, InputVal)) => {
if (curr._1.recordType == reportType) {
val rec = getNewReportRecord(workMap, curr._1, curr._2)
Some(rec)
}
else {
updateWorkMap(workMap, curr._1, curr._2)
None
}
})
val reportList = scan.filter(_.isDefined).map(_.get)
//workMap is still empty after the scanLeft.
...
Sure, I can do a reduce operation on the input data to derive the final workMap but I would need to look at the data twice. Considering that the input data set is huge, I want to avoid that too.
But unfortunately I need the workMaps at a latter step.
So, is there a better way to solve the above problem? If I can't solve problem 2 at all(according to this), is there any other way I can avoid storing R records(reportList) in a list or scan the data more than once?
I don't yet have a better design for the second question - if you can avoid combining reportList and workMap into a single data structure but we can certainly avoid storing R type records in a list.
Here is how we can re-write the calcPerPartition from the above question -
def calcPerPartition(itr: Iterator[(InputKey, InputVal)]):
Iterator[Option[(ReportKey, ReportVal)]] = {
val workMap = mutable.HashMap.empty[WorkKey, WorkVal]
var finalWorkMap = true
new Iterator[Option[(ReportKey, ReportVal)]](){
override def hasNext: Boolean = itr.hasNext
override def next(): Option[(ReportKey, ReportVal)] = {
val curr = itr.next()
val iKey = curr._1
val iVal = curr._2
val eventKey = EventKey(openKey.date, openKey.symbol)
if (iKey.recordType == reportType) {
Some(getNewReportRecord(workMap, iKey, iVal))
}
else {
//otherwise update the generic interest map but don't accumulate anything
updateWorkMap(workMap, iKey, iVal)
if (itr.hasNext) {
next()
}
else {
if(finalWorkMap){
finalWorkMap = false //because we want a final only once
Some(workMap.map(convertToReport))
}
else {
None
}
}
}
}
}
}
Instead of storing results in a list, we defined an iterator. That solved most of the memory issues we had around this issue.

Using par map to increase performance

Below code runs a comparison of users and writes to file. I've removed some code to make it as concise as possible but speed is an issue also in this code :
import scala.collection.JavaConversions._
object writedata {
def getDistance(str1: String, str2: String) = {
val zipped = str1.zip(str2)
val numberOfEqualSequences = zipped.count(_ == ('1', '1')) * 2
val p = zipped.count(_ == ('1', '1')).toFloat * 2
val q = zipped.count(_ == ('1', '0')).toFloat * 2
val r = zipped.count(_ == ('0', '1')).toFloat * 2
val s = zipped.count(_ == ('0', '0')).toFloat * 2
(q + r) / (p + q + r)
} //> getDistance: (str1: String, str2: String)Float
case class UserObj(id: String, nCoordinate: String)
val userList = new java.util.ArrayList[UserObj] //> userList : java.util.ArrayList[writedata.UserObj] = []
for (a <- 1 to 100) {
userList.add(new UserObj("2", "101010"))
}
def using[A <: { def close(): Unit }, B](param: A)(f: A => B): B =
try { f(param) } finally { param.close() } //> using: [A <: AnyRef{def close(): Unit}, B](param: A)(f: A => B)B
def appendToFile(fileName: String, textData: String) =
using(new java.io.FileWriter(fileName, true)) {
fileWriter =>
using(new java.io.PrintWriter(fileWriter)) {
printWriter => printWriter.println(textData)
}
} //> appendToFile: (fileName: String, textData: String)Unit
var counter = 0; //> counter : Int = 0
for (xUser <- userList.par) {
userList.par.map(yUser => {
if (!xUser.id.isEmpty && !yUser.id.isEmpty)
synchronized {
appendToFile("c:\\data-files\\test.txt", getDistance(xUser.nCoordinate , yUser.nCoordinate).toString)
}
})
}
}
The above code was previously an imperative solution, so the .par functionality was within an inner and outer loop. I'm attempting to convert it to a more functional implementation while also taking advantage of Scala's parallel collections framework.
In this example the data set size is 10 but in the code im working on
the size is 8000 which translates to 64'000'000 comparisons. I'm
using a synchronized block so that multiple threads are not writing
to same file at same time. A performance improvment im considering
is populating a separate collection within the inner loop ( userList.par.map(yUser => {)
and then writing that collection out to seperate file.
Are there other methods I can use to improve performance. So that I can
handle a List that contains 8000 items instead of above example of 100 ?
I'm not sure if you removed too much code for clarity, but from what I can see, there is absolutely nothing that can run in parallel since the only thing you are doing is writing to a file.
EDIT:
One thing that you should do is to move the getDistance(...) computation before the synchronized call to appendToFile, otherwise your parallelized code ends up being sequential.
Instead of calling a synchronized appendToFile, I would call appendToFile in a non-synchronized way, but have each call to that method add the new line to some synchronized queue. Then I would have another thread that flushes that queue to disk periodically. But then you would also need to add something to make sure that the queue is also flushed when all computations are done. So that could get complicated...
Alternatively, you could also keep your code and simply drop the synchronization around the call to appendToFile. It seems that println itself is synchronized. However, that would be risky since println is not officially synchronized and it could change in future versions.

Scala stateful actor, recursive calling faster than using vars?

Sample code below. I'm a little curious why MyActor is faster than MyActor2. MyActor recursively calls process/react and keeps state in the function parameters whereas MyActor2 keeps state in vars. MyActor even has the extra overhead of tupling the state but still runs faster. I'm wondering if there is a good explanation for this or if maybe I'm doing something "wrong".
I realize the performance difference is not significant but the fact that it is there and consistent makes me curious what's going on here.
Ignoring the first two runs as warmup, I get:
MyActor:
559
511
544
529
vs.
MyActor2:
647
613
654
610
import scala.actors._
object Const {
val NUM = 100000
val NM1 = NUM - 1
}
trait Send[MessageType] {
def send(msg: MessageType)
}
// Test 1 using recursive calls to maintain state
abstract class StatefulTypedActor[MessageType, StateType](val initialState: StateType) extends Actor with Send[MessageType] {
def process(state: StateType, message: MessageType): StateType
def act = proc(initialState)
def send(message: MessageType) = {
this ! message
}
private def proc(state: StateType) {
react {
case msg: MessageType => proc(process(state, msg))
}
}
}
object MyActor extends StatefulTypedActor[Int, (Int, Long)]((0, 0)) {
override def process(state: (Int, Long), input: Int) = input match {
case 0 =>
(1, System.currentTimeMillis())
case input: Int =>
state match {
case (Const.NM1, start) =>
println((System.currentTimeMillis() - start))
(Const.NUM, start)
case (s, start) =>
(s + 1, start)
}
}
}
// Test 2 using vars to maintain state
object MyActor2 extends Actor with Send[Int] {
private var state = 0
private var strt = 0: Long
def send(message: Int) = {
this ! message
}
def act =
loop {
react {
case 0 =>
state = 1
strt = System.currentTimeMillis()
case input: Int =>
state match {
case Const.NM1 =>
println((System.currentTimeMillis() - strt))
state += 1
case s =>
state += 1
}
}
}
}
// main: Run testing
object TestActors {
def main(args: Array[String]): Unit = {
val a = MyActor
// val a = MyActor2
a.start()
testIt(a)
}
def testIt(a: Send[Int]) {
for (_ <- 0 to 5) {
for (i <- 0 to Const.NUM) {
a send i
}
}
}
}
EDIT: Based on Vasil's response, I removed the loop and tried it again. And then MyActor2 based on vars leapfrogged and now might be around 10% or so faster. So... lesson is: if you are confident that you won't end up with a stack overflowing backlog of messages, and you care to squeeze every little performance out... don't use loop and just call the act() method recursively.
Change for MyActor2:
override def act() =
react {
case 0 =>
state = 1
strt = System.currentTimeMillis()
act()
case input: Int =>
state match {
case Const.NM1 =>
println((System.currentTimeMillis() - strt))
state += 1
case s =>
state += 1
}
act()
}
Such results are caused with the specifics of your benchmark (a lot of small messages that fill the actor's mailbox quicker than it can handle them).
Generally, the workflow of react is following:
Actor scans the mailbox;
If it finds a message, it schedules the execution;
When the scheduling completes, or, when there're no messages in the mailbox, actor suspends (Actor.suspendException is thrown);
In the first case, when the handler finishes to process the message, execution proceeds straight to react method, and, as long as there're lots of messages in the mailbox, actor immediately schedules the next message to execute, and only after that suspends.
In the second case, loop schedules the execution of react in order to prevent a stack overflow (which might be your case with Actor #1, because tail recursion in process is not optimized), and thus, execution doesn't proceed to react immediately, as in the first case. That's where the millis are lost.
UPDATE (taken from here):
Using loop instead of recursive react
effectively doubles the number of
tasks that the thread pool has to
execute in order to accomplish the
same amount of work, which in turn
makes it so any overhead in the
scheduler is far more pronounced when
using loop.
Just a wild stab in the dark. It might be due to the exception thrown by react in order to evacuate the loop. Exception creation is quite heavy. However I don't know how often it do that, but that should be possible to check with a catch and a counter.
The overhead on your test depends heavily on the number of threads that are present (try using only one thread with scala -Dactors.corePoolSize=1!). I'm finding it difficult to figure out exactly where the difference arises; the only real difference is that in one case you use loop and in the other you do not. Loop does do fair bit of work, since it repeatedly creates function objects using "andThen" rather than iterating. I'm not sure whether this is enough to explain the difference, especially in light of the heavy usage by scala.actors.Scheduler$.impl and ExceptionBlob.

Extending Seq.sortBy in Scala

Say I have a list of names.
case class Name(val first: String, val last: String)
val names = Name("c", "B") :: Name("b", "a") :: Name("a", "B") :: Nil
If I now want to sort that list by last name (and if that is not enough, by first name), it is easily done.
names.sortBy(n => (n.last, n.first))
// List[Name] = List(Name(a,B), Name(c,B), Name(b,a))
But what, if I‘d like to sort this list based on some other collation for strings?
Unfortunately, the following does not work:
val o = new Ordering[String]{ def compare(x: String, y: String) = collator.compare(x, y) }
names.sortBy(n => (n.last, n.first))(o)
// error: type mismatch;
// found : java.lang.Object with Ordering[String]
// required: Ordering[(String, String)]
// names.sortBy(n => (n.last, n.first))(o)
is there any way that allow me to change the ordering without having to write an explicit sortWith method with multiple if–else branches in order to deal with all cases?
Well, this almost does the trick:
names.sorted(o.on((n: Name) => n.last + n.first))
On the other hand, you can do this as well:
implicit val o = new Ordering[String]{ def compare(x: String, y: String) = collator.compare(x, y) }
names.sortBy(n => (n.last, n.first))
This locally defined implicit will take precedence over the one defined on the Ordering object.
One solution is to extend the otherwise implicitly used Tuple2 ordering. Unfortunately, this means writing out Tuple2 in the code.
names.sortBy(n => (n.second, n.first))(Ordering.Tuple2(o, o))
I'm not 100% sure what methods you think collator should have.
But you have the most flexibility if you define the ordering on the case class:
val o = new Ordering[Name]{
def compare(a: Name, b: Name) =
3*math.signum(collator.compare(a.last,b.last)) +
math.signum(collator.compare(a.first,b.first))
}
names.sorted(o)
but you can also provide an implicit conversion from a string ordering to a name ordering:
def ostring2oname(os: Ordering[String]) = new Ordering[Name] {
def compare(a: Name, b: Name) =
3*math.signum(os.compare(a.last,b.last)) + math.signum(os.compare(a.first,b.first))
}
and then you can use any String ordering to sort Names:
def oo = new Ordering[String] {
def compare(x: String, y: String) = x.length compare y.length
}
val morenames = List("rat","fish","octopus")
scala> morenames.sorted(oo)
res1: List[java.lang.String] = List(rat, fish, octopus)
Edit: A handy trick, in case it wasn't apparent, is that if you want to order by N things and you're already using compare, you can just multiply each thing by 3^k (with the first-to-order being multiplied by the largest power of 3) and add.
If your comparisons are very time-consuming, you can easily add a cascading compare:
class CascadeCompare(i: Int) {
def tiebreak(j: => Int) = if (i!=0) i else j
}
implicit def break_ties(i: Int) = new CascadeCompare(i)
and then
def ostring2oname(os: Ordering[String]) = new Ordering[Name] {
def compare(a: Name, b: Name) =
os.compare(a.last,b.last) tiebreak os.compare(a.first,b.first)
}
(just be careful to nest them x tiebreak ( y tiebreak ( z tiebreak w ) ) ) so you don't do the implicit conversion a bunch of times in a row).
(If you really need fast compares, then you should write it all out by hand, or pack the orderings in an array and use a while loop. I'll assume you're not that desperate for performance.)

Resources