For instance, how can I write code in ATS that traverses a given string as is done by the following C code:
while ((c = *str++) != 0) do_something(c);
Well, there is always a combinator-based solution:
(str).foreach()(lam(c) => do_something(c))
The following solution is easy, accessible and doesn't require any unsafe features (but it does use one advanced feature: indexed string type).
fun
loop {n:int}(p0: string(n)): void =
if string_isnot_empty (p0) then let
val c = (g0ofg1)(string_head(p0))
val p0 = string_tail(p0)
in
do_something(c); loop(p0)
end
Full code: https://glot.io/snippets/ejpwxk2xzx
The following solution makes use of UNSAFE but it should be really easy to access:
staload
UNSAFE =
"prelude/SATS/unsafe.sats"
fun
loop(p0: ptr): void = let
val c = $UNSAFE.ptr_get<char>p0)
in
if isneqz(c) then (do_something(c); loop(ptr_succ<char>(p0)) else ()
end
val () = loop(string2ptr(str))
Related
I'm opening a CSV file and reading it using BufReader and splitting each line into a vector. Then I try to insert or update the count in a HashMap using a specific column as key.
let mut map: HashMap<&str, i32> = HashMap::new();
let reader = BufReader::new(input_file);
for line in reader.lines() {
let s = line.unwrap().to_string();
let tokens: Vec<&str> = s.split(&d).collect(); // <-- `s` does not live long enough
if tokens.len() > c {
println!("{}", tokens[c]);
let count = map.entry(tokens[c].to_string()).or_insert(0);
*count += 1;
}
}
The compiler kindly tells me s is shortlived. Storing from inside a loop a borrowed value to container in outer scope? suggests "owning" the string, so I tried to change
let count = map.entry(tokens[c]).or_insert(0);
to
let count = map.entry(tokens[c].to_string()).or_insert(0);
but I get the error
expected `&str`, found struct `std::string::String`
help: consider borrowing here: `&tokens[c].to_string()`
When I prepend ampersand (&) the error is
creates a temporary which is freed while still in use
note: consider using a `let` binding to create a longer lived
There is some deficiency in my Rust knowledge about borrowing. How can I make the hashmap own the string passed as key?
The easiest way for this to work is for your map to own the keys. This means that you must change its type from HasMap<&str, i32> (which borrows the keys) to HashMap<String, i32>. At which point you can call to_string to convert your tokens into owned strings:
let mut map: HashMap<String, i32> = HashMap::new();
let reader = BufReader::new(input_file);
for line in reader.lines() {
let s = line.unwrap().to_string();
let tokens:Vec<&str> = s.split(&d).collect();
if tokens.len() > c {
println!("{}", tokens[c]);
let count = map.entry(tokens[c].to_string()).or_insert(0);
*count += 1;
}
}
Note however that this means that tokens[c] will be duplicated even if it was already present in the map. You can avoid the extra duplication by trying to modify the counter with get_mut first, but this requires two lookups when the key is missing:
let mut map: HashMap<String, i32> = HashMap::new();
let reader = BufReader::new(input_file);
for line in reader.lines() {
let s = line.unwrap().to_string();
let tokens:Vec<&str> = s.split(&d).collect();
if tokens.len() > c {
println!("{}", tokens[c]);
if let Some (count) = map.get_mut (tokens[c]) {
*count += 1;
} else {
map.insert (tokens[c].to_string(), 1);
}
}
}
I don't know of a solution that would only copy the key when there was no previous entry but still do a single lookup.
http://www.ats-lang.org/Documents.html includes "Introduction to Programming in ATS", which includes the assertion that fileref_get_line_string returns a Strptr1 (a look in filebas.dats shows that it returns a String via strptr2string), and it includes this code:
#include "share/atspre_staload.hats"
#include "share/atspre_staload_libats_ML.hats"
implement main0() = loop() where
fun loop(): void = let
val isnot = fileref_isnot_eof(stdin_ref)
in
if isnot then let
val line = fileref_get_line_string(stdin_ref)
val () = print_string(line)
val () = strptr_free(line)
in
loop()
end else ()
end
end
Which throws a type error if the strptr_free line is included. If that line isn't included, the program blatantly leaks memory. Is there current documentation or are there ATS2 examples that show how the fileref_* words are supposed to be used? What is the ATS2 version of the code above?
There are two versions of fileref_get_line_string: one in prelude/filebas and
the other in libats/ML/filebas. For getting linear strings, you need
the former:
#include
"share/atspre_staload.hats"
implement
main0() = loop() where
fun
loop(): void = let
val
isnot =
fileref_isnot_eof(stdin_ref)
in
if isnot then let
val line =
fileref_get_line_string(stdin_ref)
val () =
print_strptr(line)
val () = free(line)
in
loop()
end else ()
end
end
I have a string sequence Seq[String] which represents stdin input lines.
Those lines map to a model entity, but it is not guaranteed that 1 line = 1 entity instance.
Each entity is delimited with a special string that will not occur anywhere else in the input.
My solution was something like:
val entities = lines.mkString.split(myDelimiter).map(parseEntity)
parseEntity implementation is not relevant, it gets a String and maps to a case class which represents the model entity
The problem is with a given input, I get an OutOfMemoryException on the lines.mkString. Would a fold/foldLeft/foldRight be more efficient? Or do you have any better alternative?
You can solve this using akka streams and delimiter framing. See this section of the documentation for the basic approach.
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Framing, Source}
import akka.util.ByteString
val example = (0 until 100).mkString("delimiter").grouped(8).toIndexedSeq
val framing = Framing.delimiter(ByteString("delimiter"), 1000)
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
Source(example)
.map(ByteString.apply)
.via(framing)
.map(_.utf8String)
.runForeach(println)
The conversion to and from ByteString is a bit annoying, but Framing.delimiter is only defined for ByteString.
If you are fine with a more pure functional approach, fs2 will also offer primitives to solve this problem.
Something that worked for me if you are reading from a stream (your mileage may vary). Slightly modified version of Scala LineIterator:
class EntityIterator(val iter: BufferedIterator[Char]) extends AbstractIterator[String] with Iterator[String] {
private[this] val sb = new StringBuilder
def getc() = iter.hasNext && {
val ch = iter.next
if (ch == '\n') false // Replace with your delimiter here
else {
sb append ch
true
}
}
def hasNext = iter.hasNext
def next = {
sb.clear
while (getc()) { }
sb.toString
}
}
val entities =
new EnityIterator(scala.io.Source.fromInputStream(...).iter.buffered)
entities.map(...)
Below code runs a comparison of users and writes to file. I've removed some code to make it as concise as possible but speed is an issue also in this code :
import scala.collection.JavaConversions._
object writedata {
def getDistance(str1: String, str2: String) = {
val zipped = str1.zip(str2)
val numberOfEqualSequences = zipped.count(_ == ('1', '1')) * 2
val p = zipped.count(_ == ('1', '1')).toFloat * 2
val q = zipped.count(_ == ('1', '0')).toFloat * 2
val r = zipped.count(_ == ('0', '1')).toFloat * 2
val s = zipped.count(_ == ('0', '0')).toFloat * 2
(q + r) / (p + q + r)
} //> getDistance: (str1: String, str2: String)Float
case class UserObj(id: String, nCoordinate: String)
val userList = new java.util.ArrayList[UserObj] //> userList : java.util.ArrayList[writedata.UserObj] = []
for (a <- 1 to 100) {
userList.add(new UserObj("2", "101010"))
}
def using[A <: { def close(): Unit }, B](param: A)(f: A => B): B =
try { f(param) } finally { param.close() } //> using: [A <: AnyRef{def close(): Unit}, B](param: A)(f: A => B)B
def appendToFile(fileName: String, textData: String) =
using(new java.io.FileWriter(fileName, true)) {
fileWriter =>
using(new java.io.PrintWriter(fileWriter)) {
printWriter => printWriter.println(textData)
}
} //> appendToFile: (fileName: String, textData: String)Unit
var counter = 0; //> counter : Int = 0
for (xUser <- userList.par) {
userList.par.map(yUser => {
if (!xUser.id.isEmpty && !yUser.id.isEmpty)
synchronized {
appendToFile("c:\\data-files\\test.txt", getDistance(xUser.nCoordinate , yUser.nCoordinate).toString)
}
})
}
}
The above code was previously an imperative solution, so the .par functionality was within an inner and outer loop. I'm attempting to convert it to a more functional implementation while also taking advantage of Scala's parallel collections framework.
In this example the data set size is 10 but in the code im working on
the size is 8000 which translates to 64'000'000 comparisons. I'm
using a synchronized block so that multiple threads are not writing
to same file at same time. A performance improvment im considering
is populating a separate collection within the inner loop ( userList.par.map(yUser => {)
and then writing that collection out to seperate file.
Are there other methods I can use to improve performance. So that I can
handle a List that contains 8000 items instead of above example of 100 ?
I'm not sure if you removed too much code for clarity, but from what I can see, there is absolutely nothing that can run in parallel since the only thing you are doing is writing to a file.
EDIT:
One thing that you should do is to move the getDistance(...) computation before the synchronized call to appendToFile, otherwise your parallelized code ends up being sequential.
Instead of calling a synchronized appendToFile, I would call appendToFile in a non-synchronized way, but have each call to that method add the new line to some synchronized queue. Then I would have another thread that flushes that queue to disk periodically. But then you would also need to add something to make sure that the queue is also flushed when all computations are done. So that could get complicated...
Alternatively, you could also keep your code and simply drop the synchronization around the call to appendToFile. It seems that println itself is synchronized. However, that would be risky since println is not officially synchronized and it could change in future versions.
I wrote a new combinator for my parser in scala.
Its a variation of the ^^ combinator, which passes position information on.
But accessing the position information of the input element really cost performance.
In my case parsing a big example need around 3 seconds without position information, with it needs over 30 seconds.
I wrote a runnable example where the runtime is about 50% more when accessing the position.
Why is that? How can I get a better runtime?
Example:
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.combinator.Parsers
import scala.util.matching.Regex
import scala.language.implicitConversions
object FooParser extends RegexParsers with Parsers {
var withPosInfo = false
def b: Parser[String] = regexB("""[a-z]+""".r) ^^# { case (b, x) => b + " ::" + x.toString }
def regexB(p: Regex): BParser[String] = new BParser(regex(p))
class BParser[T](p: Parser[T]) {
def ^^#[U](f: ((Int, Int), T) => U): Parser[U] = Parser { in =>
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
val inwo = in.drop(start - offset)
p(inwo) match {
case Success(t, in1) =>
{
var a = 3
var b = 4
if(withPosInfo)
{ // takes a lot of time
a = inwo.pos.line
b = inwo.pos.column
}
Success(f((a, b), t), in1)
}
case ns: NoSuccess => ns
}
}
}
def main(args: Array[String]) = {
val r = "foo"*50000000
var now = System.nanoTime
parseAll(b, r)
var us = (System.nanoTime - now) / 1000
println("without: %d us".format(us))
withPosInfo = true
now = System.nanoTime
parseAll(b, r)
us = (System.nanoTime - now) / 1000
println("with : %d us".format(us))
}
}
Output:
without: 2952496 us
with : 4591070 us
Unfortunately, I don't think you can use the same approach. The problem is that line numbers end up implemented by scala.util.parsing.input.OffsetPosition which builds a list of every line break every time it is created. So if it ends up with string input it will parse the entire thing on every call to pos (twice in your example). See the code for CharSequenceReader and OffsetPosition for more details.
There is one quick thing you can do to speed this up:
val ip = inwo.pos
a = ip.line
b = ip.column
to at least avoid creating pos twice. But that still leaves you with a lot of redundant work. I'm afraid to really solve the problem you'll have to build the index as in OffsetPosition yourself, just once, and then keep referring to it.
You could also file a bug report / make an enhancement request. This is not a very good way to implement the feature.