Task.async in Elixir Stream - parallel-processing

I want to do a parallel map over a big list. The code looks somewhat like this:
big_list
|> Stream.map(&Task.async(Module, :do_something, [&1]))
|> Stream.map(&Task.await(&1))
|> Enum.filter filter_fun
But I was checking Stream implementation and as far as I understand Stream.map combines the functions and applies combined function to elements in the stream, which would mean that the sequence is like this:
Take first element
Create async task
Wait for it to finish
Take second elelemnt...
In that case, it doesn't do it in parallel. Am I right or am I missing something?
If I am right, what about this code?
Stream.map Task.async ...
|> Enum.map Task.await ...
Is that going to run in parallel?

The second one also doesn't do what you want. You can see it clearly with this code:
defmodule Test do
def test do
[1,2,3]
|> Stream.map(&Task.async(Test, :job, [&1]))
|> Enum.map(&Task.await(&1))
end
def job(number) do
:timer.sleep 1000
IO.inspect(number)
end
end
Test.test
You'll see a number, then a 1 second wait, another number, and so on. The key here is that you want to create the tasks as soon as possible, so you shouldn't use the
lazy Stream.map at all. Instead use the eager Enum.map at that point:
|> Enum.map(&Task.async(Test, :job, [&1]))
|> Enum.map(&Task.await(&1))
On the other hand you can use Stream.map when awaiting, as long as you do some eager operation later, like your filter. That way the awaits will be interspersed with any processing you might be doing on the results.

Elixir 1.4 provides the new Task.async_stream/5 function that will return a stream that runs a given function concurrently on each item in an enumerable.
There are also options to specify the maximum number of workers and a timeout, using the :max_concurrency and :timeout options parameters.
Please note that you don't have to await this Task, because the function returns a stream, so you can either use Enum.to_list/1 or use Stream.run/1.
This will make your example run concurrently:
big_list
|> Task.async_stream(Module, :do_something, [])
|> Enum.filter(filter_fun)

You can try Parallel Stream.
stream = 1..10 |> ParallelStream.map(fn i -> i * 2 end)
stream |> Enum.into([])
[2,4,6,8,10,12,14,16,18,20]
UPD
Or better use Flow

Related

Why do we need to add [] to a iterable array and * to asyncio.gather?

Using this example
import time
import asyncio
async def main(x):
print(f"Starting Task {x}")
await asyncio.sleep(3)
print(f"Finished Task {x}")
async def async_io():
tasks = []
for i in range(10):
tasks += [main(i)]
await asyncio.gather(*tasks)
if __name__ == "__main__":
start_time = time.perf_counter()
asyncio.run(async_io())
print(f"Took {time.perf_counter() - start_time} secs")
I noticed that we need to create a list that keeps track of the tasks to do. Understandable, but then why do we add the [] wrapper over the main(i) function? And also in the asyncio.gather(*tasks), why do we need to add the asterisk there as well?
why do we add the [] wrapper over the main(i) function?
There are a few ways to add items into a list. One such way, the way you've chosen, is by concatenating two lists together.
>>> [1] + [2]
[1, 2]
Trying to concatenate a list and something else will lead to a TypeError.
In your particular case you're using augmented assignment, a (often more performant) shorthand for
tasks = tasks + [main(i)]
Another way to accomplish this is with append.
tasks.append(main(i))
If your real code matches your example code, an even better way to spell all of this is
tasks = [main(i) for i in range(10)]
in the asyncio.gather(*tasks), why do we need to add the asterisk there as well?
Because gather will to run each positional argument it receives. Calls to gather should look like
asyncio.gather(main(0))
asyncio.gather(main(0), main(1))
Since there are times when you need to use a variable number of positional arguments, Python offers the unpacking operator (* in the case of lists).
If you felt so inclined, your example can be rewritten as
async def async_io():
await asyncio.gather(*[main(i) for i in range(10)])

Slowdowns of performance of ets select

I did some tests around performance of selection from ets tables and noted weird behaviour. For example we have a simple ets table (without any specific options) which stores key/value - a random string and a number:
:ets.new(:table, [:named_table])
for _i <- 1..2000 do
:ets.insert(:table, {:crypto.strong_rand_bytes(10)
|> Base.url_encode64
|> binary_part(0, 10), 100})
end
and one entry with known key:
:ets.insert(:table, {"test_string", 200})
Now there is simple stupid benchmark function which tries to select test_string from the ets table multiple times and to measure time of each selection:
test_fn = fn() ->
Enum.map(Enum.to_list(1..10_000), fn(x) ->
:timer.tc(fn() ->
:ets.select(:table, [{{:'$1', :'$2'},
[{:'==', :'$1', "test_string"}],
[:'$_']}])
end)
end) |> Enum.unzip
end
Now If I take a look at maximum time with Enum.max(timings) it will return a value which is approximately 10x times greater than almost of all other selections. So, for example:
iex(1)> {timings, _result} = test_fn.()
....
....
....
iex(2)> Enum.max(timings)
896
iex(3)> Enum.sum(timings) / length(timings)
96.8845
We may see here that maximum value is almost 10x times greater than average value.
What's happening here? Is it somehow related to GC, time for memory allocation or something like this? Do you have any ideas why selection from an ets table may give such slowdowns sometimes or how to profile this.
UPD.
here is the graph of timings distribution:
match_spec, the 2nd argument of the select/2 is making it slower.
According to an answer on this question
Erlang: ets select and match performance
In trivial non-specific use-cases, select is just a lot of work around match.
In non-trivial more common use-cases, select will give you what you really want a lot quicker.
Also, if you are working with tables with set or ordered_set type, to get a value based on a key, use lookup/2 instead, as it is lot faster.
On my pc, following code
def lookup() do
{timings, _} = Enum.map(Enum.to_list(1..10_000), fn(_x) ->
:timer.tc(fn() ->
:ets.lookup(:table, "test_string")
end)
end) |> Enum.unzip
IO.puts Enum.max(timings)
IO.puts Enum.sum(timings) / length(timings)
end
printed
0
0.0
While yours printed
16000
157.9
In case you are interested, here you can find the NIF C code for ets:select.
https://github.com/erlang/otp/blob/9d1b3bb0db87cf95cb821af01189f6d6be072f79/erts/emulator/beam/erl_db.c

In Haskell, is it correct measuring performance using a timestamp obtained at the beginning and in the end of a function execution?

I want to measure the performance of a Haskell function. This function is executed concurrently.
Is it correct to measure its performance using timestamps that getCurrentTime function returns? Does lazyness affects the measuring?
I want to save these times on a log. I have looked some logging libraries, but the time they return is not as precise as the timestamp that getCurrentTime returns. I use XES format on my log.
The code i use is something like this: (i did not compile it)
import Data.Time.Clock
measuredFunction:: Int -> IO (Int,UTCTime,UTCTime)
measuredFunction x = do
time' <- getCurrentTime
--performs some IO action that returns x'
time'' <- getCurrentTime
return (x',time',time'')
runTest :: Int -> Int -> IO ()
runTest init end = do
when (init <= end) (do
forkIO (do
(x',time',time'') <- measuredFunction 1
-- saves time' and time '' in a log
)
runTest (init+1) end )
It depends on the function. Some values have all their information immediately, whereas others can have expensive stuff going on "beyond the top layer". Here's a contrived example:
example :: (Int, Int)
example = (1+1, head [ x | x <- [1..], x == 10^6 ])
If you load this up in ghci, you will see (2, printed, and then after some delay, the remainder of the value 1000000) is printed. If you get a value like this, then the function will "return" "before" the expensive sub-value has been computed. But you can use deepseq to ensure that a value is computed all the way and doesn't have any sub-computations left.
Benchmarking is subtle, and there are a lot of ways to do it wrong (especially in Haskell). Fortunately we have a very good benchmarking library called criterion
(tutorial) which I definitely recommend you use if you are trying to get reliable results.

appending results to array after parallel processing

Some eloquency questions:
A. How to add a list that was formed from parallel processing directly to the a Concurrent results array in an eloquent way.
let results = System.Collections.Concurrent.ConcurrentBag<string>()
let tasks = System.Collections.Generic.List<string>()
tasks.add("a")
tasks.add("b")
let answers = tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
|> Array.toList
Array.append results answers
Attempt Is there a way to append via pipe operator?
let answers = tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
|> Array.append results
B. Is there a way to add items via List constructor?
let tasks = System.Collections.Generic.List<string>()
tasks.add("a")
tasks.add("b")
C. Is there a way to construct a queue from array using Queue constructor?
let items: string[] = [|"a", "b", "c"|]
let jobs = System.Collections.Generic.Queue<string>()
items |> Array.map jobs.Enqueue |> ignore
A. you can't use Array.append on results, because results is a ConcurrentBag, but Array.append expects its argument to be an Array. To add stuff to ConcurrentBag, use its Add method. Add items one by one:
tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
|> Array.iter results.Add
Adding items one by one is a little inefficient. If your ConcurrentBag is really created right in the same function, as your example shows, you may consider using its constructor that takes an IEnumerable<T>:
let answers = tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
let results = System.Collections.Concurrent.ConcurrentBag<string>( answers )
B. yes, there is a way to add stuff to a System.Collections.Generic.List<T>. This class provides a handy Add method for this purpose:
tasks.Add "a"
tasks.Add "b"
Enclosing the argument in parentheses (as in your attempt) is not necessary, but allowed:
tasks.Add("a")
tasks.Add("b")
C. yes, there is a way to construct a queue from an array. The Queue class has a constructor that takes an IEnumerable<T>, and arrays implement IEnumerable<T>, so you can call that constructor on an array:
let jobs = System.Collections.Generic.Queue<string>( items )
Please note that you hardly needed my help to get any of the above information. Everything is freely available on MSDN (see links above) or from autocompletion/intellisense in your favorite code editor.

Seq.map faster than a regular for loop?

I'm learning F# and one thing that preoccupies me about this language is performance. I've written a small benchmark where I compare idiomatic F# to imperative-style code written in the same language - and much to my surprise, the functional version comes out significantly faster.
The benchmark consists of:
Reading in a text file using File.ReadAllLines
Reversing the order of characters within each line
Writing back the result to the same file using File.WriteAllLines.
Here's the code:
open System
open System.IO
open System.Diagnostics
let reverseString(str:string) =
new string(Array.rev(str.ToCharArray()))
let CSharpStyle() =
let lines = File.ReadAllLines("text.txt")
for i in 0 .. lines.Length - 1 do
lines.[i] <- reverseString(lines.[i])
File.WriteAllLines("text.txt", lines)
let FSharpStyle() =
File.ReadAllLines("text.txt")
|> Seq.map reverseString
|> (fun lines -> File.WriteAllLines("text.txt", lines))
let benchmark func message =
// initial call for warm-up
func()
let sw = Stopwatch.StartNew()
for i in 0 .. 19 do
func()
printfn message sw.ElapsedMilliseconds
[<EntryPoint>]
let main args =
benchmark CSharpStyle "C# time: %d ms"
benchmark FSharpStyle "F# time: %d ms"
0
Whatever the size of the file, the "F#-style" version completes in around 75% of the time of the "C#-style" version. My question is, why is that? I see no obvious inefficiency in the imperative version.
Seq.map is different from Array.map. Because sequences (IEnumerable<T>) are not evaluated until they are enumerated, in the F#-style code no computation actually happens until File.WriteAllLines loops through the sequence (not array) generated by Seq.map.
In other words, your C#-style version is reversing all the strings and storing the reversed strings in an array, and then looping through the array to write out to the file. The F#-style version is reversing all the strings and writing them more-or-less directly to the file. That means the C#-style code is looping through the entire file three times (read to array, build reversed array, write array to file), while the F#-style code is looping through the entire file only twice (read to array, write reversed lines to file).
You'd get the best performance of all if you used File.ReadLines instead of File.ReadAllLines combined with Seq.map - but your output file would have to be different from your input file, as you'd be writing to output while still reading from input.
The Seq.map form has several advantages over a regular loop. It can precompute the function reference just once; it can avoid the variable assignments; and it can use the input sequence length to presize the result array.

Resources