F# image manipulation performance problem - performance

I am currently trying to improve the performance of an F# program to make it as fast as its C# equivalent. The program does apply a filter array to a buffer of pixels. Access to memory is always done using pointers.
Here is the C# code which is applied to each pixel of an image:
unsafe private static byte getPixelValue(byte* buffer, double* filter, int filterLength, double filterSum)
{
double sum = 0.0;
for (int i = 0; i < filterLength; ++i)
{
sum += (*buffer) * (*filter);
++buffer;
++filter;
}
sum = sum / filterSum;
if (sum > 255) return 255;
if (sum < 0) return 0;
return (byte) sum;
}
The F# code looks like this and takes three times as long as the C# program:
let getPixelValue (buffer:nativeptr<byte>) (filterData:nativeptr<float>) filterLength filterSum : byte =
let rec accumulatePixel (acc:float) (buffer:nativeptr<byte>) (filter:nativeptr<float>) i =
if i > 0 then
let newAcc = acc + (float (NativePtr.read buffer) * (NativePtr.read filter))
accumulatePixel newAcc (NativePtr.add buffer 1) (NativePtr.add filter 1) (i-1)
else
acc
let acc = (accumulatePixel 0.0 buffer filterData filterLength) / filterSum
match acc with
| _ when acc > 255.0 -> 255uy
| _ when acc < 0.0 -> 0uy
| _ -> byte acc
Using mutable Variables and a for loop in F# does result in the same speed as using recursion. All Projects are configured to run in Release Mode with Code Optimization turned on.
How could the performance of the F# version be improved?
EDIT:
The bottleneck seems to be in (NativePtr.get buffer offset). If I replace this code with a fixed value and also replace the corresponding code in the C# version with a fixed value, I get about the same speed for both programs. In fact, in C# the speed does not change at all, but in F# it makes a huge difference.
Can this behaviour possibly be changed or is it rooted deeply in the architecture of F#?
EDIT 2:
I refactored the code again to use for-loops. The execution speed remains the same:
let mutable acc <- 0.0
let mutable f <- filterData
let mutable b <- tBuffer
for i in 1 .. filter.FilterLength do
acc <- acc + (float (NativePtr.read b)) * (NativePtr.read f)
f <- NativePtr.add f 1
b <- NativePtr.add b 1
If I compare the IL code of a version that uses (NativePtr.read b) and another version that is the same except that it uses a fixed value 111uy instead of reading it from the pointer, Only the following lines in the IL code change:
111uy has IL-Code ldc.i4.s 0x6f (0.3 seconds)
(NativePtr.read b) has IL-Code lines ldloc.s b and ldobj uint8 (1.4 seconds)
For comparison: C# does the filtering in 0.4 seconds.
The fact that reading the filter does not impact performance while reading from the image buffer does is somehow confusing. Before I filter a line of the image I copy the line into a buffer that has the length of a line. That's why the read operations are not spread all over the image but are within this buffer, which has a size of about 800 bytes.

If we look at the actual IL code of the inner loop which traverses both buffers in parallel generated by C# compiler (relevant part):
L_0017: ldarg.0
L_0018: ldc.i4.1
L_0019: conv.i
L_001a: add
L_001b: starg.s buffer
L_001d: ldarg.1
L_001e: ldc.i4.8
L_001f: conv.i
L_0020: add
and F# compiler:
L_0017: ldc.i4.1
L_0018: conv.i
L_0019: sizeof uint8
L_001f: mul
L_0020: add
L_0021: ldarg.2
L_0022: ldc.i4.1
L_0023: conv.i
L_0024: sizeof float64
L_002a: mul
L_002b: add
we'll notice that while C# code uses only add operator while F# needs both mul and add. But obviously on each step we only need to increment pointers (by 'sizeof byte' and 'sizeof float' values respectively), not to calculate address (addrBase + (sizeof byte)) F# mul is unnecessary (it always multiplies by 1).
The cause for that is that C# defines ++ operator for pointers while F# provides only add : nativeptr<'T> -> int -> nativeptr<'T> operator:
[<NoDynamicInvocation>]
let inline add (x : nativeptr<'a>) (n:int) : nativeptr<'a> = to_nativeint x + nativeint n * (# "sizeof !0" type('a) : nativeint #) |> of_nativeint
So it's not "rooted deeply" in F#, it's just that module NativePtr lacks inc and dec functions.
Btw, I suspect the above sample could be written in a more concise manner if the arguments were passed as arrays instead of raw pointers.
UPDATE:
So does the following code have only 1% speed up (it seems to generate very similar to C# IL):
let getPixelValue (buffer:nativeptr<byte>) (filterData:nativeptr<float>) filterLength filterSum : byte =
let rec accumulatePixel (acc:float) (buffer:nativeptr<byte>) (filter:nativeptr<float>) i =
if i > 0 then
let newAcc = acc + (float (NativePtr.read buffer) * (NativePtr.read filter))
accumulatePixel newAcc (NativePtr.ofNativeInt <| (NativePtr.toNativeInt buffer) + (nativeint 1)) (NativePtr.ofNativeInt <| (NativePtr.toNativeInt filter) + (nativeint 8)) (i-1)
else
acc
let acc = (accumulatePixel 0.0 buffer filterData filterLength) / filterSum
match acc with
| _ when acc > 255.0 -> 255uy
| _ when acc < 0.0 -> 0uy
| _ -> byte acc
Another thought: it might also depend on the number of calls to getPixelValue your test does (F# splits this function into two methods while C# does it in one).
Is it possible that you post your testing code here?
Regarding array - I'd expect the code be at least more concise (and not unsafe).
UPDATE #2:
Looks like the actual bottleneck here is byte->float conversion.
C#:
L_0003: ldarg.1
L_0004: ldind.u1
L_0005: conv.r8
F#:
L_000c: ldarg.1
L_000d: ldobj uint8
L_0012: conv.r.un
L_0013: conv.r8
For some reason F# uses the following path: byte->float32->float64 while C# does only byte->float64. Not sure why is that, but with the following hack my F# version runs with the same speed as C# on gradbot test sample (BTW, thanks gradbot for the test!):
let inline preadConvert (p : nativeptr<byte>) = (# "conv.r8" (# "ldobj !0" type (byte) p : byte #) : float #)
let inline pinc (x : nativeptr<'a>) : nativeptr<'a> = NativePtr.toNativeInt x + (# "sizeof !0" type('a) : nativeint #) |> NativePtr.ofNativeInt
let rec accumulatePixel_ed (acc, buffer, filter, i) =
if i > 0 then
accumulatePixel_ed
(acc + (preadConvert buffer) * (NativePtr.read filter),
(pinc buffer),
(pinc filter),
(i-1))
else
acc
Results:
adrian 6374985677.162810 1408.870900 ms
gradbot 6374985677.162810 1218.908200 ms
C# 6374985677.162810 227.832800 ms
C# Offset 6374985677.162810 224.921000 ms
mutable 6374985677.162810 1254.337300 ms
ed'ka 6374985677.162810 227.543100 ms
LAST UPDATE
It turned out that we can achieve the same speed even without any hacks:
let rec accumulatePixel_ed_last (acc, buffer, filter, i) =
if i > 0 then
accumulatePixel_ed_last
(acc + (float << int16 <| NativePtr.read buffer) * (NativePtr.read filter),
(NativePtr.add buffer 1),
(NativePtr.add filter 1),
(i-1))
else
acc
All we need to do is to convert byte into, say int16 and then into float. This way 'costly' conv.r.un instruction will be avoided.
PS Relevant conversion code from "prim-types.fs" :
let inline float (x: ^a) =
(^a : (static member ToDouble : ^a -> float) (x))
when ^a : float = (# "" x : float #)
when ^a : float32 = (# "conv.r8" x : float #)
// [skipped]
when ^a : int16 = (# "conv.r8" x : float #)
// [skipped]
when ^a : byte = (# "conv.r.un conv.r8" x : float #)
when ^a : decimal = (System.Convert.ToDouble((# "" x : decimal #)))

How does this compare? It has less calls to NativePtr.
let getPixelValue (buffer:nativeptr<byte>) (filterData:nativeptr<float>) filterLength filterSum : byte =
let accumulatePixel (acc:float) (buffer:nativeptr<byte>) (filter:nativeptr<float>) length =
let rec accumulate acc offset =
if offset < length then
let newAcc = acc + (float (NativePtr.get buffer offset) * (NativePtr.get filter offset))
accumulate newAcc (offset + 1)
else
acc
accumulate acc 0
let acc = (accumulatePixel 0.0 buffer filterData filterLength) / filterSum
match acc with
| _ when acc > 255.0 -> 255uy
| _ when acc < 0.0 -> 0uy
| _ -> byte acc
F# source code of NativePtr.
[<NoDynamicInvocation>]
[<CompiledName("AddPointerInlined")>]
let inline add (x : nativeptr<'T>) (n:int) : nativeptr<'T> = toNativeInt x + nativeint n * (# "sizeof !0" type('T) : nativeint #) |> ofNativeInt
[<NoDynamicInvocation>]
[<CompiledName("GetPointerInlined")>]
let inline get (p : nativeptr<'T>) n = (# "ldobj !0" type ('T) (add p n) : 'T #)

My results on a larger test.
adrian 6374730426.098020 1561.102500 ms
gradbot 6374730426.098020 1842.768000 ms
C# 6374730426.098020 150.793500 ms
C# Offset 6374730426.098020 150.318900 ms
mutable 6374730426.098020 1446.616700 ms
F# test code
open Microsoft.FSharp.NativeInterop
open System.Runtime.InteropServices
open System.Diagnostics
open AccumulatePixel
#nowarn "9"
let test size fn =
let bufferByte = Marshal.AllocHGlobal(size * 4)
let bufferFloat = Marshal.AllocHGlobal(size * 8)
let bi = NativePtr.ofNativeInt bufferByte
let bf = NativePtr.ofNativeInt bufferFloat
let random = System.Random()
for i in 1 .. size do
NativePtr.set bi i (byte <| random.Next() % 256)
NativePtr.set bf i (random.NextDouble())
let duration (f, name) =
let stopWatch = Stopwatch.StartNew()
let time = f(0.0, bi, bf, size)
stopWatch.Stop()
printfn "%10s %f %f ms" name time stopWatch.Elapsed.TotalMilliseconds
List.iter duration fn
Marshal.FreeHGlobal bufferFloat
Marshal.FreeHGlobal bufferByte
let rec accumulatePixel_adrian (acc, buffer, filter, i) =
if i > 0 then
let newAcc = acc + (float (NativePtr.read buffer) * (NativePtr.read filter))
accumulatePixel_adrian (newAcc, (NativePtr.add buffer 1), (NativePtr.add filter 1), (i - 1))
else
acc
let accumulatePixel_gradbot (acc, buffer, filter, length) =
let rec accumulate acc offset =
if offset < length then
let newAcc = acc + (float (NativePtr.get buffer offset) * (NativePtr.get filter offset))
accumulate newAcc (offset + 1)
else
acc
accumulate acc 0
let accumulatePixel_mutable (acc, buffer, filter, length) =
let mutable acc = 0.0
let mutable f = filter
let mutable b = buffer
for i in 1 .. length do
acc <- acc + (float (NativePtr.read b)) * (NativePtr.read f)
f <- NativePtr.add f 1
b <- NativePtr.add b 1
acc
[
accumulatePixel_adrian, "adrian";
accumulatePixel_gradbot, "gradbot";
AccumulatePixel.getPixelValue, "C#";
AccumulatePixel.getPixelValueOffset, "C# Offset";
accumulatePixel_mutable, "mutable";
]
|> test 100000000
System.Console.ReadLine() |> ignore
C# test code
namespace AccumulatePixel
{
public class AccumulatePixel
{
unsafe public static double getPixelValue(double sum, byte* buffer, double* filter, int filterLength)
{
for (int i = 0; i < filterLength; ++i)
{
sum += (*buffer) * (*filter);
++buffer;
++filter;
}
return sum;
}
unsafe public static double getPixelValueOffset(double sum, byte* buffer, double* filter, int filterLength)
{
for (int i = 0; i < filterLength; ++i)
{
sum += buffer[i] * filter[i];
}
return sum;
}
}
}

Related

Competitive programming using Haskell

I am currently trying to refresh my Haskell knowledge by solving some Hackerrank problems.
For example:
https://www.hackerrank.com/challenges/maximum-palindromes/problem
I've already implemented an imperative solution in C++ which got accepted for all test cases. Now I am trying to come up with a pure functional solution in (reasonably idiomatic) Haskell.
My current code is
module Main where
import Control.Monad
import qualified Data.ByteString.Char8 as C
import Data.Bits
import Data.List
import qualified Data.Map.Strict as Map
import qualified Data.IntMap.Strict as IntMap
import Debug.Trace
-- precompute factorials
compFactorials :: Int -> Int -> IntMap.IntMap Int
compFactorials n m = go 0 1 IntMap.empty
where
go a acc map
| a < 0 = map
| a < n = go a' acc' map'
| otherwise = map'
where
map' = IntMap.insert a acc map
a' = a + 1
acc' = (acc * a') `mod` m
-- precompute invs
compInvs :: Int -> Int -> IntMap.IntMap Int -> IntMap.IntMap Int
compInvs n m facts = go 0 IntMap.empty
where
go a map
| a < 0 = map
| a < n = go a' map'
| otherwise = map'
where
map' = IntMap.insert a v map
a' = a + 1
v = (modExp b (m-2) m) `mod` m
b = (IntMap.!) facts a
modExp :: Int -> Int -> Int -> Int
modExp b e m = go b e 1
where
go b e r
| (.&.) e 1 == 1 = go b' e' r'
| e > 0 = go b' e' r
| otherwise = r
where
r' = (r * b) `mod` m
b' = (b * b) `mod` m
e' = shift e (-1)
-- precompute frequency table
initFreqMap :: C.ByteString -> Map.Map Char (IntMap.IntMap Int)
initFreqMap inp = go 1 map1 map2 inp
where
map1 = Map.fromList $ zip ['a'..'z'] $ repeat 0
map2 = Map.fromList $ zip ['a'..'z'] $ repeat IntMap.empty
go idx m1 m2 inp
| C.null inp = m2
| otherwise = go (idx+1) m1' m2' $ C.tail inp
where
m1' = Map.update (\v -> Just $ v+1) (C.head inp) m1
m2' = foldl' (\m w -> Map.update (\v -> liftM (\c -> IntMap.insert idx c v) $ Map.lookup w m1') w m)
m2 ['a'..'z']
query :: Int -> Int -> Int -> Map.Map Char (IntMap.IntMap Int)
-> IntMap.IntMap Int -> IntMap.IntMap Int -> Int
query l r m freqMap facts invs
| x > 1 = (x * y) `mod` m
| otherwise = y
where
calcCnt cs = cr - cl
where
cl = IntMap.findWithDefault 0 (l-1) cs
cr = IntMap.findWithDefault 0 r cs
f1 acc cs
| even cnt = acc
| otherwise = acc + 1
where
cnt = calcCnt cs
f2 (acc1,acc2) cs
| cnt < 2 = (acc1 ,acc2)
| otherwise = (acc1',acc2')
where
cnt = calcCnt cs
n = cnt `div` 2
acc1' = acc1 + n
r = choose acc1' n
acc2' = (acc2 * r) `mod` m
-- calc binomial coefficient using Fermat's little theorem
choose n k
| n < k = 0
| otherwise = (f1 * t) `mod` m
where
f1 = (IntMap.!) facts n
i1 = (IntMap.!) invs k
i2 = (IntMap.!) invs (n-k)
t = (i1 * i2) `mod` m
x = Map.foldl' f1 0 freqMap
y = snd $ Map.foldl' f2 (0,1) freqMap
main :: IO()
main = do
inp <- C.getLine
q <- readLn :: IO Int
let modulo = 1000000007
let facts = compFactorials (C.length inp) modulo
let invs = compInvs (C.length inp) modulo facts
let freqMap = initFreqMap inp
forM_ [1..q] $ \_ -> do
line <- getLine
let [s1, s2] = words line
let l = (read s1) :: Int
let r = (read s2) :: Int
let result = query l r modulo freqMap facts invs
putStrLn $ show result
It passes all small and medium test cases but I am getting timeout with large test cases.
The key to solve this problem is to precompute some stuff once at the beginning and use them to answer the individual queries efficiently.
Now, my main problem where I need help is:
The initital profiling shows that the lookup operation of the IntMap seems to be the main bottleneck. Is there better alternative to IntMap for memoization? Or should I look at Vector or Array, which I believe will lead to more "ugly" code.
Even in current state, the code doesn't look nice (by functional standards) and as verbose as my C++ solution. Any tips to make it more idiomatic? Other than IntMap usage for memoization, do you spot any other obvious problems which can lead to performance problems?
And is there any good sources, where I can learn how to use Haskell more effectively for competitive programming?
A sample large testcase, where the current code gets timeout:
input.txt
output.txt
For comparison my C++ solution:
#include <vector>
#include <iostream>
#define MOD 1000000007L
long mod_exp(long b, long e) {
long r = 1;
while (e > 0) {
if ((e & 1) == 1) {
r = (r * b) % MOD;
}
b = (b * b) % MOD;
e >>= 1;
}
return r;
}
long n_choose_k(int n, int k, const std::vector<long> &fact_map, const std::vector<long> &inv_map) {
if (n < k) {
return 0;
}
long l1 = fact_map[n];
long l2 = (inv_map[k] * inv_map[n-k]) % MOD;
return (l1 * l2) % MOD;
}
int main() {
std::string s;
int q;
std::cin >> s >> q;
std::vector<std::vector<long>> freq_map;
std::vector<long> fact_map(s.size()+1);
std::vector<long> inv_map(s.size()+1);
for (int i = 0; i < 26; i++) {
freq_map.emplace_back(std::vector<long>(s.size(), 0));
}
std::vector<long> acc_map(26, 0);
for (int i = 0; i < s.size(); i++) {
acc_map[s[i]-'a']++;
for (int j = 0; j < 26; j++) {
freq_map[j][i] = acc_map[j];
}
}
fact_map[0] = 1;
inv_map[0] = 1;
for (int i = 1; i <= s.size(); i++) {
fact_map[i] = (i * fact_map[i-1]) % MOD;
inv_map[i] = mod_exp(fact_map[i], MOD-2) % MOD;
}
while (q--) {
int l, r;
std::cin >> l >> r;
std::vector<long> x(26, 0);
long t = 0;
long acc = 0;
long result = 1;
for (int i = 0; i < 26; i++) {
auto cnt = freq_map[i][r-1] - (l > 1 ? freq_map[i][l-2] : 0);
if (cnt % 2 != 0) {
t++;
}
long n = cnt / 2;
if (n > 0) {
acc += n;
result *= n_choose_k(acc, n, fact_map, inv_map);
result = result % MOD;
}
}
if (t > 0) {
result *= t;
result = result % MOD;
}
std::cout << result << std::endl;
}
}
UPDATE:
DanielWagner's answer has confirmed my suspicion that the main problem in my code was the usage of IntMap for memoization. Replacing IntMap with Array made my code perform similar to DanielWagner's solution.
module Main where
import Control.Monad
import Data.Array (Array)
import qualified Data.Array as A
import qualified Data.ByteString.Char8 as C
import Data.Bits
import Data.List
import Debug.Trace
-- precompute factorials
compFactorials :: Int -> Int -> Array Int Int
compFactorials n m = A.listArray (0,n) $ scanl' f 1 [1..n]
where
f acc a = (acc * a) `mod` m
-- precompute invs
compInvs :: Int -> Int -> Array Int Int -> Array Int Int
compInvs n m facts = A.listArray (0,n) $ map f [0..n]
where
f a = (modExp ((A.!) facts a) (m-2) m) `mod` m
modExp :: Int -> Int -> Int -> Int
modExp b e m = go b e 1
where
go b e r
| (.&.) e 1 == 1 = go b' e' r'
| e > 0 = go b' e' r
| otherwise = r
where
r' = (r * b) `mod` m
b' = (b * b) `mod` m
e' = shift e (-1)
-- precompute frequency table
initFreqMap :: C.ByteString -> Map.Map Char (Array Int Int)
initFreqMap inp = Map.fromList $ map f ['a'..'z']
where
n = C.length inp
f c = (c, A.listArray (0,n) $ scanl' g 0 [0..n-1])
where
g x j
| C.index inp j == c = x+1
| otherwise = x
query :: Int -> Int -> Int -> Map.Map Char (Array Int Int)
-> Array Int Int -> Array Int Int -> Int
query l r m freqMap facts invs
| x > 1 = (x * y) `mod` m
| otherwise = y
where
calcCnt freqMap = cr - cl
where
cl = (A.!) freqMap (l-1)
cr = (A.!) freqMap r
f1 acc cs
| even cnt = acc
| otherwise = acc + 1
where
cnt = calcCnt cs
f2 (acc1,acc2) cs
| cnt < 2 = (acc1 ,acc2)
| otherwise = (acc1',acc2')
where
cnt = calcCnt cs
n = cnt `div` 2
acc1' = acc1 + n
r = choose acc1' n
acc2' = (acc2 * r) `mod` m
-- calc binomial coefficient using Fermat's little theorem
choose n k
| n < k = 0
| otherwise = (f1 * t) `mod` m
where
f1 = (A.!) facts n
i1 = (A.!) invs k
i2 = (A.!) invs (n-k)
t = (i1 * i2) `mod` m
x = Map.foldl' f1 0 freqMap
y = snd $ Map.foldl' f2 (0,1) freqMap
main :: IO()
main = do
inp <- C.getLine
q <- readLn :: IO Int
let modulo = 1000000007
let facts = compFactorials (C.length inp) modulo
let invs = compInvs (C.length inp) modulo facts
let freqMap = initFreqMap inp
replicateM_ q $ do
line <- getLine
let [s1, s2] = words line
let l = (read s1) :: Int
let r = (read s2) :: Int
let result = query l r modulo freqMap facts invs
putStrLn $ show result
I think you've shot yourself in the foot by trying to be too clever. Below I'll show a straightforward implementation of a slightly different algorithm that is about 5x faster than your Haskell code.
Here's the core combinatoric computation. Given a character frequency count for a substring, we can compute the number of maximum-length palindromes this way:
Divide all the frequencies by two, rounding down; call this the div2-frequencies. We'll also want the mod2-frequencies, which is the set of letters for which we had to round down.
Sum the div2-frequencies to get the total length of the palindrome prefix; its factorial gives an overcount of the number of possible prefixes for the palindrome.
Take the product of the factorials of the div2-frequencies. This tells the factor by which we overcounted above.
Take the size of the mod2-frequencies, or choose 1 if there are none. We can extend any of the palindrome prefixes by one of the values in this set, if there are any, so we have to multiply by this size.
For the overcounting step, it's not super obvious to me whether it would be faster to store precomputed inverses for factorials, and take their product, or whether it's faster to just take the product of all the factorials and do one inverse operation at the very end. I'll do the latter, because it just intuitively seems faster to do one inversion per query than one lookup per repeated letter, but what do I know? Should be easy to test if you want to try to adapt the code yourself.
There's only one other quick insight I had vs. your code, which is that we can cache the frequency counts for prefixes of the input; then computing the frequency count for a substring is just pointwise subtraction of two cached counts. Your precomputation on the input I find to be a bit excessive in comparison.
Without further ado, let's see some code. As usual there's some preamble.
module Main where
import Control.Monad
import Data.Array (Array)
import qualified Data.Array as A
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as M
import Data.Monoid
Like you, I want to do all my computations on cheap Ints and bake in the modular operations where possible. I'll make a newtype to make sure this happens for me.
newtype Mod1000000007 = Mod Int deriving (Eq, Ord)
instance Num Mod1000000007 where
fromInteger = Mod . (`mod` 1000000007) . fromInteger
Mod l + Mod r = Mod ((l+r) `rem` 1000000007)
Mod l * Mod r = Mod ((l*r) `rem` 1000000007)
negate (Mod v) = Mod ((1000000007 - v) `rem` 1000000007)
abs = id
signum = id
instance Integral Mod1000000007 where
toInteger (Mod n) = toInteger n
quotRem a b = (a * b^1000000005, 0)
I baked in the base of 1000000007 in several places, but it's easy to generalize by giving Mod a phantom parameter and making a HasBase class to pick the base. Ask a fresh question if you're not sure how and are interested; I'll be happy to do a more thorough writeup. There's a few more instances for Mod that are basically uninteresting and primarily needed because of Haskell's wacko numeric class hierarchy:
instance Show Mod1000000007 where show (Mod n) = show n
instance Real Mod1000000007 where toRational (Mod n) = toRational n
instance Enum Mod1000000007 where
toEnum = Mod . (`mod` 1000000007)
fromEnum (Mod n) = n
Here's the precomputation we want to do for factorials...
type FactMap = Array Int Mod1000000007
factMap :: Int -> FactMap
factMap n = A.listArray (0,n) (scanl (*) 1 [1..])
...and for precomputing frequency maps for each prefix, plus getting a frequency map given a start and end point.
type FreqMap = Map Char Int
freqMaps :: String -> Array Int FreqMap
freqMaps s = go where
go = A.listArray (0, length s)
(M.empty : [M.insertWith (+) c 1 (go A.! i) | (i, c) <- zip [0..] s])
substringFreqMap :: Array Int FreqMap -> Int -> Int -> FreqMap
substringFreqMap maps l r = M.unionWith (-) (maps A.! r) (maps A.! (l-1))
Implementing the core computation described above is just a few lines of code, now that we have suitable Num and Integral instances for Mod1000000007:
palindromeCount :: FactMap -> FreqMap -> Mod1000000007
palindromeCount facts freqs
= toEnum (max 1 mod2Freqs)
* (facts A.! sum div2Freqs)
`div` product (map (facts A.!) div2Freqs)
where
(div2Freqs, Sum mod2Freqs) = foldMap (\n -> ([n `quot` 2], Sum (n `rem` 2))) freqs
Now we just need a short driver to read stuff and pass it around to the appropriate functions.
main :: IO ()
main = do
inp <- getLine
q <- readLn
let freqs = freqMaps inp
facts = factMap (length inp)
replicateM_ q $ do
[l,r] <- map read . words <$> getLine
print . palindromeCount facts $ substringFreqMap freqs l r
That's it. Notably I made no attempt to be fancy about bitwise operations and didn't do anything fancy with accumulators; everything is in what I would consider idiomatic purely-functional style. The final count is about half as much code that runs about 5x faster.
P.S. Just for fun, I replaced the last line with print (l+r :: Int)... and discovered that about half the time is spent in read. Ouch! Seems there's still plenty of low-hanging fruit if this isn't fast enough yet.

Implementing a FIR filter using Vectors

I have implemented a FIR filter in Haskell. I don't know that much about FIR filters and my code is heavily based on an existing C# implementation. Therefore, I have a feeling that my implementation is has too much of a C# style and is not really Haskell-like. I would like to know if there is a more idiomatic Haskell way of implementing my code. Ideally, I'm lucky for some combination of higher-order functions (map, filter, fold, etc.) that implement the algorithm.
My Haskell code looks like this:
applyFIR :: Vector Double -> Vector Double -> Vector Double
applyFIR b x = generate (U.length x) help
where
help i = if i >= (U.length b - 1) then loop i (U.length b - 1) else 0
loop yi bi = if bi < 0 then 0 else b !! bi * x !! (yi-bi) + loop yi (bi-1)
vec !! i = unsafeIndex vec i -- Shorthand for unsafeIndex
This code is based on the following C# code:
public float[] RunFilter(double[] x)
{
int M = coeff.Length;
int n = x.Length;
//y[n]=b0x[n]+b1x[n-1]+....bmx[n-M]
var y = new float[n];
for (int yi = 0; yi < n; yi++)
{
double t = 0.0f;
for (int bi = M - 1; bi >= 0; bi--)
{
if (yi - bi < 0) continue;
t += coeff[bi] * x[yi - bi];
}
y[yi] = (float) t;
}
return y;
}
As you can see, it's almost a straight copy. How can I turn my implementation into a more Haskell-like one? Do you have any ideas? The only thing I could come up with was using Vector.generate.
I know that the DSP library has an implementation available. But it uses lists and is way too slow for my use case. This Vector implementation is a lot faster than the one in DSP.
I've also tried implementing the algorithm using Repa. It is faster than the Vector implementation. Here is the result:
applyFIR :: V.Vector Float -> Array U DIM1 Float -> Array D DIM1 Float
applyFIR b x = R.traverse x id (\_ (Z :. i) -> if i >= len then loop i (len - 1) else 0)
where
len = V.length b
loop :: Int -> Int -> Float
loop yi bi = if bi < 0 then 0 else (V.unsafeIndex b bi) * x !! (Z :. (yi-bi)) + loop yi (bi-1)
arr !! i = unsafeIndex arr i
First of all, I don't think that your initial vector code is a faithful translation - that is, I think it disagrees with the C# code. For example, suppose that both "x" and "b" ("b" is coeff in C#) have length 3, and have all values of 1.0. Then for y[0] the C# code would produce x[0] * coeff[0], or 1.0. (it would hit continue for all other values of bi)
With your Haskell code, however, help 0 produces 0. Your Repa version seems to suffer from the same problem.
So let's start with a more faithful translation:
applyFIR :: Vector Double -> Vector Double -> Vector Double
applyFIR b x = generate (U.length x) help
where
help i = loop i (min i $ U.length b - 1)
loop yi bi = if bi < 0 then 0 else b !! bi * x !! (yi-bi) + loop yi (bi-1)
vec !! i = unsafeIndex vec i -- Shorthand for unsafeIndex
Now, you're basically doing a calculation like this for computing, say, y[3]:
... b[3] | b[2] | b[1] | b[0]
x[0] | x[1] | x[2] | x[3] | x[4] | x[5] | ....
multiply
b[3]*x[0]|b[2]*x[1] |b[1]*x[2] |b[0]*x[3]
sum
y[3] = b[3]*x[0] + b[2]*x[1] + b[1]*x[2] + b[0]*x[3]
So one way to think of what you're doing is "take the b vector, reverse it, and to compute spot i of the result, line b[0] up with x[i], multiply all the corresponding x and b entries, and compute the sum".
So let's do that:
applyFIR :: Vector Double -> Vector Double -> Vector Double
applyFIR b x = generate (U.length x) help
where
revB = U.reverse b
bLen = U.length b
help i = let sliceLen = min (i+1) bLen
bSlice = U.slice (bLen - sliceLen) sliceLen revB
xSlice = U.slice (i + 1 - sliceLen) sliceLen x
in U.sum $ U.zipWith (*) bSlice xSlice

How to wrap last/first element making building interpolation?

I've this code that iterate some samples and build a simple linear interpolation between the points:
foreach sample:
base = floor(index_pointer)
frac = index_pointer - base
out = in[base] * (1 - frac) + in[base + 1] * frac
index_pointer += speed
// restart
if(index_pointer >= sample_length)
{
index_pointer = 0
}
using "speed" equal to 1, the game is done. But if the index_pointer is different than 1 (i.e. got fractional part) I need to wrap last/first element keeping the translation consistent.
How would you do this? Double indexes?
Here's an example of values I have. Let say in array of 4 values: [8, 12, 16, 20].
It will be:
1.0*in[0] + 0.0*in[1]=8
0.28*in[0] + 0.72*in[1]=10.88
0.56*in[1] + 0.44*in[2]=13.76
0.84*in[2] + 0.14*in[3]=16.64
0.12*in[2] + 0.88*in[3]=19.52
0.4*in[3] + 0.6*in[4]=8 // wrong; here I need to wrapper
the last point is wrong. [4] will be 0 because I don't have [4], but the first part need to take care of 0.4 and the weight of first sample (I think?).
Just wrap around the indices:
out = in[base] * (1 - frac) + in[(base + 1) % N] * frac
, where % is the modulo operator and N is the number of input samples.
This procedure generates the following line for your sample data (the dashed lines are the interpolated sample points, the circles are the input values):
I think I understand the problem now (answer only applies if I really did...):
You sample values at a nominal speed sn. But actually your sampler samples at a real speed s, where s != sn. Now, you want to create a function which re-samples the series, sampled at speed s, so it yields a series as if it were sampled with speed sn by means of linear interpolation between 2 adjacent samples. Or, your sampler jitters (has variances in time when it actually samples, which is sn + Noise(sn)).
Here is my approach - a function named "re-sample". It takes the sample data and a list of desired re-sample-points.
For any re-sample point which would index outside the raw data, it returns the respective border value.
let resample (data : float array) times =
let N = Array.length data
let maxIndex = N-1
let weight (t : float) =
t - (floor t)
let interpolate x1 x2 w = x1 * (1.0 - w) + x2 * w
let interp t1 t2 w =
//printfn "t1 = %d t2 = %d w = %f" t1 t2 w
interpolate (data.[t1]) (data.[t2]) w
let inter t =
let t1 = int (floor t)
match t1 with
| x when x >= 0 && x < maxIndex ->
let t2 = t1 + 1
interp t1 t2 (weight t)
| x when x >= maxIndex -> data.[maxIndex]
| _ -> data.[0]
times
|> List.map (fun t -> t, inter t)
|> Array.ofList
let raw_data = [8; 12; 16; 20] |> List.map float |> Array.ofList
let resampled = resample raw_data [0.0..0.2..4.0]
And yields:
val resample : data:float array -> times:float list -> (float * float) []
val raw_data : float [] = [|8.0; 12.0; 16.0; 20.0|]
val resampled : (float * float) [] =
[|(0.0, 8.0); (0.2, 8.8); (0.4, 9.6); (0.6, 10.4); (0.8, 11.2); (1.0, 12.0);
(1.2, 12.8); (1.4, 13.6); (1.6, 14.4); (1.8, 15.2); (2.0, 16.0);
(2.2, 16.8); (2.4, 17.6); (2.6, 18.4); (2.8, 19.2); (3.0, 20.0);
(3.2, 20.0); (3.4, 20.0); (3.6, 20.0); (3.8, 20.0); (4.0, 20.0)|]
Now, I still fail to understand the "wrap around" part of your question. In the end, interpolation - in contrast to extrapolation is only defined for values in [0..N-1]. So it is up to you to decide if the function should produce a run time error or simply use the edge values (or 0) for time values out of bounds of your raw data array.
EDIT
As it turned out, it is about how to use a cyclic (ring) buffer for this as well.
Here, a version of the resample function, using a cyclic buffer. Along with some operations.
update adds a new sample value to the ring buffer
read reads the content a ring buffer element as if it were a normal array, indexed from [0..N-1].
initXXX functions which create the ring buffer in various forms.
length which returns the length or capacity of the ring buffer.
The ring buffer logics is factored into a module to keep it all clean.
module Cyclic =
let wrap n x = x % n // % is modulo operator, just like in C/C++
type Series = { A : float array; WritePosition : int }
let init (n : int) =
{ A = Array.init n (fun i -> 0.);
WritePosition = 0
}
let initFromArray a =
let n = Array.length a
{ A = Array.copy a;
WritePosition = 0
}
let initUseArray a =
let n = Array.length a
{ A = a;
WritePosition = 0
}
let update (sample : float ) (series : Series) =
let wrapper = wrap (Array.length series.A)
series.A.[series.WritePosition] <- sample
{ series with
WritePosition = wrapper (series.WritePosition + 1) }
let read i series =
let n = Array.length series.A
let wrapper = wrap (Array.length series.A)
series.A.[wrapper (series.WritePosition + i)]
let length (series : Series) = Array.length (series.A)
let resampleSeries (data : Cyclic.Series) times =
let N = Cyclic.length data
let maxIndex = N-1
let weight (t : float) =
t - (floor t)
let interpolate x1 x2 w = x1 * (1.0 - w) + x2 * w
let interp t1 t2 w =
interpolate (Cyclic.read t1 data) (Cyclic.read t2 data) w
let inter t =
let t1 = int (floor t)
match t1 with
| x when x >= 0 && x < maxIndex ->
let t2 = t1 + 1
interp t1 t2 (weight t)
| x when x >= maxIndex -> Cyclic.read maxIndex data
| _ -> Cyclic.read 0 data
times
|> List.map (fun t -> t, inter t)
|> Array.ofList
let input = raw_data
let rawSeries0 = Cyclic.initFromArray input
(resampleSeries rawSeries0 [0.0..0.2..4.0]) = resampled

Optimising F# answer for Euler #4

I have recently begun learning F#. Hoping to use it to perform any mathematically heavy algorithms in C# applications and to broaden my knowledge
I have so far avoided StackOverflow as I didn't want to see the answer to this until I came to one myself.
I want to be able to write very efficient F# code, focused on performance and then maybe in other ways, such as writing in F# concisely (number of lines etc.).
Project Euler Question 4:
A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.
Find the largest palindrome made from the product of two 3-digit numbers.
My Answer:
let IsPalindrome (x:int) = if x.ToString().ToCharArray() = Array.rev(x.ToString().ToCharArray()) then x else 0
let euler4 = [for i in [100..999] do
for j in [i..999] do yield i*j]
|> Seq.filter(fun x -> x = IsPalindrome(x)) |> Seq.max |> printf "Largest product of two 3-digit numbers is %d"
I tried using option and returning Some(x) and None in IsPalindrome but kept getting compiling errors as I was passing in an int and returning int option. I got a NullRefenceException trying to return None.Value.
Instead I return 0 if the number isn't a palindrome, these 0's go into the Sequence, unfortunately.
Maybe I could order the sequence and then get the top value? instead of using Seq.Max? Or filter out results > 1?
Would this be better? Any advice would be much appreciated, even if it's general F# advice.
Efficiency being a primary concern, using string allocation/manipulation to find a numeric palindrome seems misguided – here's my approach:
module NumericLiteralG =
let inline FromZero () = LanguagePrimitives.GenericZero
let inline FromOne () = LanguagePrimitives.GenericOne
module Euler =
let inline isNumPalindrome number =
let ten = 1G + 1G + 1G + 1G + 1G + 1G + 1G + 1G + 1G + 1G
let hundred = ten * ten
let rec findHighDiv div =
let div' = div * ten
if number / div' = 0G then div else findHighDiv div'
let rec impl n div =
div = 0G || n / div = n % ten && impl (n % div / ten) (div / hundred)
findHighDiv 1G |> impl number
let problem004 () =
{ 100 .. 999 }
|> Seq.collect (fun n -> Seq.init (1000 - n) ((+) n >> (*) n))
|> Seq.filter isNumPalindrome
|> Seq.max
Here's one way to do it:
/// handy extension for reversing a string
type System.String with
member s.Reverse() = String(Array.rev (s.ToCharArray()))
let isPalindrome x = let s = string x in s = s.Reverse()
seq {
for i in 100..999 do
for j in i..999 -> i * j
}
|> Seq.filter isPalindrome
|> Seq.max
|> printfn "The answer is: %d"
let IsPalindrom (str:string)=
let rec fn(a,b)=a>b||str.[a]=str.[b]&&fn(a+1,b-1)
fn(0,str.Length-1)
let IsIntPalindrome = (string>>IsPalindrom)
let sq={100..999}
sq|>Seq.map (fun x->sq|>Seq.map (fun y->(x,y),x*y))
|>Seq.concat|>Seq.filter (snd>>IsIntPalindrome)|>Seq.maxBy (snd)
just my solution:
let isPalin x =
x.ToString() = new string(Array.rev (x.ToString().ToCharArray()))
let isGood num seq1 = Seq.exists (fun elem -> (num % elem = 0 && (num / elem) < 999)) seq1
{998001 .. -1 .. 10000} |> Seq.filter(fun x -> isPalin x) |> Seq.filter(fun x -> isGood x {999 .. -1 .. 100}) |> Seq.nth 0
simplest way is to go from 999 to 100, because is much likley to be product of two large numbers.
j can then start from i because other way around was already tested
other optimisations would go in directions where multiplactions would go descending order, but that makes everything little more difficult. In general it is expressed as list mergeing.
Haskell (my best try in functional programming)
merge f x [] = x
merge f [] y = y
merge f (x:xs) (y:ys)
| f x y = x : merge f xs (y:ys)
| otherwise = y : merge f (x:xs) ys
compare_tuples (a,b) (c,d) = a*b >= c*d
gen_mul n = (n,n) : merge compare_tuples
( gen_mul (n-1) )
( map (\x -> (n,x)) [n-1,n-2 .. 1] )
is_product_palindrome (a,b) = x == reverse x where x = show (a*b)
main = print $ take 10 $ map ( \(a,b)->(a,b,a*b) )
$ filter is_product_palindrome $ gen_mul 9999
output (less than 1s)- first 10 palindromes =>
[(9999,9901,99000099),
(9967,9867,98344389),
(9999,9811,98100189),
(9999,9721,97200279),
(9999,9631,96300369),
(9999,9541,95400459),
(9999,9451,94500549),
(9767,9647,94222249),
(9867,9547,94200249),
(9999,9361,93600639)]
One can see that this sequence is lazy generated from large to small
Optimized version:
let Euler dgt=
let [mine;maxe]=[dgt-1;dgt]|>List.map (fun x->String.replicate x "9"|>int)
let IsPalindrom (str:string)=
let rec fn(a,b)=a>b||str.[a]=str.[b]&&fn(a+1,b-1)
fn(0,str.Length-1)
let IsIntPalindrome = (string>>IsPalindrom)
let rec fn=function
|x,y,max,a,_ when a=mine->x,y,max
|x,y,max,a,b when b=mine->fn(x,y,max,a-1,maxe)
|x,y,max,a,b->a*b|>function
|m when b=maxe&&m<max->x,y,max
|m when m>max&&IsIntPalindrome(m)->fn(a,b,m,a-1,maxe)
|m when m>max->fn(x,y,max,a,b-1)
|_->fn(x,y,max,a-1,maxe)
fn(0,0,0,maxe,maxe)
Log (switch #time on):
> Euler 2;;
Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
val it : int * int * int = (99, 91, 9009)
> Euler 3;;
Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
val it : int * int * int = (993, 913, 906609)
> Euler 4;;
Real: 00:00:00.002, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
val it : int * int * int = (9999, 9901, 99000099)
> Euler 5;;
Real: 00:00:00.702, CPU: 00:00:00.686, GC gen0: 108, gen1: 1, gen2: 0
val it : int * int * int = (99793, 99041, 1293663921) //int32 overflow
Extern to BigInteger:
let Euler dgt=
let [mine;maxe]=[dgt-1;dgt]|>List.map (fun x->new System.Numerics.BigInteger(String.replicate x "9"|>int))
let IsPalindrom (str:string)=
let rec fn(a,b)=a>b||str.[a]=str.[b]&&fn(a+1,b-1)
fn(0,str.Length-1)
let IsIntPalindrome = (string>>IsPalindrom)
let rec fn=function
|x,y,max,a,_ when a=mine->x,y,max
|x,y,max,a,b when b=mine->fn(x,y,max,a-1I,maxe)
|x,y,max,a,b->a*b|>function
|m when b=maxe&&m<max->x,y,max
|m when m>max&&IsIntPalindrome(m)->fn(a,b,m,a-1I,maxe)
|m when m>max->fn(x,y,max,a,b-1I)
|_->fn(x,y,max,a-1I,maxe)
fn(0I,0I,0I,maxe,maxe)
Check:
Euler 5;;
Real: 00:00:02.658, CPU: 00:00:02.605, GC gen0: 592, gen1: 1, gen2: 0
val it :
System.Numerics.BigInteger * System.Numerics.BigInteger *
System.Numerics.BigInteger =
(99979 {...}, 99681 {...}, 9966006699 {...})

How do I translate this Haskell to F#?

I'm trying to learn F# by translating some Haskell code I wrote a very long time ago, but I'm stuck!
percent :: Int -> Int -> Float
percent a b = (fromInt a / fromInt b) * 100
freqs :: String -> [Float]
freqs ws = [percent (count x ws) (lowers ws) | x <- ['a' .. 'z']]
I've managed this:
let percent a b = (float a / float b) * 100.
although i dont like having to have the . after the 100.
What is the name of the operation I am performing in freqs, and how do I translate it to F#?
Edit: count and lowers are Char -> String -> Int and String -> Int respectively, and I have translated these already.
This is a list comprehension, and in F# it looks like the last two lines below:
// stub out since don't know the implementation
let count (c:char) (s:string) = 4
let lowers (s:string) = 10
// your code
let percent a b = (float a / float b) * 100.
let freq ws = [for x in ['a'..'z'] do
yield percent (count x ws) (lowers ws)]
More generally I think Haskell list comprehensions have the form suggested by the example below, and the corresponding F# is shown.
// Haskell
// [e(x,y) | x <- l1, y <- l2, pred(x,y)]
// F#
[for x in l1 do
for y in l2 do
if pred(x,y) then
yield e(x,y)]
Note that Brian's F# code:
let freq ws = [for x in ['a'..'z'] do yield percent (count x ws) (lowers ws)]
Can be written more elegantly as:
let freq ws = [for x in 'a'..'z' -> percent (count x ws) (lowers ws)]

Resources