Why is head-tail pattern matching so much faster than indexing? - performance

I was working on a HackerRank problem today and initially wrote it with indexing and it was incredibly slow for most of the test cases because they were huge. I then decided to switch it to head:tail pattern matching and it just zoomed. The difference was absolutely night and day, but I can't figure out how it was such a change in efficiency. Here is the code for reference if it is at all useful
Most efficient attempt with indexing
count :: Eq a => Integral b => a -> [a] -> b
count e [] = 0
count e (a:xs) = (count e xs +) $ if a == e then 1 else 0
fullCheck :: String -> Bool
fullCheck a = prefixCheck 0 (0,0,0,0) a (length a) && (count 'R' a == count 'G' a) && (count 'Y' a == count 'B' a)
prefixCheck :: Int -> (Int, Int, Int, Int) -> String -> Int -> Bool
prefixCheck n (r',g',y',b') s l
| n == l = True
| otherwise =
((<= 1) $ abs $ r - g) && ((<= 1) $ abs $ y - b)
&& prefixCheck (n+1) (r,g,y,b) s l
where c = s !! n
r = if c == 'R' then r' + 1 else r'
g = if c == 'G' then g' + 1 else g'
y = if c == 'Y' then y' + 1 else y'
b = if c == 'B' then b' + 1 else b'
run :: Int -> IO ()
run 0 = putStr ""
run n = do
a <- getLine
print $ fullCheck a
run $ n - 1
main :: IO ()
main = do
b <- getLine
run $ read b
head:tail pattern matching attempt
count :: Eq a => Integral b => a -> [a] -> b
count e [] = 0
count e (a:xs) = (count e xs +) $ if a == e then 1 else 0
fullCheck :: String -> Bool
fullCheck a = prefixCheck (0,0,0,0) a && (count 'R' a == count 'G' a) && (count 'Y' a == count 'B' a)
prefixCheck :: (Int, Int, Int, Int) -> String -> Bool
prefixCheck (r,g,y,b) [] = r == g && y == b
prefixCheck (r',g',y',b') (h:s) = ((<= 1) $ abs $ r - g) && ((<= 1) $ abs $ y - b)
&& prefixCheck (r,g,y,b) s
where r = if h == 'R' then r' + 1 else r'
g = if h == 'G' then g' + 1 else g'
y = if h == 'Y' then y' + 1 else y'
b = if h == 'B' then b' + 1 else b'
run :: Int -> IO ()
run 0 = putStr ""
run n = do
a <- getLine
print $ fullCheck a
run $ n - 1
main :: IO ()
main = do
b <- getLine
run $ read b
For reference as well, the question was
You are given a sequence of N balls in 4 colors: red, green, yellow and blue. The sequence is full of colors if and only if all of the following conditions are true:
There are as many red balls as green balls.
There are as many yellow balls as blue balls.
Difference between the number of red balls and green balls in every prefix of the sequence is at most 1.
Difference between the number of yellow balls and blue balls in every prefix of the sequence is at most 1.
Where a prefix of a string is any substring from the beginning to m where m is less than the size of the string

You have already got the answer in the comments why lists indexing performs linearly. But, if you are interested in a more Haskell style solution to the Hackerrank problem your referring to, even head-tail pattern matching is unnecessary. A more performant solution can be done with right folds:
import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
solve :: String -> Bool
solve s = foldr go (\r g y b -> r == g && y == b) s 0 0 0 0
where
go x run r g y b
| 1 < abs (r - g) || 1 < abs (y - b) = False
| x == 'R' = run (r + 1) g y b
| x == 'G' = run r (g + 1) y b
| x == 'Y' = run r g (y + 1) b
| x == 'B' = run r g y (b + 1)
main :: IO ()
main = do
n <- read <$> getLine
replicateM_ n $ getLine >>= print . solve

Related

Implementing LLL algorithm in Haskell

I'm implementing the LLL basis reduction algorithm in Haskell. I'm basing my code on the pseudocode on Wikipedia. Here is what I have so far. Apologies for the code dump; I strongly suspect the issue lies in lll but I'm giving everything just in case.
import Linear as L
f v x = v `L.dot` x
gram_schmidt b =
let aux vs us =
case vs of
v:t -> let vus = map (\u -> project u v) us
s = foldr (^+^) zero vus
u = v ^-^ s in
aux t (us++[u])
[] -> us
in aux b []
swap :: Int -> Int -> [a] -> [a]
swap i j xs =
let elemI = xs !! i
elemJ = xs !! j
left = take i xs
middle = take (j - i - 1) (drop (i + 1) xs)
right = drop (j + 1) xs
in left ++ [elemJ] ++ middle ++ [elemI] ++ right
update i xs new =
let left = take (i-1) xs
right = drop (i) xs
in left ++ [new] ++ right
sort_vecs vs = map snd (sort (zip (map norm vs) vs))
lll :: Int -> [[Double]] -> Double -> [[Double]]
lll d b delta =
let b' = gram_schmidt b
aux :: [[Double]] -> [[Double]] -> Int -> [[Double]]
aux b b' k =
if k >= d then
b
else
let aux2 :: [[Double]] -> [[Double]] -> Int -> [[Double]]
aux2 b b' j =
if j < 0 then
let mu = (f (b!!k) (b'!!(k-1))) / (f (b'!!(k-1)) (b'!!(k-1))) in
if f (b'!!k) (b'!!k) >= (delta-mu^2) * f (b'!!(k-1)) (b'!!(k-1)) then
aux b b' (k+1)
else
let bb = swap k (k-1) b
bb' = gram_schmidt bb in
aux bb bb' (max (k-1) 1)
else
let mu = (f (b!!k) (b'!!j)) / (f (b'!!j) (b'!!j)) in
if abs mu > 0.5 then
let bk = b!!k
bj = b!!j
bb = update k b (bk ^-^ (fromIntegral (round mu)) *^ bj)
bb' = gram_schmidt bb in
aux2 bb bb' (j-1)
else
aux2 b b' (j-1)
in aux2 b b' (k-1)
in sort_vecs (aux b b' 1)
My issue is that it seems to find a basis of a sublattice. In particular, lll d [[-0.8526334764831849,-3.125000000000004e-2],[-1.2941941738241598,4.419417382415916e-2]] 0.75 returns [[0.41107277914220997,0.10669417382415924],[-1.2941941738241598,4.419417382415916e-2]], a basis for a index-2 sublattice, and with basis which are almost-parallel. I've been staring at this code for ages to no avail (I thought there was an issue with update where (i-1) should be (i) and (i) should be (i+1) but this caused an infinite loop). Any help is greatly appreciated.

Something wrong with my PollardP1_rho code but I don't know how to fix it

I tried to use MillerRabin + PollardP1_rho method to factorize an integer into primes in Python3 for reducing time complexity as much as I could.But it failed some tests,I knew where the problem was.But I am a tyro in algorithm, I didn't know how to fix it.So I will put all relative codes here.
import random
def gcd(a, b):
"""
a, b: integers
returns: a positive integer, the greatest common divisor of a & b.
"""
if a == 0:
return b
if a < 0:
return gcd(-a, b)
while b > 0:
c = a % b
a, b = b, c
return a
def mod_mul(a, b, n):
# Calculate a * b % n iterately.
result = 0
while b > 0:
if (b & 1) > 0:
result = (result + a) % n
a = (a + a) % n
b = (b >> 1)
return result
def mod_exp(a, b, n):
# Calculate (a ** b) % n iterately.
result = 1
while b > 0:
if (b & 1) > 0:
result = mod_mul(result, a, n)
a = mod_mul(a, a, n)
b = (b >> 1)
return result
def MillerRabinPrimeCheck(n):
if n in {2, 3, 5, 7, 11}:
return True
elif (n == 1 or n % 2 == 0 or n % 3 == 0 or n % 5 == 0 or n % 7 == 0 or n % 11 == 0):
return False
k = 0
u = n - 1
while not (u & 1) > 0:
k += 1
u = (u >> 1)
random.seed(0)
s = 5 #If the result isn't right, then add the var s.
for i in range(s):
x = random.randint(2, n - 1)
if x % n == 0:
continue
x = mod_exp(x, u, n)
pre = x
for j in range(k):
x = mod_mul(x, x, n)
if (x == 1 and pre != 1 and pre != n - 1):
return False
pre = x
if x != 1:
return False
return True
def PollardP1_rho(n, c):
'''
Consider c as a constant integer.
'''
i = 1
k = 2
x = random.randrange(1, n - 1) + 1
y = x
while 1:
i += 1
x = (mod_mul(x, x, n) + c) % n
d = gcd(y - x, n)
if 1 < d < n:
return d
elif x == y:
return n
elif i == k:
y = x
k = (k << 1)
result = []
def PrimeFactorsListGenerator(n):
if n <= 1:
pass
elif MillerRabinPrimeCheck(n) == True:
result.append(n)
else:
a = n
while a == n:
a = PollardP1_rho(n, random.randrange(1,n - 1) + 1)
PrimeFactorsListGenerator(a)
PrimeFactorsListGenerator(n // a)
When I tried to test this:
PrimeFactorsListGenerator(4)
It didn't stop and looped this:
PollardP1_rho(4, random.randrange(1,4 - 1) + 1)
I have already tested the functions before PollardP1_rho and they work normally,so I know the function PollardP1_rho cannot deal the number 4 correctly,also the number 5.How can I fix that?
I have solved it myself.
There is 1 mistake in the code.
I should not use a var 'result' outside of the function as a global var,I should define in the function and use result.extend() to ensure the availability of the whole recursive process.So I rewrote PollardP1_rho(n, c) and PrimeFactorsListGenerator(n):
def Pollard_rho(x, c):
'''
Consider c as a constant integer.
'''
i, k = 1, 2
x0 = random.randint(0, x)
y = x0
while 1:
i += 1
x0 = (mod_mul(x0, x0, x) + c) % x
d = gcd(y - x0, x)
if d != 1 and d != x:
return d
if y == x0:
return x
if i == k:
y = x0
k += k
def PrimeFactorsListGenerator(n):
result = []
if n <= 1:
return None
if MillerRabinPrimeCheck(n):
return [n]
p = n
while p >= n:
p = Pollard_rho(p, random.randint(1, n - 1))
result.extend(PrimeFactorsListGenerator(p))
result.extend(PrimeFactorsListGenerator(n // p))
return result
#PrimeFactorsListGenerator(400)
#PrimeFactorsListGenerator(40000)
There is an additional tip: You don't need to write a function mod_mul(a, b, n) at all, using Python built-in pow(a, b, n) will do the trick and it is fully optimized.

Knuth-Morris-Pratt implementation in Haskell -- Index out of bounds

I've used the pseudocode from Wikipedia in an attempt to write a KMP algorithm in Haskell.
It's giving "index out of bounds" when I try to search beyond the length of the pattern and I can't seem to find the issue; my "fixes" have only ruined the result.
import Control.Monad
import Control.Lens
import qualified Data.ByteString.Char8 as C
import qualified Data.Vector.Unboxed as V
(!) :: C.ByteString -> Int -> Char
(!) = C.index
-- Make the table for the KMP. Directly from Wikipedia. Works as expected for inputs from Wikipedia article.
mkTable :: C.ByteString -> V.Vector Int
mkTable pat = make 2 0 (ix 0 .~ (negate 1) $ V.replicate l 0)
where
l = C.length pat
make :: Int -> Int -> V.Vector Int -> V.Vector Int
make p c t
| p >= l = t
| otherwise = proc
where
proc | pat ! (p-1) == pat ! c
= make (p+1) (c+1) (ix p .~ (c+1) $ t)
| c > 0 = make p (t V.! c) t
| otherwise = make (p+1) c (ix p .~ 0 $ t)
kmp :: C.ByteString -> C.ByteString -> V.Vector Int -> Int
kmp text pat tbl = search 0 0
where
l = C.length text
search m i
| m + i >= l = l
| otherwise = cond
where
-- The conditions for the loop, given in the wiki article
cond | pat ! i == text ! (m+i)
= if i == C.length pat - 1
then m
else search m (i+1)
| tbl V.! i > (-1)
= search (m + i - (tbl V.! i)) (tbl V.! i)
| otherwise
= search 0 (m+1)
main :: IO()
main = do
t <- readLn
replicateM_ t $ do
text <- C.getLine
pat <- C.getLine
putStrLn $ kmp text pat (mkTable pat)
Simple solution: I mixed up m and i in the last condition of kmp.
| otherwise = search 0 (m+1)
Becomes
| otherwise = search (m+1) 0
And the issue is resolved.
Aside from that, it's necessary to use unboxed arrays in the ST monad or the table generation takes an absurd amount of time.

how to implement sigma in haskell?

i'm learning haskell and i'm trying to solve http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=1098
i solved this problem using other language so i know the solution
answerTo(n, d) = func(n, d) - func(n, d - 1)
func(n, d) = Σ(func(k - 2, d -1) * func(n - k, d)) | 2 <= k <= n, k is even
func(0, d) = 1
and i need to implement func in haskell. i don't know how to.
func n d
| n < 0 || d < 0 = 0
| n == 0 && d >= 0 = 1
| otherwise = --need to implement Σ(func (k - 2) (d -1)) * (func (n - k) d) | 2 <= k <= n, k is even
i solved in this way
func (n, d)
| n == 0 && d >= 0 = 1
| n < 0 || d < 0 = 0
| otherwise = sum (zipWith (*) (map func arg1) (map func arg2))
where
arg1 = [(k - 2, d - 1) | k <- filter even [2..n]]
arg2 = [(n - k, d) | k <- filter even [2..n]]
there are other solutions more graceful?
The Math.NumberTheory.Primes.Factorization module in the arithmoi package includes a σ function that may be the one you need, along with other related functions. You can click through to the source code for that function if you like.
Edit
Now that you've clarified what you mean, the answer is simple. You need to enumerate the values you need and then sum them. The easiest way to do this is using the sum function to sum a list that you create using a list comprehension:
func n d
| n < 0 || d < 0 = 0
| n == 0 && d >= 0 = 1
| otherwise = sum [func (k - 2) (d - 1) * func (n - k) d | k <- [2,4..n]]

Optimizing the damerau version of the levenshtein algorithm to better than O(n*m)

Here is the algorithm (in ruby)
#http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
def self.dameraulevenshtein(seq1, seq2)
oneago = nil
thisrow = (1..seq2.size).to_a + [0]
seq1.size.times do |x|
twoago, oneago, thisrow = oneago, thisrow, [0] * seq2.size + [x + 1]
seq2.size.times do |y|
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + ((seq1[x] != seq2[y]) ? 1 : 0)
thisrow[y] = [delcost, addcost, subcost].min
if (x > 0 and y > 0 and seq1[x] == seq2[y-1] and seq1[x-1] == seq2[y] and seq1[x] != seq2[y])
thisrow[y] = [thisrow[y], twoago[y-2] + 1].min
end
end
end
return thisrow[seq2.size - 1]
end
My problem is that with a seq1 of length 780, and seq2 of length 7238, this takes about 25 seconds to run on an i7 laptop. Ideally, I'd like to get this reduced to about a second, since it's running as part of a webapp.
I found that there is a way to optimize the vanilla levenshtein distance such that the runtime drops from O(n*m) to O(n + d^2) where n is the length of the longer string, and d is the edit distance. So, my question becomes, can the same optimization be applied to the damerau version I have (above)?
Yes the optimization can be applied to the damereau version. Here is a haskell code to do this (I don't know Ruby):
distd :: Eq a => [a] -> [a] -> Int
distd a b
= last (if lab == 0 then mainDiag
else if lab > 0 then lowers !! (lab - 1)
else{- < 0 -} uppers !! (-1 - lab))
where mainDiag = oneDiag a b (head uppers) (-1 : head lowers)
uppers = eachDiag a b (mainDiag : uppers) -- upper diagonals
lowers = eachDiag b a (mainDiag : lowers) -- lower diagonals
eachDiag a [] diags = []
eachDiag a (bch:bs) (lastDiag:diags) = oneDiag a bs nextDiag lastDiag : eachDiag a bs diags
where nextDiag = head (tail diags)
oneDiag a b diagAbove diagBelow = thisdiag
where doDiag [_] b nw n w = []
doDiag a [_] nw n w = []
doDiag (apr:ach:as) (bpr:bch:bs) nw n w = me : (doDiag (ach:as) (bch:bs) me (tail n) (tail w))
where me = if ach == bch then nw else if ach == bpr && bch == apr then nw else 1 + min3 (head w) nw (head n)
firstelt = 1 + head diagBelow
thisdiag = firstelt : doDiag a b firstelt diagAbove (tail diagBelow)
lab = length a - length b
min3 x y z = if x < y then x else min y z
distance :: [Char] -> [Char] -> Int
distance a b = distd ('0':a) ('0':b)
The code above is an adaptation of this code.

Resources