Why my balanced brackets stack algorithm fails within nested OPEN brackets? - algorithm

I know that there're thousands of examples in internet and question here about an algorithm using a Stack to check if a sequence of brackets - "(){}[]" - are balanced.
However all I've found was simple algorithms that stops in the first unbalanced pair and return a boolean or a string saying if the sequence is balanced or not.
My problem go beyond that: I need to group all "parse" errors and then print in screen all those balanced errors!
Example "([}])"
when a common algorithm finds the "}" the program woould return false or "NO"
My algorithm needs to add a "Error: close curly braces does not match" into a list to when the program finish to parse the string all errors would be returned.
That's the complete programs:
module LE2.Stack.Delim
( parse
, erroAbreParen
, erroAbreCol
, erroAbreChaves
, erroFechaParen
, erroFechaCol
, erroFechaChaves
) where
import GHC.Exts ( sortWith )
import qualified LE2.Stack.TAD as Stack
import Debug.Trace ( trace )
tokenize :: String -> [(Char, Int)]
tokenize s = zip (filter ehDelim s) [0 ..]
parse :: String -> [String]
parse lines = go (tokenize lines) Stack.new []
where
go x y z | trace ("go: " ++ show x ++ " " ++ show y ++ " " ++ show z) False =
undefined
go [] st parsed
| Stack.isEmpty st = map fst $ sort parsed
| otherwise = map fst . sort $ parsed ++ map swap ((Stack.<<>) st)
go (('(', idx) : xs) st parsed = go xs (Stack.push st (')', idx)) parsed
go (('[', idx) : xs) st parsed = go xs (Stack.push st (']', idx)) parsed
go (('{', idx) : xs) st parsed = go xs (Stack.push st ('}', idx)) parsed
go (pair : xs) st parsed | Stack.isEmpty st = go xs st $ apply pair : parsed
go ((')', _) : xs) st parsed | Just (')', _) <- Stack.peek st =
let (Just _, st') = Stack.pop st in go xs st' parsed
go ((']', _) : xs) st parsed | Just (']', _) <- Stack.peek st =
let (Just _, st') = Stack.pop st in go xs st' parsed
go (('}', _) : xs) st parsed | Just ('}', _) <- Stack.peek st =
let (Just _, st') = Stack.pop st in go xs st' parsed
go (pair : xs) st parsed = go xs st $ apply pair : parsed
swap ('(', idx) = apply (')', idx)
swap (')', idx) = apply ('(', idx)
swap ('[', idx) = apply (']', idx)
swap (']', idx) = apply ('[', idx)
swap ('{', idx) = apply ('}', idx)
swap ('}', idx) = apply ('{', idx)
apply ('(', idx) = (erroAbreParen, idx)
apply (')', idx) = (erroFechaParen, idx)
apply ('[', idx) = (erroAbreCol, idx)
apply (']', idx) = (erroFechaCol, idx)
apply ('{', idx) = (erroAbreChaves, idx)
apply ('}', idx) = (erroFechaChaves, idx)
sort = sortWith (abs . snd)
erroFechaParen :: String
erroFechaParen = "Erro: fecha parentêses não casa!"
erroFechaCol :: String
erroFechaCol = "Erro: fecha colchetes não casa!"
erroFechaChaves :: String
erroFechaChaves = "Erro: fecha chaves não casa!"
erroAbreParen :: String
erroAbreParen = "Erro: abre parentêses não casa!"
erroAbreCol :: String
erroAbreCol = "Erro: abre colchetes não casa!"
erroAbreChaves :: String
erroAbreChaves = "Erro: abre chaves não casa!"
ehDelim :: Char -> Bool
ehDelim ch | elem ch "()[]{}" = True
| otherwise = False
I know that I maybe overcomplicated the problem but this is one of my first versions!
I also wrote some tests:
module LE2.Stack.DelimSpec where
import Test.Hspec
import LE2.Stack.Delim
spec :: Spec
spec = do
describe "testa o casamento de parênteses, colchetes e chaves!" $ do
it "deve retornar a quantidade correta de erros" $ do
let as = "{}[]()"
let bs = ")]}"
let cs = "([{"
let ds = "([]{}"
let es = "()[{}"
let fs = "()[]{"
let xs = "()[]{}"
let ys = "([{}])"
let zs = "{[()]}"
(length $ parse as) `shouldBe` 0
(length $ parse bs) `shouldBe` 3
(length $ parse cs) `shouldBe` 3
(length $ parse ds) `shouldBe` 1
(length $ parse es) `shouldBe` 1
(length $ parse fs) `shouldBe` 1
(length $ parse xs) `shouldBe` 0
(length $ parse ys) `shouldBe` 0
(length $ parse zs) `shouldBe` 0
it "deve retornar os erros corretos" $ do
let as = "([{"
let bs = "([]{}"
let cs = "()[{}"
let ds = "()[]{"
let es = "({}])"
let fs = "([}])"
let gs = "[{}])"
let hs = "]]{}}{"
let is = "(])]{{"
let js = "{}{[))"
let ks = "([{})"
let ls = "([{])"
let ms = "([{}]"
let xs = "("
let ys = "["
let zs = "{"
parse as `shouldBe` [erroAbreParen, erroAbreCol, erroAbreChaves]
parse bs `shouldBe` [erroAbreParen]
parse cs `shouldBe` [erroAbreCol]
parse ds `shouldBe` [erroAbreChaves]
parse es `shouldBe` [erroFechaCol]
parse fs `shouldBe` [erroFechaChaves]
parse gs `shouldBe` [erroFechaParen]
parse hs `shouldBe` [erroFechaCol, erroFechaCol, erroFechaChaves, erroAbreChaves]
parse is `shouldBe` [erroFechaCol, erroFechaCol, erroAbreChaves, erroAbreChaves]
parse js `shouldBe` [erroAbreChaves, erroAbreCol, erroFechaParen, erroFechaParen]
parse ks `shouldBe` [erroAbreCol]
parse ls `shouldBe` [erroAbreChaves]
parse ms `shouldBe` [erroAbreParen]
parse xs `shouldBe` [erroAbreParen]
parse ys `shouldBe` [erroAbreCol]
parse zs `shouldBe` [erroAbreChaves]
These tests only breaks in these cases:
let ks = "([{})"
let ls = "([{])"
let ms = "([{}]"
So I got really confused with WHY they break only with nested unbalanced open brackets. I could determine that my logic fails in this clause:
go (pair : xs) st parsed = go xs st $ apply pair : parsed
Because the program will push a balanced close bracket but they will remain in the stack.
But why and how I could solve this?

You can try change
go (pair : xs) st parsed = go xs st $ apply pair : parsed
to
go (pair : xs) st parsed =
let (_, st') = Stack.pop st in go (pair : xs) st' $ apply pair : parsed
I really can't ensure that it will work, but in case with ks problem is that you have stack (']', 1) : (')', 0) : [] before ) and stack remains unchanged after ), but if you apply matching ) again with stack (')', 0) : [] it maybe will work.

Related

Performance of Conway's Game of Life using the Store comonad

I've written a simple implementation of Conway's Game of Life using the Store comonad (see code below). My problem is that the grid generation is getting visibly slower from the fifth iteration onwards. Is my issue related to the fact that I'm using the Store comonad? Or am I making a glaring mistake? As far as I could tell, other implementations, which are based on the Zipper comonad, are efficient.
import Control.Comonad
data Store s a = Store (s -> a) s
instance Functor (Store s) where
fmap f (Store g s) = Store (f . g) s
instance Comonad (Store s) where
extract (Store f a) = f a
duplicate (Store f s) = Store (Store f) s
type Pos = (Int, Int)
seed :: Store Pos Bool
seed = Store g (0, 0)
where
g ( 0, 1) = True
g ( 1, 0) = True
g (-1, -1) = True
g (-1, 0) = True
g (-1, 1) = True
g _ = False
neighbours8 :: [Pos]
neighbours8 = [(x, y) | x <- [-1..1], y <- [-1..1], (x, y) /= (0, 0)]
move :: Store Pos a -> Pos -> Store Pos a
move (Store f (x, y)) (dx, dy) = Store f (x + dx, y + dy)
count :: [Bool] -> Int
count = length . filter id
getNrAliveNeighs :: Store Pos Bool -> Int
getNrAliveNeighs s = count $ fmap (extract . move s) neighbours8
rule :: Store Pos Bool -> Bool
rule s = let n = getNrAliveNeighs s
in case (extract s) of
True -> 2 <= n && n <= 3
False -> n == 3
blockToStr :: [[Bool]] -> String
blockToStr = unlines . fmap (fmap f)
where
f True = '*'
f False = '.'
getBlock :: Int -> Store Pos a -> [[a]]
getBlock n store#(Store _ (x, y)) =
[[extract (move store (dx, dy)) | dy <- yrange] | dx <- xrange]
where
yrange = [(x - n)..(y + n)]
xrange = reverse yrange
example :: IO ()
example = putStrLn
$ unlines
$ take 7
$ fmap (blockToStr . getBlock 5)
$ iterate (extend rule) seed
The store comonad per se doesn't really store anything (except in an abstract sense that a function is a “container”), but has to compute it from scratch. That clearly gets very inefficient over a couple of iterations.
You can alleviate this with no change to your code though, if you just back up the s -> a function with some memoisation:
import Data.MemoTrie
instance HasTrie s => Functor (Store s) where
fmap f (Store g s) = Store (memo $ f . g) s
instance HasTrie s => Comonad (Store s) where
extract (Store f a) = f a
duplicate (Store f s) = Store (Store f) s
Haven't tested whether this really gives acceptable performance.
Incidentally, Edward Kmett had an explicitly-memoised version in an old version of the comonad-extras package, but it's gone now. I've recently looked if that still works (seems like it does, after adjusting dependencies).

F# - Algorithm and strings

Let's says I have a string of a length N that contains only 0 or 1. I want to split that string in multiples strings and each string should contains only one digit.
Example:
00011010111
Should be split into:
000
11
0
1
0
111
The only solution I can think of if using a for loop with a string builder (Written in pseudo code below, more c# like sorry):
result = new list<string>
tmpChar = ""
tmpString = ""
for each character c in MyString
if tmpchar != c
if tmpsString != ""
result.add tmpString
tmpString = ""
endIf
tmpchar = c
endIf
tmpString += tmpChar
endFor
Do you have any other solution and maybe a clever solution that use a more functional approach?
I think Seq.scan would be a good fit for this, this is a very procedural problem in nature, preserving the order like that. But here is code that I believe does what you are asking.
"00011010111"
|> Seq.scan (fun (s, i) x ->
match s with
| Some p when p = x -> Some x, i
| _ -> Some x, i + 1 ) (None, 0)
|> Seq.countBy id
|> Seq.choose (function
| (Some t, _), n -> Some(t, n)
| _ -> None )
|> Seq.toList
Perhaps something along the lines of:
let result =
let rec groupWhileSame xs result =
match xs with
| a when a |> Seq.isEmpty -> result
| _ ->
let head = xs |> Seq.head
let str = xs |> Seq.takeWhile ((=) head)
let rest = xs |> Seq.skipWhile ((=) head)
groupWhileSame rest (Seq.append result [str])
groupWhileSame (myStr) []
Seq.fold (fun (acc:(string list)) x ->
match acc with
| y::rst when y.StartsWith(string x) -> (string x) + y::rst
| _ -> (string x)::acc)
[]
"00011010111"
Consider this function (which is generic):
let chunk s =
if Seq.isEmpty s then []
else
let rec chunk items chunks =
if Seq.isEmpty items then chunks
else
let chunks' =
match chunks with
| [] -> [(Seq.head items, 1)]
| x::xs ->
let c,n = x in let c' = Seq.head items in
if c = c' then (c, n + 1) :: xs else (c', 1) :: x :: xs
chunk (Seq.tail items) chunks'
chunk s [] |> List.rev
It returns a list of tuples, where each tuple represents an item and its repetitions.
So
"00011010111" |> Seq.toList |> chunk
actually returns
[('0', 3); ('1', 2); ('0', 1); ('1', 1); ('0', 1); ('1', 3)]
Basically, we're doing run length encoding (which is admittedly a bit wasteful in the case of your example string).
To get the list of strings that you want, we use code like following:
"00011010111"
|> Seq.toList
|> chunk
|> List.map (fun x -> let c,n = x in new string(c, n))
Here's a working version of OP's proposal with light syntax:
let chunk (s: string) =
let result = System.Collections.Generic.List<string>()
let mutable tmpChar = ""
let mutable tmpString = ""
for c in s do
if tmpChar <> string c then
if tmpString <> "" then
result.Add tmpString
tmpString <- ""
tmpChar <- string c
tmpString <- tmpString + tmpChar
result.Add tmpString
result
No attempt was made to follow a functional style.

Why is the "better" digit-listing function slower?

I was playing around with Project Euler #34, and I wrote these functions:
import Data.Time.Clock.POSIX
import Data.Char
digits :: (Integral a) => a -> [Int]
digits x
| x < 10 = [fromIntegral x]
| otherwise = let (q, r) = x `quotRem` 10 in (fromIntegral r) : (digits q)
digitsByShow :: (Integral a, Show a) => a -> [Int]
digitsByShow = map (\x -> ord x - ord '0') . show
I thought that for sure digits has to be the faster one, as we don't convert to a String. I could not have been more wrong. I ran the two versions via pe034:
pe034 digitFunc = sum $ filter sumFactDigit [3..2540160]
where
sumFactDigit :: Int -> Bool
sumFactDigit n = n == (sum $ map sFact $ digitFunc n)
sFact :: Int -> Int
sFact n
| n == 0 = 1
| n == 1 = 1
| n == 2 = 2
| n == 3 = 6
| n == 4 = 24
| n == 5 = 120
| n == 6 = 720
| n == 7 = 5040
| n == 8 = 40320
| n == 9 = 362880
main = do
begin <- getPOSIXTime
print $ pe034 digitsByShow -- or digits
end <- getPOSIXTime
print $ end - begin
After compiling with ghc -O, digits consistently takes .5 seconds, while digitsByShow consistently takes .3 seconds. Why is this so? Why is the function which stays within Integer arithmetic slower, whereas the function which goes into string comparison is faster?
I ask this because I come from programming in Java and similar languages, where the % 10 trick of generating digits is way faster than the "convert to String" method. I haven't been able to wrap my head around the fact that converting to a string could be faster.
This is the best I can come up with.
digitsV2 :: (Integral a) => a -> [Int]
digitsV2 n = go n []
where
go x xs
| x < 10 = fromIntegral x : xs
| otherwise = case quotRem x 10 of
(q,r) -> go q (fromIntegral r : xs)
when compiled with -O2 and tested with Criterion
digits runs in 470.4 ms
digitsByShow runs in 421.8 ms
digitsV2 runs in 258.0 ms
results may vary
edit:
I am not sure why building the list like this helps so much.
But you can improve your codes speed by strictly evaluating quotRem x 10
You can do this with BangPatterns
| otherwise = let !(q, r) = x `quotRem` 10 in (fromIntegral r) : (digits q)
or with case
| otherwise = case quotRem x 10 of
(q,r) -> fromIntegral r : digits q
Doing this drops digits down to 323.5 ms
edit: time without using Criterion
digits = 464.3 ms
digitsStrict = 328.2 ms
digitsByShow = 259.2 ms
digitV2 = 252.5 ms
note: The criterion package measures software performance.
Let's investigate why #No_signal's solution is faster.
I made three runs of ghc:
ghc -O2 -ddump-simpl digits.hs >digits.txt
ghc -O2 -ddump-simpl digitsV2.hs >digitsV2.txt
ghc -O2 -ddump-simpl show.hs >show.txt
digits.hs
digits :: (Integral a) => a -> [Int]
digits x
| x < 10 = [fromIntegral x]
| otherwise = let (q, r) = x `quotRem` 10 in (fromIntegral r) : (digits q)
main = return $ digits 1
digitsV2.hs
digitsV2 :: (Integral a) => a -> [Int]
digitsV2 n = go n []
where
go x xs
| x < 10 = fromIntegral x : xs
| otherwise = let (q, r) = x `quotRem` 10 in go q (fromIntegral r : xs)
main = return $ digits 1
show.hs
import Data.Char
digitsByShow :: (Integral a, Show a) => a -> [Int]
digitsByShow = map (\x -> ord x - ord '0') . show
main = return $ digitsByShow 1
If you'd like to view the complete txt files, I placed them on ideone (rather than paste a 10000 char dump here):
digits.txt
digitsV2.txt
show.txt
If we carefully look through digits.txt, it appears that this is the relevant section:
lvl_r1qU = __integer 10
Rec {
Main.$w$sdigits [InlPrag=[0], Occ=LoopBreaker]
:: Integer -> (# Int, [Int] #)
[GblId, Arity=1, Str=DmdType <S,U>]
Main.$w$sdigits =
\ (w_s1pI :: Integer) ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.ltInteger#
w_s1pI lvl_r1qU
of wild_a17q { __DEFAULT ->
case GHC.Prim.tagToEnum# # Bool wild_a17q of _ [Occ=Dead] {
False ->
let {
ds_s16Q [Dmd=<L,U(U,U)>] :: (Integer, Integer)
[LclId, Str=DmdType]
ds_s16Q =
case integer-gmp-1.0.0.0:GHC.Integer.Type.quotRemInteger
w_s1pI lvl_r1qU
of _ [Occ=Dead] { (# ipv_a17D, ipv1_a17E #) ->
(ipv_a17D, ipv1_a17E)
} } in
(# case ds_s16Q of _ [Occ=Dead] { (q_a11V, r_X12h) ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.integerToInt r_X12h
of wild3_a17c { __DEFAULT ->
GHC.Types.I# wild3_a17c
}
},
case ds_s16Q of _ [Occ=Dead] { (q_X12h, r_X129) ->
case Main.$w$sdigits q_X12h
of _ [Occ=Dead] { (# ww1_s1pO, ww2_s1pP #) ->
GHC.Types.: # Int ww1_s1pO ww2_s1pP
}
} #);
True ->
(# GHC.Num.$fNumInt_$cfromInteger w_s1pI, GHC.Types.[] # Int #)
}
}
end Rec }
digitsV2.txt:
lvl_r1xl = __integer 10
Rec {
Main.$wgo [InlPrag=[0], Occ=LoopBreaker]
:: Integer -> [Int] -> (# Int, [Int] #)
[GblId, Arity=2, Str=DmdType <S,U><L,U>]
Main.$wgo =
\ (w_s1wh :: Integer) (w1_s1wi :: [Int]) ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.ltInteger#
w_s1wh lvl_r1xl
of wild_a1dp { __DEFAULT ->
case GHC.Prim.tagToEnum# # Bool wild_a1dp of _ [Occ=Dead] {
False ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.quotRemInteger
w_s1wh lvl_r1xl
of _ [Occ=Dead] { (# ipv_a1dB, ipv1_a1dC #) ->
Main.$wgo
ipv_a1dB
(GHC.Types.:
# Int
(case integer-gmp-1.0.0.0:GHC.Integer.Type.integerToInt ipv1_a1dC
of wild2_a1ea { __DEFAULT ->
GHC.Types.I# wild2_a1ea
})
w1_s1wi)
};
True -> (# GHC.Num.$fNumInt_$cfromInteger w_s1wh, w1_s1wi #)
}
}
end Rec }
I actually couldn't find the relevant section for show.txt. I'll work on that later.
Right off the bat, digitsV2.hs produces shorter code. That's probably a good sign for it.
digits.hs seems to be following this psuedocode:
def digits(w_s1pI):
if w_s1pI < 10: return [fromInteger(w_s1pI)]
else:
ds_s16Q = quotRem(w_s1pI, 10)
q_X12h = ds_s16Q[0]
r_X12h = ds_s16Q[1]
wild3_a17c = integerToInt(r_X12h)
ww1_s1pO = r_X12h
ww2_s1pP = digits(q_X12h)
ww2_s1pP.pushFront(ww1_s1pO)
return ww2_s1pP
digitsV2.hs seems to be following this psuedocode:
def digitsV2(w_s1wh, w1_s1wi=[]): # actually disguised as go(), as #No_signal wrote
if w_s1wh < 10:
w1_s1wi.pushFront(fromInteger(w_s1wh))
return w1_s1wi
else:
ipv_a1dB, ipv1_a1dC = quotRem(w_s1wh, 10)
w1_s1wi.pushFront(integerToIn(ipv1a1dC))
return digitsV2(ipv1_a1dC, w1_s1wi)
It might not be that these functions mutate lists like my psuedocode suggests, but this immediately suggests something: it looks as if digitsV2 is fully tail-recursive, whereas digits is actually not (may have to use some Haskell trampoline or something). It appears as if Haskell needs to store all the remainders in digits before pushing them all to the front of the list, whereas it can just push them and forget about them in digitsV2. This is purely speculation, but it is well-founded speculation.

How to abstract non-linear list iteration schemes into a re-usable algorithm?

On the odd chance, that someone has a brilliant idea...
I am not sure if there is a good way to generalize that.
EDIT: I think it might be nice to explain exactly what the inputs and outputs are. The code below is only how I approached the solution.
Inputs: data, recipe
data: set of string, string list here also called "set of named lists"
recipe: list of commands
Command Print (literal|list reference)
Adds the literal to the output or if it is a list reference, it adds the head of the referenced list to the output.
Command While (list reference)
when referenced list not empty --> next command
when referenced list empty --> skip entries in recipe list past the matching Wend.
Command Wend (list reference)
replace referenced list with tail (reference list)
when referenced list is empty, next command
when referenced list is not empty, next command is the matching while above.
Outputs: string list
The best answer is the implementation of that which is shortest and which allows to re-use that algorithm in new contexts.
This is not just a programming problem for the fun of it, btw. It is basically what happens if you try to implement data driven text templating.
The code below is my attempt to solve this problem.
The first code snippet is a non-generalized solution.
The second code snippet is an attempt to isolate the algorithm.
If you play with the code, simply paste the second snippet below the first snippet and both versions are working.
The whole topic is about understanding better how to separate the iteration algorithm from the rest of the code and then to simply apply it, in contrast of having all the other code within.
Would it not be great, if there was a way to abstract the way the statements are being processed and the looping of the while/wend, such,
that it can be reused in my main code, just as I keep re-using other "iteration schemes", such as List.map?
The commonalities between my main code and this study are:
An evolving "environment" which is threaded through all steps of the computation.
Collections, which need to be iterated in a well-formed nested manner. (Malformed would be: while x while y wend x wend y)
A series of "execution steps" form the body of each of those "while wend" loops.
Done in a "pure" manner. As you will note, nothing is mutable in the study. Want to keep it like that.
Each "While" introduces a new scope (as for binding values), which is discarded again, once the while loop is done.
So, I am looking for something like:
run: CommandClassifier -> commandExecutor -> Command list -> EnvType -> EnvType
where
CommandClassifier could be a function of the form Command -> NORMAL|LOOP_START|LOOP_END
and commandexecutor: Command -> EnvType -> EnvType
Of course, nesting of those while-blocks would not be limited to 2 (just tried to keep the testProgram() small).
SideNote: the "commands list" is an AST from a preceding parser run, but that should not really matter.
type MiniLanguage =
| Print of string
| While of string
| Wend of string
let testProgram =
[ Print("Hello, I am your Mini Language program")
While("names")
Print("<names>")
While("pets")
Print("<pets>")
Wend("pets")
Print("Done with pets.")
Wend("names")
Print("Done with names.")
]
type MiniEnvironment = { Bindings : Map<string,string>; Collections : Map<string, string list> }
let init collections =
{ Bindings = Map.empty; Collections = Map.ofList collections}
let bind n v env =
let newBindings =
env.Bindings
|> Map.remove n
|> Map.add n v
{ env with Bindings = newBindings; }
let unbind n env =
{ env with Bindings = Map.remove n env.Bindings; }
let bindingValue n env =
if env.Bindings.ContainsKey n then
Some(env.Bindings.Item n)
else
None
let consumeFirstFromCollection n env =
if env.Collections.ContainsKey n then
let coll = env.Collections.Item n
match coll with
| [] -> env |> unbind n
| _ ->
let first = coll.Head
let newCollections =
env.Collections
|> Map.remove n
|> Map.add n coll.Tail
{ env with Collections = newCollections }
|> bind n first
else failwith ("Unknown collection: " + n)
// All do functions take env - the execution environment - as last argument.
// All do functions return (a new) env as single return parameter.
let rec doPrint (s : string) env =
if s.StartsWith("<") && s.EndsWith(">") then
match bindingValue (s.Substring (1, s.Length - 2 )) env with
| Some(v) -> v
| _ -> s
else s
|> printfn "%s"
env
let rec skipPastWend name code =
match code with
| (Wend(cl) :: rest) when cl = name -> rest
| [] -> failwith "No Wend found!"
| (_ :: rest) -> skipPastWend name rest
let rec doWhileX name code env =
match code with
| (Print(s) :: rest) -> env |> (doPrint s) |> doWhileX name rest
| (While(cn) :: rest) -> env |> doWhile cn rest |> ignore; env |> doWhileX name (skipPastWend cn rest)
| (Wend(cn) :: rest) when cn = name -> env
| [] -> failwith ("While without Wend for: " + name)
| _ -> failwith ("nested while refering to same collection!")
and doWhile name code env =
let e0 = env |> consumeFirstFromCollection name
match bindingValue name e0 with
| Some(s) ->
e0 |> doWhileX name code |> doWhile name code
| None -> env
let rec run (program : MiniLanguage list) env =
match program with
| (Print(s) :: rest) -> env |> (doPrint s) |> run rest
| (While(cn) :: rest) ->
env
|> doWhile cn rest |> ignore
env |> run (skipPastWend cn program)
| (Wend(cn) :: rest) -> failwith "wend found in run()"
| [] -> env
let test() =
init [ "names", ["name1"; "name2"; "name3"; ]; "pets", ["pet1"; "pet2"] ]
|> run testProgram
|> printfn "%A"
(*
Running test() yields:
Hello, I am your Mini Language program
name1
pet1
pet2
Done with pets.
name2
pet1
pet2
Done with pets.
name3
pet1
pet2
Done with pets.
Done with names.
{Bindings = map [];
Collections =
map [("names", ["name1"; "name2"; "name3"]); ("pets", ["pet1"; "pet2"])];}
*)
Here my first version of isolating the algorithm. The number of callbacks is not entirely pretty. Can anyone come up with something simpler?
// The only function I had to "modify" to work with new "generalized" algorithm.
let consumeFirstFromCollection1 n env =
if env.Collections.ContainsKey n then
let coll = env.Collections.Item n
match coll with
| [] -> (env |> unbind n , false)
| _ ->
let first = coll.Head
let newCollections =
env.Collections
|> Map.remove n
|> Map.add n coll.Tail
({ env with Collections = newCollections }
|> bind n first , true)
else failwith ("Unknown collection: " + n)
type NamedList<'n,'t when 'n : comparison> = 'n * List<'t>
type Action<'a,'c> = 'c -> 'a -> 'a
type LoopPreparer<'a,'c> = 'c -> 'a -> 'a * bool
type CommandType = | RUN | BEGIN | END
type CommandClassifier<'c> = 'c -> CommandType
type Skipper<'c> = 'c -> List<'c> -> List<'c>
type InterpreterContext<'a,'c> =
{ classifier : CommandClassifier<'c>
executor : Action<'a,'c>
skipper : Skipper<'c>
prepareLoop : LoopPreparer<'a,'c>
isMatchingEnd : 'c -> 'c -> bool
}
let interpret (context : InterpreterContext<'a,'c>) (program : 'c list) (env : 'a) : 'a =
let rec loop front (code : 'c list) e =
let e0,hasData = e |> context.prepareLoop front
if hasData
then
e0
|> loop1 front (code)
|> loop front (code)
else e
and loop1 front code e =
match code with
| x :: more when (context.classifier x) = RUN ->
//printfn "RUN %A" x
e |> context.executor x |> loop1 front more
| x :: more when (context.classifier x) = BEGIN ->
//printfn "BEGIN %A" x
e |> loop x more |> ignore
e |> loop1 front (context.skipper x more)
| x :: more when (((context.classifier x) = END) && (context.isMatchingEnd front x)) -> /// && (context.isMatchingEnd front x)
//printfn "END %A" x
e
| [] -> failwith "No END."
| _ -> failwith "TODO: Not sure which case this is. But it is not a legal one!"
let rec interpr code e =
match code with
| [] -> e
| (first :: rest) ->
match context.classifier first with
| RUN -> env |> context.executor first |> interpr rest
| BEGIN ->
e |> loop first rest |> ignore
e |> interpr (context.skipper first rest)
| END -> failwith "END without BEGIN."
interpr program env
let test1() =
let context : InterpreterContext<MiniEnvironment,MiniLanguage> =
{ classifier = fun c-> match c with | MiniLanguage.Print(_) -> RUN | MiniLanguage.While(_) -> BEGIN | MiniLanguage.Wend(_) -> END;
executor = fun c env -> match c with | Print(s) -> doPrint s env | _ -> failwith "Not a known command.";
skipper = fun c cl -> match c with | While(n) -> skipPastWend n cl | _ -> failwith "first arg of skipper SHALL be While!"
prepareLoop = fun c env -> match c with | While(n) -> (consumeFirstFromCollection1 n env) | _ -> failwith "first arg of skipper SHALL be While!"
isMatchingEnd = fun cwhile cx -> match cwhile,cx with | (While(n),Wend(p)) when n = p -> true | _ -> false
}
init [ "names", ["name1"; "name2"; "name3"; ]; "pets", ["pet1"; "pet2"] ]
|> interpret context testProgram
|> printfn "%A"

Parsing a list of tokens into an expression tree

I want to parse expressions like those in typical Haskell source. I get an input stream, which is already tokenized and annotated with fixity and precedence. The set of operators is not known at compile time and may be arbitrary. The output should be a tree representing the expression. Here's a bit of what I tried:
-- A single token of the input stream
data Token a
= Prefix a
| Infix a Int Fixity -- The Int parameter represents the precedence
| LBrace
| RBrace
deriving (Show,Eq)
data Fixity
= InfixL
| InfixR
| InfixC
deriving (Show,Eq)
data Expression a
= Atom a
| Apply a a
deriving Show
-- Wrapped into either, if expression is malformed.
exprToTree :: [Token a] -> Either String (Expression a)
exprToTree = undefined
For the sake of simpleness, I don't treat lambdas special, they are just atoms.
But now, I am completely lost. How can I convert the stream of atoms into a tree? Can somebody please point me to an algorithm or help me with finding one.
In a nutshell then, even though you have a token list you still need a parser.
Parsec can handle alternative token streams, but you'll probably have to refer to the manual - a PDF available at Daan Leijen's "legacy" home page - http://legacy.cs.uu.nl/daan/download/parsec/parsec.pdf. You can roll your own parser without using a combinator library but you will be re-implementing some fraction of Parsec. As far as I remember UU_parsing expects to work with a separate scanner so its another option.
Although it doesn't handle parsing you might find Lennart Augustsson's "Lambda calculus cooked four ways" helpful for other things - http://www.augustsson.net/Darcs/Lambda/top.pdf
Edit - here is a partly worked out plan of how you can go about it with Parsec, for details you'll have to consult section 2.11 of the manual.
Suppose you have this data type for concrete "internal" tokens:
data InternalTok = Ident String
| BinOpPlus
| BinOpMinus
| UnaryOpNegate
| IntLiteral Int
deriving (Show)
Then you end get these types for the Parsec token and parse:
type MyToken = Token InternalTok
type MyParser a = GenParser MyToken () a
Define a helper function as per the Parsec manual - this handles show and pos so individual definitions are shorter cf. the mytoken function on page 19.
mytoken :: (MyToken -> Maybe a) -> MyParser a
mytoken test = token showToken posToken testToken
where
showToken tok = show tok
posToken tok = no_pos
testToken tok = test tok
For the moment your token type does not track source position, so:
no_pos :: SourcePos
no_pos = newPos "" 0 0 0
For each terminal you have to define a token function:
identifier :: MyParser MyToken
identifier = mytoken (\tok -> case tok of
a#(Prefix (Ident _)) -> Just a
_ -> Nothing)
intLiteral :: MyParser MyToken
intLiteral = mytoken (\tok -> case tok of
a#(Prefix (IntLiteral _)) -> Just a
_ -> Nothing)
binPlus :: MyParser MyToken
binPlus = mytoken (\tok -> case tok of
a#(Infix BinOpPlus _ _) -> Just a
_ -> Nothing)
binMinus :: MyParser MyToken
binMinus = mytoken (\tok -> case tok of
a#(Infix BinOpMinus _ _) -> Just a
_ -> Nothing)
unaryNegate :: MyParser MyToken
unaryNegate = mytoken (\tok -> case tok of
a#(Prefix UnaryNegate _ _) -> Just a
_ -> Nothing)
Edit - to handle custom infix operators you'll need these token parsers:
tokInfixL :: Int -> MyParser MyToken
tokInfixL n = mytoken $ \tok -> case tok of
a#(Infix _ i InfixL) | i == n -> Just a
_ -> Nothing)
tokInfixR :: Int -> MyParser MyToken
tokInfixR n = mytoken $ \tok -> case tok of
a#(Infix _ i InfixR) | i == n -> Just a
_ -> Nothing)
tokInfixC :: Int -> MyParser MyToken
tokInfixC n = mytoken $ \tok -> case tok of
a#(Infix _ i InfixC) | i == n -> Just a
_ -> Nothing)
tokPrefix :: MyParser MyToken
tokPrefix = mytoken (\tok -> case tok of
a#(Prefix _) -> Just a
_ -> Nothing)
Now you can define the parser - you need to fix the number of levels of precedence beforehand, there is no way around that fact as you need to code a parser for each level.
The top-level expression parse is simply calls the highest precedence parser
pExpression :: Parser Expersion
pExpression = expression10
For each precendence level you need a parser roughly like this, you'll have to work out non-assoc for yourself. Also you might need to do some work on chainl / chainr - I've only written a parser in this style with UU_Parsing it might be slightly different for Parsec. Note Apply is usually at the precedence highest level.
expression10 :: Parser Expression
expression10 =
Apply <$> identifier <*> pExpression
<|> Prefix <$> tokPrefix <*> pExpression
<|> chainl (Infix <$> tokInfixL 10) expression9
<|> chainr (Infix <$> tokInfixR 10) expression9
expression9 :: Parser Expression
expression9 =
Prefix <$> tokPrefix <*> pExpression
<|> chainl (Infix <$> tokInfixL 9) expression8
<|> chainr (Infix <$> tokInfixR 9) expression8
...
You'll have to extend your syntax to handle IntLiterals and Identifiers which are at level 0 in precedence:
expression0 :: Parser Expression
expression0 =
IntLit <$> intLiteral
<|> Ident <$> identifier
<|> ...
Edit - for unlimited precedence - maybe if you only have application and Atom maybe something like this would work. Note you'll have to change the tokInfixL and tokInfixR parsers to no longer match assoc-level and you may have to experiment with the order of alternatives.
expression :: Parser Expression
expression =
Apply <$> identifier <*> expression
<|> Prefix <$> tokPrefix <*> expression
<|> chainl (Infix <$> tokInfixL) expression
<|> chainr (Infix <$> tokInfixR) expression
<|> intLiteral
<|> identifier
intLiteral :: Parser Expression
intLiteral = Atom . convert <$> intLiteral
where
convert = ??
identifier :: Parser Expression
identifier = Atom . convert <$> intLiteral
where
convert = ??
After searching the web for another topic, I found this nice piece of code to do exactly what I want. Have a look:
data Op = Op String Prec Fixity deriving Eq
data Fixity = Leftfix | Rightfix | Nonfix deriving Eq
data Exp = Var Var | OpApp Exp Op Exp deriving Eq
type Prec = Int
type Var = String
data Tok = TVar Var | TOp Op
parse :: [Tok] -> Exp
parse (TVar x : rest) = fst (parse1 (Var x) (-1) Nonfix rest)
parse1 :: Exp -> Int -> Fixity -> [Tok] -> (Exp, [Tok])
parse1 e p f [] = (e, [])
parse1 e p f inp#(TOp op#(Op _ p' f') : TVar x : rest)
| p' == p && (f /= f' || f == Nonfix)
= error "ambiguous infix expression"
| p' < p || p' == p && (f == Leftfix || f' == Nonfix)
= (e, inp)
| otherwise
= let (r,rest') = parse1 (Var x) p' f' rest in
parse1 (OpApp e op r) p f rest'
-- Printing
instance Show Exp where
showsPrec _ (Var x) = showString x
showsPrec p e#(OpApp l (Op op _ _) r) =
showParen (p > 0) $ showsPrec 9 l . showString op . showsPrec 9 r
-- Testing
plus = TOp (Op "+" 6 Leftfix)
times = TOp (Op "*" 7 Leftfix)
divide = TOp (Op "/" 7 Leftfix)
gt = TOp (Op ">" 4 Nonfix)
ex = TOp (Op "^" 8 Rightfix)
lookupop '+' = plus
lookupop '*' = times
lookupop '/' = divide
lookupop '>' = gt
lookupop '^' = ex
fromstr [x] = [TVar [x]]
fromstr (x:y:z) = TVar [x] : lookupop y : fromstr z
test1 = fromstr "a+b+c"
test2 = fromstr "a+b+c*d"
test3 = fromstr "a/b/c"
test4 = fromstr "a/b+c"
test5 = fromstr "a/b*c"
test6 = fromstr "1^2^3+4"
test7 = fromstr "a/1^2^3"
test8 = fromstr "a*b/c"
(I took it from this page: http://hackage.haskell.org/trac/haskell-prime/attachment/wiki/FixityResolution/resolve.hs)

Resources