Packrat caching : Right to left vs. Left to right? - algorithm

I'm currently trying to familiarize myself with packrat parsing. So I've read the PDF paper from 2002 linked here and in section 2.3 it describes packrat caching as a preliminary process (which occurs before the actual parsing) in which a full caching table is pre-constructed by reading the input from right to left. Only then, the actual linear parsing from left to right can start.
But in every PEG parser implementation I found, the "cache" option is usually a caching process that occurs during the actual left to right parsing. For example here.
Is there any difference between both approaches?
Thank you.

I recently worked on similar research, met the exact same confusion, and resolved it. Regardless if you are still working on this topic, here's my answer.
Your understanding is correct:
Packrat parser scans input string from left to right
Packrat parser construct the cache from right to left
But there's just one approach, not two. Let's use one simple example Parsing Expression Grammar (PEG) without left-recursion: E -> Num + E | Num
(Note that, a left-recursion example requires another long explanation, you can refer CPython's implementation for details)
The Syntax Directed Translation (SDT) will be something like:
E -> a=Num + b=E { a + b }
E -> Num { Num }
And we can write a parse_E function in below:
def parse_E(idx):
if idx in cache['parse_E']:
return cache['parse_E'][idx]
lval, nidx = parse_Char(idx)
if nidx < len(self.tokens):
operator, nnidx = parse_Char(nidx)
if operator == '+':
# E -> Num + E
rval, nnnidx = parse_E(nnidx)
cache['parse_E'][idx] = lval + rval, nnnidx
return cache['parse_E'][idx]
# E -> Num
cache['parse_E'][idx] = lval, nidx
return cache['parse_E'][idx]
According to Byran Ford's paper, the parser needs to scan the input string from left to right and construct the cache in any position:
for idx in len(input_string):
parse_E(idx)
parse_Char(idx)
So, let's check the cache construction under the hood, initially, we have an empty cache and input string:
cache: {'parse_E': {}, 'parse_Char': {}}
input string: `2 + 3 + 4`
The function call happens in the following order when idx=0. Clearly, we construct the cache from right to left at position 0 (not even to mention idx=1 or above).
parse_Char(Y) happens earlier than parse_Char(X) (Y > X)
parse_Char(X) must happens earlier than parse_E(X)
parse_E(0) --- (E -> Num + E) (pending)
-> parse_Char(0) --- 2 (pending)
-> parse_Char(1) --- + (pending)
-> parse_E(2) --- E (E -> Num + E) (pending)
-> parse_Char(2) --- 3 (pending)
-> parse_Char(3) --- + (pending)
-> parse_E(4) --- E (E -> Num) (pending)
-> parse_Char(4) --- 4 (acc)
# Only after parse_Char(4) succeed and fill into cache, parse_E(4) can be successful...and so on.
If you want to read the full Python example of Packrat parser implementation, you can check my repository. It contains a handmade Packrat parser and a CPython PEG generated Packrat parser based on a simple meta grammar.

Related

Representation of a quadtree string

I am given a string in this form ppeeefpffeefe.
Values:
p represents parent node
e represents empty node
f represents a full node
Image that represents this string can be seen here: https://i.stack.imgur.com/GZppc.png
I am writing code in Haskell and trying to convert this representation into 1024 long list of integers where 1 represents the black (full) pixel and 0 represents white (empty) pixel assuming the image size is 32x32 pixels.
This is the code I have but Haskell is giving me trouble. I know that I need to keep track of how many parent nodes I have visited and update highest level that way. I am trying to take DFS approach but anything that will do a job will help.
getQuad :: String -> Int -> Int -> Int -> [Int] -> [Int]
getQuad tree level highestLevel pCount result | (node == 'p') = result ++ (getQuad (drop 1 tree) (level+1) level 0 result)
| (node == 'e') = result ++ (getQuad (drop 1 tree) level highestLevel pCount (result ++ (take (getAmount level) [0,0..])))
| (node == 'f') = result ++ (getQuad (drop 1 tree) level highestLevel pCount (result ++ (take (getAmount level) [1,1..])))
| otherwise = result
where
node = g
getNodeValue :: String -> Char
getNodeValue tree = if (length tree > 0) then tree !! 0 else 'x'
getAmount :: Int -> Int
getAmount l = 1024 `div` (4^l)
Thank you!
I think you're trying to do way too much in a single function. I recommend starting over, and explicitly introducing separate parsing phases (to convert your String to an ADT representing it) and production phases (to convert a value of the ADT to a list of Ints). For example, a suitable ADT might look like this:
data QuadTree = Parent QuadTree QuadTree QuadTree QuadTree
| Empty
| Full
deriving (Eq, Ord, Read, Show)
There are various techniques and libraries for parsing. Given your apparent level of expertise, and the simplicity of the format, I think I might recommend starting by writing the parser by hand and ignoring error-handling. Later you could think about learning about error-handling tools like Maybe or Either and parsing combinator libraries like parsec and friends to make it more flexible to changes in the language.
So, by hand and ignoring error-handling. Here's the skeleton I would put in place and try to fill out. Our parser needs to not just consume a String, but also be able to consume just part of a String and say what's left over: when handling a nested parent node, we need to return to the outer parent the chunk of the string that the inner parent didn't consume. So:
parseQuadTree :: String -> (String, QuadTree)
parseQuadTree ('p':rest) = -- TODO: exercise for the reader
parseQuadTree ('e':rest) = (rest, Empty)
parseQuadTree ('f':rest) = (rest, Full)
parseQuadTree other = error $ "parsing failed, expected a p, e, or f, but got " ++ other ++ " instead"
For example, we might expect the following ghci exchanges once we'd finished this function:
> parseQuadTree "e"
("", Empty)
> parseQuadTree "eef"
("ef", Empty)
> parseQuadTree "peeeeef"
("ef", QuadTree Empty Empty Empty Empty)
Once you have that, then I'd try to cook up a sensible representation of the 2d result. Perhaps a nested list would do:
type Image = [[Int]]
For example, you might interpret each element of the outer list as a row of the image; its elements are the columns of that row. The three basic operations you need for this thing are pasting images side-by-side horizontally and vertically and creating a blank image.
hcat, vcat :: Image -> Image -> Image
hcat = -- TODO: exercise for the reader
vcat = -- TODO: exercise for the reader
blank :: Int -> Int -> Int -> Image
blank w h pixel = -- TODO: exercise for the reader
-- OR, you could take just one size argument; we only ever need
-- square blank images in the following code
For example, you might expect these ghci exchanges once we'd finished implementing them:
> :set +m
> let x = [[0, 1]
| ,[2, 3]
| ]
| y = [[4, 5]
| ,[6, 7]
| ]
|
> hcat x y
[[0,1,4,5],[2,3,6,7]]
> vcat x y
[[0,1],[2,3],[4,5],[6,7]]
> blank 2 3 4
[[4,4],[4,4],[4,4]]
Now you can write a function which converts a QuadTree to an Image. We'll have to know how big the image is supposed to be, so let's make that an argument to the function.
renderQuadTree :: Int -> QuadTree -> Image
renderQuadTree size (Parent nw ne sw se) = -- TODO: exercise for the reader; use hcat and vcat
where subtreeSize = size `div` 2
renderQuadTree size Empty = blank size size 0
renderQuadTree size Full = blank size size 1
For example, we might expect some such exchanges at ghci once this is finished:
> renderQuadTree 2 Empty
[[0,0],[0,0]]
> renderQuadTree 2 Full
[[1,1],[1,1]]
> renderQuadTree 2 (Parent Empty Full Full Empty)
[[0,1],[1,0]]
> renderQuadTree 4 (Parent Empty (Parent Full Empty Empty Full) Empty Full)
[[0,0,1,0],[0,0,0,1],[0,0,1,1],[0,0,1,1]]
Finally we could make a top-level function that combines all these into one convenient piece.
getQuad :: String -> [Int]
getQuad s = case parseQuadTree s of
("", t) -> concat (renderQuadTree 32 t)
(s', _) -> error $ "parser did not consume the entire description string, leftovers are: " ++ s

Comparing letter occurrence with comparator

I am having a problem with sorting a string array[], based on which strings have most letter 'p' in them, i.e., s1 (apple) would be before s2 (ape)...
I am learning how to implement Comparator to do this and use then s1.compareTo(s2) and lambda. The big question is, can't I somehow use a stream to do this ?
This is how I did it for my String array COUNTRIES in reversed alphabetic sorting
Comparator<String> reverseAlphabetic = (s1,s2) -> -s1.compareToIgnoreCase(s2);
Arrays.sort(COUNTRIES,reverseAlphabetic);
System.out.println("\nCountries in reverse alphabetic order");
for (int i=0; i<10;i++)
System.out.println("\t"+COUNTRIES[i]);
Y̶o̶u̶ ̶c̶a̶n̶ ̶d̶o̶ ̶s̶o̶m̶e̶t̶h̶i̶n̶g̶ ̶l̶i̶k̶e̶ ̶t̶h̶i̶s̶ ̶:̶
Comparator<String> comparator = (str1, str2) ->
((str1.length() - str1.replaceAll("p", "").length()) -
(str2.length() - str2.replaceAll("p", "").length()));
List<String> list = Arrays.asList("ape", "apple", "appple");
list.sort(comparator);
Actually my solution had fundamental mistakes. #Holger commented a solution in my deleted answer.
list.sort(Comparator
.comparingLong((String s) -> s.chars().filter(c -> c == 'p').count()).reversed());
commented by #Holger
Your first variant is broken too, as a - b - c - d is not the same as a - b - (c - d). It just produces the desired result by accident; a different original order will lead to different results. Yet another reason not to use minus when you mean, e.g. Integer.compare(…, …). A correct and efficient comparator can be as simple as Comparator.comparingLong(s -> s.chars().filter(c -> c=='p').count()) (or to have most occurences first, Comparator.comparingLong((String s) -> s.chars().filter(c -> c=='p').count()).reversed()), though a old-fashioned counting loop would be even better

Resolving PREDICT/PREDICT conflicts in LL(1)

I'm working on a simple LL(1) parser generator, and I've run into an issue with PREDICT/PREDICT conflicts given certain input grammars. For example, given an input grammar like:
E → E + E
| P
P → 1
I can remove out the left recursion from E, replacing it with a roughly equivalent right recursive rule, thus arriving at the grammar:
E → P E'
E' → + E E'
| ε
P → 1
Next, I can compute the relevant FIRST and FOLLOW sets for the grammar, and end up with the following:
FIRST(E) = { 1 }
FIRST(E') = { +, ε }
FIRST(P) = { 1 }
FOLLOW(E) = { +, EOF }
FOLLOW(E') = { +, EOF }
FOLLOW(P) = { +, EOF }
And finally, using PREDICT(A → α) = { FIRST(α) - ε } ∪ (FOLLOW(A) if ε ∈ FIRST(α) else ∅) to construct the PREDICT sets for the grammar, the resulting sets are as follows.
PREDICT(1. E → P E') = { 1 }
PREDICT(2. E' → + E E') = { +, EOF }
PREDICT(3. E' → ε) = { +, EOF }
PREDICT(4. P → 1) = { 1 }
So this is where I run into the conflict that PREDICT(2) = PREDICT(3), and thus, I cannot produce a parse table as the grammar is not LL(1), since parser wouldn't be able to choose which rule should be applied.
What I'm really wondering is whether it's possible to resolve the conflict or factor the grammar such that the conflict can be avoided, and produce a legal LL(1) grammar, without having to directly modify the original input grammar.
The problem here is that your original grammar is ambiguous.
E → E + E
E → P
means that P + P + P can be parsed either as (P + P) + P or P + (P + P). Eliminating left recursion doesn't fix the ambiguity, so the modified grammar is also ambiguous. And ambiguous grammars can't be LL(k) (or, for that matter, LR(k)).
So you need to make the grammar unambiguous:
E → E + P
E → P
(That's the common left-associative version.) Once you eliminate left recursion, you end up with:
E → P E'
E' → + P E'
| ε
Now + is not in FOLLOW(E').
(The example is drawn straight from the Dragon book, but simplified; it's example 4.8 in the rather battered old copy I have.)
It's worth noting that the transformation used here preserves the set of strings derived by the grammar, but not the derivation. The parse tree which results from the modified grammar is effectively right-associative, so it will need to be reprocessed to recover the desired parse. This fact is rather briefly mentioned by the Dragon book authors:
Although left-recursion elimination and left factoring are easy to do, they make the resulting grammar hard to read and difficult to use for translation purposes. (My emphasis)
They go on to suggest that operator precedence parsing can be used for expressions, and then mention that if an LR parser generator is available, dividing the grammar into a predictive part and an operator-precedence part is no longer necessary.

Syntax directed definition (count number of pairs of parentheses)

given the following grammar I have to find the appropriate semantic actions to calculate, for each string of the language, the number of pairs of parentheses in the string.
S -> (L)
S -> a
L -> L, S
L -> S
Usually, to perform this type of exercise, I build a derivation tree of a sample string and then I add the attributes. After that it is easier to find the semantic rules.
So I built this derivation tree for the string "((a, (a), a))", but I can't proceed with the resolution of the exercise. How do I count the pairs of parentheses? I'am not able to do that...
I do't want the solution but I'd like someone to help me with the reasoning to be made in these cases.
(I'm sorry for the bad tree...)
The OP wrote:
These might be the correct semantic actions for this grammar?
S -> (L) {S.p = counter + 1}
S -> a {do nothing}
L -> L, S {L.p = S.p}
L -> S {L.p = S.p}
.p is a synthesized attribute.
S-> (S) { S.count =S.count + 1}
S-> SS{ S.count = S.count + S.count}
S-> ϵ{S.count = 0}
This should make things clear

Get mathematica to simplify expression with another equation

I have a very complicated mathematica expression that I'd like to simplify by using a new, possibly dimensionless parameter.
An example of my expression is:
K=a*b*t/((t+f)c*d);
(the actual expression is monstrously large, thousands of characters). I'd like to replace all occurrences of the expression t/(t+f) with p
p=t/(t+f);
The goal here is to find a replacement so that all t's and f's are replaced by p. In this case, the replacement p is a nondimensionalized parameter, so it seems like a good candidate replacement.
I've not been able to figure out how to do this in mathematica (or if its possible). I tried:
eq1= K==a*b*t/((t+f)c*d);
eq2= p==t/(t+f);
Solve[{eq1,eq2},K]
Not surprisingly, this doesn't work. If there were a way to force it to solve for K in terms of p,a,b,c,d, this might work, but I can't figure out how to do that either. Thoughts?
Edit #1 (11/10/11 - 1:30)
[deleted to simplify]
OK, new tact. I've taken p=ton/(ton+toff) and multiplied p by several expressions. I know that p can be completely eliminated. The new expression (in terms of p) is
testEQ = A B p + A^2 B p^2 + (A+B)p^3;
Then I made the substitution for p, and called (normal) FullSimplify, giving me this expression.
testEQ2= (ton (B ton^2 + A^2 B ton (toff + ton) +
A (ton^2 + B (toff + ton)^2)))/(toff + ton)^3;
Finally, I tried all of the suggestions below, except the last (not sure how it works yet!)
Only the eliminate option worked. So I guess I'll try this method from now on. Thank you.
EQ1 = a1 == (ton (B ton^2 + A^2 B ton (toff + ton) +
A (ton^2 + B (toff + ton)^2)))/(toff + ton)^3;
EQ2 = P1 == ton/(ton + toff);
Eliminate[{EQ1, EQ2}, {ton, toff}]
A B P1 + A^2 B P1^2 + (A + B) P1^3 == a1
I should add, if the goal is to make all substitutions that are possible, leaving the rest, I still don't know how to do that. But it appears that if a substitution can completely eliminate a few variables, Eliminate[] works best.
Have you tried this?
K = a*b*t/((t + f) c*d);
Solve[p == t/(t + f), t]
-> {{t -> -((f p)/(-1 + p))}}
Simplify[K /. %[[1]] ]
-> (a b p)/(c d)
EDIT: Oh, and are you aware of Eliminiate?
Eliminate[{eq1, eq2}, {t,f}]
-> a b p == c d K && c != 0 && d != 0
Solve[%, K]
-> {{K -> (a b p)/(c d)}}
EDIT 2: Also, in this simple case, solving for K and t simultaneously seems to do the trick, too:
Solve[{eq1, eq2}, {K, t}]
-> {{K -> (a b p)/(c d), t -> -((f p)/(-1 + p))}}
Something along these lines is discussed in the MathGroup post at
http://forums.wolfram.com/mathgroup/archive/2009/Oct/msg00023.html
(I see it has an apocryphal note that is quite relevant, at least to the author of that post.)
Here is how it might be applied in the example above. For purposes of keeping this self contained I'll repeat the replacement code.
replacementFunction[expr_, rep_, vars_] :=
Module[{num = Numerator[expr], den = Denominator[expr],
hed = Head[expr], base, expon},
If[PolynomialQ[num, vars] &&
PolynomialQ[den, vars] && ! NumberQ[den],
replacementFunction[num, rep, vars]/
replacementFunction[den, rep, vars],
If[hed === Power && Length[expr] == 2,
base = replacementFunction[expr[[1]], rep, vars];
expon = replacementFunction[expr[[2]], rep, vars];
PolynomialReduce[base^expon, rep, vars][[2]],
If[Head[hed] === Symbol &&
MemberQ[Attributes[hed], NumericFunction],
Map[replacementFunction[#, rep, vars] &, expr],
PolynomialReduce[expr, rep, vars][[2]]]]]]
Your example is now as follows. We take the input, and also the replacement. For the latter we make an equivalent polynomial by clearing denominators.
kK = a*b*t/((t + f) c*d);
rep = Numerator[Together[p - t/(t + f)]];
Now we can invoke the replacement. We list the variables we are interested in replacing, treating 'p' as a parameter. This way it will get ordered lower than the others, meaning the replacements will try to remove them in favor of 'p'.
In[127]:= replacementFunction[kK, rep, {t, f}]
Out[127]= (a b p)/(c d)
This approach has a bit of magic in figuring out what should be the listed "variables". Possibly some further tweakage could be done to improve on that. But I believe that, generally, simply not listing the things we want to use as new replacements is the right way to go.
Over the years there have been variants of this idea on MathGroup. It is possible that some others may be better suited to the specific expression(s) you wish to handle.
--- edit ---
The idea behind this is to use PolynomialReduce to do algebraic replacement. That is to say, we do not try for pattern matching but instead use polynomial "canonicalization" a method. But in general we're not working with polynomial inputs. So we apply this idea recursively on PolynomialQ arguments inside NumericQ functions.
Earlier versions of this idea, along with some more explanation, can be found at the note referenced below, as well as in notes it references (how's that for explanatory recursion?).
http://forums.wolfram.com/mathgroup/archive/2006/Aug/msg00283.html
--- end edit ---
--- edit 2 ---
As observed in the wild, this approach is not always a simplifier. It does algebraic replacement, which involves, under the hood, a notion of "term ordering" (roughly, "which things get replaced by which others?") and thus simple variables may expand to longer expressions.
Another form of term rewriting is syntactic replacement via pattern matching, and other responses discuss using that approach. It has a different drawback, insofar as the generality of patterns to consider might become overwhelming. For example, what does one do with k^2/(w + p^4)^3 when the rule is to replace k/(w + p^4) with q? (Specifically, how do we recognize this as being equivalent to (k/(w + p^4))^2*1/(w + p^4)?)
The upshot is one needs to have an idea of what is desired and what methods might be feasible. This of course is generally problem specific.
One thing that occurs is perhaps you want to find and replace all commonly occurring "complicated" expressions with simpler ones. This is referred to as common subexpression elimination (CSE). In Mathematica this can be done using a function called Experimental`OptimizeExpression[]. Here are several links to MathGroup posts that discuss this.
http://forums.wolfram.com/mathgroup/archive/2009/Jul/msg00138.html
http://forums.wolfram.com/mathgroup/archive/2007/Nov/msg00270.html
http://forums.wolfram.com/mathgroup/archive/2006/Sep/msg00300.html
http://forums.wolfram.com/mathgroup/archive/2005/Jan/msg00387.html
http://forums.wolfram.com/mathgroup/archive/2002/Jan/msg00369.html
Here is an example from one of those notes.
InputForm[Experimental`OptimizeExpression[(3 + 3*a^2 + Sqrt[5 + 6*a + 5*a^2] +
a*(4 + Sqrt[5 + 6*a + 5*a^2]))/6]]
Out[206]//InputForm=
Experimental`OptimizedExpression[Block[{Compile`$1, Compile`$3, Compile`$4,
Compile`$5, Compile`$6}, Compile`$1 = a^2; Compile`$3 = 6*a;
Compile`$4 = 5*Compile`$1; Compile`$5 = 5 + Compile`$3 + Compile`$4;
Compile`$6 = Sqrt[Compile`$5]; (3 + 3*Compile`$1 + Compile`$6 +
a*(4 + Compile`$6))/6]]
--- end edit 2 ---
Daniel Lichtblau
K = a*b*t/((t+f)c*d);
FullSimplify[ K,
TransformationFunctions -> {(# /. t/(t + f) -> p &), Automatic}]
(a b p) / (c d)
Corrected update to show another method:
EQ1 = a1 == (ton (B ton^2 + A^2 B ton (toff + ton) +
A (ton^2 + B (toff + ton)^2)))/(toff + ton)^3;
f = # /. ton + toff -> ton/p &;
FullSimplify[f # EQ1]
a1 == p (A B + A^2 B p + (A + B) p^2)
I don't know if this is of any value at this point, but hopefully at least it works.

Resources