This data structure is for store words, where each letter is a node of a tree like this.
H -> E -> L -> O(f)
-> L(f)
-> I -> G -> H(f)
-> L -> L(f)
So the root node is H and this has 2 childs, and the (f) indicate that is the end of the word.
Is this a know Data structure, have a name, and have know applications?
Because I want to use it, to save memory in a web scraper, but I read more about these pros and cons if this exists?
This data structure is called TRIE structure which are based on prefix of a string. These are mainly used to visualise strings as a graph
This will help you out
https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014
Related
this grammar was in my midterm exam but I couldn't find two different parse tree it ask to show that it's ambiguous
K -> QK | ε
Q -> Qa | aQb | ab
if I didn't see that it has left recursive I was going to write that is not ambiguous,
thank you.
K -> QK -> QQK -> QQ
-> abQ -> abaQb -> abaQab
-> abaabab
K -> QK -> QQK -> QQQK -> QQQ
-> QaQQ -> abaQQ -> abaabQ
-> abaabab
Edit to add some commentary: I'm not sure there's a good way to solve these. Look for rules that can "do the same thing" (like deriving longer strings), and start there. In this case, the issue is that we can add Q in multiple ways. You can try working backwards as well: imagine strings in the language and how they would be finished in the grammar. If you're looking for the shortest counter examples possible, this is helpful since the ambiguity will typically happen fairly late in these strings.
While pondering how to best map, i.e. traverse, an a -> Maybe a-Kleisli over an unboxed vector, I looked for an existing implementation. Obviously U.Vector is not Traversable, but it does supply a mapM, which for Maybe of course works just fine.
But the question is: is the Monad constraint really needed? Well, it turns out that even boxed vectors cheat for the Traversable instance: they really just traverse a list, which they convert from/to:
instance Traversable.Traversable Vector where
{-# INLINE traverse #-}
traverse f xs = Data.Vector.fromList Applicative.<$> Traversable.traverse f (toList xs)
mono-traversable does the same thing also for unboxed vectors; here this seems even more gruesome performance-wise.
Now, I wouldn't be surprised if vector was actually able to fuse many of these hacked traversals into a far more efficient form, but still – there seems to be a fundamental problem, preventing us from implementing a traversal on an array right away. Is there any “deep reason” for this inability?
After reading through the relevant source of vector and trying to make mapM work with Applicative I think the reason why Data.Vector.Unboxed.Vector doesn't have a traverse :: (Applicative f, Unbox a, Unbox b) -> (a -> f b) -> Vector a -> f (Vector b) function and Data.Vector.Vector doesn't have a native traverse is the fusion code. The offender is the following Stream type:
-- Data/Vector/Fusion/Stream/Monadic.hs Line: 137
-- | Result of taking a single step in a stream
data Step s a where
Yield :: a -> s -> Step s a
Skip :: s -> Step s a
Done :: Step s a
-- | Monadic streams
data Stream m a = forall s. Stream (s -> m (Step s a)) s
This is used internally to implement mapM. The m will be the same as from your initial call to Data.Vector.Unboxed.mapM. But because the spine of this stream is inside the m functor, it is not possible to work with it if you only have an applicative for m.
See also this issue on the vector GitHub repo: Weaken constraint on mapM.
Disclaimer: I don't really know how fusion works. I don't know how vector works.
Is there an algorithm that implements a purely functional set?
Expected operations would be union, intersection, difference, element?, empty? and adjoin.
Those are not hard requirements though and I would be happy to learn an algorithm that only implements a subset of them.
You can use a purely functional map implementation, where you just ignore the values.
See http://hackage.haskell.org/packages/archive/containers/0.1.0.1/doc/html/Data-IntMap.html (linked to from https://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki ).
(sidenote: For more information on functional datastructures, see http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504 )
A purely functional implementation exists for almost any data structure. In the case of sets or maps, you typically use some form of search tree, e.g. red/black trees or AVL trees. The standard reference for functional data structures is the book by Okasaki:
http://www.cambridge.org/gb/knowledge/isbn/item1161740/
Significant parts of it are available for free via his thesis:
http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
The links from the answer by #ninjagecko are good. What I've been following recently are the Persistent Data Structures used in Clojure, which are functional, immutable and persistent.
A description of the implementation of the persistent hash map can be found in this two-part blog post:
http://blog.higher-order.net/2009/09/08/understanding-clojures-persistenthashmap-deftwice/
http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii/
These are implementations of some of the ideas (see the first answer, first entry) found in this reference request question.
The sets that come out of these structures support the functions you need:
http://clojure.org/data_structures#Data Structures-Sets
All that's left is to browse the source code and try to wrap your head around it.
Here is an implementation of a purely functional set in OCaml (it is the standard library of OCaml).
Is there an algorithm that implements a purely functional set?
You can implement set operations using many different purely functional data structures. Some have better complexity than others.
Examples include:
Lists
Where we have:
List Difference:
(\\) :: Eq a => [a] -> [a] -> [a]
The \\ function is list difference ((non-associative). In the result of xs \\ ys, the first occurrence of each element of ys in turn (if any) has been removed from xs. Thus
union :: Eq a => [a] -> [a] -> [a]
The union function returns the list union of the two lists. For example,
"dog" `union` "cow" == "dogcw"
Duplicates, and elements of the first list, are removed from the the second list, but if the first list contains duplicates, so will the result. It is a special case of unionBy, which allows the programmer to supply their own equality test.
intersect :: Eq a => [a] -> [a] -> [a]
The intersect function takes the list intersection of two lists. For example,
[1,2,3,4] `intersect` [2,4,6,8] == [2,4]
If the first list contains duplicates, so will the result.
Immutable Sets
More efficient data structures can be designed to improve the complexity of set operations. For example, the standard Data.Set library in Haskell implements sets as size-balanced binary trees:
Stephen Adams, "Efficient sets: a balancing act", Journal of Functional Programming 3(4):553-562, October 1993, http://www.swiss.ai.mit.edu/~adams/BB/.
Which is this data structure:
data Set a = Bin !Size !a !(Set a) !(Set a)
| Tip
type Size = Int
Yielding complexity of:
union, intersection, difference: O(n+m)
To preface this, my knowledge of this kind of stuff is puny.
Anyways, I've been developing a context-free grammar to describe the structure of alegbraic expressions so I can teach myself how the CYK parsing algorithm works. I understand how such a structure can work with only infix algebraic expressions, but I cannot understand how to develop a grammar that can handle both the unary and binary definitions of the "-" operator.
For reference, here's the grammar I've written (where S is the start symbol) in CNF:
S -> x
A -> O S
S -> L B
B -> S R
S -> K S
O -> +
O -> -
O -> *
O -> /
O -> ^
K -> -
L -> (
R -> )
The problem is that how can the CYK parsing algorithm know ahead of time whether to decide between S -> K S and A -> O S when it encounters the "-" operator? Is such a grammar context-free anymore? And most importantly, since programming languages can handle languages with both the binary and unary minus sign, how should I reasonably parse this?
This seems like a problem related to finite state automata and I don't remember everything from my coursework, but I wrote a CYK parser in OCaml, so I'll go ahead and take a shot :)
If you're trying to parse an expression like 3- -4 for example, you would have your S -> K S rule consume the -4 and then your A -> O S rule would absorb the - -4. This would eventually work up to the top-most S production rule. You should be careful with the grammar you're using though, since the A production rule you listed cannot be reached from S and you should probably have a S -> S O S rule of some sort.
The way that CYK parsing algorithms work is through backtracking, not through the "knowing ahead of time" that you mentioned in your question. What your CYK algorithm should do is to parse the -4 as a S -> K S rule and then it would try to absorb the second - with the S -> K S rule again because this production rule allows for an arbitrarily long chain of unary -. But once your algorithm realizes that it's stuck with the intermediate parse 3 S, it realizes that it has no production symbols that it can use to parse this. Once it realizes that this is no longer parseable, it will go back and instead try to parse the - as an S -> O S rule instead and continue on its merry way.
This means that your grammar remains context-free since a context-sensitive grammar means that you have terminals on the left side of the production rules, so you're good in that respect. HTH!
The grammar is ambiguous, and the parser cannot decide which case to take.
You should probably use a grammar like the following:
S -> EXPR
EXPR -> (EXPR)
EXPR -> - EXPR
EXPR -> EXPR + EXPR
EXPR -> EXPR - EXPR
// etc...
Grammars based on algebraic expressions are rather difficult to disambiguate. Here are some examples of problems which need to be addressed:
a+b+c naturally creates two parse trees. To resolve this, you need to resolve the ambiguity of the associativity of +. You may wish to let a left-to-right parsing strategy take care of this for you, but careful: exponentiation should probably associate right-to-left.
a+b*c naturally creates two parse trees. To fix this problem, you need to deal with operator precedence.
implicit multiplication (a+bc), if it is allowed, creates all sorts of nightmares, mostly at tokenization.
unary subtraction is problematic, as you mention.
If we want to solve these problems, but still have a fast-parsing grammar specialized for algebra, one approach is to have various "levels" of EXPR, one for each level of binding required by precedence levels. For example,
TERM -> (S)
EXPO -> TERM ^ EXPO
PROD -> PROD * EXPO
PROD -> PROD / EXPO
PROD -> -PROD
SUM -> SUM + PROD
SUM -> SUM - PROD
S -> SUM
This requires that you also allow "promotion" of types: SUM -> PROD, PROD -> EXP, EXP -> TERM, etc, so that things can terminate.
Hope this helps!
I don't know haskell syntax, but I know some FP concepts (like algebraic data types, pattern matching, higher-order functions ect).
Can someone explain please, what does this code mean:
data Tree ? = Leaf ? | Fork ? (Tree ?) (Tree ?)
rotateR tree = case tree of
Fork q (Fork p a b) c -> Fork p a (Fork q b c)
As I understand, first line is something like Tree-type declaration (but I don't understand it exactly). Second line includes pattern matching (I don't understand as well why do we need to use pattern matching here). And third line does something absolutely unreadable for non-haskell developer. I've found definition of Fork as fork (f,g) x = (f x, g x) but I can't move further anymore.
First of all the data type definition should not contain question marks, but normal letters:
data Tree a = Leaf a | Fork a (Tree a) (Tree a)
It defines a type Tree that contains elements of some not further specified type a.
The tree is either a Leaf, containing an element of type a, or it is a Fork, containing also an element of type a and two subtrees. The subtrees are Tree structures that contain elements of type a.
Important to note is that Haskell uses parenthesis purely for grouping, like in 2 * (2+3), not to specify calling functions. To call functions, the parameters are just written after the function name, separated with spaces, like in sin 30 or compare "abc" "abd".
In the case statement, the part to the left of -> is a pattern match, the part to the right is the functions result in case the tree actually had the form specified on the left. The pattern Fork q (Fork p a b) c matches if the tree is a Fork (that's the Fork from the data type definition) and the first subtree of it is another Fork. The lowercase letters are all just variables, capturing the different parts of the tree structure matched. So p would be the element contained in the subtree, a would be the subtrees first branch and b the second one.
The right side of the ->, Fork p a (Fork q b c), now builds a new tree from these parts matched in the pattern match. The lower case variables are all the tree parts matched on the left, and the Forks are the constructors from the data type definition. It build a tree that is a Fork and has a second subtree that is also a Fork (the part in parenthesis). The remaining pieces of this tree are just the parts of the tree that has been "dissolved" on the left side.
I think you misunderstand Fork. It is not a function, but a constructor for type Tree. It is essentially a node in the Tree data structure... Each node in Tree is either a Leaf (with a value) or a Fork (with a value and two sub-nodes).
Pattern matching is used to transform the structure. My ASCII art is not good enough to give you a drawing, but it sort-of moves 'left nodes' up and 'right nodes' down.
Note: I say you may be misunderstanding Fork, because fork (f,g) x = (f x, g x) is something completely different. It is a higher order function in this case and has nothing to do with your Tree structure.
Hope that helps :),
Carl