Pattern match on chars - frp

Im pretty new to Elm (elm-server 0.9.2), and i have encountered a problem that has become quite an obstacle for me.
Here is my problem:
according to version-0.9 documentation I should be able to write:
stripCommas str =
case str of
',' :: rest -> stripCommas rest
c :: rest -> c :: stripCommas rest
So to test this I basically did my own function (quite similar :) ):
stripNewLine str =
case str of
'\n' :: rest -> stripNewLine rest
c :: rest -> c :: stripNewLine rest
But booth of them fails, after some debugging i notice this in the javascript:
var stripNewLine = function(str){
return function(){
switch (str.ctor) {
case '::':
switch (str._0) {
case Chr '\n':
return stripNewLine(str._1);
}
return _L.Cons(str._0,stripNewLine(str._1));
}_E.Case($moduleName,'between lines 22 and 33')}();};
I don't know much about javascript but it seems that Chr '\n' should be Chr('\n'), tough I might be wrong...Can someone point me in the right direction here cause Im lost...

It is an Elm bug – which has been fixed since the latest stable release – and you're right, it's about wrongly generated Javascript.
Additionally, there's also a logic problem in that example code you're copying from that announcement blog post, which is that it's doing a non-exhaustive pattern match.
Strings are lists of chars (i.e. String is just a [Char]), so a proper pattern match should handle the empty list case, i.e:
stripCommas str =
case str of
[] -> str
',' :: rest -> stripCommas rest
c :: rest -> c :: stripCommas rest
main = asText <| stripCommas "1,2,3,4,5"
You can test this here (select "master/HEAD" from the version options, which is a later version than the current release which has the JS generation bug).

Related

Porter Stemmer, Step 1b

Similar question to this [1]porter stemming algorithm implementation question?, but expanded.
Basically, step1b is defined as:
Step1b
`(m>0) EED -> EE feed -> feed
agreed -> agree
(*v*) ED -> plastered -> plaster
bled -> bled
(*v*) ING -> motoring -> motor
sing -> sing `
My question is why does feed stem to feed and not fe? All the online Porter Stemmer's I've tried online stems to feed, but from what I see, it should stem to fe.
My train of thought is:
`feed` does not pass through `(m>0) EED -> EE` as measure of `feed` minus suffix `eed` is `m(f)`, hence `=0`
`feed` will pass through `(*v*) ED ->`, as there is a vowel in the stem `fe` once the suffix `ed` is removed. So will stem at this point to `fe`
Can someone explain to me how online Porter Stemmers manage to stem to feed?
Thanks.
It's because "feed" doesn't have a VC (vowel/consonant) combination, therefore m = 0. To remove the "ed" suffix, m > 0 (check the conditions for each step).
The rules for removing a suffix will be given in the form
(condition) S1 -> S2
This means that if a word ends with the suffix S1, and the stem before S1 satisfies the given condition, S1 is replaced by S2. The condition is usually given in terms of m, e.g.
(m > 1) EMENT ->
Here S1 is `EMENT' and S2 is null. This would map REPLACEMENT to REPLAC, since REPLAC is a word part for which m = 2.
now, in your example :
(m>0) EED -> EE feed -> feed
before 'EED', are there vowel(s) followed by constant(s), repeated more than zero time??
answer is no, befer 'EED' is "F", there are not vowel(s) followed by constant(s)
In feed m refers to vowel,consonant pair. there is no such pair.
But in agreed "VC" is ag. Hence it is replaced by agree. The condition is m>0.
Here m=0.
It's really sad that nobody here actually read the question. This is why feed doesn't get stemmed to fe by rule 2 of step 1b:
The definition of the algorithm states:
In a set of rules written beneath each other, only one is obeyed, and this
will be the one with the longest matching S1 for the given word.
It isn't clearly statet that the conditions are always ignored here, but they are. So feed does match to the first rule (but it isn't applied since the condition isn't met) and therefore the rest of the rules in 1b are ignored.
The code would approximately look like this:
// 1b
if(word.ends_with("eed")) { // (m > 0) EED -> EE
mval = getMvalueOfStem();
if(mval > 0) {
word.erase("d");
}
}
else if(word.ends_with("ed")) { // (*v*) ED -> NULL
if(containsVowel(wordStem) {
word.erase("ed");
}
}
else if(word.ends_with("ing")) { // (*v*) ING -> NULL
if(containsVowel(wordStem) {
word.erase("ing");
}
}
The important things here are the else ifs.

Haskell grammar to validate a string in specific format

I would like to define a grammar in Haskell that matches a string in format "XY12XY" (some alpha followed by some numerics), eg variable names in programming languages.
customer123 is a valid variable name, but '123customer' is not a valid variable name.
I am at a loss how to define the grammar and write a validator function that would validate whether a given string is valid variable name. I have been trying to understand and adapt the parser example at: https://wiki.haskell.org/GADT but I just can't get my head around how to tweak it to make it work for my need.
If any kind fellow Haskell gurus would help me define this please:
validate :: ValidFormat -> String -> Bool
validate f [] = False
validate f s = ...
I would like to define the ValidFormat grammar as:
varNameFormat = Concat Alpha $ Concat Alpha Numeric
I'd start with a simple parser and see if that satisfies your needs, unless you can explain why this is not enough for your use case. Parsers are pretty straightforward. I'll give a very simple (and maybe incomplete) example with attoparsec:
import Control.Applicative
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8 as B
validateVar :: B.ByteString -> Bool
validateVar bstr = case parseOnly variableP bstr of
Right _ -> True
Left _ -> False
variableP :: Parser String
variableP =
(++)
<$> many1 letter_ascii -- must start with one or more letters
<*> many (digit <|> letter_ascii) -- then can have any combination of letters/digits
<* endOfInput -- make sure we don't ignore invalid trailing chars
variableP combines parsers via <*> and will require you to handle both results of many1 letter_ascii and many (digit <|> letter_ascii). In this case we just concatenate both results via (++), check the types of many1, many, letter_ascii and digit. The <* says "parse this, but discard the result of the right hand parser" (otherwise you'd have to handle 3 results).
That means if you run the parser on "abc123" you'll get back "abc123". If you parse "1abc" the parser will fail.
Check the type of parseOnly:
parseOnly :: Parser a -> ByteString -> Either String a
We pass it our parser and the bytestring it should parse. If the parser fails we'll get Left <something went wrong>. If the parser succeeds, we'll get Right <our string>. The cool thing is... instead of just giving a string on success, we could do pretty much anything with the results in variableP, as in: use something different than (++), convert the types and whatnot (mind that the Parser type might also have to change then).
Since we only care if the parser succeeded in validateVar, we can just ignore the result in either case.
So instead of defining GADTs for your grammar, you just define Parsers.
You might also find this link useful for a tutorial: http://www.seas.upenn.edu/~cis194/fall14/spring13/lectures.html (week 10 and 11, including the assignments where you basically write your own little parser library)
I've taken this from examples of regex-applicative
import Text.Regex.Applicative
import Data.Char
import Data.Maybe
varNameFormat :: RE Char String
varNameFormat = (:) <$> psym isAlpha <*> many (psym isAlphaNum)
validate :: RE Char String -> String -> Bool
validate re str = isJust $ str =~ re
You will have
*Main> validate varNameFormat "a123"
True
*Main> validate varNameFormat "1a23"
False

How to define a wildcard pattern using cerl:c_clause

I'm trying to compile some personal language to erlang. I want to create a function with pattern matching on clauses.
This is my data :
Data =
[ {a, <a_body> }
, {b, <b_body> }
, {c, <c_body> }
].
This is what i want :
foo(a) -> <a_body>;
foo(b) -> <b_body>;
foo(c) -> <c_body>;
foo(_) -> undefined. %% <- this
I do that at the moment :
MkCaseClause =
fun({Pattern,Body}) ->
cerl:c_clause([cerl:c_atom(Pattern)], deep_literal(Body))
end,
WildCardClause = cerl:c_clause([ ??? ], cerl:c_atom(undefined)),
CaseClauses = [MkCaseClause(S) || S <- Data] ++ [WildCardClause],
So please help me to define WildCardClause. I saw that if i call my compiled function with neither a nor b nor c it results in ** exception error: no true branch found when evaluating an if expression in function ....
When i print my Core Erlang code i get this :
'myfuncname'/1 =
fun (Key) ->
case Key of
<'a'> when 'true' -> ...
<'b'> when 'true' -> ...
<'c'> when 'true' -> ...
end
So okay, case is translated to if when core is compiled. So i need to specify a true clause as in an if expression to get a pure wildcard. I don't know how to do it, since matching true in an if expression and in a case one are different semantics. In a case, true is not a wildcard.
And what if i would like match expressions with wildcards inside like {sometag,_,_,Thing} -> {ok, Thing}.
Thank you
I've found a way to do this
...
WildCardVar = cerl:c_var('_Any'),
WildCardClause = cerl:c_clause([WildCardVar], cerl:c_atom(undefined)),
...
It should work for inner wildcards too, but one has to be careful to give different variable names to each _ wildcard since only multiple _ do not match each other, variables do.
f(X,_, _ ) %% matches f(a,b,c)
f(X,_X,_X) %% doesn't

Which is more efficient in Erlang: match on two different lines, or match in tuple?

Which of these two is more efficient in Erlang? This:
ValueA = MyRecord#my_record.value_a,
ValueB = MyRecord#my_record.value_b.
Or this:
{ValueA, ValueB} = {MyRecord#my_record.value_a, MyRecord#my_record.value_b}.
?
I ask because the latter sometimes brings me to need multiple lines to fit in the 80 character line length limit I like to keep, and I tend to prefer to avoid doing stuff like this:
{ValueA, ValueB} = { MyRecord#my_record.value_a
, MyRecord#my_record.value_b }.
They generate exactly the same code! If you want less code try using:
#my_record{value_a=ValueA,value_b=ValueB} = MyRecord
which also generates the same code. Generally, if you can, use pattern matching. It is never worse, usually better. In this case they all do the minimum amount of work which is necessary.
In general write the code which is clearest and looks the best and only worry about these types of optimisation when you know that there is a speed problem with this code.
I've done a little test, and it seems they are roughly equivalent:
-module(timeit).
-export([test/0]).
-record(my_record, {value_a, value_b}).
times(N, Fn) ->
fun () -> do_times(N, Fn) end.
do_times(0, _Fn) ->
ok;
do_times(N, Fn) ->
Fn(),
do_times(N-1, Fn).
test_1() ->
MyRecord = #my_record{value_a=1, value_b=2},
timer:tc(times(100000000,
fun () ->
ValueA = MyRecord#my_record.value_a,
ValueB = MyRecord#my_record.value_b,
ValueA + ValueB
end)).
test_2() ->
MyRecord = #my_record{value_a=1, value_b=2},
timer:tc(times(100000000,
fun () ->
{ValueA, ValueB} = { MyRecord#my_record.value_a,
MyRecord#my_record.value_b },
ValueA + ValueB
end)).
test() ->
{test_1(), test_2()}.
44> timeit:test().
{{6042747,ok},{6063557,ok}}
45> timeit:test().
{{5849173,ok},{5822836,ok}}
46>
Btw, I had to add the "ValueA + ValueB" expression so the compiler doesn't treat the ValueA binding in test_1 as dead code. If you remove it, you'll see a big difference in the times because of that.

Debugging HXT performance problems

I'm trying to use HXT to read in some big XML data files (hundreds of MB.)
My code has a space-leak somewhere, but I can't seem to find it. I do have a little bit of a clue as to what is happening thanks to my very limited knowledge of the ghc profiling tool chain.
Basically, the document is parsed, but not evaluated.
Here's some code:
{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
import Text.XML.HXT.Core
import System.Environment (getArgs)
import Control.Monad (liftM)
main = do file <- (liftM head getArgs) >>= parseTuba
case file of(Left m) -> print "Failed."
(Right _) -> print "Success."
data Sentence t = Sentence [Node t] deriving Show
data Node t = Word { wSurface :: !t } deriving Show
parseTuba :: FilePath -> IO (Either String ([Sentence String]))
parseTuba f = do r <- runX (readDocument [] f >>> process)
case r of
[] -> return $ Left "No parse result."
[pr] -> return $ Right pr
_ -> return $ Left "Ambiguous parse result!"
process :: (ArrowXml a) => a XmlTree ([Sentence String])
process = getChildren >>> listA (tag "sentence" >>> listA word >>> arr (\ns -> Sentence ns))
word :: (ArrowXml a) => a XmlTree (Node String)
word = tag "word" >>> getAttrValue "form" >>> arr (\s -> Word s)
-- | Gets the tag with the given name below the node.
tag :: (ArrowXml a) => String -> a XmlTree XmlTree
tag s = getChildren >>> isElem >>> hasName s
I'm trying to read a corpus file, and the structure is obviously something like <corpus><sentence><word form="Hello"/><word form="world"/></sentence></corpus>.
Even on the very small development corpus, the program takes ~15 secs to read it in, of which around 20% are GC time (that's way too much.)
In particular, a lot of data is spending way too much time in DRAG state. This is the profile:
monitoring DRAG culprits. You can see that decodeDocument gets called a lot, and its data is then stalled until the very end of the execution.
Now, I think this should be easily fixed by folding all this decodeDocument stuff into my data structures (Sentence and Word) and then the RT can forget about these thunks. The way it's currently happening though, is that the folding happens at the very end when I force evaluation by deconstruction of Either in the IO monad, where it could easily happen online. I see no reason for this, and my attempts to strictify the program have so far been in vain. I hope somebody can help me :-)
I just can't even figure out too many places to put seqs and $!s in…
One possible thing to try: the default hxt parser is strict, but there does exist a lazy parser based on tagsoup: http://hackage.haskell.org/package/hxt-tagsoup
In understand that expat can do lazy processing as well: http://hackage.haskell.org/package/hxt-expat
You may want to see if switching parsing backends, by itself, solves your issue.

Resources