Why is my simple SQL grammar failing to parse in Brag? - scheme

I am trying to create a parser for a simple subset of SQL using a grammar written with BNF in Brag. My Brag code looks like this:
#lang brag
statement : "select" fields "from" source joins* filters*
fields : field ("," field)*
field : WORD
source : WORD
joins : join*
join : "join" source "on" "(" condition ")"
filters : "where" condition ("and" | "or" condition)*
condition : field "=" field
But when I attempt to use that grammar to parse a basic SQL statement, I run into the following error:
> (parse-to-datum "select * from table")
Encountered unexpected token of type "s" (value "s") while parsing 'unknown [line=#f, column=#f, offset=#f]
I'm a total beginner to grammars and brag. Any ideas what I'm doing incorrectly?

You need to lex/tokenize the string first. The input to parse/parse-to-datum should be a list of tokens. Also, brag is case sensitive, meaning that the input should be select rather than SELECT. After you do that, it should work:
> (parse-to-datum (list "select"
(token 'WORD "foo")
"from "
(token 'WORD "bar")
" "
" "))
'(statement "select" (fields (field "foo")) "from " (source "bar") " " " ")
For the case sensitivity issue, this is fact not a problem, as you can perform normalization during the tokenization phase.
Your grammar looks weird, however. You probably should not deal with whitespaces. Instead, the whitespace should similarly be dealt with in the tokenization phase.
See https://beautifulracket.com/bf/the-tokenizer-and-reader.html for more information about tokenization.
An alternative possibility is to use other parsers. https://docs.racket-lang.org/megaparsack/index.html, for instance, can parse a string to a datum (or syntax datum) right away, though it uses some advanced concept in functional programming, so in a way it might be more difficult to use.

Related

Does flowgrounds JSONata expressions support regex?

I'm trying to build a fairly complex expression with a CBR where I try to identify if a string contains another string. In order to do so I need to manipulate the second string and use a little bit of regex magic but it doesn't seem to work. Could anyone confirm if the JSONata implementation of flowground support regex inside a "contains" operation? The expression I am using right now is the following:
$not($contains(elements[0].attribs.content,"/" & $replace(elasticio."step_1".body.issue.fields."customfield_22519"[0],"-"," ") &"/i"))
RegEx and $contains are working correctly in combination.
The reason for your not working expression is that the second parameter of $contains is a string (something like "/xyz/i"). This string is not being interpreted as a regex.
your expression: $contains( "abc", "/" & "X" & "/i")
to change in: $contains( "abc", $eval("/" & "B" & "/i") )

let: bad syntax (not an identifier and expression for a binding) in: wordslist ###scheme

(define (most-common-word str)
(let (wordslist str-split str " ")))
I am trying to do a inner variable of lists of strings.
but I get the error "bad syntax".
I looked for answers here but the things I changed, didn't help.
str-split returns a list of strings with " " the separator.
thanks.
It should look like:
(let ([word-list <VALUE>]) <BODY>)
... which establishes a local binding from word-list to the value <VALUE>. This binding is effective only inside the <BODY> form enclosed by the let.
Now, in order to compute <VALUE>, you have to call str-split with the arguments you want (i.e. str and " "). The way you perform a function call is to wrap it in parenthesis (this is valid only in a context where the form is evaluated as an expression, not where parenthesis mean binding, for example). So <VALUE> should really be:
(str-split str " ")

SPHINX field search operator issue

I am using sphinx 2.0.4-release with SPH_MATCH_EXTENDED2 query syntax. When I have an "empty value" in my query i.e.:
blah & ''
sphinx ignores it and searches just "blah". It still works the same way when i use field search operator and an empty value comes last:
#field1 blah #field2 ''
But this query:
#field1 '' #field2 blah
causes error: syntax error, unexpected TOK_FIELDLIMIT near ' '' #field2 blah'. Of course i can trim empty values, but this behaviour seems illogical to me... Am i doing something wrong? Or is it actually a bug?
Sphinx uses an inverted index. It breaks up the text into words and stores (hashes of) them.
As such it doesnt index 'nothing' (its not a word) - so you can't search an empty string.
All of those queries are strictly a syntax error - and nonsense. But in some cases sphinx will just dispose of invalid syntax silently (because it then falls back and thinks its word char, which are then not in charset_table and so go) - and in so doing come up with a 'valid' query (just not what you intended)
The solution is to simply turn an empty field into a 'word' at indexing time, then you can search for the empty string!
eg
sql_query = SELECT id, title, IF(field1 = '','EMPTY_STRING',field1) AS field1, ....
Then you can just query as
#field1 EMPTY_STRING #field2 blah
What you use as 'EMPTY_STRING' is completely arbitrary.

ruby regex not working to remove class name from sql

I have:
BEFORE Gsub sql ::::
SELECT record_type.* FROM record_type WHERE (name = 'Registrars')
sql = sql.gsub(/SELECT\s+[^\(][A-Z]+\./mi,"SELECT ")
AFTER GSUB SQL ::::
SELECT record_type.* FROM record_type WHERE (name = 'Registrars')
The desired result is to remove the "record_type." from the statement:
So it should be :
SELECT * FROM record_type WHERE (name = 'Registrars')
After the regex is run.
I didn't write this, it's in the asf-soap-adaptor gem. Can someone tell me why it doesn't work, and how to fix?
I suppose it should be written like this...
sql = sql.gsub(/SELECT\s+[^\(][A-Z_]+\./mi,"SELECT ")
... as the code in the question won't match if the field name contains _ (underscore) symbol. I suppose that's why this code is in gem: it can work in some conditions (i.e., with underscoreless field names).
Still, I admit I don't understand why exactly this replacement should be done - and shouldn't it include 0-9 check as well (as, for example, 'record_id1' field still won't be matched - and replaced - by the character class in the regular expression; you may have to either expand it, like [0-9A-Z_], or just replace completely with \w).
so your before and after gsubs are the same? I can't tell you why it doesn't work if you dont tell me your expected result. Also for help with interpreting ruby regular expressions check out rubular.com

ANTLR parse problem

I need to be able to match a certain string ('[' then any number of equals signs or none then '['), then i need to match a matching close bracket (']' then the same number of equals signs then ']') after some other match rules. ((options{greedy=false;}:.)* if you must know). I have no clue how to do this in ANTLR, how can i do it?
An example: I need to match [===[whatever arbitrary text ]===] but not [===[whatever arbitrary text ]==].
I need to do it for an arbitrary number of equals signs as well, so therein lies the problem: how do i get it to match an equal number of equals signs in the open as in the close? The supplied parser rules so far dont seem to make sense as far as helping.
You can't easely write a lexer for it, you need parsing rules. Two rules should be sufficient. One is responsible for matching the braces, one for matching the equal signs.
Something like this:
braces : '[' ']'
| '[' equals ']'
;
equals : '=' equals '='
| '=' braces '='
;
This should cover the use case you described. Not absolute shure but maybe you have to use a predicate in the first rule of 'equals' to avoid ambiguous interpretations.
Edit:
It is hard to integrate your greedy rule and at the same time avoid a lexer context switch or something similar (hard in ANTLR). But if you are willing to integrate a little bit of java in your grammer you can write an lexer rule.
The following example grammar shows how:
grammar TestLexer;
SPECIAL : '[' { int counter = 0; } ('=' { counter++; } )+ '[' (options{greedy=false;}:.)* ']' ('=' { counter--; } )+ { if(counter != 0) throw new RecognitionException(input); } ']';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
rule : ID
| SPECIAL
;
Your tags mention lexing, but your question itself doesn't. What you're trying to do is non-regular, so I don't think it can be done as part of lexing (though I don't remember if ANTLR's lexer is strictly regular -- it's been a couple of years since I last used ANTLR).
What you describe should be possible in parsing, however. Here's the grammar for what you described:
thingy : LBRACKET middle RBRACKET;
middle : EQUAL middle EQUAL
| LBRACKET RBRACKET;

Resources