Ruby: Why does equals sign in literal regexp cause parsing error? - ruby

These parse and execute fine:
"=".scan(/=/)
"=".scan (/=/)
This causes "unterminated regexp meets end of file":
"=".scan /=/
If I insert something before the = the error goes away:
"=".scan /^=/
What's going on?

I'm guessing that you're hitting this in the parser:
case '/':
if (IS_BEG()) {
lex_strterm = NEW_STRTERM(str_regexp, '/', 0);
return tREGEXP_BEG;
}
if ((c = nextc()) == '=') {
set_yylval_id('/');
lex_state = EXPR_BEG;
return tOP_ASGN;
}
Note the nextc() check in the second if. For reference, tOP_ASGN is:
%token <id> tOP_ASGN /* +=, -= etc. */
so it is used for operator-assign tokens.
This suggests that that /=/ in
'='.scan /=/
is being seen as the divide-assign operator (/=) followed by a start-regex-literal (/).
You'll have trouble (of a slightly different sort) with this:
' ='.scan / =/
but not this:
' ='.scan(/ =/)
There is often ambiguity when a method call doesn't have parentheses. In this case, I think operator precedence rules apply and that's not what you're expecting.
I tend to put parentheses on all my method calls because I'm too old and cranky to want to worry about how the parser is going to behave.

Related

Ruby if ... any? ... include? syntax

I need to check if any elements of a large (60,000+ elements) array are present in a long string of text. My current code looks like this:
if $TARGET_PARTLIST.any? { |target_pn| pdf_content_string.include? target_pn }
self.last_match_code = target_pn
self.is_a_match = true
end
I get a syntax error undefined local variable or method target_pn.
Could someone let me know the correct syntax to use for this block of code? Also, if anyone knows of a quicker way to do this, I'm all ears!
In this case, all your syntax is correct, you've just got a logic error. While target_pn is defined (as a parameter) inside the block passed to any?, it is not defined in the block of the if statement because the scope of the any?-block ends with the closing curly brace, and target_pn is not available outside its scope. A correct (and more idiomatic) version of your code would look like this:
self.is_a_match = $TARGET_PARTLIST.any? do |target_pn|
included = pdf_content_string.include? target_pn
self.last_match_code = target_pn if included
included
end
Alternately, as jvillian so kindly suggests, one could turn the string into an array of words, then do an intersection and see if the resulting set is nonempty. Like this:
self.is_a_match = !($TARGET_PARTLIST &
pdf_content_string.gsub(/[^A-Za-z ]/,"")
.split).empty?
Unfortunately, this approach loses self.last_match_code. As a note, pointed out by Sergio, if you're dealing with non-English languages, the above regex will have to be changed.
Hope that helps!
You should use Enumerable#find rather than Enumerable#any?.
found = $TARGET_PARTLIST.find { |target_pn| pdf_content_string.include? target_pn }
if found
self.last_match_code = found
self.is_a_match = true
end
Note this does not ensure that the string contains a word that is an element of $TARGET_PARTLIST. For example, if $TARGET_PARTLIST contains the word "able", that string will be found in the string, "Are you comfortable?". If you only want to match words, you could do the following.
found = $TARGET_PARTLIST.find { |target_pn| pdf_content_string[/\b#{target_pn}\b/] }
Note this uses the method String#[].
\b is a word break in the regular expression, meaning that the first (last) character of the matched cannot be preceded (followed) by a word character (a letter, digit or underscore).
If speed is important it may be faster to use the following.
found = $TARGET_PARTLIST.find { |target_pn|
pdf_content_string.include?(target_on) && pdf_content_string[/\b#{target_pn}\b/] }
A probably more performant way would be to move all this into native code by letting Regexp search for it.
# needed only once
TARGET_PARTLIST_RE = Regexp.new("\\b(?:#{$TARGET_PARTLIST.sort.map { |pl| Regexp.escape(pl) }.join('|')})\\b")
# to check
self.last_match_code = pdf_content_string[TARGET_PARTLIST_RE]
self.is_a_match = !self.last_match_code.nil?
A much more performant way would be to build a prefix tree and create the regexp using the prefix tree (this optimises the regexp lookup), but this is a bit more work :)

If statement not working - what is wrong with this syntax?

I get an "Expected Identifier" message against the if line. Any ideas why?
if ([inputA.text isEqualToString:#""]) && ([inputB.text <> isEqualToString:#""]) {
c = 1;
}
I'm trying to say it both inputs are empty...
I presume there isn't an easier way to say if the text is null in Obj C++?
An if statement requires that its condition expression be enclosed in parentheses. You have a compound expression. You've used parentheses around the subexpressions of the logical AND operation (&&), but you haven't surrounded the entire expression in parentheses. (The subexpressions don't actually require parentheses in this case, but they don't hurt.)
Next, you have a random <> in the second subexpression. What is that doing in there? In some languages that is the "not equal" operator, but a) it's not an operator in C or Objective-C, b) it wouldn't go inside a message-send expression like that, and c) you claim you were trying to check that both inputs are empty, so I wouldn't expect you to try to negate the test for equality with the empty string.
So, fixing just those problems yields:
if ([inputA.text isEqualToString:#""] && [inputB.text isEqualToString:#""]) {
c = 1;
}
That said, pie's answer is good, too. It also works if either of the inputs has a nil text property, too.
if ([inputA.text length]==0 && [inputB.text length]==0)
{
c = 1;
}

Parslet : exclusion clause

I am currently writting a Ruby parser using Ruby, and more precisely Parslet, since I think it is far more easier to use than Treetop or Citrus. I create my rules using the official specifications, but there are some statements I can not write, since they "exclude" some syntax, and I do not know how to do that... Well, here is an example for you to understand...
Here is a basic rule :
foo::=
any-character+ BUT NOT (foo* escape_character barbar*)
# Knowing that (foo* escape_character barbar*) is included in any-character
How could I translate that using Parslet ? Maybe the absent?/present? stuff ?
Thank you very much, hope someone has an idea....
Have a nice day!
EDIT:
I tried what you said, so here's my translation into Ruby language using parslet:
rule(:line_comment){(source_character.repeat >> line_terminator >> source_character.repeat).absent? >> source_character.repeat(1)}
However, it does not seem to work (the sequence in parens). I did some tests, and came to the conclusion that what's written in my parens is wrong.
Here is a very easier example, let's consider these rules:
# Parslet rules
rule(:source_character) {any}
rule(:line_terminator){ str("\n") >> str("\r").maybe }
rule(:not){source_character.repeat >> line_terminator }
# Which looks like what I try to "detect" up there
I these these rules with this code:
# Code to test :
code = "test
"
But I get that:
Failed to match sequence (SOURCE_CHARACTER{0, } LINE_TERMINATOR) at
line 2 char 1. - Failed to match sequence (SOURCE_CHARACTER{0, }
LINE_TERMINATOR) at line 2 char 1.- Failed to match sequence (' '
' '?) at line 2 char 1.
`- Premature end of input at line 2 char 1. nil
If this sequence doesn't work, my 'complete' rule up there won't ever work... If anyone has an idea, it would be great.
Thank you !
You can do something like this:
rule(:word) { match['^")(\\s'].repeat(1) } # normal word
rule(:op) { str('AND') | str('OR') | str('NOT') }
rule(:keyword) { str('all:') | str('any:') }
rule(:searchterm) { keyword.absent? >> op.absent? >> word }
In this case, the absent? does a lookahead to make sure the next token is not a keyword; if not, then it checks to make sure it's not an operator; if not, finally see if it's a valid word.
An equivalent rule would be:
rule(:searchterm) { (keyword | op).absent? >> word }
Parslet matching is greedy by nature. This means that when you repeat something like
foo.repeat
parslet will match foo until it fails. If foo is
rule(:foo) { any }
you will be on the path to fail, since any.repeat always matches the entire rest of the document!
What you're looking for is something like the string matcher in examples/string_parser.rb (parslet source tree):
rule :string do
str('"') >>
(
(str('\\') >> any) |
(str('"').absent? >> any)
).repeat.as(:string) >>
str('"')
end
What this says is: 'match ", then match either a backslash followed by any character at all, or match any other character, as long as it is not the terminating ".'
So .absent? is really a way to exclude things from a match that follows:
str('foo').absent? >> (str('foo') | str('bar'))
will only match 'bar'. If you understand that, I assume you will be able to resolve your difficulties. Although those will not be the last on your way to a Ruby parser...

gsub! On an argument doesn't work

I am making a function that turns the first argument into a PHP var (useless, I know), and set it equal to the second argument. I'm trying to gsub! it to get rid of all the characters that can't be used in a PHP var. Here is what I have:
dvar = "$" + name.gsub!(/.?\/!#\#{}$%^&*()`~/, "") { |match| puts match }
I have the puts match there to make sure some of the characters were removed. name is a variable passed into a method in which this is its purpose. I am getting this error:
TypeError: can't convert nil into String
cVar at ./Web.rb:31
(root) at C:\Users\Andrew\Documents\NetBeansProjects\Web\lib\main.rb:13
Web.rb is the file this line is in, and main.rb is the file calling this method. How can I fix this?
EDIT: If I remove the ! in gsub!, it goes through, but the characters aren't removed.
Short answer
Use dvar = "$" + name.tr(".?\/!#\#{}$%^&*()``~", '')
Long answer
The problem you are facing is that the gsub! call is returning nil. You can't concatenate (+) a String with a nil.
That's happening because you have a malformed Regexp. You aren't escaping the special regex symbols, like $, * and ., just for a start. Also, the way it is now, gsub will only match if your string contains all that symbols in sequence. You should use the pipe (|) operator to make an OR like operation.
gsub! will also return nil if no substitutions happened.
See the documentation for gsub and gsub! here: http://ruby-doc.org/core/classes/String.html#M001186
I think you should replace gsub! with gsub. Do you really need name to change?
Example:
name = "m$var.name$$"
dvar = "$" + name.gsub!(/\$|\.|\*/, "") # $ or . or *
# dvar now contains $mvarname and name is mvarname
Your line, corrected:
dvar = "$" + name.gsub(/\.|\?|\/|\!|\#|\\|\#|\{|\}|\$|\%|\^|\&|\*|\(|\)|\`|\~/, "")
# some things shouldn't (or aren't needed to) be escaped, I don't remember them all right now
As J-_-L appointed, you could also use a character class ([]), that makes it a little clearer, I guess. Well, it's hard to mentally parse anyway.
dvar = "$" + name.gsub(/[\.\?\/\!\#\\\#\{\}\$\%\^\&\*\(\)\`\~]/, "")
But because what you are doing is simple character replacement, the best method is tr (again reminded by J-_-L!):
dvar = "$" + name.tr(".?\/!#\#{}$%^&*()`~", '')
Way easier to read and make modifications.
You cannot apply a second parameter
and a block to gsub (the block is ignored)
The regex is wrong, you forgot the
square brackets:
/[.?\/!#\#{}$%^&*()~]/`
Because your regex is wrong, it
didn't match anything and because
gsub! returns nil if nothing was
replaced, you get this strange nil no
method error
btw: you should use gsub not gsub! in
this case, because you are using the
return value (and not name itself) --
and the error would not have happened
i dont see what the block is for
just do
name = 'hello.?\/!##$%^&*()`~hello'
dvar = "$" + name.gsub(/\.|\?|\\|\/|\!|\#|\#|\{|\}|\$|\%|\^|\&|\*|\(|\)|\`|\~/, "")
puts dvar # => "$hellohello"
or use [] to denote OR
dvar = "$" + name.gsub(/[\.\?\\\/\!\#\\\#\{\}\$\%\^\&\*\(\)\`\~]/, "")
you have to escape the special characters and then OR them so it will remove them individually not just if they are all found together
also there is really no need to use gsub! to modify the string in place use the non mutator gsub() since you assign it to a new variable,
gsub! returns nil for which the operator + is not defined for stings, which gives you the no method error mentioned
It seems as the 'name' object is nil, you may be calling gsub! on nil which usually complains with a NoMethodError: private method gusb! called for nilNilClass, since I don't know the version of ruby you are using I am not sure if the error would be the same, but it's a good place to start looking at.

ANTLR parse problem

I need to be able to match a certain string ('[' then any number of equals signs or none then '['), then i need to match a matching close bracket (']' then the same number of equals signs then ']') after some other match rules. ((options{greedy=false;}:.)* if you must know). I have no clue how to do this in ANTLR, how can i do it?
An example: I need to match [===[whatever arbitrary text ]===] but not [===[whatever arbitrary text ]==].
I need to do it for an arbitrary number of equals signs as well, so therein lies the problem: how do i get it to match an equal number of equals signs in the open as in the close? The supplied parser rules so far dont seem to make sense as far as helping.
You can't easely write a lexer for it, you need parsing rules. Two rules should be sufficient. One is responsible for matching the braces, one for matching the equal signs.
Something like this:
braces : '[' ']'
| '[' equals ']'
;
equals : '=' equals '='
| '=' braces '='
;
This should cover the use case you described. Not absolute shure but maybe you have to use a predicate in the first rule of 'equals' to avoid ambiguous interpretations.
Edit:
It is hard to integrate your greedy rule and at the same time avoid a lexer context switch or something similar (hard in ANTLR). But if you are willing to integrate a little bit of java in your grammer you can write an lexer rule.
The following example grammar shows how:
grammar TestLexer;
SPECIAL : '[' { int counter = 0; } ('=' { counter++; } )+ '[' (options{greedy=false;}:.)* ']' ('=' { counter--; } )+ { if(counter != 0) throw new RecognitionException(input); } ']';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
rule : ID
| SPECIAL
;
Your tags mention lexing, but your question itself doesn't. What you're trying to do is non-regular, so I don't think it can be done as part of lexing (though I don't remember if ANTLR's lexer is strictly regular -- it's been a couple of years since I last used ANTLR).
What you describe should be possible in parsing, however. Here's the grammar for what you described:
thingy : LBRACKET middle RBRACKET;
middle : EQUAL middle EQUAL
| LBRACKET RBRACKET;

Resources