I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:
rule _
crap
/
" "*
end
rule crap
" "* "\\x0D\\x0A"* " "*
end
I'm parsing some expressions that every now and then ended up with "\x0D\x0A". Yeah, not "\r\n" but "\x0D\x0A". Something was double escaped at some point. Long story.
That rule works, but it's ugly and it bothers me. I tried this:
rule _
" "* "\\x0D\\x0A"* " "*
/
" "*
end
which caused
SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'
Ideally I would like to actually write something like:
rule _
(" " | "\\x0D\\x0A")*
end
but that doesn't work, and while we are at it, I also discovered that you can't have only one * per rule:
rule _
" "*
/
"\n"*
end
that will match " ", but never \n.
I see you're using three different OR chars: /, | and \ (of which only the first means OR).
This works fine:
grammar Language
rule crap
(" " / "\\x0D\\x0A")* {
def value
text_value
end
}
end
end
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
require 'polyglot'
require 'language'
parser = LanguageParser.new
value = parser.parse(' \\x0D\\x0A \\x0D\\x0A ').value
print '>' + value + '<'
prints:
> \x0D\x0A \x0D\x0A <
You said "I also discovered that you can't have only one * per rule" (you mean: you CAN have), "that will match " ", but never \n".
Of course; the rule succeeds when it matches zero space characters. You could just use a + instead:
rule _
" "+
/
"\n"*
end
You could also parenthesise the space characters if you want to match any number of space-or-newline characters:
rule _
(" " / "\n")*
end
Your error "class/module name must be CONSTANT" is because the rule name is used as the prefix of a module name to contain any methods attached to your rule. A module name may not begin with an underscore, so you can't use methods in a rule whose name begins with an underscore.
Related
I'm trying to concatenate a constant into a string but I'm getting syntax error, unexpected unary+, expecting end' (SyntaxError)`
This is an example of what I have to do:
NAME = "Jane"
def a_function
s = 'Hi' + NAME +' !'
puts s
end
I know you can do "Hi #{NAME}!" but in my case the string has to be with single quotes.
How can I achieve this?
You are missing a space between + and ' !'.
This is a special case of confusing Ruby, because a single expression like +x is actually a valid unary expression meaning just x, the same way as +1 means 1.
Because of this it's likely Ruby is interpreting your expression a + b +c, as a + b c, which is invalid, and hence the error.
The fix:
s = 'Hi ' + NAME + ' !'
^------ Note the space here!
I need simple parsing with embedded single and double quotes. For the following input:
" hello 'there ok \"hohh\" ' ciao \"eeee \" \" yessss 'aaa' \" %%55+ "
I need the following output:
["hello", "there ok \"hohh\" ", "ciao", "eeee ", " yessss 'aaa' ", "%%55+"]
Why does the following Ruby code that I came up with work? I do not understand the regex part. I know basic regex but I assume that the embedded quotes should not work but they still do, either with single ones having doubles and vice versa.
text.scan(/\"(.*?)\"|'(.*?)'|([^\s]+)/).flatten.select{|x|x}
No need to solve this with a custom regex; the ruby standard library contains a module for this: Shellwords.
Manipulates strings like the UNIX Bourne shell
This module manipulates strings according to the word parsing rules of the UNIX Bourne shell.
Usage:
require 'shellwords'
str = " hello 'there ok \"hohh\" ' ciao \"eeee \" \" yessss 'aaa' \" %%55+ "
Shellwords.split(str)
#=> ["hello", "there ok \"hohh\" ", "ciao", "eeee ", " yessss 'aaa' ", "%%55+"]
# Or equivalently:
str.shellsplit
#=> ["hello", "there ok \"hohh\" ", "ciao", "eeee ", " yessss 'aaa' ", "%%55+"]
The above is the "right" answer. Use that. What follows is additional information to explain why to use this, and why your answer "sort-of" works.
Parsing these strings accurately is tricky! Your regex attempt works for most inputs, but does not properly handle various edge cases. For example, consider:
str = "foo\\ bar"
str.shellsplit
#=> ["foo bar"] (correct!)
str.scan(/\"(.*?)\"|'(.*?)'|([^\s]+)/).flatten.select{|x|x}
#=> ["foo\\", "bar"] (wrong!)
The method's implementation does still use a (more complex!) regex under the hood, but also handles edge cases such as invalid inputs - which yours does not.
line.scan(/\G\s*(?>([^\s\\\'\"]+)|'([^\']*)'|"((?:[^\"\\]|\\.)*)"|(\\.?)|(\S))(\s|\z)?/m)
So without digging too deeply into the flaws of your approach (but suffice to say, it doesn't always work!), why does it mostly work? Well, your regex:
/\"(.*?)\"|'(.*?)'|([^\s]+)/
...is saying:
If " is found, match as little as possible (.*?) up until the closing ".
Same as above, for single quotes (').
If neither a single nor double quote is found, scan ahead to the first non-whitespace characters ([^\s]+ -- which could also, equivalently, have been written as \S+).
The .flatten is necessary because you're using capture groups ((...)). This could have been avoided if you'd used non-capture groups instead ((?:...)).
The .select{|x|x}, or (effectively) equivalently .compact was also necessary because of these capture groups - since in each match, 2 of the 3 groups were not part of the result.
I was trying to write a single string across multiple lines since it was too long and I arrived at this solution that I think looks best but I couldn't find anything about it in the Ruby documentation
my_original = 'what I originally had ' +
'across multiple lines'
# executes to: "what I originally had across multiple lines"
new_style = 'new format to '\
'span multiple lines'
# executes to: "new format to span multiple lines\n"
However I saw of lot of ways this could be done and I was wondering whether they uses concatenation or interpolation and all I could find was this. In this case performance is not particularly important in this case but having the knowledge of what goes on under the covers cant hurt. So was wondering the differences between these.
my_original = 'what I originally had ' +
'across multiple lines'
# executes to: "what I originally had across multiple lines"
s_one = 'I assume this is a more '
s_two = 'verbose version of the '
s_three = 'first example.'
my_string = s_one + s_two + s_three
# executes to: "I assume this is a more verbose version of the first example."
my_first_solution = 'that breaks whenever'
my_first_solution << 'ruby 3.0 might be released'
# executes to:
style_i_used = 'which can span multiple '\
'lines without having '\
'extra white space'
# executes to: "which can span multiple lines without having extra white space"
another_string = <<-HEREDOC
when you don't mind
really wonky indentation
or having extra spaces.
HEREDOC
# executes to:"when you don't mind \nreally wonky indentation \n or having extra spaces\n and new lines. \n"
# "Is this just a one line string with no special characters?# executes to:
another_string = <<~HEREDOC
I assume this is the same as
the non squiggly version with
stripping between lines.
HEREDOC
# executes to: "I assume this is the same as \nthe non squiggly version with \nstripping between lines.\n"
#Edit
another_one =
"I did not originally include; however,
this one also adds a bunch of extra
white space and new lines."
# executes to:"I did not originally include; however,\n this one also adds a bunch of extra \n white space and new lines."
This is a NoMethodError
style_i_used = 'a format to' /
'span multiple lines'
/ is a method call which Strings don't have. \ on the other hand "escapes" the newline so it's treated as one string.
These won't give an error, but are bad:
my_original = 'what I originally had' +
'across multiple lines'
s_one = 'I assume this is the same as '
s_two = 'the first example with the '
s_three = 'code being more verbose'
my_string = s_one + s_two + s_three
my_first_solution = 'that breaks whenever'
my_first_solution << 'ruby 3.0 might be released'
+ and << are method calls. I don't think the Ruby parser is smart enough to see that these can all be compile-time strings so these strings are actually being constructed during run-time with object allocation and everything. That is entirely unnecessary.
The other kinds have different usage/results, so you use whichever is appropriate.
str = "a"\
"b" # result is "ab"
str = "a\
b" # also "ab"
str = "a
b" # result is "a\nb"
str = <<END
'"a'"
'"b'"
END
# result is " '\"a'\"\n '\"b'\""
str = <<~END
'"a'"
'"b'"
END
# result is "'\"a'\"\n'\"b'\""
another_one =
"what I originally had
across multiple lines"
this is my first entry to StackOverflow and I'm a newbie coder.
So I'm making a simple addition calc and I added commas in the last 2 lines to print out integers ...
What am I missing? The error says
C:/Ruby193/rubystuff/ex1.rb:13: syntax error, unexpected ',' print
("The result of the addition is " +,result)
I thought this was the right thing to do ... i must have missed something simple.
print ("Please enter your name: ")
name = gets
puts ("Hello, " + name)
print ("Enter a number to add: ")
num1 = gets
print ("Enter a second number to add: ")
num2 = gets
result = Integer(num1) + Integer(num2)
print result
print ("The result of the addition is ",result)
print ("So the result of adding " + num1.chomp + " plus " + num2.chomp + " equals: ",result)
Ruby has string interpolation and I think most would argue that's the most idiomatic way of doing things. RubyMonk does a great job explaining it here
by changing the 'print' call to the puts method you can do:
puts "The result of the additions is #{result}"
There are two ways to pass arguments to a method:
in parentheses directly after the method name
without parentheses with whitespace after the method name
You have white space after the method, ergo you are using option #2 and are passing a single argument ("The result of the addition is ",result) to the method, but ("The result of the addition is ",result) is not legal syntax.
Is there and way to write code like this in a way that makes what it does clearer?
a = (a.split(" ")[1..-1]).join(" ")
That deletes the first word of a sentence but the code doesn't look expressive at all.
irb(main):024:0> "this is a test".split.drop(1) * " "
=> "is a test"
Edited to add:
Explanation:
By default #split delimits on whitespace.
#drop(1) gets rid of the first entry.
* " " does the same as #join(" ").
for somebody who is used to reading rexep this is pretty clean:
a = a.sub(/^\S+\s/, "")
ymmv
code
a = a.split[1..-1] * " "
explanation
String#split's default parameter is " "
Array * String is an alias for Array.join(String)
On second thought, I'm not sure if it's more transparent to someone who is not familiar with ruby, per se. But anyone who has worked with Ruby strings for a little bit will understand what's going on. And it's a lot more clean than the original version.
UPDATE
As per just-my-correct-opinion's answer (which you all should vote up instead of mine), if you are running Ruby 1.9.1 (which you should be, anyway) or Ruby 1.8.7, you can do:
a = a.split.drop(1) * " "
maybe making the process explicit will help
words = a.split
words.pop
a = words.join " "
And if you were using this throughout some code you might want to create the
methods in String and Array to make your code readable. (works in 1.8.6 too)
class String
def words
split
end
end
class Array
def but_first
self[1..-1]
end
def to_sentence
join(' ')
end
end
str = "And, this is a sentence"
puts str.words.but_first.to_sentence
Something like following
a = a[a.index(' '), a.length-(a.index(' ')+1)]
No check though