I'm trying to build my own evaluator for mathematical expressions in ruby, and before doing that am trying to implement a parser to break the expression into a tree(of arrays). It correctly breaks down expressions with parenthesis, but I am having lots of trouble trying to figure out how to make it correctly break up an expression with operator precedence for addition.
Right now, a string like 1+2*3+4 becomes 1+[2*[3+4]] instead of 1+[2*3]+4. I'm trying to do the simplest solution possible.
Here is my code:
#d = 0
#error = false
#manipulate an array by reference
def calc_expr expr, array
until #d == expr.length
c = expr[#d]
case c
when "("
#d += 1
array.push calc_expr(expr, Array.new)
when ")"
#d += 1
return array
when /[\*\/]/
#d +=1
array.push c
when /[\+\-]/
#d+=1
array.push c
when /[0-9]/
x = 0
matched = false
expr[#d]
until matched == true
y = expr.match(/[0-9]+/,#d).to_s
case expr[#d+x]
when /[0-9]/
x+=1
else matched = true
end
end
array.push expr[#d,x].to_i
#d +=(x)
else
unless #error
#error = true
puts "Problem evaluating expression at index:#{#d}"
puts "Char '#{expr[#d]}' not recognized"
end
return
end
end
return array
end
#expression = ("(34+45)+(34+67)").gsub(" ","")
evaluated = calc #expression
puts evaluated.inspect
For fun, here's a fun regex-based 'parser' that uses the nice "inside-out" approach suggested by #DavidLjungMadison. It performs simple "a*b" multiplication and division first, followed by "a+b" addition and subtraction, and then unwraps any number left in parenthesis (a), and then starts over.
For simplicity in the regex I've only chosen to support integers; expanding each -?\d+ to something more robust, and replacing the .to_i with .to_f would allow it to work with floating point values.
module Math
def self.eval( expr )
expr = expr.dup
go = true
while go
go = false
go = true while expr.sub!(/(-?\d+)\s*([*\/])\s*(-?\d+)/) do
m,op,n = $1.to_i, $2, $3.to_i
op=="*" ? m*n : m/n
end
go = true while expr.sub!(/(-?\d+)\s*([+-])\s*(-?\d+)/) do
a,op,b = $1.to_i, $2, $3.to_i
op=="+" ? a+b : a-b
end
go = true while expr.gsub!(/\(\s*(-?\d+)\s*\)/,'\1')
end
expr.to_i
end
end
And here's a bit of testing for it:
tests = {
"1" => 1,
"1+1" => 2,
"1 + 1" => 2,
"1 - 1" => 0,
"-1" => -1,
"1 + -1" => 0,
"1 - -1" => 2,
"2*3+1" => 7,
"1+2*3" => 7,
"(1+2)*3" => 9,
"(2+(3-4) *3 ) * -6 * ( 3--4)" => 42,
"4*6/3*2" => 16
}
tests.each do |expr,expected|
actual = Math.eval expr
puts [expr.inspect,'=>',actual,'instead of',expected].join(' ') unless actual == expected
end
Note that I use sub! instead of gsub! on the operators in order to survive the last test case. If I had used gsub! then "4*6/3*2" would first be turned into "24/6" and thus result in 4, instead of the correct expansion "24/3*2" → "8*2" → 16.
If you really need to do the expression parsing yourself, then you should search for both sides of an expression (such as '2*3') and replace that with either your answer (if you are trying to calculate the answer) or an expression object (such as your tree of arrays, if you want to keep the structure of the expressions and evaluate later). If you do this in the order of precedence, then precedence will be preserved.
As a simplified example, your expression parser should:
Repeatedly search for all inner parens: /(([^)+]))/ and replace that with a call to the expression parser of $1 (sorry about the ugly regexp :)
Now all parens are gone, so you are looking at math operations between numbers and/or expression objects - treat them the same
Search for multiplication: /(expr|number)*(expr|number)/
Replace this with either the answer or encapsulate the two expressions in
a new expression. Again, depending on whether you need the answer now or
if you need the expression tree.
Search for addition: ... etc ...
If you are calculating the answer now then this is easy, each call to the expression parser eventually (after necessary recursion) returns a number which you can just replace the original expression with. It's a different story if you want to build the expression tree, and how you deal with a mixture of strings and expression objects so you can run a regexp on it is up to you, you could encode a pointer to the expression object in the string or else replace the entire string at the outside with an array of objects and use something similar to regexp to search the array.
You should also consider dealing with unary operators: "3*+3"
(It might simplify things if the very first step you take is to convert all numbers to a simple expression object just containing the number, you might be able to deal with unary operators here, but that can involve tricky situations like "-3++1")
Or just find an expression parsing library as suggested. :)
Related
I am trying to do this test and there are bunch of solutions online and here but I first want to figure out why my solution is wrong even though it seems that it puts right results when I enter certain strings :
Here is what they are asking :
Write a method that takes in a string. Return the longest word in the
string. You may assume that the string contains only letters and
spaces.
You may use the String split method to aid you in your quest.
Here is my solution where I thought I could turn string into array, sort it from max length descending and then just print first element in that new string like this :
def longest_word(sentence)
sentence = sentence.split
sentence.sort_by! { |longest| -longest.length }
return sentence[0]
end
That doesn't seem to work obviously since their test gives me all false..here is the test :
puts("\nTests for #longest_word")
puts("===============================================")
puts(
'longest_word("short longest") == "longest": ' +
(longest_word('short longest') == 'longest').to_s
)
puts(
'longest_word("one") == "one": ' +
(longest_word('one') == 'one').to_s
)
puts(
'longest_word("abc def abcde") == "abcde": ' +
(longest_word('abc def abcde') == 'abcde').to_s
)
puts("===============================================")
So the question is why? And can I just fix my code or the idea is all wrong and I need to do it completely different?
str = "Which word in this string is longest?"
r = /[[:alpha:]]+/
str.scan(r).max_by(&:length)
#=> "longest"
This regular expression reads, "match one or more characters". The outer brackets constitute a character class, meaning one of the characters within the brackets must be matched.
To deal with words that are hyphenated or contain single quotes, the following is an imperfect modification1:
str = "Who said that chicken is finger-licken' good?"
r = /[[[:alpha:]]'-]+/
str.scan(r).max_by(&:length)
#=> "finger-licken'"
This regular expression reads, "match one or more characters that are a letter, apostrophe or hyphen". The outer brackets constitute a character class, meaning one of the characters within the brackets must be matched.
1 I've successfully used "finger-licken'" in scrabble.
I'd write it something like:
str = "Write a method that takes in a string"
str.split.sort_by(&:length).last # => "string"
I have a string that is one character long and can be any possible character value:
irb(main):001:0> "\x0"
=> "\u0000"
I thought this might work:
irb(main):002:0> "\x0" += 1
SyntaxError: (irb):2: syntax error, unexpected tOP_ASGN, expecting $end
"\x0" += 1
^ from /opt/rh/ruby193/root/usr/bin/irb:12:in `<main>'
But, as you can see, it didn't. How can I increment/decrement my character?
Edit:
Ruby doesn't seem to be set up to do this. Maybe I'm approaching this the wrong way. I want to manipulate raw data in terms of 8-bit chunks. How can I best accomplish that sort of operation?
Depending on what the possible values are, you can use String#next:
"\x0".next
# => "\u0001"
Or, to update an existing value:
c = "\x0"
c.next!
This may well not be what you want:
"z".next
# => "aa"
The simplest way I can think of to increment a character's underlying codepoint is this:
c = 'z'
c = c.ord.next.chr
# => "{"
Decrementing is slightly more complicated:
c = (c.ord - 1).chr
# => "z"
In both cases there's the assumption that you won't step outside of 0..255; you may need to add checks for that.
You cannot do:
"\x0" += 1
Because, in Ruby, that is short for:
"\x0" = "\x0" + 1
and it is a syntax error to assign a value to a string literal.
However, given an integer n, you can convert it to a character by using pack. For example,
[97].pack 'U' # => "a"
Similarly, you can convert a character into an integer by using ord. For example:
[300].pack('U').ord # => 300
With these methods, you can easily write your own increment function, as follows:
def step(c, delta=1)
[c.ord + delta].pack 'U'
end
def increment(c)
step c, 1
end
def decrement(c)
step c, -1
end
If you just want to manipulate bytes, you can use String#bytes, which will give you an array of integers to play with. You can use Array#pack to convert those bytes back to a String. (Refer to documentation for encoding options.)
You could use the String#next method.
I think the most elegant method (for alphanumeric chars) would be:
"a".tr('0-9a-z','1-9a-z0')
which would loop the a through to z and through the numbers and back to a.
I reread the question and see, that my answer has nothing to do with the question. I have no answer for manipulationg 8-bit values directly.
I am writing a function which can have two potential forms of input:
This is {a {string}}
This {is} a {string}
I call the sub-strings wrapped in curly-brackets "tags". I could potentially have any number of tags in a string, and they could be nested arbitrarily deep.
I've tried writing a regular expression to grab the tags, which of course fails on the nested tags, grabbing {a {string}, missing the second curly bracket. I can see it as a recursive problem, but after staring at the wrong answer too long I feel like I'm blind to seeing something really obvious.
What can I do to separate out the potential tags into parts so that they can be processed and replaced?
The More Complicated Version
def parseTags( oBody, szText )
if szText.match(/\{(.*)\}/)
szText.scan(/\{(.*)\}/) do |outers|
outers.each do |blah|
if blah.match(/(.*)\}(.*)\{(.*)/)
blah.scan(/(.*)\}(.*)\{(.*)/) do |inners|
inners.each do |tags|
szText = szText.sub("\{#{tags}\}", parseTags( oBody, tags ))
end
end
else
szText = szText.sub("\{#{blah}\}", parseTags( oBody, blah ))
end
end
end
end
if szText.match(/(\w+)\.(\w+)(?:\.([A-Za-z0-9.\[\]": ]*))/)
func = $1+"_"+$2
begin
szSub = self.send func, oBody, $3
rescue Exception=>e
szSub = "{Error: Function #{$1}_#{$2} not found}"
$stdout.puts "DynamicIO Error Encountered: #{e}"
end
szText = szText.sub("#{$1}.#{$2}#{$3!=nil ? "."+$3 : ""}", szSub)
end
return szText
end
This was the result of tinkering too long. It's not clean, but it did work for a case similar to "1" - {help.divider.red.sys.["{pc.login}"]} is replaced with ---------------[ Duwnel ]---------------. However, {pc.attr.str.dotmode} {ansi.col.red}|{ansi.col.reset} {pc.attr.pre.dotmode} {ansi.col.red}|{ansi.col.reset} {pc.attr.int.dotmode} implodes brilliantly, with random streaks of red and swatches of missing text.
To explain, anything marked {ansi.col.red} marks an ansi red code, reset escapes the color block, and {pc.attr.XXX.dotmode} displays a number between 1 and 10 in "o"s.
As others have noted, this is a perfect case for a parsing engine. Regular expressions don't tend to handle nested pairs well.
Treetop is an awesome PEG parser that you might be interested in taking a look at. The main idea is that you define everything that you want to parse (including whitespace) inside rules. The rules allow you to recursively parse things like bracket pairs.
Here's an example grammar for creating arrays of strings from nested bracket pairs. Usually grammars are defined in a separate file, but for simplicity I included the grammar at the end and loaded it with Ruby's DATA constant.
require 'treetop'
Treetop.load_from_string DATA.read
parser = BracketParser.new
p parser.parse('This is {a {string}}').value
#=> ["This is ", ["a ", ["string"]]]
p parser.parse('This {is} a {string}').value
#=> ["This ", ["is"], " a ", ["string"]]
__END__
grammar Bracket
rule string
(brackets / not_brackets)+
{
def value
elements.map{|e| e.value }
end
}
end
rule brackets
'{' string '}'
{
def value
elements[1].value
end
}
end
rule not_brackets
[^{}]+
{
def value
text_value
end
}
end
end
I would recommend instead of fitting more complex regular expressions to this problem, that you look into one of Ruby's grammar-based parsing engines. It is possible to design recursive and nested grammars in most of these.
parslet might be a good place to start for your problem. The erb-alike example, although it does not demonstrate nesting, might be closest to your needs: https://github.com/kschiess/parslet/blob/master/example/erb.rb
I want to replace the last occurrence of a substring in Ruby. What's the easiest way?
For example, in abc123abc123, I want to replace the last abc to ABC. How do I do that?
How about
new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
For instance:
irb(main):001:0> old_str = "abc123abc123"
=> "abc123abc123"
irb(main):002:0> pattern="abc"
=> "abc"
irb(main):003:0> replacement="ABC"
=> "ABC"
irb(main):004:0> new_str = old_str.reverse.sub(pattern.reverse, replacement.reverse).reverse
=> "abc123ABC123"
"abc123abc123".gsub(/(.*(abc.*)*)(abc)(.*)/, '\1ABC\4')
#=> "abc123ABC123"
But probably there is a better way...
Edit:
...which Chris kindly provided in the comment below.
So, as * is a greedy operator, the following is enough:
"abc123abc123".gsub(/(.*)(abc)(.*)/, '\1ABC\3')
#=> "abc123ABC123"
Edit2:
There is also a solution which neatly illustrates parallel array assignment in Ruby:
*a, b = "abc123abc123".split('abc', -1)
a.join('abc')+'ABC'+b
#=> "abc123ABC123"
Since Ruby 2.0 we can use \K which removes any text matched before it from the returned match. Combine with a greedy operator and you get this:
'abc123abc123'.sub(/.*\Kabc/, 'ABC')
#=> "abc123ABC123"
This is about 1.4 times faster than using capturing groups as Hirurg103 suggested, but that speed comes at the cost of lowering readability by using a lesser-known pattern.
more info on \K: https://www.regular-expressions.info/keep.html
Here's another possible solution:
>> s = "abc123abc123"
=> "abc123abc123"
>> s[s.rindex('abc')...(s.rindex('abc') + 'abc'.length)] = "ABC"
=> "ABC"
>> s
=> "abc123ABC123"
When searching in huge streams of data, using reverse will definitively* lead to performance issues. I use string.rpartition*:
sub_or_pattern = "!"
replacement = "?"
string = "hello!hello!hello"
array_of_pieces = string.rpartition sub_or_pattern
( array_of_pieces[(array_of_pieces.find_index sub_or_pattern)] = replacement ) rescue nil
p array_of_pieces.join
# "hello!hello?hello"
The same code must work with a string with no occurrences of sub_or_pattern:
string = "hello_hello_hello"
# ...
# "hello_hello_hello"
*rpartition uses rb_str_subseq() internally. I didn't check if that function returns a copy of the string, but I think it preserves the chunk of memory used by that part of the string. reverse uses rb_enc_cr_str_copy_for_substr(), which suggests that copies are done all the time -- although maybe in the future a smarter String class may be implemented (having a flag reversed set to true, and having all of its functions operating backwards when that is set), as of now, it is inefficient.
Moreover, Regex patterns can't be simply reversed. The question only asks for replacing the last occurrence of a sub-string, so, that's OK, but readers in the need of something more robust won't benefit from the most voted answer (as of this writing)
You can achieve this with String#sub and greedy regexp .* like this:
'abc123abc123'.sub(/(.*)abc/, '\1ABC')
simple and efficient:
s = "abc123abc123abc"
p = "123"
s.slice!(s.rindex(p), p.size)
s == "abc123abcabc"
string = "abc123abc123"
pattern = /abc/
replacement = "ABC"
matches = string.scan(pattern).length
index = 0
string.gsub(pattern) do |match|
index += 1
index == matches ? replacement : match
end
#=> abc123ABC123
I've used this handy helper method quite a bit:
def gsub_last(str, source, target)
return str unless str.include?(source)
top, middle, bottom = str.rpartition(source)
"#{top}#{target}#{bottom}"
end
If you want to make it more Rails-y, extend it on the String class itself:
class String
def gsub_last(source, target)
return self unless self.include?(source)
top, middle, bottom = self.rpartition(source)
"#{top}#{target}#{bottom}"
end
end
Then you can just call it directly on any String instance, eg "fooBAR123BAR".gsub_last("BAR", "FOO") == "fooBAR123FOO"
.gsub /abc(?=[^abc]*$)/, 'ABC'
Matches a "abc" and then asserts ((?=) is positive lookahead) that no other characters up to the end of the string are "abc".
our homework is to write a ruby script who calculate a subset of wordlist depending on the expression.
regular binary operations are
&& And operator
|| Or operator
++ Concatenate operator
! Negation operator
A valid call would be like
./eval.rb wordlist && a c
or
./eval.rb wordlist && || a b c
First call means generate a new wordlist which all words have at least one 'a' and 'c'.
So my question is how do I process the arguemnts in a efficent way? Maybe recursiv?
I'm stuck...
Thanks in advance.
Looks like a grammer with prefix notation. A stack is indeed your friend, and the easiest stack to use is the call stack. For example, given this grammar:
expression ::= number | operand number number
operand ::= '+' | '-'
This is the code to evaluate it:
#!/usr/bin/ruby1.8
#tokens = ['*', 2, '+', 3, 4]
def evaluate
token = #tokens.shift # Remove first token from #tokens
case token
when '*'
return evaluate * evaluate
when '+'
return evaluate + evaluate
else
return token
end
end
puts evaluate # => 14
Although this is Ruby, it's close enough to pseudo-code. I've put explicit returns in, although Ruby does not require them, because it may be clearer to someone who does not know Ruby.
Use a stack. The max size would be the number of arguments.