create a binary tree for alphabetical equation - data-structures

I have to create a binary tree to get the valid sentences out of list of tokens.
example :-
I have a set of tokens - ['(', 'one', '.', 'two', '.', '(', 'zero', '|', 'o', ')', '.', 'six', ')']
and from that I want to obtains result of sentences as follows:-
one two zero six
one two o six
Here "." means concatenation connecting operator , "()" - required grouping operator,"|" or connecting operator , [] - optional grouping operator
I already created tokens from alphabetical equation.But I am not able to create a proper binary tree structure so that I can create a tree in such a way that whenever I traverse the tree I get these sentences out of tokens
code of binary tree I can write but I am stuck in creating logic for the same
another example will be:-
input - ['(', '(', '(', 'one', '.', 'two', ')', '|', 'twelve', ')', '.', '(', 'zero', '|', 'o', ')', '.', 'six', ')']
output-
one two zero six
one two o six
twelve o six

It's useful to look at this as an equation parser. What you have is this equation:
(("one" . "two") | "twelve") . (("zero" | "0") . "six")
You can then build the tree from the bottom. ("one" . "two") looks like this:
.
/ \
"one" "two"
"twelve", of course, is a node all by itself. And to complete the first part, you put the "|" as a parent node, like this:
|
/ \
. "twelve"
/ \
"one" "two"
That gives you (("one" . "two") | "twelve").
You do the same kind of thing with the other half of the equation, and add a "." operator as the root node.
There is a formal way of building the tree while you're parsing, using a modification of the Shunting-yard algorithm.

Related

Solving alphabetical expression to obtain sentences using Binary Tree

I'm trying to solve alphabetical expressions to obtain valid sentences.
Example of input and output would be:-
Input
Actual:
(((​"​one​"​ . ​"​two​"​) | ​"​twelve​"​) . (​"​zero​"​ | ​"​o​"​) . ​"​six​"​)
Tokenized:
['(', '(', '(', 'one', '.', 'two', ')', '|', 'twelve', ')', '.',
'(', 'zero', '|', 'o', ')', '.', 'six', ')']
Output
one two zero six
one two o six
twelve o six
Operators
() - required grouping operator
[] - optional grouping operator
. - concatenate connecting operator
| - OR connecting operator
The . operator does have a higher precedence than the | operator.
I am not able to understand how should I go about it. (I want to write the code in Python)
I assume you already managed to tokenise the input. I would then suggest as a second step to build a tree from the tokens, so that the following input:
(((​"​one​"​ . ​"​two​"​) | ​"​twelve​"​) . (​"​zero​"​ | ​"​o​"​) . ​"​six​"​)
...would become the following tree:
Note that you could implement the optional words (wrapped in [...]) re-using the OR operator in combination with a None element (which would produce nothing in a final string). For instance ["six"] would be interpreted as if it were (None|"six"), which in the tree would be:
With the tree you can use recursion to find the possible strings. Each recursive call would return a list of possible strings (maybe just one). In case the children belong to a . (AND) operation, then all lists of strings found for the children must be combined as a Cartesian product. In case they belong to a | (OR) operation, these lists should be just concatenated (since all possible strings for any of the children are possible for the parent).
Here is some code that could be useful:
import itertools
import re
# subclass list just to distinguish between arguments of a "." or "|" operator
class OrList(list): # for "|" operator arguments
pass
class AndList(list): # for "." operator arguments
pass
def parse(tokens):
# helper function to conditionally either append-to or extend a list
def add(lst, factor):
if type(factor) == type(lst):
lst.extend(factor)
else:
lst.append(factor)
return lst
# helper function to conditionally return the given list or its only element
def simplify(lst):
if isinstance(lst, OrList): # remove duplicates
for i in range(len(lst)-1,-1,-1):
if lst.index(lst[i]) < i:
del lst[i]
return lst if len(lst) > 1 else lst[0]
it = iter(tokens + [""])
def recur(end):
terms = OrList()
factors = AndList()
token = None
while token != end:
token = next(it)
if token[0] == '"':
add(factors, token[1:-1]) # remove the quotes
elif token == "(":
add(factors, recur(")"))
elif token == "[": # encode [expr] as if it were: (|expr)
add(factors, add(OrList([None]), recur("]")))
else:
raise ValueError("Expected expression, got {}".format(token))
token = next(it)
if token != end and token not in ".|":
raise ValueError("Expected operator, got {}".format(token))
if token == "|" or token == end:
add(terms, simplify(factors))
factors = AndList()
return simplify(terms)
return recur("")
def validPhrases(node): # returns list of strings
if isinstance(node, list):
results = [validPhrases(child) for child in node]
if isinstance(node, OrList):
# flatten
results = list(itertools.chain.from_iterable(results))
else: # instance of AndList
# all combinations
results = [" ".join([s for s in combi if s != None])
for combi in itertools.product(*results)]
return list(set(results)) # remove duplicates
else: # str
return [node]
def generate(expr):
# tokenize
tokens = re.findall(r'"[^"]*"|\S', expr) # don't throw away the quotes yet
# make a tree in line with precedence rules
tree = parse(tokens)
# finally build all possible strings from this tree
return validPhrases(tree)
expr = '((("one" . "two" | "twelve") . ("zero" | "o")) . "six")'
print(generate(expr))
See it run on repl.it

What is the fastest way to modify a large string in Ruby?

I need to modify a string in ruby. Specifically I'm trying to remove 'holes' from a WKT string. Holes are defined as any single set of parenthesis after the first one with numbers within. For example in this string...
POLYGON ((1 2, 3 4), (5 6, 7 8))
I would need to remove , (5 6, 7 8) because this parenthesis data is a hole, and the comma and the space don't belong except to separate sets of parentheses.
I am avoiding ruby methods like match or scan to try to optimize for speed and achieve O(n) speed.
Here's what I have so far.
def remove_holes_from(wkt)
output_string = ""
last_3_chars = [ nil, nil, nil ]
number_chars = [ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ]
should_delete_chars = false
wkt.each_char do |char|
last_3_chars.shift
last_3_chars.push(char)
if should_delete_chars == false
if number_chars.include?(last_3_chars[0]) && last_3_chars[1] == ")" && last_3_chars[2] == ","
should_delete_chars = true
next
else
output_string += char
end
end
if should_delete_chars == true
if number_chars.include?(last_3_chars[0]) && last_3_chars[1] == ")" && last_3_chars[2] == ")"
should_delete_chars = false
output_string += char
else
next
end
end
end
output_string
end
The problem I am facing is that for a large polygon, like the United States (over 500,000 characters and over 40,000 points) it takes me 66 seconds to complete this. You can find the string here: https://gist.github.com/cheeseandpepper/9da5ca6ade921da2b4ab
Can anyone think of optimizations to this example I can use? Or maybe a separate approach? Thanks.
Whelp... regex wins!
wkt.gsub(/, \(-?\d.*?\)/, "")
took me 0.003742 seconds
As for the regex
, Literal comma
Literal space
\( Literal open parenthesis
-? Optional negative sign
\d Any digit (because the previous is optional, we need to make sure we have a digit vs another open parenthesis)
.* Any number of any characters (will be digits, a comma and maybe a negative sign)
?\) Up to and including a literal close parenthesis

How to split an expression into an array?

I want to convert an expression string into arrays. For string (10+2)/33, the expected result is ['(', '10', '+', '2', ')', '/', '33']. There maybe some spaces between them, and the valid operators are +-*/().
You can use this pattern:
"(10+2)/33".split(/\b|(?!\d)/)
(Obviously, the goal here is to split the string, not to check if characters are allowed).
The idea is to use the fact that (, ), +, -, /, * are single characters that are not in the \w character class. So \b will match when a digit is followed by one of these characters and vice versa. (?!\d) (negative lookakead: not followed by a digit), since it is the second alternative, is like \B(?!\d) and will match between two signs.
If you want to deal with eventual spaces, you only need to add \s* in each branch:
"(10+2)/33".split(/\s*\b\s*|\s*(?!\d)/)
Note that it may generate an empty item at the begining.
To avoid the problem you can use the scan method with a different pattern:
" ( 2 (10+2) / 33) ".scan(/\G\s*\K(?:\d+|\S)/)
Where \G ensures that all matches are contiguous from the start of the string and \K discards all on the left from the match result (the eventual white-spaces).
"(10 + 2) / 33".delete(" ").split(/(\D)/).reject(&:empty?)
# => ["(", "10", "+", "2", ")", "/", "33"]
You may use scan function also.
> s = "(10+2)/33"
> s.scan(/\d+|[^\s\w]/)
=> ["(", "10", "+", "2", ")", "/", "33"]
> s.scan(/\d+|[^\s\d]/)
=> ["(", "10", "+", "2", ")", "/", "33"]

sorted array with Number shuffle

i am working on the problem of Number shuffle https://rubymonk.com/learning/books/1-ruby-primer/problems/154-permutations#solution4802. exercises asks to:
return a sorted array of all the unique numbers that can be formed
with 3 or 4 digits.
there is a solution (See the Solution) below the exercise, that looks like this:
def number_shuffle(number)
no_of_combinations = number.to_s.size == 3 ? 6 : 24
digits = number.to_s.split(//)
combinations = []
combinations << digits.shuffle.join.to_i while
combinations.uniq.size!=no_of_combinations
combinations.uniq.sort
end
I have a few questions, can anyone explain me:
1) in 'no_of_combinations' variable what does it mean '3 ? 6 : 24'? i think 3 is amount of digits in the number. question mark ( ? ) is symbol of 'if'- if number digits are 3, the amount of numbers will be 6 in the array of combinations. colon (punctuation) is symbol sign, but i do not know why 24, there are 23 symbols considering white space in the array of combinations.
2) what does it mean << symbol after combinations? i know that it is addition sign, but what does it do here? and also, what it means exclamation mark after 'size' in the following string?
1) in 'no_of_combinations' variable what does it mean '3 ? 6 : 24'?
The expression needs to include the comparison to make sense . . .
number.to_s.size == 3 ? 6 : 24
This is the ternary if. If the comparison before the ? is true, it evaluates to the first value (6 here), if it is false it evaluates to the second value (24 here). It has nothing to do with literal Symbol values . . . in fact Ruby parser will always treat the colon as a value separator here, however you space it.
The syntax of this operator is originally from C, and you will find it copied to many other languages.
2) what does it mean << symbol after combinations? i know that it is addition sign, but what does it do here?
It is not an addition sign. This operator does different things depending on the class of the object (on the left). In this example, that is an Array, and << pushes the object on the right onto the end of the array. It is almost identical to push
and also, what it means exclamation mark after 'size' in the following string?
In this case it is part of !=, or "not equals" comparison operator. The original author could have made this clearer with a bit of whitespace.

convert string to method

I have a string that can be '+', '-', '*' or '/', and two numbers. I need to apply the operation denoted by the string to the numbers. I tried:
op = '+'
(&op.to_sym).call 1, 2
but it won't parse it. please help.
Ruby is not a Polish notation language.
1.send(op, 2)

Resources