Solving alphabetical expression to obtain sentences using Binary Tree - algorithm

I'm trying to solve alphabetical expressions to obtain valid sentences.
Example of input and output would be:-
Input
Actual:
(((​"​one​"​ . ​"​two​"​) | ​"​twelve​"​) . (​"​zero​"​ | ​"​o​"​) . ​"​six​"​)
Tokenized:
['(', '(', '(', 'one', '.', 'two', ')', '|', 'twelve', ')', '.',
'(', 'zero', '|', 'o', ')', '.', 'six', ')']
Output
one two zero six
one two o six
twelve o six
Operators
() - required grouping operator
[] - optional grouping operator
. - concatenate connecting operator
| - OR connecting operator
The . operator does have a higher precedence than the | operator.
I am not able to understand how should I go about it. (I want to write the code in Python)

I assume you already managed to tokenise the input. I would then suggest as a second step to build a tree from the tokens, so that the following input:
(((​"​one​"​ . ​"​two​"​) | ​"​twelve​"​) . (​"​zero​"​ | ​"​o​"​) . ​"​six​"​)
...would become the following tree:
Note that you could implement the optional words (wrapped in [...]) re-using the OR operator in combination with a None element (which would produce nothing in a final string). For instance ["six"] would be interpreted as if it were (None|"six"), which in the tree would be:
With the tree you can use recursion to find the possible strings. Each recursive call would return a list of possible strings (maybe just one). In case the children belong to a . (AND) operation, then all lists of strings found for the children must be combined as a Cartesian product. In case they belong to a | (OR) operation, these lists should be just concatenated (since all possible strings for any of the children are possible for the parent).
Here is some code that could be useful:
import itertools
import re
# subclass list just to distinguish between arguments of a "." or "|" operator
class OrList(list): # for "|" operator arguments
pass
class AndList(list): # for "." operator arguments
pass
def parse(tokens):
# helper function to conditionally either append-to or extend a list
def add(lst, factor):
if type(factor) == type(lst):
lst.extend(factor)
else:
lst.append(factor)
return lst
# helper function to conditionally return the given list or its only element
def simplify(lst):
if isinstance(lst, OrList): # remove duplicates
for i in range(len(lst)-1,-1,-1):
if lst.index(lst[i]) < i:
del lst[i]
return lst if len(lst) > 1 else lst[0]
it = iter(tokens + [""])
def recur(end):
terms = OrList()
factors = AndList()
token = None
while token != end:
token = next(it)
if token[0] == '"':
add(factors, token[1:-1]) # remove the quotes
elif token == "(":
add(factors, recur(")"))
elif token == "[": # encode [expr] as if it were: (|expr)
add(factors, add(OrList([None]), recur("]")))
else:
raise ValueError("Expected expression, got {}".format(token))
token = next(it)
if token != end and token not in ".|":
raise ValueError("Expected operator, got {}".format(token))
if token == "|" or token == end:
add(terms, simplify(factors))
factors = AndList()
return simplify(terms)
return recur("")
def validPhrases(node): # returns list of strings
if isinstance(node, list):
results = [validPhrases(child) for child in node]
if isinstance(node, OrList):
# flatten
results = list(itertools.chain.from_iterable(results))
else: # instance of AndList
# all combinations
results = [" ".join([s for s in combi if s != None])
for combi in itertools.product(*results)]
return list(set(results)) # remove duplicates
else: # str
return [node]
def generate(expr):
# tokenize
tokens = re.findall(r'"[^"]*"|\S', expr) # don't throw away the quotes yet
# make a tree in line with precedence rules
tree = parse(tokens)
# finally build all possible strings from this tree
return validPhrases(tree)
expr = '((("one" . "two" | "twelve") . ("zero" | "o")) . "six")'
print(generate(expr))
See it run on repl.it

Related

Regex for three expressions with 'AND'

I need to return true if a string matches three regexes. I have a lot of regex options around each regex pattern. I can use separate match/scan for each of the three values, and conjoin them with AND to see if they all return TRUE. The pipe does not work.
In the code below, I need to get TRUE only for the first mystring3:
mystr3= ' OK 3 values MyServer and myNode and myuser TRUE '
mystr2= 'has on 2 values mynode## and .myserver should be FALSE'
mystr1= ' has on 1 values Myserver should be FALSE'
regex1 = /\bmyserver\b/i ; regex2 = /\bmynode\b/i ; regex3 = /\bmyuser\b/i
regex = /#{regex1}|#{regex2}|#{regex3}/ ## AND /#{regex2}/ and /#{regex3}/
p 'match3 ' + mystr3.scan(regex).to_s
p 'match2 ' + mystr2.scan(regex).to_s
But I think there should be something easier than that.
To check to see that the string matches all three, you can use lookahead for the subexpression three times:
regex = /^(?=.*#{regex1})(?=.*#{regex2})(?=.*#{regex3})/

What is the empty statement in Golang?

In Python we can use pass clause as an placeholder.
What is the equivalent clause in Golang?
An ; or something else?
The Go Programming Language Specification
Empty statements
The empty statement does nothing.
EmptyStmt = .
Notation
The syntax is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following
operators, in increasing precedence:
| alternation
() grouping
[] option (0 or 1 times)
{} repetition (0 to n times)
Lower-case production names are used to identify lexical tokens.
Non-terminals are in CamelCase. Lexical tokens are enclosed in double
quotes "" or back quotes ``.
The form a … b represents the set of characters from a through b as
alternatives. The horizontal ellipsis … is also used elsewhere in the
spec to informally denote various enumerations or code snippets that
are not further specified. The character … (as opposed to the three
characters ...) is not a token of the Go language.
The empty statement is empty. In EBNF (Extended Backus–Naur Form) form: EmptyStmt = . or an empty string.
For example,
for {
}
var no
if true {
} else {
no = true
}

Grouping regex based on the previous grouping result

I have some parameters that I have to sort into different lists. The prefix determines which list should it belong to.
I use prefixes like: c, a, n, o and an additional hyphen (-) to determine whether to put it in include l it or exclude list.
I use the regex grouped as:
/^(-?)([o|a|c|n])(\w+)/
But here the third group (\w+) is not generic, and it should actually be dependent on the second group's result. I.e, if the prefix is:
'c' or 'a' -> /\w{3}/
'o' -> /\w{2}/
else -> /\w+/
Can I do this with a single regex? Currently I am using an if condition to do so.
Example input:
Valid:
"-cABS", "-aXYZ", "-oWE", "-oqr", "-ncanbeanyting", "nstillanything", "a123", "-conT" (will go to c_exclude_list)
Invalid:
"cmorethan3chars", "c1", "-a1234", "prefizisnotvalid", "somethingelse", "oABC"
Output: for each arg push to the correct list, ignore the invalid.
c_include_list, c_exclude_list, a_include_list, a_exclude_list etc.
You can use this pattern:
/(-?)\b([aocn])((?:(?<=[ac])\w{3}|(?<=o)\w{2}|(?<=n)\w+))\b/
The idea consists to use lookbehinds to check the previous character without including it in the capture group.
Since version 2.0, Ruby has switched from Oniguruma to Onigmo (a fork of Oniguruma), which adds support for conditional regex, among other features.
So you can use the following regex to customize the pattern based on the prefix:
^-(?:([ca])|(o)|(n))?(?(1)\w{3}|(?(2)\w{2}|(?(3)\w+)))$
Demo at rubular
Is a single, mind-bending regex the best way to deal with this problem?
Here's a simpler approach that does not employ a regex at all. I suspect that it would be at least as efficient as a single regex, considering that with the latter you must still assign matching strings to their respective arrays. I think it also reads better and would be easier to maintain. The code below should be easy to modify if I have misunderstood some fine points of the question.
Code
def devide_em_up(str)
h = { a_exclude: [], a_include: [], c_exclude: [], c_include: [],
o_exclude: [], o_include: [], other_exclude: [], other_include: [] }
str.split.each do |s|
exclude = (s[0] == ?-)
s = s[1..-1] if exclude
first = s[0]
s = s[1..-1] if 'cao'.include?(first)
len = s.size
case first
when 'a'
(exclude ? h[:a_exclude] : h[:a_include]) << s if len == 3
when 'c'
(exclude ? h[:c_exclude] : h[:c_include]) << s if len == 3
when 'o'
(exclude ? h[:o_exclude] : h[:o_include]) << s if len == 2
else
(exclude ? h[:other_exclude] : h[:other_include]) << s
end
end
h
end
Example
Let's try it:
str = "-cABS cABT -cDEF -aXYZ -oWE -oQR oQT -ncanbeany nstillany a123 " +
"-conT cmorethan3chars c1 -a1234 prefizisnotvalid somethingelse oABC"
devide_em_up(str)
#=> {:a_exclude=>["XYZ"], :a_include=>["123"],
# :c_exclude=>["ABS", "DEF"], :c_include=>["ABT"],
# :o_exclude=>["WE", "QR"], :o_include=>["QT"],
# :other_exclude=>["ncanbeany"], :other_include=>["nstillany"]}

Complementary DNA sequence

I'm having a problem writing this loop; it seems to stop after the second sequence.
I want to return the complementary DNA sequence to the given DNA sequence.
E.g. ('AGATTC') -> ('TCTAAG'), where A:T and C:G
def get_complementary_sequence(dna):
"""(str) -> str
> Return the DNA sequence that is complementary to the given DNA sequence
>>> get_complementary_sequence('AT')
('TA')
>>> get_complementary_sequence('AGATTC')
('TCTAAG')
"""
x = 0
complementary_sequence = ''
for char in dna:
complementary_sequence = (get_complement(dna))
return complementary_sequence + (dna[x:x+1])
Can anyone spot why the loop does not continue?
Here is an example how I would do it - only several lines of code really:
from string import maketrans
DNA="CCAGCTTATCGGGGTACCTAAATACAGAGATAT" #example DNA fragment
def complement(sequence):
reverse = sequence[::-1]
return reverse.translate(maketrans('ATCG','TAGC'))
print complement(DNA)
What's wrong?
You're calling:
complementary_sequence = (get_complement(dna))
...n times where n is the length of the string. This leaves you with whatever the return value of get_complement(dna) is in complementary_sequence. Presumably just one letter.
You then return this one letter (complementary_sequence) followed by the substring dna[0:1] (i.e. the first letter in dna), because x is always 0.
This would be why you always get two characters returned.
How to fix it?
Assuming you have a function like:
def get_complement(d):
return {'T': 'A', 'A': 'T', 'C': 'G', 'G': 'C'}.get(d, d)
...you could fix your function by simply using str.join() and a list comprehension:
def get_complementary_sequence(dna):
"""(str) -> str
> Return the DNA sequence that is complementary to the given DNA sequence
>>> get_complementary_sequence('AT')
('TA')
>>> get_complementary_sequence('AGATTC')
('TCTAAG')
"""
return ''.join([get_complement(c) for c in dna])
You call get_complement on all of dna instead of each char. This will simply call the came function with the same parameters len(dna) times. There's no reason to loop through the chars if you never use them. If get_complement() can take a char, I would recommend:
for char in dna:
complementary_sequence += get_complement(char)
The implementation of get_complement would take a single character and return its complement.
Also, you're returning complementary_sequence + (dna[x:x+1]). If you want the function to conform to the behavior that you've documented, the + (dna[x:x+1]) will add an extra (wrong) character from the beginning off the dna string. All you need to return is complementary_sequence! Thanks to #Kevin for noticing.
What you're doing:
>>> dna = "1234"
>>> for char in dna:
... print dna
...
1234
1234
1234
1234
what I think is closer to what you want to be doing:
>>> for char in dna:
... print char
...
1
2
3
4
Putting it all together:
# you could also use a list comprehension, with a join() call, but
# this is closer to your original implementation.
def get_complementary_sequence(seq):
complement = ''
for char in seq:
complement += get_complement(char)
return complement
def get_complement(base):
complements = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
return complements[base]
>>> get_complementary_sequence('AT')
'TA'
>>> get_complementary_sequence('AGATTC')
'TCTAAG'

Checking if a string has balanced parentheses

I am currently working on a Ruby Problem quiz but I'm not sure if my solution is right. After running the check, it shows that the compilation was successful but i'm just worried it is not the right answer.
The problem:
A string S consisting only of characters '(' and ')' is called properly nested if:
S is empty,
S has the form "(U)" where
U is a properly nested string,
S has
the form "VW" where V and W are
properly nested strings.
For example, "(()(())())" is properly nested and "())" isn't.
Write a function
def nesting(s)
that given a string S returns 1 if S
is properly nested and 0 otherwise.
Assume that the length of S does not
exceed 1,000,000. Assume that S
consists only of characters '(' and
')'.
For example, given S = "(()(())())"
the function should return 1 and given
S = "())" the function should return
0, as explained above.
Solution:
def nesting ( s )
# write your code here
if s == '(()(())())' && s.length <= 1000000
return 1
elsif s == ' ' && s.length <= 1000000
return 1
elsif
s == '())'
return 0
end
end
Here are descriptions of two algorithms that should accomplish the goal. I'll leave it as an exercise to the reader to turn them into code (unless you explicitly ask for a code solution):
Start with a variable set to 0 and loop through each character in the string: when you see a '(', add one to the variable; when you see a ')', subtract one from the variable. If the variable ever goes negative, you have seen too many ')' and can return 0 immediately. If you finish looping through the characters and the variable is not exactly 0, then you had too many '(' and should return 0.
Remove every occurrence of '()' in the string (replace with ''). Keep doing this until you find that nothing has been replaced (check the return value of gsub!). If the string is empty, the parentheses were matched. If the string is not empty, it was mismatched.
You're not supposed to just enumerate the given examples. You're supposed to solve the problem generally. You're also not supposed to check that the length is below 1000000, you're allowed to assume that.
The most straight forward solution to this problem is to iterate through the string and keep track of how many parentheses are open right now. If you ever see a closing parenthesis when no parentheses are currently open, the string is not well-balanced. If any parentheses are still open when you reach the end, the string is not well-balanced. Otherwise it is.
Alternatively you could also turn the specification directly into a regex pattern using the recursive regex feature of ruby 1.9 if you were so inclined.
My algorithm would use stacks for this purpose. Stacks are meant for solving such problems
Algorithm
Define a hash which holds the list of balanced brackets for
instance {"(" => ")", "{" => "}", and so on...}
Declare a stack (in our case, array) i.e. brackets = []
Loop through the string using each_char and compare each character with keys of the hash and push it to the brackets
Within the same loop compare it with the values of the hash and pop the character from brackets
In the end, if the brackets stack is empty, the brackets are balanced.
def brackets_balanced?(string)
return false if string.length < 2
brackets_hash = {"(" => ")", "{" => "}", "[" => "]"}
brackets = []
string.each_char do |x|
brackets.push(x) if brackets_hash.keys.include?(x)
brackets.pop if brackets_hash.values.include?(x)
end
return brackets.empty?
end
You can solve this problem theoretically. By using a grammar like this:
S ← LSR | LR
L ← (
R ← )
The grammar should be easily solvable by recursive algorithm.
That would be the most elegant solution. Otherwise as already mentioned here count the open parentheses.
Here's a neat way to do it using inject:
class String
def valid_parentheses?
valid = true
self.gsub(/[^\(\)]/, '').split('').inject(0) do |counter, parenthesis|
counter += (parenthesis == '(' ? 1 : -1)
valid = false if counter < 0
counter
end.zero? && valid
end
end
> "(a+b)".valid_parentheses? # => true
> "(a+b)(".valid_parentheses? # => false
> "(a+b))".valid_parentheses? # => false
> "(a+b))(".valid_parentheses? # => false
You're right to be worried; I think you've got the very wrong end of the stick, and you're solving the problem too literally (the info that the string doesn't exceed 1,000,000 characters is just to stop people worrying about how slow their code would run if the length was 100times that, and the examples are just that - examples - not the definitive list of strings you can expect to receive)
I'm not going to do your homework for you (by writing the code), but will give you a pointer to a solution that occurs to me:
The string is correctly nested if every left bracket has a right-bracket to the right of it, or a correctly nested set of brackets between them. So how about a recursive function, or a loop, that removes the string matches "()". When you run out of matches, what are you left with? Nothing? That was a properly nested string then. Something else (like ')' or ')(', etc) would mean it was not correctly nested in the first place.
Define method:
def check_nesting str
pattern = /\(\)/
while str =~ pattern do
str = str.gsub pattern, ''
end
str.length == 0
end
And test it:
>ruby nest.rb (()(())())
true
>ruby nest.rb (()
false
>ruby nest.rb ((((()))))
true
>ruby nest.rb (()
false
>ruby nest.rb (()(((())))())
true
>ruby nest.rb (()(((())))()
false
Your solution only returns the correct answer for the strings "(()(())())" and "())". You surely need a solution that works for any string!
As a start, how about counting the number of occurrences of ( and ), and seeing if they are equal?

Resources