Is there a way to check if two regexps can match the same string? [duplicate] - algorithm

This question already has answers here:
Regex: Determine if two regular expressions could match for the same input?
(5 answers)
Closed 10 years ago.
I have two regexps. I need to determine if it is possible to build string of given length that matches these two regexps simultaneously. I need algorithm to do that.
String's length wouldn't exceed 20 characters.

It depends. For perl compatible regular expressions (pcre), this is not generally possible, as they are turing complete: you cannot even be sure that matching always terminates.
For the original, "clean" form of reguler languages as defined in the Chomsky-hierarchy, it is known that they are closed under intersection, this is already discussed in this thread.
As soon as you have the NFA for the intersection, it is easy to check whether any string matches it - if thera is a path from the start to the end of your NFA, then the string for this path is the string you are searching for, for DFAs, an algorithm is given here, it should be simple to adapt it to NFAs.

Related

Ruby regular expression for sequence with specified start and end [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 7 years ago.
I have this string:
mRNA = "gcgagcgagcaugacgcauguactugacaugguuuaaggccgauuagugaaugugcagacgcgcauaguggcgagcuaaaaacat"
I want to upcase subsequences out of this given sequence. A subsequence should start with aug and should end with either uaa, uag or uga.
When I use the following regular expression in combination with gsub!:
mRNA.gsub!(/(aug.*uaa)|(aug.*uag)|(aug.*uga)/, &:upcase)
it results in
gcgagcgagcAUGACGCAUGUACTUGACAUGGUUUAAGGCCGAUUAGUGAAUGUGCAGACGCGCAUAGUGGCGAGCUAAaaacat
I don’t understand why it upcases one whole chunk instead of giving me two subsequences like this:
gcgagcgagcAUGACGCAUGUACTUGACAUGGUUUAAggccgauuagugaAUGUGCAGACGCGCAUAGuggcgagcuaaaaacat
What regular expression can I use to achieve this?
The .* operator is known as "greedy," which means it will grab up as many characters as it can while still matching the pattern.
To grab the smallest possible number of characters, use the "non-greedy" operator, .*?.
Modifying your original regex:
mRNA.gsub!(/(aug.*?uaa)|(aug.*?uag)|(aug.*?uga)/, &:upcase)
There are certainly smaller regexes that will do the job, though. Using #stribizhev's suggestion:
mRNA.gsub!(/aug.*?(?:uaa|uag|uga)/, &:upcase)

Is there a reason to use arithmetic expression n*(1/k) over n/k? [duplicate]

This question already has answers here:
Is Multiplying the Inverse Better or Worse?
(11 answers)
Closed 7 years ago.
Sometimes I encounter in arithmetic operations expression like this: n*(1/k).
Such expression can be presented in simpler manner: n/k.
I could imagine that in certain situations the former could be more descriptive if (1/k) represents well known ingredient but it is not always the case.
What about performance gains/losses? What about precision?
Is there any hidden reason that some developers use n*(1/k) form?
To me i find that using n*(1/k) will be having a lesser accurate answer because of the reason that when the control solves an produces the result of (1/k) the will be situations that may cause rounding off or trimming of the result that may lead to the loss of accuracy.And during the multiplication process the magnitude of the loss increases.Hence as far as i am concerned i will say n/k is better

Find the words in string with no spaces [duplicate]

This question already has answers here:
Detect most likely words from text without spaces / combined words
(5 answers)
Closed 8 years ago.
Lets suppose a string with no spaces:
Input : "putreturnsbetwenparagaphs"
Output : put returns between paragraphs
This could get more complex as more words overlap. How to achieve this really fast. If required does spell corrections and splits the word. Think about it.
One problem could be the plural or case of the word. In your example it could be difficult to make a difference between paragraph and paragraphs.
Do you have more information? Are some words in a explicit grammatical form, or could any word of a common dictionary including case, numerus etc. occour?

Regex to ristrict to accept maximum 14 digits [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Last time I was asked for checking string for minimum 8 digits. And I got following regex:
/^(?=(.*\d){8,})[\d\(\)\s+-]{8,}$/
You can see the question here: Checking string with minimum 8 digits using regex
Now I want to restrict string to accept maximum 14 digits in same regex. And I tried this:
/^(?=(.*\d){8,14})[\d\(\)\s+-]{8,}$/
No luck. Please anyone help me in fixing this.
UPDATE
After getting 2 down votes I thought better to write my own. I constructed regex using previous regex. Following regex works for me:
/^(?=(.*\d){8})(?!(.*\d){15})[\d\(\)\s+-]{8,}$/
By your request, the regex should be as simple as:
/^\d{8,14}$/
From your answer, and your other question, it seems like you are encoding a whole bunch of different rules into one increasingly complicated regex:
the string must be at least 12 chars long
it can only contain digits, parentheses, + and - signs, and spaces
there must be between 8 and 14 digits
While it's possible to do this with a regex, is it worthwhile? I'd argue that such a complicated regex is impossible to read, and therefore difficult to maintain.
If you split up the different criteria, it'll be much clearer.
string.length >= 12
string =~ /^[\d()+-\s]+$/ - note that by using square brackets to create a character class, you don't need to escape things, which also makes it much simpler.
(8..14).include?(string.count("0-9")) - check out the docs for String#count
So, altogether,
def valid?(string)
string.length >= 12 &&
string =~ /^[\d()+-\s]+$/ &&
(8..14).include?(string.count("0-9"))
end
It's a bit longer but it's a heck of a lot more understandable.
Try:
/^(?=(.*\d){8,14}(?!.*\d))[\d\(\)\s+-]{8,}$/
If I got the placement of the negative look ahead right I should fail to match a strimg with more than 14 digits.
Try also adding the 14 after the second instance of "8," if the previous regex achieved what you wanted - but it really is more complex than just 8-14 digits!

Compare two versions of a text file and find additions/removals with Ruby? [duplicate]

This question already has answers here:
diff a ruby string or array
(12 answers)
Closed 8 years ago.
I am tracking changes in a web-page using Ruby. After I removed all html tags and blank lines, I get an array of lines which needs to be checked for additions/removals assuming that there may be repetitions. Could you recommend a good gem if it has been done already?
I could make the array lines unique and then the problem is avoided. But what if I need to track the repeated lines as well with respect to their position in the text?
Sounds like a textbook case of where you'd want to use the Diff algorithm.
There's a 'diff' gem, although to be fair I've never used it: http://rubygems.org/gems/diff

Resources