Find the words in string with no spaces [duplicate] - algorithm

This question already has answers here:
Detect most likely words from text without spaces / combined words
(5 answers)
Closed 8 years ago.
Lets suppose a string with no spaces:
Input : "putreturnsbetwenparagaphs"
Output : put returns between paragraphs
This could get more complex as more words overlap. How to achieve this really fast. If required does spell corrections and splits the word. Think about it.

One problem could be the plural or case of the word. In your example it could be difficult to make a difference between paragraph and paragraphs.
Do you have more information? Are some words in a explicit grammatical form, or could any word of a common dictionary including case, numerus etc. occour?

Related

Bash regex to match a word folowed by numbers or not [duplicate]

This question already has an answer here:
Regex - two specific digits followed by optional digits
(1 answer)
Closed 3 years ago.
I want to match this strings value, value1, value2.
I got the number so far, but I need to match the word with no numbers after, also.
sed -e 's/value[0-9]//g'
You can combine a multiple expressions into one by separating with semicolons. Hope this helps.
sed 's/value[0-9]//g;s/value//g' inputfile

Ruby regular expression for sequence with specified start and end [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 7 years ago.
I have this string:
mRNA = "gcgagcgagcaugacgcauguactugacaugguuuaaggccgauuagugaaugugcagacgcgcauaguggcgagcuaaaaacat"
I want to upcase subsequences out of this given sequence. A subsequence should start with aug and should end with either uaa, uag or uga.
When I use the following regular expression in combination with gsub!:
mRNA.gsub!(/(aug.*uaa)|(aug.*uag)|(aug.*uga)/, &:upcase)
it results in
gcgagcgagcAUGACGCAUGUACTUGACAUGGUUUAAGGCCGAUUAGUGAAUGUGCAGACGCGCAUAGUGGCGAGCUAAaaacat
I don’t understand why it upcases one whole chunk instead of giving me two subsequences like this:
gcgagcgagcAUGACGCAUGUACTUGACAUGGUUUAAggccgauuagugaAUGUGCAGACGCGCAUAGuggcgagcuaaaaacat
What regular expression can I use to achieve this?
The .* operator is known as "greedy," which means it will grab up as many characters as it can while still matching the pattern.
To grab the smallest possible number of characters, use the "non-greedy" operator, .*?.
Modifying your original regex:
mRNA.gsub!(/(aug.*?uaa)|(aug.*?uag)|(aug.*?uga)/, &:upcase)
There are certainly smaller regexes that will do the job, though. Using #stribizhev's suggestion:
mRNA.gsub!(/aug.*?(?:uaa|uag|uga)/, &:upcase)

Is there a way to check if two regexps can match the same string? [duplicate]

This question already has answers here:
Regex: Determine if two regular expressions could match for the same input?
(5 answers)
Closed 10 years ago.
I have two regexps. I need to determine if it is possible to build string of given length that matches these two regexps simultaneously. I need algorithm to do that.
String's length wouldn't exceed 20 characters.
It depends. For perl compatible regular expressions (pcre), this is not generally possible, as they are turing complete: you cannot even be sure that matching always terminates.
For the original, "clean" form of reguler languages as defined in the Chomsky-hierarchy, it is known that they are closed under intersection, this is already discussed in this thread.
As soon as you have the NFA for the intersection, it is easy to check whether any string matches it - if thera is a path from the start to the end of your NFA, then the string for this path is the string you are searching for, for DFAs, an algorithm is given here, it should be simple to adapt it to NFAs.

Load a text file containing both numbers and letters 2

This question is related to Load a text file containing both numbers and letters but I ask for the opposite
blabla<tab>1
blabla<tab>2
...
Do I need fscanf in this case also?
EDIT: the answer to the question mentioned seams to be for treating individual characters as cols. In my case, I have strings of different length.
I found the textread function. It solved the problem.

Compare two versions of a text file and find additions/removals with Ruby? [duplicate]

This question already has answers here:
diff a ruby string or array
(12 answers)
Closed 8 years ago.
I am tracking changes in a web-page using Ruby. After I removed all html tags and blank lines, I get an array of lines which needs to be checked for additions/removals assuming that there may be repetitions. Could you recommend a good gem if it has been done already?
I could make the array lines unique and then the problem is avoided. But what if I need to track the repeated lines as well with respect to their position in the text?
Sounds like a textbook case of where you'd want to use the Diff algorithm.
There's a 'diff' gem, although to be fair I've never used it: http://rubygems.org/gems/diff

Resources