EditPad: How do you work with replacement string conditionals? - editpad

I want to be able to search for a letter using Regex and then convert the case of the letter. I know I can refer to the letters and use this:
Search for ^[adw] and replace with \U0 for example but I can't do this searching for ^\w so the expression applies to any letter in the search.
I have read a reply from the author of the product that refers to this:
https://www.editpadpro.com/history.html#800
In here the relevant change is:
Regex: Replacement string conditionals in the form of (?1matched:unmatched) and (?{name}matched:unmatched)
But it's not clear to me what this means. How do I actually use this?
If I have these strings and want to change the case of all the first letters in each sentence how would I use this conditional syntax to achieve it?
an apple a day.
dogs are great animals.
what about a game of football?
The result I want is this:
An apple a day.
Dogs are great animals.
What about a game of football?

Related

Power Query remove repetitive substrings

I have a column in Power Query (standalone power query with Excel), with text like this
"Hazelnut Berries Nuts Raspberry"
I need to be able to identify if there are more than 1 instance of "nut" ("berry") in it and remove generic word, to have result as
"Hazelnut Raspberry"
I have seen this post, but it works off whole words repeated.
I'm not entirely certain about your criteria for searching for the words you want to remove (PQ is fairly limited in how it can evaluate this with built in functions anyways). This will look through that string and remove any words that start with "Nut" or "Berr".
Text.Combine(List.Transform(Text.Split("Hazelnut Berries Nuts Raspberry", " "), each if (Text.StartsWith(_, "Nut") or Text.StartsWith(_, "Berr")) then null else _), " ")
Which will get your desired output. Don't know if you need more detailed criteria for evaluating each word, but that would probably need a custom function.
List.Distinct: https://learn.microsoft.com/en-ie/powerquery-m/list-distinct should do it; something like: List.Distinct(Text.Split("Hazelnut Berries Nuts Raspberry", " "))
You might need a bit more if your list could contain multiple spaces or other "stuff"

Replace all A with B and replace all B with A

Suppose I want to switch certain pairs of words. Say, for example, I want to switch dogs with cats and mice with rats, so that
This is my opinion about dogs and cats: I like dogs but I don't like cats. This is my opinion about mice and rats: I'm afraid of mice but I'm not afraid of rats.
becomes
This is my opinion about cats and dogs: I like cats but I don't like dogs. This is my opinion about rats and mice: I'm afraid of rats but I'm not afraid of mice.
The naŃ—ve approach
text = text.replace("dogs", "cats")
.replace("cats", "dogs")
.replace("mice", "rats")
.replace("rats", "mice")
is problematic since it can perform replacement on the same words multiple times. Either of the above example sentences would become
This is my opinion about dogs and dogs: I like dogs but I don't like dogs. This is my opinion about mice and mice: I'm afraid of mice but I'm not afraid of mice.
What's the simplest algorithm for replacing string pairs, while preventing something from being replaced multiple times?
Use whichever string search algorithm you deem to be appropriate, as long as it is able to search for regular expressions. Search for a regex that matches all the words you want to swap, e.g. dogs|cats|mice|rats. Maintain a separate string (in many languages, this needs to be some kind of StringBuilder in order for repeated appending to be fast) for the result, initially empty. For each match, you append the characters between the end of the previous match (or the beginning of the string) and the current match, and then you append the appropriate replacement (presumably obtained from a hashmap) to the result.
Most standard libraries should allow you to do this easily with built-in methods. For a Java example, see the documentation of Matcher.appendReplacement(StringBuffer, String). I recall doing this in C# as well, using a feature where you can specify a lambda function that decides what to replace each match with.
A naive solution that avoids any unexpected outcomes would be to replace each string with a temporary string, and then replace the temporary strings with the final strings. This assumes however, that you can form a string which is known not to be in the text, e.g.
text = text.replace("dogs", "{]1[}")
.replace("cats", "{]2[}")
.replace("mice", "{]3[}")
.replace("rats", "{]4[}")
.replace("{]2[}", "dogs")
.replace("{]1[}", "cats")
.replace("{]4[}", "mice")
.replace("{]3[}", "rats")
I am admittedly not very familiar with regex, so my idea is to create an array then loop through the elements to see if it should be replaced. First split() the sentence into an array of words:
String text = "This is my opinion about dogs and cats: I like dogs but I don't like cats.";
String[] sentence = text.split("[^a-zA-Z]"); //can't avoid regex here
Then use a for loop which contains a series of if statements to replace words:
for(int i = 0; i < sentence.length; i++) {
if(sentence[i].equals("cats") {
sentence[i] = "dogs";
}
//more similar if statements
}
Now sentence[] contains the new sentence with words. Some regex magic should allow you to also keep punctuation marks. I hope this helps, and please let me know if anything could be improved.

What Ruby Regex code can I use for obtaining "out of sight" from the input "outofsight"?

I'm building an application that returns results based on a movie input from a user. If the user messes up and forgets to space out the title of the movie is there a way I can still take the input and return the correct data? For example "outofsight" will still be interpreted as "out of sight".
There is no regex that can do this in a good and reliable way. You could try a search server like Solr.
Alternatively, you could do auto-complete in the GUI (if you have one) on the input of the user, and this way mitigate some of the common errors users can end up doing.
Example:
User wants to search for "outofsight"
Starts typing "out"
Sees "out of sight" as suggestion
Selects "out of sight" from suggestions
????
PROFIT!!!
There's no regex that can tell you where the word breaks were supposed to be. For example, if the input is "offlight", is it supposed to return "Off Light" or "Of Flight"?
This is impossible without a dictionary and some kind of fuzzy-search algorithm. For the latter see How can I do fuzzy substring matching in Ruby?.
You could take a string and put \s* in between each character.
So outofsight would be converted to:
o\s*u\s*t\s*o\s*f\s*s\s*i\s*g\s*h\s*t
... and match out of sight.
You can't do this with regular expressions, unless you want to store one or more patterns to match for each movie record. That would be silly.
A better approach for catching minor misspellings would be to calculate Levenshtein distances between what the user is typing and your movie titles. However, when your list of movies is large, this will become a rather slow operation, so you're better off using a dedicated search engine like Lucene/Solr that excels at this sort of thing.

ALL CAPS to Normal case

I'm trying to find an elegant solution on how to convert something like this
ALL CAPS TEXT. "WHY ANYONE WOULD USE IT?" THIS IS RIDICULOUS! HELP.
...to regular-case. I could more or less find all sentence-starting characters with:
(?<=^|(\. \"?)|(! ))[A-Z] #this regex sure should be more complex
but (standard) Ruby neither allows lookbehinds, nor it is possible to apply .capitalize to, say, gsub replacements. I wish I could do this:
"mytext".gsub(/my(regex)/, '\1'.capitalize)
but the current working solution would be to
"mytext".split(/\. /).each {|x| p x.capitalize } #but this solution sucks
First of all, notice that what you are trying to do will only be an approximation.
You cannot correctly tell where the sentence boundaries are. You can approximate it as The beginning of the entire string or right after a period, question mark, or exclamation mark followed by spaces. But then, you will incorrectly capitalize "economy" in "U.S. economy".
You cannot correctly tell which words should be capitalized. For example, "John" will be "john".
You may want to do some natural language processing to give you a close-to-correct result in many cases, but these methods are only probablistically correct. You will never get a perfect result.
Understanding these limitations, you might want to do:
mytext.gsub(/.*?(?:[.?!]\s+|\z)/, &:capitalize)

Skipping Characters in Regex

I have the following data
Animals = Dog Cat Turtle \
Mouse Parrot \
Snake
I would like the regex to construct a match of just the animals with none of the backslashes: Dog Cat Turtle Mouse Parrot Snake
I've got a regex, but need some help finishing it off.
/ANIMALS\s*=\s*([^\\\n]*)/
Since you specified a language, I need to ask you this: Why are you relying on the regex for everything? Don't make the problem harder than it has to be by forcing the regex to do everything.
Try this approach instead...
Use gsub! to get rid of the backslashes.
split the string on any whitespace.
Shift out the first two tokens ("Animals", "=").
Join the array with a single space, or whatever other delimiter.
So the only regex you might need is one for the whitespace delimiter split. I don't know Ruby well enough to say exactly how you would do that, though.
How 'bout the regex \b(?!Animals\b)\w+\b which matches all words that aren't Animals? Use the scan method to collect all such matches, e.g.
matchArray = sourceString.scan(/\b(?!Animals\b)\w+\b/)
Make sure you are matching with ignore-case, because ANIMALS will not match Animals without it.

Resources