Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a list of keywords call it [cat, dog, bird] and a regex to find a weight (\dlbs).
I only want to find items that
start with the items in the array
(cat|dog|bird?)
match the weight regex
(\dlbs)
only have a max of 30 characters (excluding whitespace) between 1. and 2.
do not want to or care to capture the 1-30 characters
Any help appreciated!
This will do it:
(cat|dog|bird)\s*(?:(?:\S\s*){0,30})(\dlbs)
Debuggex Demo
Edited to reflect the "excluding whitespace" point.
It matches, for example, each of the following:
The cat weighs almost exactly 7lbs.
The cat weighs almost exactly 7lbs
Note: you appear to have a stray ? in your question in (cat|dog|bird?) - I have ignored it. Also, are you sure you will have \dlbs, not, say, 17 lbs or 17 pounds? You can easily address those scenarios with
(cat|dog|bird)\s*(?:(?:\S\s*){0,30})(\d\s*(?:lb|pound)s)
(\s*.){0,30} Maximum 30 characters excluding white spaces:
(cat|dog|bird)(\s*.){0,30}\s*(\dlbs)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Design an algorithm to determine how many numbers are in a string. Example, given the string "Hello people from the 4 worlds, this is my only 1 program", the output must be 2.
Basically you need to write a simple parser to parse out the numbers in your string. To do that you need to be able to recognise a number correctly, which is a little more complicated than just recognising digits. Something like "-12,348.971" is a number, but contains the characters -,. which are not digits. However, the string "-,." is not itself a number.
Read through the string, character by character. When the parser finds the start of a number, count one more number found, and read through all the characters that form that number. Read '123' as a single number, not three numbers. When you reach the end of the number skip over non-number characters until either you find the next number or you reach the end of the file.
You might want to read up on writing a simple parser in the language of your choice.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a situation where I need to grab a large number from a string. The two cases I'm working with are:
1) when the number is made up of only numbers, like 265038960
2) When the number has a letter appended to it, like 69235M
I've been using the regex pattern
(\d.+)[A-Z]
This works for the second case and grabs '69235' without the 'M', but breaks on the first case where a letter is not found.
How can I use a condition within the regex to only parse out the number whether or not a letter is present at the end of the string?
(\d+[A-Z]?) # capture any number of digits, together with 0 or 1 uppercase letter
It's not clear if you want to capture the letter or not. In the case you want to dispose of the letter:
(\d+)[A-Z]? # capture any number of digits, followed by 0 or 1 uppercase letter
Look at example
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
My problem is simple: I have a database containing 400,000 substrings (movies and tv shows titles).
I'd like to match these titles in a message such as:
I really love Game Of Thrones and Suits, also Spotlight is an awesome
movie.
What I need is to match Game Of Thrones, Suits and Spotlight in this string.
I tried to send all titles to wit.ai but it seems that it can't handle 100,000 substrings.
I'm wondering if elasticsearch could do the job?
If that's a common problem, sorry, could you help me to search in the right direction.
Thanks!
One of the best algorithms to find strings from dictionary in a text is Aho-Corasick one
dictionary-matching algorithm that locates elements of a finite set of
strings (the "dictionary") within an input text. It matches all
strings simultaneously. The complexity of the algorithm is linear in
the length of the strings plus the length of the searched text plus
the number of output matches.
But I wonder that your database engine does not provide possibilities for such searching... Probably it really can, but you don't know?
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In order to support users learning English, I want to make a multiple-choice quiz using the vocabulary that the user is studying.
For example, if the user is learning "angel" then I need an algorithm to produce some similar words such as "angle" and "angled"
Another example, if the user is learning "accountant" then I need an algorithm to produce some similar words such as "accounttant" and "acountant", "acounttant"
You could compute the Levenshtein Distance from the starting word to each word in your vocabulary and pick the 2 or 3 shortest ones.
Depending on how many words are in your dictionary this might take a long time though, so I would recommend bailing out after a certain (small) number of steps - i.e. if you have made 3 mutations and still haven't arrived at your target word then stop and move on to the next one.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What are the best string matching algorithms which can be used to search multiple patterns within a string?
For looking for exact match to a number of different strings I favour the Aho-Corasick string matching algorithm, but there are a number of possible contenders, depending on what your patterns are. One starting point to see what is around in practical use would be look at the different variants of grep mentioned on Wikipedia or pointed to from there.