Regex to ristrict to accept maximum 14 digits [closed] - ruby

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Last time I was asked for checking string for minimum 8 digits. And I got following regex:
/^(?=(.*\d){8,})[\d\(\)\s+-]{8,}$/
You can see the question here: Checking string with minimum 8 digits using regex
Now I want to restrict string to accept maximum 14 digits in same regex. And I tried this:
/^(?=(.*\d){8,14})[\d\(\)\s+-]{8,}$/
No luck. Please anyone help me in fixing this.
UPDATE
After getting 2 down votes I thought better to write my own. I constructed regex using previous regex. Following regex works for me:
/^(?=(.*\d){8})(?!(.*\d){15})[\d\(\)\s+-]{8,}$/

By your request, the regex should be as simple as:
/^\d{8,14}$/

From your answer, and your other question, it seems like you are encoding a whole bunch of different rules into one increasingly complicated regex:
the string must be at least 12 chars long
it can only contain digits, parentheses, + and - signs, and spaces
there must be between 8 and 14 digits
While it's possible to do this with a regex, is it worthwhile? I'd argue that such a complicated regex is impossible to read, and therefore difficult to maintain.
If you split up the different criteria, it'll be much clearer.
string.length >= 12
string =~ /^[\d()+-\s]+$/ - note that by using square brackets to create a character class, you don't need to escape things, which also makes it much simpler.
(8..14).include?(string.count("0-9")) - check out the docs for String#count
So, altogether,
def valid?(string)
string.length >= 12 &&
string =~ /^[\d()+-\s]+$/ &&
(8..14).include?(string.count("0-9"))
end
It's a bit longer but it's a heck of a lot more understandable.

Try:
/^(?=(.*\d){8,14}(?!.*\d))[\d\(\)\s+-]{8,}$/
If I got the placement of the negative look ahead right I should fail to match a strimg with more than 14 digits.

Try also adding the 14 after the second instance of "8," if the previous regex achieved what you wanted - but it really is more complex than just 8-14 digits!

Related

Count Number of Sentence Ruby

I happened to search around everywhere and did not managed to find a solution to count number of sentence in a String using Ruby. Does anyone how to do it?
Example
string = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
This string should return number 4.
You can split the text into sentences and count them. Here:
string.scan(/[^\.!?]+[\.!?]/).map(&:strip).count # scan has regex to split string and strip will remove trailing spaces.
# => 4
Explaining regex:
[^\.!?]
Caret inside of a character class [^ ] is the negation operator. Which means we are looking for characters which are not present in list: ., ! and ?.
+
is a greedy operator that returns matches between 1 and unlimited times. (capturing our sentences here and ignoring repetitions like ...)
[\.!?]
matching characters ., ! or ?.
In a nutshell, we are capturing all characters that are not ., ! or ? till we get characters that are ., ! or ?. Which basically can be treated as a sentence (in broad senses).
I think it makes sense to consider a word char followed by a ?! or . the delimiter of a sentence:
string.strip.split(/\w[?!.]/).length
#=> 4
So I'm not considering the ... a delimiter when it hangs on it's own like that:
"I waited a while ... and then I went home"
But then again, maybe I should...
It also occurs to me that maybe a better delimiter is a punctuation followed by some space and a capital letter:
string.split(/[?!.]\s+[A-Z]/).length
#=> 4
Sentences end with full stops, question marks, and exclamation marks. They can also be
separated with dashes and other punctuation, but we won’t worry about these rare cases here.
The split is simple. Instead of asking Ruby to split the text on one type of character, you simply
ask it to split on any of three types of characters, like so:
txt = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
sentence_count = txt.split(/\.|\?|!/).length
puts sentence_count
#=> 7
string.squeeze('.!?').count('.!?')
#=> 4

Converting PI digits into text strings

It's kind of interesting that pi's decimal representation never ends and never settles into a permanent repeating pattern. Meaning it's highly possible that pi contains every possible combination of numbers.
This guy calculated 5 trillions 5x(10^12) numbers of pi :D
http://www.numberworld.org/misc_runs/pi-5t/details.html
From the internet: "Converted into ASCII text, somewhere in that infinite string of digits is the name of every person you will ever love, the date, time and manner of your death, and the answers to all the great questions of the universe."
Wondering if somebody has already converted and analyzed the resulting string for known sequences of letters (words/sentences)?
Check out this page: http://pi.nersc.gov/.
It allows you to search for both character strings and hexadecimal sequences. Note that this search engine only has indexed the first 4 billion decimals of pi, and uses a formula for arbitrarily positioned binary or hexadecimal digits after those indexed.
The idea that Pi contains everything ever is a nice idea, but if it's correct, that means there is also an infinite amount of false things about everything ever. For example, if Pi contains a list of all the people you will ever love, then it will also have a list of people that seems that it is a list of people you will love, but in reality it's just a mix of names in a pattern that makes it look legit.
Following the same idea, the date, time, and manner of your death could also be "falsified". For example, let's say you are a man named Jason Delara, and you die at the age of 83 at 11:35 PM in your sleep. In Pi somewhere it can say in ASCII text "Jason Delara will die at age 83, 11:35 PM, passed in his sleep." It would also say somewhere else that "Jason Delara will die at age 35, at 6:00 AM, passed in a car accident." There could be an "infinite" amount of these false predictions.
There's also the fact that, if following the idea from above, all but one the answers to one of the great questions of the universe in that digit are wrong, even if many of the answers make sense. I've thought about this a lot, and I thought "What if there's part of the digit that states which facts are correct and which are not?" The answer is "Then there is an infinite amount of false lists in the digit claiming to do the same as the real list." In short, it would be pointless to convert Pi to ASCII text to try and figure everything out.
I know I'm a little late the party, but I wrote this for anybody who comes here looking for the answers to the universe in an endless, non-repeating decimal.
It is massively convenient that pi is an irrational number we're still finding digits for as if you can't find what you want in the sequence then by definition it just happens to be later on.
As for it containing hidden information - if you create any random sequence long enough, you'll be able to create simple words from the resulting output.
Conspiracy theorists just love to see patterns where there are none. They forget the other noise and are endlessly fascinated by mere coincidences.
Would just like to provide further context this question. Yes, the point is that PI goes on infinitely. That means there are endless possibilities for sentence structure and letter combination. This means every single combination of letters will happen and is happening in PI. So technically, everything in PI could apply to everything in the observable world around us.

Is there a way to check if two regexps can match the same string? [duplicate]

This question already has answers here:
Regex: Determine if two regular expressions could match for the same input?
(5 answers)
Closed 10 years ago.
I have two regexps. I need to determine if it is possible to build string of given length that matches these two regexps simultaneously. I need algorithm to do that.
String's length wouldn't exceed 20 characters.
It depends. For perl compatible regular expressions (pcre), this is not generally possible, as they are turing complete: you cannot even be sure that matching always terminates.
For the original, "clean" form of reguler languages as defined in the Chomsky-hierarchy, it is known that they are closed under intersection, this is already discussed in this thread.
As soon as you have the NFA for the intersection, it is easy to check whether any string matches it - if thera is a path from the start to the end of your NFA, then the string for this path is the string you are searching for, for DFAs, an algorithm is given here, it should be simple to adapt it to NFAs.

Generating easy-to-remember random identifiers

As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2".
My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember.
It takes about 5 minutes to whip up an algorithm like the following in python, which produces halfway usable results:
import random
vowels = 'aeiuy' # 0 is confusing
consonants = 'bcdfghjklmnpqrstvwxz'
numbers = '0123456789'
random.seed()
for i in range(30):
chars = list()
chars.append(random.choice(consonants))
chars.append(random.choice(vowels))
chars.append(random.choice(consonants + numbers))
chars.append(random.choice(vowels))
chars.append(random.choice(vowels))
chars.append(random.choice(consonants))
print ''.join(chars)
The results look like this:
re1ean
meseux
le1ayl
kuteef
neluaq
tyliyd
ki5ias
This is already quite good, but I feel it is still easy to forget how they are spelled exactly, so that if you walk over to a colleagues desk and want to look one of those up, there's still potential for difficulty.
I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words and are thus easier to handle generally. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose.
Do you know of any published algorithms that solve this problem?
Thanks!
Carl
As you said, your sample is quite good. But if you want random identifiers that can easily be remembered, then you should not mix alphanumeric and numeric characters. Instead, you could opt to postfix an alphanumeric string with a couple of digits.
Also, in your sample You wisely excluded 'o', but forgot about the 'l', which you can easily confuse with '1'. I suggest you remove the 'l' as wel. ;-)
I am not sure that this answers your question, but maybe think about how many unique bug report number you need.
Simply using a four letter uppercase alphanumeric key like "BX-3D", you can have 36^4 = 1.7 million bug reports.
Edit: I just saw your sample. Maybe the results could be considerably improved if you used syllables instead of consonants and vowels.

How to elegantly compute the anagram signature of a word in ruby?

Arising out of this question, I'm looking for an elegant (ruby) way to compute the word signature suggested in this answer.
The idea suggested is to sort the letters in the word, and also run length encode repeated letters. So, for example "mississippi" first becomes "iiiimppssss", and then could be further shortened by encoding as "4impp4s".
I'm relatively new to ruby and though I could hack something together, I'm sure this is a one liner for somebody with more experience of ruby. I'd be interested to see people's approaches and improve my ruby knowledge.
edit: to clarify, performance of computing the signature doesn't much matter for my application. I'm looking to compute the signature so I can store it with each word in a large database of words (450K words), then query for words which have the same signature (i.e. all anagrams of a given word, that are actual english words). Hence the focus on space. The 'elegant' part is just to satisfy my curiosity.
The fastest way to create a sorted list of the letters is this:
"mississippi".unpack("c*").sort.pack("c*")
It is quite a bit faster than split('') and join(). For comparison it is also best to pack the array back together into a String, so you dont have to compare arrays.
I'm not much of a Ruby person either, but as I noted on the other comment this seems to work for the algorithm described.
s = "mississippi"
s.split('').sort.join.gsub(/(.)\1{2,}/) { |s| s.length.to_s + s[0,1] }
Of course, you'll want to make sure the word is lowercase, doesn't contain numbers, etc.
As requested, I'll try to explain the code. Please forgive me if I don't get all of the Ruby or reg ex terminology correct, but here goes.
I think the split/sort/join part is pretty straightforward. The interesting part for me starts at the call to gsub. This will replace a substring that matches the regular expression with the return value from the block that follows it. The reg ex finds any character and creates a backreference. That's the "(.)" part. Then, we continue the matching process using the backreference "\1" that evaluates to whatever character was found by the first part of the match. We want that character to be found a minimum of two more times for a total minimum number of occurrences of three. This is done using the quantifier "{2,}".
If a match is found, the matching substring is then passed to the next block of code as an argument thanks to the "|s|" part. Finally, we use the string equivalent of the matching substring's length and append to it whatever character makes up that substring (they should all be the same) and return the concatenated value. The returned value replaces the original matching substring. The whole process continues until nothing is left to match since it's a global substitution on the original string.
I apologize if that's confusing. As is often the case, it's easier for me to visualize the solution than to explain it clearly.
I don't see an elegant solution. You could use the split message to get the characters into an array, but then once you've sorted the list I don't see a nice linear-time concatenate primitive to get back to a string. I'm surprised.
Incidentally, run-length encoding is almost certainly a waste of time. I'd have to see some very impressive measurements before I'd think it worth considering. If you avoid run-length encoding, you can anagrammatize any string, not just a string of letters. And if you know you have only letters and are trying to save space, you can pack them 5 bits to a letter.
---Irma Vep
EDIT: the other poster found join which I missed. Nice.

Resources