Password validation - validation

I need to validate a password with the following requirements:
1. Be at least seven characters long
2. Contain at least one letter (a-z or A-Z)
3 Contain at least one number (0-9)
4 Contain at least one symbol (#, $, %, etc)
Can anyone give me the correct expression?

/.{7,}/
/[a-zA-Z]/
/[0-9]/
/[-!##$%^...]/

For a single regex, the most straightforward way to check all of the requirements would be with lookaheads:
/(?=.*[a-zA-Z])(?=.*\d)(?=.*[^a-zA-Z0-9\s]).{7,}/
Breaking it down:
.{7,} - at least seven characters
(?=.*[a-zA-Z]) - a letter must occur somewhere after the start of the string
(?=.*\d) - ditto 2, except a digit
(?=.*[^a-zA-Z0-9\s]) - ditto 2, except something not a letter, digit, or whitespace
However, you might choose to simply utilize multiple separate regex matches to keep things even more readable - chances are you aren't validating a ton of passwords at once, so performance isn't really a huge requirement.

Related

How can I mask everything but the last four characters of a credit card number (PAN) with "#" symbols? [duplicate]

This question already has answers here:
How to mask all but last four characters in a string
(6 answers)
Closed 2 years ago.
I have a credit card number like 1234567891234 and I want to show only the last 4 characters of this string, like #########1234. How can I do this?
string.gsub(/.(?=....)/, '*')
=> "*********1234"
gsub without the ! does not mutate the original object that string points to and can take regex arguements.
. matches with a character that is not a line break and ?= is a positive lookahead, so any character that has four characters beyond it, that are not line breaks, will be replaced with the second gsub parameter, which is *.
string.gsub(/\d(?=[0-9]{4})/, '*')
=> "*********1234"
produces the same output, looking for digits with \d and doing a positive lookahead with [0-9]{4} which matches for four characters between zero and nine.
Masking All Digits Except Last Four
If you're just trying to mask the credit card number, there are a number of ways to do that. However, what makes it potentially tricky is that credit card numbers can have anywhere from 13-19 digits, although 16 is certainly the most common.
One of the easiest ways to work around this expected variation is to use is String#slice! to save the last four digits, and then String#tr to convert the remainder of the digits to your masking character. For example:
def mask_credit_card card_number
credit_card_number = String(card_number).scan(/\d/).join
last_four_digits = credit_card_number.slice! -4..-1
credit_card_number.tr("0-9", "#") << last_four_digits
end
# Test against various lengths & formats.
[
"1234567890123456",
"1234-5678-9012-3456",
1234567890123456,
"1234-567890-12345",
].map { |card_number| mask_credit_card card_number }
#=> ["############3456", "############3456", "############3456", "###########2345"]
Caveats & Considerations
Some cards like Diners Club can start with a zero, making the card number unsuitable for processing as an Integer. Treating the card number as a String can be more reliable, but forces you to think about how you'll handle unexpected characters or spacing.
Extracting digits with #scan is safer than using #delete when invoking the mask on unsanitized input. For example, String(card_number).delete "-\s\t" would normalize the example data above, but might not catch other unexpected characters. Never trust user input!
If you want to preserve spacing, dashes, and so forth in your masking, you run the risk that a malformed string (e.g. "1234-5678-9012-34 5-6") will yield unexpected results like " 5-6" as the last four digits. It's usually better to normalize your inputs, and apply your chosen formatting to your outputs (e.g. with printf or sprintf) instead. Of course, your specific use case may vary.

Need help understanding why this string in grep pulls IP addresses rather than this other string

The following statement is from a homework question which I tested out and answered, but I'm just not understanding how come this line behaves the way it does and I want to understand why. I realize why this expression is flawed to find an IP address but I don't fully understand why it behaves the way it does since it seems as if the question mark doesn't actually behave as 0 or 1 times in like it's supposed to.
"user#machine:~$ grep -E '[01]?[0-9][0-9]?' "
To my understanding "[01]?" should look for any number 0-1 as indicated by the brackets while the question mark tells grep to look for zero or one instance only and similar with "[0-9]?". Thing is this line will print an unlimited number of digits far exceeding 3 digits. I ruled out that it was due to the 3rd bracket that didn't have a proceeding question mark since it would still print an unlimited amount of digits if I piped an echo or used a testing .txt file full of numbers.
This above example made me than wonder how to find IP's with grep the correct way. So I found countless examples like the following expression for IPv4 octets:
\.(25[0-5]\|2[0-4][0-9]\|[01][0-9][0-9]\|[0-9][0-9]).\
Is this telling me to look for any number 2-5 anywhere from 0-5 times? 0-5 is too many digits for an octet. Is it telling me to look for any number 0-5 up to 25 times? Again that's way too many digits for an octet. What does \2[0-4][0-9]\ mean in this case? I'm confused about how this expression finds numbers strictly between 1-255?
Look at it this way: x?[0-9]x? matches anything which contains a digit because both the x:es are optional. You might as well leave them out because they do not constrain the match at all.
25[0-5] looks for 25 followed by a digit in the range 0-5. In other words, the expression matches a number in the range 250-255.
The full expression in your example looks for a number in the range 00-255 by enumerating strings beginning with 25, 20-24, etc; though it's incomplete in that it doesn't permit single-digit numbers.
The expression matches a single octet (incompletely), not an entire IP address. Here is a common way to match an IPv4 address:
([3-9][0-9]?|2([0-4][0-9]?|5[0-9]?|[6-9])?|1([0-9][0-9]?)?)(\.([3-9][0-9]?|2([0-4][0-9]?|5[0-9]?|[6-9])?|1([0-9][0-9]?)?){3}
where the square brackets express character classes which match a single character out of a set, and the final curly braces {3} express a repetition.
Some regex dialects (e.g. POSIX grep) require backslashes before | and \( but I have used the extended notation (a la grep -E and most online regex exploration tools) which doesn't want backslashes.

Count Number of Sentence Ruby

I happened to search around everywhere and did not managed to find a solution to count number of sentence in a String using Ruby. Does anyone how to do it?
Example
string = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
This string should return number 4.
You can split the text into sentences and count them. Here:
string.scan(/[^\.!?]+[\.!?]/).map(&:strip).count # scan has regex to split string and strip will remove trailing spaces.
# => 4
Explaining regex:
[^\.!?]
Caret inside of a character class [^ ] is the negation operator. Which means we are looking for characters which are not present in list: ., ! and ?.
+
is a greedy operator that returns matches between 1 and unlimited times. (capturing our sentences here and ignoring repetitions like ...)
[\.!?]
matching characters ., ! or ?.
In a nutshell, we are capturing all characters that are not ., ! or ? till we get characters that are ., ! or ?. Which basically can be treated as a sentence (in broad senses).
I think it makes sense to consider a word char followed by a ?! or . the delimiter of a sentence:
string.strip.split(/\w[?!.]/).length
#=> 4
So I'm not considering the ... a delimiter when it hangs on it's own like that:
"I waited a while ... and then I went home"
But then again, maybe I should...
It also occurs to me that maybe a better delimiter is a punctuation followed by some space and a capital letter:
string.split(/[?!.]\s+[A-Z]/).length
#=> 4
Sentences end with full stops, question marks, and exclamation marks. They can also be
separated with dashes and other punctuation, but we won’t worry about these rare cases here.
The split is simple. Instead of asking Ruby to split the text on one type of character, you simply
ask it to split on any of three types of characters, like so:
txt = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
sentence_count = txt.split(/\.|\?|!/).length
puts sentence_count
#=> 7
string.squeeze('.!?').count('.!?')
#=> 4

Regex for phone number validation

I have to find in string a phone number with conditions:
Start with 0
with 10 or 11 number 0-9
with maximum 2 character "-" (Not at start or end)
Example: 01234567890, 01-234567890, 03-1234-12345.
My regex, but it not work:
/\d+{10,11}|(\d+\-\d+){11,12}|(\d+\-\d+\-\d+){12,13}/
It is a bit tricky. First, your regexp kind of has the right idea. Given that the length changes with number of dashes, we need to check each case separately. (There might be a better way, but I can't think of one.) However, (\d+-\d+){11,12} does not mean "length being 11-12", but "11-12 repetitions of \d+-\d+, giving you way more than 11-12 characters. Even if it were correct, because of the order of the disjunction, you would not be able to match 0123456789-1, because 10 digits would be found first, and ten digits followed by dash and another digit would not even be checked.
If you were trying to validate the whole string, it would have been easier, as you can use anchors ^ and $ to find the end. Without it, it is a little trickier:
(?=[\d-]{13,14}(?![\d-]))0\d+-\d+-\d+(?![\d-])|(?=[\d-]{12,13}(?!-|\d))0\d+-\d+(?![\d-])|\d{10,11}
The first part, (?=[\d-]{13,14}(?![\d-]))0\d+-\d+-\d+(?![\d-]), checks for the two-dash pattern. (?=[\d-]{13,14}(?![\d-])) checks whether you have 13-14 digit-or-dash characters after which you don't have a digit nor a dash. After making sure there is such a region, we make sure there are exactly two dashes in between digits (and making sure the whole thing is, again, not followed by a digit-or-dash - this anchor synchronises the condition in our lookahead and in the main pattern).
The second part, (?=[\d-]{12,13}(?!-|\d))0\d+-\d+(?![\d-]), is analogous, checking for one-dash matches. The third part, \d{10,11}, is trivially simple, and finds no-dash matches.
All of this is under the assumption that sawa's needling is on-point: that 0123456789- is not a match. If it is, you will need to change some plusses into stars.
Rubular
EDIT: The Rubular pattern still has the wrong \d{11,12} for the dashless case, can't be bothered to generate another Rubular :P
EDIT2: Thought of a better way.
(?=(?:\d-?){10,11}(?![\d-]))\d+(-\d+){0,2}(?![\d-])
Make sure there's 10-11 digits, and make sure there's 0-2 dashes. The anchor idea is the same as in the previous one.
Rubular.

Counting words from a mixed-language document

Given a set of lines containing Chinese characters, Latin-alphabet-based words or a mixture of both, I wanted to obtain the word count.
To wit:
this is just an example
这只是个例子
should give 10 words ideally; but of course, without access to a dictionary, 例子 would best be treated as two separate characters. Therefore, a count of 11 words/characters would also be an acceptable result here.
Obviously, wc -w is not going to work. It considers the 6 Chinese characters / 5 words as 1 "word", and returns a total of 6.
How do I proceed? I am open to trying different languages, though bash and python will be the quickest for me right now.
You should split the text on Unicode word boundaries, then count the elements which contain letters or ideographs. If you're working with Python, you could use the uniseg or nltk packages, for example. Another approach is to simply use Unicode-aware regexes but these will only break on simple word boundaries. Also see the question Split unicode string on word boundaries.
Note that you'll need a more complex dictionary-based solution for some languages. UAX #29 states:
For Thai, Lao, Khmer, Myanmar, and other scripts that do not typically use spaces between words, a good implementation should not depend on the default word boundary specification. It should use a more sophisticated mechanism, as is also required for line breaking. Ideographic scripts such as Japanese and Chinese are even more complex. Where Hangul text is written without spaces, the same applies. However, in the absence of a more sophisticated mechanism, the rules specified in this annex supply a well-defined default.
I thought about a quick hack since Chinese characters are 3 bytes long in UTF8:
(pseudocode)
for each character:
if character (byte) begins with 1:
add 1 to total chinese chars
if it is a space:
add 1 to total "normal" words
if it is a newline:
break
Then take total chinese chars / 3 + total words to get the sum for each line. This will give an erroneous count for the case of mixed languages, but should be a good start.
这是test
However, the above sentence will give a total of 2 (1 for each of the Chinese characters.) A space between the two languages would be needed to give the correct count.

Resources