How can I mask everything but the last four characters of a credit card number (PAN) with "#" symbols? [duplicate] - ruby

This question already has answers here:
How to mask all but last four characters in a string
(6 answers)
Closed 2 years ago.
I have a credit card number like 1234567891234 and I want to show only the last 4 characters of this string, like #########1234. How can I do this?

string.gsub(/.(?=....)/, '*')
=> "*********1234"
gsub without the ! does not mutate the original object that string points to and can take regex arguements.
. matches with a character that is not a line break and ?= is a positive lookahead, so any character that has four characters beyond it, that are not line breaks, will be replaced with the second gsub parameter, which is *.
string.gsub(/\d(?=[0-9]{4})/, '*')
=> "*********1234"
produces the same output, looking for digits with \d and doing a positive lookahead with [0-9]{4} which matches for four characters between zero and nine.

Masking All Digits Except Last Four
If you're just trying to mask the credit card number, there are a number of ways to do that. However, what makes it potentially tricky is that credit card numbers can have anywhere from 13-19 digits, although 16 is certainly the most common.
One of the easiest ways to work around this expected variation is to use is String#slice! to save the last four digits, and then String#tr to convert the remainder of the digits to your masking character. For example:
def mask_credit_card card_number
credit_card_number = String(card_number).scan(/\d/).join
last_four_digits = credit_card_number.slice! -4..-1
credit_card_number.tr("0-9", "#") << last_four_digits
end
# Test against various lengths & formats.
[
"1234567890123456",
"1234-5678-9012-3456",
1234567890123456,
"1234-567890-12345",
].map { |card_number| mask_credit_card card_number }
#=> ["############3456", "############3456", "############3456", "###########2345"]
Caveats & Considerations
Some cards like Diners Club can start with a zero, making the card number unsuitable for processing as an Integer. Treating the card number as a String can be more reliable, but forces you to think about how you'll handle unexpected characters or spacing.
Extracting digits with #scan is safer than using #delete when invoking the mask on unsanitized input. For example, String(card_number).delete "-\s\t" would normalize the example data above, but might not catch other unexpected characters. Never trust user input!
If you want to preserve spacing, dashes, and so forth in your masking, you run the risk that a malformed string (e.g. "1234-5678-9012-34 5-6") will yield unexpected results like " 5-6" as the last four digits. It's usually better to normalize your inputs, and apply your chosen formatting to your outputs (e.g. with printf or sprintf) instead. Of course, your specific use case may vary.

Related

Need help understanding why this string in grep pulls IP addresses rather than this other string

The following statement is from a homework question which I tested out and answered, but I'm just not understanding how come this line behaves the way it does and I want to understand why. I realize why this expression is flawed to find an IP address but I don't fully understand why it behaves the way it does since it seems as if the question mark doesn't actually behave as 0 or 1 times in like it's supposed to.
"user#machine:~$ grep -E '[01]?[0-9][0-9]?' "
To my understanding "[01]?" should look for any number 0-1 as indicated by the brackets while the question mark tells grep to look for zero or one instance only and similar with "[0-9]?". Thing is this line will print an unlimited number of digits far exceeding 3 digits. I ruled out that it was due to the 3rd bracket that didn't have a proceeding question mark since it would still print an unlimited amount of digits if I piped an echo or used a testing .txt file full of numbers.
This above example made me than wonder how to find IP's with grep the correct way. So I found countless examples like the following expression for IPv4 octets:
\.(25[0-5]\|2[0-4][0-9]\|[01][0-9][0-9]\|[0-9][0-9]).\
Is this telling me to look for any number 2-5 anywhere from 0-5 times? 0-5 is too many digits for an octet. Is it telling me to look for any number 0-5 up to 25 times? Again that's way too many digits for an octet. What does \2[0-4][0-9]\ mean in this case? I'm confused about how this expression finds numbers strictly between 1-255?
Look at it this way: x?[0-9]x? matches anything which contains a digit because both the x:es are optional. You might as well leave them out because they do not constrain the match at all.
25[0-5] looks for 25 followed by a digit in the range 0-5. In other words, the expression matches a number in the range 250-255.
The full expression in your example looks for a number in the range 00-255 by enumerating strings beginning with 25, 20-24, etc; though it's incomplete in that it doesn't permit single-digit numbers.
The expression matches a single octet (incompletely), not an entire IP address. Here is a common way to match an IPv4 address:
([3-9][0-9]?|2([0-4][0-9]?|5[0-9]?|[6-9])?|1([0-9][0-9]?)?)(\.([3-9][0-9]?|2([0-4][0-9]?|5[0-9]?|[6-9])?|1([0-9][0-9]?)?){3}
where the square brackets express character classes which match a single character out of a set, and the final curly braces {3} express a repetition.
Some regex dialects (e.g. POSIX grep) require backslashes before | and \( but I have used the extended notation (a la grep -E and most online regex exploration tools) which doesn't want backslashes.

Avoid entering white space in regex password Laravel 5.4

I am trying Regex Strong Password.
My regex is below. Works perfectly for below features.
Min 1 Digit Min 1 Lower char Min 1 Upper char Min 1 Special char Min 8
chars Max 15 chars
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[^\w]).{8,15}$
Can somebody suggest to avoid entering white spaces?
How about this?
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[^\w])(?!.*?\s).{8,15}$
I just added a negative lookahead for whitespace in addition to all your positive lookaheads.
As for what it means, it basically has a bunch of "lookaheads" which means "only create a match if the selected thing is followed by". It has four different lookaheads:
(?=.*?[A-Z]) // followed by any number of characters and then a capital letter
(?=.*?[a-z]) // followed by any number of characters and then a lowercase letter
(?=.*?[0-9]) // followed by any number of characters and then a number
(?=.*?[^\w]) // followed by any number of characters and then not a word character (0-9a-zA-Z_)
the ^ at the beginning means starts with. So it basically says the start of the regex should be followed by all four conditions specified above. I just added one more condidtion that says the start may NOT be followed by a space. It's called a "negative lookahead":
(?!.*?\s)

Counting words from a mixed-language document

Given a set of lines containing Chinese characters, Latin-alphabet-based words or a mixture of both, I wanted to obtain the word count.
To wit:
this is just an example
这只是个例子
should give 10 words ideally; but of course, without access to a dictionary, 例子 would best be treated as two separate characters. Therefore, a count of 11 words/characters would also be an acceptable result here.
Obviously, wc -w is not going to work. It considers the 6 Chinese characters / 5 words as 1 "word", and returns a total of 6.
How do I proceed? I am open to trying different languages, though bash and python will be the quickest for me right now.
You should split the text on Unicode word boundaries, then count the elements which contain letters or ideographs. If you're working with Python, you could use the uniseg or nltk packages, for example. Another approach is to simply use Unicode-aware regexes but these will only break on simple word boundaries. Also see the question Split unicode string on word boundaries.
Note that you'll need a more complex dictionary-based solution for some languages. UAX #29 states:
For Thai, Lao, Khmer, Myanmar, and other scripts that do not typically use spaces between words, a good implementation should not depend on the default word boundary specification. It should use a more sophisticated mechanism, as is also required for line breaking. Ideographic scripts such as Japanese and Chinese are even more complex. Where Hangul text is written without spaces, the same applies. However, in the absence of a more sophisticated mechanism, the rules specified in this annex supply a well-defined default.
I thought about a quick hack since Chinese characters are 3 bytes long in UTF8:
(pseudocode)
for each character:
if character (byte) begins with 1:
add 1 to total chinese chars
if it is a space:
add 1 to total "normal" words
if it is a newline:
break
Then take total chinese chars / 3 + total words to get the sum for each line. This will give an erroneous count for the case of mixed languages, but should be a good start.
这是test
However, the above sentence will give a total of 2 (1 for each of the Chinese characters.) A space between the two languages would be needed to give the correct count.

How to encode a number as a string such that the lexicographic order of the generated string is in the same order as the numeric order

For eg. if we have two strings 2 and 10, 10 will come first if we order lexicographically.
The very trivial sol will be to repeat a character n number of time.
eg. 2 can be encoded as aa
10 as aaaaaaaaaa
This way the lex order is same as the numeric one.
But, is there a more elegant way to do this?
When converting the numbers to strings make sure that all the strings have the same length, by appending 0s in the front if necessary. So 2 and 10 would be encoded as "02" and "10".
While kjampani's solution is probably the best and easiest in normal applications, another way which is more space-efficient is to prepend every string with its own length. Of course, you need to encode the length in a way which is also consistently sorted.
If you know all the strings are fairly short, you can just encode their length as a fixed-length base-X sequence, where X is the number of character codes you're willing to use (popular values are 64, 96, 255 and 256.) Note that you have to use the character codes in lexicographical order, so normal base64 won't work.
One variable-length order-preserving encoding is the one used by UTF-8. (Not UTF-8 directly, which has a couple of corner cases which will get in the way, but the same encoding technique. The order-preserving property of UTF-8 is occasionally really useful.) The full range of such compressed codes can encode values up to 42 bits long, with an average of five payload bits per byte. That's sufficient for pretty long strings; four terabyte long strings are pretty rare in the wild; but if you need longer, it's possible, too, by extending the size prefix over more than one byte.
Break the string into successive sub strings of letters and numbers and then sort by comparing each substring as an integer if it's an numeric string
"aaa2" ---> aaa + 2
"aaa1000" ---> aaa + 1000
aaa == aaa
Since they're equal, we continue:
1000 > 2
Hence, aaa1000 > aaa2.

Password validation

I need to validate a password with the following requirements:
1. Be at least seven characters long
2. Contain at least one letter (a-z or A-Z)
3 Contain at least one number (0-9)
4 Contain at least one symbol (#, $, %, etc)
Can anyone give me the correct expression?
/.{7,}/
/[a-zA-Z]/
/[0-9]/
/[-!##$%^...]/
For a single regex, the most straightforward way to check all of the requirements would be with lookaheads:
/(?=.*[a-zA-Z])(?=.*\d)(?=.*[^a-zA-Z0-9\s]).{7,}/
Breaking it down:
.{7,} - at least seven characters
(?=.*[a-zA-Z]) - a letter must occur somewhere after the start of the string
(?=.*\d) - ditto 2, except a digit
(?=.*[^a-zA-Z0-9\s]) - ditto 2, except something not a letter, digit, or whitespace
However, you might choose to simply utilize multiple separate regex matches to keep things even more readable - chances are you aren't validating a ton of passwords at once, so performance isn't really a huge requirement.

Resources