I'm trying to figure out how to write my own regex.
I made a list of viable phone numbers and non-viable ones and trying to make sure the viable ones are included but I can't figure out how to finish it up.
Allowed list
0665363636 //
06 65 36 36 36 //
06-65-36-36-36 //
+33 6 65 36 36 36
Not allowed
06 65 36 36 //
2336653636 //
+3366536361 //
0065363636
I messed around with it a bit and I currently have this:
[0+][63][6 \-3][56\ ][\d{1}][\d \-]\d{2}[\d{1} \-]\d\d? ?\-?\d?\d? ?\d?\d?$
This blocks out number 2 and 4 of the non allowed but I can't seem to figure out how to block the other ones out.
Should I put a minimum amount of numbers? If so how would I do this.
[Edit: After posting this I see it is very similar to #Lucas' answer. I will let it stand, however, for the alternative presentation.]
I would try constructing a regex for each of the allowed patterns and then take their union to obtain a single regex.
We see that all of the allowable numbers not beginning with + have 10 digits, so I will assume that's a requirement. If different numbers of digits are permitted, that can be dealt with easily.
1. Include 0665363636, exclude 2336653636 and 0065363636
I assume this means the number must begin with the digit 0 and the second digit must not be 0. That's easy:
r1 = /
^ # match start of string
0 # match 0
[1-9] # match any digit 1-9
\d{8} # match 8 digits
$ # match end of string
/x
Test:
'0665363636' =~ r1 #=> 0
'2336653636' =~ r1 #=> nil
'0065363636' =~ r1 #=> nil
That seems to work.
2. Include 06 65 36 36 36, exclude 06 65 36 36
Another easy one:
r2 = /
^ # match start of string
0 # match 0
[1-9] # match any digit 1-9 # or \d if can be zero
(?: # begin a non-capture group
\s # match one whitespace
\d{2} # match two digits
) # end capture group
{4} # match capture group 4 times
$ # match end of string
/x
Test:
'06 65 36 36 36' =~ r2 #=> 0
'06 65 36 36' =~ r2 #=> nil
Another apparent success!
We see that 06-65-36-36-36 should also be permitted. That's such a small variant of the above we don't have to bother creating another regex to include in the union; instead we just modify r2 ever-so-slightly:
r2 = /^0[1-9](?:
[\s-] # match one whitespace or a hyphen
\d{2}){4}$
/x
Notice that we don't have to escape the hyphen when it's in a character class.
Test:
'06 65 36 36 36' =~ r2 #=> 0
'06-65-36-36-36' =~ r2 #=> 0
Yes!
3. Include +33 6 65 36 36 36, exclude +3366536361
It appears that, when the number begins with a +, + must be followed by two digits, a space, one digit, a space, then four pairs of numbers separated by spaces. We can just write that down:
r3 = /
^ # match start of string
\+ # match +
\d\d # match two digits
\s\d # match one whitespace followed by a digit
(?: # begin a non-capture group
\s # match one whitespace
\d{2} # match two digits
) # end capture group
{4} # match capture group 4 times
$ # match end of string
/x
Test:
'+33 6 65 36 36 36' =~ r3 #=> 0
'+3366536361' =~ r3 #=> nil
Nailed it!
Unionize!
r = Regexp.union(r1, r2, r3)
=> /(?x-mi:
^ # match start of string
0 # match 0
[1-9] # match any digit 1-9
\d{8} # match 8 digits
$ # match end of string
)|(?x-mi:^0[1-9](?:
[\s-] # match one whitespace or a hyphen
\d{2}){4}$
)|(?x-mi:
^ # match start of string
\+ # match +
\d\d # match two digits
\s\d # match one whitespace followed by a digit
(?: # begin a non-capture group
\s # match one whitespace
\d{2} # match two digits
) # end capture group
{4} # match capture group 4 times
$ # match end of string
)/
Let's try it:
['0665363636', '06 65 36 36 36', '06-65-36-36-36',
'+33 6 65 36 36 36'].any? { |s| (s =~ r).nil? } #=> false
['06 65 36 36', '2336653636', '+3366536361',
'0065363636'].all? { |s| (s =~ r).nil? } #=> true
Bingo!
Efficiency
Unionizing individual regexes may not produce the most efficient single regex. You must decide if the benefits of easier initial initial construction and testing, and ongoing maintenance, are worth the efficiency penalty. If efficiency is paramount, you might still construct the r this way, then tune it by hand.
Looks like you want to limit the allowed phone numbers to French mobile phone numbers only.
You made a list of valid and invalid strings, which is a good starting point. But then, I think you just wanted to write the pattern in one shot, which is error-prone.
Let's follow a simple methodology and go through the allowed list and craft a very simple regex for each one:
0665363636 -> ^06\d{8}$
06 65 36 36 36 -> ^06(?: \d\d){4}$
06-65-36-36-36 -> ^06(?:-\d\d){4}$
+33 6 65 36 36 36 -> ^\+33 6(?: \d\d){4}$
So far so good.
Now, just combine everything into one regex, and factor it a bit (the 06 part is common in the first 3 cases):
^06(?:\d{8}|(?: \d\d){4}|(?:-\d\d){4})|\+33 6(?: \d\d){4}$
Et voilà. Demo here.
As a side note, you should rather use:
^0[67](?:\d{8}|(?: \d\d){4}|(?:-\d\d){4})|\+33 [67](?: \d\d){4}$
As French mobile phone numbers can start in 07 too.
Related
I am trying to convert a string of decimal values to hex, grabbing two digits at a time
so for example, if i were to convert these decimals two digits at a time
01 67 15 06 01 76 61 73
this would be my result
01430F06014C3D49
i know that str.to_s(16) will convert decimal to hex but like i said I need this done two digits at a time so the output is correct, and i have no clue how to do this in Ruby
here is what i have tried
str.upcase.chars.each_slice(2).to_s.(16).join
You can use String#gsub with a regular expression and Kernel#sprintf:
"01 67 15 06 01 76 61 73".gsub(/\d{2} */) { |s| sprintf("%02x", s.to_i) }
#=> "01430f06014c3d49"
The regular expression /\d{2} */) matches two digits followed by zero or more spaces (note 73 is not followed by space).
The result of the block calculation replaces the two or three characters that were matched by the regular expression.
sprintf's formatting directive forms a sting containing 2 characters, padded to the left with '0''s, if necessary, and converting the string representation of an integer in base 10 to the string representation of an integer in base 16 ('x').
Alternatively, one could use String#% (with sprintf's formatting directives):
"01 67 15 06 01 76 61 73".gsub(/\d{2} */) { |s| "%02x" % s.to_i }
#=> "01430f06014c3d49"
I am working on a software problem and I found myself needing to convert a 2-letter string to a 3-digit number. We're talking about English alphabet only (26 letters).
So essentially I need to convert something like AA, AR, ZF, ZZ etc. to a number in the range 0-999.
We have 676 combinations of letters and 1000 numbers, so the range is covered.
Now, I could just write up a map manually, saying that AA = 1, AB = 2 etc., but I was wondering if maybe there is a better, more "mathematical" or "logical" solution to this.
The order of numbers is of course not relevant, as long as the conversion from letters to numbers is unique and always yields the same results.
The conversion should work both ways (from letters to numbers and from numbers to letters).
Does anyone have an idea?
Thanks a lot
Treat A-Z as 1-26 in base 27, with 0 reserved for blanks.
E.g. 'CD' -> 3 * 27 + 4 = 85
85 -> 85 / 27, 85 % 27 = 3, 4 = C, D
If you don’t have to use consecutive numbers, you can view a two-letter string as a 36-based number. So, you can just use the int function to convert it into an Integer.
int('AA', 36) # 370
int('AB', 36) # 371
#...
int('ZY', 36) # 1294
int('ZZ', 36) # 1295
As for how to convert the number back to a string, you can refer to the method on How to convert an integer to a string in any base?
#furry12 because the diff between the first number and the last one is 1295-370=925<999. It is quite lucky, so you can minus every number for like 300, the results will be in the range of 0-999
def str2num(s):
return int(s, 36) - 300
print(str2num('AA')) # 70
print(str2num('ZZ')) # 995
Not sure what I'm doing incorrect but I seem to be getting it woefully wrong.
The question is, you are given a string of space separated numbers, and have to return the highest and lowest number.
Note:
All numbers are valid Int32, no need to validate them.
There will always be at least one number in the input string.
Output string must be two numbers separated by a single space, and highest number is first.
def high_and_low(numbers)
# numbers contains a string of space seperated numbers
#return the highest and lowest number
numbers.minmax { |a, b| a.length <=> b.length }
end
Output:
`high_and_low': undefined method `minmax' for "4 5 29 54 4 0 -214 542 -64 1 -3 6 -6":String
minmax is not implemented for a string. You need to split your string into an array first. But note that split will return an array of strings, not numbers, you will need to translate the strings to integers (to_i) in the next step.
Because minmax returns the values in the opposite order than required, you need to rotate the array with reverse and then just join those numbers with whitespace for the final result.
numbers = "4 5 29 54 4 0 -214 542 -64 1 -3 6 -6"
def high_and_low(numbers)
numbers.split.minmax_by(&:to_i).reverse.join(' ')
end
high_and_low(numbers)
#=> "542 -214"
How about:
numbers_array = numbers.split(' ')
"#{numbers_array.max} #{numbers_array.min}"
If you're starting with a string of numbers you may have to cast the .to_i after the call to split.
In that case:
numbers_array = numbers.split(' ').map { |n| n.to_i }
"#{numbers_array.max} #{numbers_array.min}"
As you're starting with a String, you must turn it into an Array to cast minmax on it.
Also, make sure to compare Integers by casting .map(&:to_i) on the Array; otherwise you'd compare the code-point instead of the numerical value.
def get_maxmin(string)
string.split(' ')
.map(&:to_i)
.minmax
.reverse
.join(' ')
end
There is no need to convert the string to an array.
def high_and_low(str)
str.gsub(/-?\d+/).
reduce([-Float::INFINITY, Float::INFINITY]) do |(mx,mn),s|
n = s.to_i
[[mx,n].max, [mn,n].min]
end
end
high_and_low "4 5 29 54 4 0 -214 542 -64 1 -3 6 -6"
#=> [542, -214]
Demo
This uses the form of String#gsub that has one argument and no block, so it returns an enumerator that I've chained to Enumerable#reduce (a.k.a. inject). gsub therefore merely generates matches of the regular expression /-?\d+/ and performs no substitutions.
My solution to this kata
def high_and_low(numbers)
numbers.split.map(&:to_i).minmax.reverse.join(' ')
end
Test.assert_equals(high_and_low("4 5 29 54 4 0 -214 542 -64 1 -3 6 -6"), "542 -214")
#Test Passed: Value == "542 -214"
Some docs about methods:
String#split Array#map Array#minmax Array#reverse Array#join
More about Symbol#to_proc
numbers.split.map(&:to_i) is same as number.split.map { |p| p.to_i }
But "minmax_by(&:to_i)" looks better, for sure I guess.
I'm confused about the optional argument for to_i.
Specifically, what "base" means, and how it impacts the method in this example:
"0a".to_i(16) #=> 10
I have trouble with the optional argument in regards to the string the method is called on. I thought that the return value would just be an integer value of 0.
Simple answer: It's because 0a or a in Hexadecimal is equal to 10 in Decimal.
And base, in other word Radix means the number of unique digits in a numeral system.
In Decimal, we have 0 to 9, 10 digits to represent numbers.
In Hexadecimal, there're 16 digits instead, apart from 0 to 9, we use a to f to represent the conceptual numbers of 10 to 15.
You can test it like this:
"a".to_i(16)
#=> 10
"b".to_i(16)
#=> 11
"f".to_i(16)
#=> 15
"g".to_i(16)
#=> 0 # Because it's not a correct hexadecimal digit/number.
'2c'.to_i(16)
#=> 44
'2CH2'.to_i(16)
#=> 44 # Extraneous characters past the end of a valid number are ignored, and it's case insensitive.
9.to_s.to_i(16)
#=> 9
10.to_s.to_i(16)
#=> 16
In other words, 10 in Decimal is equal to a in Hexadecimal.
And 10 in Hexadecimal is equal to 16 in Decimal. (Doc for to_i)
Note that usually we use 0x precede to Hexadecimal numbers:
"0xa".to_i(16)
#=> 10
"0x100".to_i(16)
#=> 256
Btw, you can just use these representations in Ruby:
num_hex = 0x100
#=> 256
num_bin = 0b100
#=> 4
num_oct = 0o100
#=> 64
num_dec = 0d100
#=> 100
Hexadecimal, binary, octonary, decimal (this one, 0d is superfluous of course, just use in some cases for clarification.)
Suppose we have a string str. If str contains only one character, for example, str = "1", then str[-1..1] returns 1.
But if the size (length) of str is longer than one, like str = "anything else", then str[-1..1] returns "" (empty string).
Why does Ruby interpret string slicing like this?
This behaviour is just how ranges of characters work.
The range start is -1, which is the last character in the string. The range end is 1, which is the second position from the start.
So for a one character string, this is equivalent to 0..1, which is that single character.
For a two character string, this is 1..1, which is the second character.
For a three character string, this is 2..1, which is an empty string. And so on for longer strings.
To get a non-trivial substring, the start position has to represent a position earlier than the end position.
For a single-length string, index -1 is the same as index 0, which is smaller than 1. Thus, [-1..1] gives a non-trivial substring.
For a string longer than a single character, index -1 is larger than index 0. Thus, [-1..1] cannot give a non-trivial substring, and by default, it returns an empty string.
Writing down the indices usually helps me:
# 0 1 2 3 4 5 6 7 8 9 10 11 12
str = 'a' 'n' 'y' 't' 'h' 'i' 'n' 'g' ' ' 'e' 'l' 's' 'e' #=> "anything else"
# -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
You can refer to each character by either its positive or negative index. For example, you can use either 3 or -10 to refer to "t":
str[3] #=> "t"
str[-10] #=> "t"
and either 7 or -6 to refer to "g":
str[7] #=> "g"
str[-6] #=> "g"
Likewise, you can use each of these indices to retrieve "thing" via a range:
str[3..7] #=> "thing"
str[3..-6] #=> "thing"
str[-10..7] #=> "thing"
str[-10..-6] #=> "thing"
str[-1..1] however would return an empty string, because -1 refers to the last character and 1 refers to the second. It would be equivalent to str[12..1].
But if the string consists of a single character, that range becomes valid:
# 0
str = '1'
# -1
str[-1..1] #=> "1"
In fact, 1 refers to an index after the first character, so 0 would be enough:
str[-1..0] #=> "1"