regular expression in Ruby: check the total number of digits - ruby

I need a one line gsub to replace all the non-digits in a string but only if the non-digits are not more than three and if the total length of the digits is 10
I have this which fits the first condition
p "0177/385490".gsub(/((\d+)\D?(\d+)\D?(\d+)\D?+(\d+))/,'\2\3\4\5')
#=>"0177385490"
but when i try this the {10} check doesn't work
p "0177/385490".gsub(/((\d+)\D?(\d+)\D?(\d+)\D?+(\d+)){10}/,'\2\3\4\5')
#=>"0177/385490"
how to do this please ?
EDIT
i managed to to it like this, but how to do this in a oneline gsub ?
strings = [
"0473/385 490",
"0473/385490",
"0473 38 54 90",
"0473/385 4901" #this one is't captured
]
strings.each do |s|
if /((\d+)\D?(\d+)\D?(\d+)\D?+(\d+))/ =~ s
if "#{$2}#{$3}#{$4}#{$5}".length == 10
puts "#{$2}#{$3}#{$4}#{$5}"
end
end
end
EDIT: to show why it really needs to be a onle line gsub here my routine, there will be more replacements added
def cleanup text
replacements = [
{:pattern => /(04\d{2}) (\d{2}) (\d{2}) (\d{2})/, :replace_with => '\1\2\3\4'},
{:pattern => /(0\d)(\/| |-)(\d{3}) (\d{2}) (\d{2})/, :replace_with => '\1\3\4\5'},
{:pattern => /(\d{6} )(\d{3})-(\d{2})/, :replace_with => '\1\2 \3'},
{:pattern => /(\d{2,4})\D?(\d{2,3})\D?(\d{2,3})/, :replace_with => '\1\2\3'}
].each{|replacement|text.gsub!(replacement[:pattern], replacement[:replace_with])}
text
end

I think a one-line gsub wouldn't be overly readable. Here's my approach:
chars, non_chars = s.each_char.partition { |c| c =~ /\d/ }
puts chars.join if chars.size == 10 && non_chars.size <= 3
Clean and easy to read, without any magic variables. Plus it clearly shows the rules you have imposed on the string.

Here's a one-liner with gsub, mostly to illustrate why Michael Kohl's approach is better:
(digits = s.gsub(/\D/, '')).length == 10 && s.length < 14 ? digits : s

You may use something like this:
puts s.gsub(/\D/, '') if (/\A(\d\D?){10}\z/ =~ s) && (/\A(\d+\D){0,3}\d*\z/ =~ s)

You might also want to know about the scan method.
strings.each do |s|
numbers = s.scan(/\d/).join
non_numbers = s.scan(/\D/)
puts numbers if numbers.length == 10 && non_numbers.length < 4
end
But I like the solution by #MichaelKohl better.
And then a silly example:
strings.select{|s| s.scan(/\D/).length < 4}.map{|s| s.scan(/\d/).join}.select{|s| s.length==10}

Thanks everyone but i can't use the answers because i can't insert them in my routine (edited my answer to make that more clear). Found a sollution myself. I give everyone an upvote who had a one line solution as requested, now i still need to find a way to insert my block as a replacementpattern in the cleanup routine
p "0177/3854901".gsub(/(\d+)\D?(\d+)\D?(\d+)\D?+(\d+)/){ |m| "#{$1}#{$2}#{$3}#{$4}".length==10 ? "#{$1}#{$2}#{$3}#{$4}":m}
#=> "0177/3854901" isn't replaced because it has 11 digits
p "0177/385490".gsub(/(\d+)\D?(\d+)\D?(\d+)\D?+(\d+)/){ |m| "#{$1}#{$2}#{$3}#{$4}".length==10 ? "#{$1}#{$2}#{$3}#{$4}":m}
#=> "0177385490"

Related

How do I use 'gsub' to make multiple substiuttions?

I have a string that only contains one number on either side of "-", like:
"1-3"
I want to get a result like
"01-03"
If the string had two numbers on one side of the dash like:
"1-10"
then I don't want to make any substitutions. I could do a gsub expression like
str.gsub!(/(^|[^\d]])\d[[:space:]]*\-[[:space:]]*\d([^\d]|$)/, '\1')
but I'm not clear how to do it if there are multiple (e.g. two) things to substitute.
You could probably get away with this:
def dashreplace(str)
str.sub(/\b(\d)\-(\d)\b/) do |s|
'%02d-%02d' % [ $1.to_i, $2.to_i ]
end
end
dashreplace('1-2')
# => "01-02"
dashreplace('1-20')
# => "1-20"
dashreplace('99-1,2-3')
# => "99-1,02-03"
Is there really a need to use regex here, at all? Seems like an over-complication to me. Assuming you know the string will be in the format: <digits><hyphen><digits>, you could do:
def pad_digits(string)
left_digits, right_digits = string.split('-')
if left_digits.length > 1 || right_digits.length > 1
string
else
"%02d-%02d" % [left_digits, right_digits]
end
end
pad_digits("1-3") # => "01-03"
pad_digits("1-10") # => "1-10"
This is a variant of #TomLord's answer.
def pad_single_digits(str)
str.size > 3 ? str : "0%d-0%d" % str.split('-')
end
pad_single_digits "1-3" #=> "01-03"
pad_single_digits "1-10" #=> "1-10"
"0%s-0%s" also works.
You can do:
def nums(s)
rtr=s[/^(\d)(\D+)(\d)$/] ? '0%s%s0%s' % [$1,$2,$3] : s
end

Ruby regex checks string for variations of pattern of same length

I was wondering how you construct the regular expression to check if the string has a variation of a pattern with the same length. Say the string is "door boor robo omanyte" how do I return the words that have the variation of [door]?
You can easily get all the possible words using Array#permutation. Then you can scan for them in provided string. Here:
possible_words = %w[d o o r].permutation.map &:join
# => ["door", "doro", "door", "doro", "droo", "droo", "odor", "odro", "oodr", "oord", "ordo", "orod", "odor", "odro", "oodr", "oord", "ordo", "orod", "rdoo", "rdoo", "rodo", "rood", "rodo", "rood"]
string = "door boor robo omanyte"
string.scan(possible_words.join("|"))
# => ["door"]
string = "door close rood example ordo"
string.scan(possible_words.join("|"))
# => ["door", "rood", "ordo"]
UPDATE
You can improve scan further by looking for word boundary. Here:
string = "doorrood example ordo"
string.scan(/"\b#{possible_words.join('\b|\b')}\b"/)
# => ["ordo"]
NOTE
As Cary correctly pointed out in comments below, this process is quite inefficient if you intend to find permutation for a fairly large string. However it should work fine for OP's example.
If the comment I left on your question correctly interprets the question, you could do this:
str = "door sit its odor to"
str.split
.group_by { |w| w.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["door", "odor"], ["sit", "its"]]
This assumes all the letters are the same case.
If case is not important, just make a small change:
str = "dooR sIt itS Odor to"
str.split
.group_by { |w| w.downcase.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["dooR", "Odor"], ["sIt", "itS"]]
In my opinion the fastest way to find this will be
word_a.chars.sort == word_b.chars.sort
since we are using the same characters inside the words
IMO, some kind of iteration is definitely necessary to build a regular expression to match this one. Not using a regular expression is better too.
def variations_of_substr(str, sub)
# Creates regexp to match words with same length, and
# with same characters of str.
patt = "\\b" + ( [ "[#{sub}]{1}" ] * sub.size ).join + "\\b"
# Above alone won't be enough, characters in both words should
# match exactly.
str.scan( Regexp.new(patt) ).select do |m|
m.chars.sort == sub.chars.sort
end
end
variations_of_substr("door boor robo omanyte", "door")
# => ["door"]

splitting a string into an array of 3 character from behind

I'm looking to split a numeric random string like "12345567" into the array ["12","345","567"] as simply as possible. basically changing a number into a human readable number array with splits at thousands,million, billions, etc..
my previous solution cuts it from the front rather than back
"'12345567".to_s.scan(/.{1,#{3}}/)
#> ["123","455","67"]
If you are on Rails, you can use the number_with_delimiter helper. In plain Ruby, you can include it.
require 'action_view'
require 'action_view/helpers'
include ActionView::Helpers::NumberHelper
number_with_delimiter("12345567", :delimiter => ',')
# => "12,345,567"
You can do a split on the comma, to get an Array
You could try the below.
> "12345567".scan(/\d+?(?=(?:\d{3})*$)/)
=> ["12", "345", "567"]
\d+? will do a non-greedy match of one or more digits which must be followed by exactly three digits, zero or more times and further followed by the end of a line.
\d+? will do a non-greedy match of one or more digits.
(?=..) called positive lookahead assertion which asserts that the match must be followed by,
(?:\d{3})* exactly three digits of zero or more times. So this would match an empty string or 111 or 111111 like multiples of 3.
$ End of the line anchor which matches the boundary which exists at the last.
OR
> "12345567".scan(/.{1,3}(?=(?:.{3})*$)/)
=> ["12", "345", "567"]
Here's one non-regex solution:
s = "12345567"
sz = s.size
n_first = sz % 3
((n_first>0) ? [s[0,n_first]] : []) + (n_first...sz).step(3).map { |i| s[i,3] }
#=> ["12", "345", "567"]
Another:
s.reverse.chars.each_slice(3).map { |a| a.join.reverse }.reverse
#=> ["12", "345", "567"]
A recursive approach:
def split(str)
str.size <= 3 ? [str] : (split(str[0..-4]) + [str[-3..-1]])
end
Hardly readable, though. Perhaps a more explicit code layout:
def split(str)
if str.size <= 3 then
[str] # Too short, keep it all.
else
split(str[0..-4]) + [str[-3..-1]] # Append the last 3, and recurse on the head.
end
end
Disclaimer: No test whatsoever on performance (or attempt to go for a clear tail recursion)! Just an alternative to explore.
It's hard to tell what you want, but maybe:
"12345567".scan(/^..|.{1,3}/)
=> ["12", "345", "567"]

Searching for vowels in Ruby

str = "Find the vowels in this string or else I'll date your sister"
I am looking to count the number of vowels in a string and I believe I have achieved this, but I have done it by appending the each letter to an array and taking the length of the array. What's a more common way to do this. Maybe with +=?
str.chars.to_a.each do |i|
if i =~ /[aeiou]/
x.push(i)
end
end
x.length
But here is even better answer =). It turns out that we have a String#count method:
str.downcase.count 'aeiou'
#=> 17
If you want to count the vowels, why not use count:
str.chars.count {|c| c =~ /[aeiou]/i }
Use scan
"Find the vowels in this string or else I'll date your sister".scan(/[aeiou]/i).length
No need to:
str.chars.to_a
In fact, str.chars already is a Array
> String.new.chars.class
=> Array
Refactoring a little
str.chars.each{|i| i =~ /[aeiou]/ ? x : nil}
x.length
But maybe an alternative for the best solution could be:
a.chars.map{|x| x if x.match(/[aeiouAEIOU]/)}.join.size
You should check the map block because you could perform something useful inside, as an alternative just for the count block.
without doubt best solution for count vowel inside string using block:
str.chars.count {|c| c =~ /[aeiou]/i }
There are shorter incarnations.
$ irb
>> "Find the vowels in this string or else I'll date your sister".gsub(/[^aeiou]/i, '').length
=> 17
Here's a way that use String#tr:
str = "Find the vowels in this string or else I'll date your sister"
str.size - str.tr('aeiouAEIOU','').size #=> 17
or
str.size - str.downcase.tr('aeiou','').size #=> 17

Ruby, remove last N characters from a string?

What is the preferred way of removing the last n characters from a string?
irb> 'now is the time'[0...-4]
=> "now is the "
If the characters you want to remove are always the same characters, then consider chomp:
'abc123'.chomp('123') # => "abc"
The advantages of chomp are: no counting, and the code more clearly communicates what it is doing.
With no arguments, chomp removes the DOS or Unix line ending, if either is present:
"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
From the comments, there was a question of the speed of using #chomp versus using a range. Here is a benchmark comparing the two:
require 'benchmark'
S = 'asdfghjkl'
SL = S.length
T = 10_000
A = 1_000.times.map { |n| "#{n}#{S}" }
GC.disable
Benchmark.bmbm do |x|
x.report('chomp') { T.times { A.each { |s| s.chomp(S) } } }
x.report('range') { T.times { A.each { |s| s[0...-SL] } } }
end
Benchmark Results (using CRuby 2.13p242):
Rehearsal -----------------------------------------
chomp 1.540000 0.040000 1.580000 ( 1.587908)
range 1.810000 0.200000 2.010000 ( 2.011846)
-------------------------------- total: 3.590000sec
user system total real
chomp 1.550000 0.070000 1.620000 ( 1.610362)
range 1.970000 0.170000 2.140000 ( 2.146682)
So chomp is faster than using a range, by ~22%.
Ruby 2.5+
As of Ruby 2.5 you can use delete_suffix or delete_suffix! to achieve this in a fast and readable manner.
The docs on the methods are here.
If you know what the suffix is, this is idiomatic (and I'd argue, even more readable than other answers here):
'abc123'.delete_suffix('123') # => "abc"
'abc123'.delete_suffix!('123') # => "abc"
It's even significantly faster (almost 40% with the bang method) than the top answer. Here's the result of the same benchmark:
user system total real
chomp 0.949823 0.001025 0.950848 ( 0.951941)
range 1.874237 0.001472 1.875709 ( 1.876820)
delete_suffix 0.721699 0.000945 0.722644 ( 0.723410)
delete_suffix! 0.650042 0.000714 0.650756 ( 0.651332)
I hope this is useful - note the method doesn't currently accept a regex so if you don't know the suffix it's not viable for the time being. However, as the accepted answer (update: at the time of writing) dictates the same, I thought this might be useful to some people.
str = str[0..-1-n]
Unlike the [0...-n], this handles the case of n=0.
I would suggest chop. I think it has been mentioned in one of the comments but without links or explanations so here's why I think it's better:
It simply removes the last character from a string and you don't have to specify any values for that to happen.
If you need to remove more than one character then chomp is your best bet. This is what the ruby docs have to say about chop:
Returns a new String with the last character removed. If the string
ends with \r\n, both characters are removed. Applying chop to an empty
string returns an empty string. String#chomp is often a safer
alternative, as it leaves the string unchanged if it doesn’t end in a
record separator.
Although this is used mostly to remove separators such as \r\n I've used it to remove the last character from a simple string, for example the s to make the word singular.
name = "my text"
x.times do name.chop! end
Here in the console:
>name = "Nabucodonosor"
=> "Nabucodonosor"
> 7.times do name.chop! end
=> 7
> name
=> "Nabuco"
Dropping the last n characters is the same as keeping the first length - n characters.
Active Support includes String#first and String#last methods which provide a convenient way to keep or drop the first/last n characters:
require 'active_support/core_ext/string/access'
"foobarbaz".first(3) # => "foo"
"foobarbaz".first(-3) # => "foobar"
"foobarbaz".last(3) # => "baz"
"foobarbaz".last(-3) # => "barbaz"
if you are using rails, try:
"my_string".last(2) # => "ng"
[EDITED]
To get the string WITHOUT the last 2 chars:
n = "my_string".size
"my_string"[0..n-3] # => "my_stri"
Note: the last string char is at n-1. So, to remove the last 2, we use n-3.
Check out the slice() method:
http://ruby-doc.org/core-2.5.0/String.html#method-i-slice
You can always use something like
"string".sub!(/.{X}$/,'')
Where X is the number of characters to remove.
Or with assigning/using the result:
myvar = "string"[0..-X]
where X is the number of characters plus one to remove.
If you're ok with creating class methods and want the characters you chop off, try this:
class String
def chop_multiple(amount)
amount.times.inject([self, '']){ |(s, r)| [s.chop, r.prepend(s[-1])] }
end
end
hello, world = "hello world".chop_multiple 5
hello #=> 'hello '
world #=> 'world'
Using regex:
str = 'string'
n = 2 #to remove last n characters
str[/\A.{#{str.size-n}}/] #=> "stri"
x = "my_test"
last_char = x.split('').last

Resources