How to (easily) split string in Ruby with length and also delimiter - ruby

I need to split a string in Ruby which has the following format:
[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]
ie. it is a generated javascript array. Unfortunately the list is long and I would like to split it up on the array element separator comma after reaching a specific length suitable for editing it with a code editor, but in a way to keep the integrity of the elements. For example, the line above with a split width of 15 would become:
[{a:1,b:2,c:3,d:4},
{a:5,b:6,c:7,d:8},
{a:9,b:10,c:11,d:12},
{a:13,b:14,c:15,d:16}]
and with a width of 32 the text would be:
[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},
{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]
Beside of the classical "brute force" approach (loop through, check separator between } and { while increasing length, split if length greater than and valid separator found) is there a more "Rubyish" solution to the problem?
Edit: Naive approach attached, definitely not Rubyiish as I don't have a very strong Ruby background:
def split(what, length)
result = []
clength = 0
flag = FALSE
what_copy = what.to_s
what_copy.to_s.each_char do |c|
clength += 1
if clength > length
flag = TRUE
end
if c == '}' and flag
result << what[0 .. clength]
what = what[clength+1 .. -1]
clength = 0
flag = FALSE
end
end
pp result
sres = result.join("\n")
sres
end

You could use a regex with :
a non-greedy repetition of at least width-2 characters
followed by a }
followed by a , or a ].
data = "[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"
def split_data_with_min_width(text, width)
pattern = /
( # capturing group for split
.{#{width-2},}? # at least width-2 characters, but not more than needed
\} # closing curly brace
[,\]] # a comma or a closing bracket
)
/x # free spacing mode
text.split(pattern).reject(&:empty?).join("\n")
end
puts split_data_with_min_width(data, 15)
# [{a:1,b:2,c:3,d:4},
# {a:5,b:6,c:7,d:8},
# {a:9,b:10,c:11,d:12},
# {a:13,b:14,c:15,d:16},
# {a:17,b:18,c:19,d:20}]
puts split_data_with_min_width(data, 32)
# [{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},
# {a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16},
# {a:17,b:18,c:19,d:20}]
The method uses split with a capturing group instead of scan because the last part of the string might not be long enough:
"abcde".scan(/../)
# ["ab", "cd"]
"abcde".split(/(..)/).reject(&:empty?)
# ["ab", "cd", "e"]

Code
def doit(str, min_size)
r = /
(?: # begin non-capture group
.{#{min_size},}? # match at least min_size characters, non-greedily
(?=\{) # match '{' in a positive lookahead
| # or
.+\z # match one or more characters followed by end of string
) # close non-capture group
/x # free-spacing regex definition mode
str.scan(r)
end
Examples
str = "[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"
doit(str, 18) # same for all min_size <= 18
#=> ["[{a:1,b:2,c:3,d:4},",
# "{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 19)
#=> ["[{a:1,b:2,c:3,d:4},",
# "{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 20)
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 21)
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 22) # same for 23 <= min_size <= 37
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"]
doit(str, 38) # same for 39 <= min_size <= 58
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 59) # same for min_size > 59
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"]

Like this?
2.3.1 :007 > a
=> "[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"
2.3.1 :008 > q = a.gsub("},", "},\n")
=> "[{a:1,b:2,c:3,d:4},\n{a:5,b:6,c:7,d:8},\n{a:9,b:10,c:11,d:12},\n{a:13,b:14,c:15,d:16}]"
2.3.1 :009 > puts q
[{a:1,b:2,c:3,d:4},
{a:5,b:6,c:7,d:8},
{a:9,b:10,c:11,d:12},
{a:13,b:14,c:15,d:16}]
=> nil
2.3.1 :010 >

Related

How to format Phone string in Ruby with hyphens?

I have a problem I can't solve.
I need to write phone_format method that would accept any phone string and output it in groups of 3 digits with hyphens
phone_format("555 123 1234") => "555-123-12-34"
phone_format("(+1) 888 33x19") => "188-833-19"
But if it ends with single digit like 999-9, change it to 99-99. Ideally it would be a one liner
R = /
\d{2,3} # match 2 or 3 digits (greedily)
(?= # begin positive lookahead
\d{2,3} # match 2 or 3 digits
| # or
\z # match the end of the string
) # end positive lookahead
/x # free-spacing regex definition mode
Conventionally written
R = /\d{2,3}(?=\d{2,3}|\z)/
def doit(str)
s = str.gsub(/\D/,'')
return s if s.size < 4
s.scan(R).join('-')
end
doit "555 123 123"
#=> "555-123-123"
doit "555 123 1234"
#=> "555-123-12-34"
doit "555 123 12345"
#=> "555-123-123-45"
doit "(+1) 888 33x19"
#=> "188-833-19"
doit "123"
#=> "123"
doit "1234"
#=> "12-34"
Not really a one-liner: you need to handle the special cases.
def cut_by(str, cut)
str.each_char.each_slice(cut).map(&:join).join('-')
end
def phone_format(str)
str = str.gsub(/\D/, '') # cleanup
if str.size == 4 # special case 1
cut_by(str, 2)
elsif str.size % 3 == 1 # special case 2
cut_by(str[0..-5], 3) + "-" + cut_by(str[-4..], 2)
else # normal case
cut_by(str, 3)
end
end

Find nth occurrence of variable regex in Ruby?

Writing a method for what the question says, need to find the index of the nth occurrence of a particular left bracket (defined by the user, i.e. if user provides a string with the additional parameters '{' and '5' it will find the 5th occurrence of this, same with '(' and '[').
Currently doing it with a while loop and comparing each character but this looks ugly and isn't very interesting, is there a way to do this with regex? Can you use a variable in a regex?
def _find_bracket_n(str,left_brac,brackets_num)
i = 0
num_of_left_bracs = 0
while i < str.length && num_of_left_bracs < brackets_num
num_of_left_bracs += 1 if str[i] == left_brac
i += 1
end
n_th_lbrac_index = i - 1
end
The offset of the nth instance of a given character in a string is wanted, or nil if the string contains fewer than n instances of that character. I will give four solutions.
chr = "("
str = "a(b(cd((ef(g(hi("
n = 5
Use Enumerable#find_index
str.each_char.find_index { |c| c == chr && (n = n-1).zero? }
#=> 10
Use a regular expression
chr_esc = Regexp.escape(chr)
#=> "\\("
r = /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
) # end the non-capture group
{#{n-1}} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
#{chr_esc} # match the given character
/x # free-spacing regex definition mode
#=> /
\A # match the beginning of the string
(?: # begin a non-capture group
.*? # match zero or more characters lazily
\( # match the given character
) # end the non-capture group
{4} # perform the non-capture group `n-1` times
.*? # match zero or more characters lazily
\( # match the given character
/x
str =~ r
#=> 0
$~.end(0)-1
#=> 10
For the last line we could instead write
Regexp.last_match.end(0)-1
See Regexp::escape, Regexp::last_match and MatchData#end.
The regex is conventionally written (i.e., not free-spacing mode) written as follows.
/\A(?:.*?#{chr_esc}){#{n-1}}.*?#{chr_esc}/
Convert characters to offsets, remove offsets to non-matching characters and return the nth offset of those that remain
str.size.times.select { |i| str[i] == chr }[n-1]
#=> 10
n = 20
str.size.times.select { |i| str[i] == chr }[n-1]
#=> nil
Use String#index repeatedly to decapitate substrings
s = str.dup
n.times.reduce(0) do |off,_|
i = s.index(chr)
break nil if i.nil?
s = s[i+1..-1]
off + i + 1
end - 1
#=> 10

How to mask all but last four characters in a string

I've been attempting a coding exercise to mask all but the last four digits or characters of any input.
I think my solution works but it seems a bit clumsy. Does anyone have ideas about how to refactor it?
Here's my code:
def mask(string)
z = string.to_s.length
if z <= 4
return string
elsif z > 4
array = []
string1 = string.to_s.chars
string1[0..((z-1)-4)].each do |s|
array << "#"
end
array << string1[(z-4)..(z-1)]
puts array.join(", ").delete(", ").inspect
end
end
positive lookahead
A positive lookahead makes it pretty easy. If any character is followed by at least 4 characters, it gets replaced :
"654321".gsub(/.(?=.{4})/,'#')
# "##4321"
Here's a description of the regex :
r = /
. # Just one character
(?= # which must be followed by
.{4} # 4 characters
) #
/x # free-spacing mode, allows comments inside regex
Note that the regex only matches one character at a time, even though it needs to check up to 5 characters for each match :
"654321".scan(r)
# => ["6", "5"]
/(.)..../ wouldn't work, because it would consume 5 characters for each iteration :
"654321".scan(/(.)..../)
# => [["6"]]
"abcdefghij".scan(/(.)..../)
# => [["a"], ["f"]]
If you want to parametrize the length of the unmasked string, you can use variable interpolation :
all_but = 4
/.(?=.{#{all_but}})/
# => /.(?=.{4})/
Code
Packing it into a method, it becomes :
def mask(string, all_but = 4, char = '#')
string.gsub(/.(?=.{#{all_but}})/, char)
end
p mask('testabcdef')
# '######cdef'
p mask('1234')
# '1234'
p mask('123')
# '123'
p mask('x')
# 'x'
You could also adapt it for sentences :
def mask(string, all_but = 4, char = '#')
string.gsub(/\w(?=\w{#{all_but}})/, char)
end
p mask('It even works for multiple words')
# "It even #orks for ####iple #ords"
Some notes about your code
string.to_s
Naming things is very important in programming, especially in dynamic languages.
string.to_s
If string is indeed a string, there shouldn't be any reason to call to_s.
If string isn't a string, you should indeed call to_s before gsub but should also rename string to a better description :
object.to_s
array.to_s
whatever.to_s
join
puts array.join(", ").delete(", ").inspect
What do you want to do exactly? You could probably just use join :
[1,2,[3,4]].join(", ").delete(", ")
# "1234"
[1,2,[3,4]].join
# "1234"
delete
Note that .delete(", ") deletes every comma and every whitespace, in any order. It doesn't only delete ", " substrings :
",a b,,, cc".delete(', ')
# "abcc"
["1,2", "3,4"].join(', ').delete(', ')
# "1234"
Ruby makes this sort of thing pretty trivial:
class String
def asteriskify(tail = 4, char = '#')
if (length <= tail)
self
else
char * (length - tail) + self[-tail, tail]
end
end
end
Then you can apply it like this:
"moo".asteriskify
# => "moo"
"testing".asteriskify
# => "###ting"
"password".asteriskify(5, '*')
# => "***sword"
Try this one
def mask(string)
string[0..-5] = '#' * (string.length - 4)
string
end
mask("12345678")
=> "####5678"
I will add my solution to this topic too :)
def mask(str)
str.match(/(.*)(.{4})/)
'#' * ($1 || '').size + ($2 || str)
end
mask('abcdef') # => "##cdef"
mask('x') # => "x"
I offer this solution mainly to remind readers that String#gsub without a block returns an enumerator.
def mask(str, nbr_unmasked, mask_char)
str.gsub(/./).with_index { |s,i| i < str.size-nbr_unmasked ? mask_char : s }
end
mask("abcdef", 4, '#')
#=> "##cdef"
mask("abcdef", 99, '#')
#=> "######"
Try using tap
def mask_string(str)
str.tap { |p| p[0...-4] = '#' * (p[0...-4].length) } if str.length > 4
str
end
mask_string('ABCDEF') # => ##CDEF
mask_string('AA') # => AA
mask_string('S') # => 'S'

Removing trailings zeros in string

I have a string and I need to remove trailing zeros after the 2nd decimal place:
remove_zeros("1,2,3,4.2300") #=> "1,2,3,4.23"
remove_zeros("1,2,3,4.20300") #=> "1,2,3,4.203"
remove_zeros("1,2,3,4.0200") #=> "1,2,3,4.02"
remove_zeros("1,2,3,4.0000") #=> "1,2,3,4.00"
Missing zeros don't have to be appended, i.e.
remove_zeros("1,2,3,4.0") #=> "1,2,3,4.0"
How could I do this in Ruby? I tried with converting into Float but it terminates the string when I encounter a ,. Can I write any regular expression for this?
Yes, a regular expression could be used.
R = /
\. # match a decimal
\d*? # match one or more digits lazily
\K # forget all matches so far
0+ # match one or more zeroes
(?!\d) # do not match a digit (negative lookahead)
/x # free-spacing regex definition mode
def truncate_floats(str)
str.gsub(R,"")
end
truncate_floats "1,2,3,4.2300"
#=> "1,2,3,4.23"
truncate_floats "1.34000,2,3,4.23000"
#=> "1.34,2,3,4.23"
truncate_floats "1,2,3,4.23003500"
#=> "1,2,3,4.230035"
truncate_floats "1,2,3,4.3"
#=> "1,2,3,4.3"
truncate_floats "1,2,3,4.000"
#=> "1,2,3,4."
> a = "1,2,3,4.2300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.23"
> a = "1,2,3,4.20300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.203"
First, you need to parse the string into its component numbers, then remove the trailing zeros on each number. This can be done by:
1) splitting the string on ',' to get an array of numeric strings
2) for each numeric string, convert it to a Float, then back to a string:
#!/usr/bin/env ruby
def parse_and_trim(string)
number_strings = string.split(',')
number_strings.map { |s| Float(s).to_s }.join(',')
end
p parse_and_trim('1,2,3,4.2300') # => "1.0,2.0,3.0,4.23"
If you really want to remove the trailing '.0' fragments, you could replace the script with this one:
#!/usr/bin/env ruby
def parse_and_trim_2(string)
original_strings = string.split(',')
converted_strings = original_strings.map { |s| Float(s).to_s }
trimmed_strings = converted_strings.map do |s|
s.end_with?('.0') ? s[0..-3] : s
end
trimmed_strings.join(',')
end
p parse_and_trim_2('1,2,3,4.2300') # => "1,2,3,4.23"
These could of course be made more concise, but I've used intermediate variables to clarify what's going on.

How to write a regex in a single line

I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.

Resources