I need to split a string in Ruby which has the following format:
[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]
ie. it is a generated javascript array. Unfortunately the list is long and I would like to split it up on the array element separator comma after reaching a specific length suitable for editing it with a code editor, but in a way to keep the integrity of the elements. For example, the line above with a split width of 15 would become:
[{a:1,b:2,c:3,d:4},
{a:5,b:6,c:7,d:8},
{a:9,b:10,c:11,d:12},
{a:13,b:14,c:15,d:16}]
and with a width of 32 the text would be:
[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},
{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]
Beside of the classical "brute force" approach (loop through, check separator between } and { while increasing length, split if length greater than and valid separator found) is there a more "Rubyish" solution to the problem?
Edit: Naive approach attached, definitely not Rubyiish as I don't have a very strong Ruby background:
def split(what, length)
result = []
clength = 0
flag = FALSE
what_copy = what.to_s
what_copy.to_s.each_char do |c|
clength += 1
if clength > length
flag = TRUE
end
if c == '}' and flag
result << what[0 .. clength]
what = what[clength+1 .. -1]
clength = 0
flag = FALSE
end
end
pp result
sres = result.join("\n")
sres
end
You could use a regex with :
a non-greedy repetition of at least width-2 characters
followed by a }
followed by a , or a ].
data = "[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"
def split_data_with_min_width(text, width)
pattern = /
( # capturing group for split
.{#{width-2},}? # at least width-2 characters, but not more than needed
\} # closing curly brace
[,\]] # a comma or a closing bracket
)
/x # free spacing mode
text.split(pattern).reject(&:empty?).join("\n")
end
puts split_data_with_min_width(data, 15)
# [{a:1,b:2,c:3,d:4},
# {a:5,b:6,c:7,d:8},
# {a:9,b:10,c:11,d:12},
# {a:13,b:14,c:15,d:16},
# {a:17,b:18,c:19,d:20}]
puts split_data_with_min_width(data, 32)
# [{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},
# {a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16},
# {a:17,b:18,c:19,d:20}]
The method uses split with a capturing group instead of scan because the last part of the string might not be long enough:
"abcde".scan(/../)
# ["ab", "cd"]
"abcde".split(/(..)/).reject(&:empty?)
# ["ab", "cd", "e"]
Code
def doit(str, min_size)
r = /
(?: # begin non-capture group
.{#{min_size},}? # match at least min_size characters, non-greedily
(?=\{) # match '{' in a positive lookahead
| # or
.+\z # match one or more characters followed by end of string
) # close non-capture group
/x # free-spacing regex definition mode
str.scan(r)
end
Examples
str = "[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"
doit(str, 18) # same for all min_size <= 18
#=> ["[{a:1,b:2,c:3,d:4},",
# "{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 19)
#=> ["[{a:1,b:2,c:3,d:4},",
# "{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 20)
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 21)
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 22) # same for 23 <= min_size <= 37
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},",
# "{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"]
doit(str, 38) # same for 39 <= min_size <= 58
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},",
# "{a:13,b:14,c:15,d:16}]"]
doit(str, 59) # same for min_size > 59
#=> ["[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"]
Like this?
2.3.1 :007 > a
=> "[{a:1,b:2,c:3,d:4},{a:5,b:6,c:7,d:8},{a:9,b:10,c:11,d:12},{a:13,b:14,c:15,d:16}]"
2.3.1 :008 > q = a.gsub("},", "},\n")
=> "[{a:1,b:2,c:3,d:4},\n{a:5,b:6,c:7,d:8},\n{a:9,b:10,c:11,d:12},\n{a:13,b:14,c:15,d:16}]"
2.3.1 :009 > puts q
[{a:1,b:2,c:3,d:4},
{a:5,b:6,c:7,d:8},
{a:9,b:10,c:11,d:12},
{a:13,b:14,c:15,d:16}]
=> nil
2.3.1 :010 >
I've been attempting a coding exercise to mask all but the last four digits or characters of any input.
I think my solution works but it seems a bit clumsy. Does anyone have ideas about how to refactor it?
Here's my code:
def mask(string)
z = string.to_s.length
if z <= 4
return string
elsif z > 4
array = []
string1 = string.to_s.chars
string1[0..((z-1)-4)].each do |s|
array << "#"
end
array << string1[(z-4)..(z-1)]
puts array.join(", ").delete(", ").inspect
end
end
positive lookahead
A positive lookahead makes it pretty easy. If any character is followed by at least 4 characters, it gets replaced :
"654321".gsub(/.(?=.{4})/,'#')
# "##4321"
Here's a description of the regex :
r = /
. # Just one character
(?= # which must be followed by
.{4} # 4 characters
) #
/x # free-spacing mode, allows comments inside regex
Note that the regex only matches one character at a time, even though it needs to check up to 5 characters for each match :
"654321".scan(r)
# => ["6", "5"]
/(.)..../ wouldn't work, because it would consume 5 characters for each iteration :
"654321".scan(/(.)..../)
# => [["6"]]
"abcdefghij".scan(/(.)..../)
# => [["a"], ["f"]]
If you want to parametrize the length of the unmasked string, you can use variable interpolation :
all_but = 4
/.(?=.{#{all_but}})/
# => /.(?=.{4})/
Code
Packing it into a method, it becomes :
def mask(string, all_but = 4, char = '#')
string.gsub(/.(?=.{#{all_but}})/, char)
end
p mask('testabcdef')
# '######cdef'
p mask('1234')
# '1234'
p mask('123')
# '123'
p mask('x')
# 'x'
You could also adapt it for sentences :
def mask(string, all_but = 4, char = '#')
string.gsub(/\w(?=\w{#{all_but}})/, char)
end
p mask('It even works for multiple words')
# "It even #orks for ####iple #ords"
Some notes about your code
string.to_s
Naming things is very important in programming, especially in dynamic languages.
string.to_s
If string is indeed a string, there shouldn't be any reason to call to_s.
If string isn't a string, you should indeed call to_s before gsub but should also rename string to a better description :
object.to_s
array.to_s
whatever.to_s
join
puts array.join(", ").delete(", ").inspect
What do you want to do exactly? You could probably just use join :
[1,2,[3,4]].join(", ").delete(", ")
# "1234"
[1,2,[3,4]].join
# "1234"
delete
Note that .delete(", ") deletes every comma and every whitespace, in any order. It doesn't only delete ", " substrings :
",a b,,, cc".delete(', ')
# "abcc"
["1,2", "3,4"].join(', ').delete(', ')
# "1234"
Ruby makes this sort of thing pretty trivial:
class String
def asteriskify(tail = 4, char = '#')
if (length <= tail)
self
else
char * (length - tail) + self[-tail, tail]
end
end
end
Then you can apply it like this:
"moo".asteriskify
# => "moo"
"testing".asteriskify
# => "###ting"
"password".asteriskify(5, '*')
# => "***sword"
Try this one
def mask(string)
string[0..-5] = '#' * (string.length - 4)
string
end
mask("12345678")
=> "####5678"
I will add my solution to this topic too :)
def mask(str)
str.match(/(.*)(.{4})/)
'#' * ($1 || '').size + ($2 || str)
end
mask('abcdef') # => "##cdef"
mask('x') # => "x"
I offer this solution mainly to remind readers that String#gsub without a block returns an enumerator.
def mask(str, nbr_unmasked, mask_char)
str.gsub(/./).with_index { |s,i| i < str.size-nbr_unmasked ? mask_char : s }
end
mask("abcdef", 4, '#')
#=> "##cdef"
mask("abcdef", 99, '#')
#=> "######"
Try using tap
def mask_string(str)
str.tap { |p| p[0...-4] = '#' * (p[0...-4].length) } if str.length > 4
str
end
mask_string('ABCDEF') # => ##CDEF
mask_string('AA') # => AA
mask_string('S') # => 'S'
I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.