Split a Ruby string by colon (except inside parenthesis) using regex - ruby

I want to split a string by colon.
This is an example of input:
str = "one[two:[three::four][five::six]]:seven:eight[nine:ten]"
This is an example of output:
array = ["one[two:[three::four][five::six]]", "seven", "eight[nine:ten]"]
The aim is to understand the regex representing the colon outside parentheses and nested parentheses.
But there are some constraints:
The template of regex must be like this: ^(.+)<colon_regex>(.*)<colon_regex>(.*)$
The match must be unique, with three groups.
Can you give me a suggestion?

You can use a very simple regex:
SUB_CHAR = 0.chr
#=> "\x00"
r = /#{SUB_CHAR}/
#=> /\x00/
to be used in s.split(r).
There is of course a catch: you must modify the string you pass to Puppet, (along with the above regex).
str = "one[two:[three::four][five::six]]:seven:eight[nine:ten]"
count = 0
idx = str.size.times.with_object([]) do |i,a|
case str[i]
when '[' then count += 1
when ']' then count -= 1
when ':' then a << i if count.zero?
end
end
#=> [33, 39]
s = str.dup
#=> "one[two:[three::four][five::six]]:seven:eight[nine:ten]"
idx.each { |i| s[i] = SUB_CHAR }
s #=> "one[two:[three::four][five::six]]\u0000seven\u0000eight[nine:ten]"
s.split(r)
#=> ["one[two:[three::four][five::six]]", "seven", "eight[nine:ten]"]

Adapting this nested parenthesis regex, you can do:
txt="one[two:[three::four][five::six]]:seven:eight[nine:ten]"
pat=Regexp.new('((?>[^:\[]+|(\[(?>[^\[\]]+|\g<-1>)*\]))+)')
puts txt.scan(pat).map &:first
one[two:[three::four][five::six]]
seven
eight[nine:ten]

Related

Increment alphabetical string in Ruby on rails

Task I want to solve:
Write a program that takes a string, will perform a transformation and return it.
For each of the letters of the parameter string switch it by the next one in alphabetical order.
'z' becomes 'a' and 'Z' becomes 'A'. Case remains unaffected.
def rotone(param_1)
a = ""
param_1.each_char do |x|
if x.count("a-zA-Z") > 0
a << x.succ
else
a << x
end
end
a
end
And I take this:
Input: "AkjhZ zLKIJz , 23y "
Expected Return Value: "BlkiA aMLJKa , 23z "
Return Value: "BlkiAA aaMLJKaa , 23z "
When iterators find 'z' or 'Z' it increment two times z -> aa or Z -> AA
input = "AkjhZ zLKIJz , 23y"
Code
p input.tr('a-yA-YzZ','b-zB-ZaA')
Output
"BlkiA aMLJKa , 23z"
Your problem is that String#succ (aka String#next) has been designed in a way that does not serve your purpose when the receiver is 'z' or 'Z':
'z'.succ #=> 'aa'
'Z'.succ #=> 'AA'
If you replaced a << x.succ with a << x.succ[0] you would obtain the desired result.
You might consider writing that as follows.
def rotone(param_1)
param_1.gsub(/./m) { |c| c.match?(/[a-z]/i) ? c.succ[0] : c }
end
String#gsub's argument is a regular expression that matches every character (so every character is passed to gsub's block)1.
See also String#match?. The regular expression /[a-z]/i matches every character that is one of the characters in the character class [a-z]. The option i makes the match case-independent, so uppercase letters are matched as well.
Here is alternative way to write the method that employs two hashes that are defined as constants.
CODE = [*'a'..'z', *'A'..'Z'].each_with_object({}) do |c,h|
h[c] = c.succ[0]
end.tap { |h| h.default_proc = proc { |_h,k| k } }
#=> {"a"=>"b", "b"=>"c",..., "y"=>"z", "z"=>"a",
# "A"=>"B", "B"=>"C",..., "Y"=>"Z", "Z"=>"A"}
DECODE = CODE.invert.tap { |h| h.default_proc = proc { |_h,k| k } }
#=> {"b"=>"a", "c"=>"b", ..., "z"=>"y", "a"=>"z",
# "B"=>"A", "C"=>"B", ..., "Z"=>"Y", "A"=>"Z"}
For example,
CODE['e'] #=> "f"
CODE['Z'] #=> "A"
CODE['?'] #=> "?"
DECODE['f'] #=> "e"
DECODE['A'] #=> "Z"
DECODE['?'] #=> "?"
Let's try using gsub, CODE and DECODE with an example string.
str = "The quick brown dog Zelda jumped over the lazy fox Arnie"
rts = str.gsub(/./m, CODE)
#=> "Uif rvjdl cspxo eph Afmeb kvnqfe pwfs uif mbaz gpy Bsojf"
rts.gsub(/./m, DECODE)
#=> "The quick brown dog Zelda jumped over the lazy fox Arnie"
See Hash#merge, Object#tap, Hash#default_proc=, Hash#invert and the form of Sting#gsub that takes a hash as its optional second argument.
Adding the default proc to the hash h causes h[k] to return k if h does not have a key k. Had CODE been defined without the default proc,
CODE = [*'a'..'z', *'A'..'Z'].each_with_object({}) { |c,h| h[c] = c.succ[0] }
#=> {"a"=>"b", "b"=>"c",..., "y"=>"z", "z"=>"a",
# "A"=>"B", "B"=>"C",..., "Y"=>"Z", "Z"=>"A"}
gsub would skip over characters that are not letters:
rts = str.gsub(/./m, CODE)
#=> "UifrvjdlcspxoephAfmebkvnqfepwfsuifmbazgpyBsojf"
Without the default proc we would have to write
rts = str.gsub(/./m) { |s| CODE.fetch(s, s) }
#=> "Uif rvjdl cspxo eph Afmeb kvnqfe pwfs uif mbaz gpy Bsojf"
See Hash#fetch.
1. The regular expression /./ matches every character other than line terminators. Adding the option m (/./m) causes . to match line terminators as well.

How to mask all but last four characters in a string

I've been attempting a coding exercise to mask all but the last four digits or characters of any input.
I think my solution works but it seems a bit clumsy. Does anyone have ideas about how to refactor it?
Here's my code:
def mask(string)
z = string.to_s.length
if z <= 4
return string
elsif z > 4
array = []
string1 = string.to_s.chars
string1[0..((z-1)-4)].each do |s|
array << "#"
end
array << string1[(z-4)..(z-1)]
puts array.join(", ").delete(", ").inspect
end
end
positive lookahead
A positive lookahead makes it pretty easy. If any character is followed by at least 4 characters, it gets replaced :
"654321".gsub(/.(?=.{4})/,'#')
# "##4321"
Here's a description of the regex :
r = /
. # Just one character
(?= # which must be followed by
.{4} # 4 characters
) #
/x # free-spacing mode, allows comments inside regex
Note that the regex only matches one character at a time, even though it needs to check up to 5 characters for each match :
"654321".scan(r)
# => ["6", "5"]
/(.)..../ wouldn't work, because it would consume 5 characters for each iteration :
"654321".scan(/(.)..../)
# => [["6"]]
"abcdefghij".scan(/(.)..../)
# => [["a"], ["f"]]
If you want to parametrize the length of the unmasked string, you can use variable interpolation :
all_but = 4
/.(?=.{#{all_but}})/
# => /.(?=.{4})/
Code
Packing it into a method, it becomes :
def mask(string, all_but = 4, char = '#')
string.gsub(/.(?=.{#{all_but}})/, char)
end
p mask('testabcdef')
# '######cdef'
p mask('1234')
# '1234'
p mask('123')
# '123'
p mask('x')
# 'x'
You could also adapt it for sentences :
def mask(string, all_but = 4, char = '#')
string.gsub(/\w(?=\w{#{all_but}})/, char)
end
p mask('It even works for multiple words')
# "It even #orks for ####iple #ords"
Some notes about your code
string.to_s
Naming things is very important in programming, especially in dynamic languages.
string.to_s
If string is indeed a string, there shouldn't be any reason to call to_s.
If string isn't a string, you should indeed call to_s before gsub but should also rename string to a better description :
object.to_s
array.to_s
whatever.to_s
join
puts array.join(", ").delete(", ").inspect
What do you want to do exactly? You could probably just use join :
[1,2,[3,4]].join(", ").delete(", ")
# "1234"
[1,2,[3,4]].join
# "1234"
delete
Note that .delete(", ") deletes every comma and every whitespace, in any order. It doesn't only delete ", " substrings :
",a b,,, cc".delete(', ')
# "abcc"
["1,2", "3,4"].join(', ').delete(', ')
# "1234"
Ruby makes this sort of thing pretty trivial:
class String
def asteriskify(tail = 4, char = '#')
if (length <= tail)
self
else
char * (length - tail) + self[-tail, tail]
end
end
end
Then you can apply it like this:
"moo".asteriskify
# => "moo"
"testing".asteriskify
# => "###ting"
"password".asteriskify(5, '*')
# => "***sword"
Try this one
def mask(string)
string[0..-5] = '#' * (string.length - 4)
string
end
mask("12345678")
=> "####5678"
I will add my solution to this topic too :)
def mask(str)
str.match(/(.*)(.{4})/)
'#' * ($1 || '').size + ($2 || str)
end
mask('abcdef') # => "##cdef"
mask('x') # => "x"
I offer this solution mainly to remind readers that String#gsub without a block returns an enumerator.
def mask(str, nbr_unmasked, mask_char)
str.gsub(/./).with_index { |s,i| i < str.size-nbr_unmasked ? mask_char : s }
end
mask("abcdef", 4, '#')
#=> "##cdef"
mask("abcdef", 99, '#')
#=> "######"
Try using tap
def mask_string(str)
str.tap { |p| p[0...-4] = '#' * (p[0...-4].length) } if str.length > 4
str
end
mask_string('ABCDEF') # => ##CDEF
mask_string('AA') # => AA
mask_string('S') # => 'S'

Check whether a string contains all the characters of another string in Ruby

Let's say I have a string, like string= "aasmflathesorcerersnstonedksaottersapldrrysaahf". If you haven't noticed, you can find the phrase "harry potter and the sorcerers stone" in there (minus the space).
I need to check whether string contains all the elements of the string.
string.include? ("sorcerer") #=> true
string.include? ("harrypotterandtheasorcerersstone") #=> false, even though it contains all the letters to spell harrypotterandthesorcerersstone
Include does not work on shuffled string.
How can I check if a string contains all the elements of another string?
Sets and array intersection don't account for repeated chars, but a histogram / frequency counter does:
require 'facets'
s1 = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
s2 = "harrypotterandtheasorcerersstone"
freq1 = s1.chars.frequency
freq2 = s2.chars.frequency
freq2.all? { |char2, count2| freq1[char2] >= count2 }
#=> true
Write your own Array#frequency if you don't want to the facets dependency.
class Array
def frequency
Hash.new(0).tap { |counts| each { |v| counts[v] += 1 } }
end
end
I presume that if the string to be checked is "sorcerer", string must include, for example, three "r"'s. If so you could use the method Array#difference, which I've proposed be added to the Ruby core.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
str = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
target = "sorcerer"
target.chars.difference(str.chars).empty?
#=> true
target = "harrypotterandtheasorcerersstone"
target.chars.difference(str.chars).empty?
#=> true
If the characters of target must not only be in str, but must be in the same order, we could write:
target = "sorcerer"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /s.*o.*r.*c.*e.*r.*e.*r/
str =~ r
#=> 2 (truthy)
(or !!(str =~ r) #=> true)
target = "harrypotterandtheasorcerersstone"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /h.*a.*r.*r.*y* ... o.*n.*e/
str =~ r
#=> nil
A different albeit not necessarily better solution using sorted character arrays and sub-strings:
Given your two strings...
subject = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
search = "harrypotterandthesorcerersstone"
You can sort your subject string using .chars.sort.join...
subject = subject.chars.sort.join # => "aaaaaaacddeeeeeffhhkllmnnoooprrrrrrssssssstttty"
And then produce a list of substrings to search for:
search = search.chars.group_by(&:itself).values.map(&:join)
# => ["hh", "aa", "rrrrrr", "y", "p", "ooo", "tttt", "eeeee", "nn", "d", "sss", "c"]
You could alternatively produce the same set of substrings using this method
search = search.chars.sort.join.scan(/((.)\2*)/).map(&:first)
And then simply check whether every search sub-string appears within the sorted subject string:
search.all? { |c| subject[c] }
Create a 2 dimensional array out of your string letter bank, to associate the count of letters to each letter.
Create a 2 dimensional array out of the harry potter string in the same way.
Loop through both and do comparisons.
I have no experience in Ruby but this is how I would start to tackle it in the language I know most, which is Java.

Removing trailings zeros in string

I have a string and I need to remove trailing zeros after the 2nd decimal place:
remove_zeros("1,2,3,4.2300") #=> "1,2,3,4.23"
remove_zeros("1,2,3,4.20300") #=> "1,2,3,4.203"
remove_zeros("1,2,3,4.0200") #=> "1,2,3,4.02"
remove_zeros("1,2,3,4.0000") #=> "1,2,3,4.00"
Missing zeros don't have to be appended, i.e.
remove_zeros("1,2,3,4.0") #=> "1,2,3,4.0"
How could I do this in Ruby? I tried with converting into Float but it terminates the string when I encounter a ,. Can I write any regular expression for this?
Yes, a regular expression could be used.
R = /
\. # match a decimal
\d*? # match one or more digits lazily
\K # forget all matches so far
0+ # match one or more zeroes
(?!\d) # do not match a digit (negative lookahead)
/x # free-spacing regex definition mode
def truncate_floats(str)
str.gsub(R,"")
end
truncate_floats "1,2,3,4.2300"
#=> "1,2,3,4.23"
truncate_floats "1.34000,2,3,4.23000"
#=> "1.34,2,3,4.23"
truncate_floats "1,2,3,4.23003500"
#=> "1,2,3,4.230035"
truncate_floats "1,2,3,4.3"
#=> "1,2,3,4.3"
truncate_floats "1,2,3,4.000"
#=> "1,2,3,4."
> a = "1,2,3,4.2300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.23"
> a = "1,2,3,4.20300"
> a.split(",").map{|e| e.include?(".") ? e.to_f : e}.join(",")
#=> "1,2,3,4.203"
First, you need to parse the string into its component numbers, then remove the trailing zeros on each number. This can be done by:
1) splitting the string on ',' to get an array of numeric strings
2) for each numeric string, convert it to a Float, then back to a string:
#!/usr/bin/env ruby
def parse_and_trim(string)
number_strings = string.split(',')
number_strings.map { |s| Float(s).to_s }.join(',')
end
p parse_and_trim('1,2,3,4.2300') # => "1.0,2.0,3.0,4.23"
If you really want to remove the trailing '.0' fragments, you could replace the script with this one:
#!/usr/bin/env ruby
def parse_and_trim_2(string)
original_strings = string.split(',')
converted_strings = original_strings.map { |s| Float(s).to_s }
trimmed_strings = converted_strings.map do |s|
s.end_with?('.0') ? s[0..-3] : s
end
trimmed_strings.join(',')
end
p parse_and_trim_2('1,2,3,4.2300') # => "1,2,3,4.23"
These could of course be made more concise, but I've used intermediate variables to clarify what's going on.

Ruby transform string of range measurements into a list of the measurements?

I have a sample string that I would like to transform, from this:
#21inch-#25inch
to this:
#21inch #22inch #23inch #24inch #25inch
Using Ruby, please show me how this can be done.
You can scan your string and working with range of strings:
numbers = "#21inch-#25inch".scan(/\d+/)
=> ["21", "25"]
Range.new(*numbers).map{ |s| "##{s}inch" }.join(" ")
=> "#21inch #22inch #23inch #24inch #25inch"
This solution working only if your string has a format like in your instance. For other cases you should write your own specific solution.
R = /
(\D*) # match zero or more non-digits in capture group 1
(\d+) # match one or more digits in capture group 2
([^\d-]+) # match on or more chars other the digits and hyphens in capture group 3
/x # free-spacing regex definition mode
def spin_out(str)
(prefix, first, units),(_, last, _) = str.scan(R)
(first..last).map { |s| "%s%s%s" % [prefix,s,units] }.join(' ')
end
spin_out "#21inch-#25inch"
#=> "#21inch #22inch #23inch #24inch #25inch"
spin_out "#45cm-#53cm"
#=> "#45cm #46cm #47cm #48cm #49cm #50cm #51cm #52cm #53cm"
spin_out "sz 45cm-sz 53cm"
#=> "sz 45cm sz 46cm sz 47cm sz 48cm sz 49cm sz 50cm sz 51cm sz 52cm sz 53cm"
spin_out "45cm-53cm"
#=> "45cm 46cm 47cm 48cm 49cm 50cm 51cm 52cm 53cm"
For str = "#21inch-#25inch", we obtain
(prefix, first, units),(_, last, _) = str.scan(R)
#=> [["#", "21", "inch"], ["-#", "25", "inch"]]
prefix
#=> "#"
first
#=> "21"
units
#=> "inch"
last
#=> "25"
The subsequent mapping is straightforward.
You can use a regex gsub with a block match replacement, like this:
string = "#21inch-#25inch"
new_string = string.gsub(/#\d+\w+-#\d+\w+/) do |match|
first_capture, last_capture = match.split("-")
first_num = first_capture.gsub(/\D+/, "").to_i
last_num = last_capture.gsub(/\D+/, "").to_i
pattern = first_capture.split(/\d+/)
(first_num..last_num).map {|num| pattern.join(num.to_s) }.join(" ")
end
puts "#{new_string}"
Running this will produce this output:
First: #21inch Last: #25inch
First num: 21 Last num: 25
Pattern: ["#", "inch"]
#21inch #22inch #23inch #24inch #25inch
The last line of output is the answer, and the previous lines show the progression of logic to get there.
This approach should work for other, slightly different unit formats, as well:
#32ft-#49ft
#1mm-5mm
#2acres-5acres
Making this suit multiple purposes will be quite simple. With a slight variation in the regex, you could also support a range format #21inch..#25inch:
/(#\d+\w+)[-.]+(#\d+\w+)/
Happy parsing!

Resources