Ruby = use regex to replace a line in string with blank space - ruby

Text file text.txt =
--TODAY
This == 5P
That == 4
Them == 4P
Those == 3
--TOMORROW
More == 3P
Yep == 4
Start code:
file = File.new('text.txt','r')
array = file.readlines
def remove_stuff (variable)
variable.map! {|delete| delete.gsub(/-*[A-Z]*[a-z]* *=* [^P]?/,' ')}
end
remove_stuff(array)
array.each {|x| puts x}
=>
--TODAY
P
P
--TOMORROW
P
Desired result
=>
P
P
P
What am I missing? I'm getting a space before the P, and when I match that I lose everything. I only want the P and blank placeholders for the rest.
Swap out the gsub for this line and you get just the numbers. It matches --TODAY:
gsub(/-*[A-Z]*[a-z]* *=*/,'')
Thanks!

This should do it.
text =
"--TODAY
This == 5P
That == 4
Them == 4P
Those == 3
--TOMORROW
More == 3P
Yep == 4"
arr = text.split(/\n/)
#=> ["--TODAY", "This == 5P", "That == 4", "Them == 4P",
# "Those == 3", "--TOMORROW", "More == 3P", "Yep == 4"]
r = /^\s*(?<!--).+\s==\s*\d+P$/
arr.map { |s| s[r] ? 'P' : '' }
#=> ["", "P", "", "P", "", "", "P", ""]
In the regex, (?<!--) is a "negative-lookbehind" which ensures the string does not begin with two dashes, possibly following some whitespace.

From this breakdown you can see that the minimum it will match is a space.
But there are no spaces in the lines
--TODAY
--TOMORROW
expanded
-* # optional
[A-Z]* # optional
[a-z]* # optional
[ ]* # optional
=* # optional
[ ] # required <--
[^P]? # optional
I would broaden your regex to serve a dual purpose.
One to match the == format, one to match the entire line.
Used this in Multiline mode and it worked on your data.
# (?m)^(?:[^=\n]*=+[^=P\n]*|.*)
(?m)
^
(?:
[^=\n]*
=+
[^=P\n]*
|
.*
)
With this you could just do a replace all (with '') on the string.
Then do a split on newline to get them into an array.

Related

Split an array by a repetitive value

I have a variable length array of arbitrary strings. The one consistency is the string "hello" is repeated and I want to partition the array in groups by the string "hello".
So given this:
[
"hello\r\n",
"I\r\n",
"am\r\n",
"Bob\r\n",
"hello\r\n",
"How\r\n",
"are you?\r\n"
]
I want this:
[
[
"hello\r\n",
"I\r\n",
"am\r\n",
"Bob\r\n"
],
[
"hello\r\n",
"How\r\n",
"are you?\r\n"
]
]
What I have tried:
partition = []
last = input.size
index = 0
input.each_with_object([]) do |line, acc|
index += 1
if line == "hello\r\n"
acc << partition
partition = []
partition << line
else
partition << line
end
if index == last
acc << partition
end
acc
end.delete_if(&:blank?)
=> [["hello\r\n", "I\r\n", "am\r\n", "Bob\r\n"], ["hello\r\n", "How\r\n", "are you?\r\n"]]
The result is right, but is it possible to do what I want with ruby array iterators? My solution seems clunky.
You can use Enumerable#slice_before
arr.slice_before { |i| i[/hello/] }.to_a
#=> [["hello\r\n", "I\r\n", "am\r\n", "Bob\r\n"],
# ["hello\r\n", "How\r\n", "are you?\r\n"]]
or more succinctly (as pointed out by #tokland):
arr.slice_before(/hello/).to_a
Here is a method that does not use Enumerable#slice_before, which was introduced in Ruby v.2.2. It works with v1.9+ (and would work with v1.87+ if each_with_object were replaced with reduce/inject).
Assumptions
I have assumed:
all strings preceding the first string beginning with "hello" are discarded
to match "hello" the string must begin "hello" and cannot be a word merely containing hello (e.g., "hellonfire")
Code
def group_em(arr, target)
arr.each_with_object([]) { |s,a| (s =~ /\A#{target}(?!\p{alpha})/) ?
(a << [s]) : (a.last << s unless a.empty?) }
end
Example
arr = ["Ahem\r\n", "hello\r\n", "I\r\n", "hello again\r\n", "am\r\n",
"Bob\r\n", "hellonfire\r\n", "How\r\n", "are you?\r\n"]
group_em(arr, 'hello')
#=> [["hello\r\n", "I\r\n"],
# ["hello again\r\n", "am\r\n", "Bob\r\n", "hellonfire\r\n",
# "How\r\n", "are you?\r\n"]]
Note that "Ahem\r\n" is not included because it does not follow "hello" and "hellonfire\r\n" does not trigger a new slice because it does not match `"hello"``.
Discussion
In the example, the regular expression was computed to equal
/(?m-ix:\Ahello(?!\p{alpha}))/
It could instead be defined in free-spacing mode to make it self-documenting.
/
\A # match the beginning of the string
#{target} # match target word
(?!\p{alpha}) # do not match a letter (negative lookbehind)
/x # free-spacing regex definition mode

How to mask all but last four characters in a string

I've been attempting a coding exercise to mask all but the last four digits or characters of any input.
I think my solution works but it seems a bit clumsy. Does anyone have ideas about how to refactor it?
Here's my code:
def mask(string)
z = string.to_s.length
if z <= 4
return string
elsif z > 4
array = []
string1 = string.to_s.chars
string1[0..((z-1)-4)].each do |s|
array << "#"
end
array << string1[(z-4)..(z-1)]
puts array.join(", ").delete(", ").inspect
end
end
positive lookahead
A positive lookahead makes it pretty easy. If any character is followed by at least 4 characters, it gets replaced :
"654321".gsub(/.(?=.{4})/,'#')
# "##4321"
Here's a description of the regex :
r = /
. # Just one character
(?= # which must be followed by
.{4} # 4 characters
) #
/x # free-spacing mode, allows comments inside regex
Note that the regex only matches one character at a time, even though it needs to check up to 5 characters for each match :
"654321".scan(r)
# => ["6", "5"]
/(.)..../ wouldn't work, because it would consume 5 characters for each iteration :
"654321".scan(/(.)..../)
# => [["6"]]
"abcdefghij".scan(/(.)..../)
# => [["a"], ["f"]]
If you want to parametrize the length of the unmasked string, you can use variable interpolation :
all_but = 4
/.(?=.{#{all_but}})/
# => /.(?=.{4})/
Code
Packing it into a method, it becomes :
def mask(string, all_but = 4, char = '#')
string.gsub(/.(?=.{#{all_but}})/, char)
end
p mask('testabcdef')
# '######cdef'
p mask('1234')
# '1234'
p mask('123')
# '123'
p mask('x')
# 'x'
You could also adapt it for sentences :
def mask(string, all_but = 4, char = '#')
string.gsub(/\w(?=\w{#{all_but}})/, char)
end
p mask('It even works for multiple words')
# "It even #orks for ####iple #ords"
Some notes about your code
string.to_s
Naming things is very important in programming, especially in dynamic languages.
string.to_s
If string is indeed a string, there shouldn't be any reason to call to_s.
If string isn't a string, you should indeed call to_s before gsub but should also rename string to a better description :
object.to_s
array.to_s
whatever.to_s
join
puts array.join(", ").delete(", ").inspect
What do you want to do exactly? You could probably just use join :
[1,2,[3,4]].join(", ").delete(", ")
# "1234"
[1,2,[3,4]].join
# "1234"
delete
Note that .delete(", ") deletes every comma and every whitespace, in any order. It doesn't only delete ", " substrings :
",a b,,, cc".delete(', ')
# "abcc"
["1,2", "3,4"].join(', ').delete(', ')
# "1234"
Ruby makes this sort of thing pretty trivial:
class String
def asteriskify(tail = 4, char = '#')
if (length <= tail)
self
else
char * (length - tail) + self[-tail, tail]
end
end
end
Then you can apply it like this:
"moo".asteriskify
# => "moo"
"testing".asteriskify
# => "###ting"
"password".asteriskify(5, '*')
# => "***sword"
Try this one
def mask(string)
string[0..-5] = '#' * (string.length - 4)
string
end
mask("12345678")
=> "####5678"
I will add my solution to this topic too :)
def mask(str)
str.match(/(.*)(.{4})/)
'#' * ($1 || '').size + ($2 || str)
end
mask('abcdef') # => "##cdef"
mask('x') # => "x"
I offer this solution mainly to remind readers that String#gsub without a block returns an enumerator.
def mask(str, nbr_unmasked, mask_char)
str.gsub(/./).with_index { |s,i| i < str.size-nbr_unmasked ? mask_char : s }
end
mask("abcdef", 4, '#')
#=> "##cdef"
mask("abcdef", 99, '#')
#=> "######"
Try using tap
def mask_string(str)
str.tap { |p| p[0...-4] = '#' * (p[0...-4].length) } if str.length > 4
str
end
mask_string('ABCDEF') # => ##CDEF
mask_string('AA') # => AA
mask_string('S') # => 'S'

Substring within string

I need to extract from an input everything that is after a parameter.
Input: "-a Apple -b Ball -c Chocolate"
Criteria: Need to extract everything after -c.
My output should be Chocolate. I tried split, scan and the output returned two elements. Can anyone help me with this requirement?
Also, request you to let me know how to handle if my input is "-a Apple -c Chocolate -b Ball".
You can use the OptionParser library to do this:
require 'optparse'
arguments = { }
opts = OptionParser.new do |parser|
parser.on('-a=s') do |v|
arguments[:a] = v
end
parser.on('-b=s') do |v|
arguments[:b] = v
end
parser.on('-c=s') do |v|
arguments[:c] = v
end
end
opts.parse("-a Apple -b Ball -c Chocolate".split)
arguments
# => {:a=>"Apple", :b=>"Ball", :c=>"Chocolate"}
It's quite flexible in how it works, so you can define a lot of options and how they're interpreted.
If you really want everything after the marker (-c):
s = "-a Apple -b Ball -c Chocolate"
index = s.index('-c')
everything_after = s[(index + 2)..-1]
puts everything_after # => Chocolate
If you want to parse the arguments:
require 'optparse'
opts = OptionParser.new do |parser|
parser.on('-a=s') do |v|
end
parser.on('-b=s') do |v|
end
parser.on('-c=s') do |v|
puts "-c is #{v}"
end
end
opts.parse("-a Apple -b Ball -c Chocolate".split(/\s/))
(you will need to specify all the flags, otherwise the parser will choke)
Or you could simply match the content with a Regexp.
I think you are looking for: <ANYTHING><FLAG><ANYTHING BUT DASH><ANYTHING> where <FLAG> is '-c '
s.match(/\A.*-c\s([^-]*).*\z/) do |match|
p match[1]
end
Assuming that the input is the command line arguments passed to a ruby script, try:
ARGV[ARGV.index("-c") + 1]
Explanation:
ARGV is an array that includes all the arguments passed to a ruby script. Array#index returns the index of the first object in self.
Refer to Array#index for more info.
s = "-a Apple -b Ball -c Chocolate"
One way: calculate an index
marker = "-c"
s[s.index(marker)+marker.size+1..-1]
#=> "Chocolate"
marker = "-b"
s[s.index(marker)+marker.size+1..-1]
#=> "Ball -c Chocolate"
marker = "-a"
s[s.index(marker)+marker.size+1..-1]
#=> "Apple -b Ball -c Chocolate"
Another way: use a regex
`\K` in the regex below means "forget everything matched so far".
marker = "-c"
s[/#{marker}\s+\K.*/]
#=> "Chocolate"
marker = "-b"
s[/#{marker}\s+\K.*/]
#=> "Ball -c Chocolate"
marker = "-a"
s[/#{marker}\s+\K.*/]
#=> "Apple -b Ball -c Chocolate"
Consider the regex for one of these markers.
marker = "-a"
r = /
#{marker} # match the contents of the variable 'marker'
\s+ # match > 0 whitespace chars
\K # forget everything matched so far
.* # match the rest of the line
/x # free-spacing regex definition mode
#=> /
# -a # match the contents of the variable 'marker'
# \s+ # match > 0 whitespace chars
# \K # forget everything matched so far
# .* # match the rest of the line
# /x
s[r]
#=> "Apple -b Ball -c Chocolate"
But if you really want just the text between markers
I will construct a hash with markers as keys and text as values. First, we will use the following regex to split the string.
r = /
\s* # match >= 0 spaces
\- # match hypen
( # begin capture group 1
[a-z] # match marker
) # end capture group 1
\s* # match >= 0 spaces
/x # free-spacing regex definition mode
h = s.split(r).drop(1).each_slice(2).to_h
#=> {"a"=>"Apple", "b"=>"Ball", "c"=>"Chocolate"}
With this hash we can retrieve the text for each marker.
h["a"]
#=> "Apple"
h["b"]
#=> "Ball"
h["c"]
#=> "Chocolate"
The steps to create the hash are as follows.
a = s.split(r)
#=> ["", "a", "Apple", "b", "Ball", "c", "Chocolate"]
Notice that, by putting [a-z] within a capture group in the regex, "a", "b" and "c" are included in the array a. (See String#split, third paragraph.)
b = a.drop(1)
#=> ["a", "Apple", "b", "Ball", "c", "Chocolate"]
c = b.each_slice(2)
#=> #<Enumerator: ["a", "Apple", "b", "Ball", "c", "Chocolate"]:each_slice(2)>
We can see the elements of the enumerator c by converting it to an array:
c.to_a
#=> [["a", "Apple"], ["b", "Ball"], ["c", "Chocolate"]]
Lastly,
c.to_h
#=> {"a"=>"Apple", "b"=>"Ball", "c"=>"Chocolate"}

How to write a regex in a single line

I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age
I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age
Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.

Regexp to match repeated substring

I would like to verify a string containing repeated substrings. The substrings have a particular structure. Whole string has a particular structure (substring split by "|"). For instance, the string can be:
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
How can I check that all repeated substrings match a regexp? I tried to check it with:
"1=23.00|6=22.12|12=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
But checking gives true even when several substrings do not match the regexp:
"1=23.00|6=ass|=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
# => #<MatchData "1=23.00" 1:"1=23.00">
The question is whether every repeated substring matches a regex. I understand that the substrings are separated by the character | or $/, the latter being the end of a line. We first need to obtain the repeated substrings:
a = str.split(/[#{$/}\|]/)
.map(&:strip)
.group_by {|s| s}
.select {|_,v| v.size > 1 }
.keys
Next we specify whatever regex you wish to use. I am assuming it is this:
REGEX = /[1-9][0-9]*=[1-9]+\.[0-9]+/
but it could be altered if you have other requirements.
As we wish to determine if all repeated substrings match the regex, that is simply:
a.all? {|s| s =~ REGEX}
Here are the calculations:
str =<<_
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
_
c = str.split(/[#{$/}\|]/)
#=> ["1=23.00", "6=22.12", "12=21.34", "112=20.34", "1=23.00",
# "6=22.12", "12=21.34", "1=23.00", "12=21.34", "1=23.00**"]
d = c.map(&:strip)
# same as c, possibly not needed or not wanted
e = d.group_by {|s| s}
# => {"1=23.00" =>["1=23.00", "1=23.00", "1=23.00"],
# "6=22.12" =>["6=22.12", "6=22.12"],
# "12=21.34" =>["12=21.34", "12=21.34", "12=21.34"],
# "112=20.34"=>["112=20.34"], "1=23.00**"=>["1=23.00**"]}
f = e.select {|_,v| v.size > 1 }
#=> {"1=23.00"=>["1=23.00", "1=23.00" , "1=23.00"],
# "6=22.12"=>["6=22.12", "6=22.12"],
# "12=21.34"=>["12=21.34", "12=21.34", "12=21.34"]}
a = f.keys
#=> ["1=23.00", "6=22.12", "12=21.34"]
a.all? {|s| s =~ REGEX}
#=> true
This will return true if there are any duplicates, false if there are not:
s = "1=23.00|6=22.12|12=21.34|112=20.34|3=23.00"
arr = s.split(/\|/).map { |s| s.gsub(/\d=/, "") }
arr != arr.uniq # => true
If you want to resolve it through regexp (not ruby), you should match whole string, not substrings. Well, I added [|] symbol and line ending to your regexp and it should works like you want.
([1-9][0-9]*[=][0-9\.]+[|]*)+$
Try it out.

Resources