Substring within string

Substring within string - ruby

I need to extract from an input everything that is after a parameter.
Input: "-a Apple -b Ball -c Chocolate"
Criteria: Need to extract everything after -c.
My output should be Chocolate. I tried split, scan and the output returned two elements. Can anyone help me with this requirement?
Also, request you to let me know how to handle if my input is "-a Apple -c Chocolate -b Ball".

You can use the OptionParser library to do this:
require 'optparse'
arguments = { }
opts = OptionParser.new do |parser|
parser.on('-a=s') do |v|
arguments[:a] = v
end
parser.on('-b=s') do |v|
arguments[:b] = v
end
parser.on('-c=s') do |v|
arguments[:c] = v
end
end
opts.parse("-a Apple -b Ball -c Chocolate".split)
arguments
# => {:a=>"Apple", :b=>"Ball", :c=>"Chocolate"}
It's quite flexible in how it works, so you can define a lot of options and how they're interpreted.

If you really want everything after the marker (-c):
s = "-a Apple -b Ball -c Chocolate"
index = s.index('-c')
everything_after = s[(index + 2)..-1]
puts everything_after # => Chocolate
If you want to parse the arguments:
require 'optparse'
opts = OptionParser.new do |parser|
parser.on('-a=s') do |v|
end
parser.on('-b=s') do |v|
end
parser.on('-c=s') do |v|
puts "-c is #{v}"
end
end
opts.parse("-a Apple -b Ball -c Chocolate".split(/\s/))
(you will need to specify all the flags, otherwise the parser will choke)
Or you could simply match the content with a Regexp.
I think you are looking for: <ANYTHING><FLAG><ANYTHING BUT DASH><ANYTHING> where <FLAG> is '-c '
s.match(/\A.*-c\s([^-]*).*\z/) do |match|
p match[1]
end

Assuming that the input is the command line arguments passed to a ruby script, try:
ARGV[ARGV.index("-c") + 1]
Explanation:
ARGV is an array that includes all the arguments passed to a ruby script. Array#index returns the index of the first object in self.
Refer to Array#index for more info.

s = "-a Apple -b Ball -c Chocolate"
One way: calculate an index
marker = "-c"
s[s.index(marker)+marker.size+1..-1]
#=> "Chocolate"
marker = "-b"
s[s.index(marker)+marker.size+1..-1]
#=> "Ball -c Chocolate"
marker = "-a"
s[s.index(marker)+marker.size+1..-1]
#=> "Apple -b Ball -c Chocolate"
Another way: use a regex
`\K` in the regex below means "forget everything matched so far".
marker = "-c"
s[/#{marker}\s+\K.*/]
#=> "Chocolate"
marker = "-b"
s[/#{marker}\s+\K.*/]
#=> "Ball -c Chocolate"
marker = "-a"
s[/#{marker}\s+\K.*/]
#=> "Apple -b Ball -c Chocolate"
Consider the regex for one of these markers.
marker = "-a"
r = /
#{marker} # match the contents of the variable 'marker'
\s+ # match > 0 whitespace chars
\K # forget everything matched so far
.* # match the rest of the line
/x # free-spacing regex definition mode
#=> /
# -a # match the contents of the variable 'marker'
# \s+ # match > 0 whitespace chars
# \K # forget everything matched so far
# .* # match the rest of the line
# /x
s[r]
#=> "Apple -b Ball -c Chocolate"
But if you really want just the text between markers
I will construct a hash with markers as keys and text as values. First, we will use the following regex to split the string.
r = /
\s* # match >= 0 spaces
\- # match hypen
( # begin capture group 1
[a-z] # match marker
) # end capture group 1
\s* # match >= 0 spaces
/x # free-spacing regex definition mode
h = s.split(r).drop(1).each_slice(2).to_h
#=> {"a"=>"Apple", "b"=>"Ball", "c"=>"Chocolate"}
With this hash we can retrieve the text for each marker.
h["a"]
#=> "Apple"
h["b"]
#=> "Ball"
h["c"]
#=> "Chocolate"
The steps to create the hash are as follows.
a = s.split(r)
#=> ["", "a", "Apple", "b", "Ball", "c", "Chocolate"]
Notice that, by putting [a-z] within a capture group in the regex, "a", "b" and "c" are included in the array a. (See String#split, third paragraph.)
b = a.drop(1)
#=> ["a", "Apple", "b", "Ball", "c", "Chocolate"]
c = b.each_slice(2)
#=> #<Enumerator: ["a", "Apple", "b", "Ball", "c", "Chocolate"]:each_slice(2)>
We can see the elements of the enumerator c by converting it to an array:
c.to_a
#=> [["a", "Apple"], ["b", "Ball"], ["c", "Chocolate"]]
Lastly,
c.to_h
#=> {"a"=>"Apple", "b"=>"Ball", "c"=>"Chocolate"}

Related

How do I extract the part of a string whose individual words begin with letters?

I'm using Ruby 2.4. Let's say I have a string that has a number of spaces in it
str = "abc def 123ffg"
How do I capture all the consecutive words at the beginning of the string that begin with a letter? So for example, in the above, I would want to capture
"abc def"
And if I had a string like
"aa22 b cc 33d ff"
I would want to capture
"aa22 b cc"
but if my string were
"66dd eee ff"
I would want to return nothing because the first word of that string does not begin with a letter.

If you can spare the extra spaces between words, you could then split the string and iterate the resulting array with take_while, using a regex to get the desired output; something like this:
str = "abc def 123ffg"
str.split.take_while { |word| word[0] =~ /[[:alpha:]]/ }
#=> ["abc", "def"]
The output is an array, but if a string is needed, you could use join at the end:
str.split.take_while { |word| word[0] =~ /[[:alpha:]]/ }.join(" ")
#=> "abc def"
More examples:
"aa22 b cc 33d ff".split.take_while { |word| word[0] =~ /[[:alpha:]]/ }
#=> ["aa22", "b", "cc"]
"66dd eee ff".split.take_while { |word| word[0] =~ /[[:alpha:]]/ }
#=> []

The Regular Expression
There's usually more than one way to match a pattern, although some are simpler than others. A relatively simple regular express that works with your inputs and expected outputs is as follows:
/(?:(?:\A|\s*)\p{L}\S*)+/
This matches one or more strings when all of the following conditions are met:
start-of-string, or zero or more whitespace characters
followed by a Unicode category of "letter"
followed by zero or more non-whitespace characters
The first item in the list, which is the second non-capturing group, is what allows the match to be repeated until a word starts with a non-letter.
The Proofs
regex = /(?:(?:\A|\s*)\p{L}\S*)+/
regex.match 'aa22 b cc 33d ff' #=> #<MatchData "aa22 b cc">
regex.match 'abc def 123ffg' #=> #<MatchData "abc def">
regex.match '66dd eee ff' #=> #<MatchData "">

The sub method can be used to replace with an empty string '' everything that needs to be removed from the expression.
In this case, a first sub method is needed to remove the whole text if it starts with a digit. Then another sub will remove everything starting from any word that starts with a digit.
Answer:
str.sub(/^\d+.*/, '').sub(/\s+\d+.*/, '')
Outputs:
str = "abc def 123ffg"
# => "abc def"
str = "aa22 b cc 33d ff"
# => "aa22 b cc"
str = "66dd eee ff"
# => ""

Remove a string pattern and symbols from string

I need to clean up a string from the phrase "not" and hashtags(#). (I also have to get rid of spaces and capslock and return them in arrays, but I got the latter three taken care of.)
Expectation:
"not12345" #=> ["12345"]
" notabc " #=> ["abc"]
"notone, nottwo" #=> ["one", "two"]
"notCAPSLOCK" #=> ["capslock"]
"##doublehash" #=> ["doublehash"]
"h#a#s#h" #=> ["hash"]
"#notswaggerest" #=> ["swaggerest"]
This is the code I have
def some_method(string)
string.split(", ").map{|n| n.sub(/(not)/,"").downcase.strip}
end
All of the above test does what I need to do except for the hash ones. I don't know how to get rid of the hashes; I have tried modifying the regex part: n.sub(/(#not)/), n.sub(/#(not)/), n.sub(/[#]*(not)/) to no avail. How can I make Regex to remove #?

arr = ["not12345", " notabc", "notone, nottwo", "notCAPSLOCK",
"##doublehash:", "h#a#s#h", "#notswaggerest"].
arr.flat_map { |str| str.downcase.split(',').map { |s| s.gsub(/#|not|\s+/,"") } }
#=> ["12345", "abc", "one", "two", "capslock", "doublehash:", "hash", "swaggerest"]
When the block variable str is set to "notone, nottwo",
s = str.downcase
#=> "notone, nottwo"
a = s.split(',')
#=> ["notone", " nottwo"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["one", "two"]
Because I used Enumerable#flat_map, "one" and "two" are added to the array being returned. When str #=> "notCAPSLOCK",
s = str.downcase
#=> "notcapslock"
a = s.split(',')
#=> ["notcapslock"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["capslock"]

Here is one more solution that uses a different technique of capturing what you want rather than dropping what you don't want: (for the most part)
a = ["not12345", " notabc", "notone, nottwo",
"notCAPSLOCK", "##doublehash:","h#a#s#h", "#notswaggerest"]
a.map do |s|
s.downcase.delete("#").scan(/(?<=not)\w+|^[^not]\w+/)
end
#=> [["12345"], ["abc"], ["one", "two"], ["capslock"], ["doublehash"], ["hash"], ["swaggerest"]]
Had to delete the # because of h#a#s#h otherwise delete could have been avoided with a regex like /(?<=not|^#[^not])\w+/

You can use this regex to solve your problem. I tested and it works for all of your test cases.
/^\s*#*(not)*/
^ means match start of string
\s* matches any space at the start
#* matches 0 or more #
(not)* matches the phrase "not" zero or more times.
Note: this regex won't work for cases where "not" comes before "#", such as not#hash would return #hash

Fun problem because it can use the most common string functions in Ruby:
result = values.map do |string|
string.strip # Remove spaces in front and back.
.tr('#','') # Transform single characters. In this case remove #
.gsub('not','') # Substitute patterns
.split(', ') # Split into arrays.
end
p result #=>[["12345"], ["abc"], ["one", "two"], ["CAPSLOCK"], ["doublehash"], ["hash"], ["swaggerest"]]
I prefer this way rather than a regexp as it is easy to understand the logic of each line.

Ruby regular expressions allow comments, so to match the octothorpe (#) you can escape it:
"#foo".sub(/\#/, "") #=> "foo"

Ruby regex to get text blocks including delimiters

When using scan in Ruby, we are searching for a block within a text file.
Sample file:
sometextbefore
begin
sometext
end
sometextafter
begin
sometext2
end
sometextafter2
We want the following result in an array:
["begin\nsometext\nend","begin\nsometext2\nend"]
With this scan method:
textfile.scan(/begin\s.(.*?)end/m)
we get:
["sometext","sometext2"]
We want the begin and end still in the output, not cut off.
Any suggestions?

You may remove the capturing group completely:
textfile.scan(/begin\s.*?end/m)
See the IDEONE demo
The String#scan method returns captured values only if you have capturing groups defined inside the pattern, thus a non-capturing one should fix the issue.
UPDATE
If the lines inside the blocks must be trimmed from leading/trailing whitespace, you can just use a gsub against each matched block of text to remove all the horizontal whitespace (with the help of \p{Zs} Unicode category/property class):
.scan(/begin\s.*?end/m).map { |s| s.gsub(/^\p{Zs}+|\p{Zs}+$/, "") }
Here, each match is passed to a block where /^\p{Zs}+|\p{Zs}+$/ matches either the start of a line with 1+ horizontal whitespace(s) (see ^\p{Zs}+), or 1+ horizontal whitespace(s) at the end of the line (see \p{Zs}+$).
See another IDEONE demo

Here's another approach, using Ruby's flip-flop operator. I cannot say I would recommend this approach, but Rubiests should understand how the flip-flop operator works.
First let's create a file.
str =<<_
some
text
at beginning
begin
some
text
1
end
some text
between
begin
some
text
2
end
some text at end
_
#=> "some\ntext\nat beginning\nbegin\n some\n text\n 1\nend\n...at end\n"
FName = "text"
File.write(FName, str)
Now read the file line-by-line into the array lines:
lines = File.readlines(FName)
#=> ["some\n", "text\n", "at beginning\n", "begin\n", " some\n", " text\n",
# " 1\n", "end\n", "some text\n", "between\n", "begin\n", " some\n",
# " text\n", " 2\n", "end\n", "some text at end\n"]
We can obtain the desired result as follows.
lines.chunk { |line| true if line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }.
map { |_,arr| arr.map(&:strip).join("\n") }
#=> ["begin\nsome\ntext\n1\nend", "begin\nsome\ntext\n2\nend"]
The two steps are as follows.
First, select and group the lines of interest, using Enumerable#chunk with the flip-flop operator.
a = lines.chunk { |line| true if line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }
#=> #<Enumerator: #<Enumerator::Generator:0x007ff62b981510>:each>
We can see the objects that will be generated by this enumerator by converting it to an array.
a.to_a
#=> [[true, ["begin\n", " some\n", " text\n", " 1\n", "end\n"]],
# [true, ["begin\n", " some\n", " text\n", " 2\n", "end\n"]]]
Note that the flip-flop operator is distinguished from a range definition by making it part of a logical expression. For that reason we cannot write
lines.chunk { |line| line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }.to_a
#=> ArgumentError: bad value for range
The second step is the following:
b = a.map { |_,arr| arr.map(&:strip).join("\n") }
#=> ["begin\nsome\ntext\n1\nend", "begin\nsome\ntext\n2\nend"]

Ruby has some great methods in Enumerable. slice_before and slice_after can help with this sort of problem:
string = <<EOT
sometextbefore
begin
sometext
end
sometextafter
begin
sometext2
end
sometextafter2
EOT
ary = string.split # => ["sometextbefore", "begin", "sometext", "end", "sometextafter", "begin", "sometext2", "end", "sometextafter2"]
.slice_after(/^end/) # => #<Enumerator: #<Enumerator::Generator:0x007fb1e20b42a8>:each>
.map{ |a| a.shift; a } # => [["begin", "sometext", "end"], ["begin", "sometext2", "end"], []]
ary.pop # => []
ary # => [["begin", "sometext", "end"], ["begin", "sometext2", "end"]]
If you want the resulting sub-arrays joined then that's an easy step:
ary.map{ |a| a.join("\n") } # => ["begin\nsometext\nend", "begin\nsometext2\nend"]

How to write a regex in a single line

I have this code:
str = 'printf("My name is %s and age is %0.2d", name, age);'
SPECIFIERS = 'diuXxofeEgsc'
format_specifiers = /((?:%(?:\*?([-+]?\d*\.?\d+)*(?:[#{SPECIFIERS}]))))/i
variables = /([.[^"]]*)\);$/
format = str.scan(format_specifiers)
var = str.scan(variables).first.first.split(/,/)
Is there any way a single regex can do that in a couple of lines?
My desired output is:
%s, name
%0.2d, age

I'm a big believer in keeping regular expressions as simple as possible; They can too quickly mushroom into unwieldy/unmaintainable messes. I'd start with something like this, then tweak as necessary:
str = 'printf("My name is %s and age is %0.2d", name, age);'
formats = str.scan(/%[a-z0-9.]+/) # => ["%s", "%0.2d"]
str[/,(.+)\);$/] # => ", name, age);"
vars = str[/,(.+)\);$/].scan(/[a-z]+/) # => ["name", "age"]
puts formats.zip(vars).map{ |a| a.join(', ')}
# >> %s, name
# >> %0.2d, age

Your question has two parts:
Q1: Is it possible to do this with a single regex?
Q2: Can this be done in one or two lines of code?
The answer to both questions is "yes".
format_specifiers = /
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
/x # free-spacing regex definition mode
variables = /
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-z] # match a lowercase letter
\w* # match >= 0 word characters
/x
You can decide, after testing, if these two regexes do their jobs adequately. For testing, refer to Kernel#sprintf.
r = /
(?:#{format_specifiers}) # match format_specifiers in a non-capture group
| # or
(?:#{variables}) # match variables in a non-capture group
/x
#=> /
(?:(?x-mi:
%[^\s\"\z]+ # match % followed by > 0 characters other than a
# whitespace, a double-quote or the end of the string
)) # match format_specifiers in a non-capture group
| # or
(?:(?x-mi:
,\s* # match comma followed by >= 0 whitespaces
\K # forget matches so far
[a-zA-Z] # match a letter
\w* # match >= 0 word characters
)) # match variables in a non-capture group
/x
r can of course also be written:
/(?:(?x-mi:%[^\s\"\z]+))|(?:(?x-mi:,\s*\K[a-zA-Z]\w*))/
One advantage of constructing r from two regexes is that each of the latter can be tested separately.
str = 'printf("My name is %s and age is %0.2d", name, age);'
arr = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
arr.each_slice(arr.size/2).to_a.transpose.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
I have five lines of code. We could reduce this to two by simply substituting out r in str.scan(r). We could make it a single line by writing:
str.scan(r).tap { |a|
a.replace(a.each_slice(a.size/2).to_a.transpose.map { |s| s.join(', ') }) }
#=> ["%s, name", "%0.2d, age"]
with r substituted out.
The steps here are as follows:
a = str.scan(r)
#=> ["%s", "%0.2d", "name", "age"]
b = a.each_slice(a.size/2)
#=> a.each_slice(2)
#=> #<Enumerator: ["%s", "%0.2d", "name", "age"]:each_slice(2)>
c = b.to_a
#=> [["%s", "%0.2d"], ["name", "age"]]
d = c.transpose
#=> [["%s", "name"], ["%0.2d", "age"]]
e = d.map { |s| s.join(', ') }
#=> ["%s, name", "%0.2d, age"]
a.replace(e)
#=> ["%s, name", "%0.2d, age"]
The methods used (aside from Array#size) are String#scan, Enumerable#each_slice, Enumerable#to_a, Enumerable#map, Array#transpose and Array#replace.

How to get all leading characters before the first instance of a number

I have product codes that look like:
abc123
abcd23423
I need to get all the leading characters before the first instance of a number, so:
abc
abcd
What's the best way to do this?

"abc123 abcd23423".scan(/(\D*)\d+/)
=> [["abc"], [" abcd"]]
"abc123 abcd23423".scan(/(\D*)\d+/).join
=> "abc abcd"

'abc123 abcd23423'.split(/\d+/).join
or just
'abc123 abcd23423'.gsub(/\d+/,'')

DATA.each do |l|
chars = l[/^([[:alpha:]]+)/, 1] # [:alpha:] = [a-zA-Z]
puts chars
end
__END__
abc123
abcd23423
# >> abc
# >> abcd
If you want to capture the alpha into an array do something like this:
ary = []
DATA.each do |l|
ary << l[/^([[:alpha:]]+)/, 1] # [:alpha:] = [a-zA-Z]
end
ary # => ["abc", "abcd"]
__END__
abc123
abcd23423
I didn't use \D because it means all non-numeric (AKA [^0-9]), but that can be dangerous if you are going to run into any other text that is not an alpha character:
'abc_-$%#123'[/^(\D+)/, 1] # => "abc_-$%#"
For the same reason \w is not necessarily safe:
'abc_-$%#123'[/^(\w+)/, 1] # => "abc_"
[[:alpha:]] is the alphabet characters [a-zA-Z]
'abc_-$%#123'[/^([a-zA-Z]+)/, 1] # => "abc"
'abc_-$%#123'[/^([[:alpha:]]+)/, 1] # => "abc"

You can use a regular expression which detects the beginning of the string (\A) and tries to capture as many non-digit characters (\D*) as possible (* is greedy by default):
processed_codes = codes.map { |code| code.scan(/\A(\D*)/)[0] }
You can also use String#match of course, but it has less predictable/intuitive behavior.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Substring within string - ruby

Assuming that the input is the command line arguments passed to a ruby script, try: ARGV[ARGV.index("-c") + 1] Explanation: ARGV is an array that includes all the arguments passed to a ruby script. Array#index returns the index of the first object in self. Refer to Array#index for more info.

Related

How do I extract the part of a string whose individual words begin with letters?

Remove a string pattern and symbols from string

Ruby regex to get text blocks including delimiters

How to write a regex in a single line

How to get all leading characters before the first instance of a number

Categories

Resources