How to extract content within square brackets in ruby - ruby

I am trying to extract the content inside the square brackets.
So far I have been using this, which works but I was wondering instead of using this delete function, if I could directly use something in regex.
a = "This is such a great day [cool awesome]"
a[/\[.*?\]/].delete('[]') #=> "cool awesome"

Almost.
a = "This is such a great day [cool awesome]"
a[/\[(.*?)\]/, 1]
# => "cool awesome"
a[/(?<=\[).*?(?=\])/]
# => "cool awesome"
The first one relies on extracting a group instead of a full match; the second one leverages lookahead and lookbehind to avoid the delimiters in the final match.

You can do this with Regular expression using Regexp#=~.
/\[(?<inside_brackets>.+)\]/ =~ a
=> 25
inside_brackets
=> "cool awesome"
This way you assign inside_brackets to the string which matches the regular expression if any, which I think it is more readable.
Note: Be careful to place the regular expression at the left hand side.

Related

How to change case of letters in string using RegEx in Ruby

Say I have a string : "hEY "
I want to convert it to "Hey "
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
That is the idea I have, but it seems like the upcase method does nothing when I use it within the gsub method. Why is that?
EDIT: I came up with this method:
string.gsub!(/([a-z])([A-Z]+ )/) { |str| str.downcase!.capitalize! }
Is there a way to do this within the regex though? I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work
#sawa Has the simple answer, and you've edited your question with another mechanism. However, to answer two of your questions:
Is there a way to do this within the regex though?
No, Ruby's regex does not support a case-changing feature as some other regex flavors do. You can "prove" this to yourself by reviewing the official Ruby regex docs for 1.9 and 2.0 and searching for the word "case":
https://github.com/ruby/ruby/blob/ruby_1_9_3/doc/re.rdoc
https://github.com/ruby/ruby/blob/ruby_2_0_0/doc/re.rdoc
I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work?
Your use of \1 is a kind of backreference. A backreference can be when you use \1 and such in the search pattern. For example, the regular expression /f(.)\1/ will find the letter f, followed by any character, followed by that same character (e.g. "foo" or "f!!").
In this case, within a replacement string passed to a method like String#gsub, the backreference does refer to the previous capture. From the docs:
"If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d, where d is a group number, or \k<n>, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash."
In practice, this means:
"hello world".gsub( /([aeiou])/, '_\1_' ) #=> "h_e_ll_o_ w_o_rld"
"hello world".gsub( /([aeiou])/, "_\1_" ) #=> "h_\u0001_ll_\u0001_ w_\u0001_rld"
"hello world".gsub( /([aeiou])/, "_\\1_" ) #=> "h_e_ll_o_ w_o_rld"
Now, you have to understand when code runs. In your original code…
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
…what you are doing is calling upcase on the string '\1' (which has no effect) and then calling the gsub! method, passing in a regex and a string as parameters.
Finally, another way to achieve this same goal is with the block form like so:
# Take your pick of which you prefer:
string.gsub!(/([a-z])([A-Z]+ )/){ $1.upcase << $2.downcase }
string.gsub!(/([a-z])([A-Z]+ )/){ [$1.upcase,$2.downcase].join }
string.gsub!(/([a-z])([A-Z]+ )/){ "#{$1.upcase}#{$2.downcase}" }
In the block form of gsub the captured patterns are set to the global variables $1, $2, etc. and you can use those to construct the replacement string.
I don't know why you are trying to do it in a complicated way, but the usual way is:
"hEY".capitalize # => "Hey"
If you insist in using a regex and upcase, then you would also need downcase:
"hEY".downcase.sub(/\w/){$&.upcase} # => "Hey"
If you really want to just swap the case of every letter in the string, you can avoid the complexity of regex entirely because There's A Method For That™.
"hEY".swapcase # => "Hey"
"HellO thERe".swapcase # => "hELLo THerE"
There's also swapcase! to do it destructively.

How do I write a regular expression that will match characters in any order?

I'm trying to write a regular expressions that will match a set of characters without regard to order. For example:
str = "act"
str.scan(/Insert expression here/)
would match:
cat
act
tca
atc
tac
cta
but would not match ca, ac or cata.
I read through a lot of similar questions and answers here on StackOverflow, but have not found one that matches my objectives exactly.
To clarify a bit, I'm using ruby and do not want to allow repeat characters.
Here is your solution
^(?:([act])(?!.*\1)){3}$
See it here on Regexr
^ # matches the start of the string
(?: # open a non capturing group
([act]) # The characters that are allowed and a capturing group
(?!.*\1) # That character is matched only if it does not occur once more, Lookahead assertion
){3} # Defines the amount of characters
$
The only special think is the lookahead assertion, to ensure the character is not repeated.
^ and $ are anchors to match the start and the end of the string.
[act]{3} or ^[act]{3}$ will do it in most regular expression dialects. If you can narrow down the system you're using, that will help you get a more specific answer.
Edit: as mentioned by #georgydyer in the comments below, it's unclear from your question whether or not repeated characters are allowed. If not, you can adapt the answer from this question and get:
^(?=[act]{3}$)(?!.*(.).*\1).*$
That is, a positive lookahead to check a match, and then a negative lookahead with a backreference to exclude repeated characters.
Here's how I'd go about it:
regex = /\b(?:#{ Regexp.union(str.split('').permutation.map{ |a| a.join }).source })\b/
# => /(?:act|atc|cat|cta|tac|tca)/
%w[
cat act tca atc tac cta
ca ac cata
].each do |w|
puts '"%s" %s' % [w, w[regex] ? 'matches' : "doesn't match"]
end
That outputs:
"cat" matches
"act" matches
"tca" matches
"atc" matches
"tac" matches
"cta" matches
"ca" doesn't match
"ac" doesn't match
"cata" doesn't match
I use the technique of passing an array into Regexp.union for a lot of things; I works especially well with the keys of a hash, and passing the hash into gsub for rapid search/replace on text templates. This is the example from the gsub documentation:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
Regexp.union creates a regex, and it's important to use source instead of to_s when extracting the actual pattern being generated:
puts regex.to_s
=> (?-mix:\b(?:act|atc|cat|cta|tac|tca)\b)
puts regex.source
=> \b(?:act|atc|cat|cta|tac|tca)\b
Notice how to_s embeds the pattern's flags inside the string. If you don't expect them you can accidentally embed that pattern into another, which won't behave as you expect. Been there, done that and have the dented helmet as proof.
If you really want to have fun, look into the Perl Regexp::Assemble module available on CPAN. Using that, plus List::Permutor, lets us generate more complex patterns. On a simple string like this it won't save much space, but on long strings or large arrays of desired hits it can make a huge difference. Unfortunately, Ruby has nothing like this, but it is possible to write a simple Perl script with the word or array of words, and have it generate the regex and pass it back:
use List::Permutor;
use Regexp::Assemble;
my $regex_assembler = Regexp::Assemble->new;
my $perm = new List::Permutor split('', 'act');
while (my #set = $perm->next) {
$regex_assembler->add(join('', #set));
}
print $regex_assembler->re, "\n";
(?-xism:(?:a(?:ct|tc)|c(?:at|ta)|t(?:ac|ca)))
See "Is there an efficient way to perform hundreds of text substitutions in Ruby?" for more information about using Regexp::Assemble with Ruby.
I will assume several things here:
- You are looking for permutations of given characters
- You are using ruby
str = "act"
permutations = str.split(//).permutation.map{|p| p.join("")}
# and for the actual test
permutations.include?("cat")
It is no regex though.
No doubt - the regex that uses positive/negative lookaheads and backreferences is slick, but if you're only dealing with three characters, I'd err on the side of verbosity by explicitly enumerating the character permutations like #scones suggested.
"act".split('').permutation.map(&:join)
=> ["act", "atc", "cat", "cta", "tac", "tca"]
And if you really need a regex out of it for scanning a larger string, you can always:
Regexp.union "act".split('').permutation.map(&:join)
=> /\b(act|atc|cat|cta|tac|tca)\b/
Obviously, this strategy doesn't scale if your search string grows, but it's much easier to observe the intent of code like this in my opinion.
EDIT: Added word boundaries for false positive on cata based on #theTinMan's feedback.

Remove all non-alphabetical, non-numerical characters from a string?

If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

Regex replace pattern with first char of match & second char in caps

Let's say i have the following string:
"a test-eh'l"
I want to capitalize the start of each word. A word can be separated by a space, apostrophe, hyphen, a forward slash, a period, etc. So I want the string to turn out like this:
"A Test-Eh'L"
I'm not too worried about getting the first character capitalized from the gsub call, as that's easy to do after the fact. However, when I've been using IRB and match method, I only seem to be getting one result. When i use a scan, it collects the matches, but the problem is I cannot really do much with it, as i need to replace the contents of the original string.
Here's what i have so far:
"a test-eh'a".scan(/[\s|\-|\'][a-z]/)
=> [" t", "-e", "'a"]
"a test-eh'a".match(/[\s|\-|\'][a-z]/)
=> #<MatchData " t">
Then if i try the pattern using gsub:
"a test-eh'a".gsub(/[\s|\-|\'][a-z]/, $1)
TypeError: can't convert nil into String
In javascript, i would normally use parenthesis instead of square brackets on the front section. However, i wasn't getting correct results in the scan call when doing so.
"a test-eh'a".scan(/(\s|\-|\')[a-z]/)
=> [[" "], ["-"], ["'"]]
"a test-eh'a".gsub(/(\s|\-|\')[a-z]/, $1)
=> "a'est'h'"
Any help would be appreciated.
Try this:
"a test-eh'a".gsub(/(?:^|\s|-|')[a-z]/) { |r| r.upcase }
# => "A Test-Eh'A"

Remove character from string if it starts with that character?

How can I remove the very first "1" from any string if that string starts with a "1"?
"1hello world" => "hello world"
"112345" => "12345"
I'm thinking of doing
string.sub!('1', '') if string =~ /^1/
but I' wondering there's a better way. Thanks!
Why not just include the regex in the sub! method?
string.sub!(/^1/, '')
As of Ruby 2.5 you can use delete_prefix or delete_prefix! to achieve this in a readable manner.
In this case "1hello world".delete_prefix("1").
More info here:
https://blog.jetbrains.com/ruby/2017/10/10-new-features-in-ruby-2-5/
https://bugs.ruby-lang.org/issues/12694
'invisible'.delete_prefix('in') #=> "visible"
'pink'.delete_prefix('in') #=> "pink"
N.B. you can also use this to remove items from the end of a string with delete_suffix and delete_suffix!
'worked'.delete_suffix('ed') #=> "work"
'medical'.delete_suffix('ed') #=> "medical"
https://bugs.ruby-lang.org/issues/13665
I've answered in a little more detail (with benchmarks) here: What is the easiest way to remove the first character from a string?
if you're going to use regex for the match, you may as well use it for the replacement
string.sub!(%r{^1},"")
BTW, the %r{} is just an alternate syntax for regular expressions. You can use %r followed by any character e.g. %r!^1!.
Careful using sub!(/^1/,'') ! In case the string doesn't match /^1/ it will return nil. You should probably use sub (without the bang).
This answer might be more optimised: What is the easiest way to remove the first character from a string?
string[0] = '' if string[0] == '1'
I'd like to post a tiny improvement to the otherwise excellent answer by Zach. The ^ matches the beginning of every line in Ruby regex. This means there can be multiple matches per string. Kenji asked about the beginning of the string which means they have to use this regex instead:
string.sub!(/\A1/, '')
Compare this - multiple matches with this - one match.

Resources