regex get words between braces and quotes (just the words) - ruby

I got a string in Ruby like this:
str = "enum('cpu','hdd','storage','nic','display','optical','floppy','other')"
Now i like to return just a array with only the words (not quotes, thats between the round braces (...). The regex below works, buts includes 'enum' which i don't need.
str.scan(/\w+/)
expected result should be:
{"OPTICAL"=>"optical", "DISPLAY"=>"display", "OTHER"=>"other", "FLOPPY"=>"floppy", "STORAGE"=>"storage", "NIC"=>"nic", "HDD"=>"hdd", "CPU"=>"cpu"}
thanks!

I'd suggest using negative lookahead to eliminate words followed by (:
str.scan(/\w+(?!\w|\()/)
Edit: regex updated, now it also excludes \w, so it won't match word prefixes.

Based on the output you wanted this will work.
str = "enum('cpu','hdd','storage','nic','display','optical','floppy','other')"
arr = str.scan(/'(\w+)'/)
hs = Hash[arr.map { |e| [e.first.upcase,e.first] }]
p hs #=> {"CPU"=>"cpu", "HDD"=>"hdd", "STORAGE"=>"storage", "NIC"=>"nic", "DISPLAY"=>"display", "OPTICAL"=>"optical", "FLOPPY"=>"floppy", "OTHER"=>"other"}

Related

How to remove strings that end with a particular character in Ruby

Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end

Regular Expression to select text between curly braces, Ruby

I'm working on a way to filter and replace broken curly brace tags such as {{hello}. I've tried out a few regular expressions from here in Stack and tried on my own. The closes I've come is using this regex
(?=(\}(?!\})))((?<!\})\}) which selects the last tag in the example code block below. However it does not select the entire tag, it just selects the ending curly brace }.
{{hello}}
{{world}}}
{{foobar}}
{{hello}
What I need to do is select any tag that is missing the second ending curly brace like {{hello}. Can anyone help me with the regex to select this type of tag?
filter and replace broken curly brace tags
This problem is really easy to solve if you're not nesting things.
Try this:
[\{]+([^}]+)[}]+
Essentially, you can just replace the match with {{\1}} (or {{$1}}, I forget which one Ruby uses.)
It will work as long as there are one or more of { and } consecutively around the match.
I assume we are given a string containing substrings beginning with "{{", followed by a "tag", which is a string of characters other than "{" and "}", followed by either "}" or "}}". We wish to return the tags that are followed by only one right brace. For example:
str = "Sue said {{hello}}, Bob said {{world}\nTom said {{foobar}}, Lola said {{hello}"
We can use the following regex:
r = /
\{\{ # match {{
([^}]+) # match one or more characters other than } in capture group 1
\} # match }
(?:\z|[^}]) # match end of line or a character other than }
# in a non-capture group
/x # free-spacing regex definition mode
str.scan(r).flatten
#=> ["world", "hello"]
The regex could of course be written in the conventional way:
r = /\{\{([^}]+)\}(?:\z|[^}])/
Note
str.scan(r)
=> [["world"], ["hello"]]
hence the need for flatten.
See String#scan for an explanation.
Obviously, the same regex works if
str = "{{hello}}\n{{world}\n{{foobar}}\n{{hello}"
str.scan(r).flatten
#> ["world", "hello"]
If
words = %w| {{hello}} {{world} {{foobar}} {{hello} |
#=> ["{{hello}}", "{{world}", "{{foobar}}", "{{hello}"]
then
words.select { |w| w =~ r }.map { |w| w[/[^{}]+/] }
=> ["world", "hello"]
I suggest using the following expression:
/(?<!{){{[^{}]+}(?!})/
See the regex101 demo
The pattern will match any string of text that starts with {{ not preceded with {, followed with any 1+ characters other than { and } and then a } that is not followed with }. Thus, this pattern matches strings of exactly {{xxx} structure.
Here is a Ruby demo:
"{{hello}".gsub(/(?<!{){{[^{}]+}(?!})/, "\\0}")
# => {{hello}}
Pattern details:
(?<!{) - a negative lookbehind failing the match if a { appears immediately to the left of the current position
{{ - literal {{
[^{}]+ - 1+ characters other than { and } (to allow empty values, use * instead of +)
} - a closing single }
(?!}) - a negative lookahead failing the match if a } appears right after the previously matched }.

How in ruby delete all non-digits symbols (except commas and dashes)

I meet some hard task for me. I has a string which need to parse into array and some other elements. I have a troubles with REGEXP so wanna ask help.
I need delete from string all non-digits, except commas (,) and dashes (-)
For example:
"!1,2e,3,6..-10" => "1,2,3,6-10"
"ffff5-10...." => "5-10"
"1.2,15" => "12,15"
and so.
[^0-9,-]+
This should do it for you.Replace by empty string.See demo.
https://regex101.com/r/vV1wW6/44
We must have at least one non-regex solution:
def keep_some(str, keepers)
str.delete(str.delete(keepers))
end
keep_some("!1,2e,3,6..-10", "0123456789,-")
#=> "1,2,3,6-10"
keep_some("ffff5-10....", "0123456789,-")
#=> "5-10"
keep_some("1.2,15", "0123456789,-")
#=> "12,15"
"!1,2e,3,6..-10".gsub(/[^\d,-]+/, '') # => "1,2,3,6-10"
Use String#gsub with a pattern that matches everything except what you want to keep, and replace it with the empty string. In a reguar expression, the negated character class [^whatever] matches everything except the characters in the "whatever", so this works:
a_string.gsub /[^0-9,-]/, ''
Note that the hyphen has to come last, as otherwise it will be interpreted as a range indicator.
To demonstrate, I put all your "before" strings into an Array and used Enumerable#map to run the above gsub call on all of them, producing an Array of the "after" strings:
["!1,2e,3,6..-10", "ffff5-10....", "1.2,15"].map { |s| s.gsub /[^0-9,-]/, '' }
# => ["1,2,3,6-10", "5-10", "12,15"]

Remove all non-alphabetical, non-numerical characters from a string?

If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

Matching attributes list with or without quotes

I'm trying to match a list of attributes that may have quotes around their value, something like this:
aaa=bbb ccc="ddd" eee=fff
What I want to get is a list of key/value without the quotes.
'aaa' => 'bbb', 'ccc' => 'ddd', 'eee' => 'fff'
The code (ruby) looks like this now :
attrs = {}
str.scan(/(\w+)=(".*?"|\S+)/).each do |k,v|
attrs[k] = v.sub(/^"(.*)"$/, '\1')
end
I don't know if I can get rid of the quotes by just using the regex.
Any idea ?
Thanks !
Try using the pipe for the possible attribue patterns, which is either EQUALS, QUOTE, NO-QUOTE, QUOTE, or EQUALS, NO-WHITESPACE.
str.scan(/(\w+)=("[^"]+"|\S+)/).each do |k, v|
puts "#{k}=#{v}"
end
Tested.
EDIT | Hmm, ok, I give up on a 'pure' regex solution (that will allow whitespace inside the quotes anyway). But you can do this:
attrs = {}
str.scan(/(\w+)=(?:(\w+)|"([^"]+)")/).each do |key, v_word, v_quot|
attrs[key] = v_word || v_quot
end
The key here is to capture the two alternatives and take advantage of the fact that whichever one wasn't matched will be nil.
If you want to allow whitespace around the = just add a \s* on either side of it.
I was able to get rid of the quotes in the regex, but only if I matched the quotes as well.
s = "aaa=bbb ccc=\"ddd\" eee=fff"
s.scan(/([^=]*)=(["]*)([^" ]*)(["]*)[ ]*/).each {|k, _, v, _ | puts "key=#{k} value=#{v}" }
Output is:
key=aaa value=bbb
key=ccc value=ddd
key=eee value=fff
(Match not =)=(Match 0 or more ")(Match not " or space)(Match 0 or more ")zero or more spaces
Then just ignore the quote matches in the processing.
I tried a number of combinations with OR's but could not get the operator precedence and matching to work correctly.
I don't know ruby, but maybe something like ([^ =]*)="?((?<=")[^"]*|[^ ]*)"? works?

Resources