I'm trying to match a list of attributes that may have quotes around their value, something like this:
aaa=bbb ccc="ddd" eee=fff
What I want to get is a list of key/value without the quotes.
'aaa' => 'bbb', 'ccc' => 'ddd', 'eee' => 'fff'
The code (ruby) looks like this now :
attrs = {}
str.scan(/(\w+)=(".*?"|\S+)/).each do |k,v|
attrs[k] = v.sub(/^"(.*)"$/, '\1')
end
I don't know if I can get rid of the quotes by just using the regex.
Any idea ?
Thanks !
Try using the pipe for the possible attribue patterns, which is either EQUALS, QUOTE, NO-QUOTE, QUOTE, or EQUALS, NO-WHITESPACE.
str.scan(/(\w+)=("[^"]+"|\S+)/).each do |k, v|
puts "#{k}=#{v}"
end
Tested.
EDIT | Hmm, ok, I give up on a 'pure' regex solution (that will allow whitespace inside the quotes anyway). But you can do this:
attrs = {}
str.scan(/(\w+)=(?:(\w+)|"([^"]+)")/).each do |key, v_word, v_quot|
attrs[key] = v_word || v_quot
end
The key here is to capture the two alternatives and take advantage of the fact that whichever one wasn't matched will be nil.
If you want to allow whitespace around the = just add a \s* on either side of it.
I was able to get rid of the quotes in the regex, but only if I matched the quotes as well.
s = "aaa=bbb ccc=\"ddd\" eee=fff"
s.scan(/([^=]*)=(["]*)([^" ]*)(["]*)[ ]*/).each {|k, _, v, _ | puts "key=#{k} value=#{v}" }
Output is:
key=aaa value=bbb
key=ccc value=ddd
key=eee value=fff
(Match not =)=(Match 0 or more ")(Match not " or space)(Match 0 or more ")zero or more spaces
Then just ignore the quote matches in the processing.
I tried a number of combinations with OR's but could not get the operator precedence and matching to work correctly.
I don't know ruby, but maybe something like ([^ =]*)="?((?<=")[^"]*|[^ ]*)"? works?
Related
I meet some hard task for me. I has a string which need to parse into array and some other elements. I have a troubles with REGEXP so wanna ask help.
I need delete from string all non-digits, except commas (,) and dashes (-)
For example:
"!1,2e,3,6..-10" => "1,2,3,6-10"
"ffff5-10...." => "5-10"
"1.2,15" => "12,15"
and so.
[^0-9,-]+
This should do it for you.Replace by empty string.See demo.
https://regex101.com/r/vV1wW6/44
We must have at least one non-regex solution:
def keep_some(str, keepers)
str.delete(str.delete(keepers))
end
keep_some("!1,2e,3,6..-10", "0123456789,-")
#=> "1,2,3,6-10"
keep_some("ffff5-10....", "0123456789,-")
#=> "5-10"
keep_some("1.2,15", "0123456789,-")
#=> "12,15"
"!1,2e,3,6..-10".gsub(/[^\d,-]+/, '') # => "1,2,3,6-10"
Use String#gsub with a pattern that matches everything except what you want to keep, and replace it with the empty string. In a reguar expression, the negated character class [^whatever] matches everything except the characters in the "whatever", so this works:
a_string.gsub /[^0-9,-]/, ''
Note that the hyphen has to come last, as otherwise it will be interpreted as a range indicator.
To demonstrate, I put all your "before" strings into an Array and used Enumerable#map to run the above gsub call on all of them, producing an Array of the "after" strings:
["!1,2e,3,6..-10", "ffff5-10....", "1.2,15"].map { |s| s.gsub /[^0-9,-]/, '' }
# => ["1,2,3,6-10", "5-10", "12,15"]
I got a string in Ruby like this:
str = "enum('cpu','hdd','storage','nic','display','optical','floppy','other')"
Now i like to return just a array with only the words (not quotes, thats between the round braces (...). The regex below works, buts includes 'enum' which i don't need.
str.scan(/\w+/)
expected result should be:
{"OPTICAL"=>"optical", "DISPLAY"=>"display", "OTHER"=>"other", "FLOPPY"=>"floppy", "STORAGE"=>"storage", "NIC"=>"nic", "HDD"=>"hdd", "CPU"=>"cpu"}
thanks!
I'd suggest using negative lookahead to eliminate words followed by (:
str.scan(/\w+(?!\w|\()/)
Edit: regex updated, now it also excludes \w, so it won't match word prefixes.
Based on the output you wanted this will work.
str = "enum('cpu','hdd','storage','nic','display','optical','floppy','other')"
arr = str.scan(/'(\w+)'/)
hs = Hash[arr.map { |e| [e.first.upcase,e.first] }]
p hs #=> {"CPU"=>"cpu", "HDD"=>"hdd", "STORAGE"=>"storage", "NIC"=>"nic", "DISPLAY"=>"display", "OPTICAL"=>"optical", "FLOPPY"=>"floppy", "OTHER"=>"other"}
If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.
I have special strings like name1="value1" name2='value2'. Values can contain whitespaces and are delimited by either single quotes or double quotes. Names never contain whitespaces. name/value pairs are separated by whitespaces.
I want to parse them into a list of name-value pairs like this
string.magic_split() => { "name1"=>"value1", "name2"=>"value2" }
If Ruby understood lookaround assertions, I could do this by
string.split(/[\'\"](?=\s)/).each do |element|
element =~ /(\w+)=[\'\"](.*)[\'\"]/
hash[$1] = $2
end
but Ruby does not understand lookaround assertions, so I am somewhat stuck.
However, I am sure that there are much more elegant ways to solve this problem anyway, so I turn to you. Do you have a good idea for solving this problem?
This fails on values like '"hi" she said', but it might be good enough.
str = %q(name1="value1" name2='value 2')
p Hash[ *str.chop.split( /' |" |='|="/ ) ]
#=> {"name1"=>"value1", "name2"=>"value 2"}
This is not a complete answer, but Oniguruma, the standard regexp library in 1.9 supports lookaround assertions. It can be installed as a gem if you are using Ruby 1.8.x.
That said, and as Sorpigal has commented, instead of using a regexp I would be inclined to iterate through the string one character at a time keeping track of whether you are in a name portion, when you reach the equals sign, when you are within quotes and when you reach a matched closing quote. On reaching a closing quote you can put the name and value into the hash and proceed to the next entry.
class String
def magic_split
str = self.gsub('"', '\'').gsub('\' ', '\'\, ').split('\, ').map{ |str| str.gsub("'", "").split("=") }
Hash[str]
end
end
This should do it for you.
class SpecialString
def self.parse(string)
string.split.map{|s| s.split("=") }.inject({}) {|h, a| h[a[0]] = a[1].gsub(/"|'/, ""); h }
end
end
Have a try with : /[='"] ?/
I don't know Ruby syntax but here is a Perl script you could translate
#!/usr/bin/perl
use 5.10.1;
use warnings;
use strict;
use Data::Dumper;
my $str = qq/name1="val ue1" name2='va lue2'/;
my #list = split/[='"] ?/,$str;
my %hash;
for (my $i=0; $i<#list;$i+=3) {
$hash{$list[$i]} = $list[$i+2];
}
say Dumper \%hash;
Output :
$VAR1 = {
'name2' => 'va lue2',
'name1' => 'val ue1'
};
I want to strip leading and trailing quotes, in Ruby, from a string. The quote character will occur 0 or 1 time. For example, all of the following should be converted to foo,bar:
"foo,bar"
"foo,bar
foo,bar"
foo,bar
You could also use the chomp function, but it unfortunately only works in the end of the string, assuming there was a reverse chomp, you could:
'"foo,bar"'.rchomp('"').chomp('"')
Implementing rchomp is straightforward:
class String
def rchomp(sep = $/)
self.start_with?(sep) ? self[sep.size..-1] : self
end
end
Note that you could also do it inline, with the slightly less efficient version:
'"foo,bar"'.chomp('"').reverse.chomp('"').reverse
EDIT: Since Ruby 2.5, rchomp(x) is available under the name delete_prefix, and chomp(x) is available as delete_suffix, meaning that you can use
'"foo,bar"'.delete_prefix('"').delete_suffix('"')
I can use gsub to search for the leading or trailing quote and replace it with an empty string:
s = "\"foo,bar\""
s.gsub!(/^\"|\"?$/, '')
As suggested by comments below, a better solution is:
s.gsub!(/\A"|"\Z/, '')
As usual everyone grabs regex from the toolbox first. :-)
As an alternate I'll recommend looking into .tr('"', '') (AKA "translate") which, in this use, is really stripping the quotes.
Another approach would be
remove_quotations('"foo,bar"')
def remove_quotations(str)
if str.start_with?('"')
str = str.slice(1..-1)
end
if str.end_with?('"')
str = str.slice(0..-2)
end
end
It is without RegExps and start_with?/end_with? are nicely readable.
It frustrates me that strip only works on whitespace. I need to strip all kinds of characters! Here's a String extension that will fix that:
class String
def trim sep=/\s/
sep_source = sep.is_a?(Regexp) ? sep.source : Regexp.escape(sep)
pattern = Regexp.new("\\A(#{sep_source})*(.*?)(#{sep_source})*\\z")
self[pattern, 2]
end
end
Output
'"foo,bar"'.trim '"' # => "foo,bar"
'"foo,bar'.trim '"' # => "foo,bar"
'foo,bar"'.trim '"' # => "foo,bar"
'foo,bar'.trim '"' # => "foo,bar"
' foo,bar'.trim # => "foo,bar"
'afoo,bare'.trim /[aeiou]/ # => "foo,bar"
Assuming that quotes can only appear at the beginning or end, you could just remove all quotes, without any custom method:
'"foo,bar"'.delete('"')
I wanted the same but for slashes in url path, which can be /test/test/test/ (so that it has the stripping characters in the middle) and eventually came up with something like this to avoid regexps:
'/test/test/test/'.split('/').reject(|i| i.empty?).join('/')
Which in this case translates obviously to:
'"foo,bar"'.split('"').select{|i| i != ""}.join('"')
or
'"foo,bar"'.split('"').reject{|i| i.empty?}.join('"')
Regexs can be pretty heavy and lead to some funky errors. If you are not dealing with massive strings and the data is pretty uniform you can use a simpler approach.
If you know the strings have starting and leading quotes you can splice the entire string:
string = "'This has quotes!'"
trimmed = string[1..-2]
puts trimmed # "This has quotes!"
This can also be turned into a simple function:
# In this case, 34 is \" and 39 is ', you can add other codes etc.
def trim_chars(string, char_codes=[34, 39])
if char_codes.include?(string[0]) && char_codes.include?(string[-1])
string[1..-2]
else
string
end
end
You can strip non-optional quotes with scan:
'"foo"bar"'.scan(/"(.*)"/)[0][0]
# => "foo\"bar"