Drop elements from array if regexp does not match - ruby

Is there a way to do this?
I have an array:
["file_1.jar", "file_2.jar","file_3.pom"]
And I want to keep only "file_3.pom", what I want to do is something like this:
array.drop_while{|f| /.pom/.match(f)}
But This way I keep everything in array but "file_3.pom" is there a way to do something like "not_match"?
I found these:
f !~ /.pom/ # => leaves all elements in array
OR
f !~ /*.pom/ # => leaves all elements in array
But none of those returns what I expect.

How about select?
selected = array.select { |f| /.pom/.match(f) }
p selected
# => ["file_3.pom"]
Hope that helps!

In your case you can use the Enumerable#grep method to get an array of the elements that matches a pattern:
["file_1.jar", "file_2.jar", "file_3.pom"].grep(/\.pom\z/)
# => ["file_3.pom"]
As you can see I've also slightly modified your regular expression to actually match only strings that ends with .pom:
\. matches a literal dot, without the \ it matches any character
\z anchor the pattern to the end of the string, without it the pattern would match .pom everywhere in the string.
Since you are searching for a literal string you can also avoid regular expression altogether, for example using the methods String#end_with? and Array#select:
["file_1.jar", "file_2.jar", "file_3.pom"].select { |s| s.end_with?('.pom') }
# => ["file_3.pom"]

If you whant to keep only Strings witch responds on regexp so you can use Ruby method keep_if.
But this methods "destroy" main Array.
a = ["file_1.jar", "file_2.jar","file_3.pom"]
a.keep_if{|file_name| /.pom/.match(file_name)}
p a
# => ["file_3.pom"]

Related

How in ruby delete all non-digits symbols (except commas and dashes)

I meet some hard task for me. I has a string which need to parse into array and some other elements. I have a troubles with REGEXP so wanna ask help.
I need delete from string all non-digits, except commas (,) and dashes (-)
For example:
"!1,2e,3,6..-10" => "1,2,3,6-10"
"ffff5-10...." => "5-10"
"1.2,15" => "12,15"
and so.
[^0-9,-]+
This should do it for you.Replace by empty string.See demo.
https://regex101.com/r/vV1wW6/44
We must have at least one non-regex solution:
def keep_some(str, keepers)
str.delete(str.delete(keepers))
end
keep_some("!1,2e,3,6..-10", "0123456789,-")
#=> "1,2,3,6-10"
keep_some("ffff5-10....", "0123456789,-")
#=> "5-10"
keep_some("1.2,15", "0123456789,-")
#=> "12,15"
"!1,2e,3,6..-10".gsub(/[^\d,-]+/, '') # => "1,2,3,6-10"
Use String#gsub with a pattern that matches everything except what you want to keep, and replace it with the empty string. In a reguar expression, the negated character class [^whatever] matches everything except the characters in the "whatever", so this works:
a_string.gsub /[^0-9,-]/, ''
Note that the hyphen has to come last, as otherwise it will be interpreted as a range indicator.
To demonstrate, I put all your "before" strings into an Array and used Enumerable#map to run the above gsub call on all of them, producing an Array of the "after" strings:
["!1,2e,3,6..-10", "ffff5-10....", "1.2,15"].map { |s| s.gsub /[^0-9,-]/, '' }
# => ["1,2,3,6-10", "5-10", "12,15"]

regex for a pattern at end of string

I have a string which looks like:
hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0
Through regex I want to get the string after last '/' and until end of line i.e. in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0
I tried this - ^(.+)\/(.+)$ which returns me an array of which first object is "hello/world" and 2nd object is "1.9.2-some-text"
Is there a way to just get "1.9.2-some-text" as the output?
Try using a negative character class ([^…]) like this:
[^\/]+$
This will match one or more of any character other than / followed by the end of the string.
You can use a negated match here.
'hello/world/1.9.2-some-text'.match(Regexp.new('[^/]+$'))
# => "1.9.2-some-text"
Meaning any character except: / (1 or more times) followed by the end of the string.
Although, the simplest way would be to split the string.
'hello/world/1.9.2-some-text'.split('/').last
# => "1.9.2-some-text"
OR
'hello/world/1.9.2-some-text'.split('/')[-1]
# => "1.9.2-some-text"
If you do not need to use a regex, the ordinary way of doing such thing is:
File.basename("hello/world/1.9.2-some-text")
#=> "1.9.2-some-text"
This is one way:
s = 'hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0'
s.lines.map { |l| l[/.*\/(.*)/,1] }
#=> ["1.9.2-some-text", "2.0.2-some-text", "2.11.0"]
You said, "in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0". That's neither a string nor an array, so I assumed you wanted an array. If you want a string, tack .join(', ') onto the end.
Regex's are naturally "greedy", so .*\/ will match all characters up to and including the last / in each line. 1 returns the contents of the capture group (.*) (capture group 1).

How should I write a "smart" string comparison against an array of possible matches?

I have a string from an input form and would like to compare it against a number of possible validation strings using a case-insensitive comparison, and return true if a match is found.
For example, if the input is input = 'florida' (or 'FL', or 'flor.') and I compare it against validate = ['fl', 'florida'], or some such validation array, it should return true.
I know I could use tag select with an explicit list of options, however, this is more of an example. In my case, the field can return multiple types of strings, so I'm trying to find a good solution to "parse". Seems like the sort of problem Ruby is good at?
The fastest way to compare multiple strings, especially when you can have variations on them, is to use a regular expression. Ruby has some helper methods to make this easier:
validate = ['fl', 'florida']
regex = /\b(?:#{ Regexp.union(validate.sort_by{ |s| [-s.size, s] }).source })\b/i
regex # => /\b(?:florida|fl)\b/i
'FL'[regex] # => "FL"
'florida'[regex] # => "florida"
'flor.'[regex] # => nil
Remember that in Ruby, only nil and false are false values, and every other result is considered true. A shortcut to force true/false values is to use !! (not not). Compare the above results with these:
!!'FL'[regex] # => true
!!'florida'[regex] # => true
!!'flor.'[regex] # => false
'flor.' didn't match because the pattern is looking for whole-word matches, due to the surrounding \b (word-boundary) markers. Removing them, or adding flor. to the pattern would fix that:
validate = ['fl', 'florida', 'flor']
regex = /\b(?:#{ Regexp.union(validate.sort_by{ |s| [-s.size, s] }).source })\b/i
'flor.'[regex] # => "flor"
'flor.' can't be used because of the \b which conflict with the trailing . in flor.. Removing \b:
validate = ['fl', 'florida', 'flor.']
regex = /(?:#{ Regexp.union(validate.sort_by{ |s| [-s.size, s] }).source })/i
'flor.'[regex] # => "flor."
You can get very expressive with the values in the validate array when passing them to Regexp.union but watch out for union escaping the contents of the string to protect characters that are special in regular expressions:
Regexp.union(%w[a \b dollars$ . * ? +]) # => /a|\\b|dollars\$|\.|\*|\?|\+/
You can control this:
patterns = %w[a \b dollars$ . \* \? \+]
/#{ patterns.join('|') }/ # => /a|\b|dollars$|.|\*|\?|\+/
Sometimes I build a pattern in several steps, other times I can do it all at once. It's something you have to experiment with.
Back to the beginning. The reason unioned patterns are faster is the regular expression engine is very fast, and the string is only searched once, even if multiple elements are OR'd (using |) in the pattern.
I usually just put the values into an array and then use the include? method, which returns true if the input matches any of the elements of the array.
['fl', 'florida', 'FL'].include?(input)

Remove all non-alphabetical, non-numerical characters from a string?

If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

Chaining array into new split function call

I have the following and am trying to split on '.' and then split the returned first part on '-' and return the last of the first part. I want to return 447.
a="cat-vm-447.json".split('.').split('-')
Also, how would I do this as a regular expression? I have this:
a="cat-vm-447.json".split(/-[\d]+./)
but this is splitting on the value. I want to return the number.
I can do this:
a="cat-vm-447.json".slice(/[\d]+/)
and this gives me back 447 but would really like to specify that the - and . surround it. Adding those in regex return them.
First question. Split returns an array, so you need to use Array#[] to get first(0) or last(-1) elements of this array. Alternatives is Array#first and Array#last methods.
a="cat-vm-447.json".split('.')[0].split('-')[-1] # => "447"
Second question. You can match your number into group and then get it from the response (it will have index 1. Item with index 0 will be full match ("-447." in your case). You can use String#[] or String#match (among others) methods to match your regex.
"cat-vm-447.json"[/-(\d+)\./, 1] # => "447"
# or
"cat-vm-447.json".match(/-(\d+)\./)[1] # => "447"
Split returns an array, so you need to specify the index for the next split.
a="cat-vm-447.json".split('.').first.split('-').last
For the regular expression, you need to wrap what you want to capture in parentheses.
/-(\d+)\./
a = "cat-vm-447.json"
b = a.match(/-(\d+)\./)
p b[0] # => 447
Try something like that:
if "cat-vm-447.json" =~ /([\d]+)/
p $1
else
p "No matches"
end
The parentheses in the regex extract the result in the $1 variable.
When you split your string second time, you actually trying to split Array instead of String.
ruby-1.9.3-head :003 > "cat-vm-447.json".split('.')
# => ["cat-vm-447", "json"]
In regexp case, you can use /[-.]/
ruby-1.9.3-head :008 > "cat-vm-447.json".split(/[-.]/)
# => ["cat", "vm", "447", "json"]
ruby-1.9.3-head :009 > "cat-vm-447.json".split(/[-.]/)[2]
# => "447"

Resources