Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
This is my code:
['banana', 'anana', 'naan'].map do |food|
food.reverse
end.select { |f| f.match /^a/ }
# => ["ananab", "anana"]
I wonder if this regex is used to find a word which include characters different from "a" (Negation) or if the regex matches a word, that begins with "a" (ananab and anana from my understanding).
Can someone help me a little more?
If you thought ^ means negation, that is so only when it is the first character in [], which expresses character groups. Even if you have a negation [^a], that does not mean a string that does not have the character a. It means a string that has a character other than a.
Regex is a tool to try to match something. In its implementation, it will try to match a pattern in any way possible by changing the match position, backtracking, etc. If you want to see if a string does not match a pattern, the most straightforward way is to use negation on the predicate, not on the regex. The following will return true when string s does not include an a:
s !~ /a/
But in such simple case, you can rather do:
!s.include?("a")
or instead of select, you can use reject:
reject{|s| s.include?("a")}
The regex /^a/ means: "match any string where the first character is an a". The ^ character here indicates the start of a string.
I believe what you're looking for is /[^a]/, which means: "match any string that contains a character that is not an a". The [^...] syntax will match any character except those within the braces.
To really get at what you want, throw a * on the end of there /^[^a]*$/, to say: "match any string where all the characters from start (^) to finish ($) are not a.
Addendum: As per the comments, anywhere I've written "string", I really mean "string or line". The ^ and $ characters are anchors, pinning the regex to the start or end of the line. Or, in the case of a string without any newlines, it anchors to the start or end of the string.
It is not clear what your question is. Are you asking what the code you posted does, or how to exclude items from the array that contain the letter "a" ?
The code you posted:
['banana', 'anana', 'naan'].map do |food|
food.reverse
end.select { |f| f.match /^a/ }
does the following:
first it creates a new array were each element is reversed (assumes each element is a string)
so
['banana', 'anana', 'naan'].map do |food|
food.reverse
end
should result in an array like this:
["ananab", "anana", "naan"]
This is because map takes each element in the array and injects it into the block
do |food|
food.reverse
end
which does a string reverse on each elemet and creates a new array were each element has been "mapped" to the reversed string
then the
.select { |f| f.match /^a/ }
part will create another array from ["ananab", "anana", "naan"]
containing each element that begins with the letter a (/^a/ means begins with "a")
resulting in the array
["ananab", "anana"]
If your question is how to exclude words containing the letter a then
['banana', 'anana', 'naan'].reject { |s| s.include?("a") }
should do what you want (as sawa pointed out)
Try something like this instead:
['banana', 'anana', 'naan'].map(&:reverse).select { |f| !f.include?("a") }
The code does the following: reverses the strings in the array and filters all strings that contain the character "a".
Related
This question already has answers here:
regex - matching non-necessarily consecutive occurrences
(4 answers)
Closed 3 months ago.
I'm trying to match a regex with a string as long as possible. This is the string to look in:
"xxaxxbxxbxbxxbxxbxxbxxdxx"
The pattern to match is:
"bcda"
The pattern is to be interpreted as follows:
b: There are several of them in the string. The first one should match.
c: There isn't one in the string, so nothing is returned.
d: There is just one near the end of the string. It should be returned.
a: There is one at the beginning of the string. Since b, c, and d were sought first and results are returned, a will not be returned.
The expected return is:
"bd"
It may be that regex match is not the correct way to accomplish this, but I'd like to ask for assistance with one. The basic question is this: can I use regex to generically find a substring that represents as much of a an ordered, but not necessarily consecutive, sequence of candidate characters as it possibly can? If so, how?
As #sawa explained, you cannot do this with a single regex. Here is a recursive solution.
def consecutive_matches(str, pattern)
return '' if str.empty? || pattern.empty?
ch, pat = pattern[0], pattern[1..-1]
i = str.index(ch)
if i
ch + consecutive_matches(str[i+1..-1], pat)
else
consecutive_matches(str, pat)
end
end
str = "xxaxxbxxbxbxxbxxbxxbxxdxx"
consecutive_matches(str, "bcda") #=> "bd"
consecutive_matches(str, "abcd") #=> "abd"
consecutive_matches(str, "dabc") #=> "d"
consecutive_matches(str, "cfgh") #=> ""
It is impossible to do that with a single regex match. A capture in a regex must be a substring of the original string. bd here is not, so there is no way to match that as a single capture.
Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end
I am going through the Peter Cooper book "Beginning Ruby" and I have some questions regarding some of the string methods and regular expression usage. I think I'm clear on what a regular expression is: "a string that describes a pattern for matching elements in other strings."
So:
"This is a test".scan(/\w\w/) {|x| puts x}
Output:
Th
is
is
te
st
=> "This is a test"
So it prints two characters at a time. I didn't realize it also returns the original string. Why is this?
Also,
"This is a test".scan(/[aeiou]/) { |x| puts x }
What do the brackets do? I think they are called character classes, but I am not sure exactly what they do. The explanation in Cooper's book isn't totally verbose and clear.
Explanation of character classes:
"The last important aspect of regular expressions you need to understand at this stage is
character classes. These allow you to match against a specific set of characters. For example, you can scan through all the vowels in a string:"
Yes, it is called a character class.
A character class defines a set of characters. Saying, "match one character specified by the class". The two implementations of a character class are considered a positive class [ ] and a negative class [^ ]. The positive character class allows you to define a list of characters, any one of which may appear in a string for a match to occur while the negative class allows you to define a list of characters that must NOT appear in a string for a match to occur.
Explanation of your character class:
[aeiou] # any character of: 'a', 'e', 'i', 'o', 'u'
The scan method usually returns an array with the matches, but it optionally accepts a block, which is equivalent to do an each of the resulting array.
Here is the documentation: http://www.ruby-doc.org/core-2.1.3/String.html#method-i-scan
To the second question, #hwnd already gave you a clear answer. The best way to learn this is to experiment, regex101.com is the online tool I usually use. It lists explanations for all your matching elements, so it's a wonderful learning resource too.
Some things you might like to try:
123abab12ab1234 with pattern [123]
123abab12ab1234 with pattern [ab]+
123abab12ab1234 with pattern b[1|a]
One thing to remember is that a character class matches ONE character, for example:
str = 'XXXaeiouXXX'
puts str
str.sub!(/[aeiou]/, '.')
puts str
--output:--
XXXaeiouXXX
XXX.eiouXXX
A character class says, "Match this character OR this character OR this character...ONE TIME ".
Also check out rubular:
http://rubular.com/
I didn't realize it also returns the original string. Why is this?
So that you can chain methods together:
my_str.scan(...).downcase.capitalize.each_char {|char| puts char}.upcase.chomp
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
examp = [["packed"], ["crud"], ["paced"], ["it"], ["emo"], ["wrote"], ["pcd"], ["ppcd"], ["pcd"]]
word = 'pcd'
foo = examp.select { |a| a[0][/[aeiou#{word}]/] }
p foo
Expected output:
[["paced"], ["pcd"]]
Actual output:
[["packed"], ["crud"], ["paced"], ["it"], ["emo"], ["wrote"], ["pcd"]]
Edit:
["ppcd"] and ["pcd"] (second time) added to the array. I forgot to mention that I also want to exclude words that have more than one occurrence of a letter in pcdaeiou, or words that appear more than once. Sorry.
Edit 2:
The specific problem is I want to filter an array by a given string (of letters) + some other letters. I don't want the out put to contain words with anything but, in the example, pcdaeiou. However, duplicates of aeiou are allowed, I just dont want repeated instances of p,c, or d.
[] in a regex denotes a character class. It matches a character that is any of the characters within it. So it's finding any word that contains a, e, i, o, u, p, c or d.
It's not clear what your really after though... Are you saying that a p, c, and d is required, but also allow vowels, but not any other consonants? If so, I'd say it's simplest to use two regexes. One to see if the letters you need are present, and another to make sure it only contains letters you allow.
And for all that is holy... use =~
examp.select do |a|
a[0] =~ /#{word}/ && a[0] =~ /^[#{word}aeiou]+$/
end
I assume the question is, "Select all arrays [str] from examp for which all characters of the string str are a, e, i, o or u, or one of the letters of the value of the variable word".
examp = [["packed"], ["crud"], ["paced"], ["it"], ["emo"], ["wrote"], ["pcd"]]
word = 'pcd'
p examp.flatten.select {|a| a.chars.all? {|c|
"aeiou#{word}".include?(c)}}.map {|e| [e]} # => [["paced"], ["pcd"]]
Edit: if, in my statement above of the question, each character of str "consumes" a matching character of "aeioupcd", then my solution could be modified as follows:
examp = [["packed"], ["crud"], ["paced"], ["it"], ["emo"], ["wrote"], ["pcd"], ["ppcd"]]
str = "aeiou#{word}" # => "aeioupcd"
p examp.flatten.select {|a| s = str.dup; a.chars.all? {|c|
s.delete!(c)}}.map {|e| [e]} # => [["paced"], ["pcd"]]
For a == "ppcd", "aeioupcd" is seen to contain the first "p" because "aeioupcd".delete!("p") returns "p" (rather than nil). However, delete! also removes this character from "aeioupcd", so the second "p" in "ppcd" is evaluated in "aeioucd".delete!("p") => nil, so "ppcd" is not selected.
If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.