arrays in Ruby, how to handle this situation? - ruby

Let's say I have the following array:
arr = ["", "2121", "8", "myString"]
I want to return false in case the array contains any non-digit symbols.

arr.all? { |s| s =~ /^\d+$/ }
This will check for each element if it consists only of digits (\d) – If any of them does not, false will be returned.
Edit: You didn't completely specify if the empty string is valid or not. If it is, the line has to be rewritten as follows (as per DarkDust):
arr.all? {|s| s =~ /^\d*$/ }

If empty strings are allowed:
def contains_non_digit(array)
!array.select {|s| s =~ /^.*[^0-9].*$/}.empty?
end
Explanation: this filters the array for all strings that match a regular expression. This regex is true for a string that contains at least one non-digit character. If the resulting array is empty, the array contains no non-digit strings. Finally, we need to negate the result, because we want to know the array does contain non-digit strings.

Related

How to return the whole array instead of a single string

I am trying to return all words which have more than four letters in the below exercise.
def timed_reading(max_length, text)
var_b = text.split(" ")
var_b.map do |i|
if i.length >= max_length
return i
end
end
end
print timed_reading(4,"The Fox asked the stork, 'How is the soup?'")
# >> asked
I seem to get only one word.
If you want to filter a list and select only certain kinds of entries, use the select method:
var_b.select do |i|
i.length >= max_length
end
Where that's all you need.
The return i in the middle is confusing things, as that breaks out of the loop and returns a single value from the method itself. Remember that in Ruby, unlike others such as JavaScript, return is often implied and doesn't need to be spelled out explicitly.
Blocks don't normally have return in them for this reason unless they need to interrupt the flow and break out of the method itself.
You don't need to first extract all words from the string and then select those having at least four letters. Instead you can just extract the desired words using String#scan with a regular expression.
str = "The Fox asked the stork, 'How is the soup?'? Très bon?"
str.scan /\p{Alpha}{4,}/
#=> ["asked", "stork", "soup", "Très"]
The regular expression reads, "Match strings containing 4 or more letters". I've used \p{Alpha} (same as \p{L} and [[:alpha:]]) to match unicode letters. (These are documented in Regexp. Search for these expressions there.) You could replace \p{Alpha} with [a-zA-Z], but in that case "Très" would not be matched.
If you wish to also match digits, use \p{Alnum} or [[:alnum:]] instead. While \w also matches letters (English only) and digits, it also matches underscores, which you probably don't want in this situation.
Punctuation can be a problem when words are extracted from the string by splitting on whitespace.
arr = "That is a cow.".split
#=> ["That", "is", "a", "cow."]
arr.select { |word| word.size >= 4 }
#=> ["That", "cow."]
but "cow" has only three letters. If you instead used String#scan to extract words from the string you obtain the desired result.
arr = "That is a cow?".scan /\p{Alpha}+/
#=> ["That", "is", "a", "cow"]
arr.select { |word| word.size >= 4 }
#=> ["That"]
However, if you use scan you may as well use a regular expression to retrieve only words having at least 4 characters, and skip the extra step.

Simple regex - ignoring certain characters

I'm trying to use the match method with an argument of a regex to select a valid phone number, by definition, any string with nine digits.
For example:
9347584987 is valid,
(456)322-3456 is valid,
(324)5688890 is valid.
But
(340)HelloWorld is NOT valid and
456748 is NOT valid.
So far, I'm able to use \d{9} to select the example string of 9 digit characters in a row, but I'm not sure how to specifically ignore any character, such as '-' or '(' or ')' in the middle of the sequence.
What kind of Regex could I use here?
Given:
nums=['9347584987','(456)322-3456','(324)5688890','(340)HelloWorld', '456748 is NOT valid']
You can split on a NON digit and rejoin to remove non digits:
> nums.map {|s| s.split(/\D/).join}
["9347584987", "4563223456", "3245688890", "340", "456748"]
Then filter on the length:
> nums.map {|s| s.split(/\D/).join}.select {|s| s.length==10}
["9347584987", "4563223456", "3245688890"]
Or, you can grab a group of numbers that look 'phony numbery' by using a regex to grab digits and common delimiters:
> nums.map {|s| s[/[\d\-()]+/]}
["9347584987", "(456)322-3456", "(324)5688890", "(340)", "456748"]
And then process that list as above.
That would delineate:
> '123 is NOT a valid area code for 456-7890'[/[\d\-()]+/]
=> "123" # no match
vs
> '123 is NOT a valid area code for 456-7890'.split(/\D/).join
=> "1234567890" # match
I suggest using one regular expression for each valid pattern rather than constructing a single regex. It would be easier to test and debug, and easier to maintain the code. If, for example, "123-456-7890" or 123-456-7890 x231" were in future deemed valid numbers, one need only add a single, simple regex for each to the array VALID_PATTERS below.
VALID_PATTERS = [/\A\d{10}\z/, /\A\(\d{3}\)\d{3}-\d{4}\z/, /\A\(\d{3}\)\d{7}\z/]
def valid?(str)
VALID_PATTERS.any? { |r| str.match?(r) }
end
ph_nbrs = %w| 9347584987 (456)322-3456 (324)5688890 (340)HelloWorld 456748 |
ph_nbrs.each { |s| puts "#{s.ljust(15)} \#=> #{valid?(s)}" }
9347584987 #=> true
(456)322-3456 #=> true
(324)5688890 #=> true
(340)HelloWorld #=> false
456748 #=> false
String#match? made its debut in Ruby v2.4. There are many alternatives, including str.match(r) and str =~ r.
"9347584987" =~ /(?:\d.*){9}/ #=> 0
"(456)322-3456" =~ /(?:\d.*){9}/ #=> 1
"(324)5688890" =~ /(?:\d.*){9}/ #=> 1
"(340)HelloWorld" =~ /(?:\d.*){9}/ #=> nil
"456748" =~ /(?:\d.*){9}/ #=> nil
Pattern: (Rubular Demo)
^\(?\d{3}\)?\d{3}-?\d{4}$ # this makes the expected symbols optional
This pattern will ensure that an opening ( at the start of the string is followed by 3 numbers the a closing ).
^(\(\d{3}\)|\d{3})\d{3}-?\d{4}$
On principle, though, I agree with melpomene in advising that you remove all non-digital characters, test for 9 character length, then store/handle the phone numbers in a single/reliable/basic format.

Finding the first duplicate character in the string Ruby

I am trying to call the first duplicate character in my string in Ruby.
I have defined an input string using gets.
How do I call the first duplicate character in the string?
This is my code so far.
string = "#{gets}"
print string
How do I call a character from this string?
Edit 1:
This is the code I have now where my output is coming out to me No duplicates 26 times. I think my if statement is wrongly written.
string "abcade"
puts string
for i in ('a'..'z')
if string =~ /(.)\1/
puts string.chars.group_by{|c| c}.find{|el| el[1].size >1}[0]
else
puts "no duplicates"
end
end
My second puts statement works but with the for and if loops, it returns no duplicates 26 times whatever the string is.
The following returns the index of the first duplicate character:
the_string =~ /(.)\1/
Example:
'1234556' =~ /(.)\1/
=> 4
To get the duplicate character itself, use $1:
$1
=> "5"
Example usage in an if statement:
if my_string =~ /(.)\1/
# found duplicate; potentially do something with $1
else
# there is no match
end
s.chars.map { |c| [c, s.count(c)] }.drop_while{|i| i[1] <= 1}.first[0]
With the refined form from Cary Swoveland :
s.each_char.find { |c| s.count(c) > 1 }
Below method might be useful to find the first word in a string
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
I believe the question can be interpreted in either of two ways (neither involving the first pair of adjacent characters that are the same) and offer solutions to each.
Find the first character in the string that is preceded by the same character
I don't believe we can use a regex for this (but would love to be proved wrong). I would use the method suggested in a comment by #DaveNewton:
require 'set'
def first_repeat_char(str)
str.each_char.with_object(Set.new) { |c,s| return c unless s.add?(c) }
nil
end
first_repeat_char("abcdebf") #=> b
first_repeat_char("abcdcbe") #=> c
first_repeat_char("abcdefg") #=> nil
Find the first character in the string that appears more than once
r = /
(.) # match any character in capture group #1
.* # match any character zero of more times
? # do the preceding lazily
\K # forget everything matched so far
\1 # match the contents of capture group 1
/x
"abcdebf"[r] #=> b
"abccdeb"[r] #=> b
"abcdefg"[r] #=> nil
This regex is fine, but produces the warning, "regular expression has redundant nested repeat operator '*'". You can disregard the warning or suppress it by doing something clunky, like:
r = /([^#{0.chr}]).*?\K\1/
where ([^#{0.chr}]) means "match any character other than 0.chr in capture group 1".
Note that a positive lookbehind cannot be used here, as they cannot contain variable-length matches (i.e., .*).
You could probably make your string an array and use detect. This should return the first char where the count is > 1.
string.split("").detect {|x| string.count(x) > 1}
I'll use positive lookahead with String#[] method :
"abcccddde"[/(.)(?=\1)/] #=> c
As a variant:
str = "abcdeff"
p str.chars.group_by{|c| c}.find{|el| el[1].size > 1}[0]
prints "f"

Remove all non-alphabetical, non-numerical characters from a string?

If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.

Must a gsub hash key be a string, not a regexp?

I want to do a sequence of gsubs against one string, so I utilized the fact that gsub can take a hash as the second argument. One thing I wanted to do with gsub is to convert a sequence of one or more space/tab into a single space, so I have something essentially as follows:
gsub(/[ \t]+/, {/[ \t]+/ => ' '})
In my actual code, the first argument is a union of the regexp I gave here, and the second argument includes more key-value pairs.
Now, when I apply this to a string, all of the space/tabs are deleted. I suppose this is because the match to the first argument is not regarded as matching to the key [ \t] in the second argument (hash). Does the match in the second argument hash only looks for exact string match, not regexp match? If so, is there any way to get around it?
This is a related question. If you need to use the hash because many things have to be substituted, this might work:
list = Hash.new{|h,k|if /\s+/ =~ k then ' ' else k end}
list['foo'] = 'bar'
list['apple'] = 'banana'
p "appleabc\t \tabc apple foo".gsub(/\w+|\W+/,list)
#=> "appleabc abc banana bar"
p list
#=>{"foo"=>"bar", "apple"=>"banana"} no garbage
According to the docs, gsub with a hash as the second parameter only matches against literal strings:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
If you want to supply multiple hashes you could work around it by creating a hash, where the key/value pairs are the search => replacement pairs, iterate over the hash, and pass those into the gsub. Because Ruby 1.9+ maintains the insertion order of the hash, you're guaranteed that the search will occur in the order you want.
search_hash = {
'1' => 'one',
'too' => 'two',
/[\t ]+/ => ' '
}
str = "1, too,\t3 , four"
search_hash.each { |n,v| str.gsub!(n, v) }
str #=> "one, two, 3 , four"
If you just want the spaces/tabs replaced with one space, why not just specify that as the replacement, and omit the whole hash?
gsub(/[ \t]+/, ' ')
UPDATE: based on your comment, you can use the block syntax of gsub
gsub(/[ \t]+/) {|match| *do stuff here* }

Resources