How to count consecutive consonants at beginning of a string in Ruby? - ruby

I'm still coming to terms with Regex and want to formulate an expression that will let me count the number of successive consonants at the beginning of a string. E.g. 'Cherry' will return 2, 'hello' 1, 'schlepp' 4 and so on. Since the number isn't predetermined (although English probably has some upper limit on initial consonants!) I'd need some flexible expression, but I'm a bit stuck about how to write it. Any pointers would be welcome!

This would work:
'Cherry'[/\A[bcdfghjklmnpqrstvwxyz]*/i].length #=> 2
The regex matches zero or more consonants at the beginning of the string. String#[] returns the matching part and length determines its length.
You can also express the consonants character class more succinct by intersecting [a-z] and [^aeiou] via &&:
'Cherry'[/\A[a-z&&[^aeiou]]*/i].length #=> 2

Something along this line would work:
>> 'Cherry'.downcase.split(/([aeiou].*)/).first.length
# => 2
>> 'hello'.downcase.split(/([aeiou].*)/).first.length
# => 1
>> 'schlepp'.downcase.split(/([aeiou].*)/).first.length
# => 4

Another way is to replace from the first vowel until end of string by nothing then take the length:
'Cherry'.gsub(/[aeiou].*$/,"").length

It is not necessary to use a regular expression.
CONSONANTS = (('a'..'z').to_a - 'aeiou'.chars).join
#=> "bcdfghjklmnpqrstvwxyz"
def consecutive_constants(str)
e, a = str.each_char.chunk { |c| CONSONANTS.include?(c.downcase) }.first
e ? a.size : 0
end
consecutive_constants("THIS is nuts") #=> 2
consecutive_constants("Is this ever cool?") #=> 0
consecutive_constants("1. this is wrong") #=> 0
Note
enum = "THIS is nuts".each_char.chunk { |c| CONSONANTS.include?(c.downcase) }
#=> #<Enumerator: #<Enumerator::Generator:0x000000010e1a40>:each>
We can see the elements that will be generated by this enumerator by applying Enumerable#entries (or Enumerable#to_a):
enum.entries
#=> [[true, ["T", "H"]], [false, ["I"]], [true, ["S"]], [false, [" ", "i"]],
# [true, ["s"]], [false, [" "]], [true, ["n"]], [false, ["u"]], [true, ["t", "s"]]]
Continuing,
e, a = enum.first
#=> [true, ["T", "H"]]
e #=> true
a #=> ["T", "H"]
a.size
#=> 2

Related

Print elements of array of arrays of different size in same line in Ruby

Maybe someone could help me with this. I have an array of arrays. The internal arrays have different sizes (from 2 to 4 elements).
letters = [["A", "B"],["C", "D", "F", "G"],["H", "I", "J" ]]
I'm trying to print in a same line each array havins as first column element[0] and element[1] joined, as 2nd column element[0], element[1], element[2] joined as 3rd column element[0], element[1], element[3] joined. Elements 2 and 3 not always exist.
The output I'm trying to get is like this:
AB
CD CDF CDG
HI HIJ
I'm doing in this way but I'm getting this error.
letters.map{|x| puts x[0]+x[1] + "," + x[0]+x[1]+x[2] + "," + x[0]+x[1]+x[3]}
TypeError: no implicit conversion of nil into String
from (irb):1915:in "+"
from (irb):1915:in "block in irb_binding"
from (irb):1915:in "map"
from (irb):1915
from /usr/bin/irb:11:in "<main>"
letters.each do |a,b,*rest|
puts rest.each_with_object([a+b]) { |s,arr| arr << arr.first + s }.join(' ')
end
prints
AB
CD CDF CDG
HI HIJ
The steps are as follows.
Suppose
letters = [["C", "D", "F", "G"],["H", "I", "J" ]]
Then
enum0 = letters.each
#=> #<Enumerator: [["C", "D", "F", "G"], ["H", "I", "J"]]:each>
The first element of this enumerator is generated and passed to the block, and the three block variables are assigned values.
a, b, *rest = enum0.next
#=> ["C", "D", "F", "G"]
a
#=> "C"
b
#=> "D"
rest
#=> ["F", "G"]
Next, we obtain
enum1 = rest.each_with_object([a+b])
#=> rest.each_with_object(["CD"])
#=> #<Enumerator: ["F", "G"]:each_with_object(["CD"])>
The first element of this enumerator is generated and passed to the block, and the block variables are assigned values.
s, arr = enum1.next
#=> ["F", ["CD"]]
s
#=> "F"
arr
#=> ["CD"]
The block calculation is now performed.
arr << arr.first + s
#=> arr << "CD" + "F"
#=> ["CD", "CDF"]
The second and last element of enum1 is generated and passed to the block, and block variables are assigned values and the block is computed.
s, arr = enum1.next
#=> ["G", ["CD", "CDF"]]
arr << arr.first + s
#=> ["CD", "CDF", "CDG"]
When an attempt to generate another element from enum1 we obtain
enum1.next
#StopIteration: iteration reached an end
Ruby handles the exception by breaking out of the block and returning arr. The elements of arr are then joined:
arr.join(' ')
#=> "CD CDF CDG"
and printed.
The second and last element of enum0 is now generated, passed to the block, and the three block variables are assigned values.
a, b, *rest = enum0.next
#=> ["H", "I", "J"]
a
#=> "H"
b
#=> "I"
rest
#=> ["J"]
The remaining calculations are similar.
Some readers may be unfamiliar with the method Enumerable#each_with_object, which is widely used. Read the doc, but note that here it yields the same result as the code written as follows.
letters.each do |a,b,*rest|
arr = [a+b]
rest.each { |s| arr << arr.first + s }
puts arr.join(' ')
end
By using each_with_object we avoid the need for the statement arr = [a+b] and the statement puts arr.join(' '). The functions of those two statements are of course there in the line using each_with_object, but most Ruby users prefer the flow when when chaining each_with_object to join(' '). One other difference is that the value of arr is confined to each_with_object's block, which is good programming practice.
Looks like you want to join the first two letters, then take the cartesian product with the remaining.
letters.each do |arr|
first = arr.take(2).join
rest = arr.drop(2)
puts [first, [first].product(rest).map(&:join)].join(" ")
end
This provides the exact output you specified.
Just out of curiosity, Enumerable#map-only solution.
letters = [["A", "B"],["C", "D", "F", "G"],["H", "I", "J" ]]
letters.map do |f, s, *rest|
rest.unshift(nil).map { |l| [f, s, l].join }.join(' ')
end.each(&method(:puts))
#⇒ AB
# CD CDF CDG
# HI HIJ

How do I find the first string of differing case in an array?

I have an array of strings, and all contain at least one letter:
["abc", "FFF", "EEE"]
How do I find the index of the first string that is of a different case than any previous string in the array? The function should give 1 for the above since:
FFF".eql?("FFF".upcase)
and that condition isn't true for any previous string in the array, whereas:
["P", "P2F", "ccc", "DDD"]
should yield 2 since "ccc" is not capitalized and all its predecessors are.
I know how to find the first string that is capitalized using
string_tokens.find_index { |w| w == w.upcase }
but I can't figure out how to adjust the above to account for differing case.
You could take each consecutive pair of elements and compare their upcase-ness. When they differ, you return the index.
def detect_case_change(ary)
ary.each_cons(2).with_index(1) do |(a, b), idx|
return idx if (a == a.upcase) != (b == b.upcase)
end
nil
end
detect_case_change ["abc", "FFF", "EEE"] # => 1
detect_case_change ["P", "P2F", "ccc", "DDD"] # => 2
This makes some assumptions about your data being composed entirely of 'A'..'Z' and 'a'..'z':
def find_case_mismatch(list)
index = list.each_cons(2).to_a.index do |(a,b)|
a[0].ord & 32 != b[0].ord & 32
end
index && index + 1
end
This compares the character values. 'A' differs from 'a' by one bit, and that bit is always in the same place (0x20).
Enumerable#chunk helps a lot for this task.
Enumerates over the items, chunking them together based on the return
value of the block. Consecutive elements which return the same block
value are chunked together.
l1 = ["abc", "FFF", "EEE"]
l2 = ["P", "P2F", "ccc", "DDD"]
p l1.chunk{|s| s == s.upcase }.to_a
# [[false, ["abc"]], [true, ["FFF", "EEE"]]]
p l2.chunk{|s| s == s.upcase }.to_a
# [[true, ["P", "P2F"]], [false, ["ccc"]], [true, ["DDD"]]]
The fact that you need an index makes it a bit less readable, but here's an example. The desired index (if it exists) is the size of the first chunk:
p l1.chunk{|s| s == s.upcase }.first.last.size
# 1
p l2.chunk{|s| s == s.upcase }.first.last.size
# 2
If the case doesn't change at all, it returns the length of the whole array:
p %w(aaa bbb ccc ddd).chunk{|s| s == s.upcase }.first.last.size
# 4
I assume that each element (string) of the array contains at least one letter and only letters of the same case.
def first_case_change_index(arr)
s = arr.map { |s| s[/[[:alpha:]]/] }.join
(s[0] == s[0].upcase ? s.swapcase : s) =~ /[[:upper:]]/
end
first_case_change_index ["abc", "FFF", "EEE"] #=> 1
first_case_change_index ["P", "P2F", "ccc"] #=> 2
first_case_change_index ["P", "P2F", "DDD"] #=> nil
The steps are as follows.
arr = ["P", "2PF", "ccc"]
a = arr.map { |s| s[/[[:alpha:]]/] }
#=> ["P", "P", "c"]
s = a.join
#=> "PPc"
s[0] == s[0].upcase
#=> "P" == "P"
#=> true
t = s.swapcase
#=> "ppC"
t =~ /[[:upper:]]/
#=> 2
Here is another way.
def first_case_change_index(arr)
look_for_upcase = (arr[0] == arr[0].downcase)
arr.each_index.drop(1).find do |i|
(arr[i] == arr[i].upcase) == look_for_upcase
end
end

Most common words in string

I am new to Ruby and trying to write a method that will return an array of the most common word(s) in a string. If there is one word with a high count, that word should be returned. If there are two words tied for the high count, both should be returned in an array.
The problem is that when I pass through the 2nd string, the code only counts "words" twice instead of three times. When the 3rd string is passed through, it returns "it" with a count of 2, which makes no sense, as "it" should have a count of 1.
def most_common(string)
counts = {}
words = string.downcase.tr(",.?!",'').split(' ')
words.uniq.each do |word|
counts[word] = 0
end
words.each do |word|
counts[word] = string.scan(word).count
end
max_quantity = counts.values.max
max_words = counts.select { |k, v| v == max_quantity }.keys
puts max_words
end
most_common('a short list of words with some words') #['words']
most_common('Words in a short, short words, lists of words!') #['words']
most_common('a short list of words with some short words in it') #['words', 'short']
Your method of counting instances of the word is your problem. it is in with, so it's double counted.
[1] pry(main)> 'with some words in it'.scan('it')
=> ["it", "it"]
It can be done easier though, you can group an array's contents by the number of instances of the values using an each_with_object call, like so:
counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
This goes through each entry in the array and adds 1 to the value for each word's entry in the hash.
So the following should work for you:
def most_common(string)
words = string.downcase.tr(",.?!",'').split(' ')
counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
max_quantity = counts.values.max
counts.select { |k, v| v == max_quantity }.keys
end
p most_common('a short list of words with some words') #['words']
p most_common('Words in a short, short words, lists of words!') #['words']
p most_common('a short list of words with some short words in it') #['words', 'short']
As Nick has answered your question, I will just suggest another way this can be done. As "high count" is vague, I suggest you return a hash with downcased words and their respective counts. Since Ruby 1.9, hashes retain the order that key-value pairs have been entered, so we may want to make use of that and return the hash with key-value pairs ordered in decreasing order of values.
Code
def words_by_count(str)
str.gsub(/./) do |c|
case c
when /\w/ then c.downcase
when /\s/ then c
else ''
end
end.split
.group_by {|w| w}
.map {|k,v| [k,v.size]}
.sort_by(&:last)
.reverse
.to_h
end
words_by_count('Words in a short, short words, lists of words!')
The method Array#h was introduced in Ruby 2.1. For earlier Ruby versions, one must use:
Hash[str.gsub(/./)... .reverse]
Example
words_by_count('a short list of words with some words')
#=> {"words"=>2, "of"=>1, "some"=>1, "with"=>1,
# "list"=>1, "short"=>1, "a"=>1}
words_by_count('Words in a short, short words, lists of words!')
#=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
words_by_count('a short list of words with some short words in it')
#=> {"words"=>2, "short"=>2, "it"=>1, "with"=>1,
# "some"=>1, "of"=>1, "list"=>1, "in"=>1, "a"=>1}
Explanation
Here is what's happening in the second example, where:
str = 'Words in a short, short words, lists of words!'
str.gsub(/./) do |c|... matches each character in the string and sends it to the block to decide what do with it. As you see, word characters are downcased, whitespace is left alone and everything else is converted to a blank space.
s = str.gsub(/./) do |c|
case c
when /\w/ then c.downcase
when /\s/ then c
else ''
end
end
#=> "words in a short short words lists of words"
This is followed by
a = s.split
#=> ["words", "in", "a", "short", "short", "words", "lists", "of", "words"]
h = a.group_by {|w| w}
#=> {"words"=>["words", "words", "words"], "in"=>["in"], "a"=>["a"],
# "short"=>["short", "short"], "lists"=>["lists"], "of"=>["of"]}
b = h.map {|k,v| [k,v.size]}
#=> [["words", 3], ["in", 1], ["a", 1], ["short", 2], ["lists", 1], ["of", 1]]
c = b.sort_by(&:last)
#=> [["of", 1], ["in", 1], ["a", 1], ["lists", 1], ["short", 2], ["words", 3]]
d = c.reverse
#=> [["words", 3], ["short", 2], ["lists", 1], ["a", 1], ["in", 1], ["of", 1]]
d.to_h # or Hash[d]
#=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
Note that c = b.sort_by(&:last), d = c.reverse can be replaced by:
d = b.sort_by { |_,k| -k }
#=> [["words", 3], ["short", 2], ["a", 1], ["in", 1], ["lists", 1], ["of", 1]]
but sort followed by reverse is generally faster.
def count_words string
word_list = Hash.new(0)
words = string.downcase.delete(',.?!').split
words.map { |word| word_list[word] += 1 }
word_list
end
def most_common_words string
hash = count_words string
max_value = hash.values.max
hash.select { |k, v| v == max_value }.keys
end
most_common 'a short list of words with some words'
#=> ["words"]
most_common 'Words in a short, short words, lists of words!'
#=> ["words"]
most_common 'a short list of words with some short words in it'
#=> ["short", "words"]
Assuming string is a string containing multiple words.
words = string.split(/[.!?,\s]/)
words.sort_by{|x|words.count(x)}
Here we split the words in an string and add them to an array. We then sort the array based on the number of words. The most common words will appear at the end.
The same thing can be done in the following way too:
def most_common(string)
counts = Hash.new 0
string.downcase.tr(",.?!",'').split(' ').each{|word| counts[word] += 1}
# For "Words in a short, short words, lists of words!"
# counts ---> {"words"=>3, "in"=>1, "a"=>1, "short"=>2, "lists"=>1, "of"=>1}
max_value = counts.values.max
#max_value ---> 3
return counts.select{|key , value| value == counts.values.max}
#returns ---> {"words"=>3}
end
This is just a shorter solution, which you might want to use. Hope it helps :)
This is the kind of question programmers love, isn't it :) How about a functional approach?
# returns array of words after removing certain English punctuations
def english_words(str)
str.downcase.delete(',.?!').split
end
# returns hash mapping element to count
def element_counts(ary)
ary.group_by { |e| e }.inject({}) { |a, e| a.merge(e[0] => e[1].size) }
end
def most_common(ary)
ary.empty? ? nil :
element_counts(ary)
.group_by { |k, v| v }
.sort
.last[1]
.map(&:first)
end
most_common(english_words('a short list of words with some short words in it'))
#=> ["short", "words"]
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
def common(string)
counts=Hash.new(0)
words=string.downcase.delete('.,!?').split(" ")
words.each {|k| counts[k]+=1}
p counts.sort.reverse[0]
end

Check if line contains specific characters in Ruby

How to check if a string contains only paranthesis, comma, number, and any combination of "urld" separated with a string? It should always be in the below form:
(0,3),u,r,u,l,u #True
(0,3),u,r,u,l,u #True
(0,3),u,r,u,l u #False because of space b/w l and u
I tried something like this but seems like I am off:
if !misc_text.match(/^(\d+,\d+\)(,[udlr])+)/)
The examples suggest that not all of the letters 'u', 'r', 'l', 'd' need be present. I have assumed that the string may contain any number of each of these four letters (in addition to the other permitted characters). I have also assumed that 'any combination of "urld" separated with a string' means that each pair of these four characters must be separated by one or more of the other permitted characters. This is one way to accomplish that:
def check_it(s)
(s.chars.chunk {|c| c =~ /[urld]/}.to_a.size == s.count('urld') &&
s =~ /^[(),\durld]*$/) ? true : false
end
check_it('(0,3),u,r,u,l,u') #=> true
check_it('(0,3),u,r,u,l u') #=> false
check_it('(0,3),u,r,u,lu') #=> false
Suppose
s = '(0,3),u,r,u,lu'
Then
a = s.chars.chunk {|c| c =~ /[urld]/}.to_a
#=> [[0, ["u"]], [0, ["r"]], [0, ["u"]], [0, ["l", "u"]]]
a.size #=> 4
s.count('urld') #=> 5
As a.size < s.count('urld'), count_it() returns false
If instead:
s = '(0,3),u,r,u,l u'
then
s.chars.chunk {|c| c =~ /[urld]/}.to_a.size
#=> a = [[0, ["u"]], [0, ["r"]], [0, ["u"]], [0, ["l"]], [0, ["u"]]]
# a.size => 5
# 5 == s.count('urld') => true
but
s =~ /^[(),\durld]*$/ #=> nil
so check_it() => false.
Perhaps you need this regex: /^(\(\d+,\d+\)(,[udlr])+)$/.
You are able to do as follows, mixing the regexp, and array operations:
class String
def catch str
str.split('').all? {|l| /,#{l}(,|\b)/ =~ self }
end
end
so:
"(0,3),u,r,u,l,d".catch('ulrd') # => true
"(0,3),u,r,u,l d".catch('ulrd') # => false

How to create two seperate arrays from one input?

DESCRIPTION:
The purpose of my code is to take in input of a sequence of R's and C's and to simply store each number that comes after the character in its proper array.
For Example: "The input format is as follows: R1C4R2C5
Column Array: [ 4, 5 ] Row Array: [1,2]
My problem is I am getting the output like this:
[" ", 1]
[" ", 4]
[" ", 2]
[" ", 5]
**How do i get all the Row integers following R in one array, and all the Column integers following C in another seperate array. I do not want to create multiple arrays, Rather just two.
Help!
CODE:
puts 'Please input: '
input = gets.chomp
word2 = input.scan(/.{1,2}/)
col = []
row = []
word2.each {|a| col.push(a.split(/C/)) if a.include? 'C' }
word2.each {|a| row.push(a.split(/R/)) if a.include? 'R' }
col.each do |num|
puts num.inspect
end
row.each do |num|
puts num.inspect
end
x = "R1C4R2C5"
col = []
row = []
x.chars.each_slice(2) { |u| u[0] == "R" ? row << u[1] : col << u[1] }
p col
p row
The main problem with your code is that you replicate operations for rows and columns. You want to write "DRY" code, which stands for "don't repeat yourself".
Starting with your code as the model, you can DRY it out by writing a method like this to extract the information you want from the input string, and invoke it once for rows and once for columns:
def doit(s, c)
...
end
Here s is the input string and c is the string "R" or "C". Within the method you want
to extract substrings that begin with the value of c and are followed by digits. Your decision to use String#scan was a good one, but you need a different regex:
def doit(s, c)
s.scan(/#{c}\d+/)
end
I'll explain the regex, but let's first try the method. Suppose the string is:
s = "R1C4R2C5"
Then
rows = doit(s, "R") #=> ["R1", "R2"]
cols = doit(s, "C") #=> ["C4", "C5"]
This is not quite what you want, but easily fixed. First, though, the regex. The regex first looks for a character #{c}. #{c} transforms the value of the variable c to a literal character, which in this case will be "R" or "C". \d+ means the character #{c} must be followed by one or more digits 0-9, as many as are present before the next non-digit (here a "R" or "C") or the end of the string.
Now let's fix the method:
def doit(s, c)
a = s.scan(/#{c}\d+/)
b = a.map {|str| str[1..-1]}
b.map(&:to_i)
end
rows = doit(s, "R") #=> [1, 2]
cols = doit(s, "C") #=> [4, 5]
Success! As before, a => ["R1", "R2"] if c => "R" and a =>["C4", "C5"] if c => "C". a.map {|str| str[1..-1]} maps each element of a into a string comprised of all characters but the first (e.g., "R12"[1..-1] => "12"), so we have b => ["1", "2"] or b =>["4", "5"]. We then apply map once again to convert those strings to their Fixnum equivalents. The expression b.map(&:to_i) is shorthand for
b.map {|str| str.to_i}
The last computed quantity is returned by the method, so if it is what you want, as it is here, there is no need for a return statement at the end.
This can be simplified, however, in a couple of ways. Firstly, we can combine the last two statements by dropping the last one and changing the one above to:
a.map {|str| str[1..-1].to_i}
which also gets rid of the local variable b. The second improvement is to "chain" the two remaining statements, which also rids us of the other temporary variable:
def doit(s, c)
s.scan(/#{c}\d+/).map { |str| str[1..-1].to_i }
end
This is typical Ruby code.
Notice that by doing it this way, there is no requirement for row and column references in the string to alternate, and the numeric values can have arbitrary numbers of digits.
Here's another way to do the same thing, that some may see as being more Ruby-like:
s.scan(/[RC]\d+/).each_with_object([[],[]]) {|n,(r,c)|
(n[0]=='R' ? r : c) << n[1..-1].to_i}
Here's what's happening. Suppose:
s = "R1C4R2C5R32R4C7R18C6C12"
Then
a = s.scan(/[RC]\d+/)
#=> ["R1", "C4", "R2", "C5", "R32", "R4", "C7", "R18", "C6", "C12"]
scan uses the regex /([RC]\d+)/ to extract substrings that begin with 'R' or 'C' followed by one or more digits up to the next letter or end of the string.
b = a.each_with_object([[],[]]) {|n,(r,c)|(n[0]=='R' ? r : c) << n[1..-1].to_i}
#=> [[1, 2, 32, 4, 18], [4, 5, 7, 6, 12]]
The row values are given by [1, 2, 32, 4, 18]; the column values by [4, 5, 7, 6, 12].
Enumerable#each_with_object (v1.9+) creates an array comprised of two empty arrays, [[],[]]. The first subarray will contain the row values, the second, the column values. These two subarrays are represented by the block variables r and c, respectively.
The first element of a is "R1". This is represented in the block by the variable n. Since
"R1"[0] #=> "R"
"R1"[1..-1] #=> "1"
we execute
r << "1".to_i #=> [1]
so now
[r,c] #=> [[1],[]]
The next element of a is "C4", so we will execute:
c << "4".to_i #=> [4]
so now
[r,c] #=> [[1],[4]]
and so on.
rows, cols = "R1C4R2C5".scan(/R(\d+)C(\d+)/).flatten.partition.with_index {|_, index| index.even? }
> rows
=> ["1", "2"]
> cols
=> ["4", "5"]
Or
rows = "R1C4R2C5".scan(/R(\d+)/).flatten
=> ["1", "2"]
cols = "R1C4R2C5".scan(/C(\d+)/).flatten
=> ["4", "5"]
And to fix your code use:
word2.each {|a| col.push(a.delete('C')) if a.include? 'C' }
word2.each {|a| row.push(a.delete('R')) if a.include? 'R' }

Resources