From the documentation for String#count I understand the first example, but I do not understand the rest of the examples:
a = "hello world"
a.count "lo" #=> 5
a.count "lo", "o" #=> 2
a.count "hello", "^l" #=> 4
a.count "ej-m" #=> 4
Any explanation will be helpful.
This is one of the dorkiest ruby methods, and pretty lousy documentation to boot. Threw me for a loop. I ended up looking at it because it looked like it should give me the count of occurrences of a given string. Nope. Not remotely close. But here is how I ended up counting string occurrences:
s="this is a string with is thrice"
s.scan(/is/).count # => 3
Makes me wonder why someone asked for this method, and why the documentation is so lousy. Almost like the person documenting the code really did not have a clue as to the human-understandable "business" reason for asking for this feature.
count([other_str]+) → fixnum
Each _other_str_ parameter defines a set of characters to count. The
intersection of these sets defines the characters to count in str. Any
_other_str_ that starts with a caret (^) is negated. The sequence c1–c2
means all characters between c1 and c2.
If you pass more than 1 parameter to count, it will use the intersection of those strings and will use that as the search target:
a = "hello world"
a.count "lo" #=> finds 5 instances of either "l" or "o"
a.count "lo", "o" #=> the intersection of "lo" and "o" is "o", so it finds 2 instances
a.count "hello", "^l" #=> the intersection of "hello" and "everything that is not "l" finds 4 instances of either "h", "e" or "o"
a.count "ej-m" #=> finds 4 instances of "e", "j", "k", "l" or "m" (the "j-m" part)
Let's break these down
a = "hello world"
to count the number of occurrences of the letters l and o
a.count "lo" #=> 5
to find the intersect of lo and o (which is counting the number of occurrences of l and o and taking only the count of o from the occurrences):
a.count "lo", "o" #=> 2
to count the number of occurrences of h, e, l, l and o, then intersect with any that are not l (which produces the same outcome to finding occurrences of h, e and o)
a.count "hello", "^l" #=> 4
to count the number of occurrences of e and any letter between j and m (j, k, l and m):
a.count "ej-m" #=> 4
Each argument defines a set of characters. The intersection of those sets determines the overall set that count uses to compute a tally.
a = "hello world"
a.count "lo" # l o => 5
a.count "lo", "o" # o => 2
And ^ can be used for negation (all letters in hello, except l)
a.count "hello", "^l" # h e o => 4
Ranges can be defined with -:
a.count "ej-m" # e j k l m => 4
Using gsub in string
a = "hello world hello hello hello hello world world world"
2.1.5 :195 > a.gsub('hello').count
=> 5
I'll take a stab:
Second example: using the wording "The intersection of these sets defines the characters to count in str" the parameters are "lo" and "o". The intersection of these is just "o" of which there are 2 in the string being counted on. Hence the return value of 2.
Third example: This one seems to be saying: "Any of the characters in 'hello' but not the character 'l'". Getting this from the line "Any other_str that starts with a caret (^) is negated". So, you can count the set of letters contained in the string "hello" which are found in "hello world" (i.e., h,e,l,l,o,o,l) but then comparing the intersection with the set of "^l" (i.e. h,e,o,w,o,r,d) you are left with 4 (i.e. h,e,o,o).
Fourth example: This one basically says "count all the 'e' characters and any character between 'j' and 'm'. There is one 'e' and 3 characters contained between 'j' and 'm' and all 3 happen to be the letter 'l' which leaves us with an answer of 4 again.
Related
In every other example I see, str.count is quite simple. It just counts the instances of the parameter in the string. But the example of the method given in the Ruby manual seems incomprehensible (see below). It doesn't even use parenthesis! Could anyone help elucidate this for me?
a = "hello world"
a.count "lo" » 5
a.count "lo", "o" » 2
a.count "hello", "^l" » 4
a.count "ej-m" » 4
It's counting the number of occurences of the letters you passed in as an argument
a.count("lo") # 5, counts [l, o]
hello world
*** * *
# counts all [h, e, o], but not "l" because of the caret
a.count "hello", "^l" # 4
hello world
** * *
a.count "ej-m" # counts e, and the characters in range j, k, l, m
hello world
*** *
There's a couple special characters:
caret ^ is negated.
The - means a range
The \ escapes the other two and is ignored
I have a hard time understanding the following code segment from the Ruby docs:
a = "hello world"
a.count "lo" #=> 5
a.count "lo", "o" #=> 2
a.count "hello", "^l" #=> 4
a.count "ej-m" #=> 4
"hello^world".count "\\^aeiou" #=> 4
"hello-world".count "a\\-eo" #=> 4
especially this code a.count "ej-m". Can anyone please explain how it works?
Just imagine the "pattern" strings as wrapped by [ and ] from regex syntax, that are matched against each character.
So, if we break a = "hello world" into characters:
[1] pry(main)> a = "hello world"
=> "hello world"
[2] pry(main)> a.split('')
=> ["h", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d"]
And convert "ej-m" to regex wrapped with [ and ] we get /[ej-m]/ - which means either 'e' or any character from 'j' to 'm'(including both):
[3] pry(main)> a.split('').select{|c| c=~ /[ej-m]/}
=> ["e", "l", "l", "l"]
We got 4 matches - which is also the result you get. Essensially a.count "ej-m" is equivalent to:
[4] pry(main)> a.split('').count{|c| c=~ /[ej-m]/}
=> 4
Multiple arguments to the method are just and between the matches:
[5] pry(main)> a.split('').count{|c| c =~ /[hello]/ and c =~ /[^l]/}
=> 4
The sequence c1-c2 means all characters between c1 and c2.
So you are providing a range, basically it counts which characters are in that range (>= c1 && <= c2)
i.e:
a = "hello world"
a.count "a-z"
=> 10
a.count "o-w"
=> 4 #(o, o, r, w)
a.count "e-l"
=> 5 #(h, e, l, l, l)
We find that
"hello world".count("ej-m") #=> 4 (_ell_____l_)
Examine the doc for String#count carefully.
Here is how count might be implemented to deal with patterns that closely resemble the pattern "ej-m".
def count_letters(str, pattern)
idx = pattern[1..-2].index('-')
if idx
idx += 1
before, after = pattern[idx-1], pattern[idx+1]
pattern[idx-1..idx+1] = (before..after).to_a.join
end
str.each_char.sum { |c| pattern.include?(c) ? 1 : 0 }
end
count_letters(str, pattern) #=> 4 (_ell_____l_)
However, String#count must also do the following.
Allow for multiple ranges in the pattern
"hello1world".count("e0-9j-mv-x") #=> 6 (_ell__1_w__l_)
If the pattern begins with the character '^'count the number of characters that do not match the remainder of the pattern
"hello world".count("^ej-m") #=> 7 (h___o*wor_d) * = space to count
"hello^world".count("e^j-m") #=> 5 (_ell_^___l_)
"hello world".count("\^ej-m") #=> 7 (h___o*wor_d) * = space to count
Note that escaping '^' at the beginning of the string makes no difference.
Match a hyphen
"hello-world".count("ej-m-") #=> 5 (_ell_-___l_)
"hello-world".count("-ej-m") #=> 5 (_ell_-___l_)
"hello-world".count("ej\-m") #=> 4 (_ell____l_)
Note that escaping a hyphen that is not the first or last character of the pattern makes no difference.
Match a backslash
'hello\world'.count("ej-m\\") #=> 5 (_ell_\___l_)
'hello\world'.count("\\ej-m") #=> 4 (_ell____l_)
Note that a backslash at the beginning of a string is disregarded.
Some of the above results (Ruby v2.4) do not seem to be consistent with the documentation.
I want to find the binary gap using Ruby regex
Say 1000001001010011100000000000, From left I want to use regex to match
A. 1000001 should return 00000
B. 1001 should return 00
C. 101 should return 0
D 1001 should return 00
My first attempt look like this but its missing the B and D
Update
A binary gap within a positive integer N is any maximal sequence of consecutive zeros that is surrounded by ones at both ends in the binary representation of N.
I think what you are looking for is:
/1(0+)(?=1)/
The problem with your pattern is that you consume the "closing 1". Consequence, the next research starts after this "closing 1".
But if you use a lookahead (that is a zero width assertion that doesn't consume characters and only tests what happens after), the "closing 1" isn't consumed and you get the desired result, because the next research starts after the last zero.
Note that if you don't need the zeros to be enclosed between ones, you can also simply use: /0+/
Other way: if you are sure that the string only contains 1s and 0s, you can also use the (non-)word-boundary assertion \B with this pattern: 1\K0++\B
R = /
(?= # start a positive lookahead
1 # match a one
(0+) # match one or more zeros in capture group 1
1 # match a one
) # end positive lookahead
/x # free-spacting regex definition mode
str = "1000001001010011100000000000"
arr = []
str.scan(R) { |m| arr << [m.first, Regexp.last_match.begin(0)+1] }
arr
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
The elements of arr correspond to all all substrings of one or more "0"'s of str that are preceded and followed by 1. The first element of each pair is the substring, the second is the offset into str where the substring begins.
Here's a second example.
str = "10011001010101001110001000100101"
arr = []
str.scan(R) { |m| arr << [m.first, Regexp.last_match.begin(0)+1] }
arr
#=> [["00", 1], ["00", 5], ["0", 8], ["0", 10], ["0", 12], ["00", 14],
# ["000", 19], ["000", 23], ["00", 27], ["0", 30]]
Note that one must use a positive lookahead, rather than a positive lookbehind, as (in Ruby) the latter does not permit variable-length strings (i.e., 0+).
#Stefan, in a comment, suggested an improvement:
R = /
(?<=1) # match a one in a positive lookbehind
0+ # match one or more zeros
(?=1) # match a one in a positive lookahead
/x # free-spacting regex definition mode
str = "1000001001010011100000000000"
arr = []
str.scan(R) { |m| arr << [m, Regexp.last_match.begin(0)] }
arr
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
This is similar to what #Casimir suggests (/1(0+)(?=1)/), except that by putting the first 1 in a positive lookbehind there's no need for the capture group.
Here is another way that does not use a regex.
str = "1000001001010011100000000000"
(0..str.size-3).each_with_object([]) do |i,a|
next if str[i] == '0' || str[i+1] == '1'
ndx = str[i+2..-1].index('1')
a << [str[i+1, 1+ndx], i+1] if ndx
end
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
In order to get only the zeroes in between ones, you need to use regex lookbehind and lookahead:
(?:<=1)0+(?:=1)
After that you only need to get the max lenght element.
DESCRIPTION:
The purpose of my code is to take in input of a sequence of R's and C's and to simply store each number that comes after the character in its proper array.
For Example: "The input format is as follows: R1C4R2C5
Column Array: [ 4, 5 ] Row Array: [1,2]
My problem is I am getting the output like this:
[" ", 1]
[" ", 4]
[" ", 2]
[" ", 5]
**How do i get all the Row integers following R in one array, and all the Column integers following C in another seperate array. I do not want to create multiple arrays, Rather just two.
Help!
CODE:
puts 'Please input: '
input = gets.chomp
word2 = input.scan(/.{1,2}/)
col = []
row = []
word2.each {|a| col.push(a.split(/C/)) if a.include? 'C' }
word2.each {|a| row.push(a.split(/R/)) if a.include? 'R' }
col.each do |num|
puts num.inspect
end
row.each do |num|
puts num.inspect
end
x = "R1C4R2C5"
col = []
row = []
x.chars.each_slice(2) { |u| u[0] == "R" ? row << u[1] : col << u[1] }
p col
p row
The main problem with your code is that you replicate operations for rows and columns. You want to write "DRY" code, which stands for "don't repeat yourself".
Starting with your code as the model, you can DRY it out by writing a method like this to extract the information you want from the input string, and invoke it once for rows and once for columns:
def doit(s, c)
...
end
Here s is the input string and c is the string "R" or "C". Within the method you want
to extract substrings that begin with the value of c and are followed by digits. Your decision to use String#scan was a good one, but you need a different regex:
def doit(s, c)
s.scan(/#{c}\d+/)
end
I'll explain the regex, but let's first try the method. Suppose the string is:
s = "R1C4R2C5"
Then
rows = doit(s, "R") #=> ["R1", "R2"]
cols = doit(s, "C") #=> ["C4", "C5"]
This is not quite what you want, but easily fixed. First, though, the regex. The regex first looks for a character #{c}. #{c} transforms the value of the variable c to a literal character, which in this case will be "R" or "C". \d+ means the character #{c} must be followed by one or more digits 0-9, as many as are present before the next non-digit (here a "R" or "C") or the end of the string.
Now let's fix the method:
def doit(s, c)
a = s.scan(/#{c}\d+/)
b = a.map {|str| str[1..-1]}
b.map(&:to_i)
end
rows = doit(s, "R") #=> [1, 2]
cols = doit(s, "C") #=> [4, 5]
Success! As before, a => ["R1", "R2"] if c => "R" and a =>["C4", "C5"] if c => "C". a.map {|str| str[1..-1]} maps each element of a into a string comprised of all characters but the first (e.g., "R12"[1..-1] => "12"), so we have b => ["1", "2"] or b =>["4", "5"]. We then apply map once again to convert those strings to their Fixnum equivalents. The expression b.map(&:to_i) is shorthand for
b.map {|str| str.to_i}
The last computed quantity is returned by the method, so if it is what you want, as it is here, there is no need for a return statement at the end.
This can be simplified, however, in a couple of ways. Firstly, we can combine the last two statements by dropping the last one and changing the one above to:
a.map {|str| str[1..-1].to_i}
which also gets rid of the local variable b. The second improvement is to "chain" the two remaining statements, which also rids us of the other temporary variable:
def doit(s, c)
s.scan(/#{c}\d+/).map { |str| str[1..-1].to_i }
end
This is typical Ruby code.
Notice that by doing it this way, there is no requirement for row and column references in the string to alternate, and the numeric values can have arbitrary numbers of digits.
Here's another way to do the same thing, that some may see as being more Ruby-like:
s.scan(/[RC]\d+/).each_with_object([[],[]]) {|n,(r,c)|
(n[0]=='R' ? r : c) << n[1..-1].to_i}
Here's what's happening. Suppose:
s = "R1C4R2C5R32R4C7R18C6C12"
Then
a = s.scan(/[RC]\d+/)
#=> ["R1", "C4", "R2", "C5", "R32", "R4", "C7", "R18", "C6", "C12"]
scan uses the regex /([RC]\d+)/ to extract substrings that begin with 'R' or 'C' followed by one or more digits up to the next letter or end of the string.
b = a.each_with_object([[],[]]) {|n,(r,c)|(n[0]=='R' ? r : c) << n[1..-1].to_i}
#=> [[1, 2, 32, 4, 18], [4, 5, 7, 6, 12]]
The row values are given by [1, 2, 32, 4, 18]; the column values by [4, 5, 7, 6, 12].
Enumerable#each_with_object (v1.9+) creates an array comprised of two empty arrays, [[],[]]. The first subarray will contain the row values, the second, the column values. These two subarrays are represented by the block variables r and c, respectively.
The first element of a is "R1". This is represented in the block by the variable n. Since
"R1"[0] #=> "R"
"R1"[1..-1] #=> "1"
we execute
r << "1".to_i #=> [1]
so now
[r,c] #=> [[1],[]]
The next element of a is "C4", so we will execute:
c << "4".to_i #=> [4]
so now
[r,c] #=> [[1],[4]]
and so on.
rows, cols = "R1C4R2C5".scan(/R(\d+)C(\d+)/).flatten.partition.with_index {|_, index| index.even? }
> rows
=> ["1", "2"]
> cols
=> ["4", "5"]
Or
rows = "R1C4R2C5".scan(/R(\d+)/).flatten
=> ["1", "2"]
cols = "R1C4R2C5".scan(/C(\d+)/).flatten
=> ["4", "5"]
And to fix your code use:
word2.each {|a| col.push(a.delete('C')) if a.include? 'C' }
word2.each {|a| row.push(a.delete('R')) if a.include? 'R' }
m = /(.)(.)(\d+)(\d)/.match("THX1138.")
puts m[0]
c = m.captures #=> HX1138
puts c[0] #=> H
puts m.begin(0) #=> 1
puts c[1] #=> X
puts m.begin(1) #=> 1
puts c[2] #=> 113
puts m.begin(2) #=> 2
I was expecting m.begin(1) to return 2 since X is two elements after the beginning of string.
I am reading the book well grounded rubyist which says
To get the information for capture n,
you provide n as the argument to begin
and/or end.
Similarly I was expecing m.begin(2) to rerturn 3.
Read carefully:
Returns the offset of the start of the nth element of the match array in the string.
So the match array is actually [HX1138,H,X,113,8]
SO
m.begin(0) => offset of HX1138 => 1 in "THX1138"
m.begin(1) => offset of H => 1 in "THX1138"
m.begin(2) => offset of X => 2 in "THX1138"