I've read % notation but I could not find the explanation about the followings.
Example 1: The following code with % outputs i. Obviously % changes i to a string. But I am not sure what actually % is doing.
irb(main):200:0> [[1,2,3],[4,5,6]].each{ |row| p row.map{ |i| % i } }
["i", "i", "i"]
["i", "i", "i"]
=> [[1, 2, 3], [4, 5, 6]]
irb(main):201:0> [[1,2,3],[4,5,6]].each{ |row| p row.map{ |i| i } }
[1, 2, 3]
[4, 5, 6]
=> [[1, 2, 3], [4, 5, 6]]
Example 2: It seems %2d adding 2 spaces in front of a number. Again, I am not sure what %2d is doing.
irb(main):194:0> [[1,2,3],[4,5,6],[7,8,9]].each{ |row| p row.map{|i| "%2d" % i } }
[" 1", " 2", " 3"]
[" 4", " 5", " 6"]
[" 7", " 8", " 9"]
=> [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Where can I find the documentation about these?
Here is the doc - You may also create strings using %:.
There are two different types of % strings %q(...) behaves like a single-quote string (no interpolation or character escaping) while %Q behaves as a double-quote string.....
In your first example p row.map{|i| % i } as per the above doc % i creates a string "i".
Examples :-
[1, 2, 3].map { |i| % i } # => ["i", "i", "i"]
% i # => "i"
Just remember as doc is saying -
Any combination of adjacent single-quote, double-quote, percent strings will be concatenated as long as a percent-string is not last.
From the wikipedia link
Any single non-alpha-numeric character can be used as the delimiter, %[including these], %?or these?,...
Now in your case it is %<space>i<space>. Which in the link I mentioned just above are %[..], %?..? etc.. That is why %<space>i<space> gives "i". (I used <space> to show there is a space)
Read Kernel#format
Returns the string resulting from applying format_string to any additional arguments. Within the format string, any characters other than format sequences are copied to the result.
The syntax of a format sequence is follows.
%[flags][width][.precision]type
A format sequence consists of a percent sign, followed by optional flags, width, and precision indicators, then terminated with a field type character. The field type controls how the corresponding sprintf argument is to be interpreted, while the flags modify that interpretation.
Your last question actually points to a method str % arg → new_str.
If IRB made you fool, like made me while trying to understand % i, don't worry, have a look - why in IRB modulo string literal(%) is behaving differently ?. Good answer Matthew Kerwin is given there.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm trying to write code that simulates writing a text message using a multi-tap telephone keypad in Ruby. This is the telephone keypad:
1 2 3
ABC DEF
4 5 6
GHI JKL MNO
7 8 9
PQRS TUV WXYZ
0
(space)
I tried to define it in Ruby as: (doesn't work)
"0" = [" "] # (adds a space)
"1" = [""] # (adds nothing)
"2" = ["a", "b", "c"]
"3" = ["d", "e", "f"]
"4" = ["g", "h", "i"]
"5" = ["j", "k", "l"]
"6" = ["m", "n", "o"]
"7" = ["p", "q", "r", "s"]
"8" = ["t", "u", "v"]
"9" = ["w", "x", "y", "z"]
I will explain how it works with two examples. First I will send the string goat. To send the g I press 4 once. Next, to send o I press 6 three times (as pressing 6 once would send m and pressing 6 twice would send n). For a press 2 once and for t press 8 once. We therefore would send
466628
g oat
Next, consider cake. By the same procedure we would send
22225533
ca k e
Here there is problem. When decoding this there are several possibilities for 2222. It could be aaaa, bb and so on. To overcome this ambiguity a "pause", represented as a space, is inserted after each string of digits that is followed by a string of the same digit. For cake, therefore, we would write
222 25533
c a k e
I already have a hash with the numbers and its corresponding letters, and I know that I have to sort the numbers by how many times they repeat themselves. But I do not know which method I use for it.
Also, do I have to use the same logic in case I need to encode (number to letter)?
(I had the encoding part first when Cary Swoveland pointed out that you might want decoding. The answer now contains both ways and became quite long, I hope you don't mind)
Your example code doesn't work. You can't just assign to a string literal. However, you could use a hash like this to define your keypad in Ruby:
keypad = {
'0' => [' '],
'1' => [], # <- you can leave this out
'2' => %w[a b c],
'3' => %w[d e f],
'4' => %w[g h i],
'5' => %w[j k l],
'6' => %w[m n o],
'7' => %w[p q r s],
'8' => %w[t u v],
'9' => %w[w x y z],
}
Decoding
To turn 222 25533 into cake, I'd start by splitting consecutive and space-delimited numbers. You could use a regex:
parts = '222 25533'.gsub(/(\d)\1*/).map(&:itself)
#=> ["222", "2", "55", "33"]
This can be converted to an array containing key/number-of-times pairs:
key_strokes = parts.map { |part| [part[0], part.length] }
#=> [["2", 3], ["2", 1], ["5", 2], ["3", 2]]
which can be converted to the letters using the keypad hash:
letters = key_strokes.map { |key, times| key_pad[key][times - 1] }
#=> ["c", "a", "k", "e"]
That - 1 is needed because array indices are zero-based. Finally, turn the letters into a word:
letters.join
#=> "cake"
Encoding
To convert characters to key strokes, I'd create a hash based on the keypad which maps each character to a key/number-of-times pair:
mapping = {}
key_pad.each do |key, values|
values.each.with_index(1) do |char, times|
mapping[char] = [key, times]
end
end
mapping
#=> {
# " "=>["0", 1], "a"=>["2", 1], "b"=>["2", 2], "c"=>["2", 3], "d"=>["3", 1],
# "e"=>["3", 2], "f"=>["3", 3], "g"=>["4", 1], "h"=>["4", 2], "i"=>["4", 3],
# "j"=>["5", 1], "k"=>["5", 2], "l"=>["5", 3], "m"=>["6", 1], "n"=>["6", 2],
# "o"=>["6", 3], "p"=>["7", 1], "q"=>["7", 2], "r"=>["7", 3], "s"=>["7", 4],
# "t"=>["8", 1], "u"=>["8", 2], "v"=>["8", 3], "w"=>["9", 1], "x"=>["9", 2],
# "y"=>["9", 3], "z"=>["9", 4]
# }
In the above hash, "c"=>["2", 3] means that in order to get c you have to press the 2 key 3 times. To render the sequence for a single key in Ruby, we can utilize String#* which repeats a string:
key, times = mapping['c']
key #=> '2'
times #=> 3
key * times
#=> '222'
Getting the key strokes for an entire word (or sentence) is a matter of mapping each character to its respective hash value:
parts = 'cake'.each_char.map { |char| mapping[char] }
#=> [["2", 3], ["2", 1], ["5", 2], ["3", 2]]
To render the actual sequence, we have to first group consecutive runs of the same key:
chunks = parts.chunk_while { |(a, _), (b, _)| a == b }.to_a
#=> [
# [["2", 3], ["2", 1]],
# [["5", 2]],
# [["3", 2]]
# ]
We can now join identical key strokes via space and the chunks without space:
chunks.map { |chunk| chunk.map { |k, t| k * t }.join(' ') }.join
#=> "222 25533"
You are given:
arr = [
["0", [" "]],
["1", [""]],
["2", ["a", "b", "c"]],
["3", ["d", "e", "f"]],
["4", ["g", "h", "i"]],
["5", ["j", "k", "l"]],
["6", ["m", "n", "o"]],
["7", ["p", "q", "r", "s"]],
["8", ["t", "u", "v"]],
["9", ["w", "x", "y", "z"]]
]
I have been puzzled by the inclusion of
["1", [""]],
which seems to serve no purpose. It would have a purpose, however, if instead of
222 25533
to represent the string, "cake", we used
222125533
That is, if two successive characters that are represented by strings of the same digit (such as "222" and "2") they are to be separated by a "1", rather than by a "pause", expressed as a space. If that were done we could encode and decode strings as follows.
Encoding
CHAR_TO_DIGITS = arr.each_with_object({}) do |(num, a),h|
a.each.with_index(1) { |ltr,i| h[ltr] = num * i }
end
#=> {" "=>"0", ""=>"1", "a"=>"2", "b"=>"22", "c"=>"222",
# "d"=>"3", "e"=>"33", "f"=>"333", "g"=>"4", "h"=>"44",
# "i"=>"444", "j"=>"5", "k"=>"55", "l"=>"555", "m"=>"6",
# "n"=>"66", "o"=>"666", "p"=>"7", "q"=>"77", "r"=>"777",
# "s"=>"7777", "t"=>"8", "u"=>"88", "v"=>"888", "w"=>"9",
# "x"=>"99", "y"=>"999", "z"=>"9999"}
def encode(plain_text)
plain_text.each_char.with_object('') do |c,s|
digits = CHAR_TO_DIGITS[c]
s << '1' if !s.empty? && digits[0] == s[-1]
s << digits
end
end
Then
encoded_1 = encode "cake"
#=> "222125533"
encoded_2 = encode "my dog has fleas"
#=> "69990366640442777703335553327777"
Decoding
Decoding is even easier.
DIGITS_TO_CHAR = CHAR_TO_DIGITS.invert
#=> {"0"=>" ", "1"=>"", "2"=>"a", "22"=>"b", "222"=>"c",
# "3"=>"d", "33"=>"e", "333"=>"f", "4"=>"g", "44"=>"h",
# "444"=>"i", "5"=>"j", "55"=>"k", "555"=>"l", "6"=>"m",
# "66"=>"n", "666"=>"o", "7"=>"p", "77"=>"q", "777"=>"r",
# "7777"=>"s", "8"=>"t", "88"=>"u", "888"=>"v", "9"=>"w",
# "99"=>"x", "999"=>"y", "9999"=>"z"}
def decode(encoded_text)
encoded_text.gsub(/(\d)\1*/, DIGITS_TO_CHAR)
end
Then
decode encoded_1
#=> "cake"
decode encoded_2
#=> "my dog has fleas"
This uses the form of String#gsub that employs a hash to make substitutions. See also Hash#invert.
I'd start with converting "222 25533" into an array [[2,3],[2,1],[5,2],[3,2]] where the first number represents a digit and the second is a number of its occurrences.
Having this you can easily find letters from the keypad.
I want to find the binary gap using Ruby regex
Say 1000001001010011100000000000, From left I want to use regex to match
A. 1000001 should return 00000
B. 1001 should return 00
C. 101 should return 0
D 1001 should return 00
My first attempt look like this but its missing the B and D
Update
A binary gap within a positive integer N is any maximal sequence of consecutive zeros that is surrounded by ones at both ends in the binary representation of N.
I think what you are looking for is:
/1(0+)(?=1)/
The problem with your pattern is that you consume the "closing 1". Consequence, the next research starts after this "closing 1".
But if you use a lookahead (that is a zero width assertion that doesn't consume characters and only tests what happens after), the "closing 1" isn't consumed and you get the desired result, because the next research starts after the last zero.
Note that if you don't need the zeros to be enclosed between ones, you can also simply use: /0+/
Other way: if you are sure that the string only contains 1s and 0s, you can also use the (non-)word-boundary assertion \B with this pattern: 1\K0++\B
R = /
(?= # start a positive lookahead
1 # match a one
(0+) # match one or more zeros in capture group 1
1 # match a one
) # end positive lookahead
/x # free-spacting regex definition mode
str = "1000001001010011100000000000"
arr = []
str.scan(R) { |m| arr << [m.first, Regexp.last_match.begin(0)+1] }
arr
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
The elements of arr correspond to all all substrings of one or more "0"'s of str that are preceded and followed by 1. The first element of each pair is the substring, the second is the offset into str where the substring begins.
Here's a second example.
str = "10011001010101001110001000100101"
arr = []
str.scan(R) { |m| arr << [m.first, Regexp.last_match.begin(0)+1] }
arr
#=> [["00", 1], ["00", 5], ["0", 8], ["0", 10], ["0", 12], ["00", 14],
# ["000", 19], ["000", 23], ["00", 27], ["0", 30]]
Note that one must use a positive lookahead, rather than a positive lookbehind, as (in Ruby) the latter does not permit variable-length strings (i.e., 0+).
#Stefan, in a comment, suggested an improvement:
R = /
(?<=1) # match a one in a positive lookbehind
0+ # match one or more zeros
(?=1) # match a one in a positive lookahead
/x # free-spacting regex definition mode
str = "1000001001010011100000000000"
arr = []
str.scan(R) { |m| arr << [m, Regexp.last_match.begin(0)] }
arr
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
This is similar to what #Casimir suggests (/1(0+)(?=1)/), except that by putting the first 1 in a positive lookbehind there's no need for the capture group.
Here is another way that does not use a regex.
str = "1000001001010011100000000000"
(0..str.size-3).each_with_object([]) do |i,a|
next if str[i] == '0' || str[i+1] == '1'
ndx = str[i+2..-1].index('1')
a << [str[i+1, 1+ndx], i+1] if ndx
end
#=> [["00000", 1], ["00", 7], ["0", 10], ["00", 12]]
In order to get only the zeroes in between ones, you need to use regex lookbehind and lookahead:
(?:<=1)0+(?:=1)
After that you only need to get the max lenght element.
I have a string which contains a 2-D array.
b= "[[1, 2, 3], [4, 5, 6]]"
c = b.gsub(/(\[\[)/,"[").gsub(/(\]\])/,"]")
The above is how I decide to flatten it to:
"[1, 2, 3], [4, 5, 6]"
Is there a way to replace the leftmost and rightmost brackets without doing a double gsub call? I'm doing a deeper dive into regular expressions and would like to see different alternatives.
Sometimes, the string may be in the correct format as comma delimited 1-D arrays.
The gsub method accepts a hash, and anything that matches your regular expression will be replaced using the keys/values in that hash, like so:
b = "[[1, 2, 3], [4, 5, 6]]"
c = b.gsub(/\[\[|\]\]/, '[[' => '[', ']]' => ']')
That may look a little jumbled, and in practice I'd probably define the list of swaps on a different line. But this does what you were looking for with one gsub, in a more intuitive way.
Another option is to take advantage of the fact that gsub also accepts a block:
c = b.gsub(/\[\[|\]\]/){|matched_value| matched_value.first}
Here we match any double opening/closing square brackets, and just take the first letter of any matches. We can clean up the regex:
c = b.gsub(/\[{2}|\]{2}/){|matched_value| matched_value.first}
This is a more succinct way to specify that we want to match exactly two opening brackets, or exactly two closing brackets. We can also refine the block:
c = b.gsub(/\[{2}|\]{2}/, &:first)
Here we're using some Ruby shorthand. If you only need to call a simple method on the object passed into a block, you can use the &: notation to do this. I think I've gotten it about as short and sweet as I can. Happy coding!
\[(?=\[)|(?<=\])\]
You can try this.Replace with ``.See demo.
http://regex101.com/r/hQ1rP0/25
Don't even bother with a regular expression, just do a simple string slice:
b= "[[1, 2, 3], [4, 5, 6]]"
b[1 .. -2] # => "[1, 2, 3], [4, 5, 6]"
the string may be in the correct format as comma delimited 1D arrays
Then sense whether it is and conditionally modify it:
b= "[[1, 2, 3], [4, 5, 6]]"
b = b[1 .. -2] if b[0, 2] == '[[' # => "[1, 2, 3], [4, 5, 6]"
Regular expressions aren't universal hammers, and not everything is a nail to be hit with one.
To "squeeze" consecutive occurrences of a specific character set, you can use tr_s:
"[[1,2],[3,4]]".tr_s('[]','[]')
=> "[1,2],[3,4]"
You're saying "translate all runs of square bracket characters to one of that character". To do the same thing with regular expressions and gsub, you can do:
"[[1,2],[3,4]]".gsub(/(\[|\])+/,'\1')
I'm learning ruby and can't figure out what's the problem here.
formatter = "%s %s %s %s"
puts formatter = % [1, 2, 3, 4]
Result:
ex8.rb:3: syntax error, unexpected tINTEGER, expecting $end
puts formatter = % [1, 2, 3, 4]
^
You either a) Don't need that = sign:
formatter = "%s %s %s %s"
puts formatter % [1, 2, 3, 4]
or b) need to assign the result to formatter differently:
formatter = "%s %s %s %s"
puts formatter = formatter % [1, 2, 3, 4]
or
formatter = "%s %s %s %s"
formatter = formatter % [1, 2, 3, 4]
puts formatter
The former answer for b will assign the result to formatter and then output the result of that assignment, which will be the right-hand side. I'd recommend the latter (and you could of course condense the top two lines into a single line) just because it's clearer.
Edit:
Also, if you check the code in Learn Ruby the Hard Way, they're not reassigning anything to formatter. The point is that you can supply any four-item array via formatter % and it will produce the text content of those four items. I see it's just dipping into Ruby methods (and you may be unfamiliar with printf), but the following are equivalent:
puts formatter % [1, 2, 3, 4]
puts formatter.%([1, 2, 3, 4])
# And the very retro
puts sprintf(formatter, 1, 2, 3, 4)
In other words, while there are a few nuances for operators -- just some sugar that you can actually use things like %= to assign the result and you don't need the . separating the object and its method -- these are just methods. You can look up % in Ruby's documentation like any other method.
It's not completely clear what is that you are trying to do. Maybe this?
formatter = "%s %s %s %s"
puts formatter % [1, 2, 3, 4]
# >> 1 2 3 4
Well these are called format specification with arguments,
"I got the following values: %s, %d, %d and %d" % ["Tom", 2, 3, 4]
=> "I got the following values: Tom, 2, 3 and 4"
"%05d" % 123
=> "00123"
More you can find at http://ruby-doc.org/core-1.9.3/String.html#method-i-25
Given two strings of equal length such that
s1 = "ACCT"
s2 = "ATCT"
I would like to find out the positions where there strings differ. So i have done this. (please suggest a better way of doing it. I bet there should be)
z= seq1.chars.zip(seq2.chars).each_with_index.map{|(s1,s2),index| index+1 if s1!=s2}.compact
z is an array of positions where the two strings are different. In this case z returns 2
Imagine that I add a new string
s3 = "AGCT"
and I wish to compare it with the the others and see where the 3 strings differ. We could do the same approach as above but this time
s1.chars.zip(s2.chars,s3.chars)
returns an array of arrays. Given two strings I was relaying on just comparing two chars for equality, but as I add more strings it starts to become overwhelming and as the strings become longer.
#=> [["A", "A", "A"], ["C", "T", "G"], ["C", "C", "C"], ["T", "T", "T"]]
Running
s1.chars.zip(s2.chars,s3.chars).each_with_index.map{|item| item.uniq}
#=> [["A"], ["C", "T", "G"], ["C"], ["T"]]
can help reduce redundancy and return positions that are exactly the same(non empty subarray of size 1). I could then print out the indices and contents of the subarrays that are of size > 1.
s1.chars.zip(s2.chars,s3.chars,s4.chars).each_with_index.map{|item| item.uniq}.each_with_index.map{|a,index| [index+1,a] unless a.size== 1}.compact.map{|h| Hash[*h]}
#=> [{2=>["C", "T", "G"]}]
I feel that this will glide to a halt or get slow as I increase the number of strings and as the string lengths get longer. What are some alternative ways of optimally doing this?
Thank you.
Here's where I'd start. I'm purposely using different strings to make it easier to see the differences:
str1 = 'jackdaws love my giant sphinx of quartz'
str2 = 'jackdaws l0ve my gi4nt sphinx 0f qu4rtz'
To get the first string's characters:
str1.chars.with_index.to_a - str2.chars.with_index.to_a
=> [["o", 10], ["a", 19], ["o", 30], ["a", 35]]
To get the second string's characters:
str2.chars.with_index.to_a - str1.chars.with_index.to_a
=> [["0", 10], ["4", 19], ["0", 30], ["4", 35]]
There will be a little slow down as the strings get bigger, but it won't be bad.
EDIT: Added more info.
If you have an arbitrary number of strings, and need to compare them all, use Array#combination:
str1 = 'ACCT'
str2 = 'ATCT'
str3 = 'AGCT'
require 'pp'
pp [str1, str2, str3].combination(2).to_a
>> [["ACCT", "ATCT"], ["ACCT", "AGCT"], ["ATCT", "AGCT"]]
In the above output you can see that combination cycles through the array, returning the various n sized combinations of the array elements.
pp [str1, str2, str3].combination(2).map{ |a,b| a.chars.with_index.to_a - b.chars.with_index.to_a }
>> [[["C", 1]], [["C", 1]], [["T", 1]]]
Using combination's output you could cycle through the array, comparing all the elements against each other. So, in the above returned array, in the "ACCT" and "ATCT" pair, 'C' was the difference between the two, located at position 1 in the string. Similarly, in "ACCT" and "AGCT" the difference is "C" again, in position 1. Finally for 'ATCT' and 'AGCT' it's 'T' at position 1.
Because we already saw in the longer string samples that the code will return multiple changed characters, this should get you pretty close.
Solution 1
strings = %w[ACCT ATCT AGCT]
First, join the strings, and make a hash of all the positions for each character.
joined = strings.join
positions = (0...joined.length).group_by{|i| joined[i]}
# => {"A"=>[0, 4, 8], "C"=>[1, 2, 6, 10], "T"=>[3, 5, 7, 11], "G"=>[9]}
Then, group the indices by their corresponding position within each string, remove those that are repeated as many times as the number of strings. This part is a variant of an algorithm that Jorg suggests.
length = strings.first.length
n = strings.length
diff = Hash[*positions.map{|k, v|
[k, v.group_by{|i| i % length}.reject{|i, is| is.length == n}.keys]
}]
This will give something like:
diff
# => {"A"=>[], "C"=>[1], "T"=>[1], "G"=>[1]}
which means that, "A" appears in the same positions in all strings, and "C", "T", and "G" differ at position 1 (count starts from 0) of the strings.
If you simply want to know the positions where the strings differ, do
diff["G"] + diff["A"] + diff["C"] + diff["T"]
# or diff["G"] + diff["A"] + diff["C"]
# => [1]
Solution 2
Note that, by maintaining an array of indices where a pairwise comparison fails, and keep adding to indices to it, comparison of s1 against the rest (s2, s3, ...) will suffice.
length = s1.length
diff = []
[s2, s3, ...].each{|s| diff += (0...length).reject{|i| s1[i] == s[i]}}
Explanation in a bit more detail
Suppose
s1 = 'GGGGGGGGG'
s2 = 'GGGCGGCGG'
s3 = 'GGGAGGCGG'
Afters1 and s2 are compared, we have the set of indices [3, 6] that represents where they differ. Now, when we add s3 into consideration, it does not matter whether we compare it with s1 or with s2 because, if s1[i] and s2[i] are different, then i is already included in the set [3, 6], so it does not make difference whether or not either of them are different from s3[i] and i is to be added to the set. On the other hand, if s1[i] and s2[i] are the same, it also does not make difference which one of them we compare with s3[i]. Therefore, pairwise comparison of s1 with s2, s3, ... is enough.
You almost certainly don't want to be doing this analysis with your own code. Rather, you want to be handing it off to an existing multiple sequence alignment tool, like Clustal.
I realise this is not an answer to your question, but i hope it's a solution to your problem!