How to implement CYK parsing algorithm in Ruby? - ruby

I am trying to implement CYK algorithm in Ruby according to pseudocode from Wikipedia. My implementation fails to produce the correct parse table. In the method given below, grammar is a member of my own grammar class. Here is the code:
# checks whether a grammar accepts given string
# assumes input grammar to be in CNF
def self.parse(grammar, string)
n = string.length
r = grammar.nonterminals.size
# create n x n x r matrix
tbl = Array.new(n) { |_| Array.new(n) { |_| Array.new(r, false) } }
(0...n).each { |s|
grammar.rules.each { |rule|
# check if rule is unit production: A -> b
next unless rule.rhs.size == 1
unit_terminal = rule.rhs[0]
if unit_terminal.value == string[s]
v = grammar.nonterminals.index(rule.lhs)
tbl[0][s][v] = true
end
}
}
(1...n).each { |l|
(0...n - l + 1).each { |s|
(0..l - 1).each { |p|
# enumerate over A -> B C rules, where A, B and C are
# indices in array of NTs
grammar.rules.each { |rule|
next unless rule.rhs.size == 2
a = grammar.nonterminals.index(rule.lhs)
b = grammar.nonterminals.index(rule.rhs[0])
c = grammar.nonterminals.index(rule.rhs[1])
if tbl[p][s][b] and tbl[l - p][s + p][c]
tbl[l][s][a] = true
end
}
}
}
}
v = grammar.nonterminals.index(grammar.start_sym)
return tbl[n - 1][0][v]
end
I tested it with this simple example:
grammar:
A -> B C
B -> 'x'
C -> 'y'
string: 'xy'
The parse table tbl was the following:
[[[false, true, false], [false, false, true]],
[[false, false, false], [false, false, false]]]
The problem definitely lies in the second part of the algorithm - substrings of length larger than 1. The first layer (tbl[0]) contains correct values.
Help much appreciated.

The problem lies in the translation from the 1-based arrays in the pseudocode to the 0-based arrays in your code.
It becomes obvious when you look at the first indices in the condition tbl[p][s][b] and tbl[l-p][s+p][c] in the very first run of the loop. The pseudocode checks tbl[1] and tbl[1] and your code checks tbl[0] and tbl[1].
I think you have to make the 0-based correction when you access the array and not in the ranges for l and p. Otherwise the calculations with the indices are wrong.
This should work:
(2..n).each do |l|
(0...n - l + 1).each do |s|
(1..l - 1).each do |p|
grammar.rules.each do |rule|
next unless rule.rhs.size == 2
a = grammar.nonterminals.index(rule.lhs)
b = grammar.nonterminals.index(rule.rhs[0])
c = grammar.nonterminals.index(rule.rhs[1])
if tbl[p - 1][s][b] and tbl[l - p - 1][s + p][c]
tbl[l - 1][s][a] = true
end
end
end
end
end

Related

How to optimize code - it works, but I know I'm missing much learning

The exercise I'm working on asks "Write a method, coprime?(num_1, num_2), that accepts two numbers as args. The method should return true if the only common divisor between the two numbers is 1."
I've written a method to complete the task, first by finding all the factors then sorting them and looking for duplicates. But I'm looking for suggestions on areas I should consider to optimize it.
The code works, but it is just not clean.
def factors(num)
return (1..num).select { |n| num % n == 0}
end
def coprime?(num_1, num_2)
num_1_factors = factors(num_1)
num_2_factors = factors(num_2)
all_factors = num_1_factors + num_2_factors
new = all_factors.sort
dups = 0
new.each_index do |i|
dups += 1 if new[i] == new[i+1]
end
if dups > 1
false
else
true
end
end
p coprime?(25, 12) # => true
p coprime?(7, 11) # => true
p coprime?(30, 9) # => false
p coprime?(6, 24) # => false
You could use Euclid's algorithm to find the GCD, then check whether it's 1.
def gcd a, b
while a % b != 0
a, b = b, a % b
end
return b
end
def coprime? a, b
gcd(a, b) == 1
end
p coprime?(25, 12) # => true
p coprime?(7, 11) # => true
p coprime?(30, 9) # => false
p coprime?(6, 24) # => false```
You can just use Integer#gcd:
def coprime?(num_1, num_2)
num_1.gcd(num_2) == 1
end
You don't need to compare all the factors, just the prime ones. Ruby does come with a Prime class
require 'prime'
def prime_numbers(num_1, num_2)
Prime.each([num_1, num_2].max / 2).map(&:itself)
end
def factors(num, prime_numbers)
prime_numbers.select {|n| num % n == 0}
end
def coprime?(num_1, num_2)
prime_numbers = prime_numbers(num_1, num_2)
# & returns the intersection of 2 arrays (https://stackoverflow.com/a/5678143)
(factors(num_1, prime_numbers) & factors(num_2, prime_numbers)).length == 0
end

Code wars: Flap Display with while loops

I'm trying to work through a level 5 kata by using while loops. Essentially the problem is to turn each letter rotors[n] number of times and then move on to the next rotors number until you get an output word.
flap_display(["CAT"],[1,13,27])
should output ["DOG"]
Here's what I have so far
def flap_display(lines, rotors)
stuff = "ABCDEFGHIJKLMNOPQRSTUVWXYZ?!##&()|<>.:=-+*/0123456789"
i = 0
j = 0
new_word = lines
while i < rotors.length
while j < new_word[0].length
new_word[0][j] = stuff[stuff.index(new_word[0][j]) + rotors[i]]
j += 1
end
i += 1
j = 0
end
new_word
end
This technically traverses the stuff string and assigns the right letters. However it fails two important things: it does not skip each letter when it rotates to the correct position (C should stop rotating when it hits D, A when it hits O etc) and it does not account for reaching the end of the stuff list and eventually returns a nil value for stuff[stuff.index(new_word[0][j]) + rotors[i]]. How can I fix these two problems using basic loops and enumerables or maybe a hash?
A fuller statement of the problem is given here. This is one Ruby-like way it could be done.
FLAPS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ ?!##&()|<>.:=-+*/0123456789"
NBR_FLAPS = FLAPS.size
def flap_display(str, rot)
rot_cum = rot.each_with_object([]) { |n,a| a << a.last.to_i + n }
str.gsub(/./) { |c| FLAPS[(c.ord + rot_cum.shift - 65) % NBR_FLAPS] }
end
flap_display("CAT", [1,13,27])
#=> "DOG"
flap_display("DOG", [-1,-13,-27])
#=> "CAT"
flap_display("CAT", [5,37,24])
#=> "H*&"
'A'.ord #=> 65 and rot_cum contains the cumulative values of rot:
arr = [1, 13, 27]
rot_cum = arr.each_with_object([]) { |n,a| a << a.last.to_i + n }
#=> [1, 14, 41]
I've written a.last.to_i rather than a.last to deal with the case where a is empty, so a.last #=> nil, meaning a.last.to_i => nil.to_i => 0. See NilClass#to_i. Those opposed to such trickery could write:
rot_cum = arr.drop(1).each_with_object([arr.first]) { |n,a| a << a.last + n }

How can I convert a human-readable number to a computer-readable number in Ruby?

I'm working in Ruby with an array that contains a series of numbers in human-readable format (e.g., 2.5B, 1.27M, 600,000, where "B" stands for billion, "M" stands for million). I'm trying to convert all elements of the array to the same format.
Here is the code I've written:
array.each do |elem|
if elem.include? 'B'
elem.slice! "B"
elem = elem.to_f
elem = (elem * 1000000000)
else if elem.include? 'M'
elem.slice! "M"
elem = elem.to_f
elem = (elem * 1000000)
end
end
When I inspect the elements of the array using puts(array), however, the numbers appear with the "B" and "M" sliced off but the multiplication conversion does not appear to have been applied (e.g., the numbers now read 2.5, 1.27, 600,000, instead of 2500000000, 1270000, 600,000).
What am I doing wrong?
First thing to note is that else if in ruby is elsif. See http://www.tutorialspoint.com/ruby/ruby_if_else.htm
Here is a working function for you to try out:
def convert_array_items_from_human_to_integers(array)
array.each_with_index do |elem,i|
if elem.include? 'B'
elem.slice! "B"
elem = elem.to_f
elem = (elem * 1000000000)
elsif elem.include? 'M'
elem.slice! "M"
elem = elem.to_f
elem = (elem * 1000000)
end
array[i] = elem
end
return array
end
Calling convert_array_items_from_human_to_integers(["2.5B", "1.2M"])
returns [2500000000.0, 1200000.0]
Another variation:
array = ['2.5B', '1.27M', '$600000']
p array.each_with_object([]) { |i, a|
i = i.gsub('$', '')
a << if i.include? 'B'
i.to_f * 1E9
elsif i.include? 'M'
i.to_f * 1E6
else
i.to_f
end
}
#=> [2500000000.0, 1270000.0, 600000.0]
Try this:
array.map do |elem|
elem = elem.gsub('$','')
if elem.include? 'B'
elem.to_f * 1000000000
elsif elem.include? 'M'
elem.to_f * 1000000
else
elem.to_f
end
end
This uses map instead of each to return a new array. Your attempt assigns copies of the array elements, leaving the original array in place (except for the slice!, which modifies in place). You can dispense with the slicing in the first place, since to_f will simply ignore any non-numeric characters.
EDIT:
If you have leading characters such as $2.5B, as your question title indicates (but not your example), you'll need to strip those explicitly. But your sample code doesn't handle those either, so I assume that's not an issue.
Expanding a bit on pjs' answer:
array.each do |elem|
elem is a local variable pointing to each array element, one at a time. When you do this:
elem.slice! "B"
you are sending a message to that array element telling it to slice the B. And you're seeing that in the end result. But when you do this:
elem = elem.to_f
now you've reassigned your local variable elem to something completely new. You haven't reassigned what's in the array, just what elem is.
Here's how I'd go about it:
ARY = %w[2.5B 1.27M 600,000]
def clean_number(s)
s.gsub(/[^\d.]+/, '')
end
ARY.map{ |v|
case v
when /b$/i
clean_number(v).to_f * 1_000_000_000
when /m$/i
clean_number(v).to_f * 1_000_000
else
clean_number(v).to_f
end
}
# => [2500000000.0, 1270000.0, 600000.0]
The guts of the code are in the case statement. A simple check for the multiplier allows me to strip the undesired characters and multiply by the right value.
Normally we could use to_f to find the floating-point number to be multiplied for strings like '1.2', but it breaks down for things like '$1.2M' because of the "$". The same thing is true for embedded commas marking thousands:
'$1.2M'.to_f # => 0.0
'1.2M'.to_f # => 1.2
'6,000'.to_f # => 6.0
'6000'.to_f # => 6000.0
To fix the problem for simple strings containing just the value, it's not necessary to do anything fancier than stripping undesirable characters using gsub(/[^\d.]+/, ''):
'$1.2M'.gsub(/[^\d.]+/, '') # => "1.2"
'1.2M'.gsub(/[^\d.]+/, '') # => "1.2"
'6,000'.gsub(/[^\d.]+/, '') # => "6000"
'6000'.gsub(/[^\d.]+/, '') # => "6000"
[^\d.] means "anything NOT a digit or '.'.
Be careful how you convert your decimal values to integers. You could end up throwing away important precision:
'0.2M'.gsub(/[^\d.]+/, '').to_f * 1_000_000 # => 200000.0
'0.2M'.gsub(/[^\d.]+/, '').to_i * 1_000_000 # => 0
('0.2M'.gsub(/[^\d.]+/, '').to_f * 1_000_000).to_i # => 200000
Of course all this breaks down if your string is more complex than a simple number and multiplier. It's easy to break down a string and identify those sort of sub-strings, but that's a different question.
I would do it like this:
Code
T, M, B = 1_000, 1_000_000, 1_000_000_000
def convert(arr)
arr.map do |n|
m = n.gsub(/[^\d.TMB]/,'')
m.to_f * (m[-1][/[TMB]/] ? Object.const_get(m[-1]) : 1)
end
end
Example
arr = %w[$2.5B 1.27M 22.5T, 600,000]
convert(arr)
# => [2500000000.0, 1270000.0, 22500.0, 600000.0]
Explanation
The line
m = n.gsub(/[^\d.TMB]/,'')
# => ["2.5B", "1.27M", "22.5T", "600000"]
merely eliminates unwanted characters.
m.to_f * (m[-1][/[TMB]/] ? Object.const_get(m[-1]) : 1)
returns the product of the string converted to a float and a constant given by the last character of the string, if that character is T, M or B, else 1.
Actual implementation might be like this:
class A
T, M, B = 1_000, 1_000_000, 1_000_000_000
def doit(arr)
c = self.class.constants.map(&:to_s).join
arr.map do |n|
m = n.gsub(/[^\d.#{c}]/,'')
m.to_f * (m[-1][/[#{c}]/] ? self.class.const_get(m[-1]) : 1)
end
end
end
If we wished to change the reference for 1,000 from T to K and add T for trillion, we would need only change
T, M, B = 1_000, 1_000_000, 1_000_000_000
to
K, M, B, T = 1_000, 1_000_000, 1_000_000_000, 1_000_000_000_000

Ruby Pascal's triangle generator with memoization

I am attempting to memoize my implementation of a Pascal's triangle generator, as a Ruby learning experiment. I have the following working code:
module PascalMemo
#cache = {}
def PascalMemo::get(r,c)
if #cache[[r,c]].nil? then
if c == 0 || c == r then
#cache[[r,c]] = 1
else
#cache[[r,c]] = PascalMemo::get(r - 1, c) + PascalMemo::get(r - 1, c - 1)
end
end
#cache[[r,c]]
end
end
def pascal_memo (r,c)
PascalMemo::get(r,c)
end
Can this be made more concise? Specifically, can I create a globally-scoped function with a local closure more simply than this?
def pascal_memo
cache = [[1]]
get = lambda { |r, c|
( cache[r] or cache[r] = [1] + [nil] * (r - 1) + [1] )[c] or
cache[r][c] = get.(r - 1, c) + get.(r - 1, c - 1)
}
end
p = pascal_memo
p.( 10, 7 ) #=> 120
Please note that the above construct does achieve memoization, it is not just a simple recursive method.
Can this be made more concise?
It seems pretty clear, IMO, and moduleing is usually a good instinct.
can I create a globally-scoped function with a local closure more simply than this?
Another option would be a recursive lambda:
memo = {}
pascal_memo = lambda do |r, c|
if memo[[r,c]].nil?
if c == 0 || c == r
memo[[r,c]] = 1
else
memo[[r,c]] = pascal_memo[r - 1, c] + pascal_memo[r - 1, c - 1]
end
end
memo[[r,c]]
end
pascal_memo[10, 2]
# => 45
I have found a way to accomplish what I want that is slightly more satisfactory, since it produces a function rather than a lambda:
class << self
cache = {}
define_method :pascal_memo do |r,c|
cache[[r,c]] or
(if c == 0 or c == r then cache[[r,c]] = 1 else nil end) or
cache[[r,c]] = pascal_memo(r-1,c) + pascal_memo(r-1,c-1)
end
end
This opens up the metaclass/singleton class for the main object, then uses define_method to add a new method that closes over the cache variable, which then falls out of scope for everything except the pascal_memo method.

Ruby array with an extra state

I'm trying to go through an array and add a second dimension for true and false values in ruby.
For example. I will be pushing on arrays to another array where it would be:
a = [[1,2,3,4],[5]]
I would like to go through each array inside of "a" and be able to mark a state of true or false for each individual value. Similar to a map from java.
Any ideas? Thanks.
You're better off starting with this:
a = [{ 1 => false, 2 => false, 3 => false, 4 => false }, { 5 => false }]
Then you can just flip the booleans as needed. Otherwise you will have to pollute your code with a bunch of tests to see if you have a Fixnum (1, 2, ...) or a Hash ({1 => true}) before you can test the flag's value.
Hashes in Ruby 1.9 are ordered so you wouldn't lose your ordering by switching to hashes.
You can convert your array to this form with one of these:
a = a.map { |x| Hash[x.zip([false] * x.length)] }
# or
a = a.map { |x| x.each_with_object({}) { |i,h| h[i] = false } }
And if using nil to mean "unvisited" makes more sense than starting with false then:
a = a.map { |x| Hash[x.zip([nil] * x.length)] }
# or
a = a.map { |x| x.each_with_object({}) { |i,h| h[i] = nil } }
Some useful references:
Hash[]
each_with_object
zip
Array *
If what you are trying to do is simply tag specific elements in the member arrays with boolean values, it is just a simple matter of doing the following:
current_value = a[i][j]
a[i][j] = [current_value, true_or_false]
For example if you have
a = [[1,2,3,4],[5]]
Then if you say
a[0][2] = [a[0,2],true]
then a becomes
a = [[1,2,[3,true],4],[5]]
You can roll this into a method
def tag_array_element(a, i, j, boolean_value)
a[i][j] = [a[i][j], boolean_value]
end
You might want to enhance this a little so you don't tag a specific element twice. :) To do so, just check if a[i][j] is already an array.
Change x % 2 == 0 for the actual operation you want for the mapping:
>> xss = [[1,2,3,4],[5]]
>> xss.map { |xs| xs.map { |x| {x => x % 2} } }
#=> [[{1=>false}, {2=>true}, {3=>false}, {4=>true}], [{5=>false}]]

Resources