Simple question here. I never programmed in ruby... so I thought I asked here to confirm if I'm even close to the solution.
Challenge:
Problem Definition: This Ruby method should ensure that the word "Twitter" is spelt correctly.
def fix_spelling(name)
if name = "twittr"
name = "twitter"
else
fix_spelling(name)
end
return "name"
end
I checked how to build methods in ruby and I came out with the following solution:
The problems I identified:
the method is being called inside the function so it will never print anything.
the return is actually returning a string "name" rather that the variable.
def fix_spelling(name)
if name = "twittr"
name = "twitter"
end
return name
end
puts fix_spelling("twittr")
Would this be correct?
Priting:
def fix_spelling(name)
if name == "twittr"
name = "twitter"
end
return name
end
puts fix_spelling(name = "twittr");
Fixing and Shortening the Original Code
A much shorter and more idiomatic version of your current solution looks like this:
def fix_spelling name
name == 'twittr' ? 'twitter' : name
end
# validate inputs
p %w[twitter twittr twit].map { |word| fix_spelling word }
#=> ["twitter", "twitter", "twit"]
However, this essentially just returns name for any other value than twittr, whether it's spelled correctly or not. If that's what you expect, fine. Otherwise, you'll need to develop a set of case statements or return values that can "correct" all sorts of other misspellings. You might also consider using the Levenshtein distance or other heuristic for fuzzy matching rather than using fixed strings or regular expressions to map your inputs to outputs.
Fuzzy Matching
Consider this alternative approach, which uses a gem to determine if the Damerau-Levenshtein edit distance is ~50% of the length of your correctly-spelled word, allows for additional words, and returns the original word bracketed by question marks when it can't be corrected:
require 'damerau-levenshtein'
WORD_LIST = %w[Facebook Twitter]
def autocorrect word
WORD_LIST.map do |w|
max_dist = (w.length / 2).round
return w if DamerauLevenshtein.distance(w, word) <= max_dist
end
'?%s?' % word
end
# validate inputs
p %w[twitter twittr twit facebk].map { |word| autocorrect word }
#=> ["Twitter", "Twitter", "?twit?", "Facebook"]
This isn't really a "spellchecker in a box," but provides a foundation for a more flexible framework if that's where you're going with this. There are a lot of edge cases such as correct-word mapping, capitalization, word stemming, and abbreviations (think "fb" for Facebook) that I'm excluding from the scope of this answer, but edit distance will certainly get you further along towards a comprehensive auto-correct solution than the original example would. Your mileage may certainly vary.
Related
I am new to ruby. I am trying to create a report_checker function that checks how often the word "green, red, amber" appears and returns it in the format: "Green: 2/nAmber: 1/nRed:1".
If the word is not one of the free mentioned, it is replaced with the word 'unaccounted' but the number of times it appears is still counted.
My code is returning repeats e.g if I give it the input report_checker("Green, Amber, Green"). It returns "Green: 2/nAmber: 1/nGreen: 2" as opposed to "Green: 2/nAmber: 1".
Also, it doesn't count the number of times an unaccounted word appears. Any guidance on where I am going wrong?
def report_checker(string)
array = []
grading = ["Green", "Amber", "Red"]
input = string.tr(',', ' ').split(" ")
input.each do |x|
if grading.include?(x)
array.push( "#{x}: #{input.count(x)}")
else
x = "Unaccounted"
array.push( "#{x}: #{input.count(x)}")
end
end
array.join("/n")
end
report_checker("Green, Amber, Green")
I tried pushing the words into separate words and returning the expected word with its count
There's a lot of things you can do here to steer this into more idiomatic Ruby:
# Use a constant, as this never changes, and a Set, since you only care
# about inclusion, not order. Calling #include? on a Set is always
# quick, while on a longer array it can be very slow.
GRADING = Set.new(%w[ Green Amber Red ])
def report_checker(string)
# Do this as a series of transformations:
# 1. More lenient splitting on either comma or space, with optional leading
# and trailing spaces.
# 2. Conversion of invalid inputs into 'Unaccounted'
# 3. Grouping together of identical inputs via the #itself method
# 4. Combining these remapped strings into a single string
string.split(/\s*[,|\s]\s*/).map do |input|
if (GRADING.include?(input))
input
else
'Unaccounted'
end
end.group_by(&:itself).map do |input, samples|
"#{input}: #{samples.length}"
end.join("\n")
end
report_checker("Green, Amber, Green, Orange")
One thing you'll come to learn about Ruby is that simple mappings like this translate into very simple Ruby code. This might look a bit daunting now if you're not used to it, but keep in mind each component of that transformation isn't that complex, and further, that you can run up to that point to see what's going on, or even use .tap { |v| p v }. in the middle to expand on what's flowing through there.
Taking this further into the Ruby realm, you'd probably want to use symbols, as in :green and :amber, as these are very tidy as things like Hash keys: { green: 0, amber: 2 } etc.
While this is done as a single method, it might make sense to split this into two concerns: One focused on computing the report itself, as in to a form like { green: 2, amber: 1, unaccounted: 1 } and a second that can convert reports of that form into the desired output string.
There are lots and lots of ways to accomplish your end goal in Ruby. I won't go over those, but I will take a moment to point out a few key issues with your code in order to show you where the most notable probelms are and to show you how to fix it with as few changes as I can personally think of:
Issue #1:
if grading.include?(x)
array.push( "#{x}: #{input.count(x)}")
This results in a new array element being added each and every time grading includes x. This explains why you are getting repeated array elements ("Green: 2/nAmber: 1/nGreen: 2"). My suggested fix for this issue is to use the uniq method in the last line of your method defintion. This will remove any duplicated array elements.
Issue #2
else
x = "Unaccounted"
array.push( "#{x}: #{input.count(x)}")
The reason you're not seeing any quantity for your "Unaccounted" elements is that you're adding the word(string) "Unaccounted" to your array, but you've also re-defined x. The problem here is that input does not actually include any instances of "Unaccounted", so your count is always going to be 0. My suggested fix for this is to simply find the length difference between input and grading which will tell you exactly how many "Unaccounted" elements there actually are.
Issue #3 ??
I'm assuming you meant to include a newline and not a forward slash (/) followed by a literal "n" (n). My suggested fix for this of course is to use a proper newline (\n). If my assumption is incorrect, just ignore that part.
After all changes, your minimally modified code would look like this:
def report_checker(string)
array = []
grading = ["Green", "Amber", "Red"]
input = string.tr(',', ' ').split(" ")
input.each do |x|
if grading.include?(x)
array.push( "#{x}: #{input.count(x)}")
else
array.push( "Unaccounted: #{(input-grading).length}")
end
end
array.uniq.join("\n")
end
report_checker("Green, Amber, Green, Yellow, Blue, Blue")
#=>
Green: 2
Amber: 1
Unaccounted: 3
Again, I'm not suggesting that this is the most effective or efficient approach. I'm just giving you some minor corrections to work with so you can take baby steps if so desired.
Try with blow code
add your display logic outside of method
def report_checker(string, grading = %w[ Green Amber Red ])
data = string.split(/\s*[,|\s]\s*/)
unaccounted = data - grading
(data - unaccounted).tally.merge('Unaccounted' => unaccounted.count)
end
result = report_checker("Green, Amber, Green, Orange, Yellow")
result.each { |k,v| puts "#{k} : #{v}"}
Output
Green : 2
Amber : 1
Unaccounted : 2
Given a single letter (string), say "a", I want to convert this into its corresponding control code, i.e. "\ca" - or equivalently (in alternate syntax) - "\C-a", ?\ca, "\x01", "\u0001"
I was hoping there'd be some "nice", clean way of doing this conversion, but I can't figure it out.
An obvious first attempt might be to try something like:
def convert_to_control_code(letter)
"\c#{letter}"
end
...But this does not work, since this will always return "\u0003{letter}" (where "\u0003" is the control code "\c#"
My current solution is simply to "brute force" it by doing the following:
def convert_to_control_code(letter)
(0..255).detect { |x| x.chr =~ Regexp.new("\\c#{char}") }.chr
end
However, I can't help but feel there's a "right" way of doing this!
Edit:
Here's another, non brute-force solution I've come up with, that seems to work:
def convert_to_control_code(letter)
(letter.ord % 32).chr
end
This looks much nicer, but also very hacky!
You can write it as :
def convert_to_control_code(letter)
eval "?\\C-#{letter.chr}"
end
convert_to_control_code(97) # => "\u0001"
convert_to_control_code(98) # => "\u0002"
One possibility is to do the same as Ruby itself does. It might look something like this:
def convert_to_control(letter)
letter = letter.chr # ensure we are only dealing with a single char
return 0177.chr if letter == '?'
raise 'an error' unless letter.ascii_only? # or do something else
(letter.ord & 0x9f).chr
end
You might want to change the encoding of the result depending on what you are doing.
Question
I need to search a given web page for a particular node when given the exact HTML as a string. For instance, if given:
url = "https://www.wikipedia.org/"
node_to_find = "<title>Wikipedia</title>"
I want to "select" the node on the page (and eventually return its children and sibling nodes). I'm having trouble with the Nokogiri docs, and how to exactly go about this. It seems as though, most of the time, people want to use Xpath syntax or the #css method to find nodes that satisfy a set of conditions. I want to use the HTML syntax and just find the exact match within a webpage.
Possible start of a solution?
If I create two Nokogiri::HTML::DocumentFragment objects, they look similar but do not match due to the memory id being different. I think this might be a precursor to solving it?
irb(main):018:0> n = Nokogiri::HTML::DocumentFragment.parse(<title>Wikipedia</title>").child
=> #<Nokogiri::XML::Element:0x47e7e4 name="title" children=[ <Nokogiri::XML::Text:0x47e08c "Wikipedia">]>
irb(main):019:0> n.class
=> Nokogiri::XML::Element
Then I create a second one using the exact same arguments. Compare them - it returns false:
irb(main):020:0> x = Nokogiri::HTML::DocumentFragment.parse("<title>Wikipedia</title>").child
=> #<Nokogiri::XML::Element:0x472958 name="title" children=[#<Nokogiri::XML::Text:0x4724a8 "Wikipedia">]>
irb(main):021:0> n == x
=> false
So I'm thinking that if I can somehow create a method that can find matches like this, then I can perform operations of that node. In particular - I want to find the descendents (children and next sibling).
EDIT: I should mention that I have a method in my code that creates a Nokogiri::HTML::Document object from a given URL. So - that will be available to compare with.
class Page
attr_accessor :url, :node, :doc, :root
def initialize(params = {})
#url = params.fetch(:url, "").to_s
#node = params.fetch(:node, "").to_s
#doc = parse_html(#url)
end
def parse_html(url)
Nokogiri::HTML(open(url).read)
end
end
As suggested by commenter #August, you could use Node#traverse to see if the string representation of any node matches the string form of your target node.
def find_node(html_document, html_fragment)
matching_node = nil
html_document.traverse do |node|
matching_node = node if node.to_s == html_fragment.to_s
end
matching_node
end
Of course, this approach is fraught with problems that boil down to the canonical representation of the data (do you care about attribute ordering? specific syntax items like quotation marks? whitespace?).
[Edit] Here's a prototype of converting an arbitrary HTML element to an XPath expression. It needs some work but the basic idea (match any element with the node name, specific attributes, and possibly text child) should be a good starting place.
def html_to_xpath(html_string)
node = Nokogiri::HTML::fragment(html_string).children.first
has_more_than_one_child = (node.children.size > 1)
has_non_text_child = node.children.any? { |x| x.type != Nokogiri::XML::Node::TEXT_NODE }
if has_more_than_one_child || has_non_text_child
raise ArgumentError.new('element may only have a single text child')
end
xpath = "//#{node.name}"
node.attributes.each do |_, attr|
xpath += "[#{attr.name}='#{attr.value}']" # TODO: escaping.
end
xpath += "[text()='#{node.children.first.to_s}']" unless node.children.empty?
xpath
end
html_to_xpath('<title>Wikipedia</title>') # => "//title[text()='Wikipedia']"
html_to_xpath('<div id="foo">Foo</div>') # => "//div[id='foo'][text()='Foo']"
html_to_xpath('<div><br/></div>') # => ArgumentError: element may only have a single text child
It seems possible that you could build an XPath from any HTML fragment (e.g. not restricted to those with only a single text child, per my prototype above) but I'll leave that as an exercise for the reader ;-)
I've spent a few hours searching for a way to push an array into another array or into a hash. Apologies in advance if the formatting of this question is bit messy. This is the first time I've asked a question on StackOverflow so I'm trying to get the hang of styling my questions properly.
I have to write some code to make the following test unit past:
class TestNAME < Test::Unit::TestCase
def test_directions()
assert_equal(Lexicon.scan("north"), [['direction', 'north']])
result = Lexicon.scan("north south east")
assert_equal(result, [['direction', 'north'],
['direction', 'south'],
['direction', 'east']])
end
end
The most simple thing I've come up with is below. The first part passes, but then the second part is not returning the expected result when I run rake test.
Instead or returning:
[["direction", "north"], ["direction", "south"], ["direction",
"east"]]
it's returning:
["north", "south", "east"]
Although, if I print the result of y as a string to the console, I get 3 separate arrays that are not contained within another array (as below). Why hasn't it printed the outermost square brackets of the array, y?
["direction", "north"]
["direction", "south"]
["direction", "east"]
Below is the code I've written in an attempt to pass the test unit above:
class Lexicon
def initialize(stuff)
#words = stuff.split
end
def self.scan(word)
if word.include?(' ')
broken_words = word.split
broken_words.each do |word|
x = ['direction']
x.push(word)
y = []
y.push(x)
end
else
return [['direction', word]]
end
end
end
Any feedback about this will be much appreciated. Thank you all so much in advance.
What you're seeing is the result of each, which returns the thing being iterated over, or in this case, broken_words. What you want is collect which returns the transformed values. Notice in your original, y is never used, it's just thrown out after being composed.
Here's a fixed up version:
class Lexicon
def initialize(stuff)
#words = stuff.split
end
def self.scan(word)
broken_words = word.split(/\s+/)
broken_words.collect do |word|
[ 'direction', word ]
end
end
end
It's worth noting a few things were changed here:
Splitting on an arbitrary number of spaces rather than one.
Simplifying to a single case instead of two.
Eliminating the redundant return statement.
One thing you might consider is using a data structure like { direction: word } instead. That makes referencing values a lot easier since you'd do entry[:direction] avoiding the ambiguous entry[1].
If you're not instantiating Lexicon objects, you can use a Module which may make it more clear that you're not instantiating objects.
Also, there is no need to use an extra variable (i.e. broken_words), and I prefer the { } block syntax over the do..end syntax for functional blocks vs. iterative blocks.
module Lexicon
def self.scan str
str.split.map {|word| [ 'direction', word ] }
end
end
UPDATE: based on Cary's comment (I assume he meant split when he said scan), I've removed the superfluous argument to split.
I am writing a small project in ruby that takes all of the words from a website and then sorts them short to long.
To verify that what gets sorted is actually valid english I am comparing an array of scraped words to the basic unix/osx words file.
The method to do this is the spell_check method. The problem is that when used on small arrays is works fine, but with larger ones it will let non-words through. Any ideas?
def spell_check (words_array)
dictionary = IO.read "./words.txt"
dictionary = dictionary.split
dictionary.map{|x| x.strip }
words_array.each do |word|
if !(dictionary.include? word)
words_array.delete word
end
end
return words_array
end
I simplified your code, maybe this will work?
def spell_check(words)
lines = IO.readlines('./words.txt').map { |line| line.strip }
words.reject { |word| !lines.include? word }
end
I noticed that you were trying to modify the words_array while you were simultaneously iterating over it with each:
words_array.each do |word|
if !(dictionary.include? word)
words_array.delete word # Delete the word from words_array while iterating!
end
end
I'm not sure if this is the case in Ruby, but in other programming languages like Java and C#, trying to modify a collection, while you're iterating over it at the same time, invalidates the iteration, and either produces unexpected behavior, or just throws an error. Maybe this was your problem with your original code?
Also, the return statement was unnecessary in your original code, because the last statement evaluated in a Ruby block is always returned (unless there's an explicit return that precedes the last statement). It's idiomatic Ruby to leave it out in such cases.