Cucumber Ruby read Word Doc or Text - ruby

I'm writing tests that will be confirming a lot of text on page. It's for Terms and Conditions pages, Cookies, Privacy Policy etc. Not what I'd like to do but it's a requirement I can't avoid. I've heard that Cucumber can open a text file like. txt or .doc and compare the text on screen.
I've tried to find any reference to this but have come up short. Is anyone able to point me in the right direction please?
Thanks

Cucumber/aruba has a Given step that goes like this :
Given a file named "foo" with:
"""
hello world
"""
You would then be able to check that you webpage has content with :
Then I should see "hello world"
And your step :
Then /^I should see "([^"]*)" do |text
(page.should have_content(text))

I continued to play and found this works for a .txt file:
File.foreach(doc) do |line|
line = line.strip
within_window(->{ page.title == 'Cookies' }) do
page_content = find('#main').text
page_content.gsub!('‘',"'")
page_content.gsub!('’',"'")
expect(page_content).to have_text line
end
end
end
doc is my filename variable. I used strip to remove the newline characters created when pasting into the txt file. The cookies open up in a new tab so I navigated to that. I had invalid UTF-8 characters so just used gsub on those.
I'm sure there's a much better way to do this but this works for now

I'd recommend using approval tests. A person approves the original, then the automated test verifies no change. If the license text changes, the test fails. If the text is supposed to pass, then the tester approves the new text, then it's automated thereafter.

Related

Ruby sub not applying in the output file using IO.write

I have an issue about getting the same result in the output txt file that I get applying the sub method in a string. So the thing is when I apply the following code in a single string I get the \n before the capital letter in the middle of the string:
line3= "We were winning The Home Secretary played a
important role."
line3.sub(/(?<!^) *+(?=[A-Z])/, "\n")
=> "We were winning\nThe Home Secretary played a\n important role."
But if I apply the following code the txt file that I get doesn't have any \n before the capital letter.
old= File.readlines("Modificado word.txt")
second= old.join
third= second.sub(/(?<!^) *+(?=[A-Z])/, "\n")
new= IO.write("new.txt", third)
I've tried multiple ways of encoding(surely in the wrong way) because I thought the the issue might be there but any of them worked. Even the gsub, but didn't work either.
Ok, I've got the solution, I don't know why but the type of encode of the txt file is in a format that the readlines command is not even able to read, so I copied all the content in another txt file which should be created from scratch and it worked :)

How to align horizontally a multiline CLI help text with OptionParser

I am trying to make the output of my CLI Ruby gem my_command --help cleaner.
There is some CLI options and flags that need a couple of sentences to explain them. I don't have found a way to properly align this text in their column when the explanation is too long to fit inside a regular terminal width view.
I want to have something like this:
ab --help as an example, note how some flags have a multiple line explanations with a proper alignment.
Right now, I am doing something like this in OptionParser to keep text aligned in their column in case we need multiple lines to explain something:
opts.on("-d", "--directory PATH", String, "Directory to save the downloaded files into\n\t\t\t\t Default is ./websites/ plus the domain name") do |t|
options[:directory] = t
end
It's working, but it doesn't seem optimal nor clean to have \t everywhere to force formatting. Plus, I can see cases where it's not being formatted correctly in other terminal configurations.
How can I align horizontally a multiline CLI help text with OptionParser in a clean way?
You can force line breaks without needing to add tabs by adding more parameters to opts.on:
opts.on("-d", "--directory PATH", String,
"Directory to save the downloaded files into",
"Default is ./websites/ plus the domain name") do |t|
options[:directory] = t
end
This isn't very clearly documented in official documentation, but you can see it used in the complete example.

Ignore commented out code when using YARD

I have some Ruby code that looks like this:
# some_string = "{really?}"
where the curly braces need to be part of the string. This line is commented out code that I'd like to remain there. I'm additionally using YARD to document code, so when I run yard doc it (naturally) throws a warning about being unable to link "really".
Is there a way I can tell YARD to ignore commented out code?
Is there a way I can tell YARD to ignore commented out code?
On the one hand, YARD is documented as supporting Rdoc markup. And Rdoc is documented to support a couple of ways to hide parts.
RDoc stops processing comments if it finds a comment line starting
with -- right after the # character (otherwise, it will be treated as
a rule if it has three dashes or more). This can be used to separate
external from internal comments, or to stop a comment being associated
with a method, class, or module. Commenting can be turned back on with
a line that starts with ++.
:stopdoc: / :startdoc:
Stop and start adding new documentation elements to the current
container. For example, if a class has a number of constants that you
don’t want to document, put a :stopdoc: before the first, and a
:startdoc: after the last. If you don’t specify a :startdoc: by the end
of the container, disables documentation for the rest of the current
file.
Source
On the other hand, I have never persuaded Rdoc or YARD to follow that markup. If your luck is better than mine, you can stop reading here.
If you, too, can't persuade YARD to follow that markup, I think your best bet might be to cut that line, and commit the file with a distinctive commit message--one that you'll be able to find easily by grepping the source control logs.
Finally, rake lets you transform text (code) files in arbitrary ways. You can write a Rakefile to delete lines before processing them through YARD.
$ cat silly-ruby-file.src
class Something
def this_method
end
def that_method
# some_string = "{really?}" # Hide me
end
end
I appended the text # Hide me; it's a lot easier to filter that specific text than it is to filter commented lines of arbitrary code.
$ cat Rakefile
task :default => "silly-ruby-file.rb"
sh "grep -v '# Hide me' silly-ruby-file.src > silly-ruby-file.rb"
This tells rake to run grep, copying all lines except those that have the text "# Hide me" to stdout, which is redirected to "silly-ruby-file.rb".

Regex Markdown Header

I'm trying to create a regular (ruby) expression which checks for multiple conditions. I use this regex to replace the content of my object. My regex is close to finished, except two problems I'm facing with regard to markdown.
First of, headers are giving me trouble. For example, I don't want to replace the word "Hi" for "Hello" if "Hi" is in a header.
Hi John <== # should not change
==================
Text: Hi, how are you? <== # Should be: Hello, how are you? after substitution
Or:
#### Hi Peter <== # should not change
Text: Hi, how are you? <== # Should be: Hello, how are you? after substitution
Question: How can I escape markdown headers within my regex? I've tried negative lookbehind and lookahead assertions, but to no avail.
My second problem should be quite easy, but somehow I'm struggling. If words are Italic "hi" I want to find and replace them, without changing the underscores. I can find the word with this regex:
\b[_]*hi[_]*\b
Question 2: But if I would replace it, I would also change the underscores. Is there a way to only detect the word itself and replace it, while still using word boundaries?
Code Example
#website.autolinks.all.each do |autolink|
autolink.name #for example returns "Iphone5"
autolink.url #for example returns "http://www.apple.com"
regex = /\b(?<!##\s)(?<![\d.\[])([_]*)#{autolink.name}([_]*)(?![\d'"<\/a>])\b/
if #permalink.blog_entry.content.match(regex)
#permalink.blog_entry.content.gsub!(regex, "[#{autolink.name}](# {autolink.url})")
end
end
Example text
Iphone5
==============
Iphone5 is the best mobile phone there is, even though the people at Samsung probably think, or perhaps only hope that their Samsung Galaxy S3 is better.
#### Samsung Galaxy S3?
Yes, that's the name of the newest Samsung phone.
This will result in a text with HTML tags, but when I use my regex my content uses Markdown syntax (used before the markdown converter).
Regexes work best when they do one clear thing. If you have multiple conditions, your code should usually reflect that by dividing the processing into steps.
In this case, you have two clear steps:
Use a simple regex or other logic to skip over the header portion of the message.
Once you know you are in the content, use another regex to process the content.
I've found a solution:
regex = /(?<!##\s)(?<![\d.\[a-z])#{autolink.name}(?![\d'"a-z<\/a>])(?!.*\n(==|--))/i
if #permalink.blog_entry.content.match(regex)
#permalink.blog_entry.content.gsub!(regex, "[\\0](#{autolink.url})")
end

How do I extract links from HTML using regex?

I want to extract links from google.com; My HTML code looks like this:
<a href="http://www.test.com/" class="l"
I took me around five minutes to find a regex that works using www.rubular.com.
It is:
"(.*?)" class="l"
The code is:
require "open-uri"
url = "http://www.google.com/search?q=ruby"
source = open(url).read()
links = source.scan(/"(.*?)" class="l"/)
links.each { |link| puts #{link}
}
The problem is, is it not outputting the websites links.
Those links actually have class=l not class="l". By the way, to figure this put I added some logging to the method so that you can see the output at various stages and debug it. I searched for the string you were expecting to find and didn't find it, which is why your regex failed. So I looked for the right string you actually wanted and changed the regex accordingly. Debugging skills are handy.
require "open-uri"
url = "http://www.google.com/search?q=ruby"
source = open(url).read
puts "--- PAGE SOURCE ---"
puts source
links = source.scan(/<a.+?href="(.+?)".+?class=l/)
puts "--- FOUND THIS MANY LINKS ---"
puts links.size
puts "--- PRINTING LINKS ---"
links.each do |link|
puts "- #{link}"
end
I also improved your regex. You are looking for some text that starts with the opening of an a tag (<a), then some characters of some sort that you dont care about (.+?), an href attribute (href="), the contents of the href attribute that you want to capture ((.+?)), some spaces or other attributes (.+?), and lastly the class attrubute (class=l).
I have .+? in three places there. the . means any character, the + means there must be one or more of the things right before it, and the ? means that the .+ should try to match as short a string as possible.
To put it bluntly, the problem is that you're using regexes. The problem is that HTML is what is known as a context-free language, while regular expressions can only the class of languages that are known as regular languages.
What you should do is send the page data to a parser that can handle HTML code, such as Hpricot, and then walk the parse tree you get from the parser.
What im going wrong?
You're trying to parse HTML with regex. Don't do that. Regular expressions cannot cover the range of syntax allowed even by valid XHTML, let alone real-world tag soup. Use an HTML parser library such as Hpricot.
FWIW, when I fetch ‘http://www.google.com/search?q=ruby’ I do not receive ‘class="l"’ anywhere in the returned markup. Perhaps it depends on which local Google you are using and/or whether you are logged in or otherwise have a Google cookie. (Your script, like me, would not.)

Resources