download pdf files that are href links using ruby mechanize - ruby

Using Ruby Mechanize I have successfully submitted input values to a form and am able to get the resultant page based on the search criteria. The resultant page has pdf files as ahref links that i need to download.
Attribute href has value:
href='xxx.do?FILENAME=path/abc.pdf&SEARCHTEXT=aaa&ID=123_4
where SEARCHTEXT is the text entered as input originally. When i manually click the link pdf opens in a new window having
url as http://someip:8080/xxx/temp/123_4 which is the same ID seen in the href attribute. The actual filename however is different and is of the form xxx.123_2_.doc. My below code returns 0 byte file -
scraper.pluggable_parser.pdf = Mechanize::FileSaver
File.open('n1pdf.pdf', 'wb'){|f| f << scraper.get(alink).body}
where alink=http://someip:8080/xxx/temp/123_4
If i use
File.open("new.pdf", "w") do |f|
uri = URI(alink)
f << Net::HTTP.get(uri)
end
I get HTTP not found error.
I am not sure if i am doing this correct. Is ID a session id that is generated dynamically since all pdf files on the resultant page have this ID with _1/2/3 as filename(or url).
Please note that whenever i manually click and open a pdf file and then hardcore that in my code the file downloads but does not when my code dynamically extracts the ID value and assigns to alink. Not sure if this is related to cookies. Kindly help. Thank You.

Make sure it's the right absolute url:
uri = scraper.page.uri.merge(a[:href])
puts uri # just check to be sure
File.open('n1pdf.pdf', 'wb'){|f| f << scraper.get(uri).body}

Related

How to assert that image has been uploaded with Capybara?

Let's say I have a form where user can create a post with an image attached to it. I want to make sure the image attached is displayed on the next page:
visit url
fill_in the_form
click_on 'Create'
assert_selector '.post'
post = Post.first
img = page.find '.post .image'
assert_equal post.file.thumb.url, URI(img[:src]).path
But I'm told asserting against database objects in system tests is to be avoided. What do I do then?
So long as there's no "complex" file renaming happening on the backend, you already know the uploaded filename when populating the form:
fill_in the_form
Therefore, you could assert that the page contains an image with this name (perhaps using an xpath).
If there is trivial file renaming (e.g. replacing spaces with hyphens), then you could either (ideally) just choose a filename that does not change, or reproduce the renaming in your test.

Ruby Watir -- Trying to loop through links in cnn.com and click each one of them

I have created this method to loop through the links in a certain div in the web site. My porpose of the method Is to collect the links insert them in an array then click each one of them.
require 'watir-webdriver'
require 'watir-webdriver/wait'
site = Watir::Browser.new :chrome
url = "http://www.cnn.com/"
site.goto url
box = Array.new
container = site.div(class: "column zn__column--idx-1")
wanted_links = container.links
box << wanted_links
wanted_links.each do |link|
link.click
site.goto url
site.div(id: "nav__plain-header").wait_until_present
end
site.close
So far it seems like I am only able to click on the first link then I get an error message stating this:
unable to locate element, using {:element=>#<Selenium::WebDriver::Element:0x634e0a5400fdfade id="0.06177683611003881-3">} (Watir::Exception::UnknownObjectException)
I am very new to ruby. I appreciate any help. Thank you.
The problem is that once you navigate to another page, all of the element references (ie those in wanted_links) become stale. Even if you return to the same page, Watir/Selenium does not know it is the same page and does not know where the stored elements are.
If you are going to navigate away, you need to collect all of the data you need first. In this case, you just need the href values.
# Collect the href of each link
wanted_links = container.links.map(&:href)
# You have each page URL, so you can navigate directly without returning to the homepage
wanted_links.each do |link|
site.goto url
end
In the event that the links do not directly navigate to a page (eg they execute JavaScript when clicked), you will need to collect enough data to re-locate the elements later. What you use as the locator will depend on what is known to be static/unique. As an example, I will assume that the link text is a good locator.
# Collect the text of each link
wanted_links = container.links.map(&:text)
# Iterate through the links
wanted_links.each do |link_text|
container = site.div(class: "column zn__column--idx-1")
container.link(text: link_text).click
site.back
end

Content-Disposition inline filename issue with IE

I am displaying a pdf in browser with inline from API using an aspx page.
While saving the pdf using Chrome/Firefox, takes the filename from header("Content-Disposition", "inline;filename=xyz.pdf")
But while saving the pdf using IE it does not reads the filename from header("Content-Disposition", "inline;filename=xyz.pdf"). instead it takes the aspx name.
Technical details
I have an xyz.aspx page.
The xyz.aspx page will invoke an API for a document.
Then the downloaded document from API will transferred to browser with inline to display the pdf document.
Am setting the response header as below and writing the file bytes.
HttpContext.Current.Response.ClearHeaders();
Response.AddHeader("Content-Disposition", "inline;filename=\"" + Name + "\"");
HttpContext.Current.Response.ContentType = "application/pdf";
Issue:
While saving the opened pdf in IE it takes xyz.aspx instead of the name from header.
Requirement:
While saving the pdf using IE, it need to save with the name of pdf.
I googled so much, as every one tells its IE behavior. I hope some one knows a solution.
Note: I have to display the pdf in browser and then save. Not to download using "attachment"
It is true some versions of IE can't handle ("Content-Disposition", "inline;filename=...")
This is because filename=... was originally intended for the attachment disposition. Not all browser-based PDF viewers can handle it.
The only solution I see is to allow access via a different url.
Suppose you have a route to the pdf like: /pdf/view. If you change it to /pdf/view/filename and you configure your application to handle this route in the same way as /pdf/view your problem is solved.
You can also re-write the download url on the webserver.
Depending on your webserver you have various ways of doing this.
I have also tried with all kind of headers and methods.
In the end, my solution was
private FileResult ReturnStreamAsPdf(string fileName, Stream stream)
{
ContentDisposition cd = new ContentDisposition
{
FileName = fileName,
Inline = true // false = prompt the user for downloading; true = browser to try to show the file inline
};
Response.Headers.Add("Content-Disposition", cd.ToString());
Response.Headers.Add("X-Content-Type-Options", "nosniff");
return new FileStreamResult(stream, MediaTypeNames.Application.Pdf);
}
and the Route Attribute on the method:
[Route("api/getpdfticket/{ticketnumber}")]
public async Task<FileResult> GetPdfTicket(string ticketnumber)
And the href:
href="#($"/api/getpdfticket/{product.TicketNumber}.pdf")"
It seems like Microsloft is still inventing their own standards:
http://test.greenbytes.de/tech/tc2231/#inlwithasciifilenamepdf
PDF Handler : content-disposition filename

Ruby Watir-webdriver saving image when navigating directly to the image

I'm trying to grab a set of information from a series of pages that are loaded via JS and to accomplish that I'm using watir-webdriver to load the page and nokogiri to parse them. This is working great, however, I need to grab a picture off of the page. The path of the picture is generated upon the page's loading so I wrote the following to create an array of relative URLS to the images and navigate directly to the absolute URL of the first index of the array, which is always the image I want.
img_srcs = $page_html.css('img').map{ |i| i['src'] } #genereates an array of relative urls pointing to every image
imageURL= "website.com" + img_srcs[1].gsub("..","").to_s #take the relative URL of image at index position 1 (the image) and converts it to an absolute URL
$browser.goto(imageURL)
How can I save this image which the browser has directly loaded? Any help would be appreciated and please let me know if I anything is unclear.
Edit:
I've now added the following code
image_source = $browser.image(:class => "decoded").image.src
File.open("#{$imageID}.txt", "w") do |f|
f.write open(image_source).read
f.close
end
However, I'm getting the error
C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-webdriver-0.6.4/lib/watir-webdriver/el
ements/element.rb:490:in 'assert_exists': unable to locate element, using {:tag_
name=>"img"} (Watir::Exception::UnknownObjectException)
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/watir-webdriver-0.6.4/lib/watir
-webdriver/attribute_helper.rb:71:in 'block in define_string_attribute'
from 12.rb:121:in 'imageDownload'
from 12.rb:134:in 'navAndGrab'
from 12.rb:137:in '<main>'
When you do:
$browser.image(:class => "decoded").image.src
You are looking for the html:
<img class="decoded">
<img src="what_you_want"></img>
</img>
I am guessing your html is not like that, hence you get the exception regarding finding the image within the image.
You probably just want the first image with class decoded (remove the second .image):
image_source = $browser.image(:class => "decoded").src
Or maybe you want the full list of images and then get the first one:
image_source = $browser.images(:class => "decoded").first.src

Selenium 2.0 Webdriver & Ruby, link element methods other than .text? Navigate.to links in array?

I'm a bit further along in converting some sample test/specs from Watir to Selenium. After my last question here and suggested response, I began using Selenium 2.0 with WebDriver instead of Selenium 1.
The example in question deals with gathering all links within a table into an array -- that part is complete. However, once the links are in the array, the only meaningful way that I can interact with them appears to be .text. Using #driver.navigate.to Array[1] gives a URL format error in the browser, and link.href or .src are not valid options.
The Watir implementation gathered these links (pages added by users via CMS), stored them in an array and then visited each page one by one, submitting a lead form. I believe I could get this to work using Selenium and revisiting the "home" page that contains all of the links between lead form submissions, but that could mean hundreds of extra page loads, cached or not.
The code so far:
' #countries = Array.new
#browser.navigate.to "http://www.testingdomain{$env}.com/global"
#browser.find_elements(:xpath, "//table[#class='global-list']//a").each do |link|
#countries << [link.text, link.href] ## The original WATIR line that needs an update
end #links
#countries.uniq! #DEBUG for false CMS content'
The closest item I could find in the selenium-webdriver documentation was the (string).attribute method, but again, am unsure of what attributes
I was not sure of the format for use with the attribute method, but after some experimenting I was able to move past this step.
#countries = Array.new
#browser.navigate.to "http://www.testingdomain{$env}.com/global"
#browser.find_elements(:xpath, "//table[#class='global-list']//a").each do |link|
text = link.attribute("text")
href = link.attribute("href")
#countries << [text, href]
end #links
`#countries.uniq! #DEBUG for false CMS content
Looks like you found your answer to your question on your own.
Indeed, element.attribute allows you to pull any HTML attribute a tag could possibly have. Thus, since you were wanting to pull the URL of the element, you used element.attribute('href') to return the element's href="" attribute. The same can be done for any other attributes, including class, id, style, etc.

Resources