Selenium: automatic download

Selenium: automatic download - windows

I used Selenium in the Python version.
But I am stuck at the final step. I can open the browser, login, and open the download link but the file does not download automatically.
I have read the documentation, and technically I just need to set fp.set_preference("browser.helperApps.neverAsk.saveToDisk","...") with the right values, but still, it is not working.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import os
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", os.getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/EDI-X12, application/EDIFACT, application/javascript, application/octet-stream, application/ogg, application/pdf, application/xhtml+xml, application/x-shockwave-flash, application/json, application/xml, application/zip, audio/mpeg, audio/x-ms-wma, audio/vnd.rn-realaudio, audio/x-wav, image/gif, image/jpeg, image/png, image/tiff, image/vnd.microsoft.icon, image/vnd.djvu, image/svg+xml, multipart/mixed, multipart/alternative, multipart/related, text/css, text/csv, text/html, text/javascript(obsolete), text/plain, text/xml, video/mpeg, video/mp4, video/quicktime, video/x-ms-wmv, video/x-msvideo, video/x-flv, video/webm, application/vnd.oasis.opendocument.text, application/vnd.oasis.opendocument.spreadsheet, application/vnd.oasis.opendocument.presentation, application/vnd.oasis.opendocument.graphics, application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.ms-powerpoint, application/vnd.openxmlformats-officedocument.presentationml.presentation, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/vnd.mozilla.xul+xml")
browser = webdriver.Firefox(firefox_profile=fp)
browser.get('https://www4.webcas.net/mail02/menu')
browser.find_element_by_xpath("//input[#type='text']").send_keys('login')
browser.find_element_by_xpath("//input[#type='password']").send_keys('pass' + Keys.RETURN)
time.sleep(2) # delays for 2 seconds
browser.get('https://www4.webcas.net/mail02/fm/onetime-ticket?to=enquete')
time.sleep(2) # delays for 2 seconds
browser.get('https://www4.webcas.net/form02/operator/formulator/download?enquete_id=4770')
time.sleep(2) # delays for 2 seconds
In the end, I always have the firefox popup "would you like to open or save".
Is there anything I am doing wrong ?

I found the solution.
The method is the correct one, but the parameters are not correct.
I had to check the file
C:\Users\User\AppData\Roaming\Mozilla\Firefox\Profiles\tofzlgfm.default\mimeTypes.rdf
Where I found the following lines
<RDF:Description RDF:about="urn:mimetype:text/*"
NC:value="text/*"
NC:editable="true"
NC:fileExtensions="zip"
NC:description="WinZip File">
<NC:handlerProp RDF:resource="urn:mimetype:handler:text/*"/>
</RDF:Description>
So, I just modified my line adding text/* to the other values
I hope this will be useful to someone.

I was having the same problem with pdf files, to workaround that I also added the following to the profile
fp.set_preference("browser.helperApps.neverAsk.openFile", "application/pdf")
fp.set_preference("pdfjs.disabled", True)
So in your case you should change the file types to match the ones you want and could ignore the second line. Hope this helps, cheers.
Bonus point: To increase performance, disable image loading with
fp.set_preference("permissions.default.image", 2)

Related

Unable to see Response Body in JMeter, View Results Tree listener

When I run a Http Request, to a page that should return a response body (I know it's working because I already tried in Postman). When I execute the sampler it's sends a 200 OK code, but the response body in the View Results Tree Listener, is empty. Why does this happen?
I use MAC OS, and I installed JMeter with Brew. I've already tried to add the following information in the user.properties file:
jmeter.save.saveservice.output_format=xml
jmeter.save.saveservice.response_data=true
jmeter.save.saveservice.samplerData=true
jmeter.save.saveservice.requestHeaders=true
jmeter.save.saveservice.url=true
jmeter.save.saveservice.responseHeaders=true
It looks like this:

I had this exact issue, and the final answer was to use either Java version 9 or in my case it was Java version 8. That fixed the issue and I am now able to see the response body and response headers.

The changes you've made don't have any impact onto View Results Tree listener output, they are only for .jtl results files.
Try the following:
Run your JMeter test in command-line non-GUI mode like
jmeter -n -t test.jmx -l result.xml
and open result.xml file with your faviourite text or XML viewer/editor. You should see something like:
<?xml version="1.0" encoding="UTF-8"?>
<testResults version="1.2">
<httpSample t="93" it="0" lt="93" ct="42" ts="1568029799118" s="true" lb="HTTP Request" rc="200" rm="OK" tn="Thread Group 1-1" dt="text" by="759" sby="139" ng="1" na="1">
<responseData class="java.lang.String">{
"userId": 1,
"id": 1,
"title": "delectus aut autem",
"completed": false
}</responseData>
<java.net.URL>http://jsonplaceholder.typicode.com/todos/1</java.net.URL>
</httpSample>
</testResults>
where responseData tag contains XML-escaped response data. If there is some data in the file - most probably something is wrong with your JMeter installation, try re-installing it by downloading JMeter from the official website as the Brew package might be broken.
Check jmeter.log file contents, if anything goes wrong JMeter normally writes a log message with the results.

I see you didn't load the generated *.jtl file in your "View results Tree" panel. You should browse and open that file and then you can see the results. Remember, you must add the listener, run the tests in non-GUI mode once, and the *.jtl file would then contain the results. Open them here:
And I think the correct results are not shown.

Weblogic 10.3.6 javascript MIME type issue in chrome

Hi I am using the chrome Version 49.0.2623.87 (64-bit) in Mac.
When I try to load the ck editor in chrome I am getting the following error:
Refused to execute script from 'ckeditor/ckeditor.js' because its MIME type ('') is not executable, and strict MIME type checking is enabled.
It is working fine in firefox and Safari.

The other answer works however it must be put in the file as
key=value
key=value
ex:
XML=text/xml
js=text/javascript
More info for configuration: https://www.techpaste.com/2014/06/steps-configure-mime-types-weblogic-11g/

This is the problem with web logic server. Server not setting any mime type in the response when browser request the creditor.js .To resolve this issue create a file with name mimemappings.properties in config directory of your server.
Place the following content in the file
XML = text/xml
js = text/javascript

download image from url using python urllib but receiving HTTP Error 403: Forbidden

I want to download image file from a url using python module "urllib.request", which works for some website (e.g. mangastream.com), but does not work for another (mangadoom.co) receiving error "HTTP Error 403: Forbidden". What could be the problem for the latter case and how to fix it?
I am using python3.4 on OSX.
import urllib.request
# does not work
img_url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png'
img_filename = 'my_img.png'
urllib.request.urlretrieve(img_url, img_filename)
At the end of error message it said:
...
HTTPError: HTTP Error 403: Forbidden
However, it works for another website
# work
img_url = 'http://img.mangastream.com/cdn/manga/51/3140/006.png'
img_filename = 'my_img.png'
urllib.request.urlretrieve(img_url, img_filename)
I have tried the solutions from the post below, but none of them works on mangadoom.co.
Downloading a picture via urllib and python
How do I copy a remote image in python?
The solution here also does not fit because my case is to download image.
urllib2.HTTPError: HTTP Error 403: Forbidden
Non-python solution is also welcome. Your suggestion will be very appreciated.

This website is blocking the user-agent used by urllib, so you need to change it in your request. Unfortunately I don't think urlretrieve supports this directly.
I advise for the use of the beautiful requests library, the code becomes (from here) :
import requests
import shutil
r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png', stream=True)
if r.status_code == 200:
with open("img.png", 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
Note that it seems this website does not forbide requests user-agent. But if need to be modified it is easy :
r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png',
stream=True, headers={'User-agent': 'Mozilla/5.0'})
Also relevant : changing user-agent in urllib

You can build an opener. Here's the example:
import urllib.request
opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)
url=''
local=''
urllib.request.urlretrieve(url,local)
By the way, the following codes are the same:
(none-opener)
req=urllib.request.Request(url,data,hdr)
html=urllib.request.urlopen(req)
(opener builded)
html=operate.open(url,data,timeout)
However, we are not able to add header when we use:
urllib.request.urlretrieve()
So in this case, we have to build an opener.

I try wget with the url in terminal and it works:
wget -O out_005.png http://mangadoom.co/wp-content/manga/5170/886/005.png
so my way around is to use the script below, and it works too.
import os
out_image = 'out_005.png'
url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png'
os.system("wget -O {0} {1}".format(out_image, url))

Getting random "read_nonblock': end of file reached (EOFError)" with Net::HTTP.start

When I execute the following code...
http = Net::HTTP.start('jigsaw.w3.org')
http.request_post('/css-validator/validator', ' ', 'Content-type' => "multipart/form-data")
...then I very often get the following error:
EOFError: end of file reached
from /Users/josh/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/net/protocol.rb:153:in `read_nonblock'
Is this only me? What could be the problem? Sometimes it seems to work, but most of the time it doesn't.

The problem seems to be on the side of the host:
Loading http://jigsaw.w3.org/css-validator/DOWNLOAD.html manually in a browser results most of the time in "no data received" at the moment.
I'm trying to set up the downloadable command line version of the validator on my local machine and use this. More info here: How can I validate CSS on internal web pages?

HTML 5 Cache manifest gets cached itself

I have a problem of that it seems that the cache.manifest file gets cached itself. Meaning every changes to the file are not being noted by (Mobile) Safari, so it will never update and always show the last cached files.
I tried to avoid it using an .htaccess file in the same directory as the cache.manifest file:
ExpiresActive On
ExpiresDefault "access"
That didn't help so I changed cache.manifest in a php file that contains the following headers:
header("Expires: Mon, 26 Jul 1990 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
header('Content-Type: text/cache-manifest');
Anyone have other ideas of how I can make sure the cache file itself will get retrieved if possible?
Works on: Safari (Desktop), Chrome (Samsung Galaxy Tab v10.1), Firefox
Fails on: Chrome, Safari (iOS)
Renamed the cache.manifest.php back to cache.manifest and added the following lines to the .htaccess
<IfModule mod_expires.c>
Header set Cache-Control "public"
ExpiresActive on
# cache.manifest needs re-requests in FF 3.6 (thx Remy ~Introducing HTML5)
ExpiresByType text/cache-manifest "access plus 0 seconds"
</IfModule>
If I change the revision comment within the cache.manifest and refresh it on Safari (iOS) it still shows me the old file. I am clueless.

According to the HTML5 documentation, if an application cache manifest file is byte-for-byte the same as a previous one, regardless of HTTP cache headers for expiry/etc, it is considered to not require an update.
At the bottom of your cache manifest file, you need to include a comment at the bottom of the file with the timestamp of the most recently modified file, e.g.:
# last modified: Thu, 30 Jun 2011 01:19:46 GMT
This will break up the byte-for-byte identicalness, even if the list of files remains the same but a few are updated.

As alluded to in other answers, cache manifests are a real pain to deal with.
I've tweaked a PHP manifest "build" script for my HTML5 notepad app.
Tested and working on Chrome, Firefox, IE8+, Android and iOS.
It's open source and available here: https://github.com/JasonHanley/note5/blob/master/build.php
I also use the ExpiresByType text/cache-manifest "access plus 0 seconds" in my .htaccess and I believe that is necessary in addition to generated manifest timestamps.

I've just stumbled onto this one myself, and in a similar vein to SimpleCoders suggestion I'd suggest that if you are using Apache you can generate the cache.manifest using Server Side Includes, eg:
CACHE MANIFEST
# <!--#flastmod file="index.html"-->
# <!--#flastmod file="whatever.js"-->
# <!--#flastmod file="whatever.css"-->
whatever.js
whatever.css
That way, whenever any of those files are updated, the manifest will change automatically. You may also need to enable includes for that file and disable caching, eg: Apache config something like:
Alias /whatever /var/www/whatever
<Directory /var/www/whatever>
Options +Includes
AddHandler server-parsed .manifest
</Directory>
CacheDisable /whatever/ihealth.manifest
Check your server logs to make sure you're returning the file with a "200 Okay" rather than a "304 Not Modified".

The cache manifest is a terrible piece of technology.
The browser is not caching the manifest; instead, it's just failing to recognize that it has changed which is what you are observing. Try adding a random comment or two to your manifest (prepend comments with #) and then see if it works.
Just modifying files that the manifest references won't trigger the browser to redownload the manifest. If this is what you were hoping for, then try this: Use a PHP file to generate your manifest. Of course, use header to set the proper MIME type. After you have echoed out all of your resources, echo out the hash of the timestamp of all of those resources. That way, if one of them is modified, the manifest file changes. This is what I'm using:
// Collect a list of resources we need to check (customize to your needs)
$files = array(
"/scripts/script1.js",
"/scripts/script2.js",
"/scripts/script3.js",
"/scripts/script4.js",
"/css/style.css"
);
$filetime = 0;
foreach ($files as $file) {
$filetime += filemtime($file);
}
// This echoes out the hash of the filetimes as a comment
echo "#" . sha1($filetime);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Selenium: automatic download - windows

Related

Unable to see Response Body in JMeter, View Results Tree listener

Weblogic 10.3.6 javascript MIME type issue in chrome

download image from url using python urllib but receiving HTTP Error 403: Forbidden

Getting random "read_nonblock': end of file reached (EOFError)" with Net::HTTP.start

HTML 5 Cache manifest gets cached itself

Categories

Resources