Automatic download file from web page - windows

I am looking for a method to download automatically a file from a website.
Currently the process is really manual and heavy.
I go on a webpage, I enter my pass and login.
It opens a pop up, where I have to click a download button to save a .zip file.
Do you have any advice on how I could automate this task ?
I am on windows 7, and I can use mainly MS dos batch, or python. But I am open to other ideas.

You can use selenium web driver to automate the downloading. You can use below snippet for browser download preferences in java.
FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("browser.download.folderList", 2);
profile.setPreference("browser.download.manager.showWhenStarting", false);
profile.setPreference("browser.download.dir", "C:\\downloads");
profile.setPreference("browser.helperApps.neverAsk.openFile","text/csv,application/x-msexcel,application/excel,application/x-excel,application/vnd.ms-excel,text/html,text/plain,application/msword,application/xml");
To handle the popup using this class when popup comes.
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_DOWN);
robot.keyRelease(KeyEvent.VK_DOWN);
robot.keyPress(KeyEvent.VK_ENTER);
robot.keyRelease(KeyEvent.VK_ENTER);

You'll want to take a look at requests (to fetch the html and the file), Beautifulsoup (to parse the html and find the links)
requests has built in auth: http://docs.python-requests.org/en/latest/
Beautifulsoup is quite easy to use: http://www.crummy.com/software/BeautifulSoup/bs4/doc/
Pseudocode: use request to download the sites html and auth. Go through the links by parsing. If a link meets the criteria -> save in a list, else continue. When all the links have been scrapped, go through them and download the file using requests (req = requests.get('url_to_file_here', auth={'username','password'}), if req.status_code in [200], file = req.text
If you can post the link of the site you want to download from, maybe we can do more.

Related

Is it possible to force fail a recaptcha v2 for testing purposes? (I.e. pretend to be a robot)

I'm implementing an invisible reCAPTCHA as per the instructions in the documentation: reCAPTCHA V2 documentation
I've managed to implement it without any problems. But, what I'd like to know is whether I can simulate being a robot for testing purposes?
Is there a way to force the reCAPTCHA to respond as if it thought I was a robot?
Thanks in advance for any assistance.
In the Dev Tools, open Settings, then Devices, add a custom device with any name and user agent equal to Googlebot/2.1.
Finally, in Device Mode, at the left of the top bar, choose the device (the default is Responsive).
You can test the captcha in https://www.google.com/recaptcha/api2/demo?invisible=true
(This is a demo of the Invisible Recaptcha. You can remove the url invisible parameter to test with the captcha button)
You can use a Chrome Plugin like Modify Headers and Add a user-agent like Googlebot/2.1 (+http://www.google.com/bot.html).
For Firefox, if you don't want to install any add-ons, you can easily manually change the user agent :
Enter about:config into the URL box and hit return;
Search for “useragent” (one word), just to check what is already there;
Create a new string (right-click somewhere in the window) titled (i.e. new
preference) “general.useragent.override”, and with string value
"Googlebot/2.1" (or any other you want to test with).
I tried this with Recaptcha v3, and it indeed returns a score of 0.1
And don't forget to remove this line from about:config when done testing !
I found this method here (it is an Apple OS article, but the Firefox method also works for Windows) : http://osxdaily.com/2013/01/16/change-user-agent-chrome-safari-firefox/
I find that if you click on the reCaptcha logo rather than the text box, it tends to fail.
This is because bots detect clickable hitboxes, and since the checkbox is an image, as well as the "I'm not a robot" text, and bots can't process images as text properly, but they CAN process clickable hitboxes, which the reCaptcha tells them to click, it just doesn't tell them where.
Click as far away from the checkbox as possible while keeping your mouse cursor in the reCaptcha. You will then most likely fail it. ( it will just bring up the thing where you have to identify the pictures).
The pictures are on there because like I said, bots can't process images and recognize things like cars.
yes it is possible to force fail a recaptcha v2 for testing purposes.
there are two ways to do that
First way :
you need to have firefox browser for that just make a simple form request
and then wait for response and after getting response click on refresh button firefox will prompt a box saying that " To display this page, Firefox must send information that will repeat any action (such as a search or order confirmation) that was performed earlier. " then click on "resend"
by doing this browser will send previous " g-recaptcha-response " key and this will fail your recaptcha.
Second way
you can make any simple post request by any application like in linux you can use curl to make post request.
just make sure that you specify all your form filed and also header for request and most important thing POST one field name as " g-recaptcha-response " and give any random value to this field
Just completing the answer of Rafael, follow how to use the plugin
None of proposed answers worked for me. I just wrote a simple Node.js script which opens a browser window with a page. ReCaptcha detects automated browser and shows the challenge. The script is below:
const puppeteer = require('puppeteer');
let testReCaptcha = async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://yourpage.com');
};
testReCaptcha();
Don't forget to install puppeteer by running npm i puppeteer and change yourpage.com to your page address

Headless browser using jmeter

I tried to use (jp#gc - HtmlUnit Driver Config) to create a headless browser test using jmeter, but I get this error
Response message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: "getComputedStyle" is not defined.
I read online and it suggest that jp#gc - HtmlUnit Driver Config doesn't support javascript. Is there a way I can fix this via jmeter? or is there any other option to do headless browser testing. I have linux server as load injector
Update:
I have a webdriver sampler to open google page
WDS.sampleResult.sampleStart() WDS.browser.get('http://google.com')
WDS.sampleResult.sampleEnd()
and have downloaded Phanton JS, but when I run it it doesn't show anything on the report. Should I add any other config?
HtmlUnit do not support very well JS.
I done many tests and used each one and i can say that PHANTOMJS is the best one with good support of all JS/CSS... have a beautiful renderer to have nice screenshots.
by code you can use it like this (you can download it from here http://phantomjs.org/download.html (phantomjs-1.9.8 is very stable)):
Capabilities caps = new DesiredCapabilities();
((DesiredCapabilities) caps).setJavascriptEnabled(true);
((DesiredCapabilities) caps).setCapability("takesScreenshot", true);
((DesiredCapabilities) caps).setCapability(
PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
"your custom path\\phantomjs.exe"
);
WebDriver driver = new PhantomJSDriver(caps);
If you want to do that via JMeter GUI, you need to add before your Logic Controller an element JSR223 Sampler JSR223_Sampler
and inside the script panel :
org.openqa.selenium.Capabilities caps = new org.openqa.selenium.remote.DesiredCapabilities();
((org.openqa.selenium.remote.DesiredCapabilities) caps).setJavascriptEnabled(true);
((org.openqa.selenium.remote.DesiredCapabilities) caps).setCapability("takesScreenshot", true);
((org.openqa.selenium.remote.DesiredCapabilities) caps).setCapability(
org.openqa.selenium.phantomjs.PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
"your custom path\\phantomjs.exe");
org.openqa.selenium.WebDriver driver = new org.openqa.selenium.phantomjs.PhantomJSDriver(caps);
org.apache.jmeter.threads.JMeterContextService.getContext().getCurrentSampler().getThreadContext()
.getVariables().putObject(com.googlecode.jmeter.plugins.webdriver.config.WebDriverConfig.BROWSER, driver);
Do not hesitate if you need more informations.

How to convert IHTMLImgElement to image

I am automating Internet Explorer using SHDocVW.dll and MSHTML with C#, and I wish to save an image from the page to the disk (JPEG format).
I can't use the WebClient class to download the image; if I do it, I end up downloading the site's login page. I can't print the screen either, because the browser has to remain invisible during this process, running in the background.
I have tried to do the following:
IHTMLImgElement imgElement = ...;
IHTMLControlRange imgRange = ...;
imgRange.add(imgElement as IHTMLControlElement);
imgRange.execCommand( "copy", false, null );
This does nothing. I am not able to extract anything from the clipboard. Every solution I found didn't work for me.
Your webclient approach is probably missing cookies... see How do I log into a site with WebClient? for an example that handles cookies.
your code looks fine except the user has to change the security setting to enable clipboard access. If the image is cached on disk you can dig the WinInet cache after parsing the page for the image location.

How to edit the url in current browser using Watin

I need to navigate to new url from the current opened browser using Wating Code, Let me know if any one tried the same scenario. Also I need to get the url in the current opened browser.
using (IE browser = new IE())
{
browser.GoTo("www.google.co.uk");
string curentUrl = browser.Url;
}
If the browser is already open, you use the AttachTo static method
http://watinandmore.blogspot.com/2010/01/browserattachto-and-iattachto.html
HTH!

Register an application to a URL protocol (all browsers) via installer

I know this is possible via a simple registry change to accomplish this as long as IE/firefox is being used. However, I am wondering if there is a reliable way to do so for other browsers,
I am specifically looking for a way to do this via an installer, so editing a preference inside a specific browser will not cut it.
Here is the best I can come up with:
IE: http://msdn.microsoft.com/en-us/library/aa767914(VS.85).aspx
FireFox: http://kb.mozillazine.org/Register_protocol
Chrome: Since every other browser in seems to support the same convention, I created a bug for chrome.
Opera: I can't find any documentation, but it appears to follow the same method as IE/Firefox (see above links)
Safari: Same thing as opera, it works, but I can't find any documentation on it
Yes. Here is how to do it with FireFox:
http://kb.mozillazine.org/Register_protocol
and Opera:
http://www.opera.com/support/kb/view/535/
If someone looks like a solution for an intranet web site (for all browsers, not only IE), that contains hyperlinks to a shared server folders (like in my case) this is a possible solution:
register protocol (URI scheme) via registry (this can be done for all corporative users i suppose). For example, "myfile:" scheme. (thanks to Greg Dean's answer)
The hyperlink href attribute will then should look like
<a href='myfile:\\mysharedserver\sharedfolder\' target='_self'>Shared server</a>
Write a console application that redirects argument to windows explorer (see step 1 for example of such application)
This is piece of mine test app:
const string prefix = "myfile:";
static string ProcessInput(string s)
{
// TODO Verify and validate the input
// string as appropriate for your application.
if (s.StartsWith(prefix))
s = s.Substring(prefix.Length);
s = System.Net.WebUtility.UrlDecode(s);
Process.Start("explorer", s);
return s;
}
I think this app can be easily installed by your admins for all intranet users :)
I couldn't set up scheme setting to open such links in explorer without this separate app.

Resources