Impossible to configure Selenium + Cntlm proxy in Python? - firefox

Is it impossible for people behind authenticate proxy (with domain) to use Selenium in Python, going through CNTML proxy on localhost?
Nothing seems to be working. Eventhough I put proxy settings to that:
myProxy = "localhost:3128"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': myProxy,
'ftpProxy': myProxy,
'sslProxy': myProxy,
'noProxy': '' # set this value as desired
})
self.browser = webdriver.Firefox(proxy=proxy)
I also tried to create a custom profile for Firefox, configuring the CNTLM proxy inside it, forcing in python that profile for my Firefox webdriver, but nothing is working.
Is that even possible?

Related

proxy.pac - exception for images

I'm a web developer and I use squid as a proxy, which I entered in firefox as the proxy server.
So when I enter http://www.example.com in firefox, I see the site on my local machine, by having configured squid accordingly.
Now problem is, that some of our customers have GBs of images, and it's a pain to load them all on my machine. So basically I want to use my offline webpage, but loading the images from the live server, so I don't have a broken site without images.
In order to do this I've tried to create a proxy.pac and configured it this way:
function FindProxyForURL(url, host) {
if (shExpMatch(url, "*.jpg")) {
return "DIRECT";
} else {
return "PROXY 192.168.178.31:3128; DIRECT";
}
}
Unfortunately it doesn't really work. What am I doing wrong, and how can I achieve my goal?
According to the Mozilla document on PAC files:
The path and query components of https:// URLs are stripped. In Chrome, you can disable this by setting PacHttpsUrlStrippingEnabled to false, in Firefox the preference is network.proxy.autoconfig_url.include_path.
What this means is when you enter a url such as https://www.example.com/image.jpg, what gets passed to the PAC script is the url https://www.example.com. As a result, you're never going to enter the first condition of your if statement.
In Firefox, you can change this by going to the about:config page and setting network.proxy.autoconfig_url.include_path to true.

Vue Cli Webpack Proxy

I've been developing with the vue-cli and the Webpack template. Everything works flawlessly but I'm having some issues using a custom host. Right now Webpack listens to localhost:8080 (or similar) and I want to be able to use a custom domain such as http://project.dev. Has anybody figured this out?
This might be where the problem resides:
https://github.com/chimurai/http-proxy-middleware
I also added this to the proxyTable:
proxyTable: { 'localhost:8080' : 'http://host.dev' } and it gives me a console response [HPM] Proxy Created / -> http://host.dev
Any advice, direction or suggestion would be great!
Update
I successfully added a proxy to my Webppack project this way:
var mProxy = proxyMiddleware('/', {
target: 'http://something.dev',
changeOrigin: true,
logLevel: 'debug'
})
app.use(mProxy)
This seems to work, but still not on port 80.
Console Log:
[HPM] Proxy created: / -> http://something.dev
I can assume the proxy is working! But my assets are not loaded when I access the url.
Is important to note I'm used to working with Mamp -- and its using port 80. So the only way I can run this proxy is to shut down Mamp and set the port to 80. It seems to work, but when I reload page with the proxy URL -- there is a little delay, trying to resolve, and then console outputs this:
[HPM] GET / -> http://mmm-vue-ktest.dev
[HPM] PROXY ERROR: ECONNRESET. something.dev -> http://something.dev/
And this displays in the browser:
Error occured while trying to proxy to: mmm-vue-ktest.dev/
The proxy table is for forwarding requests to another server, like a development API server.
If you want the webpack dev server to run on this address, you have to add it to your OS's hosts file. Vue or we pack can't do this, it's the job of your OS.
Google will have simple guides for every OS.

How to use multiple proxy when crawling with scrapy + splash?

We crawl with scrapy + splash and we want to use multiple proxy. But splash only support single proxy https://splash.readthedocs.io/en/stable/api.html#proxy-profiles.
[proxy]
; required
host=proxy.crawlera.com
port=8010
; optional, default is no auth
username=username
password=password
; optional, default is HTTP. Allowed values are HTTP and SOCKS5
type=HTTP
How to use multiple proxy when crawling with scrapy + splash?
There are several options:
use multiple profiles (as Rafael Almeida suggested in comment);
pass a different proxy URL with each request (see http://splash.readthedocs.io/en/stable/api.html#arg-proxy);
write a Splash Lua script and use request:set_proxy in splash:on_request callback - there is an example in docs. This way you can set a different proxy for different requests initialted by a page, not only a single proxy per rendered page. I'm not aware of a way to do that in other browser automation tools like phantomjs or selenium.

Settingup Proxy with htmlunit

I am new on htmlunit, and with almost No knowledge of programming.
In Centos - Webserver (for www.mydomain.com), i am trying to create a proxy.
I want my server to be used as proxy on request comming for www.mydomain.com and
send response as htmlsnapshot.
i saw some of the things like
SocketAddress addr = new InetSocketAddress("xxx.xxx.xx.xxx", 8888);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr); //or Proxy.Type.SOCKS
URL url = new URL("http://mydomain.com/test"); URConnection conn =
url.openConnection(proxy);
But i wonder where to setup that.
Can this be done entirely from apache..
i dont see any configuration file as such fo htmlunit.
BTW, i have installed htmlunit ( using jpackage repo)
HTMLUnit is GUI-Less browser for Java programs Where did you see the code that you have included. From memory I cannot recall the API even having a Proxy Class. You can certainly point the WebClient at a proxy by passing in a ProxyConfig; and that is only there to declutter the WebClient Class.
Are you looking for a Proxy server, or a way to simulate a browser?

watir-webdriver change proxy while keeping browser open

I am using the Watir-Webdriver library in Ruby to check some pages. I know I can connect through a proxy using
profile = Selenium::WebDriver::Firefox::Profile.new#create a new profile
profile.proxy = Selenium::WebDriver::Proxy.new(#create proxy data for in the profile
:http => proxyadress,
:ftp => nil,
:ssl => nil,
:no_proxy => nil
)
browser = Watir::Browser.new :firefox, :profile => profile#create a browser window with this profile
browser.goto "http://www.example.com"
browser.close
However, when wanting to connect to the same page multiple times using different proxies, I have to create a new browser for every proxy. Loading(and unloading) the browser takes quite some time.
So, my question: Is there any way to change, using webdriver in ruby, the proxy adress Firefox uses to connect through while keeping the browser open?
If you want to test whether a page is blocked when accessed through a proxy server, you can do that through a headless library. I recently had success using mechanize. You can probably use net/http as well.
I am still not sure why you need to change the proxy server for a current session.
require 'Mechanize'
session = Mechanize.new
session.set_proxy(host, port, user, pass)
session.user_agent='Mac Safari'
session.agent.robots = true #observe of robots.txt rules
response = session.get(url)
puts response.code
You need to supply the proxy host/port/user/pass (user/pass are optional), and the url. If you get an exception, then the response.code is probably not friendly.
You may need to use an OS level automation tool to automate going through the FF menus to change the setting as a user would.
For windows users there is the option of either the new RAutomation tool, or AutoIT. both can be used to automate things at the OS UI level, which would let you go into the browser settings and change the proxy there.
Still I'd think if you are checking a larger number of sites that the overhead to change the proxy settings would not be that much compared to all of the site navigation and waiting for pages to load etc.
Unless you are currently taking a 'row traverse' approach and changing proxy settings multiple times for each site you are checking? If that's the case I would go towards more of a by-column method (if we were to presume each column is a proxy, and each row is a site) and fire up the browser for one proxy, check all the sites, then change the proxy and re-check all the sites. That way you'd only be changing the proxy settings once for each proxy which should not add that much overhead to your script.
It might mean a little more work with storing and then reporting results at the end (if you had been writing them out a line at a time) but that's what hashes or arrays are for.

Resources