I am currently using selenium and crawling a website.
I have tested if I could set a proxy server on Selenium.
But now, I want to set a paid rental proxy server and I got a trial IP address whose the format looks like this IP:PORT:USER:PASS.
And I don't know how to set USER:PASS.
The provider didn't know how to set in Selenium.
So I don't know what I can do now.
With random proxy this worked fine.
proxy_host = '185.186.61.44'
proxy_port = '11334'
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument("--proxy-server=http://#{proxy_host}:#{proxy_port}")
So I wanted to set something like this.
proxy_host = '185.186.61.44'
proxy_port = '12323'
proxy_user = "7a2345129"
proxy_pass = "easdga341d4"
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--proxy-server=http://#{proxy_host}:#{proxy_port}:#{proxy_user}:#{proxy_pass}")
but I found that it was not that easy as I read some solution that uses puppeteer.
I wonder if there are any solution for my case.
If anybody has any clues I would love you to tell me.
Thank you.
Selenium 4 added support for basic auth, which at the time of writing is Chrome specific.
See here for more details.
To specify basic auth creds:
driver.devtools.new
driver.register(username: 'username', password: 'password')
Example using scraperapi.com as proxy
require 'selenium-webdriver'
proxy = Selenium::WebDriver::Proxy.new(
http: 'proxy-server.scraperapi.com:8001',
ssl: 'proxy-server.scraperapi.com:8001'
)
cap = Selenium::WebDriver::Remote::Capabilities.chrome(proxy: proxy)
options = Selenium::WebDriver::Chrome::Options.new(
args: [
'--no-sandbox',
'--headless',
'--disable-dev-shm-usage',
'--single-process',
'--ignore-certificate-errors'
]
)
driver = Selenium::WebDriver.for(:chrome, capabilities: [options,cap])
driver.devtools.new
driver.register(username: 'scraperapi', password: 'xxxx')
driver.navigate.to("http://httpbin.org/ip")
puts "content: #{driver.page_source}"
The Chrome::Options above are specific to my usecase, expect for the ignore-certificate-errors option which is needed to handle https traffic using scraperapi's proxies.
gemfile had:
gem 'selenium-devtools', '~> 0.91.0'
gem 'selenium-webdriver', '~> 4.1'
The format of an URL is most like
proto://USER:PASS#host.domain.tld:port
So your code have to look like :
proxy_host = '185.186.61.44'
proxy_port = '12323'
proxy_user = "7a2345129"
proxy_pass = "easdga341d4"
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument("--proxy-server=http://{proxy_user}:#{proxy_pass}##{proxy_host}:#{proxy_port}:#")
Related
Very odd behavior from Firefox (geckodriver) with Selenium webdriver. Given the exact same parameters as Chrome (chromedriver) with Selenium, geckodriver seems to ignore proxy settings and connect directly to the Internet. Any reason why geckodriver operates this way or how to work around it?
def selenium_chrome_full(proxy, url)
options = Selenium::WebDriver::Chrome::Options.new(args: ["start-maximized", "--proxy-server=%s" % proxy])
driver = Selenium::WebDriver.for(:chrome, capabilities: options)
driver.get(url)
end
def selenium_firefox_full(proxy, url)
options = Selenium::WebDriver::Firefox::Options.new(args: ["start-maximized", "--proxy-server=%s" % proxy])
driver = Selenium::WebDriver.for(:firefox, capabilities: options)
driver.get(url)
end
proxy = "0.0.0.0:8080"
url = "https://www.google.com/search?client=firefox-b-1-d&q=whatsmyip"
selenium_firefox_full(proxy, url)
selenium_chrome_full(proxy, url)
For a bit of additional context, I also ran a debugger to see the value of options after initialization they are as follows for Firefox and Chrome respectively:
#<Selenium::WebDriver::Firefox::Options:0x000000010814b390 #debugger_address=nil, #options={:args=>["start-maximized", "--proxy-server=0.0.0.0:8080"], :browser_name=>"firefox", :prefs=>{}}, #profile=nil>
#<Selenium::WebDriver::Chrome::Options:0x0000000102362150 #options={:args=>["start-maximized", "--proxy-server=0.0.0.0:8080"], :prefs=>{}, :emulation=>{}, :local_state=>{}, :exclude_switches=>[], :perf_logging_prefs=>{}, :window_types=>[], :browser_name=>"chrome"}, #profile=nil, #logging_prefs={}, #encoded_extensions=[], #extensions=[]>
I have server working that looks a little bit like this
require "socket"
require "openssl"
require "thread"
listeningPort = Integer(ARGV[0])
server = TCPServer.new(listeningPort)
sslContext = OpenSSL::SSL::SSLContext.new
sslContext.cert = OpenSSL::X509::Certificate.new(File.open("cert.pem"))
sslContext.key = OpenSSL::PKey::RSA.new(File.open("priv.pem"))
sslServer = OpenSSL::SSL::SSLServer.new(server, sslContext)
puts "Listening on port #{listeningPort}"
loop do
connection = sslServer.accept
Thread.new {...}
end
When I connect with TLS1.3 and I provide a client cert, I can see that it's working when I verify the cert in the ssl context, but peer_cert is never set on the connection, only the context receives a session.
Do I need to upgrade manually to TLS to access the cert from the client?
The reason why I want it is, I can restrict content or authenticate by looking at the cert on the Gemini protocol
After a lot of reading in the OpenSSL docs I found a solution:
I set the sslContext.verify_mode = OpenSSL::SSL::VERIFY_PEER and add a verification callback
sslContext.verify_callback = proc do |_a, _b|
true
end
Which will behave like VERIFY_NONE, but it does request the peer certificate (which it won't when mode is set to VERIFY_NONE as the documentation states: https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_verify.html
I have the next configuration:
Net::HTTP.ssl_context_accessor 'ssl_version'
#http = Net::HTTP.new(#url.host, 443)
#http.ssl_version = :SSLv2
#http.use_ssl = true
#http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#http.set_debug_output $stderr
#http.open_timeout = 10
#http.read_timeout = 10
And then I use the #http object to make a request_get this way:
path = "/login.cgi?username=#{#url.user}&password=#{#url.password}"
debug("Making request #{#http.address}")
response = #http.request_get(path)
debug("#{response.body}")
#cookie = response.get_fields('set-cookie').split('; ')[0]
Puppet.debug('Cookie got!')
The server is supposed to return me a cookie, but the only output I get from the debug is
Debug: Making request server.com
opening connection to server.com...
opened
And it hangs there forever (not even raising timeout).
I'm very new to ruby, and this code has been retrieved from other stackoverflow questions, and was suppose to work.
I've been searching for google, but haven't found anything similar, any idea?
Changing the SSL version to SSLv3 and the request_get method by post solved the problem.
We are developing a WP8 app that requires push notifications.
To test it we have run the push notification POST request with CURL command line, making sure that it actually connects, authenticates with the client SSL certificate and sends the correct data. We know for a fact that this work as we are receiving pushes to the devices.
This is the CURL command we have been using for testing purposes:
curl --cert client_cert.pem -v -H "Content-Type:text/xml" -H "X-WindowsPhone-Target:Toast" -H "X-NotificationClass:2" -X POST -d "<?xml version='1.0' encoding='utf-8'?><wp:Notification xmlns:wp='WPNotification'><wp:Toast><wp:Text1>My title</wp:Text1><wp:Text2>My subtitle</wp:Text2></wp:Toast></wp:Notification>" https://db3.notify.live.net/unthrottledthirdparty/01.00/AAF9MBULkDV0Tpyj24I3bzE3AgAAAAADCQAAAAQUZm52OkE1OUZCRDkzM0MyREY1RkE
Of course our SSL cert is needed to actually use the URL, but I was hoping someone else has done this and can see what we are doing wrong.
Now, our problem is that we need to make this work with Ruby instead, something we have been unable to get to work so far.
We have tried using HTTParty with no luck, and also net/http directly without any luck.
Here is a very simple HTTParty test script I have used to test with:
require "httparty"
payload = "<?xml version='1.0' encoding='utf-8'?><wp:Notification xmlns:wp='WPNotification'><wp:Toast><wp:Text1>My title</wp:Text1><wp:Text2>My subtitle</wp:Text2></wp:Toast></wp:Notification>"
uri = "https://db3.notify.live.net/unthrottledthirdparty/01.00/AAF9MBULkDV0Tpyj24I3bzE3AgAAAAADCQAAAAQUZm52OkE1OUZCRDkzM0MyREY1RkE"
opts = {
body: payload,
headers: {
"Content-Type" => "text/xml",
"X-WindowsPhone-Target" => "Toast",
"X-NotificationClass" => "2"
},
debug_output: $stderr,
pem: File.read("/Users/kenny/Desktop/client_cert.pem"),
ca_file: File.read('/usr/local/opt/curl-ca-bundle/share/ca-bundle.crt')
}
resp = HTTParty.post uri, opts
puts resp.code
This seems to connect with SSL properly, but then the MS IIS server returns 403 to us for some reason we don't get.
Here is essentially the same thing I've tried using net/http:
require "net/http"
url = URI.parse "https://db3.notify.live.net/unthrottledthirdparty/01.00/AAF9MBULkDV0Tpyj24I3bzE3AgAAAAADCQAAAAQUZm52OkE1OUZCRDkzM0MyREY1RkE"
payload = "<?xml version='1.0' encoding='utf-8'?><wp:Notification xmlns:wp='WPNotification'><wp:Toast><wp:Text1>My title</wp:Text1><wp:Text2>My subtitle</wp:Text2></wp:Toast></wp:Notification>"
pem_path = "./client_cert.pem"
cert = File.read pem_path
http = Net::HTTP.new url.host, url.port
http.use_ssl = true
http.cert = OpenSSL::X509::Certificate.new cert
http.key = OpenSSL::PKey::RSA.new cert
http.ca_path = '/etc/ssl/certs' if File.exists?('/etc/ssl/certs') # Ubuntu
http.ca_file = '/usr/local/opt/curl-ca-bundle/share/ca-bundle.crt' if File.exists?('/usr/local/opt/curl-ca-bundle/share/ca-bundle.crt') # Mac OS X
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
r = Net::HTTP::Post.new url.path
r.body = payload
r.content_type = "text/xml"
r["X-WindowsPhone-Target"] = "toast"
r["X-NotificationClass"] = "2"
http.start do
resp = http.request r
puts resp.code, resp.body
end
Like the HTTParty version, this also returns 403..
I'm starting to get the feeling that this won't actually work with net/http, but I've also seen a few examples of code claiming to work, but I can't see any difference compared to what we have tested with here.
Does anyone know how to fix this? Is it possible? Should I use libcurl instead perhaps? Or even do a system call to curl? (I may have to do the last one as an interim solution if we can't get this to work soon).
Any input is greatly appreciated!
Thanks,
Kenny
Try using some tool like http://mitmproxy.org to compare requests from your code and curl.
For example curl in addition to specified headers does send User-Agent and Accept-headers, microsoft servers may be checking for these for some reason.
If this does not help - then it's ssl-related
Im looking a way to use different IP addresses for each GET request with standard Net::HTTP library. Server has 5 ip addresses and assuming that some API`s are blocking access when request limit per IP is reached. So, only way to do it - use another server. I cant find anything about it in ruby docs.
For example, curl allows you to attach it to specific ip address (in PHP):
$req = curl_init($url)
curl_setopt($req, CURLOPT_INTERFACE, 'ip.address.goes.here';
$result = curl_exec($req);
Is there any way to do it with Net::HTTP library? As alternative - CURB (ruby curl binding). But it will be the last thing i`ll try.
Suggestions / Ideas?
P.S. The solution with CURB (with dirty tests, ip`s being replaced):
require 'rubygems'
require 'curb'
ip_addresses = [
'1.1.1.1',
'2.2.2.2',
'3.3.3.3',
'4.4.4.4',
'5.5.5.5'
]
ip_addresses.each do |address|
url = 'http://www.ip-adress.com/'
c = Curl::Easy.new(url)
c.interface = address
c.perform
ip = c.body_str.scan(/<h2>My IP address is: ([\d\.]{1,})<\/h2>/).first
puts "for #{address} got response: #{ip}"
end
I know this is old, but hopefully someone else finds this useful, as I needed this today. You can do the following:
http = Net::HTTP.new(uri.host, uri.port)
http.local_host = ip
response = http.request(request)
Note that you I don't believe you can use Net::HTTP.start, as it doesn't accept local_host as an option.
There is in fact a way to do this if you monkey patch TCPSocket:
https://gist.github.com/800214
Curb is awesome but won't work with Jruby so I've been looking into alternatives...
Doesn't look like you can do it with Net:HTTP. Here's the source
http://github.com/ruby/ruby/blob/trunk/lib/net/http.rb
Line 644 is where the connection is opened
s = timeout(#open_timeout) { TCPSocket.open(conn_address(), conn_port()) }
The third and fourth arguments to TCPSocket.open are local_address and local_port, and since they're not specified, it's not possible. Looks like you'll have to go with curb.
Of course you can. I did as below:
# remote_host can be IP or hostname
uri = URI.parse( "http://" + remote_host )
http = Net::HTTP.new( uri.host, uri.port )
request = Net::HTTP::Get.new(uri.request_uri)
request.initialize_http_header( { "Host" => domain })
response = http.request( request )