I am trying to use the Hugging face pipeline behind proxies.
Consider the following line of code
from transformers import pipeline
sentimentAnalysis_pipeline = pipeline("sentiment-analysis")
The above code gives the following error.
HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json (Caused by ProxyError('Your proxy appears to only use HTTP and not HTTPS, try changing your proxy URL to be HTTP. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#https-proxy-error-http-proxy', SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1091)'))))
I tried to check the proxy on my machine having OS - "windows server 2016 Datacenter" using the following code.
import urllib.request
print(urllib.request.getproxies())
The output is as follows:
{'http': 'http://12.10.10.12:8080', 'https': 'https://12.10.10.12:8080', 'ftp': 'ftp://12.10.10.12:8080'}
However, as per the documentation from urlib3 page, the above setting is incompatible and the problem lies in the https setting :
{
"http": "http://127.0.0.1:8888",
"https": "https://127.0.0.1:8888" # <--- This setting is the problem!
}
and the right setting is
{ # Everything is good here! :)
"http": "http://127.0.0.1:8888",
"https": "http://127.0.0.1:8888"
}
How can we change the proxy setting from "https": "https://127.0.0.1:8888" to "https": "http://127.0.0.1:8888" in a windows OS?
I tried by setting the windows environment variable name as "https_proxy" and the variable values as http://127.0.0.1:8888. However, It is not working.
I found the solution and it is pretty simple. Include the following lines in your python script/notebook. Change the proxy_url and port as per your setting. I hope it helps, someone in the community.
import os
os.environ['HTTP_PROXY'] = 'http://proxy_url:proxy_port'
os.environ['HTTPS_PROXY'] = 'http://proxy_url:proxy_port'
Method 1 (recommended): pypac
pip install pypac
Monkeypatch the requests libary.
import requests
import pypac
def request(method, url, **kwargs):
with pypac.PACSession() as session:
return session.request(method=method, url=url, **kwargs)
requests.request = request
transformers should now work as expected.
from transformers import pipeline
sentimentAnalysis_pipeline = pipeline("sentiment-analysis")
Method 2: Disable SSL verification
WARNING: This method could expose you to malicious attacks.
Disabling SSL verification is a bad idea in general, but the story even worse here because (afaik) transformers may download code and run exec on it. This opens the door for a man-in-the-middle to execute arbitrary code on your machine.
This is probably a very bad idea unless you really know what you're doing or you don't care at all about your machine's security. Otherwise, do not use this method.
import requests
import functools
requests.request = functools.partial(requests, verify=False)
Explaination
Setting the HTTP_PROXY and HTTPS_PROXY environment variables might not be enough to get through your corporate firewall. It may be using a .pac to autoconfigure its proxy on your machine. Browsers pick file up and use it automatically, as do some development tools (e.g. JetBrains). The requests library does not appear to do so.
Fortunately, there is a library called PyPAC that can does this for you. But you'll need to monkeypatch requests to use PyPAC's request method rather than its own.
You don't need to patch requests.get since that function delegates to requests.request anyway.
Related
I am trying to download file from a url using selenium and Firefox on python3 but that give me an error in the geckodriver log file:
(firefox:13723): Gtk-WARNING **: 11:12:39.178: Theme parsing error: <data>:1:77: Expected ')' in color definition
1546945960048 Marionette INFO Listening on port 40601
1546945960132 Marionette WARN TLS certificate errors will be ignored for this session
console.error: BroadcastService:
receivedBroadcastMessage: handler for
remote-settings/monitor_changes
threw error:
Message: Error: Polling for changes failed: NetworkError when attempting to fetch resource..
Stack:
remoteSettingsFunction/remoteSettings.pollChanges#resource://services-settings/remote-settings.js:188:13
I use geckodriver verssion 0.22 and firefow version 65.0. Also am on UBUNTU 18 (only ssh)
geckodriver is in the /usr/bin file and have all the needed right.
I have read on google that this might be because of the COPS. But I really get what the COPS are or how to do to fix them (if that is the real problem).
here my code:
from os import getcwd
from pyvirtualdisplay import Display
from selenium import webdriver
# start the virtual display
display = Display(visible=0, size=(800, 600))
display.start()
# configure firefox profile to automatically save csv files in the current directory
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv")
driver = webdriver.Firefox(firefox_profile=fp)
page = "https://www.thinkbroadband.com/download"
driver.get(page)
driver.find_element_by_xpath("//*[#id='main-col']/div/div/div[8]/p[2]/a[1]").click()
Do you guys have any idea ?
This error message...
Message: Error: Polling for changes failed: NetworkError when attempting to fetch resource..
...implies that there was a NetworkError while attempting to fetch resource.
Here the main issue probably is related to Cross-Origin Resource Sharing (CORS)
Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin. A web application makes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, and port) than its own origin.
An example of a cross-origin request: The frontend JavaScript code for a web application served from http://domain-a.com uses XMLHttpRequest to make a request for http://api.domain-b.com/data.json.
For security reasons, browsers restrict cross-origin HTTP requests initiated from within scripts. For example, XMLHttpRequest and the Fetch API follow the same-origin policy. This means that a web application using those APIs can only request HTTP resources from the same origin the application was loaded from, unless the response from the other origin includes the right CORS headers.
Modern browsers handle the client-side components of cross-origin sharing, including headers and policy enforcement. But this new standard means servers have to handle new request and response headers.
Solution
You need to induce WebDriverWait for the desired element to be clickable and you can use the following solution:
Code Block:
from selenium import webdriver
from os import getcwd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# configure firefox profile to automatically save csv files in the current directory
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv")
driver = webdriver.Firefox(firefox_profile=fp, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.thinkbroadband.com/download")
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='specific-download-headline' and contains(., 'Extra Small File (5MB)')]//following::p[1]/a"))).click()
Snapshot:
Reference: How to resolve “TypeError: NetworkError when attempting to fetch resource.”
I got the same error. After updating the geckodriver vresion to geckodriver 0.24.0 ( 2019-01-28) worked fine for me. Try this
xxxxx:~$ geckodriver --version
geckodriver 0.24.0 ( 2019-01-28)
I'm trying to get a Typo3 (6.2) instance running behind a (forwarding) proxy (squid). I have set
'HTTP' => array(
'adapter' => 'curl',
'proxy_host' => 'my.local.proxy.ip',
'proxy_port' => '8080',
)
as well as
'SYS' => array(
'curlProxyServer' => 'http://my.local.proxy.ip:8080',
'curlUse' => '1'
)
The proxy doesn't ask for credentials.
When I try to update the extension list, I get the error message
Update Extension List
Could not access remote resource http://repositories.typo3.org/mirrors.xml.gz.
If I try Get preconfigured distribution, it says
1342635425
Could not access remote resource http://repositories.typo3.org/mirrors.xml.gz.
According to the proxy log, the server doesn't even try to connect to the proxy.
I can easily download the file using wget on the command line.
Ok, I've investigated he issue a bit more and from what I can tell, the Typo3 doesn't even try to connect anywhere.
I used tcpdump and wireshark to analyze the network traffic. The site claims to have tried sending a http-Request to repositories.typo3.org so I'd expect to find either a proxy connection attempt or a DNS query followed by an attempt to connect to that IP. (Of course, the latter is known not to work.) However, none of this happens.
I've tried some slight changes in the variable curlProxyServer. The documentation clearly states
String: Proxyserver as http://proxy:port/. Deprecated since 4.6 - will be removed in TYPO3 CMS 7. See below for http options.
So I tried adding the trailing "/" and removing the "http://" - no change. I'm confident there's no problem whatsoever regarding the proxy as the proxy isn't even contacted and has been working perfectly fine for everything else for years.
The error message comes from \TYPO3\CMS\Extensionmanager\Utility\Repository\Helper::fetchFile(). This one uses \TYPO3\CMS\Core\Utility\GeneralUtility::getUrl() to get the actual file content.
According to your setting, it should use the first part of the function, because curlUse is set and the URL starts with http or https.
So what you would need to do now is to throw some debug lines in the code and check at what point the request goes wrong.
Look at the source code, three possibilities come to mind:
The curl proxy parameters does not support a scheme, thus it should be 'curlProxyServer' => 'my.local.proxy.ip:8080',.
Some redirect does not work.
Your proxy has problems with https, because the TYPO3 TER should be queried over https.
I try to proxy traffic of a ruby application over a SOCKS proxy using ruby 2.0 and SOCKSify 1.5.0.
require 'socksify/http'
uri = URI.parse("www.example.org")
proxy_addr = "127.0.0.1"
proxy_port = 14000
puts Net::HTTP.SOCKSProxy(proxy_addr, proxy_port).get(uri)
This is the minimal working example. Obviously it doesn't work but I think it should. I receive no error messages executing the file, it doesn't stop so I have to abort it manually. I have tried the solution after I found it in this answer (the code in that answer is different, but as mentioned above I first adapted it to my match my existing non-proxy-code and afterwards reduced it)
The proxies work, I tested both tor and ssh -D connection on my own webserver and other websites.
As rubyforge seems to be no longer existing, I can't access the SOCKSify documentation on it. I think the version might be outdated, does not work with ruby 2.0 or something like that.
What am I doing wrong here? Or is there an alternative to SOCKSify?
Checking the documentation for Net::HTTP#Proxies gives an example we can base our code on. Also note the addition of the .body method, also found in the documentation.
Try this code:
require 'socksify/http'
uri = URI.parse('http://www.example.org/')
proxy_addr = '127.0.0.1'
proxy_port = 1400
Net::HTTP.SOCKSProxy(proxy_addr, proxy_port).start(uri.host, uri.port) do |http|
puts http.get(uri.path).body
end
Is it impossible for people behind authenticate proxy (with domain) to use Selenium in Python, going through CNTML proxy on localhost?
Nothing seems to be working. Eventhough I put proxy settings to that:
myProxy = "localhost:3128"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': myProxy,
'ftpProxy': myProxy,
'sslProxy': myProxy,
'noProxy': '' # set this value as desired
})
self.browser = webdriver.Firefox(proxy=proxy)
I also tried to create a custom profile for Firefox, configuring the CNTLM proxy inside it, forcing in python that profile for my Firefox webdriver, but nothing is working.
Is that even possible?
When access an XML-RPC service using xmlrpc/client in ruby, it throws an OpenSSL::SSL::SSLError when the server certificate is not valid. How can I make it ignore this error and proceed with the connection?
Turns out it's like this:
xmlrpc = ::XMLRPC::Client.new("foohost")
xmlrpc.instance_variable_get(:#http).instance_variable_set(:#verify_mode, OpenSSL::SSL::VERIFY_NONE)
That works with ruby 1.9.2, but clearly is poking at internals, so the real answer is "the API doesn't provide such a mechanism, but here's a hack".
Actually the client has been updated, now one has direct access to the http connection:
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/41286/diff/lib/xmlrpc/client.rb
xmlrpc.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
But better set ca_file or ca_path.
Still I see no option to apply such config to _async calls.
Update: found a workaround by monkey patching the client object:
xmlrpc_client.http.ca_file = #options[:ca_file]
xmlrpc_client.instance_variable_set(:#ca_file, #options[:ca_file])
def xmlrpc_client.net_http(host, port, proxy_host, proxy_port)
h = Net::HTTP.new host, port, proxy_host, proxy_port
h.ca_file = #ca_file
h
end
So you need both, the older approach and the monkey patching. We add also an instance variable, otherwise the new method cannot see the actual value.