Is it possible to create forwarding HTTPS Proxy (not reverse proxy) that would be able to:
block some urls based on the url regexp (ads, flash, movies, ...)
cache images based on the url regexp
It seems to me that in the usual case it is impossible because the HTTPS stream is encrypted and there's no way to process or alter it.
But, this case is special, it is a proxy for the web crawler, I don't need HTTPS at all, but some sites allow access via HTTPS only, and I have to somehow support it.
So, maybe it would be possible to do something like that?
Crawler --http--> Proxy --https--> Site
So, the proxy would be able to decode HTTPS stream and post-process it. Would it work? Is there any docs or details about such approach?
Pretty sure Apache 2.2 provides this functionality with mod_proxy in conjunction with mod_ssl and mod_cache.
Note: blocking is done using the 'ProxyBlock' directive in mod_proxy.
Related
I have a use case where I have to put a middle server or relay or tunnel to do network communication with the following points:
I have a web server running, let say when I hit an API /request hosted my web server, it creates a post request to https://www.google.com and gives me a response through the endpoint.
I want a middle server (proxy etc.) which I will call while creating this post request instead of communicating through my webserver,
the call goes to the middle server and gives me the same response as I was getting directly.
For this, the SQUID proxy worked for me.
I came across NGINX, but we can not use NGINX as a forward proxy, also there are some observations that might be useful with this regard.
SQUID proxy also uses the conf file as similar to NGINX,
HTTPS traffic is encrypted, the proxy server need to do some more work to get something with Https requests,
For intercepting, and creating ACL rules, someone will need to have a dummy certificate to be used by the server to act as the owner of the requested content through the proxy,
a list of rules can be incorporated within SQUID.conf to achieve the filtering.
I hope this could be useful to achieve something like this.
Let's say I have a Kubernetes Job that makes https requests to changing URLs and I want to allow specific URLs only and block all other requests. My idea is deploy an Https-Proxy-Pod and use NetworkPolicies to make sure the Job-Pod can only communicate with the Https-Proxy-Pod. See following sketch for better understanding:
sketch of https-proxy sidecar deployment
I know how to do that but have no idea what Https Proxy to use. As far as I understood envoy is not a suitable solution for what I want to do: https://github.com/envoyproxy/envoy/issues/1606
Does anyone has a better solution or can tell me which proxy to use?
Mitmproxy is an open source tool that you can use to filter HTTP and HTTPS requests transparently using the Python scripting language.
There's also a quite detailed tutorial on how to use it
HTTP proxy with SSL and DNS support.
I must be lacking some key concepts about proxy-ing because I cannot grasp this. I am looking to run a simply http or https proxy without interfering with SSL. Simply, a fully transparent proxy that can passthrough all the traffic to the browser connected via HTTP or HTTPS proxy without modifying or intercepting any packets. Not able to find any code online or I'm not using the right keywords.
EX. On the browser adding server.someVPN.com:80 on the HTTP proxy field and as soon as you try to visit a website, it prompts for authentication. Then it works perfectly with any domain, any security, any ssl, no further steps needed. Most VPN providers have this.
How's this possible? it even resolves DNS itself. I thought on transparent proxy the dns relies on the client. Preferably looking for a nodeJS solution but any lang works.
Please don't propose any solutions such as SOCKS5 or sock forwarding or DNS overriding or CA based MITM. According to HTTP 1.1 which supports 'CONNECT' this should be easy.
Not looking to proxy specific domains, looking for an all inclusive solution just like most VPN Providers providers.
----Found the answer too quickly, feel free to delete this post/question admins.
The way it works is that the browser knows it is talking to a proxy server, so for example if the browser want to connect to htttp://www.example.com it sends a CONNECT www.example.com:443 HTTP/1.1 to the proxy server, the proxy server resolves wwww.example.com via DNS and then opens a TCP connection to wwww.example.com port 443 and proxies the TCP stream transparently to the client.
I don't know any solution for nodejs. Common proxy servers include Squid, Privoxy and Apache Traffic Server
See also: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/CONNECT
Found the solution right after I asked...
This module works perfectly https://github.com/mpangrazzi/harrier
Does exactly what I was asking for.
How does Netnanny or k9 Web Protection setup web proxy without configuring the browsers?
How can it be done?
Using WinSock directly, or at the NDIS or hardware driver level, and
then filter at those levels, just like any firewalls soft does. NDIS being the easy way.
Download this ISO image: http://www.microsoft.com/downloads/en/confirmation.aspx?displaylang=en&FamilyID=36a2630f-5d56-43b5-b996-7633f2ec14ff
it has bunch of samples and tools to help you build what you want.
After you mount or burn it on CD and install it go to this folder:
c:\WinDDK\7600.16385.1\src\network\ndis\
I think what you need is a transparent proxy that support WCCP.
Take a look at squid-cache FAQ page
And the Wikipedia entry for WCCP
With that setup you just need to do some firewall configuration and all your web traffic will be handled by the transparent proxy. And no setup will be needed on your browser.
netnanny is not a proxy. It is tied to the host machine and browser (and possibly other applications as well. It then filters all incoming and outgoing "content" from the machine/application.
Essentially Netnanny is a content-control system as against destination-control system (proxy).
Easiest way to divert all traffic to a certain site to some other address is by changing hosts file on local host
You might want to have a look at the explanation here: http://www.fiddlertool.com/fiddler/help/hookup.asp
This is how Fiddler2 achieves inserting a proxy in between most apps and the internet without modifying the apps (although lots of explanation of how-to failing the default setup). This does not answer how NetNanny/K9 etc work though, as noted above they do a little more and may be a little more intrusive.
I believe you search for BrowserHelperObjects. These little gizmos capture ALL browser communication, and as such can either remote ads from the HTML (good gizmo), or redirect every second click to a spam site (bad gizmo), or just capture every URL you type and send it home like all the WebToolBars do.
What you want to do is route all outgoing http(s) requests from your lan through a reverse proxy (like squid). This is the setup for a transparent web proxy.
There are different ways to do this, although I've only ever set it up OpenBSD and Linux; and using Squid as the reverse proxy.
At a high level you have a firewall with rules to send all externally bound http traffic to a local squid server. The Squid server is configured to:
accept all http requests
forward the requests on to the real external hosts
cache the reply
forward the reply back to the requestor on the local lan
You can then add more granular rules in Squid to control access to websites, filter content, etc.
I pretty sure you can also get this functionality in different networking gear. I bet F5 has some products that do some or all of what I described, and probably Cisco as well. There is probably other proxies out there besides Squid that you can use too.
PS. I have no idea if this is how K9 Web Protection or NetNanny works.
Squid could provide an intercept proxy for HTTP and HTTPs ports, without configuring the browsers and it also supports WCCP.
So I have a custom proxy that is written in ruby using mongrel to handle some fairly complex caching logic. This works great for both http and ftp requests, however since mongrel is not designed to handle https requests, I wish to front the whole thing with apache and make use of the ProxyRemote command to pass through to mongrel for https requests.
This sort of thing is easily accomplished to mirror certain site directory structures via the ProxyPass and ProxyPassReverse commands in apache, but I don't see a way to do this using ProxyRemote.
The problem is that mongrel does not handle CONNECT requests which are made to establish a secure request. So while I am able to handle https requests within the proxy itself, actually using the proxy with an https request directly is not supported.
It seems that the simplest solution would be to have apache handle the https request and then simply pass the http request itself (minus the CONNECT) to mongrel and have it handle it appropriately and return it to apache and then to the client.
So my question is, Is there a way to make ProxyRemote work the same way that ProxyPass does with HTTP requests (i.e. pass an unencrypted request to mongrel)?
Just use ProxyPass and ProxyPassReverse, the connection between your reverse proxy (apache) and your mongrel will see normal plain http :), no magic necessary (especially not CONNECT, afaik thats only possbile for forward proxies, but I'm not sure).
Hum, have you tried to do so ?
I've been using apache to do the https and just pass the requests with the old default .htaccess mod_rewrite rules.