HTTP request through proxy in Erlang - proxy

I would like to write a generic server that could spawn many HTTP/HTTPS requests through a proxy, in a manner: every HTTP request through different proxy. How can I do it ? It seems that it is possible to tunnel all traffic through some one, specific proxy, but I would like to change proxy on every request (ex. reading from file containing lines like "109.121.144.17:8008").
Does anybody know how to do it ? I tried standard httpc module but I cannot find info how to use it in the way I would like to.

Have you considered using ibrowse? You can specify the proxy settings on each request like:
7> ibrowse:send_req("http://www.google.com/", [], get, [],
[{proxy_user, "XXXXX"},
{proxy_password, "XXXXX"},
{proxy_host, "proxy"},
{proxy_port, 8080}], 1000).

Related

Create a multi-website proxy with `http-proxy`

I'm using node-http-proxy to run a proxy website. I would like to proxy any target website that the user chooses, similarly to what's done by https://www.proxysite.com/, https://www.croxyproxy.com/ or https://hide.me/en/proxy.
How would one achieve this with node-http-proxy?
Idea #1: use a ?target= query param.
My first naive idea was to add a query param to the proxy, so that the proxy can read it and redirect.
Code-wise, it would more or less look like (assuming we're deploy this to http://myproxy.com):
const BASE_URL = 'https://myproxy.com';
// handler is the unique handler of all routes.
async function handler(
req: NextApiRequest,
res: NextApiResponse
): Promise<void> {
try {
const url = new URL(req.url, BASE_URL); // For example: `https://myproxy.com?target=https://google.com`
const targetURLStr = url.searchParams.get('target'); // Get `?target=` query param.
return httpProxyMiddleware(req, res, {
changeOrigin: true,
target: targetURLStr,
});
} catch (err) {
res.status(500).json({ error: (err as Error).message });
}
}
Problem: If I deploy this code to myproxy.com, and load https://myproxy.com?target=https://google.com, then google.com is loaded, but:
if I click a link to google images, it loads https://myproxy.com/images instead of https://myproxy.com?target=https://google.com/images, also see URL as query param in proxy, how to navigate?
Idea #2: use cookies
Second idea is to read the ?target= query param like above, store its hostname in a cookie, and proxy all resources to the cookie's hostname.
So for example user wants to access https://google.com/a/b?c=d via the proxy. The flow is:
go to https://myproxy.com?target=${encodeURIComponent('https://google.com/a/b?c=d')}
proxy reads the ?target= query param, sets the hostname (https://google.com) in a cookie
proxy redirects to https://myproxy.com/a/b?c=d (307 redirect)
proxy sees a new request, and since the cookie is set, we proxy this request into node-http-proxy using cookie's target.
Code-wise, it would look like: https://gist.github.com/throwaway34241/de8a623c1925ce0acd9d75ff10746275
Problem: This works very well. But only for one proxy at a time. If I open one browser tab with https://myproxy.com?target=https://google.com, and another tab with https://myproxy.com?target=https://facebook.com, then:
first it'll set the cookie to https://google.com, and i can navigate in the 1st tab correctly
then I go to the 2nd tab (without closing the 1st one), it'll set the cookie to https://facebook.com, and I can navigate facebook on the 2nd tab correctly
but then if I go back to the first tab, it'll proxy google resources through facebook, because the cookie has been overwritten.
I'm a bit out of ideas, and am wondering how those generic proxy websites are doing. Ideally, I would not want to parse the HTML of the target website.
The idea of a Proxy is to intercept the client requests, either by ports or by backend APIs, extract the URLs of requested resources, modify them and make those requests by self from servers, and modify responses and send them back to the client.
your first approach does this except modify responses and send back modified responses.
one way to do this is to edit all links in resources return by proxy to have your web address in them, only then send them as responses back to the client.
another way is to wrap the target site in a frame, as most web proxy sites do, and have a script to crawl the page and replace all links.
there is a small problem though. javascript-based requests are mostly hardcoded in the script and it is not an easy job to replace them.
your seconds approach sounds as if it would work better, but just a sound, nothing concrete I can say. implement a tab activity checker so you can change the cookie to your active tab. please check how-to-tell-if-browser-tab-is-active discussion about that

How to use multiple proxy when crawling with scrapy + splash?

We crawl with scrapy + splash and we want to use multiple proxy. But splash only support single proxy https://splash.readthedocs.io/en/stable/api.html#proxy-profiles.
[proxy]
; required
host=proxy.crawlera.com
port=8010
; optional, default is no auth
username=username
password=password
; optional, default is HTTP. Allowed values are HTTP and SOCKS5
type=HTTP
How to use multiple proxy when crawling with scrapy + splash?
There are several options:
use multiple profiles (as Rafael Almeida suggested in comment);
pass a different proxy URL with each request (see http://splash.readthedocs.io/en/stable/api.html#arg-proxy);
write a Splash Lua script and use request:set_proxy in splash:on_request callback - there is an example in docs. This way you can set a different proxy for different requests initialted by a page, not only a single proxy per rendered page. I'm not aware of a way to do that in other browser automation tools like phantomjs or selenium.

How to enable CORS on Sonatype Nexus?

I want to develop a Monitoring-WebApp for different things with AngularJS as Frontend. One of the core-elements is showing an overview of Nexus-Artifacts/Repositories.
When I request the REST-API I'm getting following error back:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
Origin 'http://localhost:9090' is therefore not allowed access.
To fix this error, I need to modify the response headers to enable CORS.
It would be great if anyone is familiar with that type of problem and could give me an answer!
The CORS headers are present in the response of the system you are trying to invoke. (Those are checked on the client side [aka browser this case], you can implement a call on your backend to have those calls and there you can ignore those headers, but that could become quite hard to maintain.) To change those you'll need a proxy. So your application will not call the url directly like
fetch("http://localhost:9090/api/sometest")
There are at least two ways: one to add a proxy directly before the sonar server and modify the headers for everyone. I do not really recommend this because of security reasons. :)
The other more maintaneable solution is to go through the local domain of the monitoring web app as follows:
fetch("/proxy/nexus/api/sometest")
To achieve this you need to setup a proxy where your application is running. This could map the different services which you depend on, and modify the headers if necessary.
I do not know which application http server are you going to use, but here are some proxy configuration documentations on the topic:
For Apache HTTPD mod_proxy you could use a configuration similar to this:
ProxyPass "/proxy/nexus/" "http://localhost:9090/"
ProxyPassReverse "/proxy/nexus/" "http://localhost:9090/"
It is maybe necessary to use the cookies as well so you may need to take a look at the following configurations:
ProxyPassReverseCookiePath
ProxyPassReverseCookieDomain
For Nginx location you could employ something as follows
location /proxy/nexus/ {
proxy_pass http://localhost:9090/;
}
For node.js see documentation: https://github.com/nodejitsu/node-http-proxy
module.exports = (req, res, next) => {
proxy.web(req, res, {
target: 'http://localhost:4003/',
buffer: streamify(req.rawBody)
}, next);
};

Requiring an API call to use HTTPS with Apigee

I'm building a proxy using Apigee that transmits sensitive data. I need to ensure that clients of this API use HTTPS.
I've coded a Raise Fault policy that does something like this
proxies/default.xml
<PreFlow name="PreFlow">
<Request>
<Step>
<FaultRules/>
<Name>Require-HTTPS</Name>
<Condition>request.scheme != "https"</Condition>
</Step>
</Request>
<Response/>
</PreFlow>
policies/Require-HTTPS.xml
<RaiseFault async="false" continueOnError="false" enabled="true" name="Require-HTTPS">
<DisplayName>Require-HTTPS</DisplayName>
<FaultRules/>
<Properties/>
<FaultResponse>
<Set>
<Headers/>
<Payload contentType="application/json">\{
"status" : 400,
"message" : "Sensitive transactions may only be executed over HTTPS",
}
</Payload>
<StatusCode>400</StatusCode>
<ReasonPhrase>Requires HTTPS</ReasonPhrase>
</Set>
</FaultResponse>
<IgnoreUnresolvedVariables>true</IgnoreUnresolvedVariables>
</RaiseFault>
The problem is, the fault is always raised, whether I use HTTP or HTTPS to access.
I can see in the debugging console that the condition in proxies.xml always resolves to true, whether I use HTTP or HTTPS to access the API. In fact, the request.scheme always seems to be HTTP.
However, I do see the following header using HTTPS access that's not present using HTTP
X-Forwarded-Proto : https
Can I depend on this header to enforce HTTPS only access to my API? Or is there some other recommended way to do this?
You might try detecting the virtual host rather than the scheme by using the virtualhost.name variable (default or secure).
However, I suggest you create a two proxies -- one for https and one for http. This will remove the possibility of a consumer slipping past your conditional roadblock. Click on the word "default" under Proxy to edit the entire proxy file and scroll down to the HTTPProxyConnection
<HTTPProxyConnection>
<BasePath>/testme</BasePath>
<VirtualHost>default</VirtualHost>
<VirtualHost>secure</VirtualHost>
</HTTPProxyConnection>
Just remove the VirtualHost for default and the consumer will no longer be able to connect using https. I beleive you can then create a second API with the same path, only then remove "secure" and create a RasieFault without a condition. Worst case, you can rely on the Apigee error if you just disable default in your API.

HTTP GET request with a separate entity body in JMeter?

I want to send a JSON payload with HTTP GET request but I want to prevent it to be viewable in URL.
GET http://<domain>/school/search.json
{
schoolId: ["S1","S2","S3"],
location: "Pune"
}
How can I achieve this in JMeter Apache?
Get implies visible in Url, what exactly do you want to do ?
Sending Body data along with HTTP GET request is available for default (HttpClient4) implementation since ver.3.1 (as Bugzilla #60358), as well as request retrying behavior both for PUT and GET with body fixed since ver.3.2 (as Bugzilla #60837).
Just as additional note: you will likely encounter problems if you have cache/proxies in your setup and if you plan to take advantage from their usage.

Resources