I'm using node-http-proxy to run a proxy website. I would like to proxy any target website that the user chooses, similarly to what's done by https://www.proxysite.com/, https://www.croxyproxy.com/ or https://hide.me/en/proxy.
How would one achieve this with node-http-proxy?
Idea #1: use a ?target= query param.
My first naive idea was to add a query param to the proxy, so that the proxy can read it and redirect.
Code-wise, it would more or less look like (assuming we're deploy this to http://myproxy.com):
const BASE_URL = 'https://myproxy.com';
// handler is the unique handler of all routes.
async function handler(
req: NextApiRequest,
res: NextApiResponse
): Promise<void> {
try {
const url = new URL(req.url, BASE_URL); // For example: `https://myproxy.com?target=https://google.com`
const targetURLStr = url.searchParams.get('target'); // Get `?target=` query param.
return httpProxyMiddleware(req, res, {
changeOrigin: true,
target: targetURLStr,
});
} catch (err) {
res.status(500).json({ error: (err as Error).message });
}
}
Problem: If I deploy this code to myproxy.com, and load https://myproxy.com?target=https://google.com, then google.com is loaded, but:
if I click a link to google images, it loads https://myproxy.com/images instead of https://myproxy.com?target=https://google.com/images, also see URL as query param in proxy, how to navigate?
Idea #2: use cookies
Second idea is to read the ?target= query param like above, store its hostname in a cookie, and proxy all resources to the cookie's hostname.
So for example user wants to access https://google.com/a/b?c=d via the proxy. The flow is:
go to https://myproxy.com?target=${encodeURIComponent('https://google.com/a/b?c=d')}
proxy reads the ?target= query param, sets the hostname (https://google.com) in a cookie
proxy redirects to https://myproxy.com/a/b?c=d (307 redirect)
proxy sees a new request, and since the cookie is set, we proxy this request into node-http-proxy using cookie's target.
Code-wise, it would look like: https://gist.github.com/throwaway34241/de8a623c1925ce0acd9d75ff10746275
Problem: This works very well. But only for one proxy at a time. If I open one browser tab with https://myproxy.com?target=https://google.com, and another tab with https://myproxy.com?target=https://facebook.com, then:
first it'll set the cookie to https://google.com, and i can navigate in the 1st tab correctly
then I go to the 2nd tab (without closing the 1st one), it'll set the cookie to https://facebook.com, and I can navigate facebook on the 2nd tab correctly
but then if I go back to the first tab, it'll proxy google resources through facebook, because the cookie has been overwritten.
I'm a bit out of ideas, and am wondering how those generic proxy websites are doing. Ideally, I would not want to parse the HTML of the target website.
The idea of a Proxy is to intercept the client requests, either by ports or by backend APIs, extract the URLs of requested resources, modify them and make those requests by self from servers, and modify responses and send them back to the client.
your first approach does this except modify responses and send back modified responses.
one way to do this is to edit all links in resources return by proxy to have your web address in them, only then send them as responses back to the client.
another way is to wrap the target site in a frame, as most web proxy sites do, and have a script to crawl the page and replace all links.
there is a small problem though. javascript-based requests are mostly hardcoded in the script and it is not an easy job to replace them.
your seconds approach sounds as if it would work better, but just a sound, nothing concrete I can say. implement a tab activity checker so you can change the cookie to your active tab. please check how-to-tell-if-browser-tab-is-active discussion about that
Related
I have created 2 version of the website, one for desktop and another for mobile. When user point their browser to www.example.com, based on the HTTP user agent, I want my server to serve them different website.
I don't want to use responsive design due to the fact my design and page layout as well as content are quite different between desktop and mobile. Furthermore, we may want to play around with search crawler by having another rule to open another plain HTML website.
I wonder can I configure such rule in my web server? or on Cloudflare?
You can detect the user's device by checking the User-Agent HTTP header for the first-time visitors OR a cookie for returning visitors, then use a Cloudfalre Worker script that would act as a reverse proxy redirecting requests to either desktop or mobile version of the website/app.
import isMobile from "ismobilejs";
export default {
fetch(req) {
const device = isMobile(req.headers.get("user-agent"));
// TODO: Also check cookies (for returning visitors)
const { pathname, search } = new URL(req.url);
const targetUrl = device.phone
? `https://m.example.com${pathname}${search}`
: `https://example.com${pathname}${search}`;
return fetch(targetUrl, req);
}
};
References
https://github.com/kaimallea/isMobile — User-Agent parser
https://github.com/kriasoft/cloudflare-starter-kit — Cloudflare Workers Starter Kit
Set Proxy per page in a Puppeteer Browser
Using a For of loop to create a new page for each automated instance, but after both pages load and take a screenshot, whatever is the first instance to start automating first, it takes over and only that automation takes place.
setting flags from what i've seen is only doable when creating a new browser
eg.
const browser = await puppeteer.launch({args:['--proxy-server=ip:port']});
cant seem to find any docs about setting it via the page.
I made a module that does this. It's called puppeteer-page-proxy.
It supports setting a proxy for an entire page, or if you like, it can set a different proxy for each request.
First install it:
npm i puppeteer-page-proxy
Then require it:
const useProxy = require('puppeteer-page-proxy');
Using it is easy;
Set proxy for an entire page:
await useProxy(page, 'http://127.0.0.1:8000');
If you want a different proxy for each request,then you can simply do this:
await page.setRequestInterception(true);
page.on('request', req => {
useProxy(req, 'socks5://127.0.0.1:9000');
});
Then if you want to be sure that your page's IP has changed, you can look it up;
const data = await useProxy.lookup(page);
console.log(data.ip);
It supports http, https, socks4 and socks5 proxies, and it also supports authentication if that is needed:
const proxy = 'http://login:pass#127.0.0.1:8000'
Repository:
https://github.com/Cuadrix/puppeteer-page-proxy
Can't get my head around $urlRouterPovider...
basically whenever I go to a link it should load associated view and controller. So that works.
$urlRouterProvider.when("/","/home")
$urlRouterProvider.otherwise("/error")
$stateProvider.state('views', {
url: "/:view",
templateUrl: function(stateParams, formResolver) {
return "views/" + stateParams.view + "/" + stateParams.view + "-view.html";
},
controllerProvider: function($stateParams) {
return "" + $stateParams.view + "Ctrl";
}
});
So whenever user goes to http://localhost:3030/#/foo, it loads "views/foo/foo.html" with controller as "fooCtrl", and goes to home by default, and for all other cases errror.
That is cool. What I need though, whenever user goes to http://localhost:3030/#/auth it would redirect to "/auth" on the server, skipping stateProvider. Currently it sees that as a state and tries to find corresponding view and controller.
If you need to redirect them to the server you need to leave out the #/ part of the URL.
The browser ignores the the #/ portion of the URL, which is how AngularJS is able to allow the page you server from localhost:3030/#/ handle the request. This is essentially still just requesting localhost:3030/
If you are wanting to do a true redirect or navigation to /auth on your server, ignore state for that request - you want your browser to make a straight-up HTTP request pointed directly at your server. Use /auth as the action in your form, or post to /auth from within your controller. When you are done on the server, redirect the user back to your Angular application.
Remember as well that you need to have some mechanism for your AngularJS application to know ehnIn our applications, we have the server set a cookie with a JWT token in it that is then used by the AngularJS application to retrieve the user information. This way the AngularJS application knows how to tell when a the user is really logged in (vs. a user going to a URL that represents a logged-in state).
How to prevent IE from caching the request sent to the server?
i tried by setting ("Cache-Control: no-cache) in the https response object but still the IE is caching my request data.
Please find tmy project details as below:
in my application i am sending login request to the server. so after i login if i take the memory dump using winHex tool i am able to get the password details in the memory.
i am clearing the dialog refrense also but still the request data is getting cached.
Please suggest me some work arround for this
You could try to add a parameter to your URL with a random value, this will prevent that the URL is always thesame.
Example:
Normal URL:
www.test.com/test.php
Fake different URL:
www.test.com/test.php?_dc=12353somerandomval
Make sure the _dc parameter always has a different value, you can (for example) use JavaScript date object for this (It returns the current time in milliseconds, which will virtually always be different):
params: {
_dc : new Date().getTime()
}
In a project I did a while back I had the exact same issues, I searched around and saw a few things that recommended adding a time stamp to the request, that does work too, but this was the most elegant way that worked for me.
$('document').ready(function () {
$.ajaxSetup({
cache: false
});
});
In my XPage I have a xe:djxDataGrid (dojox.grid.datagrid) which uses xe:restService which seems to use dojox.data.JsonRestStore.
Everything works fine without proxy but my client accesses the application via a proxy because of corporate policy. After a user updates data in the DataGrid it shows old values when accessed behind the proxy.
When the REST Control/JsonRestStore sends an ajax GET request to get data, there is no Cache-Control parameter in request headers. And Domino does not place Expires parameter in the reponse headers. I believe that's why the old version of the GET request gets cached by the proxy.
We have tried to disable cache in browsers but that does not help which indicates the proxy is caching the requests.
I believe this could be solved either by:
Setting Cache-Control parameter in request headers OR
Setting Expires parameter in response headers
But I haven't found a way to set either of these. For the XPage Domino sets Expires:-1 response header but not for the ajax GET request which is:
/mypage.xsp/?$$viewid=!ddrg6o7q1z!&$$axtarget=view:_id1:_id2:callback1:restService1
This returns the JSON data to JsonRestStore and gets cached by the proxy.
One options is to try to get an exception to the proxy so requests to this site would bypass the proxy cache. But exceptions are generally not easy to get thru.
Any ideas? Thanks.
Update1
My colleque suggested that I could intercept the xhr GET requests made by dojox.data.JsonRestStore and add a time parameter to the URL to prevent cache. Here is my question about that:
Prevent cache in every Dojo xhr request on page
Update2
#SvenHasselbach has a great solution for preventing cache for all xhrs:
http://openntf.org/XSnippets.nsf/snippet.xsp?id=cache-prevention-for-dojo-xhr-requests
It seems to work perfectly, &dojo.preventCache= parameter is added to the URLs and the requests seem to return correct JSON also with this parameter. But the DataGrid stops working when I use that code. Every xhr causes this error:
Tried with Firefox and Chrome. The first page of data still loads because xhr interception is not yet in place but the subsequent pages show only "..." in each cell.
The solution is Sven Hasselbach's code in the comment section of Julian Buss's blog which needs to be slightly modified.
I changed xhrPost to xhrGet and did not place the code to dojo.addOnLoad. When placed there it was not effective in the first XHR by the DataGrid/Store.
I also removed the headers modification because it overrides existing headers. When the REST control requests data from server with xhrGet the URL is always the same and rows requested are in HTTP header like this:
Range: items=0-9
This (and other) headers disappear when the original code is used. To just add headers we would have take the existing headers from args and append to them. I didn't see a need for that because it should be enough to add the parameter in the URL. Here is the extremely simple code I'm using:
if( !(dojo._xhrGet )) {
dojo._xhrGet = dojo.xhrGet;
}
dojo.xhrGet = function (args) {
args['preventCache'] = true;
return dojo._xhrGet(args);
}
Now I'm getting all rows and all XHR Get URLs have &dojo.preventCache= parameter which is exactly what I wanted. Next we'll test in customer environment to see if this solves their problem.
Update
As Julian points out in his blog I could also use a Web Site Rule to set Expires or cache-control http response headers.
Update
The customer reports it's working now for them!