I'm investigating the use of http-proxy-middleware / node-http-proxy as a reverse proxy. Does anyone know if this is really possible?
I've already setup http-proxy-middleware so that I can proxy a request through it (the results are displayed in an iframe), and I'm also able to modify the request headers and html results. Specifically, I'm setting the host/origin headers and rewriting the result to change embedded links so that they go through the proxy as well.
But, some links are generated by js, and rewriting javascript responses seems to be very difficult to do correctly.
Is there a way to do this without rewriting links? I.e., is there any method to configure the iframe to automatically send all requests through the proxy?
Or maybe this is not really possible, and I'd need to use a full proxy like Squid?
Thanks!
This does seem to be possible, to a limited extent. http-proxy-middleware can be configured to edit response headers and to also rewrite the response body, so that links can be rewritten to use the proxy URL. XmlHttpRequest and fetch() requests can also be intercepted to rewrite the requests to use the proxy URL.
Use of a reverse proxy should be 100% transparent to clients and your application code, with zero code changes. So perhaps it is a design problem where I can clarify requirements for you.
URL DESIGN
As an API example, I might design URLs as follows for an API:
Public URL: https://api.mycompany.com/products
Internal URL: https://productservice.internal.com:3000
Note that the public URL of the API is actually that of a route within the reverse proxy.
An internet client would only ever use the public URL. If the internal API ever returns URLs to internet clients, it needs to be configured to use the public URL.
REVERSE PROXIES
The most mature options are probably the nginx based ones, which provide both declarative routing and also the ability to write any logic you like via plugins. There are plenty of examples in Curity guides, which may make you aware of some use cases
A mainstream option is to use the proxy-pass directive to route to an internal URL. The same pattern should work for the node RP you mention, though for simple tasks no custom logic should be needed.
Header configuration is a common thing to do in the RP, eg to ensure that the component receives the original client's IP address, rather than that of the RP, but that is often optional.
MISBEHAVING BACKEND COMPONENT
Perhaps this is the root of the problem - if a website returns the internal URL, eg in redirects or image URLs, then it is wrong. Many tech stacks will have a property such as BaseUrl that fixes this.
I have an AJAX site and I'm using hashbangs (#!) in my urls with the intention of then providing the correct HTML versions when google bots replace the #! with ?_escaped_fragment_.
How do I go about routing/proxying/redirecting the url with _escaped_fragment_ to the corresponding HTML pages? I can't find documentation on this part of the process specifically, and my first thought was that I should be using a 301 or 302 redirect, but I was told that wasn't the case, albeit not given any more info.
You can't use htaccess or redirects at all. Everything after the # in the URL isn't even sent to the server. The URL fragment is entirely client side. You'll need to use some kind of javascript solution to look at the fragment, and make whatever appropriate AJAX call to the server and load the content you get back.
I'm looking to implement the Google crawlable AJAX states as described here:
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
Essentially this requires specifying your AJAX states with a #!state value at the end of the url.
This should then be passed to the application server (PHP in my case) as part of the query string eg.
http://www.example.com/#!open would become http://www.example.com/?_escaped_fragment_=open
Unfortunately i'm having trouble figuring out how to implement this via mod_rewrite on Apache 2. Can anyone offer some help?
Cheers
James
The RFC 2396 section 4 says:
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
That is, the fragment won't be visible for the web server so you'll have to look for some other method as mod_rewrite is a no go.
Depending on what language you are familiar with, you can install HTMLUnit if you're a Java dev or you could try to write a proxy and use it to fetch the parsed content by, i.e. Jaxer or a Firefox instance. I used Jaxer and is pretty easy to implement the crawlable ajax pages after you get used with the Jaxer API (which isn't complicated at all)
Hi I have a problem where I'm setting up an internet Kiosk in a public place and when a user goes to a certain URL I want it to redirect to another particular URL.
For example I want it setup so that if a user goes to www.example.com/step1 I want the browser to automatically go to www.example.com/step2
The only restriction here is that it has to work on Windows due to hardware limitations.
Does anybody know how I could do this?
Thanks
A couple of ways to do it:
Implement a Browser Helper Object, catch the BeforeNavigate event, cancel the navigation and direct it somewhere else.
Use a specialized proxy server that responds to a request for the first URL by returning a redirect to the second, and passes all other requests through.
You could try to modify the Hosts file. In Windows, I think it is found in WINDOWS/system32/drivers/.
It can be used to redirect a request for one IP address to another.
Hope that's useful.
When I look at Amazon.com and I see their URL for pages, it does not have .htm, .html or .php at the end of the URL.
It is like:
http://www.amazon.com/books-used-books-textbooks/b/ref=topnav_storetab_b?ie=UTF8&node=283155
Why and how? What kind of extension is that?
Your browser doesn't care about the extension of the file, only the content type that the server reports. (Well, unless you use IE because at Microsoft they think they know more about what you're serving up than you do). If your server reports that the content being served up is Content-Type: text/html, then your browser is supposed to treat it like it's HTML no matter what the file name is.
Typically, it's implemented using a URL rewriting scheme of some description. The basic notion is that the web should be moving to addressing resources with proper URIs, not classic old URLs which leak implementation detail, and which are vulnerable to future changes as a result.
A thorough discussion of the topic can be found in Tim Berners-Lee's article Cool URIs Don't Change, which argues in favour of reducing the irrelevant cruft in URIs as a means of helping to avoid the problems that occur when implementations do change, and when resources do move to a different URL. The article itself contains good general advice on planning out a URI scheme, and is well worth a read.
More specifically than most of these answers:
Web content doesn't use the file extension to determine what kind of file is being served (unless you're Internet Explorer). Instead, they use the Content-type HTTP header, which is sent down the wire before the content of the image, HTML page, download, or whatever. For example:
Content-type: text/html
denotes that the page you are viewing should be interpreted as HTML, and
Content-type: image/png
denotes that the page is a PNG image.
Web servers often use the file extension if the file is served directly from disk to determine what Content-type to assign, but web applications can also generate pages with any Content-type they like in response to a request. No matter the filename's structure or extension, so long as the actual content of the page matches with the declared Content-type, the data renders as intended.
For websites that use Apache, they are probably using mod_rewrite that enables them to rewrite URLS (and make them more user and SEO friendly)
You can read more here http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
and here http://www.sitepoint.com/article/apache-mod_rewrite-examples/
EDIT: There are rewriting modules for IIS as well.
Traditionally the file extension represents the file that is being served.
For example
http://someserver/somepath/image.jpg
Later that same approach was used to allow a script process the parameter
http://somerverser/somepath/script.php?param=1234&other=7890
In this case the file was a php script that process the "request" and presented a dinamically created file.
Nowadays, the applications are much more complex than that ( namely amazon that you metioned )
Then there is no a single script that handles the request ( but a much more complex app wit several files/methods/functions/object etc ) , and the url is more like the entry point for a web application ( it may have an script behind but that another thing ) so now web apps like amazon, and yes stackoverflow don't show an file in the URL but anything comming is processed by the app in the server side.
websites urls without file extension?
Here I questions represents the webapp and 322747 the parameter
I hope this little explanation helps you to understand better all the other answers.
Well how about a having an index.html file in the directory and then you type the path into the browser? I see that my Firefox and IE7 both put the trailing slash in automatically, I don't have to type it. This is more suited to people like me that do not think every single url on earth should invoke php, perl, cgi and 10,000 other applications just in order to sent a few kilobytes of data.
A lot of people are using an more "RESTful" type architecture... or at least, REST-looking URLs.
This site (StackOverflow) dosn't show a file extension... it's using ASP.NET MVC.
Depending on the settings of your server you can use (or not) any extension you want. You could even set extensions to be ".JamesRocks" but it won't be very helpful :)
Anyways just in case you're new to web programming all that gibberish on the end there are arguments to a GET operation, and not the page's extension.
A number of posts have mentioned this, and I'll weigh in. It absolutely is a URL rewriting system, and a number of platforms have ways to implement this.
I've worked for a few larger ecommerce sites, and it is now a very important part of the web presence, and offers a number of advantages.
I would recommend taking the technology you want to work with, and researching samples of the URL rewriting mechanism for that platform. For .NET, for example, there google 'asp.net url rewriting' or use an add-on framework like MVC, which does this functionality out of the box.
In Django (a web application framework for python), you design the URLs yourself, independent of any file name, or even any path on the server for that matter.
You just say something like "I want /news/<number>/ urls to be handled by this function"