Im getting different counts for google PlusOne for https and http protocols.
Has anyone experienced this behaviour and have any suggestions as to resolve it.
I am using the typical implementation shown here.
I also found this link regarding canonical urls but I dont think this is relevant in this case as I have no alternative canonical url only repeated content served on the 2 different protocols.
Pages referred by different URLs are different. One can serve different content via HTTP and HTTPS. So if you want to have the same counter, you need to specify the canonical URL (with or without https, you decide) in counter parameters.
Related
I'm investigating the use of http-proxy-middleware / node-http-proxy as a reverse proxy. Does anyone know if this is really possible?
I've already setup http-proxy-middleware so that I can proxy a request through it (the results are displayed in an iframe), and I'm also able to modify the request headers and html results. Specifically, I'm setting the host/origin headers and rewriting the result to change embedded links so that they go through the proxy as well.
But, some links are generated by js, and rewriting javascript responses seems to be very difficult to do correctly.
Is there a way to do this without rewriting links? I.e., is there any method to configure the iframe to automatically send all requests through the proxy?
Or maybe this is not really possible, and I'd need to use a full proxy like Squid?
Thanks!
This does seem to be possible, to a limited extent. http-proxy-middleware can be configured to edit response headers and to also rewrite the response body, so that links can be rewritten to use the proxy URL. XmlHttpRequest and fetch() requests can also be intercepted to rewrite the requests to use the proxy URL.
Use of a reverse proxy should be 100% transparent to clients and your application code, with zero code changes. So perhaps it is a design problem where I can clarify requirements for you.
URL DESIGN
As an API example, I might design URLs as follows for an API:
Public URL: https://api.mycompany.com/products
Internal URL: https://productservice.internal.com:3000
Note that the public URL of the API is actually that of a route within the reverse proxy.
An internet client would only ever use the public URL. If the internal API ever returns URLs to internet clients, it needs to be configured to use the public URL.
REVERSE PROXIES
The most mature options are probably the nginx based ones, which provide both declarative routing and also the ability to write any logic you like via plugins. There are plenty of examples in Curity guides, which may make you aware of some use cases
A mainstream option is to use the proxy-pass directive to route to an internal URL. The same pattern should work for the node RP you mention, though for simple tasks no custom logic should be needed.
Header configuration is a common thing to do in the RP, eg to ensure that the component receives the original client's IP address, rather than that of the RP, but that is often optional.
MISBEHAVING BACKEND COMPONENT
Perhaps this is the root of the problem - if a website returns the internal URL, eg in redirects or image URLs, then it is wrong. Many tech stacks will have a property such as BaseUrl that fixes this.
I have a problem with Domino Web Server with an XPages Application which is placed behind some reverse proxy. The problem is, the proxy forwards all requests from the URL like h2tps://organization/test_server/ to the Domino Web Server.
This makes all links in the application brocken. And I don't know how to fix it.
For example, the login attempt will be redirected by the server to h2tps://organization/names.nsf?Login instead of h2tps://organization/test_server/names.nsf?Login
Have you any idea how to fix it?
When using a reverse proxy, we recommend keeping the original URL unchanged. Because many redirects, Ajax requests, cookie are closely related to URL, if the URL changes, almost must go to modify the code.
This problem is especially serious in Domino, because in a lot of Javascript code that will be used in the absolute path, for example /names.nsf. As a comparison of Java applications in general will use relative paths (for example ../login).
The actual way to achieve URL unchanged: map the domain name of the original domino server (for example test.domino.xxx), to the reverse proxy server, and reverse proxy server via http HOST header to determine forwarded to which backend server, without the need to add additional path (for example test_server).
In IBM WebSeal example, this configuration is called virtual host junctions.
Did you create a site document on the domino Server?
I have solved the problem with some efforts.
At first I have moved the DB in the folder /test_server/.
Then I have changed all static HTML links to use the /test_server/.
With the option xsp.application.context.proxy=test_server I have changed the
paths for internal XPages and Extensions Library resources.
At the end I have to add some substitution rules on the Domino
to prevent duplicated paths like /test_server/test_server/.
Now it seems to be working well.
The proposal from the proxy team was to use url rewrite on the Web Server. It can be done with Domino without doubt, but requires to develop a DNSAPI Addon (a dll written in C). And it doesn't look for me like an easy task.
Anyway, thanks a lot for your help!
I'm thinking of doing such structure for accessing some hypothetical page:
/foo/ID/some-friendly-string
The key part here is "ID" that identifies the page so everything that's not ID is only relevant to SEO. I also want everything else that isn't "/foo/ID/some-friendly-string" to redirect to the original link. Eg.:
/foo/ID ---> /foo/ID/some-friendly-string
/foo/ID/some-friendly-string-blah ---> /foo/ID/some-friendly-string
But what if somehow these links get "polluted" somewhere on the internet and spiders start accessing them with "/foo/ID/some-friendly-string-blah-blah-pollution" URLs? I don't even know if this can happen, but if, say, some bad person decided to post thousands of such "different" links on some well known forums or some such - then google would find thousands of "different" URLs 301-redirecting to the same page.
In such case - would there be some sort of a penalty or is it all the same to google as long as the endpoint is unique and no content duplicates?
I might be a little paranoid with this, but that's just my nature to investigate explaitable situations :)
Thanks for your thoughts
Your approach of using 301 redirect is correct.
301 redirects are very useful if people access your site through several different URLs.
For instance, your page for a given ID can be accessed in multiple ways, say:
http://yoursite.com/foo/ID
http://yoursite.com/foo/ID/some-friendly-string (preferred)
http://yoursite.com/foo/ID/some-friendly-string-blah
http://yoursite.com/some-friendly-string-blah-blah-pollution
It is a good idea to pick one of those URLs (you have decided to be http://yoursite.com/foo/ID/some-friendly-string) as your preferred URL and use 301 redirects to send traffic from the other URLs to your preferred one.
I would also recommend adding canonical link to the HEAD section on the page e.g.
<link rel="canonical" href="http://yourwebsite.com/foo/ID/some-friendly-string"/>
You can get more details on 301 redirects in:
Google Webmaster Tools - Configuration > Change of Address
Google Webmaster Tools Documentation - 301 redirects
I hope that will help you out with your decisions on redirects.
EDIT
I forgot to mention very good example, namely, Stack Overflow. The URL of this question is
http://stackoverflow.com/questions/14318239/seo-301-redirect-limits but you can access it with http://stackoverflow.com/questions/14318239/blahblah and will get redirect to the original URL.
I'm looking to implement the Google crawlable AJAX states as described here:
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
Essentially this requires specifying your AJAX states with a #!state value at the end of the url.
This should then be passed to the application server (PHP in my case) as part of the query string eg.
http://www.example.com/#!open would become http://www.example.com/?_escaped_fragment_=open
Unfortunately i'm having trouble figuring out how to implement this via mod_rewrite on Apache 2. Can anyone offer some help?
Cheers
James
The RFC 2396 section 4 says:
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
That is, the fragment won't be visible for the web server so you'll have to look for some other method as mod_rewrite is a no go.
Depending on what language you are familiar with, you can install HTMLUnit if you're a Java dev or you could try to write a proxy and use it to fetch the parsed content by, i.e. Jaxer or a Firefox instance. I used Jaxer and is pretty easy to implement the crawlable ajax pages after you get used with the Jaxer API (which isn't complicated at all)
When I look at Amazon.com and I see their URL for pages, it does not have .htm, .html or .php at the end of the URL.
It is like:
http://www.amazon.com/books-used-books-textbooks/b/ref=topnav_storetab_b?ie=UTF8&node=283155
Why and how? What kind of extension is that?
Your browser doesn't care about the extension of the file, only the content type that the server reports. (Well, unless you use IE because at Microsoft they think they know more about what you're serving up than you do). If your server reports that the content being served up is Content-Type: text/html, then your browser is supposed to treat it like it's HTML no matter what the file name is.
Typically, it's implemented using a URL rewriting scheme of some description. The basic notion is that the web should be moving to addressing resources with proper URIs, not classic old URLs which leak implementation detail, and which are vulnerable to future changes as a result.
A thorough discussion of the topic can be found in Tim Berners-Lee's article Cool URIs Don't Change, which argues in favour of reducing the irrelevant cruft in URIs as a means of helping to avoid the problems that occur when implementations do change, and when resources do move to a different URL. The article itself contains good general advice on planning out a URI scheme, and is well worth a read.
More specifically than most of these answers:
Web content doesn't use the file extension to determine what kind of file is being served (unless you're Internet Explorer). Instead, they use the Content-type HTTP header, which is sent down the wire before the content of the image, HTML page, download, or whatever. For example:
Content-type: text/html
denotes that the page you are viewing should be interpreted as HTML, and
Content-type: image/png
denotes that the page is a PNG image.
Web servers often use the file extension if the file is served directly from disk to determine what Content-type to assign, but web applications can also generate pages with any Content-type they like in response to a request. No matter the filename's structure or extension, so long as the actual content of the page matches with the declared Content-type, the data renders as intended.
For websites that use Apache, they are probably using mod_rewrite that enables them to rewrite URLS (and make them more user and SEO friendly)
You can read more here http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
and here http://www.sitepoint.com/article/apache-mod_rewrite-examples/
EDIT: There are rewriting modules for IIS as well.
Traditionally the file extension represents the file that is being served.
For example
http://someserver/somepath/image.jpg
Later that same approach was used to allow a script process the parameter
http://somerverser/somepath/script.php?param=1234&other=7890
In this case the file was a php script that process the "request" and presented a dinamically created file.
Nowadays, the applications are much more complex than that ( namely amazon that you metioned )
Then there is no a single script that handles the request ( but a much more complex app wit several files/methods/functions/object etc ) , and the url is more like the entry point for a web application ( it may have an script behind but that another thing ) so now web apps like amazon, and yes stackoverflow don't show an file in the URL but anything comming is processed by the app in the server side.
websites urls without file extension?
Here I questions represents the webapp and 322747 the parameter
I hope this little explanation helps you to understand better all the other answers.
Well how about a having an index.html file in the directory and then you type the path into the browser? I see that my Firefox and IE7 both put the trailing slash in automatically, I don't have to type it. This is more suited to people like me that do not think every single url on earth should invoke php, perl, cgi and 10,000 other applications just in order to sent a few kilobytes of data.
A lot of people are using an more "RESTful" type architecture... or at least, REST-looking URLs.
This site (StackOverflow) dosn't show a file extension... it's using ASP.NET MVC.
Depending on the settings of your server you can use (or not) any extension you want. You could even set extensions to be ".JamesRocks" but it won't be very helpful :)
Anyways just in case you're new to web programming all that gibberish on the end there are arguments to a GET operation, and not the page's extension.
A number of posts have mentioned this, and I'll weigh in. It absolutely is a URL rewriting system, and a number of platforms have ways to implement this.
I've worked for a few larger ecommerce sites, and it is now a very important part of the web presence, and offers a number of advantages.
I would recommend taking the technology you want to work with, and researching samples of the URL rewriting mechanism for that platform. For .NET, for example, there google 'asp.net url rewriting' or use an add-on framework like MVC, which does this functionality out of the box.
In Django (a web application framework for python), you design the URLs yourself, independent of any file name, or even any path on the server for that matter.
You just say something like "I want /news/<number>/ urls to be handled by this function"