I have this URL: www.example.com/yyy.gif at my site.
It is actually NOT a direct link to the image but a HTML page containing the image (direct url to said image is www.example.com/files/yyy.gif). I want to keep it that way.
THE PROBLEM
When the link (www.example.com/yyy.gif) is posted somewhere (forums, comments at various websites) it is quite common that their script assumes it is a direct link to an image and tries to display it as an image (<image src="www.example.com/yyy.gif">) which leads to broken image on their site.
THE QUESTION
Is there a way to detect these cases and automatically reroute them to the direct image URL? Keep in mind that visitors should be able to open the original URL without being redirected.
You could check the $_SERVER['HTTP_REFERER'] variable to see if the url originates from your site.
If it doesn't, output some headers for the image and then the image itself instead of what you would normally do.
I think I got it!
So I checked what request headers are generated when the same URL is accessed by clinking on a link and when it is loaded using the <img> tag.
The only parameter that differs is "Accept". When the URL was accessed by <img> it was */* (Chrome, IE, Safari) or image/png,image/*;q=0.8,*/*;q=0.5 (FF).
But when accessed by clicking on a link (or just opening the URL directly), it is something like this text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
So I'm checking for 'text/html' in my script before outputting any content. If it is present, I output the html version; otherwise go for the image directly.
And it seems to be working exactly as I wanted (yay!).
Could there be any pitfalls that I should be aware of?
Related
I feel like I should be able to figure this out, but I can't.
This image attempts to download (in the browsers I tested: safari and chrome):
https://d3i71xaburhd42.cloudfront.net/9470b0dc3daccafa53ebe8f54d5bfed00afce2ce/29-Figure13-1.png
while this one (and most other images) automatically appears in the browser:
https://i.imgur.com/TvqM9Gp.png
These two images are completely arbitrary examples; there are obviously many more examples one could point to. In my experience most images display automatically in the browser, but occasionally one wants to download to a 'downloads' folder or another familiar location, and I'm not quite sure why that is.
What causes some images to download while others are automatically opened by and are viewable in the browser?
This is usually caused by either one of two HTTP response headers.
Content-Disposition
This header tells the browser that the content of the response should be displayed inline (the default value) or as an attachment. The latter will cause the browser to download the response.
Content-Type
This tells the browser what kind of content the response contains. Depending on the content type, the browser knows how a response should be handled. For example text/html will cause the browser to treat the response as HTML and will render it as such. text/plain will cause the response to be displayed as a simple text file, image/jpeg will cause the the response to be displayed as an image and binary/octet-stream will tell the browser, "this is binary data", which generally causes the browser to download the file. The list of MIME-types goes on and on.
If an image is downloaded instead of displayed in the browser and it doesn't have a Content-Disposition response header set to attachment, it usually means that the Content-Type isn't set correctly. For the first image you provided, the Content-Type is set to binary/octet-stream, so the browser will not treat it like an image.
I have tried to set my site up ( http://www.diablo3values.com )according to the guidelines set out here : https://developers.google.com/webmasters/ajax-crawling/ However, it appears that Google has updated their indexes (because I see the revisions to the meta description tags) but the ajax content does not show up in the index.
I am trying to use the “Handle pages without hash fragments” option.
If you view either of the following:
http://www.diablo3values.com/?_escaped_fragment_=
http://www.diablo3values.com/about?_escaped_fragment_=
you will correctly see the HTML snap shot with my content. (those are the two pages I an most concerned about).
Any Ideas? Am I doing something wrong? How do you get google to correclty recognize the tag.
I'm typing this as an answer, since it got a little to long to be a comment.
First of all, your links seems to point to localhost:8080/about, and not /about, which probably is why google doesn't index it in the first place.
Second, here's my experience with pushstate urls and Google AJAX crawling:
My experience is that ajax crawling with pushstate urls is handled a little differently by google than with hashbang urls. Since google won't know that your url is a pushstate url (since it looks just like a regular url), you need to add <meta name="fragment" content="!"> to all your pages, not only the "root" page. And google doesn't seem to know that the pages are part of the same application, so it treats every page as a separate Ajax application. So the Google bot will never actually create a navigation structure inside _escaped_fragment_, like _escaped_fragment_=/about, as it would with a hashbang url (#!/about). Instead, it will request /about?_escaped_fragment_= (which you aparently already have set up). This goes for all your "deep links". Instead of /?_escaped_fragment_=/thelink, google will always request /thelink?_escaped_fragment_=.
But as said initially, the reason it doesn't work for you is probably because you have localhost:8080 urls in your _escaped_fragment_ generated html.
Googlebot only knows to crawl the escaped fragment if your urls conform to the hash bang standard. As users navigate your site, your urls need to be:
http://www.diablo3values.com/
http://www.diablo3values.com/#!contact
http://www.diablo3values.com/#!about
Googlebot actually needs to see these urls in the source code so that it can follow them. Then it knows to download the following urls:
http://www.diablo3values.com/?_escaped_fragment=contact
http://www.diablo3values.com/?_escaped_fragment=about
On your site you appear to be loading a new page on each click, and then loading the content of each page via AJAX too. This is not how I would expect an AJAX site to work. Usually the purpose of using AJAX is so that the user never has to load a whole new page. When the user clicks, the new content section is loaded and inserted into the page. You serve the navigation once and then you only serve escaped fragments of the content.
Just to be sure, if you load a page and let's say this page has 3 images. First refere to "/images/1.jpgn", the second to "/images/2.jpg" and the third to "/images/1.jpg" again. When the page sent to the browser, will the browser make a new request to the server and ask for the image? And if the image has already been request (like my "lets say", it has two time the same image) will it request it again or it will know that this image/url has already been loaded and will just retrieve it from the temp?
Which lead to my second question, is there a way to save with javascript/jquery this image on the computer (with the download box opening like if you were downloading a file) from the temp without having to request it again from the server?
I don't know if I am really clear but in short, I want to save an image of the page from the cache and not request a download to the server.
Browsers generally cache what they can, according to what the HTTP response headers say. That is, servers ultimately control what browsers can (or should) cache, so it's server configuration that usually controls such things.
This applies not only to images but all content: HTML pages, CSS, JavaScript, etc.
It is all on how the server sends the image the 1st time (with or without caching).
If you have caching enabled on your browser, the browser will usually check your cache before requesting the file from the server.
The browser should take care of. It won't continually re-request the same file.
Typically, the browser will see that two images have the same source and therefore only download it once.
However if the same image is requested again later, the browser will send an If-Not-Modified-Since header to the server. The server can then respond with 304 Not Modified, at which point the browser uses the local copy to "download instantly".
I couldn't really word the title very well, but here's my problem: I've got a webpage that reads from a database each time the user clicks a button, the content is then replaced for part of the page.
Because it is an ajax load, everything is done in the background, and so the URL stays the same. This wasn't be a problem at all until I realised that I will want to have a different Facebook comments box for each set of content that is loaded - so if someone comments, it is posted to their facebook profile, people click on the link and are then taken to different content.
So... what I need is some way of referencing each set of content, and I've found a site that does exactly that (I'm sure there are a lot of them).
Here's the link.
Each set of content has a different 'hash code' (because I don't know the actual name for it) which is appended to the URL - in this case the code is "#1922934", this allows people to post links to it that specific set of content on Facebook etc. - and also allows a different Facebook comment box for each set of content.
Does anyone know how such a set-up can be achieved or how these 'hash codes' work?
Here's a document from wikipedia on it.
[http://en.wikipedia.org/wiki/Fragment_identifier][1]
The main idea is that URI fragments are used because they don't cause a page reload. They also can be used to refer to anchors on a web page.
What I would do is on page load use JavaScript to read the URI fragment (location.hash) then make a request to your server to load the comments etc. The URI fragment cannot be read by a server and is only found through a client (browser)
Sounds like you want something like SammyJS.
If you click on a result in Google Instant, the referer sent by your browser to the destination website contains a bunch of parameters, including the all important q=[autocompleted query]
But you're coming from a page whose URL is simply http://www.google.com/ with a bunch of stuff after the # character, i.e. as an on-page anchor.
So the browser appears to be sending a URL as the referer which is different from the URL of the page that you were viewing when you clicked.
There doesn't seem to be an additional redirection, so how on earth do they do that?
Most of the time, a Google search result actually sends you to a Google redirect page rather than directly to the target page. They use JavaScript to switch the target of the link onmousedown as you click on it.
You can see this effect by click-and-holding on the search result link and watching your status bar.
This isn't specific to Google Instant, they've been doing it for quite a long time on their standard results pages.
The page anchor part of the URL can be manipulated client-side without a new request to the server. Even when talking about static anchor links (e.g. Section Foo), clicking on them does not cause a new request to be sent to the server; it is processed completely within the browser.
The javascript being used by Google to make Google Instant work is simply altering the anchor programatically before making a request to the server.
What Google are you using?
My URL after searching is this:
http://www.google.es/#sclient=psy&hl=es&q=something+to+search&aq=f&aqi=g4g-o1&aql=&oq=&gs_rfai=&pbx=1&fp=b0....
It does include the q= part