get list of webpages that contain an image

get list of webpages that contain an image - image

How can I get a list of webpages that contain the image in question?
Photobucket has a stats option which lets you see what websites have embedded your image. How do they do that?

I'd assume photobucket checks the webserver logs, looking for the referer in any request for a specific image.
For any HTTP request, the browser also sends the so-called referer which contains the URL that "triggered" the request. If someone clicks on a link to webpage B in webpage A, the browser not only requests the linked webpage from the server of webpage B, but also sends the referer along, containing the URL of the "linking webpage" A. Same goes with images embedded in a webpage. The request for the image also contains the URL of the embedding webpage so the server can log which pages embed an image.
Of course, this could be suppressed by privacy tools in the user's browser, so the method would not be completely correct, but in most cases its sufficient.
See also http://en.wikipedia.org/wiki/HTTP_referrer

My guess is that they're seeing what web pages are pulling the embedded image by parsing the server logs.

Related

Custom title and image for Facebook share button on AJAX result

This question exists in different flavors, but not for AJAX pages.
I use AJAX to pull a single video into my page and I want a custom FB share button for it. Everything I've read so far says that FB pulls the required title and image from meta-tags in the page's < head> section (og:image and og:title).
I've tried to change the meta properties when the AJAX call returns, before rendering the share button. This hasn't worked. It uses the values that were present upon initial page load. I have yet to encounter a single answer to this question.
Are there data attributes I can add to the 'fb-like' div to specify a custom title and image (similar to data-href)?
Danke!

You need an individual URL for each individual piece of content that you want to share. Open Graph objects (and simple shared links “become” such, automatically) are identified by their URL (og:url).
Now if your whole page is built on AJAX, you still need to create such individual URLs somehow – the Facebook scraper tool does not “speak” JavaScript, and relies solely on the OG meta information that the server delivers for any URL it requests.
Since the hash part of an URL is only of relevance client-side (and does not even get send to the server), “typical” AJAX URLs that rely on those to tell the client which piece of content to load in the background are no good here.
So if you want to share two pieces of content (videos) as http://www.example.com/?v=vid1 and http://www.example.com/?v=vid2, then you have to make sure that your server delivers the meta data for each video under its respective URL.

How image are handle by the browser and how to save them without reloading?

Just to be sure, if you load a page and let's say this page has 3 images. First refere to "/images/1.jpgn", the second to "/images/2.jpg" and the third to "/images/1.jpg" again. When the page sent to the browser, will the browser make a new request to the server and ask for the image? And if the image has already been request (like my "lets say", it has two time the same image) will it request it again or it will know that this image/url has already been loaded and will just retrieve it from the temp?
Which lead to my second question, is there a way to save with javascript/jquery this image on the computer (with the download box opening like if you were downloading a file) from the temp without having to request it again from the server?
I don't know if I am really clear but in short, I want to save an image of the page from the cache and not request a download to the server.

Browsers generally cache what they can, according to what the HTTP response headers say. That is, servers ultimately control what browsers can (or should) cache, so it's server configuration that usually controls such things.
This applies not only to images but all content: HTML pages, CSS, JavaScript, etc.

It is all on how the server sends the image the 1st time (with or without caching).
If you have caching enabled on your browser, the browser will usually check your cache before requesting the file from the server.

The browser should take care of. It won't continually re-request the same file.

Typically, the browser will see that two images have the same source and therefore only download it once.
However if the same image is requested again later, the browser will send an If-Not-Modified-Since header to the server. The server can then respond with 304 Not Modified, at which point the browser uses the local copy to "download instantly".

plain http image on https/ssl page = warning

I've found the page that plain http images with a https/ssl page can't be displayed without warnings. Are there any way to display a picture from another http:// web-site on your https://web-site without warnings? (suppose you have a permission to display that picture on you web-site).
Chrome put a yellow triangle on SSL locker: "...However, this page includes other resources, that are not secure..."
IE displays a warning when a page loads: "Do you want to view only the webpage content that was delivered securely?"
So, how to display a picture on https:// page if it is on another web-server?

You can use the information on this article on Encosia. Basically you have to use a // syntax for your urls in order to use the same protocol in all cases. For example, if you have a https request, the following
//ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
will hit google's CDN using the https protocol. However, if you don't have control over the other server, i think you're out of luck. If you do have control over the other server i'd recommend using the method described in the article above by allowing your content server to serve both protocols.

Hotlink redirection

I have this URL: www.example.com/yyy.gif at my site.
It is actually NOT a direct link to the image but a HTML page containing the image (direct url to said image is www.example.com/files/yyy.gif). I want to keep it that way.
THE PROBLEM
When the link (www.example.com/yyy.gif) is posted somewhere (forums, comments at various websites) it is quite common that their script assumes it is a direct link to an image and tries to display it as an image (<image src="www.example.com/yyy.gif">) which leads to broken image on their site.
THE QUESTION
Is there a way to detect these cases and automatically reroute them to the direct image URL? Keep in mind that visitors should be able to open the original URL without being redirected.

You could check the $_SERVER['HTTP_REFERER'] variable to see if the url originates from your site.
If it doesn't, output some headers for the image and then the image itself instead of what you would normally do.

I think I got it!
So I checked what request headers are generated when the same URL is accessed by clinking on a link and when it is loaded using the <img> tag.
The only parameter that differs is "Accept". When the URL was accessed by <img> it was */* (Chrome, IE, Safari) or image/png,image/*;q=0.8,*/*;q=0.5 (FF).
But when accessed by clicking on a link (or just opening the URL directly), it is something like this text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
So I'm checking for 'text/html' in my script before outputting any content. If it is present, I output the html version; otherwise go for the image directly.
And it seems to be working exactly as I wanted (yay!).
Could there be any pitfalls that I should be aware of?

How does Google Instant change the referer sent by the browser?

If you click on a result in Google Instant, the referer sent by your browser to the destination website contains a bunch of parameters, including the all important q=[autocompleted query]
But you're coming from a page whose URL is simply http://www.google.com/ with a bunch of stuff after the # character, i.e. as an on-page anchor.
So the browser appears to be sending a URL as the referer which is different from the URL of the page that you were viewing when you clicked.
There doesn't seem to be an additional redirection, so how on earth do they do that?

Most of the time, a Google search result actually sends you to a Google redirect page rather than directly to the target page. They use JavaScript to switch the target of the link onmousedown as you click on it.
You can see this effect by click-and-holding on the search result link and watching your status bar.
This isn't specific to Google Instant, they've been doing it for quite a long time on their standard results pages.

The page anchor part of the URL can be manipulated client-side without a new request to the server. Even when talking about static anchor links (e.g. Section Foo), clicking on them does not cause a new request to be sent to the server; it is processed completely within the browser.
The javascript being used by Google to make Google Instant work is simply altering the anchor programatically before making a request to the server.

What Google are you using?
My URL after searching is this:
http://www.google.es/#sclient=psy&hl=es&q=something+to+search&aq=f&aqi=g4g-o1&aql=&oq=&gs_rfai=&pbx=1&fp=b0....
It does include the q= part

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

get list of webpages that contain an image - image

How can I get a list of webpages that contain the image in question? Photobucket has a stats option which lets you see what websites have embedded your image. How do they do that?

My guess is that they're seeing what web pages are pulling the embedded image by parsing the server logs.

Related

Custom title and image for Facebook share button on AJAX result

How image are handle by the browser and how to save them without reloading?

plain http image on https/ssl page = warning

Hotlink redirection

How does Google Instant change the referer sent by the browser?

Categories

Resources