Google Chrome Malware Warning when including images from image search API - ruby

I'm using Google and Bing image search APIs to provide a way for users of my web app to search for images to include in the documents they create in the app. A (rare?) problem I encountered today: a result from either Bing or Google (I'm going to assume Bing) caused the Google Chrome Malware detector to go off.
Is there any good way to avoid this that I'm not aware of, aside from only using the Google Image API (which is being deprecated!) since I assume they filter out results from sites they think contain malware?
There doesn't seem to be any performant way on my end to check these results before displaying them to prevent this error from occurring, and I'm very worried that any less savvy computer users will think my site is at fault (not to mention being unable to make the warning go away).
I guess I'm also making the assumption here that images from random Internet sites are okay to include in the page as long as they are returned by these APIs...I do copy them over to our own S3 account a few minutes after they are added to the document in case they are changed/removed on the external site...
EDIT: The result is indeed being included from the Bing API, and it is from thefatlossauthority.com.
I would prefer a solution based in Ruby, but given a general solution I'm more than willing to implement it myself.

Related

AdSense on history.pushState enabled page

First off, I know this has been discussed over and over again. But let's take this as a "late 2012 edition" since things tend to change rapidly on the internet.
I have this web page which is a "classical" web page with full page refreshes. Every internal click produces new content. We can show AdSense ads this way without a problem.
Now I started looking into "ajaxifying" (PJAX) the whole page for performance reasons (I've actually made a prototype version and it works superbly). The whole thing works only on browsers that support history.pushState, and whenever a user clicks on a internal link a AJAX request is triggered that fetches only the content part of the page (everything between the header and footer) and replaces old content with it.
The end result is, that the user is presented with a brand new page (including the changed URL and what not) and only the mechanism for delivering the page has changed (full reload vs. AJAX). As far as google (and older browsers) is concerned this is still a regular page with regular links (progressive enhancement and all that).
And yet there isn't a way to display AdSense, what with the document.write's and AdSense's TOS ruining the party.
My question: is there a Google approved (I'm not interested in hacks that will get us banned) way to display AdSense ads on a page like this (and I haven't found it). Or if there isn't, does Google have any plans on supporting this in the future (again, I haven't found anything related to this).
update
After some more digging around I came across Google DFP, which seems to support async loading of adds. But, I'm not sure I can load AdSense ads through it dynamically without breaking the TOS. I'm 100% sure I can load other ads this way, but not for AdSense. Could somebody clear this up for me?
According to this page loading Adsense ads through DFP you are subject to the both the DFP and Adsense terms. So I guess if you are following the current Adsense terms you are not allowed to do what you are talking about... at the same time Google provides a rather easy method to do exactly what you want to do with DFP...
Its still a grey area...

Direct URL to "I'm Feeling Lucky" for images

I have a website for book reviews. I offer a link to the Amazon entry of the books. I discovered after a bit of research that the direct URL for Google's "I'm Feeling Lucky" is:
http://www.google.com/search?hl=en&q=TITLE+AUTHOR+amazon&btnI=745
Which works magic because then I don't have to manually include the Amazon link in my database and directly links to the Amazon page (works 99.99% of the times).
I was wondering if there was an equivalent for images (whether Google or some alternative) to retrieve an image URL based on keywords only (for the purpose of getting the book cover image).
There's no such thing for Google Images, but you might be able to use another web service to do what you want. I noticed that when you're searching for a book, the first image result isn't always the cover of it. Sometimes it's a photo of the author, sometimes it's some image from book's review, so you can hardly rely on that.
It should not be hard to parse the amazon page and get the image and link but google has an API to google books that return all informations about a book in JSON format, you can try it online on the API Explorer (the cover are on the results too). Click here to see an example (click "Execute" to run it).
Unfortunately public Google search engine doesn't support that. You should use Custom Search API to implement such feature in your application. Alternatively use XGoogle (unofficial Python wrapper to Google Search services, see google_dl tool for example).
Other suggestions is to use:
YQL by Yahoo (see yql-tables repo at GitHub for examples).
Use alternative search engines.
E.g. In Wolfram Alpha you can type: "show image of laptop" and it'll give you the first popular picture, however you need to use Wolfram|Alpha APIs or some script (see this ChatBot for example) to pick up the direct link.

How do you find Wp7 App Rank without resorting to 3rd Party Apps or Websites

How/Where is the Wp7 AppRank stored/calculated from? A number of sites and app display it but where are they getting this information from?
A Code example (Its got to be a screenscrape from one of the official sites?) where applicable would be interesting
Take a look at Brandon Watson's post on crawling the Windows Phone Marketplace as a good starting place; http://catalog.zune.net provides xml data on the marketplace, and the information you're looking for (the application rank) can be retrieved through specifying the orderby as downloadRank as he's done, and the first app returned is #1, then #2, etc.
http://catalog.zune.net/v3.2/en-us/apps?clientType=WinMobile%207.1&store=Zest&orderby=downloadRank
Note that this post is a little old, so you may need to play around with the query parameters (like the clientType) to make sure you're getting the right data back. This post might also be helpful.

Why use a Google Sitemap?

I've played around with Google Sitemaps on a couple sites. The lastmod, changefreq, and priority parameters are pretty cool in theory. But in practice I haven't seen these parameters affect much.
And most of my sites don't have a Google Sitemap and that has worked out fine. Google still crawls the site and finds all of my pages. The old meta robot and robots.txt mechanisms still work when you don't want a page (or directory) to be indexed. And I just leave every other page alone and as long as there's a link to it Google will find it.
So what reasons have you found to write a Google Sitemap? Is it worth it?
From the FAQ:
Sitemaps are particularly helpful if:
Your site has dynamic content.
Your site has pages that aren't easily
discovered by Googlebot during the
crawl process—for example, pages
featuring rich AJAX or images.
Your site is new and has few links to it.
(Googlebot crawls the web by
following links from one page to
another, so if your site isn't well
linked, it may be hard for us to
discover it.)
Your site has a large
archive of content pages that are not
well linked to each other, or are not
linked at all.
It also allows you to provide more granular information to Google about the relative importance of pages in your site and how often the spider should come back. And, as mentioned above, if Google deems your site important enough to show sublinks under in the search results, you can control what appears via sitemap.
I believe the "special links" in search results are generated from the google sitemap.
What do I mean by "special link"? Search for "apache", below the first result (Apache software foundation) there are two columns of links ("Apache Server", "Tomcat", "FAQ").
I guess it helps Google to prioritize their crawl? But in practice I was involved in a project where we used the gzip-ed large version of it where it helped massively. And AFAIK there is a nice integration with webmaster tools as well.
I am also curious about the topic, but does it cost anything to generate a sitemap?
In theory, anything that costs nothing and may have a potential gain, even if very small or very remote, can be defined as "worth it".
In addition, Google says: "Tell us about your pages with Sitemaps: which ones are the most important to you and how often they change. You can also let us know how you would like the URLs we index to appear." (Webmaster Tools)
I don't think that the bold statement above is possible with the traditional mechanisms that search engines use to discover URLs.

How to know quantity of users with turned off images in browser?

I'm working on the quite popular website, which looks good if user has turned on "Load images" option in his browser's settings.
When you try to open the website with "turned off images" option, it becomes not usable, many components won't work, because user won't see "important" buttons(we don't use standard OS buttons).
So, we can't understand and measure negative business impact of this mistake(absent alt/title attributes).
We can't set priority for this task - because we don't know how much such users comes to our website.
Please give me some advice how this problem can be solved?
Look in the logs for how many hits you get on a page without the subsequent requests from the browser for the other images.
Of course the browser might have images cached, so look for the first time you get a hit.
You can even use IP address for this, since it's OK if you throw out good data (that is, hits that are new that you disregard). The question is just: Of the hits you know are first-time, how many don't get images?
If this is a public page (i.e. not a web application that you've logged in to), also disregard search engine bots to the greatest extent possible; they usually won't retrieve images.

Resources